topic badge

11.065 Parallel box plots

Lesson

Parallel box plots

Parallel box plots are used to compare two sets of data visually. When comparing box plots, the five key data points are going to be the important parts to compare. Remember that the five number summary includes:

  • the lowest data point
  • $Q_1$Q1
  • the median
  • $Q_3$Q3
  • the highest data point

Just like when we look at back-to-back stem plots, we can compare the spread of data in two box plots. We call these parallel box plots as they are presented parallel to each other along the same number line for comparison. They must therefore be in the same scale so that a visual comparison is straightforward. 

It is important to clearly label each box plot. Here we have plotted two sets of data, comparing the time it took two different groups of people to complete an online task. 

Time taken to complete an online task

 

Notice that overall the under $30$30s were faster at completing the task. Both the under $30$30s box plot and the over $30$30s box plot are slightly negatively skewed. Over $75%$75% of the under $30$30s completed the task in under $22$22 seconds, which is the median time taken by the over $30$30s.  $100%$100% of the under $30$30s had finished the task before $75$75% of the over $30$30s had completed it. 
Overall the under $30$30s performed better and had a smaller spread of scores. There was a larger variance within the over $30$30 group, with a range of $24$24 seconds compared to $20$20 seconds for the under $30$30s.

 

Key comparisons

There are many things to keep in mind when comparing two sets of data, and a good place to start is often to compare the information in the five number summaries for each set. A few of the most important questions to ask yourself are:

  • How do the spreads of data compare?
  • How do the skews compare? Is one set of data more symmetrical? 
  • Is there a big difference in the medians?

 

Worked example

Example 1

The box plots show the distances, in centimetres, jumped by two high jumpers.

(a) Who has a higher median jump?

Think: The median is shown by the line in the middle of the box. Whose median line has a higher value?

Do: The middle line for John is at $120$120 cm, while the middle line for Bill is at $110$110 cm. So John has a higher median jump.

(b) Who made the highest jump?

Think: The highest jump is the value furthest to the right for each person.

Do: Notice that Bill doesn't have an upper whisker, so his highest jump was $120$120 cm - the same as his upper quartile height. On the other hand, John's highest jump was $150$150 cm, so John had the highest jump overall.

(c) Who made the lowest jump?

Think: The lowest jump is the value furthest to the left for each person. 

Do: Both John and Bill had a lowest jump of $60$60 cm.

 

Practice questions

question 1

The box plots drawn below show the number of repetitions of a $70$70 kg bar that two weightlifters can lift. They both record their repetitions over $30$30 days.

  1. Which weightlifter has the more consistent results?

    Weightlifter A.

    A

    Weightlifter B.

    B
  2. What statistical evidence supports your answer?

    The mean.

    A

    The range.

    B

    The mode.

    C

    The graph is positively skewed.

    D
  3. Which statistic is the same for each weightlifter?

    The median.

    A

    The mean.

    B

    The mode.

    C
  4. Which weightlifter can do the most repetitions of $70$70 kg?

    Weightlifter A.

    A

    Weightlifter B.

    B

QUESTION 2

The box plots show the monthly profits (in thousands of dollars) of two financial traders over a year.

Ned

5
10
15
20
25
30
35
40
45
50
55
60

Tobias

5
10
15
20
25
30
35
40
45
50
55
60

Two box plots displayed above horizontal number lines. The box plot above represents Ned's box plot and the one below represents Tobias's box plot. The number lines have major tick marks at intervals of $5$5, ranging from $5$5 to $60$60. Between each major tick marks, there are four minor tick marks representing increment of $1$1 unit. On Ned's box plot, The box spans from $25$25, representing the first quartile, to $41$41, representing the third quartile, with a vertical line dividing the box at $32$32, representing the median. Thin horizontal lines extend from the edges of the Ned's box to $14$14 on the left and $55$55 on the right, both plotted as vertical lines representing minimum and maximum data points, respectively. On Tobias's box plot, The box spans from $27$27, representing the first quartile, to $40$40, representing the third quartile, with a vertical line dividing the box at $33$33, representing the median. Thin horizontal lines extend from the edges of the Tobias's box to $15$15 on the left and $50$50 on the right, both plotted as vertical lines representing minimum and maximum data points, respectively

  1. Who made a higher median monthly profit?

    Ned

    A

    Tobias

    B
  2. Whose profits had a higher interquartile range?

    Tobias

    A

    Ned

    B
  3. Whose profits had a higher range?

    Ned

    A

    Tobias

    B
  4. How much more did Ned make in his most profitable month than Tobias did in his most profitable month?

QUESTION 3

The two box plots below show the data collected by the manufacturers on the life-span of light bulbs, measured in thousands of hours.

Two box plots displayed above a horizontal number line. The box plot above represents Manufacturer $A$A's box plot and the one below represents Manufacturer $B$B's box plot. The number line is titled as "Thousands of hours" and has major tick marks at intervals of $1$1, ranging from $0$0 to $8$8. Between each major tick marks, there is one minor tick mark representing half a unit. On Manufacturer $A$A's box plot, the box spans from $2.5$2.5, representing the first quartile, to $4.5$4.5, representing the third quartile, with a vertical line dividing the box at $4$4, representing the median. Thin horizontal lines extend from the edges of the Manufacturer $A$A's box to $1$1 on the left and $5$5 on the right, both plotted as vertical lines representing minimum and maximum data points, respectively. On Manufacturer $B$B's box plot, The box spans from $3.5$3.5, representing the first quartile, to $6$6, representing the third quartile, with a vertical line dividing the box at $5$5, representing the median. Thin horizontal lines extend from the edges of the Manufacturer B's box to $1.5$1.5 on the left and $8$8 on the right, both plotted as vertical lines representing minimum and maximum data points, respectively.

  1. Complete the following table using the two box plots. Write each answer in terms of hours.

      Manufacturer A Manufacturer B
    Median $\editable{}$ $5000$5000
    Lower Quartile $\editable{}$ $\editable{}$
    Upper Quartile $4500$4500 $\editable{}$
    Range $\editable{}$ $6500$6500
    Interquartile Range $\editable{}$ $\editable{}$
  2. Which manufacturer produces light bulbs with the best lifespan?

    Manufacturer A.

    A

    Manufacturer B.

    B

QUESTION 4

The box plots below represent the daily sales made by Carl and Angelina over the course of one month.

0
10
20
30
40
50
60
70
Angelina's Sales
0
10
20
30
40
50
60
70
Carl's Sales

Two box plots displayed above horizontal number lines. The box plot above represents Angelina's sales and the one below represents Carl's sales. The number lines have major tick marks at intervals of $10$10, ranging from $0$0 to $70$70. Between each major tick marks, there are nine minor tick marks representing increment of $1$1 unit. On Angelina's box plot, the box spans from $16$16, representing the first quartile, to $42$42, representing the third quartile, with a vertical line dividing the box at $30$30, representing the median. Thin horizontal lines extend from the edges of the Angelina's box to $2$2 on the left and $51$51 on the right, both plotted as vertical lines representing minimum and maximum data points, respectively. On Carl's box plot, The box spans from $30$30, representing the first quartile, to $49$49, representing the third quartile, with a vertical line dividing the box at $42$42, representing the median. Thin horizontal lines extend from the edges of the Carl's box to $14$14 on the left and $64$64 on the right, both plotted as vertical lines representing minimum and maximum data points, respectively.

  1. What is the range in Angelina's sales?

  2. What is the range in Carl’s sales?

  3. By how much did Carl’s median sales exceed Angelina's?

  4. Considering the middle $50%$50% of sales for both sales people, whose sales were more consistent?

    Carl

    A

    Angelina

    B
  5. Which salesperson had a more successful sales month?

    Angelina

    A

    Carl

    B

Outcomes

MS11-7

develops and carries out simple statistical processes to answer questions posed

What is Mathspace

About Mathspace