topic badge

7.09 Box plots

Lesson

Box and whisker plots are a great way of displaying quantitative (numerical) data as they clear show all the quartiles in a data set. Since statisticians are interested in what's "normal," they assume that most scores will be somewhere in the middle. As such, the "box" in box-and-whisker plots indicates the middle half of the scores. This chapter will explain how box-and-whisker plots give us a clear picture of a data set's central tendency and spread. 

Let's take a minute to get familiar with the features of box-and-whisker plots.

 

Features of a box-and-whisker plot

We start with a number line that displays the values in our data set.

Above that, you'll see that there are two lines or "whiskers" that extend from the box outwards. The two end points of these lines show the maximum (greatest) and minimum (least) scores in the data set.

The two vertical edges of the box show the quartiles of the data range. The left hand side of the box is the lower quartile (Q1) and the right hand side of the box is the upper quartile (Q3).

Finally, the vertical line inside the box shows the median (the middle score), sometimes called Q2, of the data.

The diagram below shows a nice summary of all this information:

Each quartile represents $25%$25% of the data set.

In other words, from the minimum score to the lower quartile is $25%$25% of the data, from the lower quartile to the median is another $25%$25%, from the median to the upper quartile is another $25%$25% and from the upper quartile to the maximum score represents another $25%$25%.

We can add these quartiles together to find out what percentage of the data lies in different portions of the box and whisker plot.  For example, $50%$50% of the scores in a data set lie in the box portion between the lower and upper quartiles. More specifically this is the middle $50%$50% of the data which are often considered the normal values of data.

We can also find the range of a data set, which is the distance between the minimum and maximum values, by simply subtracting the largest and smallest pieces of data.

Along those same lines the interquartile range (IQR) is the distance between the lower and upper quartile. To find the IQR simply subtract $Q_3-Q_1$Q3Q1.

A list of the minimum, lower quartile, median, upper quartile, and maximum values is often called the five number summary.

Creating a box-and-whisker plot

  1. Put the data in ascending order (from smallest to largest).
  2. Find the median (middle value) of the data.
  3. To divide the data into quarters, find the median (middle value) between the minimum value and the median, as well as between the median and the maximum value. 

If there are lots of scores in a data set, it may be easier to work out which scores represent the median and the upper and lower quartiles to avoid all of that counting. For a reminder of how to do this, click here.

 

Worked examples

Question 1

For the box-and-whisker plot above, find the:

a) least score

Think: The least score is at the end of the left whisker.

Do: $3$3

 

b) greatest score

Think: The greatest score is at the end of the right whisker.

Do: $18$18

 

c) range

Think: The range is the difference between the greatest value and the least value.

Do: $18-3=15$183=15

 

d) median

Think: The median is shown by the line inside the box on the graph.

Do: $10$10

 

e) interquartile range (IQR)

Think: The IQR is the difference between the upper quartile and the lower quartile.

Do: $15-8=7$158=7

 

Question 2

Using the box-and-whisker plot above:

a) what percentage of scores lie between:

$10.9$10.9 and $11.2$11.2

$10.8$10.8 and $10.9$10.9   

$11.1$11.1 and $11.3$11.3  

$10.9$10.9 and $11.3$11.3   

$10.8$10.8 and $11.2$11.2

Think: For these five questions, think about how many quartiles are in that range. Remember that one quartile represents $25%$25% of the data set.

Do:

$50%$50% of scores lie between Q1 to Q3.

$25%$25% of the scores lie between the least score and Q1.

$50%$50% of scores lie between the median and the greatest score.

$75%$75% of scores lie between Q2 and the greatest score.

$75%$75% of scores lie between the least score and Q3.

 

b) In which quartile (or quartiles) is the data the most spread out?

Think: Which quartile takes up the longest space on the graph?

Do: The second quartile is the most spread out.

 

Question 3

Below is the luggage weight of $30$30 passengers.

Weight (kg) Frequency
$16$16 $5$5
$17$17 $5$5
$18$18 $2$2
$19$19 $4$4
$20$20 $6$6
$21$21 $4$4
$22$22 $4$4

a) What is the mean check in weight? Leave your answer to two decimal places if needed.

Think: We need to add up the scores and divide it by the number of scores.

Do: 

$\text{Mean weight }$Mean weight $=$= $\frac{5\times16+5\times17+2\times18+4\times19+6\times20+4\times21+4\times22}{30}$5×16+5×17+2×18+4×19+6×20+4×21+4×2230
  $=$= $\frac{569}{30}$56930
  $=$= $18.9666$18.9666...
  $=$= $18.97$18.97kg

b) Determine the:

i) Median

Think: The median is the $\frac{n+1}{2}$n+12th score.

Do: The median is the $\frac{30+1}{2}$30+12th score, which is the $15.5$15.5th score. This means that the median weight is $19$19kg.

 

ii) Lower Quartile

Think: The lower quartile is the $\frac{n+1}{4}$n+14th score. 

Do: The $\frac{31}{4}$314th score is the $7.75$7.75th score. This means that the lower quartile is $17$17kg.

 

iii) Upper Quartile

Think: The upper quartile is the $\frac{3\left(n+1\right)}{4}$3(n+1)4th score.

Do: The $\frac{3\times31}{4}$3×314th score is the $23.25$23.25th score. This means that the upper quartile is $21$21kg.

 

c) In which quartile does the mean lie?

Think: The mean lies between the lower quartile and the median.

Do: The mean lies in the second quartile.

 

 

Practice questions

Question 4

Question 5

Two groups of people, athletes and non-athletes, had their resting heart rate measured. The results were displayed in a pair of box plots.

Athletes
30
40
50
60
70
80
90

A box plot is shown on a number line ranging from 30 to 90. The number line is labeled at intervals of 10 units, with minor ticks indicating intervals of 2 units. The box plot ranges from 40 to 70.

Non-athletes
30
40
50
60
70
80
90
Beats per minute

A box plot is shown on a number line ranging from 30 to 90. The number line is labeled at intervals of 10 units, with minor ticks indicating intervals of 2 units. The box plot ranges from 46 to 90.

  1. What is the median heart rate of athletes?

  2. What is the median heart rate of the non-athletes?

  3. Using this measure, which group has the lower heart rates?

    Non-athletes

    A

    Athletes

    B
  4. What is the interquartile range of the athletes' heart rates?

  5. What is the interquartile range of the non-athletes' heart rates?

  6. Using this measure, which group has more consistent heart rate measures?

    Non-athletes

    A

    Athletes

    B

 

 

Outcomes

6.SP.4

Display numerical data in plots on a number line, including dot plots, histograms, and box plots. Choose the most appropriate graph/plot for the data collected.

6.SP.5

Summarize numerical data sets in relation to their context, such as by:

6.SP.5.a

Reporting the number of observations

6.SP.5.b

Describing the nature of the attribute under investigation, including how it was measured and its units of measurement.

6.SP.5.c

Giving quantitative measures of center (median and/or mean) and variability (interquartile range and/or mean absolute deviation), as well as describing any overall pattern and any striking deviations (for example, outliers) from the overall pattern with reference to the context in which the data was gathered.

What is Mathspace

About Mathspace