7. Statistics

Lesson

Box and whisker plots are a great way of displaying quantitative (numerical) data as they clear show all the quartiles in a data set. Since statisticians are interested in what's "normal," they assume that most scores will be somewhere in the middle. As such, the "box" in box-and-whisker plots indicates the middle half of the scores. This chapter will explain how box-and-whisker plots give us a clear picture of a data set's central tendency and spread.

Let's take a minute to get familiar with the features of box-and-whisker plots.

We start with a number line that displays the values in our data set.

Above that, you'll see that there are two lines or "whiskers" that extend from the box outwards. The two end points of these lines show the maximum (greatest) and minimum (least) scores in the data set.

The two vertical edges of the box show the quartiles of the data range. The left hand side of the box is the lower quartile (Q1) and the right hand side of the box is the upper quartile (Q3).

Finally, the vertical line inside the box shows the median (the middle score), sometimes called Q2, of the data.

The diagram below shows a nice summary of all this information:

Each quartile represents $25%$25% of the data set.

In other words, from the minimum score to the lower quartile is $25%$25% of the data, from the lower quartile to the median is another $25%$25%, from the median to the upper quartile is another $25%$25% and from the upper quartile to the maximum score represents another $25%$25%.

We can add these quartiles together to find out what percentage of the data lies in different portions of the box and whisker plot. For example, $50%$50% of the scores in a data set lie in the box portion between the lower and upper quartiles. More specifically this is the middle $50%$50% of the data which are often considered the *normal* values of data.

We can also find the range of a data set, which is the distance between the minimum and maximum values, by simply subtracting the largest and smallest pieces of data.

Along those same lines the interquartile range (IQR) is the distance between the lower and upper quartile. To find the IQR simply subtract $Q_3-Q_1$`Q`3−`Q`1.

A list of the minimum, lower quartile, median, upper quartile, and maximum values is often called the five number summary.

- Put the data in ascending order (from smallest to largest).
- Find the median (middle value) of the data.
- To divide the data into quarters, find the median (middle value) between the minimum value and the median, as well as between the median and the maximum value.

If there are lots of scores in a data set, it may be easier to work out which scores represent the median and the upper and lower quartiles to avoid all of that counting. For a reminder of how to do this, click here.

For the box-and-whisker plot above, find the:

a) least score

**Think:** The least score is at the end of the left whisker.

**Do:** $3$3

b) greatest score

**Think:** The greatest score is at the end of the right whisker.

**Do:** $18$18

c) range

**Think:** The range is the difference between the greatest value and the least value.

**Do:** $18-3=15$18−3=15

d) median

**Think:** The median is shown by the line inside the box on the graph.

**Do:** $10$10

e) interquartile range (IQR)

**Think:** The IQR is the difference between the upper quartile and the lower quartile.

**Do:** $15-8=7$15−8=7

Using the box-and-whisker plot above:

a) what percentage of scores lie between:

$10.9$10.9 and $11.2$11.2

$10.8$10.8 and $10.9$10.9

$11.1$11.1 and $11.3$11.3

$10.9$10.9 and $11.3$11.3

$10.8$10.8 and $11.2$11.2

**Think:** For these five questions, think about how many quartiles are in that range. Remember that one quartile represents $25%$25% of the data set.

**Do:**

$50%$50% of scores lie between Q1 to Q3.

$25%$25% of the scores lie between the least score and Q1.

$50%$50% of scores lie between the median and the greatest score.

$75%$75% of scores lie between Q2 and the greatest score.

$75%$75% of scores lie between the least score and Q3.

b) In which quartile (or quartiles) is the data the most spread out?

**Think:** Which quartile takes up the longest space on the graph?

**Do: **The second quartile is the most spread out.

Below is the luggage weight of $30$30 passengers.

Weight (kg) | Frequency |
---|---|

$16$16 | $5$5 |

$17$17 | $5$5 |

$18$18 | $2$2 |

$19$19 | $4$4 |

$20$20 | $6$6 |

$21$21 | $4$4 |

$22$22 | $4$4 |

a) What is the mean check in weight? Leave your answer to two decimal places if needed.

**Think:** We need to add up the scores and divide it by the number of scores.

**Do:**

$\text{Mean weight }$Mean weight | $=$= | $\frac{5\times16+5\times17+2\times18+4\times19+6\times20+4\times21+4\times22}{30}$5×16+5×17+2×18+4×19+6×20+4×21+4×2230 |

$=$= | $\frac{569}{30}$56930 | |

$=$= | $18.9666$18.9666... | |

$=$= | $18.97$18.97kg |

b) Determine the:

i) Median

**Think: **The median is the $\frac{n+1}{2}$`n`+12th score.

**Do:** The median is the $\frac{30+1}{2}$30+12th score, which is the $15.5$15.5th score. This means that the median weight is $19$19kg.

ii) Lower Quartile

**Think:** The lower quartile is the $\frac{n+1}{4}$`n`+14th score.

**Do:** The $\frac{31}{4}$314th score is the $7.75$7.75th score. This means that the lower quartile is $17$17kg.

iii) Upper Quartile

**Think:** The upper quartile is the $\frac{3\left(n+1\right)}{4}$3(`n`+1)4th score.

**Do: **The $\frac{3\times31}{4}$3×314th score is the $23.25$23.25th score. This means that the upper quartile is $21$21kg.

c) In which quartile does the mean lie?

**Think:** The mean lies between the lower quartile and the median.

**Do:** The mean lies in the second quartile.

You have been asked to represent this data in a box plot. Answer the following questions:

$20,36,52,56,24,16,40,4,28$20,36,52,56,24,16,40,4,28

Complete the table for the given data:

Minimum $\editable{}$ Lower Quartile $\editable{}$ Median $\editable{}$ Upper Quartile $\editable{}$ Maximum $\editable{}$ Construct a box plot for the data.

0102030405060Data

Two groups of people, athletes and non-athletes, had their resting heart rate measured. The results were displayed in a pair of box plots.

Athletes |

30 40 50 60 70 80 90 |

Non-athletes |

30 40 50 60 70 80 90 Beats per minute |

What is the median heart rate of athletes?

What is the median heart rate of the non-athletes?

Using this measure, which group has the lower heart rates?

Non-athletes

AAthletes

BWhat is the interquartile range of the athletes' heart rates?

What is the interquartile range of the non-athletes' heart rates?

Using this measure, which group has more consistent heart rate measures?

Non-athletes

AAthletes

B

Display numerical data in plots on a number line, including dot plots (line plots), histograms, and box plots. (GAISE Model, step 3)

Summarize numerical data sets in relation to their context.

Report the number of observations.

Describe the nature of the attribute under investigation, including how it was measured and its units of measurement.

Find the quantitative measures of center (median and/or mean) for a numerical data set and recognize that this value summarizes the data set with a single number. Interpret mean as an equal or fair share. Find measures of variability (range and interquartile range) as well as informally describe the shape and the presence of clusters, gaps, peaks, and outliers in a distribution.