Box and whisker plots are a great way of displaying quantitative (numerical) data as they clear show all the quartiles in a data set. Since statisticians are interested in what's "normal," they assume that most scores will be somewhere in the middle. As such, the "box" in box-and-whisker plots indicates the middle half of the scores. This chapter will explain how box-and-whisker plots give us a clear picture of a data set's central tendency and spread.
Let's take a minute to get familiar with the features of box-and-whisker plots.
We start with a number line that displays the values in our data set.
Above that, you'll see that there are two lines or "whiskers" that extend from the box outwards. The two end points of these lines show the maximum (greatest) and minimum (least) scores in the data set.
The two vertical edges of the box show the quartiles of the data range. The left hand side of the box is the lower quartile (Q1) and the right hand side of the box is the upper quartile (Q3).
Finally, the vertical line inside the box shows the median (the middle score), sometimes called Q2, of the data.
The diagram below shows a nice summary of all this information:
Each quartile represents $25%$25% of the data set.
In other words, from the minimum score to the lower quartile is $25%$25% of the data, from the lower quartile to the median is another $25%$25%, from the median to the upper quartile is another $25%$25% and from the upper quartile to the maximum score represents another $25%$25%.
We can add these quartiles together to find out what percentage of the data lies in different portions of the box and whisker plot. For example, $50%$50% of the scores in a data set lie in the box portion between the lower and upper quartiles. More specifically this is the middle $50%$50% of the data which are often considered the normal values of data.
We can also find the range of a data set, which is the distance between the minimum and maximum values, by simply subtracting the largest and smallest pieces of data.
Along those same lines the interquartile range (IQR) is the distance between the lower and upper quartile. To find the IQR simply subtract $Q_3-Q_1$Q3−Q1.
A list of the minimum, lower quartile, median, upper quartile, and maximum values is often called the five number summary.
If there are lots of scores in a data set, it may be easier to work out which scores represent the median and the upper and lower quartiles to avoid all of that counting. For a reminder of how to do this, click here.
For the box-and-whisker plot above, find the:
a) least score
Think: The least score is at the end of the left whisker.
Do: $3$3
b) greatest score
Think: The greatest score is at the end of the right whisker.
Do: $18$18
c) range
Think: The range is the difference between the greatest value and the least value.
Do: $18-3=15$18−3=15
d) median
Think: The median is shown by the line inside the box on the graph.
Do: $10$10
e) interquartile range (IQR)
Think: The IQR is the difference between the upper quartile and the lower quartile.
Do: $15-8=7$15−8=7
Using the box-and-whisker plot above:
a) what percentage of scores lie between:
$10.9$10.9 and $11.2$11.2
$10.8$10.8 and $10.9$10.9
$11.1$11.1 and $11.3$11.3
$10.9$10.9 and $11.3$11.3
$10.8$10.8 and $11.2$11.2
Think: For these five questions, think about how many quartiles are in that range. Remember that one quartile represents $25%$25% of the data set.
Do:
$50%$50% of scores lie between Q1 to Q3.
$25%$25% of the scores lie between the least score and Q1.
$50%$50% of scores lie between the median and the greatest score.
$75%$75% of scores lie between Q2 and the greatest score.
$75%$75% of scores lie between the least score and Q3.
b) In which quartile (or quartiles) is the data the most spread out?
Think: Which quartile takes up the longest space on the graph?
Do: The second quartile is the most spread out.
Below is the luggage weight of $30$30 passengers.
Weight (kg) | Frequency |
---|---|
$16$16 | $5$5 |
$17$17 | $5$5 |
$18$18 | $2$2 |
$19$19 | $4$4 |
$20$20 | $6$6 |
$21$21 | $4$4 |
$22$22 | $4$4 |
a) What is the mean check in weight? Leave your answer to two decimal places if needed.
Think: We need to add up the scores and divide it by the number of scores.
Do:
$\text{Mean weight }$Mean weight | $=$= | $\frac{5\times16+5\times17+2\times18+4\times19+6\times20+4\times21+4\times22}{30}$5×16+5×17+2×18+4×19+6×20+4×21+4×2230 |
$=$= | $\frac{569}{30}$56930 | |
$=$= | $18.9666$18.9666... | |
$=$= | $18.97$18.97kg |
b) Determine the:
i) Median
Think: The median is the $\frac{n+1}{2}$n+12th score.
Do: The median is the $\frac{30+1}{2}$30+12th score, which is the $15.5$15.5th score. This means that the median weight is $19$19kg.
ii) Lower Quartile
Think: The lower quartile is the $\frac{n+1}{4}$n+14th score.
Do: The $\frac{31}{4}$314th score is the $7.75$7.75th score. This means that the lower quartile is $17$17kg.
iii) Upper Quartile
Think: The upper quartile is the $\frac{3\left(n+1\right)}{4}$3(n+1)4th score.
Do: The $\frac{3\times31}{4}$3×314th score is the $23.25$23.25th score. This means that the upper quartile is $21$21kg.
c) In which quartile does the mean lie?
Think: The mean lies between the lower quartile and the median.
Do: The mean lies in the second quartile.
Two groups of people, athletes and non-athletes, had their resting heart rate measured. The results were displayed in a pair of box plots.
Athletes |
Non-athletes |
What is the median heart rate of athletes?
What is the median heart rate of the non-athletes?
Using this measure, which group has the lower heart rates?
Non-athletes
Athletes
What is the interquartile range of the athletes' heart rates?
What is the interquartile range of the non-athletes' heart rates?
Using this measure, which group has more consistent heart rate measures?
Non-athletes
Athletes