We previously looked at the quartiles of a data set, and found the first quartile, the median, and the third quartile. Remember that the quartiles can be useful to give some basic insight into the internal spread of data, whereas the range only uses the difference between the two extreme data points, the maximum and minimum. We can use the quartiles in combination with the two extremes of a data set to simplify the data into a five number summary:
The five numbers from the five number summary break up a set of scores into four parts with $25%$25% of the scores in each quartile. Have a look at the diagram here:
So knowing these five key numbers can help us identify regions, such as the top $25%$25%, $50%$50%, and $75%$75% of the scores.
The table shows the number of points scored by a basketball team in each game of their previous season.
$59$59 | $67$67 | $73$73 | $82$82 | $91$91 | $58$58 | $79$79 | $88$88 |
$69$69 | $84$84 | $55$55 | $80$80 | $98$98 | $64$64 | $82$82 |
Sort the data in ascending order.
State the maximum value of the set.
State the minimum value of the set.
Find the median value.
Find the lower quartile.
Find the upper quartile.
Creating a box plot:
For the box plot above, find the:
(a) Range
Think: The range is the difference between the highest score and the lowest score. That is, the difference between the scores at the ends of the whiskers.
Do: For this data set, the range is $18-3=15$18−3=15.
(b) Median
Think: The median is shown by the line inside the rectangular box.
Do: For this data set, the median line is at the score $10$10.
(c) Interquartile range (IQR)
Think: The IQR is the difference between the upper quartile and the lower quartile.
Do: For this set, the lower quartile (at the left end of the box) is $8$8, while the upper quartile (at the right end of the box) is $15$15. This means that the IQR is $15-8=7$15−8=7.
(d) What percentage of scores are in the range $8$8 to $18$18 inclusive?
Think: $8$8 is the first quartile and $18$18 is the maximum value and there are $25%$25% of the data between each quartile.
Do: There is $75%$75% of the data between these values.
Parallel box plots are used to compare two or more sets of data visually. These box plots are presented parallel to each other along the same number line for comparison. They must therefore be in the same scale, so a visual comparison is fairly straightforward. It is important to clearly label each box plot.
Key Comparisons:
When comparing two sets of data we can compare the location of each value of the five number summary. We can also ask ourselves the following questions:
The parallel box plot below shows two sets of data, comparing the time it took two different groups of people to complete an online task.
(a) Which group was generally faster?
Think: Which box plot has its main values further to the left? Is this consistent for all of the values in the five number summary? Are the differences significant? In particular note the difference in the median.
Do: We can see that overall the under $30$30s were faster at completing the task. Each of the numbers in the five number summary are smaller for the under $30$30s and their median is $4$4 seconds faster than the over $30$30s. We also have over $75%$75% of the under $30$30s completed the task in under $22$22 seconds, which is the median time taken by the over $30$30s. $100%$100% of the under $30$30s had finished the task before $75%$75% of the over $30$30s had completed it.
(b) Which group had more consistent completion times?
Think: For consistency note the difference in range and interquartile range. Recall, the smaller a measure of spread the more consistent the scores are.
Do: Overall the under $30$30s had smaller spread of scores. There was a larger variance within the over $30$30 group, with a range of $24$24 seconds compared to $20$20 seconds for the under $30$30s. The interquartile range was also smaller by $3$3 seconds for the under $30$30s group.
The box plots below represent the daily sales made by Carl and Angelina over the course of one month.
0 10 20 30 40 50 60 70 Angelina's Sales |
0 10 20 30 40 50 60 70 Carl's Sales |
Two box plots displayed above horizontal number lines. The box plot above represents Angelina's sales and the one below represents Carl's sales. The number lines have major tick marks at intervals of $10$10, ranging from $0$0 to $70$70. Between each major tick marks, there are nine minor tick marks representing increment of $1$1 unit. On Angelina's box plot, the box spans from $16$16, representing the first quartile, to $42$42, representing the third quartile, with a vertical line dividing the box at $30$30, representing the median. Whiskers extend from the edges of the Angelina's box to $2$2 on the left and $51$51 on the right, representing minimum and maximum data points, respectively. On Carl's box plot, The box spans from $30$30, representing the first quartile, to $49$49, representing the third quartile, with a vertical line dividing the box at $42$42, representing the median. Whiskers extend from the edges of the Carl's box to $14$14 on the left and $64$64 on the right, representing minimum and maximum data points, respectively.
What is the range in Angelina's sales?
What is the range in Carl’s sales?
By how much did Carl’s median sales exceed Angelina's?
Considering the middle $50%$50% of sales for both sales people, whose sales were more consistent?
Carl
Angelina
Which salesperson had a more successful sales month?
Angelina
Carl