topic badge
AustraliaVIC
VCE 11 General 2023

1.07 Box plots

Lesson

Five number summary

A previous lesson defined the quartiles of a data set, and the calculations for finding the first quartile, the median, and the third quartile. Remember that the quartiles can be useful to give some basic insight into the internal spread of data, whereas the range only uses the difference between the two extreme data points, the maximum and minimum. The quartiles can be used in combination with the two extremes of a data set to simplify the data into a five number summary.

Five number summary

A five number summary for a data set consists of:

$\text{Min},Q_1,\text{Median},Q_3,\text{Max}$Min,Q1,Median,Q3,Max

 

The five numbers from the five number summary break up a set of scores into four parts, as shown:

So, knowing these five key numbers can help identify regions of $25%$25%, $50%$50%, and $75%$75% of the scores.

Practice question

Question 1

The table shows the number of points scored by a basketball team in each game of their previous season.

$59$59 $67$67 $73$73 $82$82 $91$91 $58$58 $79$79 $88$88
$69$69 $84$84 $55$55 $80$80 $98$98 $64$64 $82$82  
  1. Sort the data in ascending order.

  2. State the maximum value of the set.

  3. State the minimum value of the set.

  4. Find the median value.

  5. Find the lower quartile.

  6. Find the upper quartile.

 

Box plots

Box plots, sometimes called box-and-whisker-plots, can be a useful way of displaying quantitative (numerical) data as they clearly show the five values from a five number summary of a data set. In particular, a box plot highlights the middle $50%$50% of the scores in the data set, between $Q_1$Q1 and $Q_3$Q3. Box plots provide a clear picture of the central tendency and spread of a set of data.

 

Features of a box plot

Start with a number line that covers the full range of values in the data set. Next, plot the values from the five number summary on the number line, and connect them in a certain way to create a box plot. Here is an example:

The two vertical edges of the box show the quartiles of the data range. The left-hand side of the box is $Q_1$Q1 and the right-hand side of the box is $Q_3$Q3. The vertical line inside the box shows the median (the middle score) of the data.

Then there are two lines that extend from the box outwards. The endpoint of the left line is at the minimum score, while the endpoint of the right line is at the maximum score.

Worked examples

example 1

For the box plot above, find the:

(a) Lowest score

Think: The lowest score is the furthest left point of the plot.

Do: So in this case, the lowest score is $3$3.

(b) Highest score

Think: The highest score is the furthest right point of the plot.

Do: So in this case, the highest score is $18$18.

(c) Range

Think: The range is the difference between the highest score and the lowest score.

Do: For this data set, the range is $18-3=15$183=15.

(d) Median

Think: The median is shown by the line inside the rectangular box.

Do: For this data set, the median line is at the score $10$10.

(e) Interquartile range (IQR)

Think: The IQR is the difference between the upper quartile and the lower quartile.

Do: For this set, the lower quartile (at the left end of the box) is $8$8, while the upper quartile (at the right end of the box) is $15$15. This means that the IQR is $15-8=7$158=7.

example 2

Using the box plot above:

(a) What percentage of scores lie in each of the following regions?

  1. $10.9$10.9 and $11.2$11.2
  2. $10.8$10.8 and $10.9$10.9
  3. $11.1$11.1 and $11.3$11.3
  4. $10.9$10.9 and $11.3$11.3
  5. $10.8$10.8 and $11.2$11.2

Think: For these five regions, we should look at how many quartiles are in that region. Remember that one quartile represents $25%$25% of the data set.

Do:

  1. $50%$50% of scores lie between $Q_1$Q1 to $Q_3$Q3.
  2. $25%$25% of the scores lie between the lowest score and $Q_1$Q1.
  3. $50%$50% of scores lie between the median and the highest score.
  4. $75%$75% of scores lie between $Q_2$Q2 and the highest score.
  5. $75%$75% of scores lie between the lowest score and $Q_3$Q3.

 

Practice question

Question 2

 

Outcomes

U1.AoS1.5

construct and interpret graphical displays of data, and describe the distributions of the variables involved and interpret in the context of the data

What is Mathspace

About Mathspace