topic badge

8.09 Box plots

Introduction

Box and whisker plots, or box plots, are a great way of displaying numerical data as they clearly show all the quartiles in a data set. Since statisticians are interested in what's "normal," they assume that most values will be somewhere in the middle. The "box" in box and whisker plots indicates the middle half of the data set. Let's look at how box plots give us a clear picture of a data set's central tendency and spread.

Box and whisker plots

We start with a number line that displays the values in our data set. Above that, you'll see that there are two lines or "whiskers" that extend from the box outwards. The two end points of these lines show the maximum (upper extreme) and minimum (lower extreme) values in the data set.

A box plot with its different parts- Minimum, Lower quartile, Median, Upper quartile, and Maximum.

The two vertical edges of the box show the  quartiles  of the data range. The left hand side of the box is the lower quartile (Q1) and the right hand side of the box is the upper quartile (Q3).

Remember that, from the minimum value to the lower quartile is 25\% of the data, from the lower quartile to the median is another 25\%, from the median to the upper quartile is another 25\% and from the upper quartile to the maximum value represents another 25\%.

50\% of the values in a data set lie between Q1 and Q3, which is the box portion between the lower and upper quartiles. This is the middle 50\% of the data which are often considered the normal values of data.

Finally, the vertical line inside the box shows the median (the middle value), sometimes called Q2.

The box and whisker plot shows a nice summary of all this information:

We can also find the range of a data set, which is the distance between the minimum and maximum values, by simply subtracting the largest and smallest pieces of data.

Along those same lines the interquartile range (IQR) is the distance between the lower and upper quartile. To find the IQR simply subtract Q3 - Q1.

A list of the minimum, lower quartile, median, upper quartile, and maximum values is often called the five number summary.

Creating a box plot:

  1. Put the data in ascending order (from smallest to largest).

  2. Find the median (middle value) of the data.

  3. To divide the data into quarters, find the median (middle value) between the minimum value and the median, as well as between the median and the maximum value.

If there are lots of values in a data set, it may be easier to work out which values represent the median and the upper and lower quartiles to avoid all of that counting.

Examples

Example 1

For the following box plot:

0
2
4
6
8
10
12
14
16
18
20
a

Find the lowest value.

Worked Solution
Create a strategy

The lowest value is at the end of the left whisker.

Apply the idea

\text{Lowest value}=3

b

Find the highest value.

Worked Solution
Create a strategy

The highest value is at the end of the right whisker.

Apply the idea

\text{Highest value}=18

c

Find the range.

Worked Solution
Create a strategy

The range is the difference between the highest value and the lowest value.

Apply the idea
\displaystyle \text{Range}\displaystyle =\displaystyle 18-3Find the difference of the values
\displaystyle =\displaystyle 15Evaluate the subtraction
d

Find the median.

Worked Solution
Create a strategy

The median is marked by a line between the lower and upper quartile.

Apply the idea

\text{Median}=10

e

Find the interquartile range (\text{IQR}).

Worked Solution
Create a strategy

The interquartile range (\text{IQR}) is the difference between the upper quartile and the lower quartile.

Apply the idea
\displaystyle \text{ Interquartile range (IQR) }\displaystyle =\displaystyle 15-7Find the difference between the quartiles
\displaystyle =\displaystyle 8Evaluate the subtraction

Example 2

You have been asked to represent this data in a box plot: 20,\,36,\,52,\,56,\,24,\,16,\,40,\,4,\,28

a

Complete the table for the given data.

Minimum
Lower quartile
Median
Upper quartile
Maximum
Interquartile range
Worked Solution
Create a strategy

To find the minimum, median, and maximum order the numbers from smallest to biggest and find the first, middle, and last value. Then find the quartiles.

Apply the idea

To find the minimum, median, and maximum,order the values: 4,\,16,\,20,\,24,\,28,\,36,\,40,\,52,\,56

\text{Minimum}=4

\text{Maximum}=56

The middle values is: 28

\text{Median}=28

To find the lower quartile, find the median of the lower half of the values: 4,\,16,\,20,\,24

The middle values are: 12,\,20

\displaystyle \text{Lower quartile}\displaystyle =\displaystyle \dfrac{16+20}{2}Find the average of the middle values
\displaystyle =\displaystyle \dfrac{36}{2}Evaluate the addition
\displaystyle =\displaystyle 18Evaluate the division

To find the upper quartile, find the median of the upper half of the values: 36,\,40,\,52,\,56

The middle values are: 40,\,52

\displaystyle \text{Upper quartile}\displaystyle =\displaystyle \dfrac{40+52}{2}Find the average of the middle values
\displaystyle =\displaystyle \dfrac{92}{2}Evaluate the addition
\displaystyle =\displaystyle 46Evaluate the division
Minimum4
Lower quartile18
Median28
Upper quartile46
Maximum56
Interquartile range28
b

Construct a box plot for the data.

Worked Solution
Create a strategy

Use the the answer from part (a) to construct a box plot.

Apply the idea
0
10
20
30
40
50
60
Idea summary

The features of a box plot are shown below:

A box plot with its different parts- Minimum, Lower quartile, Median, Upper quartile, and Maximum.

A list of the minimum, lower quartile, median, upper quartile, and maximum values is often called the five number summary.

One quartile represents 25\% of the data set.

Creating a box plot:

  1. Put the data in ascending order (from smallest to largest).

  2. Find the median (middle value) of the data.

  3. To divide the data into quarters, find the median (middle value) between the minimum value and the median, as well as between the median and the maximum value.

Outcomes

6.SP.B.4

Display numerical data in plots on a number line, including dot plots, histograms, and box plots.

6.SP.B.5

Summarize numerical data sets in relation to their context, such as by:

6.SP.B.5.A

Reporting the number of observations

6.SP.B.5.B

Describing the nature of the attribute under investigation, including how it was measured and its units of measurement.

6.SP.B.5.C

Giving quantitative measures of center (median and/or mean) and variability (interquartile range and/or mean absolute deviation), as well as describing any overall pattern and any striking deviations from the overall pattern with reference to the context in which the data was gathered.

6.SP.B.5.D

Relating the choice of measures of center and variability to the shape of the data distribution and the context in which the data was gathered

What is Mathspace

About Mathspace