The **range** is a **measure of spread** based on the minimum and maximum in a data set, but does not tell us about the spread of data that falls between these two values. To find the range of a data set, simply subtract the largest and smallest pieces of data.

The **median** is a **measure of center**, and tells us where the middle of the data set is.

Investigate how to divide the data into various percentages by checking the boxes. To investigate with different sets of data, click the "New data" button.

What percentage of the data lies below Q_1? Hence, what does Q_1 represent?

What percentage of the data lies below the median? Hence, what does the median represent?

What percentage of the data lies below Q_3? Hence, what does Q_3 represent?

What percentage of the data lies between the minimum value and the maximum value?

Which section of the data is the most spread out?

From the minimum to Q_1

From Q_1 to the median

From the median to Q_3

From Q_3 to the maximum

To get a better picture of the internal spread in a data set, we can find the **quartiles** of a data set. Quartiles are scores at particular locations in the data set. Instead of dividing a data set into halves, like the median, they divide a data set into 4 quarters, where each quarter contains the same number of data values.

Let's look at how we would divide up a data set into quarters. When runners train for a marathon, they gradually increase the number of miles they run in the months before the marathon. This data set represents the number of miles Alessia ran each week of training.

Now there are four values in each half of the data set, so we will split each of the four values in half to find the quartiles.

We can now summarize the data by looking at these five critical points:

**Lower extreme**(minimum): lowest value**Lower quartile**(Q_1): about 25\% of the data is below this value- Median (Q_2): about 50\% of the data lies on either side of this value
**Upper quartile**(Q_3): about 25\% of the data is above this value**Upper extreme**(maximum): highest value

Lower extreme | 1 |
---|---|

Lower quartile | 3.5 |

Median | 9 |

Upper quartile | 13 |

Upper extreme | 19 |

These values, known as the five-number summary, can be easily displayed in a **boxplot** or box-and-whisker plot.

The number line at the bottom helps us read the values in the boxplot. Above that, you will see that there are two lines or "whiskers" that extend from the box outwards. The box and the whiskers help us easily identify the four different quartiles. Each quartile represents approximately 25\% of the data set.

The **interquartile range** (IQR) is the difference between the third quartile and the first quartile. 50\% of scores lie within the IQR.

Since it focuses on the middle 50\% of the data set, the interquartile range often gives a better indication of the internal spread than the range does, and it is less affected by outliers. IQR = \text{Upper quartile} - \text{Lower quartile}

In the previous example, the IQR of Alessia's training data is 13-3.5 = 9.5This tells us that the middle 50\% of the data differs by 9.5 miles.

To create a boxplot:

Put the data in ascending order (from smallest to largest).

Find the median (middle value) of the data.

To divide the data into quarters, find the middle value between the minimum value and the median, as well as between the median and the maximum value.

When working through the data cycle, boxplots are a useful tool for answering statistical questions related to the spread of the data.

For the following boxplot:

a

Find the lower extreme.

Worked Solution

b

Find the upper extreme.

Worked Solution

c

Find the range.

Worked Solution

d

Find the median.

Worked Solution

e

Find the interquartile range (IQR).

Worked Solution

You have been asked to represent this data in a boxplot: 20,\,36,\,52,\,56,\,24,\,16,\,40,\,4,\,28

a

Complete the table for the given data.

Minimum | |
---|---|

Lower quartile | |

Median | |

Upper quartile | |

Maximum | |

Interquartile range |

Worked Solution

b

Construct a boxplot for the data.

Worked Solution

The box-and-whisker plot represents the thickness of the glass on various dining tables.

a

Which formulated question could be answered by analyzing the given boxplot?

A

How many dining room tables have a thickness of 11.1\text{ mm}?

B

How does the size of a dining room table affect the thickness of the glass?

C

What proportion of glass tables have a thickness of 11\text{ mm}?

D

What range of thickness is most common for the glass of dining room tables?

Worked Solution

b

What percentage of values lie between:

10.9 and 11.2

10.8 and 10.9

11.1 and 11.3

10.9 and 11.3

10.8 and 11.2

Worked Solution

c

In which quartile (or quartiles) is the data the most spread out?

Worked Solution

Lucille wants to track the amount of time she spends on her phone or tablet outside of school. Her goal is to only spend one to two hours on her devices each day. The statistical question she writes for her study is "How does the amount of time I spend on my phone or tablet vary each day?"

a

Determine the data Lucille must collect to answer her statistical question.

Worked Solution

b

Which method of data collection would lead to the least amount of statistical bias for Lucille's study?

A

Collecting data on the amount of time she spent on her phone and tablet in the past three weeks

B

Collecting data on the amount of time she spends on her phone and tablet over the next three weeks

Worked Solution

c

According to Lucille's phone and tablet settings, the total amount of time she spent on her devices (in hours) each day over the past three weeks is shown: \{3.2,\, 7.5,\, 6.1,\, 8.0,\, 1.8,\, 2.5,\, 4.8,\, 5.0,\, 3.2,\, 2.0,\, 0.5, 1.2,\, 2.8,\,4.5,\,3.6,\,5.5,\,7.1,\,6.2,\,4.2,\,2.3\}Represent the data in a boxplot.

Worked Solution

d

How does the amount of time Lucille spends on her phone or tablet vary each day?

Worked Solution

Idea summary

A list of the minimum, lower quartile, median, upper quartile, and maximum values is often called the **five-number summary**.

- The
**lower extreme**(minimum) is the smallest value in the data set. The

**lower quartile**(Q_{1} or the**first quartile**) is the middle score in the bottom half of data.The

**median**(Q_{2} or the**second quartile**) is the middle value of a data set.The

**upper quartile**(Q_{3} or the**third quartile**) is the middle score in the top half of the data set.- The
**upper extreme**(maximum) is the largest value in the data set.

One quartile represents 25\% of the data set.

These features are shown in a boxplot:

Creating a boxplot:

Put the data in ascending order (from smallest to largest).

Find the median (middle value) of the data.

To divide the data into quarters, find the middle value between the minimum value and the median, as well as between the median and the maximum value.

To calculate the interquartile range:

\displaystyle IQR=Q_{3}-Q_{1}

\bm{IQR}

is the interquartile range

\bm{Q_{1}}

is the first quartile

\bm{Q_{3}}

is the third quartile