The range is a measure of spread based on the minimum and maximum in a data set, but does not tell us about the spread of data that falls between these two values. To find the range of a data set, simply subtract the largest and smallest pieces of data.
The median is a measure of center, and tells us where the middle of the data set is.
Investigate how to divide the data into various percentages by checking the boxes. To investigate with different sets of data, click the "New data" button.
What percentage of the data lies below Q_1? Hence, what does Q_1 represent?
What percentage of the data lies below the median? Hence, what does the median represent?
What percentage of the data lies below Q_3? Hence, what does Q_3 represent?
What percentage of the data lies between the minimum value and the maximum value?
Which section of the data is the most spread out?
From the minimum to Q_1
From Q_1 to the median
From the median to Q_3
From Q_3 to the maximum
To get a better picture of the internal spread in a data set, we can find the quartiles of a data set. Quartiles are scores at particular locations in the data set. Instead of dividing a data set into halves, like the median, they divide a data set into 4 quarters, where each quarter contains the same number of data values.
Let's look at how we would divide up a data set into quarters. When runners train for a marathon, they gradually increase the number of miles they run in the months before the marathon. This data set represents the number of miles Alessia ran each week of training.
Now there are four values in each half of the data set, so we will split each of the four values in half to find the quartiles.
We can now summarize the data by looking at these five critical points:
Lower extreme | 1 |
---|---|
Lower quartile | 3.5 |
Median | 9 |
Upper quartile | 13 |
Upper extreme | 19 |
These values, known as the five-number summary, can be easily displayed in a boxplot or box-and-whisker plot.
The number line at the bottom helps us read the values in the boxplot. Above that, you will see that there are two lines or "whiskers" that extend from the box outwards. The box and the whiskers help us easily identify the four different quartiles. Each quartile represents approximately 25\% of the data set.
The interquartile range (IQR) is the difference between the third quartile and the first quartile. 50\% of scores lie within the IQR.
Since it focuses on the middle 50\% of the data set, the interquartile range often gives a better indication of the internal spread than the range does, and it is less affected by outliers. IQR = \text{Upper quartile} - \text{Lower quartile}
In the previous example, the IQR of Alessia's training data is 13-3.5 = 9.5This tells us that the middle 50\% of the data differs by 9.5 miles.
To create a boxplot:
Put the data in ascending order (from smallest to largest).
Find the median (middle value) of the data.
To divide the data into quarters, find the middle value between the minimum value and the median, as well as between the median and the maximum value.
When working through the data cycle, boxplots are a useful tool for answering statistical questions related to the spread of the data.
For the following boxplot:
Find the lower extreme.
Find the upper extreme.
Find the range.
Find the median.
Find the interquartile range (IQR).
You have been asked to represent this data in a boxplot: 20,\,36,\,52,\,56,\,24,\,16,\,40,\,4,\,28
Complete the table for the given data.
Minimum | |
---|---|
Lower quartile | |
Median | |
Upper quartile | |
Maximum | |
Interquartile range |
Construct a boxplot for the data.
The box-and-whisker plot represents the thickness of the glass on various dining tables.
Which formulated question could be answered by analyzing the given boxplot?
What percentage of values lie between:
10.9 and 11.2
10.8 and 10.9
11.1 and 11.3
10.9 and 11.3
10.8 and 11.2
In which quartile (or quartiles) is the data the most spread out?
Lucille wants to track the amount of time she spends on her phone or tablet outside of school. Her goal is to only spend one to two hours on her devices each day. The statistical question she writes for her study is "How does the amount of time I spend on my phone or tablet vary each day?"
Determine the data Lucille must collect to answer her statistical question.
Which method of data collection would lead to the least amount of statistical bias for Lucille's study?
According to Lucille's phone and tablet settings, the total amount of time she spent on her devices (in hours) each day over the past three weeks is shown: \{3.2,\, 7.5,\, 6.1,\, 8.0,\, 1.8,\, 2.5,\, 4.8,\, 5.0,\, 3.2,\, 2.0,\, 0.5, 1.2,\, 2.8,\,4.5,\,3.6,\,5.5,\,7.1,\,6.2,\,4.2,\,2.3\}Represent the data in a boxplot.
How does the amount of time Lucille spends on her phone or tablet vary each day?
A list of the minimum, lower quartile, median, upper quartile, and maximum values is often called the five-number summary.
The lower quartile (Q_{1} or the first quartile) is the middle score in the bottom half of data.
The median (Q_{2} or the second quartile) is the middle value of a data set.
The upper quartile (Q_{3} or the third quartile) is the middle score in the top half of the data set.
One quartile represents 25\% of the data set.
These features are shown in a boxplot:
Creating a boxplot:
Put the data in ascending order (from smallest to largest).
Find the median (middle value) of the data.
To divide the data into quarters, find the middle value between the minimum value and the median, as well as between the median and the maximum value.
To calculate the interquartile range: