Measures of spread in a numerical data set describe whether the values in a data set are very similar and clustered together, or whether there is a lot of variation in the values and they are very spread out. In this section, we will look at the range and interquartile range as measures of spread.
The range is the simplest measure of spread in a numerical data set. It is the difference between the maximum and minimum values in a data set.
Two bus drivers, Kenji and Bjorn, track how many passengers board their buses each day for a week. Their results are displayed in this table:
M | T | W | T | F | |
---|---|---|---|---|---|
Kenji | 10 | 13 | 14 | 16 | 11 |
Bjorn | 2 | 27 | 13 | 5 | 17 |
Both data sets have the same median and the same mean, but the sets are quite different. To calculate the range, we start by finding the greatest and least number of passengers for each driver:
Greatest | Least | |
---|---|---|
Kenji | 16 | 10 |
Bjorn | 27 | 2 |
Now we subtract the least from the greatest to find the difference, which is the range:
Range | |||
---|---|---|---|
Kenji | 16-10 | = | 6 |
Bjorn | 27-2 | = | 25 |
Notice how Kenji's range is quite small compared to Bjorn's. We might say that Kenji's route is more predictable and that Bjorn's route is much more variable. We can see that the range does not say anything about the size of the values, just their spread.
The range of a numerical data set is the difference between the greatest and the least value in the data set.
Which of the following data sets has the largest range?
Range is the difference between the greatest and the least value in the data set.
To get a better picture of the internal spread in a data set, it is often more useful to find the set's quartiles, from which the interquartile range (IQR) can be calculated.
Quartiles are values at particular locations in the data set-similar to the median, but instead of dividing a data set into halves, they divide a data set into quarters. Let's look at how we would divide up some data sets into quarters now.
Make sure the data set is ordered before finding the quartiles or the median.
First locate the median, between the 4\text{th} and 5\text{th} values:
Now there are four values in each half of the data set, so split each of the four values in half to find the quartiles. We can see the first quartile, Q_{1} is between the 2\text{nd} and 3\text{rd} values-that is, there are two values on either side of Q_{1}. Similarly, the third quartile, Q_{3} is between the 6\text{th} and 7\text{th} values:
To find Q_{1} for this data set, we would need to find the mean of 3 and 4, which is 3.5. And to find Q_{3}, we would find the mean of 12 and 14, which is 13.
Now let's look at a data set with 9 values:
This time, the 5\text{th} term is the median. There are four terms on either side of the median. So Q_{1} is between the 2\text{nd} and 3\text{rd} values and Q_{3} is between the 6\text{th} and 7\text{th} values. Again, we would need to find the mean of the 2\text{nd} and 3\text{rd} values, and the mean of the 6\text{th} and 7\text{th} values to find Q_{1} and Q_{3}.
Finally, let's look at a set with 10 values:
For this set, the median is between the 5\text{th} and 6\text{th} values. This time, there are 5 values on either side of the median. So Q_{1} is the 3\text{rd} term and Q_{3} is the 8\text{th} term.
Each quartile represents 25\% of the data set. The least value to the first quartile is approximately 25\% of the data, the first quartile to the median is another 25\%, the median to the third quartile is another 25\%, and the third quartile to the greatest value represents the last 25\% of the data. We can combine these sections together-for example, 50\% of the values in a data set lie between the first and third quartiles.
Q_{1} is the first quartile (sometimes called the lower quartile). It is the middle value in the bottom half of data.
Q_{2} is the second quartile, and is usually called the median, which we have already learned about.
Q_{3} is the third quartile (sometimes called the upper quartile). It is the middle value in the top half of the data set.
The interquartile range (IQR) is the difference between the third quartile and the first quartile. 50\% of values lie within the IQR because it contains the data set between the first quartile and the median, as well as the median and the third quartile. Since it focuses on the middle 50\% of the data set, the interquartile range often gives a better indication of the internal spread than the range does, and it is less affected by individual values that are unusually high or low, which are the outliers.
Consider the following set of values:33,\,38,\,50,\,12,\,33,\,48,\,41
Sort the values in ascending order.
Find the number of values.
Find the median.
Find the first quartile of the set of values.
Find the third quartile of the set of values.
Find the interquartile range.
Interquartile range is the difference between the third quartile and the first quartile.
To find the first quartile, find the median of the first half of the data set. To find the third quartile, find the median of the second half of the data set.