topic badge
iGCSE (2021 Edition)

18.09 Interquartile range

Lesson

Quartiles

When we find the median value of a data set, we are finding a central value with the property that there are as many data points below this value as there are above it. That is, we are finding a value that splits the data set into two equal parts.

We then extended this idea in order to define the quartiles. These are three numbers that split a data set into four equal parts - so the first quartile is the median of the lower half of the set, and the third quartile is the median of the upper half. (The second quartile is, of course, just the median of the whole set.)

Extending this idea further still leads to the concept of quantiles. If there are $N$N numbers in the data set and we want to divide the set into $k$k parts, the associated quantile is a number which has, as nearly as possible, $\frac{N}{k}$Nk of the numbers are below it in the ordered set, with the remaining data points above it.

Most often we choose either $k=10$k=10 and the resulting quantiles are called deciles, or we choose $k=100$k=100 and call the resulting quantiles percentiles.

Clearly, the median should be the same as the $50$50th percentile, the first quartile should be the same as the $25$25th percentile and the third quartile should be the same as the $75$75th percentile. Similarly, for example, the $4$4th decile is the same as the $40$40th percentile. Thus, if we know how to calculate the percentiles, we automatically have a way of determining the quartiles and the deciles.

There are different methods for determining the percentiles of a data set, each giving slightly different results. The differences disappear when the data sets are large.

The simplest method is the following: to find the $p$pth percentile of a data set with $N$N elements, calculate $\frac{p}{100}\times N$p100×N. The smallest integer that is greater than or equal to the result is the rank of the number in the data that will be taken to be the required percentile.

 

Worked example

Example 1

Find the $30$30th percentile of the following set of nine numbers: $14,19,23,24,31,33,40,42,56$14,19,23,24,31,33,40,42,56.

Think: Note that, once again, the data set is already arranged in ascending order. So the $30$30th percentile can be found $\frac{30}{100}$30100 of the way along the data set. Remember that there are $n=9$n=9 scores.

Do:

$\text{Position }$Position $=$= $\frac{30}{100}\times9$30100×9
  $=$= $2.7$2.7th score

The nearest integer above $2.7$2.7 is $3$3. So, we take the third score to be the $30$30th percentile.

So, for this data set, the $30$30th percentile is $23$23.

Reflect: Note that the $25$25th percentile would also be $23$23 for this set of scores, which happens because the data set is so small. If the data set was much larger, the two percentiles would likely be different.

 

Note: In this course you may use your GDC to find the mean, median and quartiles of a discrete data set. 

 

Practice question

QUESTION 1

Consider the data set $9,5,6,3,9,8,4,2,3,2$9,5,6,3,9,8,4,2,3,2.

  1. Calculate the mean to two decimal places.

  2. Calculate the median.

  3. Calculate the value of quartile $1$1.

  4. Calculate the value of quartile $3$3.

  5. Calculate the value of decile $2$2.

  6. Calculate the value of decile $8$8.

  7. Calculate the value of the percentile $43$43.

  8. Calculate the value of the percentile $88$88.

 

 

Interquartile range

Whilst the range is very simple to calculate, it is based on only two numbers in the data set, it does not tell us about the spread of data within these two values. To get a better picture of the internal spread in a data set, it is often more useful to find the set's quartiles, from which the interquartile range (IQR) can be calculated.

Quartiles are scores at particular locations in the data set–similar to the median, but instead of dividing a data set into halves, they divide a data set into quarters. Let's look at how we would divide up some data sets into quarters now.

Careful!

Make sure the data set is ordered before finding the quartiles or the median.

 

Exploration

  • Here is a data set with $8$8 scores:
$\editable{1}$1   $\editable{3}$3   $\editable{4}$4   $\editable{7}$7   $\editable{11}$11   $\editable{12}$12   $\editable{14}$14   $\editable{19}$19

 

First locate the median, between the $4$4th and $5$5th scores:

        Median        
              $\downarrow$              
$\editable{1}$1   $\editable{3}$3   $\editable{4}$4   $\editable{7}$7   $\editable{11}$11   $\editable{12}$12   $\editable{14}$14   $\editable{19}$19

 

Now there are four scores in each half of the data set, so split each of the four scores in half to find the quartiles. We can see the first quartile, $Q_1$Q1, is between the $2$2nd and $3$3rd scores–that is, there are two scores on either side of $Q_1$Q1. Similarly, the third quartile, $Q_3$Q3, is between the $6$6th and $7$7th scores:

    $Q_1$Q1   Median   $Q_3$Q3    
      $\downarrow$       $\downarrow$       $\downarrow$      
$\editable{1}$1   $\editable{3}$3   $\editable{4}$4   $\editable{7}$7   $\editable{11}$11   $\editable{12}$12   $\editable{14}$14   $\editable{19}$19

 

  • Now let's look at a situation with $9$9 scores:
    $Q_1$Q1   Median   $Q_3$Q3    
      $\downarrow$         $\downarrow$         $\downarrow$      
$\editable{8}$8   $\editable{8}$8   $\editable{10}$10   $\editable{11}$11   $\editable{13}$13   $\editable{14}$14   $\editable{18}$18   $\editable{22}$22   $\editable{25}$25

 

This time, the $5$5th term is the median. There are four terms on either side of the median, like for the set with eight scores. So $Q_1$Q1 is still between the $2$2nd and $3$3rd scores and $Q_3$Q3 is between the $6$6th and $7$7th scores.

 

  • Finally, let's look at a set with $10$10 scores:
    $Q_1$Q1   Median   $Q_3$Q3    
        $\downarrow$         $\downarrow$         $\downarrow$        
$\editable{12}$12   $\editable{13}$13   $\editable{14}$14   $\editable{19}$19   $\editable{19}$19   $\editable{21}$21   $\editable{22}$22   $\editable{22}$22   $\editable{28}$28   $\editable{30}$30

 

For this set, the median is between the $5$5th and $6$6th scores. This time, however, there are $5$5 scores on either side of the median. So $Q_1$Q1 is the $3$3rd term and $Q_3$Q3 is the $8$8th term.

 

Naming the quartiles

  • $Q_1$Q1 is the first quartile (sometimes called the lower quartile). It is the middle score in the bottom half of data and it represents the $25$25th percentile. $25%$25% of scores are less than the lower quartile. 
  • $Q_2$Q2 is the second quartile, and is usually called the median, which we have already learnt about. It represents the $50$50th percentile of the data set. $50%$50% of scores are less than the median.
  • $Q_3$Q3 is the third quartile (sometimes called the upper quartile). It is the middle score in the top half of the data set, and represents the $75$75th percentile. $75%$75% of scores are less than the upper quartile. 

 

Calculating the interquartile range

The interquartile range (IQR) is the difference between the third quartile and the first quartile. $50%$50% of scores lie within the IQR because it contains the data set between the first quartile and the median, as well as the median and the third quartile.

Since it focuses on the middle $50%$50% of the data set, the interquartile range often gives a better indication of the internal spread than the range does, and it is less affected by individual scores that are unusually high or low, which are the outliers.

 

To calculate the interquartile range

Subtract the first quartile from the third quartile. That is,

$\text{IQR }=Q_3-Q_1$IQR =Q3Q1

Note: In this course you may use your GDC to calculate the quartiles and median of a data set.

Worked example

Example 2

Consider the following set of data: $1,1,3,5,7,9,9,10,15$1,1,3,5,7,9,9,10,15.

(a) Identify the median.

Think: There are nine numbers in the set, so we can say that $n=9$n=9. We can also see that the data set is already arranged in ascending order. We identify the median as the middle score either by the "cross-out" method or as the $\frac{n+1}{2}$n+12th score. 

Do:

$\text{Position of median}$Position of median $=$= $\frac{9+1}{2}$9+12

Substituting $n=9$n=9 into $\frac{n+1}{2}$n+12

  $=$= $5$5th score

Simplifying the fraction

 

Counting through the set to the $5$5th score gives us $7$7 as the median.

(b) Identify $Q_1$Q1 (the lower quartile) and $Q_3$Q3 (the upper quartile).

Think: We identify $Q_1$Q1 and $Q_3$Q3 as the middle scores in the lower and upper halves of the data set respectively, either by the "cross-out" method–or any method that we use to find the median, but just applying it to the lower or upper half of the data set. 

Do: The lower half of the data set is all the scores to the left of the median, which is $1,1,3,5$1,1,3,5. There are four scores here, so $n=4$n=4. So we can find the position of $Q_1$Q1 as follows:

$\text{Position of }Q_1$Position of Q1 $=$= $\frac{4+1}{2}$4+12

Substituting $n=4$n=4 into $\frac{n+1}{2}$n+12

  $=$= $2.5$2.5th score

Simplifying the fraction

 

$Q_1$Q1 is therefore the mean of the $2$2nd and $3$3rd scores. So we see that:

$Q_1$Q1 $=$= $\frac{1+3}{2}$1+32

Taking the average of the $2$2nd and $3$3rd scores

  $=$= $2$2

Simplifying the fraction

 

The upper half of the data set is all the scores to the right of the median, which is $9,9,10,15$9,9,10,15. Since there are also $n=4$n=4 scores, $Q_3$Q3 will be the mean of the $2$2nd and $3$3rd scores in this upper half.

$Q_3$Q3 $=$= $\frac{9+10}{2}$9+102

Taking the average of the $2$2nd and $3$3rd scores in the upper half

  $=$= $9.5$9.5

Simplifying the fraction

 

(c) Calculate the $\text{IQR }$IQR  of the data set.

Think: Remember that $\text{IQR }=Q_3-Q_1$IQR =Q3Q1, and we just found $Q_1$Q1 and $Q_3$Q3.

Do:

$\text{IQR }$IQR $=$= $9.5-2$9.52

Substituting $Q_1=9.5$Q1=9.5 and $Q_3=2$Q3=2 into the formula

  $=$= $7.5$7.5

Simplifying the subtraction

 

How do outliers affect the range and IQR?

Remember, the range only changes if the highest or lowest score in a data set is changed, otherwise it will remain the same. An outlier is always the highest or lowest score in a data set.  Therefore including an outlier will increase the range.

The IQR does not use the highest or lowest score, therefore including an outlier will have no effect on the IQR


Practice questions

Question 2

Answer the following, given this set of scores:

$33,38,50,12,33,48,41$33,38,50,12,33,48,41

  1. Sort the scores in ascending order.

  2. Find the number of scores.

  3. Find the median.

  4. Find the first quartile of the set of scores.

  5. Find the third quartile of the set of scores.

  6. Find the interquartile range.

Question 3

The stem plot shows the number of hours students spent studying during an entire semester.

Stem Leaf
$6$6 $2$2 $7$7        
$7$7 $1$1 $2$2 $2$2 $4$4 $7$7 $9$9
$8$8 $0$0 $1$1 $2$2 $5$5 $7$7  
$9$9 $0$0 $1$1        
 
Key: $5$5$\mid$$2$2$=$=$52$52
  1. Find the first quartile of the set of scores.

  2. Find the third quartile of the set of scores.

  3. Find the interquartile range.

 

Outcomes

0607C11.4B

Quartiles from lists of discrete data.

0607E11.4B

Quartiles from lists of discrete data.

What is Mathspace

About Mathspace