Middle Years

# 10.03 Measuring the centre and spread

Lesson

## Measures of centre

### Mean

The mean is often referred to as the average. To calculate the mean, add all the scores in a data set, then divide this by number of scores.

To find the mean from a graphical representation, we can use a frequency table to list out the values of on the graph. Consider the histogram below:

We can construct a frequency table like the one below:

Score ($x$x) Frequency ($f$f) $xf$xf
$1$1 $3$3 $3$3
$2$2 $8$8 $16$16
$3$3 $5$5 $15$15
$4$4 $3$3 $12$12
$5$5 $1$1 $5$5

The mean will be calculated by dividing the sum of the last column by the sum of the second column, $\frac{51}{20}=2.55$5120=2.55.

### Median

The median is one way of describing the middle or the centre of a data set using a single value. The median is the middle score in a data set.

Suppose we have five numbers in our data set: $4$4, $11$11, $15$15, $20$20 and $24$24.

The median would be $15$15 because it is the value right in the middle. There are two numbers on either side of it.

$4,11,\editable{15},20,24$4,11,15,20,24

If we have an even number of terms, we will need to find the average of the middle two terms. Suppose we wanted to find the median of the set $2,3,6,9$2,3,6,9, we want the value halfway between $3$3 and $6$6. The average of $3$3 and $6$6 is $\frac{3+6}{2}=\frac{9}{2}$3+62=92, or $4.5$4.5, so the median is $4.5$4.5.

$2,3,\editable{4.5},6,9$2,3,4.5,6,9

If we have a larger data set, however, we may not be able to see right away which term is in the middle. We can use the "cross out" method.

### The "cross out" method

Once a data set is ordered, we can cross out numbers in pairs (one high number and one low number) until there is only one number left. Let's check out this process using an example. Here is a data set with nine numbers:

1. Check that the data is sorted in ascending order (i.e. in order from smallest to largest).

1. Cross out the smallest and the largest number, like so:

1. Repeat step 2, working from the outside in - taking the smallest number and the largest number each time until there is only one term left. We can see in this example that the median is $7$7:

Note that this process will only leave one term if there are an odd number of terms to start with. If there are an even number of terms, this process will leave two terms instead, if you cross them all out, you've gone too far! To find the median of a set with an even number of terms, we can then take the mean of these two remaining middle terms.

The idea behind the cross out method can be used in graphical representations by cross off data points from each side.

### Mode

The mode describes the most frequently occurring score.

Suppose that $10$10 people were asked how many pets they had. $2$2 people said they didn't own any pets, $6$6 people had one pet and $2$2 people said they had two pets.

In this data set, the most common number of pets that people have is one pet, and so the mode of this data set is $1$1.

A data set can have more than one mode, if two or more scores are equally tied as the most frequently occurring.

Measures of centre

Mean

• The numerical average of a data set, this is the sum of the data values divided by the number of data values.
• Appropriate for sets of data where there are no values much higher or lower than those in the rest of the data set

Median

• The middle value of a data set ranked in order
•  A good choice when data sets have a couple of values much higher or lower than most of the others

Mode

• The data value that occurs most frequently
• A good descriptor to use when the set of data has some identical values, when data is non-numeric (categorical) or when data reflects the most popular item

#### Practice questions

##### Question 1

Find the median of this set of scores:

$3$3, $8$8, $13$13, $17$17, $19$19, $24$24, $26$26, $27$27

##### Question 2

Select the data set from each of the options below that has:

1. The lowest mode.

$87,2,20,20,8,10$87,2,20,20,8,10

A

$11,8,8,48,2,17$11,8,8,48,2,17

B

$87,2,20,20,8,10$87,2,20,20,8,10

A

$11,8,8,48,2,17$11,8,8,48,2,17

B
2. The highest median?

$2,8,11,17$2,8,11,17

A

$8,20,20,48,87$8,20,20,48,87

B

$2,8,11,17$2,8,11,17

A

$8,20,20,48,87$8,20,20,48,87

B

##### QUESTION 3

Consider the table below.

Score Frequency
$1$1 - $4$4 $1$1
$5$5 - $8$8 $5$5
$9$9 - $12$12 $10$10
$13$13 - $16$16 $5$5
$17$17 - $20$20 $3$3
1. Use the midpoint of each class interval to determine the mean of the following sample distribution, correct to one decimal place.

2. Which is the modal group?

$5$5 - $8$8

A

$13$13 - $16$16

B

$1$1 - $4$4

C

$9$9 - $12$12

D

$17$17 - $20$20

E

$5$5 - $8$8

A

$13$13 - $16$16

B

$1$1 - $4$4

C

$9$9 - $12$12

D

$17$17 - $20$20

E

### Range

The range of a numerical data set is the difference between the smallest and largest scores in the set. The range is one type of measure of spread.

For example, at one school the ages of students in Year $7$7 vary between $11$11 and $14$14. So the range for this set is $14-11=3$1411=3.

As a different example, if we looked at the ages of people waiting at a bus stop, the youngest person might be a $7$7 year old and the oldest person might be a $90$90 year old. The range of this set of data is $90-7=83$907=83, which is a much larger range of ages.

The range of a numerical data set is given by:

Range$=$=maximum score$-$minimum score

#### Practice question

##### Question 4

Find the range of the following set of scores:

$20,19,3,19,18,3,16,3$20,19,3,19,18,3,16,3

### Interquartile range

Whilst the range is very simple to calculate, it is based on only two numbers in the data set, it does not tell us about the spread of data within these two values. To get a better picture of the internal spread in a data set, it is often more useful to find the set's quartiles, from which the interquartile range (IQR) can be calculated.

Quartiles are scores at particular locations in the data set–similar to the median, but instead of dividing a data set into halves, they divide a data set into quarters. Let's look at how we would divide up some data sets into quarters now.

Careful!

Make sure the data set is ordered before finding the quartiles or the median.

#### Exploration

• Here is a data set with $8$8 scores:
 $\editable{1}$1 $\editable{3}$3 $\editable{4}$4 $\editable{7}$7 $\editable{11}$11 $\editable{12}$12 $\editable{14}$14 $\editable{19}$19

First locate the median, between the $4$4th and $5$5th scores:

 Median $\downarrow$↓ $\editable{1}$1 $\editable{3}$3 $\editable{4}$4 $\editable{7}$7 $\editable{11}$11 $\editable{12}$12 $\editable{14}$14 $\editable{19}$19

Now there are four scores in each half of the data set, so split each of the four scores in half to find the quartiles. We can see the first quartile, $Q_1$Q1, is between the $2$2nd and $3$3rd scores–that is, there are two scores on either side of $Q_1$Q1. Similarly, the third quartile, $Q_3$Q3, is between the $6$6th and $7$7th scores:

 $Q_1$Q1​ Median $Q_3$Q3​ $\downarrow$↓ $\downarrow$↓ $\downarrow$↓ $\editable{1}$1 $\editable{3}$3 $\editable{4}$4 $\editable{7}$7 $\editable{11}$11 $\editable{12}$12 $\editable{14}$14 $\editable{19}$19

• Now let's look at a situation with $9$9 scores:
 $Q_1$Q1​ Median $Q_3$Q3​ $\downarrow$↓ $\downarrow$↓ $\downarrow$↓ $\editable{8}$8 $\editable{8}$8 $\editable{10}$10 $\editable{11}$11 $\editable{13}$13 $\editable{14}$14 $\editable{18}$18 $\editable{22}$22 $\editable{25}$25

This time, the $5$5th term is the median. There are four terms on either side of the median, like for the set with eight scores. So $Q_1$Q1 is still between the $2$2nd and $3$3rd scores and $Q_3$Q3 is between the $6$6th and $7$7th scores.

• Finally, let's look at a set with $10$10 scores:
 $Q_1$Q1​ Median $Q_3$Q3​ $\downarrow$↓ $\downarrow$↓ $\downarrow$↓ $\editable{12}$12 $\editable{13}$13 $\editable{14}$14 $\editable{19}$19 $\editable{19}$19 $\editable{21}$21 $\editable{22}$22 $\editable{22}$22 $\editable{28}$28 $\editable{30}$30

For this set, the median is between the $5$5th and $6$6th scores. This time, however, there are $5$5 scores on either side of the median. So $Q_1$Q1 is the $3$3rd term and $Q_3$Q3 is the $8$8th term.

### What do the quartiles represent?

Each quartile represents $25%$25% of the data set. The lowest score to the first quartile is approximately $25%$25% of the data, the first quartile to the median is another $25%$25%, the median to the third quartile is another $25%$25%, and the third quartile to the highest score represents the last $25%$25% of the data. We can combine these sections together–for example, $50%$50% of the scores in a data set lie between the first and third quartiles.

These quartiles are sometimes referred to as percentilesA percentile is a percentage that indicates the value below which a given percentage of observations in a group of observations fall. For example, if a score is in the $75$75th percentile in a statistical test, it is higher than $75%$75% of all other scores. The median represents the $50$50th percentile, or the halfway point in a data set.

### Naming the quartiles

• $Q_1$Q1 is the first quartile (sometimes called the lower quartile). It is the middle score in the bottom half of data and it represents the $25$25th percentile. $25%$25% of scores are less than the lower quartile.
• $Q_2$Q2 is the second quartile, and is usually called the median, which we have already learnt about. It represents the $50$50th percentile of the data set. $50%$50% of scores are less than the median.
• $Q_3$Q3 is the third quartile (sometimes called the upper quartile). It is the middle score in the top half of the data set, and represents the $75$75th percentile. $75%$75% of scores are less than the upper quartile.

### Calculating the interquartile range

The interquartile range (IQR) is the difference between the third quartile and the first quartile. $50%$50% of scores lie within the IQR because it contains the data set between the first quartile and the median, as well as the median and the third quartile.

Since it focuses on the middle $50%$50% of the data set, the interquartile range often gives a better indication of the internal spread than the range does, and it is less affected by individual scores that are unusually high or low, which are the outliers.

To calculate the interquartile range

Subtract the first quartile from the third quartile. That is,

$\text{IQR }=Q_3-Q_1$IQR =Q3Q1

#### Worked example

##### Example 1

Consider the following set of data: $1,1,3,5,7,9,9,10,15$1,1,3,5,7,9,9,10,15.

(a) Identify the median.

Think: There are nine numbers in the set, so we can say that $n=9$n=9. We can also see that the data set is already arranged in ascending order. We identify the median as the middle score either by the "cross-out" method or as the $\frac{n+1}{2}$n+12th score.

Do:

 $\text{Position of median}$Position of median $=$= $\frac{9+1}{2}$9+12​ Substituting $n=9$n=9 into $\frac{n+1}{2}$n+12​ $=$= $5$5th score Simplifying the fraction

Counting through the set to the $5$5th score gives us $7$7 as the median.

(b) Identify $Q_1$Q1 (the lower quartile) and $Q_3$Q3 (the upper quartile).

Think: We identify $Q_1$Q1 and $Q_3$Q3 as the middle scores in the lower and upper halves of the data set respectively, either by the "cross-out" method–or any method that we use to find the median, but just applying it to the lower or upper half of the data set.

Do: The lower half of the data set is all the scores to the left of the median, which is $1,1,3,5$1,1,3,5. There are four scores here, so $n=4$n=4. So we can find the position of $Q_1$Q1 as follows:

 $\text{Position of }Q_1$Position of Q1​ $=$= $\frac{4+1}{2}$4+12​ Substituting $n=4$n=4 into $\frac{n+1}{2}$n+12​ $=$= $2.5$2.5th score Simplifying the fraction

$Q_1$Q1 is therefore the mean of the $2$2nd and $3$3rd scores. So we see that:

 $Q_1$Q1​ $=$= $\frac{1+3}{2}$1+32​ Taking the average of the $2$2nd and $3$3rd scores $=$= $2$2 Simplifying the fraction

The upper half of the data set is all the scores to the right of the median, which is $9,9,10,15$9,9,10,15. Since there are also $n=4$n=4 scores, $Q_3$Q3 will be the mean of the $2$2nd and $3$3rd scores in this upper half.

 $Q_3$Q3​ $=$= $\frac{9+10}{2}$9+102​ Taking the average of the $2$2nd and $3$3rd scores in the upper half $=$= $9.5$9.5 Simplifying the fraction

(c) Calculate the $\text{IQR }$IQR  of the data set.

Think: Remember that $\text{IQR }=Q_3-Q_1$IQR =Q3Q1, and we just found $Q_1$Q1 and $Q_3$Q3.

Do:

 $\text{IQR }$IQR $=$= $9.5-2$9.5−2 Substituting $Q_1=9.5$Q1​=9.5 and $Q_3=2$Q3​=2 into the formula $=$= $7.5$7.5 Simplifying the subtraction

### How do outliers affect the range and IQR?

Remember, the range only changes if the highest or lowest score in a data set is changed, otherwise it will remain the same. An outlier is always the highest or lowest score in a data set.  Therefore including an outlier will increase the range.

The IQR does not use the highest or lowest score, therefore including an outlier will have no effect on the IQR

#### Practice questions

##### Question 5

Answer the following, given this set of scores:

$33,38,50,12,33,48,41$33,38,50,12,33,48,41

1. Sort the scores in ascending order.

2. Find the number of scores.

3. Find the median.

4. Find the first quartile of the set of scores.

5. Find the third quartile of the set of scores.

6. Find the interquartile range.

##### Question 6

The stem plot shows the number of hours students spent studying during an entire semester.

Stem Leaf
$6$6 $2$2 $7$7
$7$7 $1$1 $2$2 $2$2 $4$4 $7$7 $9$9
$8$8 $0$0 $1$1 $2$2 $5$5 $7$7
$9$9 $0$0 $1$1

 Key: $5$5$\mid$∣$2$2$=$=$52$52
1. Find the first quartile of the set of scores.

2. Find the third quartile of the set of scores.

3. Find the interquartile range.

##### Question 7

The column graph shows the number of pets that each student in a class owns.

1. Find the first quartile of the set of scores.

2. Find the third quartile of the set of scores.

3. Find the interquartile range.