9. Centre and Spread

Lesson

Measures of central tendency attempt to summarise a set of data with a single value that describes the centre or middle of the scores.

The three main measures of central tendency are the mean, median,** **and mode. Deciding which one is best depends on some other characteristics of the particular set of data, and we will look further into the suitability of the different measures and observing the effects of outliers in our next lesson.

Measures of centre

**Mean:** Often referred to as the average–this is the sum of the scores divided by the number of scores.

**Median:** The middle value of an ordered set of data–or the value that separates the bottom half and top half of the scores.

**Mode:** The most frequently occurring value. For continuous data or data grouped in class intervals we talk about the modal class - the most frequently occurring class, rather than a mode.

The median is one way of describing the middle or the centre of a data set using a single value. The median is the **middle score** in a data set.

The data must be ordered (usually in ascending order) before calculating the median.

Suppose we have five numbers in our data set: $4$4, $11$11, $15$15, $20$20 and $24$24.

The median would be $15$15 because it is the value right in the middle. There are two numbers on either side of it.

$4,11,\editable{15},20,24$4,11,15,20,24

If we have a larger data set, however, we may not be able to see straight away which term is in the middle. There are two methods we can use to help us work this out.

Once a data set is ordered, we can cross out numbers in pairs (one high number and one low number) until there is only one number left. Let's check out this process using an example. Here is a data set with nine numbers:

- Check that the data is sorted in ascending order (i.e. in order from smallest to largest).

- Cross out the smallest and the largest number, like so:

- Repeat step 2, working from the outside in - taking the smallest number and the largest number each time until there is only one term left. We can see in this example that the median is $7$7:

Note that this process will only leave one term if there are an odd number of terms to start with. If there are an even number of terms, this process will leave two terms instead, if you cross them all out, you've gone too far! To find the median of a set with an even number of terms, we can then take the mean of these two remaining middle terms.

We can also work out which term will be the middle number by considering whether there is an odd or even number of scores, and then using a formula.

We summarise the formulas below.

Finding the median position

Let $n$`n` be the number of terms.

- If $n$
`n`is odd, then the median is the middle term, which is the $\frac{n+1}{2}$`n`+12th term. - If $n$
`n`is even, then the median is the average of the two middle terms, that being the $\frac{n}{2}$`n`2th and $\left(\frac{n}{2}+1\right)$(`n`2+1)th terms.

Let's use the same set of nine numbers from the previous example, $1,1,3,5,7,9,9,10,15$1,1,3,5,7,9,9,10,15. We can see that there is an odd number of scores, $n=9$`n`=9, so the position of the median is:

$\text{Position of median }$Position of median | $=$= | $\frac{9+1}{2}$9+12 |
Where we've used $\frac{n+1}{2}$ |

$=$= | $5$5th term |
Simplifying the fraction |

This means the **fifth** term will be the median: $1,1,3,5,\editable{7},9,9,10,15$1,1,3,5,7,9,9,10,15.

So again, we find that the median is $7$7.

Let's now try this with an even number of terms. Here is a data set with four terms: $8,12,17,20$8,12,17,20. This time, we have $n=4$`n`=4. What would happen if we used the same procedure as above?

$\text{Position of median}$Position of median | $=$= | $\frac{4+1}{2}$4+12 |
Where we've used $\frac{n+1}{2}$ |

$=$= | $2.5$2.5th term |
Simplifying the fraction |

What does the "$2.5$2.5th term" mean? Well, just like when we used the "cross-out" method, the $2.5$2.5th term means the average (mean) of the $2$2nd and $3$3rd terms. This is why the when the number of scores, $n$`n`, is even, we find the average of the $\frac{n}{2}$`n`2th term and $\left(\frac{n}{2}+1\right)$(`n`2+1)th terms.

Again, remember that the data must be in order before counting along to the median position. So in this example, the median will be the average of $12$12 and $17$17.

$\text{Median }$Median | $=$= | $\frac{12+17}{2}$12+172 |
Taking the average of the $2$2nd and $3$3rd scores |

$=$= | $14.5$14.5 |
Simplifying the fraction |

Consider the following scores:

$23,25,13,9,11,21,24,17,20$23,25,13,9,11,21,24,17,20

Sort the scores in ascending order.

Calculate the median.

Write down $4$4 consecutive odd numbers whose median is $40$40.

Write all solutions on the same line separated by a comma.

Determine the following using the histogram:

The total number of scores.

The median.

The mode is another **measure of central tendency** - that is, it's a third way of describing a value that represents the centre of the data set. The **mode** describes the **most frequently occurring score**. For continuous data or data grouped in class intervals we talk about the **modal class** - the most frequently occurring class, rather than a mode.

Let's say we ask $10$10 people how many pets they have. $2$2 people say no pets, $6$6 people say one pet and $2$2 people say they have two pets. What is the most common number of pets for people to have? In this case, the most common number is **one** pet, because the largest number of people, which was $6$6, had one pet. So the mode of this data set is $1$1.

Data can have more than one mode when several outcomes have the same highest frequency. When the data has two or more modes we refer to it as being **multimodal** and if it has exactly two modes it is called bimodal.

**Note: **We can also refer to the general shape of the data as being bimodal if the data has two clear peaks. When talking about the general shape the peaks do not need to be of exactly the same height.

A statistician organised a set of data into the frequency table shown below, find the mode of the data.

Score ($x$x) |
Frequency ($f$f) |
---|---|

$10$10 | $26$26 |

$20$20 | $10$10 |

$30$30 | $18$18 |

$40$40 | $18$18 |

$50$50 | $15$15 |

**Think:** The mode is the score that occurs **most frequently**.

**Do:** The highest number in the frequency column is $26$26. This corresponds to the score of $10$10, and therefore the mode is $10$10.

**Reflect:** At a glance, it may seem unusual that $10$10 is the mode, since the mode measures central tendency, and $10$10 is far from being the centre of the numbers that we saw between $10$10 and $50$50.

The mode measures central tendency, but a different kind of central tendency. It tells us where the data likes to "bunch up"–this gives us an approximation for what score we're likely to draw if we sample from the data set.

Find the mode of the following scores:

$8,18,5,2,2,10,8,5,14,14,8,8,10,18,14,5$8,18,5,2,2,10,8,5,14,14,8,8,10,18,14,5

Mode = $\editable{}$

Find the mode from the histogram shown.