topic badge

11.01 Measures of central tendency

Lesson

Measures of central tendency attempt to summarise a set of data with a single value that describes the centre or middle of the scores.

The main measures of central tendency are the mean, median, and mode. Deciding which one is best depends on some other characteristics of the particular set of data, and we will look further into this in chapter 11.05.

 

The mean

The mean is often described as the average of the numbers in a data set. It is defined as the sum of the scores divided by the number of scores.

The symbol for the mean of a sample is $\overline{x}$x, whilst the population mean is represented by the symbol $\mu$μ (Greek letter 'mu').

To calculate the mean, we add up all the scores in a data set, then divide this total by the frequency (the number of scores).

Hint

To find the sum of all the scores, we can either add up each individual score, or, if certain scores are repeated, we can add the products of the scores and their frequencies (that is, add the products $f\times x=fx$f×x=fx). We will see this in action in Example 2.

Now let's have a go at calculating the mean of data sets ourselves.

 

Worked examples

Example 1

Find the mean from the data in the stem plot below:

Stem Leaf
$2$2 $3$3 $8$8    
$3$3 $1$1 $1$1 $1$1  
$4$4 $0$0 $3$3    
$5$5 $0$0 $3$3 $8$8 $8$8
$6$6 $2$2 $2$2 $9$9  
$7$7 $1$1 $8$8    
$8$8 $3$3      
$9$9 $0$0 $0$0 $1$1  

 

Think: We can find the mean by adding up all of the scores, then dividing the total by the number of scores.

Do:

$\text{Mean }$Mean $=$= $\frac{23+28+3\times31+40+43+50+53+2\times58+2\times62+69+71+78+83+2\times90+91}{20}$23+28+3×31+40+43+50+53+2×58+2×62+69+71+78+83+2×90+9120
  $=$= $\frac{1142}{20}$114220
  $=$= $57.1$57.1

 

Example 2

A statistician has organised a set of data into the frequency table shown:

Score ($x$x) Frequency ($f$f)
$44$44 $8$8
$46$46 $10$10
$48$48 $6$6
$50$50 $18$18
$52$52 $5$5

 

(a) Complete the frequency distribution table by adding the $fx$fx column:

Think: For each score ($x$x value), the value in the $fx$fx column will be that score multiplied by its frequency $f$f.

Score ($x$x) Frequency ($f$f) $fx$fx
$44$44 $8$8 $352$352
$46$46 $10$10 $460$460
$48$48 $6$6 $288$288
$50$50 $18$18 $900$900
$52$52 $5$5 $260$260
Totals $47$47 $2260$2260

(b) Calculate the mean of this data set, correct to two decimal places.

Think: We calculate the mean by dividing the sum of the scores (that is, the total $fx$fx) by the number of scores (the total $f$f).

Do:

$\text{Mean }$Mean $=$= $\frac{2260}{47}$226047
  $=$= $48.0851\ldots$48.0851
  $=$= $48.09$48.09 ($2$2 d.p.)

 

Practice questions

Question 1

Find the mean of the following scores:

$8$8, $15$15, $6$6, $27$27, $3$3.

Question 2

The mean of $4$4 scores is $21$21. If three of the scores are $17$17, $3$3 and $8$8, find the $4$4th score, $x$x

  1. Enter each line of working as an equation.

 

The median

The median is another measure of central tendency. It is one way of describing a value that represents the middle or the centre of a data set. The median (which kind of sounds like medium) is the middle score in a data set. 

Remember!

The data must be ordered (usually in ascending order) before calculating the median.

 

Which term is in the middle?

Suppose we have five numbers in our data set: $4$4, $11$11, $15$15, $20$20 and $24$24.

The median would be $15$15 because it is the value right in the middle. There are two numbers on either side of it.

$4,11,\editable{15},20,24$4,11,15,20,24

If we have a larger data set, however, we may not be able to see straight away which term is in the middle. There are two methods we can use to help us work this out.

 

The "cross out" method

Once a data set is ordered, we can cross out numbers in pairs (one high number and one low number) until there is only one number left. Let's check out this process using an example. Here is a data set with nine numbers:

  1. Check that the data is sorted in ascending order (i.e. in order from smallest to largest).

  1. Cross out the smallest and the largest number, like so:

  1. Repeat step 2, working from the outside in - taking the smallest number and the largest number each time until there is only one term left. We can see in this example that the median is $7$7:

Note that this process will only leave one term if there are an odd number of terms to start with. If there are an even number of terms, this process will leave two terms instead (if you cross them all out, you've gone too far)! To find the median of a set with an even number of terms, we can then take the mean of these two remaining middle terms.

 

The "counting terms" method

We can also work out which term will be the middle number by using the following formula:

Finding the median position

Let $n$n be the number of terms. Then the middle term is the $\frac{n+1}{2}$n+12th term.

So if we use the same set of nine numbers from the previous example, $1,1,3,5,7,9,9,10,15$1,1,3,5,7,9,9,10,15, we see that:

$\text{Middle term }$Middle term $=$= $\frac{9+1}{2}$9+12
  $=$= $5$5
This means the fifth term will be the median: $1,1,3,5,\editable{7},9,9,10,15$1,1,3,5,7,9,9,10,15.

So again, we find that the median is $7$7.

 

Let's now try this with an even number of terms. Here is a data set with four terms: $8,12,17,20$8,12,17,20. This time, we get:

$\text{Middle term }$Middle term $=$= $\frac{4+1}{2}$4+12
  $=$= $2.5$2.5th term

What does the "$2.5$2.5th term" mean? Well, just like when we used the "cross out" method, the $2.5$2.5th term means the average (mean) of the second and third terms. Again, remember that the data must be in order before counting along to the median position. So in this example, the median will be the average of $12$12 and $17$17.

$\text{Median }$Median $=$= $\frac{12+17}{2}$12+172
  $=$= $14.5$14.5
 

Practice questions

Question 3

Consider the following scores:

$23,25,13,9,11,21,24,17,20$23,25,13,9,11,21,24,17,20

  1. Sort the scores in ascending order.

  2. Calculate the median.

QUESTION 4

Write down $4$4 consecutive odd numbers whose median is $40$40.

  1. Write all solutions on the same line separated by a comma.

QUESTION 5

Determine the following using the histogram:

ScoreFrequency5101520254445464748

A bar graph that represents the distribution of scores. The x-axis is titled "Score" and ranges from $44$44 to $48$48 labeled in intervals of 1. The y-axis is titled "Frequency" and ranges from 5 to 25, labeled in major intervals of 5 and minor intervals of 1. The height of the column for score $44$44 is $20$20. The height of the column for score $45$45 is $17$17. The height of the column for score $46$46 is $6$6. The height of the column for score $47$47 is $11$11. The height of the column for score $48$48 is $13$13. The frequency of each score is in the graph but not explicitly labeled.

 

  1. The total number of scores.

  2. The median.

Question 6

Find the median from the frequency distribution table:

Score Frequency
$23$23 $2$2
$24$24 $26$26
$25$25 $37$37
$26$26 $24$24
$27$27 $25$25

 

The mode

The mode is another measure of central tendency - that is, it's a third way of describing a value that represents the centre of the data set. The mode describes the most frequently occurring score. Remember that the word and the meaning start with the same two letters.

Let's say we ask $10$10 people how many pets they have. $2$2 people say no pets, $6$6 people say one pet and $2$2 people say they have two pets. What is the most common number of pets for people to have? In this case, the most common number is one pet, because the largest number of people $\left(\frac{6}{10}\right)$(610) had one pet. So the mode of this data set is $1$1

Remember!

The mode is the most frequently occurring score

For data that has been grouped into class intervals, we no longer have a frequency for individual scores. Instead we have the frequency for each class interval. In these situations, we call the the class interval with the highest frequency, the modal class. So for grouped data, the modal class is the equivalent of the mode.

Remember!

For grouped data, the modal class is the class interval with the highest frequency.

 

Worked example

Example 3

A statistician organised a set of data into the frequency table shown:

Score ($x$x) Frequency ($f$f)
$10$10 $18$18
$20$20 $10$10
$30$30 $26$26
$40$40 $18$18
$50$50 $15$15

Think: The mode is the score that occurs the most frequently.

Do: The highest number in the frequency column is $26$26. This corresponds to the score of $30$30, and therefore the mode is $30$30.

 

Practice questions

Question 7

Find the mode of the following scores:

$8,18,5,2,2,10,8,5,14,14,8,8,10,18,14,5$8,18,5,2,2,10,8,5,14,14,8,8,10,18,14,5

  1. Mode = $\editable{}$

Question 8

Find the mode from the histogram shown.

HistogramScoresFrequency510152025306869707172

Outcomes

MS11-7

develops and carries out simple statistical processes to answer questions posed

What is Mathspace

About Mathspace