topic badge

7.03 Measures of centre

Lesson

Measures of central tendency attempt to summarise a set of data with a single value that describes the centre or middle of the scores.

The three main measures of central tendency are the meanmedian, and mode. Deciding which one is best depends on some other characteristics of the particular set of data, and we will look further into the suitability of the different measures in our lesson on describing distributions.

 

Measures of centre

Mean: Often referred to as the average–this is the sum of the scores divided by the number of scores.

Median: The middle value of an ordered set of data–or the value that separates the bottom half and top half of the scores.

Mode: The most frequently occurring value. For continuous data or data grouped in class intervals we talk about the modal class - the most frequently occurring class, rather than a mode.

 

Mean

The mean is described as the average of the numbers in a data set. It is defined as the sum of the scores divided by the number of scores.

We can use the interactive tool below to visualise the position of the mean for different data sets, and also how the mean changes as we move one of the scores around.

 

The symbol for the mean of a sample is $\overline{x}$x, whilst the population mean is represented by the symbol $\mu$μ (Greek letter 'mu'). We typically don't have data for every member of the population, so we usually don't know $\mu$μ exactly, but we can estimate it by using the sample mean, $\overline{x}$x, from a well designed survey.

If certain scores are repeated, such as when information is given in a frequency table then we can find the total sum of all scores by multiplying each unique score by its frequency, then adding them all up.

We summarise the calculation of the mean below.

Mean

The mean of a set of data is calculated by:

$\text{Mean}=\frac{\text{Total sum of all scores}}{\text{Number of scores}}$Mean=Total sum of all scoresNumber of scores

If certain scores are repeated, then:

$\text{Total sum of all scores}=\text{sum of}\ \left(\text{Unique score}\times\text{Frequency}\right)$Total sum of all scores=sum of (Unique score×Frequency)

Now let's look at a few examples of calculating the mean of different data sets.

 

Worked examples

Example 1

Find the mean from the data in the stem plot below.

Stem Leaf
$2$2 $3$3 $8$8    
$3$3 $1$1 $1$1 $1$1  
$4$4 $0$0 $3$3    
$5$5 $0$0 $3$3 $8$8 $8$8
$6$6 $2$2 $2$2 $9$9  
$7$7 $1$1 $8$8    
$8$8 $3$3      
$9$9 $0$0 $0$0 $1$1  

 

Think: We can find the mean by adding up all of the scores, then dividing the total by the number of scores.

Do:

$\text{Mean}$Mean $=$= $\frac{\text{Total of all scores}}{\text{Number of scores}}$Total of all scoresNumber of scores
  $=$= $\frac{23+28+3\times31+40+43+50+53+2\times58+2\times62+69+71+78+83+2\times90+91}{20}$23+28+3×31+40+43+50+53+2×58+2×62+69+71+78+83+2×90+9120
  $=$= $\frac{1142}{20}$114220
  $=$= $57.1$57.1

 

Example 2

A statistician has organised a set of data into the frequency table shown.

Score ($x$x) Frequency ($f$f)
$44$44 $8$8
$46$46 $10$10
$48$48 $6$6
$50$50 $18$18
$52$52 $5$5

 

(a) Complete the frequency distribution table by adding a column showing the total sum for each unique score.

Think: For each unique score ($x$x-value), multiply it by the number of times that score appears. In other words, multiply the unique score by its frequency $\left(f\right)$(f) to find the total sum for that score.

Do: So for a score of $44$44, which occurred $8$8 times, the total score is $44\times8=352$44×8=352. Completing the entire column, we get the following table.

Score ($x$x) Frequency ($f$f) $fx$fx
$44$44 $8$8 $352$352
$46$46 $10$10 $460$460
$48$48 $6$6 $288$288
$50$50 $18$18 $900$900
$52$52 $5$5 $260$260
Totals $47$47 $2260$2260

(b) Calculate the mean of this data set. Round your answer to two decimal places.

Think: We calculate the mean by dividing the sum of the scores (that is, the sum of all the $fx$fx's) by the number of scores (the total frequency).

Do:

$\text{Mean}$Mean $=$= $\frac{\text{Total of all scores}}{\text{Number of scores}}$Total of all scoresNumber of scores
  $=$= $\frac{2260}{47}$226047
  $=$= $48.09$48.09 ($2$2 d.p.)

 

Practice questions

Question 1

Find the mean of the following scores:

$8$8, $15$15, $6$6, $27$27, $3$3.

Question 2

In each game of the season, a basketball team recorded the number of 'three-point shots' they scored. The results for the season are represented in the given dot plot.

  1. What was the total number of points scored from three-point shots during the season?

  2. What was the mean number of points scored from three-point shots each game? Round to two decimal places if necessary.

  3. What was the mean number of three point shots per game this season? Leave your answer to two decimal places if necessary.

Question 3

The mean of $4$4 scores is $21$21. If three of the scores are $17$17, $3$3 and $8$8, find the $4$4th score, $x$x

  1. Enter each line of working as an equation.

 

Median

The median is one way of describing the middle or the centre of a data set using a single value. The median is the middle score in a data set.

The data must be ordered (usually in ascending order) before calculating the median.

Which term is in the middle?

Suppose we have five numbers in our data set: $4$4, $11$11, $15$15, $20$20 and $24$24.

The median would be $15$15 because it is the value right in the middle. There are two numbers on either side of it.

$4,11,\editable{15},20,24$4,11,15,20,24

If we have a larger data set, however, we may not be able to see straight away which term is in the middle. There are two methods we can use to help us work this out.

 

The "cross out" method

Exploration

Once a data set is ordered, we can cross out numbers in pairs (one high number and one low number) until there is only one number left. Let's check out this process using an example. Here is a data set with nine numbers:

  1. Check that the data is sorted in ascending order (i.e. in order from smallest to largest).

  1. Cross out the smallest and the largest number, like so:

  1. Repeat step 2, working from the outside in - taking the smallest number and the largest number each time until there is only one term left. We can see in this example that the median is $7$7:

Note that this process will only leave one term if there are an odd number of terms to start with. If there are an even number of terms, this process will leave two terms instead, if you cross them all out, you've gone too far! To find the median of a set with an even number of terms, we can then take the mean of these two remaining middle terms.

 

The "counting terms" method

We can also work out which term will be the middle number by considering whether there is an odd or even number of scores, and then using a formula.

We summarise the formulas below.

Finding the median position

Let $n$n be the number of terms.

  • If $n$n is odd, then the median is the middle term, which is the $\frac{n+1}{2}$n+12th term.
  • If $n$n is even, then the median is the average of the two middle terms, that being the $\frac{n}{2}$n2th and $\left(\frac{n}{2}+1\right)$(n2+1)th terms.

Exploration

Let's use the same set of nine numbers from the previous example, $1,1,3,5,7,9,9,10,15$1,1,3,5,7,9,9,10,15. We can see that there is an odd number of scores, $n=9$n=9, so the position of the median is:

$\text{Position of median }$Position of median $=$= $\frac{9+1}{2}$9+12

Where we've used $\frac{n+1}{2}$n+12

  $=$= $5$5th term

Simplifying the fraction

 

This means the fifth term will be the median: $1,1,3,5,\editable{7},9,9,10,15$1,1,3,5,7,9,9,10,15.

So again, we find that the median is $7$7.

Let's now try this with an even number of terms. Here is a data set with four terms: $8,12,17,20$8,12,17,20. This time, we have $n=4$n=4. What would happen if we used the same procedure as above?

$\text{Position of median}$Position of median $=$= $\frac{4+1}{2}$4+12

Where we've used $\frac{n+1}{2}$n+12 again

  $=$= $2.5$2.5th term

Simplifying the fraction

 

What does the "$2.5$2.5th term" mean? Well, just like when we used the "cross-out" method, the $2.5$2.5th term means the average (mean) of the $2$2nd and $3$3rd terms. This is why the when the number of scores, $n$n, is even, we find the average of the $\frac{n}{2}$n2th term and $\left(\frac{n}{2}+1\right)$(n2+1)th terms.

Again, remember that the data must be in order before counting along to the median position. So in this example, the median will be the average of $12$12 and $17$17.

$\text{Median }$Median $=$= $\frac{12+17}{2}$12+172

Taking the average of the $2$2nd and $3$3rd scores

  $=$= $14.5$14.5

Simplifying the fraction

Practice questions

Question 4

Consider the following scores:

$23,25,13,9,11,21,24,17,20$23,25,13,9,11,21,24,17,20

  1. Sort the scores in ascending order.

  2. Calculate the median.

Question 5

Write down $4$4 consecutive odd numbers whose median is $40$40.

  1. Write all solutions on the same line separated by a comma.

Question 6

Determine the following using the bar graph:

ScoreFrequency5101520254445464748

  1. The total number of scores.

  2. The median.

 

The mode

The mode is another measure of central tendency - that is, it's a third way of describing a value that represents the centre of the data set. The mode describes the most frequently occurring score. For continuous data or data grouped in class intervals we talk about the modal class - the most frequently occurring class, rather than a mode.

Let's say we ask $10$10 people how many pets they have. $2$2 people say no pets, $6$6 people say one pet and $2$2 people say they have two pets. What is the most common number of pets for people to have? In this case, the most common number is one pet, because the largest number of people, which was $6$6, had one pet. So the mode of this data set is $1$1.

Data can have more than one mode when several outcomes have the same highest frequency. When the data has two or more modes we refer to it as being multimodal and if it has exactly two modes it is called bimodal

Note: We can also refer to the general shape of the data as being bimodal if the data has two clear peaks. When talking about the general shape the peaks do not need to be of exactly the same height.

 

Worked example

Example 3

A statistician organised a set of data into the frequency table shown below, find the mode of the data.

Score ($x$x) Frequency ($f$f)
$10$10 $26$26
$20$20 $10$10
$30$30 $18$18
$40$40 $18$18
$50$50 $15$15

Think: The mode is the score that occurs most frequently.

Do: The highest number in the frequency column is $26$26. This corresponds to the score of $10$10, and therefore the mode is $10$10.

Reflect: At a glance, it may seem unusual that $10$10 is the mode, since the mode measures central tendency, and $10$10 is far from being the centre of the numbers that we saw between $10$10 and $50$50.

The mode measures central tendency, but a different kind of central tendency. It tells us where the data likes to "bunch up"–this gives us an approximation for what score we're likely to draw if we sample from the data set.

 

Practice questions

Question 7

Find the mode of the following scores:

$8,18,5,2,2,10,8,5,14,14,8,8,10,18,14,5$8,18,5,2,2,10,8,5,14,14,8,8,10,18,14,5

  1. Mode = $\editable{}$

Question 8

Find the mode from the histogram shown.

HistogramScoresFrequency510152025306869707172

 

Measures of centre for interval grouped data

 

For data grouped in intervals, such as continuous data, we cannot find the exact measures of centre as we do not have the individual scores. We can however find approximate measures by representing all scores in an interval by the class centre (midpoint) of the given interval. 

Worked example

Example 4

Estimate the mean for the data represented in the grouped frequency table:

Class Frequency
$30-<40$30<40 $12$12
$40-<50$40<50 $16$16
$50-<60$50<60 $25$25
$60-<70$60<70 $4$4

Think: To estimate the mean for the data we first need to determine the class centres, which will be used to represent all the scores in a class. For instance, the class centre for the first interval is $\frac{30+40}{2}=35$30+402=35.

We then use:  $\text{Total sum of all scores}\approx\text{sum of}\ \left(\text{Class centre}\times\text{Frequency}\right)$Total sum of all scoressum of (Class centre×Frequency)

Do:

Class Class centre Frequency
$30-<40$30<40 $35$35 $12$12
$40-<50$40<50 $45$45 $16$16
$50-<60$50<60 $55$55 $25$25
$60-<70$60<70 $65$65 $7$7
$\text{Mean}$Mean $=$= $\frac{\text{Total sum of all scores}}{\text{Number of scores}}$Total sum of all scoresNumber of scores
  $\approx$ $\frac{35\times12+45\times16+55\times25+65\times7}{60}$35×12+45×16+55×25+65×760
  $=$= $49.5$49.5

Thus, the mean for this data set is approximately $49.5$49.5.

Practice questions

Question 9

Consider the table below.

Score Frequency
$1$1 - $4$4 $2$2
$5$5 - $8$8 $7$7
$9$9 - $12$12 $15$15
$13$13 - $16$16 $5$5
$17$17 - $20$20 $1$1
  1. Use the midpoint of each class interval to determine an estimate for the mean of the following sample distribution. Round your answer to one decimal place.

  2. Which is the modal group?

    $1$1 - $4$4

    A

    $17$17 - $20$20

    B

    $13$13 - $16$16

    C

    $5$5 - $8$8

    D

    $9$9 - $12$12

    E

Question 10

Consider the table below.

Score (x) Frequency
$0\le x<20$0x<20 $4$4
$20\le x<40$20x<40 $15$15
$40\le x<60$40x<60 $23$23
$60\le x<80$60x<80 $73$73
$80\le x<100$80x<100 $45$45
  1. Use the midpoint of each class interval to determine an estimate for the mean of the following sample distribution. Round your answer to one decimal place.

  2. Which is the modal group?

    $0\le x<20$0x<20

    A

    $60\le x<80$60x<80

    B

    $20\le x<40$20x<40

    C

    $40\le x<60$40x<60

    D

    $80\le x<100$80x<100

    E

 

Measures of centre using technology 

Throughout this chapter and in particular for moderate to large data sets, you should use appropriate technology such as a calculator with statistics program on your computer.

Tips:

  • Familiarise yourself with the program and the types of calculations and graphs it is capable of creating.
  • Ensure settings are correct for the data given, this is particularly important when changing between data that is in a simple list to data that is in a frequency table.
  • Take note of the different symbols used for the different calculations we will encounter.

Select  your brand of calculator below to work through an example of finding the measures of centre using technology.

Casio ClassPad

Calculator example coming soon.

TI Nspire

Calculator example coming soon.

Outcomes

2.3.1.7

determine the mean and standard deviation (using technology) of a dataset and use statistics as measures of location and spread of a data distribution, being aware of the significance of the size of the standard deviation

What is Mathspace

About Mathspace