topic badge

3.015 Analysing data

Lesson

Analysing data: measures of central tendency

Measures of central tendency, or measures of location, refer to statistical quantities that tell us where the middle of the scores are (the average). There are 3 of these measures: mean, median and mode. They can all be referred to as "averages" of the data set. 

Mean

The mean is what we typically consider to be the average of all the scores.

You calculate the mean by adding up all the scores, then dividing the total by the number of scores.

Median

The median is the middle score in a data set when the scores are arranged in numerical order.

There are two ways you can find the median:

  1. Write the numbers in the data set in ascending order, then find the middle score by crossing out a number at each end until you are left with one in the middle. 
  2. Calculate what score would be in the middle using the formula: $\text{middle term }=\frac{n+1}{2}$middle term =n+12, then count up in ascending order until you reach the score that is that term.

Mode

The mode is the most frequently occurring score.

To find the mode, determine which score you see most frequently in your data set.

Worked example

Example 1 

Given the following set of scores:

$65.2$65.2, $64.3$64.3, $71.6$71.6, $63.2$63.2, $45.2$45.2, $62.2$62.2, $46.8$46.8, $58.7$58.7

A) Sort the scores in ascending order

Think: Ascending means lowest to highest.

Do:

$45.2,46.8,58.7,62.2,63.2,64.3,65.2,71.6$45.2,46.8,58.7,62.2,63.2,64.3,65.2,71.6

 

B) Calculate the median, writing your answer as a decimal.

Think: Which term will be in the middle?

Do:

$\text{Middle term }$Middle term $=$= $\frac{n+1}{2}$n+12
  $=$= $\frac{8+1}{2}$8+12
  $=$= $4.5$4.5
 

This means that the median lies between the fourth and fifth scores.

$\frac{62.2+63.2}{2}$62.2+63.22 $=$= $62.7$62.7

The median is $62.7$62.7.

Analysing data: measures of spread

The range, interquartile range, variance and standard deviation are all measures of spread. They tell us about how spread out the scores are. We will look at the interquartile range in a later lesson.

Range

The range is the difference between the highest score and the lowest score.

To calculate the range, you need to subtract the lowest score from the highest score.

Variance 

Variance compares the distance of every score in a data set to the mean of the data set. The variance is found by finding the average of the squared differences, or deviations. Squaring the distances ensures that we are computing positive values. 

This gives us the following formula, with $\mu$μ for the mean and $x$x for the scores we have in our set:

$\sigma^2=\frac{1}{n}\Sigma(x-\mu)^2$σ2=1nΣ(xμ)2

The calculator's statistical mode can calculate variance more easily. 

We will be learning more about variance when we look at probability distributions.

Standard Deviation 

Standard deviation is simply the square root of the variance. We do this simply to have a measure of spread in the same units as the scores. 

We can calculate the standard deviation for a population or a sample. In this course, we will be finding the population standard deviation most of the time.

The symbols used are:

$\text{Population Standard Deviation}$Population Standard Deviation $=$= $\sigma$σ (lowercase sigma)
$\text{Sample Standard Deviation}$Sample Standard Deviation $=$= $s$s  

 

In statistics mode on a calculator, the following symbols might be used:

$\text{Population Standard Deviation}$Population Standard Deviation $=$= $\sigma_n$σn
$\text{Sample Standard Deviation}$Sample Standard Deviation $=$= $\sigma_{n-1}$σn1

 

When using the calculator to find the standard deviation, ensure settings are correct for the data given, this is particularly important when changing between data that is in a simple list to data that is in a frequency table.

Standard deviation is a very powerful way of comparing the spread of different data sets, particularly if there are different means and population numbers.

Practice questions

Question 1

Find the sample standard deviation of the following set of scores, correct to two decimal places, by using the statistics mode on the calculator:

$9,5,-14,8,1,3,-6,-16,8,-17$9,5,14,8,1,3,6,16,8,17

Question 2

The table shows the number of goals scored by a football team in each game of the year.

Score ($x$x) Frequency ($f$f)
$0$0 $3$3
$1$1 $1$1
$2$2 $5$5
$3$3 $1$1
$4$4 $5$5
$5$5 $5$5
  1. In how many games were $0$0 goals scored?

  2. Determine the median number of goals scored. Leave your answer to one decimal place if necessary.

  3. Calculate the mean number of goals scored each game. Leave your answer to two decimal places if necessary.

  4. Use your calculator to find the population standard deviation. Leave your answer to two decimal places if necessary.

Question 3

Languages and Mathematics are very different disciplines, and so to compare results in the two subjects, the standard deviation is used. The mean and standard deviation of exam results in each subject are given.

  Mean Std. Deviation
Languages $60$60 $7$7
Mathematics $67$67 $8$8
  1. A student receives a mark of $81$81 in Languages. How many standard deviations away from the mean is this mark?

  2. What mark in Mathematics would be equivalent to a mark of $81$81 in Languages?

  3. A student receives a mark of $86.2$86.2 in Mathematics. How many standard deviations away from the mean is this mark? Leave your answer to one decimal place if needed.

  4. What mark in Languages would be equivalent to a mark of $86.2$86.2 in Mathematics? Leave your answer to one decimal place if necessary.

Question 4

Find the mean of the following scores:

$8$8, $15$15, $6$6, $27$27, $3$3.

Question 5

Find the mode of the following scores:

$2,2,6,7,7,7,7,11,11,11,13,13,16,16$2,2,6,7,7,7,7,11,11,11,13,13,16,16

  1. Mode = $\editable{}$

Question 6

Find the range of the following set of scores:

$10,19,19,7,20,14,2,11$10,19,19,7,20,14,2,11

Comparing data sets

While calculating the measures of central tendency and measures of spread can tell us a lot about a data set, these calculations can also be very powerful in comparing and contrasting two different data sets.

Worked example

Example 2

The number of minutes spent exercising per day for $10$10 days is recorded for two people who have just signed up for a new gym membership. 

Person A:  $45$45  $50$50  $50$50  $55$55  $55$55  $60$60  $60$60  $65$65  $65$65  $65$65  

Person B:  $20$20  $30$30  $45$45  $55$55  $60$60  $60$60  $65$65  $70$70  $70$70  $70$70

(a)  Calculate the mean, median, mode and range for Person A

mode = $65$65

median = $57.5$57.5

mean = $57$57

range = $20$20

 

(b)  Calculate the mean, median, mode and range for Person B

mode = $70$70

median = $60$60

mean = $54.5$54.5

range = $50$50

(c)  Which person is the most consistent with their exercise?

Person A

(d)  Which statistical measure supports your answer to part (c)?

The range. The smaller range for Person A indicates that the number of minutes they exercise each day is more consistent than that of Person B. 

The range for Person B is more than double that of Person A, indicating more inconsistency in their exercise routine.

(e)  Which person seems to train more overall?

Person B

(f)  Which statistical measure(s) supports your answer to (e)?

The mode and median for Person B are both larger than for Person A. 

While the mean for Person B is slightly lower than Person A, this is due to the negative skew of their data.

Overall, the larger mode and median for Person B indicates that they exercise for longer overall.

Practice questions

Question 7

The beaks of two groups of bird are measured, in mm, to determine whether they might be of the same species.

Length of beaks of two groups of birds (in mm.)
Group 1 $33$33 $39$39 $31$31 $27$27 $22$22 $37$37 $30$30 $24$24 $24$24 $28$28
Group 2 $29$29 $44$44 $45$45 $34$34 $31$31 $44$44 $44$44 $33$33 $37$37 $34$34
  1. Calculate the range for Group 1.

  2. Calculate the range for Group 2.

  3. Calculate the mean for Group 1. Give your answer as a decimal.

  4. Calculate the mean for Group 2. Give your answer as a decimal.

  5. Choose the most appropriate statement that describes the set of data.

    Although the ranges are similar, the mean values are significantly different indicating that these two groups of birds are of the same species.

    A

    Although the ranges are similar, the mean values are significantly different indicating that these two groups of birds are not of the same species.

    B

    Although the mean values are similar, the ranges are significantly different indicating that these two groups of birds are not of the same species.

    C

    Although the mean values are similar, the ranges are significantly different indicating that these two groups of birds are of the same species.

    D

Misuse of statistics in the media

Statistics are usually included in media to support facts, reinforce arguments or provide additional information to the viewer. However, we must not forget that they are a powerful tool of persuasion, and must be interpreted with caution, as statistics can be deliberately manipulated or skewed by the author to shape the opinions of viewers.

One example of how the statistics may be misused (intentionally or unintentionally) is in choosing the most appropriate measure of central tendency, that is the mean, median or mode. The mean is usually the most appropriate measure as it considers each score, however, it is very sensitive to outliers that may exaggerate the mean such that is not an accurate representation of the data set. In these situations, the median is most appropriate as it is not affected by outliers. Examples of data sets where the median is more appropriate than the mean are when describing average house prices and average incomes. 

Here is an article that highlights another media source's misuse of the mean and median: Average Australian wages revealed.

Practice question

Question 8

The selling price of recently sold houses are:

$\$467000$$467000, $\$413000$$413000, $\$410000$$410000, $\$456000$$456000, $\$487000$$487000, $\$929000$$929000

  1. What is the mean selling price, rounded to the nearest thousand dollars?

  2. Which of the selling prices raises the mean so that it is not reflective of most of the prices?

    $\editable{}$ dollars

  3. Recalculate the mean selling price excluding the outlier.

Outcomes

MA12-8

solves problems using appropriate statistical processes

What is Mathspace

About Mathspace