Level 5

How the shape affects choice of centre and spread

Lesson

Choosing a measure of centre

We've already leared about three measures of central tendency: mean, median and mode. These measures all give us a sense of the average score in the set. However, certain features in a data set can significantly affect measures of central tendency.

So how do we know which measure of central tendency is most appropriate for a data set?

Mode

Remember the mode is the most frequently occurring score. So if we notice that a data set has a number of repeated scores, then the mode would be a good measure of centre.

Mean

If the range of scores is reasonably small and there are no outliers, then the mean is the most appropriate average.

Median

Unlike the mean and the mode, the median is not affected by outliers or repeated values. So the median is a good measure of central tendency if a data set has outliers or a large range.

Choosing a measure of centre from a graph

The shape of the data may also determine which measure of central tendency is the most appropriate measure of a data set.

If a data set is symmetrical, the mean, median, and mode will all be equal.

When data is negatively skewed, the mode is always the highest measure of central tendency and the mean will always be the lowest measure of central tendency.

For negatively skewed data:
MEAN < MEDIAN < MODE

When data is positively skewed, the mean is always the highest measure of central tendency and the mode will always be the lowest measure of central tendency.

For positively skewed data:
MODE < MEDIAN < MEAN

Therefore, in skewed data, the most appropriate measure of central tendency will be the median.

Summary

Here is a basic summary of selecting an appropriate measure of central tendency (even though sometimes it can be helpful to consider more than one measure).

Data Set...	Mean	Median	Mode
has outliers
has repeated values
has a relatively small range
is skewed

But of course, sometimes the context of the data we are analysing lends itself to particular measures as well.

Now let's have a go at choosing a good measure of central tendency for some data sets.

Worked Examples

Question 1

Which measure of centre would be best for the following data set?

$15,13,16,17,15,15,15$15,13,16,17,15,15,15

Mean
A
Mode
B
Median
C

QUESTION 2

Which measure of centre would be best for the following data set?

$8,10,14,18,19,91$8,10,14,18,19,91

Mean
A
Median
B
Mode
C

QUESTION 3

Which measure of centre should we use for the following set of data?

$12,15,16,21,22,25$12,15,16,21,22,25

Median
A
Mean
B
Mode
C

QUESTION 4

What measure of center would be most appropriate to use to represent the data in this graph?

Median
A
Mode
B
Mean
C

Choosing a measure of spread

Certain features in a data set can also significantly affect measures of spread. In particular, outliers can have a big effect on measures of spread. In this chapter, we will look at two measures of spread: range and mean absolute deviation (MAD).

Range

The range is the difference between the highest and lowest scores in a data set. The range gives us a good picture of the spread of the scores. However, the range is heavily affected by outliers and if all the scores in a data set are clustered except for one outlier, then the range may not be the best measure of spread.

Mean Absolute Deviation (MAD)

The mean absolute deviation (MAD) is a good measure of spread if there are outliers in a data set. MAD is the mean of the absolute deviations from the data's mean. As such, it doesn't place as much weight on large deviations from the mean.

Worked example

Question 5

Look at the dot plot below.

What would the most appropriate measure of spread be?

Think: This data set has an outlier. What measure of spread is best for data sets with outliers?

Do: The range is heavily affected by outliers so the MAD will be the best measure of spread.

Click here to investigate measures of centre and spread further.

Outcomes

S5-1

Plan and conduct surveys and experiments using the statistical enquiry cycle:– determining appropriate variables and measures;– considering sources of variation;– gathering and cleaning data;– using multiple displays, and re-categorising data to find patterns, variations, relationships, and trends in multivariate data sets;– comparing sample distributions visually, using measures of centre, spread, and proportion;– presenting a report of findings

S5-2

Evaluate statistical investigations or probability activities undertaken by others, including data collection methods, choice of measures, and validity of findings