Data Analysis

Hong Kong

Stage 1 - Stage 3

Lesson

We've already leared about three measures of central tendency: mean, median and mode. These measures all give us a sense of the average score in the set. However, certain features in a data set can significantly affect measures of central tendency.

So how do we know which measure of central tendency is most appropriate for a data set?

Remember the mode is the most frequently occurring score. So if we notice that a data set has **a number of repeated scores**, then the mode would be a good measure of centre.

If the range of scores is reasonably **small** and there are **no outliers**, then the mean is the most appropriate average.

Unlike the mean and the mode, the median is not affected by outliers or repeated values. So the median is a good measure of central tendency if a data set has **outliers or a large range**.

The shape of the data may also determine which measure of central tendency is the most appropriate measure of a data set.

If a data set is symmetrical, the mean, median, and mode will all be equal.

When data is negatively skewed, the mode is always the highest measure of central tendency and the mean will always be the lowest measure of central tendency.

For negatively skewed data:

MEAN < MEDIAN < MODE

When data is positively skewed, the mean is always the highest measure of central tendency and the mode will always be the lowest measure of central tendency.

For positively skewed data:

MODE < MEDIAN < MEAN

Therefore, in skewed data, the most appropriate measure of central tendency will be the median.

Here is a basic summary of selecting an appropriate measure of central tendency (even though sometimes it can be helpful to consider more than one measure).

Data Set... | Mean | Median | Mode |
---|---|---|---|

has outliers | |||

has repeated values | |||

has a relatively small range | |||

is skewed |

But of course, sometimes the context of the data we are analysing lends itself to particular measures as well.

Now let's have a go at choosing a good measure of central tendency for some data sets.

Which measure of centre would be best for the following data set?

$15,13,16,17,15,15,15$15,13,16,17,15,15,15

Mean

AMode

BMedian

C

Which measure of centre would be best for the following data set?

$8,10,14,18,19,91$8,10,14,18,19,91

Mean

AMedian

BMode

C

Which measure of centre should we use for the following set of data?

$12,15,16,21,22,25$12,15,16,21,22,25

Median

AMean

BMode

C

What measure of center would be most appropriate to use to represent the data in this graph?

Median

AMode

BMean

C

Certain features in a data set can also significantly affect measures of spread. In particular, outliers can have a big effect on measures of spread. In this chapter, we will look at two measures of spread: range and mean absolute deviation (MAD).

The range is the difference between the highest and lowest scores in a data set. The range gives us a good picture of the spread of the scores. However, the range is heavily affected by outliers and if all the scores in a data set are clustered except for one outlier, then the range may not be the best measure of spread.

The mean absolute deviation (MAD) is a good measure of spread if there are outliers in a data set. MAD is the mean of the absolute deviations from the data's mean. As such, it doesn't place as much weight on large deviations from the mean.

Look at the dot plot below.

What would the most appropriate measure of spread be?

**Think**: This data set has an outlier. What measure of spread is best for data sets with outliers?

**Do**: The range is heavily affected by outliers so the MAD will be the best measure of spread.

Click here to investigate measures of centre and spread further.