topic badge

10.05 Outliers

Lesson

Outliers

In statistics, we tend to assume that our data will fit some kind of trend and that most things will fit into a "normal" range. This is why we look at measures of central tendency, such as the  mean, median and mode  .

An outlier is an event that is very different from the norm and results in a score that is really above or below average. For example, if there are five people in a group and four people were between 120 \text{ cm} and 130 \text{ cm}, whereas Jim was 165 \text{ cm}, Jim would be an outlier as he is much taller than everyone else in the group.

Examples

Example 1

The dot plot shows the temperature (\degree \text{C}) in a town over a several week period. Identify the temperature that is an outlier.

The image shows a dot plot with the data for temperature in a town. Ask your teacher for more information.
Worked Solution
Create a strategy

Identify the temperature that is much greater or smaller than these scores.

Apply the idea

We can see that the dot for 21 is far away from the rest of the dots.\text{Outlier}=21 \degree \text{C}

Idea summary

An outlier is an event that is very different from the norm and results in a score that is really above or below average.

Effects of outliers

Outliers can skew or change the shape of our data. This can be a problem (especially for small data sets) because the mean, median and range might not properly represent the situation. We can counteract this by removing outliers. Removing outliers will have the following effects:

Removing a really low outlierRemoving a really high outlier
The range will decrease.The range will decrease.
The median might increase.The median might decrease.
The mean will increase.The mean will decrease.
The mode will not change.The mode will not change.

Examples

Example 2

Consider the following set of data: 53,\,46,\,25,\,50,\,30,\,30,\,40,\,30,\,47,\,109

a

Find the mean, median, mode, and range.

Worked Solution
Create a strategy

To find the mean, use the formula: \text{Mean}=\dfrac{\text{Sum of score}}{\text{Number of scores}}

To find the median find the middle score, to find the mode find the most frequent score.

To find the range, use the formula: \text{Range}=\text{Highest score}-\text{Lowest score}

Apply the idea
\displaystyle \text{Mean}\displaystyle =\displaystyle \dfrac{53+46+25+50+30+30+40+30+47+109}{10}Use the formula
\displaystyle =\displaystyle \dfrac{460}{10}Evaluate the addition
\displaystyle =\displaystyle 46Evaluate the division

To find the median, order the scores: 25,\,30,\,30,\,30,\,40,\,46,\,47,\,50,\,53,\,109

The middle scores are: 40,\,46

\displaystyle \text{Median}\displaystyle =\displaystyle \dfrac{40+46}{2}Find the average of the middle values
\displaystyle =\displaystyle 43Evaluate

To find the mode, choose the score which occurs most often.

\text{Mode}=30

To find the range:

\displaystyle \text{Range}\displaystyle =\displaystyle 109-25Substitute the values
\displaystyle =\displaystyle 84Evaluate the subtraction
b

Which data value is an outlier?

Worked Solution
Create a strategy

Choose the value that is much greater or much smaller than the rest of the data set.

Apply the idea

\text{Outlier}=109

c

Find the mean, median, mode, and range after removing the outlier.

Worked Solution
Apply the idea
\displaystyle \text{Mean}\displaystyle =\displaystyle \dfrac{25+30+30+30+40+46+47+50+53}{9}Substitute all the scores
\displaystyle =\displaystyle \dfrac{351}{9}Evaluate the addition
\displaystyle =\displaystyle 39Evaluate the division

To find the median, order the scores: 25,\,30,\,30,\,30,\,40,\,46,\,47,\,50,\,53

The middle score is 40.

\text{Median}=40

To find the mode, choose the score which occurs most often.

\text{Mode}=30

To find the range:

\displaystyle \text{Range}\displaystyle =\displaystyle 53-25Substitute the values
\displaystyle =\displaystyle 28Evaluate the subtraction
d

Let A be the original data set and B be the data set without the outlier.

Complete the table using the symbols >,< and = to compare the statistics before and after removing the outlier.

\text{With outlier}\text{Without\ outlier}
Mean:AB
Median:AB
Mode:AB
Worked Solution
Create a strategy

Compare the statistics in part (a) and in part (c).

Apply the idea

Statistics from parts (a) and (c):

With outlierWithout outlier
Mean4639
Median4340
Mode3030
Range8428

Comparison table:

\text{With outlier}\text{Without\ outlier}
Mean:A>B
Median:A>B
Mode:A=B
Range:A>B
Idea summary

Removing outliers will have the following effects on the summary statistics:

A really low outlierA really high outlier
The range will decreaseThe range will decrease
The median might increaseThe median might decrease
The mean will increaseThe mean will decrease
The mode will not changeThe mode will not change

Outcomes

VCMSP300

Investigate the effect of individual data values including outliers, on the range, mean and median

What is Mathspace

About Mathspace