Univariate Data

NZ Level 5

Identify Outliers

Lesson

In statistics, we tend to assume that our data will fit some kind of trend and that most things will fit into a "normal" range, even though we expect a certain amount of variability. This is why we look at measures of central tendency, such as the mean, median and mode, and talk about the normal distribution, which is a nice symmetrical graph shaped like a bell.

An outlier is an event that is very different from the norm and results in a score that is really above or below average. For example, if there are 10 people in a long jump contest and nine people can jump between 4 and 5 metres, and one person only jumped 1 metre, they would be an outlier as their jump is *much* shorter than everyone else in the group.

Outliers can affect our measures of central tendency and variability, which we will learn about later in Things Out of the Norm. However, let's look at some examples of how to identify outliers now.

Identify the outlier(s) in the data set $\left\{73,77,81,86,131\right\}${73,77,81,86,131}.

The graph shows the annual net profit (in millions) of a company over a several year period. Identify the year in which the annual net profit is an outlier.

VO_{2} Max is a measure of how efficiently your body uses oxygen during exercise. The more physically fit you are, the higher your VO_{2} Max. Here are some people’s results when their VO_{2} Max was measured.

$46,27,32,46,30,25,41,24,26,29,21,21,26,47,21,30,41,26,28,26,76$46,27,32,46,30,25,41,24,26,29,21,21,26,47,21,30,41,26,28,26,76

Sort the values into ascending order.

Determine the median VO

_{2}Max.Determine the upper quartile value. Leave your answer as a decimal if necessary.

Determine the lower quartile value. Leave your answer as a decimal if necessary.

Calculate $1.5\times IQR$1.5×

`I``Q``R`, where IQR is the interquartile range. Leave your answer as a decimal if necessary.An outlier is a score that is more than $1.5\times IQR$1.5×

`I``Q``R`above or below the Upper Quartile or Lower Quartile respectively. State the outlier.Here is a box and whisker plot for the data.

An average untrained healthy person has a VO

_{2}Max between $30$30 and $40$40. The majority of this group of people are likely to:do moderate amounts of exercise

Abe professional athletes

Bdo none to moderate amounts of exercise

Cdo moderate amounts of exercise

Abe professional athletes

Bdo none to moderate amounts of exercise

C

Plan and conduct surveys and experiments using the statistical enquiry cycle:– determining appropriate variables and measures;– considering sources of variation;– gathering and cleaning data;– using multiple displays, and re-categorising data to find patterns, variations, relationships, and trends in multivariate data sets;– comparing sample distributions visually, using measures of centre, spread, and proportion;– presenting a report of findings