8.03 Introduction to z-scores

Lesson

Worksheet

Practice

Lesson

The shape or spread of a normal distribution is affected by the standard deviation of the data set. Some sets of data are quite spread out and have a large standard deviation, whereas others have similar values that lie closely around the mean and have a small standard deviation.

To compare several normally distributed data sets we need a common unit of measurement called the $z$z-score, which tells us the number of standard deviations a value lies above or below the mean.

In the normal distribution below, the points marked on the horizontal axis are separated by $1$1 standard deviation. Whatever the standard deviation may be, we know the values at the endpoints of the region have $z$z-scores of $-1$−1 and $1$1 respectively. Values in the highlighted region are closer to the mean than average, and values outside it are further away.

A positive $z$z-score indicates the score is above the mean.
A $z$z-score of $0$0 indicates the score is equal to the mean.
A negative $z$z-score indicates the score is below the mean.

Exploration

The heights of office buildings in a large city have a mean of $50$50 m. If we picked out two buildings, one that's $45$45 m tall and one that's $60$60 m tall, we can quickly calculate that the first building is $5$5 m below the mean and the second is $10$10 m above the mean.

Are the majority of buildings far off from being $50$50 m tall? Or are almost all buildings really close to being $50$50 metres tall? To try and answer this question we need to know some more information - that the buildings have a standard deviation of $5$5 m. We need to know both the mean and the standard deviation to calculate $z$z-scores, and we use this formula to calculate them.

Formula for calculating z-scores

If a data set is approximately normally distributed with mean $\overline{x}$x and standard deviation $s$s, then the $z$z-score corresponding to a value of $x$x from the data set is defined as:

$z=\frac{x-\overline{x}}{s}$z=x−xs

Let's calculate the $z$z-score for the office building of height $45$45 m first:

$z$`z`	$=$=	$\frac{x-\overline{x}}{s}$`x`−`xs`	(write down the formula)
$z$`z`	$=$=	$\frac{45-50}{5}$45−505	(substitute in our values)
$z$`z`	$=$=	$-\frac{5}{5}$−55
$z$`z`	$=$=	$-1$−1

The $z$z-score of $-1$−1 tells us that the building is exactly one lot of the average distance below the mean height. Let's use the same formula on the office building that has height $60$60 m:

$z$`z`	$=$=	$\frac{x-\overline{x}}{s}$`x`−`xs`	(write down the formula)
$z$`z`	$=$=	$\frac{60-50}{5}$60−505	(substitute in our values)
$z$`z`	$=$=	$\frac{10}{5}$105
$z$`z`	$=$=	$2$2

The $z$z-score of $2$2 tells us that the height of the building is two groups of the average distance above the mean height. Here are the heights of the two buildings represented as endpoints of the shaded region.

The mean and standard deviation of $z$`z`-scores

The $z$z-scores are a common unit of measure for any data set that is approximately normally distributed. The set of $z$z-scores will always have a mean of $0$0 and standard deviation of $1$1. We can check this by observing that the formula for our $z$z-scores above involves subtracting each score by the mean $\overline{x}$x. In other words, we shift the data set so each score is centred about the new mean of $0$0. Say for instance we obtained the following data set.

Subtracting all the scores by the mean translates the histogram so that they are concentrated around $0$0.

We can see that $0$0 is always the mean of $z$z-scores.

Remember that the average distance between the scores and the mean is the standard deviation, $s$s. If we divide all the scores by $s$s, we scale the scores so that the average distance between the scores and the mean is now $\frac{s}{s}$ss which is $1$1. This is why we divide by the standard deviation when computing $z$z-scores.

Practice questions

QUESTION 1

The marks in a recent English exam are approximately normally distributed with a mean of $57$57 and a standard deviation of $5$5.

Find the value of the $z$z-score that corresponds to an English mark of $67$67.

QUESTION 2

A data set is approximately normally distributed.

If a value that belongs to the data set has a $z$z-score of $4.29$4.29, how many standard deviation(s) is the value away from the mean?
Is the value above or below the mean?
$Above$
A
$Below$
B

QUESTION 3

The heights of a group of Year $12$12 students (in cm) are approximately normally distributed. The heights are shown below.

$167,161,159,164,161,164,162,162,166,162$167,161,159,164,161,164,162,162,166,162

Given that the mean is approximately $162.8$162.8 cm and the standard deviation is approximately $2.32$2.32 cm, calculate the following $z$z-scores to two decimal places.

Height (cm)	$159$159	$161$161	$161$161	$162$162	$162$162	$162$162	$164$164	$164$164	$166$166	$167$167
$z$`z`-scores	$\editable{}$	$\editable{}$	$\editable{}$	$\editable{}$	$\editable{}$	$\editable{}$	$\editable{}$	$\editable{}$	$\editable{}$	$\editable{}$

What is the mean of the $z$z-scores correct to the nearest integer?
What is the standard deviation of the $z$z-scores correct to the nearest integer?
Is the following statement true or false?

"All data sets that are approximately normally distributed have $z$z-scores with a mean of $0$0 and a standard deviation of $1$1."
True
A
False
B

Outcomes

MS2-12-2

analyses representations of data in order to make inferences, predictions and draw conclusions

MS2-12-7

solves problems requiring statistical processes, including the use of the normal distribution and the correlation of bivariate data