topic badge

7.07 Z-scores

Z-scores

To directly compare multiple normally distributed data sets, we need a common unit of measurement. In statistics involving the normal distribution, we use the number of standard deviations away from the mean as a standardized unit of measurement called a z-score.

z-score

The number of standard deviations an element is away from the mean z\text{-score}(z)=\dfrac{x-\mu}{\sigma} where x is an element of the data set, \mu is the mean of the data set, and \sigma is the standard deviation of the data set.

We can use z-scores along with the standard normal curve to compare values from different sets of data.

The standard normal distribution

The set of all z-scores. The mean of the data in a standard normal distribution is 0 and the standard deviation is 1. This allows for the comparison of unlike normal data.

The standard normal distribution curve, with a mean of 0. Speak to your teacher for more details.

For example, a data set that is normally distributed with a mean of 1010 and a standard deviation of 20 can be standardized with z-scores. This would allow us to compare other sets of similar data with a different mean and standard deviation.

Two curves. The left curve is titled A Normal Distribution, with a mean of 1,010. A right arrow labeled Standardize is pointing from the left curve to the right curve. The right curve is titled The Standard Normal Distribution, with a mean of 0. Speak to your teacher for more details.

To find the z-score of a data value, we must know the mean and standard deviation. If we know those values, we can use the following formula to find the equivalent z-score.

\displaystyle z=\dfrac{x-\mu}{\sigma}
\bm{z}
The z-score
\bm{x}
The data value
\bm{\mu}
The population mean
\bm{\sigma}
The population standard deviation
  • A positive z-score indicates the data value was above the mean.

  • A z-score of 0 indicates the data value was equal to the mean.

  • A negative z-score indicates the data value was below the mean.
  • The larger the magnitude of the z-score, the further the score is from the mean.

The empirical rule can also be used to estimate the percentage of data within 1, 2, and 3 standard deviations on the mean in the standard normal curve.

A normal distribution curve. Below the curve is a horizontal axis with the following evenly spaced marks from left to right:-3, -2, -1, 0, 1, 2, 3. The peak of the curve is at 0. Vertical lines are drawn from the curve to each mark in the horizontal axis. The area under the curve between -3 and -2 is labeled 2.35%, between -2 and -1 labeled 13.5%, between -1 and 0 labeled 34%, between 0 and 1 labeled 34%, between 1 and 2 labeled 13.5%, and between 2 and 3 labeled 2.35%. Below the horizontal axis, a set of three brackets are shown: a bracket connecting -1 and 1 is labeled 68%, a bracket connecting -2 and 2 is labeled 95%, and a bracket connecting -3 and 3 is labeled 99.7%.

Examples

Example 1

Brock is applying to different colleges across America and needs to decide if he should emphasize his SAT score, ACT score, or both. The test scores for both the SAT and ACT are normally distributed. The data is summarized in the table provided.

Brock's scoreMeanStandard Deviation
SAT14501051211
ACT3020.85.7
a

Calculate and interpret the z-score for Brock's SAT score.

Worked Solution
Create a strategy

The formula for finding the z-score is z=\dfrac{x-\mu}{\sigma} where x is a test score, \mu is the mean, and \sigma is the standard deviation.

Apply the idea

For the SAT, we are given x=1450,\, \mu=1051,\, and \sigma=211:

\displaystyle z\displaystyle =\displaystyle \dfrac{x-\mu}{\sigma}Formula for z-scores
\displaystyle z\displaystyle =\displaystyle \dfrac{1450-1051}{211}Substitute the known values
\displaystyle z\displaystyle =\displaystyle 1.89Evaluate

Brock's z-score for his SAT test is z=1.89 which means Brock scored 1.89 standard deviations above the mean.

b

Calculate and interpret the z-score for Brock's ACT score.

Worked Solution
Create a strategy

We will use the formula for z-scores again, but this time with the given values for the ACT.

Apply the idea

For the ACT, we are given x=30,\, \mu=20.8,\, and \sigma=5.7:

\displaystyle z\displaystyle =\displaystyle \dfrac{x-\mu}{\sigma}Formula for z-scores
\displaystyle z\displaystyle =\displaystyle \dfrac{30-20.8}{5.7}Substitute the known values
\displaystyle z\displaystyle =\displaystyle 1.61Evaluate

Brock's z-score for his ACT test is z=1.61 which means Brock scored 1.61 standard deviations above the mean.

c

Determine which test Brock did better on relative to all other SAT and ACT test takers.

Worked Solution
Create a strategy

Compare the z-scores found in parts (a) and (b). The higher Brock's z-score, the better he did relative to the other test takers.

Apply the idea

Relative to all people who took the SAT and ACT, Brock did slightly better on his SAT test than he did on his ACT test since his z-score was higher.

Reflect and check

Both of Brock's scores were better than average, but similar relative to the averages, so he can report either of the test scores when applying to different colleges. On college applications, only one test score is usually required.

Example 2

Three sprinters are training for a national competition. The data collected on each of their running times (in seconds) is approximately normal. Information for their mean, standard deviation, a practice 400\text{ m} sprint and its corresponding z-score are shown in the table.

\mu\sigmaz\text{-score}\text{Practice time}
Lina653-1.27
Aurelia620.8565.4
Mariana2-0.559.5
a

Find the 400\text{ m} sprint time Lina ran during practice.

Worked Solution
Create a strategy

To find the practice time that had a z-score of -1.27, we can use the formula z=\dfrac{x-\mu}{\sigma} with the given mean, standard deviation, and z-score, then solve for the practice time in seconds.

Apply the idea

From the given information, we know \mu=65, \sigma=3, andz=-1.27.

\displaystyle z\displaystyle =\displaystyle \dfrac{x-\mu}{\sigma}Formula for z-scores
\displaystyle -1.27\displaystyle =\displaystyle \dfrac{x-65}{3}Subsitute known values
\displaystyle -3.81\displaystyle =\displaystyle x-65Multiply both sides by 3
\displaystyle 61.19\displaystyle =\displaystyle xAdd 65 to both sides

Lina ran a 400\text{ m} practice time of 61.2 seconds.

Reflect and check

Running 400\text{ m} in 61.2 seconds is -1.27 standard deviations below the sprinter's average time of 65 seconds. This tells us she ran faster in that practice run than she normally does.

b

Find the standard deviation of Aurelia's times.

Worked Solution
Create a strategy

To find the standard deviation, we can use the z-score formula with the given mean, z-score, and practice time, then solve for the standard deviation.

Apply the idea

From the given information, we know \mu=62, z=0.85, and x=65.4.

\displaystyle z\displaystyle =\displaystyle \dfrac{x-\mu}{\sigma}Formula for z-scores
\displaystyle 0.85\displaystyle =\displaystyle \dfrac{65.4-62}{\sigma}Subsitute known values
\displaystyle 0.85\displaystyle =\displaystyle \dfrac{3.4}{\sigma}Evaluate the numerator
\displaystyle 0.85\sigma\displaystyle =\displaystyle 3.4Multiply both sides by \sigma
\displaystyle \sigma\displaystyle =\displaystyle 4Divide both sides by 0.85

Aurelia's 400\text{ m} times have a standard deviation of 4 seconds.

c

Find the average 400\text{ m} sprint time for Mariana.

Worked Solution
Create a strategy

To find the mean, we can use the z-score formula with the given standard deviation, z-score, and practice time, then solve for the mean.

Apply the idea

From the given information, we know \sigma=2, z=-0.5, and x=59.5.

\displaystyle z\displaystyle =\displaystyle \dfrac{x-\mu}{\sigma}Formula for z-scores
\displaystyle -0.5\displaystyle =\displaystyle \dfrac{59.5-\mu}{2}Subsitute known values
\displaystyle -1\displaystyle =\displaystyle 59.5-\muMultiply both sides by 2
\displaystyle -60.5\displaystyle =\displaystyle -\muSubtract 59.5 from both sides
\displaystyle 60.5\displaystyle =\displaystyle \muMultiply -1 to both sides

Mariana's average 400\text{ m} time is 60.5 seconds.

Reflect and check

Of the three sprinters, Mariana has the fastest average 400\text{ m} time, and her sprint times are more consistent.

Example 3

An extreme amusement park ride only allows riders over 60 inches tall to ride. Colette was not allowed to ride because she did not meet the height requirement, but her younger brother Gavin was able to ride because he was taller than the height requirement. This led her to ask the question, "How do the heights of men compare to the heights of women?"

a

Describe a method Colette can use to collect data.

Worked Solution
Create a strategy

First, use Colette's statistical question to determine the type of data that needs to be collected. Then, consider whether the data can be collected by research, a survey, an observation, or a scientific experiment.

Apply the idea

Colette needs to collect data on the heights of men and women. Since most people know their heights, Colette can use a survey or poll to collect the data.

Reflect and check

Colette could also research the average heights of men and women. While researching, she would need to make sure that the sample is representative of the population, and the data collection process did not introduce bias.

b

The data Colette collected on the heights of men and women are shown.

Female heightsMale heights
66, 61, 62, 64, 60,\\62, 64, 63, 58, 64,\\60, 68, 62, 59, 64,\\60, 64, 66, 62, 6271, 69, 71, 66, 69,\\77, 74, 72, 75, 71,\\68, 72, 70, 64, 73,\\68, 66, 70, 67, 73

Use technology to create a smooth curve to model each distribution and describe the shape of each curve.

Worked Solution
Create a strategy

Using technology, we can follow these steps to create a smooth curve of the data:

  1. Enter the data into a single column using the GeoGebra Statistics calculator.

  2. Highlight the data and select One Variable Analysis.

  3. In the settings menu (represented by the gear icon), change the frequency type to Normalized. This will adjust the values on the y-axis to reflect a probability distribution.

  4. Check the box to show the normal curve. To see the smooth curve on its own, uncheck the histogram box.

Apply the idea

First, we will create the smooth curve that approximates the women's heights.

A screenshot of the GeoGebra statistics tool showing how to display the smooth curve that models a given set of data. Speak to your teacher for more details.

The curve is symmetric and bell-shaped, meaning the data is approximately normal.

Next, we will create the smooth curve that approximates the men's heights.

A screenshot of the GeoGebra statistics tool showing how to display the smooth curve that models a given set of data. Speak to your teacher for more details.

To see the full curve, we can adjust the settings by selecting the Graph tab and unchecking the automatic dimensions. Then, we can adjust the y-Max to 0.13, which will allow us to see the top of the curve.

A screenshot of the GeoGebra statistics tool showing how to adjust the scales used in a plotting the smooth curve. Speak to your teacher for more details.

Again, the curve is symmetric and bell-shaped, meaning the data is approximately normal.

Reflect and check

Although both data sets are normally distributed, they have different measures of center and spread. This means the curves will have a similar shape, but one is likely taller than the other and they are centered around different values.

c

Answer the statistical question that Colette formulated.

Worked Solution
Create a strategy

Colette's statistical question was, "How do the heights of men compare to the heights of women?". We can answer this by analyzing the average heights of men and women, which are represented by the center of the normal curves.

Apply the idea

Looking at the smooth curves from the previous part, we can see that the curve that approximates the women's heights is centered at around 63 inches. The curve that approximates the men's heights is centered at 70 inches.

This tells us that, on average, men are taller then women.

Reflect and check

Rather than using the curves to compare the means, we could have calculated the mean of each data set using technology, then compared the values.

d

Since both data sets are normally distributed, Colette wanted to further investigate men's and women's heights relative to the height requirement for the ride. Her new statistical question is, "How does the percentage of male riders who can ride this ride compare to the percentage of female riders who can ride?"

Find and interpret the z-scores for the 60-inch height requirement relative to the average American female heights and average American male heights.

Worked Solution
Create a strategy

The average height of men and women are different, and the standard deviations of the heights are different as well. We can compare the heights of men and women by using z-scores to standardize the measurements.

To find the z-score, we can use the formula z=\dfrac{x-\mu}{\sigma} with the given height requirement. Then, we can use technology to find the mean and standard deviation of each set.

Apply the idea

By selecting "Show summary statistics" (\Sigma\text{x} icon), we can find the mean and standard deviation of women's heights.

A screenshot of the GeoGebra statistics tool showing how to display related statistics of a given set of data. Speak to your teacher for more details.

From this information, we see \mu\approx 62.5 and \sigma\approx 2.5, and we were given x=60.

The z-score for women's height is z=\dfrac{60-62.5}{2.5}=-1.

For women, the height restriction of 60 inches is only 1 standard deviation below the average female height.

Next, we will find the mean and standard deviation of men's heights.

A screenshot of the GeoGebra statistics tool showing how to display related statistics of a given set of data. Speak to your teacher for more details.

From this information, we see \mu\approx 70, and \sigma\approx 3.

The z-score is z=\dfrac{60-70}{3}\approx -3.33.

The height restriction of 60 inches is more than three standard deviations below the average male height.

Reflect and check

Notice that the mean and standard deviations of the sets were rounded to use "nice" values. This allows us to sketch the curves more easily. However, we should not round to "nice" values if the difference is relatively large.

We can use the normal distribution curves of the data to check our answers.

The image shows two normal distribution curve plotted on number line labeled with Women's height and Men's heights. Ask your teacher for more information.

The data value of 60 does lie 1 standard deviation below the mean of the women's heights and 3\frac{1}{3} standard deviations below the mean of the men's heights.

e

Compare the percentage of male riders who can ride this ride to the percentage of female riders who can ride.

Worked Solution
Create a strategy

Because we have the z-scores, we can graph the position of the height requirement relative to the men's heights and the women's heights on the same standard normal curve. Then we can use the empirical rule to find and compare the percentages.

A normal distribution curve. Below the curve is a horizontal axis with the following evenly spaced marks from left to right:-3, -2, -1, 0, 1, 2, 3. The peak of the curve is at 0. Vertical lines are drawn from the curve to each mark in the horizontal axis. The area under the curve between -3 and -2 is labeled 2.35%, between -2 and -1 labeled 13.5%, between -1 and 0 labeled 34%, between 0 and 1 labeled 34%, between 1 and 2 labeled 13.5%, and between 2 and 3 labeled 2.35%. Below the horizontal axis, a set of three brackets are shown: a bracket connecting -1 and 1 is labeled 68%, a bracket connecting -2 and 2 is labeled 95%, and a bracket connecting -3 and 3 is labeled 99.7%.
Apply the idea
A standard normal curve with Probability density on the y-axis with numbers 0.0 to 0.4 and z score on the x axis with number negative 3 to 3. An arrow labeled men z equals negative 3.33 is pointed at the left side of the negative 3 mark on the x axis. Another arrow labeled women z equals negative 1 is pointed at the negative 1 mark on the x axis.

Using the empirical rule, we can estimate that more than 99.85\% of men will be able to ride this ride. However, only 84\% of women will be able to ride.

Idea summary

Data that is normally distributed can be normalized using z-scores. This allows us to compare data sets that have different means and standard deviations.

To find the z-score of a data value, we must know the mean and standard deviation. If we know those values, we can use the following formula to find the equivalent z-score.

\displaystyle z=\dfrac{x-\mu}{\sigma}
\bm{z}
The z-score
\bm{x}
The data value
\bm{\mu}
The population mean
\bm{\sigma}
The population standard deviation

Outcomes

A2.ST.1

The student will apply the data cycle (formulate questions; collect or acquire data; organize and represent data; and analyze data and communicate results) with a focus on univariate quantitative data represented by a smooth curve, including a normal curve.

A2.ST.1b

Collect or acquire univariate data through research, or using surveys, observations, scientific experiments, polls, or questionnaires.

A2.ST.1f

Calculate and interpret the z-score for a value in a data set.

A2.ST.1g

Compare two data points from two different distributions using z-scores.

A2.ST.1h

Determine the solution to problems involving the relationship of the mean, standard deviation, and z-score of a data set represented by a smooth or normal curve.

A2.ST.1i

Apply the Empirical Rule to answer investigative questions.

What is Mathspace

About Mathspace