1. Inferential Statistics

Lesson

We've already been introduced to the normal distribution and it's symmetrical bell shape. Now we are going to delve further into how scores are spread across this curve.

The empirical rule, also known as the 68-95-99.7% rule is an estimate of the spread of data. As a general rule, the majority of scores lie within three standard deviations of the mean. More specifically:

- $68%$68% of scores lie within $1$1 standard deviation of the mean.

- $95%$95% of scores lie within $2$2 standard deviations of the mean.

- $99.7%$99.7% of scores lie within $3$3 standard deviations of the mean.

Since our normal distribution is so beautifully symmetrical, we can actually divide these regions up further. For example, as $95%$95% of scores lie within $2$2 standard deviations of the mean, $47.5%$47.5% (half of $95%$95%) of scores will lie between the mean and $2$2 standard deviations above the mean, as shown in the picture below.

This same principle applies for any of the empirical rule values and we can use this information to work out the spread of scores. For example, we can say $81.5%$81.5% ($68%+13.5%$68%+13.5%) of scores lie between $2$2 standard deviations below and $1$1 standard deviation above the mean.

Play around with this applet by moving the endpoints of the shaded region. You will see the percentage of scores lying between the endpoints, and can reveal the percentages of each piece with the toggle:

Watch out!

As the normal distribution is bell shaped, the spread of scores does not remain consistent between measures of standard deviation and hence, the percentage amounts can't be transferred between regions.

For example, $68%$68% of scores lie between $1$1 standard deviation below and $1$1 standard deviation above the mean. However, only $47.5%$47.5% of scores lie between $2$2 standard deviations below the mean.

Standard deviation is a measure of spread that we can apply to everyday contexts. For example, let's say the mean score in a test was $67$67 and the standard deviation was $7$7 points. This means that:

- a person who was one standard deviation above the mean would have received a score of $74$74 (as this is $67+7$67+7).
- a person who was two standard deviations below the mean would have received a score of $53$53 (as this is $67-2\times7$67−2×7)

Remember!

If the standard deviation is above the mean, you add.

If the standard deviation is below the mean, you subtract.

The grades in a test are approximately normally distributed. The mean mark is $60$60 with a standard deviation of $2$2.

Between which two scores does approximately $68%$68% of the results lie symmetrically about the mean? Write both scores on the same line, separated by a comma.

Between which two scores does approximately $95%$95% of the results lie symmetrically about the mean? Write both scores on the same line, separated by a comma.

Between which two scores does approximately $99.7%$99.7% of the results lie symmetrically about the mean? Write both scores on the same line, separated by a comma.

The following figure shows the approximate percentage of scores lying within various standard deviations from the mean of a normal distribution. The heights of $600$600 boys are found to approximately follow such a distribution, with a mean height of $145$145 cm and a standard deviation of $20$20 cm. Find the number of boys with heights between:

$125$125 cm and $165$165 cm

$105$105 cm and $185$185 cm

$85$85 cm and $205$205 cm (to the nearest whole number)

$145$145 cm and $165$165 cm

$165$165 cm and $185$185 cm (to the nearest whole number)

In a normal distribution, what percentage of scores lie between $2$2 standard deviations below and $3$3 standard deviations above the mean? Use the empirical rule to find your answer.

As we discovered previously, the shape or spread of a normal distribution is affected by the standard deviation, which varies depending on the data set. Just like in every branch of mathematics, to directly compare multiple normally distributed data sets, we need a common unit of measurement. In statistics involving the normal distribution, we use the number of standard deviations away from the mean as a standardized unit of measurement called a $z$`z`-score.

As mentioned above, a $z$`z`-score is a value that shows how many standard deviations a score is above or below the mean. In other words, it's indicative of how an individual's score deviates from the population mean, as shown in the picture below.

- A positive $z$
`z`-score indicates the score was above the mean. - A $z$
`z`-score of 0 indicates the score was equal to the mean. - A negative $z$
`z`-score indicates the score was below the mean.

Careful!

What's really important to remember is that $z$`z`-scores can only be defined if the population parameters (ie. the mean and standard deviation of the population) are known.

Remember a "population" just means every member of a group is counted. It doesn't have to be people. For example, it may be the Australian population, all the students in Year 10 in a school or all the chickens on a farm.

$z$`z`-scores are used to compare various normally distributed data sets. For example, let's say Sam got $75$75 on his biology exam and $80$80 on his chemistry exam. On first glance, it would seem that he did better on his chemistry exam. However, then he received this info from his teacher:

Mean | S. D. | |

Chemistry | 75 | 6 |

Biology | 70 | 3 |

What does this mean for Sam?

To really understand how Sam performed in his exams, we need to calculate his z-score for both of them. Let's do that now!

There is a formula was can use for calculating the $z$`z`-scores of a population.

Formula for calculating z-scores from a population

$z=\frac{x-\mu}{\sigma}$`z`=`x`−`μ``σ`

This means:

$\text{standardized z score}=\frac{\text{raw score}-\text{population mean score}}{\text{standard deviation}}$standardized z score=raw score−population mean scorestandard deviation

So let's start start by calculating Sam's $z$`z`-score for biology:

$z$z |
$=$= | $\frac{x-\mu}{\sigma}$x−μσ |

$=$= | $\frac{75-70}{3}$75−703 | |

$=$= | $1.6666$1.6666... | |

$z$z |
$=$= | $1.67$1.67 (to 2 d.p.) |

This means he is $1.67$1.67 standard deviations above the mean in biology.

Now let's calculate his $z$`z`-score for chemistry:

$z$z |
$=$= | $\frac{x-\mu}{\sigma}$x−μσ |

$=$= | $\frac{80-75}{6}$80−756 | |

$=$= | $0.8333$0.8333... | |

$=$= | $0.83$0.83 (to 2 d.p.) |

This means he is $0.83$0.83 standard deviations above the mean in chemistry.

His $z$`z`-score for biology was nearly twice what it was for chemistry! We'll learn more about comparing scores later but if you think about the empirical rule, this is a really significant jump!

A general ability test has a mean score of $100$100 and a standard deviation of $15$15.

If Paul received a score of $102$102 in the test, what was his $z$

`z`-score correct to two decimal places?If Georgia had a $z$

`z`-score of $3.13$3.13, what was her score in the test, correct to the nearest integer?

Kathleen scored $83.4$83.4 in her Biology exam, in which the mean score and standard deviation were $81$81 and $2$2 respectively. She also scored $60$60 in her Geography exam, in which the mean score was $46$46 and the standard deviation was $4$4.

Find Kathleen’s $z$

`z`-score in Biology. Give your answer to one decimal place if needed.Find Kathleen’s $z$

`z`-score in Geography. Give your answer to one decimal place if needed.Which exam did Kathleen do better in?

Biology

AGeography

BBiology

AGeography

B

The number of runs scored by Maximilian in each of his innings is listed below.

$34,33,31,33,32,32,33,31,33,33$34,33,31,33,32,32,33,31,33,33

What was his batting average correct to two decimal places?

What was his (sample) standard deviation correct to two decimal places?

What was $z$

`z`-score of his final innings score, correct to two decimal places?What was the $z$

`z`-score of his greatest score, correct to two decimal places?

When you get your grades back for an assessment, I bet you often ask your teacher for the average, or the mean grade. Let's take a look now at why it would also be beneficial for you to ask your teacher for the standard deviation of the results as well.

Calculating z-scores from a population

$z=\frac{x-\mu}{\sigma}$`z`=`x`−`μ``σ`

This means:

$\text{standardized z score}=\frac{\text{raw score}-\text{population mean score}}{\text{standard deviation}}$standardized z score=raw score−population mean scorestandard deviation

We also spoke a little about how you can use them to compare results. Let's delve into this a bit further.

Let's say that you have completed two math tests this year. In each test you scored $70%$70%. In each test the mean mark was $60%$60%.

At first glance it appears that you performed just as well in both tests when comparing yourself to the mean of the group. I'll now reveal the standard deviation of each test and now we'll get the true picture of your performance.

Let's say that the standard deviation for Test $1$1 was $10%$10% and the standard deviation for Test $2$2was $15%$15%.

We can see that for Test $1$1 you achieved a mark one whole standard deviation above the mean. Well done!

For Test $2$2 however, you achieved a mark less than one whole standard deviation above the mean, which indicates that your achievement in Test $2$2 is not as strong as your achievement in Test $1$1.

We can calculate the $z$`z`-scores to gain a little more insight.

$Z$`Z`-score for Test $1$1:

$z$z |
$=$= | $\frac{x-\mu}{\sigma}$x−μσ |

$=$= | $\frac{70-60}{10}$70−6010 | |

$=$= | $1$1 | |

$Z$`Z`-score for Test $2$2:

$z$z |
$=$= | $\frac{x-\mu}{\sigma}$x−μσ |

$=$= | $\frac{70-60}{15}$70−6015 | |

$=$= | $0.66666$0.66666... | |

$z$z |
$=$= | $0.67$0.67 (to 2 decimal places) |

Our calculations support our observations about the achievement in both math tests.

Marge scored $43$43 in her Mathematics exam, in which the mean score was $49$49 and the standard deviation was $5$5. She also scored $92.2$92.2 in her Philosophy exam, in which the mean score was $98$98 and the standard deviation was $2$2.

Find Marge’s $z$

`z`-score in Mathematics.Find Marge’s $z$

`z`score in Philosophy.Which exam did Marge do better in, compared to the rest of her class?

Philosophy

AMathematics

BPhilosophy

AMathematics

B

A factory packages two types of cereal: Rainbow Crispies in a $600$600 g box and Honey Combs in a $650$650g. A box of Rainbow Crispies has a mean mass of $600$600 g with a standard deviation of $2.2$2.2 g. A box of Honey Combs has a mean mass of $650$650 g with a standard deviation of $1.4$1.4 g.

A box of Rainbow Crispies was selected at random for quality control. It had a mass of $613.2$613.2 g. Calculate the $z$

`z`-score of this box.A box of Honey Combs was selected at random for quality control. It had a mass of $653.08$653.08 g. Calculate the $z$

`z`-score of this box.Based on the z-scores, which box of cereal is closer to its marked mean mass.

Honey Combs

ARainbow Crispies

BHoney Combs

ARainbow Crispies

B

Use the mean and standard deviation of a data set to fit it to a normal distribution and to estimate population percentages. Recognize that there are data sets for which such a procedure is not appropriate. Use calculators, spreadsheets, and tables to estimate areas under the normal curve.

Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling.

Use data from a randomized experiment to compare two treatments; use simulations to decide if differences between parameters are significant.