1. Inferential Statistics

Lesson

We've already been introduced to the normal distribution. In this chapter, we are going to focus on the standard normal distribution which is the simplest form of the normal distribution. It has three key features:

- its graph is bell shaped
- it has a mean, median and mode of $0$0 (ie. $\mu=0$
`μ`=0) - it has a standard deviation of $1$1 (ie. $\sigma=1$
`σ`=1)

The standard normal distribution is of great interest because any data set with some other normal distribution can be rescaled so that it has the standard normal distribution. This means that we only need to know the properties of the standard normal distribution in order to make predictions about data from infinitely many other normal distributions.

The statistic known as a *z-score* is often used in relation to the normal probability distribution, although it can be used with other distributions. The process of obtaining a z-score is called standardization.

In this process, the raw scores are shifted by subtracting a fixed number, the mean, from all of them so that the new mean becomes zero. Then, each of these numbers is divided by the standard deviation so that the new standard deviation is one. Thus, each of the raw scores has a corresponding z-score and, taken as a whole, they have a mean of $0$0 and a standard deviation of $1$1.

The advantage of the standardization procedure is that scores of the same kind from different experiments can be compared.

To obtain a z-score, it is necessary to know the mean $\mu$`μ` and the standard deviation $\sigma$`σ` of a complete population. If $x$`x` is a raw score, the corresponding z-score is $z=\frac{x-\mu}{\sigma}$`z`=`x`−`μ``σ`.

Each number in the data set is a single measurement of some feature of an experimental subject. (They might be test scores from a group of high school students, for example.) To standardize a set of scores we first calculate the mean and standard deviation of the raw data.

The following scores were obtained from a mathematics class.

$35,44,51,52,59,64,64,67,69,69,70,70,71,73,73,75,78,79,83,91$35,44,51,52,59,64,64,67,69,69,70,70,71,73,73,75,78,79,83,91

The average is $66.85$66.85 and the standard deviation is $13.02$13.02.

We subtract the mean from each score:

$-31.85$−31.85 | $-22.85$−22.85 | $-15.85$−15.85 | $-14.85$−14.85 | $-7.85$−7.85 |

$-2.85$−2.85 | $-2.85$−2.85 | $0.15$0.15 | $2.15$2.15 | $2.15$2.15 |

$3.15$3.15 | $3.15$3.15 | $4.15$4.15 | $6.15$6.15 | $6.15$6.15 |

$8.15$8.15 | $11.15$11.15 | $12.15$12.15 | $16.15$16.15 | $24.15$24.15 |

We divide each of these numbers by the standard deviation:

$-2.45$−2.45 | $-1.75$−1.75 | $-1.22$−1.22 | $-1.14$−1.14 | $-0.60$−0.60 |

$-0.22$−0.22 | $-0.22$−0.22 | $0.01$0.01 | $0.17$0.17 | $0.17$0.17 |

$0.24$0.24 | $0.24$0.24 | $0.32$0.32 | $0.47$0.47 | $0.47$0.47 |

$0.63$0.63 | $0.86$0.86 | $0.93$0.93 | $1.24$1.24 | $1.85$1.85 |

These are the standardized scores. A negative z-score is below the average and a positive z-score is above.

If a particular set of scores is from a normal probability distribution, then approximately 68% of the observations will be within one standard deviation from the mean, 95% of the observations will be within 2 standard deviations of the mean and 99.7% will be closer than 3 standard deviations from the mean. The observations tend to be densest near the mean and the density falls off with distance from the mean. The typical density curve has a shape as in the following diagram.

If the scores in Example $1$1 really are from a normal distribution, we would expect $68%$68% of the z-scores to lie somewhere between $-1$−1 and $1$1. In fact, there are $14$14 in this region, which is about $68%$68% of $20$20. Only $1$1 of the z-scores is further than $2$2 standard deviations from the mean, which is still more than would be expected from a normal distribution.

If a student is chosen at random from the same class as in the Exploration above, what is the probability that the student had a z-score of greater than $1$1 in the test?

In a normal distribution, $100-68=32%$100−68=32% of scores are more than $1$1 standard deviation from the mean. Of these, half of them or $16%$16% should be above the mean. So, the probability is $16%$16% or $0.16$0.16 that the chosen student has a z-score of more than $1$1.

This probability estimate should be treated with some caution, however, because the data may not be strictly normally distributed. Notice, for example, that there are more positive z-scores than negative. In a normal distribution, the observations would be arranged more symmetrically about the mean. Since there are more positive z-scores than negative in the data, it may be that the probability we seek is a little higher than the calculated $0.16$0.16.

Tables are available to determine probabilities associated with non-whole number z-scores from the standard normal distribution. Using these, one can answer questions like, What is the probability that an observed value will be at least $z$`z`? or What is the probability that an observation will fall between $0$0 and $z$`z`?

The table below shows the area under the standard normal curve between $0$0 and a given $z$`z`-score. Use this table to find the probability that a variable has a $z$`z`-score less than $z=0.85$`z`=0.85.

Give your answer to four decimal places.

A sprinter is training for a national competition. She runs 400 m in an average time of $75$75 seconds, with a standard deviation of $6$6 seconds.

Use the table below showing the area under the standard normal curve between $0$0 and a given $z$`z`-score to answer the following questions.

Determine the $z$

`z`-score of a time of $67$67 seconds. Round your answer to two decimal places.Find $P(X$

`P`(`X`$<$<$67$67$)$). Round your answer to four decimal places.The value $0.0918$0.0918 represents the probability that:

The sprinter will run 400 m in less than $67$67 seconds.

AThe sprinter will run 400 m in exactly than $67$67 seconds.

BThe sprinter will run 400 m in more than $67$67 seconds.

CThe sprinter will run 400 m in less than $67$67 seconds.

AThe sprinter will run 400 m in exactly than $67$67 seconds.

BThe sprinter will run 400 m in more than $67$67 seconds.

C

The mean height of an adult male is $1.78$1.78 m, with a standard deviation of $9$9 cm.

Determine the $z$

`z`-score of a height of $1.69$1.69 m.If $700$700 males are chosen at random, approximately how many males will be taller than $1.69$1.69 m?

Round your answer to the nearest whole number of people.

Use the examples and videos below to work through how to obtain normal distribution probabilities with your graphing or scientific calculator. You may need to do a quick search or receive some guidance on where to find the normal distribution functions on your particular model of calculator.

Using your calculator, find the area under the normal curve between $z=-1.23$`z`=−1.23 and $z=-1.55$`z`=−1.55.

Give your answer to four decimal places.

Using your calculator, find the probability that a $z$`z`-score is at most $1.60$1.60 given that it is greater than $-0.69$−0.69 in the standard normal distribution.

Give your answer correct to $4$4 decimal places.

Use the mean and standard deviation of a data set to fit it to a normal distribution and to estimate population percentages. Recognize that there are data sets for which such a procedure is not appropriate. Use calculators, spreadsheets, and tables to estimate areas under the normal curve.

Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling.

Use data from a randomized experiment to compare two treatments; use simulations to decide if differences between parameters are significant.