NZ Level 7 (NZC) Level 2 (NCEA)

Applications of Standard Deviation using sample population

Lesson

*Standard deviation* is a measure commonly used to describe how widely spread are the observations in a set of data.

If most of the observations are clustered around a central value, then the standard deviation is small. If there are many observations scattered away from the mean value, the standard deviation will be large.

Education authorities wish to compare the performances of students in different subjects in a school and also between schools in tests that have different averages and different powers of differentiating between students' abilities.

To do this, each score from a school test is given as the number of standard deviations it is away from the mean. These numbers are usually called *z*-scores. A student who achieves a *z*-score of $1.75$1.75 has scored significantly above the average result for the class while a student with a *z*-score of $-0.62$−0.62 has achieved a below average result.

The example of *z*-scores used in the context of school tests is one of calculating the standard deviation from an entire population of experimental subjects. A *population standard deviation* is indicated by the symbol $\sigma$`σ`.

We calculate $\sigma$`σ` as follows.

Suppose the observed scores are $\left\{x_1,x_2,x_3,...,x_n\right\}${`x`1,`x`2,`x`3,...,`x``n`}. We begin by calculating the mean of the scores.

$\mu=\frac{1}{n}\sum_{i=1}^n\ x_i$`μ`=1`n``n`∑`i`=1 `x``i`

Then, $\sigma$`σ` is the square root of the average of the squared distances of the observations from the mean. That is,

$\sigma=\sqrt{\frac{1}{n}\sum_{i=1}^n\left(\mu-x_i\right)^2}$`σ`=√1`n``n`∑`i`=1(`μ`−`x``i`)2

Statistical calculations often use a quantity called the population *variance.* This is just the square of the standard deviation and it is notated $\sigma^2$`σ`2.

In other situations, a researcher will infer information about a population by analysing a sample drawn randomly from the population.

The mean of a random sample drawn from a population is likely to be close to the population mean if the sample is large enough, but a sample tends to underestimate the population standard deviation.

It is shown in more advanced statistical textbooks that to get an unbiased estimate of the true population standard deviation from a sample it is necessary to divide by $n-1$`n`−1 rather than by $n$`n` in the averaging operation. Also, we need to use the sample mean $\overline{x}$`x` as an estimator for the population mean $\mu$`μ`. (The sample mean is calculated in the same way as the population mean.)

Thus, we obtain a sample standard deviation $s$`s` with the formula

$s=\sqrt{\frac{1}{n-1}\sum_{i=1}^n\left(\overline{x}-x_i\right)^2}$`s`=√1`n`−1`n`∑`i`=1(`x`−`x``i`)2

Suppose the same mathematics test was administered to $10000$10000 year $11$11 students. Education authorities are interested in how well the test distinguished between different levels of ability among the students. They consider that a better test would have a wider spread of scores than an inferior one.

A random sample of $50$50 student results is drawn from the $10000$10000. It is standard practice to use digital technology in some form to calculate the mean and the standard deviation of a set of scores. In doing so, it would be important in this case to choose the *sample* standard deviation option rather than the population standard deviation.

If the mean of the sample is $67%$67% and the sum of the squared differences from the mean is $1035$1035, the sample standard deviation is

$s=\sqrt{\frac{1}{50+1}\times1035}\approx4.5$`s`=√150+1×1035≈4.5

Assuming that the $10000$10000 scores had an approximately normal distribution, we expect the majority of scores to lie within $1$1 standard deviation from the mean and nearly all of them to lie within $3$3 standard deviations from the mean. This suggests that the test had enough easy questions so that most students could achieve scores above $53%$53% but there may have been some very hard questions so that hardly anyone could achieve a score better than $80%$80%.

The authorities may have preferred a test that produced a wider spread of marks than this one did.

The mean of a set of scores is $77$77 and

the standard deviation is $29$29. Find the value of:

$\text{Mean }-\text{Standard Deviation}$Mean −Standard Deviation

$\text{Mean }+2\times\text{Standard Deviation}$Mean +2×Standard Deviation

$\text{Mean }-\frac{\left(2\times\text{Standard Deviation}\right)}{3}$Mean −(2×Standard Deviation)3

$\text{Mean }+\frac{\left(4\times\text{Standard Deviation}\right)}{5}$Mean +(4×Standard Deviation)5

$\text{Mean }+3\times\text{Standard Deviation}$Mean +3×Standard Deviation

$\text{Mean }-2\times\text{Standard Deviation}$Mean −2×Standard Deviation

The following table shows the marks obtained by a student in two subjects.

Subject | Mean | Standard Deviation |
---|---|---|

Science | $89$89 | $11$11 |

English | $72$72 | $12$12 |

Find the mark in Science that is 2 standard deviations below the mean.

Find the mark in English that is 1.5 standard deviations above the mean

Find the mark in Science that is 0.5 standard deviations above the mean

Find the mark in English that is 1 standard deviation below the mean.

A batsman’s mean number of runs is $62$62 and the standard deviation is $13$13. In the next match he makes $50$50 runs. If this score is added to the existing scores, which is true of the new mean and standard deviation?

Mean $>$> $62$62, with standard deviation $<$< $13$13

AMean $>$> $62$62, with standard deviation $>$> $13$13

BMean $<$< $62$62, with standard deviation $<$< $13$13

CMean $<$< $62$62, with standard deviation $>$> $13$13

DMean $>$> $62$62, with standard deviation $<$< $13$13

AMean $>$> $62$62, with standard deviation $>$> $13$13

BMean $<$< $62$62, with standard deviation $<$< $13$13

CMean $<$< $62$62, with standard deviation $>$> $13$13

D

S7-4 Investigate situations that involve elements of chance: A comparing theoretical continuous distributions, such as the normal distribution, with experimental distributions B calculating probabilities, using such tools as two-way tables, tree diagrams, simulations, and technology.

Apply probability methods in solving problems