NZ Level 7 (NZC) Level 2 (NCEA)

Standard Deviation using sample population

Lesson

When we wish to know certain statistics to do with large numbers of subjects, usually whole populations of some sort, it can be too expensive or it may be impossible for other reasons to collect data on every subject. For this reason, techniques have been developed to make it possible to estimate the population statistics on the basis of random samples drawn from the whole population.

In the case of the statistic we call *standard deviation, *we use a modified version of the formula given previously for *population standard deviation *to produce an estimate which we call the* sample standard deviation.*

In the following formula, the numbers $x_i$`x``i` are the values in the sample data. There is one value for each subscript $i$`i`.

There are $n$`n` numbers $x_i$`x``i` in the sample. So, $i$`i` goes from $1$1 to $n$`n` in the summation.

The symbol $\overline{x}$`x` (pronounced '$x$`x` bar') is the sample mean.

The letter $s$`s` is used for the sample standard deviation and $s^2$`s`2 is used for the sample variance.

The symbol $\Sigma$Σ (upper case sigma) is the summation symbol.

This is the formula by which a calculator calculates the sample standard deviation. It differs from the population standard deviation formula in that it has a smaller divisor $(n-1)$(`n`−1) than in the population formula. This makes the estimate slightly larger than it would be otherwise. The change is needed to make $s^2$`s`2 what is called an *unbiased* estimator of the population variance $\sigma^2$`σ`2.

$s^2=\frac{1}{n-1}\Sigma\left(x_i-\overline{x}\right)^2$`s`2=1`n`−1Σ(`x``i`−`x`)2

As before, there are several steps in the calculation. It is usually best to automate the process with the help of a calculator or with a computer application.

STEPS

- Calculate the sample mean. $\overline{x}=\frac{1}{n}\Sigma_{i=1}^n\ x_i$
`x`=1`n`Σ`n``i`=1`x``i` - Find the difference from the mean for each score. $x_i-\overline{x}$
`x``i`−`x`. - Square each of the differences. $\left(x_i-\overline{x}\right)^2$(
`x``i`−`x`)2 - Sum the squared differences. $\Sigma\left(x_i-\overline{x}\right)^2$Σ(
`x``i`−`x`)2 - Divide the sum by one less than the number of scores. $\frac{1}{n-1}\Sigma\left(x_i-\overline{x}\right)^2$1
`n`−1Σ(`x``i`−`x`)2 - Take the square root. $s=\sqrt{\frac{1}{n-1}\Sigma\left(x_i-\overline{x}\right)^2}$
`s`=√1`n`−1Σ(`x``i`−`x`)2

Statistical displays including histograms, dot plots, stem-and-leaf plots and others, are developed from the sample data. The same raw data is used to calculate statistics including the sample mean, the sample standard deviation and also the range, the median, the quartiles and the interquartile range.

Before calculators with statistical functions became readily available, it was sometimes necessary to derive these quantities from a histogram or some other display. Techniques to do this are still sometimes taught although little used in practice.

In the case of a histogram, we can get the number of observations in each class (histogram column) and then use the class centre as an estimate for the average value for the class. Multiplying these numbers for each class gives the approximate total for the class. Summing these totals for all classes gives the total for all observations in the data set. The mean is then this number divided by the total number of observations.

The median can be found from a *cumulative frequency histogram *by finding the number on the horizontal axis that corresponds with the point on the inscribed polygon that is half-way up the vertical range.

In a sample of scores it was found that

$10$10 subjects had the value $23$23

$13$13 subjects had the value $24$24

$12$12 subjects had the value $25$25

$8$8 subjects had the value $26$26, and

$3$3 subjects had the value $27$27.

The following statistics are needed: range, mean, sample standard deviation, median, first and third quartiles, interquartile range.

The smallest score is $23$23 and the largest $27$27. Therefore the range is $27-23=4$27−23=4.

There are $10+13+12+8+3=46$10+13+12+8+3=46 subjects and their scores come to

$10\times23+13\times24+12\times25+8\times26+3\times27=1131$10×23+13×24+12×25+8×26+3×27=1131.

Therefore, the mean is $\frac{1131}{46}\approx24.58$113146≈24.58.

By spreadsheet: $s\approx1.2$`s`≈1.2.

The median is the average of the $23$23rd and $24$24th scores, which is $24.5$24.5.

By spreadsheet:

$Q_1=24$`Q`1=24

$Q_3=25.25$`Q`3=25.25

$Q_3-Q_1=25.25-24=1.25$`Q`3−`Q`1=25.25−24=1.25

Find the following based on this set of scores:

$19,18,14,19,10$19,18,14,19,10

Find the mean.

Complete the following table.

Score($x$ `x`)$(x-$( `x`−mean$)$)$(x-$( `x`−mean$)^2$)2$19$19 $\editable{}$ $\editable{}$ $18$18 $\editable{}$ $\editable{}$ $14$14 $\editable{}$ $\editable{}$ $19$19 $\editable{}$ $\editable{}$ $10$10 $\editable{}$ $\editable{}$ Thus, find the sample standard deviation, correct to 2 decimal places.

Find the range of the set of scores.

Find the sample standard deviation of the following set of scores correct to $2$2 decimal places by using the statistics mode on the calculator:

$-14,5,1,-7,8,-17,-6,8,5,3$−14,5,1,−7,8,−17,−6,8,5,3

Use the cumulative frequency histogram given to answer the following.

Determine the range of scores.

Determine the mode.

Determine the median score.

Calculate the mean. Give your answer correct to two decimal places.

Use your calculator to find the sample standard deviation correct to one decimal place.

S7-4 Investigate situations that involve elements of chance: A comparing theoretical continuous distributions, such as the normal distribution, with experimental distributions B calculating probabilities, using such tools as two-way tables, tree diagrams, simulations, and technology.

Apply probability methods in solving problems