NZ Level 7 (NZC) Level 2 (NCEA) Standard Deviation using sample population
Lesson

When we wish to know certain statistics to do with large numbers of subjects, usually whole populations of some sort, it can be too expensive or it may be impossible for other reasons to collect data on every subject. For this reason, techniques have been developed to make it possible to estimate the population statistics on the basis of random samples drawn from the whole population.

In the case of the statistic we call standard deviation, we use a modified version of the formula given previously for  population standard deviation to produce an estimate which we call the sample standard deviation.

In the following formula, the numbers $x_i$xi are the values in the sample data. There is one value for each subscript $i$i.
There are $n$n numbers $x_i$xi in the sample. So, $i$i goes from $1$1 to $n$n in the summation.
The symbol $\overline{x}$x (pronounced '$x$x bar') is the sample mean.
The letter $s$s is used for the sample standard deviation and $s^2$s2 is used for the sample variance.
The symbol $\Sigma$Σ (upper case sigma) is the summation symbol.

This is the formula by which a calculator calculates the sample standard deviation. It differs from the population standard deviation formula in that it has a smaller divisor $(n-1)$(n1) than in the population formula. This makes the estimate slightly larger than it would be otherwise. The change is needed to make $s^2$s2 what is called an unbiased estimator of the population variance $\sigma^2$σ2.

$s^2=\frac{1}{n-1}\Sigma\left(x_i-\overline{x}\right)^2$s2=1n1Σ(xix)2

As before, there are several steps in the calculation. It is usually best to automate the process with the help of a calculator or with a computer application.

STEPS

1. Calculate the sample mean.  $\overline{x}=\frac{1}{n}\Sigma_{i=1}^n\ x_i$x=1nΣni=1 xi
2. Find the difference from the mean for each score. $x_i-\overline{x}$xix.
3. Square each of the differences.    $\left(x_i-\overline{x}\right)^2$(xix)2
4. Sum the squared differences.  $\Sigma\left(x_i-\overline{x}\right)^2$Σ(xix)2
5. Divide the sum by one less than the number of scores.  $\frac{1}{n-1}\Sigma\left(x_i-\overline{x}\right)^2$1n1Σ(xix)2
6. Take the square root.  $s=\sqrt{\frac{1}{n-1}\Sigma\left(x_i-\overline{x}\right)^2}$s=1n1Σ(xix)2

Statistical displays including histograms, dot plots, stem-and-leaf plots and others, are developed from the sample data. The same raw data is used to calculate statistics including the sample mean, the sample standard deviation and also the range, the median, the quartiles and the interquartile range.

Before calculators with statistical functions became readily available, it was sometimes necessary to derive these quantities from a histogram or some other display. Techniques to do this are still sometimes taught although little used in practice.

In the case of a histogram, we can get the number of observations in each class (histogram column) and then use the class centre as an estimate for the average value for the class. Multiplying these numbers for each class gives the approximate total for the class. Summing these totals for all classes gives the total for all observations in the data set. The mean is then this number divided by the total number of observations.

The median can be found from a cumulative frequency histogram by finding the number on the horizontal axis that corresponds with the point on the inscribed polygon that is half-way up the vertical range.

#### Example

In a sample of scores it was found that
$10$10 subjects had the value $23$23
$13$13 subjects had the value $24$24
$12$12 subjects had the value $25$25
$8$8 subjects had the value $26$26, and
$3$3 subjects had the value $27$27.

The following statistics are needed: range, mean, sample standard deviation, median, first and third quartiles, interquartile range.

### range

The smallest score is $23$23 and the largest $27$27. Therefore the range is $27-23=4$2723=4.

### sample mean

There are $10+13+12+8+3=46$10+13+12+8+3=46 subjects and their scores come to
$10\times23+13\times24+12\times25+8\times26+3\times27=1131$10×23+13×24+12×25+8×26+3×27=1131.
Therefore, the mean is $\frac{1131}{46}\approx24.58$11314624.58.

### sample standard deviation

By spreadsheet: $s\approx1.2$s1.2.

### median

The median is the average of the $23$23rd and $24$24th scores, which is $24.5$24.5.

### quartiles

$Q_1=24$Q1=24
$Q_3=25.25$Q3=25.25

### interquartile range

$Q_3-Q_1=25.25-24=1.25$Q3Q1=25.2524=1.25

#### Worked Examples

##### Question 1

Find the following based on this set of scores:

$19,18,14,19,10$19,18,14,19,10

1. Find the mean.

2. Complete the following table.

Score($x$x) $(x-$(xmean$)$) $(x-$(xmean$)^2$)2
$19$19 $\editable{}$ $\editable{}$
$18$18 $\editable{}$ $\editable{}$
$14$14 $\editable{}$ $\editable{}$
$19$19 $\editable{}$ $\editable{}$
$10$10 $\editable{}$ $\editable{}$
3. Thus, find the sample standard deviation, correct to 2 decimal places.

4. Find the range of the set of scores.

##### Question 2

Find the sample standard deviation of the following set of scores correct to $2$2 decimal places by using the statistics mode on the calculator:

$-14,5,1,-7,8,-17,-6,8,5,3$14,5,1,7,8,17,6,8,5,3

##### Question 3

Use the cumulative frequency histogram given to answer the following.

1. Determine the range of scores.

2. Determine the mode.

3. Determine the median score.

4. Calculate the mean. Give your answer correct to two decimal places.

5. Use your calculator to find the sample standard deviation correct to one decimal place.

### Outcomes

#### S7-4

S7-4 Investigate situations that involve elements of chance: A comparing theoretical continuous distributions, such as the normal distribution, with experimental distributions B calculating probabilities, using such tools as two-way tables, tree diagrams, simulations, and technology.

#### 91267

Apply probability methods in solving problems