In  1.01 Statistical design , we learned that a sample must be representative of a population before we make inferences about the population. However, even with a well-designed study, we will see some variation in results. In this lesson we look at the margin of error in sample results. This way, we can quantify the level of confidence when making a prediction about a population parameter from a sample.
When we take a random sample from a population, we can calculate a sample statistic such as the sample mean or sample proportion. These can be used to estimate the corresponding population parameter. A sample only provides an estimate of the population parameter. With repeated sampling, the estimates vary, and a sampling distribution can be created to model the variation.
The notation \overline{x} is commonly used to represent the sample mean, while \mu represents the population mean. And \hat{p} is commonly used to represent the sample proportion, while p represents the population proportion.
Let's explore how much a sample statistic might vary from the population parameter by creating a sampling distribution of the proportion of blue chocolate candies in a large jar of Randy's candies.
Use the applet to simulate 200 samples of 20 candies.
What was the mean of the sample distribution? How does this relate to the population of candy in the jar?
By how much did the lowest and highest samples vary from the population proportion?
Does the sampling distribution appear to be approximately normally distributed? Explain your answer.
With repeated sampling from a population a sampling distribution can be created for a sample statistic. For a large enough sample size, the sampling distribution will be approximately normally distributed regardless the shape of the original distribution, and the mean of a sampling distribution is equal to the mean of the population.
A grocery store owner wants to sell apples in bags of 10 and wants to find the average weight of an apple to label the bags.
A random sample of 10 apples is taken and their weights, in grams, were as follows: 79.3, \,\, 81.5,\,\, 81.1,\,\, 86.6,\,\, 82.7,\,\, 81.2,\,\, 82.7,\,\, 79.2,\,\, 84.1,\,\, 86.4
Calculate the sample mean.
A sampling distribution is created by taking 100 samples of size 10. This distribution has a mean of 80 \text { g} and a standard deviation of 3.85 \text{ g}. A graph of this distribution is shown below.
What percentage of sample means are within 2 standard deviations of the mean for this data? Does this match what we expect from approximately normally distributed data?
The grocery store owner decides to label the bags of apples as 10 times the mean weight of an apple in the population. Determine what the label on the bag should be.
With repeated sampling from a population, a sampling distribution can be created for a sample statistic. For a large enough sample size, the sampling distribution will be approximately normally distributed, and the mean of a sampling distribution is equal to the mean of the population.
When taking repeated samples, we can accurately predict the population parameter. However, we do not generally have the luxury of taking multiple samples. How confident can we be when using a single sample as an estimate for a population parameter?
Estimates for population characteristics are usually given along with a margin of error.
Using repeated sampling or simulation, we can estimate the margin of error from a sampling distribution by creating an interval about the mean that contains most of the sample outcomes. A commonly used guideline for 'most' is that the interval should contain approximately 95\% of the outcomes. The margin of error will then be half the size of such an interval.
To estimate the margin of error from a single outcome, we need to understand the variation between individual samples taken from a given population. We can observe the variation by creating sampling distributions for a very large number of samples using simulation and probability. These models will approach the theoretical sampling distribution as we increase the number of samples taken.
The theoretical distribution tells us that the samples vary in a predictable way, dependent on the standard deviation of the population and the size of the individual samples. Let's explore how the sample size and standard deviation are related for sampling distributions.
Here is a table of standard deviations that were obtained when creating sampling distributions with different sample sizes. The population the samples were taken from had a mean of 50 and a standard deviation of 6.
Sample size | Standard deviation |
---|---|
4 | 3.00 |
10 | 1.90 |
25 | 1.20 |
50 | 0.85 |
100 | 0.60 |
Can you determine a pattern in how the standard deviation changes in relation to the sample size?
Why would you expect to see less variation as the sample size becomes larger?
Estimate the standard deviation if the sample size was 400.
When the sample size is large enough, the theoretical sampling distribution will be approximately normally distributed with:
A mean equal to the population mean
A standard deviation equal to the population standard deviation divided by the square root of the sample size, \dfrac{\sigma}{\sqrt{n}}
This gives us important information about how the samples can vary and we can use this to create a range of likely values for a population parameter based on a single sample.
Recall that in a normal distribution, approximately 95\% of outcomes lie within 2 standard deviations from the mean. Using 2 standard deviations as the margin of error will create an interval about the sample statistic that would capture the population mean for 95\% of samples. We often refer to a margin of error calculated this way as the 95\% confidence level margin of error.
We often do not know the value of the population standard deviation. In such cases, we can use the sample standard deviation as an estimate for the population standard deviation. Using this, we can estimate the margin of error from a single sample as follows:
Margin of error gives you a sense of how accurate a sample statistic predicts the population parameter. For example, a poll may quote the support for a political candidate as 52\% with a margin of error of 5\%. This means we are confident that the actual support for the candidate in the wider population is between 47\% and 57\%.
A council conducted a survey to gauge the opinion of local residents on a proposed rule enforcing an evening curfew for cats. Of the responses, 42\% were in favor of the new rule. The survey reported a margin of error of 3.5\%. Interpret the margin of error in this context.
The following is a simulation of sampling distribution for 160 samples of size 20. The mean of the distribution is 0.65. Determine a reasonable estimate for the margin of error for samples of size 20 from the same population.
A sample of 25 phones of a particular model are taken to estimate how long the phone's battery lasts under typical usage. In the sample, the batteries lasted an average of 8.5 hours with a standard deviation of 0.8 hours.
Give an estimate of the margin of error for the battery life.
If we wanted to reduce the margin of error by 50\% but keep the same confidence level, how many phones would we need to sample?
Before an election a poll of 200 voters was conducted and candidate X was ahead with 43\% of the vote compared to candidate Y with 38\%. If the poll reported a margin of error of 4\%, could the two candidates actually have an equal level of support in the wider population of voters? Justify your answer.
The margin of error reflects the accuracy of an estimate in predicting the value of a population parameter. At a 95\% confidence level, the margin of error can be estimated from a single sample using: