topic badge

1.04 Margin of error

Introduction

In  1.01 Statistical design  , we learned that a sample must be representative of a population before we make inferences about the population. However, even with a well-designed study, we will see some variation in results. In this lesson we look at the margin of error in sample results. This way, we can quantify the level of confidence when making a prediction about a population parameter from a sample.

Variation in samples

When we take a random sample from a population, we can calculate a sample statistic such as the sample mean or sample proportion. These can be used to estimate the corresponding population parameter. A sample only provides an estimate of the population parameter. With repeated sampling, the estimates vary, and a sampling distribution can be created to model the variation.

Sample mean

The average of a sample taken from a larger population. Often used as an estimate of the population mean.

Sample proportion

The proportion of a sample taken from a larger population that matches a given criteria. Often used as an estimate of the population proportion.

Sampling distribution

A probability distribution of a statistic that is obtained through repeated sampling of a specific population.

The notation \overline{x} is commonly used to represent the sample mean, while \mu represents the population mean. And \hat{p} is commonly used to represent the sample proportion, while p represents the population proportion.

Exploration

Let's explore how much a sample statistic might vary from the population parameter by creating a sampling distribution of the proportion of blue chocolate candies in a large jar of Randy's candies.

Loading interactive...

Use the applet to simulate 200 samples of 20 candies.

  1. What was the mean of the sample distribution? How does this relate to the population of candy in the jar?

  2. By how much did the lowest and highest samples vary from the population proportion?

  3. Does the sampling distribution appear to be approximately normally distributed? Explain your answer.

With repeated sampling from a population a sampling distribution can be created for a sample statistic. For a large enough sample size, the sampling distribution will be approximately normally distributed regardless the shape of the original distribution, and the mean of a sampling distribution is equal to the mean of the population.

Examples

Example 1

A grocery store owner wants to sell apples in bags of 10 and wants to find the average weight of an apple to label the bags.

a

A random sample of 10 apples is taken and their weights, in grams, were as follows: 79.3, \,\, 81.5,\,\, 81.1,\,\, 86.6,\,\, 82.7,\,\, 81.2,\,\, 82.7,\,\, 79.2,\,\, 84.1,\,\, 86.4

Calculate the sample mean.

Worked Solution
Create a strategy

To find the mean of the sample, we use the formula: \text{Mean}=\dfrac{\text{Sum of the scores}}{\text{Number of scores}}

Apply the idea
\displaystyle \text{Sum of scores}\displaystyle =\displaystyle 79.3+ 81.5+ 81.1+ 86.6+ 82.7+ 81.2+ 82.7+ 79.2+ 84.1+ 86.4
\displaystyle =\displaystyle 824.8
\displaystyle \text{Sample mean}\displaystyle =\displaystyle \dfrac{\text{Sum of the scores}}{\text{Number of scores}}Formula
\displaystyle =\displaystyle \dfrac{824.8}{10}Substitute in values
\displaystyle =\displaystyle 82.48Evaluate the division

The sample mean is 82.48 g.

Reflect and check

Check that the figure seems reasonable given the scores. The apples were between 79.2 \text{ g} and 86.6 g. So the figure of 82.48 g seems reasonable. We could use this as an estimate for the mean weight of an apple in the population. How accurate do you think this estimate would be?

b

A sampling distribution is created by taking 100 samples of size 10. This distribution has a mean of 80 \text { g} and a standard deviation of 3.85 \text{ g}. A graph of this distribution is shown below.

A histogram titled Distribution of sample means with Frequency on the y-axis, with numbers 0 through 16, and Mean weight in grams on the x-axis, with bars labeled at the left endpoint 70 through 90. The 70 through 71 bar goes to 1 on the y-axis, 71 through 72 goes to 2, 73 through 74 goes to 1, 74 through 75 goes to 2, 75 through 76 goes to 9, 76 through 77 goes to 6, 77 through 78 goes to 10, 78 through 79 goes to 9, 79 through 80 goes to 5, 80 through 81 goes to 16, 81 through 82 goes to 9, 82 through 83 goes to 6, 83 through 84 goes to 8, 84 through 85 goes to 5, 85 through 86 goes to 5, 86 through 87 goes to 3, 88 through 89 goes to 2, and 89 through 90 goes to 1.

What percentage of sample means are within 2 standard deviations of the mean for this data? Does this match what we expect from approximately normally distributed data?

Worked Solution
Create a strategy

Add and subtract 2 standard deviations from the mean and count the outcomes that fall inside of this interval.

Apply the idea

Mean plus 2 standard deviations = 80 + 2\left(3.85\right) = 87.6

Mean minus 2 standard deviations = 80 - 2\left(3.85\right) = 72.4

There are 6 data values outside this range, so there are 100-6=94 inside this range.

94\% of the sample means are within 2 standard deviations of the mean of the data set. This is close to the expected 95\% for a normal distribution and may approximate a normal distribution more closely if we were to take more samples.

Reflect and check

The distribution has a main central peak and tapers off to the sides but doesn't closely resemble a normal distribution. This is due to variation in the relatively small number of samples taken. To observe that the sampling distribution closely approximates a normal distribution for a larger number of samples we could simulate taking more samples using technology.

c

The grocery store owner decides to label the bags of apples as 10 times the mean weight of an apple in the population. Determine what the label on the bag should be.

Worked Solution
Create a strategy

The mean of the sampling distribution is equal to the mean of the population. We can use the mean of the distribution given in part (b).

Apply the idea
\displaystyle \text{Label}\displaystyle =\displaystyle 10\cdot 80Using the mean of the sampling distribution
\displaystyle =\displaystyle 800Evaluate the multiplication

The bags should be labeled as 800 g.

Reflect and check

Do you think this is a reasonable choice for a label? What percentage of customers would receive a bag that weighed less than the labeled weight?

Idea summary

With repeated sampling from a population, a sampling distribution can be created for a sample statistic. For a large enough sample size, the sampling distribution will be approximately normally distributed, and the mean of a sampling distribution is equal to the mean of the population.

Margin of error

When taking repeated samples, we can accurately predict the population parameter. However, we do not generally have the luxury of taking multiple samples. How confident can we be when using a single sample as an estimate for a population parameter?

Estimates for population characteristics are usually given along with a margin of error.

Margin of error

The maximum expected difference between the estimate of the population characteristic and the actual population characteristic.

Using repeated sampling or simulation, we can estimate the margin of error from a sampling distribution by creating an interval about the mean that contains most of the sample outcomes. A commonly used guideline for 'most' is that the interval should contain approximately 95\% of the outcomes. The margin of error will then be half the size of such an interval.

To estimate the margin of error from a single outcome, we need to understand the variation between individual samples taken from a given population. We can observe the variation by creating sampling distributions for a very large number of samples using simulation and probability. These models will approach the theoretical sampling distribution as we increase the number of samples taken.

The theoretical distribution tells us that the samples vary in a predictable way, dependent on the standard deviation of the population and the size of the individual samples. Let's explore how the sample size and standard deviation are related for sampling distributions.

Exploration

Here is a table of standard deviations that were obtained when creating sampling distributions with different sample sizes. The population the samples were taken from had a mean of 50 and a standard deviation of 6.

Sample sizeStandard deviation
43.00
101.90
251.20
500.85
1000.60
  1. Can you determine a pattern in how the standard deviation changes in relation to the sample size?

  2. Why would you expect to see less variation as the sample size becomes larger?

  3. Estimate the standard deviation if the sample size was 400.

When the sample size is large enough, the theoretical sampling distribution will be approximately normally distributed with:

  • A mean equal to the population mean

  • A standard deviation equal to the population standard deviation divided by the square root of the sample size, \dfrac{\sigma}{\sqrt{n}}

This gives us important information about how the samples can vary and we can use this to create a range of likely values for a population parameter based on a single sample.

Recall that in a normal distribution, approximately 95\% of outcomes lie within 2 standard deviations from the mean. Using 2 standard deviations as the margin of error will create an interval about the sample statistic that would capture the population mean for 95\% of samples. We often refer to a margin of error calculated this way as the 95\% confidence level margin of error.

We often do not know the value of the population standard deviation. In such cases, we can use the sample standard deviation as an estimate for the population standard deviation. Using this, we can estimate the margin of error from a single sample as follows:

\displaystyle \text{Margin of error}\approx 2\dfrac{s}{\sqrt{n}}
\bm{s}
is the sample standard deviation
\bm{n}
is the sample size

Margin of error gives you a sense of how accurate a sample statistic predicts the population parameter. For example, a poll may quote the support for a political candidate as 52\% with a margin of error of 5\%. This means we are confident that the actual support for the candidate in the wider population is between 47\% and 57\%.

Examples

Example 2

A council conducted a survey to gauge the opinion of local residents on a proposed rule enforcing an evening curfew for cats. Of the responses, 42\% were in favor of the new rule. The survey reported a margin of error of 3.5\%. Interpret the margin of error in this context.

Worked Solution
Create a strategy

Add and subtract the margin of error to create an interval that we are confident the population proportion lies within.

Apply the idea

Sample proportion plus margin of error = 42\% + 3.5\% = 45.5\%

Sample proportion minus margin of error = 42\%-3.5\% = 38.5\%

The council can be confident that the true population proportion of residents in favor of the curfew is between 38.5\% and 45.5\%.

Reflect and check

While we are confident the interval contains the population proportion, even with a well-designed survey, including a random sample, the given interval may not contain the population proportion. For example, if the margin of error was created using a 95\% confidence level, there would be a 5\% (1 in 20) chance that a random sample would result in a proportion that was more than 2 standard deviations from the population proportion.

Example 3

The following is a simulation of sampling distribution for 160 samples of size 20. The mean of the distribution is 0.65. Determine a reasonable estimate for the margin of error for samples of size 20 from the same population.

A dot plot titled Distribution of sample proportions with Frequency on the y-axis, with numbers 0 through 35, and Proportion on the x-axis, with numbers 0 through 1 in steps of 0.05. The number of dots is as follows: at 0.35, 3; at 0.4, 2; at 0.45, 5; at 0.5, 11; at 0.55, 13; at 0.6, 28; at 0.65, 34; at 0.7, 24; at 0.75, 21; at 0.8, 12; at 0.85, 3; at 0.9, 3; at 0.95, 1.
Worked Solution
Create a strategy

We want to create an estimate for the margin of error from a simulated sampling distribution. We should select a margin of error that would contain most of the sample proportions about the mean of the given distribution. This ensures we are reasonably confident that the actual population proportion is within the range of plus or minus the margin of error from a typical sample proportion.

Apply the idea

Let's start at the mean and expand out until we capture approximately 95\% of the sample proportions.

A dot plot titled Distribution of sample proportions with Frequency on the y-axis, with numbers 0 through 35, and Proportion on the x-axis, with numbers 0 through 1 in steps of 0.05. The number of dots is as follows: at 0.35, 3; at 0.4, 2; at 0.45, 5; at 0.5, 11; at 0.55, 13; at 0.6, 28; at 0.65, 34; at 0.7, 24; at 0.75, 21; at 0.8, 12; at 0.85, 3; at 0.9, 3; at 0.95, 1. Vertical dashed lines are drawn at the 0.45 and 0.85 mark. The 0.65 mark is labeled Mean. The distance from the vertical dashed line at the 0.45 mark to the Mean is labeled minus 0.2. The distance from the Mean to vertical dashed line at the 0.85 mark is labeled plus 0.2.

If we create an interval of \pm 0.2 from the mean, we contain all but 9 samples, so the interval contains 94.375\% of the samples. This gives us a reasonable estimate for margin of error of 0.2.

Reflect and check

Check that creating an interval of \pm 0.2 about any of the sample proportions within the boundary lines drawn would include the mean of the distribution and therefore include the population proportion.

Example 4

A sample of 25 phones of a particular model are taken to estimate how long the phone's battery lasts under typical usage. In the sample, the batteries lasted an average of 8.5 hours with a standard deviation of 0.8 hours.

a

Give an estimate of the margin of error for the battery life.

Worked Solution
Create a strategy

We want to estimate the margin of error from a single sample. To do this we can use the equation \text{Margin of error}\approx 2\dfrac{s}{\sqrt{n}}, where s is the sample standard deviation and n is the number of samples.

Apply the idea
\displaystyle \text{Margin of error}\displaystyle \approx\displaystyle 2\dfrac{s}{\sqrt{n}}Formula
\displaystyle =\displaystyle 2\left(\dfrac{0.8}{\sqrt{25}}\right)Substitute s=0.8 and n=25
\displaystyle =\displaystyle 2\left(0.16\right)Evaluate the division
\displaystyle =\displaystyle 0.32Evaluate the multiplication

The phone batteries will last 8.5 hours with a margin of error of 0.32 hours.

Reflect and check

The margin of error we estimated here was at a 95\% confidence level. Since using a margin of error of 2 standard deviations, we will create an interval of values that will include the true proportion 95\% of the time.

b

If we wanted to reduce the margin of error by 50\% but keep the same confidence level, how many phones would we need to sample?

Worked Solution
Create a strategy

We can use the equation \text{Margin of error}\approx 2\dfrac{s}{\sqrt{n}}, using the same standard deviation and half the margin of error from our previous calculation. Then, we can solve for n.

Apply the idea

50\% of previous margin of error is 0.5\left(0.32\right)=0.16 hours.

\displaystyle \text{Margin of error}\displaystyle \approx\displaystyle 2\dfrac{s}{\sqrt{n}}Formula
\displaystyle 0.16\displaystyle =\displaystyle 2\dfrac{0.8}{\sqrt{n}}Substitute margin of error=0.16 and s=0.8
\displaystyle 0.16\sqrt{n}\displaystyle =\displaystyle 2\left(0.8\right)Multiply both sides by \sqrt{n}
\displaystyle \sqrt{n}\displaystyle =\displaystyle \dfrac{1.6}{0.16}Evaluate the multiplication and divide both sides by 0.16
\displaystyle \sqrt{n}\displaystyle =\displaystyle 10Evaluate the division
\displaystyle n\displaystyle =\displaystyle 100Square both sides of the equation

We would need a sample of approximately 100 phones to reduce the margin of error by 50\%.

Reflect and check

Notice that to reduce the margin of error by \dfrac{1}{2} we needed to increase the sample size by a factor of 4. To decrease the margin of error to \dfrac{1}{3}, the original size we would need to increase the sample size by a factor of 9. Can you spot a pattern? Can you prove it?

Example 5

Before an election a poll of 200 voters was conducted and candidate X was ahead with 43\% of the vote compared to candidate Y with 38\%. If the poll reported a margin of error of 4\%, could the two candidates actually have an equal level of support in the wider population of voters? Justify your answer.

Worked Solution
Create a strategy

We can use the margin of error to create a range of possible values we are confident each candidate's support percentage lies within. Then we can look to see if the intervals share the same possible value.

Apply the idea

Candidate X:

Sample proportion plus margin of error = 43\% + 4\% = 47\%

Sample proportion minus margin of error = 44\%-4\% = 39\%

Candidate Y:

Sample proportion plus margin of error = 38\% + 4\% = 42\%

Sample proportion minus margin of error = 38\%-4\% = 34\%

The population proportion in support of candidate X could range from 39\% to 47\%. The population proportion in support of candidate Y could range from 34\% to 42\%. As these two intervals overlap, the candidates could be tied in the wider voting population.

Reflect and check

To be confident in whether or not Candidate X was ahead in voter support, we would require the intervals not to overlap. This could be achieved by taking a larger sample to reduce the margin of error.

Idea summary

The margin of error reflects the accuracy of an estimate in predicting the value of a population parameter. At a 95\% confidence level, the margin of error can be estimated from a single sample using:

\displaystyle \text{Margin of error}\approx 2\dfrac{s}{\sqrt{n}}
\bm{s}
is the sample standard deviation
\bm{n}
is the sample size

Outcomes

S.IC.A.2

Decide if a specified model is consistent with results from a given data-generating process, e.g. Using simulation.

S.IC.B.4

Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling.

S.IC.B.5

Use data from a randomized experiment to compare two treatments; use simulations to decide if differences between parameters are significant.

S.IC.B.6

Evaluate reports based on data.

What is Mathspace

About Mathspace