topic badge

16.07 Confidence intervals

Lesson

Confidence intervals for sample proportions

We see and hear a lot of statistics in the media, published by polls or researchers. However, as we have discovered, these are often taken from samples of a population.

For example, in 2009, the World Bank reported that $59.45%$59.45% of adult males in Russia smoke. This would have been calculated from a sample of males from across Russia, not the whole population. So how reliable is this? Perhaps the males sampled were much more likely to smoke than the population as a whole. Can we really infer from the sample that almost $60%$60% of adult males in Russia are smokers? Can a single value accurately estimate the prevalence of male smokers?

We have seen that when the sample size is large, the sample proportion has an approximately normal distribution. So what we can look for is a range of values within which we can be fairly certain contains the true population proportion.

Exploration

Use this Geogebra app to explore what happens when you change the size of the sample proportion, the size of the population proportion, the size of the sample.

  1. Start by dragging one slider at a time. Then answer the questions below

Note: the true proportion is represented by the vertical line. See how you can change how many of the samples will contain the true population proportion.

Guiding questions

  1. What happens when you change the population proportion? What else is affected?
  2. What happens when you change the sample size? What else is affected?
  3. What happens when you change the confidence level? What else is affected?

 

An approximate $95%$95% confidence interval for the true population proportion $p$p is given by

where:

 is a calculated value of the sample proportion 

$n$n is the size of the sample from which was calculated.

So if we have a confidence interval of $95%$95%, we expect approximately $95%$95% of such intervals would contain $p$p. We do not know whether the particular confidence interval obtained, however, is one of the $95%$95% that contains $p$p or the $5%$5% that does not.

Looking at it another way, we can say we are $5%$5% confident that the population proportion lies outside this interval - above or below. Halving this percentage tells us that we can be $2.5%$2.5% confident that it lies above the interval, and $2.5%$2.5% confident that it lies below it.

If we wanted to be more confident, we could calculate at a $99%$99% confidence level: 

If we are fine with being less confident, we can go down to a $90%$90% confidence level:

Note that the higher/lower we set the confidence level, the larger/smaller the confidence interval we calculate. The wider we cast the net, the more sure we can be that the true proportion lies within it.

However, it is wrong to say that

“The probability that the true proportion lies in the interval

is $90%$90%”.

There is a big difference between "probability" and "confidence"! The true value of $p$p doesn't take on a range of likely values - it's not a variable at all, it is fixed (even though it is unknowable). Here are two analogies that may help you understand.

1. What is the probability that the first day of 2017 was a Tuesday?

You may think at first that it is $\frac{1}{7}$17, since there are $7$7 days in a week. However, there is no question of likelihood in this statement - either the first of January 2017 was a Tuesday, or it wasn’t. So we can say that this probability is either $0$0 or $1$1.
(In this case it’s $0$0, since it was a Sunday).

2. You pick any real number, and I mark out a finite interval on the real line - as big as I like, but finite. From your perspective, what is the probability that the interval I just marked out contains your number?

Well, as soon as I mark out my interval, you will know for sure whether my guess is correct or not. Either I’m right (so the probability is $1$1), or I’m wrong (so the probability is $0$0).

In summary, we can be reasonably confident that the true proportion lies within the confidence interval, with a degree of confidence equal to the confidence level. Additional, or larger, samples will allow us to narrow the confidence interval around the fixed value of the true proportion.

 

Worked Examples

question 1

For a random sample of $33$33 lollies, $12$12 were found to contain food coloring.

a) Find a point estimate for $p$p, the proportion of lollies that contain food coloring.

Think: this is a simple, single number that shows the proportion of lollies that contain food coloring, which you can express as a fraction.

Do: $\frac{12}{33}$1233

b) Calculate a $90%$90% confidence interval for $p$p.

Think: substitute in the relevant values into the correct version of the formula

Do: $\frac{12}{33}\pm1.65\sqrt{\frac{\frac{12}{33}(1-\frac{12}{33})}{33}}$1233±1.651233(11233)33

=$\frac{12}{33}\pm1.65\sqrt{\frac{\frac{12}{33}(\frac{21}{33})}{33}}$1233±1.651233(2133)33

= $\frac{12}{33}\pm1.65\sqrt{\frac{\frac{252}{1089}}{33}}$1233±1.65252108933

= $\frac{12}{33}\pm1.65\frac{28}{3993}$1233±1.65283993

= $\frac{12}{33}\pm1.65\sqrt{\frac{28}{3993}}$1233±1.65283993

= $\frac{12}{33}\pm1.65(0.083739306)$1233±1.65(0.083739306)

$\frac{12}{33}\pm0.138169855$1233±0.138169855

= $0.225466507$0.225466507, $0.501806219$0.501806219

This means we can be $90%$90% confident that the true population proportion of lollies which contain coloring is somewhere between around $0.23$0.23 and $0.50$0.50.

Practice questions

QUESTION 2

A sample of size $170$170 is taken from the population, and the sample proportion is found to be $0.55$0.55.

Standard Normal Probability z-value
$0.9$0.9 $1.282$1.282
$0.925$0.925 $1.440$1.440
$0.95$0.95 $1.645$1.645
$0.975$0.975 $1.960$1.960
$0.99$0.99 $2.326$2.326
$0.995$0.995 $2.576$2.576
  1. State the $z$z-value that corresponds to a $90%$90% confidence interval.

  2. Use the table of values to calculate the $90%$90% confidence interval for the true proportion.

    Express your answer in the form $\left(\editable{},\editable{}\right)$(,), and give your answer to two decimal places.

  3. Which of the following statements about the confidence interval are correct? Select all that apply.

    There is a $90%$90% probability that the true proportion lies between $0.49$0.49 and $0.61$0.61.

    A

    The probability that the true proportion lies within $\left(0.49,0.61\right)$(0.49,0.61) is $0$0 or $1$1.

    B

    We have $90%$90% confidence that the true proportion lies between $0.49$0.49 and $0.61$0.61.

    C

    The true proportion lies between $0.49$0.49 and $0.61$0.61.

    D

QUESTION 3

In a sample of $60$60 students from a school, $24$24 of them would prefer different school hours.

Standard Normal Probability z-value
$0.9$0.9 $1.282$1.282
$0.925$0.925 $1.440$1.440
$0.95$0.95 $1.645$1.645
$0.975$0.975 $1.960$1.960
$0.99$0.99 $2.326$2.326
$0.995$0.995 $2.576$2.576
  1. Estimate the probability that a student in the school would prefer different school hours.

  2. Estimate the standard deviation ($\hat{\sigma}$^σ) of the sampling distribution.

    Round your answer to 2 decimal places.

  3. Use the table of values and the result of the previous part to find the $90%$90% confidence interval for the probability of a student at the school preferring different school hours.

    Express your answer in the form $\left(\editable{},\editable{}\right)$(,), and give your answer to two decimal places.

  4. Which of the following statements about the proportion of students of the school preferring different school times is correct? Select all that apply.

    There is a $90%$90% probability that the true proportion lies between $0.30$0.30 and $0.50$0.50.

    A

    The true proportion lies between $0.30$0.30 and $0.50$0.50.

    B

    The probability that the true proportion lies within $\left(0.30,0.50\right)$(0.30,0.50) is $0$0 or $1$1.

    C

    We have $90%$90% confidence that the true proportion lies between $0.30$0.30 and $0.50$0.50.

    D

QUESTION 4

In a car manufacturing plant, the brake pads of $100$100 cars are tested and $10$10 of them fail the test. State the $95%$95% confidence interval for the proportion of cars produced in the plant whose brakes fail the test.

Express your answer in the form $\left(\editable{},\editable{}\right)$(,), and give your answer to two decimal places.

Standard Normal Probability z-value
$0.9$0.9 $1.282$1.282
$0.925$0.925 $1.440$1.440
$0.95$0.95 $1.645$1.645
$0.975$0.975 $1.960$1.960
$0.99$0.99 $2.326$2.326
$0.995$0.995 $2.576$2.576

 

Margins of error

Whenever measurements are made, there is always some uncertainty about the true value of the quantity being measured.

If ten people measure the same physical quantity with as much precision as they can manage, there could be as many as ten slightly different results. Similarly, if several groups of, say, thirty people are surveyed as to their voting intentions in a coming election, the proportions planning to vote for the different parties are likely to be different from group to group.

We assume that the mean of the measurements or the mean of the survey results is close to the true or population mean. We might then present the experimental mean as the result of our research but we should acknowledge the uncertainty involved in the result by saying that the true mean is probably within some specified small distance of the result and also give the probability that this is so.

Thus, after a survey of voting intentions, we might be told that party A would receive $52%$52% of the vote if the election were to be held tomorrow but that there was a $2%$2% margin of error in the survey. This usually means that with a probability of $0.95$0.95, the true proportion of voters for party A is between $50%$50% and $54%$54%.

The margin of error arises from the fact that a sample is being used to estimate a population parameter. The ten measurements of a physical quantity constitute a sample drawn from the infinitely many measurements that could be made on the quantity and the survey group of thirty people is a sample of the population of eligible voters.

Sample means or proportions vary unpredictably from sample to sample. Thus, they are random variables and, therefore, have an associated probability distribution. The size of the margin of error depends on the variance of the sampling distribution: the larger the sample, the smaller the variance of the sampling distribution and, hence, the margin of error, and vice versa. Techniques exist for determining how large a sample should be in order to keep the margin of error within certain bounds.

 

Binomial samples

Statisticians speak of confidence intervals rather than margins of error. To construct a $95%$95% confidence interval, for example, is to construct bounds within which the true value of a parameter lies with a probability of $0.95$0.95

In the case of sample surveys in which there are two possible responses, the aim is to determine the proportion of responses in each category. This corresponds to estimating the probabilities $p$p and $1-p$1p of observing one or other response from a particular subject. Thus, the estimate $\hat{p}$^p is given by $\hat{p}=\frac{X}{n}$^p=Xn where $X$X is a binomial random variable representing the number of 'successes' and $n$n is the number of subjects in the sample, that is, the number of trials.

The estimator $\hat{p}$^p is itself a random variable since it varies from sample to sample. It can be shown that it has approximately the normal distribution when $n$n is large. (The sampling distribution of $\hat{p}$^p cannot be quite normal since probabilities are only between $0$0 and $1$1 but the normal distribution has a domain going from $-\infty$ to $\infty$. In particular, the approximation is poor when $p$p is near $0$0 or near $1$1.)

Now,

$E\left(\hat{p}\right)=E\left(\frac{X}{n}\right)=\frac{np}{n}=p$E(^p)=E(Xn)=npn=p,

which makes $\hat{p}$^p an unbiased estimator of $p$p, and

$Var\left(\hat{p}\right)=Var\left(\frac{X}{n}\right)=\frac{1}{n^2}Var\left(X\right)=\frac{1}{n^2}np\left(1-p\right)=\frac{p\left(1-p\right)}{n}$Var(^p)=Var(Xn)=1n2Var(X)=1n2np(1p)=p(1p)n

To understand how these ideas are used to build a confidence interval of a desired size requires some theory about confidence intervals and the normal distribution. 

We state here, without explanation, that to obtain a $95%$95% margin of error of size $e$e in a binomial sample of size $n$n, we can use the formula

$n=\frac{1.96^2\hat{p}\left(1-\hat{p}\right)}{e^2}$n=1.962^p(1^p)e2

Worked example

Question 5

Suppose the voting intentions, on a two-party preferred basis, are thought to be about equally divided. We wish to know what number of potential voters to include in a random sample to have a $95%$95% confidence level that the survey result will be within $2%$2% of the true proportion.

The error value $e$e is $0.02$0.02 and we guess the value of $\hat{p}$^p to be $0.5$0.5. Thus, $n=\frac{1.96^2\times0.5\times0.5}{0.02^2}=2401$n=1.962×0.5×0.50.022=2401

Thus, the survey should include at least $2400$2400 respondents.

Practice questions

Question 6

Using the table provided, calculate the margin of error for a $90%$90% confidence interval for a sample of size $140$140 that has a sample proportion of $0.63$0.63. Give your answer to two decimal places.

Standard Normal Probability z-value
$0.9$0.9 $1.282$1.282
$0.925$0.925 $1.440$1.440
$0.95$0.95 $1.645$1.645
$0.975$0.975 $1.960$1.960
$0.99$0.99 $2.326$2.326
$0.995$0.995 $2.576$2.576

Question 7

Use the table provided to find the confidence level corresponding to a margin of error of $1.6$1.6 in a standard normal variable.

Express your answer as a percentage correct to one decimal place.

z Area under the

standard normal curve

to the left of z

$1.50$1.50 $0.9332$0.9332
$1.60$1.60 $0.9452$0.9452
$1.70$1.70 $0.9554$0.9554
$1.80$1.80 $0.9641$0.9641
$1.90$1.90 $0.9713$0.9713
$2.00$2.00 $0.9772$0.9772
$2.10$2.10 $0.9812$0.9812
$2.20$2.20 $0.9861$0.9861
$2.30$2.30 $0.9893$0.9893
$2.40$2.40 $0.9918$0.9918
$2.50$2.50 $0.9938$0.9938
$2.60$2.60 $0.9953$0.9953
$2.70$2.70 $0.9965$0.9965
$2.80$2.80 $0.9974$0.9974
$2.90$2.90 $0.9981$0.9981
$3.00$3.00 $0.9987$0.9987

Question 8

 

Applications of confidence intervals

There are many real life applications for confidence intervals. Several of them are explored in the problems below.

Practice questions

QUESTION 9

QUESTION 10

QUESTION 11

Lucy and Adam run an ice cream shop and they are wondering whether they should also sell coffee at their shop.

  1. What is the $z$z-score required to find a $90%$90% confidence interval to the nearest three decimal places?

  2. Lucy thinks that the proportion of customers who would also buy coffee is $0.3$0.3.

    Calculate the size of the sample required for Lucy to achieve a margin of error of $0.05$0.05 in an approximate $90%$90% confidence interval for this proportion.

    Use your answer from part (a) and round your answer to the nearest whole number.

  3. Adam thinks that the proportion of customers who would also buy coffee is $0.4$0.4.

    Calculate the size of the sample required for Adam to achieve a margin of error of $0.05$0.05 in an approximate $90%$90% confidence interval for this proportion.

    Use your answer from part (a) and round your answer to the nearest whole number.

  4. Suppose that Adam's estimate of the proportion of customers wanting coffee is correct, but a sample is performed using the sample size from Lucy's estimate.

    What will be the margin of error for the actual proportion $p$p, using a $90%$90% confidence interval? Give your answer as a decimal, rounded to three decimal places.

What is Mathspace

About Mathspace