Lesson

Whenever measurements are made, there is always some uncertainty about the true value of the quantity being measured.

If ten people measure the same physical quantity with as much precision as they can manage, there could be as many as ten slightly different results. Similarly, if several groups of, say, thirty people are surveyed as to their voting intentions in a coming election, the proportions planning to vote for the different parties are likely to be different from group to group.

We assume that the mean of the measurements or the mean of the survey results is close to the true or population mean. We might then present the experimental mean as the result of our research but we should acknowledge the uncertainty involved in the result by saying that the true mean is *probably *within some specified small distance of the result and also give the probability that this is so.

Thus, after a survey of voting intentions, we might be told that party **A** would receive $52%$52% of the vote if the election were to be held tomorrow but that there was a $2%$2% margin of error in the survey. This usually means that with a probability of $0.95$0.95, the true proportion of voters for party **A** is between $50%$50% and $54%$54%.

The margin of error arises from the fact that a *sample *is being used to estimate a population parameter. The ten measurements of a physical quantity constitute a sample drawn from the infinitely many measurements that could be made on the quantity and the survey group of thirty people is a sample of the population of eligible voters.

Sample means or proportions vary unpredictably from sample to sample. Thus, they are random variables and, therefore, have an associated probability distribution. The size of the margin of error depends on the variance of the sampling distribution: the larger the sample, the smaller the variance of the sampling distribution and, hence, the margin of error, and *vice versa*. Techniques exist for determining how large a sample should be in order to keep the margin of error within certain bounds.

Statisticians speak of confidence intervals rather than margins of error. To construct a $95%$95% confidence interval, for example, is to construct bounds within which the true value of a parameter lies with a probability of $0.95$0.95.

In the case of sample surveys in which there are two possible responses, the aim is to determine the proportion of responses in each category. This corresponds to estimating the probabilities $p$`p` and $1-p$1−`p` of observing one or other response from a particular subject. Thus, the estimate $\hat{p}$^`p` is given by $\hat{p}=\frac{X}{n}$^`p`=`X``n` where $X$`X` is a binomial random variable representing the number of 'successes' and $n$`n` is the number of subjects in the sample, that is, the number of trials.

The estimator $\hat{p}$^`p` is itself a random variable since it varies from sample to sample. It can be shown that it has approximately the *normal *distribution when $n$`n` is large. (The sampling distribution of $\hat{p}$^`p` cannot be quite normal since probabilities are only between $0$0 and $1$1 but the normal distribution has a domain going from $-\infty$−∞ to $\infty$∞. In particular, the approximation is poor when $p$`p` is near $0$0 or near $1$1.)

Now,

$E\left(\hat{p}\right)=E\left(\frac{X}{n}\right)=\frac{np}{n}=p$`E`(^`p`)=`E`(`X``n`)=`n``p``n`=`p`,

which makes $\hat{p}$^`p` an unbiased estimator of $p$`p`, and

$Var\left(\hat{p}\right)=Var\left(\frac{X}{n}\right)=\frac{1}{n^2}Var\left(X\right)=\frac{1}{n^2}np\left(1-p\right)=\frac{p\left(1-p\right)}{n}$`V``a``r`(^`p`)=`V``a``r`(`X``n`)=1`n`2`V``a``r`(`X`)=1`n`2`n``p`(1−`p`)=`p`(1−`p`)`n`

To understand how these ideas are used to build a confidence interval of a desired size requires some theory about confidence intervals and the normal distribution.

We state here, without explanation, that to obtain a $95%$95% margin of error of size $e$`e` in a binomial sample of size $n$`n`, we can use the formula

$n=\frac{1.96^2\hat{p}\left(1-\hat{p}\right)}{e^2}$`n`=1.962^`p`(1−^`p`)`e`2

Suppose the voting intentions, on a two-party preferred basis, are thought to be about equally divided. We wish to know what number of potential voters to include in a random sample to have a $95%$95% confidence level that the survey result will be within $2%$2% of the true proportion.

The error value $e$`e` is $0.02$0.02 and we guess the value of $\hat{p}$^`p` to be $0.5$0.5. Thus, $n=\frac{1.96^2\times0.5\times0.5}{0.02^2}=2401$`n`=1.962×0.5×0.50.022=2401

Thus, the survey should include at least $2400$2400 respondents.

Investigate situations that involve elements of chance: A calculating probabilities of independent, combined, and conditional events B calculating and interpreting expected values and standard deviations of discrete random variables C applying distributions such as the Poisson, binomial, and normal

Apply probability distributions in solving problems