When sampling, we are sometimes interested in obtaining a sub-group's proportion of an overall population, rather than in obtaining a sample estimate of a mean value $\mu$μ.

For example, suppose we are interested in finding the proportion of salmon in a lake full of fish. Unless we drain the lake, the exact proportion of salmon is probably unknowable. Perhaps our best option is to somehow obtain a random sample of fish (using fishing nets or some other device), and use the sample proportion as our estimate of the true proportion.

If for example, suppose there were $S+O$S+O fish in the lake where $S$S is the total number of salmon and $O$O was the total amount of other fish in the lake. Our random sample might contain $360$360 fish of which $9$9 were salmon. The sample proportion, which we could call $\hat{p}$^p, would be $\frac{90}{360}=0.25$90360=0.25, and this would be an estimate of the population proportion $p=\frac{S}{S+O}$p=SS+O.

Of course the interesting thing about taking a sample like this is that we simply don't know the $p$p value. In the fish sample, for example, how could we tell the obtained sample proportion $\hat{p}=0.25$^p=0.25 was representative of $p$p?

This is a difficult question to answer, but at its heart, the answer has to do with experimental design. Clearly the larger the sample taken, the more accurate we expect $\hat{p}$^p to be. When thinking about the sample size, we might consider the size of the lake and whether all fish have access to all parts of the lake. We might even decide to take a number of samples from different points around the lake and develop an average proportion.

What sample size should we use?

There are various statistical methods available to determine an appropriate sample size. A rather crude method involves using a preliminary test sample first, and from that determining a sufficient sample size to use in a more rigorous test.

For example, suppose we use the sample proportion $\hat{p}=0.25$^p=0.25.as our preliminary proportion.

Then, if we want to be $95%$95% sure that the true proportion $p$p is within $e=0.05$e=0.05 of $\hat{p}$^p, we simply calculate:

$n=\frac{1.96^2\times(0.25)(1-0.25)}{0.05^2}=288.12$n=1.962×(0.25)(1−0.25)0.052=288.12

The number $1.96$1.96 in the formula is related to an arbitrarily chosen confidence level, and the result is only meant to be a ball park figure.

If we take notice of the result, perhaps a rigorous estimate of $p$p would be obtained with a sample size of around $300$300 fish. Note that the formula contains the square of the tolerance limit $e$e in the denominator, and this means that if we want more accuracy, we will need a far greater sample size.

Comparing a sample proportion and a claimed proportion

In certain instances we can determine the true proportion using rational argument. We know for example that the probability of obtaining a head in a single flip of a fair coin is $\frac{1}{2}$12. The coin has two sides and each side is equally likely to fall uppermost.

In these instances we think of the population proportion as a probability $p$p. It is often referred to as the $p$p-value of the experiment. We then can compare any particular sample proportion $\hat{p}$^p with its $p$p-value and make a judgment on either the coin's fairness or the robustness of the sample.

Judgements like this are at the heart of sampling theory.

Examples

Example 1

A random sample of $200$200 people were asked whether they preferred chocolate ice-cream over vanilla ice-cream. $88$88 preferred chocolate over vanilla. The sample proportion $\hat{p}=\frac{88}{200}=0.44$^p=88200=0.44.

Example 2

Two independent samples we conducted across a city's households to ascertain the proportion of households with more than 1 bathroom. Sample A (a random sample of $130$130 households) found a proportion of $0.3154$0.3154 and Sample B (a random sample of $210$210 different households) found a proportion of $0.4095$0.4095.

Sample B would be more robust than sample A because the sample size of B is larger. However, knowing the sample sizes, and knowing the samples used different households allows us to combine the independent results. From sample A it looks like $41$41 households have more than $1$1 bathroom, and similarly from sample B, $86$86 households have more than $1$1 bathroom. This means that from the two samples, we know that of $340$340 households, $127$127 have more than $1$1 bathroom. This is an overall sample proportion of $0.3735$0.3735.

Worked Examples

QUESTION 1

A survey of $115$115 randomly selected people in $Busan$ found that $6$6 of them were aged over $55$55.

A second survey of $2183$2183 randomly selected people in $Busan$ found that $475$475 of them were aged over $55$55.

Considering the first survey, what is the sample proportion of people in $Busan$ over the age of $55$55?
Considering the second survey, what is the sample proportion of people in $Busan$ over the age of $55$55?
Which sample proportion is likely to be the better estimate of the population proportion?
Neither sample. The sample size does not matter, since both come from the same population.
A
The first sample. The smaller the sample the more reliable the results and the more rigorous the sampling.
B
The second sample. The larger the sample size, the closer the parameters of the sample resemble the parameters of the population.
C

QUESTION 2

A population is sampled several times to investigate the proportion that purchase a new $pot plant$ each year. Each sample is equal in size and the values of the sample proportions are recorded in the graph below.

How many samples were taken?
Hence, estimate the population proportion of customers who purchased a new $pot plant$ . Give your answer as a decimal, correct to three decimal places.

QUESTION 3

A census for a particular country showed that $94%$94% of people used public transport at some point during a regular week.

At about the same time as the census, a sample of $2420$2420 people in a region of the country showed that $1001$1001 of those people used public transport at some point during a regular week.

Determine $p$p, the population proportion of the residents who use public transport at least once a week. Express your answer as a decimal.
Determine $\hat{p}$^p, the sample proportion of the residents who use public transport at least once a week. Express your answer a decimal, correct to two decimal places.
Comparing the population and sample proportions, which of the following statements is true?
The sample exhibits no bias and seems to be indicative of the larger population.
A
The sample exhibits bias and does not seem to adequately represent the larger population.
B

8.13 Sample proportions (calculating and comparison)

The sample proportion