topic badge
AustraliaVIC
VCE 12 Methods 2023

10.02 Distribution of sample proportions

Lesson

Sample proportion

As the name suggests, a sample proportion is the proportion of a random sample exhibiting a particular characteristic that we are observing for the population. This proportion in a particular sample can be expressed as a fraction, decimal or percentage, and is denoted by $\hat{p}$^p.

Calculating the sample proportion

The sample proportion is calculated by $\hat{P}=\frac{X}{n}$^P=Xn,

where $X$X is the amount in a random sample exhibiting the particular characteristic and $n$n is the size of the sample taken.

While the value of $\hat{P}$^P varies randomly, due to $n$n and $X$X potentially changing from one sample to the next, the population proportion, $p$p, remains constant.

Why would we end up looking at a sample proportion in the first place? While we might prefer to survey the entire population about the particular characteristic we're researching, in reality this is often not possible. We therefore have to perform a random sample of the population. Perhaps the next best thing, is to take a large sample from the population to get a good idea of what proportion exhibits the characteristic. Due to time and financial constraints, a more common practice is to take one large sample and use that particular value of $\hat{P}$^P, which we denote by the lowercase $\hat{p}$^p as a point estimate (estimator) of what is happening in the greater population.

Worked example

example 1

A random sample of $200$200 people were asked whether they preferred chocolate ice-cream over vanilla ice-cream. $88$88 preferred chocolate over vanilla. Determine the sample proportion.

Think: The sample size is $200$200 and the number who said they preferred chocolate was $88$88. We can express our sample proportion as a fraction or a decimal.

Do: $\hat{p}=\frac{88}{200}=0.44$^p=88200=0.44.

Practice questions

question 1

A survey of $115$115 randomly selected people in Busan found that $6$6 of them were aged over $55$55.

A second survey of $2183$2183 randomly selected people in Busan found that $475$475 of them were aged over $55$55.

  1. Considering the first survey, what is the sample proportion of people in Busan over the age of $55$55?

  2. Considering the second survey, what is the sample proportion of people in Busan over the age of $55$55?

  3. Which sample proportion is likely to be the better estimate of the population proportion?

    Neither sample. The sample size does not matter, since both come from the same population.

    A

    The first sample. The smaller the sample the more reliable the results and the more rigorous the sampling.

    B

    The second sample. The larger the sample size, the closer the parameters of the sample resemble the parameters of the population.

    C

question 2

A census for a particular country showed that $94%$94% of people used public transport at some point during a regular week.

At about the same time as the census, a sample of $2420$2420 people in a region of the country showed that $1001$1001 of those people used public transport at some point during a regular week.

  1. Determine $p$p, the population proportion of the residents who use public transport at least once a week. Express your answer as a decimal.

  2. Determine $\hat{p}$^p, the sample proportion of the residents who use public transport at least once a week. Express your answer a decimal, correct to two decimal places.

  3. Comparing the population and sample proportions, which of the following statements is true?

    The sample exhibits no bias and seems to be indicative of the larger population.

    A

    The sample exhibits bias and does not seem to adequately represent the larger population.

    B

 

The distribution of the sample proportions

As we saw in the previous chapter, when we compare one sample to another, we notice some variability in our samples.

When we invest our attention in one particular characteristic occurring in the sample, let's say, the number of students walking to school, we then have a situation where we are looking at a success or a failure. Our previous knowledge together with the focus on one particular characteristic being exhibited, enable us to model the distribution of the sample proportions using a discrete random variable.

 

Sampling from small populations

Using our knowledge of probability and possible selections we can create a probability distribution for the number of successes and hence, the proportion of successes in a sample.

Worked example

EXAMPLE 2

A bag contains $4$4 blue and $3$3 green marbles. Three marbles are selected at random without replacement, let $X$X be the number of blue marbles selected.

(a) Represent $X$X in a probability distribution table.

Think: By considering a tree diagram or the number of possible combinations, we can use selections to calculate the probabilities.

Do:

$x$x $0$0 $1$1 $2$2 $3$3
Calculation $\frac{3\times2\times1}{7\times6\times5}$3×2×17×6×5 $3\times\frac{4\times3\times2}{7\times6\times5}$3×4×3×27×6×5 $3\times\frac{4\times3\times3}{7\times6\times5}$3×4×3×37×6×5 $\frac{4\times3\times2}{7\times6\times5}$4×3×27×6×5
$P\left(X=x\right)$P(X=x) $\frac{1}{35}$135 $\frac{12}{35}$1235 $\frac{18}{35}$1835 $\frac{4}{35}$435

(b) Instead of the number of blue marbles in a selection, represent the distribution of the proportion of blue marbles in a selection, $\hat{P}$^P.

Think: The possible number of blue marbles in each selection was $0$0, $1$1, $2$2 or $3$3. Hence, the possible proportion of blue marbles in a selection is $0$0, $\frac{1}{3}$13, $\frac{2}{3}$23 or $\frac{3}{3}=1$33=1. Thus if we define $\hat{P}$^P as the proportion of blue marbles in a selection, we can convert the distribution above to one representing the distribution of sample proportions.

Do:

$x$x $0$0 $1$1 $2$2 $3$3
$\hat{p}$^p $0$0 $\frac{1}{3}$13 $\frac{2}{3}$23 $1$1
$P\left(\hat{P}=\hat{p}\right)$P(^P=^p) $\frac{1}{35}$135 $\frac{12}{35}$1235 $\frac{18}{35}$1835 $\frac{4}{35}$435

(c) How would the distribution change if the three marbles were selected with replacement after each selection?

Think: The number of marbles in the bag and the possible proportions that are blue would remain unchanged on each selection. The probability table for the number of blue marbles in a selection would form a binomial distribution with probability of success $p=\frac{4}{7}$p=47 and $n=3$n=3. Similar to the above, we can convert this to a table involving proportions by dividing the number in the selection, $x$x, by $n=3$n=3.

Do:

$x$x $0$0 $1$1 $2$2 $3$3
$\hat{p}$^p $0$0 $\frac{1}{3}$13 $\frac{2}{3}$23 $1$1

$P\left(\hat{P}=\hat{p}\right)$P(^P=^p) ($3$3 d.p.)

$0.0787$0.0787 $0.315$0.315 $0.420$0.420 $0.187$0.187

 

Sampling from large populations

When considering sampling from a large population, we can treat the situation as if we are selecting with replacement and assume the probability remains the same for each selection. In doing so, the probability distribution of the sample proportions can be obtained using the binomial distribution where the probability of success is $p$p and the number of trials will be the number of objects in the sample. Let's look at an example below.

Worked example

Example 3

At a particular school, records show that $20%$20% of students walk to school. In a random sample of $5$5 students, let $X$X be the number of students who walked to school.

(a) Represent $X$X in a probability distribution table.

Think: As we recall from our work on the binomial random variable, students either walk to school (success) or they do not (failure) and the number of students out of $5$5 who walked to school varies from $0$0 to $5$5. We can use technology to complete a table of values. Note, we are here looking at the number of students who walked, not the proportion.

Do:

$x$x $0$0 $1$1 $2$2 $3$3 $4$4 $5$5
$P\left(X=x\right)$P(X=x) $0.32768$0.32768 $0.4096$0.4096 $0.2048$0.2048 $0.0512$0.0512 $0.0064$0.0064 $0.00032$0.00032

(b) Instead of the number of students who walked to school, represent the distribution of the proportion of students who walked to school in this sample, $\hat{p}$^p.

Think: If the number of students who walked to school was $0$0, $1$1, $2$2, $3$3, $4$4 or $5$5, then the proportion of students who walked to school is $0$0, $\frac{1}{5}$15, $\frac{2}{5}$25, $\frac{3}{5}$35, $\frac{4}{5}$45 or $\frac{5}{5}$55. Thus if we define $\hat{P}$^P as the proportion of students who walked to school in this sample, we can represent the distribution in a table very similar to the one above.

Do:

$\hat{p}$^p $0$0 $0.2$0.2 $0.4$0.4 $0.6$0.6 $0.8$0.8 $1$1
$P\left(\hat{P}=\hat{p}\right)$P(^P=^p) $0.32768$0.32768 $0.4096$0.4096 $0.2048$0.2048 $0.0512$0.0512 $0.0064$0.0064 $0.00032$0.00032

Practice question

question 3

Three marbles are randomly drawn from a bag containing five black and six grey marbles.

Let $X$X be the number of black marbles drawn, with replacement.

  1. What is $p$p, the proportion of black marbles in the bag?

  2. If $3$3 marbles are drawn, with replacement, then the number of black marbles drawn, $X$X, can be $0$0, $1$1, $2$2 or $3$3.

    What are the values of the sample proportions, $\hat{P}$^P, of black marbles associated with each outcome of $X$X?

    Simplify your answers where possible.

    If $X=0$X=0: $\hat{P}$^P$=$=$\editable{}$

    If $X=1$X=1: $\hat{P}$^P$=$=$\editable{}$

    If $X=2$X=2: $\hat{P}$^P$=$=$\editable{}$

    If $X=3$X=3: $\hat{P}$^P$=$=$\editable{}$

  3. Construct the probability distribution for $X$X and $\hat{P}$^P below.

    Write each probability correct to four decimal places.

    $x$x $0$0 $1$1 $2$2 $3$3
    $P$P$($($X=x$X=x$)$) $0.1623$0.1623 $\editable{}$ $\editable{}$ $\editable{}$
    $\hat{p}$^p $0$0 $\frac{1}{3}$13 $\frac{2}{3}$23 $1$1
    $P$P$($($\hat{P}=\hat{p}$^P=^p$)$) $\editable{}$ $0.4057$0.4057 $\editable{}$ $\editable{}$
  4. Use your answers from part (c) to determine $P$P$($($\hat{P}<\frac{1}{2}$^P<12$)$), correct to the nearest four decimal places.

 

Mean and standard deviation of the distribution of sample proportions

Before we construct our distribution of the sample proportions, we first construct a distribution for the number of occurrences of successes of particular characteristic within our sample. Hence, we can let $X$X be the number of successes in our random sample. In addition to this, when the population is large or the sampling is done with replacement, then $X$X can be modelled by a binomial distribution such that $X\sim B\left(n,p\right)$X~B(n,p).

When we think about how we obtained the $\hat{p}$^p values in our example above, we know we simply divided each $x$x by $n$n, the possible number of outcomes, also known as the size of our sample.

We can relate this back to what we learnt about the linear change of scale and origin and we can therefore state that $\hat{P}=\frac{X}{n}$^P=Xn and hence:

$E\left(\hat{P}\right)$E(^P) $=$= $\frac{E\left(X\right)}{n}$E(X)n

Using what we know about the linear change of scale

  $=$= $\frac{n\times p}{n}$n×pn

Substituting the expected value of $E\left(X\right)$E(X) formula from our binomial distribution

  $=$= $p$p

Simplifying

 

Similarly we can apply the same linear change of scale and see how we calculate the standard deviation.

$SD\left(\hat{P}\right)$SD(^P) $=$= $\frac{\sqrt{n\times p\times\left(1-p\right)}}{n}$n×p×(1p)n

Using what we know about the linear change of scale

and our formula for the standard deviation of our binomial distribution

  $=$= $\sqrt{\frac{n\times p\times\left(1-p\right)}{n^2}}$n×p×(1p)n2

Bringing the $n$n into the square root by squaring

  $=$= $\sqrt{\frac{p\times\left(1-p\right)}{n}}$p×(1p)n

Simplifying the $n$n and $n^2$n2

 

Mean and standard deviation

When selecting a random sample of size $n$n from a large population, the distribution of sample proportions has:

  • Mean: $E\left(\hat{P}\right)=p$E(^P)=p
  • Standard deviation: $SD\left(\hat{P}\right)=\sqrt{\frac{p\left(1-p\right)}{n}}$SD(^P)=p(1p)n

Worked example

Example 4

Let's return to our second worked example, about the random sample of $5$5 students and the proportion of the sample walking to school if it is known that $20%$20% of students from this school walk to school.

(a) Calculate the mean of the distribution of the sample proportions.

Think: We could calculate the mean or expected value in the same way we did for any discrete random variable, and calculate directly from the table. However, since we know that the values in the table for $P\left(\hat{P}=\hat{p}\right)$P(^P=^p) are directly related to the values for $P\left(X=x\right)$P(X=x), which in turn are obtained from the binomial distribution, we can use what we know about calculating the expected value for the binomial distribution.

Do: From our work above we can see that $E\left(\hat{P}\right)=\hat{p}=p$E(^P)=^p=p which is $0.2$0.2 for this example.

(b) Calculate the standard deviation of the distribution of the sample proportions.

Think: Again, we could calculate the standard deviation directly from the table, or we can use the relationship we now know between the distribution of the sample proportions and the binomial distribution and get straight into our calculation.

Do: Using our simplified formula from above:

$SD\left(\hat{P}\right)$SD(^P) $=$= $\sqrt{\frac{0.2\times0.8}{5}}$0.2×0.85
  $=$= $=\frac{2\sqrt{5}}{25}$=2525

Practice questions

question 4

$15%$15% of all customers at a book store bought at least one autobiography. In a random sample of $100$100 customers, determine:

  1. The expected number of customers who purchased an autobiography.

  2. The expected sample proportion of customers who purchased an autobiography.

  3. The standard deviation of the sample proportions of customers who purchased an autobiography.

    Round your answer to three decimal places.

question 5

Three marbles are drawn with replacement from a bag containing six black and five grey marbles.

  1. Let $X$X be the number of black marbles drawn.

    Determine $E\left(X\right)$E(X).

  2. Determine the standard deviation of $X$X.
    Leave your answer in exact form.

  3. Let $\hat{P}$^P be the proportion of black marbles drawn.

    Determine $E\left(\hat{P}\right)$E(^P).

  4. Determine the standard deviation of $\hat{P}$^P.
    Leave your answer in exact form.

Outcomes

U34.AoS4.7

the definition of sample proportion as a random variable and key features of the distribution of sample proportions

U34.AoS4.4

statistical inference, including definition and distribution of sample proportions, simulations and confidence intervals: - distinction between a population parameter and a sample statistic and the use of the sample statistic to estimate the population parameter - simulation of random sampling, for a variety of values of 𝑝 and a range of sample sizes, to illustrate the distribution of 𝑃^ and variations in confidence intervals between samples - concept of the sample proportion as a random variable whose value varies between samples, where 𝑋 is a binomial random variable which is associated with the number of items that have a particular characteristic and 𝑛 is the sample size - approximate normality of the distribution of P^ for large samples and, for such a situation, the mean 𝑝 (the population proportion) and standard deviation - determination and interpretation of, from a large sample, an approximate confidence interval for a population proportion where 𝑧 is the appropriate quantile for the standard normal distribution, in particular the 95% confidence interval as an example of such an interval where 𝑧 ≈ 1.96 (the term standard error may be used but is not required).

U34.AoS4.12

simulate repeated random sampling and interpret the results, for a variety of population proportions and a range of sample sizes, to illustrate the distribution of sample proportions and variations in confidence intervals

What is Mathspace

About Mathspace