topic badge

10.02 Distribution of sample proportions

Lesson

Sample proportion

As the name suggests, a sample proportion is the proportion of a random sample exhibiting a particular characteristic that we are observing for the population. This proportion in a particular sample can be expressed as a fraction, decimal or percentage, and is denoted by $\hat{p}$^p.

Calculating the sample proportion

The sample proportion is calculated by $\hat{P}=\frac{X}{n}$^P=Xn,

where $X$X is the amount in a random sample exhibiting the particular characteristic and $n$n is the size of the sample taken.

While the value of $\hat{P}$^P varies randomly, due to $n$n and $X$X potentially changing from one sample to the next, the population proportion, $p$p, remains constant.

Why would we end up looking at a sample proportion in the first place? While we might prefer to survey the entire population about the particular characteristic we're researching, in reality this is often not possible. We therefore have to perform a random sample of the population. Perhaps the next best thing, is to take a large sample from the population to get a good idea of what proportion exhibits the characteristic. Due to time and financial constraints, a more common practice is to take one large sample and use that particular value of $\hat{P}$^P, which we denote by the lowercase $\hat{p}$^p as a point estimate (estimator) of what is happening in the greater population.

Worked example

example 1

A random sample of $200$200 people were asked whether they preferred chocolate ice-cream over vanilla ice-cream. $88$88 preferred chocolate over vanilla. Determine the sample proportion.

Think: The sample size is $200$200 and the number who said they preferred chocolate was $88$88. We can express our sample proportion as a fraction or a decimal.

Do: $\hat{p}=\frac{88}{200}=0.44$^p=88200=0.44.

Practice questions

question 1

A survey of $115$115 randomly selected people in Busan found that $6$6 of them were aged over $55$55.

A second survey of $2183$2183 randomly selected people in Busan found that $475$475 of them were aged over $55$55.

  1. Considering the first survey, what is the sample proportion of people in Busan over the age of $55$55?

  2. Considering the second survey, what is the sample proportion of people in Busan over the age of $55$55?

  3. Which sample proportion is likely to be the better estimate of the population proportion?

    Neither sample. The sample size does not matter, since both come from the same population.

    A

    The first sample. The smaller the sample the more reliable the results and the more rigorous the sampling.

    B

    The second sample. The larger the sample size, the closer the parameters of the sample resemble the parameters of the population.

    C

question 2

A census for a particular country showed that $94%$94% of people used public transport at some point during a regular week.

At about the same time as the census, a sample of $2420$2420 people in a region of the country showed that $1001$1001 of those people used public transport at some point during a regular week.

  1. Determine $p$p, the population proportion of the residents who use public transport at least once a week. Express your answer as a decimal.

  2. Determine $\hat{p}$^p, the sample proportion of the residents who use public transport at least once a week. Express your answer a decimal, correct to two decimal places.

  3. Comparing the population and sample proportions, which of the following statements is true?

    The sample exhibits no bias and seems to be indicative of the larger population.

    A

    The sample exhibits bias and does not seem to adequately represent the larger population.

    B

 

The distribution of the sample proportions

As we saw in the previous chapter, when we compare one sample to another, we notice some variability in our samples.

When we invest our attention in one particular characteristic occurring in the sample, let's say, the number of students walking to school, we then have a situation where we are looking at a success or a failure. Our previous knowledge together with the focus on one particular characteristic being exhibited, enable us to model the distribution of the sample proportions using a discrete random variable.

 

Sampling from small populations

Using our knowledge of probability and possible selections we can create a probability distribution for the number of successes and hence, the proportion of successes in a sample.

Worked example

EXAMPLE 2

A bag contains $4$4 blue and $3$3 green marbles. Three marbles are selected at random without replacement, let $X$X be the number of blue marbles selected.

(a) Represent $X$X in a probability distribution table.

Think: By considering a tree diagram or the number of possible combinations, we can use selections to calculate the probabilities.

Do:

$x$x $0$0 $1$1 $2$2 $3$3
Calculation $\frac{3\times2\times1}{7\times6\times5}$3×2×17×6×5 $3\times\frac{4\times3\times2}{7\times6\times5}$3×4×3×27×6×5 $3\times\frac{4\times3\times3}{7\times6\times5}$3×4×3×37×6×5 $\frac{4\times3\times2}{7\times6\times5}$4×3×27×6×5
$P\left(X=x\right)$P(X=x) $\frac{1}{35}$135 $\frac{12}{35}$1235 $\frac{18}{35}$1835 $\frac{4}{35}$435

(b) Instead of the number of blue marbles in a selection, represent the distribution of the proportion of blue marbles in a selection, $\hat{P}$^P.

Think: The possible number of blue marbles in each selection was $0$0, $1$1, $2$2 or $3$3. Hence, the possible proportion of blue marbles in a selection is $0$0, $\frac{1}{3}$13, $\frac{2}{3}$23 or $\frac{3}{3}=1$33=1. Thus if we define $\hat{P}$^P as the proportion of blue marbles in a selection, we can convert the distribution above to one representing the distribution of sample proportions.

Do:

$x$x $0$0 $1$1 $2$2 $3$3
$\hat{p}$^p $0$0 $\frac{1}{3}$13 $\frac{2}{3}$23 $1$1
$P\left(\hat{P}=\hat{p}\right)$P(^P=^p) $\frac{1}{35}$135 $\frac{12}{35}$1235 $\frac{18}{35}$1835 $\frac{4}{35}$435

(c) How would the distribution change if the three marbles were selected with replacement after each selection?

Think: The number of marbles in the bag and the possible proportions that are blue would remain unchanged on each selection. The probability table for the number of blue marbles in a selection would form a binomial distribution with probability of success $p=\frac{4}{7}$p=47 and $n=3$n=3. Similar to the above, we can convert this to a table involving proportions by dividing the number in the selection, $x$x, by $n=3$n=3.

Do:

$x$x $0$0 $1$1 $2$2 $3$3
$\hat{p}$^p $0$0 $\frac{1}{3}$13 $\frac{2}{3}$23 $1$1

$P\left(\hat{P}=\hat{p}\right)$P(^P=^p) ($3$3 d.p.)

$0.0787$0.0787 $0.315$0.315 $0.420$0.420 $0.187$0.187

 

Sampling from large populations

When considering sampling from a large population, we can treat the situation as if we are selecting with replacement and assume the probability remains the same for each selection. In doing so, the probability distribution of the sample proportions can be obtained using the binomial distribution where the probability of success is $p$p and the number of trials will be the number of objects in the sample. Let's look at an example below.

Worked example

Example 3

At a particular school, records show that $20%$20% of students walk to school. In a random sample of $5$5 students, let $X$X be the number of students who walked to school.

(a) Represent $X$X in a probability distribution table.

Think: As we recall from our work on the binomial random variable, students either walk to school (success) or they do not (failure) and the number of students out of $5$5 who walked to school varies from $0$0 to $5$5. We can use technology to complete a table of values. Note, we are here looking at the number of students who walked, not the proportion.

Do:

$x$x $0$0 $1$1 $2$2 $3$3 $4$4 $5$5
$P\left(X=x\right)$P(X=x) $0.32768$0.32768 $0.4096$0.4096 $0.2048$0.2048 $0.0512$0.0512 $0.0064$0.0064 $0.00032$0.00032

(b) Instead of the number of students who walked to school, represent the distribution of the proportion of students who walked to school in this sample, $\hat{p}$^p.

Think: If the number of students who walked to school was $0$0, $1$1, $2$2, $3$3, $4$4 or $5$5, then the proportion of students who walked to school is $0$0, $\frac{1}{5}$15, $\frac{2}{5}$25, $\frac{3}{5}$35, $\frac{4}{5}$45 or $\frac{5}{5}$55. Thus if we define $\hat{P}$^P as the proportion of students who walked to school in this sample, we can represent the distribution in a table very similar to the one above.

Do:

$\hat{p}$^p $0$0 $0.2$0.2 $0.4$0.4 $0.6$0.6 $0.8$0.8 $1$1
$P\left(\hat{P}=\hat{p}\right)$P(^P=^p) $0.32768$0.32768 $0.4096$0.4096 $0.2048$0.2048 $0.0512$0.0512 $0.0064$0.0064 $0.00032$0.00032

Practice question

question 3

Three marbles are randomly drawn from a bag containing five black and six grey marbles.

Let $X$X be the number of black marbles drawn, with replacement.

  1. What is $p$p, the proportion of black marbles in the bag?

  2. If $3$3 marbles are drawn, with replacement, then the number of black marbles drawn, $X$X, can be $0$0, $1$1, $2$2 or $3$3.

    What are the values of the sample proportions, $\hat{P}$^P, of black marbles associated with each outcome of $X$X?

    Simplify your answers where possible.

    If $X=0$X=0: $\hat{P}$^P$=$=$\editable{}$

    If $X=1$X=1: $\hat{P}$^P$=$=$\editable{}$

    If $X=2$X=2: $\hat{P}$^P$=$=$\editable{}$

    If $X=3$X=3: $\hat{P}$^P$=$=$\editable{}$

  3. Construct the probability distribution for $X$X and $\hat{P}$^P below.

    Write each probability correct to four decimal places.

    $x$x $0$0 $1$1 $2$2 $3$3
    $P$P$($($X=x$X=x$)$) $0.1623$0.1623 $\editable{}$ $\editable{}$ $\editable{}$
    $\hat{p}$^p $0$0 $\frac{1}{3}$13 $\frac{2}{3}$23 $1$1
    $P$P$($($\hat{P}=\hat{p}$^P=^p$)$) $\editable{}$ $0.4057$0.4057 $\editable{}$ $\editable{}$
  4. Use your answers from part (c) to determine $P$P$($($\hat{P}<\frac{1}{2}$^P<12$)$), correct to the nearest four decimal places.

 

Outcomes

4.5.2.1

understand the concept of the sample proportion 𝑝̂ as a random variable whose value varies between samples, and the formulas for the mean 𝑝 and standard deviation sqrt(𝑝(1−𝑝)/𝑛) of the sample proportion 𝑝̂

What is Mathspace

About Mathspace