Populations and Samples

NZ Level 8 (NZC) Level 3 (NCEA) [In development]

The exact distribution of the sample proportions

Lesson

We introduce the idea of the probability distribution of the sample proportions with an example.

Imagine a computer is programmed to randomly select four whole numbers between $1$1 and $10$10. The program is allowed to select any number more than once. Using combinatoric language we can assume that the selections are made with replacement.

A success is defined when a number drawn is prime (there are four primes available, namely $2,3,5,7$2,3,5,7). The six non-primes ($1,4,6,8,9,10$1,4,6,8,9,10) are deemed as failures.

So for example if the computer selects the set of numbers $2,2,6,9$2,2,6,9 then we record the selection as two successes. Likewise $1,6,9,10$1,6,9,10 would be recorded and $0$0 successes and $5,5,1,5$5,5,1,5 would be recorded as $3$3 successes.

This program is an example of a Bernoulli process where exactly two outcomes are possible (either a success or a failure) and the probability of a success $p$`p` remains fixed for each of the four selections. The selection of each number is completely independent of the selection of any other number.

Since there are four primes between $1$1 and $10$10, the probability of a success is given by $p=\frac{4}{10}=0.4$`p`=410=0.4 and therefore the probability of a failure $q=1-p$`q`=1−`p` is $0.6$0.6.

Using the binomial probability formula, the probability of $x$`x` successes in the four numbers selected becomes:

$P(x)=\binom{4}{x}(0.4)^x(0.6)^{4-x}$`P`(`x`)=(4`x`)(0.4)`x`(0.6)4−`x` $x=0,1,2,3,4$`x`=0,1,2,3,4

These probabilities can be listed in a table like this:

$x$x |
$P(x)$P(x) |
---|---|

$0$0 | $0.1296$0.1296 |

$1$1 | $0.3456$0.3456 |

$2$2 | $0.3456$0.3456 |

$3$3 | $0.1536$0.1536 |

$4$4 | $0.0256$0.0256 |

From the table it is clear that the most likely outcome is either $1$1 or $2$2 successes (primes) in any set of four numbers selected.

Indeed the mean or expected value is determined as $4\times0.4=1.6$4×0.4=1.6 and the variance is given as $4\times0.4\times0.6=0.96$4×0.4×0.6=0.96.

We now imagine that the computer generates a sample of four numbers.

The proportion of successes $\hat{p}$^`p` in our sample can only be one of five possibilities. If there are no primes, then there are no successes, and our proportion becomes $\frac{0}{4}=0$04=0. If there is exactly one prime, the proportion becomes $\frac{1}{4}$14 and so on. The probability of these proportions are equivalent to the binomial probabilities shown in the above table.

We create a new table showing the binomial probabilities along side these proportions as follows:

$\hat{p}$^p |
$P(\hat{p})$P(^p) |
---|---|

$0$0 | $0.1296$0.1296 |

$0.25$0.25 | $0.3456$0.3456 |

$0.5$0.5 | $0.3456$0.3456 |

$0.75$0.75 | $0.1536$0.1536 |

$1$1 | $0.0256$0.0256 |

This table is the exact probability distribution of the sample proportion $\hat{p}$^`p` for this particular computer program.

We can answer a number of probability questions about the distribution of sample proportions.

For example, referring to the computer program scenario above, we might ask what is the probability that $\hat{p}$^`p` is less than $0.6$0.6.

From the table, the answer is given by the sum $0.1296+0.3456+0.3456=0.8208$0.1296+0.3456+0.3456=0.8208.

As another example, to find $P(\hat{p}<0.8|\hat{p}\ge0.25)$`P`(^`p`<0.8|^`p`≥0.25) we use the conditional probability law as follows:

$P(\hat{p}<0.8|\hat{p}\ge0.25)$P(^p<0.8|^p≥0.25) |
$=$= | $\frac{(\hat{p}<0.8)\cap(\hat{p}\ge0.25)}{\hat{p}\ge0.25}$(^p<0.8)∩(^p≥0.25)^p≥0.25 |

$=$= | $\frac{0.3456+0.3456+0.1536}{0.3456+0.3456+0.1536+0.0256}$0.3456+0.3456+0.15360.3456+0.3456+0.1536+0.0256 | |

$=$= | $\frac{0.8448}{0.8704}$0.84480.8704 | |

$=$= | $0.9706$0.9706 | |

A dog has three puppies.

Let $M$`M` represent the number of male puppies in this litter.

If a dog has $3$3 puppies, then the number of male puppies, $M$

`M`, can be $0$0, $1$1, $2$2 or $3$3.What are the values of the proportions, $\hat{P}$^

`P`of male puppies in the litter associated with each outcome of $M$`M`?If $M=0$

`M`=0: $\hat{P}$^`P`$=$=$\editable{}$If $M=1$

`M`=1: $\hat{P}$^`P`$=$=$\editable{}$If $M=2$

`M`=2: $\hat{P}$^`P`$=$=$\editable{}$If $M=3$

`M`=3: $\hat{P}$^`P`$=$=$\editable{}$Construct the probability distribution for $M$

`M`and $\hat{P}$^`P`below.$m$ `m`$0$0 $1$1 $2$2 $3$3 $P$ `P`$($($M=m$`M`=`m`$)$)$\frac{1}{8}$18 $\editable{}$ $\editable{}$ $\editable{}$ $\hat{p}$^ `p`$0$0 $\frac{1}{3}$13 $\frac{2}{3}$23 $1$1 $P$ `P`$($($\hat{P}=\hat{p}$^`P`=^`p`$)$)$\editable{}$ $\frac{3}{8}$38 $\editable{}$ $\editable{}$ Use your answers from part (b) to determine $P$

`P`$($($\hat{P}>\frac{1}{2}$^`P`>12$)$).

In Western Australia it has been shown that $40%$40% of all voters are in favour of daylight saving. A sample of $5$5 voters are selected from Western Australia at random.

What are the possible value of the sample proportion, $\hat{P}$^

`P`, of individuals that are in favour of daylight saving in the sample?Write your answers from smallest to largest in the empty boxes below, simplifying where possible.

$\hat{P}$^ `P`$\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ Construct a probability distribution table which summarises the sample proportion of individuals from Western Australia who favoured daylight saving.

Give your answers correct to four decimal places.

$\hat{p}$^ `p`$0$0 $\frac{1}{5}$15 $\frac{2}{5}$25 $\frac{3}{5}$35 $\frac{4}{5}$45 $1$1 $P$ `P`$($($\hat{P}=\hat{p}$^`P`=^`p`$)$)$\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ Determine $P$

`P`$($($\hat{P}$^`P`$<$<$\frac{3}{5}$35$)$), using the results of part (b). Round your answer to the nearest four decimal places.

Two dice are rolled and the absolute value of the differences between the numbers appearing uppermost are recorded.

Complete the table below that represents the sample space.

Die $2$2 **1****2****3****4****5****6**Die $1$1 **1**$0$0 $\editable{}$ $\editable{}$ $3$3 $\editable{}$ $\editable{}$ **2**$1$1 $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ **3**$\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ $2$2 $\editable{}$ **4**$\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ **5**$4$4 $\editable{}$ $2$2 $\editable{}$ $\editable{}$ $\editable{}$ **6**$\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ Let $X$

`X`be defined as the absolute value of the difference between the two dice. Construct the probability distribution for $X$`X`using the table below.Enter the values of $x$

`x`from left to right in ascending order, and simplify each probability.$x$ `x`$\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ $P$ `P`$($($X=x$`X`=`x`$)$)$\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ What is the probability, $p$

`p`, that $X>3$`X`>3?Two dice were rolled $3$3 times. Their absolute difference was recorded.

Let $Y$

`Y`be the number of times the absolute difference was greater than $3$3. Then $Y$`Y`can be $0$0, $1$1, $2$2 or $3$3.What is $\hat{P}$^

`P`, the sample proportion of absolute differences greater than $3$3 associated with each outcome of $Y$`Y`?If $Y=0$

`Y`=0: $\hat{P}$^`P`$=$=$\editable{}$If $Y=1$

`Y`=1: $\hat{P}$^`P`$=$=$\editable{}$If $Y=2$

`Y`=2: $\hat{P}$^`P`$=$=$\editable{}$If $Y=3$

`Y`=3: $\hat{P}$^`P`$=$=$\editable{}$Construct the probability distribution for $Y$

`Y`and $\hat{P}$^`P`below.Write each probability correct to four decimal places.

$y$ `y`$0$0 $1$1 $2$2 $3$3 $P$ `P`$($($Y=y$`Y`=`y`$)$)$\editable{}$ $\editable{}$ $0.0694$0.0694 $\editable{}$ $\hat{p}$^ `p`$0$0 $\frac{1}{3}$13 $\frac{2}{3}$23 $1$1 $P$ `P`$($($\hat{P}=\hat{p}$^`P`=^`p`$)$)$\editable{}$ $\editable{}$ $\editable{}$ $0.0046$0.0046 Use the results of part (e) to determine $P$

`P`$($($\hat{P}$^`P`$<$<$1$1$)$).Round your answer to four decimal places.

Make inferences from surveys and experiments: A determining estimates and confidence intervals for means, proportions, and differences, recognising the relevance of the central limit theorem B using methods such as resampling or randomisation to assess

Use statistical methods to make a formal inference