10. Sampling and estimation

Lesson

Before we launch into some applications of confidence intervals, let's recap some of the important calculations and interpretations from the previous section.

Calculating a confidence interval

A confidence interval is calculated in the following way:

$(\hat{p}-k\times\sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}},\hat{p}+k\times\sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}})$(^`p`−`k`×√^`p`×(1−^`p`)`n`,^`p`+`k`×√^`p`×(1−^`p`)`n`)

Where $\hat{p}$^`p` is the sample proportion taken from our particular sample, $n$`n` is the size of our sample, and $k$`k` is the $z$`z`-score associated with the level of confidence we wish to achieve.

Common confidence intervals and their associated $z$`z`-scores:

- $90%$90% confidence interval has $k\approx1.645$
`k`≈1.645 - $95%$95% confidence interval has $k\approx1.960$
`k`≈1.960 - $99%$99% confidence interval has $k\approx2.576$
`k`≈2.576

Calculating the margin of error

The margin of error is the distance from $\hat{p}$^`p` to either end of the confidence interval. This means we can calculate it in a number of ways.

- If we have the confidence interval $(a,b)$(
`a`,`b`) we can simply calculate the margin of error as $\frac{b-a}{2}$`b`−`a`2 - If we have $\hat{p}$^
`p`and the confidence interval $(a,b)$(`a`,`b`) we can calculate the margin of error as $b-\hat{p}$`b`−^`p`or $\hat{p}-a$^`p`−`a` - If we don't yet have the confidence interval we can use the portion of the confidence interval calculation that gives the margin of error which is $k\times\sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}}$
`k`×√^`p`×(1−^`p`)`n` where $k$`k`and $n$`n`are as noted above.

Interpreting a confidence interval and the margin of error

Recall that a confidence interval represents the level of confidence we have that the true population proportion, $p$`p`, will fall in the domain of the interval we have calculated.

The higher the level of confidence, the larger the margin of error we must tolerate, due to the "wider net" we have created to "catch" the true proportion $p$`p`.

The size of the margin of error is influenced by the level of confidence, the size of the sample and the value of the sample proportion.

As confidence intervals form an interval estimate of the population proportion, we can use them as a rudimentary tool for assessing claims about the population proportion.

A coin is tossed $250$250 times and the coin landed heads $105$105 times.

**(a) **Find the sample proportion $\hat{p}$^`p` of the number of times the coin landed heads.

**Think:** $\hat{p}=\frac{x}{n}$^`p`=`x``n`, where $x$`x` is the number of 'heads' and $n$`n` is the sample size.

**Do:**

$\hat{p}$^p |
$=$= | $\frac{x}{n}$xn |

$=$= | $\frac{105}{250}$105250 | |

$=$= | $\frac{21}{50}$2150 |

**(b)** Create a $95%$95% confidence interval for the proportion of heads that we expect to appear when using this coin.

**Think:** Use the calculator with $x=105$`x`=105, $n=250$`n`=250 and confidence level $=5%$=5% to obtain the confidence interval for the population proportion.

**Do:**

$95%$95% confidence interval: $\left(0.359,0.481\right)$(0.359,0.481)

**(c)** Assess whether or not the coin is fair.

**Think:** Look to see if the expected population proportion of heads for a fair coin lies within the confidence interval.

**Do:** If the coin was fair the population proportion should be $0.5$0.5, however, this is not within our $95%$95% confidence interval. This could mean either we have an unusual sample, since $5%$5% of such intervals created from a sample will not contain the population proportion, or the coin is biased and is less likely to show heads than a fair coin.

We can state at a $95%$95% confidence level the coin does not appear to be fair.

**(d)** A second coin is tossed $250$250 times and the coin landed heads $127$127 times. Use a $95%$95% confidence interval to assess whether or not the coin is fair.

**Think: **Use the calculator to create the $95%$95% confidence interval and look to see if the expected proportion of $0.5$0.5 heads lies within the confidence interval.

**Do:** Using the calculator with $x=127$`x`=127, $n=250$`n`=250 and confidence level $95%$95%, we obtain:

$95%$95% Confidence interval: $\left(0.446,0.570\right)$(0.446,0.570)

If the coin was fair the population proportion should be $0.5$0.5 and this is within our $95%$95% confidence interval. However, the population proportion may be anywhere within this range, so we cannot be sure the coin is not in fact biased. Instead, we can say we have insufficient evidence to refute the coin is fair at a $95%$95% confidence level.

**Reflect:** At a given confidence level we can refute a claim if the asserted proportion does not lie within the confidence interval. However, we cannot accept a claim that a proportion is a certain value given the value lies within the confidence interval. We can simply state that there is insufficient evidence to refute the claim.

In a sample of $350$350 people, it is found that only $1$1 has blood type B-negative.

Let $p$

`p`represent the proportion of the population that have blood type B-negative.Find an estimate for $p$

`p`.Find an approximate two-sided $95%$95% confidence interval for $p$

`p`.Give your answer as an interval in the form $\left(a,b\right)$(

`a`,`b`), rounding all values to four decimal places.Select the most appropriate interpretation of the confidence interval found in part (b).

We are $95%$95% confident that the probability that a person has blood type B-negative is contained within this interval.

AThe probability that a person has blood type B-negative is not contained within this interval.

BThe probability that a person has blood type B-negative is contained within this interval.

CThere is a $95%$95% chance that the probability that a person has blood type B-negative is contained within this interval.

DWe are $95%$95% confident that the probability that a person has blood type B-negative is contained within this interval.

AThe probability that a person has blood type B-negative is not contained within this interval.

BThe probability that a person has blood type B-negative is contained within this interval.

CThere is a $95%$95% chance that the probability that a person has blood type B-negative is contained within this interval.

DOne measure of the validity of a confidence interval is that the product of the sample size $n$

`n`and the population proportion $p$`p`is greater than $5$5.Estimate this product for the blood type sample.

Given the result of part (d), select the most appropriate statement below.

Since $np<5$

`n``p`<5 for our estimate, we cannot be sure that the sampling distribution is approximately normal and so the confidence interval is not valid.ASince $np>5$

`n``p`>5 for our estimate, we know that the sampling distribution is approximately normal and so the confidence interval is valid.BSince $np<5$

`n``p`<5 for our estimate, we cannot be sure that the sampling distribution is approximately normal and so the confidence interval is not valid.ASince $np>5$

`n``p`>5 for our estimate, we know that the sampling distribution is approximately normal and so the confidence interval is valid.B

A random survey was conducted to estimate the proportion of people who favoured reading using an e-reader over a standard book. It was found that $286$286 out of $419$419 people surveyed preferred the e-reader.

Determine the sample proportion $\hat{p}$^

`p`of those in the survey who preferred to use an e-reader.Round your answer to two decimal places.

Working with a two-sided confidence interval of $90%$90%, estimate the minimum sample size necessary to ensure a margin of error of at most $0.05$0.05 if the sample proportion remains the same.

Using the sample proportion $\hat{p}$^

`p`from the initial survey, the $95%$95% confidence interval for $p$`p`is $0.64\le\hat{p}\le0.72$0.64≤^`p`≤0.72.Considering this interval, which of the following surveys are more likely to be representative of the total population?

A random sample of $79$79 at a book store found that $31$31 had a preference for e-readers.

AA random sample of $365$365 at an inner city park found that $256$256 had a preference for e-readers.

BA random sample of $79$79 at a book store found that $31$31 had a preference for e-readers.

AA random sample of $365$365 at an inner city park found that $256$256 had a preference for e-readers.

B

The proportion of the population of the United States thought to have Celiac disease is $p$`p`. A sample of $2000$2000 Americans were surveyed for the disease and a confidence interval for the sample proportion was calculated as $\left(0.0089,0.0121\right)$(0.0089,0.0121).

How many people in this sample had the disease?

Use the margin of error to find the $z$

`z`-score, $z$`z`, for this confidence interval.Round your answer to three decimal places.

What is the level of confidence for this sample?

Give your answer as a percentage and round to the nearest percent.

A chocolate company claims that $24%$24% of their chocolate drops are blue. Quiana buys a packet to test the claim, and out of $210$210 candies $49$49 were blue.

State the sample proportion $\hat{p}$^

`p`of the number of blue chocolate drops.Construct an approximate two-sided $95%$95% confidence interval for the population proportion of blue chocolate drops.

Give your answer in the form $\left(a,b\right)$(

`a`,`b`), rounding each endpoint to three decimal places.Is there evidence to refute the claim?

No, because the confidence interval for the proportion of blue chocolate drops expected contains the claimed proportion.

ANo, because the confidence interval for the proportion of blue chocolate drops expected does not contain the claimed proportion.

BYes, because the confidence interval for the proportion of blue chocolate drops expected does not contain the claimed proportion.

CYes, because the confidence interval for the proportion of blue chocolate drops expected contains the claimed proportion.

DNo, because the confidence interval for the proportion of blue chocolate drops expected contains the claimed proportion.

ANo, because the confidence interval for the proportion of blue chocolate drops expected does not contain the claimed proportion.

BYes, because the confidence interval for the proportion of blue chocolate drops expected does not contain the claimed proportion.

CYes, because the confidence interval for the proportion of blue chocolate drops expected contains the claimed proportion.

DQuiana is not convinced and buys a larger bag. This time, out of $2140$2140 chocolate drops, $472$472 were blue. Does this sample offer evidence to refute the claim at a $95%$95% confidence level?

Yes, because the confidence interval for the proportion of blue chocolate drops expected contains the claimed proportion.

ANo, because the confidence interval for the proportion of blue chocolate drops expected does not contain the claimed proportion.

BYes, because the confidence interval for the proportion of blue chocolate drops expected does not contain the claimed proportion.

CNo, because the confidence interval for the proportion of blue chocolate drops expected contains the claimed proportion.

DABCD

use the approximate confidence interval [ ˆp-√(ˆp(1−ˆp)/n, ˆp+z√(ˆp(1−ˆp)/n), as an interval estimate for p, where z is the appropriate quantile for the standard normal distribution

define the approximate margin of error E=z√(ˆp (1−ˆp)/n and understand the trade-off between margin of error and level of confidence

use simulation to illustrate variations in confidence intervals between samples and to show that most but not all confidence intervals contain p