8. Probability & Statistics

Lesson

We can compare samples of two different populations to draw inferences about the populations without having to gather data on every individual in the population.

By using the measures of central tendency of a data set (that is, the **mean, median,** and **mode**), as well as measures of spread (such as the **range, interquartile range** and **mean absolute deviation**), we can make clear comparisons and contrasts between different groups.

We can also benefit from examining the shape of the distribution of two sets of data when comparing them.

Suppose you want to know whether children's cereals available in your local grocery store have more sugar than adult cereals. You randomly select $20$20 boxes of children's cereals and $20$20 boxes of adult cereals and measure the percent of the weight per serving that contains sugar. Your results can be summarized in the following double box plot:

Sample median (%) | IQR (%) | |
---|---|---|

Adult's cereal | $11$11 | $12.5$12.5 |

Kid's cereal | $46$46 | $6.5$6.5 |

Consider the answers to the following questions:

- Are there any adult cereals that have more sugar than kid's cereals? Explain how you know.
- Which sample has a distribution that is not approximately symmetric?
- What is the difference between the sample medians for the two groups?
- Express the difference between the two sample medians as a multiple of the larger interquartile range.
- Do you think there is a meaningful difference between the percent of sugar in adult cereals vs. the percent of sugar in children's cereals? Explain your reasoning.

In the exploration above we saw that the samples of the two different populations had a different in medians that was much larger than the interquartile range. Almost three times bigger, in fact! This supports that there was a meaningful difference between the populations.

In general, if the difference in centers between two population samples is $2$2 or more times greater than the measure of variability, we can say that there is likely a meaningful difference between the populations. Otherwise, we do not have significant evidence to support a difference in the populations.

Meaningful difference between populations

In general, we can say that there is likely a meaningful difference between two populations if

- the difference in medians between samples of the two populations is 2 times greater than the larger interquartile range (IQR)
- the difference in means between samples of the two populations is 2 times greater than the larger mean absolute deviation (MAD)

If measurements from the samples do not show either of the above, then no conclusion can be drawn.

The following box-and-whisker plot shows the number of points scored by two basketball teams in each of their matches last season.

Team A 30 40 50 60 Scores |
Team B 30 40 50 60 70 Scores |

What is the median score of Team A?

What is the median score of Team B?

What is the range of Team A’s scores?

What is the range of Team B’s scores?

What is the interquartile range of Team A’s scores?

What is the interquartile range of Team B’s scores?

The boxplots summarize results from a medical study. The treatment group received an experimental drug to relieve cold symptoms, and the control group received a placebo. The boxplots show the number of days each group continued to report symptoms.

Which of the following statements are true?

Control group

0

5

10

15

20

Days

Treatment group

0

5

10

15

20

Days

There is an outlier in the treatment group of $16$16.

True

AFalse

BTrue

AFalse

BOnly the control group plot is skewed to the right.

True

AFalse

BTrue

AFalse

BThe skew is more prominent in the treatment group.

True

AFalse

BTrue

AFalse

BIn the treatment group, cold symptoms lasted $0$0 to $13$13 days ($\text{range }=13$range =13) versus $4$4 to $12$12 days ($\text{range }=8$range =8) for the control group.

True

AFalse

BTrue

AFalse

BIt appears that the drug had a positive effect on patient recovery.

True

AFalse

BTrue

AFalse

B

A scientist examined $10$10 crickets and $10$10 katydids one night, and collected data on how many chirps they made per minute. His observations are presented in the table.

$1$1 | $2$2 | $3$3 | $4$4 | $5$5 | $6$6 | $7$7 | $8$8 | $9$9 | $10$10 | |
---|---|---|---|---|---|---|---|---|---|---|

Crickets | $53$53 | $48$48 | $53$53 | $51$51 | $51$51 | $51$51 | $47$47 | $53$53 | $49$49 | $47$47 |

Katydids | $106$106 | $106$106 | $113$113 | $106$106 | $112$112 | $111$111 | $110$110 | $109$109 | $113$113 | $110$110 |

Calculate the mean number of chirps per minute made by the crickets. Leave your answer to one decimal place if needed.

Hence, calculate the MAD number of cricket chirps.

Crickets are known for their ability to predict temperature. The air temperature (in fahrenheit) can be approximated using the formula $T=N+40$

`T`=`N`+40, where $N$`N`is the number of chirps per minute. What temperature is being predicted by this group of crickets?Katydids can also be used for the same purpose, though the formula converting their chirps per minute to temperature is slightly more complicated, $T=\frac{N+161}{3}$

`T`=`N`+1613. Calculate the temperature being predicted by the group of katydids if the mean and MAD of their chirps is $109.6$109.6 and $2.28$2.28 respectively. Leave your answer to one decimal place.If the actual temperature is $90$90°F, did both approximations perform well?

Yes

ANo

BYes

ANo

BIf you were going to use the observation from a single cricket or a single katydid to predict the temperature, which would be better to use according to the MAD of each group?

Katydid

ACricket

BKatydid

ACricket

B

Informally assess the degree of visual overlap of two numerical data distributions with similar variabilities, measuring the difference between the centers by expressing it as a multiple of a measure of variability.

Use measures of center and measures of variability for numerical data from random samples to draw informal comparative inferences about two populations.