We can compare samples of two different populations to draw inferences about the populations without having to gather data on every individual in the population.
By using the measures of central tendency of a data set (that is, the mean, median, and mode), as well as measures of spread (such as the range, interquartile range and mean absolute deviation), we can make clear comparisons and contrasts between different groups.
We can also examine the shape of the distribution of two sets of data when comparing them.
Suppose you want to know whether children's cereals available in your local grocery store have more sugar than adult cereals. You randomly select 20 boxes of children's cereals and 20 boxes of adult cereals and measure the percent of the weight per serving that contains sugar. Your results can be summarized in the following double box plot:
Sample median (%) | IQR (%) | |
---|---|---|
Adult's cereal | 11 | 12.5 |
Kid's cereal | 46 | 6.5 |
In the exploration above we saw that the samples of the two different populations had a different in medians that was much larger than the interquartile range. Almost three times bigger, in fact. This supports that there was a meaningful difference between the populations.
In general, if the difference in centers between two population samples is 2 or more times greater than the measure of variability, we can say that there is likely a meaningful difference between the populations. Otherwise, we do not have significant evidence to support a difference in the populations.
The following box-and-whisker plot shows the number of points scored by two basketball teams in each of their matches last season.
What is the median score of Team A?
What is the median score of Team B?
What is the range of Team A’s scores?
What is the range of Team B’s scores?
What is the interquartile range of Team A’s scores?
What is the interquartile range of Team B’s scores?
The box plots summarize results from a medical study. The treatment group received an experimental drug to relieve cold symptoms, and the control group received a placebo. The box plots show the number of days each group continued to report symptoms.
Which of the following statements are true?
There is an outlier in the treatment group of 16.
Only the control group plot is skewed to the right.
The skew is more prominent in the treatment group.
In the treatment group, cold symptoms lasted 0 to 13 days (\text{range}=13) versus 4 to 12 days (\text{range}=8) for the control group.
It appears that the drug had a positive effect on patient recovery.
In general, we can say that there is likely a meaningful difference between two populations if:
If measurements from the samples do not show either of the above, then no conclusion can be drawn.