topic badge

9.03 Comparative inferences

Comparative inferences

We can compare samples of two different populations to draw inferences about the populations without having to gather data on every individual in the population.

By using the measures of central tendency of a data set (that is, the mean, median, and mode), as well as measures of spread (such as the range, interquartile range and mean absolute deviation), we can make clear comparisons and contrasts between different groups.

We can also examine the shape of the distribution of two sets of data when comparing them.

Suppose you want to know whether children's cereals available in your local grocery store have more sugar than adult cereals. You randomly select 20 boxes of children's cereals and 20 boxes of adult cereals and measure the percent of the weight per serving that contains sugar. Your results can be summarized in the following double box plot:

A double box plot showing the percentage of sugar in adult and kid's cereal. Ask your teacher for more information.
Sample median (%)IQR (%)
Adult's cereal1112.5
Kid's cereal466.5
  1. Are there any adult cereals that have more sugar than kid's cereals? Explain how you know.
  2. Which sample has a distribution that is not approximately symmetric?
  3. What is the difference between the sample medians for the two groups?
  4. Express the difference between the two sample medians as a multiple of the larger interquartile range.
  5. Do you think there is a meaningful difference between the percent of sugar in adult cereals vs. the percent of sugar in children's cereals? Explain your reasoning.

In the exploration above we saw that the samples of the two different populations had a different in medians that was much larger than the interquartile range. Almost three times bigger, in fact. This supports that there was a meaningful difference between the populations.

In general, if the difference in centers between two population samples is 2 or more times greater than the measure of variability, we can say that there is likely a meaningful difference between the populations. Otherwise, we do not have significant evidence to support a difference in the populations.

Examples

Example 1

The following box-and-whisker plot shows the number of points scored by two basketball teams in each of their matches last season.

Team A
30
40
50
60
Team B
30
40
50
60
70
a

What is the median score of Team A?

Worked Solution
Create a strategy

The median is indicated by the line inside the box.

Apply the idea

\text{Median}= 46

b

What is the median score of Team B?

Worked Solution
Create a strategy

The median is indicated by the line inside the box.

Apply the idea

\text{Median}= 41

c

What is the range of Team A’s scores?

Worked Solution
Create a strategy

The range is the difference between the greatest value and lowest value indicated by the ends of the two whiskers.

Apply the idea
\displaystyle \text{Range}\displaystyle =\displaystyle 56-33Subtract the lowest score from the greatest score
\displaystyle =\displaystyle 23Evaluate
d

What is the range of Team B’s scores?

Worked Solution
Create a strategy

The range is the difference between the greatest value and lowest value indicated by the ends of the two whiskers.

Apply the idea
\displaystyle \text{Range}\displaystyle =\displaystyle 65-30Subtract the lowest score from the greatest score
\displaystyle =\displaystyle 35Evaluate
e

What is the interquartile range of Team A’s scores?

Worked Solution
Create a strategy

The interquartile range is the difference between the upper and lower quartile indicated by the right and left edges of the box.

Apply the idea
\displaystyle \text{IQR}\displaystyle =\displaystyle 50-43Subtract the lower quartile from the upper quartile
\displaystyle =\displaystyle 7Evaluate
f

What is the interquartile range of Team B’s scores?

Worked Solution
Create a strategy

The interquartile range is the difference between the upper and lower quartile indicated by the right and left edges of the box.

Apply the idea
\displaystyle \text{IQR}\displaystyle =\displaystyle 62-36Subtract the lower quartile from the upper quartile
\displaystyle =\displaystyle 26Evaluate

Example 2

The box plots summarize results from a medical study. The treatment group received an experimental drug to relieve cold symptoms, and the control group received a placebo. The box plots show the number of days each group continued to report symptoms.

Control group
0
5
10
15
20
Treatment group
0
5
10
15
20

Which of the following statements are true?

a

There is an outlier in the treatment group of 16.

Worked Solution
Create a strategy

An outlier in a set of data is an observation that lies away from the remainder of that set of data.

Apply the idea

There is no data of 16 in the treatment group. So this statement is false.

b

Only the control group plot is skewed to the right.

Worked Solution
Create a strategy

A data set is skewed to the right if the longer part of the box is to the right of (or above) the median.

Apply the idea

The longer part of the box for the control group is to the left of the median, so it is not skewed to the right. So this statement is false.

c

The skew is more prominent in the treatment group.

Worked Solution
Create a strategy

Skewed data will have a lop-sided box plot, where the median will cut the box in two unequal pieces.

Apply the idea

The right side of the box plot for the treatment group appears more stretched than either side of the control group's box plot. So this statement is true.

d

In the treatment group, cold symptoms lasted 0 to 13 days (\text{range}=13) versus 4 to 12 days (\text{range}=8) for the control group.

Worked Solution
Create a strategy

The range is the difference between the greatest value and lowest value indicated by the right and left edges of the whisker.

Apply the idea

For the control group:

\displaystyle \text{Range}\displaystyle =\displaystyle 12-4Subtract the most number of days to the least one
\displaystyle =\displaystyle 4Evaluate

For the treatment group:

\displaystyle \text{Range}\displaystyle =\displaystyle 12-0Subtract the most number of days to the least one
\displaystyle =\displaystyle 12Evaluate

The statement is false.

e

It appears that the drug had a positive effect on patient recovery.

Worked Solution
Create a strategy

If the drug had a positive effect on patient recovery, the recovery times should be noticeably shorter.

Apply the idea

This is true since the median is significantly lower for the treatment group.

Idea summary

In general, we can say that there is likely a meaningful difference between two populations if:

  • the difference in medians between samples of the two populations is 2 times greater than the larger interquartile range (IQR)
  • the difference in means between samples of the two populations is 2 times greater than the larger mean absolute deviation (MAD)

If measurements from the samples do not show either of the above, then no conclusion can be drawn.

Outcomes

7.SP.B.3

Informally assess the degree of visual overlap of two numerical data distributions with similar variabilities, measuring the difference between the centers by expressing it as a multiple of a measure of variability.

7.SP.B.4

Use measures of center and measures of variability for numerical data from random samples to draw informal comparative inferences about two populations.

What is Mathspace

About Mathspace