topic badge

9.08 Compare data sets

Lesson

We have seen a useful way to compare numerical data sets is parallel box plots as these compare the two sets on the same scale. We can create a similar comparison by creating a back-to-back stem plot. Comparing the shape of histograms also provides a visual way to compare two or more sets of data. 

Back-to-back stem plots

The advantage of using a back-to-back stem plot to compare two sets of data is that the original data is retained and we can calculate the mean, mode and other statistics exactly. A disadvantage is that it is only suitable for small to medium data sets and we can only compare two sets of data at a time.

Two sets of data can be displayed side-by-side using a back-to-back stem plot. In the example below, the pulse rates of $18$18 students were recorded before and after exercise.

Reading a back-to-back stem plot is very similar to reading a regular stem plot.

Referring to the example above:

  • The central column displays the stems, with the leaf values on each side.
  • The values on the left are the pulse rates of the students before exercise, while the values on the right are their pulse rates after exercise.
  • In this example, the fourth row of the plot, $4$4 $3$3 $0$0 $\mid$ $8$8 $\mid$ $2$2 $2$2 $6$6, displays pulse rates of $80$80, $83$83 and $84$84 before exercise and pulse rates of $82$82, $82$82 and $86$86 after exercise. They are not necessarily the pulse rates of the same students.
  • On both sides of the stem column, the leafs are displayed in ascending order with the lowest value closest to the stem.

To create a stem plot, it is usually easier to arrange all of the data values in ascending order, before ordering them in the plot.

 

Practice questions

Question 1

10 participants had their pulse measured before and after exercise with results shown in the stem and leaf plot below.

Key: 6 | 1 | 2 $=$= 12 and 16
  1. What is the mode pulse rate after exercise?

    $\editable{}$

  2. What is the range of pulse rates before exercise?

  3. What is the range of pulse rates after exercise?

  4. Calculate the mean pulse rate before exercise.

  5. What is the mean pulse rate after exercise?

  6. What can you conclude from the measures of centre and spread that you have just calculated?

    The mode pulse rate is the best comparison of pulse rates before and after exercise.

    A

    The range of pulse rates decreases after exercise.

    B

    The range of pulse rates increasing after exercise shows that some people are fitter than others.

    C

    The range of pulse rates and the mean pulse rate increase after exercise.

    D

Question 2

The data below shows the results of a survey conducted on the price of concert tickets locally and the price of the same concerts at an international venue.

Local International
Stem Leaf
$6$6 $0$0 $4$4 $6$6 $7$7
$7$7 $3$3 $5$5 $6$6 $6$6 $7$7
$8$8 $2$2 $4$4 $4$4 $5$5 $7$7
$9$9 $1$1 $4$4 $6$6 $7$7 $9$9
$10$10 $4$4
 
Stem Leaf
$6$6 $0$0 $7$7
$7$7 $0$0 $0$0 $3$3 $4$4
$8$8 $0$0 $5$5 $6$6 $6$6
$9$9 $1$1 $1$1 $3$3 $4$4 $6$6
$10$10 $1$1 $4$4 $4$4 $5$5 $6$6
 
Key: $1$1$\mid$$2$2$=$=$12$12
  1. What was the most expensive ticket price at the international venue?

    $\editable{}$ dollars

  2. What was the median ticket price at the international venue? Leave your answer to two decimal places if needed.

  3. What percentage of local ticket prices were cheaper than the international median?

  4. At the international venue, what percentage of tickets cost between $\$90$$90 and $\$110$$110 (inclusive)?

  5. At the local venue, what percentage of tickets cost between $\$90$$90 and $\$100$$100 (inclusive)?

Question 3

Two friends have been growing sunflowers. They have measured the height of their sunflowers to the nearest cm, with their results shown below:

Quentin$=$=$39,18,14,44,37,18,23,28$39,18,14,44,37,18,23,28

Tricia$=$=$49,25,42,5,47,12,15,8,35,22,28,6,21$49,25,42,5,47,12,15,8,35,22,28,6,21

  1. Display the data on the stem-and-leaf plot.

    Quentin   Tricia
      0   $\editable{}$   $\editable{}$   $\editable{}$
    $\editable{}$   $\editable{}$   $\editable{}$   1   $\editable{}$   $\editable{}$
    $\editable{}$   $\editable{}$   2   $\editable{}$   $\editable{}$   $\editable{}$   $\editable{}$
    $\editable{}$   $\editable{}$   3   $\editable{}$
    $\editable{}$   4   $\editable{}$   $\editable{}$   $\editable{}$
  2. What is the median length of Tricia's sunflowers?

  3. What is the median length of Quentin's sunflowers?

  4. Which of these statements is true?

    Quentin's flowers have a higher median length and larger range of lengths, which shows that Quentin has taller flowers overall.

    A

    Tricia's flowers have a higher median length and larger range of lengths, which shows that Tricia has taller flowers overall.

    B

    Tricia's flowers have a higher median length and smaller range of lengths, which shows that Tricia has taller flowers overall.

    C

    Quentin's flowers have a higher median length and smaller range of lengths, which shows that Quentin has taller flowers overall.

    D

 

Comparing histograms and column graphs

Histograms and column graphs can be compared in a number of ways. The columns can be drawn side by side or the entire graphs can be drawn back to back as shown in the diagrams below. When comparing histograms we often comment on differences in features such as location, spread, skewness and modality.

Side by side column graph Back to back histogram
   

Practice questions

Question 4

Two Science classes, each with $20$20 students, were given a $10$10 question True/False test. The results for each class are shown below.

                                      Class 1Questions CorrectNumber of Students5104567     Class 2Questions CorrectNumber of Students51078910
  1. Do you think Class 1 studied for their test?

    Yes

    A

    No

    B
  2. Which statistical piece of evidence supports your answer?

    The mean.

    A

    The range.

    B

    The positive skew of the graph.

    C
  3. Do you think Class 2 studied for their test?

    Yes

    A

    No

    B
  4. Which statistical piece of evidence supports your answer?

    The mean.

    A

    The range.

    B

    The graph is negatively skewed.

    C
  5. Which statistic is the same for each class?

    The mode.

    A

    The range.

    B
  6. Calculate the mean for class 1 correct to one decimal place:

  7. Calculate the mean for class 2 correct to one decimal place.

Question 5

Consider the column graph that shows the number of blood donations per month in a given year.

  1. Which state had the most donations?

    NSW

    A

    VIC

    B
  2. Both states show a period with a lower number of donations due to cold and flu symptoms preventing donors being eligible. Which period is this?

    Autumn months

    A

    Summer months

    B

    Spring months

    C

    Winter months

    D
  3. Which month had the highest monthly donations?

    May

    A

    January

    B

    November

    C

    April

    D
  4. If the total donated over a year across all states is $763542$763542 units and $2%$2% is used for trauma and road accidents, how many units of blood is required in a year for these incidents?

Question 6
Consider the back to back column chart as shown below which compares the ages of male and female students in a primary school.

  1. Which distribution is positively skewed?

    Male
    A
    Female
    B
  2. Which distribution has the highest mode?
    Female
    A
    Male
    B
  3. Which distribution has the highest mean?

    Female
    A
    Male
    B

 

Outcomes

2.1.14

compare back to back stem plots for different data sets

2.1.17

compare the characteristics of the shape of histograms using symmetry, skewness and bimodality

What is Mathspace

About Mathspace