topic badge
iGCSE (2021 Edition)

22.08 Comparing data sets

Lesson


Key comparisons

There are many things to keep in mind when comparing two sets of data. A few of the most important questions to ask yourself are:

  • How do the spreads of data compare?
  • How do the skews compare? Is one set of data more symmetrical? 
  • Is there a big difference in the medians?

 

Back-to-back stem plots

A back-to-back stem plot is very similar to a regular stem plot, in that the "stem" is used to group the scores and each "leaf" indicates the individual scores within each group.

In a back-to-back stem-and-leaf plot, however, two sets of data are displayed simultaneously. One set of data is displayed with its leaves on the left, and the other with its leaves on the right. The "leaf" values are still written in ascending order from the stem outwards.

This allows us to compare the shape and location of the two distributions side by side.

Histogram comparison

Consider the following histograms that show the height of students in two basketball teams. We know that one graph represents a team made up of Year $12$12 students and the other represents a Year $8$8 team. Which one corresponds to the year 12 team?

Team $A$A

Team $B$B

We can still compare these distributions, even though there is clearly a different number of students in the teams, because we are only interested in the shape and location of the data.

In our comparison we want to mention the most significant differences, and also describe relevant characteristics that are the same, or similar.

In this case we can observe these important similarities and differences:

  • both distributions are approximately symmetrical, and uni-modal.
  • the modal class for Team A of $170-175$170175 cm is much higher than the modal class $150-155$150155 cm for Team B.
  • if we ignore the outlier values in the $195-200$195200 class for Team A, then the range of both distributions is similar, at $35$35 cm for Team A and $30$30 cm for Team B.
  • student heights have greater spread overall for Team A
  • the heights for Team A appear to be concentrated around the modal class so we can say that the a clustered at $170-180$170180 cm.

Based on these observations, we could confidently say that Team A is the team of Year $12$12 students. In this case, the decision is clear because of the difference in the height for the modal class, which we would expect to be significantly higher for the older students.

Why do we compare?

It is important to be able to compare data sets because it helps us make conclusions or judgements about the data. For example, suppose Jim scores $\frac{5}{10}$510 in a geography test and $\frac{6}{10}$610 in a history test. Based on those marks alone, it makes sense to say that he did better in history.

But what if everyone else in his geography class scored $\frac{4}{10}$410, while everyone else in his history class scored $\frac{8}{10}$810? Now we know that Jim had the highest score in the class in geography, and the lowest score in the class in history. With this extra information, it makes more sense to say that he did better in geography.

By comparing the measures of central tendency in a data set (the mean, median and mode), as well as measures of spread (the range and interquartile range), we can make comparisons between different groups and draw conclusions about our data.

Practise questions

question 1

The number of goals scored by Team 1 and Team 2 in a football tournament are recorded.

Match Team 1 Team 2
A $2$2 $3$3
B $4$4 $2$2
C $5$5 $2$2
D $3$3 $5$5
E $3$3 $4$4
Match Team 1 Team 2
A $2$2 $3$3
B $4$4 $2$2
C $5$5 $2$2
D $3$3 $5$5
E $3$3 $4$4
  1. Find the total number of goals scored by both teams in Match C.

  2. What is the total number of goals scored by Team 1 across all the matches?

  3. What is the mean number of goals scored by Team 1?

  4. What is the mean number of goals scored by Team 2?

question 2

The weight (in kilograms) of a group of men and women were recorded and presented in a stem-and-leaf plot as shown.

Women Stem Men
    $7$7 $6$6 $3$3 $5$5          
$8$8 $6$6 $3$3 $1$1 $1$1 $6$6 $2$2 $3$3 $3$3 $8$8 $9$9
      $3$3 $1$1 $7$7 $1$1 $2$2 $4$4 $8$8  
          $8$8 $3$3        
 
Key: $8$8$\mid$$3$3$=$=$83$83
  1. What is the mean weight of the group of men? Express your answer in decimal form.

  2. What is the mean weight of the group of women? Express your answer in decimal form.

  3. Which group is heavier?

    Women

    A

    Men

    B

question 3

A science class with $20$20 students, was given two different $10$10 question True/False tests, one about dinosaurs and one about nanotechnology. The results for each topic test are shown below.

DinosaursQuestions CorrectNumber of Students5104567 NanotechnologyQuestions CorrectNumber of Students51078910
  1. Which topic did the class know more about?

    Dinosaurs

    A

    Nanotechnology

    B
  2. Which statistical piece of evidence supports your answer?

    The positive skew of the graph.

    A

    The mean.

    B

    The range.

    C
  3. Which statistic is the same for each topic?

    The mode.

    A

    The range.

    B
  4. Calculate the mean for the dinosaur topic test. Give your answer correct to one decimal place:

  5. Calculate the mean for the nanotechnology topic test. Give your answer correct to one decimal place.

Outcomes

0580C9.2

Read, interpret and draw simple inferences from tables and statistical diagrams. Compare sets of data using tables, graphs and statistical measures. Appreciate restrictions on drawing conclusions from given data.

0580E9.2

Read, interpret and draw simple inferences from tables and statistical diagrams. Compare sets of data using tables, graphs and statistical measures. Appreciate restrictions on drawing conclusions from given data.

What is Mathspace

About Mathspace