There are many things to keep in mind when comparing two sets of data. A few of the most important questions to ask yourself are:
How do the spreads of data compare?
How do the skews compare? Is one set of data more symmetrical?
Is there a big difference in the medians?
A back-to-back stem-and-leaf plot is very similar to a regular stem-and-leaf plot, in that the "stem" is used to group the scores and each "leaf" indicates the individual scores within each group.
Group A | Group B | |
---|---|---|
7\ 3 | 1 | 0\ 3\ 6 |
5\ 0 | 2 | 1\ 6\ 7\ 8 |
6\ 5\ 5 | 3 | 5\ 5\ 6 |
1\ 1 | 4 | 1\ 1\ 5\ 6\ 9 |
8\ 4\ 3 | 5 | 0\ 3\ 6\ 8 |
Key 5\vert 2 = 25 | Key 2\vert 1 = 21 |
This allows us to compare the shape and location of the two distributions side by side.
The weight (in kilograms) of a group of men and women were recorded and presented in a stem and leaf plot as shown.
Women | Stem | Men |
---|---|---|
7\ 6\ 3 | 5 | |
8\ 6\ 3\ 1\ 1 | 6 | 2\ 3\ 3\ 8\ 9 |
3\ 1 | 7 | 1\ 2\ 4\ 8 |
8 | 3 | |
Key 1\vert 6 = 61 | Key 6\vert 2 = 62 |
What is the mean weight of the group of men?
What is the mean weight of the group of women?
Which group is heavier?
In a back-to-back stem-and-leaf plot two sets of data are displayed simultaneously. One set of data is displayed with its leaves on the left of the stems, and the other with its leaves on the right. The "leaf" values are still written in ascending order from the stem outwards.
Consider the following histograms that show the height of students in two basketball teams. We know that one graph represents a team made up of Year 12 students and the other represents a Year 8 team. Which one corresponds to the Year 12 team?
Team A:
Team B:
We can still compare these distributions, even though there is clearly a different number of students in the teams, because we are only interested in the shape and location of the data.
In our comparison we want to mention the most significant differences, and also describe relevant characteristics that are the same, or similar.
In this case we can observe these important similarities and differences:
In this case we can observe these important similarities and differences:
The modal class for Team A of 170-175 cm is much higher than the modal class 150-155 cm for Team B.
If we ignore the outlier values in the 195-200 class for Team A, then the range of both distributions is similar, at 35 cm for Team A and 30 cm for Team B.
Student heights have greater spread overall for Team A.
The heights for Team A appear to be concentrated around the modal class so we can say that the a clustered at 170-180 cm.
Based on these observations, we could confidently say that Team A is the team of Year 12 students. In this case, the decision is clear because of the difference in the height for the modal class, which we would expect to be significantly higher for the older students.
It is important to be able to compare data sets because it helps us make conclusions or judgements about the data. For example, suppose Jim scores \dfrac{5}{10} in a geography test and \dfrac{6}{10} in a history test. Based on those marks alone, it makes sense to say that he did better in history.
But what if everyone else in his geography class scored \dfrac{4}{10}, while everyone else in his history class scored \dfrac{8}{10}? Now we know that Jim had the highest score in the class in geography, and the lowest score in the class in history. With this extra information, it makes more sense to say that he did better in geography.
By comparing the measures of central tendency in a data set (the mean, median and mode), as well as measures of spread (the range and interquartile range), we can make comparisons between different groups and draw conclusions about our data.
A science class with 20 students, was given two different 10 question True/False tests, one about dinosaurs and one about nanotechnology. The results for each topic test are shown below:
Which topic did the class know more about?
Which statistical piece of evidence supports your answer?
Which statistic is the same for each topic?
Calculate the mean for the dinosaur topic test. Give your answer correct to one decimal place.
Calculate the mean for the nanotechnology topic test. Give your answer correct to one decimal place.
By comparing the measures of central tendency in a data set (the mean, median and mode), as well as measures of spread (the range and interquartile range), we can make comparisons between different groups and draw conclusions about our data.