The table below shows the different displays that can be used depending on the type of response and explanatory variables.
Response variable | Explanatory variable | Display |
---|---|---|
Categorical | Categorical | Two-way frequency table, Segmented bar chart |
Numerical | Categorical (two categories only) | Back to back stem plot, Parallel box plot, Parallel dot plot |
Numerical | Categorical | Parallel box plot, Parallel dot plot |
There are many things to keep in mind when comparing two sets of data. A few of the most important questions to ask yourself are:
How do the spreads of data compare?
How do the skews compare? Is one set of data more symmetrical?
Is there a big difference in the medians?
A back-to-back stem plot is very similar to a regular stem plot, in that the "stem" is used to group the scores and each "leaf" indicates the individual scores within each group.
In a back-to-back stem-and-leaf plot, however, two sets of data are displayed simultaneously. One set of data is displayed with its leaves on the left, and the other with its leaves on the right. The "leaf" values are still written in ascending order from the stem outwards.
The data below shows the results of a survey conducted on the price of concert tickets locally and the price of the same concerts at an international venue.
Local | International | |
---|---|---|
7\ 5\ 2\ 2 | 6 | 0\ 5 |
9\ 6\ 5\ 4\ 0 | 7 | 2\ 3\ 8\ 8 |
9\ 6\ 5\ 3\ 0 | 8 | 2\ 3\ 7\ 8 |
8\ 7\ 4\ 3\ 1 | 9 | 0\ 1\ 6\ 7\ 9 |
5 | 10 | 0\ 2\ 3\ 5\ 8 |
\text{ Key: } 6|1|2 = \$ 16 \text{ and } \$ 12
What was the most expensive ticket price at the international venue?
What was the median ticket price at the international venue? Leave your answer to two decimal places if needed.
What percentage of local ticket prices were cheaper than the international median?
At the international venue, what percentage of tickets cost between \$90 and \$110 (inclusive)?
At the local venue, what percentage of tickets cost between \$90 and \$100 (inclusive)?
The back-to-back stem plots show the number of pieces of paper used over several days by Maximillian’s and Charlie’s students.
Maximillian's students | Charlie's students | |
---|---|---|
7 | 0 | 7 |
3 | 1 | 1\ 2\ 3 |
8 | 2 | 8 |
4\ 3 | 3 | 2\ 3\ 4 |
7\ 6\ 5 | 4 | 9 |
3\ 2 | 5 | 2 |
Key: 6 \vert 1 \vert 2 = 16 \text{ and }12
Which of the following statements are true?
I. Maximillian's students did not use 7 pieces of paper on any day.
II. Charlie's median is higher than Maximillian’s median.
III. The median is greater than the mean in both groups.
A back-to-back stem plot is very similar to a regular stem plot, in that the "stem" is used to group the scores and each "leaf" indicates the individual scores within each group.
Parallel box plots are used to compare two sets of data visually. Remember that a box plot is a visual display of the information in a five number summary. As such, these values are the important parts to compare:
Minimum
Q1 (lower quartile)
Median
Q3 (upper quartile)
Maximum
Parallel box plots are presented parallel to each other, along the same horizontal scale for comparison. Since they are in the same scale, a visual comparison is fairly straightforward. It is important to clearly label each box plot.
Here is an example:
Looking at the parallel box plots, we can see that overall the under 30s were faster at completing the task. Both the under 30s box plot and the over 30s box plot are slightly negatively skewed. Over 75\% of the under 30s completed the task in under 22 seconds, which is the median time taken by the over 30s. 100\% of the under 30s had finished the task before 75\% of the over 30s had completed it. Overall the under 30s performed better and had a smaller spread of scores. There was a larger variance within the over 30 group, with a range of 24 seconds compared to 20 seconds for the under 30s.
The box plots show the monthly profits (in thousands of dollars) of two derivatives traders over a year:
Who made a higher median monthly profit?
Whose profits had a higher interquartile range?
Whose profits had a higher range?
How much more did Ned make in his most profitable month than Tobias did in his most profitable month?
The box plots below represent the daily sales made by Carl and Angelina over the course of one month.
What is the range in Angelina's sales?
What is the range in Carl's sales?
By how much did Carl's median sales exceed Angelina's?
Considering the middle 50\% of sales for both sales people, whose sales were more consistent?
Which salesperson had a more successful sales month?
Parallel box plots are used to compare two or more sets of data visually. These box plots are presented parallel to each other along the same number line using the same scale.
Parallel dot plots are another way to compare two or more sets of data. They must be plotted against the same scale using the same units. This makes the comparison between the data sets easy and ensures it isn't misleading. When creating a parallel dot plot, it's important to take the time to make sure everything is lined up correctly.
A class completed 40 questions for homework. The time needed for boys and girls to finish them was collected, and the data was presented as a parallel dot plot:
Comparing boys and girls, which gender had the highest median time?
Which gender had the largest range?
Which group has the highest valued mode?
Isabelle did an experiment to see how well plants grow in different conditions.
She had 8 plants grow in the sunshine, and 8 that grew in the shade. She measured how tall they grew in centimetres after 2 months, and recorded the information as a parallel dot plot.
Which group of plants had a higher range of heights?
Which dot plot shows a positive skew?
How much higher is the median height of plants grown in the sunshine than the median height of plants grown in the shade?
Parallel dot plots are another way to compare two or more sets of data. They must be plotted against the same scale using the same units.