Stem-and-leaf plots were introduced by John W. Tukey in his book Exploratory Data Analysis in 1977. They provided a quick way of revealing the distribution properties of a moderate sized data set.
A single stem-and-leaf plot can be used to display univariate data, while a double stem-and-leaf plot is used for bivariate data, that is, when two data sets are to be compared.
Data sets are compared, for example, when we wish to know whether a treatment has had an effect, or whether two samples represent the same or different populations . Also, comparison of data sets might suggest the possibility of a causal relationship between quantities being measured.
The following is based on material from earlier chapters accessible by clicking on the links 'single' and 'double'.
Stem-and-leaf plots give a good overview of the shape of the data. They are like histograms turned on their sides. We can identify any skew, outliers and/or clustering. Further, since each individual score is recorded in a stem-and-leaf plot, in ascending order, it is easy to locate the median and the quartiles. The plot also makes it easy to identify the mode in a data set.
In a stem-and-leaf plot, we split the first digit(s) in a score from the other digits. The first digit(s) becomes the "stem" and the other digits become the "leaf."
The scores $10,13,16,21,26,27,28,35,35,36,41,41,45,46,49,50,53,56,58$10,13,16,21,26,27,28,35,35,36,41,41,45,46,49,50,53,56,58 are displayed in the stem-and-leaf plot below.
The "stem" is a column and the stem values are written downwards in that column. The "leaf" values are written across in rows corresponding to the "stem" value. The "leaf" values are written in ascending order from the stem outwards.
The "stem" is used to group the scores and each "leaf" indicates the individual scores within each group. This is like a histogram with bin widths of $10$10 with the advantage that the contents of the bins are not lost.
All the scores are written in ascending order. When you create your own stem-and-leaf plot, it is best to write all your scores in order before you start putting them into a stem and leaf plot. In other words, arrange the numbers in order from smallest to largest.
To display two data sets simultaneously, a back-to-back stem-and-leaf plot can be used. This type of display allows comparisons between data sets.
Reading a back-to-back stem-and-leaf plot is very similar to a regular stem-and-leaf plot. The "stem" is used to group the scores and each "leaf" indicates the individual scores within each group.
The "stem" is a column and the stem values are written downwards in that column. The "leaf" values are written across in the rows corresponding to the "stem" value. In a back-to-back stem-and-leaf plot, one set of data is displayed on the left and one set of data is written on the right. The "leaf" values are still written in ascending order from the stem outwards.
If you have to create your own stem-and-leaf plot, it's easier to write all your scores in ascending order before you start putting them into a stem and leaf plot.
The stem-and-leaf plot below shows the age of people to enter through the gates of a concert in the first $5$5 seconds.
Stem | Leaf | |
$1$1 | $1$1 $2$2 $4$4 $5$5 $6$6 $6$6 $7$7 $9$9 $9$9 | |
$2$2 | $2$2 $3$3 $5$5 $5$5 $7$7 | |
$3$3 | $1$1 $3$3 $8$8 $9$9 | |
$4$4 | ||
$5$5 | $8$8 | |
|
How many people passed through the gates in the first $5$5 seconds?
What was the age of the youngest person?
The youngest person was $\editable{}$ years old.
What was the age of the oldest person?
The oldest person was $\editable{}$ years old.
What proportion of the concert-goers were under $20$20 years old?
10 participants had their pulse measured before and after exercise with results shown in the stem-and-leaf plot below.
Key: | 6 | 1 | 2 | $=$= | 12 and 16 |
What is the mode pulse rate after exercise?
$\editable{}$
How many modes are there for the pulse rate before exercise?
$\editable{}$
What is the range of pulse rates before exercise?
What is the range of pulse rates after exercise?
Calculate the mean pulse rate before exercise.
What is the mean pulse rate after exercise?
What can you conclude from the measures of centre and spread that you have just calculated?
The range of pulse rates decreases after exercise.
The range of pulse rates and the mean pulse rate increase after exercise.
The range of pulse rates increasing after exercise shows that some people are fitter than others.
The mode pulse rate is the best comparison of pulse rates before and after exercise.
Two friends have been growing sunflowers. They have measured the height of their sunflowers to the nearest cm, with their results shown below:
Quentin$=$=$39,18,14,44,37,18,23,28$39,18,14,44,37,18,23,28
Tricia$=$=$49,25,42,5,47,12,15,8,35,22,28,6,21$49,25,42,5,47,12,15,8,35,22,28,6,21
Display the data on the stem-and-leaf plot.
Quentin | Tricia | |
0 | $\editable{}$ $\editable{}$ $\editable{}$ | |
$\editable{}$ $\editable{}$ $\editable{}$ | 1 | $\editable{}$ $\editable{}$ |
$\editable{}$ $\editable{}$ | 2 | $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ |
$\editable{}$ $\editable{}$ | 3 | $\editable{}$ |
$\editable{}$ | 4 | $\editable{}$ $\editable{}$ $\editable{}$ |
What is the median length of Tricia's sunflowers?
What is the median length of Quentin's sunflowers?
Which of these statements is true?
Quentin's flowers have a higher median length and larger range of lengths, which shows that Quentin has taller flowers overall.
Tricia's flowers have a higher median length and larger range of lengths, which shows that Tricia has taller flowers overall.
Tricia's flowers have a higher median length and smaller range of lengths, which shows that Tricia has taller flowers overall.
Quentin's flowers have a higher median length and smaller range of lengths, which shows that Quentin has taller flowers overall.