Stemandleaf plots were introduced by John W. Tukey in his book Exploratory Data Analysis in 1977. They provided a quick way of revealing the distribution properties of a moderate sized data set.
A single stemandleaf plot can be used to display univariate data, while a double stemandleaf plot is used for bivariate data, that is, when two data sets are to be compared.
Data sets are compared, for example, when we wish to know whether a treatment has had an effect, or whether two samples represent the same or different populations . Also, comparison of data sets might suggest the possibility of a causal relationship between quantities being measured.
The following is based on material from earlier chapters accessible by clicking on the links 'single' and 'double'.
Stemandleaf plots give a good overview of the shape of the data. They are like histograms turned on their sides. We can identify any skew, outliers and/or clustering. Further, since each individual score is recorded in a stemandleaf plot, in ascending order, it is easy to locate the median and the quartiles. The plot also makes it easy to identify the mode in a data set.
In a stemandleaf plot, we split the first digit(s) in a score from the other digits. The first digit(s) becomes the "stem" and the other digits become the "leaf."
The scores $10,13,16,21,26,27,28,35,35,36,41,41,45,46,49,50,53,56,58$10,13,16,21,26,27,28,35,35,36,41,41,45,46,49,50,53,56,58 are displayed in the stemandleaf plot below.
The "stem" is a column and the stem values are written downwards in that column. The "leaf" values are written across in rows corresponding to the "stem" value. The "leaf" values are written in ascending order from the stem outwards.
The "stem" is used to group the scores and each "leaf" indicates the individual scores within each group. This is like a histogram with bin widths of $10$10 with the advantage that the contents of the bins are not lost.
All the scores are written in ascending order. When you create your own stemandleaf plot, it is best to write all your scores in order before you start putting them into a stem and leaf plot. In other words, arrange the numbers in order from smallest to largest.
To display two data sets simultaneously, a backtoback stemandleaf plot can be used. This type of display allows comparisons between data sets.
Reading a backtoback stemandleaf plot is very similar to a regular stemandleaf plot. The "stem" is used to group the scores and each "leaf" indicates the individual scores within each group.
The "stem" is a column and the stem values are written downwards in that column. The "leaf" values are written across in the rows corresponding to the "stem" value. In a backtoback stemandleaf plot, one set of data is displayed on the left and one set of data is written on the right. The "leaf" values are still written in ascending order from the stem outwards.
If you have to create your own stemandleaf plot, it's easier to write all your scores in ascending order before you start putting them into a stem and leaf plot.
The stemandleaf plot below shows the age of people to enter through the gates of a concert in the first $5$5 seconds.
Stem  Leaf  
$1$1  $1$1 $2$2 $4$4 $5$5 $6$6 $6$6 $7$7 $9$9 $9$9  
$2$2  $2$2 $3$3 $5$5 $5$5 $7$7  
$3$3  $1$1 $3$3 $8$8 $9$9  
$4$4  
$5$5  $8$8  

How many people passed through the gates in the first $5$5 seconds?
What was the age of the youngest person?
The youngest person was $\editable{}$ years old.
What was the age of the oldest person?
The oldest person was $\editable{}$ years old.
What proportion of the concertgoers were under $20$20 years old?
10 participants had their pulse measured before and after exercise with results shown in the stemandleaf plot below.
Key:  6  1  2  $=$=  12 and 16 
What is the mode pulse rate after exercise?
$\editable{}$
How many modes are there for the pulse rate before exercise?
$\editable{}$
What is the range of pulse rates before exercise?
What is the range of pulse rates after exercise?
Calculate the mean pulse rate before exercise.
What is the mean pulse rate after exercise?
What can you conclude from the measures of centre and spread that you have just calculated?
The range of pulse rates decreases after exercise.
The range of pulse rates and the mean pulse rate increase after exercise.
The range of pulse rates increasing after exercise shows that some people are fitter than others.
The mode pulse rate is the best comparison of pulse rates before and after exercise.
Two friends have been growing sunflowers. They have measured the height of their sunflowers to the nearest cm, with their results shown below:
Quentin$=$=$39,18,14,44,37,18,23,28$39,18,14,44,37,18,23,28
Tricia$=$=$49,25,42,5,47,12,15,8,35,22,28,6,21$49,25,42,5,47,12,15,8,35,22,28,6,21
Display the data on the stemandleaf plot.
Quentin  Tricia  
0  $\editable{}$ $\editable{}$ $\editable{}$  
$\editable{}$ $\editable{}$ $\editable{}$  1  $\editable{}$ $\editable{}$ 
$\editable{}$ $\editable{}$  2  $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$ 
$\editable{}$ $\editable{}$  3  $\editable{}$ 
$\editable{}$  4  $\editable{}$ $\editable{}$ $\editable{}$ 
What is the median length of Tricia's sunflowers?
What is the median length of Quentin's sunflowers?
Which of these statements is true?
Quentin's flowers have a higher median length and larger range of lengths, which shows that Quentin has taller flowers overall.
Tricia's flowers have a higher median length and larger range of lengths, which shows that Tricia has taller flowers overall.
Tricia's flowers have a higher median length and smaller range of lengths, which shows that Tricia has taller flowers overall.
Quentin's flowers have a higher median length and smaller range of lengths, which shows that Quentin has taller flowers overall.
Plan and conduct investigations using the statistical enquiry cycle: A justifying the variables and measures used B managing sources of variation, including through the use of random sampling C identifying and communicating features in context (trends, relationships between variables, and differences within and between distributions), using multiple displays D making informal inferences about populations from sample data E justifying findings, using displays and measures.
Investigate a given multivariate data set using the statistical enquiry cycle
Investigate bivariate numerical data using the statistical enquiry cycle