6.4: Stem and Leaf Plots
Histograms are one of the most widely used methods for displaying the observed values for a variable. They’re simple, pretty, and very informative. However, they do take a little bit of effort to draw. Sometimes it can be quite useful to make use of simpler, if less visually appealing, options. One such alternative is the stem and leaf plot . To a first approximation you can think of a stem and leaf plot as a kind of text-based histogram. Stem and leaf plots aren’t used as widely these days as they were 30 years ago, since it’s now just as easy to draw a histogram as it is to draw a stem and leaf plot. Not only that, they don’t work very well for larger data sets. As a consequence you probably won’t have as much of a need to use them yourself, though you may run into them in older publications. These days, the only real world situation where I use them is if I have a small data set with 20-30 data points and I don’t have a computer handy, because it’s pretty easy to quickly sketch a stem and leaf plot by hand.
With all that as background, lets have a look at stem and leaf plots. The AFL margins data contains 176 observations, which is at the upper end for what you can realistically plot this way. The function in R for drawing stem and leaf plots is called
stem()
and if we ask for a stem and leaf plot of the
afl.margins
data, here’s what we get:
stem( afl.margins )
##
## The decimal point is 1 digit(s) to the right of the |
##
## 0 | 001111223333333344567788888999999
## 1 | 0000011122234456666899999
## 2 | 00011222333445566667788999999
## 3 | 01223555566666678888899
## 4 | 012334444477788899
## 5 | 00002233445556667
## 6 | 0113455678
## 7 | 01123556
## 8 | 122349
## 9 | 458
## 10 | 148
## 11 | 6
The values to the left of the
|
are called
stems
and the values to the right are called
leaves
. If you just look at the shape that the leaves make, you can see something that looks a lot like a histogram made out of numbers, just rotated by 90 degrees. But if you know how to read the plot, there’s quite a lot of additional information here. In fact, it’s also giving you the actual values of
all
of the observations in the data set. To illustrate, let’s have a look at the last line in the stem and leaf plot, namely
11 | 6
. Specifically, let’s compare this to the largest values of the
afl.margins
data set:
> max( afl.margins )
[1] 116
Hm…
11 | 6
versus
116
. Obviously the stem and leaf plot is trying to tell us that the largest value in the data set is 116. Similarly, when we look at the line that reads
10 | 148
, the way we interpret it to note that the stem and leaf plot is telling us that the data set contains observations with values 101, 104 and 108. Finally, when we see something like
5 | 00002233445556667
the four
0
s in the the stem and leaf plot are telling us that there are four observations with value 50.
I won’t talk about them in a lot of detail, but I should point out that some customisation options are available for stem and leaf plots in R. The two arguments that you can use to do this are:
-
scale
. Changing thescale
of the plot (default value is 1), which is analogous to changing the number of breaks in a histogram. Reducing the scale causes R to reduce the number of stem values (i.e., the number of breaks, if this were a histogram) that the plot uses. -
width
. The second way that to can customise a stem and leaf plot is to alter thewidth
(default value is 80). Changing the width alters the maximum number of leaf values that can be displayed for any given stem.
However, since stem and leaf plots aren’t as important as they used to be, I’ll leave it to the interested reader to investigate these options. Try the following two commands to see what happens:
> stem( x = afl.margins, scale = .25 )
> stem( x = afl.margins, width = 20 )
The only other thing to note about stem and leaf plots is the line in which R tells you where the decimal point is. If our data set had included only the numbers .11, .15, .23, .35 and .59 and we’d drawn a stem and leaf plot of these data, then R would move the decimal point: the stem values would be 1,2,3,4 and 5, but R would tell you that the decimal point has moved to the left of the
|
symbol. If you want to see this in action, try the following command:
> stem( x = afl.margins / 1000 )
The stem and leaf plot itself will look identical to the original one we drew, except for the fact that R will tell you that the decimal point has moved.