18.5: Implicit Loops
- Last updated
- Apr 27, 2023
- Save as PDF
- Page ID
- 36234
( \newcommand{\kernel}{\mathrm{null}\,}\)
There’s one last topic I want to discuss in this chapter. In addition to providing the explicit looping structures via while
and for
, R also provides a collection of functions for implicit loops. What I mean by this is that these are functions that carry out operations very similar to those that you’d normally use a loop for. However, instead of typing out the whole loop, the whole thing is done with a single command. The main reason why this can be handy is that – due to the way that R is written – these implicit looping functions are usually about to do the same calculations much faster than the corresponding explicit loops. In most applications that beginners might want to undertake, this probably isn’t very important, since most beginners tend to start out working with fairly small data sets and don’t usually need to undertake extremely time consuming number crunching. However, because you often see these functions referred to in other contexts, it may be useful to very briefly discuss a few of them.
The first and simplest of these functions is sapply()
. The two most important arguments to this function are X
, which specifies a vector containing the data, and FUN
, which specifies the name of a function that should be applied to each element of the data vector. The following example illustrates the basics of how it works:
words <- c("along", "the", "loom", "of", "the", "land")
sapply( X = words, FUN = nchar )
## along the loom of the land
## 5 3 4 2 3 4
Notice how similar this is to the second example of a for
loop in Section 8.2.2. The sapply()
function has implicitly looped over the elements of words
, and for each such element applied the nchar()
function to calculate the number of letters in the corresponding word.
The second of these functions is tapply()
, which has three key arguments. As before X
specifies the data, and FUN
specifies a function. However, there is also an INDEX
argument which specifies a grouping variable.140 What the tapply()
function does is loop over all of the different values that appear in the INDEX
variable. Each such value defines a group: the tapply()
function constructs the subset of X
that corresponds to that group, and then applies the function FUN
to that subset of the data. This probably sounds a little abstract, so let’s consider a specific example, using the nightgarden.Rdata
file that we used in Chapter 7.
gender <- c( "male","male","female","female","male" )
age <- c( 10,12,9,11,13 )
tapply( X = age, INDEX = gender, FUN = mean )
## female male
## 10.00000 11.66667
In this extract, what we’re doing is using gender
to define two different groups of people, and using their ages
as the data. We then calculate the mean()
of the ages, separately for the males and the females. A closely related function is by()
. It actually does the same thing as tapply()
, but the output is formatted a bit differently. This time around the three arguments are called data
, INDICES
and FUN
, but they’re pretty much the same thing. An example of how to use the by()
function is shown in the following extract:
by( data = age, INDICES = gender, FUN = mean )
## gender: female
## [1] 10
## --------------------------------------------------------
## gender: male
## [1] 11.66667
The tapply()
and by()
functions are quite handy things to know about, and are pretty widely used. However, although I do make passing reference to the tapply()
later on, I don’t make much use of them in this book.
Before moving on, I should mention that there are several other functions that work along similar lines, and have suspiciously similar names: lapply
, mapply
, apply
, vapply
, rapply
and eapply
. However, none of these come up anywhere else in this book, so all I wanted to do here is draw your attention to the fact that they exist.