Skip to main content
Statistics LibreTexts

8.5: Implicit Loops

  • Page ID
    3990
  • There’s one last topic I want to discuss in this chapter. In addition to providing the explicit looping structures via while and for, R also provides a collection of functions for implicit loops. What I mean by this is that these are functions that carry out operations very similar to those that you’d normally use a loop for. However, instead of typing out the whole loop, the whole thing is done with a single command. The main reason why this can be handy is that – due to the way that R is written – these implicit looping functions are usually about to do the same calculations much faster than the corresponding explicit loops. In most applications that beginners might want to undertake, this probably isn’t very important, since most beginners tend to start out working with fairly small data sets and don’t usually need to undertake extremely time consuming number crunching. However, because you often see these functions referred to in other contexts, it may be useful to very briefly discuss a few of them.

    The first and simplest of these functions is sapply(). The two most important arguments to this function are X, which specifies a vector containing the data, and FUN, which specifies the name of a function that should be applied to each element of the data vector. The following example illustrates the basics of how it works:

    words <- c("along", "the", "loom", "of", "the", "land")
    sapply( X = words, FUN = nchar )
    ## along   the  loom    of   the  land 
    ##     5     3     4     2     3     4

    Notice how similar this is to the second example of a for loop in Section 8.2.2. The sapply() function has implicitly looped over the elements of words, and for each such element applied the nchar() function to calculate the number of letters in the corresponding word.

    The second of these functions is tapply(), which has three key arguments. As before X specifies the data, and FUN specifies a function. However, there is also an INDEX argument which specifies a grouping variable.140 What the tapply() function does is loop over all of the different values that appear in the INDEX variable. Each such value defines a group: the tapply() function constructs the subset of X that corresponds to that group, and then applies the function FUN to that subset of the data. This probably sounds a little abstract, so let’s consider a specific example, using the nightgarden.Rdata file that we used in Chapter 7.

    gender <- c( "male","male","female","female","male" )
    age <- c( 10,12,9,11,13 )
    tapply( X = age, INDEX = gender, FUN = mean )
    ##   female     male 
    ## 10.00000 11.66667

    In this extract, what we’re doing is using gender to define two different groups of people, and using their ages as the data. We then calculate the mean() of the ages, separately for the males and the females. A closely related function is by(). It actually does the same thing as tapply(), but the output is formatted a bit differently. This time around the three arguments are called data, INDICES and FUN, but they’re pretty much the same thing. An example of how to use the by() function is shown in the following extract:

    by( data = age, INDICES = gender, FUN = mean )
    ## gender: female
    ## [1] 10
    ## -------------------------------------------------------- 
    ## gender: male
    ## [1] 11.66667

    The tapply() and by() functions are quite handy things to know about, and are pretty widely used. However, although I do make passing reference to the tapply() later on, I don’t make much use of them in this book.

    Before moving on, I should mention that there are several other functions that work along similar lines, and have suspiciously similar names: lapply, mapply, apply, vapply, rapply and eapply. However, none of these come up anywhere else in this book, so all I wanted to do here is draw your attention to the fact that they exist.