Skip to main content
Statistics LibreTexts

3.10: Indexing Vectors

  • Page ID
    8112
  • One last thing to add before finishing up this chapter. So far, whenever I’ve had to get information out of a vector, all I’ve done is typed something like months[4]; and when I do this R prints out the fourth element of the months vector. In this section, I’ll show you two additional tricks for getting information out of the vector.

    3.10.1 Extracting multiple elements

    One very useful thing we can do is pull out more than one element at a time. In the previous example, we only used a single number (i.e., 2) to indicate which element we wanted. Alternatively, we can use a vector. So, suppose I wanted the data for February, March and April. What I could do is use the vector c(2,3,4) to indicate which elements I want R to pull out. That is, I’d type this:

    sales.by.month[ c(2,3,4) ]
    ## [1] 100 200  50

    Notice that the order matters here. If I asked for the data in the reverse order (i.e., April first, then March, then February) by using the vector c(4,3,2), then R outputs the data in the reverse order:

    sales.by.month[ c(4,3,2) ]
    ## [1]  50 200 100

    A second thing to be aware of is that R provides you with handy shortcuts for very common situations. For instance, suppose that I wanted to extract everything from the 2nd month through to the 8th month. One way to do this is to do the same thing I did above, and use the vector c(2,3,4,5,6,7,8) to indicate the elements that I want. That works just fine

    sales.by.month[ c(2,3,4,5,6,7,8) ]
    ## [1] 100 200  50  25   0   0   0

    but it’s kind of a lot of typing. To help make this easier, R lets you use 2:8 as shorthand for c(2,3,4,5,6,7,8), which makes things a lot simpler. First, let’s just check that this is true:

    2:8
    ## [1] 2 3 4 5 6 7 8

     Next, let’s check that we can use the 2:8 shorthand as a way to pull out the 2nd through 8th elements of sales.by.months:

    sales.by.month[2:8]
    ## [1] 100 200  50  25   0   0   0

     So that’s kind of neat.

    3.10.2 Logical indexing

    At this point, I can introduce an extremely useful tool called logical indexing. In the last section, I created a logical vector any.sales.this.month, whose elements are TRUE for any month in which I sold at least one book, and FALSE for all the others. However, that big long list of TRUEs and FALSEs is a little bit hard to read, so what I’d like to do is to have R select the names of the months for which I sold any books. Earlier on, I created a vector months that contains the names of each of the months. This is where logical indexing is handy. What I need to do is this:

    months[ sales.by.month > 0 ]
    ## [1] "February" "March"    "April"    "May"

    To understand what’s happening here, it’s helpful to notice that sales.by.month > 0 is the same logical expression that we used to create the any.sales.this.month vector in the last section. In fact, I could have just done this:

    months[ any.sales.this.month ]
    ## [1] "February" "March"    "April"    "May"

    and gotten exactly the same result. In order to figure out which elements of months to include in the output, what R does is look to see if the corresponding element in any.sales.this.month is TRUE. Thus, since element 1 of any.sales.this.month is FALSE, R does not include "January" as part of the output; but since element 2 of any.sales.this.month is TRUE, R does include "February" in the output. Note that there’s no reason why I can’t use the same trick to find the actual sales numbers for those months. The command to do that would just be this:

    sales.by.month [ sales.by.month > 0 ]
    ## [1] 100 200  50  25

    In fact, we can do the same thing with text. Here’s an example. Suppose that – to continue the saga of the textbook sales – I later find out that the bookshop only had sufficient stocks for a few months of the year. They tell me that early in the year they had "high" stocks, which then dropped to "low" levels, and in fact for one month they were "out" of copies of the book for a while before they were able to replenish them. Thus I might have a variable called stock.levels which looks like this:

    stock.levels<-c("high", "high", "low", "out", "out", "high",
                    "high", "high", "high", "high", "high", "high")
    
    stock.levels
    ##  [1] "high" "high" "low"  "out"  "out"  "high" "high" "high" "high" "high"
    ## [11] "high" "high"

    Thus, if I want to know the months for which the bookshop was out of my book, I could apply the logical indexing trick, but with the character vector stock.levels, like this:

    months[stock.levels == "out"]
    ## [1] "April" "May"

     Alternatively, if I want to know when the bookshop was either low on copies or out of copies, I could do this:

    months[stock.levels == "out" | stock.levels == "low"]
    ## [1] "March" "April" "May"

     or this

    months[stock.levels != "high" ]
    ## [1] "March" "April" "May"

     Either way, I get the answer I want.

    At this point, I hope you can see why logical indexing is such a useful thing. It’s a very basic, yet very powerful way to manipulate data. We’ll talk a lot more about how to manipulate data in Chapter 7, since it’s a critical skill for real world research that is often overlooked in introductory research methods classes (or at least, that’s been my experience). It does take a bit of practice to become completely comfortable using logical indexing, so it’s a good idea to play around with these sorts of commands. Try creating a few different variables of your own, and then ask yourself questions like “how do I get R to spit out all the elements that are [blah]”. Practice makes perfect, and it’s only by practicing logical indexing that you’ll perfect the art of yelling frustrated insults at your computer.40