3.10: Indexing Vectors
One last thing to add before finishing up this chapter. So far, whenever I’ve had to get information out of a vector, all I’ve done is typed something like
months[4]
; and when I do this R prints out the fourth element of the
months
vector. In this section, I’ll show you two additional tricks for getting information out of the vector.
Extracting multiple elements
One very useful thing we can do is pull out more than one element at a time. In the previous example, we only used a single number (i.e.,
2
) to indicate which element we wanted. Alternatively, we can use a vector. So, suppose I wanted the data for February, March and April. What I could do is use the vector
c(2,3,4)
to indicate which elements I want R to pull out. That is, I’d type this:
sales.by.month[ c(2,3,4) ]
## [1] 100 200 50
Notice that the order matters here. If I asked for the data in the reverse order (i.e., April first, then March, then February) by using the vector
c(4,3,2)
, then R outputs the data in the reverse order:
sales.by.month[ c(4,3,2) ]
## [1] 50 200 100
A second thing to be aware of is that R provides you with handy shortcuts for very common situations. For instance, suppose that I wanted to extract everything from the 2
nd
month through to the 8
th
month. One way to do this is to do the same thing I did above, and use the vector
c(2,3,4,5,6,7,8)
to indicate the elements that I want. That works just fine
sales.by.month[ c(2,3,4,5,6,7,8) ]
## [1] 100 200 50 25 0 0 0
but it’s kind of a lot of typing. To help make this easier, R lets you use
2:8
as shorthand for
c(2,3,4,5,6,7,8)
, which makes things a lot simpler. First, let’s just check that this is true:
2:8
## [1] 2 3 4 5 6 7 8
Next, let’s check that we can use the
2:8
shorthand as a way to pull out the 2nd through 8th elements of
sales.by.months
:
sales.by.month[2:8]
## [1] 100 200 50 25 0 0 0
So that’s kind of neat.
Logical indexing
At this point, I can introduce an extremely useful tool called
logical indexing
. In the last section, I created a logical vector
any.sales.this.month
, whose elements are
TRUE
for any month in which I sold at least one book, and
FALSE
for all the others. However, that big long list of
TRUE
s and
FALSE
s is a little bit hard to read, so what I’d like to do is to have R select the names of the
months
for which I sold any books. Earlier on, I created a vector
months
that contains the names of each of the months. This is where logical indexing is handy. What I need to do is this:
months[ sales.by.month > 0 ]
## [1] "February" "March" "April" "May"
To understand what’s happening here, it’s helpful to notice that
sales.by.month > 0
is the same logical expression that we used to create the
any.sales.this.month
vector in the last section. In fact, I could have just done this:
months[ any.sales.this.month ]
## [1] "February" "March" "April" "May"
and gotten exactly the same result. In order to figure out which elements of
months
to include in the output, what R does is look to see if the corresponding element in
any.sales.this.month
is
TRUE
. Thus, since element 1 of
any.sales.this.month
is
FALSE
, R does not include
"January"
as part of the output; but since element 2 of
any.sales.this.month
is
TRUE
, R does include
"February"
in the output. Note that there’s no reason why I can’t use the same trick to find the actual sales numbers for those months. The command to do that would just be this:
sales.by.month [ sales.by.month > 0 ]
## [1] 100 200 50 25
In fact, we can do the same thing with text. Here’s an example. Suppose that – to continue the saga of the textbook sales – I later find out that the bookshop only had sufficient stocks for a few months of the year. They tell me that early in the year they had
"high"
stocks, which then dropped to
"low"
levels, and in fact for one month they were
"out"
of copies of the book for a while before they were able to replenish them. Thus I might have a variable called
stock.levels
which looks like this:
stock.levels<-c("high", "high", "low", "out", "out", "high",
"high", "high", "high", "high", "high", "high")
stock.levels
## [1] "high" "high" "low" "out" "out" "high" "high" "high" "high" "high"
## [11] "high" "high"
Thus, if I want to know the months for which the bookshop was out of my book, I could apply the logical indexing trick, but with the character vector
stock.levels
, like this:
months[stock.levels == "out"]
## [1] "April" "May"
Alternatively, if I want to know when the bookshop was either low on copies or out of copies, I could do this:
months[stock.levels == "out" | stock.levels == "low"]
## [1] "March" "April" "May"
or this
months[stock.levels != "high" ]
## [1] "March" "April" "May"
Either way, I get the answer I want.
At this point, I hope you can see why logical indexing is such a useful thing. It’s a very basic, yet very powerful way to manipulate data. We’ll talk a lot more about how to manipulate data in Chapter 7, since it’s a critical skill for real world research that is often overlooked in introductory research methods classes (or at least, that’s been my experience). It does take a bit of practice to become completely comfortable using logical indexing, so it’s a good idea to play around with these sorts of commands. Try creating a few different variables of your own, and then ask yourself questions like “how do I get R to spit out all the elements that are [blah]”. Practice makes perfect, and it’s only by practicing logical indexing that you’ll perfect the art of yelling frustrated insults at your computer. 40