One of the most important things to be able to do in R (or any programming language, for that matter) is to store information in variables. Variables in R aren’t exactly the same thing as the variables we talked about in the last chapter on research methods, but they are similar. At a conceptual level you can think of a variable as label for a certain piece of information, or even several different pieces of information. When doing statistical analysis in R all of your data (the variables you measured in your study) will be stored as variables in R, but as well see later in the book you’ll find that you end up creating variables for other things too. However, before we delve into all the messy details of data sets and statistical analysis, let’s look at the very basics for how we create variables and work with them.
Variable assignment using
Since we’ve been working with numbers so far, let’s start by creating variables to store our numbers. And since most people like concrete examples, let’s invent one. Suppose I’m trying to calculate how much money I’m going to make from this book. There’s several different numbers I might want to store. Firstly, I need to figure out how many copies I’ll sell. This isn’t exactly Harry Potter, so let’s assume I’m only going to sell one copy per student in my class. That’s 350 sales, so let’s create a variable called
sales. What I want to do is assign a value to my variable
sales, and that value should be
350. We do this by using the assignment operator, which is
<-. Here’s how we do it:
sales <- 350
When you hit enter, R doesn’t print out any output.23 It just gives you another command prompt. However, behind the scenes R has created a variable called
sales and given it a value of
350. You can check that this has happened by asking R to print the variable on screen. And the simplest way to do that is to type the name of the variable and hit enter24.
##  350
So that’s nice to know. Anytime you can’t remember what R has got stored in a particular variable, you can just type the name of the variable and hit enter.
Okay, so now we know how to assign variables. Actually, there’s a bit more you should know. Firstly, one of the curious features of R is that there are several different ways of making assignments. In addition to the
<- operator, we can also use
=, and it’s pretty important to understand the differences between them.25 Let’s start by considering
->, since that’s the easy one (we’ll discuss the use of
= in Section 3.5.1. As you might expect from just looking at the symbol, it’s almost identical to
<-. It’s just that the arrow (i.e., the assignment) goes from left to right. So if I wanted to define my
sales variable using
->, I would write it like this:
350 -> sales
This has the same effect: and it still means that I’m only going to sell
350 copies. Sigh. Apart from this superficial difference,
-> are identical. In fact, as far as R is concerned, they’re actually the same operator, just in a “left form” and a “right form.”26
Doing calculations using variables
Okay, let’s get back to my original story. In my quest to become rich, I’ve written this textbook. To figure out how good a strategy is, I’ve started creating some variables in R. In addition to defining a
sales variable that counts the number of copies I’m going to sell, I can also create a variable called
royalty, indicating how much money I get per copy. Let’s say that my royalties are about $7 per book:
sales <- 350 royalty <- 7
The nice thing about variables (in fact, the whole point of having variables) is that we can do anything with a variable that we ought to be able to do with the information that it stores. That is, since R allows me to multiply
350 * 7
##  2450
it also allows me to multiply
sales * royalty
##  2450
As far as R is concerned, the
sales * royalty command is the same as the
350 * 7 command. Not surprisingly, I can assign the output of this calculation to a new variable, which I’ll call
revenue. And when we do this, the new variable
revenue gets the value
2450. So let’s do that, and then get R to print out the value of
revenue so that we can verify that it’s done what we asked:
revenue <- sales * royalty revenue
##  2450
That’s fairly straightforward. A slightly more subtle thing we can do is reassign the value of my variable, based on its current value. For instance, suppose that one of my students (no doubt under the influence of psychotropic drugs) loves the book so much that he or she donates me an extra $550. The simplest way to capture this is by a command like this:
revenue <- revenue + 550 revenue
##  3000
In this calculation, R has taken the old value of
revenue (i.e., 2450) and added 550 to that value, producing a value of 3000. This new value is assigned to the
revenue variable, overwriting its previous value. In any case, we now know that I’m expecting to make $3000 off this. Pretty sweet, I thinks to myself. Or at least, that’s what I thinks until I do a few more calculation and work out what the implied hourly wage I’m making off this looks like.
Rules and conventions for naming variables
In the examples that we’ve seen so far, my variable names (
revenue) have just been English-language words written using lowercase letters. However, R allows a lot more flexibility when it comes to naming your variables, as the following list of rules27 illustrates:
- Variable names can only use the upper case alphabetic characters
Zas well as the lower case characters
z. You can also include numeric characters
9in the variable name, as well as the period
_character. In other words, you can use
SaL.e_sas a variable name (though I can’t think why you would want to), but you can’t use
- Variable names cannot include spaces: therefore
my salesis not a valid name, but
- Variable names are case sensitive: that is,
salesare different variable names.
- Variable names must start with a letter or a period. You can’t use something like
1salesas a variable name. You can use
.salesas a variable name if you want, but it’s not usually a good idea. By convention, variables starting with a
.are used for special purposes, so you should avoid doing so.
- Variable names cannot be one of the reserved keywords. These are special names that R needs to keep “safe” from us mere users, so you can’t use them as the names of variables. The keywords are:
NA, Rtextverb#NA_integer_#, Rtextverb#NA_real_#,
NA_complex_, and finally,
NA_character_. Don’t feel especially obliged to memorise these: if you make a mistake and try to use one of the keywords as a variable name, R will complain about it like the whiny little automaton it is.
In addition to those rules that R enforces, there are some informal conventions that people tend to follow when naming variables. One of them you’ve already seen: i.e., don’t use variables that start with a period. But there are several others. You aren’t obliged to follow these conventions, and there are many situations in which it’s advisable to ignore them, but it’s generally a good idea to follow them when you can:
- Use informative variable names. As a general rule, using meaningful names like
revenueis preferred over arbitrary ones like
variable2. Otherwise it’s very hard to remember what the contents of different variables are, and it becomes hard to understand what your commands actually do.
- Use short variable names. Typing is a pain and no-one likes doing it. So we much prefer to use a name like
salesover a name like
sales.for.this.book.that.you.are.reading. Obviously there’s a bit of a tension between using informative names (which tend to be long) and using short names (which tend to be meaningless), so use a bit of common sense when trading off these two conventions.
- Use one of the conventional naming styles for multi-word variable names. Suppose I want to name a variable that stores “my new salary”. Obviously I can’t include spaces in the variable name, so how should I do this? There are three different conventions that you sometimes see R users employing. Firstly, you can separate the words using periods, which would give you
my.new.salaryas the variable name. Alternatively, you could separate words using underscores, as in
my_new_salary. Finally, you could use capital letters at the beginning of each word (except the first one), which gives you
myNewSalaryas the variable name. I don’t think there’s any strong reason to prefer one over the other,28 but it’s important to be consistent.