3.4: Storing a Number As a Variable

Last updated
Save as PDF

Page ID: 3954

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $ $ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $$\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$ $\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$

One of the most important things to be able to do in R (or any programming language, for that matter) is to store information in variables. Variables in R aren’t exactly the same thing as the variables we talked about in the last chapter on research methods, but they are similar. At a conceptual level you can think of a variable as label for a certain piece of information, or even several different pieces of information. When doing statistical analysis in R all of your data (the variables you measured in your study) will be stored as variables in R, but as well see later in the book you’ll find that you end up creating variables for other things too. However, before we delve into all the messy details of data sets and statistical analysis, let’s look at the very basics for how we create variables and work with them.

Variable assignment using `<->`

Since we’ve been working with numbers so far, let’s start by creating variables to store our numbers. And since most people like concrete examples, let’s invent one. Suppose I’m trying to calculate how much money I’m going to make from this book. There’s several different numbers I might want to store. Firstly, I need to figure out how many copies I’ll sell. This isn’t exactly Harry Potter, so let’s assume I’m only going to sell one copy per student in my class. That’s 350 sales, so let’s create a variable called sales. What I want to do is assign a value to my variablesales, and that value should be 350. We do this by using the assignment operator, which is <-. Here’s how we do it:

sales <- 350

When you hit enter, R doesn’t print out any output.²³ It just gives you another command prompt. However, behind the scenes R has created a variable called sales and given it a value of 350. You can check that this has happened by asking R to print the variable on screen. And the simplest way to do that is to type the name of the variable and hit enter²⁴.

sales

## [1] 350

So that’s nice to know. Anytime you can’t remember what R has got stored in a particular variable, you can just type the name of the variable and hit enter.

Okay, so now we know how to assign variables. Actually, there’s a bit more you should know. Firstly, one of the curious features of R is that there are several different ways of making assignments. In addition to the <- operator, we can also use -> and =, and it’s pretty important to understand the differences between them.²⁵ Let’s start by considering ->, since that’s the easy one (we’ll discuss the use of = in Section 3.5.1. As you might expect from just looking at the symbol, it’s almost identical to <-. It’s just that the arrow (i.e., the assignment) goes from left to right. So if I wanted to define my sales variable using ->, I would write it like this:

350 -> sales

This has the same effect: and it still means that I’m only going to sell 350 copies. Sigh. Apart from this superficial difference, <- and -> are identical. In fact, as far as R is concerned, they’re actually the same operator, just in a “left form” and a “right form.”²⁶

Doing calculations using variables

Okay, let’s get back to my original story. In my quest to become rich, I’ve written this textbook. To figure out how good a strategy is, I’ve started creating some variables in R. In addition to defining a sales variable that counts the number of copies I’m going to sell, I can also create a variable called royalty, indicating how much money I get per copy. Let’s say that my royalties are about $7 per book:

sales <- 350
royalty <- 7

The nice thing about variables (in fact, the whole point of having variables) is that we can do anything with a variable that we ought to be able to do with the information that it stores. That is, since R allows me to multiply 350 by 7

350 * 7

## [1] 2450

it also allows me to multiply sales by royalty

sales * royalty

## [1] 2450

As far as R is concerned, the sales * royalty command is the same as the 350 * 7 command. Not surprisingly, I can assign the output of this calculation to a new variable, which I’ll call revenue. And when we do this, the new variable revenue gets the value 2450. So let’s do that, and then get R to print out the value of revenue so that we can verify that it’s done what we asked:

revenue <- sales * royalty
revenue

## [1] 2450

That’s fairly straightforward. A slightly more subtle thing we can do is reassign the value of my variable, based on its current value. For instance, suppose that one of my students (no doubt under the influence of psychotropic drugs) loves the book so much that he or she donates me an extra $550. The simplest way to capture this is by a command like this:

revenue <- revenue + 550
revenue

## [1] 3000

In this calculation, R has taken the old value of revenue (i.e., 2450) and added 550 to that value, producing a value of 3000. This new value is assigned to the revenue variable, overwriting its previous value. In any case, we now know that I’m expecting to make $3000 off this. Pretty sweet, I thinks to myself. Or at least, that’s what I thinks until I do a few more calculation and work out what the implied hourly wage I’m making off this looks like.

Rules and conventions for naming variables

In the examples that we’ve seen so far, my variable names (sales and revenue) have just been English-language words written using lowercase letters. However, R allows a lot more flexibility when it comes to naming your variables, as the following list of rules²⁷ illustrates:

Variable names can only use the upper case alphabetic characters A-Z as well as the lower case characters a-z. You can also include numeric characters 0-9 in the variable name, as well as the period . or underscore _ character. In other words, you can use SaL.e_s as a variable name (though I can’t think why you would want to), but you can’t use Sales?.
Variable names cannot include spaces: therefore my sales is not a valid name, but my.sales is.
Variable names are case sensitive: that is, Sales and sales are different variable names.
Variable names must start with a letter or a period. You can’t use something like _sales or 1sales as a variable name. You can use .sales as a variable name if you want, but it’s not usually a good idea. By convention, variables starting with a . are used for special purposes, so you should avoid doing so.
Variable names cannot be one of the reserved keywords. These are special names that R needs to keep “safe” from us mere users, so you can’t use them as the names of variables. The keywords are: if, else, repeat, while, function, for, in, next, break, TRUE, FALSE, NULL, Inf, NaN, NA, Rtextverb#NA_integer_#, Rtextverb#NA_real_#, NA_complex_, and finally, NA_character_. Don’t feel especially obliged to memorise these: if you make a mistake and try to use one of the keywords as a variable name, R will complain about it like the whiny little automaton it is.

In addition to those rules that R enforces, there are some informal conventions that people tend to follow when naming variables. One of them you’ve already seen: i.e., don’t use variables that start with a period. But there are several others. You aren’t obliged to follow these conventions, and there are many situations in which it’s advisable to ignore them, but it’s generally a good idea to follow them when you can:

Use informative variable names. As a general rule, using meaningful names like sales and revenue is preferred over arbitrary ones like variable1 and variable2. Otherwise it’s very hard to remember what the contents of different variables are, and it becomes hard to understand what your commands actually do.
Use short variable names. Typing is a pain and no-one likes doing it. So we much prefer to use a name like sales over a name like sales.for.this.book.that.you.are.reading. Obviously there’s a bit of a tension between using informative names (which tend to be long) and using short names (which tend to be meaningless), so use a bit of common sense when trading off these two conventions.
Use one of the conventional naming styles for multi-word variable names. Suppose I want to name a variable that stores “my new salary”. Obviously I can’t include spaces in the variable name, so how should I do this? There are three different conventions that you sometimes see R users employing. Firstly, you can separate the words using periods, which would give you my.new.salary as the variable name. Alternatively, you could separate words using underscores, as in my_new_salary. Finally, you could use capital letters at the beginning of each word (except the first one), which gives you myNewSalary as the variable name. I don’t think there’s any strong reason to prefer one over the other,²⁸ but it’s important to be consistent.

Variable assignment using <->

Doing calculations using variables

Rules and conventions for naming variables

Variable assignment using `<->`