In this section I discuss R packages, since almost all of the functions you might want to use in R come in packages. A package is basically just a big collection of functions, data sets and other R objects that are all grouped together under a common name. Some packages are already installed when you put R on your computer, but the vast majority of them of R packages are out there on the internet, waiting for you to download, install and use them.

When I first started writing this book, Rstudio didn’t really exist as a viable option for using R, and as a consequence I wrote a very lengthy section that explained how to do package management using raw R commands. It’s not actually terribly hard to work with packages that way, but it’s clunky and unpleasant. Fortunately, we don’t have to do things that way anymore. In this section, I’ll describe how to work with packages using the Rstudio tools, because they’re so much simpler. Along the way, you’ll see that whenever you get Rstudio to do something (e.g., install a package), you’ll actually see the R commands that get created. I’ll explain them as we go, because I think that helps you understand what’s going on.

A package must be installed before it can be loaded.

A package must be loaded before it can be used.

This two step process might seem a little odd at first, but the designers of R had very good reasons to do it this way,43 and you get the hang of it pretty quickly.

## 4.2.1 The package panel in Rstudio

Right, lets get started. The first thing you need to do is look in the lower right hand panel in Rstudio. You’ll see a tab labelled “Packages”. Click on the tab, and you’ll see a list of packages that looks something like Figure 4.1. Every row in the panel corresponds to a different package, and every column is a useful piece of information about that package.44 Going from left to right, here’s what each column is telling you:

• The check box on the far left column indicates whether or not the package is loaded.
• The one word of text immediately to the right of the check box is the name of the package.
• The short passage of text next to the name is a brief description of the package.
• The number next to the description tells you what version of the package you have installed.
• The little x-mark next to the version number is a button that you can push to uninstall the package from your computer (you almost never need this).

That seems straightforward enough, so let’s try loading and unloading packades. For this example, I’ll use the foreign package. The foreign package is a collection of tools that are very handy when R needs to interact with files that are produced by other software packages (e.g., SPSS). It comes bundled with R, so it’s one of the ones that you have installed already, but it won’t be one of the ones loaded. Inside the foreign package is a function called read.spss(). It’s a handy little function that you can use to import an SPSS data file into R, so let’s pretend we want to use it. Currently, the foreign package isn’t loaded, so if I ask R to tell me if it knows about a function called read.spss() it tells me that there’s no such thing…

exists( "read.spss" )
## [1] FALSE

Now let’s load the package. In Rstudio, the process is dead simple: go to the package tab, find the entry for the foreign package, and check the box on the left hand side. The moment that you do this, you’ll see a command like this appear in the R console:

library("foreign", lib.loc="/Library/Frameworks/R.framework/Versions/3.0/Resources/library")

The lib.loc bit will look slightly different on Macs versus on Windows, because that part of the command is just Rstudio telling R where to look to find the installed packages. What I’ve shown you above is the Mac version. On a Windows machine, you’ll probably see something that looks like this:

library("foreign", lib.loc="C:/Program Files/R/R-3.0.2/library")

But actually it doesn’t matter much. The lib.loc bit is almost always unnecessary. Unless you’ve taken to installing packages in idiosyncratic places (which is something that you can do if you really want) R already knows where to look. So in the vast majority of cases, the command to load the foreign package is just this:

library("foreign")

Throughout this book, you’ll often see me typing in library() commands. You don’t actually have to type them in yourself: you can use the Rstudio package panel to do all your package loading for you. The only reason I include the library() commands sometimes is as a reminder to you to make sure that you have the relevant package loaded. Oh, and I suppose we should check to see if our attempt to load the package actually worked. Let’s see if R now knows about the existence of the read.spss() function…

exists( "read.spss" )
## [1] TRUE

Yep. All good.

Sometimes, especially after a long session of working with R, you find yourself wanting to get rid of some of those packages that you’ve loaded. The Rstudio package panel makes this exactly as easy as loading the package in the first place. Find the entry corresponding to the package you want to unload, and uncheck the box. When you do that for the foreign package, you’ll see this command appear on screen:

detach("package:foreign", unload=TRUE)

And the package is unloaded. We can verify this by seeing if the read.spss() function still exists():

exists( "read.spss" )
## [1] FALSE

Nope. Definitely gone.

## 4.2.4 A few extra comments

Sections 4.2.2 and 4.2.3 cover the main things you need to know about loading and unloading packages. However, there’s a couple of other details that I want to draw your attention to. A concrete example is the best way to illustrate. One of the other packages that you already have installed on your computer is the Matrix package, so let’s load that one and see what happens:

library( Matrix )

## Loading required package: lattice

This is slightly more complex than the output that we got last time, but it’s not too complicated. The Matrix package makes use of some of the tools in the lattice package, and R has kept track of this dependency. So when you try to load the Matrix package, R recognises that you’re also going to need to have the lattice package loaded too. As a consequence, both packages get loaded, and R prints out a helpful little note on screen to tell you that it’s done so.

R is pretty aggressive about enforcing these dependencies. Suppose, for example, I try to unload the lattice package while the Matrix package is still loaded. This is easy enough to try: all I have to do is uncheck the box next to “lattice” in the packages panel. But if I try this, here’s what happens:

detach("package:lattice", unload=TRUE)

## Error: package lattice' is required by Matrix' so will not be detached


R refuses to do it. This can be quite useful, since it stops you from accidentally removing something that you still need. So, if I want to remove both Matrix and lattice, I need to do it in the correct order

Something else you should be aware of. Sometimes you’ll attempt to load a package, and R will print out a message on screen telling you that something or other has been “masked”. This will be confusing to you if I don’t explain it now, and it actually ties very closely to the whole reason why R forces you to load packages separately from installing them. Here’s an example. Two of the package that I’ll refer to a lot in this book are called car and psych. The car package is short for “Companion to Applied Regression” (which is a really great book, I’ll add), and it has a lot of tools that I’m quite fond of. The car package was written by a guy called John Fox, who has written a lot of great statistical tools for social science applications. The psych package was written by William Revelle, and it has a lot of functions that are very useful for psychologists in particular, especially in regards to psychometric techniques. For the most part, car and psych are quite unrelated to each other. They do different things, so not surprisingly almost all of the function names are different. But… there’s one exception to that. The car package and the psych package both contain a function called logit().45 This creates a naming conflict. If I load both packages into R, an ambiguity is created. If the user types in logit(100), should R use the logit() function in the car package, or the one in the psych package? The answer is: R uses whichever package you loaded most recently, and it tells you this very explicitly. Here’s what happens when I load the car package, and then afterwards load the psych package:

library(car)
## Loading required package: carData

library(psych)
##
## Attaching package: 'psych'

## The following object is masked from 'package:car':
##
##   logit

The output here is telling you that the logit object (i.e., function) in the car package is no longer accessible to you. It’s been hidden (or “masked”) from you by the one in the psych package.46

One of the main selling points for R is that there are thousands of packages that have been written for it, and these are all available online. So whereabouts online are these packages to be found, and how do we download and install them? There is a big repository of packages called the “Comprehensive R Archive Network” (CRAN), and the easiest way of getting and installing a new package is from one of the many CRAN mirror sites. Conveniently for us, R provides a function called install.packages() that you can use to do this. Even more conveniently, the Rstudio team runs its own CRAN mirror and Rstudio has a clean interface that lets you install packages without having to learn how to use the install.packages() command47

Using the Rstudio tools is, again, dead simple. In the top left hand corner of the packages panel (Figure 4.1) you’ll see a button called “Install Packages”. If you click on that, it will bring up a window like the one shown in Figure 4.2.

There are a few different buttons and boxes you can play with. Ignore most of them. Just go to the line that says “Packages” and start typing the name of the package that you want. As you type, you’ll see a dropdown menu appear (Figure 4.3), listing names of packages that start with the letters that you’ve typed so far.

You can select from this list, or just keep typing. Either way, once you’ve got the package name that you want, click on the install button at the bottom of the window. When you do, you’ll see the following command appear in the R console:

install.packages("psych")

This is the R command that does all the work. R then goes off to the internet, has a conversation with CRAN, downloads some stuff, and installs it on your computer. You probably don’t care about all the details of R’s little adventure on the web, but the install.packages() function is rather chatty, so it reports a bunch of gibberish that you really aren’t all that interested in:

trying URL 'http://cran.rstudio.com/bin/macosx/contrib/3.0/psych_1.4.1.tgz'
Content type 'application/x-gzip' length 2737873 bytes (2.6 Mb)
opened URL
==================================================

/var/folders/cl/thhsyrz53g73q0w1kb5z3l_80000gn/T//RtmpmQ9VT3/downloaded_packages

Despite the long and tedious response, all thar really means is “I’ve installed the psych package”. I find it best to humour the talkative little automaton. I don’t actually read any of this garbage, I just politely say “thanks” and go back to whatever I was doing.

## 4.2.6 Updating R and R packages

Every now and then the authors of packages release updated versions. The updated versions often add new functionality, fix bugs, and so on. It’s generally a good idea to update your packages periodically. There’s an update.packages() function that you can use to do this, but it’s probably easier to stick with the Rstudio tool. In the packages panel, click on the “Update Packages” button. This will bring up a window that looks like the one shown in Figure 4.4. In this window, each row refers to a package that needs to be updated. You can to tell R which updates you want to install by checking the boxes on the left. If you’re feeling lazy and just want to update everything, click the “Select All” button, and then click the “Install Updates” button. R then prints out a lot of garbage on the screen, individually downloading and installing all the new packages. This might take a while to complete depending on how good your internet connection is. Go make a cup of coffee. Come back, and all will be well.

• psych. This package, written by William Revelle, includes a lot of tools that are of particular use to psychologists. In particular, there’s several functions that are particularly convenient for producing analyses or summaries that are very common in psych, but less common in other disciplines.
• car. This is the Companion to Applied Regression package, which accompanies the excellent book of the same name by (Fox and Weisberg 2011). It provides a lot of very powerful tools, only some of which we’ll touch in this book.
Besides these three, there are a number of packages that I use in a more limited fashion: gplots, sciplot, foreign, effects, R.matlab, gdata, lmtest, and probably one or two others that I’ve missed. There are also a number of packages that I refer to but don’t actually use in this book, such as reshape, compute.es, HistData and multcomp among others. Finally, there are a number of packages that provide more advanced tools that I hope to talk about in future versions of the book, such as sem, ez, nlme and lme4. In any case, whenever I’m using a function that isn’t in the core packages, I’ll make sure to note this in the text.