2.5: Accessing a Data Frame
- Page ID
- 4406
We access the individual elements in a data frame using square brackets to identify a specific cell. For instance, the following accesses the data in the cell in row 15, column 12:
> int92.dat[15,12]
[1] 180
We can also access cells by name by putting quotes around the name:
> int92.dat["71","perf"]
[1] 105.1
This expression returns the data in the row labeled 71
and the column labeled perf
. Note that this is not row 71, but rather the row that contains the data for the processor whose name is 71
.
We can access an entire column by leaving the first parameter in the square brackets empty. For instance, the following prints the value in every row for the column labeled clock
:
> int92.dat[,"clock"]
[1] 100 125 166 175 190 ...
Similarly, this expression prints the values in all of the columns for row 36:
> int92.dat[36,]
nperf perf clock threads cores ...
36 13.07378 79.86399 80 1 1 ...
The functions nrow() and ncol() return the number of rows and columns, respectively, in the data frame:
> nrow(int92.dat)
[1] 78
> ncol(int92.dat)
[1] 16
Because R functions can typically operate on a vector of any length, we can use built-in functions to quickly compute some useful results. For example, the following expressions compute the minimum, maximum, mean, and standard deviation of the perf
column in the int92.dat
data frame:
> min(int92.dat[,"perf"])
[1] 36.7
> max(int92.dat[,"perf"])
[1] 366.857
> mean(int92.dat[,"perf"])
[1] 124.2859
> sd(int92.dat[,"perf"])
[1] 78.0974
This square-bracket notation can become cumbersome when you do a substantial amount of interactive computation within the R environment. R provides an alternative notation using the $ symbol to more easily access a column. Repeating the previous example using this notation:
> min(int92.dat$perf)
[1] 36.7
> max(int92.dat$perf)
[1] 366.857
> mean(int92.dat$perf)
[1] 124.2859
> sd(int92.dat$perf)
[1] 78.0974
This notation says to use the data in the column named perf
from the data frame named int92.dat
. We can make yet a further simplification using the attach
function. This function makes the corresponding data frame local to the current workspace, thereby eliminating the need to use the potentially awkward $ or square-bracket indexing notation. The following example shows how this works:
> attach(int92.dat) > min(perf) [1] 36.7 > max(perf) [1] 366.857 > mean(perf) [1] 124.2859 > sd(perf) [1] 78.0974
To change to a different data frame within your local workspace, you must first detach the current data frame:
> detach(int92.dat)
> attach(fp00.dat)
> min(perf)
[1] 87.54153
> max(perf)
[1] 3369
> mean(perf)
[1] 1217.282
> sd(perf)
[1] 787.4139
Now that we have the necessary data available in the R environment, and some understanding of how to access and manipulate this data, we are ready to generate our first regression model.