7.10: Coercing Data from One Class to Another
Sometimes you want to change the variable class. This can happen for all sorts of reasons. Sometimes when you import data from files, it can come to you in the wrong format: numbers sometimes get imported as text, dates usually get imported as text, and many other possibilities besides. Regardless of how you’ve ended up in this situation, there’s a very good chance that sometimes you’ll want to convert a variable from one class into another one. Or, to use the correct term, you want to coerce the variable from one class into another. Coercion is a little tricky, and so I’ll only discuss the very basics here, using a few simple examples.
Firstly, let’s suppose we have a variable
x
that is
supposed
to be representing a number, but the data file that you’ve been given has encoded it as text. Let’s imagine that the variable is something like this:
x <- "100" # the variable
class(x) # what class is it?
## [1] "character"
Obviously, if I want to do calculations using
x
in its current state, R is going to get very annoyed at me. It thinks that
x
is text, so it’s not going to allow me to try to do mathematics using it! Obviously, we need to coerce
x
from character to numeric. We can do that in a straightforward way by using the
as.numeric()
function:
x <- as.numeric(x) # coerce the variable
class(x) # what class is it?
## [1] "numeric"
x + 1 # hey, addition works!
## [1] 101
Not surprisingly, we can also convert it back again if we need to. The function that we use to do this is the
as.character()
function:
x <- as.character(x) # coerce back to text
class(x) # check the class:
## [1] "character"
However, there’s some fairly obvious limitations: you can’t coerce the string
"hello world"
into a number because, well, there’s isn’t a number that corresponds to it. Or, at least, you can’t do anything useful:
as.numeric( "hello world" ) # this isn't going to work.
## Warning: NAs introduced by coercion
## [1] NA
In this case R doesn’t give you an error message; it just gives you a warning, and then says that the data is missing (see Section 4.6.1 for the interpretation of
NA
).
That gives you a feel for how to change between numeric and character data. What about logical data? To cover this briefly, coercing text to logical data is pretty intuitive: you use the
as.logical()
function, and the character strings
"T"
,
"TRUE"
,
"True"
and
"true"
all convert to the logical value of
TRUE
. Similarly
"F"
,
"FALSE"
,
"False"
, and
"false"
all become
FALSE
. All other strings convert to
NA
. When you go back the other way using
as.character()
,
TRUE
converts to
"TRUE"
and
FALSE
converts to
"FALSE"
. Converting numbers to logicals – again using
as.logical()
– is straightforward. Following the convention in the study of logic, the number
0
converts to
FALSE
. Everything else is
TRUE
. Going back using
as.numeric()
,
FALSE
converts to
0
and
TRUE
converts to
1
.