# 5.2: Creating or Modifying Variables Using Mutate()

Often we will want to either create a new variable based on an existing variable, or modify the value of an existing variable. Within the tidyverse, we do this using a function called mutate(). Let’s start with a toy example by creating a data frame containing a single variable.

toy_df <- data.frame(x = c(1,2,3,4))
glimpse(toy_df)
## Observations: 4
## Variables: 1
## $x <dbl> 1, 2, 3, 4 Let’s say that we wanted to create a new variable called y that would contain the value of x multiplied by 10. We could do this using mutate() and then assign the result back to the same data frame: toy_df <- toy_df %>% # create a new variable called y that contains x*10 mutate(y = x*10) glimpse(toy_df) ## Observations: 4 ## Variables: 2 ##$ x <dbl> 1, 2, 3, 4
## $y <dbl> 10, 20, 30, 40 We could also overwrite a variable with a new value: toy_df2 <- toy_df %>% # create a new variable called y that contains x*10 mutate(y = y + 1) glimpse(toy_df2) ## Observations: 4 ## Variables: 2 ##$ x <dbl> 1, 2, 3, 4
## \$ y <dbl> 11, 21, 31, 41

We will use mutate() often so it’s an important function to understand.

Here we can use it with our example data frame to create a new variable that is the sum of several other variables.

myDataFrame <-
myDataFrame %>%
mutate(total = x + y + z)

kable(myDataFrame)
n x y z total
russ 1 4 7 12
lucy 2 5 8 15
jaclyn 3 6 9 18
tyler 4 7 10 21

mutate() is a function that creates a new variable in a data frame using the existing variables. In this case, it creates a variable called total that is the sum of the existing variables x, y, and z.

# 5.2.1 Remove a column using the select() function

Adding a minus sign to the name of a variable within the select() command will remove that variable, leaving all of the others.

myDataFrame <-
myDataFrame %>%
dplyr::select(-total)

kable(myDataFrame)
n x y z
russ 1 4 7
lucy 2 5 8
jaclyn 3 6 9
tyler 4 7 10