Skip to main content
Statistics LibreTexts

5.2: Creating or Modifying Variables Using Mutate()

  • Page ID
    8727
  • Often we will want to either create a new variable based on an existing variable, or modify the value of an existing variable. Within the tidyverse, we do this using a function called mutate(). Let’s start with a toy example by creating a data frame containing a single variable.

    toy_df <- data.frame(x = c(1,2,3,4))
    glimpse(toy_df)
    ## Observations: 4
    ## Variables: 1
    ## $ x <dbl> 1, 2, 3, 4

    Let’s say that we wanted to create a new variable called y that would contain the value of x multiplied by 10. We could do this using mutate() and then assign the result back to the same data frame:

    toy_df <- toy_df %>%
      # create a new variable called y that contains x*10
      mutate(y = x*10)
    glimpse(toy_df)
    ## Observations: 4
    ## Variables: 2
    ## $ x <dbl> 1, 2, 3, 4
    ## $ y <dbl> 10, 20, 30, 40

    We could also overwrite a variable with a new value:

    toy_df2 <- toy_df %>%
      # create a new variable called y that contains x*10
      mutate(y = y + 1)
    glimpse(toy_df2)
    ## Observations: 4
    ## Variables: 2
    ## $ x <dbl> 1, 2, 3, 4
    ## $ y <dbl> 11, 21, 31, 41

    We will use mutate() often so it’s an important function to understand.

    Here we can use it with our example data frame to create a new variable that is the sum of several other variables.

    myDataFrame <- 
      myDataFrame %>%
      mutate(total = x + y + z)
    
    kable(myDataFrame)
    n x y z total
    russ 1 4 7 12
    lucy 2 5 8 15
    jaclyn 3 6 9 18
    tyler 4 7 10 21

    mutate() is a function that creates a new variable in a data frame using the existing variables. In this case, it creates a variable called total that is the sum of the existing variables x, y, and z.

    5.2.1 Remove a column using the select() function

    Adding a minus sign to the name of a variable within the select() command will remove that variable, leaving all of the others.

    myDataFrame <- 
      myDataFrame %>%
      dplyr::select(-total)
    
    kable(myDataFrame)
    n x y z
    russ 1 4 7
    lucy 2 5 8
    jaclyn 3 6 9
    tyler 4 7 10