Skip to main content
Statistics LibreTexts

5.4: Looking at Individual Variables Using Pull() and Head()

  • Page ID
    8729
  • The NHANES data frame contains a large number of variables, but usually we are only interested in a particular variable. We can extract a particular variable from a data frame using the pull() function. Let’s say that we want to extract the variable PhysActive. We could do this by piping the data frame into the pull command, which will result in a list of many thousands of values. Instead of printing out this entire list, we will pipe the result into the head() function, which just shows us the first few values contained in a variable. In this case we are not assigning the value back to a variable, so it will simply be printed to the screen.

    NHANES %>%
      # extract the PhysActive variable
      pull(PhysActive) %>%
      # extract the first 10 values 
      head(10) %>%
      kable()
    x
    No
    NA
    No
    NA
    NA
    Yes
    Yes
    Yes
    Yes
    NA

    There are two important things to notice here. The first is that there are three different values apparent in the answers: “Yes”, “No”, and , which means that the value is missing for this person (perhaps they didn’t want to answer that question on the survey). When we are working with data we generally need to remove missing values, as we will see below.

    The second thing to notice is that R prints out a list of “Levels” of the variable. This is because this variable is defined as a particular kind of variable in R known as a factor. You can think of a factor variable as a categorial variable with a specific set of levels. The missing data are not treated as a level, so it can be useful to make the missing values explicit, which can be done using a function called fct_explicit_na() in the forcats package. Let’s add a line to do that:

    NHANES %>%
      mutate(PhysActive = fct_explicit_na(PhysActive)) %>%
      # extract the PhysActive variable
      pull(PhysActive) %>%
      # extract the first 10 values 
      head(10) %>%
      kable()
    x
    No
    (Missing)
    No
    (Missing)
    (Missing)
    Yes
    Yes
    Yes
    Yes
    (Missing)

    This new line overwrote the old value of PhysActive with a version that has been processed by the fct_explicit_na() function to convert values to explicitly missing values. Now you can see that Missing values are treated as an explicit level, which will be useful later.

    Now we are ready to start summarizing data!