9.1: Mean

Last updated
Save as PDF

Page ID: 8758

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

The mean is defined as the sum of values divided by the number of values being summed:

\(\ \bar{X} =\frac{\sum_{i=1}^nx_i}{n}\)

Let’s say that we want to obtain the mean height for adults in the NHANES database (contained in the data Height). We would sum the individual heights (using the sum() function) and then divide by the number of values:

sum(NHANES$Height)/length(NHANES$Height)

## [1] NA

This returns the value NA, because there are missing values for some rows, and the sum() function doesn’t automatically handle those. To address this, we could filter the data frame using drop_na() to drop rows with NA values for this variable:

height_noNA <- NHANES %>%
  drop_na(Height) %>%
  pull(Height)

sum(height_noNA)/length(height_noNA)

## [1] 160

There is, of course, a built-in function in R called mean() that will compute the mean. Like the sum() function, mean() will return NA if there are any NA values in the data:

mean(NHANES$Height)

## [1] NA

The mean() function includes an optional argument called na.rm that will remove NA values if it is set to TRUE:

mean(NHANES$Height, na.rm=TRUE)

## [1] 160