# 6.1: Finding Distributions of Functions of Random Variables

Moments

In this section we look at various numerical characteristics of random variables. These give us a way of classifying and comparing random variables.

Expected Value of Random Variables
\begin{defn}
\rule{0pt}{0pt}
\begin{enumerate}
\item If $X$ is a discrete random variable with possible values $x_1, x_2, \ldots, x_i, \ldots$, and frequency function $p(x)$, then the \define{expected value} (or \define{mean}) of $X$ is given by
$$\mu = \mu_X = \expec{X} = \sum_i x_i\cdot p(x_i).$$
\item If $X$ is a continuous random variable with density function $f(x)$, then the \define{expected value} (or \define{mean}) of $Y$ is given by
$$\mu = \mu_X = \expec{X} = \int\limits^{\infty}_{-\infty}\! x\cdot f(x)\, dx.$$
\end{enumerate}
\end{defn}
\end{framed}

The expected value of a random variable has many interpretations. First, looking at the formulas in Definition 27 for computing expected value, note that it is essentially a \emph{weighted average}. Specifically, for a discrete random variable, the expected value is computed by weighting'' each value of the random variable, $x_i$, by the probability that the random variable takes that value, $p(x_i)$, and then summing over all possible values. The formula for the expected value of a continuous random variable is the continuous analogue, where instead of summing over all possible values we integrate. This interpretation of the expected value as a weighted average explains why it is also referred to as the mean of the random variable.
\vskip1ex

The expected value of a random variable is also interpreted as the \emph{long-run value} of the random variable. In other words, if we repeat the underlying random experiment several times and take the average of the values of the random variable corresponding to the outcomes, we would get the expected value, approximately. Again, we see that the expected value is related to an average value of the random variable. Given the interpretation of the expected value as an average, either weighted'' or long-run'', the expected value is often referred to as a \emph{measure of center} of the random variable.
\vskip1ex

Finally, the expected value of a random variable has a graphical interpretation. The expected value gives the \emph{center of mass} of the frequency function in the discrete case and the pdf in the continuous case.

\begin{ex}
Consider again the context of Example 1, where we recorded the sequence of heads and tails in two tosses of a fair coin. In Example 9 we defined the discrete random variable $X$ to denote the number of heads obtained. In Example 11 we found the frequency function of $X$. We now apply Definition 27, part 1, and compute the expected value of $X$:
$$\expec{X} = 0\cdot p(0) + 1\cdot p(1) + 2\cdot p(2) = 0 + 0.5 + 0.5 = 1.$$
Thus, we expect that the number of heads obtained in two tosses of a fair coin will be 1 in the long-run or on average. Figure 6 demonstrates the graphical representation of the expected value as the center of mass of the frequency function.

\begin{figure}[h]
\includegraphics[scale=0.5]{expec1.jpg}
\centering
\caption{Histogram of $X$: The red arrow represents the center of mass, or the expected value of $X$.}
\end{figure}
\end{ex}
\el

\begin{ex}
Consider again the context of Example 17, where we defined the continuous random variable $X$ to denote the time a person waits for an elevator to arrive. The pdf of $X$ was given by
$$f(x) = \left\{\begin{array}{l l} x, & \text{for}\ 0\leq x\leq 1 \\ 2-x, & \text{for}\ 1< x\leq 2 \\ 0, & \text{otherwise} \end{array}\right.$$
Applying Definition 27, part 2, we compute the expected value of $X$:
$$\expec{X} = \int\limits^1_0\! x\cdot x\, dx + \int\limits^2_1\! x\cdot (2-x)\, dx = \int\limits^1_0\! x^2\, dx + \int\limits^2_1\! (2x - x^2)\, dx = \frac{1}{3} + \frac{2}{3} = 1.$$
Thus, we expect a person will wait 1 minute for the elevator on average. Figure 7 demonstrates the graphical representation of the expected value as the center of mass of the pdf.

\begin{figure}[h]
\includegraphics[scale=0.1]{expec2.jpg}
\centering
\caption{Graph of $f$: The red arrow represents the center of mass, or the expected value of $X$.}
\end{figure}
\end{ex}
\el

For many of the common probability distributions, the expected value is given by a parameter of the distribution. For example, if discrete random variable $X$ has a Poisson distribution with parameter $\lambda$, then $\expec{X} = \lambda$. This can be derived directly from Definition 27, but we will derive it another way in Section 4.3 below. As another example, if continuous random variable $X$ has a normal distribution with parameters $\mu$ and $\sigma$, then $\expec{X} = \mu$. The normal case is why the notation $\mu$ is often used for the expected value. Again, this fact can be derived using Definition 27; however, the integral calculation requires many tricks.
\vskip1ex

The expected value may not be exactly equal to a parameter of the probability distribution, but rather it may be a function of the parameters as the next example with the uniform distribution shows.

\begin{ex}
Suppose the random variable $X$ has a uniform distribution on the interval $[a,b]$. Then the pdf of $X$ is given by
$$f(x) = \frac{1}{b-a}, \quad\text{for}\ a\leq x\leq b.$$
Applying Definition 27, part 2, we compute the expected value of $X$:
$$\expec{X} = \int\limits^b_a\! x\cdot\frac{1}{b-a}\, dx = \frac{b^2 - a^2}{2}\cdot\frac{1}{b-a} = \frac{(b-a)(b+a)}{2}\cdot\frac{1}{b-a} = \frac{b+ a}{2}.$$
Thus, the expected value of the uniform$[a,b]$ distribution is given by the average of the parameters $a$ and $b$, or the midpoint of the interval $[a,b]$. This is readily apparent when looking at a graph of the pdf. Since the pdf is constant over $[a,b]$, the center of mass is simply given by the midpoint.
\end{ex}
\el

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsubsection{Expected Value of Functions of Random Variables}

In many applications, we may not be interested in the value of a random variable itself, but rather in a function applied to the random variable or a collection of random variables. For example, we may be interested in the value of $X^2$. The following theorems, which we state without proof, demonstrates how to calculate the expected value of functions of random variables.

\begin{framed}
\begin{thm}
Let $X$ be a random variable and let $g$ be a real-valued function. Define the random variable $Y = g(X)$.
\begin{enumerate}
\item If $X$ is a discrete random variable with possible values $x_1, x_2, \ldots, x_i, \ldots$, and frequency function $p(x)$, then the expected value of $Y$ is given by
$$\expec{Y} = \sum_i g(x_i)\cdot p(x_i).$$
\item If $X$ is a continuous random variable with pdf $f(x)$, then the expected value of $Y$ is given by
$$\expec{Y} = \int\limits^{\infty}_{-\infty}\! g(x)\cdot f(x)\, dx.$$
\end{enumerate}
\end{thm}
\end{framed}

To put it simply, Theorem 1 states that to find the expected value of a function of a random variable, just apply the function to the possible values of the random variable in the definition of expected value. Before stating an important special case of Theorem 1, a word of caution regarding order of operations. Note that, in general,
$$\boxed{\expec{g(X)} \neq g\left(\expec{X}\right)\text{!}}$$
However, as the next theorem states, there are exceptions.
\vskip1ex

\begin{framed}
\noindent\textbf{Special Case of Theorem 1:} Let $X$ be a random variable. If $g$ is a linear function, i.e., $g(x) = ax + b$, then
$$\expec{g(X)} = \expec{aX + b} = a\expec{X} + b.$$
\end{framed}

The above special case is referred to as the \emph{linearity} of expected value.

\begin{framed}
\begin{thm}
Suppose $X_1, \ldots, X_n$ are jointly distributed random variables, and let $Y = g(X_1, \ldots, X_n)$.
\begin{enumerate}
\item If $X_1, \ldots, X_n$ are discrete random variables with joint frequency function $p(x_1, \ldots, x_n)$, then the expected value of $Y$ is given by
$$\expec{Y} = \sum_{x_1, \ldots, x_n} g(x_1, \ldots, x_n)\cdot p(x_1, \ldots, x_n),$$
where the sum is over all possible combinations of possible values for the random variables $X_1, \ldots, X_n$.
\item If $X_1, \ldots, X_n$ are continuous random variables with joint density function $p(x_1, \ldots, x_n)$, then the expected value of $Y$ is given by
$$\expec{Y} = \int\limits^{\infty}_{-\infty}\!\cdots\int\limits^{\infty}_{-\infty}\! g(x_1, \ldots, x_n)\cdot f(x_1, \ldots, x_n)\, dx_1\, \ldots\, dx_n.$$
\end{enumerate}
\end{thm}
\end{framed}

Theorem 2 allows us to extend the linearity property of expected value to linear combinations of jointly distributed random variables.
\vskip1ex

\begin{framed}
\noindent\textbf{Extension of Special Case of Theorem 1:} Let $X_1, \ldots, X_n$ be jointly distributed random variables, and let $a_1, \ldots, a_n, b$ be constants. Then, the following holds:
$$\expec{a_1X_1 + \cdots + a_nX_n + b} = a_1\expec{X_1} + \cdots + a_n\expec{X_n} + b.$$
\end{framed}

As a corollary to Theorem 2, we obtain an easy way of finding the expected value of products of functions of independent random variables.

\begin{framed}
\begin{cor}
If $X$ and $Y$ are independent random variables, then
$$\expec{g(X)\cdot h(Y)} = \expec{g(X)} \cdot \expec{h(Y)}.$$
\end{cor}
\end{framed}

Corollary 1 implies that, for independent random variables, $\expec{XY} = \expec{X}\expec{Y}$.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{Variance of Random Variables}

We now look at our second numerical characteristic associated to random variables.

\begin{framed}
\begin{defn}
The \define{variance} of a random variable $X$ is given by
$$\sigma^2 = \var{X} = \expec{(X-\mu)^2},$$
where $\mu$ denotes the expected value of $X$. The \define{standard deviation} of $X$ is given by
$$\sigma = \text{SD}(X) = \sqrt{\var{X}}.$$
\end{defn}
\end{framed}

In words, the variance of a random variable is the average of the squared deviations of the random variable from its mean (or expected value). Notice that the variance of a random variable will result in a number with units squared, but the standard deviation will have the same units as the random variable. Thus, the standard deviation is easier to interpret, which is why we make a point to define it. The variance and standard deviation give us a \emph{measure of spread} for random variables. The standard deviation is interpreted as a measure of how spread out'' the possible values of $X$ are with respect to the mean of $X$.
\vskip1ex

As with expected values, for many of the common probability distributions, the variance is given by a parameter or a function of the parameters for the distribution. For example, if continuous random variable $X$ has a normal distribution with parameters $\mu$ and $\sigma$, then $\var{X} = \sigma^2$, i.e., the parameter $\sigma$ gives the standard deviation. Again, the normal case explains the notation used for variance and standard deviation.

\begin{ex}
Suppose $X_1\sim\text{normal}(0, 2^2)$ and $X_2\sim\text{normal}(0, 3^2)$. So, $X_1$ and $X_2$ are both normally distributed random variables with the same mean, but $X_2$ has a larger standard deviation. Given our interpretation of standard deviation, this implies that the possible values of $X_2$ are more spread out'' from the mean. This is easily seen by looking at the graphs of the pdf's corresponding to $X_1$ and $X_2$ given in Figure 8.

\begin{figure}[h]
\includegraphics[scale=0.3]{normal.pdf}
\centering
\caption{Graph of normal pdf's: $X_1\sim\text{normal}(0,2^2)$ in blue, $X_2\sim\text{normal}(0,3^2)$ in red}
\end{figure}
\end{ex}
\el

Theorem 1 tells us how to compute variance, since it is given by finding the expected value of a function applied to the random variable. First, if $X$ is a discrete random variable with possible values $x_1, x_2, \ldots, x_i, \ldots$, and frequency function $p(x_i)$, then the variance of $X$ is given by
$$\boxed{\var{X} = \sum_{i} (x_i - \mu)^2\cdot p(x_i).}$$
If $X$ is continuous with pdf $f(x)$, then
$$\boxed{\var{X} = \int\limits^{\infty}_{-\infty}\! (x-\mu)^2\cdot f(x)\, dx.}$$
The above formulas follow directly from Definition 28. However, there is an alternate formula for calculating variance, given by the following theorem, that is often easier to use.

\begin{framed}
\begin{thm}
$\var{X} = \expec{X^2} - \mu^2$
\end{thm}
\end{framed}

\begin{ex}
Continuing in the context of Example 23, we calculate the variance and standard deviation of the random variable $X$ denoting the number of heads obtained in two tosses of a fair coin. Using the alternate formula for variance, we need to first calculate $\expec{X^2}$, for which we use Theorem 1:
$$\expec{X^2} = 0^2\cdot p(0) + 1^2\cdot p(1) + 2^2\cdot p(2) = 0 + 0.5 + 1 = 1.5.$$
In Example 23, we found that $\mu = \expec{X} = 1$. Thus, we find
\begin{align*}
\var{X} &= \expec{X^2} - \mu^2 = 1.5 - 1 = 0.5 \\
\Rightarrow\ \text{SD}(X) &= \sqrt{\var{X}} = \sqrt{0.5} \approx 0.707
\end{align*}
\end{ex}
\el

\begin{ex}
Continuing with Example 24, we calculate the variance and standard deviation of the random variable $X$ denoting the time a person waits for an elevator to arrive. Again, we use the alternate formula for variance and first find $\expec{X^2}$ using Theorem 1:
$$\expec{X^2} = \int\limits^1_0\! x^2\cdot x\, dx + \int\limits^2_1\! x^2\cdot (2-x)\, dx = \int\limits^1_0\! x^3\, dx + \int\limits^2_1\! (2x^2 - x^3)\, dx = \frac{1}{4} + \frac{11}{12} = \frac{7}{6}.$$
In Example 24, we found that $\mu = \expec{X} = 1$. Thus, we have
\begin{align*}
\var{X} &= \expec{X^2} - \mu^2 = \frac{7}{6} - 1 = \frac{1}{6} \\
\Rightarrow\ \text{SD}(X) &= \sqrt{\var{X}} = \frac{1}{\sqrt{6}} \approx 0.408
\end{align*}
\end{ex}
\el

Given that the variance of a random variable is defined to be the expected value of \emph{squared} deviations from the mean, variance is not linear as expected value is. We do have the following useful property of variance though.

\begin{framed}
\begin{thm}
Let $X$ be a random variable, and $a, b$ be constants. Then the following holds:
$$\var{aX + b} = a^2\var{X}.$$
\end{thm}
\end{framed}

Theorem 4 easily follows from a little algebraic modification. Note that the $+\ b$'' disappears in the formula. There is an intuitive reason for this. Namely, the $+\ b$'' corresponds to a \emph{horizontal shift} of the frequency function or pdf of the random variable. Such a transformation to either of these functions is not going to affect the \emph{spread}, i.e., the variance will not change.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{Moment-Generating Functions}

The expected value and variance of a random variable are actually special cases of a more general class of numerical characteristics for random variables given by moments.

\begin{framed}
\begin{defn}
The \define{r$^{\text{th}}$ moment} of a random variable $X$ is given by
$$\expec{X^r}.$$
The \define{r$^{\text{th}}$ central moment} of a random variable $X$ is given by
$$\expec{(X-\mu)^r},$$
where $\mu = \expec{X}$.
\end{defn}
\end{framed}

Note that the expected value of a random variable is given by the first moment, i.e., when $r=1$. Also, the variance of a random variable is given the second central moment.
\vskip1ex

As with expected value and variance, the moments of a random variable are used to characterize the distribution of the random variable and to compare the distribution to that of other random variables. Moments can be calculated directly from the definition, but, even for moderate values of $r$, this approach becomes cumbersome. The next definition and theorem provides an easier way to generate moments.

\begin{framed}
\begin{defn}
The \define{moment-generating function (mgf)} of a random variable $X$ is given by
$$M_X(t) = \expec{e^{tX}}, \quad\text{for}\ t\in\mathbb{R}.$$
\end{defn}
\end{framed}

\begin{framed}
\begin{thm}
If random variable $X$ has mgf $M_X(t)$, then
$$M^{(r)}_X(0) = \frac{d^r}{dt^r}\left[M_X(t)\right]_{t=0} = \expec{X^r}.$$
In other words, the $r^{\text{th}}$ derivative of the mgf evaluated at $t=0$ gives the value of the $r^{\text{th}}$ moment.
\end{thm}
\end{framed}

Theorem 1 tells us how to derive the mgf of a random variable, since the mgf is given by taking the expected value of a function applied to the random variable:
$$\boxed{M_X(t) = \expec{e^{tX}} = \left\{\begin{array}{c l} \text{discrete:} & \displaystyle{\sum_i e^{tx_i}\cdot p(x_i)} \\ & \\ \text{continuous:} & \displaystyle{\int\limits^{\infty}_{-\infty}\! e^{tx}\cdot f(x)\, dx} \end{array}\right.}$$
We can now derive the first moment of the Poisson distribution, i.e., derive the fact we mentioned in Section 4.1 that the expected value is given by the parameter. We also find the variance.

\begin{ex}
Let $X\sim\text{Poisson}(\lambda)$. Then, the frequency function of $X$ is given by
$$p(x) = \frac{e^{-\lambda}\lambda^x}{x!}, \quad\text{for}\ x=0,1,2,\ldots.$$
Before we derive the mgf for $X$, we recall from calculus the Taylor series expansion of the exponential function $e^y$:
$$e^y = \sum_{x=0}^{\infty} \frac{y^x}{x!}.$$
Using this fact, we find
$$M_X(t) = \expec{e^{tX}} = \sum^{\infty}_{x=0} e^{tx}\cdot\frac{e^{-\lambda}\lambda^x}{x!} = e^{-\lambda}\sum^{\infty}_{x=0} \frac{(e^t\lambda)^x}{x!} = e^{-\lambda}e^{e^t\lambda} = e^{\lambda(e^t - 1)}.$$
Now we take the first and second derivative of $M_X(t)$. Remember we are differentiating with respect to $t$:
\begin{align*}
M'_X(t) &= \frac{d}{dt}\left[e^{\lambda(e^t - 1)}\right] = \lambda e^te^{\lambda(e^t - 1)} \\
M''_X(t) &= \frac{d}{dt}\left[\lambda e^te^{\lambda(e^t - 1)}\right] = \lambda e^te^{\lambda(e^t - 1)} + \lambda^2 e^{2t}e^{\lambda(e^t - 1)}
\end{align*}
Next we evaluate the derivatives at $t=0$ to find the first and second moments of $X$:
\begin{align*}
\expec{X} = M'_X(0) &= \lambda e^0e^{\lambda(e^0 - 1)} = \lambda \\
\expec{X^2} = M''_X(0) &= \lambda e^0e^{\lambda(e^0 - 1)} + \lambda^2 e^{0}e^{\lambda(e^0 - 1)} = \lambda + \lambda^2
\end{align*}
Finally, in order to find the variance, we use the alternate formula:
$$\var{X} = \expec{X^2} - \left(\expec{X}\right)^2 = \lambda + \lambda^2 - \lambda^2 = \lambda.$$
%Thus, we have shown that both the mean and variance for the Poisson$(\lambda)$ distribution is given by the parameter $\lambda$.
\end{ex}
%\el

Note that the mgf of a random variable is a \emph{function} of $t$. The main application of mgf's is to find the moments of a random variable, as the previous example demonstrated. There are more properties of mgf's that allow us to find moments for functions of random variables.

\begin{framed}
\begin{thm}
Let $X$ be a random variable with mgf $M_X(t)$, and let $a,b$ be constants. If random variable $Y= aX + b$, then the mgf of $Y$ is given by
$$M_Y(t) = e^{bt}M_X(at).$$
\end{thm}
\end{framed}

\begin{framed}
\begin{thm}
If $X_1, \ldots, X_n$ are independent random variables with mgf's $M_{X_1}(t), \ldots, M_{X_n}(t)$, respectively, then the mgf of random variable $Y = X_1 + \cdots + X_n$ is given by
$$M_Y(t) = M_{X_1}(t) \cdots M_{X_n}(t).$$
\end{thm}
\end{framed}

Recall that a binomially distributed random variable can be written as a sum of independent Bernoulli random variables. We use this and Theorem 7 to derive the mean and variance for a binomial distribution. First, we find the mean and variance of a Bernoulli distribution.

\begin{ex}
Recall that $X$ has a Bernoulli$(p)$ distribution if it is assigned the value of 1 with probability $p$ and the value of 0 with probability $1-p$. Thus, the frequency function of $X$ is given by
$$p(x) = \left\{\begin{array}{l l} 1-p, & \text{if}\ x=0 \\ p, & \text{if}\ x=1 \end{array}\right.$$
In order to find the mean and variance of $X$, we first derive the mgf:
$$M_X(t) = \expec{e^{tX}} = e^{t(0)}(1-p) + e^{t(1)}p = 1 - p + e^tp.$$
Now we differentiate $M_X(t)$ with respect to $t$:
\begin{align*}
M'_X(t) &= \frac{d}{dt}\left[1 - p + e^tp\right] = e^tp \\
M''_X(t) &= \frac{d}{dt}\left[e^tp\right] = e^tp
\end{align*}
Next we evaluate the derivatives at $t=0$ to find the first and second moments:
$$M'_X(0) = M''_X(0) = e^0p = p.$$
Thus, the expected value of $X$ is $\expec{X} = p$. Finally, we use the alternate formula for calculating variance:
$$\var{X} = \expec{X^2} - \left(\expec{X}\right)^2 = p - p^2 = p(1-p).$$
\end{ex}
\el

\begin{ex}
Let $X\sim\text{binomial}(n,p)$. If $X_1, \ldots, X_n$ denote $n$ independent Bernoulli$(p)$ random variables, then we can write
$$X = X_1 + \cdots + X_n.$$
In Example 30, we found the mgf for a Bernoulli$(p)$ random variable. Thus, we have
$$M_{X_i}(t) = 1 - p + e^tp, \quad\text{for}\ i=1, \ldots, n.$$
Using Theorem 7, we derive the mgf for $X$:
$$M_X(t) = M_{X_1}(t) \cdots M_{X_n}(t) = (1-p+e^tp) \cdots (1-p+e^tp) = (1-p+e^tp)^n.$$
Now we can use the mgf of $X$ to find the moments:
\begin{align*}
M'_X(t) &= \frac{d}{dt}\left[(1-p+e^tp)^n\right] = n(1-p+e^tp)^{n-1}e^tp \\
&\Rightarrow M'_X(0) = np \\
M''_X(t) &= \frac{d}{dt}\left[n(1-p+e^tp)^{n-1}e^tp\right] = n(n-1)(1-p+e^tp)^{n-2}(e^tp)^2 + n(1-p+e^tp)^{n-1}e^tp \\
&\Rightarrow M''_X(0) = n(n-1)p^2 + np
\end{align*}
Thus, the expected value of $X$ is $\expec{X} = np$, and the variance is
$$\var{X} = \expec{X^2} - (\expec{X})^2 = n(n-1)p^2 + np - (np)^2 = np(1-p).$$
\end{ex}
\el

We end with a final property of mgf's that relates to the comparison of the distribution of random variables.

\begin{framed}
\begin{thm}
The mgf $M_X(t)$ of random variable $X$ uniquely determines the probability distribution of $X$. In other words, if random variables $X$ and $Y$ have the same mgf, $M_X(t) = M_Y(t)$, then $X$ and $Y$ have the same probability distribution.
\end{thm}
\end{framed}

The main application of Theorem 8 is to derive the probability distribution of functions of random variables. One of the most important applications like this is the following, which states that the sum of independent normally distributed random variables is also normally distributed.

\begin{framed}
\noindent\textbf{Sum of Normal Random Variables:} If $X_1, \ldots, X_n$ are independent random variables with $X_i\sim\text{normal}(\mu_i,\sigma_i^2)$, for $i=1,\ldots,n$, then the random variable given by their sum is also normally distributed. More specifically, if $Y = X_1 + \cdots + X_n$, then
$$Y \sim\text{normal}(\mu,\sigma^2), \quad\text{where}\ \mu = \mu_1 + \cdots + \mu_n\ \text{and}\ \sigma^2 = \sigma_1^2 + \cdots + \sigma_n^2.$$
\end{framed}

The above result is extremely useful in the study of statistics.