3.8: Moment-Generating Functions (MGFs) for Discrete Random Variables
- Page ID
- 4374
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)The expected value and variance of a random variable are actually special cases of a more general class of numerical characteristics for random variables given by moments.
Definition \(\PageIndex{1}\)
The rth moment of a random variable \(X\) is given by
$$\text{E}[X^r].\notag$$
The rth central moment of a random variable \(X\) is given by
$$\text{E}[(X-\mu)^r],\notag$$
where \(\mu = \text{E}[X]\).
Note that the expected value of a random variable is given by the first moment, i.e., when \(r=1\). Also, the variance of a random variable is given the second central moment.
As with expected value and variance, the moments of a random variable are used to characterize the distribution of the random variable and to compare the distribution to that of other random variables. Moments can be calculated directly from the definition, but, even for moderate values of \(r\), this approach becomes cumbersome. The next definition and theorem provide an easier way to generate moments.
Definition \(\PageIndex{2}\)
The moment-generating function (mgf) of a random variable \(X\) is given by
$$M_X(t) = E[e^{tX}], \quad\text{for}\ t\in\mathbb{R}.\notag$$
Theorem \(\PageIndex{1}\)
If random variable \(X\) has mgf \(M_X(t)\), then
$$M^{(r)}_X(0) = \frac{d^r}{dt^r}\left[M_X(t)\right]_{t=0} = \text{E}[X^r].\notag$$
In other words, the \(r^{\text{th}}\) derivative of the mgf evaluated at \(t=0\) gives the value of the \(r^{\text{th}}\) moment.
Theorem 3.8.1 tells us how to derive the mgf of a random variable, since the mgf is given by taking the expected value of a function applied to the random variable:
$$M_X(t) = E[e^{tX}] = \sum_i e^{tx_i}\cdot p(x_i)\notag$$
We can now derive the first moment of the Poisson distribution, i.e., derive the fact we mentioned in Section 3.6, but left as an exercise, that the expected value is given by the parameter \(\lambda\). We also find the variance.
Example \(\PageIndex{1}\)
Let \(X\sim\text{Poisson}(\lambda)\). Then, the pmf of \(X\) is given by
$$p(x) = \frac{e^{-\lambda}\lambda^x}{x!}, \quad\text{for}\ x=0,1,2,\ldots.\notag$$
Before we derive the mgf for \(X\), we recall from calculus the Taylor series expansion of the exponential function \(e^y\):
$$e^y = \sum_{x=0}^{\infty} \frac{y^x}{x!}.\notag$$
Using this fact, we find
$$M_X(t) = \text{E}[e^{tX}] = \sum^{\infty}_{x=0} e^{tx}\cdot\frac{e^{-\lambda}\lambda^x}{x!} = e^{-\lambda}\sum^{\infty}_{x=0} \frac{(e^t\lambda)^x}{x!} = e^{-\lambda}e^{e^t\lambda} = e^{\lambda(e^t - 1)}.\notag$$
Now we take the first and second derivatives of \(M_X(t)\). Remember we are differentiating with respect to \(t\):
\begin{align*}
M'_X(t) &= \frac{d}{dt}\left[e^{\lambda(e^t - 1)}\right] = \lambda e^te^{\lambda(e^t - 1)} \\
M''_X(t) &= \frac{d}{dt}\left[\lambda e^te^{\lambda(e^t - 1)}\right] = \lambda e^te^{\lambda(e^t - 1)} + \lambda^2 e^{2t}e^{\lambda(e^t - 1)}
\end{align*}
Next we evaluate the derivatives at \(t=0\) to find the first and second moments of \(X\):
\begin{align*}
\text{E}[X] = M'_X(0) &= \lambda e^0e^{\lambda(e^0 - 1)} = \lambda \\
\text{E}[X^2] = M''_X(0) &= \lambda e^0e^{\lambda(e^0 - 1)} + \lambda^2 e^{0}e^{\lambda(e^0 - 1)} = \lambda + \lambda^2
\end{align*}
Finally, in order to find the variance, we use the alternate formula:
$$\text{Var}(X) = \text{E}[X^2] - \left(\text{E}[X]\right)^2 = \lambda + \lambda^2 - \lambda^2 = \lambda.\notag$$
Thus, we have shown that both the mean and variance for the Poisson\((\lambda)\) distribution is given by the parameter \(\lambda\).
Note that the mgf of a random variable is a function of \(t\). The main application of mgf's is to find the moments of a random variable, as the previous example demonstrated. There are more properties of mgf's that allow us to find moments for functions of random variables.
Theorem \(\PageIndex{2}\)
Let \(X\) be a random variable with mgf \(M_X(t)\), and let \(a,b\) be constants. If random variable \(Y= aX + b\), then the mgf of \(Y\) is given by
$$M_Y(t) = e^{bt}M_X(at).\notag$$
Theorem \(\PageIndex{3}\)
If \(X_1, \ldots, X_n\) are independent random variables with mgf's \(M_{X_1}(t), \ldots, M_{X_n}(t)\), respectively, then the mgf of random variable \(Y = X_1 + \cdots + X_n\) is given by
$$M_Y(t) = M_{X_1}(t) \cdots M_{X_n}(t).\notag$$
Recall that a binomially distributed random variable can be written as a sum of independent Bernoulli random variables. We use this and Theorem 3.8.3 to derive the mean and variance for a binomial distribution. First, we find the mean and variance of a Bernoulli distribution.
Example \(\PageIndex{2}\)
Recall that \(X\) has a Bernoulli\((p)\) distribution if it is assigned the value of 1 with probability \(p\) and the value of 0 with probability \(1-p\). Thus, the pmf of \(X\) is given by
$$p(x) = \left\{\begin{array}{l l}
1-p, & \text{if}\ x=0 \\
p, & \text{if}\ x=1
\end{array}\right.\notag$$
In order to find the mean and variance of \(X\), we first derive the mgf:
$$M_X(t) = \text{E}[e^{tX}] = e^{t(0)}(1-p) + e^{t(1)}p = 1 - p + e^tp.\notag$$
Now we differentiate \(M_X(t)\) with respect to \(t\):
\begin{align*}
M'_X(t) &= \frac{d}{dt}\left[1 - p + e^tp\right] = e^tp \\
M''_X(t) &= \frac{d}{dt}\left[e^tp\right] = e^tp
\end{align*}
Next we evaluate the derivatives at \(t=0\) to find the first and second moments:
$$M'_X(0) = M''_X(0) = e^0p = p.\notag$$
Thus, the expected value of \(X\) is \(\text{E}[X] = p\). Finally, we use the alternate formula for calculating variance:
$$\text{Var}(X) = \text{E}[X^2] - \left(\text{E}[X]\right)^2 = p - p^2 = p(1-p).\notag$$
Example \(\PageIndex{3}\)
Let \(X\sim\text{binomial}(n,p)\). If \(X_1, \ldots, X_n\) denote \(n\) independent Bernoulli\((p)\) random variables, then we can write
$$X = X_1 + \cdots + X_n.\notag$$
In Example 3.8.2, we found the mgf for a Bernoulli\((p)\) random variable. Thus, we have
$$M_{X_i}(t) = 1 - p + e^tp, \quad\text{for}\ i=1, \ldots, n.\notag$$
Using Theorem 3.8.3, we derive the mgf for \(X\):
$$M_X(t) = M_{X_1}(t) \cdots M_{X_n}(t) = (1-p+e^tp) \cdots (1-p+e^tp) = (1-p+e^tp)^n.\notag$$
Now we can use the mgf of \(X\) to find the moments:
\begin{align*}
M'_X(t) &= \frac{d}{dt}\left[(1-p+e^tp)^n\right] = n(1-p+e^tp)^{n-1}e^tp \\
&\Rightarrow M'_X(0) = np \\
M''_X(t) &= \frac{d}{dt}\left[n(1-p+e^tp)^{n-1}e^tp\right] = n(n-1)(1-p+e^tp)^{n-2}(e^tp)^2 + n(1-p+e^tp)^{n-1}e^tp \\
&\Rightarrow M''_X(0) = n(n-1)p^2 + np
\end{align*}
Thus, the expected value of \(X\) is \(\text{E}[X] = np\), and the variance is
$$\text{Var}(X) = \text{E}[X^2] - (\text{E}[X])^2 = n(n-1)p^2 + np - (np)^2 = np(1-p).\notag$$
We end with a final property of mgf's that relates to the comparison of the distribution of random variables.
Theorem \(\PageIndex{4}\)
The mgf \(M_X(t)\) of random variable \(X\) uniquely determines the probability distribution of \(X\). In other words, if random variables \(X\) and \(Y\) have the same mgf, \(M_X(t) = M_Y(t)\), then \(X\) and \(Y\) have the same probability distribution.
Exercise \(\PageIndex{1}\)
Suppose the random variable \(X\) has the following mgf:
$$M_X(t) = \left(0.85 + 0.15e^t\right)^{33}\notag$$ What is the distribution of \(X\)?
- Hint
- Use Theorem 3.8.4 and look at Example 3.8.3.
- Answer
-
We found in Example 3.8.3 that the mgf for a binomial distribution is
$$M_X(t) = (1-p+e^tp)^n,\notag$$ which is the mgf given with \(p=0.15\) and \(n=33\). Thus, \(X\sim \text{binomial}(33, 0.15)\).