# 10.1: Generating Functions for Discrete Distributions

- Page ID
- 3168

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)So far we have considered in detail only the two most important attributes of a random variable, namely, the mean and the variance. We have seen how these attributes enter into the fundamental limit theorems of probability, as well as into all sorts of practical calculations. We have seen that the mean and variance of a random variable contain important information about the random variable, or, more precisely, about the distribution function of that variable. Now we shall see that the mean and variance do contain the available information about the density function of a random variable. To begin with, it is easy to give examples of different distribution functions which have the same mean and the same variance. For instance, suppose \(X\) and \(Y\) are random variables, with distributions

\[p_X = \pmatrix{ 1 & 2 & 3 & 4 & 5 & 6\cr 0 & 1/4 & 1/2 & 0 & 0 & 1/4\cr},\] \[p_Y = \pmatrix{ 1 & 2 & 3 & 4 & 5 & 6\cr 1/4 & 0 & 0 & 1/2 & 1/4 & 0\cr}.\]

Then with these choices, we have \(E(X) = E(Y) = 7/2\) and \(V(X) = V(Y) = 9/4\), and yet certainly \(p_X\) and \(p_Y\) are quite different density functions.

This raises a question: If \(X\) is a random variable with range \(\{x_1, x_2, \ldots\}\) of at most countable size, and distribution function \(p = p_X\), and if we know its mean \(\mu = E(X)\) and its variance \(\sigma^2 = V(X)\), then what else do we need to know to determine \(p\) completely?

## Moments

A nice answer to this question, at least in the case that \(X\) has finite range, can be given in terms of the of \(X\), which are numbers defined as follows: \[\begin{aligned} \mu_k &=& k \mbox{th}\,\,\mbox{moment~of}\,\, X\\ &=& E(X^k) \\ &=& \sum_{j = 1}^\infty (x_j)^k p(x_j)\ ,\end{aligned}\] provided the sum converges. Here \(p(x_j) = P(X = x_j)\).

In terms of these moments, the mean \(\mu\) and variance \(\sigma^2\) of \(X\) are given simply by

\[\begin{aligned} \mu &=& \mu_1, \\ \sigma^2 &=& \mu_2 - \mu_1^2\ ,\end{aligned}\] so that a knowledge of the first two moments of \(X\) gives us its mean and variance. But a knowledge of the moments of \(X\) determines its distribution function \(p\) completely.

## Moment Generating Functions

To see how this comes about, we introduce a new variable \(t\), and define a function \(g(t)\) as follows:

\[\begin{aligned} g(t) &=& E(e^{tX}) \\ &=& \sum_{k = 0}^\infty \frac {\mu_k t^k}{k!} \\ &=& E\left(\sum_{k = 0}^\infty \frac {X^k t^k}{k!} \right) \\ &=& \sum_{j = 1}^\infty e^{tx_j} p(x_j)\ .\end{aligned}\] We call \(g(t)\) the for \(X\), and think of it as a convenient bookkeeping device for describing the moments of \(X\). Indeed, if we differentiate \(g(t)\) \(n\) times and then set \(t = 0\), we get \(\mu_n\):

\[\begin{aligned} \left. \frac {d^n}{dt^n} g(t) \right|_{t = 0} &=& g^{(n)}(0) \\ &=& \left. \sum_{k = n}^\infty \frac {k!\, \mu_k t^{k - n}} {(k - n)!\, k!} \right|_{t = 0} \\ &=& \mu_n\ .\end{aligned}\] It is easy to calculate the moment generating function for simple examples.

## Examples

Suppose \(X\) has range \(\{1,2,3,\ldots,n\}\) and \(p_X(j) = 1/n\) for \(1 \leq j \leq n\) (uniform distribution). Then

\[\begin{aligned} g(t) &=& \sum_{j = 1}^n \frac 1n e^{tj} \\ &=& \frac 1n (e^t + e^{2t} +\cdots+ e^{nt}) \\ &=& \frac {e^t (e^{nt} - 1)} {n (e^t - 1)}\ .\end{aligned}\] If we use the expression on the right-hand side of the second line above, then it is easy to see that

\[\begin{aligned} \mu_1 &=& g'(0) = \frac 1n (1 + 2 + 3 + \cdots + n) = \frac {n + 1}2, \\ \mu_2 &=& g''(0) = \frac 1n (1 + 4 + 9+ \cdots + n^2) = \frac {(n + 1)(2n + 1)}6\ ,\end{aligned}\] and that \(\mu = \mu_1 = (n + 1)/2\) and \(\sigma^2 = \mu_2 - \mu_1^2 = (n^2 - 1)/12\).

Suppose now that \(X\) has range \(\{0,1,2,3,\ldots,n\}\) and \(p_X(j) = {n \choose j} p^j q^{n - j}\) for \(0 \leq j \leq n\) (binomial distribution). Then \[\begin{aligned} g(t) &=& \sum_{j = 0}^n e^{tj} {n \choose j} p^j q^{n - j} \\ &=& \sum_{j = 0}^n {n \choose j} (pe^t)^j q^{n - j} \\ &=& (pe^t + q)^n\ .\end{aligned}\] Note that \[\begin{aligned} \mu_1 = g'(0) &=& \left. n(pe^t + q)^{n - 1}pe^t \right|_{t = 0} = np\ , \\ \mu_2 = g''(0) &=& n(n - 1)p^2 + np\ ,\end{aligned}\] so that \(\mu = \mu_1 = np\), and \(\sigma^2 = \mu_2 - \mu_1^2 = np(1 - p)\), as expected.

Suppose \(X\) has range \(\{1,2,3,\ldots\}\) and \(p_X(j) = q^{j - 1}p\) for all \(j\) (geometric distribution). Then \[\begin{aligned} g(t) &=& \sum_{j = 1}^\infty e^{tj} q^{j - 1}p \\ &=& \frac {pe^t}{1 - qe^t}\ .\end{aligned}\] Here \[\begin{aligned} \mu_1 &=& g'(0) = \left. \frac {pe^t}{(1 - qe^t)^2} \right|_{t = 0} = \frac 1p\ , \\ \mu_2 &=& g''(0) = \left. \frac {pe^t + pqe^{2t}}{(1 - qe^t)^3} \right|_{t = 0} = \frac {1 + q}{p^2}\ ,\end{aligned}\] \(\mu = \mu_1 = 1/p\), and \(\sigma^2 = \mu_2 - \mu_1^2 = q/p^2\), as computed in Example [exam 6.21].

Let \(X\) have range \(\{0,1,2,3,\ldots\}\) and let \(p_X(j) = e^{-\lambda}\lambda^j/j!\) for all \(j\) (Poisson distribution with mean \(\lambda\)). Then \[\begin{aligned} g(t) &=& \sum_{j = 0}^\infty e^{tj} \frac {e^{-\lambda}\lambda^j}{j!} \\ &=& e^{-\lambda} \sum_{j = 0}^\infty \frac {(\lambda e^t)^j}{j!} \\ &=& e^{-\lambda} e^{\lambda e^t} = e^{\lambda(e^t - 1)}\ .\end{aligned}\] Then \[\begin{aligned} \mu_1 &=& g'(0) = \left. e^{\lambda(e^t - 1)}\lambda e^t \right|_{t = 0} = \lambda\ ,\\ \mu_2 &=& g''(0) = \left. e^{\lambda(e^t - 1)} (\lambda^2 e^{2t} + \lambda e^t) \right|_{t = 0} = \lambda^2 + \lambda\ ,\end{aligned}\] \(\mu = \mu_1 = \lambda\), and \(\sigma^2 = \mu_2 - \mu_1^2 = \lambda\).

The variance of the Poisson distribution is easier to obtain in this way than directly from the definition (as was done in Exercise [sec 6.2].[exer 6.2.100]).

## Moment Problem

Using the moment generating function, we can now show, at least in the case of a discrete random variable with finite range, that its distribution function is completely determined by its moments.

Add exercises text here. For the automatic number to work, you need to Let \(X\) be a discrete random variable with finite range \(\{x_1,x_2,\ldots, x_n\}\), distribution function \(p\), and moment generating function \(g\). Then \(g\) is uniquely determined by \(p\), and conversely.

**Proof**-
We know that \(p\) determines \(g\), since \[g(t) = \sum_{j = 1}^n e^{tx_j} p(x_j)\ .\] Conversely, assume that \(g(t)\) is known. We wish to determine the values of \(x_j\) and \(p(x_j)\), for \(1 \le j \le n\). We assume, without loss of generality, that \(p(x_j) > 0\) for \(1 \le j \le n\), and that \[x_1 < x_2 < \ldots < x_n\ .\] We note that \(g(t)\) is differentiable for all \(t\), since it is a finite linear combination of exponential functions. If we compute \(g'(t)/g(t)\), we obtain \[

(click for details)\ .\] Dividing both top and bottom by \(e^{tx_n}\), we obtain the expression \[`Callstack: at (Bookshelves/Probability_Theory/Book:_Introductory_Probability_(Grinstead_and_Snell)/10:_Generating_Functions/10.01:_Generating_Functions_for_Discrete_Distributions), /content/body/div[4]/section/div/dl/dd/p/span[1], line 1, column 5`

(click for details)\ .\] Since \(x_n\) is the largest of the \(x_j\)’s, this expression approaches \(x_n\) as \(t\) goes to \(\infty\). So we have shown that \[x_n = \lim_{t \rightarrow \infty} {{g'(t)}\over{g(t)}}\ .\] To find \(p(x_n)\), we simply divide \(g(t)\) by \(e^{tx_n}\) and let \(t\) go to \(\infty\). Once \(x_n\) and \(p(x_n)\) have been determined, we can subtract \(p(x_n) e^{tx_n}\) from \(g(t)\), and repeat the above procedure with the resulting function, obtaining, in turn, \(x_{n-1}, \ldots, x_1\) and \(p(x_{n-1}), \ldots, p(x_1)\).`Callstack: at (Bookshelves/Probability_Theory/Book:_Introductory_Probability_(Grinstead_and_Snell)/10:_Generating_Functions/10.01:_Generating_Functions_for_Discrete_Distributions), /content/body/div[4]/section/div/dl/dd/p/span[2], line 1, column 5`

If we delete the hypothesis that \(X\) have finite range in the above theorem, then the conclusion is no longer necessarily true.

## Ordinary Generating Functions

In the special but important case where the \(x_j\) are all nonnegative integers, \(x_j = j\), we can prove this theorem in a simpler way.

In this case, we have \[g(t) = \sum_{j = 0}^n e^{tj} p(j)\ ,\] and we see that \(g(t)\) is a in \(e^t\). If we write \(z = e^t\), and define the function \(h\) by \[h(z) = \sum_{j = 0}^n z^j p(j)\ ,\] then \(h(z)\) is a polynomial in \(z\) containing the same information as \(g(t)\), and in fact \[\begin{aligned} h(z) &=& g(\log z)\ , \\ g(t) &=& h(e^t)\ .\end{aligned}\] The function \(h(z)\) is often called the for \(X\). Note that \(h(1) = g(0) = 1\), \(h'(1) = g'(0) = \mu_1\), and \(h''(1) = g''(0) - g'(0) = \mu_2 - \mu_1\). It follows from all this that if we know \(g(t)\), then we know \(h(z)\), and if we know \(h(z)\), then we can find the \(p(j)\) by Taylor’s formula: \[\begin{aligned} p(j) &=& \mbox{coefficient~of}\,\, z^j \,\, \mbox{in}\,\, h(z) \\ &=& \frac{h^{(j)}(0)}{j!}\ .\end{aligned}\]

For example, suppose we know that the moments of a certain discrete random variable \(X\) are given by \[\begin{aligned} \mu_0 &=& 1\ , \\ \mu_k &=& \frac12 + \frac{2^k}4\ , \qquad \mbox{for}\,\, k \geq 1\ .\end{aligned}\] Then the moment generating function \(g\) of \(X\) is \[\begin{aligned} g(t) &=& \sum_{k = 0}^\infty \frac{\mu_k t^k}{k!} \\ &=& 1 + \frac12 \sum_{k = 1}^\infty \frac{t^k}{k!} + \frac14 \sum_{k = 1}^\infty \frac{(2t)^k}{k!} \\ &=& \frac14 + \frac12 e^t + \frac14 e^{2t}\ .\end{aligned}\] This is a polynomial in \(z = e^t\), and \[h(z) = \frac14 + \frac12 z + \frac14 z^2\ .\] Hence, \(X\) must have range \(\{0,1,2\}\), and \(p\) must have values \(\{1/4,1/2,1/4\}\).

## Properties

Both the moment generating function \(g\) and the ordinary generating function \(h\) have many properties useful in the study of random variables, of which we can consider only a few here. In particular, if \(X\) is any discrete random variable and \(Y = X + a\), then \[\begin{aligned} g_Y(t) &=& E(e^{tY}) \\ &=& E(e^{t(X + a)}) \\ &=& e^{ta} E(e^{tX}) \\ &=& e^{ta} g_X(t)\ ,\end{aligned}\] while if \(Y = bX\), then \[\begin{aligned} g_Y(t) &=& E(e^{tY}) \\ &=& E(e^{tbX}) \\ &=& g_X(bt)\ .\end{aligned}\] In particular, if \[X^* = \frac{X - \mu}\sigma\ ,\] then (see Exercise [exer 10.1.14]) \[g_{x^*}(t) = e^{-\mu t/\sigma} g_X\left( \frac t\sigma \right)\ .\]

If \(X\) and \(Y\) are random variables and \(Z = X + Y\) is their sum, with \(p_X\), \(p_Y\), and \(p_Z\) the associated distribution functions, then we have seen in Chapter [chp 7] that \(p_Z\) is the of \(p_X\) and \(p_Y\), and we know that convolution involves a rather complicated calculation. But for the generating functions we have instead the simple relations \[\begin{aligned} g_Z(t) &=& g_X(t) g_Y(t)\ , \\ h_Z(z) &=& h_X(z) h_Y(z)\ ,\end{aligned}\] that is, \(g_Z\) is simply the of \(g_X\) and \(g_Y\), and similarly for \(h_Z\).

To see this, first note that if \(X\) and \(Y\) are independent, then \(e^{tX}\) and \(e^{tY}\) are independent (see Exercise [sec 5.2].[exer 5.2.38]), and hence \[E(e^{tX} e^{tY}) = E(e^{tX}) E(e^{tY})\ .\] It follows that \[\begin{aligned} g_Z(t) &=& E(e^{tZ}) = E(e^{t(X + Y)}) \\ &=& E(e^{tX}) E(e^{tY}) \\ &=& g_X(t) g_Y(t)\ ,\end{aligned}\] and, replacing \(t\) by \(\log z\), we also get \[h_Z(z) = h_X(z) h_Y(z)\ .\]

If \(X\) and \(Y\) are independent discrete random variables with range \(\{0,1,2,\ldots,n\}\) and binomial distribution \[p_X(j) = p_Y(j) = {n \choose j} p^j q^{n - j}\ ,\] and if \(Z = X + Y\), then we know (cf. Section [sec 7.1]) that the range of \(X\) is \[\{0,1,2,\ldots,2n\}\] and \(X\) has binomial distribution \[p_Z(j) = (p_X * p_Y)(j) = {2n \choose j} p^j q^{2n - j}\ .\] Here we can easily verify this result by using generating functions. We know that \[\begin{aligned} g_X(t) = g_Y(t) &=& \sum_{j = 0}^n e^{tj} {n \choose j} p^j q^{n - j} \\ &=& (pe^t + q)^n\ ,\end{aligned}\] and \[h_X(z) = h_Y(z) = (pz + q)^n\ .\] Hence, we have \[g_Z(t) = g_X(t) g_Y(t) = (pe^t + q)^{2n}\ ,\] or, what is the same, \[\begin{aligned} h_Z(z) &=& h_X(z) h_Y(z) = (pz + q)^{2n} \\ &=& \sum_{j = 0}^{2n} {2n \choose j} (pz)^j q^{2n - j}\ ,\end{aligned}\] from which we can see that the coefficient of \(z^j\) is just \(p_Z(j) = {2n \choose j} p^j q^{2n - j}\).

If \(X\) and \(Y\) are independent discrete random variables with the non-negative integers \(\{0,1,2,3,\ldots\}\) as range, and with geometric distribution function \[p_X(j) = p_Y(j) = q^j p\ ,\] then \[g_X(t) = g_Y(t) = \frac p{1 - qe^t}\ ,\] and if \(Z = X + Y\), then \[\begin{aligned} g_Z(t) &=& g_X(t) g_Y(t) \\ &=& \frac{p^2}{1 - 2qe^t + q^2 e^{2t}}\ .\end{aligned}\] If we replace \(e^t\) by \(z\), we get \[\begin{aligned} h_Z(z) &=& \frac{p^2}{(1 - qz)^2} \\ &=& p^2 \sum_{k = 0}^\infty (k + 1) q^k z^k\ ,\end{aligned}\] and we can read off the values of \(p_Z(j)\) as the coefficient of \(z^j\) in this expansion for \(h(z)\), even though \(h(z)\) is not a polynomial in this case. The distribution \(p_Z\) is a negative binomial distribution (see Section [sec 5.1]).

Here is a more interesting example of the power and scope of the method of generating functions.

## Heads or Tails

In the coin-tossing game discussed in Example [exam 1.3], we now consider the question “When is Peter first in the lead?"

**Answer**-
Let \(X_k\) describe the outcome of the \(k\)th trial in the game \[X_k = \left \{ \matrix{ +1, &\mbox{if}\,\, k{\rm th}\,\, \mbox{toss~is~heads}, \cr -1, &\mbox{if}\,\, k{\rm th}\,\, \mbox{toss~is~tails.}\cr}\right.\] Then the \(X_k\) are independent random variables describing a Bernoulli process. Let \(S_0 = 0\), and, for \(n \geq 1\), let \[S_n = X_1 + X_2 + \cdots + X_n\ .\] Then \(S_n\) describes Peter’s fortune after \(n\) trials, and Peter is first in the lead after \(n\) trials if \(S_k \leq 0\) for \(1 \leq k < n\) and \(S_n = 1\).

Now this can happen when \(n = 1\), in which case \(S_1 = X_1 = 1\), or when \(n > 1\), in which case \(S_1 = X_1 = -1\). In the latter case, \(S_k = 0\) for \(k = n - 1\), and perhaps for other \(k\) between 1 and \(n\). Let \(m\) be the such value of \(k\); then \(S_m = 0\) and \(S_k < 0\) for \(1 \leq k < m\). In this case Peter loses on the first trial, regains his initial position in the next \(m - 1\) trials, and gains the lead in the next \(n - m\) trials.

Let \(p\) be the probability that the coin comes up heads, and let \(q = 1-p\). Let \(r_n\) be the probability that Peter is first in the lead after \(n\) trials. Then from the discussion above, we see that \[\begin{aligned} r_n &=& 0\ , \qquad \mbox{if}\,\, n\,\, \mbox{even}, \\ r_1 &=& p \qquad (= \mbox{probability~of~heads~in~a~single~toss)}, \\ r_n &=& q(r_1r_{n-2} + r_3r_{n-4} +\cdots+ r_{n-2}r_1)\ , \qquad \mbox{if} \ n > 1,\ n\ \mbox{odd}.\end{aligned}\] Now let \(T\) describe the time (that is, the number of trials) required for Peter to take the lead. Then \(T\) is a random variable, and since \(P(T = n) = r_n\), \(r\) is the distribution function for \(T\).

We introduce the generating function \(h_T(z)\) for \(T\):

\[h_T(z) = \sum_{n = 0}^\infty r_n z^n\ .\]

Then, by using the relations above, we can verify the relation

\[h_T(z) = pz + qz(h_T(z))^2\ .\]

If we solve this quadratic equation for \(h_T(z)\), we get \[h_T(z) = \frac{1 \pm \sqrt{1 - 4pqz^2}}{2qz} = \frac{2pz}{1 \mp \sqrt{1 - 4pqz^2}}\ .\] Of these two solutions, we want the one that has a convergent power series in \(z\) (i.e., that is finite for \(z = 0\)). Hence we choose \[h_T(z) = \frac{1 - \sqrt{1 - 4pqz^2}}{2qz} = \frac{2pz}{1 + \sqrt{1 - 4pqz^2}}\ .\] Now we can ask: What is the probability that Peter is in the lead? This probability is given by (see Exercise \(\PageIndex{10}\))

\[\begin{aligned} \sum_{n = 0}^\infty r_n &=& h_T(1) = \frac{1 - \sqrt{\mathstrut1 - 4pq}}{2q} \\ &=& \frac{1 - |p - q|}{2q} \\ &=& \left \{ \begin{array}{ll} p/q, & \mbox{if $p < q$}, \\ 1, & \mbox{if $p \geq q$}, \end{array}\right. \end{aligned}\] so that Peter is sure to be in the lead eventually if \(p \geq q\).

How long will it take? That is, what is the expected value of \(T\)? This value is given by \[E(T) = h_T'(1) = \left \{ \matrix { 1/(p - q), & \mbox{if}\,\, p > q, \cr \infty, & \mbox{if}\,\, p = q.\cr}\right.\] This says that if \(p > q\), then Peter can expect to be in the lead by about \(1/(p - q)\) trials, but if \(p = q\), he can expect to wait a long time.

A related problem, known as the Gambler’s Ruin problem, is studied in Exercise [exer 11.2.22] and in Section 12.2.

## Exercises

#### Exercise \(\PageIndex{1}\)

Find the generating functions, both ordinary \(h(z)\) and moment \(g(t)\), for the following discrete probability distributions.

- The distribution describing a fair coin.
- The distribution describing a fair die.
- The distribution describing a die that always comes up 3.
- The uniform distribution on the set \(\{n,n+1,n+2,\ldots,n+k\}\).
- The binomial distribution on \(\{n,n+1,n+2,\ldots,n+k\}\).
- The geometric distribution on \(\{0,1,2,\ldots,\}\) with \(p(j) = 2/3^{j + 1}\).

#### Exercise \(\PageIndex{2}\)

For each of the distributions (a) through (d) of Exercise \(\PageIndex{1}\) calculate the first and second moments, \(\mu_1\) and \(\mu_2\), directly from their definition, and verify that \(h(1) = 1\), \(h'(1) = \mu_1\), and \(h''(1) = \mu_2 - \mu_1\).

#### Exercise \(\PageIndex{3}\)

Let \(p\) be a probability distribution on \(\{0,1,2\}\) with moments \(\mu_1 = 1\), \(\mu_2 = 3/2\).

- Find its ordinary generating function \(h(z)\).
- Using (a), find its moment generating function.
- Using (b), find its first six moments.
- Using (a), find \(p_0\), \(p_1\), and \(p_2\).

#### Exercise \(\PageIndex{4}\)

In Exercise \(\PageIndex{3}\) the probability distribution is completely determined by its first two moments. Show that this is always true for any probability distribution on \(\{0,1,2\}\). : Given \(\mu_1\) and \(\mu_2\), find \(h(z)\) as in Exercise \(\PageIndex{3}\) and use \(h(z)\) to determine \(p\).

#### Exercise \(\PageIndex{5}\)

Let \(p\) and \(p'\) be the two distributions

\[p = \pmatrix{ 1 & 2 & 3 & 4 & 5 \cr 1/3 & 0 & 0 & 2/3 & 0 \cr}\ ,\]

\[p' = \pmatrix{ 1 & 2 & 3 & 4 & 5 \cr 0 & 2/3 & 0 & 0 & 1/3 \cr}\ .\]

- Show that \(p\) and \(p'\) have the same first and second moments, but not the same third and fourth moments.
- Find the ordinary and moment generating functions for \(p\) and \(p'\).

#### Exercise \(\PageIndex{6}\)

Let \(p\) be the probability distribution

\[p = \pmatrix{ 0 & 1 & 2 \cr 0 & 1/3 & 2/3 \cr}\ ,\] and let \(p_n = p * p * \cdots * p\) be the \(n\)-fold convolution of \(p\) with itself.

- Find \(p_2\) by direct calculation (see Definition 7.1.1).
- Find the ordinary generating functions \(h(z)\) and \(h_2(z)\) for \(p\) and \(p_2\), and verify that \(h_2(z) = (h(z))^2\).
- Find \(h_n(z)\) from \(h(z)\).
- Find the first two moments, and hence the mean and variance, of \(p_n\) from \(h_n(z)\). Verify that the mean of \(p_n\) is \(n\) times the mean of \(p\).
- Find those integers \(j\) for which \(p_n(j) > 0\) from \(h_n(z)\).

#### Exercise \(\PageIndex{7}\)

Let \(X\) be a discrete random variable with values in \(\{0,1,2,\ldots,n\}\) and moment generating function \(g(t)\). Find, in terms of \(g(t)\), the generating functions for

- \(-X\).
- \(X + 1\).
- \(3X\).
- \(aX + b\).

#### Exercise \(\PageIndex{8}\)

Let \(X_1\), \(X_2\), …, \(X_n\) be an independent trials process, with values in \(\{0,1\}\) and mean \(\mu = 1/3\). Find the ordinary and moment generating functions for the distribution of

- \(S_1 = X_1\). : First find \(X_1\) explicitly.
- \(S_2 = X_1 + X_2\).
- \(S_n = X_1 + X_2 +\cdots+ X_n\).

#### Exercise \(\PageIndex{9}\)

Let \(X\) and \(Y\) be random variables with values in \(\{1,2,3,4,5,6\}\) with distribution functions \(p_X\) and \(p_Y\) given by \[\begin{aligned} p_X(j) &=& a_j\ , \\ p_Y(j) &=& b_j\ .\end{aligned}\]

- Find the ordinary generating functions \(h_X(z)\) and \(h_Y(z)\) for these distributions.
- Find the ordinary generating function \(h_Z(z)\) for the distribution \(Z = X + Y\).
- Show that \(h_Z(z)\) cannot ever have the form \[h_Z(z) = \frac{z^2 + z^3 +\cdots+ z^{12}}{11}\ .\]

: \(h_X\) and \(h_Y\) must have at least one nonzero root, but \(h_Z(z)\) in the form given has no nonzero real roots.

It follows from this observation that there is no way to load two dice so that the probability that a given sum will turn up when they are tossed is the same for all sums (i.e., that all outcomes are equally likely).

#### Exercise \(\PageIndex{10}\)

Show that if \[h(z) = \frac{1 - \sqrt{1 - 4pqz^2}}{2qz}\ ,\] then \[h(1) = \left \{ \begin{array}{ll} p/q, & \mbox{if $p \leq q,$} \\ 1, & \mbox{if $p \geq q,$} \end{array}\right.\] and \[h'(1) = \left \{ \begin{array}{ll} 1/(p - q), & \mbox{if $p > q,$}\\ \infty, & \mbox{if $p = q.$} \end{array}\right.\]

#### Exercise \(\PageIndex{11}\)

Show that if \(X\) is a random variable with mean \(\mu\) and variance \(\sigma^2\), and if \(X^* = (X - \mu)/\sigma\) is the standardized version of \(X\), then \[g_{X^*}(t) = e^{-\mu t/\sigma} g_X\left( \frac t\sigma \right)\ .\]