14.4: The Poisson Distribution

Last updated
Save as PDF

Page ID: 10269

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\(\newcommand{\P}{\mathbb{P}}\) \(\newcommand{\E}{\mathbb{E}}\) \(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\N}{\mathbb{N}}\) \(\newcommand{\bs}{\boldsymbol}\) \(\newcommand{\var}{\text{var}}\) \(\newcommand{\sd}{\text{sd}}\) \(\newcommand{\skw}{\text{skew}}\) \(\newcommand{\kur}{\text{kurt}}\)

Basic Theory

Recall that in the Poisson model, \(\bs{X} = (X_1, X_2, \ldots)\) denotes the sequence of inter-arrival times, and \(\bs{T} = (T_0, T_1, T_2, \ldots)\) denotes the sequence of arrival times. Thus \(\bs{T}\) is the partial sum process associated with \(\bs{X}\): \[ T_n = \sum_{i=0}^n X_i, \quad n \in \N \] Based on the strong renewal assumption, that the process restarts at each fixed time and each arrival time, independently of the past, we now know that \(\bs{X}\) is a sequence of independent random variables, each with the exponential distribution with rate parameter \(r \), for some \( r \in (0, \infty) \). We also know that \( \bs{T} \) has stationary, independent increments, and that for \( n \in \N_+ \), \(T_n\) has the gamma distribution with rate parameter \(r\) and scale parameter \(n\). Both of the statements characterize the Poisson process with rate \( r \).

Recall that for \(t \ge 0\), \(N_t\) denotes the number of arrivals in the interval \((0, t]\), so that \( N_t = \max\{n \in \N: T_n \le t\} \). We refer to \(\bs{N} = (N_t: t \ge 0)\) as the counting process. In this section we will show that \( N_t \) has a Poisson distribution, named for Simeon Poisson, one of the most important distributions in probability theory. Our exposition will alternate between properties of the distribution and properties of the counting process. The two are intimately intertwined. It's not too much of an exaggeration to say that wherever there is a Poisson distribution, there is a Poisson process lurking in the background.

Probability density function.

Recall that the probability density function of the \( n \)th arrival time \(T_n\) is \[ f_n(t) = r^n \frac{t^{n-1}}{(n-1)!} e^{-r t}, \quad 0 \le t \lt \infty \] We can find the distribution of \(N_t\) because of the inverse relation between \(\bs{N}\) and \(\bs{T}\). In particular, recall that \[ N_t \ge n \iff T_n \le t, \quad t \in (0, \infty), \; n \in \N_+ \] since both events mean that there are at least \(n\) arrivals in \((0, t]\).

For \( t \in [0, \infty) \), the probability density function of \(N_t\) is given by

\[ \P(N_t = n) = e^{-r t} \frac{(r t)^n}{n!}, \quad n \in \N \]

Proof

Using the inverse relationship noted above, and integration by parts, we have \[ \P(N_t \ge n) = \P(T_n \le t) = \int_0^t f_n(s) \, ds = 1 - \sum_{k=0}^{n-1} e^{-r t} \frac{(r t)^k}{k!}, \quad n \in \N \] For \(n \in \N\) we have \(\P(N_t = n) = \P(N_t \ge n) - \P(N_t \ge n + 1)\). Simplifying gives the result.

Note that the distribution of \(N_t\) depends on the paramters \(r\) and \(t\) only through the product \(r t\). The distribution is called the Poisson distribution with parameter \(r t\).

In the Poisson experiment, vary \(r\) and \(t\) with the scroll bars and note the shape of the probability density function. For various values of \(r\) and \(t\), run the experiment 1000 times and compare the relative frequency function to the probability density function.

In general, a random variable \(N\) taking values in \(\N\) is said to have the Poisson distribution with parameter \(a \in (0, \infty)\) if it has the probability density function \[ g(n) = e^{-a} \frac{a^n}{n!}, \quad n \in \N \]

\(g(n - 1) \lt g(n)\) if and only if \(n \lt a\).
If \(a \notin \N_+\), there is a single mode at \(\lfloor a \rfloor\).
If \(a \in \N_+\), there are consecutive modes at \(a - 1\) and \(a\).

Proof

Part (a) follows from simple algebra, and similarly, \( g(n - 1) = g(n) \) if and only if \( n = a \) (and thus \( a \in \N_+ \)). Parts (b) and (c) then follow.

The Poisson distribution does not have simple closed-form distribution or quantile functions. Trivially, we can write the distribution function as a sum of the probability density function.

The Poisson distribution with parameter \( a \) has distribution function \( G \) given by \[ G(n) = \sum_{k=0}^n e^{-a} \frac{a^k}{k!}, \quad n \in \N \]

Open the special distribution calculator, select the Poisson distribution, and select CDF view. Vary the parameter and note the shape of the distribution and quantile functions. For various values of the parameter, compute the quartiles.

Sometimes it's convenient to allow the parameter \( a \) to be 0. This degenerate Poisson distribution is simply point mass at 0. That is, with the usual conventions regarding nonnegative integer powers of 0, the probability density function \( g \) above reduces to \( g(0) = 1 \) and \( g(n) = 0 \) for \( n \in \N_+ \).

Moments

Suppose that \(N\) has the Poisson distribution with parameter \(a \gt 0\). Naturally we want to know the mean, variance, skewness and kurtosis, and the probability generating function of \(N\). The easiest moments to compute are the factorial moments. For this result, recall the falling power notation for the number of permutations of size \(k\) chosen from a population of size \(n\): \[ n^{(k)} = n (n - 1) \cdots [n - (k + 1)] \]

The factorial moment of \(N\) of order \(k \in \N\) is \(\E\left[N^{(k)}\right] = a^k\).

Proof

Using the standard change of variables formula for expected value, \[ \E[N^{(k)}] = \sum_{n=0}^\infty n^{(k)} e^{-a} \frac{a^n}{n!} = e^{-a} a^k \sum_{n=k}^\infty \frac{a^{n-k}}{(n-k)!} = e^{-a} a^k e^a = a^k \]

The mean and variance of \(N\) are the parameter \(a\).

\(\E(N) = a\)
\(\var(N) = a\)

Proof

This follows directly from the first factorial moment: \(\E(N) = \E\left[N^{(1)}\right] = a\).
Note that \(\E(N^2) = \E\left[N^{(2)}\right] + \E(N) = a^2 + a\).

Open the special distribution simulator and select the Poisson distribution. Vary the parameter and note the location and size of the mean\( \pm \)standard deviation bar. For selected values of the parameter, run the simulation 1000 times and compare the empirical mean and standard deviation to the distribution mean and standard deviation.

The skewness and kurtosis of \( N \) are

\( \skw(N) = 1 / \sqrt{a} \)
\( \kur(N) = 3 + 1 / a \)

Proof

These results follow from the computational formulas for skewness and kurtosis and the results for factorial moments above. Specifically,

\( \E(N^3) = a^3 + 3 a^2 + a \) and \( \skw(N) = [\E(N^3) - 3 a^2 - a^3] / a^{3/2} \)
\( \E(N^4) = a^4 + 6 a^3 + 7 a^2 + a \) and \(\kur(N) = [\E(N^4) - 4 a \E(N^3) + 6 a^3 + 3 a^4] / a^2 \)

Note that the Poisson distribution is positively skewed, but \( \skw(N) \to 0 \) as \( n \to \infty \). Recall also that the excess kurtosis is \( \kur(N) - 3 = +1 /a \to 0 \) as \( n \to \infty \). This limit is related to the convergence of the Poisson distribution to the normal, discussed below.

Open the special distribution simulator and select the Poisson distribution. Vary the parameter and note the shape of the probability density function in the context of the results on skewness and kurtosis above.

The probability generating function \( P \) of \(N\) is given by \[ P(s) = \E\left(s^N\right) = e^{a(s - 1)}, \quad s \in \R \]

Proof

Using the change of variables formula again, \[ \E(s^N) = \sum_{n=0}^\infty s^n e^{-a} \frac{a^n}{n!} = e^{-a} \sum_{n=0}^\infty \frac{(a s)^n}{n!} = e^{-a} e^{a s}, \quad s \in \R \]

Returning to the Poisson counting process \(\bs{N} = (N_t: t \ge 0)\) with rate parameter \(r\), it follows that \(\E(N_t) = r t\) and \(\var(N_t) = r t\) for \(t \ge 0\). Once again, we see that \(r\) can be interpreted as the average arrival rate. In an interval of length \(t\), we expect about \(r t\) arrivals.

In the Poisson experiment, vary \(r\) and \(t\) with the scroll bars and note the location and size of the mean\( \pm \)standard deviation bar. For various values of \(r\) and \(t\), run the experiment 1000 times and compare the sample mean and standard deviation to the distribution mean and standard deviation, respectively.

Estimating the Rate

Suppose again that we have a Poisson process with rate \( r \in (0, \infty) \). In many practical situations, the rate \(r\) in unknown and must be estimated based on observing data. For fixed \(t \gt 0\), a natural estimator of the rate \(r\) is \(N_t / t\).

The mean and variance of \(N_t / t\) are

\(\E(N_t / t) = r\)
\(\var(N_t / t) = r / t\)

Proof

These result follow easily from \(\E(N_t) = \var(N_t) = r t\) and basic properties of expected value and variance.

Part (a) means that the estimator is unbiased. Since this is the case, the variance in part (b) gives the mean square error. Since \(\var(N_t)\) decreases to 0 as \(t \to \infty\), the estimator is consistent.

Additional Properties and Connections

Increments and Characterizations

Let's explore the basic renewal assumption of the Poisson model in terms of the counting process \(\bs{N} = (N_t: t \ge 0)\). Recall that \(N_t\) is the number of arrivals in the interval \((0, t]\), so it follows that if \(s, \; t \in [0, \infty)\) with \(s \lt t\), then \(N_t - N_s\) is the number of arrivals in the interval \((s, t]\). Of course, the arrival times have continuous distributions, so the probability that an arrival occurs at a specific point \(t\) is 0. Thus, it does not really matter if we write the interval above as \((s, t]\), \((s, t)\), \([s, t)\) or \([s, t]\).

The process \(\bs{N}\) has stationary, independent increments.

If \(s, \, t \in [0, \infty)\) with \(s \lt t\) then \(N_t - N_s\) has the same distribution as \(N_{t-s}\), namely Poisson with parameter \(r (t - s)\).
If \(t_1, \, t_2, \, t_3 \ldots \in [0, \infty)\) with \(t_1 \lt t_2 \lt t_3 \lt \cdots\) then \(\left(N_{t_1}, N_{t_2} - N_{t_1}, N_{t_3} - N_{t_2}, \ldots\right)\) is an independent sequence.

Statements about the increments of the counting process can be expressed more elegantly in terms of our more general counting process. Recall that for \(A \subseteq [0, \infty)\) (measurable of course), \(N(A)\) denotes the number of random points in \(A\): \[ N(A) = \#\left\{n \in \N_+: T_n \in A\right\} \] and so in particular, \( N_t = N(0, t] \). Thus, note that \( t \mapsto N_t \) is a (random) distribution function and \( A \mapsto N(A) \) is the (random) measure associated with this distribution function. Recall also that \(\lambda\) denotes the standard length (Lebesgue) measure on \([0, \infty)\). Here is our third characterization of the Poisson process.

A process of random points in time is a Poisson process with rate \( r \in (0, \infty) \) if and only if the following properties hold:.

If \( A \subseteq [0, \infty) \) is measurable then \( N(A) \) has the Poisson distribution with parameter \( r \lambda(A) \).
if \( \{A_i: i \in I\} \) is a countable, disjoint collection of measurable sets in \( [0, \infty) \) then \( \{N(A_i): i \in I\} \) is a set of independent variables.

From a modeling point of view, the assumptions of stationary, independent increments are ones that might be reasonably made. But the assumption that the increments have Poisson distributions does not seem as clear. Our next characterization of the process is more primitive in a sense, because it just imposes some limiting assumptions (in addition to stationary, independent increments.

A process of random points in time is a Poisson process with rate \( r \in (0, \infty) \) if and only if the following properties hold:

If \( A, \, B \subseteq [0, \infty) \) are measurable and \(\lambda(A) = \lambda(B)\), then \(N(A)\) and \(N(B)\) have the same distribution.
if \( \{A_i: i \in I\} \) is a countable, disjoint collection of measurable sets in \( [0, \infty) \) then \( \{N(A_i): i \in I\} \) is a set of independent variables.
If \( A_n \subseteq [0, \infty) \) is measurable and \(\lambda(A_n) \gt 0\) for \(n \in \N_+\), and if \(\lambda(A_n) \to 0\) as \(n \to \infty\) then \begin{align} &\frac{\P\left[N(A_n) = 1\right]}{\lambda(A_n)} \to r \text{ as } n \to \infty \\ &\frac{\P\left[N(A_n) \gt 1\right]}{\lambda(A_n)} \to 0 \text{ as } n \to \infty \end{align}

Proof

As usual, let \(N_t = N(0, t]\), the number of arrivals in \( (0, t] \), and in addition let \(P_n(t) = \P(N_t = n)\) for \(t \ge 0\) and \(n \in \N\). Note first that \(P_0\) satisfies the following differential equation and initial condition: \[ P_0^\prime(t) = -r P_0(t), \; t \gt 0; \quad P_0(0) = 1 \] Hence \(P_0(t) = e^{-r t}\) for \(t \ge 0\). Next for \(n \in \N_+\), \(P_n\) satisfies the following differential equation and initial condition \[P_n^\prime(t) = -r P_n(t) + r P_{n-1}(t), \; t \gt 0; \quad P_n(0) = 0 \] Hence \(P_n(t) = e^{-r t} (r t)^n / n!\) for \(t \ge 0\) and therefore \(N_t\) has the Poisson distribution with parameter \(r t\).

Of course, part (a) is the stationary assumption and part (b) the independence assumption. The first limit in (c) is sometimes called the rate property and the second limit the sparseness property. In a small time interval of length \( dt \), the probability of a single random point is approximately \( r \, dt \), and the probability of two or more random points is negligible.

Sums

Suppose that \(N\) and \(M\) are independent random variables, and that \(N\) has the Poisson distribution with parameter \(a \in (0, \infty)\) and \(M\) has the Poisson distribution with parameter \(b \in (0, \infty)\). Then \(N + M\) has the Poisson distribution with parameter \(a + b\).

Proof from the Poisson process

There are several ways to prove this result, but the one that gives the most insight is a probabilistic proof based on the Poisson process. Thus suppose that \(\bs{N} = (N_t: t \ge 0)\) is a Poisson counting process with rate 1. We can associate \(N\) with \(N_a\) and \(M\) with \(N_{a + b} - N_a\), since these have the correct distributions and are independent. But then \(N + M\) is \(N_{a + b}\).

Proof from probability generating functions

From our result above, \( M \) has PGF \( P(s) = e^{a (s - 1)} \) for \( s \in \R \), and \( N \) has PGF \( Q(s) = e^{b(s-1)} \) for \( s \in \R \). Hence \( M + N \) has PGF \( P(s) Q(s) = e^{(a+b)(s-1)} \) for \( s \in \R \). But this is the PGF of the Poisson distribution with parameter \( a + b \).

Proof from convolution

From our results above, \( M \) has PDF \( g(n) = e^{-a} a^n / n! \) for \( n \in \N \), and \( N \) has PDF \( h(n) = e^{-b} b^n / n! \) for \( n \in \N \). Hence the PDF of \( M + N \) is the convolution \( g * h \). For \( n \in \N \), \[ (g * h)(n) = \sum_{k=0}^n g(k) h(n - k) = \sum_{k=0}^n e^{-a} \frac{a^k}{k!} e^{-b} \frac{b^{n-k}}{(n - k)!} = e^{-(a+b)} \frac{1}{n!} \sum_{k=0}^n \frac{n!}{k! (n - k)!} a^k b^{n-k}\] By the binomial theorem, the last sum is \( (a + b)^n \).

From the last theorem, it follows that the Poisson distribution is infinitely divisible. That is, a Poisson distributed variable can be written as the sum of an arbitrary number of independent, identically distributed (in fact also Poisson) variables.

Suppose that \( N \) has the Poisson distribution with parameter \( a \in (0, \infty) \). Then for \( n \in \N_+ \), \( N \) has the same distribution as \( \sum_{i=1}^n N_i \) where \( (N_1, N_2, \ldots, N_n) \) are independent, and each has the Poisson distribution with parameter \( a / n \).

Normal Approximation

Because of the representation as a sum of independent, identically distributed variables, it's not surprising that the Poisson distribution can be approximated by the normal.

Suppose that \(N_t\) has the Poisson distribution with parameter \(t \gt 0\). Then the distribution of the variable below converges to the standard normal distribution as \(t \to \infty\). \[ Z_t = \frac{N_t - t}{\sqrt{t}} \]

Proof

As usual, we can assume that \((N_t: t \ge 0)\) is the Poisson counting process with rate 1. Note that \(Z_t\) is simply the standard score associated with \(N_t\). For \(n \in \N_+\), \(N_n\) is the sum of \(n\) independent variables, each with the Poisson distribution with parameter 1. Thus, from the central limit theorem, the distribution of \(Z_n\) converges to the standard normal distribution as \(n \to \infty\). For general \(t \in [0, \infty)\), it's possible to write \(Z_t = Z_n + W_t\) where \(n = \lfloor t \rfloor\) and \(W_t \to 0\) as \(t \to \infty\) (in probability and hence in distribution).

Thus, if \(N\) has the Poisson distribution with parameter \(a\), and \(a\) is large, then the distribution of \(N\) is approximately normal with mean \(a\) and standard deviation \(\sqrt{a}\). When using the normal approximation, we should remember to use the continuity correction, since the Poisson is a discrete distribution.

In the Poisson experiment, set \(r = t = 1\). Increase \(t\) and note how the graph of the probability density function becomes more bell-shaped.

General Exponential

The Poisson distribution is a member of the general exponential family of distributions. This fact is important in various statistical procedures.

Suppose that \(N\) has the Poisson distribution with parameter \(a \in (0, \infty)\). This distribution is a one-parameter exponential family with natural parameter \(\ln(a)\) and natural statistic \(N\).

Proof

This follows from the form of the Poisson PDF: \[ g(n) = e^{-a} \frac{a^n}{n!} = \frac{e^{-a}}{n!} \exp\left[n \ln (a)\right], \quad n \in \N \]

The Uniform Distribution

The Poisson process has some basic connections to the uniform distribution. Consider again the Poisson process with rate \(r \gt 0\). As usual, \(\bs{T} = (T_0, T_1, \ldots)\) denotes the arrival time sequence and \(\bs{N} = (N_t: t \ge 0)\) the counting process.

For \(t \gt 0\), the conditional distribution of \(T_1\) given \(N_t = 1\) is uniform on the interval \((0, t]\).

Proof

Given \(N_t = 1\) (one arrival in \((0, t]\)) the arrival time \(T_1\) takes values in \((0, t]\). From independent and stationary increments properties, \[ \P(T_1 \le s \mid N_t = 1) = \P(N_s = 1, N_t - N_s = 0 \mid N_t = 1) = \frac{\P(N_s = 1, N_t - N_s = 0)}{\P(N_t = 0)} = \frac{\P(N_s = 1) \P(N_t - N_s = 0)}{\P(N_t = 1)}\] Hence using the Poisson distribution, \[ \P(T_1 \le s \mid N_t = 1) = \frac{e^{-r\,s} s e^{-r(t - s)}}{e^{-r\,t} t} = \frac{s}{t}, \quad 0 \lt s \le t \]

More generally, for \(t \gt 0\) and \(n \in \N_+\), the conditional distribution of \((T_1, T_2, \ldots, T_n)\) given \(N_t = n\) is the same as the distribution of the order statistics of a random sample of size \(n\) from the uniform distribution on the interval \((0, t]\).

Heuristic proof

Suppose that \( 0 \lt t_1 \lt t_2 \lt t_n \lt t \). On the event \( N_t = n \), the probability density of \( (T_1, T_2, \ldots, T_n) \) at \( (t_1, t_2, \ldots, t_n) \) is the probability density function of independent inter-arrival times \( \left(t_1, t_2 - t_1, \ldots, t_n - t_{n-1}\right) \) times the probability of no arrivals in the interval \((t_n, t) \). Hence given \( N_t = n \), the conditional density of \( (T_1, T_2, \ldots, T_n) \) at \( (t_1, t_2, \ldots, t_n) \) is \[ \frac{r e^{- r t_1} r e^{-r (t_2 - t_1)} \cdots r e^{-r(t_n - t_{n-1})} e^{-r(t - t_n)}}{e^{-r t} (r t)^n \big/n!} = \frac{n!}{t^n} \] But this is the PDF of the order statistics from a sample of size \( n \) from the uniform distribution on \( [0, t] \).

Note that the conditional distribution in the last result is independent of the rate \(r\). This means that, in a sense, the Poisson model gives the most random distribution of points in time.

The Binomial Distribution

The Poisson distribution has important connections to the binomial distribution. First we consider a conditional distribution based on the number of arrivals of a Poisson process in a given interval, as we did in the last subsection.

Suppose that \( (N_t: t \in [0, \infty)) \) is a Poisson counting process with rate \( r \in (0, \infty) \). If \( s, \, t \in (0, \infty) \) with \( s \lt t \), and \(n \in \N_+\), then the conditional distribution of \(N_s\) given \(N_t = n\) is binomial with trial parameter \(n\) and success parameter \(p = s / t\).

Proof

Note that given \(N_t = n\), the number of arrivals \(N_s\) in \((0, s]\) takes values in \(\{0, 1 \ldots, n\}\). Again, the stationary and independent increments properties are critical for the proof. \[ \P(N_s = k \mid N_t = n) = \frac{\P(N_s = k, N_t = n)}{\P(N_t = n)} = \frac{\P(N_s = k, N_t - N_s = n - k)}{\P(N_t = n)} = \frac{\P(N_s = k) \P(N_t - N_s = n - k)}{\P(N_t = n)} \] Subsitituting into the Poisson PDFs gives \[ \P(N_s = k \mid N_t = n) = \frac{\left(e^{-r s}(r s)^k / k!\right)\left(e^{-r(t-s)}[(r(t - s)]^{n-k}/(n - k)!\right)}{e^{-r t}(r t)^n / n!} = \frac{n!}{k! (n - k)!} \left(\frac{s}{t}\right)^k \left(1 - \frac{s}{t}\right)^{n-k}\]

Note again that the conditional distribution in the last result does not depend on the rate \(r\). Given \( N_t = n \), each of the \( n \) arrivals, independently of the others, falls into the interval \( (0, s] \) with probability \( s / t \) and into the interval \( (s, t] \) with probability \( 1 - s / t = (t - s) / t \). Here is essentially the same result, outside of the context of the Poisson process.

Suppose that \( M \) has the Poisson distribution with parameter \( a \in (0, \infty) \), \( N \) has the Poisson distribution with parameter \( b \in (0, \infty) \), and that \( M \) and \( N \) are independent. Then the conditional distribution of \( M \) given \( M + N = n \) is binomial with parameters \( n \) and \( p = a / (a + b) \).

Proof

The proof is essentially the same as the previous theorem, with minor modifications. First recall from the result above that \( M + N \) has the Poisson distribution with parameter \( a + b \). For \( n, \, k \in \N \) with \( k \le n \), \[ \P(M = k \mid M + N = n) = \frac{\P(M = k, M + N = n)}{\P(M + N = n)} = \frac{\P(M = k, N = n - k)}{\P(N + M = n)} = \frac{\P(M = k) \P(N = n - k)}{\P(M + N = n)} \] Subsitituting into the Poisson PDFs gives \[ \P(M = k \mid M + N = n) = \frac{\left(e^{-a} a^k / k!\right)\left(e^{-b} b^{n-k}/(n - k)!\right)}{e^{-(a + b)}(a + b)^n / n!} = \frac{n!}{k! (n - k)!} \left(\frac{a}{a + b}\right)^k \left(1 - \frac{a}{a + b}\right)^{n-k}\]

More importantly, the Poisson distribution is the limit of the binomial distribution in a certain sense. As we will see, this convergence result is related to the analogy between the Bernoulli trials process and the Poisson process that we discussed in the Introduction, the section on the inter-arrival times, and the section on the arrival times.

Suppose that \( p_n \in (0, 1) \) for \( n \in \N_+ \) and that \( n p_n \to a \in (0, \infty) \) as \( n \to \infty \). Then the binomial distribution with parameters \(n\) and \(p_n\) converges to the Poisson distribution with parameter \(a\) as \(n \to \infty\). That is, for fixed \(k \in \N\), \[ \binom{n}{k} p_n^k (1 - p_n)^{n-k} \to e^{-a} \frac{a^k}{k!} \text{ as } n \to \infty \]

Direct proof

The binomial PDF with parameters \( n \) and \( p_n \) at \( k \in \{0, 1, \ldots, n\} \) can be written as \[ \frac{1}{k!}\left[n p_n\right]\left[(n - 1) p_n\right] \cdots \left[(n - k + 1) p_n\right] \left(1 - n \frac{p_n}{n}\right)^{n-k} \] But \((n - j) p_n \to a\) as \(n \to \infty\) for fixed \(j\). Also, using a basic theorem from calculus, \(\left(1 - n p_n / n\right)^{n-k} \to e^{-a}\) as \(n \to \infty\).

Proof from generating functions

An easier proof uses probability generating functions. For \(s \in \R\), using the same basic limit from calculus, \[ \left[(1 - p_n) + p_n s\right]^n \to e^{a(s - 1)} \text{ as } n \to \infty \] The left side is the PGF of the binomial distribution with parameters \( n \) and \( p_n \), while the right side is the PGF of the Poisson distribution with parameter \( a \).

The mean and variance of the binomial distribution converge to the mean and variance of the limiting Poisson distribution, respectively.

\(n p_n \to a\) as \(n \to \infty\)
\(n p_n (1 - p_n) \to a\) as \(n \to \infty\)

Of course the convergence of the means is precisely our basic assumption, and is further evidence that this is the essential assumption. But for a deeper look, let's return to the analogy between the Bernoulli trials process and the Poisson process. Recall that both have the strong renewal property that at each fixed time, and at each arrival time, the process stochastically starts over, independently of the past. The difference, of course, is that time is discrete in the Bernoulli trials process and continuous in the Poisson process. The convergence result is a special case of the more general fact that if we run Bernoulli trials at a faster and faster rate but with a smaller and smaller success probability, in just the right way, the Bernoulli trials process converges to the Poisson process. Specifically, suppose that we have a sequence of Bernoulli trials processes. In process \(n\) we perform the trials at a rate of \(n\) per unit time, with success probability \(p_n\). Our basic assumption is that \(n p_n \to r\) as \(n \to \infty\) where \(r \gt 0\). Now let \(Y_{n,t}\) denote the number of successes in the time interval \((0,t]\) for Bernoulli trials process \(n\), and let \(N_t\) denote the number of arrivals in this interval for the Poisson process with rate \( r \). Then \( Y_{n,t} \) has the binomial distribution with parameters \(\lfloor n t \rfloor\) and \(p_n\), and of course \( N_t \) has the Poisson distribution with parameter \( r t \).

For \(t \gt 0\), the distribution of \(Y_{n,t}\) converges to the distribution of \(N_t\) as \(n \to \infty\).

Proof

Note that \( n t - 1 \lt \lfloor n t \rfloor \le n t \) and hence \( n p_n t - p_n \lt \lfloor n t \rfloor p_n \le n p_n t \). Since \( n p_n \to r \) and \( p_n \to 0 \) as \( n \to \infty \), it follows from the squeeze theorem for limits that \( \lfloor n t \rfloor p_n \to r t \) as \( n \to \infty \). Thus, the result follows from our previous convergence theorem.

Compare the Poisson experiment and the binomial timeline experiment.

Open the Poisson experiment and set \( r = 1 \) and \( t = 5 \). Run the experiment a few times and note the general behavior of the random points in time. Note also the shape and location of the probability density function and the mean\( \pm \)standard deviation bar.
Now open the binomial timeline experiment and set \( n = 100 \) and \( p = 0.05 \). Run the experiment a few times and note the general behavior of the random points in time. Note also the shape and location of the probability density function and the mean\( \pm \)standard deviation bar.

From a practical point of view, the convergence of the binomial distribution to the Poisson means that if the number of trials \(n\) is large and the probability of success \(p\) small, so that \(n p^2\) is small, then the binomial distribution with parameters \(n\) and \(p\) is well approximated by the Poisson distribution with parameter \(r = n p\). This is often a useful result, because the Poisson distribution has fewer parameters than the binomial distribution (and often in real problems, the parameters may only be known approximately). Specifically, in the approximating Poisson distribution, we do not need to know the number of trials \(n\) and the probability of success \(p\) individually, but only in the product \(n p\). The condition that \(n p^2\) be small means that the variance of the binomial distribution, namely \(n p (1 - p) = n p - n p^2\) is approximately \(r = n p\), the variance of the approximating Poisson distribution.

Recall that the binomial distribution can also be approximated by the normal distribution, by virtue of the central limit theorem. The normal approximation works well when \(n p\) and \(n (1 - p)\) are large; the rule of thumb is that both should be at least 5. The Poisson approximation works well, as we have already noted, when \(n\) is large and \( n p^2 \) small.

Computational Exercises

Suppose that requests to a web server follow the Poisson model with rate \(r = 5\) per minute. Find the probability that there will be at least 8 requests in a 2 minute period.

Answer

0.7798

Defects in a certain type of wire follow the Poisson model with rate 1.5 per meter. Find the probability that there will be no more than 4 defects in a 2 meter piece of the wire.

Answer

0.8153

Suppose that customers arrive at a service station according to the Poisson model, at a rate of \(r = 4\). Find the mean and standard deviation of the number of customers in an 8 hour period.

Answer

32, 5.657

In the Poisson experiment, set \(r = 5\) and \(t = 4\). Run the experiment 1000 times and compute the following:

\(\P(15 \le N_4 \le 22) \)
The relative frequency of the event \(\{15 \le N_4 \le 22\}\).
The normal approximation to \(\P(15 \le N_4 \le 22)\).

Answer

0.6157
0.6025

Suppose that requests to a web server follow the Poisson model with rate \(r = 5\) per minute. Compute the normal approximation to the probability that there will be at least 280 requests in a 1 hour period.

Answer

0.8818

Suppose that requests to a web server follow the Poisson model, and that 1 request comes in a five minute period. Find the probability that the request came during the first 3 minutes of the period.

Answer

0.6

Suppose that requests to a web server follow the Poisson model, and that 10 requests come during a 5 minute period. Find the probability that at least 4 requests came during the first 3 minutes of the period.

Answer

0.9452

In the Poisson experiment, set \(r = 3\) and \(t = 5\). Run the experiment 100 times.

For each run, compute the estimate of \(r\) based on \(N_t\).
Over the 100 runs, compute the average of the squares of the errors.
Compare the result in (b) with \( \var(N_t) \).

Suppose that requests to a web server follow the Poisson model with unknown rate \(r\) per minute. In a one hour period, the server receives 342 requests. Estimate \(r\).

Answer

\(r = 5.7\) per minute

In the binomial experiment, set \(n = 30\) and \(p = 0.1\), and run the simulation 1000 times. Compute and compare each of the following:

\(\P(Y_{30} \le 4)\)
The relative frequency of the event \(\{Y_{30} \le 4\}\)
The Poisson approximation to \(\P(Y_{30} \le 4)\)

Answer

0.8245
0.8153

Suppose that we have 100 memory chips, each of which is defective with probability 0.05, independently of the others. Approximate the probability that there are at least 3 defectives in the batch.

Answer

0.7350

In the binomial timeline experiment, set \(n = 40\) and \(p = 0.1\) and run the simulation 1000 times. Compute and compare each of the following:

\(\P(Y_{40} \gt 5)\)
The relative frequency of the event \(\{Y_{40} \gt 5\}\)
The Poisson approximation to \(\P(Y_{40} \gt 5)\)
The normal approximation to \(\P(Y_{40} \gt 5)\)

Answer

0.2063
0.2149
0.2146

In the binomial timeline experiment, set \(n = 100\) and \(p = 0.1\) and run the simulation 1000 times. Compute and compare each of the following:

\(\P(8 \lt Y_{100} \lt 15)\)
The relative frequency of the event \(\{8 \lt Y_{100} \lt 15\}\)
The Poisson approximation to \(\P(8 \lt Y_{100} \lt 15)\)
The normal approximation to \(\P(8 \lt Y_{100} \lt 15)\)

Answer

0.6066
0.5837
0.6247

A text file contains 1000 words. Assume that each word, independently of the others, is misspelled with probability \(p\).

If \(p = 0.015\), approximate the probability that the file contains at least 20 misspelled words.
If \(p = 0.001\), approximate the probability that the file contains at least 3 misspelled words.

Answer

The true distribution of the number of misspelled words is binomial, with \( n = 1000 \) and \( p \).

The normal approximation (with \( \mu = n p = 15 \) and \( \sigma^2 = n p (1 - p) = 14.775 \)) is 0.120858. The Poisson approximation (with parameter \( n p = 15 \)) is 0.124781. The true binomial probability is 0.123095.
The Poisson approximation (with parameter \( n p = 1 \)) is 0.0803014. The true binomial probability is 0.0802093.