14.2: The Exponential Distribution
- Page ID
- 10267
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Basic Theory
The Memoryless Property
Recall that in the basic model of the Poisson process, we have points
that occur randomly in time. The sequence of inter-arrival times is \(\bs{X} = (X_1, X_2, \ldots)\). The strong renewal assumption states that at each arrival time and at each fixed time, the process must probabilistically restart, independent of the past. The first part of that assumption implies that \(\bs{X}\) is a sequence of independent, identically distributed variables. The second part of the assumption implies that if the first arrival has not occurred by time \(s\), then the time remaining until the arrival occurs must have the same distribution as the first arrival time itself. This is known as the memoryless property and can be stated in terms of a general random variable as follows:
Suppose that \( X \) takes values in \( [0, \infty) \). Then \( X \) has the memoryless property if the conditional distribution of \(X - s\) given \(X \gt s\) is the same as the distribution of \(X\) for every \( s \in [0, \infty) \). Equivalently, \[ \P(X \gt t + s \mid X \gt s) = \P(X \gt t), \quad s, \; t \in [0, \infty) \]
The memoryless property determines the distribution of \(X\) up to a positive parameter, as we will see now.
Distribution functions
Suppose that \(X\) takes values in \( [0, \infty) \) and satisfies the memoryless property.
\(X\) has a continuous distribution and there exists \(r \in (0, \infty)\) such that the distribution function \(F\) of \(X\) is \[ F(t) = 1 - e^{-r\,t}, \quad t \in [0, \infty) \]
Proof
Let \(F^c = 1 - F\) denote the denote the right-tail distribution function of \(X\) (also known as the reliability function), so that \(F^c(t) = \P(X \gt t)\) for \(t \ge 0\). From the definition of conditional probability, the memoryless property is equivalent to the law of exponents: \[ F^c(t + s) = F^c(s) F^c(t), \quad s, \; t \in [0, \infty) \] Let \(a = F^c(1)\). Implicit in the memoryless property is \(\P(X \gt t) \gt 0\) for \(t \in [0, \infty)\), so \(a \gt 0\). If \(n \in \N_+\) then \[ F^c(n) = F^c\left(\sum_{i=1}^n 1\right) = \prod_{i=1}^n F^c(1) = \left[F^c(1)\right]^n = a^n \] Next, if \(n \in \N_+\) then \[ a = F^c(1) = F^c\left(\frac{n}{n}\right) = F^c\left(\sum_{i=1}^n \frac{1}{n}\right) = \prod_{i=1}^n F^c\left(\frac{1}{n}\right) = \left[F^c\left(\frac{1}{n}\right)\right]^n \] so \(F^c\left(\frac{1}{n}\right) = a^{1/n}\). Now suppose that \(m \in \N\) and \(n \in \N_+\). Then \[ F^c\left(\frac{m}{n}\right) = F^c\left(\sum_{i=1}^m \frac{1}{n}\right) = \prod_{i=1}^m F^c\left(\frac{1}{n}\right) = \left[F^c\left(\frac{1}{n}\right)\right]^m = a^{m/n} \] Thus we have \(F^c(q) = a^q\) for rational \(q \in [0, \infty)\). For \(t \in [0, \infty)\), there exists a sequence of rational numbers \((q_1, q_2, \ldots)\) with \(q_n \downarrow t\) as \(n \uparrow \infty\). We have \(F^c(q_n) = a^{q_n}\) for each \(n \in \N_+\). But \(F^c\) is continuous from the right, so taking limits gives \(a^t = F^c(t) \). Now let \(r = -\ln(a)\). Then \(F^c(t) = e^{-r\,t}\) for \(t \in [0, \infty)\).
The probability density function of \(X\) is \[ f(t) = r \, e^{-r\,t}, \quad t \in [0, \infty) \]
- \( f \) is decreasing on \( [0, \infty) \).
- \( f \) is concave upward on \( [0, \infty) \).
- \( f(t) \to 0 \) as \( t \to \infty \).
Proof
This follows since \( f = F^\prime \). The properties in parts (a)–(c) are simple.
A random variable with the distribution function above or equivalently the probability density function in the last theorem is said to have the exponential distribution with rate parameter \(r\). The reciprocal \(\frac{1}{r}\) is known as the scale parameter (as will be justified below). Note that the mode of the distribution is 0, regardless of the parameter \( r \), not very helpful as a measure of center.
In the gamma experiment, set \(n = 1\) so that the simulated random variable has an exponential distribution. Vary \(r\) with the scroll bar and watch how the shape of the probability density function changes. For selected values of \(r\), run the experiment 1000 times and compare the empirical density function to the probability density function.
The quantile function of \(X\) is \[ F^{-1}(p) = \frac{-\ln(1 - p)}{r}, \quad p \in [0, 1) \]
- The median of \(X\) is \(\frac{1}{r} \ln(2) \approx 0.6931 \frac{1}{r}\)
- The first quartile of \(X\) is \(\frac{1}{r}[\ln(4) - \ln(3)] \approx 0.2877 \frac{1}{r}\)
- The third quartile \(X\) is \(\frac{1}{r} \ln(4) \approx 1.3863 \frac{1}{r}\)
- The interquartile range is \(\frac{1}{r} \ln(3) \approx 1.0986 \frac{1}{r}\)
Proof
The formula for \( F^{-1} \) follows easily from solving \( p = F^{-1}(t) \) for \( t \) in terms of \( p \).
In the special distribution calculator, select the exponential distribution. Vary the scale parameter (which is \( 1/r \)) and note the shape of the distribution/quantile function. For selected values of the parameter, compute a few values of the distribution function and the quantile function.
Returning to the Poisson model, we have our first formal definition:
A process of random points in time is a Poisson process with rate \( r \in (0, \infty) \) if and only the interarrvial times are independent, and each has the exponential distribution with rate \( r \).
Constant Failure Rate
Suppose now that \(X\) has a continuous distribution on \([0, \infty)\) and is interpreted as the lifetime of a device. If \(F\) denotes the distribution function of \(X\), then \(F^c = 1 - F\) is the reliability function of \(X\). If \(f\) denotes the probability density function of \(X\) then the failure rate function \( h \) is given by \[ h(t) = \frac{f(t)}{F^c(t)}, \quad t \in [0, \infty) \] If \(X\) has the exponential distribution with rate \(r \gt 0\), then from the results above, the reliability function is \(F^c(t) = e^{-r t}\) and the probability density function is \(f(t) = r e^{-r t}\), so trivially \(X\) has constant rate \(r\). The converse is also true.
If \(X\) has constant failure rate \(r \gt 0\) then \(X\) has the exponential distribution with parameter \(r\).
Proof
Recall that in general, the distribution of a lifetime variable \(X\) is determined by the failure rate function \(h\). Specifically, if \(F^c = 1 - F\) denotes the reliability function, then \((F^c)^\prime = -f\), so \(-h = (F^c)^\prime / F^c\). Integrating and then taking exponentials gives \[ F^c(t) = \exp\left(-\int_0^t h(s) \, ds\right), \quad t \in [0, \infty) \] In particular, if \(h(t) = r\) for \(t \in [0, \infty)\), then \(F^c(t) = e^{-r t}\) for \(t \in [0, \infty)\).
The memoryless and constant failure rate properties are the most famous characterizations of the exponential distribution, but are by no means the only ones. Indeed, entire books have been written on characterizations of this distribution.
Moments
Suppose again that \(X\) has the exponential distribution with rate parameter \(r \gt 0\). Naturaly, we want to know the the mean, variance, and various other moments of \(X\).
If \(n \in \N\) then \(\E\left(X^n\right) = n! \big/ r^n\).
Proof
By the change of variables theorem for expected value, \[ \E\left(X^n\right) = \int_0^\infty t^n r e^{-r\,t} \, dt\] Integrating by parts gives \(\E\left(X^n\right) = \frac{n}{r} \E\left(X^{n-1}\right)\) for \(n \in \N+\). Of course \(\E\left(X^0\right) = 1\) so the result now follows by induction.
More generally, \(\E\left(X^a\right) = \Gamma(a + 1) \big/ r^a\) for every \(a \in [0, \infty)\), where \(\Gamma\) is the gamma function.
In particular.
- \(\E(X) = \frac{1}{r}\)
- \(\var(X) = \frac{1}{r^2}\)
- \(\skw(X) = 2\)
- \(\kur(X) = 9\)
In the context of the Poisson process, the parameter \(r\) is known as the rate of the process. On average, there are \(1 / r\) time units between arrivals, so the arrivals come at an average rate of \(r\) per unit time. The Poisson process is completely determined by the sequence of inter-arrival times, and hence is completely determined by the rate \( r \).
Note also that the mean and standard deviation are equal for an exponential distribution, and that the median is always smaller than the mean. Recall also that skewness and kurtosis are standardized measures, and so do not depend on the parameter \(r\) (which is the reciprocal of the scale parameter).
The moment generating function of \(X\) is \[ M(s) = \E\left(e^{s X}\right) = \frac{r}{r - s}, \quad s \in (-\infty, r) \]
Proof
By the change of variables theorem \[ M(s) = \int_0^\infty e^{s t} r e^{-r t} \, dt = \int_0^\infty r e^{(s - r)t} \, dt \] The integral evaluates to \( \frac{r}{r - s} \) if \( s \lt r \) and to \( \infty \) if \( s \ge r \).
In the gamma experiment, set \(n = 1\) so that the simulated random variable has an exponential distribution. Vary \(r\) with the scroll bar and watch how the mean\( \pm \)standard deviation bar changes. For various values of \(r\), run the experiment 1000 times and compare the empirical mean and standard deviation to the distribution mean and standard deviation, respectively.
Additional Properties
The exponential distribution has a number of interesting and important mathematical properties. First, and not surprisingly, it's a member of the general exponential family.
Suppose that \( X \) has the exponential distribution with rate parameter \( r \in (0, \infty) \). Then \( X \) has a one parameter general exponential distribution, with natural parameter \( -r \) and natural statistic \( X \).
Proof
This follows directly from the form of the PDF, \( f(x) = r e^{-r x} \) for \( x \in [0, \infty) \), and the definition of the general exponential family.
The Scaling Property
As suggested earlier, the exponential distribution is a scale family, and \(1/r\) is the scale parameter.
Suppose that \(X\) has the exponential distribution with rate parameter \(r \gt 0\) and that \(c \gt 0\). Then \(c X\) has the exponential distribution with rate parameter \(r / c\).
Proof
For \(t \ge 0\), \(\P(c\,X \gt t) = \P(X \gt t / c) = e^{-r (t / c)} = e^{-(r / c) t}\).
Recall that multiplying a random variable by a positive constant frequently corresponds to a change of units (minutes into hours for a lifetime variable, for example). Thus, the exponential distribution is preserved under such changes of units. In the context of the Poisson process, this has to be the case, since the memoryless property, which led to the exponential distribution in the first place, clearly does not depend on the time units.
In fact, the exponential distribution with rate parameter 1 is referred to as the standard exponential distribution. From the previous result, if \( Z \) has the standard exponential distribution and \( r \gt 0 \), then \( X = \frac{1}{r} Z \) has the exponential distribution with rate parameter \( r \). Conversely, if \( X \) has the exponential distribution with rate \( r \gt 0 \) then \( Z = r X \) has the standard exponential distribution.
Similarly, the Poisson process with rate parameter 1 is referred to as the standard Poisson process. If \( Z_i \) is the \( i \)th inter-arrival time for the standard Poisson process for \( i \in \N_+ \), then letting \( X_i = \frac{1}{r} Z_i \) for \( i \in \N_+ \) gives the inter-arrival times for the Poisson process with rate \( r \). Conversely if \( X_i \) is the \( i \)th inter-arrival time of the Poisson process with rate \( r \gt 0 \) for \( i \in \N_+ \), then \( Z_i = r X_i \) for \( i \in \N_+ \) gives the inter-arrival times for the standard Poisson process.
Relation to the Geometric Distribution
In many respects, the geometric distribution is a discrete version of the exponential distribution. In particular, recall that the geometric distribution on \( \N_+ \) is the only distribution on \(\N_+\) with the memoryless and constant rate properties. So it is not surprising that the two distributions are also connected through various transformations and limits.
Suppose that \(X\) has the exponential distribution with rate parameter \(r \gt 0\). Then
- \(\lfloor X \rfloor\) has the geometric distributions on \(\N\) with success parameter \(1 - e^{-r}\).
- \(\lceil X \rceil\) has the geometric distributions on \(\N_+\) with success parameter \(1 - e^{-r}\).
Proof
- For \(n \in \N\) note that \(\P(\lfloor X \rfloor = n) = \P(n \le X \lt n + 1) = F(n + 1) - F(n)\). Substituting into the distribution function and simplifying gives \(\P(\lfloor X \rfloor = n) = (e^{-r})^n (1 - e^{-r})\).
- For \(n \in \N_+\) note that \(\P(\lceil X \rceil = n) = \P(n - 1 \lt X \le n) = F(n) - F(n - 1)\). Substituting into the distribution function and simplifying gives \(\P(\lceil X \rceil = n) = (e^{-r})^{n - 1} (1 - e^{-r})\).
The following connection between the two distributions is interesting by itself, but will also be very important in the section on splitting Poisson processes. In words, a random, geometrically distributed sum of independent, identically distributed exponential variables is itself exponential.
Suppose that \(\bs{X} = (X_1, X_2, \ldots)\) is a sequence of independent variables, each with the exponential distribution with rate \(r\). Suppose that \(U\) has the geometric distribution on \(\N_+\) with success parameter \(p\) and is independent of \(\bs{X}\). Then \(Y = \sum_{i=1}^U X_i\) has the exponential distribution with rate \(r p\).
Proof
Recall that the moment generating function of \(Y\) is \(P \circ M\) where \(M\) is the common moment generating function of the terms in the sum, and \(P\) is the probability generating function of the number of terms \(U\). But \(M(s) = r \big/ (r - s)\) for \(s \lt r\) and \(P(s) = p s \big/ \left[1 - (1 - p)s\right]\) for \(s \lt 1 \big/ (1 - p)\). Thus, \[ (P \circ M)(s) = \frac{p r \big/ (r - s)}{1 - (1 - p) r \big/ (r - s)} = \frac{pr}{pr - s}, \quad s \lt pr \] It follows that \(Y\) has the exponential distribution with parameter \(p r\)
The next result explores the connection between the Bernoulli trials process and the Poisson process that was begun in the Introduction.
For \( n \in \N_+ \), suppose that \( U_n \) has the geometric distribution on \( \N_+ \) with success parameter \( p_n \), where \( n p_n \to r \gt 0 \) as \( n \to \infty \). Then the distribution of \( U_n / n \) converges to the exponential distribution with parameter \( r \) as \( n \to \infty \).
Proof
Let \( F_n \) denote the CDF of \( U_n / n \). Then for \( x \in [0, \infty) \) \[ F_n(x) = \P\left(\frac{U_n}{n} \le x\right) = \P(U_n \le n x) = \P\left(U_n \le \lfloor n x \rfloor\right) = 1 - \left(1 - p_n\right)^{\lfloor n x \rfloor} \] But by a famous limit from calculus, \( \left(1 - p_n\right)^n = \left(1 - \frac{n p_n}{n}\right)^n \to e^{-r} \) as \( n \to \infty \), and hence \( \left(1 - p_n\right)^{n x} \to e^{-r x} \) as \( n \to \infty \). But by definition, \( \lfloor n x \rfloor \le n x \lt \lfloor n x \rfloor + 1\) or equivalently, \( n x - 1 \lt \lfloor n x \rfloor \le n x \) so it follows that \( \left(1 - p_n \right)^{\lfloor n x \rfloor} \to e^{- r x} \) as \( n \to \infty \). Hence \( F_n(x) \to 1 - e^{-r x} \) as \( n \to \infty \), which is the CDF of the exponential distribution.
To understand this result more clearly, suppose that we have a sequence of Bernoulli trials processes. In process \( n \), we run the trials at a rate of \( n \) per unit time, with probability of success \( p_n \). Thus, the actual time of the first success in process \( n \) is \( U_n / n \). The last result shows that if \( n p_n \to r \gt 0 \) as \( n \to \infty \), then the sequence of Bernoulli trials processes converges to the Poisson process with rate parameter \( r \) as \( n \to \infty \). We will return to this point in subsequent sections.
Orderings and Order Statistics
Suppose that \(X\) and \(Y\) have exponential distributions with parameters \(a\) and \(b\), respectively, and are independent. Then \[ \P(X \lt Y) = \frac{a}{a + b} \]
Proof
This result can be proved in a straightforward way by integrating the joint PDF of \((X, Y)\) over \(\{(x, y): 0 \lt x \lt y \lt \infty\}\). A more elegant proof uses conditioning and the moment generating function above: \[ \P(Y \gt X) = \E\left[\P(Y \gt X \mid X)\right] = \E\left(e^{-b X}\right) = \frac{a}{a + b}\]
The following theorem gives an important random version of the memoryless property.
Suppose that \(X\) and \(Y\) are independent variables taking values in \([0, \infty)\) and that \(Y\) has the exponential distribution with rate parameter \(r \gt 0\). Then \(X\) and \(Y - X\) are conditionally independent given \(X \lt Y\), and the conditional distribution of \(Y - X\) is also exponential with parameter \(r\).
Proof
Suppose that \(A \subseteq [0, \infty)\) (measurable of course) and \(t \ge 0\). Then \[ \P(X \in A, Y - X \ge t \mid X \lt Y) = \frac{\P(X \in A, Y - X \ge t)}{\P(X \lt Y)} \] But conditioning on \(X\) we can write the numerator as \[ \P(X \in A, Y - X \gt t) = \E\left[\P(X \in A, Y - X \gt t \mid X)\right] = \E\left[\P(Y \gt X + t \mid X), X \in A\right] = \E\left[e^{-r(t + X)}, X \in A\right] = e^{-rt} \E\left(e^{-r\,X}, X \in A\right) \] Similarly, conditioning on \(X\) gives \(\P(X \lt Y) = \E\left(e^{-r\,X}\right)\). Thus \[ \P(X \in A, Y - X \gt t \mid X \lt Y) = e^{-r\,t} \frac{\E\left(e^{-r\,X}, X \in A\right)}{\E\left(e^{-rX}\right)} \] Letting \(A = [0, \infty)\) we have \(\P(Y \gt t) = e^{-r\,t}\) so given \(X \lt Y\), the variable \(Y - X\) has the exponential distribution with parameter \(r\). Letting \(t = 0\), we see that given \(X \lt Y\), variable \(X\) has the distribution \[ A \mapsto \frac{\E\left(e^{-r\,X}, X \in A\right)}{\E\left(e^{-r\,X}\right)} \] Finally, because of the factoring, \(X\) and \(Y - X\) are conditionally independent given \(X \lt Y\).
For our next discussion, suppose that \(\bs{X} = (X_1, X_2, \ldots, X_n)\) is a sequence of independent random variables, and that \(X_i\) has the exponential distribution with rate parameter \(r_i \gt 0\) for each \(i \in \{1, 2, \ldots, n\}\).
Let \(U = \min\{X_1, X_2, \ldots, X_n\}\). Then \(U\) has the exponential distribution with parameter \(\sum_{i=1}^n r_i\).
Proof
Recall that in general, \(\{U \gt t\} = \{X_1 \gt t, X_2 \gt t, \ldots, X_n \gt t\}\) and therefore by independence, \(F^c(t) = F^c_1(t) F^c_2(t) \cdots F^c_n(t)\) for \(t \ge 0\), where \(F^c\) is the reliability function of \(U\) and \(F^c_i\) is the reliability function of \(X_i\) for each \(i\). When \(X_i\) has the exponential distribution with rate \(r_i\) for each \(i\), we have \(F^c(t) = \exp\left[-\left(\sum_{i=1}^n r_i\right) t\right]\) for \(t \ge 0\).
In the context of reliability, if a series system has independent components, each with an exponentially distributed lifetime, then the lifetime of the system is also exponentially distributed, and the failure rate of the system is the sum of the component failure rates. In the context of random processes, if we have \(n\) independent Poisson process, then the new process obtained by combining the random points in time is also Poisson, and the rate of the new process is the sum of the rates of the individual processes (we will return to this point latter).
Let \(V = \max\{X_1, X_2, \ldots, X_n\}\). Then \(V\) has distribution function \( F \) given by \[ F(t) = \prod_{i=1}^n \left(1 - e^{-r_i t}\right), \quad t \in [0, \infty) \]
Proof
Recall that in general, \(\{V \le t\} = \{X_1 \le t, X_2 \le t, \ldots, X_n \le t\}\) and therefore by independence, \(F(t) = F_1(t) F_2(t) \cdots F_n(t)\) for \(t \ge 0\), where \(F\) is the distribution function of \(V\) and \(F_i\) is the distribution function of \(X_i\) for each \(i\).
Consider the special case where \( r_i = r \in (0, \infty) \) for each \( i \in \N_+ \). In statistical terms, \(\bs{X}\) is a random sample of size \( n \) from the exponential distribution with parameter \( r \). From the last couple of theorems, the minimum \(U\) has the exponential distribution with rate \(n r\) while the maximum \(V\) has distribution function \(F(t) = \left(1 - e^{-r t}\right)^n\) for \(t \in [0, \infty)\). Recall that \(U\) and \(V\) are the first and last order statistics, respectively.
In the order statistic experiment, select the exponential distribution.
- Set \(k = 1\) (this gives the minimum \(U\)). Vary \(n\) with the scroll bar and note the shape of the probability density function. For selected values of \(n\), run the simulation 1000 times and compare the empirical density function to the true probability density function.
- Vary \(n\) with the scroll bar, set \(k = n\) each time (this gives the maximum \(V\)), and note the shape of the probability density function. For selected values of \(n\), run the simulation 1000 times and compare the empirical density function to the true probability density function.
Curiously, the distribution of the maximum of independent, identically distributed exponential variables is also the distribution of the sum of independent exponential variables, with rates that grow linearly with the index.
Suppose that \( r_i = i r \) for each \( i \in \{1, 2, \ldots, n\} \) where \( r \in (0, \infty) \). Then \( Y = \sum_{i=1}^n X_i \) has distribution function \( F \) given by \[ F(t) = (1 - e^{-r t})^n, \quad t \in [0, \infty) \]
Proof
By assumption, \( X_k \) has PDF \( f_k \) given by \( f_k(t) = k r e^{-k r t} \) for \( t \in [0, \infty) \). We want to show that \( Y_n = \sum_{i=1}^n X_i\) has PDF \( g_n \) given by \[ g_n(t) = n r e^{-r t} (1 - e^{-r t})^{n-1}, \quad t \in [0, \infty) \] The PDF of a sum of independent variables is the convolution of the individual PDFs, so we want to show that \[ f_1 * f_2 * \cdots * f_n = g_n, \quad n \in \N_+ \] The proof is by induction on \( n \). Trivially \( f_1 = g_1 \), so suppose the result holds for a given \( n \in \N_+ \). Then \begin{align*} g_n * f_{n+1}(t) & = \int_0^t g_n(s) f_{n+1}(t - s) ds = \int_0^t n r e^{-r s}(1 - e^{-r s})^{n-1} (n + 1) r e^{-r (n + 1) (t - s)} ds \\ & = r (n + 1) e^{-r(n + 1)t} \int_0^t n(1 - e^{-rs})^{n-1} r e^{r n s} ds \end{align*} Now substitute \( u = e^{r s} \) so that \( du = r e^{r s} ds \) or equivalently \(r ds = du / u\). After some algebra, \begin{align*} g_n * f_{n+1}(t) & = r (n + 1) e^{-r (n + 1)t} \int_1^{e^{rt}} n (u - 1)^{n-1} du \\ & = r(n + 1) e^{-r(n + 1) t}(e^{rt} - 1)^n = r(n + 1)e^{-rt}(1 - e^{-rt})^n = g_{n+1}(t) \end{align*}
This result has an application to the Yule process, named for George Yule. The Yule process, which has some parallels with the Poisson process, is studied in the chapter on Markov processes. We can now generalize the order probability above:
For \(i \in \{1, 2, \ldots, n\}\), \[ \P\left(X_i \lt X_j \text{ for all } j \ne i\right) = \frac{r_i}{\sum_{j=1}^n r_j} \]
Proof
First, note that \(X_i \lt X_j\) for all \(i \ne j\) if and only if \(X_i \lt \min\{X_j: j \ne i\}\). But the minimum on the right is independent of \(X_i\) and, by result on minimums above, has the exponential distribution with parameter \(\sum_{j \ne i} r_j\). The result now follows from order probability for two events above.
Suppose that for each \(i\), \(X_i\) is the time until an event of interest occurs (the arrival of a customer, the failure of a device, etc.) and that these times are independent and exponentially distributed. Then the first time \(U\) that one of the events occurs is also exponentially distributed, and the probability that the first event to occur is event \(i\) is proportional to the rate \(r_i\).
The probability of a total ordering is \[ \P(X_1 \lt X_2 \lt \cdots \lt X_n) = \prod_{i=1}^n \frac{r_i}{\sum_{j=i}^n r_j} \]
Proof
Let \( A = \left\{X_1 \lt X_j \text{ for all } j \in \{2, 3, \ldots, n\}\right\} \). then \[ \P(X_1 \lt X_2 \lt \cdots \lt X_n) = \P(A, X_2 \lt X_3 \lt \cdots \lt X_n) = \P(A) \P(X_2 \lt X_3 \lt \cdots \lt X_n \mid A) \] But \( \P(A) = \frac{r_1}{\sum_{i=1}^n r_i} \) from the previous result, and \( \{X_2 \lt X_3 \lt \cdots \lt X_n\} \) is independent of \( A \). Thus we have \[ \P(X_1 \lt X_2 \lt \cdots \lt X_n) = \frac{r_1}{\sum_{i=1}^n r_i} \P(X_2 \lt X_3 \lt \cdots \lt X_n) \] so the result follows by induction.
Of course, the probabilities of other orderings can be computed by permuting the parameters appropriately in the formula on the right.
The result on minimums and the order probability result above are very important in the theory of continuous-time Markov chains. But for that application and others, it's convenient to extend the exponential distribution to two degenerate cases: point mass at 0 and point mass at \( \infty \) (so the first is the distribution of a random variable that takes the value 0 with probability 1, and the second the distribution of a random variable that takes the value \( \infty \) with probability 1). In terms of the rate parameter \( r \) and the distribution function \( F \), point mass at 0 corresponds to \( r = \infty \) so that \( F(t) = 1 \) for \( 0 \lt t \lt \infty \). Point mass at \( \infty \) corresponds to \( r = 0 \) so that \( F(t) = 0 \) for \( 0 \lt t \lt \infty \). The memoryless property, as expressed in terms of the reliability function \( F^c \), still holds for these degenerate cases on \( (0, \infty) \): \[ F^c(s) F^c(t) = F^c(s + t), \quad s, \, t \in (0, \infty) \] We also need to extend some of results above for a finite number of variables to a countably infinite number of variables. So for the remainder of this discussion, suppose that \( \{X_i: i \in I\} \) is a countable collection of independent random variables, and that \( X_i \) has the exponential distribution with parameter \( r_i \in (0, \infty) \) for each \( i \in I \).
Let \( U = \inf\{X_i: i \in I\} \). Then \( U \) has the exponential distribution with parameter \( \sum_{i \in I} r_i \)
Proof
The proof is almost the same as the one above for a finite collection. Note that \( \{U \ge t\} = \{X_i \ge t \text{ for all } i \in I\} \) and so \[ \P(U \ge t) = \prod_{i \in I} \P(X_i \ge t) = \prod_{i \in I} e^{-r_i t} = \exp\left[-\left(\sum_{i \in I} r_i\right)t \right] \] If \( \sum_{i \in I} r_i \lt \infty \) then \( U \) has a proper exponential distribution with the sum as the parameter. If \( \sum_{i \in I} r_i = \infty \) then \( P(U \ge t) = 0 \) for all \( t \in (0, \infty) \) so \( P(U = 0) = 1 \).
For \(i \in \N_+\), \[ \P\left(X_i \lt X_j \text{ for all } j \in I - \{i\}\right) = \frac{r_i}{\sum_{j \in I} r_j} \]
Proof
First note that since the variables have continuous distributions and \( I \) is countable, \[ \P\left(X_i \lt X_j \text{ for all } j \in I - \{i\} \right) = \P\left(X_i \le X_j \text{ for all } j \in I - \{i\}\right)\] Next note that \(X_i \le X_j\) for all \(j \in I - \{i\}\) if and only if \(X_i \le U_i \) where \(U_i = \inf\left\{X_j: j \in I - \{i\}\right\}\). But \( U_i \) is independent of \(X_i\) and, by previous result, has the exponential distribution with parameter \(s_i = \sum_{j \in I - \{i\}} r_j\). If \( s_i = \infty \), then \( U_i \) is 0 with probability 1, and so \( P(X_i \le U_i) = 0 = r_i / s_i \). If \( s_i \lt \infty \), then \( X_i \) and \( U_i \) have proper exponential distributions, and so the result now follows from order probability for two variables above.
We need one last result in this setting: a condition that ensures that the sum of an infinite collection of exponential variables is finite with probability one.
Let \( Y = \sum_{i \in I} X_i \) and \( \mu = \sum_{i \in I} 1 / r_i \). Then \( \mu = \E(Y) \) and \( \P(Y \lt \infty) = 1 \) if and only if \( \mu \lt \infty \).
Proof
The result is trivial if \( I \) is finite, so assume that \( I = \N_+ \). Recall that \( \E(X_i) = 1 / r_i \) and hence \( \mu = \E(Y) \). Trivially if \( \mu \lt \infty \) then \( \P(Y \lt \infty) = 1 \). Conversely, suppose that \( \P(Y \lt \infty) = 1 \). Then \( \P(e^{-Y} \gt 0) = 1 \) and hence \( \E(e^{-Y}) \gt 0 \). Using independence and the moment generating function above, \[ \E(e^{-Y}) = \E\left(\prod_{i=1}^\infty e^{-X_i}\right) = \prod_{i=1}^\infty \E(e^{-X_i}) = \prod_{i=1}^\infty \frac{r_i}{r_i + 1} \gt 0\] Next recall that if \( p_i \in (0, 1) \) for \( i \in \N_+ \) then \[ \prod_{i=1}^\infty p_i \gt 0 \text{ if and only if } \sum_{i=1}^\infty (1 - p_i) \lt \infty \] Hence it follows that \[ \sum_{i=1}^\infty \left(1 - \frac{r_i}{r_i + 1}\right) = \sum_{i=1}^\infty \frac{1}{r_i + 1} \lt \infty \] In particular, this means that \( 1/(r_i + 1) \to 0 \) as \( i \to \infty \) and hence \( r_i \to \infty \) as \( i \to \infty \). But then \[ \frac{1/(r_i + 1)}{1/r_i} = \frac{r_i}{r_i + 1} \to 1 \text{ as } i \to \infty \] By the comparison test for infinite series, it follows that \[ \mu = \sum_{i=1}^\infty \frac{1}{r_i} \lt \infty \]
Computational Exercises
Show directly that the exponential probability density function is a valid probability density function.
Solution
Clearly \( f(t) = r e^{-r t} \gt 0 \) for \( t \in [0, \infty) \). Simple integration that \[ \int_0^\infty r e^{-r t} \, dt = 1 \]
Suppose that the length of a telephone call (in minutes) is exponentially distributed with rate parameter \(r = 0.2\). Find each of the following:
- The probability that the call lasts between 2 and 7 minutes.
- The median, the first and third quartiles, and the interquartile range of the call length.
Answer
Let \(X\) denote the call length.
- \(\P(2 \lt X \lt 7) = 0.4237\)
- \(q_1 = 1.4384\), \(q_2 = 3.4657\), \(q_3 = 6.9315\), \(q_3 - q_1 = 5.4931\)
Suppose that the lifetime of a certain electronic component (in hours) is exponentially distributed with rate parameter \(r = 0.001\). Find each of the following:
- The probability that the component lasts at least 2000 hours.
- The median, the first and third quartiles, and the interquartile range of the lifetime.
Answer
Let \(T\) denote the lifetime
- \(\P(T \ge 2000) = 0.1353\)
- \(q_1 = 287.682\), \(q_2 = 693.147\), \(q_3 = 1386.294\), \(q_3 - q_1 = 1098.612\)
Suppose that the time between requests to a web server (in seconds) is exponentially distributed with rate parameter \(r = 2\). Find each of the following:
- The mean and standard deviation of the time between requests.
- The probability that the time between requests is less that 0.5 seconds.
- The median, the first and third quartiles, and the interquartile range of the time between requests.
Answer
Let \(T\) denote the time between requests.
- \(\E(T) = 0.5\), \(\sd(T) = 0.5\)
- \(\P(T \lt 0.5) = 0.6321\)
- \(q_1 = 0.1438\), \(q_2 = 0.3466\), \(q_3 = 0.6931\), \(q_3 - q_1 = 0.5493\)
Suppose that the lifetime \(X\) of a fuse (in 100 hour units) is exponentially distributed with \(\P(X \gt 10) = 0.8\). Find each of the following:
- The rate parameter.
- The mean and standard deviation.
- The median, the first and third quartiles, and the interquartile range of the lifetime.
Answer
Let \(X\) denote the lifetime.
- \(r = 0.02231\)
- \(\E(X) = 44.814\), \(\sd(X) = 44.814\)
- \(q_1 = 12.8922\), \(q_2 = 31.0628\), \(q_3 = 62.1257\), \(q_3 - q_1 = 49.2334\)
The position \(X\) of the first defect on a digital tape (in cm) has the exponential distribution with mean 100. Find each of the following:
- The rate parameter.
- The probability that \(X \lt 200\) given \(X \gt 150\).
- The standard deviation.
- The median, the first and third quartiles, and the interquartile range of the position.
Answer
Let \(X\) denote the position of the first defect.
- \(r = 0.01\)
- \(\P(X \lt 200 \mid X \gt 150) = 0.3935\)
- \(\sd(X) = 100\)
- \(q_1 = 28.7682\), \(q_2 = 69.3147\), \(q_3 = 138.6294\), \(q_3 - q_1 = 109.6812\)
Suppose that \( X, \, Y, \, Z \) are independent, exponentially distributed random variables with respective parameters \( a, \, b, \, c \in (0, \infty) \). Find the probability of each of the 6 orderings of the variables.
Proof
- \( \P(X \lt Y \lt Z) = \frac{a}{a + b + c} \frac{b}{b + c} \)
- \( \P(X \lt Z \lt Y) = \frac{a}{a + b + c} \frac{c}{b + c} \)
- \( \P(Y \lt X \lt Z) = \frac{b}{a + b + c} \frac{a}{a + c} \)
- \( \P(Y \lt Z \lt X) = \frac{b}{a + b + c} \frac{c}{a + c} \)
- \( \P(Z \lt X \lt Y) = \frac{c}{a + b + c} \frac{a}{a + b} \)
- \( \P(Z \lt Y \lt X) = \frac{c}{a + b + c} \frac{b}{a + b} \)