# 5.37: The Wald Distribution

- Page ID
- 10470

The Wald distribution, named for Abraham Wald, is important in the study of Brownian motion. Specifically, the distribution governs the first time that a Brownian motion with positive drift hits a fixed, positive value. In Brownian motion, the distribution of the *random position* at a *fixed time* has a normal (Gaussian) distribution, and thus the Wald distribution, which governs the *random time* at a *fixed position*, is sometimes called the inverse Gaussian distribution.

## The Basic Wald Distribution

### Distribution Functions

As usual, let \( \Phi \) denote the standard normal distribution function.

The basic Wald distribution with shape parameter \( \lambda \in (0, \infty) \) is a continuous distribution on \( (0, \infty) \) with distribution function \( G \) given by \[ G(u) = \Phi\left[\sqrt{\frac{\lambda}{u}}(u - 1)\right] + e^{2 \lambda} \Phi\left[-\sqrt{\frac{\lambda}{u}} (u + 1)\right], \quad u \in (0, \infty) \] The special case \( \lambda = 1 \) gives the standard Wald distribution.

## Proof

Note that as \( u \to \infty \), \( \sqrt{\frac{\lambda}{u}}(u - 1) \to \infty \) and \( -\sqrt{\frac{\lambda}{u}}(u + 1) \to -\infty \), and hence \( G(u) \to 1 \). As \( u \downarrow 0 \), \(\sqrt{\frac{\lambda}{u}}(u - 1) \to -\infty \) and \( -\sqrt{\frac{\lambda}{u}}(u + 1) \to -\infty\), and hence \( G(u) \to 0 \). Of course, \( G \) is clearly continuous on \( (0, \infty) \), so it remains to show that \( G \) is increasing on this interval. Differentiating gives \[ G^\prime(u) = \phi\left[\sqrt{\frac{\lambda}{u}}(u - 1)\right] \left[\frac{\sqrt{\lambda}}{2}\left(u^{-1/2} + u^{-3/2}\right)\right] + e^{2 \lambda} \phi\left[-\sqrt{\frac{\lambda}{u}}(u + 1)\right] \left[-\frac{\sqrt{\lambda}}{2}\left(u^{-1/2} - u^{-3/2}\right)\right] \] where \( \phi(z) = \frac{1}{\sqrt{2 \pi}} e^{-z^2/2} \) is the standard normal PDF. Simple algebra shows that \[ \phi\left[\sqrt{\frac{\lambda}{u}}(u - 1)\right] = e^{2 \lambda} \phi\left[-\sqrt{\frac{\lambda}{u}}(u + 1)\right] = \frac{1}{\sqrt{2 \pi}} \exp\left[-\frac{\lambda}{2 u} (u - 1)^2\right]\] so simplifying further gives \[ G^\prime(u) = \sqrt{\frac{\lambda}{2 \pi}} u^{-3/2} \exp\left[-\frac{\lambda}{2 u} (u - 1)^2\right] \gt 0, \quad u \in (0, \infty)\]

The probability density function \( g \) is given by \[ g(u) = \sqrt{\frac{\lambda}{2 \pi u^3}} \exp\left[-\frac{\lambda}{2 u}(u - 1)^2\right], \quad u \in (0, \infty) \]

- \( g \) increases and then decreases with mode \( u_0 = \sqrt{1 + \left(\frac{3}{2 \lambda}\right)^2} - \frac{3}{2 \lambda} \)
- \( g \) is concave upward then downward then upward again.

## Proof

The formula for the PDF follows immediately from the proof of the CDF above, since \( g = G^\prime \). The first order properties come from \[ g^\prime(u) = \sqrt{\frac{\lambda}{8 \pi u^7}} \exp\left[-\frac{\lambda}{2 u}(u - 1)^2\right] \left[\lambda(1 - u^2) - 3 u\right], \quad u \in (0, \infty) \] and the second order properties from \[ g^{\prime\prime}(u) = \sqrt{\frac{\lambda}{32 \pi u^{11}}} \exp\left[-\frac{\lambda}{2 u}(u - 1)^2\right]\left[15 u^2 + \lambda^2 (u^2 - 1)^2 + 2 \lambda u(3 u^2 - 5)\right], \quad u \in (0, \infty) \]

So \( g \) has the classic unimodal shape, but the inflection points are very complicated functions of \( \lambda \). For the mode, note that \( u_0 \downarrow 0 \) as \( \lambda \downarrow 0 \) and \( u_0 \uparrow 1 \) as \( \lambda \uparrow \infty \). The probability density function of the standard Wald distribution is \[ g(u) = \sqrt{\frac{1}{2 \pi u^3}} \exp\left[-\frac{1}{2 u}(u - 1)^2\right], \quad u \in (0, \infty) \]

Open the special distribution simulator and select the Wald distribution. Vary the shape parameter and note the shape of the probability density function. For various values of the parameter, run the simulation 1000 times and compare the empirical density function to the probability density function.

The quantile function of the standard Wald distribution does not have a simple closed form, so the median and other quantiles must be approximated.

Open the special distribution calculator and select the Wald distribution. Vary the shape parameter and note the shape of the distribution and probability density functions. For selected values of the parameter, compute approximate values of the first quartile, the median, and the third quartile.

### Moments

Suppose that random variable \( U \) has the standard Wald distribution with shape parameter \( \lambda \in (0, \infty) \).

\( U \) has moment generating function \( m \) given by \[ m(t) = \E\left(e^{t U}\right) = \exp\left[\lambda \left(1 - \sqrt{1 - \frac{2 t}{\lambda}}\right)\right], \quad t \lt \frac{\lambda}{2} \]

## Proof

The proof requires some facts about the modified Bessel function of the second kind, denoted \( K_\alpha \) where the parameter \( \alpha \in \R \). This function is one of the two linearly independent solutions of the differential equation \[ x^2 \frac{d^2 y}{d x^2} + x \frac{d y}{dx} - (x^2 + \alpha^2) y = 0 \] The other solution, appropriately enough, is the modified Bessel function of the first kind. The function of the second kind, the one that we care about here, is the solution that decays exponentially as \( x \to \infty \). The first fact we need is that \[ K_{-1/2}(x) = \frac{1}{\sqrt{x}} e^{-x}, \quad x \in (0, \infty) \] which you can verify be direct substitution into the differential equation. The second fact that we need is the identity \[ \int_0^\infty x^{p-1} \exp\left[-\frac{1}{2}\left(a x + \frac{b}{x}\right)\right] dx = \frac{2 K_p(\sqrt{a b})}{(a / b)^{p/2}}, \quad a, \, b \in (0, \infty); \; p \in \R \] Now, for the moment generating function of \( U \) we have \[m(t) = \int_0^\infty e^{t x} \sqrt{\frac{\lambda}{2 \pi x^3}} \exp\left[-\frac{\lambda}{2x}(x - 1)^2\right] dx\] Combining the exponentials and doing some algebra, we can rewrite this as \[ m(t) = \sqrt{\frac{\lambda}{2 \pi}} e^\lambda \int_0^\infty x^{-3/2} \exp\left[-\frac{1}{2}(\lambda - 2 t) x - \frac{1}{2} \frac{\lambda}{x}\right] dx \] The integral now has the form of the identity given above with \( p = -1/2 \), \( a = \lambda - 2 t \), and \( b = \lambda \). Hence we have \[ m(t) = \sqrt{\frac{\lambda}{2 \pi}} e^\lambda \frac{2 K_{-1/2}\left[\sqrt{\lambda (\lambda - 2 t)}\right]}{[(\lambda - 2 t) / \lambda]^{-1/4}} \] Using the explicit form of \( K_{-1/2} \) given above and doing more algebra we get \[ m(t) = \exp\left[\lambda - \sqrt{\lambda (\lambda - 2 t)}\right] = \exp\left[\lambda\left(1 - \sqrt{1 - \frac{2 t}{\lambda}}\right)\right] \]

Since the moment generating function is finite in an interval containing 0, the basic Wald distribution has moments of all orders.

The mean and variance of \( U \) are

- \( \E(U) = 1 \)
- \( \var(U) = \frac{1}{\lambda} \)

## Proof

Differentiating gives \begin{align} m^\prime(t) & = m(t) \left(1 - \frac{2 t}{\lambda}\right)^{-1/2} \\ m^{\prime\prime}(t) &= m(t) \left[\left(1 - \frac{2 t}{\lambda}\right)^{-1} + \frac{1}{\lambda} \left(1 - \frac{2 t}{\lambda}\right)^{-3/2}\right] \end{align} and hence \( \E(U) = m^\prime(0) = 1 \) and \( \E\left(U^2\right) = m^{\prime\prime}(0) = 1 + \frac{1}{\lambda} \).

So interestingly, the mean is 1 for all values of the shape parameter, while \( \var(U) \to \infty \) as \( \lambda \downarrow 0 \) and \( \var(U) \to 0 \) as \( \lambda \to \infty \).

Open the special distribution simulator and select the Wald distribution. Vary the shape parameter and note the size and location of the mean \( \pm \) standard deviation bar. For various values of the parameters, run the simulation 1000 times and compare the empirical mean and standard deviation to the distribution mean and standard deviation.

The skewness and kurtosis of \( U \) are

- \( \skw(U) = 3 / \sqrt{\lambda} \)
- \( \kur(U) = 3 + 15 / \lambda \)

## Proof

The main tool is the differential equation for the moment generating function that we used in computing the mean and variance: \[ m^\prime(t) = m(t) \left(1 - \frac{2 t}{\lambda}\right)^{-1/2} \] Using this recursively, we can find the first four moments of \( U \). We already know the first two: \( m^\prime(0) = \E(U) = 1 \), \( m^{\prime \prime}(0) = \E(U^2) = 1 + 1 / \lambda \). The third and fourth are \begin{align} m^{(3)}(0) = & E(U^3) = 1 + 3 / \lambda + 3 / \lambda^2 \\ m^{(4)}(0) = & E(U^4) = 1 + 6 / \lambda + 15 / \lambda^2 + 15 / \lambda^3 \end{align} The results then follow from the standard computational formulas for the skewness and kurtosis in terms of the moments.

It follows that the excess kurtosis is \( \kur(U) - 3 = 15 / \lambda \). Note that \( \skw(U) \to \infty \) as \( \lambda \to 0 \) and \( \skw(U) \to 0 \) as \( \lambda \to \infty \). Similarly, \( \kur(U) \to \infty \) as \( \lambda \to 0 \) and \( \kur(U) \to 3 \) as \( \lambda \to \infty \).

## The General Wald Distribution

The basic Wald distribution is generalized into a scale family. Scale parameters often correspond to a change of units, and so are of basic importance.

Suppose that \( \lambda, \, \mu \in (0, \infty) \) and that \( U \) has the basic Wald distribution with shape parameter \( \lambda / \mu \). Then \( X = \mu U \) has the Wald distribution with shape parameter \( \lambda \) and mean \( \mu \).

Justification for the name of the parameter \( \mu \) as the *mean* is given below. Note that the generalization is consistent—when \( \mu = 1 \) we have the basic Wald distribution with shape parameter \( \lambda \).

### Distribution Functions

Suppose that \( X \) has the Wald distribution with shape parameter \( \lambda \in (0, \infty) \) and mean \( \mu \in (0, \infty) \). Again, we let \( \Phi \) denote the standard normal distribution function.

\( X \) has distribution function \( F \) given by \[ F(x) = \Phi\left[\sqrt{\frac{\lambda}{x}} \left(\frac{x}{\mu} - 1\right)\right] + \exp\left(\frac{2 \lambda}{\mu}\right) \Phi\left[-\sqrt{\frac{\lambda}{x}} \left(\frac{x}{\mu} + 1\right)\right], \quad x \in (0, \infty) \]

## Proof

Recall that the CDF \( F \) of \( X \) is related to the CDF \( G \) of \( U \) by \[ F(x) = G\left(\frac{x}{\mu}\right), \quad x \in (0, \infty) \] so the result follows from the CDF above, with \( \lambda \) replaced by \( \lambda / \mu \), and \( x \) with \( x / \mu \).

\( X \) has probability density function \( f \) given by \[ f(x) = \sqrt{\frac{\lambda}{2 \pi x^3}} \exp\left[\frac{-\lambda (x - \mu)^2}{2 \mu^2 x}\right], \quad x \in (0, \infty) \]

- \( f \) increases and then decreases with mode \( x_0 = \mu\left[\sqrt{1 + \left(\frac{3 \mu}{2 \lambda}\right)^2} - \frac{3 \mu}{2 \lambda}\right] \)
- \( f \) is concave upward then downward then upward again.

## Proof

Recall that the PDF \( f \) of \( X \) is related to the PDF \( g \) of \( U \) by \[ f(x) = \frac{1}{\mu} g\left(\frac{x}{\mu}\right), \quad x \in (0, \infty) \] Hence the result follows from the PDF above with \( \lambda \) replaced by \( \lambda / \mu \) and \( x \) with \( x / \mu \).

Once again, the graph of \( f \) has the classic unimodal shape, but the inflection points are complicated functions of the parameters.

Open the special distribution simulator and select the Wald distribution. Vary the parameters and note the shape of the probability density function. For various values of the parameters, run the simulation 1000 times and compare the empirical density function to the probability density function.

Again, the quantile function cannot be expressed in a simple closed form, so the median and other quantiles must be approximated.

Open the special distribution calculator and select the Wald distribution. Vary the parameters and note the shape of the distribution and density functions. For selected values of the parameters, compute approximate values of the first quartile, the median, and the third quartile.

### Moments

Suppose again that \( X \) has the Wald distribution with shape parameter \( \lambda \in (0, \infty) \) and mean \( \mu \in (0, \infty) \). By definition, we can take \( X = \mu U \) where \( U \) has the basic Wald distribution with shape parameter \( \lambda / \mu \).

\( X \) has moment generating function \( M \) given by \[ M(t) = \exp\left[\frac{\lambda}{\mu} \left(1 - \sqrt{1 - \frac{2 \mu^2 t}{\lambda}}\right)\right], \quad t \lt \frac{\lambda}{2 \mu^2} \]

## Proof

Recall that the MGF \( M \) of \( X \) is related to the MGF \( m \) of \( U \) by \( M(t) = m(t \mu) \). Hence the result follows from the result MFG above with \( \lambda \) replaced by \( \lambda / \mu \) and \( t \) with \( t \mu \).

As promised, the parameter \( \mu \) is the mean of Wald distribution.

The mean and variance of \( X \) are

- \( \E(X) = \mu \)
- \( \var(X) = \mu^3 / \lambda \)

## Proof

From the results for the mean and variance above and basic properties of expected value and variance, we have \( \E(X) = \mu \E(U) = \mu \cdot 1 \) and \( \var(X) = \mu^2 \var(U) = \mu^2 \frac{\mu}{\lambda} \).

Open the special distribution simulator and select the Wald distribution. Vary the parameters and note the size and location of the mean \( \pm \) standard deviation bar. For various values of the parameters, run the simulation 1000 times and compare the empirical mean and standard deviation to the distribution mean and standard deviation.

The skewness and kurtosis of \( X \) are

- \( \skw(X) = 3 \sqrt{\mu / \lambda} \)
- \( \kur(U) = 3 + 15 \mu / \lambda \)

## Proof

Skewness and kurtosis are invariant under scale transformations, so \( \skw(X) = \skw(U) \) and \( \kur(X) = \kur(U) \). The results then follow from the skewness and kurtosis above, with \( \lambda \) replaced by \( \lambda / \mu \).

### Related Distribution

As noted earlier, the Wald distribution is a scale family, although neither of the parameters is a scale parameter.

Suppose that \( X \) has the Wald distribution with shape parameters \( \lambda \in (0, \infty) \) and mean \( \mu \in (0, \infty) \) and that \( c \in (0, \infty) \). Then \( Y = c X \) has the Wald distribution with shape parameter \( c \lambda \) and mean \( c \mu \).

## Proof

By definition, we can take \( X = \mu U \) where \( U \) has the basic Wald distribution with shape parameter \( \lambda / \mu \). Then \( Y = c X = (c \mu) U \). Since \( U \) has shape parameter \( c \lambda / c \mu \), the result follows from the definition.

For the next result, it's helpful to re-parameterize the Wald distribution with the mean \( \mu \) and the ratio \( r = \lambda / \mu^2 \in (0, \infty) \). This parametrization is clearly equivalent, since we can recover the shape parameter from the mean and ratio as \( \lambda = r \mu^2 \). Note also that \( r = \E(X) \big/ \var(X) \), the ratio of the mean to the variance. Finally, note that the moment generating function above becomes \[ M(t) = \exp\left[r \mu \left(1 - \sqrt{1 - \frac{2}{r}t}\right)\right], \quad t \lt \frac{r}{2} \] and of course, this function characterizes the Wald distribution with this parametrization. Our next result is that the Wald distribution is closed under convolution (corresponding to sums of independent variables) when the ratio is fixed.

Suppose that \( X_1 \) has the Wald distribution with mean \( \mu_1 \in (0, \infty) \) and ratio \( r \in (0, \infty) \); \( X_2 \) has the Wald distribution with mean \( \mu_2 \in (0, \infty) \) and ratio \( r \); and that \( X_1 \) and \( X_2 \) are independent. Then \( Y = X_1 + X_2 \) has the Wald distribution with mean \( \mu_1 + \mu_2 \) and ratio \( r \).

## Proof

For \( i \in \{1, 2\} \), the MGF of \( X_i \) is \[ M_i(t) = \exp\left[r \mu_i \left(1 - \sqrt{1 - \frac{2}{r}t}\right)\right], \quad t \lt \frac{r}{2} \] Hence the MGF of \( Y = X_i + X_2 \) is \[ M(t) = M_1(t) M_2(t) = \exp\left[r\left(\mu_1 + \mu_2\right) \left(1 - \sqrt{1 - \frac{2}{r}t}\right)\right], \quad t \lt \frac{r}{2} \] Hence \( Y \) has the Wald distribution with mean \( \mu_1 + \mu_2 \) and ratio \( r \).

In the previous result, note that the shape parameter of \( X_1 \) is \(r \mu_1^2\), the shape parameter of \( X_2 \) is \( r \mu_2^2 \), and the shape parameter of \( Y \) is \( \lambda = r (\mu_1 + \mu_2)^2 \). Also, of course, the result generalizes to a sum of any finite number of independent Wald variables, as long as the ratio is fixed. The next couple of results are simple corollaries.

Suppose that \( (X_1, X_2, \ldots, X_n) \) is a sequence of independent variables, each with the Wald distribution with shape parameter \( \lambda \in (0, \infty) \) and mean \( \mu \in (0, \infty) \). Then

- \( Y_n = \sum_{i=1}^n X_i \) has the Wald distribution with shape parameter \( n^2 \lambda \) and mean \( n \mu \).
- \( M_n = \frac{1}{n} \sum_{i=1}^n X_i \) has the Wald distribution with shape parameter \( n \lambda \) and mean \( \mu \).

## Proof

- This follows from the previous result and induction. The mean of \( Y_n \) of course is \( n \mu \). The common ratio is \( r = \lambda / \mu^2 \), and hence the shape parameter of \( Y_n \) is \( (\lambda / \mu^2) (n \mu)^2 = n^2 \lambda \).
- This follows from (a) and the scaling result above. The mean of \( M_n \) of course is \( \mu \) and the shape parameter is \( (1 / n) (n^2 \lambda) = n \lambda \).

In the context of the previous theorem, \( (X_1, X_2, \ldots, X_n) \) is a random sample of size \( n \) from the Wald distribution, and \( \frac{1}{n} \sum_{i=1}^n X_i \) is the sample mean. The Wald distribution is infinitely divisible:

Suppose that \( X \) has the Wald distribution with shape parameter \( \lambda \in (0, \infty) \) and mean \( \mu \in (0, \infty)\). For every \( n \in \N_+ \), \( X \) has the same distribution as \( \sum_{i=1}^n X_i \) where \( (X_1, X_2, \ldots, X_n) \) are independent, and each has the Wald distribution with shape parameter \( \lambda / n^2 \) and mean \( \mu / n \).

The Lévy distribution is related to the Wald distribution, not surprising since the Lévy distribution governs the first time that a standard Brownian motion hits a fixed positive value.

For fixed \( \lambda \in (0, \infty) \), the Wald distribution with shape parameter \( \lambda \) and mean \( \mu \in (0, \infty) \) converges to the Lévy distribution with location parameter 0 and scale parameter \(\lambda \) as \( \mu \to \infty \).

## Proof

From the formula for the CDF above, note that \[ F(x) \to \Phi\left(-\sqrt{\frac{\lambda}{x}}\right) + \Phi\left(-\sqrt{\frac{\lambda}{x}}\right) = 2 \Phi\left(-\sqrt{\frac{\lambda}{x}}\right) = 2\left[1 - \Phi\left(\sqrt{\frac{\lambda}{x}}\right)\right] \text{ as } \mu \to \infty \] But the last expression is the distribution function of the Lévy distribution with location parameter 0 and shape parameter \( \lambda \).

The other limiting distribution, this time with the mean fixed, is less interesting.

For fixed \( \mu \in (0, \infty) \), the Wald distribution with shape parameter \( \lambda \in (0, \infty) \) and mean \( \mu \) converges to point mass at \( \mu \) and variance 1 as \( \lambda \to \infty \).

## Proof

This time, it's better to use \( M \), the moment generating function above. By rationalizing we see that for fixed \( \mu \in (0, \infty)\) and \( t \in \R \) \[ \frac{\lambda}{\mu} \left(1 - \sqrt{1 - \frac{2 \mu^2 t}{\lambda}}\right) = \frac{2 \mu t}{1 + \sqrt{1 - 2 \mu^2 t / \lambda}} \to \mu t \text{ as } \lambda \to \infty\] Hence \( M(t) \to e^{\mu t} \) as \( \lambda \to \infty \) and the limit is the MGF of the constant random variable \(\mu\).

Finally, the Wald distribution is a member of the general exponential family of distributions.

The Wald distribution is a general exponential distribution with natural parameters \( -\lambda / (2 \mu^2) \) and \( -\lambda / 2 \), and natural statistics \( X \) and \( 1 / X \).

## Proof

This follows from the PDF \( f \) above. If we expand the square and simplify, we can write \( f \) in the form \[ f(x) = \sqrt{\frac{\lambda}{2 \pi}} \exp\left(\frac{\lambda}{2 \mu}\right) x^{-3/2} \exp\left(-\frac{\lambda}{2 \mu^2} x - \frac{\lambda}{2} \frac{1}{x}\right), \quad x \in (0, \infty) \]