5.12: The Lognormal Distribution
Basic Theory
Definition
Suppose that \(Y\) has the normal distribution with mean \(\mu \in \R\) and standard deviation \(\sigma \in (0, \infty)\). Then \(X = e^Y\) has the lognormal distribution with parameters \(\mu\) and \(\sigma\).
- The parameter \( \sigma \) is the shape parameter of the distribution.
- The parameter \( e^\mu\) is the scale parameter of the distribution.
If \(Z\) has the standard normal distribution then \(W = e^Z\) has the standard lognormal distribution .
So equivalently, if \(X\) has a lognormal distribution then \(\ln X\) has a normal distribution, hence the name. The lognormal distribution is a continuous distribution on \((0, \infty)\) and is used to model random quantities when the distribution is believed to be skewed, such as certain income and lifetime variables. It's easy to write a general lognormal variable in terms of a standard lognormal variable. Suppose that \(Z\) has the standard normal distribution and let \(W = e^Z\) so that \(W\) has the standard lognormal distribution. If \(\mu \in \R\) and \(\sigma \in (0, \infty)\) then \(Y = \mu + \sigma Z\) has the normal distribution with mean \(\mu\) and standard deviation \(\sigma\) and hence \(X = e^Y\) has the lognormal distribution with parameters \(\mu\) and \(\sigma\). But \[X = e^Y = e^{\mu + \sigma Z} = e^\mu \left(e^Z\right)^\sigma = e^\mu W^\sigma\]
Distribution Functions
Suppose that \(X\) has the lognormal distribution with parameters \(\mu \in \R\) and \(\sigma \in (0, \infty)\).
The probability density function \(f\) of \(X\) is given by \[ f(x) = \frac{1}{\sqrt{2 \pi} \sigma x} \exp \left[-\frac{\left(\ln x - \mu\right)^2}{2 \sigma^2} \right], \quad x \in (0, \infty) \]
- \( f \) increases and then decreases with mode at \( x = \exp\left(\mu - \sigma^2\right) \).
- \( f \) is concave upward then downward then upward again, with inflection points at \( x = \exp\left(\mu - \frac{3}{2} \sigma^2 \pm \frac{1}{2} \sigma \sqrt{\sigma^2 + 4}\right) \)
- \( f(x) \to 0 \) as \( x \downarrow 0 \) and as \( x \to \infty \).
Proof
The form of the PDF follows from the change of variables theorem. Let \( g \) denote the PDF of the normal distribution with mean \( \mu \) and standard deviation \( \sigma \), so that \[ g(y) = \frac{1}{\sqrt{2 \pi} \sigma} \exp\left[-\frac{1}{2}\left(\frac{y - \mu}{\sigma}\right)^2\right], \quad y \in \R \] The mapping \( x = e^y \) maps \( \R \) one-to-one onto \( (0, \infty) \) with inverse \( y = \ln x \). Hence the PDF \( f \) of \( X = e^Y \) is \[ f(x) = g(y) \frac{dy}{dx} = g\left(\ln x\right) \frac{1}{x} \] Substituting gives the result. Parts (a)–(d) follow from standard calculus.
In the special distribution simulator, select the lognormal distribution. Vary the parameters and note the shape and location of the probability density function. For selected values of the parameters, run the simulation 1000 times and compare the empirical density function to the true probability density function.
Let \(\Phi\) denote the standard normal distribution function, so that \(\Phi^{-1}\) is the standard normal quantile function. Recall that values of \(\Phi\) and \(\Phi^{-1}\) can be obtained from the special distribution calculator, as well as standard mathematical and statistical software packages, and in fact these functions are considered to be special functions in mathematics. The following two results show how to compute the lognormal distribution function and quantiles in terms of the standard normal distribution function and quantiles.
The distribution function \(F\) of \(X\) is given by \[ F(x) = \Phi \left( \frac{\ln x - \mu}{\sigma} \right), \quad x \in (0, \infty) \]
Proof
Once again, write \( X = e^{\mu + \sigma Z} \) where \( Z \) has the standard normal distribution. For \( x \gt 0 \), \[ F(x) = \P(X \le x) = \P\left(Z \le \frac{\ln x - \mu}{\sigma}\right) = \Phi \left( \frac{\ln x - \mu}{\sigma} \right) \]
The quantile function of \(X\) is given by \[ F^{-1}(p) = \exp\left[\mu + \sigma \Phi^{-1}(p)\right], \quad p \in (0, 1) \]
Proof
This follows by solving \( p = F(x) \) for \( x \) in terms of \( p \).
In the special distribution calculator, select the lognormal distribution. Vary the parameters and note the shape and location of the probability density function and the distribution function. With \(\mu = 0\) and \(\sigma = 1\), find the median and the first and third quartiles.
Moments
The moments of the lognormal distribution can be computed from the moment generating function of the normal distribution. Once again, we assume that \(X\) has the lognormal distribution with parameters \(\mu \in \R\) and \(\sigma \in (0, \infty)\).
For \( t \in \R \), \[ \E\left(X^t\right) = \exp \left( \mu t + \frac{1}{2} \sigma^2 t^2 \right) \]
Proof
Recall that if \( Y \) has the normal distribution with mean \( \mu \in \R \) and standard deviation \( \sigma \in (0, \infty) \), then \( Y \) has moment generating function given by \[ \E\left(e^{t Y}\right) = \exp\left(\mu t + \frac{1}{2} \sigma^2 t^2\right), \quad t \in \R \] Hence the result follows immediately since \( \E\left(X^t\right) = \E\left(e^{t Y}\right) \).
In particular, the mean and variance of \(X\) are
- \(\E(X) = \exp\left(\mu + \frac{1}{2} \sigma^2\right)\)
- \(\var(X) = \exp\left[2 (\mu + \sigma^2)\right] - \exp\left(2 \mu + \sigma^2\right)\)
In the simulation of the special distribution simulator, select the lognormal distribution. Vary the parameters and note the shape and location of the mean\( \pm \)standard deviation bar. For selected values of the parameters, run the simulation 1000 times and compare the empirical moments to the true moments.
From the general formula for the moments, we can also compute the skewness and kurtosis of the lognormal distribution.
The skewness and kurtosis of \(X\) are
- \( \skw(X) = \left(e^{\sigma^2} + 2\right) \sqrt{e^{\sigma^2} - 1} \)
- \(\kur(X) = e^{4 \sigma^2} + 2 e^{3 \sigma^2} + 3 e^{2 \sigma^2} - 3\)
Proof
These result follow from the first 4 moments of the lognormal distribution and the standard computational formulas for skewness and kurtosis.
The fact that the skewness and kurtosis do not depend on \( \mu \) is due to the fact that \( \mu \) is a scale parameter. Recall that skewness and kurtosis are defined in terms of the standard score and so are independent of location and scale parameters. Naturally, the lognormal distribution is positively skewed. Finally, note that the excess kurtosis is \[ \kur(X) - 3 = e^{4 \sigma^2} + 2 e^{3 \sigma^2} + 3 e^{2 \sigma^2} - 6 \]
Even though the lognormal distribution has finite moments of all orders, the moment generating function is infinite at any positive number. This property is one of the reasons for the fame of the lognormal distribution.
\(\E\left(e^{t X}\right) = \infty\) for every \(t \gt 0\).
Proof
By definition, \(X = e^Y\) where \(Y\) has the normal distribution with mean \(\mu\) and standard deviation \(\sigma\). Using the change of variables formula for expected value we have \[\E\left(e^{t X}\right) = \E\left(e^{t e^Y}\right) = \int_{-\infty}^\infty \exp(t e^y) \frac{1}{\sqrt{2 \pi} \sigma} \exp\left[-\frac{1}{2}\left(\frac{y - \mu}{\sigma}\right)^2\right] dy = \frac{1}{\sqrt{2 \pi} \sigma} \int_{-\infty}^\infty \exp\left[t e^y - \frac{1}{2} \left(\frac{y - \mu}{\sigma}\right)^2\right] dy\] If \(t \gt 0\) the integrand in the last integral diverges to \(\infty\) as \(y \to \infty\), so there is no hope that the integral converges.
Related Distributions
The most important relations are the ones between the lognormal and normal distributions in the definition: if \(X\) has a lognormal distribution then \(\ln X\) has a normal distribution; conversely if \(Y\) has a normal distribution then \(e^Y\) has a lognormal distribution. The lognormal distribution is also a scale family.
Suppose that \( X \) has the lognormal distribution with parameters \( \mu \in \R \) and \( \sigma \in (0, \infty) \) and that \( c \in (0, \infty) \). Then \( c X \) has the lognormal distribution with parameters \( \mu + \ln c\) and \( \sigma \).
Proof
From the definition , we can write \( X = e^Y \) where \( Y \) has the normal distribution with mean \( \mu \) and standard deviation \( \sigma \). Hence \[ c X = c e^Y = e^{\ln c} e^Y = e^{\ln c + Y} \] But \( \ln c + Y \) has the normal distribution with mean \( \ln c + \mu \) and standard deviation \( \sigma \).
The reciprocal of a lognormal variable is also lognormal.
If \(X\) has the lognormal distribution with parameters \(\mu \in \R\) and \(\sigma \in (0, \infty)\) then \(1 / X\) has the lognormal distribution with parameters \(-\mu\) and \(\sigma\).
Proof
Again from the definition , we can write \( X = e^Y \) where \(Y\) has the normal distribution with mean \(\mu\) and standard deviation \(\sigma\). Hence \(1 / X = e^{-Y}\). But \(-Y\) has the normal distribution with mean \(-\mu\) and standard deviation \(\sigma\).
The lognormal distribution is closed under non-zero powers of the underlying variable. In particular, this generalizes the previous result.
Suppose that \(X\) has the lognormal distribution with parameters \(\mu \in \R\) and \(\sigma \in (0, \infty)\) and that \(a \in \R \setminus \{0\}\). Then \(X^a\) has the lognormal distribution with parameters with parameters \(a \mu\) and \(|a| \sigma\).
Proof
Again from the definition , we can write \( X = e^Y \) where \(Y\) has the normal distribution with mean \(\mu\) and standard deviation \(\sigma\). Hence \(X^a = e^{a Y}\). But \(a Y\) has the normal distribution with mean \(a \mu\) and standard deviation \(|a| \sigma\).
Since the normal distribution is closed under sums of independent variables, it's not surprising that the lognormal distribution is closed under products of independent variables.
Suppose that \(n \in \N_+\) and that \((X_1, X_2, \ldots, X_n)\) is a sequence of independent variables, where \(X_i\) has the lognormal distribution with parameters \(\mu_i \in \R\) and \(\sigma_i \in (0, \infty)\) for \(i \in \{1, 2, \ldots, n\}\). Then \(\prod_{i=1}^n X_i\) has the lognormal distribution with parameters \(\mu\) and \(\sigma\) where \(\mu = \sum_{i=1}^n \mu_i\) and \(\sigma^2 = \sum_{i=1}^n \sigma_i^2\).
Proof
Again from the definition , we can write \( X_i = e^{Y_i} \) where \(Y_i\) has the normal distribution with mean \(\mu_i\) and standard deviation \(\sigma_i\) for \(i \in \{1, 2, \ldots, n\}\) and where \((Y_1, Y_2, \ldots, Y_n)\) is an independent sequence. Hence \(\prod_{i=1}^n X_i = \exp\left(\sum_{i=1}^n Y_i\right)\). But \(\sum_{i=1}^n Y_i\) has the normal distribution with mean \(\sum_{i=1}^n \mu_i\) and variance \(\sum_{i=1}^n \sigma_i^2\).
Finally, the lognormal distribution belongs to the family of general exponential distributions.
Suppose that \( X \) has the lognormal distribution with parameters \( \mu \in \R \) and \( \sigma \in (0, \infty) \). The distribution of \( X \) is a 2-parameter exponential family with natural parameters and natural statistics, respectively, given by
- \(\left( -1 / 2 \sigma^2, \mu / \sigma^2 \right)\)
- \(\left(\ln^2(X), \ln X\right)\)
Proof
This follows from the definition of the general exponential family, since we can write the lognormal PDF in the form \[ f(x) = \frac{1}{\sqrt{2 \pi} \sigma} \exp\left(-\frac{\mu^2}{2 \sigma^2}\right) \frac{1}{x} \exp\left[-\frac{1}{2 \sigma^2} \ln^2(x) + \frac{\mu}{\sigma^2} \ln x\right], \quad x \in (0, \infty) \]
Computational Exercises
Suppose that the income \(X\) of a randomly chosen person in a certain population (in $1000 units) has the lognormal distribution with parameters \(\mu = 2\) and \(\sigma = 1\). Find \(\P(X \gt 20)\).
Answer
\(\P(X \gt 20) = 0.1497\)
Suppose that the income \(X\) of a randomly chosen person in a certain population (in $1000 units) has the lognormal distribution with parameters \(\mu = 2\) and \(\sigma = 1\). Find each of the following:
- \(\E(X)\)
- \(\var(X)\)
Answer
- \(\E(X) = e^{5/2} \approx 12.1825\)
- \(\sd(X) = \sqrt{e^6 - e^5} \approx 15.9629\)