5.17: The Beta Distribution
- Page ID
- 10357
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)In this section, we will study the beta distribution, the most important distribution that has bounded support. But before we can study the beta distribution we must study the beta function.
The Beta Function
Definition
The beta function \( B \) is defined as follows: \[ B(a, b) = \int_0^1 u^{a-1} (1 - u)^{b - 1} du; \quad a, \, b \in (0, \infty) \]
Proof that \( B \) is well defined
We need to show that \(B(a, b) \lt \infty\) for every \(a, \, b \in (0, \infty)\). The integrand is positive on \( (0, 1) \), so the integral exists, either as a real number or \( \infty \). If \( a \ge 1 \) and \( b \ge 1 \), the integrand is continuous on \( [0, 1] \), so of course the integral is finite. Thus, the only cases of interest are when \( 0 \lt a \lt 1 \) or \( 0 \lt b \lt 1 \). Note that \[ \int_0^1 u^{a-1} (1 - u)^{b - 1} du = \int_0^{1/2} u^{a-1} (1 - u)^{b - 1} du + \int_{1/2}^1 u^{a-1} (1 - u)^{b - 1} du \] If \(0 \lt a \lt 1\), \((1 - u)^{b-1}\) is bounded on \(\left(0, \frac{1}{2}\right]\) and \( \int_0^{1/2} u^{a - 1} \, du = \frac{1}{a 2^a} \). Hence the first integral on the right in the displayed equation is finite. Similarly, If \(0 \lt b \lt 1\), \(u^{a-1}\) is bounded on \(\left[\frac{1}{2}, 1\right)\) and \( \int_{1/2}^1 (1 - u)^{b-1} \, du = \frac{1}{b 2^b} \). Hence the second integral on the right in the displayed equation is also finite.
The beta function was first introduced by Leonhard Euler.
Properties
The beta function satisfies the following properties:
- \(B(a, b) = B(b, a)\) for \( a, \, b \in (0, \infty) \), so \( B \) is symmetric.
- \(B(a, 1) = \frac{1}{a}\) for \( a \in (0, \infty) \)
- \( B(1, b) = \frac{1}{b} \) for \( b \in (0, \infty) \)
Proof
- Using the substitution \( v = 1 - u \) we have \[ B(a, b) = \int_0^1 u^{a-1}(1 - u)^{b-1} du = \int_0^1 (1 - v)^{a-1} v^{b-1} dv = B(b, a) \]
- \( B(a, 1) = \int_0^1 u^{a-1} du = \frac{1}{a} \)
- This follows from (a) and (b).
The beta function has a simple expression in terms of the gamma function:
If \( a, \, b \in (0, \infty) \) then \[ B(a, b) = \frac{\Gamma(a) \Gamma(b)}{\Gamma(a + b)} \]
Proof
From the definitions, we can express \(\Gamma(a + b) B(a, b)\) as a double integral: \[ \Gamma(a + b) B(a, b) = \int_0^\infty x^{a + b -1} e^{-x} dx \int_0^1 y^{a-1} (1 - y)^{b-1} dy = \int_0^\infty \int_0^1 (x y)^{a-1}[x (1 - y)]^{b-1} x e^{-x} dx \, dy \] Next we use the transformation \(w = x y\), \(z = x (1 - y)\) which maps \((0, \infty) \times (0, 1) \) one-to-one onto \( (0, \infty) \times (0, \infty) \). The inverse transformation is \( x = w + z \), \( y = w \big/(w + z) \) and the absolute value of the Jacobian is \[ \left|\det\frac{\partial(x, y)}{\partial(w, z)}\right| = \frac{1}{(w + z)} \] Thus, using the change of variables theorem for multiple integrals, the integral above becomes \[ \int_0^\infty \int_0^\infty w^{a-1} z^{b-1} (w + z) e^{-(w + z)} \frac{1}{w + z} dw \, dz \] which after simplifying is \( \Gamma(a) \Gamma(b) \).
Recall that the gamma function is a generalization of the factorial function. Here is the corresponding result for the beta function:
If \(j, \, k \in \N_+\) then \[ B(j, k) = \frac{(j - 1)! (k - 1)!}{(j + k - 1)!} \]
Proof
Recall that \( \Gamma(n) = (n - 1)! \) for \( n \in \N_+ \), so this result follows from the previous one.
Let's generalize this result. First, recall from our study of combinatorial structures that for \(a \in \R\) and \(j \in \N\), the ascending power of base \( a \) and order \( j \) is \[ a^{[j]} = a (a + 1) \cdots [a + (j - 1)] \]
If \(a, \, b \in (0, \infty)\), and \(j, \, k \in \N\), then \[ \frac{B(a + j, b + k)}{B(a, b)} = \frac{a^{[j]} b^{[k]}}{(a + b)^{[j + k]}} \]
Proof
Recall that \( \Gamma(a + j) = a^{[j]} \Gamma(a) \), so the result follows from the representation above for the beta function in terms of the gamma function.
\(B\left(\frac{1}{2}, \frac{1}{2}\right) = \pi\).
Proof
The Incomplete Beta Function
The integral that defines the beta function can be generalized by changing the interval of integration from \((0, 1)\) to \((0, x)\) where \(x \in [0, 1]\).
The incomplete beta function is defined as follows \[ B(x; a, b) = \int_0^x u^{a-1} (1 - u)^{b-1} du, \quad x \in (0, 1); \; a, \, b \in (0, \infty) \]
Of course, the ordinary (complete) beta function is \(B(a, b) = B(1; a, b)\) for \( a, \, b \in (0, \infty) \).
The Standard Beta Distribution
Distribution Functions
The beta distributions are a family of continuous distributions on the interval \( (0, 1) \).
The (standard) beta distribution with left parameter \( a \in (0, \infty) \) and right parameter \( b \in (0, \infty) \) has probability density function \( f \) given by \[ f(x) = \frac{1}{B(a, b)} x^{a-1} (1 - x)^{b-1}, \quad x \in (0, 1) \]
Of course, the beta function is simply the normalizing constant, so it's clear that \( f \) is a valid probability density function. If \( a \ge 1 \), \( f \) is defined at 0, and if \( b \ge 1 \), \( f \) is defined at 1. In these cases, it's customary to extend the domain of \( f \) to these endpoints. The beta distribution is useful for modeling random probabilities and proportions, particularly in the context of Bayesian analysis. The distribution has just two parameters and yet a rich variety of shapes (so in particular, both parameters are shape parameters). Qualitatively, the first order properties of \( f \) depend on whether each parameter is less than, equal to, or greater than 1.
For \( a, \, b \in (0, \infty) \) with \( a + b \ne 2 \), define \[ x_0 = \frac{a - 1}{a + b - 2} \]
- If \(0 \lt a \lt 1\) and \(0 \lt b \lt 1\), \( f \) decreases and then increases with minimum value at \(x_0 \) and with \( f(x) \to \infty \) as \( x \downarrow 0 \) and as \( x \uparrow 1 \).
- If \(a = 1\) and \(b = 1\), \( f \) is constant.
- If \(0 \lt a \lt 1\) and \(b \ge 1\), \( f \) is decreasing with \( f(x) \to \infty \) as \( x \downarrow 0 \).
- If \(a \ge 1\) and \(0 \lt b \lt 1\), \( f \) is increasing with \( f(x) \to \infty \) as \( x \uparrow 1 \).
- If \(a = 1\) and \(b \gt 1\), \( f \) is decreasing with mode at \( x = 0 \).
- If \(a \gt 1\) and \(b = 1\), \( f \) is increasing with mode at \( x = 1 \).
- If \(a \gt 1\) and \(b \gt 1\), \( f \) increases and then decreases with mode at \( x_0 \).
Proof
These results follow from standard calculus. The first derivative is \[ f^\prime(x) = \frac{1}{B(a, b)} x^{a - 2}(1 - x)^{b - 2} [(a - 1) - (a + b - 2) x], \quad 0 \lt x \lt 1 \]
From part (b), note that the special case \(a = 1\) and \(b = 1\) gives the continuous uniform distribution on the interval \((0, 1)\) (the standard uniform distribution). Note also that when \(a \lt 1\) or \(b \lt 1\), the probability density function is unbounded, and hence the distribution has no mode. On the other hand, if \(a \ge 1\), \(b \ge 1\), and one of the inequalites is strict, the distribution has a unique mode at \( x_0 \). The second order properties are more complicated.
For \( a, \, b \in (0, \infty) \) with \( a + b \notin \{2, 3\} \) and \( (a - 1)(b - 1)(a + b - 3) \ge 0 \), define \begin{align} x_1 &= \frac{(a - 1)(a + b - 3) - \sqrt{(a - 1)(b - 1)(a + b - 3)}}{(a + b - 3)(a + b - 2)}\\ x_2 &= \frac{(a - 1)(a + b - 3) + \sqrt{(a - 1)(b - 1)(a + b - 3)}}{(a + b - 3)(a + b - 2)} \end{align} For \( a \lt 1 \) and \( a + b = 2 \) or for \( b \lt 1 \) and \( a + b = 2 \), define \( x_1 = x_2 = 1 - a / 2 \).
- If \( a \le 1 \) and \( b \le 1 \), or if \( a \le 1 \) and \( b \ge 2 \), or if \( a \ge 2 \) and \( b \le 1 \), \( f \) is concave upward.
- If \( a \le 1 \) and \( 1 \lt b \lt 2 \), \( f \) is concave upward and then downward with inflection point at \( x_1 \).
- If \( 1 \lt a \lt 2 \) and \( b \le 1 \), \( f \) is concave downward and then upward with inflection point at \( x_2 \).
- If \( 1 \lt a \le 2 \) and \( 1 \lt b \le 2 \), \( f \) is concave downward.
- If \( 1 \lt a \le 2 \) and \( b \gt 2 \), \( f \) is concave downward and then upward with inflection point at \( x_2 \).
- If \( a \gt 2 \) and \( 1 \lt b \le 2 \), \( f \) is concave upward and then downward with inflection point at \( x_1 \).
- If \( a \gt 2 \) and \( b \gt 2 \), \( f \) is concave upward, then downward, then upward again, with inflection points at \( x_1 \) and \( x_2 \).
Proof
These results follow from standard (but very tedious) calculus. The second derivative is \[ f^{\prime\prime}(x) = \frac{1}{B(a, b)} x^{a - 3}(1 - x)^{b - 3} \left[(a + b - 2)(a + b - 3) x^2 - 2 (a - 1)(a + b - 3) x + (a - 1)(a - 2)\right] \]
In the special distribution simulator, select the beta distribution. Vary the parameters and note the shape of the beta density function. For selected values of the parameters, run the simulation 1000 times and compare the empirical density function to the true density function.
The special case \(a = \frac{1}{2}\), \(b = \frac{1}{2}\) is the arcsine distribution, with probability density function given by \[ f(x) = \frac{1}{\pi \, \sqrt{x (1 - x)}}, \quad x \in (0, 1) \] This distribution is important in a number of applications, and so the arcsine distribution is studied in a separate section.
The beta distribution function \(F\) can be easily expressed in terms of the incomplete beta function. As usual \(a\) denotes the left parameter and \(b\) the right parameter.
The beta distribution function \( F \) with parameters \(a, \, b \in (0, \infty)\) is given by \[ F(x) = \frac{B(x; a, b)}{B(a, b)}, \quad x \in (0, 1) \]
The distribution function \( F \) is sometimes known as the regularized incomplete beta function. In some special cases, the distribution function \(F\) and its inverse, the quantile function \(F^{-1}\), can be computed in closed form, without resorting to special functions.
If \(a \in (0, \infty)\) and \(b = 1\) then
- \(F(x) = x^a\) for \(x \in (0, 1)\)
- \(F^{-1}(p) = p^{1/a}\) for \(p \in (0, 1)\)
If \(a = 1\) and \(b \in (0, \infty)\) then
- \(F(x) = 1 - (1 - x)^b\) for \(x \in (0, 1)\)
- \(F^{-1}(p) = 1 - (1 - p)^{1/b}\) for \(p \in (0, 1)\)
If \(a = b = \frac{1}{2}\) (the arcsine distribution) then
- \(F(x) = \frac{2}{\pi} \arcsin\left(\sqrt{x}\right)\) for \(x \in (0, 1)\)
- \(F^{-1}(p) = \sin^2\left(\frac{\pi}{2} p\right)\) for \(p \in (0, 1)\)
There is an interesting relationship between the distribution functions of the beta distribution and the binomial distribution, when the beta parameters are positive integers. To state the relationship we need to embellish our notation to indicate the dependence on the parameters. Thus, let \(F_{a, b}\) denote the beta distribution function with left parameter \(a \in (0, \infty)\) and right parameter \(b \in (0, \infty)\), and let \(G_{n,p}\) denote the binomial distribution function with trial parameter \(n \in \N_+\) and success parameter \(p \in (0, 1)\).
If \(j, \, k \in \N_+\) and \(x \in (0, 1)\) then \[ F_{j,k}(x) = G_{j + k - 1, 1 - x}(k - 1) \]
Proof
By definition \[ F_{j,k}(x) = \frac{1}{B(j,k)} \int_0^x t^{j-1} (1 - t)^{k-1} dt \] Integrate by parts with \( u = (1 - t)^{k-1} \) and \( dv = t^{j-1} dt \), so that \( du = -(k -1) (1 - t)^{k-2} \) and \( v = t^j / j \). The result is \[ F_{j,k}(x) = \frac{1}{j B(j, k)} (1 - x)^{k-1} x^j + \frac{k-1}{j B(j, k)} \int_0^x t^j (1 - t)^{k-2} dt\] But by the property of the beta function above, \( B(j, k) = (j - 1)! (k - 1)! \big/(j + k - 1)! \). Hence \( 1 \big/ j B(j, k) = \binom{j + k - 1}{k - 1} \) and \( (k - 1) \big/ j B(j, k) = 1 \big/ B(j + 1, k - 1) \). Thus, the last displayed equation can be rewritten as \[F_{j,k}(x) = \binom{j + k - 1}{k - 1} (1 - x)^{k-1} x^j + F_{j + 1, k - 1}(x)\] Recall from the special case above that \(F_{j + k - 1, 1}(x) = x^{j + k - 1}\). Iterating the last displayed equation gives the result.
In the special distribution calculator, select the beta distribution. Vary the parameters and note the shape of the density function and the distribution function. In each of the following cases, find the median, the first and third quartiles, and the interquartile range. Sketch the boxplot.
- \(a = 1\), \(b = 1\)
- \(a = 1\), \(b = 3\)
- \(a = 3\), \(b = 1\)
- \(a = 2\), \(b = 4\)
- \(a = 4\), \(b = 2\)
- \(a = 4\), \(b = 4\)
Moments
The moments of the beta distribution are easy to express in terms of the beta function. As before, suppose that \(X\) has the beta distribution with left parameter \(a \in (0, \infty)\) and right parameter \(b \in (0, \infty)\).
If \(k \in [0, \infty)\) then \[ \E\left(X^k\right) = \frac{B(a + k, b)}{B(a, b)} \] In particular, if \( k \in \N \) then \[ \E\left(X^k\right) = \frac{a^{[k]}}{(a + b)^{[k]}} \]
Proof
Note that \[ \E\left(X^k\right) = \int_0^1 x^k \frac{1}{B(a, b)} x^{a-1} (1 - x)^{b - 1} dx = \frac{1}{B(a, b)} \int_0^1 x^{a + k - 1} (1 - x)^{b - 1} dx = \frac{B(a + k, b)}{B(a, b)}\] If \( k \in \N \), the formula simplifies by the property of the beta function above.
From the general formula for the moments, it's straightforward to compute the mean, variance, skewness, and kurtosis.
The mean and variance of \( X \) are \begin{align} \E(X) &= \frac{a}{a + b} \\ \var(X) &= \frac{a b}{(a + b)^2 (a + b + 1)} \end{align}
Proof
The formula for the mean and variance follow from the formula for the moments and the computational formula \( \var(X) = \E(X^2) - [\E(X)]^2 \)
Note that the variance depends on the parameters \( a \) and \( b \) only through the product \( a b \) and the sum \( a + b \).
Open the special distribution simulator and select the beta distribution. Vary the parameters and note the size and location of the mean\(\pm\)standard deviation bar. For selected values of the parameters, run the simulation 1000 times and compare the sample mean and standard deviation to the distribution mean and standard deviation.
The skewness and kurtosis of \( X \) are \begin{align} \skw(X) &= \frac{2 (b - a) \sqrt{a + b + 1}}{(a + b + 2) \sqrt{a b}} \\ \kur(X) &= \frac{3 a^3 b + 3 a b^3 + 6 a^2 b^2 + a^3 + b^3 + 13 a^2 b + 13 a b^2 + a^2 + b^2 + 14 a b}{a b (a + b + 2) (a + b + 3)} \end{align}
Proof
These results follow from the computational formulas that give the skewness and kurtosis in terms of \( \E(X^k) \) for \( k \in \{1, 2, 3, 4\} \), and the formula for the moments above.
In particular, note that the distribution is positively skewed if \( a \lt b \), unskewed if \( a = b \) (the distribution is symmetric about \( x = \frac{1}{2} \) in this case) and negatively skewed if \( a \gt b \).
Open the special distribution simulator and select the beta distribution. Vary the parameters and note the shape of the probability density function in light of the previous result on skewness. For various values of the parameters, run the simulation 1000 times and compare the empirical density function to the true probability density function.
Related Distributions
The beta distribution is related to a number of other special distributions.
If \(X\) has the beta distribution with left parameter \(a \in (0, \infty)\) and right parameter \(b \in (0, \infty)\) then \(Y = 1 - X\) has the beta distribution with left parameter \(b\) and right parameter \(a\).
Proof
This follows from the standard change of variables formula. If \( f \) and \( g \) denote the PDFs of \( X \) and \( Y \) respectively, then \[ g(y) = f(1 - y) = \frac{1}{B(a, b)} (1 - y)^{a - 1} y^{b - 1} = \frac{1}{B(b, a)} y^{b - 1} (1 - y)^{a - 1}, \quad y \in (0, 1)\]
The beta distribution with right parameter 1 has a reciprocal relationship with the Pareto distribution.
Suppose that \( a \in (0, \infty) \).
- If \(X\) has the beta distribution with left parameter \(a \) and right parameter 1 then \(Y = 1 / X\) has the Pareto distribution with shape parameter \(a\).
- If \( Y \) has the Pareto distribution with shape parameter \( a \) then \( X = 1 / Y \) has the beta distribution with left parameter \( a \) and right parameter 1.
Proof
The two results are equivalent. In (a), suppose that \( X \) has the beta distribution with parameters \( a \) and 1. The transformation \( y = 1 / x \) maps \( (0, 1) \) one-to-one onto \( (0, \infty) \). The inverse is \( x = 1 / y \) with \( dx/dy = -1/y^2 \). Recall also that \( B(a, 1) = 1 / a \). By the change of variables formula, the PDF \( g \) of \( Y = 1 / X \) is given by \[ g(y) = f\left(\frac{1}{y}\right) \frac{1}{y^2} = a \left(\frac{1}{y}\right)^{a-1} \frac{1}{y^2} = \frac{a}{y^{a+1}}, \quad y \in (0, \infty) \] We recognize \( g \) as the PDF of the Pareto distribution with shape parameter \( a \).
The following result gives a connection between the beta distribution and the gamma distribution.
Suppose that \(X\) has the gamma distribution with shape parameter \(a \in (0, \infty)\) and rate parameter \(r \in (0, \infty)\), \(Y\) has the gamma distribution with shape parameter \(b \in (0, \infty)\) and rate parameter \(r\), and that \(X\) and \(Y\) are independent. Then \(V = X \big/ (X + Y)\) has the beta distribution with left parameter \(a\) and right parameter \(b\).
Proof
Let \( U = X + Y \) and \( V = X \big/ (X + Y) \). We will actually prove stronger results: \( U \) and \( V \) are independent, \( U \) has the gamma distribution with shape parameter \( a + b \) and rate parameter \( r \), and \( V \) has the beta distribution with parameters \( a \) and \( b \). First note that \( (X, Y) \) has joint PDF \( f \) given by \[ f(x, y) = \frac{r^a}{\Gamma(a)} x^{a-1} e^{-r x} \frac{r^b}{\Gamma(b)} y^{b-1} e^{-r y} = \frac{r^{a+b}}{\Gamma(a) \Gamma(b)} x^{a-1} y^{b-1} e^{-r(x + y)}; \quad x, \, y \in (0, \infty) \] The transformation \( u = x + y \) and \( v = x \big/ (x + y) \) maps \( (0, \infty) \times (0, \infty) \) one-to-one onto \( (0, \infty) \times (0, 1) \). The inverse is \( x = u v \), \( y = u(1 - v) \) and the absolute value of the Jacobian is \[ \left|\det \frac{\partial(x, y)}{\partial(u, v)}\right| = u \] Hence by the multivariate change of variables theorem, the PDF \( g \) of \( (U, V) \) is given by \begin{align} g(u, v) & = f[u v, u(1 - v)] u = \frac{r^{a+b}}{\Gamma(a) \Gamma(b)} (u v)^{a-1} [u(1 - v)]^{b-1} e^{-ru} u \\ & = \frac{r^{a+b}}{\Gamma(a) \Gamma(b)} u^{a+b-1} e^{-ru} v^{a-1} (1 - v)^{b-1} \\ & = \frac{r^{a+b}}{\Gamma(a + b)} u^{a+b-1} e^{-ru} \frac{\Gamma(a + b)}{\Gamma(a) \Gamma(b)} v^{a-1} (1 - v)^{b-1}; \quad u \in (0, \infty), v \in (0, 1) \end{align} The results now follow from the factorization theorem. The factor in \( u \) is the gamma PDF with shape parameter \( a + b \) and rate parameter \( r \) while the factor in \( v \) is the beta PDF with parameters \( a \) and \( b \).
The following result gives a connection between the beta distribution and the \(F\) distribution. This connection is a minor variation of the previous result.
If \(X\) has the \(F\) distribution with \(n \in (0, \infty)\) degrees of freedom in the numerator and \(d \in (0, \infty)\) degrees of freedom in the denominator then \[ Y = \frac{(n / d) X}{1 + (n / d)X} \] has the beta distribution with left parameter \(a = n / 2\) and right parameter \(b = d / 2\).
Proof
If \(X\) has the \(F\) distribution with \(n \gt 0\) degrees of freedom in the numerator and \(d \gt 0\) degrees of freedom in the denominator then \( X \) can be written as \[ X = \frac{U / n}{V / d} \] where \( U \) has the chi-square distribution with \( n \) degrees of freedom, \( V \) has the chi-square distribution with \( d \) degrees of freedom, and \( U \) and \( V \) are independent. Hence \[ Y = \frac{(n / d) X}{1 + (n / d) X} = \frac{U / V}{1 + U / V} = \frac{U}{U + V} \] But the chi-square distribution is a special case of the gamma distribution. Specifically, \( U \) has the gamma distribution with shape parameter \( n / 2 \) and rate parameter \( 1/2 \), \( V \) has the gamma distribution with shape parameter \( d / 2 \) and rate parameter \( 1/2 \), and again \( U \) and \( V \) are independent. Hence by the previous result, \( Y \) has the beta distribution with left parameter \( n/2 \) and right parameter \( d/2 \).
Our next result is that the beta distribution is a member of the general exponential family of distributions.
Suppose that \(X\) has the beta distribution with left parameter \(a \in (0, \infty)\) and right parameter \(b \in (0, \infty)\). Then the distribution is a two-parameter exponential family with natural parameters \(a - 1\) and \(b - 1\), and natural statistics \(\ln(X)\) and \(\ln(1 - X)\).
Proof
This follows from the definition of the general exponential distribution, since the PDF \( f \) of \( X \) can be written as \[ f(x) = \frac{1}{B(a, b)} \exp\left[(a - 1) \ln(x) + (b - 1) \ln(1 - x)\right], \quad x \in (0, 1) \]
The beta distribution is also the distribution of the order statistics of a random sample from the standard uniform distribution.
Suppose \( n \in \N_+ \) and that \( (X_1, X_2, \ldots, X_n) \) is a sequence of independent variables, each with the standard uniform distribution. For \( k \in \{1, 2, \ldots, n\} \), the \( k \)th order statistics \( X_{(k)} \) has the beta distribution with left parameter \( a = k \) and right parameter \( b = n - k + 1 \).
Proof
See the section on order statistics.
One of the most important properties of the beta distribution, and one of the main reasons for its wide use in statistics, is that it forms a conjugate family for the success probability in the binomial and negative binomial distributions.
Suppose that \( P \) is a random probability having the beta distribution with left parameter \( a \in (0, \infty) \) and right parameter \( b \in (0, \infty) \). Suppose also that \( X \) is a random variable such that the conditional distribution of \( X \) given \( P = p \in (0, 1) \) is binomial with trial parameter \( n \in \N_+ \) and success parameter \( p \). Then the conditional distribution of \( P \) given \( X = k \) is beta with left parameter \( a + k \) and right parameter \( b + n - k \).
Proof
The joint PDF \( f \) of \( (P, X) \) on \( (0, 1) \times \{0, 1, \ldots n\} \) is given by \[ f(p, k) = \frac{1}{B(a, b)} p^{a-1} (1 - p)^{b-1} \binom{n}{k} p^k (1 - p)^{n-k} = \frac{1}{B(a, b)} \binom{n}{k} p^{a + k - 1} (1 - p)^{b + n - k - 1} \] The conditional PDF of \( P \) given \( X = k \) is simply the normalized version of the function \( p \mapsto f(p, k) \). We can tell from the functional form that this distribution is beta with the parameters given in the theorem.
Suppose again that \( P \) is a random probability having the beta distribution with left parameter \( a \in (0, \infty) \) and right parameter \( b \in (0, \infty) \). Suppose also that \( N \) is a random variable such that the conditional distribution of \( N \) given \( P = p \in (0, 1) \) is negative binomial with stopping parameter \( k \in \N_+ \) and success parameter \( p \). Then the conditional distribution of \( P \) given \( N = n \) is beta with left parameter \( a + k \) and right parameter \( b + n - k \).
Proof
The joint PDF \( f \) of \( (P, N) \) on \( (0, 1) \times \{k, k + 1, \ldots\} \) is given by \[ f(p, n) = \frac{1}{B(a, b)} p^{a-1} (1 - p)^{b-1} \binom{n - 1}{k - 1} p^k (1 - p)^{n-k} = \frac{1}{B(a, b)} \binom{n - 1}{k - 1} p^{a + k - 1} (1 - p)^{b + n - k - 1} \] The conditional PDF of \( P \) given \( N = n \) is simply the normalized version of the function \( p \mapsto f(p, n) \). We can tell from the functional form that this distribution is beta with the parameters given in the theorem.
in both cases, note that in the posterior distribution of \( P \), the left parameter is increased by the number of successes and the right parameter by the number of failures. For more on this, see the section on Bayesian estimation in the chapter on point estimation.
The General Beta Distribution
The beta distribution can be easily generalized from the support interval \((0, 1)\) to an arbitrary bounded interval using a linear transformation. Thus, this generalization is simply the location-scale family associated with the standard beta distribution.
Suppose that \(Z\) has the standard beta distibution with left parameter \(a \in (0, \infty)\) and right parameter \(b \in (0, \infty)\). For \(c \in \R\) and \(d \in (0, \infty)\) random variable \(X = c + d Z\) has the beta distribution with left parameter \( a \), right parameter \( b \), location parameter \( c \) and scale parameter \( d \).
For the remainder of this discussion, suppose that \( X \) has the distribution in the definition above.
\(X\) has probability density function
\[ f(x) = \frac{1}{B(a, b) d^{a + b - 1}} (x - c)^{a - 1} (c + d - x)^{b - 1}, \quad x \in (c, c + d) \]Proof
This follows from a standard result for location-scale families. If \( g \) denotes the standard beta PDF of \( Z \), then \( X \) has PDF \( f \) given by \[ f(x) = \frac{1}{d} g\left(\frac{x - c}{d}\right), \quad x \in (c, c + d) \]
Most of the results in the previous sections have simple extensions to the general beta distribution.
The mean and variance of \( X \) are
- \(\E(X) = c + d \frac{a}{a + b}\)
- \(\var(X) = d^2 \frac{a b}{(a + b)^2 (a + b + 1)}\)
Proof
This follows from the standard mean and variance and basic properties of expected value and variance.
- \( \E(X) = c + d \E(Z)\)
- \( \var(X) = d^2 \var(Z) \)
Recall that skewness and variance are defined in terms of standard scores, and hence are unchanged under location-scale transformations. Hence the skewness and kurtosis of \( X \) are just as for the standard beta distribution.