Skip to main content
Statistics LibreTexts

16.5: Beta-Binomial Regression

  • Page ID
    57790
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    There is yet another option for modeling dependent variables that are a count with an upper bound. The Binomial distribution is rather restrictive in that the dispersion parameter must be 1. We were able to relax this requirement by including a new variable to be the level of overdispersion to be estimated from the data. All this did, however, was admit that the Binomial distribution is not the correct distribution. Using MQLE allowed us to estimate the standard errors better, but we still clung to the Binomial distribution as the "true" distribution of the data.

    It is better to select an entirely new distribution than continue to twist the Binomial into something that it is not. This new distribution must be known, and it must allow for non-unit dispersion.

    Thankfully, we have such a distribution — the Beta-Binomial distribution. Here is its probability mass function:

    \begin{equation}
    f(y;\ n,\alpha,\beta) = \binom{n}{y} \frac{B(y+\alpha, n-y+\beta)}{B(\alpha,\beta)}
    \end{equation}

    Here \(B(x,y)\) is the "beta function" defined as

    \begin{equation}
    B(x,y) = \frac{\Gamma(x) \Gamma(y)}{\Gamma(x+y)}
    \end{equation}

    Unsurprisingly, the expected value is

    \begin{equation}
    E[Y] = n \frac{\alpha}{\alpha+\beta}
    \end{equation}

    Thankfully, the variance is

    \begin{align}
    V[Y] &= n \frac{\alpha\beta(\alpha+\beta+n)}{(\alpha+\beta)^2(\alpha+\beta+1)} \\[2em]
    &= E[Y] \frac{\beta(\alpha+\beta+n)}{(\alpha+\beta)(\alpha+\beta+1)}
    \end{align}

    Why "thankfully"? Quite simply because dispersion is not an issue. Like the Gaussian, there are enough variables in the probability mass function to estimate the dispersion from the data without forcing it to have a specific value (as in the Binomial and Poisson cases).

    With all of that good news being said, it is unfortunate that the Beta-Binomial is not a member of the Exponential Class. Thus, one cannot use the techniques defined by Nelder and Wedderburn (1972). However, one can still estimate the parameters of interest using maximum likelihood estimation.

    While it is clear that the mathematics of the problem will be more complicated, the coding will be about the same as when we used OLS and GLMs. As an example, compare and contrast the following three lines of code. Determine what each does.

    lm(y~x)
    glm(y~x, family=binomial)
    vglm(y~x, family=betabinomial)
    

    So, what are the differences? How nice is it that the researcher who created Vector Generalized Linear Models (VGLMs) believed in following good programming practice (Yee 2010 and 2015). Note that the coding is very similar. However, the function requires the VGAM package. If you do not already have it, please install it from CRAN. Do not forget to load it before using it:

    library(VGAM)
    

    After running the vglm function line, perform the usual summary of the estimated model. Note that this will take awhile. The function is performing many estimation steps in that summary call. As a result, there are a few things that are included in the regression output.

    To see this, here is the output from the Binomial regression using MQLE.

    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)  -6.6636     6.7370  -0.989    0.336
    averageAge    0.3766     0.3469   1.086    0.292
    

    Here is the output from the Beta-Binomial regression (using MLE).

    Coefficients:
                  Estimate Std. Error z value Pr(>|z|)
    (Intercept):1  -5.8076     5.7313  -1.013    0.311
    (Intercept):2  -0.3309     0.3335  -0.992    0.321
    averageAge      0.3233     0.2944   1.098    0.272
    

    The effect of the average age is interpreted in the same manner (note that the logit link was used in both cases). The difference in the estimated effects in the two models is slight. The big difference is that the intercept is estimated using two values in the VGLM. This is because the Beta-Binomial distribution has two parameters to estimate, \(\alpha\) and \(\beta\).

    Compare Figure \(\PageIndex{1}\) below, which was fit using the Beta-Binomial model, to the Figure 3 in the O Canada example, which was fit using the Binomial distribution. Note that the story told by the Beta-Binomial is approximately the same. The only difference I can see is that the Humanities and Social Sciences are not as close together in this model as they were in the Binomial model.

    Why Beta-Binomial

    The key reason to choose the Beta-Binomial over the Binomial is that the overdispersion guarantees that the Binomial is not the right distribution. Thus, logically, the Beta-Binomial is the better distribution.

    Beta-Binomial regression figure
    Figure \(\PageIndex{1}\): The estimated results of modeling the "O Canada" data using the Beta-Binomial distribution. Compare to the earlier figure that used the Binomial distribution.

    This page titled 16.5: Beta-Binomial Regression is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Ole Forsberg.

    • Was this article helpful?