Skip to main content
Statistics LibreTexts

16.5: Beta-Binomial Regression

  • Page ID
    57790
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    This section introduces the beta-binomial distribution as an alternative modeling framework for dependent variables that are counts with a fixed upper bound, particularly when the data exhibit overdispersion relative to a simple binomial model. It explains that while the binomial distribution assumes a fixed dispersion parameter of 1, real-world data often violate this assumption, leading to the use of quasi-binomial methods as a patch. The beta-binomial distribution offers a more principled solution by incorporating an additional parameter to explicitly model the dispersion, allowing it to better capture the true variability in the data. Although the beta-binomial is not a member of the exponential family and therefore cannot be fit using standard GLM procedures, it can be estimated via maximum likelihood using specialized R packages like VGAM, with code that remains reassuringly familiar to users of lm and glm.

     

    Learning Objectives

    By the end of this section, you will be able to:

    1. Explain the limitation of the binomial distribution for modeling count data, specifically that it forces the dispersion parameter to be 1, and recognize when overdispersion makes the beta-binomial distribution a more appropriate alternative.
    2. Describe the key properties of the beta-binomial distribution, noting that its expected value is \(n\pi\) and its variance includes an extra parameter (\(\rho\)) to model dispersion, allowing it to accommodate overdispersed data.
    3. Fit a beta-binomial regression model in R using the vglm function from the VGAM package, specifying family=betabinomial, and compare the coefficient estimates and standard errors to those from a quasi-binomial GLM.
    4. Interpret the output from a beta-binomial regression, understanding that the model estimates two parameters for the intercept (related to the mean and dispersion) and that the interpretation of the slope coefficients remains on the log-odds scale.

     

    ✦•················• ✦ •··················•✦

     

    There is yet another option for modeling dependent variables that are a count with an upper bound. The Binomial distribution is rather restrictive in that the dispersion parameter must be 1. We were able to relax this requirement by including a new variable to be the level of overdispersion to be estimated from the data. All this did, however, was admit that the Binomial distribution is not the correct distribution. Using MQLE allowed us to estimate the standard errors better, but we still clung to the Binomial distribution as the "true" distribution of the data.

    It is better to select an entirely new distribution than continue to twist the Binomial into something that it is not. This new distribution must be known, and it must allow for non-unit dispersion.

    Thankfully, we have such a distribution — the Beta-Binomial distribution. Here is its probability mass function:

    \begin{equation}
    f(y;\ n,\alpha,\beta) = \binom{n}{y} \frac{B(y+\alpha, n-y+\beta)}{B(\alpha,\beta)}
    \end{equation}

    Here \(B(x,y)\) is the "beta function" defined as

    \begin{equation}
    B(x,y) = \frac{\Gamma(x) \Gamma(y)}{\Gamma(x+y)}
    \end{equation}

    Unsurprisingly, the expected value is

    \begin{equation}
    E[Y] = n \frac{\alpha}{\alpha+\beta}
    \end{equation}

    Thankfully, the variance is

    \begin{align}
    V[Y] &= n \frac{\alpha\beta(\alpha+\beta+n)}{(\alpha+\beta)^2(\alpha+\beta+1)} \\[2em]
    &= E[Y] \frac{\beta(\alpha+\beta+n)}{(\alpha+\beta)(\alpha+\beta+1)}
    \end{align}

    Why "thankfully"? Quite simply because dispersion is not an issue. Like the Gaussian, there are enough variables in the probability mass function to estimate the dispersion from the data without forcing it to have a specific value (as in the Binomial and Poisson cases).

     

    With all of that good news being said, it is unfortunate that the Beta-Binomial is not a member of the Exponential Class. Thus, one cannot use the techniques defined by Nelder and Wedderburn (1972). However, one can still estimate the parameters of interest using maximum likelihood estimation.

     

    While it is clear that the mathematics of the problem will be more complicated, the coding will be about the same as when we used OLS and GLMs. As an example, compare and contrast the following three lines of code. Determine what each does.

    lm(y~x)
    glm(y~x, family=binomial)
    vglm(y~x, family=betabinomial)
    

     

    So, what are the differences? How nice is it that the researcher who created Vector Generalized Linear Models (VGLMs) believed in following good programming practice (Yee 2010 and 2015). Note that the coding is very similar. However, the function requires the VGAM package. If you do not already have it, please install it from CRAN. Do not forget to load it before using it:

    library(VGAM)
    

    After running the vglm function line, perform the usual summary of the estimated model. Note that this will take awhile. The function is performing many estimation steps in that summary call. As a result, there are a few things that are included in the regression output.

     

    To see this, here is the output from the Binomial regression using MQLE.

    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)  -6.6636     6.7370  -0.989    0.336
    averageAge    0.3766     0.3469   1.086    0.292
    

     

    Here is the output from the Beta-Binomial regression (using MLE).

    Coefficients:
                  Estimate Std. Error z value Pr(>|z|)
    (Intercept):1  -5.8076     5.7313  -1.013    0.311
    (Intercept):2  -0.3309     0.3335  -0.992    0.321
    averageAge      0.3233     0.2944   1.098    0.272
    

     

    The effect of the average age is interpreted in the same manner (note that the logit link was used in both cases). The difference in the estimated effects in the two models is slight. The big difference is that the intercept is estimated using two values in the VGLM. This is because the Beta-Binomial distribution has two parameters to estimate, \(\alpha\) and \(\beta\).

    Compare Figure \(\PageIndex{1}\) below, which was fit using the Beta-Binomial model, to the Figure 3 in the O Canada example, which was fit using the Binomial distribution. Note that the story told by the Beta-Binomial is approximately the same. The only difference I can see is that the Humanities and Social Sciences are not as close together in this model as they were in the Binomial model.

     

    Why Beta-Binomial

    The key reason to choose the Beta-Binomial over the Binomial is that the overdispersion guarantees that the Binomial is not the right distribution. Thus, logically, the Beta-Binomial is the better distribution.

     

     

    Beta-Binomial regression figure
    Figure \(\PageIndex{1}\): The estimated results of modeling the "O Canada" data using the Beta-Binomial distribution. Compare to the earlier figure that used the Binomial distribution.

     

     

     


    This page titled 16.5: Beta-Binomial Regression is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Ole Forsberg.

    • Was this article helpful?