Skip to main content
Statistics LibreTexts

17.8: Bayesian Regression

  • Page ID
    8313
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Okay, so now we’ve seen Bayesian equivalents to orthodox chi-square tests and t-tests. What’s next? If I were to follow the same progression that I used when developing the orthodox tests you’d expect to see ANOVA next, but I think it’s a little clearer if we start with regression.

    quick refresher

    In Chapter 15 I used the parenthood data to illustrate the basic ideas behind regression. To remind you of what that data set looks like, here’s the first six observations:

    load("./rbook-master/data/parenthood.Rdata")
    head(parenthood)
    ##   dan.sleep baby.sleep dan.grump day
    ## 1      7.59      10.18        56   1
    ## 2      7.91      11.66        60   2
    ## 3      5.14       7.92        82   3
    ## 4      7.71       9.61        55   4
    ## 5      6.68       9.75        67   5
    ## 6      5.99       5.04        72   6

    Back in Chapter 15 I proposed a theory in which my grumpiness (dan.grump) on any given day is related to the amount of sleep I got the night before (dan.sleep), and possibly to the amount of sleep our baby got (baby.sleep), though probably not to the day on which we took the measurement. We tested this using a regression model. In order to estimate the regression model we used the lm() function, like so:

    model <- lm( 
      formula = dan.grump ~ dan.sleep + day + baby.sleep,
      data = parenthood
    )

    The hypothesis tests for each of the terms in the regression model were extracted using the summary() function as shown below:

    summary(model) 
    ## 
    ## Call:
    ## lm(formula = dan.grump ~ dan.sleep + day + baby.sleep, data = parenthood)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -10.906  -2.284  -0.295   2.652  11.880 
    ## 
    ## Coefficients:
    ##               Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept) 126.278707   3.242492  38.945   <2e-16 ***
    ## dan.sleep    -8.969319   0.560007 -16.016   <2e-16 ***
    ## day          -0.004403   0.015262  -0.288    0.774    
    ## baby.sleep    0.015747   0.272955   0.058    0.954    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 4.375 on 96 degrees of freedom
    ## Multiple R-squared:  0.8163, Adjusted R-squared:  0.8105 
    ## F-statistic: 142.2 on 3 and 96 DF,  p-value: < 2.2e-16

    When interpreting the results, each row in this table corresponds to one of the possible predictors. The (Intercept) term isn’t usually interesting, though it is highly significant. The important thing for our purposes is the fact that dan.sleep is significant at p<.001 and neither of the other variables are.

    Bayesian version

    Okay, so how do we do the same thing using the BayesFactor package? The easiest way is to use the regressionBF() function instead of lm(). As before, we use formula to indicate what the full regression model looks like, and the data argument to specify the data frame. So the command is:

    regressionBF(
      formula = dan.grump ~ dan.sleep + day + baby.sleep,
      data = parenthood
    )
    ## Bayes factor analysis
    ## --------------
    ## [1] dan.sleep                    : 1.622545e+34 ±0.01%
    ## [2] day                          : 0.2724027    ±0%
    ## [3] baby.sleep                   : 10018411     ±0%
    ## [4] dan.sleep + day              : 1.016576e+33 ±0%
    ## [5] dan.sleep + baby.sleep       : 9.77022e+32  ±0%
    ## [6] day + baby.sleep             : 2340755      ±0%
    ## [7] dan.sleep + day + baby.sleep : 7.835625e+31 ±0%
    ## 
    ## Against denominator:
    ##   Intercept only 
    ## ---
    ## Bayes factor type: BFlinearModel, JZS

    So that’s pretty straightforward: it’s exactly what we’ve been doing throughout the book. The output, however, is a little different from what you get from lm(). The format of this is pretty familiar. At the bottom we have some techical rubbish, and at the top we have some information about the Bayes factors. What’s new is the fact that we seem to have lots of Bayes factors here. What’s all this about?

    The trick to understanding this output is to recognise that if we’re interested in working out which of the 3 predictor variables are related to dan.grump, there are actually 8 possible regression models that could be considered. One possibility is the intercept only model, in which none of the three variables have an effect. At the other end of the spectrum is the full model in which all three variables matter. So what regressionBF() does is treat the intercept only model as the null hypothesis, and print out the Bayes factors for all other models when compared against that null. For example, if we look at line 4 in the table, we see that the evidence is about 1033 to 1 in favour of the claim that a model that includes both dan.sleep and day is better than the intercept only model. Or if we look at line 1, we can see that the odds are about 1.6×1034 that a model containing the dan.sleep variable (but no others) is better than the intercept only model.

    Finding the best model

    In practice, this isn’t super helpful. In most situations the intercept only model is one that you don’t really care about at all. What I find helpful is to start out by working out which model is the best one, and then seeing how well all the alternatives compare to it. Here’s how you do that. In this case, it’s easy enough to see that the best model is actually the one that contains dan.sleep only (line 1), because it has the largest Bayes factor. However, if you’ve got a lot of possible models in the output, it’s handy to know that you can use the head() function to pick out the best few models. First, we have to go back and save the Bayes factor information to a variable:

    models <- regressionBF(
      formula = dan.grump ~ dan.sleep + day + baby.sleep,
      data = parenthood
    )

    Let’s say I want to see the best three models. To do this, I use the head() function specifying n=3, and here’s what I get as the result:

    head( models, n = 3)
    
    ## Bayes factor analysis
    ## --------------
    ## [1] dan.sleep              : 1.622545e+34 ±0.01%
    ## [2] dan.sleep + day        : 1.016576e+33 ±0%
    ## [3] dan.sleep + baby.sleep : 9.77022e+32  ±0%
    ## 
    ## Against denominator:
    ##   Intercept only 
    ## ---
    ## Bayes factor type: BFlinearModel, JZS

    This is telling us that the model in line 1 (i.e., dan.grump ~ dan.sleep) is the best one. That’s almost what I’m looking for, but it’s still comparing all the models against the intercept only model. That seems silly. What I’d like to know is how big the difference is between the best model and the other good models. For that, there’s this trick:

    head( models/max(models), n = 3)
    ## Bayes factor analysis
    ## --------------
    ## [1] dan.sleep              : 1         ±0%
    ## [2] dan.sleep + day        : 0.0626532 ±0.01%
    ## [3] dan.sleep + baby.sleep : 0.0602154 ±0.01%
    ## 
    ## Against denominator:
    ##   dan.grump ~ dan.sleep 
    ## ---
    ## Bayes factor type: BFlinearModel, JZS

    Notice the bit at the bottom showing that the “denominator” has changed. What that means is that the Bayes factors are now comparing each of those 3 models listed against the dan.grump ~ dan.sleep model. Obviously, the Bayes factor in the first line is exactly 1, since that’s just comparing the best model to itself. More to the point, the other two Bayes factors are both less than 1, indicating that they’re all worse than that model. The Bayes factors of 0.06 to 1 imply that the odds for the best model over the second best model are about 16:1. You can work this out by simple arithmetic (i.e., 0.06/1≈16), but the other way to do it is to directly compare the models. To see what I mean, here’s the original output:

    models
    ## Bayes factor analysis
    ## --------------
    ## [1] dan.sleep                    : 1.622545e+34 ±0.01%
    ## [2] day                          : 0.2724027    ±0%
    ## [3] baby.sleep                   : 10018411     ±0%
    ## [4] dan.sleep + day              : 1.016576e+33 ±0%
    ## [5] dan.sleep + baby.sleep       : 9.77022e+32  ±0%
    ## [6] day + baby.sleep             : 2340755      ±0%
    ## [7] dan.sleep + day + baby.sleep : 7.835625e+31 ±0%
    ## 
    ## Against denominator:
    ##   Intercept only 
    ## ---
    ## Bayes factor type: BFlinearModel, JZS

    The best model corresponds to row 1 in this table, and the second best model corresponds to row 4. All you have to do to compare these two models is this:

    models[1] / models[4]
    ## Bayes factor analysis
    ## --------------
    ## [1] dan.sleep : 15.96088 ±0.01%
    ## 
    ## Against denominator:
    ##   dan.grump ~ dan.sleep + day 
    ## ---
    ## Bayes factor type: BFlinearModel, JZS

    And there you have it. You’ve found the regression model with the highest Bayes factor (i.e., dan.grump ~ dan.sleep), and you know that the evidence for that model over the next best alternative (i.e., dan.grump ~ dan.sleep + day) is about 16:1.

    Extracting Bayes factors for all included terms

    Okay, let’s say you’ve settled on a specific regression model. What Bayes factors should you report? In this example, I’m going to pretend that you decided that dan.grump ~ dan.sleep + baby.sleep is the model you think is best. Sometimes it’s sensible to do this, even when it’s not the one with the highest Bayes factor. Usually this happens because you have a substantive theoretical reason to prefer one model over the other. However, in this case I’m doing it because I want to use a model with more than one predictor as my example!

    Having figured out which model you prefer, it can be really useful to call the regressionBF() function and specifying whichModels="top". You use your “preferred” model as the formula argument, and then the output will show you the Bayes factors that result when you try to drop predictors from this model:

    regressionBF( 
     formula = dan.grump ~ dan.sleep + baby.sleep,
     data = parenthood,
     whichModels = "top"
    )
    ## Bayes factor top-down analysis
    ## --------------
    ## When effect is omitted from dan.sleep + baby.sleep , BF is...
    ## [1] Omit baby.sleep : 16.60705     ±0.01%
    ## [2] Omit dan.sleep  : 1.025403e-26 ±0.01%
    ## 
    ## Against denominator:
    ##   dan.grump ~ dan.sleep + baby.sleep 
    ## ---
    ## Bayes factor type: BFlinearModel, JZS

    Okay, so now you can see the results a bit more clearly. The Bayes factor when you try to drop the dan.sleep predictor is about 10−26, which is very strong evidence that you shouldn’t drop it. On the other hand, the Bayes factor actually goes up to 17 if you drop baby.sleep, so you’d usually say that’s pretty strong evidence for dropping that one.


    This page titled 17.8: Bayesian Regression is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Danielle Navarro via source content that was edited to the style and standards of the LibreTexts platform.

    • Was this article helpful?