17.10: Summary
 Page ID
 8315
The first half of this chapter was focused primarily on the theoretical underpinnings of Bayesian statistics. I introduced the mathematics for how Bayesian inference works (Section 17.1), and gave a very basic overview of how Bayesian hypothesis testing is typically done (Section 17.2). Finally, I devoted some space to talking about why I think Bayesian methods are worth using (Section 17.3.
The second half of the chapter was a lot more practical, and focused on tools provided by the BayesFactor
package. Specifically, I talked about using the contingencyTableBF()
function to do Bayesian analogs of chisquare tests (Section 17.6, the ttestBF()
function to do Bayesian ttests, (Section 17.7), the regressionBF()
function to do Bayesian regressions, and finally the anovaBF()
function for Bayesian ANOVA.
If you’re interested in learning more about the Bayesian approach, there are many good books you could look into. John Kruschke’s book Doing Bayesian Data Analysis is a pretty good place to start (Kruschke 2011), and is a nice mix of theory and practice. His approach is a little different to the “Bayes factor” approach that I’ve discussed here, so you won’t be covering the same ground. If you’re a cognitive psychologist, you might want to check out Michael Lee and E.J. Wagenmakers’ book Bayesian Cognitive Modeling (Lee and Wagenmakers 2014). I picked these two because I think they’re especially useful for people in my discipline, but there’s a lot of good books out there, so look around!
References
Jeffreys, Harold. 1961. The Theory of Probability. 3rd ed. Oxford.
Kass, Robert E., and Adrian E. Raftery. 1995. “Bayes Factors.” Journal of the American Statistical Association 90: 773–95.
Fisher, R. 1925. Statistical Methods for Research Workers. Edinburgh, UK: Oliver; Boyd.
Johnson, Valen E. 2013. “Revised Standards for Statistical Evidence.” Proceedings of the National Academy of Sciences, no. 48: 19313–7.
Morey, Richard D., and Jeffrey N. Rouder. 2015. BayesFactor: Computation of Bayes Factors for Common Designs. http://CRAN.Rproject.org/package=BayesFactor.
Gunel, Erdogan, and James Dickey. 1974. “Bayes Factors for Independence in Contingency Tables.” Biometrika, 545–57.
Kruschke, J. K. 2011. Doing Bayesian Data Analysis: A Tutorial with R and BUGS. Burlington, MA: Academic Press.
Lee, Michael D, and EricJan Wagenmakers. 2014. Bayesian Cognitive Modeling: A Practical Course. Cambridge University Press.

It’s a leap of faith, I know, but let’s run with it okay?

Um. I hate to bring this up, but some statisticians would object to me using the word “likelihood” here. The problem is that the word “likelihood” has a very specific meaning in frequentist statistics, and it’s not quite the same as what it means in Bayesian statistics. As far as I can tell, Bayesians didn’t originally have any agreed upon name for the likelihood, and so it became common practice for people to use the frequentist terminology. This wouldn’t have been a problem, except for the fact that the way that Bayesians use the word turns out to be quite different to the way frequentists do. This isn’t the place for yet another lengthy history lesson, but to put it crudely: when a Bayesian says “a likelihood function” they’re usually referring one of the rows of the table. When a frequentist says the same thing, they’re referring to the same table, but to them “a likelihood function” almost always refers to one of the columns. This distinction matters in some contexts, but it’s not important for our purposes.

If we were being a bit more sophisticated, we could extend the example to accommodate the possibility that I’m lying about the umbrella. But let’s keep things simple, shall we?

You might notice that this equation is actually a restatement of the same basic rule I listed at the start of the last section. If you multiply both sides of the equation by P(d), then you get P(d)P(hd)=P(d,h), which is the rule for how joint probabilities are calculated. So I’m not actually introducing any “new” rules here, I’m just using the same rule in a different way.

Obviously, this is a highly simplified story. All the complexity of real life Bayesian hypothesis testing comes down to how you calculate the likelihood P(dh) when the hypothesis h is a complex and vague thing. I’m not going to talk about those complexities in this book, but I do want to highlight that although this simple story is true as far as it goes, real life is messier than I’m able to cover in an introductory stats textbook.

http://www.imdb.com/title/tt0093779/quotes. I should note in passing that I’m not the first person to use this quote to complain about frequentist methods. Rich Morey and colleagues had the idea first. I’m shamelessly stealing it because it’s such an awesome pull quote to use in this context and I refuse to miss any opportunity to quote The Princess Bride.

http://about.abc.net.au/reportspublications/appreciationsurveysummaryreport2013/

In the interests of being completely honest, I should acknowledge that not all orthodox statistical tests that rely on this silly assumption. There are a number of sequential analysis tools that are sometimes used in clinical trials and the like. These methods are built on the assumption that data are analysed as they arrive, and these tests aren’t horribly broken in the way I’m complaining about here. However, sequential analysis methods are constructed in a very different fashion to the “standard” version of null hypothesis testing. They don’t make it into any introductory textbooks, and they’re not very widely used in the psychological literature. The concern I’m raising here is valid for every single orthodox test I’ve presented so far, and for almost every test I’ve seen reported in the papers I read.

A related problem: http://xkcd.com/1478/

Some readers might wonder why I picked 3:1 rather than 5:1, given that Johnson (2013) suggests that p=.05 lies somewhere in that range. I did so in order to be charitable to the pvalue. If I’d chosen a 5:1 Bayes factor instead, the results would look even better for the Bayesian approach.

Okay, I just know that some knowledgeable frequentists will read this and start complaining about this section. Look, I’m not dumb. I absolutely know that if you adopt a sequential analysis perspective you can avoid these errors within the orthodox framework. I also know that you can explictly design studies with interim analyses in mind. So yes, in one sense I’m attacking a “straw man” version of orthodox methods. However, the straw man that I’m attacking is the one that is used by almost every single practitioner. If it ever reaches the point where sequential methods become the norm among experimental psychologists and I’m no longer forced to read 20 extremely dubious ANOVAs a day, I promise I’ll rewrite this section and dial down the vitriol. But until that day arrives, I stand by my claim that default Bayes factor methods are much more robust in the face of data analysis practices as they exist in the real world. Default orthodox methods suck, and we all know it.

If you’re desperate to know, you can find all the gory details in Gunel and Dickey (1974). However, that’s a pretty technical paper. The help documentation to the
contingencyTableBF()
gives this explanation: “the argumentpriorConcentration
indexes the expected deviation from the null hypothesis under the alternative, and corresponds to Gunel and Dickey’s (1974) a parameter.” As I write this I’m about halfway through the Gunel and Dickey paper, and I agree that setting a=1 is a pretty sensible default choice, since it corresponds to an assumption that you have very little a priori knowledge about the contingency table. 
In some of the later examples, you’ll see that this number is not always 0%. This is because the
BayesFactor
package often has to run some simulations to compute approximate Bayes factors. So the answers you get won’t always be identical when you run the command a second time. That’s why the output of these functions tells you what the margin for error is. 
Apparently this omission is deliberate. I have this vague recollection that I spoke to Jeff Rouder about this once, and his opinion was that when homogeneity of variance is violated the results of a ttest are uninterpretable. I can see the argument for this, but I’ve never really held a strong opinion myself. (Jeff, if you never said that, I’m sorry)

Just in case you’re interested: the “JZS” part of the output relates to how the Bayesian test expresses the prior uncertainty about the variance σ2, and it’s short for the names of three people: “Jeffreys Zellner Siow”. See Rouder et al. (2009) for details.

Again, in case you care … the null hypothesis here specifies an effect size of 0, since the two means are identical. The alternative hypothesis states that there is an effect, but it doesn’t specify exactly how big the effect will be. The r value here relates to how big the effect is expected to be according to the alternative. You can type
?ttestBF
to get more details. 
Again, guys, sorry if I’ve misread you.

I don’t even disagree with them: it’s not at all obvious why a Bayesian ANOVA should reproduce (say) the same set of model comparisons that the Type II testing strategy uses. It’s precisely because of the fact that I haven’t really come to any strong conclusions that I haven’t added anything to the
lsr
package to make Bayesian Type II tests easier to produce.