9.7: Summary
 Page ID
 8231
In this chapter we’ve talked about probability. We’ve talked what probability means, and why statisticians can’t agree on what it means. We talked about the rules that probabilities have to obey. And we introduced the idea of a probability distribution, and spent a good chunk of the chapter talking about some of the more important probability distributions that statisticians work with. The section by section breakdown looks like this:
 Probability theory versus statistics (Section 9.1)
 Frequentist versus Bayesian views of probability (Section 9.2)
 Basics of probability theory (Section 9.3)
 Binomial distribution (Section 9.4), normal distribution (Section 9.5), and others (Section 9.6)
As you’d expect, my coverage is by no means exhaustive. Probability theory is a large branch of mathematics in its own right, entirely separate from its application to statistics and data analysis. As such, there are thousands of books written on the subject and universities generally offer multiple classes devoted entirely to probability theory. Even the “simpler” task of documenting standard probability distributions is a big topic. I’ve described five standard probability distributions in this chapter, but sitting on my bookshelf I have a 45chapter book called “Statistical Distributions” Evans, Hastings, and Peacock (2011) that lists a lot more than that. Fortunately for you, very little of this is necessary. You’re unlikely to need to know dozens of statistical distributions when you go out and do real world data analysis, and you definitely won’t need them for this book, but it never hurts to know that there’s other possibilities out there.
Picking up on that last point, there’s a sense in which this whole chapter is something of a digression. Many undergraduate psychology classes on statistics skim over this content very quickly (I know mine did), and even the more advanced classes will often “forget” to revisit the basic foundations of the field. Most academic psychologists would not know the difference between probability and density, and until recently very few would have been aware of the difference between Bayesian and frequentist probability. However, I think it’s important to understand these things before moving onto the applications. For example, there are a lot of rules about what you’re “allowed” to say when doing statistical inference, and many of these can seem arbitrary and weird. However, they start to make sense if you understand that there is this Bayesian/frequentist distinction. Similarly, in Chapter 13 we’re going to talk about something called the ttest, and if you really want to have a grasp of the mechanics of the ttest it really helps to have a sense of what a tdistribution actually looks like. You get the idea, I hope.
References
Fisher, R. 1922b. “On the Mathematical Foundation of Theoretical Statistics.” Philosophical Transactions of the Royal Society A 222: 309–68.
Meehl, P. H. 1967. “Theory Testing in Psychology and Physics: A Methodological Paradox.” Philosophy of Science 34: 103–15.
Evans, M., N. Hastings, and B. Peacock. 2011. Statistical Distributions (3rd Ed). Wiley.

This doesn’t mean that frequentists can’t make hypothetical statements, of course; it’s just that if you want to make a statement about probability, then it must be possible to redescribe that statement in terms of a sequence of potentially observable events, and the relative frequencies of different outcomes that appear within that sequence.

Note that the term “success” is pretty arbitrary, and doesn’t actually imply that the outcome is something to be desired. If θ referred to the probability that any one passenger gets injured in a bus crash, I’d still call it the success probability, but that doesn’t mean I want people to get hurt in bus crashes!

Since computers are deterministic machines, they can’t actually produce truly random behaviour. Instead, what they do is take advantage of various mathematical functions that share a lot of similarities with true randomness. What this means is that any random numbers generated on a computer are pseudorandom, and the quality of those numbers depends on the specific method used. By default R uses the “Mersenne twister” method. In any case, you can find out more by typing
?Random
, but as usual the R help files are fairly dense. 
In practice, the normal distribution is so handy that people tend to use it even when the variable isn’t actually continuous. As long as there are enough categories (e.g., Likert scale responses to a questionnaire), it’s pretty standard practice to use the normal distribution as an approximation. This works out much better in practice than you’d think.

For those readers who know a little calculus, I’ll give a slightly more precise explanation. In the same way that probabilities are nonnegative numbers that must sum to 1, probability densities are nonnegative numbers that must integrate to 1 (where the integral is taken across all possible values of X). To calculate the probability that X falls between a and b we calculate the definite integral of the density function over the corresponding range, \(\int_{a}^{b} p(x) d x\). If you don’t remember or never learned calculus, don’t worry about this. It’s not needed for this book.