Okay, time for a little bit of a digression. I’ve been lying to you a little bit so far. There’s a tiny change that you need to make to your calculations whenever you only have 1 degree of freedom. It’s called the “continuity correction”, or sometimes the Yates correction. Remember what I pointed out earlier: the χ2 test is based on an approximation, specifically on the assumption that binomial distribution starts to look like a normal distribution for large N. One problem with this is that it often doesn’t quite work, especially when you’ve only got 1 degree of freedom (e.g., when you’re doing a test of independence on a 2×2 contingency table). The main reason for this is that the true sampling distribution for the X2 statistic is actually discrete (because you’re dealing with categorical data!) but the χ2 distribution is continuous. This can introduce systematic problems. Specifically, when N is small and when df=1, the goodness of fit statistic tends to be “too big”, meaning that you actually have a bigger α value than you think (or, equivalently, the p values are a bit too small). Yates (1934) suggested a simple fix, in which you redefine the goodness of fit statistic as:
Basically, he just subtracts off 0.5 everywhere. As far as I can tell from reading Yates’ paper, the correction is basically a hack. It’s not derived from any principled theory: rather, it’s based on an examination of the behaviour of the test, and observing that the corrected version seems to work better. I feel obliged to explain this because you will sometimes see R (or any other software for that matter) introduce this correction, so it’s kind of useful to know what they’re about. You’ll know when it happens, because the R output will explicitly say that it has used a “continuity correction” or “Yates’ correction”.