18.1: The Undiscovered Statistics
First, I’m going to talk a bit about some of the content that I wish I’d had the chance to cram into this version of the book, just so that you can get a sense of what other ideas are out there in the world of statistics. I think this would be important even if this book were getting close to a final product: one thing that students often fail to realise is that their introductory statistics classes are just that: an introduction. If you want to go out into the wider world and do real data analysis, you have to learn a whole lot of new tools that extend the content of your undergraduate lectures in all sorts of different ways. Don’t assume that something can’t be done just because it wasn’t covered in undergrad. Don’t assume that something is the right thing to do just because it {} covered in an undergrad class. To stop you from falling victim to that trap, I think it’s useful to give a bit of an overview of some of the other ideas out there.
Omissions within the topics covered
Even within the topics that I have covered in the book, there are a lot of omissions that I’d like to redress in future version of the book. Just sticking to things that are purely about statistics (rather than things associated with R), the following is a representative but not exhaustive list of topics that I’d like to expand on in later versions:
- Other types of correlations In Chapter 5 I talked about two types of correlation: Pearson and Spearman. Both of these methods of assessing correlation are applicable to the case where you have two continuous variables and want to assess the relationship between them. What about the case where your variables are both nominal scale? Or when one is nominal scale and the other is continuous? There are actually methods for computing correlations in such cases (e.g., polychoric correlation), but I just haven’t had time to write about them yet.
- More detail on effect sizes In general, I think the treatment of effect sizes throughout the book is a little more cursory than it should be. In almost every instance, I’ve tended just to pick one measure of effect size (usually the most popular one) and describe that. However, for almost all tests and models there are multiple ways of thinking about effect size, and I’d like to go into more detail in the future.
- Dealing with violated assumptions In a number of places in the book I’ve talked about some things you can do when you find that the assumptions of your test (or model) are violated, but I think that I ought to say more about this. In particular, I think it would have been nice to talk in a lot more detail about how you can tranform variables to fix problems. I talked a bit about this in Sections 7.2, 7.3 and 15.9.4, but the discussion isn’t detailed enough I think.
- Interaction terms for regression In Chapter 16 I talked about the fact that you can have interaction terms in an ANOVA, and I also pointed out that ANOVA can be interpreted as a kind of linear regression model. Yet, when talking about regression in Chapter 15 I made no mention of interactions at all. However, there’s nothing stopping you from including interaction terms in a regression model. It’s just a little more complicated to figure out what an “interaction” actually means when you’re talking about the interaction between two continuous predictors, and it can be done in more than one way. Even so, I would have liked to talk a little about this.
- Method of planned comparison As I mentioned this in Chapter 16, it’s not always appropriate to be using post hoc correction like Tukey’s HSD when doing an ANOVA, especially when you had a very clear (and limited) set of comparisons that you cared about ahead of time. I would like to talk more about this in a future version of book.
- Multiple comparison methods Even within the context of talking about post hoc tests and multiple comparisons, I would have liked to talk about the methods in more detail, and talk about what other methods exist besides the few options I mentioned.