# 12.6: The Most Typical Way to Do Chi-square Tests in R

- Page ID
- 4019

When discussing how to do a chi-square goodness of fit test (Section 12.1.7) and the chi-square test of independence (Section 12.2.2), I introduced you to two separate functions in the `lsr`

package. We ran our goodness of fit tests using the `goodnessOfFitTest()`

function, and our tests of independence (or association) using the `associationTest()`

function. And both of those functions produced quite detailed output, showing you the relevant descriptive statistics, printing out explicit reminders of what the hypotheses are, and so on. When you’re first starting out, it can be very handy to be given this sort of guidance. However, once you start becoming a bit more proficient in statistics and in R it can start to get very tiresome. A real statistician hardly needs to be told what the null and alternative hypotheses for a chi-square test are, and if an advanced R user wants the descriptive statistics to be printed out, they know how to produce them!

For this reason, the basic `chisq.test()`

function in R is a lot more terse in its output, and because the mathematics that underpin the goodness of fit test and the test of independence is basically the same in each case, it can run either test depending on what kind of input it is given. First, here’s the goodness of fit test. Suppose you have the frequency table `observed`

that we used earlier,

`observed`

```
##
## clubs diamonds hearts spades
## 35 51 64 50
```

If you want to run the goodness of fit test against the hypothesis that all four suits are equally likely to appear, then all you need to do is input this frequenct table to the `chisq.test()`

function:

**chisq.test**( x = observed )

```
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 8.44, df = 3, p-value = 0.03774
```

Notice that the output is very compressed in comparison to the `goodnessOfFitTest()`

function. It doesn’t bother to give you any descriptive statistics, it doesn’t tell you what null hypothesis is being tested, and so on. And as long as you already understand the test, that’s not a problem. Once you start getting familiar with R and with statistics, you’ll probably find that you prefer this simple output rather than the rather lengthy output that `goodnessOfFitTest()`

produces. Anyway, if you want to change the null hypothesis, it’s exactly the same as before, just specify the probabilities using the `p`

argument. For instance:

**chisq.test**( x = observed, p = **c**(.2, .3, .3, .2) )

```
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 4.7417, df = 3, p-value = 0.1917
```

Again, these are the same numbers that the `goodnessOfFitTest()`

function reports at the end of the output. It just hasn’t included any of the other details.

What about a test of independence? As it turns out, the `chisq.test()`

function is pretty clever.^{180} If you input a * cross-tabulation* rather than a simple frequency table, it realises that you’re asking for a test of independence and not a goodness of fit test. Recall that we already have this cross-tabulation stored as the

`chapekFrequencies`

variable:`chapekFrequencies`

```
## species
## choice robot human
## puppy 13 15
## flower 30 13
## data 44 65
```

To get the test of independence, all we have to do is feed this frequency table into the `chisq.test()`

function like so:

**chisq.test**( chapekFrequencies )

```
##
## Pearson's Chi-squared test
##
## data: chapekFrequencies
## X-squared = 10.722, df = 2, p-value = 0.004697
```

Again, the numbers are the same as last time, it’s just that the output is very terse and doesn’t really explain what’s going on in the rather tedious way that `associationTest()`

does. As before, my intuition is that when you’re just getting started it’s easier to use something like `associationTest()`

because it shows you more detail about what’s going on, but later on you’ll probably find that `chisq.test()`

is more convenient.