16.6: Practice ChiSquare Test of Independence College Sports
 Page ID
 17432
We will be using the frequencies of watching college sports and how much college sports teams affect the choice of college attendance again, so here is the contingency table of the Observed frequencies and the table with the Expected frequencies that have been presented before. They are being presented again so that you don't have to go back and forth to find all of the data.
Here's the contingency table with the Observed frequencies:
College Sports  Primary Affect  Somewhat Affected  Did Not Affect Decision  Total 
Watched  47  26  14  87 
Did Not Watch  21  23  37  81 
Total  68  49  51  168 
and the Expected frequencies:
College Sports  Primary Affect  Somewhat Affected  Did Not Affect Decision  Total 
Watched  35.21  25.38  26.41  87 
Did Not Watch  32.79  23.63  24.59  81 
Total  68  49  51  168 
Now we're ready to follow the same 4step procedure that you've come to know.
Step 1: State the Hypotheses
ChiSquare tests on patterns of relationship, and this doesn't change when we have more than one variable. The research hypothesis does not need to specify how each cell relates to each other like in a factorial ANOVA because we generally don't do pairwise comparisons for ChiSquare analyses. Instead, you can describe a general pattern of relationship between the two variables.
Example \(\PageIndex{1}\)
What is the research hypothesis in words for this scenario? Make sure to describe a general pattern.
Solution
 Research hypothesis in words: There will be a pattern of difference such that there will be more people whose college decision was affected by college sports AND who watched college sports.
Remember, the hypotheses in symbols we can only say that the probabilities will not be equal. To determine that, let's figure out what the probability would be if all of the cells were equal. To find that out, we would divide a probability of 100% by the number of cells. There are six cells (Affected Decision=3; Watched=2; \(2\times 3 = 6\)).
\[\dfrac{100}{6} = 16.67 \nonumber \]
So the probability that any random participant will fall into a specific cell is 0.167 for each cell.
Example \(\PageIndex{2}\)
What is the research hypothesis in symbols for this scenario?
Solution
 Research hypothesis in symbols: \(P_{EachCell}\neq 0.167\).
If this research hypothesis in symbols doesn't make sense, it might be easier to start with a null hypothesis in words and symbols, then figure out how that works out for the research hypothesis. But honestly, it's not a huge deal if you don't get the probability part of the hypotheses. The important point is that you are testing the null hypothesis that all frequencies will be similar (no pattern of relationship), but that you actually expect a particular pattern (research hypothesis).
Example \(\PageIndex{3}\)
What is the null hypothesis in words and symbols for this scenario?
Solution
 Null hypothesis in words: There is no pattern of difference in watching college sports affect college decisions.
 Null hypothesis in symbols: \(P_{EachCell} = 0.167\).
Before we move on to an easier step, let's stop and remind ourselves that, just like correlations, ChiSquare tests cannot show that watching sports caused people to choose their college differently. This scenario is set up to make you think that the IV is watching college sports and the DV is the person's choice of college, but a ChiSquare can't test whether one thing causes another; these statistical analyses can only show if there's a pattern of relationship or not. The design of the experiment (how we collect the data to rule out alternative causes) is how we can show that one variable causes changes in another. This scenario is basically asking the participants if they think that college sports affected their choice of college. It's a small distinction, but an important one.
Step 2: Find the Critical Value
Okay, moving on from the "correlation doesn't equal causation" rant applied to ChiSquare!
Our critical value will come from the same table that we used for the Goodness of Fit ChiSquare test, but our degrees of freedom will change. Because we now have rows and columns (instead of just columns) our new Degrees of Freedom use information on both. This is described at the bottom of the Critical Value of ChiSquare Table page, and looks like this:
 \(\chi_{ToI}^2\) Test of Independence: \((R1)\times(C1) \)
 R is the number of rows
 C is the number of columns
What this means is that the number of rows minus one is multiplied by the number of columns minus one. In our example:
\[d f=(21)(31)=1 \\times 2 = 2 \nonumber \]
Exercise \(\PageIndex{1}\)
What is the critical value for this scenario from the p = 0.05 column?
 Answer

With our df = 2 (\(d f=(21)*(31)=1 \times 2 = 2 \)), the critical value is 5.991.
Step 3: Calculate the Test Statistic
You probably won't believe it, but you finally have caught a break in learning formulas. The formula for ChiSquare's Goodness of Fit test is the same formula for the ChiSquare Test of Independence!
\[\chi^{2}=\sum_{Each}\left(\dfrac{\left(EO\right)^{2}}{E} \right)\nonumber \]
If you find a way to combine Table \(\PageIndex{1}\) and Table \(\PageIndex{2}\) with the Differences, Differences Squared and divided by the Expected frequences into one table, then you are a data visualization wizard! For now, we'll create a new table for each step of the formula.
Example \(\PageIndex{4}\)
Use the previous two tables (Table \(\PageIndex{1}\) and Table \(\PageIndex{2}\) ) to create a table of differences by subtracting the Observed frequencies from the Expected frequencies for each cell.
Solution
Here is a table of differences:
College Sports  Primary Affect  Somewhat Affected  Did Not Affect Decision  Total 
Watched  35.2147=11.79  25.3826=0.62 (or 0.63)  26.4114=12.47  0 
Did Not Watch  32.7921=11.79  23.6323=0.63  24.5937=12.41  0 
Total  0  0 (or 0.01)  0  0 (or 0.01) 
Notice that the row and column Totals are zero (or nearly so, depending on rounding). This is another good calculation check!
So far, we've accomplished the part in the parentheses of the formula:
\[\chi^{2}=\sum_{Each}\left(\dfrac{\left(EO\right)^{2}}{E} \right)\nonumber \]
What is the next step in this formula?
Example \(\PageIndex{4}\)
Square the difference scores in each cell of Table \(\PageIndex{3}\).
Solution
College Sports  Primary Affect  Somewhat Affected  Did Not Affect Decision 
Watched  139.00  0.38  154.01 
Did Not Watch  139.00  0.40  154.01 
The row and column for the Total was removed because those sums aren't used for anything. You can calculate them for completeness, but it won't help you finish the formula. Unless it helps you not get lost!
Now we've finished the numerator of the formula:
\[\chi^{2}=\sum_{Each}\left(\dfrac{\left(EO\right)^{2}}{E} \right)\nonumber \]
What is the next step?
Example \(\PageIndex{5}\)
Using the squared differences in Table \(\PageIndex{4}\):, complete the formula by dividing each cell by it's own Expected frequency (found in Table \(\PageIndex{2}\)). Then, add up all of the Total rows and columns to get the calculated \(\chi^2\).
Solution
College Sports  Primary Affect  Somewhat Affected  Did Not Affect Decision  Total 
Watched  \(\dfrac{139}{35.21} = 3.95 \)  \(\dfrac{0.38}{25.38} = 0.01 \)  \(\dfrac{154.01}{26.41} = 5.83 \)  \(\sum_{Row} = 9.79 \) 
Did Not Watch  \(\dfrac{139}{32.79} = 4.24 \)  \(\dfrac{0.40}{23.63} = 0.02 \)  \(\dfrac{154.01}{24.59} = 6.26 \)  \(\sum_{Row} = 10.52 \) 
Total  \(\sum_{Column} = 8.19 \)  \(\sum_{Column} = 0.03 \)  \(\sum_{Column} = 12.09 \) 
\(\sum_{Column} = 20.31 \) \(\sum_{Row} = 20.31 \) 
The Total for summing the Totals for the columns is the same as the sum of the Totals for the rows, so we did it correctly!
Okay, let's see if, for the Test of Independence, doing your calculations in the formula might be easier.
Example \(\PageIndex{6\)
Use the ChiSquare formula to calculate the \(\chi^2\) statistic
Solution
Using the information in Table \(\PageIndex{1}\) and Table \(\PageIndex{2}\), we find:
\[\begin{aligned} \chi^{2} &=\dfrac{(35.2147)^{2}}{35.21}+\dfrac{(25.3826)^{2}}{25.38}+\dfrac{(26.4114)^{2}}{26.41}+ \dfrac{(32.7921)^{2}}{32.79}+\dfrac{(23.6223)^{2}}{23.62}+\dfrac{(24.5937)^{2}}{24.59} \end{aligned} \nonumber \]
\[\begin{aligned} \chi_{Diff}^{2} &=\dfrac{(11.79)^{2}}{35.21}+\dfrac{(0.62)^{2}}{25.38}+\dfrac{(12.41)^{2}}{26.41}+ \dfrac{(11.79)^{2}}{32.79}+\dfrac{(0.63)^{2}}{23.62}+\dfrac{(12.41)^{2}}{24.59} \end{aligned} \nonumber \]
\[\begin{aligned} \chi_{Diff Squared}^{2} &=\dfrac{139}{35.21}+\dfrac{0.38}{25.38}+\dfrac{154.01}{26.41}+ \dfrac{139.00}{32.79}+\dfrac{0.40}{23.62}+\dfrac{154.01}{24.59} \end{aligned} \nonumber \]
\[\begin{aligned} \chi_{Division}^{2} &=3.95+0.01+5.83+ 4.24+0.02+6.26 = 20.31 \end{aligned} \nonumber \]
\[ \chi^{2} = 20.31 \nonumber \]
What do you think? Was it easier to do the calculations in five different tables, or do it all in one formula? There's no right answer for this, it really is what's easier for you.
But now, we're ready to make a decision!
Step 4: Make the Decision
What is the final decision?
Exercise \(\PageIndex{2}\)
Should the null hypothesis be retained or rejected?
 Answer

Our calculated \(\chi^2\)=20.31, and the critical \(\chi^2\) was 5.991, so we would reject the null hypothesis. Our calculated value is so extreme that we would expect it less than 5% of the time if there really was no pattern of relationship between the two qualitative variables.
So what would the statistical sentence look like?
Exercise \(\PageIndex{3}\)
What would our results look like in the statistical sentence?
 Answer

\(\chi^2\)(2)=20.31, p<.05
Let's use all that we've done to let people know what we found in...
The WriteUp
Can you write this up with the four requirements for reporting results but without descriptive statistics? You can include all of the Observed frequencies, but that gets clunky. A good way around that is to refer to the original table of Observed frequencies.
Example \(\PageIndex{7}\)
Report the results in a concluding paragraph that includes the four requirements.
Solution
The research hypothesis was that there will be a pattern of difference such that there will be more people whose college decision was affected by college sports AND who watched college sports A pattern of difference was found (\(\chi^2\)(2)=20.31, p<.05). As can be seen in Table \(\PageIndex{1}\), this research hypothesis was not supported. People who watched college sports seem to believe that they used that to choose their college, and people who didn't watch college sports seem to believe that they did not use college sports to make their decision about which college to choose.
Did you notice all of the "seems like" and "they believe" in that concluding paragraph? Yeah, that's how scientists write. Because science is cumulative, each of one us adds one piece of evidence to a pile that supports one idea. In this case, the idea was that people thought that their choice of college was affected by whether they watched college sports or not. In the Goodness of Fit example, the idea that was supported was that there are about the same amount of people who like and dislike pineapples on pizza. But one study is never conclusive. Instead, many, many scientists conduct many, many studies. Some of them show reality, but some of them (p<.05) find results from their sample that do not match the reality in the population. It can be hard for nonscientists because they might just see us being wishywashy about our results when we are really following the guidelines of the null hypothesis signifiance testing procedure.
Let's try one more example so that we've got this ChiSquare thing down.
Contributors and Attributions
Foster et al. (University of MissouriSt. Louis, Rice University, & University of Houston, Downtown Campus)
