Skip to main content
Statistics LibreTexts

15.2: Chi-Square Test of Independence

  • Page ID
    56672
    • Chanler Hilley, Kennesaw State University
    • University of Missouri System

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    The test for goodness of fit is a useful tool for assessing a single categorical variable. However, what is more common is wanting to know if two categorical variables are related to one another. This type of analysis is similar to a correlation, the only difference being that we are working with nominal data, which violates the assumptions of traditional correlation coefficients. This is where the \(\chi^2 \) test for independence comes in handy.

    As noted above, our only description for nominal data is frequency, so we will again present our observations in a frequency table. When we have two categorical variables, our frequency table is crossed. That is, each combination of levels from each categorical variable is presented. This type of frequency table is called a contingency table because it shows the frequency of each category in one variable, contingent upon the specific level of the other variable.

    An example contingency table is shown in Table \(\PageIndex{1}\), which displays whether or not 168 college students watched college sports growing up (Yes/No) and whether the students’ final choice of which college to attend was influenced by the college’s sports teams (Yes, primary; Yes, somewhat; No).

    Table \(\PageIndex{1}\): Contingency table of watching college sports and college decision making
    Yes, primary Yes, somewhat No Total
    Watched as a child 47 26 14 87
    Did not watch as a child 21 23 37 81
    Total 68 49 51 168

    In contrast to the frequency table for our test for goodness of fit, our contingency table does not contain expected values, only observed data. Within our table, wherever our rows and columns cross, we have a cell. A cell contains the frequency of observing its corresponding specific levels of each variable at the same time. The top left cell in Table \(\PageIndex{1}\) shows us that 47 people in our study watched college sports as a child and had college sports as their primary deciding factor in which college to attend.

    Cells are numbered based on which row they are in (rows are numbered top to bottom) and which column they are in (columns are numbered left to right). We always name the cell using (R,C), with the row first and the column second. A quick and easy way to remember the order is that the brand RC Cola exists, but CR Cola does not. Based on this convention, the top left cell containing our 47 participants who watched college sports as a child and had sports as a primary criterion is cell (1,1). Next to it, which has 26 people who watched college sports as a child but had sports only somewhat affect their decision, is cell (1,2), and so on. We only number the cells where our categories cross. We do not number our total cells, which have their own special name: marginal values.

    Marginal values are the total values for a single category of one variable, added up across levels of the other variable. In Table \(\PageIndex{1}\), these marginal values have been made bold for ease of explanation, though this is not normally the case. We can see that, in total, 87 of our participants (47 + 26 + 14) watched college sports growing up, and 81 (21 + 23 + 37) did not. The total of these two marginal values is 168, the total number of people in our study. Likewise, 68 people used sports as a primary criterion for deciding which college to attend, 50 considered it somewhat, and 50 did not consider it at all. The total of these marginal values is also 168, our total number of people. The marginal values for rows and columns will always both add up to the total number of participants, N, in the study. If they do not, then a calculation error was made, and you must go back and check your work.

    Expected Values of Contingency Tables

    Our expected values for contingency tables are based on the same logic as they were for frequency tables, but now we must incorporate information about how frequently each row and column was observed (the marginal values) and how many people were in the sample overall (N) to find what random chance would have made the frequencies out to be. Specifically:

    \[\Large E_{ij} = \frac{R_jC_j}{N} \nonumber \]

    The subscripts i and j indicate which row and column, respectively, correspond to the cell we are calculating the expected frequency for, and the Ri and Cj are the row and column marginal values, respectively. N is still the total sample size. Using the data from Table \(\PageIndex{1}\), we can calculate the expected frequency for cell (1,1), the college sport watchers who used sports as their primary criterion, to be:

    \[\Large E_{1,1} = \frac{(87)(68)}{168} = 35.21 \nonumber \]

    We can follow the same math to find all the expected values for this table:

    Table \(\PageIndex{2}\): Expected values from the contingency table of watching college sports and college decision making
    Yes, primary Yes, somewhat No Total
    Watched as a child 35.21 25.38 26.41 87
    Did not watch as a child 32.79 23.62 24.59 81
    Total 68 49 51 168

    Notice that the marginal values still add up to the same totals as before. This is because the expected frequencies are just row and column averages simultaneously. Our total N will also add up to the same value.

    The observed and expected frequencies can be used to calculate the same \(\chi^2 \) statistic as we calculated for the test for goodness of fit. Before we get there, though, we should look at the hypotheses and degrees of freedom used for contingency tables.

    Test for Independence

    The \(\chi^2 \) test performed on contingency tables is known as the test for independence. In this analysis, we are looking to see if the values of each categorical variable (that is, the frequency of their levels) are related to or independent of the values of the other categorical variable. Because we are still doing a \(\chi^2 \) test, which is nonparametric, we still do not have mathematical versions of our hypotheses. The actual interpretations of the hypotheses are quite simple: the null hypothesis says that the variables are independent or not related, and the alternative hypothesis says that they are not independent or that they are related. Using this setup and the data provided in Table \(\PageIndex{1}\), let’s formally test whether watching college sports as a child is related to using sports as a criterion for selecting a college to attend.

    Example College Sports

    We will follow the same four-step procedure as we have since Chapter 7.

    Step 1: State the Hypotheses

    Our null hypothesis of no difference will state that there is no relationship between our variables, and our alternative will state that our variables are related.

    \[
    \begin{aligned}
    H_0:&\ \text{College choice criteria is independent of college sports viewership as a child} \\
    H_A:&\ \text{College choice criteria is related to college sports viewership as a child} \nonumber
    \end{aligned}
    \]

    Step 2: Find the Critical Value

    Our critical value will come from the same table that we used for the test for goodness of fit (in section 16.5), but our degrees of freedom will change. Because we now have rows and columns (instead of just columns), our new degrees of freedom use information from both:

    \[\Large df = (R - 1)(C - 1) \nonumber \]

    In our example:

    \[\Large df = (2 - 1)(3 - 1) = (1)(2) = 2 \nonumber \]

    Based on our 2 degrees of freedom, our critical value from our table is 5.991.

    Step 3: Calculate the Test Statistic and Effect Size

    The same formula for \(\chi^2 \) is used once again:

    \[
    \Large
    \begin{array}{rl}
    \chi^2 &= \displaystyle \sum \frac{(O - E)^2}{E} \\[2.5ex]
    &= \displaystyle \frac{(47 - 35.21)^2}{35.21} + \frac{(26 - 35.21)^2}{35.21} + \frac{(14 - 35.21)^2}{35.21} + \\
    &\quad \displaystyle \frac{(21 - 35.21)^2}{35.21} + \frac{(23 - 35.21)^2}{35.21} + \frac{(37 - 35.21)^2}{35.21} \\[2.5ex]
    &= \displaystyle 3.94 + 0.02 + 5.83 + 4.24 + 0.02 + 6.26 \\[2.5ex]
    &= 20.31
    \end{array}
    \nonumber
    \]

    Step 4: Make the Decision

    The final decision for our test of independence is still based on our observed value (20.31) and our critical value (5.991). Because our observed value is greater than our critical value, we can reject the null hypothesis.

    Reject \(H_0\). Based on our data from 168 people, we can say that there is a statistically significant relationship between whether someone watches college sports growing up and the influence a college’s sports teams have on that person’s decision on which college to attend, and the effect size was moderate, \(\chi^2 \)(2, N = 168) = 20.31, p < .05, V < .348. Figure \(\PageIndex{2}\) shows the output from JASP for this example.

    A contingency table and chi-square test summary showing observed and expected counts for decision types vs. verdict, with statistics including chi-square, p-value, and Cramérs V.
    Figure \(\PageIndex{2}\): Output from JASP for the \(\chi^2 \) test for independence described in the College Sports example. (“JASP chi-square independence” by Rupa G. Gordon/Judy Schmitt is licensed under CC BY-NC-SA 4.0.)
    Video: Chi-square test for association (independence)

    Chi-square test for association (independence) on YouTube.

    Test Your Knowledge

    Question \(\PageIndex{1}\)

    Question \(\PageIndex{2}\)

    Question \(\PageIndex{3}\)

    Question \(\PageIndex{4}\)

    Question \(\PageIndex{5}\)


    This page titled 15.2: Chi-Square Test of Independence is shared under a not declared license and was authored, remixed, and/or curated by Chanler Hilley, Kennesaw State University via source content that was edited to the style and standards of the LibreTexts platform.