Skip to main content
Statistics LibreTexts

6.4: Testing for independence in two-way tables

  • Page ID
    56940
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    We all buy used products – cars, computers, textbooks, and so on – and we sometimes assume the sellers of those products will be forthright about any underlying problems with what they’re selling. This is not something we should take for granted. Researchers recruited 219 participants in a study where they would sell a used iPod3 that was known to have frozen twice in the past. The participants were incentivized to get as much money as they could for the iPod since they would receive a 5% cut of the sale on top of $10 for participating. The researchers wanted to understand what types of questions would elicit the seller to disclose the freezing issue.

    Unbeknownst to the participants who were the sellers in the study, the buyers were collaborating with the researchers to evaluate the influence of different questions on the likelihood of getting the sellers to disclose the past issues with the iPod. The scripted buyers started with “Okay, I guess I’m supposed to go first. So you’ve had the iPod for 2 years ...” and ended with one of three questions:

    • General: What can you tell me about it?
    • Positive Assumption: It doesn’t have any problems, does it?
    • Negative Assumption: What problems does it have?

    The question is the treatment given to the sellers, and the response is whether the question prompted them to disclose the freezing issue with the iPod. The results are shown in Figure [ipod_ask_data_summary], and the data suggest that asking the, What problems does it have?, was the most effective at getting the seller to disclose the past freezing issues. However, you should also be asking yourself: could we see these results due to chance alone, or is this in fact evidence that some questions are more effective for getting at the truth?

    General Positive Assumption Negative Assumption Total
    Disclose Problem 2 23 36 61
    Hide Problem 71 50 37 158
    Total 73 73 73 219

    [ipod_ask_data_summary]

    Differences of one-way tables vs two-way tables A one-way table describes counts for each outcome in a single variable. A two-way table describes counts for combinations of outcomes for two variables. When we consider a two-way table, we often would like to know, are these variables related in any way? That is, are they dependent (versus independent)?

    The hypothesis test for the iPod experiment is really about assessing whether there is statistically significant evidence that the success each question had on getting the participant to disclose the problem with the iPod. In other words, the goal is to check whether the buyer’s question was independent of whether the seller disclosed a problem.

    Expected counts in two-way tables

    Like with one-way tables, we will need to compute estimated counts for each cell in a two-way table.

    From the experiment, we can compute the proportion of all sellers who disclosed the freezing problem as \(61{}/219= 0.2785{}\). If there really is no difference among the questions and 27.85% of sellers were going to disclose the freezing problem no matter the question that was put to them, how many of the 73 people in the group would we have expected to disclose the freezing problem? [iPodExComputeExpAA] We would predict that \(0.2785{} \times 73{} = 20.33{}\) sellers would disclose the problem. Obviously we observed fewer than this, though it is not yet clear if that is due to chance variation or whether that is because the questions vary in how effective they are at getting to the truth.

    [iPodExComputeExpBB] If the questions were actually equally effective, meaning about 27.85% of respondents would disclose the freezing issue regardless of what question they were asked, about how many sellers would we expect to hide the freezing problem from the Positive Assumption group?

    We can compute the expected number of sellers who we would expect to disclose or hide the freezing issue for all groups, if the questions had no impact on what they disclosed, using the same strategy employed in Example [iPodExComputeExpAA] and Guided Practice [iPodExComputeExpBB]. These expected counts were used to construct Figure [ipod_ask_data_summary_expected], which is the same as Figure [ipod_ask_data_summary], except now the expected counts have been added in parentheses.

    General Positive Assumption Negative Assumption Total
    Disclose Problem 2 23 36 61
    Hide Problem 71 50 37 158
    Total 73 73 73 219

    [ipod_ask_data_summary_expected]

    The examples and exercises above provided some help in computing expected counts. In general, expected counts for a two-way table may be computed using the row totals, column totals, and the table total. For instance, if there was no difference between the groups, then about 27.85% of each column should be in the first row:

    \[\begin{aligned} 0.2785{}\times (\text{column 1 total}) &= 20.33{} \\ 0.2785{}\times (\text{column 2 total}) &= 20.33{} \\ 0.2785{}\times (\text{column 3 total}) &= 20.33{}\end{aligned}\]

    Looking back to how 0.2785 was computed – as the fraction of sellers who disclosed the freezing issue (\(158{}/219{}\)) – these three expected counts could have been computed as

    \[\begin{aligned} \left(\frac{\text{row 1 total}}{\text{table total}}\right) \text{(column 1 total)} &= 20.33{} \\ \left(\frac{\text{row 1 total}}{\text{table total}}\right) \text{(column 2 total)} &= 20.33{} \\ \left(\frac{\text{row 1 total}}{\text{table total}}\right) \text{(column 3 total)} &= 20.33{}\end{aligned}\]

    This leads us to a general formula for computing expected counts in a two-way table when we would like to test whether there is strong evidence of an association between the column variable and row variable.

    Computing expected counts in a two-way table To identify the expected count for the \(i^{th}\) row and \(j^{th}\) column, compute

    \[\begin{aligned} \text{Expected Count}_{\text{row }i,\text{ col }j} = \frac{(\text{row $i$ total}) \times (\text{column $j$ total})}{\text{table total}}\vspace{2mm} \end{aligned}\]

    The chi-square test for two-way tables

    The chi-square test statistic for a two-way table is found the same way it is found for a one-way table. For each table count, compute

    \[\begin{aligned} &\text{General formula} && \frac{(\text{observed count } - \text{expected count})^2} {\text{expected count}} \\ &\text{Row 1, Col 1} && \frac{(2- 20.33)^2}{20.33} = 16.53 \\ &\text{Row 1, Col 2} && \frac{(23- 20.33)^2}{20.33} = 0.35 \\ & \hspace{9mm}\vdots && \hspace{13mm}\vdots \\ &\text{Row 2, Col 3} && \frac{(37- 52.67)^2}{52.67} = 4.66\end{aligned}\]

    Adding the computed value for each cell gives the chi-square test statistic \(X^2\):

    \[\begin{aligned} X^2 = 16.53 + 0.35 + \dots + 4.66 = 40.13\end{aligned}\]

    Just like before, this test statistic follows a chi-square distribution. However, the degrees of freedom are computed a little differently for a two-way table.4 For two way tables, the degrees of freedom is equal to

    \[\begin{aligned} df = \text{(number of rows minus 1)}\times \text{(number of columns minus 1)}\end{aligned}\]

    In our example, the degrees of freedom parameter is

    \[\begin{aligned} df = (2-1)\times (3-1) = 2\end{aligned}\]

    If the null hypothesis is true (i.e. the questions had no impact on the sellers in the experiment), then the test statistic \(X^2 = 40.13\) closely follows a chi-square distribution with 2 degrees of freedom. Using this information, we can compute the p-value for the test, which is depicted in Figure [iPodChiSqTail].

    Computing degrees of freedom for a two-way table When applying the chi-square test to a two-way table, we use

    \[\begin{aligned} df = (R-1)\times (C-1) \end{aligned}\]

    where \(R\) is the number of rows in the table and \(C\) is the number of columns.

    When analyzing 2-by-2 contingency tables, one guideline is to use the two-proportion methods introduced in Section 2.

    Visualization of the p-value for X^2 = 40.13 when df = 2.
    Visualization of the p-value for \(X^2 = 40.13\) when \(df = 2\).

    Find the p-value and draw a conclusion about whether the question affects the sellers likelihood of reporting the freezing problem. Using a computer, we can compute a very precise value for the tail area above \(X^2 = 40.13\) for a chi-square distribution with 2 degrees of freedom: 0.000000002. (If using the table in Appendix [chiSquareProbabilityTable], we would identify the p-value is smaller than 0.001.) Using a significance level of \(\alpha=0.05\), the null hypothesis is rejected since the p-value is smaller. That is, the data provide convincing evidence that the question asked did affect a seller’s likelihood to tell the truth about problems with the iPod.

    Figure [diabetes2ExpMetRosiLifestyleSummary] summarizes the results of an experiment evaluating three treatments for Type 2 Diabetes in patients aged 10-17 who were being treated with metformin. The three treatments considered were continued treatment with metformin (), treatment with metformin combined with rosiglitazone (), or a lifestyle intervention program. Each patient had a primary outcome, which was either lacked glycemic control (failure) or did not lack that control (success). What are appropriate hypotheses for this test? [diabetes2ExpMetRosiLifestyleIntroExample]

    • There is no difference in the effectiveness of the three treatments.
    • There is some difference in effectiveness between the three treatments, e.g. perhaps the treatment performed better than .
    Failure Success Total
    109 125 234
    120 112 232
    90 143 233
    Total 319 380 699

    A chi-square test for a two-way table may be used to test the hypotheses in Example [diabetes2ExpMetRosiLifestyleIntroExample]. As a first step, compute the expected values for each of the six table cells.

    Compute the chi-square test statistic for the data in Figure [diabetes2ExpMetRosiLifestyleSummary].

    Because there are 3 rows and 2 columns, the degrees of freedom for the test is \(df = (3 - 1) \times (2 - 1) = 2\). Use \(X^2 = 8.16\), \(df = 2\), evaluate whether to reject the null hypothesis using a significance level of 0.05.


    1. For an example of a two-proportion hypothesis test that does not require the success-failure condition to be met, see Section [caseStudyMalariaVaccine].
    2. Using some of the rules learned in earlier chapters, we might think that the standard error would be \(np(1-p)\), where \(n\) is the sample size and \(p\) is the proportion in the population. This would be correct if we were looking only at one count. However, we are computing many standardized differences and adding them together. It can be shown – though not here – that the square root of the count is a better way to standardize the count differences.
    3. For readers not as old as the authors, an iPod is basically an iPhone without any cellular service, assuming it was one of the later generations. Earlier generations were more basic.
    4. Recall: in the one-way table, the degrees of freedom was the number of cells minus 1.

    This page titled 6.4: Testing for independence in two-way tables is shared under a CC BY-SA 3.0 license and was authored, remixed, and/or curated by David Diez, Christopher Barr, & Mine Çetinkaya-Rundel via source content that was edited to the style and standards of the LibreTexts platform.

    • Was this article helpful?