14.5: Testing Independence with Chi-Squared
- Page ID
- 50187
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Testing a Hypothesis of Independence
The chi-squared test for independence is also known as a Pearson’s chi-squared test. This version of chi-squared is used to compare the counts of outcomes under different conditions and can be appropriate for experiments which are used to compare counts. Thus, cause-effect can sometimes be deduced using this technique when appropriate to the data and corresponding hypothesis. The formula for this is the same as the one for a goodness of fit. Where these techniques diverge is in how the expected counts are computed. Thus, the focus of this section will be the computations for the chi-squared test of independence first followed by completing the steps for hypothesis testing using an example.
A chi-squared test of independence can be used to test whether the counts for one variable are dependent on another variable. The test compares the observed frequencies to those that would be expected if the variables were independent. When the observed counts are similar to the expected counts and, thus, the result is non-significant, it indicates that the variables are independent. When the observed counts are significantly dissimilar to the expected counts, it indicates that the variables are dependent.
Suppose that you hypothesize that the count of customers who make a purchase depends upon whether they are greeted when entering a store. Suppose that to test this, every other customer is greeted when they enter the store until 80 customers have been observed. The summary data for the observations can be organized into a cross tabulation (crosstabs) table.
Counts for Test of Independence | |||
---|---|---|---|
Purchased | Did Not Purchase | Row | Total |
Greeted | 15 | 25 | 40 |
Not Greeted | 8 | 32 | 40 |
Column Total | 23 | 57 | \(N=80\) |
Determining Expected Counts
The expected counts are computed for each category using the following formula: \[f_e=\dfrac{\text { row total } \times \text { column total }}{N} \nonumber \]
The expected counts for Data Set 14.2 for this scenario are as follows:
Observed |
Expected |
|
---|---|---|
Greeted |
||
Purchased |
15 |
\(f_e=\dfrac{40 \times 23}{80}=11.50\) |
Did Not Purchase |
25 |
\(f_e=\dfrac{40 \times 57}{80}=28.50\) |
Not Greeted |
||
Purchased |
8 |
\(f_e=\dfrac{40 \times 23}{80}=11.50\) |
Did Not Purchase |
32 |
\(f_e=\dfrac{40 \times 57}{80}=28.50\) |
The Chi-Squared Formula
The chi-squared formula is the same for the tests of independence and goodness of fit. Thus, the formula for computing \(\chi^2\) is still:
\[\chi^2=\Sigma \dfrac{\left(f_o-f_e\right)^2}{f_e} \nonumber \]
Recall that the steps to using this formula are as follows:
- Find the difference between \(f_o\) (frequency observed in the data) and \(f_e\) (the frequency expected) for each category.
- Square the difference for each category.
- Divide the squared difference by \(f_e\) for each category.
- Sum the results of step 3 to get the \(\chi^2\) value.
Example Using Chi-Squared Formula
Let’s complete the computations for Data Set 14.2 using the table method. The computations are as follows:
Preparation | Steps | ||||
---|---|---|---|---|---|
Subgroups | Observed | Expected |
Differences \(f_o-f_e\) |
Squared \((f_o-f_e)^2\) |
Divided \(\dfrac{\left(f_o-f_e\right)^2}{f_e}\) |
Greeted | |||||
Purchased | 15 | 11.50 | 3.50 | 12.25 | 12.25/11.50 = 1.0652… |
Did Not Purchase | 25 | 28.30 | (-3.50) | 12.25 | 12.25/28.50 = 0.4298… |
Not Greeted | |||||
Purchased | 8 | 11.50 | (-3.50) | 12.25 | 12.25/11.50 = 1.0652… |
Did Not Purchase | 32 | 28.50 | 3.50 | 12.25 | 12.25/28.50 = 0.4298… |
Total | 80 | \(\chi^2=2.9900 \ldots\) |
Steps in Hypothesis Testing
The computations for Data Set 14.2 are already shown above. Therefore, for this section, we will focus on the steps to testing but with an abbreviated section on the aforementioned computations. In order to test a hypothesis, we must follow these steps:
1. State the hypothesis.
A summary of the research hypothesis can be stated as follows: It is hypothesized that the count of customers who make a purchase depends upon whether they are greeted when entering a store.
The null hypothesis for this example would state that the counts of customers who make a purchase does not depend on whether they are greeted when entering a store. Keep in mind that the expected counts are what will occur if counts of purchases are not dependent on greetings. If the result is significant it will support the research hypothesis. However, if the result is not significant, it will not support the research hypothesis and the null will be retained.
2. Choose the inferential test (formula) that best fits the hypothesis.
The counts of categories for a qualitative variable are being tested to see whether they are independent of another variable so the appropriate test is chi-squared test of independence.
3. Determine the critical value.
In order to determine the critical value for chi-square, we need to know the alpha level and the degrees of freedom. The alpha level is often set at .05. The degrees of freedom for this chi-squared are as follows:
\[\begin{gathered}
d f=k-1 \\
d f=4-1 \\
d f=3
\end{gathered} \nonumber \]
The alpha level and df are used to determine the critical value for the test. The tables of the critical values for χ2are located earlier in this chapter.
The critical value for this example is 7.815. The obtained \(\chi^2\)-value must be greater than 7.815 to be declared significant when using Data Set 14.2.
4. Calculate the test statistic.
In order to use a test of independence, we first must find the observed and expected counts. Observed counts are based on a data set, however, expected counts must be computed. Computing the expected counts is the key feature which distinguishes a test of independence from a goodness of fit test when using chi-squared. To find the expected counts, the following formula is used:
\[f_e=\dfrac{\text { row total } \times \text { column total }}{N} \nonumber \]
The details to using this formula to compute the expected counts for Data Set 14.2 is shown in the prior section. Once found, the observed and expected counts are plugged into the same formula which was used for the goodness of fit tests. To remind, that formula is as follows:
\[\chi^2=\Sigma \dfrac{\left(f_o-f_e\right)^2}{f_e} \nonumber \]
The computations for Data Set 14.2 are shown in Table 14.2 in the previous section of this chapter (see Table 14.2 for step-by-step computations). The result, when rounded to the hundredths place, is: \(\chi^2\) = 2.99.
5. Apply a decision rule and determine whether the result is significant.
Assess whether the obtained value exceeds the critical value as follows:
The critical value is 7.815
The obtained \(\chi^2\)-value is 2.99
The obtained \(\chi^2\)-value does not exceed (i.e. is lesser than) the critical value and, thus, the result is not significant.
6. Calculate the effect sizes and any secondary analyses.
When a chi-squared test of independence result is significant, post-hoc tests are sometimes desired. When three or more category counts are being compared, various post-hoc tests (such as using a secondary chi-squared test for independence with a Bonferroni correction to compare each pair of categories) may be desired and used. However, because the current test was not significant, post-hoc analyses are not warranted.
7. Report the results in American Psychological Associate (APA) format.
The same formatting guidelines for reporting a goodness of fit result apply to a test of independence. In addition, in this example the expected counts were not even and this needs to be made clear. Thus, it can be useful to include the expected counts in the summary. The results for our hypothesis with Data Set 14.2 can be written as shown in the summary example below.
A chi-squared test of independence was used to test the hypothesis that the count of customers who make a purchase depends upon whether they are greeted when entering a store. Contrary to the hypothesis, the counts of those who were greeted who did (\(n\) = 15), and did not make purchases (\(n\) = 25) and the counts of those who were not greeted who did (\(n\) = 8) and did not make purchases (\(n\) = 32) were not significantly different than expected counts of 11.50, 28.50, 11.50, and 28.50, respectively, \(\chi^2(3, N= 80) = 2.99\), \(p\) >.05. Thus, the data do not support the hypothesis that purchasing depends upon being greeted.
Reading Review 14.4
- How is \(df\) calculated for a chi-squared test of independence?
- How are expected counts for each category calculated when using a chi-squared test of independence?
- What detail about each category should be included in the APA-formatted summary for a chi-squared test of independence?