14.2: Chi-squared Goodness of Fit
- Page ID
- 50183
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)There are times when a hypothesis makes a statement about whether counts or proportions of subgroups in a sample will be approximately equal to specified counts or proportions. In these situations, a chi-squared goodness of fit test is appropriate. The chi-squared goodness of fit tests whether the counts of subgroups of a variable fit specified, expected proportions or counts. One thing that makes the chi-squared quite unusual is that the null hypothesis indicates the specified counts and, thus, sometimes a null result is desired. This is because the null states that the counts observed in the data will match those specified when using a goodness of fit test and the alternative hypothesis would, thus, be that the counts in the data do not match those specified.
Usually the chi-squared goodness of fit is used to test whether the distribution of data from a sample is similar to the distribution that is known or presumed true about a population. Chi squared goodness of fit, therefore, is assessing whether the data are a good fit to (i.e. are similar or dissimilar to) the counts or proportions specified in the null hypothesis. When a result is significant, it means that the counts are significantly different than those specified by the null hypothesis. When a result is not significant, it means that the counts are approximately equal to those specified by the null.
Let’s take a look at this using an example. Suppose that among the population of college students, 25% were business majors, 25% were psychology majors, 20% were nursing majors, and 30% were statistics majors. Suppose that you wanted to test whether the proportion of students in each of these majors at a small college in Statistonia students, overall. In this case, you would expect that at the proportions of majors at Statistonia College would be approximately 25% business majors, 25% psychology majors, 20% nursing majors, and 30% statistics majors. In this case, these percentages would be used as the specified proportions and a non-significant result would mean that the counts observed at Statistonia College were not significantly different from those in the population. A significant result would mean that the counts observed at Statistonia College were significantly different from those in the population. This is because a significant result occurs when counts observed in the data are different from the expected counts. (Note: We will see how to test this specific example in the SPSS section).
There are two kinds of counts considered in a chi-squared goodness of fit: Observed counts and expected counts. Observed counts refer to the counts that exist in the sample data. The expected counts are based on either known population proportions or on a hypothesis about what counts are expected to be. We will review these in detail later in this chapter. For now, let’s focus on being clear on the general idea of this test. Below shows a summary of how to interpret results from a chi-squared goodness of fit. Refer back to this as needed to remind yourself what the test is comparing and what a significant result means compared to what a non-significant result means.
Significant Result | Counts observed in the sample are significantly different from those expected based on the population or a hypothesis. | \(Observed\; Counts \neq Expected\; Counts\) |
---|---|---|
Non Significant Result |
Counts observed in the sample are not significantly different from those expected based on the population or a hypothesis. Alternative wording: Counts observed in the sample are approximately equal to those expected based on the population or a hypothesis. |
\(Observed\; Counts \approx Expected\; Counts\) |
Variables
One or more grouping/categorical variables are used in chi-square. These variables can either include nominal or ordinal data which are separated into categories for the purposes of hypothesis testing. It is also possible to create categories with interval or ratio data using intervals, however, this is rarely used because other tests are generally a better fit to those data and their corresponding hypotheses.
In a univariate chi-square, one variable with at least two categories or levels is used. However, it is possible to use a version that looks at the counts of categories of two or more variables at once such as when testing whether the counts by gender and race/ethnicity in a sample are proportionally similar to those in the population. However, the focus of this section of the chapter will be univariate chi-square.
Data and Assumptions
Each statistical test has some assumptions which must be met in order for the formula to function properly. In keeping, there are a few assumptions about the data which must be met before a chi-squared goodness of fit is used. The most important is that no category can have an expected count lower than 5. Thus, sample sizes should be large enough that each expected count is 5 or higher.
Hypotheses
Hypotheses for chi-squared goodness of fit focus on whether observed counts are similar or significantly different from those expected based on populations or theories (and their corresponding hypotheses). We will focus on the non-directional hypothesis for this chapter because the goodness of fit test functions as an omnibus test. It can indicate whether counts are or are not as expected but cannot, on its own, indicate which counts are different than expected and which are not. When there are only two subgroups, only the chi-squared goodness of fit test is needed to know whether both groups were significantly different than expected. When there are three or more groups and a significant result, however, some ambiguity remains about the pairwise comparisons. Thus, what the goodness of fit test tells us is, overall, whether counts of subgroups are as expected or not. The non-directional research and corresponding null hypotheses for chi-squared goodness of fit can be summarized as follows:
Research hypothesis |
Counts observed in the sample are significantly different from those expected based on the population or a theory. | \(H_A: f_{\text {observed }} \neq f_{\text {expected }}\) |
---|---|---|
Null hypothesis |
Counts observed in the sample are not significantly different from those expected based on the population or a theory. | \(H_0: f_{\text {observed }}=f_{\text {expected }}\) |
In this example \(f_{\text {observed }\) refers to the frequency of counts observed in the data for each subgroup or category of the test variable and \(f_{\text {expected }\) refers to the frequency of counts expected in the data for each subgroup or category of the test variable based on what is known or theorized about the population. When the counts in the sample are similar or equal to those expected, the result will be non-significant and, thus, favor the null hypothesis. Conversely, when the counts in the sample are notably dissimilar from those expected, the result will be significant and, thus, support the alternative hypothesis.
Limitations
A limitation of chi-squared goodness of fit is that it cannot always be used to determine cause effect relationships; however, under some conditions this may be appropriate using a different form of chi-squared known as a chi-squared test for independence (which we will cover later in this chapter). In addition, the chi-squared goodness of fit test can be hard to interpret when there are many categories and may not be an option when any of those groups has an expected count that is less than 5. However, it otherwise is a good option for comparing group counts when sufficient data are available.
Reading Review 14.1
- What kinds of variables are tested using chi-squared goodness of fit tests?
- What does a significant result indicate when using chi-squared goodness of fit tests?
- What does a non-significant result indicate when using chi-squared goodness of fit tests?