14.3: Testing Goodness of Fit with Chi-Squared
- Page ID
- 50184
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Testing a Goodness of Fit Hypothesis
This test is used to see whether counts of subgroups or categories are approximately as expected (based on population data, underlying theory, and/or hypothesis) or are significantly different than expected. When the counts observed in the data are similar to the expected counts, the expected counts are said to be a good fit to the data. When the observed counts are significantly dissimilar to the expected counts, the expected counts are a poor fit to the data.
Determining Expected Counts
Expected counts are set based on either what is known or expected to be true about populations. Let’s review the origins for each of these possible ways of setting expected counts. When a sample is being compared to a known population, the population statistics proportions or counts are used to set expectations. When this is done, chi-squared is often being used to see if a sample is an appropriate match to, or representative of, the population from which it was drawn. This can be very useful if you want to establish that a sample represents a population on key demographics such as age, gender, ethnicity, and/or socioeconomic status before using them to test other hypotheses about the population. If the sample is not significantly different from the population on any of the demographic variables, they may be said to be a good representation of the population.
In this same way, a chi-squared goodness of fit can be used to test whether a sample is significantly different from a population or another sample when this may be relevant. For example, it can be used to see whether the counts of different ethnic identities in a sample of students is similar to (or significantly different from) the proportions of those identities in the rest of the student body they are meant to represent. If the counts are similar, it may be determined that the sample is a good representation of the sample on this specific demographic. If the counts are significantly different from the population of students, however, it may be determined that the sample is biased and/or that generalizations should be limited.
A chi-squared goodness of fit can also be used to see if a change has occurred in population. For example, suppose that a company does quality control testing each year and in the prior year, 80% of the products passed the quality control test but 20% failed and needed to be discarded. Suppose that since then the company has made substantial changes to their process to try to reduce quality issues. Suppose that after these changes, they do a sample quality control test of 100 products and find that 96 of them (equivalent to 96% of the sample) passed the quality control test and 4 (equivalent to 4%) fail and need to be discarded. The company could test whether this is a significant improvement over what would be expected before they made those changes to their process. In this example, a significant result would be desirable as it would be taken to indicate that the changes to their process resulted in significantly fewer items needing to be discarded due to quality issues.
The other way expected counts can be set is based on something theorized, hypothesized, or otherwise desired. For example, suppose that a company is launching a new drink and want to know which of two versions to release. Based on the success of other companies, they believe a sweeter version of the drink will be preferred and sell better than a less sweet, healthier version. Suppose they wanted to know whether the number of customers who preferred the sweet version would be different than the number of customers who preferred the healthier version. A chi-squared goodness of fit could be used to test whether the counts of those who preferred each version were even, indicating no clear preference for either version, or significantly uneven, indicating a preference for one version over the other. Knowing whether there is a significant preference for one version over the other can be very valuable information for the company when making their decision about which version to release.
Computing Expected Counts from Percents
When expected counts are based on percentages, those percentages need to be converted to counts before they can be used in the goodness of fit formula. To convert from a percentage to an expected count, the following formula is used:
\[f_e=p n \nonumber \]
Where:
\(f_e\) stands for frequency (i.e. count) expected for a category
\(p\) stands for the proportion or percent expected for a category in decimal form
\(n\) stands for the sample size for the data set
For example, suppose there was a sample of 40 adults, of which 40% were expected to be have children and 60% were expected to not have children. The expected counts for each subgroup would be calculated as follows:
Has Children (subgroup 1):
\[f_{e 1}=.40(40)=16 \nonumber \]
Does Not Have Children (subgroup 2):
\[f_{e 2}=.60(40)=24 \nonumber \]
Thus, the expected count for those who had children would be 16 persons and the expected count for those who did not have children would be 24 persons. Note that when you sum the expected counts for all groups, it will be equal to the total sample size.
Determining Observed Counts
Observed counts are based on sample data. Each category for a variable is identified and the participants in each of those categories are simply counted to get the observed counts. These are called observed counts because data are examples of what has been observed. The symbol used for frequency observed is fo. Numbers or names can be added to the subscript to differentiation subgroups. For example, if there were two subgroups, \(f_{o 1}\) could be used to refer to the observed counts for subgroup 1 and \(f_{o 2}\) could be used to refer to the observed counts of subgroup 2.
The Goodness of Fit Formula
The chi-squared goodness of fit formula is fairly simple and is as follows:
\[\chi^2=\Sigma \dfrac{\left(f_o-f_e\right)^2}{f_e} \nonumber \]
The steps to using this formula are as follows:
- Find the difference between \(f_o\) (frequency observed in the data) and \(f_e\) (the frequency expected) for each category (i.e subgroup).
- Square the difference for each category.
- Divide the squared difference by \(f_e\) for each category.
- Sum the results of step 3 to get the \(\chi^2\) value.
Thus, the \(\dfrac{\left(f_o-f_e\right)^2}{f_e}\) portion must be computed for each category or subgroup before these values are summed. If there are two categories, \(\dfrac{\left(f_o-f_e\right)^2}{f_e}\) is computed twice, once for each category, before their results are summed. If there are three categories, \(\dfrac{\left(f_o-f_e\right)^2}{f_e}\) is computed three times, once for each category, before their results are summed, and so on.
Example Using the Goodness of Fit Formula
Let’s try using the formula to test whether the counts of sample members who did and did not have children was approximately equal to 40% and 60%, respectively. Suppose that in the sample of 40 adults, 22 did not have children and 18 did. These would be used for the observed counts. The data can be organized as follows:
Subgroups | Observed | Expected |
---|---|---|
Has Children | 18 | 16 |
Does Not Have Children | 22 | 24 |
There are two ways you can organize and compute the \(\chi^2\) using the goodness of fit formula: table version and formula version. I really like the table format for organizing and conducting computations but some find the formula version easier to understand. The computations and steps are the same in both versions. The only difference is how the information is laid out. Thus, both versions are equally useful and are shown here so you can choose which the format that is the clearest to you.
Table format builds from the summary counts table above to organize and complete the chi squared formula steps. The observed and expected counts are the preparatory work which are then used in the steps of the chi-squared goodness of fit formula. For the above data, the table format of steps and results are as follows:
Preparation | Steps | ||||
---|---|---|---|---|---|
Subgroups | Observed | Expected |
Differences \(f_o-f_e\) |
Squared \((f_o-f_e)^2\) |
Divided \(\dfrac{\left(f_o-f_e\right)^2}{f_e}\) |
Has Children | 18 | 16 | 2 | 4 | 4/16 = 0.25 |
Does Not Have Children | 22 | 24 | (-2) | 4 | 4/24 = 0.1667 |
Total |
Summed 40 |
Summed \(\chi^2=0.4167\) |
The result when rounded to the hundredths place is: \(\chi^2=0.42\).
The same preparation and steps can be used in formula format. For this format, the parts of the formula are filled in and computed using order of operations. Here is what it looks like to compute the \(\chi^2\) in formula format:
Has Children | Does Not Have Children | |||
---|---|---|---|---|
\(\chi^2\) | = | \(\dfrac{\left(f_o-f_e\right)^2}{f_e}\) | + | \(\dfrac{\left(f_o-f_e\right)^2}{f_e}\) |
\(\chi^2\) | = | \(\dfrac{(18-16)^2}{16}\) | + | \(\dfrac{(22-24)^2}{24}\) |
\(\chi^2\) | = | \(\dfrac{(2)^2}{16}\) | + | \(\dfrac{(-2)^2}{24}\) |
\(\chi^2\) | = | \(\dfrac{4}{16}\) | + | \(\dfrac{4}{24}\) |
\(\chi^2\) | = | 0.25 | + | 0.1667 |
\(\chi^2\) | = | 0.4167 |
The result when rounded to the hundredths place is: \(\chi^2=0.42\).
Notice that the steps and results are the same whether you use the table format or the formula format. This is because each is just a different way of showing the same steps and computations.
Reading Review 14.2
- What is an observed count based on?
- What is an expected count based on?
- What does \(f_o\) represent?
- What does \(f_e\) represent?
- What are the steps to computing \(\chi^2\) goodness of fit for two categories?