Skip to main content
Statistics LibreTexts

15.1: Chi-Square Goodness of Fit Test

  • Page ID
    56671
    • Chanler Hilley, Kennesaw State University
    • University of Missouri System

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    We come at last to our final topic: chi-square (\(\chi^2 \)). This test is a special form of analysis called a nonparametric test, so the structure of it will look a little bit different from what we have done so far. However, the logic of hypothesis testing remains unchanged. The purpose of chi-square is to understand the frequency distribution of a single categorical variable or find a relationship between two categorical variables, which is frequently a very useful way to look at our data.

    Categories and Frequency Tables

    Our data for the \(\chi^2 \) test are categorical—specifically, nominal—variables. Recall from Unit 1 that nominal variables have no specified order and can only be described by their names and the frequencies with which they occur in the dataset. Thus, unlike the other variables we have tested, we cannot describe our data for the \(\chi^2 \) test using means and standard deviations. Instead, we will use frequency tables.

    Table \(\PageIndex{1}\) gives an example of a frequency table used for a \(\chi^2 \) test. The columns represent the different categories within our single variable, which in this example is pet preference. The \(\chi^2 \) test can assess as few as two categories, and there is no technical upper limit on how many categories can be included in our variable, although, as with ANOVA, having too many categories makes our computations long and our interpretation difficult. The final column in the table is the total number of observations, or N. The \(\chi^2 \) test assumes that each observation comes from only one person and that each person will provide only one observation, so our total observations will always equal our sample size.

    Table \(\PageIndex{1}\): Pet preferences
    Cat Dog Other Total
    Observed 14 17 5 36
    Expected 12 12 12 36

    There are two rows in this table. The first row gives the observed frequencies of each category from our dataset; in this example, 14 people reported preferring cats as pets, 17 people reported preferring dogs, and 5 people reported a different animal. The second row gives expected values; expected values are what would be found if each category had equal representation. The calculation for an expected value is:

    \[\Large E = \frac{N}{C} \nonumber \]

    where N is the total number of people in our sample and C is the number of categories in our variable (also the number of columns in our table). The expected values correspond to the null hypothesis for \(\chi^2 \) tests: equal representation of categories. Our first of two \(\chi^2 \) tests, the test for goodness of fit, will assess how well our data lines up with, or deviates from, this assumption.

    Test for Goodness of Fit

    The test for goodness of fit assesses one categorical variable against a null hypothesis of equally sized frequencies. Equal frequency distributions are what we would expect to get if categorization were completely random. We could, in theory, also test against a specific distribution of category sizes if we have a good reason to. If we have information about how a population is distributed, we could compare our observed sample distribution to the expected values if the sample followed the same distribution as the population. For example, if we know that in the population of a small liberal arts college, 15% of students are international students, while 85% are domestic students, we would then calculate expected values for our sample using these percentages. In that case, we would be testing against the null hypothesis of 15% international students. This is less common, so we will not deal with more examples of this sort in this text.

    Hypotheses

    All \(\chi^2 \) tests, including the test for goodness of fit, are nonparametric tests. This means that there is no population parameter we are estimating or testing against; we are working only with our sample data. Because of this, there are no mathematical statements for \(\chi^2 \) hypotheses. This should make sense because the mathematical hypothesis statements were always about population parameters (e.g., \(\mu \)), so if we are nonparametric, we have no parameters and therefore no mathematical statements.

    We do, however, still state our hypotheses verbally. For \(\chi^2 \) tests for goodness of fit, our null hypothesis is that there is an equal number of observations in each category. That is, there is no difference between the categories in how prevalent they are. Our alternative hypothesis says that the categories do differ in their frequency. We do not have specific directions or one-tailed tests for \(\chi^2 \), matching our lack of mathematical statements.

    Degrees of Freedom and the \(\chi^2 \) Table

    Our degrees of freedom for the \(\chi^2 \) test are based on the number of categories we have in our variable, not on the number of people or observations, as it was for our other tests. Luckily, they are still as simple to calculate:

    \[\Large df = C - 1 \nonumber \]

    So, for our pet preference example, we have 3 categories, thus we have 2 degrees of freedom. Our degrees of freedom, along with our significance level (still defaulted to \(\alpha = .05 \)) are used to find our critical values in the \(\chi^2 \) table, a portion of which is shown in Table \(\PageIndex{2}\). (The complete \(\chi^2 \) table can be found in section 16.5.) Because we do not have directional hypotheses for \(\chi^2 \) tests, we do not need to differentiate between critical values for one- or two-tailed tests. In fact, just like our F tests for regression and ANOVA, all \(\chi^2 \) tests are one-tailed tests.

    Table \(\PageIndex{2}\): Critical Values for Chi-Square (\(\chi^2\) Table) - Columns are proportions in critical regions.
    df .1 .05 .02 .01 .005
    1 2.706 3.841 5.024 6.635 7.879
    2 4.605 5.991 7.378 9.210 10.597
    3 6.251 7.815 9.348 11.345 12.838
    4 7.779 9.488 11.143 13.277 14.860
    5 9.236 11.070 12.833 15.086 16.750
    6 10.645 12.592 14.449 16.812 18.548
    7 12.017 14.067 16.013 18.475 20.278
    8 13.362 15.507 17.535 20.090 21.955
    9 14.684 16.919 19.023 21.666 23.589
    10 15.987 18.307 20.483 23.209 25.188

    \(\chi^2 \) Statistic

    The calculations for our test statistic in \(\chi^2 \) tests combine our information from our observed frequencies (O) and our expected frequencies (E) for each level of our categorical variable. For each cell (category), we find the difference between the observed and expected values, square them, and divide by the expected values. We then sum this value across cells for our test statistic. This is shown in the formula:

    \[\Large \chi^2 = \sum{\frac{(O-E)^2}{E}} \nonumber \]

    For our pet preference data, we would have:

    \[\Large \chi^2 = \frac{(14-12)^2}{12} + \frac{(17-12)^2}{12} + \frac{(5-12)^2}{12} = 0.33+2.08+4.08 = 6.49 \nonumber \]

    Notice that, for each cell’s calculation, the expected value in the numerator and the expected value in the denominator are the same value. Let’s now take a look at an example from start to finish.

    Example: Pineapple on Pizza

    There is a very passionate and ongoing debate about whether pineapple should go on pizza. Being the objective, rational data analysts that we are, we will collect empirical data to see if we can settle this debate once and for all. We gather data from a group of adults, asking for a simple yes-or-no answer.

    Step 1: State the Hypotheses

    We start, as always, with our hypotheses. Our null hypothesis of no difference states that an equal number of people will say they do and do not like pineapple on pizza, and our alternative hypothesis will be that one side wins out over the other:

    \[
    \begin{aligned}
    H_0:&\ \text{An equal number of people do and do not like pineapple on pizza} \\
    H_A:&\ \text{A significant majority of people agree one way or the other} \nonumber
    \end{aligned}
    \]

    Step 2: Find the Critical Value

    To avoid any potential bias in this crucial analysis, we will leave \(\alpha \) at its typical level. We have two options in our data (Yes or No), which will give us two categories. Based on this, we will have 1 degree of freedom. From our \(\chi^2 \) table, we find a critical value of 3.84.

    Step 3: Calculate the Test Statistic and Effect Size

    The results of the data collection are presented in Table \(\PageIndex{3}\). We had data from 45 people in all and 2 categories, so our expected values are E = 45/2 = 22.50.

    Table \(\PageIndex{3}\): Pineapple-on-pizza preferences
    Yes No Total
    Observed 26 19 45
    Expected 22.50 22.50 45

    We can use these to calculate our \(\chi^2 \) statistic:

    \[\Large \chi^2 = \frac{(26-22.50)^2}{22.50} + \frac{(19-22.50)^2}{22.50} = 0.54+0.54 = 1.08 \nonumber \]

    Effect Size for \(\chi^2 \)

    Like all other significance tests, \(\chi^2 \) \ tests—both for goodness of fit and for independence—have effect sizes that can and should be calculated. There are many options for which effect size to use, and the ultimate decision is based on the type of data, the structure of your frequency or contingency table, and the types of conclusions you would like to draw. For the purpose of our introductory course, we will focus only on a single effect size that is simple and flexible: Cramer’s V.

    Cramer’s V is a type of correlation coefficient that can be computed on categorical data. Like any other correlation coefficient (e.g., Pearson’s r), the cutoffs for small, medium, and large effect sizes of Cramer’s V are .10, .30, and .50, respectively. The calculation of Cramer’s V is very simple:

    \[\Large V = \sqrt{\frac{\chi^2}{N(k-1)}} \nonumber \]

    For this calculation, k is the smaller value of either R (the number of rows) or C (the number of columns). The numerator is simply the test statistic we calculate during Step 3 of the hypothesis-testing procedure. For our example, we had 2 rows and 3 columns, so k = 2:

    \[\Large V = \sqrt{\frac{\chi^2}{N(k-1)}} = \sqrt{\frac{1.08}{45(2-1)}} = \sqrt{\frac{1.08}{45}} = \sqrt{.024} = .15 \nonumber \]

    So the statistically significant relationship between our variables was moderately strong.

    Step 4: Make the Decision

    Our observed test statistic had a value of 1.08, and our critical value was 3.84. Our test statistic was smaller than our critical value, so we fail to reject the null hypothesis, and the debate rages on. Figure \(\PageIndex{1}\) shows the output from JASP for this example.

    A table showing results of a multinomial test: chi-square=1.091, df=1, p=0.297. Observed: No=19, Yes=25. Expected (H0): No=22, Yes=22.
    Figure \(\PageIndex{1}\): Output from JASP for the \(\chi^2 \) test for goodness of fit described in the Pineapple on Pizza example. (“JASP chi-square goodness of fit” by Rupa G. Gordon/Judy Schmitt is licensed under CC BY-NC-SA 4.0.)
    Video: Pearson's chi square test (goodness of fit)

    Pearson's chi square test (goodness of fit) on YouTube.

    Test Your Knowledge

    Question \(\PageIndex{1}\)

    Question \(\PageIndex{2}\)

    Question \(\PageIndex{3}\)

    Question \(\PageIndex{4}\)


    This page titled 15.1: Chi-Square Goodness of Fit Test is shared under a not declared license and was authored, remixed, and/or curated by Chanler Hilley, Kennesaw State University via source content that was edited to the style and standards of the LibreTexts platform.