Skip to main content
Statistics LibreTexts

14.1: Categories and Frequency Tables

  • Page ID
    14545
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Our data for the \(\chi^{2}\) test are categorical, specifically nominal, variables. Recall from unit 1 that nominal variables have no specified order and can only be described by their names and the frequencies with which they occur in the dataset. Thus, unlike our other variables that we have tested, we cannot describe our data for the \(\chi^{2}\) test using means and standard deviations. Instead, we will use frequencies tables.

    Table \(\PageIndex{1}\): Pet Preferences
      Cat Dog Other Total
    Observed 14 17 5 36
    Expected 12 12 12 36

    Table \(\PageIndex{1}\) gives an example of a frequency table used for a \(\chi^{2}\) test. The columns represent the different categories within our single variable, which in this example is pet preference. The \(\chi^{2}\) test can assess as few as two categories, and there is no technical upper limit on how many categories can be included in our variable, although, as with ANOVA, having too many categories makes our computations long and our interpretation difficult. The final column in the table is the total number of observations, or \(N\). The \(\chi^{2}\) test assumes that each observation comes from only one person and that each person will provide only one observation, so our total observations will always equal our sample size.

    There are two rows in this table. The first row gives the observed frequencies of each category from our dataset; in this example, 14 people reported liking preferring cats as pets, 17 people reported preferring dogs, and 5 people reported a different animal. The second row gives expected values; expected values are what would be found if each category had equal representation. The calculation for an expected value is:

    \[E=\dfrac{N}{k}\]

    Where \(N\) is the total number of people in our sample and \(k\) is the number of categories in our variable (also the number of columns in our table). The expected values correspond to the null hypothesis for \(\chi^{2}\) tests: equal representation of categories. Our first of two \(\chi^{2}\) tests, the Goodness-of-Fit test, will assess how well our data lines up with, or deviates from, this assumption.

    Contributors and Attributions

    • Foster et al. (University of Missouri-St. Louis, Rice University, & University of Houston, Downtown Campus)


    This page titled 14.1: Categories and Frequency Tables is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Foster et al. (University of Missouri’s Affordable and Open Access Educational Resources Initiative) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.

    • Was this article helpful?