4.2: Categorical Distributions

Last updated
Save as PDF

Page ID: 49886

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\dsum}{\displaystyle\sum\limits} \)

\( \newcommand{\dint}{\displaystyle\int\limits} \)

\( \newcommand{\dlim}{\displaystyle\lim\limits} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

Categorical distributions are for categorical, nominal variables. Usually, these variables are demographic variables. Examples are gender: males, females (assuming a binary world, of course), diagnosis: anxiety, addiction, PTSD diagnosis, race: Black, White, Hispanic, Asian (assuming these are the only racial categories), academic major: Liberal Arts, Physical Sciences.

For categorical variables, the data consists of counts, or the number of times you observe someone to be in one of those types or groups within the nominal category. This distribution consists of the number of times you counted that someone or something was in a particular category.

The numbers assigned to each category type are just codes. Gender: 0 = males, 1 = females (assuming a binary world, of course), Diagnosis: 1 = Anxiety, 2 = Addiction, 3 = PTSD diagnosis, Race: 1 = Black, 2 = White, 3 = Hispanic, 4 = Asian (assuming these are the only racial categories), Academic Major: 0 = Liberal Arts, 1 = Physical Sciences.

Notice the codes. Some codes start with 0 (males) or 1 (anxiety). You can number the codes any way you see fit. For general ease, it seems easiest to start with 1 and 2, and so forth. But the number, mostly, does not matter for the purpose of entering the codes into a statistical processor. I say this mostly because, in later analysis, the order of the codes could matter in terms of comparison. One of the analyses, or statistical procedure, is dummy coding. Here, one of the categories has to be 0, and the other categories are 1, 2, 3, and so forth. We will cover this issue in regression analyses and other analyses. For now, let us keep it simple and say that as codes only, the numbers assigned to each group within a categorical variable can be anything, and for conventional purposes, it is easy to number them as 1, 2, or 3.

You would be forgiven if you were confused about the codes assigned to each group within a categorical variable, and the data summaries. The data for nominal variables are summarized as frequency counts. It is important to discern between the two. So for our examples, the following are summarized frequency counts for the group – Gender: males, n = 16, females, n = 16; Diagnosis: Number of patients with Anxiety, n = 35, Addiction, n = 23, PTSD diagnosis, n = 16; Race: Black, n = 345, White, n = 125, Hispanic, n = 163, Asian, n = 142; Academic Major: Liberal Arts, n = 93, Physical Sciences, n = 76. Side note: the n is lowercase and italicized to represent the sample size for groups, while the N represents the total sample size (APA manual, section 6.44, p. 187). Those counts are results and not the codes for each group.

Recall that these variables have variations by type and group, are mutually exclusive variables, and are discrete. In Figure One, or Figure 1, the frequency plot, along the X-axis, the participants are distributed horizontally. There are no values, and no one group, or type is higher or lower than the other. The numbers assigned to each group are just codes; there are no values or valences assigned to each number.

Why is it important to recognize that there are no numbers with values, just codes? Because it means the variation by type can be in any order along the X-axis. The order is arbitrary. If the order of the arrangement of the types along the X-axis is arbitrary, that means the shape of the distribution can be anything.

Why do you give a rat’s ass about that? In contrast to categorical distributions, the shape of a continuous distribution does matter, and we hope to see a normal bell curve distribution. However, categorical variables do not have to follow that distribution because the order of the types is arbitrary.

4.2.1: Creating the Table

The first step is to create a table of the nominal variables for each group within that nominal variable using SPSS.

4.2.2: Creating the Plot

The second step is to create a distribution plot of the nominal variables for each group within that nominal variable. This is repeated from above. The distribution plot consists of the X-axis and the Y-axis. The X-axis lists all the groups for the nominal variable in the order of the codes. The Y-axis is a frequency count of how many participants or observations fall in each group.

4.2.3: Evaluating the Plot
This page emphasizes the evaluation of categorical distribution plots to ensure sample characteristics reflect population traits. It discusses the impact of confounds and the necessity of comparing sample data to population expectations, highlighting the importance of balanced demographic sampling and the option to collapse groups with insufficient data. Quality assessments of distributions require alignment with expected patterns rather than just frequency count shapes.

Search

Text Color

Text Size

Margin Size

Font Type