Skip to main content
Statistics LibreTexts

18.1: Nominal Dependent Variables

  • Page ID
    57802
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    A nominal variable is a categorical variable where there does not exist a meaningful ordering in the categories. Examples may include job type, presidential vote (and non-vote), and beer brand choice. These variables are categorical — not numeric — and the categories have no inherent ordering. White Collar is not "greater than" Professional. Voting Monarčista is not "more than" voting Republikán. Widmer is not "more than" Coors.

    How do we model such dependent variables?

    Of course, there may be a time when you are predicting Republikán vote by examining an underlying level of conservatism. In such a case, Monarčista–Republikán would be ordered. Thus, it really depends on what you are modeling (as always).

    There are a couple of ways of doing this. The first, easiest, and most understandable is to model the variable as a series of binary dependent variables. We already understand how this works, the testing of the model is already conceptually understood, and it works (not really, but close?). There are just a couple things to clarify.

    Caution

    Usually. Nothing in statistics always is best. As you have seen by now, there are always methods that work better, but with trade-offs. The science here is to be aware of the strengths with the weaknesses and balance them to get closer to the true process you are trying to model.

    The Multinomial Distribution

    As with the simply binary dependent variable case, let us layout the mathematical background to the nominal dependent variable case. As in the binary dependent variables case, we are actually modeling the underlying probabilities of each of the outcomes. Also, as in the binary case, there are five requirements for the random variable to follow a Multinomial distribution (cf. Section 22.4: Probability Distributions):

    1. the number of trials, \(n\), is known;
    2. each trial has \(J\) possible outcomes;
    3. the success probability for each trial, \(\{\pi_1, \pi_2, \ldots, \pi_J\}\), is constant;
    4. each trial is independent from the others; and
    5. the random variable is the number of each type of outcome in those \(n\) trials.

    Thus, if we let \(\pi_j\) be the probability that category \(j\) is selected, then the following two conditions must hold:

    \begin{align}
    & 0 < \pi_j < 1 \qquad \text{ for all } j \in \{1, 2, \ldots, J\} \label{eq:multi-probreq1} \\[1em]
    & \sum_{j=1}^{J}\ \pi_j = 1 \label{eq:multi-probreq2}
    \end{align}

    Condition \(\ref{eq:multi-probreq1}\) must hold because we are dealing with probabilities bounded by 0 and 1, and Condition \(\ref{eq:multi-probreq2}\) holds because one of the \(J\) possible outcomes must happen. In the binary case, our two probabilities were \(\pi_1 = \pi\) and \(\pi_2 = 1-\pi\), which satisfies the second condition by default and the first because it makes no sense to study phenomena that always or never occurs.

    When we generalize the binary case, we need to select an appropriate probability distribution — one that can model \(J\) possible outcomes with $J$ different probabilities. That distribution is called the multinomial distribution.

    Recall that the distribution in the binary case was the binomial distribution.

    The probability density function for the multinomial distribution in the general case is

    \begin{equation}
    f_\mathbf{X}(\mathbf{X})\ =\ \frac{n!}{x_1! x_2! \cdots x_J!}\ \ \pi_1^{x_1}\ \pi_2^{x_2}\ \cdots\ \pi_J^{x_J} \label{eq:multi-multinomialpdf}
    \end{equation}

    Here, \(x_i\) are non-negative integers and \(\sum x_j = n\). The expected value of this distribution for a specified outcome is

    \begin{equation}
    E[X_j] = n\pi_j
    \end{equation}

    and the covariance between two outcomes is

    \begin{equation}
    Cov[X_i, X_j] = -n\pi_i\pi_j
    \end{equation}

    Multinomial Outcomes

    Be aware that \(\mathbf{X}\) is a vector. So, if \(n=1\) and \(J=4\), the following could be outcomes from the Multinomial distribution:
    \[
    x = \left[ \begin{array}{c}0 \\ 1\\0\\0 \end{array}\right]; \qquad x = \left[ \begin{array}{c} 1\\0\\0\\0 \end{array}\right]; \qquad x = \left[ \begin{array}{c} 0\\0\\1\\0 \end{array}\right]
    \]

    In the first example, a 2 came up; in the second, a 1; in the third, a 3. Note that in each case, the sum of the entries is \(n\) and the number of entries is \(J\).

    Now, if \(n=4\) and \(J=3\), the following could be outcomes from a Multinomial distribution:
    \[
    x = \left[ \begin{array}{c} 0\\3\\1 \end{array}\right]; \qquad x = \left[ \begin{array}{c} 2\\1\\1 \end{array}\right]; \qquad x = \left[ \begin{array}{c} 0\\0\\4 \end{array}\right]
    \]

    In the first example, three 2s and a 3 came up; in the second, two 1s, a 2, and a 3 came up; in the last, four 3s came up.

    Be aware that the sum of the entries in each outcome vector is \(n\) and that the number of entries is \(J\).

    If the random variable \(\mathbf{X}\) follows a Multinomial distribution with \(n=3\) and \(\boldsymbol{\pi} = [0.1, 0.5, 0.4]^\prime\), then we could write it as
    \begin{equation}
    \mathbf{X} \sim Multi\left(n=3,\ \boldsymbol{\pi} = \left[ \begin{array}{c} 0.1\\0.5\\0.4 \end{array}\right]^{\phantom{!}} \right)
    \end{equation}

    and the expected value of \(\mathbf{X}\) would be

    \begin{equation}
    E[\mathbf{X}] = \left[ \begin{array}{c} 0.3\\1.5\\1.2 \end{array}\right]
    \end{equation}

    The expected value of \(X_3\) would be \(E[X_3]=1.2\).

    Note

    Make sure you see that this is just an extension of the Binomial distribution, where

    \begin{equation}
    f_X(x) = \frac{n!}{x! (n-x)!}\ \pi^{x} (1-\pi)^{n-x}
    \end{equation}

    with

    \begin{equation}
    \mathbf{X} = \left[ \begin{array}{c} x \\ n-x \end{array} \right] \qquad \text{and} \qquad
    \boldsymbol{\pi} = \left[ \begin{array}{c} \pi \\ 1-\pi \end{array} \right]
    \end{equation}

    Using the Multinomial

    Let us look at how to use the Multinomial distribution for modelling with two examples.

    Example \(\PageIndex{1}\): Rolling Three Dice

    Let us illustrate the multinomial distribution with a typical "rolling a die example." Assuming that the die is fair, then the probability of rolling each of the six outcomes is \(1/6\). If we roll a fair die 3 times, what is the probability the outcome is \([1, 0, 1, 0, 0, 1]^\prime\) (that is, a 1, a 3, and a 6 come up)? What is the expected value of \(\mathbf{X}\)?

    Solution.
    This is a multinomial experiment. There are a fixed number of possible outcomes (six), the probabilities of each outcome are constant (they do not change as we roll the die), and the probabilities sum to one. As such, we know the probability mass function is
    \begin{equation}
    f(\mathbf{x}) = \frac{3!}{x_1!\ x_2!\ x_3!\ x_4!\ x_5!\ x_6!\ }\ \left(\frac{1}{6}\right)^{x_1}\ \left(\frac{1}{6}\right)^{x_2}\ \left(\frac{1}{6}\right)^{x_3}\ \left(\frac{1}{6}\right)^{x_4}\ \left(\frac{1}{6}\right)^{x_5}\ \left(\frac{1}{6}\right)^{x_6}
    \end{equation}

    Thus,
    \begin{align}
    P\left[ \mathbf{X}=\left[ \begin{array}{c} 1\\0\\1\\0\\0\\1 \end{array}\right]^{\phantom{X}} \right] &= \frac{3!}{1!\ 0!\ 1!\ 0!\ 0!\ 1!}\ \left(\frac{1}{6}\right)^{1}\ \left(\frac{1}{6}\right)^{0}\ \left(\frac{1}{6}\right)^{1}\ \left(\frac{1}{6}\right)^{0}\ \left(\frac{1}{6}\right)^{0}\ \left(\frac{1}{6}\right)^{1} \\
    &= 6 \left(\frac{1}{6}\right)^3 \\[1em]
    &= \frac{1}{36}
    \end{align}

    Thinking through the problem should get us to the same point.

    Finally, we know the expected value is
    \begin{equation}
    E[\mathbf{X}] = n\mathbf{\pi} = 3 \left[ \begin{array}{c} 1/6\\1/6\\1/6\\1/6\\1/6\\1/6 \end{array}\right] = \left[ \begin{array}{c} 0.5\\0.5\\0.5\\0.5\\0.5\\0.5 \end{array}\right]
    \end{equation}

    \(\blacksquare\)

    Estimating πj

    As we have a formula for our expected value, we have our mechanism for estimating the several \(\pi_j\): in an experiment (or set of data), count the number of times outcome \(j\) occurred and divide by the total number of trials (or records). This is actually the maximum likelihood estimator for \(\pi_j\). Thus, our linear predictor is
    \begin{equation}
    \mathrm{logit}(\pi_j) = \beta_{j,0} + \beta_{j,1} x_1 + \beta_{j,2} x_2 + \cdots +\beta_{j,k} x_k
    \end{equation}

    Notice that this linear predictor has \(k+1\) parameters to estimate for each of the \(j\) categories. Thus, you will need more than \(j(k+1)\) pieces of data to fit it. There are ways to reduce the dimensionality of the problem (reduce the number of parameters in need of estimation); however, these are beyond the scope of this book.

    We need the logit link (or something just like it) to force our linear predictions to be in the range \(\pi_j \in (0, 1)\). As any link that maps \(g :\) ℝ \(\to (0,1)\) is acceptable, we could use the log-log link, the complementary log-log link, the probit link, or any of an infinite number of others... in theory. As before, the choice of the link function is largely a matter of tradition. If you deviate from tradition, the burden of proof is on you to justify the selection. Furthermore, the differences are usually slight. If the differences are large, then there is something wrong with your research model. Because of this, it would behoove you to fit your research model using a couple different (appropriate) link functions to help determine the stability (robustness) of your results.

    Thus, there are two things that you need to take away from this discussion: First, we are able to fit the entire model at once because we have a distribution that can produce the necessary nominal results. Second, we model the underlying probabilities (like in the binary case), not the actual outcomes, as usual.

    Modeling Nominal Variables

    To see all of this in action, let us look at an extended example.

    Example \(\PageIndex{2}\): Modeling Occupation

    The General Social Survey (GSS) at the University of Chicago conducts an extensive survey of adult Americans every year. The data is freely available from NORC. In this very small subset of the data, gssocc, I would like to predict a person's occupation category (occ) based on race (white), years of education (ed), and years of experience (exper).

    Representative Sample?

    Before getting started, let us examine the variables involved. The race variable is binary, with a 1 representing the person identifying as "white" and a 0 otherwise. As a side note, this is a race variable, not an ethnicity variable. Thus, Hispanics may self-identify as either white or not-white. Also note that this is a self-identification variable; that is, the individual being surveyed decided his or her reported race. Looking at a frequency count, a full 91.69% of the respondents stated they were white. This is significantly higher than the population at large, where approximately 80% of Americans were white when the survey was conducted. When we do the final analysis, we need to keep this in mind, as it is not necessarily representative of the nation as a whole.

    The median number of years of education in the sample is 12 years, which corresponds to graduating from high school. The mean number of years is 13.09, which indicates the sample is right skewed (the Hildebrand ratio is +0.37). Furthermore, it is interesting to note that 51.0% of the sample only graduated from high school. Additionally, 23.4% of the sample received a bachelor's degree or more (which is close to the population 27% who have received a bachelor's degree or higher). Finally, 18.7% of the sample did not graduate from high school, which is close to the 15% estimate of the population. From this, it appears as though the sample is representative of the population in terms of educational attainment.

    Note

    This was a safe assumption with respect to the educ variable, but not with respect to the white variable. As such, it needs to be mentioned that you are unable to check the representativeness of the experience variable.

    The third independent variable is the years of experience in the job. There are no general statistics for the population, so we will have to make a large assumption that the sample represents the population. In the sample, the years of experience varies widely, from 2 to 66 years. The median is 17 years and the mean is 20.5 years. Thus, the sample is also right skewed. This makes sense as this is a count variable. Count variables tend to be right skewed as they cannot take on negative values. In fact, there is nothing in the distribution of the experience variable that looks wrong. With that said, however, one still needs to mention the caveat that we cannot tell if this is representative of the population.

    Correlations

    Looking at the correlations amongst the independent variables can help us avoid any unpleasantness and surprises due to collinearity and multicollinearity. The correlation matrix below does not show any significant hint of multicollinearity. In fact, this correlation matrix suggests that these three variables are effectively independent of each other. Pearson's product-moment correlation test does indicate that the correlation between education and experience is statistically significant at the \(\alpha=0.05\) level (\(t=-5.2152, df=335, p \ll 0.0001\)), but that just means we have sufficient evidence that \(\rho \ne 0\). The coefficient of -0.2740 is a low level of correlation.

                      White  Education   Experience
        White        1.0000     0.0243      -0.0794
        Education    0.0243     1.0000      -0.2740
        Experience  -0.0794    -0.2740       1.0000
    

    Nominal Regression Thoughts

    Now, let us model the outcome variable with the three independent variables. Actually, we need to step back and really think about what we mean by "model the outcome." Do I want to predict the probability that a person will be Blue Collar given the input variables? Or: Do I want to predict the job category given the input variables? These are different questions. They require slightly different methods.

    Option 1: Blue Collar vs. All Others

    The first question actually asks a binary question: What is the probability that a person will be Blue Collar (compared to all of the other job categories)? This is very much like the questions asked in Chapter 15: Binary Dependent Variables. Here, the dependent variable takes on values 1 (Blue Collar) and 0 (not Blue Collar).

    To answer this question, we need to create a variable called bluecol as an indicator variable for Blue Collared-ness. Thus, the model we fit will be
    \[
    \texttt{bluecol} \sim \texttt{white} + \texttt{ed} + \texttt{exper}
    \]

    We would fit it using a generalized linear model, a binomial family, and a logit link. The results of the regression are in the table here. From this model, we can perform all of the goodness of fit measures from Chapter 15: Binary Dependent Variables.

                             Estimate   Std.Err   z-value   p-value
      Intercept                3.1036    1.0110      3.07    0.0021
      White                    0.7090    0.6213      1.14    0.2538
      Years of education      -0.3721    0.0640     -5.81    0.0000
      Years of experience     -0.0259    0.0113     -2.30    0.0215
    

    Looking at the results from running the model, we see that greater levels of education and greater levels of experience are associated with a lower probability of being a blue collar worker. For Bob, an individual who responded that he was white, had 20 years of education, and 10 years of experience in their current job, the probability of being a blue collar worker is approximately 2% (as compared to not being a blue collar worker).

    Option 2: All vs. All

    The second option is to try to estimate Bob's actual occupation type, not just whether he is Blue Collar. Note that answering this question is not as simple as running five logistic regressions. This is because the last parenthetical part in the previous section, "as compared to not being a blue collar worker", is subtle and extremely important.

    Note

    Here is why:

    What is the probability that Bob is a white collar worker? If we do the same steps above, we get that the probability that Bob is a white collar worker (as compared to not being a white collar worker) is 13.1%. Similarly, if we continue performing separate logistic regressions, the probability that Bob is a professional is 96.9%; menial, 2.3%; and craft, 7.9%.

    Note that all of these probabilities add up to more than 100%.

    If we are trying to determine Bob's occupation, then there is something seriously wrong here, since the probability that Bob holds one of these five job types cannot be greater than 100%.

    The problem is that we kept changing the base category in our probability calculations. In Chapter 15: Binary Dependent Variables, we never mentioned the need to specify the base category since it always defaulted to the opposite of what we were modeling. In other words, we were actually measuring the probability of an event as compared to the probability of "not the event" (a.k.a. the odds of the event). This ensured that the probabilities always added up to 100%.

    The Lesson

    Comparing probabilities of events is not as easy as when we were only working in the binary realm. It is doable — easily so, with one small change. We need to select a base category that does not change throughout our analysis. The choice is up to you, as all choices are equally acceptable from a statistics standpoint.

    Practically speaking, to perform this modeling, you will have to load the nnet package. Since this comes with your base distribution of R, there is no need to install it. Once loaded with the library(nnet) command, to fit the better model, use the R command

    multinom(occ ~ white + ed + exper)
    

    Because of the large amount of output, the regression table is structured slightly different. The coefficients (in logit units) and the standard errors are still presented. The statistical significance (p-values) is not. However, a quick rule of thumb is that the variable is statistically significant (at the \(\alpha=0.05\) level) if the parameter estimate is more than twice the standard error. The table below presents the output from modeling the data in the form given in the output.

     Coefficients:
                     Constant     White   Educ Level   Experience
     Craft            -1.8328   -0.7642       0.1933       0.0230
     Menial           -0.7412   -1.2365       0.0994      -0.0074
     Prof            -12.2595    0.5376       0.8783       0.0309
     WhiteCol         -6.9800    0.3349       0.4526       0.0299
     
     
     Std. Errors:
                     Constant     White   Educ Level   Experience
     Craft             1.1861    0.6324       0.0775       0.0126
     Menial            1.5195    0.1996       0.1023       0.0174
     Prof              1.6681    0.7996       0.1005       0.0144
     WhiteCol          1.7144    0.9340       0.1023       0.0153
    

    Note that one of the five job types is missing: Blue Collar. This is because all of the probabilities are measured with respect to Blue Collar. Thus, these percentages are directly comparable (after transforming from logit units).

    The Coefficients part of the regression table provides you with four separate equations. For instance, the above says that the Craft equation is

    \begin{equation}
    \mathrm{logit}\left(\frac{\texttt{Craft}}{\texttt{BlueCol}}\right) = -1.8328 - 0.7642\texttt{White} + 0.1933\texttt{Educ} + 0.0230\texttt{Experience}
    \end{equation}

    That is rather tedious to calculate by hand. R is nice in that if you predict on a multinomial model, it will give you the category with the highest probability, by default. Thus, according to this model, Bob will most likely be a Professional (which was our conclusion above). If we want the probabilities for each of the possible job types for Bob, we need to add a type="probs" parameter to our function call:

    predict(model.mn1, newdata=BOB, type="probs")
    

    Such a call gives us the following probabilities (which sum to one, as they should):

      BlueCol   Craft   Menial     Prof   WhiteCol
       0.0020  0.0091   0.0020   0.9565     0.0304
    

    Interpreting Estimates

    The interpretation of the coefficients (parameter estimates) is the same as for the binary dependent variable case. Just remember that the coefficients are in "logit units," so you will want to speak of log-odds. In R, however, this library does not require you to back-transform your predictions. To remember this, just look at the output — it is in proportions already (a quick check is that they sum to one).

    Goodness of fit

    The first check of the goodness of the model is the relative accuracy (see also Section~\ref{sec:bdv-accuracy}). The accuracy is the number of correct predictions divided by the number of cases. The relative accuracy divides this number by the accuracy of always selecting the modal category (\textbf{the null model}). For this dataset, the modal category is Professional, with 140 out of 337 cases belonging to Professionals. Thus, the relative accuracy is \(\frac{169}{337} / \frac{140}{337} = 1.207\). Thus, this model improves accuracy by 21% over the null model. Is this good? It depends on your other models.

    Also, using the basic accuracy score means that you are asserting that your data are representative of the population. Without this assertion, the accuracy means virtually nothing!

    As Maximum Likelihood Estimation is used, the Akaike Information Criteria (AIC) score is also reported. For this model, \(AIC = 885\). Is this good? Again, it depends on your other models. In other words, model comparison needs another model. I leave it as an exercise to see that the null model has \(AIC = 1027\). Thus, our model is much better than the null model.

    Interpreting the Model

    Now that we have looked at our model, let us look at the parameter estimates. According to our model, Whites have a higher probability of being Professionals and White Collar workers than they are to be Craft or Menial laborers. As for education, higher levels of education are associated with higher odds of being a Professional or a White Collar worker (both of these are statistically significant) than being a Blue Collar worker. Finally, years of experience are not a statistically significant predictor of job type, as none of the coefficients are statistically significant (coefficient / standard error > 2).

    So, we have a picture of Professionals and White Collar workers, when compared to Blue Collar workers: they are White and well educated. Not an earth-shattering conclusion, but it is encouraging to see that our conclusions do seem to reflect reality.

    Note on the Rule of Thumb

    This rule of thumb comes from the fact that in a Normal distribution, the ratio needs to exceed 1.96 to be statistically significant at the \(\alpha=0.05\) level. These parameter estimates are not guaranteed to be Normally distributed. As such, the rule of thumb is to be more conservative.

    Even with the rule of thumb, do not bet the farm. 🐄


    This page titled 18.1: Nominal Dependent Variables is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Ole Forsberg.

    • Was this article helpful?