Skip to main content
Statistics LibreTexts

17.2: The Mathematics

  • Page ID
    57795
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Count models have dependent variables that can take on only non-negative integers. Back in the time of OLS, we handled the non-negative aspect by taking the logarithm of the dependent variable (perhaps by adding 1 before taking the logarithm if there are values of 0). However, OLS does not allow for discrete dependent variables. The discrete aspect must be handled through Generalized Linear Models (GLMs).

    Recall that using GLMs requires that we explicitly specify three things.

    1. First, we need to know the linear predictor, \(\eta\).
    2. Second, we need to know the distribution of the dependent variable, conditioned on the independent variables.
    3. Finally, we need to know the link function that appropriately connects the two of them.

    The linear predictor is the same as always: the weighted sum of our independent variables. The canonical link function is the logarithm function. Finally, the distribution we will use is the Poisson distribution. Note that the Poisson is not the only option for such count dependent variables. The Negative Binomial distribution can also be used, but as the Negative Binomial distribution is a bit more complicated than the Poisson, we will motivate this chapter with the Poisson and save the Negative Binomial for later.

    The Poisson Distribution

    The Poisson distribution has the following probability mass function (pmf):

    \[ f(y;\ \lambda) = \frac{\displaystyle e^{-\lambda} \lambda^y}{\displaystyle y!} \qquad y \in \{0, 1, 2, 3, \ldots\} \]

    Again, the probability mass function (pmf) is not as important as the expected value of this distribution. Why? Remember that the Generalized Linear Model paradigm models the expected value, \(\mathrm{E}[Y \mid X]\), not the actual outcomes.

    Calculating the expected value of the Poisson distribution is not as easy as it was for the Binomial; it requires a trick:

    \begin{align}
    E[Y] &\stackrel{\text{def}}{=} \sum_{y=0}^{\infty} y\, f(y) \\[1em]
    &= \sum_{y=0}^{\infty} y\, \frac{e^{-\lambda} \lambda^y}{y!} \\[1em]
    &= \sum_{y=1}^{\infty} y\, \frac{e^{-\lambda} \lambda^y}{y!} \\[1em]
    &= \sum_{y=1}^{\infty} \frac{e^{-\lambda} \lambda^y}{(y-1)!} \\[1em]
    &= \lambda \sum_{y=1}^{\infty} \frac{e^{-\lambda} \lambda^{(y-1)}}{(y-1)!} \\[1em]
    \end{align}

    Let us define \(z \stackrel{\text{def}}{=} y-1\). This leads to the next step:

    \begin{align}
    E[Y] & = \lambda \sum_{z=0}^{\infty} \frac{e^{-\lambda} \lambda^{z}}{z!}
    \end{align}

    Thus, the expected value of a Poisson random variable is \(E[Y]=\lambda\).

    Note

    Recall that one of the assumptions of Ordinary Least Squares is that the variance is constant with respect to the (expected value of the) dependent variable. When the outcomes are distributed as Poisson random variables, we can actually prove that the variance is not constant with respect to the predicted outcomes. To see this, let \(Y \sim Pois(\lambda)\). With this, and with the probability mass function above, we can use the definition to calculate the variance of \(Y\). Without proof, the variance of \(Y\) is \(V[Y]=\lambda\). Yes, the variance is the same as the expected value.

    Thus, the variance is a function of the expected value, and the homoskedasticity requirement of OLS is violated.

    That the variance is a function of the expected value also creates a problem. Quite often, we will be dealing with data in which the variance is not equal to, but is greater than, the expected value. Such data is termed overdispersed. When we encounter it, we will discuss what it means and what we should do.

    By the way, this is the same overdispersion as we discussed in the section on binomial overdispersion. It is caused by the same structures and is tested in the same manner. I encourage you to revisit that section.

    Now that we understand our choice of distribution a bit better, and the resulting expected value, let us examine the third facet: the link function. First, note that \(\lambda\) is bounded; \(\lambda \in (0,\infty)\). Thus, we need a function that takes a bounded variable and transforms it into an unbounded variable. We have already met a link function that can handle this --- the logarithm function.

    Note

    Again, note that we are modeling \(\lambda = E[Y]\), not the observed count. As \(\lambda\) is continuous and bounded below by zero (but never equal to zero), we can use the logarithm function as our transformation link.

    And so, we have the three necessary components to use Generalized Linear Models for count data:

    • the linear predictor, \[ \eta = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k \]
    • the conditional distribution of the dependent variable, \[ Y\ |\ x \sim Pois(\lambda) \] with the formula for the expected value, \[ \mu = \lambda \]
    • and the link function, \[ \log(\mu) = \eta \]

    Here is what you need to take away from this section: The distribution must fit the possible outcomes. The link must translate the bounds on the parameter to the lack of bounds on the linear predictor. Both require you to know some distributions.

    Deriving the Canonical Link

    In the chapter on GLMs, we mentioned that each distribution has a canonical link. Let us derive the canonical link for the Poisson distribution. The steps to determine the canonical link are the same for the Poisson as it was for the Gaussian, Bernoulli, and Binomial:

    1. Write the probability mass function (pmf).
    2. Write the probability mass function in the required form.
    3. Read off the canonical link.

    For this distribution, this results in:
    \begin{align}
    \text{pmf}: \quad f(y \mid \lambda) &=\frac{e^{-\lambda} \lambda^y}{y!} \\[1em]
    &= \exp \left[ \log \left( e^{-\lambda}\right) + \log \left( \lambda^y \right) - \log\left(y!\right) \right] \\[1em]
    &= \exp \left[ -\lambda + y \log(\lambda) - \log(y!) \right] \\[1em]
    &= \exp \left[ \frac{y \log(\lambda) - \lambda}{1} + -\log(y!) \right] \\
    \end{align}

    This is in the required form:
    \begin{equation}
    f(y) = \exp \left[ \frac{y \theta - b(\theta)}{a(\phi)} + c(y, \theta) \right]
    \end{equation}

    Thus, reading off the standard form, we have the following:

    • \(y = y\)
    • \(\theta = \log(\lambda)\)
    • \(a(\phi)=1\)
    • \(b(\theta) = \lambda = \exp(\theta)\)
    • \(c(y,\theta) = -\log(y!)\)

    As such, the canonical link is the log function. I leave it as an exercise to show \(\mathrm{E}[Y]=\lambda\) and \(\mathrm{V}[Y]=\lambda\) using the methods of the GLM mathematics section.

    Example \(\PageIndex{1}\): Citizen's Initiative

    The people in many US states have the ability to formulate binding laws by placing them before the people for a vote. This process is called the Citizens' Initiative. Extant theory suggests that states with a higher population will also use the initiative process more often than states with a lower population. Let us test this hypothesis with data (crime datafile).

    Solution.
    As we are performing GLM modeling, we need to determine the three needed components. First, since the dependent variable is a count of the number of initiatives placed before the voters, we will assume that the dependent variable has a Poisson distribution:

    \[ \text{inituse} \mid \lambda \sim \mathrm{Poisson}(\lambda) \]

    The linear predictor will use our explanatory variable:

    \[ \eta = \beta_0 + \beta_1 \text{pop90} \]

    The link function will be the logarithm function:

    \[ \log(\lambda) = \eta \]

    With this, we use these commands to load and analyze the data

    vcr = read.csv("http://rur.kvasaheim.com/data/crime.csv")
    
    m2 = glm( inituse~pop90, family=poisson(link=log),
        data=vcr, subset=(ccode!=93) )
    

    Now, summary(m2) tells us that there is a statistically significant relationship between the state's population in 1990 and its use of initiatives in the 1990s. Unfortunately, the relationship is negative (\(\hat{\beta}_1 = -7.433 \times 10^{-8}\)), which is definitely inconsistent with the original hypothesis. We have shown that the original hypothesis does not agree with this reality.

    Let us now predict the number of initiatives that Utah would have had during the 1990s using the fact that the population of Utah is 1,722,850. We can do this by hand or we can use the predict function. In either case, we must remember to back-transform using the inverse of the logarithm function, the exponential function. Using the latter method gives me an un-transformed prediction of 2.0, which means the model predicts 7.44 initiatives for Utah in the 1990s. The real value is 3.

    UTAH = data.frame(pop90=1722850)
    
    prL = predict(m2, newdata=UTAH)
    exp(prL)
    

    NN

    The glm function used here includes an additional parameter that we have not discussed: subset. This parameter allows us to explicitly specify which data to include in the analysis. Here, I removed the state with ccode equal to 93 (California) from the analysis. The reason I did this is that a plot of the entire data suggested that California was an influential point.

    Plot of initiative use against state population.
    Figure \(\PageIndex{1}\): A plot of initiative use against the population of the state in 1990, with the Poisson regression curve superimposed.

    The interesting thing is that the graph visually calls into question the results of the GLM regression above. While the effect direction does definitely appear to be negative, it is hard to believe that this effect has such a high level of significance (\(p \ll 0.0001\)). There is a lot of variance in the data. What is happening?

    The problem is that the model/data are overdispersed.


    This page titled 17.2: The Mathematics is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Ole Forsberg.

    • Was this article helpful?