14.2: The Requirements for GLMs
- Page ID
- 57769
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)The Generalized Linear Model (GLM) is a paradigm that extends the CLM and many adjustments to it. To accomplish this feat, the model parts are named and examined. Those three parts are the linear predictor, the conditional distribution of the dependent variable, and the link function. While we have already mentioned all three of these concepts, let us explore them in greater detail before we derive the mathematical results.
✦•················• ✦ •··················•✦
There is a modeling paradigm termed General Linear Models, which merely allows for multiple independent variables to the CLM; technically, the CLM uses only one independent variable. General Linear Models are rarely discussed separately from the CLM, as such there is standardized no abbreviation for them. However, authors that do discuss General Linear Models frequently abbreviate them by GLM. These same authors will abbreviate Generalized Linear Models by GLZ.
Upshot: When searching for information on GLMs, make sure you are reading about Generalized Linear Models and not General Linear Models.
The Linear Predictor
Of the three knowledge requirements for using generalized linear models (GLMs), the linear predictor is the most familiar. It is merely the weighted sum of your chosen explanatory variables that you used throughout the classical linear model chapters:
\begin{align}
\eta &\stackrel{\text{def}}{=} \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k \\[1ex]
&= \mathbf{X}\mathbf{B}
\end{align}
The only difference is that we are providing a name for the weighted sum (\(\eta\), the Greek letter "eta") and we are calling it the "linear predictor." It is a "linear" predictor because the expression is linear in each of the coefficients (\(\beta_i\)). It is a predictor because it is used to predict the expected value of the dependent variable from the independent variables.
Note that the values produced by the linear predictor are unbounded. That is, note that \(\eta \in\) ℝ. This is very important to realize, especially when we get to the third requirement: the link function.
The Conditional Distribution
The first "new" addition is the conditional distribution of the dependent variable (its distribution, conditional on the values of the independent variable). Naming it is usually not as difficult as it may seem — a few rules of thumb are very helpful. The distribution chosen reflects your knowledge of the domain of the dependent variable. If the dependent variable can take on all Real values (as before), then an appropriate distribution is the Gaussian distribution (as before). If the dependent variable can take on only values of 0 and 1, then an appropriate distribution is the Bernoulli distribution. And so forth. Table~\ref{tab:glms-dists} provides appropriate distributions for several different types of dependent variables (and the chapter in which we discuss them). This is not an exhaustive list, nor are the listed distributions always correct. They are just a good place to start.
The Gaussian distribution is the eponymous distribution named for Johann Carl Friedrich Gauss (1777–1855). We already know it as the normal distribution. That we are using the name Gaussian reflects standard terminology in GLMs and a desire to give credit where it is due. Well, in Francophone areas, the distribution is known as the Gauss-Laplace distribution to give appropriate credit to Pierre-Simon, Marquis de Laplace (1749–1827). However, Laplace also has his own distribution.
Both the Gaussian and the Laplace distribution were created to describe errors in measurement.
| Dependent Variable is... | Default Distribution | Canonical Link | Treated In |
|---|---|---|---|
| Continuous, Unbounded | Gaussian | Identity | 14: Generalized Linear Models |
| Continuous, Bounded by 0 | Gamma | Inverse | Not in this book |
| Discrete, Dichotomous | Bernoulli | Logit | 15: Binary Dependent Variables |
| Discrete, Bounded Count | Binomial | Logit | 16: Binomial Dependent Variables |
| Discrete, Unbounded Count | Poisson | Log | 17: Count Dependent Variables |
| Discrete, Unordered | Multinomial | Logit | 18.1: Nominal Dependent Variables |
| Discrete, Ordered (not interval) | Multinomial | Logit | 18.2: Ordinal Dependent Variables |
All of these distributions have something in common: They are members of the exponential family of distributions (or exponential class of distributions). Section 14.2: The Mathematics (below) discusses why this family of distributions was selected and which distributions belong to it.
The distribution is important in that its expected value automatically restricts the outcome to appropriate values of the dependent variable. Note that we are explicitly modeling the expected value of the dependent variable (given the values of the dependent variables). That we are modeling the expected value may sound odd, but we did this previously with the linear models: Our prediction line was a line of the conditional expectation of the dependent variable, \(E[Y \ |\ x]\). The same is true for GLMs: The fitting routine predicts the expected value of the distribution, \(E[Y\ |\ x]\), not the observed value.
The Link Function
The third aspect you need to know in order to use the GLM framework is the link function, which links the linear predictor and the expected value of the distribution. If we symbolize the expected value of the distribution as \(\mu\) and the linear predictor as \(\eta\), then the link function is \(g(\cdot)\), such that
\begin{equation*}g(\mu)=\eta\end{equation*}
The most important requirement for the link function is that it maps the bounded domain of the expected value of \(Y\) to the unbounded domain of the linear predictor \(\eta\). An additional requirement is that it is a bijection; that is, the link and its inverse are both functions. It is also usual to make the link a strictly increasing function. This forces the direction of the effect of your variable to be in the same direction as the sign of the estimated coefficient: if the coefficient estimate is positive, then the variable has a positive effect on the dependent variable.
The table of distributions, above, lists the canonical link functions for each of the provided distributions. One can use links that are not canonical — and often should — but the canonical link is the default link function used. In subsequent chapters, when an alternate link function is appropriate, we will discuss why.
The Mathematics
Nelder and Wedderburn (1972) formulated the GLM paradigm to unify modeling techniques for several different classes of problems, including logistic regression, count regression, and linear regression. Starting with a member of the exponential family of distributions, Nelder and Wedderburn created an estimation method called iteratively re-weighted least squares (IRLS). This method uses maximum likelihood estimation to estimate the parameter effects using an iterative procedure. MLE remains the primary method of fitting GLMs, but other approaches are used, including maximum quasi-likelihood estimation (which we'll see later), Bayesian estimation, and several variance stabilization methods.
Their choice of MLE was simply one of computing ease. Remember that the early 1970s were a time of sartorial splendor, stagflation, and ABBA — not of cheap computing power. However, even though MLE was chosen for ease, these estimates have some helpful properties. As such, this is still the most widely used method for fitting GLMs, just as OLS has been the preferred method for fitting CLMs for many decades.
Exponential Class of Distributions
The one and only requirement on the distribution is that it belongs to the exponential class of distributions (Nelder and Wedderburn 1974; Wood 2006). Many of the distributions we experience belong to this class, so it is not an issue. Examples of distributions in this class are
- Beta
- Chi-squared
- Exponential
- Gamma
- Gaussian (Normal)
- Geometric
- Poisson
- standard Uniform
Specifically, to be a member of this family, the probability density function (or probability mass function, if discrete) must be expressible in the following form:
\begin{equation} \label{eq:glms-expfamily}
f(y) = \exp \left[ \frac{y \theta - b(\theta)}{a(\phi)} + c(y,\phi) \right]
\end{equation}
Let's look at a few features of this form to better understand what each of the parts indicates.
The Mean
The expected value of the distribution is just
\begin{equation}
E[Y] = b^\prime(\theta)
\end{equation}
Recall that the expected value is important, as it is what we actually model in GLMs.
The Variance
The variance is
\begin{equation}
V[Y] = b^{\prime\prime}(\theta) \cdot a(\phi)
\end{equation}
The \(a(\phi)\) is called the "dispersion parameter." Infrequently, the chosen distribution forces this to be a specific value. Usually, however, this variable is free to reflect the data (be estimated from). For those distributions that force this to be a specific number (Binomial and Poisson), we either need to use quasi-likelihood to fit the model or we need to test this assumption.
Canonical Link
Next, the \(\theta\) is the canonical link function, \(g^{-1}(\cdot)\). It is a function of the parameters of the distribution selected. In the Gaussian case, the canonical link is the identity function, \(\mu = \eta\). In the Bernoulli (and Binomial when \(n\) is known) case, the canonical link is the logit function, \(\mathrm{logit}(\mu) = \eta\), where the logit function is defined as
\begin{equation}
\mathrm{logit}(\mu) \stackrel{\text{def}}{=} \log \left[ \frac{\mu}{1-\mu} \right]
\end{equation}
Nuisance Parameters
Finally, \(c(y,\phi)\) is a term that allows some flexibility to the exponential family of distributions. Without the \(c(\cdot)\) function, far fewer distributions would belong to this family. Further, note that the \(c(\cdot)\) function affects neither the expected value nor the variance. For all intents and purposes, the \(c(\cdot)\) function is ignored and treated as a nuisance parameter.
However, it would be interesting to see what \(c(\cdot)\) can tell us about the distribution.


