15.6: Modeling with Other Links
- Page ID
- 57781
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)This section emphasizes the importance of exploring alternative link functions when modeling binary data, rather than defaulting to the canonical logit link without consideration. Here, I argue that while the logit link is sufficient for basic model fitting, a deeper understanding of the data-generating process can be gained by comparing results across different links. The section introduces two asymmetric link functions — the complementary log-log (a.k.a. cloglog) and the log-log — and explains how their distinct shapes (one with a steep approach to its maximum, the other from its minimum) can capture different types of underlying probability processes.
By the end of this section, you will be able to:
- Explain the rationale for using multiple link functions (beyond the canonical logit) when modeling binary data, noting that consistent conclusions across different links strengthen confidence in the model, while discrepancies may indicate model misspecification.
- Fit logistic regression models in R using asymmetric link functions, specifically the complementary log-log link (
family=binomial(link="cloglog")) and the log-log link (family=binomial(link=make.link("loglog"))). - Recognize that coefficient estimates from models with different link functions are not directly comparable because they are on different scales, and that model comparisons should instead focus on the stability of predictions and overall conclusions about the direction and significance of effects.
✦•················• ✦ •··················•✦
The logistic regression (or "logit" or "logistic" regression) we did above is quite sufficient if all you want to do is fit the data using logistic regression. If, on the other hand, you want to better understand the process that gave you the data, you will want to try different link functions to determine if any of the alternative links do an appreciably better job of fitting your data. The logit link is symmetric. You should also use the probit link as a check on your model: If the results are comparable, then the conclusions are strengthened; if not, there is something wrong with your model.
In addition to using a second symmetric link function, you should use the two main asymmetric link functions: the complementary log-log and the log-log link function.
The Complementary Log-Log Link (cloglog)
As mentioned earlier, there are several other available links functions beyond the logit link (see the earlier table of link functions). Actually, for binary response variables, all that is required of the link function is for it to be increasing, to smoothly map \(g \colon (0, 1) \mapsto \mathbb{R}\), and to have an inverse that smoothly maps \(g^{-1} \colon \mathbb{R} \mapsto (0, 1)\). As mentioned earlier, the logit link is symmetric. If you are dealing with rare-events data, you may not want to use a symmetric link function. The complementary log-log link is asymmetric and is often useful (see Figure \(\PageIndex{1}\) below).
The formula for the complementary log-log is
\[ g(\pi) \stackrel{\text{def}}{=} \log\Big(-\log(1-\pi)\Big) \]
Its inverse is
\[ g^{-1}(\eta) = 1 - \exp\Big( -\exp(\eta) \Big) \]
The plot of the complementary log-log function is seen in the figure above, overlaid with the same plot for the logit link. Note the difference in shapes. Recall that the logit link is symmetric. The complementary log-log is not; it approaches its maximum value more steeply than the logit.
Because of this asymmetry, it will fit models differently. Let us fit the coin data with a complementary log-log link. The command is
glm(head~trial, family=binomial(link="cloglog"), data=coin)
Note that the only change is in the link clause. The results of this new model are provided in the table below. Note that the direction of effect is the same in both models. Unfortunately, as the first model is in logit units and the second model is in complementary log-log units, comparing the magnitude of the coefficients tells us nothing. Comparing predictions tells us much more.
Using the logit model, the prediction for \(\pi_1\) was 0.095. Using the complementary log-log model, the prediction is \(\pi_1 = 0.122\), which is closer to the true value of \(\pi_1=0.150\).
Estimate Std. Err z value Pr(> |z|)
Constant term -2.0651 0.4353 -4.74 << 0.0001
Trial number 0.0244 0.0063 3.86 0.0001
The Log-Log Link
A second useful asymmetrical link function is the log-log link (see Figure \(\PageIndex{2}\) below). Note that the asymmetric log-log link rises to its maximum much slower than either the symmetric logit link or the asymmetric complementary log-log link. Because of this functional shape, it will be better at fitting certain data sets better than the other link functions discussed.
In reality, there is a functional relationship between the complementary log-log and the log-log link functions. They are \(180^{\circ}\) rotations of each other. Thus, statistical programs either have no support for either or have support only one. Like most statistics packages, R has native support for only one of the two. For R, it is the complementary log-log link.
This is actually a decision of history. From how I (and most) have presented the binary dependent variable models, it seems as though we statisticians started with the logit. The first use of this type of regression, however, used the complementary log-log function (Fisher, 1922). It was not pretty, but it was a fantastic step in the right direction!
The command to perform the log-log regression on this data is the same as before, except for the link parameter, which is now
link=make.link("loglog")
With this, I leave it as an exercise for you to show that the effect of trials in the log-log model is 0.0233 and that the predicted probability of a head for Coin 1 using this model is 0.0498.
So, which link should I use?
Now that you know about these five link functions, which should you use? An answer other than "All of them!" requires subject matter knowledge that scientists rarely have. However, here are some comments on the five links. Take them with a grain of salt.
The logit link is the canonical link function for the Bernoulli distribution. It is symmetric and has relatively thin tails. Because it is the canonical link, its mathematical properties are attractive, and its coefficients can be interpreted in terms of changes in the log-odds of the outcome. The logit link is the default choice in many fields, particularly the social sciences, and is most appropriate when the outcomes are balanced (neither extremely rare nor extremely frequent).
The probit link is also symmetric and is based on the inverse of the standard normal cumulative distribution function (a.k.a. the quantile function). Because the Normal distribution is so familiar, the probit link is frequently used in biostatistics and fields with deep roots in the Normal distribution. In practice, predictions made with the probit link are usually very close to those made with the logit link. The coefficient estimates will typically differ by a constant factor of approximately 1.67, and the levels of statistical significance will generally be similar.
The cauchit link is a symmetric link based on the Cauchy distribution, which has much heavier tails than the logistic or Normal distributions. This means the cauchit link should be considered when the underlying probability process may involve extreme values or when the transition from failure to success is very abrupt. Because of its heavy tails, the cauchit link can produce model fits that are noticeably different from the others, and it can be a useful diagnostic tool for checking the robustness of your conclusions.
The complementary log-log link is an asymmetric link. It is the inverse of the Gumbel distribution. This link function is characterized by rising very slowly from zero for low values of the linear predictor, but then approaching one very sharply. It is most appropriate when the probability of a success is asymmetric in this direction, meaning that the event of interest is initially very unlikely but, once conditions become favorable, its probability increases rapidly.
The log-log link is also asymmetric and is essentially the mirror image of the complementary log-log link. Fitting a model with a cloglog link to the probability of a failure (\(1-\pi\)) is equivalent to fitting a log-log link to the probability of a success (\(\pi\)). This means the log-log link is appropriate when the probability of success is asymmetric in the opposite direction: it increases very sharply from zero at first, but then approaches one very slowly as the linear predictor increases.


