15.4: Modeling with the Logit
- Page ID
- 57779
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)For a binary response variable, the canonical link function is the logit link. This link is characterized by being symmetric and having relatively thin tails (see the earlier figure on symmetric link functions). This symmetry may be important when you are dealing with events that are balanced — neither rare nor frequent. The tail thickness may be important when you think there is a sharp transition between success and failure in your data. In reality, current social science theory is rarely so clear as to give you guidance in which link function you should use. As such, try several and see which one gives the best fit.
There is another reason to try several link functions. Since the "population" link function is not known, the predictions of the model should be robust to the choice of link function: Test several and see if the predictions are stable. If not, then the quality of your model depends heavily on something you cannot measure.
Of course, if there is a traditional link function used in your field, you should use it as the default. Thus, social scientists should start with the logit, while the health science researchers should start with the probit.
Since binary dependent variable regression is so very important to understand, let us look at it from a different direction:
Let us imagine an experiment where we have a series of 100 coins. Were these coins all fair, then the probability of getting a Head on any throw would be \(\pi=\frac{1}{2}\). However, let us assume these coins are not necessarily fair, and that they are weighted in a very specific manner: Coin \(i\) has a probability of flipping a Head of \(\pi_i\), which increases as \(i\) increases. That is,
\[ \pi_{i+1} > \pi_{i}, \quad \forall i \]Now, if we were allowed to flip each coin only once, how can we estimate \(\pi_1\) from the data?
Solution.
As we have no evidence to the contrary, let us use the canonical link function, the logit. Our steps are quite similar to the steps we performed when we had to transform the dependent variable:
- Read in the data
- Model the dependent variable using the GLM paradigm (specify the distribution, the linear estimator, and the link function)
- Predict outcomes using your model
- Back-transform the predictions using the inverse of your chosen link function
There is a step missing from when we previously transformed our dependent variable: We do not have to transform the dependent variable. Generalized linear modeling does that for us in R. We do, however, have to back-transform the predictions. Be aware of this!
In R, the general form of the command is, showing the most important parameters,
glm(formula, family(link), data)
Only formula is required. If family is missing, the Gaussian (or Normal) distribution will be assumed. If link is missing, the canonical link for that family will be assumed. If data is missing, the current dataframe will be assumed.
For binary response variables, the family will need to be the Binomial distribution. Thus, for the example using the coinflips data file, the command will be
m1 = glm(head~trial,
family=binomial(link="logit"),
data=coin)
I used the data parameter, as I did not attach the data earlier. If you attached, then you do not need to include this parameter. I also included link="logit" even though this is the default setting for the Binomial family in order to remind myself of the link function I used in this analysis.
Estimate Std. Err z-value p-value Constant Term -2.2929 0.5384 -4.26 << 0.0001 Trial Number 0.0345 0.0087 3.97 0.0001
The results from this command are summarized in the table above. Again, note that the parameter estimates (and predictions) will be in "logit units." You will have to use the logistic function (the inverse of the logit) to get the predictions in units of probability. Recall that the interpretation here would be
"For every increase of 1 in the coin number, the odds of a head increases by about 3.45 percent."
Recall that the original question asked us to determine \(\pi_1\), the probability of getting a Head on the first coin. There are a couple ways of doing that. The best will depend on the numbers involved. Since we want \(\pi_1\), we know it is equal to the logistic of the intercept plus one times the coefficient:
\begin{align}
\pi_x &= \text{logistic}\big(-2.2929 + 0.0345\, x \big) \\[1em]
pi_1 &= \text{logistic}\big(-2.2929 + 0.0345\, (1) \big) \\[1em]
&= 0.0946
\end{align}
The other way is to use the predict function and take the logistic of that value. You will get the same answer (within rounding error). The function call used is
predict(m1, newdata=data.frame(trial=1))
This gives an answer of \(-2.2584\). The logistic of \(-2.2584\) is our estimate of \(\pi_1\), which is \(\pi_1 = 0.0946\).
If we so desire, we can also plot the probability curve on a graph of the outcomes (see Figure \(\PageIndex{1}\) above). With such a graph, we could estimate which coin is most fair. With the graph, we could also get a feel for how well the model represents the data.
The linear predictor is represented in the curve graphed in the figure above. Note, however, that the curve is not linear. This is because the curve is actually the logistic of the linear predictor.
With that said, the curve is linear in the transform space. If you graph the coin number against the logit of the head probability, the line of best fit is, indeed, a line (see the earlier schematic figure on latent variable modeling).


