14.7: End-of-Chapter Materials
- Page ID
- 57774
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)R Functions
In this chapter, we were introduced to several R functions that will be useful in the future. These are listed here.
Packages
There were no packages used in this chapter.
Statistics
lm(formula)
This function performs linear regression on the data using ordinary least squares to estimate the parameters. As there is much information contained in this function, you will want to save the results in a variable.glm(formula)
This function performs generalized linear model estimation on the given formula using maximum likelihood estimation. There are three additional parameters that can (and often should) be specified.- The
familyparameter specifies the distributional family of the dependent variable, options includegaussian(this chapter),binomial,quasibinomial,poisson,quasipoisson, andgamma. If this parameter is not specified, R defaults togaussian. - The
linkparameter specifies the link function for the distribution. If none is specified, the canonical link is assumed. - Finally, the
dataparameter specifies the data from which the formula variables come. This is the same parameter as in thelmfunction. This is important if we are working with several data sets and need to specify which set of variables is to be used.
- The
predict(model, newdata)
As with almost all statistical packages, R has a predict function. It takes two parameters, the model, and a dataframe of the independent values from which you want to predict. If you omitnewdata, then it will predict based on the independent variables of the data itself, which can be used to calculate residuals. The dataframe must list all independent variables with their associate new values. You can specify multiple new values for a single independent variable.
Exercises
This section offers suggestions on things you can practice from just the information in this chapter. As the purpose of this chapter was to introduce Generalized Linear Models and emphasize that everything we have done thus far can be done with GLMs, all of the extension questions are from previous chapters. For each of these, use the Generalized Linear Model paradigm (and the glm function).
Summary
- What are the three aspects of your model that must be known before using generalized linear models?
- When doing ordinary least squares regression, what were these three aspects?
- How does the canonical link function differ from a link function?
- What is \(a(\phi)\) for the Gaussian distribution?
Data
- Now, note that the value for the Reka stat is 46% weekly church attendance. If, in the year 2012, the voters of Reka were faced with a ballot measure limiting the number of cows in the city limits, but not restricting chickens, what is the probability that it will pass?
- Calculate a 95% confidence interval, with the transformed Cow Vote model, for predicting Děčín's vote. Is the actual outcome within the 95% confidence interval?
- The logit transformation is not the only possible choice as a link for proportion data, there is also the asymmetric "complementary log-log" transformation (
cloglog). Use this function as the link function to predict Děčín's vote, its 95% confidence interval, and the probability of the Cow ballot measure passing. The inverse of the complementary log-log transform has no name, but the R function iscloglog.inv. - Estimate the GDP per capita for Papua New Guinea. For this problem, use the untransformed model. Also, calculate a 95% confidence interval for this estimate. How close is this estimate to the real answer, and it the real answer within the predicted confidence interval?
- Estimate the GDP per capita for Papua New Guinea. For this problem, use the transformed model. Also, calculate a 95% confidence interval for this estimate. How close is this estimate to the real answer, and it the real answer within the predicted confidence interval?
- Compare and contrast the results of your two Papua New Guinea estimates above. Which model works best for Papua New Guinea? Which model works best overall?
Applied Readings
- Denise Gammonley, Ning Jackie Zhang, Kathryn Frahm, and Seung Chun Paek (2009). "Social Service Staffing in U.S. Nursing Homes." Social Service Review. 83(4): 633–50.
- Katarina A. McDonnell and Neil J. Holbrook (2004). "A Poisson Regression Model of Tropical Cyclogenesis for the Australian–Southwest Pacific Ocean Region." Weather & Forecasting. 19(2): 440–455.
- Michael A. Neblo (2009). "Meaning and Measurement: Reorienting the Race Politics Debate." Political Research Quarterly. 62(3): 474–84.
- Weiren Wang and Felix Famoye (1997). "Modeling Household Fertility Decisions with Generalized Poisson Regression." Journal of Population Economics. 10(3): 273–83.
Theory Readings
- Hirotugu Akaike (1974). "A New Look at Statistical Identification Model." IEEE Transactions on Automatic Control. 19(6): 716–23.
- Hirotugu Akaike (1977). "On Entropy Maximization Principle." In: P. R. Krishnaiah (Editor). Applications of Statistics: Proceedings of the Symposium Held at Wright State University, Dayton, Ohio, 14-18 June 1976. New York: North Holland Publishing, 27–41.
- George Casella and Roger L. Berger (2002). Statistical Inference, Second edition. New York: Duxbury.
- Carl F. Gauss (1809). "Theoria motus corporum coellestium." Werke, 7, Göttingen: K. Gesellschaft Wissenschaft.
- Peter J. Huber (1967). "The Behavior of Maximum Likelihood Estimates under Nonstandard Conditions." Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. 221–233.
- Pierre-Simon Laplace (1812). "Théorie analytique des probabilités." Paris.
- Peter McCullagh and John A. Nelder (1989). Generalized Linear Models. London: Chapman and Hall.
- John A. Nelder and Robert W. Wedderburn (1972). "Generalized Linear Models." Journal of the Royal Statistical Society Series A (General). 135(3): 370–84.
- Samuel S. Wilks (1938). "The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses." The Annals of Mathematical Statistics. 9(1): 60–62.
- Simon N. Wood (2006). Generalized Additive Models: An introduction with R. New York: Chapman & Hall.
- Halbert White (1980). "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity." Econometrica. 48(4): 817–838.


