15.9: End-of-Chapter Material
- Page ID
- 57784
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)R Functions
In this chapter, we were introduced to several R functions that will be useful in the future. These are listed here.
Packages
KnoxStats
This package adds much general functionality to R, especially in terms of accuracy measures.Epi
This package adds several functions and procedures related to epidemiology. As it is not a part of the base installation for R, you will need to install it before you can load it withlibrary(Epi). It is a support package for the book, Epidemiology with R.
Statistics
lm(formula)
This function performs linear regression on the data, with the supplied formula. As there is much information contained in this function, you will want to save the results in a variable.glm(formula)
This function performs generalized linear model estimation on the given formula. There are three additional parameters that can (and often should) be specified.- The
familyparameter specifies the distributional family of the dependent variable, options includegaussian,binomial,poisson,gamma,quasibinomial, andquasipoisson. If this parameter is not specified, R assumesgaussian. For this chapter, we relied on thebinomialfamily. - The
linkparameter specifies the link function for the distribution. If none is specified, the canonical link is assumed. The canonical link for the Binomial distribution is the logit function. - Finally, the
dataparameter specifies the data from which the formula variables come. This is the same parameter as in thelmfunction.
- The
predict(model, newdata)
As with almost all statistical packages, R has a predict function. It takes two parameters, the model, and a dataframe of the independent values from which you want to predict. If you omitnewdata, then it will predict based on the independent variables of the data itself, which can be used to calculate residuals. The dataframe must list all independent variables with their associate new values. You can specify multiple new values for a single independent variable.AIC(model)
This function calculates the Akaike Informations Criterion score for the provided model. The model needs to have been fit using Maximum Likelihood Estimation.BIC(model)
This function calculates Schwarz's Bayesian Information Criterion (BIC) for the provided model. The model needs to have been fit using Maximum Likelihood Estimation.deviance(model)
This function returns the deviance in the model. This value is useful in the Likelihood Ratio Test. The model needs to have been fit using Maximum Likelihood Estimation.pchisq(x)
This gives the value of the cumulative distribution function (CDF) under the Chi-squared distribution. The necessary parameter is the number of degrees of freedom,df=. By default, it returns the lower-tail probability. Usually, we will want to have the upper-tail probability, thus we will use thelower.tail=FALSEparameter.
var.test(x,y)
This function performs an F test, which compares the variances of two samples (x and y) from Normal populations. It can only compare two samples. If you need to compare more than two samples for equality of variance, you will need to perform either a Bartlett test or a Fligner-Killeen test.
Accuracy
accuracy(model)
This function determines the predictive accuracy of a provided model. It takes three necessary parameters:data,truth,model, andthreshold. It has the optional parameter of returning the number of correct classifications (rate=FALSE). This actually should not be used, unless your sample is representative of the population in terms of positive and negative realities... which is never true if you try to balance your data.- Here is a list of other accuracy measures you may want to use. Make sure you know what each is measuring and what it means to use that particular measure of accuracy
Fscore: F-scoreF1score: F1 score (balanced F-score)MCC: Matthews correlation coefficient (the phi coefficient)phiCoef: Phi coefficient (mean square contingency coefficient)precision: Precision rate (positive predictive value)recall: Recall rate (sensitivity)fnr: False Negative rate (miss rate)fpr: False Positive rate (Type I error rate)
Graphics
ROC(formula)
This function in theEpipackage performs ROC analysis on the data. It provides a ROC graph as well as some statistical values. This function also provides theAUCmeasure of accuracy.
Programming
for
This command is one of the basic control-constructs in the R language (as in most programming languages). The usual use isfor(var in seq) expr, wherevaris the looping variable (the variable that equals the current loop number). The parameterseqis a vector of values. Usually,seqis something like1:100, which is a vector of values from 1 to 100. Finally,expris the expression (or series of expressions) that are performed for each value in theseqvector.
Exercises
This section offers suggestions on things you can practice from this chapter.
- In Example \ref{ex:bdv-badols}, we suggested that you fit the provided pseudo data with linear regression and the OLS method. Please do so now.
- From Section \ref{sec:bdv-log-log}, please fit the \var{coin} data with the formula \var{head$\dist$trial} and the log-log link. What is the predicted probability of getting a Head on Coin 15?
- Use the coinflip data (coinflips.csv ) to estimate the coin that is closest to being fair (a probability of producing a head is closest to 0.500). Use multiple link functions and select which you think is the best.
- Let us revisit the \var{cows} data. One of the variables is \var{passed}, which is a binary variable indicating whether the ballot measure passed. Your job is to predict the proportion of voters in D\v{e}\v{c}\'{i}n who will vote in favor of the bill to limit cows. Do not use the \var{pctFavor} variable. Decide which model you are supposed to use. Prove that your model is the best model available. Make your prediction of the vote share. Include graphs if you would like, but only if the graph helps to illustrate your point.
Applied Readings
- Judi Bartfeld and Myoung Kim (2010). "Participation in the School Breakfast Program: New Evidence from the ECLS-K." Social Service Review. 84(4): 541–62.
- Regina P. Branton (2009). "The Importance of Race and Ethnicity in Congressional Primary Elections." Political Research Quarterly. 62(3): 459–73.
- Denise Gammonley, Ning Jackie Zhang, Kathryn Frahm, and Seung Chun Paek. (2009) "Social Service Staffing in U.S. Nursing Homes." Social Service Review. 83(4): 633–50.
- Michael A. Neblo (2009). "Meaning and Measurement: Reorienting the Race Politics Debate." Political Research Quarterly. 62(3): 474–84.
- Lenna Nepomnyaschy and Irwin Garfinkel (2011). "Fathers' Involvement with Their Nonresident Children and Material Hardship." Social Service Review. 85(1): 3–38.
- Joseph G. Pickard, Megumi Inoue, Letha A. Chadiha, and Sharon Johnson (2011). "The Relationship of Social Support to African American Caregivers' Help-Seeking for Emotional Problems." Social Service Review. 85(2): 247–66.
- Brian Kelleher Richter, Krislert Samphantharak, and Jeffrey F. Timmons (2009). "Lobbying and Taxes." American Journal of Political Science. 53(4): 893–909.
- Lori E. Ross, Rachel Epstein, Corrie Goldfinger, and Christina Yager (2009). "Policy and Practice regarding Adoption by Sexual and Gender Minority People in Ontario." Canadian Public Policy / Analyse de Politiques. 35(4): 451–67.
Theory Readings
- Hirotugu Akaike (1974). "A New Look at Statistical Identification Model." IEEE Transactions on Automatic Control. 19(6): 716–723.
- Hirotugu Akaike (1977). "On Entropy Maximization Principle." In P. R. Krishnaiah (Editor). Applications of Statistics: Proceedings of the Symposium Held at Wright State University, Dayton, Ohio, 14-18 June 1976. New York: North Holland Publishing, 27–41.
- George Casella and Roger L. Berger (2002). Statistical Inference. Second edition. New York: Duxbury.
- Peter McCullagh and John A. Nelder (1989). Generalized Linear Models. London: Chapman and Hall.
- Gideon E. Schwarz (1978). "Estimating the dimension of a model." Annals of Statistics. 6(2): 461–64.
- Samuel S. Wilks (1938) "The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses." The Annals of Mathematical Statistics. 9(1): 60–62.


