7.5: End-of-Chapter Materials
- Page ID
- 57742
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Here are the expected materials to supplement the chapter.
R Functions
In this chapter, we were introduced to many, many, many R functions that will be useful in regression. In fact, this chapter uses more R functions than any other chapter in this book. Here are the many.
Packages
car
This package provides several statistical tests used in the book An R Companion to Applied Regression by J. Fox and S. Weisberg. It is a great package that provides a lot of additional functionality for R.lawstat
This package provides several statistical tests used in law and public policy analysis. It provides the basicruns.testfunction for us.lmtest
This package provides many tests related to linear models. It provides an implementation of the Breusch-Pagan test,bptest, which tests for heteroskedasticity in the residuals.KnoxStats
This package adds much general functionality to R. Specifically, it improves upon the runs test in thelawstatpackage.
Statistics
source(filename)
This function runs an R script from a separate file. That file may be local or on the Internet.lm(formula)
This is the function that performs ordinary least squares estimation on linear models.aov(formula)
This function performs ordinary least squares estimation on linear models.summary(x)
This produces the six-number summary or a frequency table of the provided variable, depending on the type of variable.summary.lm(mod)
When applied to a linear model fit using either theaovfunction or thelmfunction, provides estimates of the effects of the numeric variables and the levels of the categorical variables in the model.summary.aov(mod)
When applied to a linear model fit using either theaovfunction or thelmfunction, provides estimates of the statistical significance of the variables in the model.shapiroTest(E)
This tests the null hypothesis that the variableEcomes from a Normal distribution. It is based on theshapiro.testfunction in base R installation. It adds capabilities to test Normality in several groups.fligner.test(formula)
This tests for heteroskedasticity when the independent variable is categorical.bptest(mod)
This function from thelmtestpackage performs the Bresuch-Pagan test for heteroskedasticity.runs.test(E, order)
This alteration to thelawstatfunction tests whether the variableE, as ordered byorderexhibits fit issues.vif(model)
This function calculates the variance inflation factor (VIF) for each of the independent variables in the model.predict(mod)
This predicts the values of the dependent variable at each point in the dataset or for the values specified.confint(mod)
This calculates confidence intervals for the parameters in ordinary least squares regression.set.base(var,level)
This function redefines the base category in the provided level. By default, the base category is the first according to the alphabet.
Probability
set.seed(x)
This sets the random number seed. Doing so makes replication possible.rexp(n, rate)
This generates \(n\) random values from an Exponential distribution with the specified rate parameter.rnorm(n, mean, sd)
This generates \(n\) random values from a Normal distribution with specified mean and standard deviation. By default the mean is 0 and the standard deviation is 1.runif(n, min, max)
This generates \(n\) random values from a Uniform distribution with specified minimum and maximum values. By default, the minimum is 0 and the maximum is 1.
Mathematics
head(x)
This returns the first six values in the variable.foot(x)
This returns the last six values in the variable.seq(from, to, by, length)
This returns a vector of sequential values, wherebyindicates the step size andlengthspecifies the vector length. Only one of these two should be provided. If neither is provided, thenbydefaults to 1.length(x)
This calculates the length of a vector (variable), which is the sample size, \(n\).residuals(mod)
This calculates the residuals in the model, which is the difference between the observed and the predicted.
Graphics
qqnorm(x)
This creates a Normal quantile-quantile plot for the given values.qqline(x)
This adds the diagonal line to the quantile-quantile plot.plot(x,y)
This produces a scatter plot of the y-values against the x-values.overlay(x)
This produces a histogram with a Normal curve overlaying it. Technically, there are several possible overlays, but the Normal curve is the default.
par(...)
This sets parameters for the next graphic started. Look through the help page for this function to see all you can specify.plot.new()
Creates a blank, new plot.plot.window(xlim, ylim)
Specifies the limits for the x- and y-axes.axis(side)
When a plot is already drawn, this adds values along axis numberside.title(...)
When a plot is already drawn, this adds the x- and y-labels.lines(x,y)
When a plot is already drawn, this draws lines between each subsequent (x, y) pair.points(x,y)
When a plot is already drawn, this draws points at each (x, y) pair.
Programming
library(package)
This loads an external package that you have already installed on your computer. It allows access to all functions and data sets in thepackagepackage.attach(dataframe)
This allows you to access the variables in thedataframewithout having to prefix each withdataframe$.as.character(x)
This changes the values in variablexto be characters.as.numeric(x)
This changes the values in variablexto be numbers.
Exercises
- In the two panels in Figure 7.3.2, the lines of best fit do not go beyond the data. Why?
- Section 7.3 ended by stating that there was a really big problem with those results. Run the following code.
mean(outcome>1) + mean(outcome<0)
What value is given, what does it mean, and why does it imply there is something fundamentally wrong with the analysis?


