R Tutorial for ANOVA and Linear Regression

Last updated

Aug 17, 2020
Save as PDF
- Parameter Estimation in Simple Linear Regression
- Simple linear regression

$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$

$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$

$\newcommand{\id}{\mathrm{id}}$ $\newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $\newcommand{\range}{\mathrm{range}\,}$

$\newcommand{\RealPart}{\mathrm{Re}}$ $\newcommand{\ImaginaryPart}{\mathrm{Im}}$

$\newcommand{\Argument}{\mathrm{Arg}}$ $\newcommand{\norm}[1]{\| #1 \|}$

$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$\newcommand{\Span}{\mathrm{span}}$

$\newcommand{\id}{\mathrm{id}}$

$\newcommand{\Span}{\mathrm{span}}$

$\newcommand{\kernel}{\mathrm{null}\,}$

$\newcommand{\range}{\mathrm{range}\,}$

$\newcommand{\RealPart}{\mathrm{Re}}$

$\newcommand{\ImaginaryPart}{\mathrm{Im}}$

$\newcommand{\Argument}{\mathrm{Arg}}$

$\newcommand{\norm}[1]{\| #1 \|}$

$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$\newcommand{\Span}{\mathrm{span}}$ $\newcommand{\AA}{\unicode[.8,0]{x212B}}$

$\newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$\newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$\newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$

$\newcommand{\vectorC}[1]{\textbf{#1}}$

$\newcommand{\vectorD}[1]{\overrightarrow{#1}}$

$\newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}$

$\newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}}$

$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$

$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$

$\newcommand{\avec}{\mathbf a}$

$\newcommand{\bvec}{\mathbf b}$

$\newcommand{\cvec}{\mathbf c}$

$\newcommand{\dvec}{\mathbf d}$

$\newcommand{\dtil}{\widetilde{\mathbf d}}$

$\newcommand{\evec}{\mathbf e}$

$\newcommand{\fvec}{\mathbf f}$

$\newcommand{\nvec}{\mathbf n}$

$\newcommand{\pvec}{\mathbf p}$

$\newcommand{\qvec}{\mathbf q}$

$\newcommand{\svec}{\mathbf s}$

$\newcommand{\tvec}{\mathbf t}$

$\newcommand{\uvec}{\mathbf u}$

$\newcommand{\vvec}{\mathbf v}$

$\newcommand{\wvec}{\mathbf w}$

$\newcommand{\xvec}{\mathbf x}$

$\newcommand{\yvec}{\mathbf y}$

$\newcommand{\zvec}{\mathbf z}$

$\newcommand{\rvec}{\mathbf r}$

$\newcommand{\mvec}{\mathbf m}$

$\newcommand{\zerovec}{\mathbf 0}$

$\newcommand{\onevec}{\mathbf 1}$

$\newcommand{\real}{\mathbb R}$

$\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$

$\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$

$\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$

$\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$

$\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$

$\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$

$\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$

$\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$

$\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$

$\newcommand{\laspan}[1]{\text{Span}\{#1\}}$

$\newcommand{\bcal}{\cal B}$

$\newcommand{\ccal}{\cal C}$

$\newcommand{\scal}{\cal S}$

$\newcommand{\wcal}{\cal W}$

$\newcommand{\ecal}{\cal E}$

$\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$

$\newcommand{\gray}[1]{\color{gray}{#1}}$

$\newcommand{\lgray}[1]{\color{lightgray}{#1}}$

$\newcommand{\rank}{\operatorname{rank}}$

$\newcommand{\row}{\text{Row}}$

$\newcommand{\col}{\text{Col}}$

$\renewcommand{\row}{\text{Row}}$

$\newcommand{\nul}{\text{Nul}}$

$\newcommand{\var}{\text{Var}}$

$\newcommand{\corr}{\text{corr}}$

$\newcommand{\len}[1]{\left|#1\right|}$

$\newcommand{\bbar}{\overline{\bvec}}$

$\newcommand{\bhat}{\widehat{\bvec}}$

$\newcommand{\bperp}{\bvec^\perp}$

$\newcommand{\xhat}{\widehat{\xvec}}$

$\newcommand{\vhat}{\widehat{\vvec}}$

$\newcommand{\uhat}{\widehat{\uvec}}$

$\newcommand{\what}{\widehat{\wvec}}$

$\newcommand{\Sighat}{\widehat{\Sigma}}$

$\newcommand{\lt}{<}$

$\newcommand{\gt}{>}$

$\newcommand{\amp}{&}$

$\definecolor{fillinmathshade}{gray}{0.9}$

ANOVA table

Let's say we have collected data, and our X values have been entered in R as an array called data.X, and our Y values as data.Y. Now, we want to find the ANOVA values for the data. We can do this through the following steps:

First, we should fit our data to a model. > data.lm = lm(data.Y~data.X)
Next, we can get R to produce an ANOVA table by typing : > anova(data.lm)
Now, we should have an ANOVA table!

Fitted Values

To obtain the fitted values of the model from our previous example, we type: > data.fit = fitted(data.lm)
This gives us an array called "data.fit" that contains the fitted values of data.lm

Residuals

Now we want to obtain the residuals of the model: > data.res = resid(data.lm)
Now we have an array of the residuals.

Hypothesis testing

If you have already found the ANOVA table for your data, you should be able to calculate your test statistic from the numbers given.
Let's say we want to find the F - quantile given by $\large \mathbf{F} (.95; 3 , 24)$ . We can find this by typing > qf(.95, 3, 24)
To find the t - quantile given by $\large \mathbf{t} (.975; 1, 19)$ , we would type: > qt(.975, 1, 19)

P - values

To get the p - value for the F - quantile of, say, 2.84 , with degrees of freedom 3 and 24, we would type in > pf(2.84, 3, 24)

Normal Q-Q plot

We want to obtain the Normal Probability plot for the standardized residuals of our data, "data.lm".
We have already fit our data to a model, but we now need the studentized residuals:

> data.stdres = rstandard(data.lm)

Now, we make the plot by typing: > qqnorm(data.stdres)
Now, to see the line, type: > qqline(data.stdres)

More on Linear Regression

Fitting a Model

Let's say we have two X variables in our data, and we want to find a multiple regression model. Once again, let's say our Y values have been saved as a vector titled "data.Y". Now, let's assume that the X values for the first variable are saved as "data.X1", and those for the second variable as "data.X2".
If we want to fit our data to the model $\large Y_i = \beta_1 X_{i1} + \beta_2 X_{i2} + \epsilon_i$ , we can type:

> data.lm.mult = lm(data.Y ~ data.X1 + data.X2).

This has given us a model to work with, titled "data.lm.mult"

Summary of Model

We can now see our model by typing > summary(data.lm.mult)
The summary should list the estimates, the standard errors, and the t-values of each variable. The summary also lists the Residual Standard Error, the Multiple and Adjusted R-squared values, and other very useful information.

Pairwise Comparison Scatterplot Matrix

Let's say we have a model with three different variables (the variables are named "data.X", "data.Y", and "data.Z"). We can compare the variables against eachother in a scatterplot matrix easily by typing:

> pairs(cbind(data.X, data.Y, data.Z))

If the variables are listed together in one data frame (let's say it's called "data.XYZ"), we can get the same matrix by typing: > pairs(data.XYZ)

Further Questions

If you would like more information on any R instructions to be added to this page, please comment, noting what you would like to see, and we will work on putting up the information as soon as possible.

Contributors

Valerie Regalia
Debashis Paul

Search

Text Color

Text Size

Margin Size

Font Type