# R Tutorial for ANOVA and Linear Regression

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$

( \newcommand{\kernel}{\mathrm{null}\,}\) $$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\id}{\mathrm{id}}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\kernel}{\mathrm{null}\,}$$

$$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$

$$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$

$$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\AA}{\unicode[.8,0]{x212B}}$$

$$\newcommand{\vectorA}[1]{\vec{#1}} % arrow$$

$$\newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$$

$$\newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vectorC}[1]{\textbf{#1}}$$

$$\newcommand{\vectorD}[1]{\overrightarrow{#1}}$$

$$\newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}$$

$$\newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}}$$

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

$$\newcommand{\avec}{\mathbf a}$$ $$\newcommand{\bvec}{\mathbf b}$$ $$\newcommand{\cvec}{\mathbf c}$$ $$\newcommand{\dvec}{\mathbf d}$$ $$\newcommand{\dtil}{\widetilde{\mathbf d}}$$ $$\newcommand{\evec}{\mathbf e}$$ $$\newcommand{\fvec}{\mathbf f}$$ $$\newcommand{\nvec}{\mathbf n}$$ $$\newcommand{\pvec}{\mathbf p}$$ $$\newcommand{\qvec}{\mathbf q}$$ $$\newcommand{\svec}{\mathbf s}$$ $$\newcommand{\tvec}{\mathbf t}$$ $$\newcommand{\uvec}{\mathbf u}$$ $$\newcommand{\vvec}{\mathbf v}$$ $$\newcommand{\wvec}{\mathbf w}$$ $$\newcommand{\xvec}{\mathbf x}$$ $$\newcommand{\yvec}{\mathbf y}$$ $$\newcommand{\zvec}{\mathbf z}$$ $$\newcommand{\rvec}{\mathbf r}$$ $$\newcommand{\mvec}{\mathbf m}$$ $$\newcommand{\zerovec}{\mathbf 0}$$ $$\newcommand{\onevec}{\mathbf 1}$$ $$\newcommand{\real}{\mathbb R}$$ $$\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$$ $$\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$$ $$\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$$ $$\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$$ $$\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$$ $$\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$$ $$\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$$ $$\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$$ $$\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$$ $$\newcommand{\laspan}[1]{\text{Span}\{#1\}}$$ $$\newcommand{\bcal}{\cal B}$$ $$\newcommand{\ccal}{\cal C}$$ $$\newcommand{\scal}{\cal S}$$ $$\newcommand{\wcal}{\cal W}$$ $$\newcommand{\ecal}{\cal E}$$ $$\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$$ $$\newcommand{\gray}[1]{\color{gray}{#1}}$$ $$\newcommand{\lgray}[1]{\color{lightgray}{#1}}$$ $$\newcommand{\rank}{\operatorname{rank}}$$ $$\newcommand{\row}{\text{Row}}$$ $$\newcommand{\col}{\text{Col}}$$ $$\renewcommand{\row}{\text{Row}}$$ $$\newcommand{\nul}{\text{Nul}}$$ $$\newcommand{\var}{\text{Var}}$$ $$\newcommand{\corr}{\text{corr}}$$ $$\newcommand{\len}[1]{\left|#1\right|}$$ $$\newcommand{\bbar}{\overline{\bvec}}$$ $$\newcommand{\bhat}{\widehat{\bvec}}$$ $$\newcommand{\bperp}{\bvec^\perp}$$ $$\newcommand{\xhat}{\widehat{\xvec}}$$ $$\newcommand{\vhat}{\widehat{\vvec}}$$ $$\newcommand{\uhat}{\widehat{\uvec}}$$ $$\newcommand{\what}{\widehat{\wvec}}$$ $$\newcommand{\Sighat}{\widehat{\Sigma}}$$ $$\newcommand{\lt}{<}$$ $$\newcommand{\gt}{>}$$ $$\newcommand{\amp}{&}$$ $$\definecolor{fillinmathshade}{gray}{0.9}$$

## ANOVA table

• Let's say we have collected data, and our X values have been entered in R as an array called data.X, and our Y values as data.Y. Now, we want to find the ANOVA values for the data. We can do this through the following steps:
1. First, we should fit our data to a model. > data.lm = lm(data.Y~data.X)
2. Next, we can get R to produce an ANOVA table by typing : > anova(data.lm)
3. Now, we should have an ANOVA table!

### Fitted Values

• To obtain the fitted values of the model from our previous example, we type: > data.fit = fitted(data.lm)
• This gives us an array called "data.fit" that contains the fitted values of data.lm

### Residuals

• Now we want to obtain the residuals of the model: > data.res = resid(data.lm)
• Now we have an array of the residuals.

### Hypothesis testing

• If you have already found the ANOVA table for your data, you should be able to calculate your test statistic from the numbers given.
• Let's say we want to find the F - quantile given by $$\large \mathbf{F} (.95; 3 , 24)$$. We can find this by typing > qf(.95, 3, 24)
• To find the t - quantile given by $$\large \mathbf{t} (.975; 1, 19)$$ , we would type: > qt(.975, 1, 19)

### P - values

• To get the p - value for the F - quantile of, say, 2.84 , with degrees of freedom 3 and 24, we would type in > pf(2.84, 3, 24)

### Normal Q-Q plot

• We want to obtain the Normal Probability plot for the standardized residuals of our data, "data.lm".
• We have already fit our data to a model, but we now need the studentized residuals:

> data.stdres = rstandard(data.lm)

• Now, we make the plot by typing: > qqnorm(data.stdres)
• Now, to see the line, type: > qqline(data.stdres)

## More on Linear Regression

### Fitting a Model

• Let's say we have two X variables in our data, and we want to find a multiple regression model. Once again, let's say our Y values have been saved as a vector titled "data.Y". Now, let's assume that the X values for the first variable are saved as "data.X1", and those for the second variable as "data.X2".
• If we want to fit our data to the model $$\large Y_i = \beta_1 X_{i1} + \beta_2 X_{i2} + \epsilon_i$$ , we can type:

> data.lm.mult = lm(data.Y ~ data.X1 + data.X2).

• This has given us a model to work with, titled "data.lm.mult"

### Summary of Model

• We can now see our model by typing > summary(data.lm.mult)
• The summary should list the estimates, the standard errors, and the t-values of each variable. The summary also lists the Residual Standard Error, the Multiple and Adjusted R-squared values, and other very useful information.

### Pairwise Comparison Scatterplot Matrix

• Let's say we have a model with three different variables (the variables are named "data.X", "data.Y", and "data.Z"). We can compare the variables against eachother in a scatterplot matrix easily by typing:

> pairs(cbind(data.X, data.Y, data.Z))

• If the variables are listed together in one data frame (let's say it's called "data.XYZ"), we can get the same matrix by typing: > pairs(data.XYZ)

## Further Questions

• If you would like more information on any R instructions to be added to this page, please comment, noting what you would like to see, and we will work on putting up the information as soon as possible.

## Contributors

• Valerie Regalia
• Debashis Paul

This page titled R Tutorial for ANOVA and Linear Regression is shared under a not declared license and was authored, remixed, and/or curated by Debashis Paul.