Skip to main content
Statistics LibreTexts

R Tutorial for ANOVA and Linear Regression

  • Page ID
    251
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    ANOVA table

    • Let's say we have collected data, and our X values have been entered in R as an array called data.X, and our Y values as data.Y. Now, we want to find the ANOVA values for the data. We can do this through the following steps:
    1. First, we should fit our data to a model. > data.lm = lm(data.Y~data.X)
    2. Next, we can get R to produce an ANOVA table by typing : > anova(data.lm)
    3. Now, we should have an ANOVA table!

    Fitted Values

    • To obtain the fitted values of the model from our previous example, we type: > data.fit = fitted(data.lm)
    • This gives us an array called "data.fit" that contains the fitted values of data.lm

    Residuals

    • Now we want to obtain the residuals of the model: > data.res = resid(data.lm)
    • Now we have an array of the residuals.

    Hypothesis testing

    • If you have already found the ANOVA table for your data, you should be able to calculate your test statistic from the numbers given.
    • Let's say we want to find the F - quantile given by \( \large \mathbf{F} (.95; 3 , 24) \). We can find this by typing > qf(.95, 3, 24)
    • To find the t - quantile given by \( \large \mathbf{t} (.975; 1, 19) \) , we would type: > qt(.975, 1, 19)

    P - values

    • To get the p - value for the F - quantile of, say, 2.84 , with degrees of freedom 3 and 24, we would type in > pf(2.84, 3, 24)

    Normal Q-Q plot

    • We want to obtain the Normal Probability plot for the standardized residuals of our data, "data.lm".
    • We have already fit our data to a model, but we now need the studentized residuals:

    > data.stdres = rstandard(data.lm)

    • Now, we make the plot by typing: > qqnorm(data.stdres)
    • Now, to see the line, type: > qqline(data.stdres)

    More on Linear Regression

    Fitting a Model

    • Let's say we have two X variables in our data, and we want to find a multiple regression model. Once again, let's say our Y values have been saved as a vector titled "data.Y". Now, let's assume that the X values for the first variable are saved as "data.X1", and those for the second variable as "data.X2".
    • If we want to fit our data to the model \( \large Y_i = \beta_1 X_{i1} + \beta_2 X_{i2} + \epsilon_i \) , we can type:

    > data.lm.mult = lm(data.Y ~ data.X1 + data.X2).

    • This has given us a model to work with, titled "data.lm.mult"

    Summary of Model

    • We can now see our model by typing > summary(data.lm.mult)
    • The summary should list the estimates, the standard errors, and the t-values of each variable. The summary also lists the Residual Standard Error, the Multiple and Adjusted R-squared values, and other very useful information.

    Pairwise Comparison Scatterplot Matrix

    • Let's say we have a model with three different variables (the variables are named "data.X", "data.Y", and "data.Z"). We can compare the variables against eachother in a scatterplot matrix easily by typing:

    > pairs(cbind(data.X, data.Y, data.Z))

    • If the variables are listed together in one data frame (let's say it's called "data.XYZ"), we can get the same matrix by typing: > pairs(data.XYZ)

    Further Questions

    • If you would like more information on any R instructions to be added to this page, please comment, noting what you would like to see, and we will work on putting up the information as soon as possible.

    Contributors

    • Valerie Regalia
    • Debashis Paul


    This page titled R Tutorial for ANOVA and Linear Regression is shared under a not declared license and was authored, remixed, and/or curated by Debashis Paul.

    • Was this article helpful?