Loading [MathJax]/jax/output/HTML-CSS/jax.js
Skip to main content
Library homepage
 

Text Color

Text Size

 

Margin Size

 

Font Type

Enable Dyslexic Font
Statistics LibreTexts

Analysis of variance approach to regression

( \newcommand{\kernel}{\mathrm{null}\,}\)

We divide the total variability in the observe data into two parts - one coming from the errors, the other coming from the predictor.

ANOVA Decomposition

The following decomposition

Yi¯Y=(^Yi¯Y)+(Yi^Yi)

with i=1,2,...,n..

represents the deviation of the observed response from the mean response in terms of the sum of the deviation of the fitted value from the mean plus the residual.

Taking the sum of squares, and after some algebra we have:

ni=1(Yi¯Y)2=ni=1(^Yi¯Y)2+ni=1(Yi^Yi)2.

or

SSTO=SSR+SSE

where

SSTO=ni=1(Yi¯Y)2

and

SSR=ni=1(^Yi¯Y)2.

is referred to as the ANOVA decomposition to the variation in the response. Note that

SSR=b21ni=1(Xi¯X)2.

Degrees of freedom

The degrees of freedom of different terms in the decomposition Equation ??? are

df(SSTO)=n1

df(SSR)=1

df(SSE)=n2.

So,

df(SSTO)=d.f.(SSR)+d.f.(SSE).

Expected value and distribution

E(SSE)=(n2)σ2, and E(SSR)=σ2+β21ni=1(Xi¯X)2. Also, under the normal regression model, and under H0:β1=0,

SSRσ2χ21,SSEσ2χ2n2,

and these two are independent.

Mean squares

MSE=SSEd.f.(SSE)=SSEn2,MSR=SSRd.f.(SSR)=SSR1.

Also, E(MSE)=σ2,E(MSR)=σ2+β21ni=1(Xi¯X)2.

F ratio

For testing H0:β1=0 versus H1:β10, the following test statistics, called the F ratio, can be used:

F=MSRMSE.

The reason is that MSRMSE fluctuates around 1 + β21ni=1(Xi¯X)2σ2. So, a significantly large value of F provides evidence against H0 and for H1.

Under H0,F has the F distribution with paired degrees of freedom (d.f.( SSR ), d.f.( SSE )) = (1, n - 2 ), (written FF1,n2). Thus, the test rejects H0 at level of significance α if F>F(1α;1,n2), where F(1α;1,n2) is the (1α) quantile of F1;n2 distribution.

Relation between F-test and t-test

Check that F=(t)2. where t=b1s(b1) is the test statistic for testing H0:β1=0 versus H1:β10. So, the F-test is equivalent to the t-test in this case.

ANOVA table

It is a table that gives the summary of the various objects used in testing H0:β1=0 against H1:β10. It is of the form:

Source df SS MS F*
Regression d.f.(SSR) = 1 SSR MSR MSRMSE
Error d.f.(SSE) = n - 2 SSE MSE
Total d.f.(SSTO) = n - 1 SSTO

Example 1: housing price data

We consider a data set on housing prices. Here Y = selling price of houses (in $1000), and X = size of houses (100 square feet). The summary statistics are given below:

n=19,¯X=15.719,¯Y=75.211,

i(Xi¯X)2=40.805,i(Yi¯Y)2=556.078,i(Xi¯X)(Yi¯Y)=120.001.

(Example) - Estimates of β1 and β0

b1=i(Xi¯X)(Yi¯Y)i(Xi¯X)2=120.00140.805=2.941.

and

b0=¯Yb1¯X=75.211(2.941)(15.719)=28.981.

(Example) - MSE

The degrees of freedom (d.f.) = n2=17.SSE=i(Yi¯Y)2b21i(Xi¯X)2=203.17. So,

MSE=SSEn2=203.1717=11.95.

Also, SSTO = 556.08 and SSR = SSTO - SSE = 352.91, MSR = SSR/1 = 352.91.

F=MSRMSE=29.529=(t)2, where t=b1s(b1)=2.9410.5412=5.434. Also, F( 0.95; 1, 17 ) = 4.45, t( 0.975; 17) = 2.11. So, we reject H0:β1=0. The ANOVA table is given below.

Source df SS MS F*
Regression 1 352.91 352.91 29.529
Error 17 203.17 11.95
Total 18 556.08

Contributors

  • Valerie Regalia
  • Debashis Paul

This page titled Analysis of variance approach to regression is shared under a not declared license and was authored, remixed, and/or curated by Debashis Paul.

  • Was this article helpful?

Support Center

How can we help?