Test for Lack of Fit
( \newcommand{\kernel}{\mathrm{null}\,}\)
Lack of Fit
When we have repeated measurements for different values of the predictor variables X, it is possible to test whether a linear model fits the data.
Suppose that we have data that can be expressed in the form:
{(Xj,Yij):i=1,...,nj;j=1,...,c}where c>2.
Assume that the data come from the model :
Yij=μj+εij,i=1,...,nj;j=1,...,c(1).
The null hypothesis in which the linear model holds is: H0:μj=β0+β1Xj, for all j=1,...,c.
Here (1) is the full model and the model specified by H0 is the reduced model. We follow the usual procedure for the ANOVA, by computing the sum of squares due to errors for the full and reduced models.
Let ˉY=1nj∑nji=1Yij, and ˉY=1c∑cj=1njˉYj=1n∑nj=1∑nji=1Yij, where n=∑cj=1nj.
SSTO=∑cj=1∑nji=1(Yij−ˉY)2 and
SSPE=SSEfull=c∑j=1nj∑i=1(Yij−ˉYj)2=c∑j=1nj∑i=1Y2ij−c∑j=1njˉY2j
SSEred=SSE=c∑j=1nj∑i=1(Yij−β0−β1Xj)2
SSLF=SSEred−SSEfull.
Degrees of freedom
d.f.(SSPE)=n−c;d.f.(SSLF)=d.f.(SSEred)−d.f.(SSPE)=(n−2)−(n−c)=c−2.
Source | d.f. | SS | MS=SS/d.f. | F-statistic |
Regression | 1 | SSR | MSR | MSR/MSE |
Error | n-2 | SSE=SSLF+SSPE | MSE | |
Lack of fit | c-2 | SSLF | MSLF | MSLF/MSPE |
Pure error | n-c | SSPE | MSPE | |
Total | n-1 | SSTO=SSR+SSLF+SSPE |
Reject H0:(μj=β0+β1Xjforallj) at level α if F∗LF=MSLFMSPE>F(1−α;c−2,n−c).
Example: Growth rate data
In the following example, data are available on the effect of dietary supplement on the growth rates of rats. Here X= dose of dietary supplement and Y= growth rate. The following table presents the data in a form suitable for the analysis.
j=1 | j=2 | j=3 | j=4 | j=5 | \(j = 6\) |
Y11=73 Y21=78 | Y12=85 Y22=88 | Y13=90 Y23=91 | Y14=87 Y24=86 Y34=91 | Y15=75 | Y16=65 Y26=63 |
ˉY1=75.5 | ˉY2=86.5 | ˉY3=90.5 | ˉY4=88 | ˉY5=75 | ˉY6=64 |
So, for this data, c=6,n=∑cj=1nj=12.
SSTO=∑j∑i(Yij−ˉY)2=1096.00
SSPE=∑j∑iY2ij−∑jnjˉY2j=79828−79704.5=33.50
SSEred=∑i∑j(Yij−β0−β1Xj)2=891.73(Note:β0=92.003,β1=−0.498)
SSLF=SSEred−SSPE=891.73−33.50=858.23
d.f.(SSPE)=n−c=6
d.f.(SSLF)=c−2=4
MSLF=SSLFc−2=214.5575
MSPE=SSPEn−c=5.5833
ANOVA Table:
Source | d.f. | SS | MS | F* |
Regression Error | 1 10 | 204.27 891.73 | 204.27 89.173 | MSRMSE= 2.29 |
Lack of fit Pure error | 4 6 | 858.23 33.50 | 214.56 5.583 | MSLFMSPE= 38.43 |
Total | 11 | 1096.00 |
F(0.95;4,6)=4.534. Since F⋆LF=38.43>4.534, reject H0: μj=β0+β1Xj for all j at 5% level of significance.
Also, if you are testing H′0: β1=0 against H′1: β1≠0, assuming that the linear model holds, as in the
usual ANOVA for linear regression, then the corresponding test statistic F⋆=MSRMSE=2.29. Which is less than
F(0.95;1,n−2)=F(0.95;1,10)=4.964. So, if one assumes that linear model holds then the test cannot reject H′0 at 5% level of significance.
Contributors
- Joy Wei
- Debashis Paul