Skip to main content
Statistics LibreTexts

Test for Lack of Fit

Lack of Fit 

When we have repeated measurements for different values of the predictor variables \(X\), it is possible to test whether a linear model fits the data.

Suppose that we have data that can be expressed in the form:

\( \{(X_j,Y_{ij}) : i = 1, ..., n_{j}; j = 1, ..., c\}\)where \(c >2\).

Assume that the data come from the model  :

\[Y_{ij} = \mu_{j} + \varepsilon_{ij}, i = 1, ..., n_{j}; j = 1, ..., c                                                        (1)\].

The null hypothesis in which the linear model holds is: \(H_{0}: \mu_{j} = \beta_{0} + \beta_{1}X_{j}\), for all \(j = 1, ..., c\).

Here (1) is the full model and the model specified by \(H_{0}\) is the reduced model. We follow the usual procedure for the ANOVA, by computing the sum of squares due to errors for the full and reduced models.

Let \(\bar{Y} = \frac{1}{n_{j}} \sum_{i = 1}^{n_{j}} Y_{ij}\), and \(\bar{Y} = \frac{1}{c}\sum_{j=1}^{c}n_{j}\bar{Y}_{j} = \frac{1}{n}\sum_{j=1}^{n}\sum_{i=1}^{n_j}Y_{ij}\), where \(n = \sum_{j=1}^{c}n_{j}\).

\(SSTO = \sum_{j=1}^{c}\sum_{i=1}^{n_j}(Y_{ij} - \bar{Y})^2\) and

\[SSPE = SSE_{full} = \sum_{j=1}^{c}\sum_{i=1}^{n_j}(Y_{ij} - \bar{Y}_{j})^2 = \sum_{j=1}^{c}\sum_{i=1}^{n_j}Y_{ij}^2 - \sum_{j=1}^{c}n_{j}\bar{Y}_{j}^2\]

\[SSE_{red} = SSE = \sum_{j=1}^{c}\sum_{i=1}^{n_j}(Y_{ij} - \beta_{0} - \beta_{1}X_{j})^2\]

\[SSLF = SSE_{red} - SSE_{full}\].

Degrees of freedom

\[d.f.(SSPE) = n - c; d.f.(SSLF) = d.f.(SSE_{red}) - d.f.(SSPE) = (n - 2) - (n - c) = c - 2.\]

 

ANOVA Table
Source d.f. SS MS=SS/d.f. F-statistic
Regression 1 SSR MSR MSR/MSE
Error n-2 SSE=SSLF+SSPE MSE  
Lack of fit c-2 SSLF MSLF MSLF/MSPE
Pure error n-c SSPE MSPE  
Total n-1 SSTO=SSR+SSLF+SSPE    

 

Reject \(H_{0} : (\mu_{j} = \beta_{0} + \beta_{1}X_{j} for  all  j)\) at level \(\alpha\) if \(F^*_{LF} = \frac{MSLF}{MSPE} > F(1 - \alpha; c - 2, n - c)\).

Example: Growth rate data

In the following example, data are available on the effect of dietary supplement on the growth rates of rats. Here \(X = \) dose of dietary supplement and \(Y =\) growth rate. The following table presents the data in a form suitable for the analysis.

\(j = 1\)
\(X_{1} = 10\)
\(n_{1} = 2\)

\(j = 2\)
\(X_{2} = 15\)
\(n_{2} = 2\)

\(j = 3\)
\(X_{3} = 20\)
\(n_{3} = 2\)

\(j = 4\)
\(X_{4} = 25\)
\(n_{4} = 3\)

\(j = 5\)
\(X_{5} = 30\)
\(n_{5} = 1\)

\(j = 6\)
\(X_{6} = 35\)
\(n_{6} = 2\)

\(Y_{11} = 73\)
\(Y_{21} = 78\)
\(Y_{12} = 85\)
\(Y_{22} = 88\)
\(Y_{13} = 90\)
\(Y_{23} = 91\)
\(Y_{14} = 87\)
\(Y_{24} = 86\)
\(Y_{34} = 91\)
\(Y_{15} = 75\) \(Y_{16} = 65\)
\(Y_{26} = 63\)
\(\bar{Y}_{1} = 75.5\) \(\bar{Y}_{2} = 86.5\) \(\bar{Y}_{3} = 90.5\) \(\bar{Y}_{4} = 88\) \(\bar{Y}_{5} = 75\) \(\bar{Y}_{6} = 64\)

So, for this data, \(c = 6, n = \sum_{j = 1}^{c}n_{j} = 12\).

\(SSTO = \sum_{j}\sum_{i}(Y_{ij} - \bar{Y})^2 = 1096.00\)

\(SSPE = \sum_{j}\sum_{i}Y_{ij}^2 - \sum_{j}n_{j}\bar{Y}_{j}^2 = 79828 - 79704.5 = 33.50\)

\(SSE_{red} = \sum_{i}\sum_{j}(Y_{ij} - \beta_{0} - \beta_{1}X_{j})^2 = 891.73 (Note: \beta_{0} = 92.003, \beta_{1} = -0.498)\)

\(SSLF = SSE_{red} - SSPE = 891.73 - 33.50 = 858.23\)

\(d.f.(SSPE) = n - c = 6\)

\(d.f.(SSLF) = c - 2 = 4\)

\(MSLF = \frac{SSLF}{c - 2} = 214.5575\)

\(MSPE = \frac{SSPE}{n -c} = 5.5833\)

ANOVA Table:

Source d.f. SS MS F*
Regression
Error
1
10
204.27
891.73
204.27
89.173
\(\frac{MSR}{MSE} =\) 2.29
Lack of fit
Pure error
4
6
858.23
33.50
214.56
5.583
\(\frac{MSLF}{MSPE} =\) 38.43
Total 11 1096.00    

\(F(0.95;4,6) = 4.534\). Since \(F_{LF}^\star = 38.43 > 4.534\), reject \(H_{0} :\) \( \mu_{j} = \beta_{0} + \beta_{1}X_{j} \) for all j at 5% level of significance.

Also, if you are testing \(H_{0}' :\)  \(\beta_{1} = 0\) against \(H_{1}' :\)  \(\beta_{1} \neq 0\), assuming that the linear model holds, as in the
usual ANOVA for linear regression, then the corresponding test statistic \(F^\star = \frac{MSR}{MSE} = 2.29\). Which is less than
\(F(0.95;1,n-2) = F(0.95;1,10) = 4.964\). So, if one assumes that linear model holds then the test cannot reject \(H_{0}' \) at 5% level of significance.

Contributors

  • Joy Wei
  • Debashis Paul