Test for Lack of Fit

Lack of Fit

When we have repeated measurements for different values of the predictor variables $$X$$, it is possible to test whether a linear model fits the data.

Suppose that we have data that can be expressed in the form:

$$\{(X_j,Y_{ij}) : i = 1, ..., n_{j}; j = 1, ..., c\}$$where $$c >2$$.

Assume that the data come from the model :

$Y_{ij} = \mu_{j} + \varepsilon_{ij}, i = 1, ..., n_{j}; j = 1, ..., c (1)$.

The null hypothesis in which the linear model holds is: $$H_{0}: \mu_{j} = \beta_{0} + \beta_{1}X_{j}$$, for all $$j = 1, ..., c$$.

Here (1) is the full model and the model specified by $$H_{0}$$ is the reduced model. We follow the usual procedure for the ANOVA, by computing the sum of squares due to errors for the full and reduced models.

Let $$\bar{Y} = \frac{1}{n_{j}} \sum_{i = 1}^{n_{j}} Y_{ij}$$, and $$\bar{Y} = \frac{1}{c}\sum_{j=1}^{c}n_{j}\bar{Y}_{j} = \frac{1}{n}\sum_{j=1}^{n}\sum_{i=1}^{n_j}Y_{ij}$$, where $$n = \sum_{j=1}^{c}n_{j}$$.

$$SSTO = \sum_{j=1}^{c}\sum_{i=1}^{n_j}(Y_{ij} - \bar{Y})^2$$ and

$SSPE = SSE_{full} = \sum_{j=1}^{c}\sum_{i=1}^{n_j}(Y_{ij} - \bar{Y}_{j})^2 = \sum_{j=1}^{c}\sum_{i=1}^{n_j}Y_{ij}^2 - \sum_{j=1}^{c}n_{j}\bar{Y}_{j}^2$

$SSE_{red} = SSE = \sum_{j=1}^{c}\sum_{i=1}^{n_j}(Y_{ij} - \beta_{0} - \beta_{1}X_{j})^2$

$SSLF = SSE_{red} - SSE_{full}$.

Degrees of freedom

$d.f.(SSPE) = n - c; d.f.(SSLF) = d.f.(SSE_{red}) - d.f.(SSPE) = (n - 2) - (n - c) = c - 2.$

 Source d.f. SS MS=SS/d.f. F-statistic Regression 1 SSR MSR MSR/MSE Error n-2 SSE=SSLF+SSPE MSE Lack of fit c-2 SSLF MSLF MSLF/MSPE Pure error n-c SSPE MSPE Total n-1 SSTO=SSR+SSLF+SSPE

Reject $$H_{0} : (\mu_{j} = \beta_{0} + \beta_{1}X_{j} for all j)$$ at level $$\alpha$$ if $$F^*_{LF} = \frac{MSLF}{MSPE} > F(1 - \alpha; c - 2, n - c)$$.

Example: Growth rate data

In the following example, data are available on the effect of dietary supplement on the growth rates of rats. Here $$X =$$ dose of dietary supplement and $$Y =$$ growth rate. The following table presents the data in a form suitable for the analysis.

 $$j = 1$$ $$X_{1} = 10$$ $$n_{1} = 2$$ $$j = 2$$ $$X_{2} = 15$$ $$n_{2} = 2$$ $$j = 3$$ $$X_{3} = 20$$ $$n_{3} = 2$$ $$j = 4$$ $$X_{4} = 25$$ $$n_{4} = 3$$ $$j = 5$$ $$X_{5} = 30$$ $$n_{5} = 1$$ $$j = 6$$ $$X_{6} = 35$$ $$n_{6} = 2$$ $$Y_{11} = 73$$ $$Y_{21} = 78$$ $$Y_{12} = 85$$ $$Y_{22} = 88$$ $$Y_{13} = 90$$ $$Y_{23} = 91$$ $$Y_{14} = 87$$ $$Y_{24} = 86$$ $$Y_{34} = 91$$ $$Y_{15} = 75$$ $$Y_{16} = 65$$ $$Y_{26} = 63$$ $$\bar{Y}_{1} = 75.5$$ $$\bar{Y}_{2} = 86.5$$ $$\bar{Y}_{3} = 90.5$$ $$\bar{Y}_{4} = 88$$ $$\bar{Y}_{5} = 75$$ $$\bar{Y}_{6} = 64$$

So, for this data, $$c = 6, n = \sum_{j = 1}^{c}n_{j} = 12$$.

$$SSTO = \sum_{j}\sum_{i}(Y_{ij} - \bar{Y})^2 = 1096.00$$

$$SSPE = \sum_{j}\sum_{i}Y_{ij}^2 - \sum_{j}n_{j}\bar{Y}_{j}^2 = 79828 - 79704.5 = 33.50$$

$$SSE_{red} = \sum_{i}\sum_{j}(Y_{ij} - \beta_{0} - \beta_{1}X_{j})^2 = 891.73 (Note: \beta_{0} = 92.003, \beta_{1} = -0.498)$$

$$SSLF = SSE_{red} - SSPE = 891.73 - 33.50 = 858.23$$

$$d.f.(SSPE) = n - c = 6$$

$$d.f.(SSLF) = c - 2 = 4$$

$$MSLF = \frac{SSLF}{c - 2} = 214.5575$$

$$MSPE = \frac{SSPE}{n -c} = 5.5833$$

ANOVA Table:

 Source d.f. SS MS F* Regression Error 1 10 204.27 891.73 204.27 89.173 $$\frac{MSR}{MSE} =$$ 2.29 Lack of fit Pure error 4 6 858.23 33.50 214.56 5.583 $$\frac{MSLF}{MSPE} =$$ 38.43 Total 11 1096.00

$$F(0.95;4,6) = 4.534$$. Since $$F_{LF}^\star = 38.43 > 4.534$$, reject $$H_{0} :$$ $$\mu_{j} = \beta_{0} + \beta_{1}X_{j}$$ for all j at 5% level of significance.

Also, if you are testing $$H_{0}' :$$ $$\beta_{1} = 0$$ against $$H_{1}' :$$ $$\beta_{1} \neq 0$$, assuming that the linear model holds, as in the
usual ANOVA for linear regression, then the corresponding test statistic $$F^\star = \frac{MSR}{MSE} = 2.29$$. Which is less than
$$F(0.95;1,n-2) = F(0.95;1,10) = 4.964$$. So, if one assumes that linear model holds then the test cannot reject $$H_{0}'$$ at 5% level of significance.

Contributors

• Joy Wei
• Debashis Paul