Diagnostics for residuals(continued)

Diagnostics for residuals (continued)

Nonnormality of errors

This can be studied graphically by using the normal probability plot, or Q-Q (standing for quantile-quantile) plot. In this plot, the ordered residual (or observed quantiles) of the residuals are plotted aginst the expected quantiles assuming that $$\epsilon_i$$'s are approximately normal and independent with mean 0 and variance = MSE. This results in plotting the k-th largest ei against

$${\sqrt{MSE}*z\left[\dfrac{k-0.375}{n+0.25}\right]},$$

where z(q) is the q-th quantile of N(0,1) distribution, where0<q<1. If the errors are normally distributed then the points on the plots should almost along the diagonal line. Departures from that could indicates skewness or heavier-tailed distributions.

(a) The model: $$Y = 2 + 3X + \epsilon$$, where $$\epsilon$$~N(0,1). 100 observations, with Xi= i/10, i = 1,...,100

Coefficients Estimate Std. Error t-statistic P-value
Intercept 1.5413 0.2196 7.02 2.92 * 10-10
Slope 3.08907 0.03775 81.84 <2 * 10-16

$${\sqrt{MSE}}$$= 1.09, R2 = 0.9856.

(b) True Model:  $$Y = 2+3X+\epsilon$$, where $$\epsilon$$~t5.. 100 observations, with Xi = i/10, i = 1...100.

Coefficients Estimate Std. Error t-statistic P-value
Intercept 2.11144 0.28279 7.467 3.42*10-11
Slope 2.97458 0.04862 61.185 <2*10-16

$${\sqrt{MSE}} = 1.403,$$

with $$R^2 = 1.403$$.

(c) True Model:$$Y = 2+3X+\epsilon$$. where $$\epsilon$$ ~ (x52 - 5). 100 observations, with Xi = i/10, i= 1...100.

Coefficients Estimate Std. Error t-statistic P-value
Intercept 2.4615 0.6533 3.768 0.000281
Slope 2.9894 0.1123 26.617 <2*10-16

$${\sqrt{MSE}}$$ = 3.242, R2 = 0.8785.

(d) True Model:$$Y = 2+3X+\epsilon$$. where $$\epsilon$$ ~ (5-x52). 100 observations, with Xi = i/10, i= 1...100.

Coefficients Estimate Std. Error t-statistic P-value
Intercept 2.7402 0.4694 6.838 6.87*10-8
Slope 2.9896 0.0807 37.048 <2*10-16

$${\sqrt{MSE}}= 2.329,$$

with $$R^2 = 0.9334$$.

Heteroscedasticity

Heteroscedasticity or unequal variance: the variance of the error $$\epsilon$$i may sometimes depend on the value of Xi. This is often reflected in the plot of residuals versus X through an unequal spread of the residuals along the X-axis.

One possibility is that the variance either increases or decreases with increasing value of X. This is often true for financial data, where the volume of transactions usually has a role in the uncertainty of the market. Another possibility is that the data may come from different strata with different variabilities. E.g. different measuring instruments, with different precisions, may have been used.

(a) True Model:$$Y = 2+3X+\epsilon$$. where $$\epsilon$$ ~ (5-x52). 100 observations, with Xi = i/10, i= 1...100.

 Coefficients Estimate Std. Error t-statistic P-value Intercept 1.0074 0.9729 1.035 0.303 Slope 3.3382 0.1673 19.958 <2*10-16

$${\sqrt{MSE}}$$ = 2.329, R2 = 0.9334.

Contributors

• Chengcheng Zhang