Skip to main content
Statistics LibreTexts

Regression through the origin

Regression through the origin

Sometimes due to the nature of the problem (e.g. (i) physical law where one variable is proportional to another variable, and the goal is to determine the constant of proportionality; (ii) X = sales, Y = profit from sales), or, due to empirical considerations ( in the full regression model the intercept \(\beta_0\) turns out to be insignificant), one may fit the model \(Y_i\) = \(\beta_1X_i\) + \(\varepsilon_i\), where \(\varepsilon_i\) are assumed to be uncorrelated, and have mean 0 and variance \(\sigma^2\). Then estimates are:

$$\widehat \beta_1 = \widetilde b_1 = \frac{\sum_{i=1}^n X_iY_i}{\sum_{i=1}^n X_i^2},        \qquad \widetilde{SSE} = \sum_{i=1}^n (Y_i - \widetilde b_1 X_i)^2 = \sum_i Y_i^2 - \widetilde b_1^2 \sum_i X_i^2. $$

Also, \(E(\widetilde b_1) = \beta_1\), \(E(\widetilde{SSE}) = (n-1)\sigma^2\), so that \(\widetilde{MSE} = \frac{1}{n-1}\widetilde{SSE}\) is an unbiased estimator of \(\sigma^2\) and  d.f.\((\widetilde{MSE}) = n-1\). Var\((\widetilde b_1) = \frac{\sigma^2}{\sum_i X_i^2}\), and is estimated by \(s^2(\widetilde b_1) = \frac{\widetilde{MSE}}{\sum_i X_i^2}\).

  • 100(1 - \(\alpha\))% confidence interval for \(\beta_1\) : \(\widetilde b_1 \pm t(1-\alpha/2; n-1) s(\widetilde b_1)\).
  • Estimate of mean response for \(X = X_h\) : \(\widetilde Y_h = \widetilde b_1 X_h\) with estimated standard error \(s(\widetilde Y_h) = \sqrt{\widetilde{MSE} \frac{X_h^2}{\sum_i X_i^2}}\).
  • 100(1 - \(\alpha)\)% confidence interval for mean response : \(\widetilde Y_h \pm t(1-\alpha/2; n-1) s(\widetilde Y_h)\).
  • ANOVA decomposition : \(\widetilde{SSTO} =  \widetilde{SSR} + \widetilde{SSE}\), where \(\widetilde{SSTO} = \sum_i Y_i^2\), with d.f. \((\widetilde{SSTO}) = n\),  \(\widetilde{SSR} = \widetilde b_1^2 \sum_i X_i^2\) with d.f.\((\widetilde{SSR}) = 1\). Reject \(H_0 :
    \beta_1 = 0\) if F-ratio \(F^* = \frac{\widetilde{MSR}}{\widetilde{MSE}} > F(1-\alpha;1,n-1)\).

Inverse prediction, or calibration problem

In some experimental studies it is important to know the value of X in order to obtain ( on an average ) a pre-specified value of Y. The following example illustrates such a situtation.

X 10 15 15 20 20 20 25 25 28 30
Y 160 171 175 182 184 181 188 193 195 200

Here Y = tensile strength of paper, X = amount (percentage) of hardwood in the pulp.

Want to find \(X_{h(new)}\) for given value of \(Y_{h(new)}\).

Estimate \(\widehat X_{h(new)} = \frac{Y_{h(new)} - b_0}{b_1}\). Estimated standard error of prediction is \(s(\widehat X_{h(new)})\) where
$$ s^2(\widehat X_{h(new)}) = \frac{MSE}{b_1^2} \left[1+ \frac{1}{n} + \frac{(\widehat X_{h(new)} - \overline{X})^2}{\sum_i(X_i - \overline{X})^2}\right]. $$

Then 100(1 - \(\alpha\))% prediction interval for \(X_{h(new)}\) is given by \(\widehat X_{h(new)} \pm t(1-\alpha/2;n-2) s(\widehat X_{h(new)})\).

Fitted model : \(\widehat Y\) = 143.8244 + 1.8786X .  SSE = 38.8328, SSTO = 1300.9, SSR = 1262.1, \(R^2\) = 0.9701, \(\sum_i (X_i - \overline{X})^2\) = 357.6, MSE = 4.8541, \(\overline{X}\) = 20.8, \(\overline{Y}\) = 182.9.

Contributors

  • Yingwen Li 
  • Debashis Paul