# Inference in Simple Linear Regression

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$

( \newcommand{\kernel}{\mathrm{null}\,}\) $$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\id}{\mathrm{id}}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\kernel}{\mathrm{null}\,}$$

$$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$

$$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$

$$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\AA}{\unicode[.8,0]{x212B}}$$

$$\newcommand{\vectorA}[1]{\vec{#1}} % arrow$$

$$\newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$$

$$\newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vectorC}[1]{\textbf{#1}}$$

$$\newcommand{\vectorD}[1]{\overrightarrow{#1}}$$

$$\newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}$$

$$\newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}}$$

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

$$\newcommand{\avec}{\mathbf a}$$ $$\newcommand{\bvec}{\mathbf b}$$ $$\newcommand{\cvec}{\mathbf c}$$ $$\newcommand{\dvec}{\mathbf d}$$ $$\newcommand{\dtil}{\widetilde{\mathbf d}}$$ $$\newcommand{\evec}{\mathbf e}$$ $$\newcommand{\fvec}{\mathbf f}$$ $$\newcommand{\nvec}{\mathbf n}$$ $$\newcommand{\pvec}{\mathbf p}$$ $$\newcommand{\qvec}{\mathbf q}$$ $$\newcommand{\svec}{\mathbf s}$$ $$\newcommand{\tvec}{\mathbf t}$$ $$\newcommand{\uvec}{\mathbf u}$$ $$\newcommand{\vvec}{\mathbf v}$$ $$\newcommand{\wvec}{\mathbf w}$$ $$\newcommand{\xvec}{\mathbf x}$$ $$\newcommand{\yvec}{\mathbf y}$$ $$\newcommand{\zvec}{\mathbf z}$$ $$\newcommand{\rvec}{\mathbf r}$$ $$\newcommand{\mvec}{\mathbf m}$$ $$\newcommand{\zerovec}{\mathbf 0}$$ $$\newcommand{\onevec}{\mathbf 1}$$ $$\newcommand{\real}{\mathbb R}$$ $$\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$$ $$\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$$ $$\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$$ $$\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$$ $$\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$$ $$\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$$ $$\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$$ $$\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$$ $$\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$$ $$\newcommand{\laspan}[1]{\text{Span}\{#1\}}$$ $$\newcommand{\bcal}{\cal B}$$ $$\newcommand{\ccal}{\cal C}$$ $$\newcommand{\scal}{\cal S}$$ $$\newcommand{\wcal}{\cal W}$$ $$\newcommand{\ecal}{\cal E}$$ $$\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$$ $$\newcommand{\gray}[1]{\color{gray}{#1}}$$ $$\newcommand{\lgray}[1]{\color{lightgray}{#1}}$$ $$\newcommand{\rank}{\operatorname{rank}}$$ $$\newcommand{\row}{\text{Row}}$$ $$\newcommand{\col}{\text{Col}}$$ $$\renewcommand{\row}{\text{Row}}$$ $$\newcommand{\nul}{\text{Nul}}$$ $$\newcommand{\var}{\text{Var}}$$ $$\newcommand{\corr}{\text{corr}}$$ $$\newcommand{\len}[1]{\left|#1\right|}$$ $$\newcommand{\bbar}{\overline{\bvec}}$$ $$\newcommand{\bhat}{\widehat{\bvec}}$$ $$\newcommand{\bperp}{\bvec^\perp}$$ $$\newcommand{\xhat}{\widehat{\xvec}}$$ $$\newcommand{\vhat}{\widehat{\vvec}}$$ $$\newcommand{\uhat}{\widehat{\uvec}}$$ $$\newcommand{\what}{\widehat{\wvec}}$$ $$\newcommand{\Sighat}{\widehat{\Sigma}}$$ $$\newcommand{\lt}{<}$$ $$\newcommand{\gt}{>}$$ $$\newcommand{\amp}{&}$$ $$\definecolor{fillinmathshade}{gray}{0.9}$$

## Inference in Simple Linear Regression

• Fact : Under normal regression model $$(b_0,b_1)$$ and $$SSE$$ are independently distributed and
$$\frac{b_0 - \beta_0}{s(b_0)} \sim t_{n-2}$$, $$\qquad \frac{b_1 - \beta_1}{s(b_1)} \sim t_{n-2}$$, $$\qquad SSE \sim \sigma^2 \chi_{n-2}^2$$.

• Confidence interval for $$\beta_0$$ and $$\beta_1$$ : $$100(1-\alpha)\%$$ (two-sided) confidence interval for $$\beta_i$$:
$$(b_i - t(1-\alpha/2;n-2) s(b_i)$$, $$b_i + t(1-\alpha/2;n-2) s(b_i))$$

for $$i=0,1$$, where $$t(1-\alpha/2;n-2)$$ is the $$1-\alpha/2$$ upper cut-off point (or $$(1-\alpha/2)$$ quantile) of $$t_{n-2}$$ distribution; i.e., $$P(t_{n-2} > t(1-\alpha/2;n-2)) = \alpha/2$$.
• Hypothesis tests for $$\beta_0$$ and $$\beta_1$$ : $$H_0 : \beta_i = \beta_{i0}$$ ($$i=0$$ or $$1$$).
Test statistic : $$T_i = \frac{b_i - \beta_{i0}}{s(b_i)}$$.
1. Alternative: $$H_1 : \beta_i > \beta_{i0}$$. Reject $$H_0$$ at level $$\alpha$$ if $$\frac{b_i - \beta_{i0}}{s(b_i)} > t(1-\alpha;n-2)$$. Or if, P-value = $$P(t_{n-2} > T_i^{observed}) < \alpha$$.
2. Alternative: $$H_1 : \beta_i < \beta_{i0}$$. Reject $$H_0$$ at level $$\alpha$$ if $$\frac{b_i - \beta_{i0}}{s(b_i)} < t(\alpha;n-2)$$. Or if, P-value = $$P(t_{n-2} < T_i^{observed}) < \alpha$$.
3. Alternative: $$H_1 : \beta_i \neq \beta_{i0}$$. Reject $$H_0$$ at level $$\alpha$$ if $$|\frac{b_i - \beta_{i0}}{s(b_i)}| > t(1-\alpha/2;n-2)$$. Or if, P-value = $$P(|t_{n-2}| > |T_i^{observed}|) < \alpha$$.

Inference for mean response at $$X = X_h$$

• Point estimate: $$\widehat Y_h = b_0 + b_1 X_h$$.

Fact: $$E(\widehat Y_h) = \beta_0 + \beta_1 X_h = E(Y_h)$$, $$Var(\widehat Y_h) = \sigma^2(\widehat Y_h) = \sigma^2\left[\frac{1}{n} + \frac{(X_h - \overline{X})^2}{\sum_i (X_i - \overline{X})^2}\right]$$. Estimated variance is $$s^2(\widehat Y_h) = MSE \left[\frac{1}{n} + \frac{(X_h - \overline{X})^2}{\sum_i (X_i - \overline{X})^2}\right]$$.

Distribution: $$\frac{\widehat Y_h - E(Y_h)}{s(\widehat Y_h)} \sim t_{n-2}$$.

Confidence interval: $$100(1-\alpha)$$% confidence interval for $$E(Y_h)$$ is $$(\widehat Y_h - t(1-\alpha/2;n-2) s(\widehat Y_h),\widehat Y_h + t(1-\alpha/2;n-2) s(\widehat Y_h))$$.

Prediction of a new observation $$Y_{h(new)}$$ at $$X = X_h$$

• Prediction : $$\widehat Y_{h(new)} = \widehat Y_h = b_0 + b_1 X_h$$.

Error in prediction : $$Y_{h(new)} - \widehat Y_{h(new)} = Y_{h(new)} - \widehat Y_h$$.

Fact : $$\sigma^2(Y_{h(new)} - \widehat Y_h) = \sigma^2(Y_{h(new)}) + \sigma^2(\widehat Y_h) = \sigma^2 + \sigma^2(\widehat Y_h) = \sigma^2\left[1+\frac{1}{n}+ \frac{(X_h - \overline{X})^2}{\sum_i (X_i - \overline{X})^2}\right]$$.

Estimate of $$\sigma^2(Y_{h(new)} - \widehat Y_h)$$ is $$s^2(Y_{h(new)} - \widehat Y_h) = MSE \left[1+\frac{1}{n}+ \frac{(X_h - \overline{X})^2}{\sum_i (X_i - \overline{X})^2}\right]$$.

Distribution : $$\frac{Y_{h(new)} - \widehat Y_h}{s(Y_{h(new)} -\widehat Y_h)} \sim t_{n-2}$$.

Prediction interval : $$100(1-\alpha)$$% prediction interval for $$Y_{h(new)}$$ is $$(\widehat Y_h - t(1-\alpha/2;n-2) s(Y_{h(new)}-\widehat Y_h),\widehat Y_h + t(1-\alpha/2;n-2) s(Y_{h(new)}-\widehat Y_h))$$.
• Confidence band for the regression line : At $$X=X_h$$ the $$100(1-\alpha)$$% confidence band for the regression line is given by $$\widehat Y_h \pm w_\alpha s(\widehat Y_h), \qquad \mbox{where} \sim w_\alpha = \sqrt{2F(1-\alpha; 2, n-2)}$$.

Here $$F(1-\alpha;2,n-2)$$ is the $$1-\alpha$$ upper cut-off point (or, $$(1-\alpha)$$ quantile) for the $$F_{2,n-2}$$ distribution ($$F$$ distribution with d.f. $$(2,n-2)$$).

Example $$\PageIndex{1}$$: Simple linear regression

We consider a data set on housing price. Here$$Y=$$ selling price of houses (in $1000), and $$X=$$ size of house (100 square feet). The summary statistics are given below: $$n = 19$$, $$\overline{X} = 15.719$$, $$\overline{Y} = 75.211$$ $$\sum_i(X_i - \overline{X})^2 = 40.805$$, $$\sum_i (Y_i - \overline{Y})^2 = 556.078$$, $$\sum_i (X_i - \overline{X})(Y_i - \overline{Y}) = 120.001$$. Estimates of $$\beta_1$$ and $$\beta_0$$ : $b_1 = \frac{\sum_i (X_i - \overline{X})(Y_i - \overline{Y})}{\sum_i(X_i - \overline{X})^2} = \frac{120.001}{40.805} = 2.941$ and $b_0 = \overline{Y} - b_1 \overline{X} = 75.211 - (2.941)(15.719) = 28.981.$ • Fit and Prediction: The fitted regression line : $$Y = 28.981 + 2.941 X$$. When $$X = 18.5 = X_h$$, the predicted value, that is an estimate of the mean selling price (in$1000) when size of the house is 1850 sq. ft., is $$\widehat Y_h = 28.981 + (2.941) (18.5) = 83.39$$.
• MSE: The degrees of freedom (df) $$= n-2 = 17$$. $$SSE = \sum_i(Y_i - \overline{Y})^2 - b_1^2\sum_i(X_i - \overline{X})^2 = 203.17$$. So, $$MSE = \frac{SSE}{n-2} = \frac{203.17}{17} = 11.95$$.
• Standard Error Estimates: $$s^2(b_0) = MSE \left[\frac{1}{n} + \frac{\overline{X}^2}{\sum_i(X_i - \overline{X})^2} \right] = 73.00$$, $$\qquad s(b_0) = \sqrt{s^2(b_0)} = 8.544$$.
$$s^2(b_1) = \frac{MSE}{\sum_i(X_i - \overline{X})^2} = 0.2929$$, $$\qquad s(b_1) = \sqrt{s^2(b_1)} = 0.5412$$.
• Confidence Intervals: We assume that the errors are normal to ﬁnd conﬁdence intervals for the parameters $$\beta_0$$ and $$\beta_1$$. We use the fact that $$\frac{b_0 - \beta_0}{s(b_0)} \sim t_{n-2}$$ and $$\frac{b_1 - \beta_1}{s(b_1)} \sim t_{n-2}$$ where $$t_{n-2}$$ denotes the $$t$$-distribution with $$n-2$$ degrees of freedom. Since $$t(0.975;17) = 2.1098$$, it follows that 95% two-sided confidence interval for $$\beta_1$$ is $$2.941 \pm (2.1098)(0.5412) = (1.80, 4.08)$$. Since $$t(0.95;17) = 1.740$$, the 90% two-sided confidence interval for $$\beta_0$$ is $$28.981\pm (1.740)(8.544) = (14.12,43.84)$$.

## Contributors

• Agnes Oshiro
(Source: Spring 2012 STA108 Handout 4)

This page titled Inference in Simple Linear Regression is shared under a not declared license and was authored, remixed, and/or curated by Debashis Paul.