Inference in Simple Linear Regression

Last updated
Save as PDF

Page ID: 224

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$ \newcommand{\dsum}{\displaystyle\sum\limits} $

$ \newcommand{\dint}{\displaystyle\int\limits} $

$ \newcommand{\dlim}{\displaystyle\lim\limits} $

$ \newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\id}{\mathrm{id}}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\kernel}{\mathrm{null}\,}$

$ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$

$ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$

$ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\AA}{\unicode[.8,0]{x212B}}$

$ \newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$ \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$ \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vectorC}[1]{\textbf{#1}} $

$ \newcommand{\vectorD}[1]{\overrightarrow{#1}} $

$ \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} $

$ \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} $

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$\newcommand{\longvect}{\overrightarrow}$

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$\newcommand{\avec}{\mathbf a}$ $\newcommand{\bvec}{\mathbf b}$ $\newcommand{\cvec}{\mathbf c}$ $\newcommand{\dvec}{\mathbf d}$ $\newcommand{\dtil}{\widetilde{\mathbf d}}$ $\newcommand{\evec}{\mathbf e}$ $\newcommand{\fvec}{\mathbf f}$ $\newcommand{\nvec}{\mathbf n}$ $\newcommand{\pvec}{\mathbf p}$ $\newcommand{\qvec}{\mathbf q}$ $\newcommand{\svec}{\mathbf s}$ $\newcommand{\tvec}{\mathbf t}$ $\newcommand{\uvec}{\mathbf u}$ $\newcommand{\vvec}{\mathbf v}$ $\newcommand{\wvec}{\mathbf w}$ $\newcommand{\xvec}{\mathbf x}$ $\newcommand{\yvec}{\mathbf y}$ $\newcommand{\zvec}{\mathbf z}$ $\newcommand{\rvec}{\mathbf r}$ $\newcommand{\mvec}{\mathbf m}$ $\newcommand{\zerovec}{\mathbf 0}$ $\newcommand{\onevec}{\mathbf 1}$ $\newcommand{\real}{\mathbb R}$ $\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$ $\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$ $\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$ $\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$ $\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$ $\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$ $\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$ $\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$ $\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$ $\newcommand{\laspan}[1]{\text{Span}\{#1\}}$ $\newcommand{\bcal}{\cal B}$ $\newcommand{\ccal}{\cal C}$ $\newcommand{\scal}{\cal S}$ $\newcommand{\wcal}{\cal W}$ $\newcommand{\ecal}{\cal E}$ $\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$ $\newcommand{\gray}[1]{\color{gray}{#1}}$ $\newcommand{\lgray}[1]{\color{lightgray}{#1}}$ $\newcommand{\rank}{\operatorname{rank}}$ $\newcommand{\row}{\text{Row}}$ $\newcommand{\col}{\text{Col}}$ $\renewcommand{\row}{\text{Row}}$ $\newcommand{\nul}{\text{Nul}}$ $\newcommand{\var}{\text{Var}}$ $\newcommand{\corr}{\text{corr}}$ $\newcommand{\len}[1]{\left|#1\right|}$ $\newcommand{\bbar}{\overline{\bvec}}$ $\newcommand{\bhat}{\widehat{\bvec}}$ $\newcommand{\bperp}{\bvec^\perp}$ $\newcommand{\xhat}{\widehat{\xvec}}$ $\newcommand{\vhat}{\widehat{\vvec}}$ $\newcommand{\uhat}{\widehat{\uvec}}$ $\newcommand{\what}{\widehat{\wvec}}$ $\newcommand{\Sighat}{\widehat{\Sigma}}$ $\newcommand{\lt}{<}$ $\newcommand{\gt}{>}$ $\newcommand{\amp}{&}$ $\definecolor{fillinmathshade}{gray}{0.9}$

Inference in Simple Linear Regression

Fact : Under normal regression model $(b_0,b_1)$ and $SSE$ are independently distributed and
$\frac{b_0 - \beta_0}{s(b_0)} \sim t_{n-2}$, $\qquad \frac{b_1 - \beta_1}{s(b_1)} \sim t_{n-2}$, $\qquad SSE \sim \sigma^2 \chi_{n-2}^2$.
Confidence interval for $\beta_0$ and $\beta_1$ : $100(1-\alpha)\%$ (two-sided) confidence interval for $\beta_i$:
$(b_i - t(1-\alpha/2;n-2) s(b_i)$, $b_i + t(1-\alpha/2;n-2) s(b_i))$

for $i=0,1$, where $t(1-\alpha/2;n-2)$ is the $1-\alpha/2$ upper cut-off point (or $(1-\alpha/2)$ quantile) of $t_{n-2}$ distribution; i.e., $P(t_{n-2} > t(1-\alpha/2;n-2)) = \alpha/2$.

Hypothesis tests for $\beta_0$ and $\beta_1$ : $H_0 : \beta_i = \beta_{i0}$ ($i=0$ or $1$).
Test statistic : $T_i = \frac{b_i - \beta_{i0}}{s(b_i)}$.

Alternative: $H_1 : \beta_i > \beta_{i0}$. Reject $H_0$ at level $\alpha$ if $\frac{b_i - \beta_{i0}}{s(b_i)} > t(1-\alpha;n-2)$. Or if, P-value = $P(t_{n-2} > T_i^{observed}) < \alpha$.
Alternative: $H_1 : \beta_i < \beta_{i0}$. Reject $H_0$ at level $\alpha$ if $\frac{b_i - \beta_{i0}}{s(b_i)} < t(\alpha;n-2)$. Or if, P-value = $P(t_{n-2} < T_i^{observed}) < \alpha$.
Alternative: $H_1 : \beta_i \neq \beta_{i0}$. Reject $H_0$ at level $\alpha$ if $|\frac{b_i - \beta_{i0}}{s(b_i)}| > t(1-\alpha/2;n-2)$. Or if, P-value = $P(|t_{n-2}| > |T_i^{observed}|) < \alpha$.

Inference for mean response at $X = X_h$

Point estimate: $\widehat Y_h = b_0 + b_1 X_h$.

Fact: $E(\widehat Y_h) = \beta_0 + \beta_1 X_h = E(Y_h)$, $Var(\widehat Y_h) = \sigma^2(\widehat Y_h) = \sigma^2\left[\frac{1}{n} + \frac{(X_h - \overline{X})^2}{\sum_i (X_i - \overline{X})^2}\right]$. Estimated variance is $s^2(\widehat Y_h) = MSE \left[\frac{1}{n} + \frac{(X_h - \overline{X})^2}{\sum_i (X_i - \overline{X})^2}\right]$.

Distribution: $\frac{\widehat Y_h - E(Y_h)}{s(\widehat Y_h)} \sim t_{n-2}$.

Confidence interval: $100(1-\alpha)$% confidence interval for $E(Y_h)$ is $(\widehat Y_h - t(1-\alpha/2;n-2) s(\widehat Y_h),\widehat Y_h + t(1-\alpha/2;n-2) s(\widehat Y_h))$.

Prediction of a new observation $Y_{h(new)}$ at $X = X_h$

Prediction : $\widehat Y_{h(new)} = \widehat Y_h = b_0 + b_1 X_h$.

Error in prediction : $Y_{h(new)} - \widehat Y_{h(new)} = Y_{h(new)} - \widehat Y_h$.

Fact : $\sigma^2(Y_{h(new)} - \widehat Y_h) = \sigma^2(Y_{h(new)}) + \sigma^2(\widehat Y_h) = \sigma^2 + \sigma^2(\widehat Y_h) = \sigma^2\left[1+\frac{1}{n}+ \frac{(X_h - \overline{X})^2}{\sum_i (X_i - \overline{X})^2}\right]$.

Estimate of $\sigma^2(Y_{h(new)} - \widehat Y_h)$ is $s^2(Y_{h(new)} - \widehat Y_h) = MSE \left[1+\frac{1}{n}+ \frac{(X_h - \overline{X})^2}{\sum_i (X_i - \overline{X})^2}\right]$.

Distribution : $\frac{Y_{h(new)} - \widehat Y_h}{s(Y_{h(new)} -\widehat Y_h)} \sim t_{n-2}$.

Prediction interval : $100(1-\alpha)$% prediction interval for $Y_{h(new)}$ is $(\widehat Y_h - t(1-\alpha/2;n-2) s(Y_{h(new)}-\widehat Y_h),\widehat Y_h + t(1-\alpha/2;n-2) s(Y_{h(new)}-\widehat Y_h))$.

Confidence band for the regression line : At $X=X_h$ the $100(1-\alpha)$% confidence band for the regression line is given by $\widehat Y_h \pm w_\alpha s(\widehat Y_h), \qquad \mbox{where} \sim w_\alpha = \sqrt{2F(1-\alpha; 2, n-2)}$.

Here $F(1-\alpha;2,n-2)$ is the $1-\alpha$ upper cut-off point (or, $(1-\alpha)$ quantile) for the $F_{2,n-2}$ distribution ($F$ distribution with d.f. $(2,n-2)$).

Example $\PageIndex{1}$: Simple linear regression

We consider a data set on housing price. Here$Y=$ selling price of houses (in $1000), and $X=$ size of house (100 square feet). The summary statistics are given below:

$n = 19$, $\overline{X} = 15.719$, $\overline{Y} = 75.211$

$\sum_i(X_i - \overline{X})^2 = 40.805$, $\sum_i (Y_i - \overline{Y})^2 = 556.078$, $\sum_i (X_i - \overline{X})(Y_i - \overline{Y}) = 120.001$.

Estimates of $\beta_1$ and $\beta_0$ :

\[b_1 = \frac{\sum_i (X_i - \overline{X})(Y_i - \overline{Y})}{\sum_i(X_i - \overline{X})^2} = \frac{120.001}{40.805} = 2.941\]

and

\[b_0 = \overline{Y} - b_1 \overline{X} = 75.211 - (2.941)(15.719) = 28.981.\]

Fit and Prediction: The fitted regression line : $Y = 28.981 + 2.941 X$. When $X = 18.5 = X_h$, the predicted value, that is an estimate of the mean selling price (in $1000) when size of the house is 1850 sq. ft., is $\widehat Y_h = 28.981 + (2.941) (18.5) = 83.39$.
MSE: The degrees of freedom (df) $= n-2 = 17$. $SSE = \sum_i(Y_i - \overline{Y})^2 - b_1^2\sum_i(X_i - \overline{X})^2 = 203.17$. So, $MSE = \frac{SSE}{n-2} = \frac{203.17}{17} = 11.95$.
Standard Error Estimates: $s^2(b_0) = MSE \left[\frac{1}{n} + \frac{\overline{X}^2}{\sum_i(X_i - \overline{X})^2} \right] = 73.00$, $\qquad s(b_0) = \sqrt{s^2(b_0)} = 8.544$.
$s^2(b_1) = \frac{MSE}{\sum_i(X_i - \overline{X})^2} = 0.2929$, $\qquad s(b_1) = \sqrt{s^2(b_1)} = 0.5412$.
Confidence Intervals: We assume that the errors are normal to ﬁnd conﬁdence intervals for the parameters $\beta_0$ and $\beta_1$. We use the fact that $\frac{b_0 - \beta_0}{s(b_0)} \sim t_{n-2}$ and $\frac{b_1 - \beta_1}{s(b_1)} \sim t_{n-2}$ where $t_{n-2}$ denotes the $t$-distribution with $n-2$ degrees of freedom. Since $t(0.975;17) = 2.1098$, it follows that 95% two-sided confidence interval for $\beta_1$ is $2.941 \pm (2.1098)(0.5412) = (1.80, 4.08)$. Since $t(0.95;17) = 1.740$, the 90% two-sided confidence interval for $\beta_0$ is $28.981\pm (1.740)(8.544) = (14.12,43.84)$.

Contributors

Agnes Oshiro

(Source: Spring 2012 STA108 Handout 4)

Search

Text Color

Text Size

Margin Size

Font Type