# Simple Linear Regression (with one predictor)

- Page ID
- 239

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)

## Model

\(X\) and \(Y\) are the predictor and response variables, respectively. Fit the model,\[ Y_i = \beta_0+\beta_1X_i+\epsilon_i, x = 1,2,...,n \]

where \( \epsilon_1 ,..., \epsilon_n \) are **uncorrelated**, \( E(\epsilon_1)=0, VAR(\epsilon_1)=\sigma^2 \).

## Interpretation

Look at the scatter plot of \(Y\) (vertical axis) versus \(X\) (horizontal axis). Consider narrow vertical strips around the different values of \(X\):- Means (measure of center) of the points falling in the vertical strips lie (approximately) on a straight line with slope \(\beta_1\) and intercept \(\beta_0\).
- Standard deviations (measure of spread) of the points falling in each vertical strip are (roughly) the same.

## Estimation of \(\beta_0 \) and \( \beta_1 \)

We employ the method of least squares to estimate \(\beta_0\) and \(\beta_1\). This means, we minimize the sum of squared errors : \(Q(\beta_0,\beta_1) = \sum_{i=1}^n(Y_i-\beta_0-\beta_1X_i)^2\). This involves differentiating \(Q(\beta_0,\beta_1)\) with respect to the*parameters*\(\beta_0\) and \(\beta_1\) and setting the derivatives to zero. This gives us the

**normal equations:**\[nb_0 + b_1\sum_{i=1}^nX_i = \sum_{i=1}^nY_i\] \[b_0\sum_{i=1}^nX_i+b_1\sum_{i=1}^nX_i^2 = \sum_{i=1}^nX_iY_i\] Solving these equations, we have: \[b_1=\frac{\sum_{i=1}^nX_iY_i-n\overline{XY}}{\sum_{i=1}^nX_i^2-n\overline{X}^2} = \frac{\sum_{i=1}^n(X_i-\overline{X})(Y_i-\overline{y})}{\sum_{i=1}^n(X_i-\overline{X})^2}, b_0 = \overline{Y}-b_1\overline{X}\] \(b_0\) and \(b_1\) are the

*estimates*of \(\beta_0\) and \(\beta_1\), respectively, and are sometimes denoted as \(\widehat\beta_0\) and \(\widehat\beta_1\).

## Prediction

The**fitted regression line**is given by the equation: \[\widehat{Y} = b_0 + b_1X\] and is used to predict the value of \(Y\) given a value of \(X\).

## Residuals

These are the quantities \(e_i = Y_i - \widehat{Y}_i = Y_i - (b_0 + b_1X_i)\), where \(\widehat{Y}_i = b_0 + b_1X_i\). Note that \(\epsilon_i = Y_i - \beta_0 - \beta_1X_i\). This means that \(e_i\)'s estimate \(\epsilon_i\)'s. Some properties of the regression line and residuals are :

- \(\sum_{i}e_i = 0\).
- \(\sum_{i}e_i^2 \leq \sum_{i}(Y_i - u_0 - u_1X_i)^2\) for any \((u_0, u_1)\) (with equality when \((u_0, u_1)\) = \((b_0, b_1)\)).
- \(\sum_{i}Y_i = \sum_{i}\widehat{Y}_i\).
- \(\sum_{i}X_ie_i = 0\).
- \(\sum_{i}\widehat{Y}_ie_i = 0\).
- Regression line passes through the point \((\overline{X},\overline{Y})\)
- The slope \(b_1\) of the regression line can be expressed as \(b_1 = r_{XY}\frac{sy}{sx}\), where \(r_{XY}\) is the correlation coefficient between \(X\) and \(Y\) and \(s_X\) and \(s_Y\) are the standard deviations of \(X\) and \(Y\).

**Error sum of squares**, deonted \(SSE\), is given by \[SSE = \sum_{i=1}^ne_i^2 = \sum_{i=1}^n(Y_i - \overline{Y})^2 - b_1^2\sum_{i=1}^n(X_i-\overline{X})^2.\]

## Estimation of \(\sigma^2\)

It can be shown that \(E(SSE) = (n-2)\sigma^2.\) Therefore, \(\sigma^2\) is estimated by the

**mean squared error**, i.e., \(MSE = \frac{SSE}{n-2}.\) Note also that this justifies the statement that the**degree of freedom**of the errors is \(n-2\) which is sample size \((n)\) minus the number of regression coefficients (\(\beta_0\) and \(\beta_1\)) being estimated.## Contributors

- Debashis Paul (UCD)
- Scott Brunstein (UCD)