# Parameter Estimation in Simple Linear Regression

### Parameter estimation in simple linear regression

• **Model**: \(X\) and \(Y\) are the *predictor* and *response* variables, respectively. Fit the model,

\[Y_i = \beta_0 + \beta_1X_i + \epsilon_i, i = 1,...,n \tag{1}\]

where \(\epsilon_1,...,\epsilon_n\) are uncorrelated, E\((\epsilon_i)\) = 0, Var\((\epsilon_i) = \sigma^2\) for all \(i\).

• **Estimates of the parameters**: We have the following estimates for \(\beta_0, \beta_1,\) and \(\sigma^2\), respectively.

\[b_0 = \overline{Y} - b_1 \overline{x},\]

\[b_1 = \dfrac{\sum_{i=1}^n(X_i - \overline{X})(Y_i - \overline{Y})}{\sum_{i=1}^n(X_i - \overline{X})^2},\]

\[\widehat{\sigma}^2 = MSE = \frac{SSE}{n-2} \tag{2},\]

where

\[SSE = \sum_{i=1}^n(Y_i - b_0 - b_1X_i)^2 = \sum_{i=1}^n(Y_i - \overline{Y})^2 - b_1^2\sum_{i=1}^n(X_i - \overline{X})^2 . \]

**Prediction**: The predicted value of \(Y\), given \(X = X_h\) is \(\widehat{Y}_h = b_0 + b_1X_h = \overline{Y} + b_1(X_h - \overline{X})\).**Expected values and variances**: Under the assumptions of the simple linear regression model, we have \(E(b_0) = \beta_0, E(b_1) = \beta_1\) and \(E(\widehat{\sigma}^2) = E(MSE) = \sigma^2\). In other words, the estimators \(b_0, b_1, \widehat{\sigma}^2\) are*unbiased.*Also, \(E(\widehat{Y}_h|X_h) = \beta_0 + \beta_1X_h\).

Assuming that \(X_1,...,X_n\) are *non-random*, the variances of \(b_0\) and \(b_1\) are given by:

\[\sigma^2(b_0) = \sigma^2\left [ \frac{1}{n} + \frac{\overline{X}^2}{\sum_i(X_i-\overline{X})^2} \right ], \]

and

\[ \sigma^2(b_1) = \frac{\sigma^2}{ \sum_{i=1}^n (X_i - \overline{X})^2 } \tag{3}. \]

Replacing \(\sigma^2\) by \(MSE\), we obtain the estimates of the variances of \(\beta_0\) and \(beta_1\), and these are denoted by

\[s^2(b_0) = MSE \left [ \frac{1}{n} + \frac{\overline{X}^2}{\sum_i(X_i - \overline{X})^2} \right ],\]

and

\[s^2(b_1) = \frac{MSE}{\sum_{i=1}^n(X_i - \overline{X})^2}, \tag{4}\]

respectively. Thus, \(s(b_0)\) and \(s(b_1)\) are the estimated **standard errors** of the estimators of \(\beta_0\) and \(\beta_1\), respectively.

Similarly, the variance and its estimate of \(\widehat{Y}_h\) are

\[\sigma^2(\widehat{Y}_h) = \sigma^2\) /(\left [ \dfrac{1}{n} + \dfrac{(X_h - \overline{X})^2}{\sum_i(X_i-\overline{X})^2} \right ] \]

\[s^2(\widehat{Y}_h) = MSE \left [ \dfrac{1}{n} + \dfrac{(X_h-\overline{X})^2}{\sum_i(X_i - \overline{X})^2} \right ], \tag{5}\]

respectively.

### Normal linear regression model

In model specified by (1), if the random variables \(\epsilon_1, ..., \epsilon_n\) are independent and identically distributed as \(N(0,\sigma^2)\), then we have a *normal linear regression model*. This means that for each fixed value of \(X\), the conditional distribution of \(Y\) given \(X\) is \(N(\beta_0 + \beta_1X, \sigma^2)\).

#### Maximum likelihood estimation

Under this model, one can also obtain the estimates of \(b_0, b_1,\) and \(\sigma^2\) by method of **maximum likelihood**. This means that one treats the joint probability density function of \(Y_1, ..., Y_n\) given \(X_1, ..., X_n\)

\[f(Y_1, ..., Y_n|X_1, ..., X_n;\beta_0,\beta_1,\sigma^2) =\]

\[ \dfrac{1}{(\sigma\sqrt{2\pi})^n} \exp \left(\dfrac{-1}{2\sigma^2}\sum_{i=1}^n(Y_i - \beta_0 - \beta_1X_i)^2 \right )\]

as a function, say \(L(\beta_0,\beta_1,\sigma^2)\) of the parameters, and then maximizes this function w.r.t. the parameters by solving the equations:

\[\frac{\partial(log L)}{\partial\beta_0} = 0,\]

\[\frac{\partial(log L)}{\partial\beta_1} = 0,\]

\[\frac{\partial(log L)}{\partial\sigma^2} = 0 \tag{6}\]

to obtain the *maximum likelihood estimates*:

\[\widehat{\beta}_1 = \frac{\sum_{i=1}^n(X_i - \overline{X})(Y_i - \overline{Y})}{\sum_{i=1}^n(X_i - \overline{X})^2} = b_1,\]

\[\widehat{\beta}_0 = \overline{Y} - \widehat{\beta}_1\overline{X} = b_0\]

\[\widehat{\sigma^2} = \frac{1}{n}\sum_{i=1}^n(Y_i - \widehat{\beta}_0 - \widehat{\beta}_1X_i)^2 = \frac{n - 2}{n}MSE. \tag{7}.\]

#### Exact distribution

Under the normality assumption, we can compute exact distribution of certain random variables that are very important for conducting tests of hypotheses for the different parameters. We have, \(SSE\) and \((b_0,b_1)\) are independently distributed, and

\[SSE ~ \sigma^2\chi^2_(n-2),\]

\[\dfrac{b_0 - \beta_0}{s(b_0}~t_(n-2),\]

and

\[\dfrac{b_1 - \beta_1}{s(b_1)}~t_(n-2) \tag{8}. \]

where \(\chi^2_k\) and \(t_k\) denote the Chi-square and* t*-distribution, respectively, with *k* **degrees of freedom**.

### Contributors

- Scott Brunstein
- Debashis Paul