Skip to main content
Statistics LibreTexts

3.3: Our First Assumptions

  • Page ID
    57712
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    This was fun. We were able to determine the correct formula for the line of best fit — given our particular definition of "best." Those equations lead to other equations. These are the mathematical results for our given sample.

    Cool mule.

    ✦•················• ✦ •··················•✦

    Until this point, we have only required variation in the independent variable. If we make three additional assumptions, we have additional results.

    Note

    This is how mathematical statistics progresses. Assumptions are made, then we play with the equations to learn about the consequences of those assumptions. Then, when we have exhausted our efforts, we make additional assumptions... ad infinitum

    Recall that the data model is

    \begin{equation}
    y = \beta_0 + \beta_1 x + \varepsilon \label{eq:dataModel}
    \end{equation}

    With this, here are the three assumptions we will make, all about the residuals:

    • The first assumption is that the residuals are realizations of a random variable (\(\varepsilon\) has a distribution).
    • The second is that the expected value of the residuals is zero, \(E[\varepsilon]=0\) (the measurements are not systematically biased).
    • The third is that the residuals are independent and have a finite and constant variance, \(V[\varepsilon]=\sigma^2 < \infty\) (the residuals are homoskedastic).

    The above simple assumptions lead to several additional interesting results. Some are proven here, some are left as exercises.

    Theorem \(\PageIndex{1}\)

    The OLS estimator for \(\beta_1\) is unbiased. That is,

    \begin{equation}
    E[b_1] = \beta_1
    \end{equation}


    Proof:
    To prove this, we will start with the formula for \(b_1\) and simplify until we obtain the results.

    \begin{align}
    E[b_1] &= E\left[ \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}\right] \\[1em]
    &= E\left[ \frac{\sum_{i=1}^n (x_i - \bar{x})y_i}{\sum_{i=1}^n (x_i - \bar{x})^2} \right] \\[1em]
    &= \frac{\sum_{i=1}^n (x_i - \bar{x})E[y_i]}{\sum_{i=1}^n (x_i - \bar{x})^2} \\[1em]
    &= \frac{\sum_{i=1}^n (x_i - \bar{x})\left(\beta_0 + x_i\beta_1 + \varepsilon\right)}{\sum_{i=1}^n (x_i - \bar{x})^2} \\[1em]
    %
    &= \frac{\sum_{i=1}^n (x_i - \bar{x})\beta_0}{\sum_{i=1}^n (x_i - \bar{x})^2} + \frac{\sum_{i=1}^n (x_i - \bar{x}) x_i\beta_1 }{\sum_{i=1}^n (x_i - \bar{x})^2} + \frac{\sum_{i=1}^n (x_i - \bar{x})E[\varepsilon]}{\sum_{i=1}^n (x_i - \bar{x})^2} \\[1em]
    %
    &= \frac{\beta_0\sum_{i=1}^n (x_i - \bar{x})}{\sum_{i=1}^n (x_i - \bar{x})^2} + \frac{\beta_1 \sum_{i=1}^n (x_i - \bar{x})x_i}{\sum_{i=1}^n (x_i - \bar{x})^2} + \frac{E[\varepsilon] \sum_{i=1}^n (x_i - \bar{x})}{\sum_{i=1}^n (x_i - \bar{x})^2} \\[1em]
    %
    &= \beta_0\ \frac{0}{\sum_{i=1}^n (x_i - \bar{x})^2} + \beta_1\ \frac{\sum_{i=1}^n (x_i - \bar{x})(x_i - \bar{x})}{\sum_{i=1}^n (x_i - \bar{x})^2} + 0\ \frac{0}{\sum_{i=1}^n (x_i - \bar{x})^2} \\[1em]
    %
    &= \beta_1
    \end{align}

    Thus, the OLS estimator of \(\beta_1\) is unbiased. This is a nice property. It means \(E[b_1] = \beta_1\).

    Note

    Did you notice where we used these three results?

    \begin{align}
    & \sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y}) = \sum_{i=1}^n (x_i - \bar{x})y_i \text{,} \\
    & \sum_{i=1}^n (x_i - \bar{x})(x_i - \bar{x}) = \sum_{i=1}^n (x_i - \bar{x})x_i \text{, and} \\
    & \sum_{i=1}^n (x_i - \bar{x}) = 0
    \end{align}

    All three are just simple algebra.


    ───── ⋆⋅☆⋅⋆ ─────

    Here are a few other results for you to prove:

    Theorem \(\PageIndex{2}\)

    \( E[\beta_0] = \beta_0 \)

    Theorem \(\PageIndex{3}\)

    \( V[b_1] = \frac{\displaystyle \sigma^2}{\displaystyle S_{xx}} \)

    Theorem \(\PageIndex{4}\)

    \( V[b_0] = \sigma^2\left(\frac{\displaystyle 1}{\displaystyle n} + \frac{\displaystyle \bar{x}}{\displaystyle S_{xx}}\right) \)

    Theorem \(\PageIndex{5}\)

    \( Cov[b_0, b_1] = -\sigma^2 \frac{\displaystyle \bar{x}}{\displaystyle S_{xx}} \)

    ───── ⋆⋅☆⋅⋆ ─────

    Finally, let us define the mean square error (MSE) in the case of simple linear regression, SLR (one dependent and one independent variable).

    Definition: Mean Square Error (MSE)

    \( \text{MSE} = \frac{\displaystyle 1}{\displaystyle n-2}\ \displaystyle \sum_{i=1}^n e_i^2 \)

    Theorem \(\PageIndex{6}\)

    \( E[\text{MSE}] = \sigma^2 \)

    In other words, the definition of the MSE provides an unbiased estimator of the variance of the residuals. This is why we define it in this manner.

    Note

    Be aware that the above definition of the MSE only holds in the case of simple linear regression (SLR); that is, it holds when there is just one dependent and one independent variable. It can be shown that the general definition of the MSE is

    \( \text{MSE} = \frac{\displaystyle 1}{\displaystyle n-p}\ \displaystyle \sum_{i=1}^n e_i^2 \)

    Here, p is the number of parameters being estimated, which is usually the number of independent variables plus the constant term.

     


    This page titled 3.3: Our First Assumptions is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Ole Forsberg.

    • Was this article helpful?