Skip to main content
Statistics LibreTexts

3: Intro to Linear Regression

  • Page ID
    57709
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Strešlau, the Capital or Ruritania

    Regression is a set of methods that seek to learn the specific relationship between one or more influenced (dependent, response) variables and one or more influencing (independent, predictor) variables. There are many existing regression methods, each focusing on different ways of determining how best to quantify that relationship.

    As is tradition, this chapter starts with our first definition of "best fit" and derives many results from that definition. This chapter is entirely mathematical in that probability distributions are not considered (until Chapter 5).

    ✦•················• 😺 •··················•✦

    Scatter plot of the sample data
    Figure \(\PageIndex{1}\): Sample data and a line of best fit for that data. Note that the slope of the line is negative. This indicates that increasing values of \(x\) tend to correspond to lower values of \(y\). Regression detects for such trends.

    Let \(x\) and \(y\) be numeric variables. The linear relationship between \(x\) and \(y\) can be summarized by a line that "best" fits the observed data. That is, we can summarize the relationship between \(x\) and \(y\) using a linear equation:

    \begin{equation}
    y = \beta_0 + \beta_1 x + \varepsilon \label{eq:ch2a-lobf}
    \end{equation}

    Here, parameter \(\beta_1\) represents the slope and parameter \(\beta_0\) represents the y-intercept (the value of \(y\) on the line when \(x=0\)). The slope is usually the only thing in the equation that is interesting; it is the effect of \(x\) on \(y\). The \(\varepsilon\) represents the vertical distance between the observation and the population line of best fit. It contains all of the things that affect \(y\) that are not included in \(x\).

    We said that the line given in equation \ref{eq:ch2a-lobf} "best" fits the observed data. What we mean by "best" determines where we go from here. In thinking about "best," it may help to see some sample data and the "line of best fit" for it (Figure \(\PageIndex{1}\), above).

    A good statistician will ask:

    What makes this line the "best"?

    A good statistician will answer:

    It depends.

    Note that there are at least three definitions of "best" that we can use:

    1. Maximize the likelihood that the data were generated
    2. Minimize the sum of the absolute value of the residuals
    3. Minimize the sum of the square of the residuals

    All three definitions are entirely legitimate — as are many other definitions. However, each leads to different estimation methods and estimators.

    Note

    While different models will usually give different estimates, the substantive conclusions will rarely differ significantly in a well-formed model.

    Note that the result will be a line represented by

    \begin{equation}
    \hat{y} = b_0 + b_1 x \label{eq:lm2a-model}
    \end{equation}

    Using Latin characters indicates that these are based on your particular sample; they are sample estimates. Contrast this with using Greek characters to indicate population parameters. The "hat" on the \(y\) indicates that this is an estimate. All together, this is our model equation. It is the equation of the line of best fit based on the data you collected.

    The first definition leads to "maximum likelihood estimation," which will be covered in Chapter 12. It is an excellent technique that can be generalized to many more settings than can ordinary least squares. Its greatest strength is that it makes use of the researcher's greater understanding of the data-generating process (Chapter 14 to Chapter 18). Its greatest weakness is the mathematics involved.

    The second definition leads to a type of robust regression frequently termed "median regression." This method is helpful for times when there are outliers in the data that you cannot (or should not) remove. The drawback to this method is that estimating the two parameters (\(\beta_0\) and \(\beta_1\)) does not provide a closed-form solution. In other words, it requires a repetitive sequence of steps and can only approximate those estimates. Furthermore, the approximation process is computationally intensive. Because of this, median regression was little used until recently. Because of this, the statistical theory behind it is not as well explored as other types. We will see this in Chapter 11: Quantile Regression.

    The most popular definition of "best," and the one that starts our journey, is the final definition. It leads to an estimation method called "ordinary least squares" (OLS). It is rather straight-forward to minimize a sum of squared values using differential calculus. One strength is that an equation results from this process — a closed-form solution with no need for iteration. This means that the process returns mathematically exact values. The drawback is that it is limited in the types of processes that can be modeled.

    We start exploring ordinary least squares immediately.


    This page titled 3: Intro to Linear Regression is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Ole Forsberg.

    • Was this article helpful?