Skip to main content
Statistics LibreTexts

Simple linear regression

  • Page ID
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    The basic problem in regression analysis is to understand the relationship between a response variable, denoted by \(Y\), and one or more predictor variables, denoted by \(X\). The relationship is typically empirical or statistical as opposed to functional or mathematical. The goal is to describe this relationship in the form of a functional dependence of the mean value of Y given any value of \(X\) from paired observations \( \{(X_i,Y_i) : i=1,\ldots,n\} \)

    A basic linear regression model for the response \(Y\) on the predictor \(X\) is given by

    $$Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i, \qquad i=1,\ldots,n,$$

    where the noise \( \varepsilon_1, \ldots, \varepsilon_n \) are uncorrelated, \(Mean(\varepsilon_i) = 0\), and \( Variance(\varepsilon_i) = \sigma^2.\)


    Look at the scatter plot of \(Y\) (vertical axis) vs. \(X\) (horizontal axis). Consider narrow vertical strips around the different values of \(X\):

    • Mean (measure of center) of the points falling in the vertical strips lie (approximately) on a straight line with slope \( \beta_1\) and intercept \(\beta_0\) .
    • Standard deviations (measure of spread) of the points falling in each vertical strip are (roughly) the same.

    This page titled Simple linear regression is shared under a not declared license and was authored, remixed, and/or curated by Debashis Paul.