Skip to main content
Statistics LibreTexts

Simple linear regression

The basic problem in regression analysis is to understand the relationship between a response variable, denoted by \(Y\), and one or more predictor variables, denoted by \(X\). The relationship is typically empirical or statistical as opposed to functional or mathematical. The goal is to describe this relationship in the form of a functional dependence of the mean value of Y given any value of \(X\) from paired observations  \( \{(X_i,Y_i) : i=1,\ldots,n\} \)

A basic linear regression model for the response \(Y\) on the predictor \(X\) is given by 

$$Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i, \qquad i=1,\ldots,n,$$

where the noise \( \varepsilon_1,  \ldots, \varepsilon_n \) are uncorrelated,  \(Mean(\varepsilon_i) = 0\), and  \( Variance(\varepsilon_i) = \sigma^2.\)


Look at the scatter plot of \(Y\) (vertical axis) vs. \(X\) (horizontal axis). Consider narrow vertical strips around the different values of \(X\):
  • Mean (measure of center) of the points falling in the vertical strips lie (approximately) on a straight line with  slope \( \beta_1\) and intercept \(\beta_0\) . 
  • Standard deviations (measure of spread) of the points falling in each vertical strip are (roughly) the same.