Simple linear regression

The basic problem in regression analysis is to understand the relationship between a response variable, denoted by $$Y$$, and one or more predictor variables, denoted by $$X$$. The relationship is typically empirical or statistical as opposed to functional or mathematical. The goal is to describe this relationship in the form of a functional dependence of the mean value of Y given any value of $$X$$ from paired observations $$\{(X_i,Y_i) : i=1,\ldots,n\}$$

A basic linear regression model for the response $$Y$$ on the predictor $$X$$ is given by

$$Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i, \qquad i=1,\ldots,n,$$

where the noise $$\varepsilon_1, \ldots, \varepsilon_n$$ are uncorrelated, $$Mean(\varepsilon_i) = 0$$, and $$Variance(\varepsilon_i) = \sigma^2.$$

Interpretation

Look at the scatter plot of $$Y$$ (vertical axis) vs. $$X$$ (horizontal axis). Consider narrow vertical strips around the different values of $$X$$:

• Mean (measure of center) of the points falling in the vertical strips lie (approximately) on a straight line with slope $$\beta_1$$ and intercept $$\beta_0$$ .
• Standard deviations (measure of spread) of the points falling in each vertical strip are (roughly) the same.