# Simple linear regression

The basic problem in regression analysis is to understand the relationship between a **response variable, denoted by** \(Y\), and one or more **predictor variables, denoted by **\(X\). The relationship is typically empirical or statistical as opposed to functional or mathematical. The goal is to describe this relationship in the form of a functional dependence of the mean value of Y given any value of \(X\) from paired observations \( \{(X_i,Y_i) : i=1,\ldots,n\} \)

A basic linear regression model for the response \(Y\) on the predictor \(X\) is given by

$$Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i, \qquad i=1,\ldots,n,$$

where the

*noise*\( \varepsilon_1, \ldots, \varepsilon_n \) are uncorrelated, \(Mean(\varepsilon_i) = 0\), and \( Variance(\varepsilon_i) = \sigma^2.\)### Interpretation

Look at the scatter plot of \(Y\) (vertical axis) vs. \(X\) (horizontal axis). Consider narrow vertical strips around the different values of \(X\):

- Mean (measure of center) of the points falling in the vertical strips lie (approximately) on a straight line with
**slope**\( \beta_1\) and**intercept**\(\beta_0\) . - Standard deviations (measure of spread) of the points falling in each vertical strip are (roughly) the same.