# 12.1: Theoretical Specification

- Page ID
- 7258

As with simple regression, the theoretical multiple regression model contains a ** systematic** component — Y=α+β1Xi1+β2Xi2+…+βkXikY=α+β1Xi1+β2Xi2+…+βkXik and a

**component—ϵiϵi. The overall theoretical model is expressed as:**

**stochastic**Y=α+β1Xi1+β2Xi2+…+βkXik+ϵiY=α+β1Xi1+β2Xi2+…+βkXik+ϵi

where - αα is the constant term - β1β1 through βkβk are the parameters of IVs 1 through k - kk is the number of IVs - ϵϵ is the error term

In matrix form the theoretical model can be much more simply expressed as: y=Xβ+ϵy=Xβ+ϵ.

The empirical model that will be estimated can be expressed as:Yi=A+B1Xi1+B2Xi2+…+BkXik+Ei=^Yi+EiYi=A+B1Xi1+B2Xi2+…+BkXik+Ei=Yi^+EiTherefore, the residual sum of squares (RSS) for the model is expressed as:RSS=∑E2i=∑(Yi−^Yi)2=∑(Yi−(A+B1Xi1+B2Xi2+…+BkXik))2RSS=∑Ei2=∑(Yi−Yi^)2=∑(Yi−(A+B1Xi1+B2Xi2+…+BkXik))2

## 12.1.1 Assumptions of OLS Regression

There are several important assumptions necessary for multiple regression. These assumptions include linearity, fixed XX’s, and errors that are normally distributed.

OLS Assumptions

Systematic Component

- Linearity
- Fixed XX

Stochastic Component

- Errors have identical distributions
- Errors are independent of XX and other ϵiϵi
- Errors are normally distributed

### Linearity

When OLS is used, it is assumed that a linear functional form is the correct specification for the model being estimated. Note that linearity is assumed in the * parameters* (that is, for the BsBs), therefore the expected value of the dependent variable is a linear function of the parameters, not necessarily of the variables themselves. So, as we will discuss in later chapters, it is possible to transform the variables (the XsXs) to introduce non-linearity into the model while retaining linear estimated coefficients. For example, a model with a squared XX term can be estimated with OLS:

Y=A+BX2i+EY=A+BXi2+E

However, a model with a squared BB term cannot.

### Fixed XX

The assumption of fixed values of XX means that the value of XX in our observations is not systematically related to the value of the other XX’s. We can see this most clearly in an experimental setting where the researcher can manipulate the experimental variable while controlling for all other possible XsXs through random assignment to a treatment and control group. In that case, the value of the experimental treatment is completely unrelated to the value of the other XsXs – or, put differently, the treatment variable is orthogonal to the other XsXs. This assumption is carried through to observational studies as well. Note that if XX is assumed to be fixed, then changes in YY are assumed to be a result of the independent variations in the XX’s and error (and nothing else).