12.1: Theoretical Specification

Last updated
Save as PDF

Page ID: 7258

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

As with simple regression, the theoretical multiple regression model contains a systematic component — Y=α+β1Xi1+β2Xi2+…+βkXikY=α+β1Xi1+β2Xi2+…+βkXik and a stochastic component—ϵiϵi. The overall theoretical model is expressed as:

Y=α+β1Xi1+β2Xi2+…+βkXik+ϵiY=α+β1Xi1+β2Xi2+…+βkXik+ϵi

where - αα is the constant term - β1β1 through βkβk are the parameters of IVs 1 through k - kk is the number of IVs - ϵϵ is the error term

In matrix form the theoretical model can be much more simply expressed as: y=Xβ+ϵy=Xβ+ϵ.

The empirical model that will be estimated can be expressed as:Yi=A+B1Xi1+B2Xi2+…+BkXik+Ei=^Yi+EiYi=A+B1Xi1+B2Xi2+…+BkXik+Ei=Yi^+EiTherefore, the residual sum of squares (RSS) for the model is expressed as:RSS=∑E2i=∑(Yi−^Yi)2=∑(Yi−(A+B1Xi1+B2Xi2+…+BkXik))2RSS=∑Ei2=∑(Yi−Yi^)2=∑(Yi−(A+B1Xi1+B2Xi2+…+BkXik))2

12.1.1 Assumptions of OLS Regression

There are several important assumptions necessary for multiple regression. These assumptions include linearity, fixed XX’s, and errors that are normally distributed.

OLS Assumptions

Systematic Component

Linearity

Fixed XX

Stochastic Component

Errors have identical distributions

Errors are independent of XX and other ϵiϵi

Errors are normally distributed

Linearity

When OLS is used, it is assumed that a linear functional form is the correct specification for the model being estimated. Note that linearity is assumed in the parameters (that is, for the BsBs), therefore the expected value of the dependent variable is a linear function of the parameters, not necessarily of the variables themselves. So, as we will discuss in later chapters, it is possible to transform the variables (the XsXs) to introduce non-linearity into the model while retaining linear estimated coefficients. For example, a model with a squared XX term can be estimated with OLS:

Y=A+BX2i+EY=A+BXi2+E

However, a model with a squared BB term cannot.

Fixed XX

The assumption of fixed values of XX means that the value of XX in our observations is not systematically related to the value of the other XX’s. We can see this most clearly in an experimental setting where the researcher can manipulate the experimental variable while controlling for all other possible XsXs through random assignment to a treatment and control group. In that case, the value of the experimental treatment is completely unrelated to the value of the other XsXs – or, put differently, the treatment variable is orthogonal to the other XsXs. This assumption is carried through to observational studies as well. Note that if XX is assumed to be fixed, then changes in YY are assumed to be a result of the independent variations in the XX’s and error (and nothing else).