11.3: OLS Regression in Matrix Form
- Page ID
- 7255
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)As was the case with simple regression, we want to minimize the sum of the squared errors, ee. In matrix notation, the OLS model is y=Xb+ey=Xb+e, where e=y−Xbe=y−Xb. The sum of the squared ee is:
∑e2i=[e1e2⋯en]⎡⎢ ⎢ ⎢ ⎢⎣e1e2⋮en⎤⎥ ⎥ ⎥ ⎥⎦=e′e(11.1)(11.1)∑ei2=[e1e2⋯en][e1e2⋮en]=e′e
Therefore, we want to find the bb that minimizes this function:
e′e=(y−Xb)′(y−Xb)=y′y−b′X′y−y′Xb+b′X′Xb=y′y−2b′X′y+b′X′Xbe′e=(y−Xb)′(y−Xb)=y′y−b′X′y−y′Xb+b′X′Xb=y′y−2b′X′y+b′X′Xb
To do this we take the derivative of e′ee′e w.r.t bb and set it equal to 00.
∂e′e∂b=−2X′y+2X′Xb=0∂e′e∂b=−2X′y+2X′Xb=0To solve this we subtract 2X′Xb2X′Xb from both sides:−2X′Xb=−2X′y−2X′Xb=−2X′y
Then to remove the −2−2’s, we multiply each side by −1/2−1/2. This leaves us with:
(X′X)b=X′y(X′X)b=X′y
To solve for bb we multiply both sides by the inverse of X′X,(X′X)−1X′X,(X′X)−1. Note that for matrices this is equivalent to dividing each side by X′XX′X. Therefore:
b=(X′X)−1X′y(11.2)(11.2)b=(X′X)−1X′y
The X′XX′X matrix is square, and therefore invertible (i.e., the inverse exists). However, the X′XX′X matrix can be non-invertible (i.e., singular) if n<kn<k—the number of kk independent variables exceeds the nn-size—or if one or more of the independent variables is perfectly correlated with another independent variable. This is termed perfect multicollinearity and will be discussed in more detail in Chapter 14. Also note that the X′XX′X matrix contains the basis for all the necessary means, variances, and covariances among the XX’s.
X′X=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣n∑X1∑X2∑X3∑X1∑X21∑X1X2∑X1X3∑X2∑X2X1∑X22∑X2X3∑X3∑X3X1∑X3X2∑X23⎤⎥ ⎥ ⎥ ⎥ ⎥⎦X′X=[n∑X1∑X2∑X3∑X1∑X12∑X1X2∑X1X3∑X2∑X2X1∑X22∑X2X3∑X3∑X3X1∑X3X2∑X32]
Regression in Matrix Form
Assume a model using nn observations, kk parameters, and k−1k−1, XiXi (independent) variables.
y=Xb+e^y=Xbb=(X′X)−1X′yy=Xb+ey^=Xbb=(X′X)−1X′y
- y=n∗1y=n∗1 column vector of observations of the DV, YY
- ^y=n∗1y^=n∗1 column vector of predicted YY values
- X=n∗kX=n∗k matrix of observations of the IVs; first column 11s
- b=k∗1b=k∗1 column vector of regression coefficients; first row is AA
- e=n∗1e=n∗1 column vector of nn residual values
Using the following steps, we will use R
to calculate bb, a vector of regression coefficients; ^yy^, a vector of predicted yy values; and ee, a vector of residuals.
We want to fit the model y=Xb+ey=Xb+e to the following matrices:
y=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣611435910⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦X=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣1454172312641196134517341825⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦y=[611435910]X=[1454172312641196134517341825]
Create two objects, the yy matrix and the XX matrix.
y <- matrix(c(6,11,4,3,5,9,10),7,1)
y
## [,1]
## [1,] 6
## [2,] 11
## [3,] 4
## [4,] 3
## [5,] 5
## [6,] 9
## [7,] 10
X <- matrix(c(1,1,1,1,1,1,1,4,7,2,1,3,7,8,5,2,6,9,4,3,2,4,3,4,6,5,4,5),7,4)
X
## [,1] [,2] [,3] [,4]
## [1,] 1 4 5 4
## [2,] 1 7 2 3
## [3,] 1 2 6 4
## [4,] 1 1 9 6
## [5,] 1 3 4 5
## [6,] 1 7 3 4
## [7,] 1 8 2 5
Calculate bb: b=(X′X)−1X′yb=(X′X)−1X′y.
We can calculate this in R
in just a few steps. First, we transpose XX to get X′X′.
X.prime <- t(X)
X.prime
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 1 1 1 1 1 1 1
## [2,] 4 7 2 1 3 7 8
## [3,] 5 2 6 9 4 3 2
## [4,] 4 3 4 6 5 4 5
Then we multiply XX by X′X′; (X′XX′X).
X.prime.X <- X.prime %*% X
X.prime.X
## [,1] [,2] [,3] [,4]
## [1,] 7 32 31 31
## [2,] 32 192 104 134
## [3,] 31 104 175 146
## [4,] 31 134 146 143
Next, we find the inverse of X′XX′X; X′X−1X′X−1
X.prime.X.inv<-solve(X.prime.X)
X.prime.X.inv
## [,1] [,2] [,3] [,4]
## [1,] 12.2420551 -1.04528602 -1.01536017 -0.63771186
## [2,] -1.0452860 0.12936970 0.13744703 -0.03495763
## [3,] -1.0153602 0.13744703 0.18697034 -0.09957627
## [4,] -0.6377119 -0.03495763 -0.09957627 0.27966102
Then, we multiply X′X−1X′X−1 by X′X′.
X.prime.X.inv.X.prime<-X.prime.X.inv %*% X.prime
X.prime.X.inv.X.prime
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.43326271 0.98119703 1.50847458 -1.7677436 1.8561970 -0.6718750
## [2,] 0.01959746 0.03032309 -0.10169492 0.1113612 -0.2821769 0.1328125
## [3,] 0.07097458 0.02198093 -0.01694915 0.2073623 -0.3530191 0.1093750
## [4,] -0.15677966 -0.24258475 -0.18644068 0.1091102 0.2574153 -0.0625000
## [,7]
## [1,] -1.33951271
## [2,] 0.08977754
## [3,] -0.03972458
## [4,] 0.28177966
Finally, to obtain the bb vector we multiply X′X−1X′X′X−1X′ by yy.
b<-X.prime.X.inv.X.prime %*% y
b
## [,1]
## [1,] 3.96239407
## [2,] 1.06064619
## [3,] 0.04396186
## [4,] -0.48516949
We can use the lm
function in R
to check and see whether our “by hand” matrix approach gets the same result as does the “canned” multiple regression routine:
lm(y~0+X)
##
## Call:
## lm(formula = y ~ 0 + X)
##
## Coefficients:
## X1 X2 X3 X4
## 3.96239 1.06065 0.04396 -0.48517
Calculate ^yy^: ^y=Xby^=Xb.
To calculate the ^yy^ vector in R
, simply multiply X
and b
.
y.hat <- X %*% b
y.hat
## [,1]
## [1,] 6.484110
## [2,] 10.019333
## [3,] 4.406780
## [4,] 2.507680
## [5,] 4.894333
## [6,] 9.578125
## [7,] 10.109640
Calculate ee.
To calculate ee, the vector of residuals, simply subtract the vector yy from the vector ^yy^.
e <- y-y.hat
e
## [,1]
## [1,] -0.4841102
## [2,] 0.9806674
## [3,] -0.4067797
## [4,] 0.4923199
## [5,] 0.1056674
## [6,] -0.5781250
## [7,] -0.1096398