Skip to main content
Statistics LibreTexts

11.3: OLS Regression in Matrix Form

  • Page ID
    7255
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    As was the case with simple regression, we want to minimize the sum of the squared errors, ee. In matrix notation, the OLS model is y=Xb+ey=Xb+e, where e=y−Xbe=y−Xb. The sum of the squared ee is:

    ∑e2i=[e1e2⋯en]⎡⎢ ⎢ ⎢ ⎢⎣e1e2⋮en⎤⎥ ⎥ ⎥ ⎥⎦=e′e(11.1)(11.1)∑ei2=[e1e2⋯en][e1e2⋮en]=e′e

    Therefore, we want to find the bb that minimizes this function:

    e′e=(y−Xb)′(y−Xb)=y′y−b′X′y−y′Xb+b′X′Xb=y′y−2b′X′y+b′X′Xbe′e=(y−Xb)′(y−Xb)=y′y−b′X′y−y′Xb+b′X′Xb=y′y−2b′X′y+b′X′Xb

    To do this we take the derivative of e′ee′e w.r.t bb and set it equal to 00.

    ∂e′e∂b=−2X′y+2X′Xb=0∂e′e∂b=−2X′y+2X′Xb=0To solve this we subtract 2X′Xb2X′Xb from both sides:−2X′Xb=−2X′y−2X′Xb=−2X′y

    Then to remove the −2−2’s, we multiply each side by −1/2−1/2. This leaves us with:

    (X′X)b=X′y(X′X)b=X′y

    To solve for bb we multiply both sides by the inverse of X′X,(X′X)−1X′X,(X′X)−1. Note that for matrices this is equivalent to dividing each side by X′XX′X. Therefore:

    b=(X′X)−1X′y(11.2)(11.2)b=(X′X)−1X′y

    The X′XX′X matrix is square, and therefore invertible (i.e., the inverse exists). However, the X′XX′X matrix can be non-invertible (i.e., singular) if n<kn<k—the number of kk independent variables exceeds the nn-size—or if one or more of the independent variables is perfectly correlated with another independent variable. This is termed perfect multicollinearity and will be discussed in more detail in Chapter 14. Also note that the X′XX′X matrix contains the basis for all the necessary means, variances, and covariances among the XX’s.

    X′X=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣n∑X1∑X2∑X3∑X1∑X21∑X1X2∑X1X3∑X2∑X2X1∑X22∑X2X3∑X3∑X3X1∑X3X2∑X23⎤⎥ ⎥ ⎥ ⎥ ⎥⎦X′X=[n∑X1∑X2∑X3∑X1∑X12∑X1X2∑X1X3∑X2∑X2X1∑X22∑X2X3∑X3∑X3X1∑X3X2∑X32]

    Regression in Matrix Form

    Assume a model using nn observations, kk parameters, and k−1k−1, XiXi (independent) variables.
    y=Xb+e^y=Xbb=(X′X)−1X′yy=Xb+ey^=Xbb=(X′X)−1X′y

    • y=n∗1y=n∗1 column vector of observations of the DV, YY
    • ^y=n∗1y^=n∗1 column vector of predicted YY values
    • X=n∗kX=n∗k matrix of observations of the IVs; first column 11s
    • b=k∗1b=k∗1 column vector of regression coefficients; first row is AA
    • e=n∗1e=n∗1 column vector of nn residual values

    Using the following steps, we will use R to calculate bb, a vector of regression coefficients; ^yy^, a vector of predicted yy values; and ee, a vector of residuals.

    We want to fit the model y=Xb+ey=Xb+e to the following matrices:

    y=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣611435910⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦X=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣1454172312641196134517341825⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦y=[611435910]X=[1454172312641196134517341825]

    Create two objects, the yy matrix and the XX matrix.

    y <- matrix(c(6,11,4,3,5,9,10),7,1)
    y
    ##      [,1]
    ## [1,]    6
    ## [2,]   11
    ## [3,]    4
    ## [4,]    3
    ## [5,]    5
    ## [6,]    9
    ## [7,]   10
    X <- matrix(c(1,1,1,1,1,1,1,4,7,2,1,3,7,8,5,2,6,9,4,3,2,4,3,4,6,5,4,5),7,4)
    X
    ##      [,1] [,2] [,3] [,4]
    ## [1,]    1    4    5    4
    ## [2,]    1    7    2    3
    ## [3,]    1    2    6    4
    ## [4,]    1    1    9    6
    ## [5,]    1    3    4    5
    ## [6,]    1    7    3    4
    ## [7,]    1    8    2    5

    Calculate bb: b=(X′X)−1X′yb=(X′X)−1X′y.

    We can calculate this in R in just a few steps. First, we transpose XX to get X′X′.

    X.prime <- t(X)
    X.prime
    ##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
    ## [1,]    1    1    1    1    1    1    1
    ## [2,]    4    7    2    1    3    7    8
    ## [3,]    5    2    6    9    4    3    2
    ## [4,]    4    3    4    6    5    4    5

    Then we multiply XX by X′X′; (X′XX′X).

    X.prime.X <- X.prime %*% X
    X.prime.X
    ##      [,1] [,2] [,3] [,4]
    ## [1,]    7   32   31   31
    ## [2,]   32  192  104  134
    ## [3,]   31  104  175  146
    ## [4,]   31  134  146  143

    Next, we find the inverse of X′XX′X; X′X−1X′X−1

    X.prime.X.inv<-solve(X.prime.X)
    X.prime.X.inv
    ##            [,1]        [,2]        [,3]        [,4]
    ## [1,] 12.2420551 -1.04528602 -1.01536017 -0.63771186
    ## [2,] -1.0452860  0.12936970  0.13744703 -0.03495763
    ## [3,] -1.0153602  0.13744703  0.18697034 -0.09957627
    ## [4,] -0.6377119 -0.03495763 -0.09957627  0.27966102

    Then, we multiply X′X−1X′X−1 by X′X′.

    X.prime.X.inv.X.prime<-X.prime.X.inv %*% X.prime
    X.prime.X.inv.X.prime
    ##             [,1]        [,2]        [,3]       [,4]       [,5]       [,6]
    ## [1,]  0.43326271  0.98119703  1.50847458 -1.7677436  1.8561970 -0.6718750
    ## [2,]  0.01959746  0.03032309 -0.10169492  0.1113612 -0.2821769  0.1328125
    ## [3,]  0.07097458  0.02198093 -0.01694915  0.2073623 -0.3530191  0.1093750
    ## [4,] -0.15677966 -0.24258475 -0.18644068  0.1091102  0.2574153 -0.0625000
    ##             [,7]
    ## [1,] -1.33951271
    ## [2,]  0.08977754
    ## [3,] -0.03972458
    ## [4,]  0.28177966

    Finally, to obtain the bb vector we multiply X′X−1X′X′X−1X′ by yy.

    b<-X.prime.X.inv.X.prime %*% y
    b
    ##             [,1]
    ## [1,]  3.96239407
    ## [2,]  1.06064619
    ## [3,]  0.04396186
    ## [4,] -0.48516949

    We can use the lm function in R to check and see whether our “by hand” matrix approach gets the same result as does the “canned” multiple regression routine:

    lm(y~0+X)
    ## 
    ## Call:
    ## lm(formula = y ~ 0 + X)
    ## 
    ## Coefficients:
    ##       X1        X2        X3        X4  
    ##  3.96239   1.06065   0.04396  -0.48517

    Calculate ^yy^: ^y=Xby^=Xb.

    To calculate the ^yy^ vector in R, simply multiply X and b.

    y.hat <- X %*% b 
    y.hat
    ##           [,1]
    ## [1,]  6.484110
    ## [2,] 10.019333
    ## [3,]  4.406780
    ## [4,]  2.507680
    ## [5,]  4.894333
    ## [6,]  9.578125
    ## [7,] 10.109640

    Calculate ee.

    To calculate ee, the vector of residuals, simply subtract the vector yy from the vector ^yy^.

    e <- y-y.hat
    e
    ##            [,1]
    ## [1,] -0.4841102
    ## [2,]  0.9806674
    ## [3,] -0.4067797
    ## [4,]  0.4923199
    ## [5,]  0.1056674
    ## [6,] -0.5781250
    ## [7,] -0.1096398

    This page titled 11.3: OLS Regression in Matrix Form is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Jenkins-Smith et al. (University of Oklahoma Libraries) via source content that was edited to the style and standards of the LibreTexts platform.

    • Was this article helpful?