4: Matrices and Linear Regression
- Page ID
- 57716
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
In the previous chapter, we were introduced to the classical linear model and estimating the parameters using ordinary linear regression. All of the work was done using a scalar representation of the data. When moving beyond simple linear regression, the estimators are more difficult to determine. The calculus remains almost as simple, but solving the system of equations becomes prohibitive.
The usual solution to solving complicated system of equations is to use a matrix representation of the problem. That is what this chapter does. Along the way, we discover more about linear models than we expected.
✦•················• 😺 •··················•✦
As in the previous chapter, let \(x\) and \(y\) be numeric variables. The linear relationship between \(x\) and \(y\) can be summarized by a line that "best" fits the observed data. That is, we can (and will) summarize the relationship between \(x\) and \(y\) using a linear equation:
\begin{equation}y = \beta_0 + \beta_1 x \end{equation}
The above holds in the case of simple linear regression (SLR). So, what do we do when there are more independent variables? Here is that representation:
\begin{equation}
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \cdots + \beta_k x_k \label{eq:lm2b-lobf}
\end{equation}
Here, \(\beta_0\) is the y-intercept (still). And, \(\beta_j\) is the effect of variable \(x_j\) on the dependent variable, assuming all the other variables remain constant. By the way, his is called the ceteris paribus assumption. If the independent variables are independent of each other, then this requirement it met. However, there is frequently some correlation among the independent variables. Read on to see what to do about this.
We say that the "line" given by equation \ref{eq:lm2b-lobf} best fits the observed data. However, when dealing with two independent variables, it is not a line but a plane; with three, a space; with four, a hyperplane; etc. Clearly, meaningfully representing an entire four-variable model is quite difficult.
By the end of this chapter, you should be able to:
- Matrix Representation of Regression Models
-
Express a multiple linear regression model with k predictors and n observations using compact matrix notation: \(\mathbf{Y} = \mathbf{XB} + \mathbf{E}\).
-
Identify the dimensions and interpret the components of the design matrix (\(\mathbf{X}\)), parameter vector (\(\mathbf{B}\)), response vector (\(\mathbf{Y}\)), and error vector (\(\mathbf{E}\)).
-
Translate between scalar equations for individual observations and the unified matrix representation of the entire model.
-
- Matrix Solution & the Hat Matrix
-
Derive and apply the Ordinary Least Squares (OLS) solution in matrix form: \(\mathbf{b} = \mathbf{X (X^\prime X)^{-1} X^\prime Y}\).
-
Define the hat matrix (\(\mathbf{H} = \mathbf{X(X^\prime X)^{-1} X^\prime}\)) and explain its role in generating predicted values: \(\mathbf{\hat{Y}} = \mathbf{HY}\).
-
Interpret the diagonal elements of the hat matrix (the leverage, \(h_{i,i}\)) as measures of an observation's influence on its own prediction.
-
- Geometric Interpretation of OLS
-
Conceptualize the column space of \(\mathbf{X}\) as the set of all possible linear combinations of the predictor variables.
-
Explain the OLS solution geometrically: as the orthogonal projection of the observed vector \(\mathbf{Y}\) onto the column space of \(\mathbf{X}\).
-
Relate the residual vector (\(\mathbf{E} = \mathbf{Y} - \mathbf{\hat{Y}}\)) to this geometric picture, recognizing it as perpendicular to the column space of \(\mathbf{X}\).
-
- Multicollinearity: Detection & Consequences
-
Define multicollinearity as high correlations among predictor variables in the design matrix \(\mathbf{X}\).
-
Explain why perfect multicollinearity makes \(\mathbf{(X^\prime X)}\) singular and the OLS estimates \(\mathbf{b} = \mathbf{(X^\prime X)^{-1} X^\prime Y}\) impossible to compute.
-
Describe the practical consequences of high (but not perfect) multicollinearity: inflated standard errors, unstable coefficient estimates, and difficulty in assessing individual predictor importance.
-
- Incorporating Categorical Predictors
-
Construct a proper design matrix \(\mathbf{X}\) for a regression model that includes categorical independent variables (factors) using dummy coding (e.g., treatment or indicator coding).
-
Interpret the regression coefficients for dummy-coded variables relative to the chosen reference category.
-
Extend the dummy coding approach to factors with more than two levels.
-
- Synthesis: Connecting Theory to Practice
-
Implement the matrix formulation of OLS using statistical software (in concept) to solve for \(\mathbf{b}\) given \(\mathbf{X}\) and \(\mathbf{Y}\).
-
Diagnose potential multicollinearity in a model's design matrix by examining correlations or calculating variance inflation factors (VIFs).
-
Articulate the advantages of the matrix approach for deriving theoretical properties and generalizing regression concepts to multiple predictors.
-
· · ─ ·✶· ─ · ·


