20.4: Other Terms and Operations
- Page ID
- 57812
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)There exist other helpful operations on matrices. Already, we have come across the determinant as being especially helpful in determining if a matrix is invertible or singular. However, there are more. I'm not sure where to put them, so I'm putting all of them in this one section.
Enjoy!
✦•················• ✦ •··················•✦
Trace
Another useful function is the trace. It is just the sum of the diagonal elements. That is:
\begin{equation}
\text{tr}\ \mathbf{A} = \sum_{i=1}^r a_{ii}
\end{equation}
The formula is simple... perhaps deceptively so. In fact, one may wonder what the trace actually tells us about a matrix. Quite simply, it provides another fundamental scalar descriptor of the matrix. While it is less intuitive than the determinant, it offers several key insights:
- Invariant Under Change of Basis: The trace is a similarity invariant, meaning that even if you change the coordinate system (e.g., perform a similarity transformation like \( \mathbf{P^{-1}AP} \)), the trace of the matrix remains unchanged. This points to it capturing some intrinsic property of the underlying data, independent of how it is represented.
- Sum of Eigenvalues: The trace is equal to the sum of the matrix's eigenvalues. This is a profoundly important connection, as the eigenvalues represent the fundamental scaling factors of the transformation in its principal directions. Therefore, the trace gives you a single number that summarizes the total "spread" of these scaling factors.
[Recall that the determinant is the product of the eigenvalues.] - A Measure of "Total Effect": In certain contexts, the trace can be interpreted as a measure of the total variance explained. For example, in a projection matrix used in regression, the trace (which equals its rank) indicates the number of dimensions in the projected subspace. In other words, the trace of the hat matrix is the number of degrees of freedom needed to estimate the parameters.
In essence, while the determinant tells you about the multiplicative scaling of volume, the trace tells you about the additive sum of the matrix's core scaling components. Its invariance makes it a crucial tool for understanding the essential nature of a linear transformation.
Transpose
The transpose of a matrix is just the matrix where the rows and columns are switched. Thus, if \(\mathbf{B}\) is the transpose of \(\mathbf{A}\), then \(b_{ij} = a_{ji}\). In symbols, we indicate the transpose as
\begin{equation}
\mathbf{B} = \mathbf{A}^\prime
\end{equation}
In some sources, it is also symbolized as
\begin{equation}
\mathbf{B} = \mathbf{A}^{\top}
\end{equation}
There are three quick results that you will need to prove:
- \(\mathbf{A}^\prime\mathbf{A}\) and \(\mathbf{A}\mathbf{A}^\prime\) both exist and are symmetric.
- \(\text{rank} \mathbf{A}^\prime\mathbf{A} = \text{rank} \mathbf{A}\mathbf{A}^\prime\), the matrix and its transpose contain the same amount of information.
- \( \left( \mathbf{A}^\prime \right)^{-1} = \left( \mathbf{A}^{-1} \right)^\prime \), the inverse of a transposed matrix is the transpose of the inverted matrix. This presupposed that \( \mathbf{A}^{-1}\) exists.
Symmetric
A matrix is symmetric if it is equal to its transpose, \(\mathbf{A} = \mathbf{A}^\prime\).
\begin{equation}
\left[ a_{ij}\right] = \left[ a_{ji}\right]
\end{equation}
Note that only square matrices can be symmetric. Symmetric matrices have some nice properties with respect to calculations.
Also note that one can "symmetrize" any square matrix. That is, one can form symmetric matrix \(\mathbf{X}\) from a square matrix \(\mathbf{A}\) as
\begin{equation}
\mathbf{X} = \frac{\mathbf{A} + \mathbf{A}^\prime}{2}
\end{equation}
I leave it as an exercise to prove that \(\mathbf{X}\) is symmetric.
One important feature of symmetric matrices is that they can be transformed into a diagonal matrix. If \(\mathbf{A}\) is symmetric, then there exists a matrix \(\mathbf{Q}\) such that \(\mathbf{AQ}\) is diagonal.
Why is this helpful?
First, remember that multiplication of diagonal matrices is commutative.
Second, as you will see in the text, diagonal covariance matrices indicate independence of the variables. This means that any set of variables can be linearly transformed into a set of independent variables. This fact is the basis for a procedure called "principal component analysis."
Idempotency
A matrix \(\mathbf{A}\) is idempotent if \(\mathbf{A}\mathbf{A} = \mathbf{A}\). This means that multiplying the matrix by itself yields the original matrix; i.e., once the transformation is applied a first time, applying it again will not change the result.
The concept of idempotency is the formal mathematical expression of a projection (see below). Imagine projecting a 3D object onto a 2D wall — the shadow is a "flattened" version of the object. If you then try to project that 2D shadow onto the same wall again, it remains unchanged. The idempotent matrix captures this exact property: it "squashes" vectors onto a subspace, and any vector already in that subspace is left untouched by a subsequent application of the transformation.
The Determinant
The determinant is a mathematical concept that serves as a diagnostic tool for assessing the fundamental viability of multivariate models. In regression, the determinant of the \(\mathbf{X^{\prime}X}\) matrix determines whether the ordinary least squares solution exists at all. If the determinant equals zero, the matrix is singular, indicating perfect multicollinearity among predictors, which prevents inversion and makes estimation impossible. Beyond mere existence, the magnitude of the determinant also conveys the overall collinearity in the data; a near-zero determinant suggests high multicollinearity, even if not perfect, leading to unstable coefficient estimates with inflated standard errors. Thus, the determinant acts as a gatekeeper for model validity, a gauge of data stability, and a key component in many advanced statistical procedures.
- det M = 0
M is rank deficient; its inverse does not exist - det M ≠ 0
M is full rank; it has an inverse - |det(A)|
This is the "volume scaling factor" of the linear transformation (of the matrix)
Calculating the determinant of a general square matrix by hand is a bit beyond the needs of this course. However, the determinant of a 2 times; 2 matrix is useful:
\begin{equation}
\text{det} A = a_{1,1}a_{2,2} - a_{1,2}a_{2,1}
\end{equation}
In R, the function is det:
A = matrix( c(1,2,3,4,5,6,7,8,9), ncol=3) det(A)
There are a couple of ways to indicate the determinant of a matrix. Some sources use |A| to indicate the determinant of the matrix A. This can be problematic if one also is taking absolute values.
Because of this possible confusion, I prefer det, where det A indicates the determinant of the matrix A.
As always, be aware of the symbols and terminology used by the author you are reading.
Matrix Rank
In statistics, especially when we move into multivariable methods like linear regression, we rely heavily on linear algebra. At the heart of many of these techniques is a key operation: solving for a set of unknown parameters. The rank of a matrix is the concept that tells us whether a unique and sensible solution exists.
Think of it this way: the rank of a matrix is the number of linearly independent rows or columns it contains. "Linearly independent" means that no row or column can be created by a linear combination of the others. It's a measure of the matrix's true informational content.
Its importance becomes clear in two fundamental areas:
- Invertibility: Can We Compute Our Estimates?
In ordinary least squares (OLS) regression, we estimate the coefficients using the formula:
\(\mathbf{b} = \mathbf{(X^{\prime}X)^{-1} X^{\prime}Y}\)
The critical step is inverting the matrix \(X^{\prime}X\). A matrix can be inverted if and only if it is "full rank."
- Full Rank: The rank equals the number of columns in \(\mathbf{X}\). This means all our predictor variables are linearly independent. \(\mathbf{X^{\prime}X}\) is invertible, the formula works, and we get unique estimates for all our coefficients.
- Rank Deficient (Less-Than-Full Rank): The rank is less than the number of columns. This signals perfect multicollinearity — one predictor is an exact linear combination of others (e.g., including variables for "weight in kg" and "weight in lbs," or the sum of all dummy variables for a categorical factor). In this case, \(\mathbf{X^{\prime}X}\) is singular; it cannot be inverted. The formula breaks down, and no unique set of coefficients exists—the computer will either throw an error or drop a variable arbitrarily.
This leads to a very important take-away:
- The rank is a diagnostic for perfect multicollinearity. It tells us if our model's design is fundamentally flawed before we even look at the data.
- Dimensionality: What Are We Actually Measuring?
Rank also tells us about the effective dimensionality of our data. Imagine you have 10 survey questions, but due to redundant phrasing, they only capture 3 distinct underlying attitudes. Your data matrix has 10 columns, but its rank might be close to 3. This has two major (and useful) implications:
- Principal Component Analysis (PCA): PCA finds new, uncorrelated variables (components). The number of meaningful components you can extract is exactly the rank of your standardized data matrix. It reveals the true number of dimensions needed to explain most of the variation.
- Linear Models and Hypothesis Testing: When testing complex hypotheses (e.g., in ANOVA or MANOVA), we compare models using matrices. The rank of these hypothesis matrices determines the degrees of freedom for the test. It essentially counts how many independent restrictions we are placing on our model parameters.
This leads to another very important take-away:
- Rank moves beyond just inversion. It quantifies the true, non-redundant information in your dataset, guiding dimensionality reduction and validating the structure of statistical tests.
Determining Matrix Rank
Remember that the rank of a matrix is the maximum number of linearly independent rows or columns in the matrix. In simpler terms, it tells us how many "truly unique" dimensions of information exist in your data, free from redundancy. For a data matrix, the rank has a practical interpretation: it's the maximum number of uncorrelated variables you could theoretically extract from your data.
Two columns are linearly dependent if one can be expressed as an exact linear combination of the others. When this happens, the matrix is rank deficient.
As an example, let us say we have data on 5 students with 4 variables:
| Student ID |
Math Score (X₁) |
Reading Score (X₂) |
Total Score (X₃) |
Double Math (X₄) |
|---|---|---|---|---|
| A | 80 | 85 | 165 | 160 |
| B | 90 | 80 | 170 | 180 |
| C | 70 | 90 | 160 | 140 |
| D | 85 | 75 | 160 | 170 |
| E | 95 | 85 | 180 | 190 |
From this data, our 5×4 data matrix \(\mathbf{X}\) is:
X = matrix( c(80, 85, 165, 160,
90, 80, 170, 180,
70, 90, 160, 140,
85, 75, 160, 170,
95, 85, 180, 190), ncol=4)
So, how do we determine the rank of this matrix? There are a few ways. Some easier than others. I will only show the first one in detail. The others can be found in your notes from Linear Algebra.
Method 1: Finding Dependencies
Looking at our variables and how they are defined, we see
- Total Score (X₃) = Math Score (X₁) + Reading Score (X₂)
165 = 80 + 85
170 = 90 + 80
So X₃ = X₁ + X₂ → a linearly dependent group - Double Math (X₄) = 2 × Math Score (X₁)
160 = 2 × 80
180 = 2 × 90
So X₄ = 2X₁ → a linearly dependent group
So we have 4 columns, but:
- X₃ depends on X₁ and X₂
- X₄ depends on X₁
So, we really only have two variables giving information. Thus, rank \(\mathbf{X}\) = 2.
Mathematically, we can say
\begin{equation}
\sum_{i=1}^4 a_n\mathbf{X}_n = 0
\end{equation}
for the vectors
\begin{align*}
a = \big[ 1, 1, -1, 0 \big]^{\prime} \\[1em]
a = \big[ 2, 0, 0, -1 \big]^{\prime}
\end{align*}
Because we found (at least) one vector \(a\), we know that \(\mathbf{X}\) is rank deficient.
Note that these vectors are not unique. Any scalar multiple of these will give the same result.
Other Methods
The direct method is not the only one. While it makes it easy to show a matrix is rank deficient, it is not so good at determining the actual rank of the matrix. These are methods for determining the matrix rank more directly:
- Gaussian Elimination:
The rank is the number of rows with non-zero leading element. - Invertible Submatrices:
The matrix's rank equals the size of the largest square sub-matrix with a non-zero determinant. - Singular Value Decomposition:
In practice, software uses SVD. The rank equals the number of non-zero singular values.
Rank in R
Of course, R has a built-in function to calculate the rank of a matrix. The function is qr. The function returns a lot of information, including the rank. If all you want is the rank of the matrix, just access rank from the output. Here is an example.
# Define a matrix M <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), ncol = 3) # Calculate the rank matrix_rank <- qr(M)$rank print(matrix_rank) # Output: 2
Note that the $rank part of qr(M)$rank just returns the rank of the matrix. This is because qr(M) is a list-type variable with several variables included (just like with a data frame). The $ accesses the variables internal to the list.
Positive Definite Matrices
A matrix \(\mathbf{A}\) is a positive definite (pd) if \(\mathbf{q}^\prime \mathbf{A} \mathbf{q} > 0\) for all non-zero vectors \(\mathbf{q}\). It is usually difficult to determine if a matrix is positive definite (pd). However, once you know it is, there are some important properties, which we look at in the next section.
Their Importance
So, why are positive definite matrices important?
Positive definite matrices are important because they guarantee the stability, uniqueness, and interpretability of many core multivariate procedures. They ensure that quadratic forms — such as those appearing in Mahalanobis distance, likelihood functions, and other optimization criteria — yield positive values for any non-zero vector, which is essential for meaningful variance estimation and probability calculations.
- In practical terms, the covariance matrix (of any non-degenerate random vector) must be positive definite; if it were not, it would imply zero or negative variance for some linear combination of the variables, which is nonsensical.
This property also underpins the feasibility of methods like Cholesky decomposition for simulation, the convergence of optimization algorithms in maximum likelihood estimation, and the validity of confidence ellipsoids in multivariate inference. In essence, positive definiteness acts as a mathematical certificate that a multivariate model is well-posed and that the underlying data structure is coherent and admissible.
Projection Matrices
A matrix \(\mathbf{P}\) is a projection matrix if it is idempotent. The purpose of projection matrices is to project a higher space onto a subspace. If \(\mathbf{P}\) is also symmetric, it is called an orthogonal projection matrix. This means it projects the larger space orthogonally (perpendicularly) onto the subspace. Think of shining a flashlight on a plant. If you put the flashlight directly over the plant, it will project the plant orthogonally onto the floor. If you do it at an angle, then the projection is called oblique. Here are comments on both types and why they matter to us.
Oblique Projections
A projection matrix is defined by the property \( \mathbf{P}^2 = \mathbf{P} \) (idempotency). This simply means that once you project a vector onto a subspace, projecting it again doesn't change anything. An oblique projection satisfies this condition but does not require the projection to be at a right angle. You can project a vector onto a line or plane by "casting a shadow" from a non-right angle. The resulting projection matrix for an oblique projection is idempotent but not symmetric.
Orthogonal Projections
An orthogonal projection is a special case of a projection where the "shadow" is cast at a right angle onto the subspace. This is the type of projection most frequently used in statistics and linear algebra for tasks like least squares regression. For a matrix to be an orthogonal projection matrix, it must be both idempotent and symmetric. The symmetry condition is what guarantees that the projection is perpendicular.
Why do we care?
In linear regression, the famous "hat matrix" \( \mathbf{H} = \mathbf{X}(\mathbf{X}^\prime \mathbf{X})^{-1}\mathbf{X}^\prime \) is the orthogonal projection matrix that projects the observed data vector \( \mathbf{Y} \) onto the column space of the design matrix \( \mathbf{X} \) to obtain the fitted values \( \mathbf{\hat{Y}} \).
The key in both instances is that you are simplifying a complicated reality (3-D object) onto a simpler model (2-D shadow).
Three Special Numeric Matrices
There are three special matrices that are used in computer matrix calculations. These are interesting in themselves, and there are some exercises for you based on these.
- The vector \(\mathbf{j}\) is a vector of 1s. It is used to calculate row sums (if pre-multiplying) or column sums (if post-multiplying).
- The matrix \(\mathbf{J}\) is a matrix of 1s. It does what \(\mathbf{j}\) does, but puts the sums in a matrix.
- The vector \mathbf{e_i} vector is a vector of 0s, with a 1 in the \(i^{\text{th}}\) position. It is used in proofs, as it can be used to select an individual row, column, or element of a matrix.
Other Linear Algebra Topics
There are a few remaining topics that I cannot put elsewhere. Here they are
- Matrices \(\mathbf{A}\) and \(\mathbf{B}\) are orthogonal, \(\mathbf{A} \perp \mathbf{B}\), if \(\mathbf{A}^\prime \mathbf{B}=\mathbf{0}\). That is, if the inner products of the columns of \(\mathbf{A}\) and \(\mathbf{B}\) are orthogonal, then the matrices themselves are orthogonal.
- The eigenvalues of a matrix \(\mathbf{A}\) are those values \(\lambda\) that solve the equation \(\mathbf{A}\mathbf{v} = \lambda\mathbf{v}\).
- The vector \(\mathbf{v}\) corresponding to each eigenvalue is called its eigenvector.


