20.3: Multiplication
- Page ID
- 57811
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Matrix multiplication is not a single operation but rather a family of distinct products, each with its own rules and purpose. The most fundamental is the "usual" matrix product for linear transformations. However, there are others. Alongside the usual, we have the Hadamard product for element-wise multiplication, the Kronecker product for creating block matrices, and scalar multiplication which uniformly scales every element in a matrix.
This diversity of products is the computational bedrock of linear models. Without these specific operations, the elegant, compact matrix equations that define regression would be impossible, forcing us to rely on cumbersome and inefficient scalar notations... which we will experience.
✦•················• ✦ •··················•✦
There are many, many, many types of multiplication with matrices. As expected, the version used depends on the need. As only the scalar product and the matrix product are typically seen in an undergraduate linear models course, we will only discuss those two here.
Scalar Product
As in arithmetic, the scalar product arose from needing to repeatedly add a matrix to itself. Thus, instead of writing \(\mathbf{A} + \mathbf{A} + \mathbf{A} + \mathbf{A} + \mathbf{A} + \mathbf{A} + \mathbf{A} + \mathbf{A}\), one could write \(8\mathbf{A}\), where \(8\) is a scalar. This was quickly generalized to non-integer values for the scalar multiple, just as \(3 \times a\) was quickly generalized to things like \(4.25 \times a\).
Scalar multiplication is defined as
\begin{equation}
c\mathbf{A} = \left[ ca_{ij} \right]
\end{equation}
Scalar products do not change the dimension of the matrix. That is, if \(c\) is a scalar and \(\mathbf{A} \in \mathcal{M}_{r \times c}\), then \(c\mathbf{A} \in \mathcal{M}_{r \times c}\).
Scalar products are commutative. That is, if \(c\) is a scalar and \(\mathbf{A}\) is a matrix, then \(c\mathbf{A} = \mathbf{A}c\). This will come in handy later, so be aware of it. Note that \(c\) does not need to be a natural number.
Scalar products are also associative. That is, if \(c\) is a scalar, then the following are equivalent:
- \( c\mathbf{A}\mathbf{B}\)
- \( \mathbf{A}c\mathbf{B}\)
- \(\mathbf{AB}c\)
- \( \left(c\mathbf{A}\right)\mathbf{B}\)
- \( c\left(\mathbf{A}\mathbf{B}\right)\)
Scalar multiplication is also distributive over matrix addition. Thus, \(c\left( \mathbf{A} + \mathbf{B}\right) = c\mathbf{A} + c\mathbf{B}\).
Matrix Product
The matrix product is the multiplication that is (usually) meant when one just says "matrix multiplication." Its definition arises from linear algebra and repeated linear transformations. It has many nice properties. Calculation is not one of them.
Let us define two matrices \(\mathbf{A}\) and \(\mathbf{B}\) such that the number of columns of \(\mathbf{A}\) equals the number of rows of \(\mathbf{B}\). Their product is defined as
\begin{equation}
\mathbf{A}\mathbf{B} = \left[ ab_{i,j}\right] = \left[ \sum_k a_{i,k}b_{k,j} \right]
\end{equation}
Here, \(k\) ranges between 1 and the number of columns of \(\mathbf{A}\). The dimension of the product is the number of rows of \(\mathbf{A}\) by the number of columns of \(\mathbf{B}\). That is, let \(\mathbf{A} \in \mathcal M_{r1 \times c1}\) and \(\mathbf{B} \in \mathcal M_{r2 \times c2}\). Then, \(\mathbf{A}\) and \(\mathbf{B}\) are commensurate for matrix multiplication if \(c1 = r2\). The dimension of the product is \(r1 \times c2\).
Let us define our two matrices as
\begin{equation}
\mathbf{A} = \left[ \begin{matrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{matrix} \right]
\end{equation}
and
\begin{equation}
\mathbf{B} = \left[ \begin{matrix} a & b & c \\ d & e & f \\ g & h & m \end{matrix} \right]
\end{equation}
Find the product \(\mathbf{A}\mathbf{B}\).
Solution:
The first step is to check that the matrices are commensurate for multiplication: Do the number of columns of \(\mathbf{A}\) equal the number of rows of \(\mathbf{B}\)? Note that \(\mathbf{A} \in \mathcal{M}_{2 \times 3}\) and \(\mathbf{B} \in \mathcal{M}_{3 \times 3}\). Because the "inners" of \(\mathbf{AB}\) are equal to each other, the two matrices are commensurate.
Second, we determine the dimension of the product. It is the number of rows of \(\mathbf{A}\) by the number of columns of \(\mathbf{B}\): \(2 \times 3\), the "outers."
Third, since we know the dimension of the product, we just have to fill in the blanks in the matrix:
\begin{equation}
\mathbf{A}\mathbf{B} = \left[ \begin{matrix} - & - & - \\ - & - & - \end{matrix} \right]
\end{equation}
The top-right element in the product matrix is element \(1,1\). Thus, by our definition, it equals
\begin{align}
ab_{1,1} &= \left[ \sum_k a_{1,k}b_{k,1} \right] \\[1em]
&= \left[ a_{1,1}b_{1,1} + a_{1,2}b_{2,1} + a_{1,3}b_{3,1} \right] \\[1em]
&= \left[ 1a + 2d + 3g \right]
\end{align}
The top-middle element, \(ab_{1,2}\), is
\begin{align}
ab_{1,2} &= \left[ \sum_k a_{1,k}b_{k,2} \right] \\[1em]
&= \left[ a_{1,1}b_{1,2} + a_{1,2}b_{2,2} + a_{1,3}b_{3,2} \right] \\[1em]
&= \left[ 1b + 2e + 3h \right]
\end{align}
Note what is happening here. The elements of the "top-right" cell is the inner product of the top row and the right column. Similarly, the bottom-center element is the inner product of the bottom row and the center column.
I leave it as an exercise for you to finish the multiplication. Here is the final product:
\begin{equation}
\mathbf{A}\mathbf{B} = \left[ \begin{matrix} 1a + 2d + 3g & 1b + 2e + 3h & 1c + 2f + 3m \\
4a + 5d + 6g & 4b + 5e + 6h & 4c + 5f + 6m \end{matrix} \right]
\end{equation}
In scalar arithmetic, we have a multiplicative identity, multiplicative inverse, and multiplication is commutative, associative, and distributes over addition. All of these hold for matrix multiplication --- except commutativity. In general, \(\mathbf{A}\mathbf{B} \ne \mathbf{B}\mathbf{A}\), even when the multiplications both make sense.
The multiplicative identity, which we will symbolize by \(\mathbf{I}\), has the property that \(\mathbf{A}\mathbf{I} = \mathbf{I}\mathbf{A} = \mathbf{A}\). We will call \(\mathbf{I}\) the identity matrix.
Technically, this statement is only true if \(\mathbf{A}\) is square. If it is not square, then the two identity matrices will have different dimension. We will restrict ourselves to square matrices. The "generalized inverse" is beyond the scope of this text.
If a multiplicative identity exists, then a multiplicative inverse also exists. The inverse of a matrix \(\mathbf{A}\), denoted \(\mathbf{A}^{-1}\), is a matrix that satisfies these two requirements:
\begin{equation}
\mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}
\end{equation}
Not all matrices have inverses. Those that do not are called singular. Those that do are called invertible.
Only square matrices can be invertible. However, not all square matrices are invertible. From linear algebra, a matrix is invertible if and only if it is square and is of full rank.
A consequence of this is that a matrix is invertible if its determinant is non-zero. In general, the calculation of the determinant and the inverse are computationally intensive. However, they are rather straight-forward for \(2 \times 2\) matrices. Since much of the initial exploration of linear models takes place in the realm of one dependent and one independent variable, it behooves us to focus a bit on \(2 \times 2\) matrices.
Let \(\mathbf{A} \in \mathcal M_{2\times2}\). Then, the determinant of \(\mathbf{A}\) is
\(\det \mathbf{A} = a_{11}a_{22} - a_{12}a_{21}\)
From your linear algebra class, you may have learned that the determinant of a matrix is a single scalar value that reveals the matrix's scaling factor for area (or volume or hyper-volume) when it acts as a linear transformation. A determinant of zero indicates that the transformation compresses the space into a lower dimension, making the matrix non-invertible, while a non-zero determinant confirms the transformation preserves the full dimensionality of the space. Since the rank of a matrix is important to statisticians, this last feature is important in terms of the data (information).
If \(\mathbf{A} \in \mathcal M_{2\times 2}\) then its inverse is
\begin{equation}
\mathbf{A}^{-1} = \frac{1}{\det \mathbf{A}} \left[ \begin{matrix} a_{22} & -a_{12} \\ -a_{21} & a_{11} \end{matrix} \right]
\end{equation}
Proof
To prove this, we will show that \(\mathbf{A}\mathbf{A}^{-1} = \mathbf{I}\) and that \(\mathbf{A}^{-1}\mathbf{A} = \mathbf{I}\).
\begin{align}
\mathbf{A}\mathbf{A}^{-1} &= \left[ \begin{matrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{matrix} \right] \qquad \frac{1}{\det \mathbf{A}} \left[ \begin{matrix} a_{22} & -a_{12} \\ -a_{21} & a_{11} \end{matrix} \right] \\[1em]
&= \frac{1}{\det \mathbf{A}} \left[ \begin{matrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{matrix} \right] \left[ \begin{matrix} a_{22} & -a_{12} \\ -a_{21} & a_{11} \end{matrix} \right] \\[1em]
&= \frac{1}{a_{11}a_{22} - a_{12}a_{21}}
\left[ \begin{matrix}
a_{11}a_{22} - a_{12}a_{21} & -a_{11}a_{12} + a_{11}a_{12} \\[1ex]
a_{21}a_{22} - a_{21}a_{22} & -a_{12}a_{21} + a_{11}a_{22} \\
\end{matrix} \right] \\[1em]
&= \frac{1}{a_{11}a_{22} - a_{12}a_{21}}
\left[ \begin{matrix}
a_{11}a_{22} - a_{12}a_{21} & 0 \\[1ex]
0& -a_{12}a_{21} + a_{11}a_{22} \\
\end{matrix} \right] \\[1em]
&= \left[ \begin{matrix}
1 & 0 \\ 0 & 1 \\ \end{matrix} \right] \\[1em]
&= \mathbf{I}
\end{align}
Thus, we have shown that \(\mathbf{A}\mathbf{A}^{-1} = \mathbf{I}\). This is just one-half of the proof. I leave it as an exercise for you to prove the second half: \(\mathbf{A}^{-1}\mathbf{A} = \mathbf{I}\).
The reason I provide the mathematics for \(2 \times 2\) matrices is that in our study of simple linear regression, many of the important calculations will be done with \(2 \times 2\) matrices. See Chapter 4: Matrices and Linear Regression.
Formulas do exist for \(3 \times 3\) matrices, too. However, once we move beyond that, hand calculations are time-prohibitive. At the end of this section, I provide some examples of performing these calculations in R.
Now that we have a mechanism to calculate a multiplicative inverse, let us see that not all square matrices have one.
Let \(\mathbf{A} \in \mathcal M_{2\times2}\) be defined as
\begin{equation}
\mathbf{A} = \left[ \begin{matrix} 1 & 3 \\ 2 & 6 \end{matrix} \right]
\end{equation}
Calculate \(\mathbf{A}^{-1}\).
Solution:
Let us calculate \(\mathbf{A}^{-1}\) using our formula,
\begin{equation}
\mathbf{A}^{-1} = \frac{1}{\det \mathbf{A}} \left[ \begin{matrix} \phantom{-}a_{22} & -a_{12} \\ -a_{21} & \phantom{-}a_{11} \end{matrix} \right]
\end{equation}
Applying this formula is straight-forward:
\begin{equation}
\mathbf{A}^{-1} = \frac{1}{0} \left[ \begin{matrix} \phantom{-}6 & -3 \\ -2 & \phantom{-}1 \end{matrix} \right]
\end{equation}
Yes, the determinant of \(\mathbf{A}\) is \(\det \mathbf{A} = 1 \times 6 - 3 \times 2 = 6 - 6 = 0\). Since the determinant is \(0\), the inverse does not exist (one cannot divide by \(0\)). The matrix is singular.
What is it about the \(\mathbf{A}\) matrix that makes it singular?
Note that the second column is just 3 times the first column (or the second row is twice the first). This means the matrix is not full rank. The columns are not linearly independent. When we get to using these matrices with real data, we will see this as the second column gives us no knowledge about the world that is not already contained in the first column. The second column is redundant.
So far, we have seen the multiplicative identity and the multiplicative inverse. It is time to note that matrix multiplication is not commutative, in general.
Matrix multiplication is not commutative. That is, there exist \(\mathbf{A}\) and \(\mathbf{B}\) such that \(\mathbf{A}\mathbf{B} \ne \mathbf{B}\mathbf{A}\).
Not that this is not saying that it is impossible for \(\mathbf{A}\mathbf{B} = \mathbf{B}\mathbf{A}\). It is merely saying that we cannot assume it. There are definitely times that the product is commutative. In fact, knowing those times will help us simplify some of the more-complicated models.
Proof:
The proof is simple. It is simply a counter-example. Let
\begin{equation}
\mathbf{A} = \left[\begin{matrix} 3&1\\ 2&7\\ \end{matrix}\right]
\end{equation}
and
\begin{equation}
\mathbf{B} = \left[\begin{matrix} 1&1\\ 1&1\\ \end{matrix}\right]
\end{equation}
Note that \(\mathbf{A}\mathbf{B} \ne \mathbf{B}\mathbf{A}\).
Since we have found a counter-example, we have shown that matrix multiplication is not commutative, in general.
While technically correct, this proof leaves us feeling a little empty. So we found one counter-example. Cool beans. But we learned precious little about commutativity with matrix multiplication. Let us explore a bit and see if we can determine when matrix multiplication is commutative. We may learn something interesting.
Exploring Matrices:
First, let us assume \(\mathbf{A}\) and \(\mathbf{B}\) are square and commensurate. This ensures that \(\mathbf{A}\mathbf{B}\) and \(\mathbf{B}\mathbf{A}\) can be calculated. For instance, let both be \(2 \times 2\) matrices. Then, \(\mathbf{A}\mathbf{B}\) is
\begin{align}
\mathbf{A}\mathbf{B} &= \left[ \begin{matrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{matrix} \right]
\left[ \begin{matrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{matrix} \right] \\[1em]
&= \left[ \begin{matrix}
a_{11}b_{11} + a_{12}b_{21} & a_{11}b_{12} + a_{12}b_{22} \\[1ex]
a_{21}b_{11} + a_{22}b_{21} & a_{21}b_{12} + a_{22}b_{22}
\end{matrix} \right]
\end{align}
and \(\mathbf{B}\mathbf{A}\) is
\begin{align}
\mathbf{B}\mathbf{A} &= \left[ \begin{matrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{matrix} \right]
\left[ \begin{matrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{matrix} \right]\\[1em]
&= \left[ \begin{matrix}
b_{11}a_{11} + b_{12}a_{21} & b_{11}a_{12} + b_{12}a_{22} \\[1ex]
b_{21}a_{11} + b_{22}a_{21} & b_{21}a_{12} + b_{22}a_{22}
\end{matrix} \right]\\[1em]
&= \left[ \begin{matrix}
a_{11}b_{11} + a_{21}b_{12} & a_{12}b_{11} + a_{22}b_{12} \\[1ex]
a_{11}b_{21} + a_{21}b_{22} & a_{12}b_{21} + a_{22}b_{22}
\end{matrix} \right]
\end{align}
By comparing the two product matrices, \(\mathbf{A}\mathbf{B}\) and \(\mathbf{B}\mathbf{A}\), we can determine some restrictions under which multiplication is commutative.
For instance, if \(a_{12}=a_{21}=0\) and \(b_{12}=b_{21}=0\), then the two product matrices are the same. In other words, if both \(\mathbf{A}\) and \(\mathbf{B}\) are both diagonal matrices, multiplication will be commutative. That's an interesting consequence we would have missed if we just stopped with our proof.
In fact, now that we know what to look for, it is trivial to show that multiplication of diagonal matrices is commutative in general... as we do here:
Let \(\mathbf{A}\) and \(\mathbf{B}\) be diagonal matrices of the same size. The product is commutative; that is, \(\mathbf{AB} = \mathbf{BA}\).
Proof:
Since \mathbf{A} and \mathbf{B} are diagonal and of the same shape, they can be represented as
\begin{equation}
\mathbf{A} = \left[
\begin{matrix}
a_1 & 0 & 0 & \cdots & 0 \\
0 & a_2 & 0 & \cdots & 0 \\
0 & 0 & a_3 & & 0 \\
\vdots & \vdots & & \ddots & \vdots \\
0 & 0 & 0 & \cdots & a_n \\
\end{matrix}
\right]
\end{equation}
and
\begin{equation}
\mathbf{B} = \left[
\begin{matrix}
b_1 & 0 & 0 & \cdots & 0 \\
0 & b_2 & 0 & \cdots & 0 \\
0 & 0 & b_3 & & 0 \\
\vdots & \vdots & & \ddots & \vdots \\
0 & 0 & 0 & \cdots & b_n \\
\end{matrix}
\right]
\end{equation}
By definition, the product \(\mathbf{AB}\) is
\begin{align}
\mathbf{A}\mathbf{B} &= \left[
\begin{matrix}
a_1 & 0 & 0 & \cdots & 0 \\
0 & a_2 & 0 & \cdots & 0 \\
0 & 0 & a_3 & & 0 \\
\vdots & \vdots & & \ddots & \vdots \\
0 & 0 & 0 & \cdots & a_n \\
\end{matrix}
\right] \ \left[
\begin{matrix}
b_1 & 0 & 0 & \cdots & 0 \\
0 & b_2 & 0 & \cdots & 0 \\
0 & 0 & b_3 & & 0 \\
\vdots & \vdots & & \ddots & \vdots \\
0 & 0 & 0 & \cdots & b_n \\
\end{matrix}
\right] \\[2em]
&= \left[
\begin{matrix}
a_1b_1 & 0 & 0 & \cdots & 0 \\
0 & a_2b_2 & 0 & \cdots & 0 \\
0 & 0 & a_3b_3 & & 0 \\
\vdots & \vdots & & \ddots & \vdots \\
0 & 0 & 0 & \cdots & a_nb_n \\
\end{matrix}
\right]
\end{align}
Similarly, the product \(\mathbf{BA}\) is
\begin{align}
\mathbf{B}\mathbf{A} &= \left[
\begin{matrix}
b_1 & 0 & 0 & \cdots & 0 \\
0 & b_2 & 0 & \cdots & 0 \\
0 & 0 & b_3 & & 0 \\
\vdots & \vdots & & \ddots & \vdots \\
0 & 0 & 0 & \cdots & b_n \\
\end{matrix}
\right]\ \left[
\begin{matrix}
a_1 & 0 & 0 & \cdots & 0 \\
0 & a_2 & 0 & \cdots & 0 \\
0 & 0 & a_3 & & 0 \\
\vdots & \vdots & & \ddots & \vdots \\
0 & 0 & 0 & \cdots & a_n \\
\end{matrix}
\right] \\[2em]
&= \left[
\begin{matrix}
b_1a_1 & 0 & 0 & \cdots & 0 \\
0 & b_2a_2 & 0 & \cdots & 0 \\
0 & 0 & b_3a_3 & & 0 \\
\vdots & \vdots & & \ddots & \vdots \\
0 & 0 & 0 & \cdots & b_na_n \\
\end{matrix}
\right] \\[2em]
&= \left[
\begin{matrix}
a_1b_1 & 0 & 0 & \cdots & 0 \\
0 & a_2b_2 & 0 & \cdots & 0 \\
0 & 0 & a_3b_3 & & 0 \\
\vdots & \vdots & & \ddots & \vdots \\
0 & 0 & 0 & \cdots & a_nb_n \\
\end{matrix}
\right] \\[2em]
&= \mathbf{A}\mathbf{B}
\end{align}
Thus, we have shown \(\mathbf{AB} = \mathbf{BA}\) for two diagonal matrices of the same size.
Finally, I leave it as an exercise for you to prove that matrix multiplication is associative (when the multiplication can be done). That is, if \(\mathbf{A}\mathbf{B}\mathbf{C}\) can be calculated, then it can be calculated as either \(\left(\mathbf{A}\mathbf{B}\right)\mathbf{C}\) or as \(\mathbf{A}\left(\mathbf{B}\mathbf{C}\right)\).


