© 2015-2016 Jacob Ström, Kalle Åström, and Tomas Akenine-Möller Forum

# Chapter 6: The Matrix

Enter the matrix.

Matrices are a very powerful tool to manipulate data with. As can be seen in the example shown in Interactive Illustration 6.1, matrices can be used to transform images in different ways. After the theory has been presented, the text will connect back to this example.
Interactive Illustration 6.1: Each pixel (short for "picture element") of an image consists of three components, namely, a red ($r$), green ($g$), and a blue ($b$) component. Hence, each pixel can be thought of as a (column) vector, $\vc{p} = \left( \begin{smallmatrix} r \\ g \\ b \end{smallmatrix} \right)$. Below the caption, there is a matrix, which is denoted by $\mx{M}$, and it is simply $3\times 3$ values. Depending on the numbers there, you will get a different result on the image to the right above. The matrix, $\mx{M}$, is applied to each pixel's color component vector, $\vc{p}$, of the image to the left. The reader may want to try the following matrices $\left( \begin{smallmatrix} 1 & 1 & 1 \\ 0 & 0 & 0\\ 0 & 0 & 0 \end{smallmatrix} \right)$, $\left( \begin{smallmatrix} 0.3 & 0.6 & 0.1 \\ 0.3 & 0.6 & 0.1\\ 0.3 & 0.6 & 0.1 \end{smallmatrix} \right)$, and $\left( \begin{smallmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{smallmatrix} \right)$, as well as matrices of your own.
Interactive Illustration 6.1: Each pixel (short for "picture element") of an image consists of three components, namely, a red ($\hid{r}$), green ($\hid{g}$), and a blue ($\hid{b}$) component. Hence, each pixel can be thought of as a (column) vector, $\hid{\vc{p} = \left( \begin{smallmatrix} r \\ g \\ b \end{smallmatrix} \right)}$. Below the caption, there is a matrix, which is denoted by $\hid{\mx{M}}$, and it is simply $\hid{3\times 3}$ values. Depending on the numbers there, you will get a different result on the image to the right above. The matrix, $\hid{\mx{M}}$, is applied to each pixel's color component vector, $\hid{\vc{p}}$, of the image to the left. The reader may want to try the following matrices $\hid{\left( \begin{smallmatrix} 1 & 1 & 1 \\ 0 & 0 & 0\\ 0 & 0 & 0 \end{smallmatrix} \right)}$, $\hid{\left( \begin{smallmatrix} 0.3 & 0.6 & 0.1 \\ 0.3 & 0.6 & 0.1\\ 0.3 & 0.6 & 0.1 \end{smallmatrix} \right)}$, and $\hid{\left( \begin{smallmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{smallmatrix} \right)}$, as well as matrices of your own.
In Section 6.10, we will connect back to this introductory example. Now, let us start with a definition of the matrix, and then see how they can be used.

As we saw in Chapter 5, a typical linear system of equations can look like
 $$\begin{cases} \begin{array}{rrrl} 2 & \!\!\!\!\!\! x_1 + 4 &\!\!\!\!\!\!\!x_2 - 2 &\!\!\!\!\!\!x_3 = \hid{-}16, \\ - & \!\!\!\!\!\! x_1 - 7 &\!\!\!\!\!\!x_2 + 2 &\!\!\!\!\!\!x_3 = -27, \\ & 3 &\!\!\!\!\!\!x_2 - 6 &\!\!\!\!\!\!x_3 = -21. \\ \end{array} \end{cases}$$ (6.1)
To the left of the equal signs, there are a number of constants being multiplied by either $x_1$, $x_2$, or $x_3$. All these constants can be taken out and inserted into a structure called a matrix. For Equation (6.1) this results in
 $$\left(\begin{array}{rrr} -2 & 4 & -2 \\ -1 & -7 & 2 \\ 0 & 3 & -6 \end{array}\right) .$$ (6.2)
As can be seen, there are large parenthesis surrounding the array of numbers. This size of the matrix above is $3\times 3$, i.e., three rows and three columns. However, in general, a matrix can have any size. This leads to the following definition.

Definition 6.1: Matrix
A matrix, $\mx{A}$, is a two-dimensional array of scalars, $a_{ij}$, with $r$ rows and $c$ columns, e.g.,
 $$\left( \begin{array}{cccc} a_{11} & a_{12} & \dots & a_{1c} \\ a_{21} & a_{22} & \dots & a_{2c} \\ \vdots & \vdots & \ddots & \vdots \\ a_{r1} & a_{r2} & \dots & a_{rc} \end{array} \right).$$ (6.3)
The size of the matrix is $r \times c$, i.e., the number of rows times the number of columns. The matrix is called square if $r=c$. A short-hand notation for the elements in the matrix $\mx{A}$ is $[ a_{ij} ]$, which is convenient when dealing with operations on matrices, which we will see.
Note that the notation for a matrix is upper-case bold letters, e.g., $\mx{A}$, and as usual, all scalars are lower-case italic letters, $a_{ij}$., where $i$ is the row and $j$ is the column of the matrix element. Sometimes, it is convenient to extract out either a particular column of scalars, or a particular row. Note that for a $r \times c$ matrix, $\mx{A}$, there are $r$ different row vectors and $c$ different column vectors. The column vectors for a matrix, $\mx{A}$, are
 \begin{align} \mx{A}=& \left( \begin{array}{rrr} -2 & 4 & -2 \\ -1 & -7 & 2 \\ 0 & 3 & -6 \end{array} \right) = \left( \begin{array}{ccc} \vert & \vert & \vert \\ \vc{a}_{,1} & \vc{a}_{,2} & \vc{a}_{,3} \\ \vert & \vert & \vert \end{array} \right), \\ &\\ &\\ &\text{where } \vc{a}_{,1} = \left( \begin{array}{rrr} -2 \\ -1\\ 0 \end{array} \right), \ \ \vc{a}_{,2} = \left( \begin{array}{rrr} 4 \\ -7\\ 3 \end{array} \right), \ \ \text{and } \vc{a}_{,3} = \left( \begin{array}{rrr} -2 \\ 2\\ -6 \end{array} \right). \end{align} (6.4)
Note that the vertical lines ($\vert$) are there to illustrate that $\vc{a}_{,i}$ are column vectors that extend both upwards and downwards in the matrix. The corresponding row vectors are
 \begin{align} \mx{A}=& \left( \begin{array}{rrr} -2 & 4 & -2 \\ -1 & -7 & 2 \\ 0 & 3 & -6 \end{array} \right) = \left( \begin{array}{c} -\,\,\, \vc{a}_{1,}^\T - \\ -\,\,\, \vc{a}_{2,}^\T - \\ -\,\,\, \vc{a}_{3,}^\T - \end{array} \right), \\ &\text{where } \vc{a}_{1,} = \left( \begin{array}{rrr} -2 \\ 4\\ -2 \end{array} \right), \ \ \vc{a}_{2,} = \left( \begin{array}{rrr} -1 \\ -7\\ 2 \end{array} \right), \ \ \text{and } \vc{a}_{3,} = \left( \begin{array}{rrr} 0 \\ 3\\ -6 \end{array} \right). \end{align} (6.5)
Recall that vectors are by default column vectors in this book, and therefore, we have transposed the $\vc{a}_{i,}$ vectors above, in order to turn them into row vectors. The horizontal lines ($-$) are there to illustrate that $\vc{a}_{i,}^\T$ are row vectors that extend both to the left and to the right in the matrix. As can be seen, this is consistent with our notation for vectors, which use lower-case bold letters. This notation is summarized in the following definition.

Definition 6.2: Row and Column Vectors from a Matrix
The $i$:th row vector of an $r \times c$ matrix, $\mx{A}$, is denoted by $\vc{a}_{i,}^\T$, and it has $c$ scalar elements in it.
The $i$:th column vector of $\mx{A}$ is denoted by $\vc{a}_{,i}$, which has $r$ scalar elements in it. Using vectors, a matrix can thus be written in the following two ways,
 $$\mx{A} = \bigl(\vc{a}_{,1} \,\,\, \vc{a}_{,2} \,\,\,\dots\,\,\, \vc{a}_{,c}\bigr) = \left( \begin{array}{c} \vc{a}_{1,}^\T\\ \vc{a}_{2,}^\T\\ \vdots \\ \vc{a}_{r,}^\T\\ \end{array} \right).$$ (6.6)
In the definition above, we have omitted vertical and horizontal lines (as used in (6.5) and (6.4)) in Equation (6.6). Note that the row vector is denoted $\vc{a}_{i,}^\T$, i.e., it is a column vector ($\vc{a}_{i,}$), which has been transposed into a row vector.

There are also two special constant matrices called the identity matrix, which is denoted by $\mx{I}$, and the zero matrix, which is denoted by $\mx{O}$. The former plays a role which is similar to the number 1 in plain algebra, and the latter plays the role of the 0.

Definition 6.3: Identity Matrix
An identity matrix, $\mx{I}$, of size $n \times n$ has zeroes everywhere except in the diagonal that goes from the upper left to the lower right, where there are ones, i.e.,
 $$\mx{I} = \left( \begin{array}{cccc} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 1 \end{array} \right).$$ (6.7)
Hence, a $2 \times 2$ identity matrix is $\mx{I} =\bigl( \begin{smallmatrix} 1 & 0\\ 0 & 1\end{smallmatrix} \bigr)$, and a $3\times 3$ identity matrix is $\mx{I} =\Bigl( \begin{smallmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{smallmatrix} \Bigr)$. As can be seen, we have used $\mx{I}$ for both these matrices. Next follows the definition of the zero matrix.

Definition 6.4: Zero Matrix
A zero matrix, $\mx{O}$, has all its matrix elements equal to zero.
In most cases, the size of $\mx{I}$ and $\mx{O}$ can be determined from the context in which they are used, and otherwise, we will mention what the size is.

Note also that if the number of columns of a matrix, $\mx{A}$, is 1, i.e., $c=1$, then we have a column vector, and we may denote it as a vector instead, i.e., $\vc{a}$. Furthermore, if the number of rows is 1, i.e., $r=1$, in a matrix, $\mx{B}$, then we have a row vector, and may we denote it by a transposed column vector instead, i.e., $\vc{b}^\T$. Below, we show a $3\times 1$ matrix (column vector) and a $1\times 3$ matrix (row vector), which is written out as a transposed column vector.
 $$\underbrace{ \mx{A} }_{3\times 1} = \left( \begin{array}{c} 3 \\ 2 \\ 6 \end{array} \right) = \underbrace{\vc{a}}_{\begin{array}{c} \text{column} \\ \text{vector} \end{array}} ,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \underbrace{ \mx{B} }_{1\times 3} = \bigl(5\,\,\, 1 \,\,\, 4\bigr) = \underbrace{ \vc{b}^\T }_{\begin{array}{c} \text{transposed} \\ \text{column} \\ \text{vector} \end{array}}$$ (6.8)
It should be pointed out that sometimes it is more natural to use row vectors just as a row instead of a transposed column. This is really up to the reader.

As we have already seen in Chapter 2, vectors can be transposed. This means that a column vector becomes a row vector and vice versa. A matrix can also be transposed as defined below.

Definition 6.5: Matrix Transpose
The transpose of an $r\times c$ matrix, $\mx{A}=[a_{ij}]$, is denoted by $\mx{A}^\T$ (of size $c\times r$) and is formed by making the columns of $\mx{A}$ into rows in $\mx{A}^\T$ (or rows into columns, which is equivalent). This can also be expressed using the shorthand notation for a matrix as
 $$\mx{A}^\T = [a_{ji}].$$ (6.9)
Note that the order of the indices has changed from $ij$ to $ji$.
A square matrix may also be symmetric if it can be reflected along the main diagonal (from upper left down to lower right) while remaining the same. This is summarized in the following definition.

Definition 6.6: Symmetric Matrix
A square matrix is called symmetric if $\mx{A}=\mx{A}^\T$.
Next follows some examples of matrix transposing.

Example 6.1: Matrix Transposes
Assume we have the following matrices,
 $$\mx{A}= \left( \begin{array}{rrr} 1 & 6 & 5 \\ 6 & 2 & 4 \\ 5 & 4 & 3 \end{array} \right) ,\spc\spc \mx{B}= \left( \begin{array}{rr} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{array} \right) ,\spc\spc \mx{C}=\bigl(1\,\,\, 2\,\,\, 3 \bigr).$$ (6.10)
Their corresponding transposes are
 $$\mx{A}^\T= \left( \begin{array}{rrr} 1 & 6 & 5 \\ 6 & 2 & 4 \\ 5 & 4 & 3 \end{array} \right) ,\spc\spc \mx{B}^\T= \left( \begin{array}{rrr} 1 & 2 & 3 \\ 4 & 5 & 6 \end{array} \right) ,\spc\spc \mx{C}^\T= \left( \begin{array}{c} 1 \\ 2 \\ 3 \end{array} \right)$$ (6.11)
Note that $\mx{A}=\mx{A}^\T$, which means that $\mx{A}$ is symmetric (Definition 6.6). It is also worth noting that the size of $\mx{B}$ is $3\times 2$, while the size of $\mx{B}^\T$ is $2\times 3$, which makes sense, since the rows turn into columns when transposing. Finally, $\mx{C}$ is a single row, which turns into a single column in $\mx{C}^\T$. This is similar to how a transposed column vector becomes a row vector.
With these definitions, it is time to attempt to visualize a matrix with geometry. This is not done in any books that we have seen, however, it makes for a deeper understanding in some cases. See Interactive Illustration 6.2.
Interactive Illustration 6.2: In this interactive illustration, we visualize a $2\times 2$ matrix, $\mx{A}$, as the two column vectors, $\vc{a}_{,1}$ and $\vc{a}_{,2}$, that it consists of, i.e., $\mx{A} = \bigl(\textcolor{#aa0000}{\vc{a}_{,1}}\,\, \textcolor{#00aa00}{\vc{a}_{,2}} \bigr)$. Note that the column vectors can be moved in this illustration. As an exercise, see if you can create the identity matrix, $\bigl( \begin{smallmatrix} 1 & 0\\ 0 & 1 \end{smallmatrix} \bigr)$, for example. Now, press Forward to see what a $3\times 3$ matrix may look like.
Interactive Illustration 6.2: Here, we visualize the three column vectors, $\hid{\textcolor{#aa0000}{\vc{a}_{,1}}}$, $\hid{\textcolor{#00aa00}{\vc{a}_{,2}}}$, and $\hid{\textcolor{#0000aa}{\vc{a}_{,3}}}$ of a $\hid{3\times 3}$ matrix, $\hid{\mx{A}}$. Note that the gray, dashed lines are only in this illustration to help reveal the three-dimensional locations of the vectors. Again, as an exercise, it may be useful to attempt to move the vectors so that they form the identity matrix, $\hid{\Bigl( \begin{smallmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{smallmatrix} \Bigr)}$, for example. Recall that you can change the viewpoint by right-clicking, pressing and moving the mouse, or slide with two fingers pressed on a tablet.
$\mx{A} = \left(\begin{array}{l} \hid{1} \\ \hid{1} \end{array}\right.$
$\left.\begin{array}{l} \hid{1} \\ \hid{1} \end{array}\right)$
$\textcolor{#aa0000}{\vc{a}_{,1}}$
$\textcolor{#009000}{\vc{a}_{,2}}$
$\textcolor{#0000aa}{\vc{a}_{,3}}$
$\mx{A} = \left(\begin{array}{l} \hid{1} \\ \hid{1} \\ \hid{1} \end{array}\right.$
$\left.\begin{array}{l} \hid{1} \\ \hid{1} \\ \hid{1} \end{array}\right)$
Next, a handful of matrix operations are presented.

There are three fundamental operations on matrices. These are
• matrix multiplication by a scalar,
• matrix-matrix multiplication.
These are are presented in the following subsections.

#### 6.3.1 Matrix Multiplication by a Scalar

Matrix multiplication by a scalar is quite similar to vector multiplication by a scalar (Section 2.3), as can be seen in the following definition.

Definition 6.7: Matrix Multiplication by a Scalar
A matrix $\mx{A}$ can be multiplied by a scalar $k$ to form a new matrix $\mx{S} = k \mx{A}$, which is of the same size as $\mx{A}$.
 $$\mx{S}= \left( \begin{array}{cccc} s_{11} & s_{12} & \dots & s_{1c} \\ s_{21} & s_{22} & \dots & s_{2c} \\ \vdots & \vdots & \ddots & \vdots \\ s_{r1} & s_{r2} & \dots & s_{rc} \end{array} \right) = \left( \begin{array}{cccc} k a_{11} & k a_{12} & \dots & k a_{1c} \\ k a_{21} & k a_{22} & \dots & k a_{2c} \\ \vdots & \vdots & \ddots & \vdots \\ k a_{r1} & k a_{r2} & \dots & k a_{rc} \end{array} \right)$$ (6.12)
This is more compactly expressed as: $[ s_{ij} ] = k[ a_{ij} ] = [ k a_{ij} ]$.
A short example on the scalar multiplication by a matrix follows.

Example 6.2: Matrix Multiplication by a Scalar
A $2\times 2$ matrix $\mx{A}$ is
 $$\mx{A}= \left( \begin{array}{rr} 5 & -2 \\ 3 & 8 \end{array} \right).$$ (6.13)
If we want to multiply this matrix by a scalar, $k=4$, we get
 $$k\mx{A}=4 \left( \begin{array}{rr} 5 & -2 \\ 3 & 8 \end{array} \right) = \left( \begin{array}{rr} 4\cdot 5 & 4\cdot (-2) \\ 4\cdot 3 & 4\cdot 8 \end{array} \right) = \left( \begin{array}{rr} 20 & -8 \\ 12 & 32 \end{array} \right).$$ (6.14)

If two matrices $\mx{A}$ and $\mx{B}$ have the same size, then the two matrices can be added to form a new matrix, $\mx{S}=\mx{A} + \mx{B}$, of the same size, where each element $s_{ij}$ is the sum of the elements in the same position in $\mx{A}$ and $\mx{B}$, i.e.,
 $$\mx{S}= \left( \begin{array}{cccc} s_{11} & s_{12} & \dots & s_{1c} \\ s_{21} & s_{22} & \dots & s_{2c} \\ \vdots & \vdots & \ddots & \vdots \\ s_{r1} & s_{r2} & \dots & s_{rc} \end{array} \right) = \left( \begin{array}{cccc} a_{11}+b_{11} & a_{12}+b_{12} & \dots & a_{1c}+b_{1c} \\ a_{21}+b_{21} & a_{22}+b_{22} & \dots & a_{2c}+b_{2c} \\ \vdots & \vdots & \ddots & \vdots \\ a_{r1}+b_{r1} & a_{r2}+b_{r2} & \dots & a_{rc}+b_{rc} \end{array} \right).$$ (6.15)
This is more compactly expressed as: $[ s_{ij} ] = [ a_{ij} ] + [ b_{ij} ] = [ a_{ij} + b_{ij} ]$.
With the help from Definition 6.7 (matrix multiplication by a scalar), and with the definition of matrix addition, we can easily subtract two matrices. The difference $\mx{D}$ between $\mx{A}$ and $\mx{B}$ becomes
 \begin{gather} \mx{D} = \mx{A} + (-1)\mx{B} = \mx{A} - \mx{B} \\ \Longleftrightarrow \\ [d_{ij}] = [a_{ij}]+ (-1)[b_{ij}] = [a_{ij}]+ [-b_{ij}] = [a_{ij} - b_{ij}], \end{gather} (6.16)
where we on the first row have used scalar multiplication (by $-1$) and matrix addition.

A short example on matrix addition follows.

Assume we have two $2\times 2$ matrices $\mx{A}$ and $\mx{B}$, which are set as
 $$\mx{A}= \left( \begin{array}{rr} 5 & -2 \\ 3 & 8 \end{array} \right) \,\,\,\,\,\,\text{and}\,\,\,\,\,\, \mx{B}= \left( \begin{array}{rr} -1 & 2 \\ 4 & -6 \end{array} \right).$$ (6.17)
The matrix addition, $\mx{S}=\mx{A}+\mx{B}$, is
 $$\mx{S}=\mx{A}+\mx{B}= \left( \begin{array}{rr} 5 & -2 \\ 3 & 8 \end{array} \right) + \left( \begin{array}{rr} -1 & 2 \\ 4 & -6 \end{array} \right) = \left( \begin{array}{rr} 5-1 & -2+2 \\ 3+4 & 8-6 \end{array} \right) = \left( \begin{array}{rr} 4 & 0 \\ 7 & 2 \end{array} \right).$$ (6.18)

#### 6.3.3 Matrix-Matrix Multiplication

While matrix multiplication by a scalar and matrix addition are rather straightforward, the matrix-matrix multiplication may not be so at first. However, as we will see, it is an extremely powerful tool. The definition is below.

Definition 6.9: Matrix-Matrix Mulitplication
If $\mx{A}$ is an $r \times s$ matrix and $\mx{B}$ is an $s\times t$ matrix, then the product matrix $\mx{P}=\mx{A}\mx{B}$, which is an $r \times t$ matrix, is defined as
 \begin{align} \mx{P} =& \mx{A}\mx{B} = \left( \begin{array}{ccc} a_{11} & \dots & a_{1s} \\ \vdots & \ddots & \vdots \\ a_{r1} & \dots & a_{rs} \end{array} \right) \left( \begin{array}{ccc} b_{11} & \dots & b_{1t} \\ \vdots & \ddots & \vdots \\ b_{s1} & \dots & b_{st} \end{array} \right)\\ &\\ =& \left( \begin{array}{ccc} \sum_{k=1}^s a_{1k} b_{k1} & \dots & \sum_{k=1}^s a_{1k} b_{kt} \\ \vdots & \ddots & \vdots \\ \sum_{k=1}^s a_{rk} b_{k1} & \dots & \sum_{k=1}^s a_{sk} b_{kt} \end{array} \right) = \left( \begin{array}{ccc} p_{11} & \dots & p_{1t} \\ \vdots & \ddots & \vdots \\ p_{r1} & \dots & p_{rt} \end{array} \right). \end{align} (6.19)
Note that there must be as many columns in $\mx{A}$ as there are rows in $\mx{B}$, otherwise the matrix-matrix multiplication is not defined. The matrix-matrix multiplication is more compactly expressed as $\bigl[p_{ij}\bigr] = \Bigl[\sum_{k=1}^s a_{ik} b_{kj}\Bigr]$.
The size of the product may be remembered more easily with this rule,
 $$(r \times \bcancel{s})\, (\bcancel{s} \times t) \longrightarrow (r \times t),$$ (6.20)
i.e., only the row size ($r$) of the first operand and the column size ($t$) of the second operand remains, and the column size ($s$) of the first operand and the row size ($s$) of the second operand must be equal.

Note that the sum in the product, $\sum_{k=1}^s a_{ik} b_{kj}$, reminds us of a dot product in an orthonormal basis (Definition 3.6). Hence, by using Definition 6.2, we can express the matrix-matrix multiplication in terms of the row vectors of $\mx{A}$ and the column vectors of $\mx{B}$ as
 \begin{align} \mx{P} = \mx{A}\mx{B} &= \left( \begin{array}{c} -\,\,\, \vc{a}_{1,}^\T - \\ \textcolor{#cc0000}{-}\,\,\, \textcolor{#cc0000}{\vc{a}_{2,}^\T} \textcolor{#cc0000}{-} \\ \vdots \\ -\,\,\, \vc{a}_{r,}^\T - \end{array} \right) \left( \begin{array}{ccccc} \vert & \vert & \textcolor{#cc0000}{\vert} & & \vert \\ \vc{b}_{,1} & \vc{b}_{,2} & \textcolor{#cc0000}{\vc{b}_{,3}} & \dots & \vc{b}_{,t} \\ \vert & \vert & \textcolor{#cc0000}{\vert} & & \vert \end{array} \right) \\ &\\ &= \left( \begin{array}{ccccc} \vc{a}_{1,}\cdot \vc{b}_{,1} & \vc{a}_{1,}\cdot \vc{b}_{,2} & \vc{a}_{1,}\cdot \vc{b}_{,3} & \dots & \vc{a}_{1,}\cdot \vc{b}_{,t} \\ \vc{a}_{2,}\cdot \vc{b}_{,1} & \vc{a}_{2,}\cdot \vc{b}_{,2} & \textcolor{#cc0000}{\vc{a}_{2,}\cdot \vc{b}_{,3}} & \dots & \vc{a}_{2,}\cdot \vc{b}_{,t} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \vc{a}_{r,}\cdot \vc{b}_{,1} & \vc{a}_{r,}\cdot \vc{b}_{,2} & \vc{a}_{r,}\cdot \vc{b}_{,3} & \dots & \vc{a}_{r,}\cdot \vc{b}_{,t} \\ \end{array} \right). \end{align} (6.21)
As can be seen on the red vectors above, you can produce a matrix element on the $i$:th row and $j$:th column in $\mx{P}$ by taking the dot product between the $i$:th row vector in $\mx{A}$ and the $j$:th column vector in $\mx{B}$, i.e.,
 $$[p_{ij}] = \Biggl[\sum_{k=1}^s a_{ik} b_{kj}\Biggr] = \bigl[ \vc{a}_{i,} \cdot \vc{b}_{,j} \bigr],$$ (6.22)
in shorthand notation. Note that if we relax the notion of the matrix, so that $\vc{a}_{i,}^\T$ is considered as a $1\times s$ matrix, and $\vc{b}_{,j}$ is an $s\times 1$ matrix, then we can rewrite Equation (6.22) as a matrix-matrix multiplication, that is,
 $$[p_{ij}] = \Biggl[ \sum_{k=1}^s a_{ik} b_{kj} \Biggr] = \biggl[ \vc{a}_{i,}^\T \vc{b}_{,j} \biggr] = \Biggl[ \Bigl(a_1\spc a_2 \dots a_s \Bigr) \left( \begin{array}{c} b_1\\ b_2\\ \vdots\\ b_s \end{array} \right) \Biggr].$$ (6.23)

Example 6.4: Matrix-Matrix Multiplication
In this example, we will multiply a $3 \times 2$ matrix $\mx{M}$ by a $2\times 3$ matrix $\mx{N}$ i.e.,
 \begin{gather} \mx{M} = \left( \begin{array}{rr} 4 & 2 \\ 3 & -2 \\ 0 & -1 \end{array} \right) ,\,\,\,\,\,\, \mx{N} = \left( \begin{array}{rrr} 2 & 1 & 3 \\ -1 & 5 & 8 \\ \end{array} \right) \\ \, \\ \mx{M}\vc{N} = \left( \begin{array}{rr} \textcolor{#00aaaa}{4} & \textcolor{#00aaaa}{2} \\ \textcolor{#aaaa00}{3} & \textcolor{#aaaa00}{-2} \\ \textcolor{#aa00aa}{0} & \textcolor{#aa00aa}{-1} \end{array} \right) \left( \begin{array}{rrr} \textcolor{#cc0000}{2} & \textcolor{#00cc00}{1} & \textcolor{#0000cc}{3} \\ \textcolor{#cc0000}{-1} & \textcolor{#00cc00}{5} & \textcolor{#0000cc}{8} \\ \end{array} \right) = \\ \, \\ \left( \begin{array}{rrr} \textcolor{#00aaaa}{4} \cdot \textcolor{#cc0000}{2} + \textcolor{#00aaaa}{2}\cdot (\textcolor{#cc0000}{-1}) & \textcolor{#00aaaa}{4} \cdot \textcolor{#00cc00}{1} + \textcolor{#00aaaa}{2}\cdot \textcolor{#00cc00}{5} & \textcolor{#00aaaa}{4} \cdot \textcolor{#0000cc}{3} + \textcolor{#00aaaa}{2}\cdot \textcolor{#0000cc}{8} \\ \textcolor{#aaaa00}{3} \cdot \textcolor{#cc0000}{2} \textcolor{#aaaa00}{-2}\cdot (\textcolor{#cc0000}{-1}) & \textcolor{#aaaa00}{3} \cdot \textcolor{#00cc00}{1} \textcolor{#aaaa00}{-2}\cdot \textcolor{#00cc00}{5} & \textcolor{#aaaa00}{3} \cdot \textcolor{#0000cc}{3} \textcolor{#aaaa00}{-2}\cdot \textcolor{#0000cc}{8} \\ \textcolor{#aa00aa}{0} \cdot \textcolor{#cc0000}{2} \textcolor{#aa00aa}{-1}\cdot (\textcolor{#cc0000}{-1}) & \textcolor{#aa00aa}{0} \cdot \textcolor{#00cc00}{1} \textcolor{#aa00aa}{-1}\cdot \textcolor{#00cc00}{5} & \textcolor{#aa00aa}{0} \cdot \textcolor{#0000cc}{3} \textcolor{#aa00aa}{-1}\cdot \textcolor{#0000cc}{8} \end{array} \right) = \\ \, \\ \left( \begin{array}{rrr} 6 & 14 & 28 \\ 8 & -7 & -7 \\ 1 & -5 & -8 \end{array} \right). \end{gather} (6.24)
Note that the rows in $\mx{M}$ and the columns in $\vc{N}$ have been color coded here to more easily see what is going on. The size of the product is (see Rule (6.20)): $(3 \times \bcancel{2})\, (\bcancel{2} \times 3) \longrightarrow (3 \times 3)$, i.e., the result is a $3\times 3$ matrix.
Per Definition 6.9, we know that the matrix-matrix multiplication, $\mx{M}\mx{N}$, is only defined if $\mx{M}$ is $r \times s$ and $\mx{N}$ is $s\times t$. This means the number of columns ($s$) in $\mx{M}$ must be equal to the number of rows ($s$) in $\mx{N}$. Hence, $r$ and $t$ can be arbitrary values as long as they are $\geq 1$. If $t=1$ then $\mx{M}$ has only one column, and we do not really have a matrix, but rather a column vector, as exemplified in (6.8). This means that matrix-vector multiplication is a subset of the matrix-matrix multiplication. An example is given below.

Example 6.5: Matrix-Vector Multiplication
In this example, a $3\times 3$ matrix, $\mx{M}$, will be multiplied by a three-dimensional vector, $\vc{v}$, i.e.,
 \begin{gather} \mx{M} = \left( \begin{array}{rrr} 1 & 0 & 2 \\ 2 & -1 & 3 \\ 4 & -2 & -3 \end{array} \right) ,\,\,\,\,\,\, \vc{v} = \left( \begin{array}{r} -4\\ 5 \\ 6 \end{array} \right) \\ \, \\ \mx{M}\vc{v} = \left( \begin{array}{rrr} \textcolor{#00aaaa}{1} & \textcolor{#00aaaa}{0} & \textcolor{#00aaaa}{2} \\ \textcolor{#aaaa00}{2} & \textcolor{#aaaa00}{-1} & \textcolor{#aaaa00}{3} \\ \textcolor{#aa00aa}{4} & \textcolor{#aa00aa}{-2} & \textcolor{#aa00aa}{-3} \end{array} \right) \left( \begin{array}{r} \textcolor{#cc0000}{-4}\\ \textcolor{#00cc00}{5} \\ \textcolor{#0000cc}{6} \end{array} \right) = \left( \begin{array}{r} \textcolor{#00aaaa}{1} \cdot (\textcolor{#cc0000}{-4}) + \textcolor{#00aaaa}{0}\cdot \textcolor{#00cc00}{5} + \textcolor{#00aaaa}{2} \cdot \textcolor{#0000cc}{6}\\ \textcolor{#aaaa00}{2} \cdot (\textcolor{#cc0000}{-4}) \textcolor{#aaaa00}{- 1}\cdot \textcolor{#00cc00}{5} + \textcolor{#aaaa00}{3} \cdot \textcolor{#0000cc}{6}\\ \textcolor{#0000cc}{4} \cdot (\textcolor{#cc0000}{-4}) \textcolor{#0000cc}{- 2}\cdot \textcolor{#00cc00}{5} \textcolor{#0000cc}{- 3} \cdot \textcolor{#0000cc}{6} \end{array} \right) = \left( \begin{array}{r} 8 \\ 5 \\ -44 \end{array} \right). \end{gather} (6.25)
Note that the rows in $\mx{M}$ and $\vc{v}$ have been color coded here to more easily see what is going on. The matrix-vector multiplication behaves exactly as the matrix-matrix multiplication, except here, the second operand $(\vc{v})$ has only one column. The size of the product is (see Rule (6.20)): $(3 \times \bcancel{3})\, (\bcancel{3} \times 1) \longrightarrow (3 \times 1)$, i.e., the result is a $3\times 1$ matrix, which is a three-dimensional column vector.
Note that the example in (6.1) is a linear system of equations that can be expressed using a matrix and two vectors, i.e.,
 \begin{gather} \begin{cases} \begin{array}{rrrl} 2 & \!\!\!\!\!\! x_1 + 4 &\!\!\!\!\!\!\!x_2 - 2 &\!\!\!\!\!\!x_3 = \hid{-}16 \\ - & \!\!\!\!\!\! x_1 - 7 &\!\!\!\!\!\!x_2 + 2 &\!\!\!\!\!\!x_3 = -27 \\ & 3 &\!\!\!\!\!\!x_2 - 6 &\!\!\!\!\!\!x_3 = -21 \\ \end{array} \end{cases} \\ \Longleftrightarrow \\ \underbrace{ \left( \begin{array}{rrr} -2 & 4 & -2 \\ -1 & -7 & 2 \\ 0 & 3 & -6 \end{array} \right) }_{\mx{A}} \, \underbrace{ \left( \begin{array}{c} x_1\\ x_2\\ x_3 \end{array} \right) }_{\vc{x}} = \underbrace{ \left( \begin{array}{r} 16\\ -27\\ -21 \end{array} \right) }_{\vc{b}} \\ \Longleftrightarrow \\ \mx{A} \vc{x} = \vc{b}. \end{gather} (6.26)
Now that we know this much, we can de-mystify where the definition of the matrix-matrix multiplication (Definition 6.9) really comes from. Let us assume that we have two systems of linear equations,
 \begin{gather} \begin{cases} \begin{array}{r} z_1 = a_{11} y_1 + a_{12} y_2 \\ z_2 = a_{21} y_1 + a_{22} y_2 \end{array} \end{cases} \,\,\,\,\,\,\,\,\text{and}\,\,\,\,\,\,\,\, \begin{cases} \begin{array}{r} y_1 = b_{11} x_1 + b_{12} x_2 \\ y_2 = b_{21} x_1 + b_{22} x_2 \end{array} \end{cases}. \end{gather} (6.27)
Now, what would the result be if we wanted to express $z_1$ and $z_2$ in terms of $x_1$ and $x_2$ instead of in $y_1$ and $y_2$. This is done below as
 \begin{gather} \begin{cases} \begin{array}{r} z_1 = a_{11} (b_{11} x_1 + b_{12} x_2) + a_{12} (b_{21} x_1 + b_{22} x_2) \\ z_2 = a_{21} (b_{11} x_1 + b_{12} x_2) + a_{22} (b_{21} x_1 + b_{22} x_2) \end{array} \end{cases} \\ \Longleftrightarrow \\ \begin{cases} \begin{array}{r} z_1 = (a_{11} b_{11} + a_{12} b_{21}) x_1 + (a_{11} b_{12} + a_{12} b_{22}) x_2) \\ z_2 = (a_{21} b_{11} + a_{22} b_{21}) x_1 + (a_{21} b_{12} + a_{22} b_{22}) x_2) \end{array} \end{cases}. \end{gather} (6.28)
As can be seen, the terms before $x_1$ and $x_2$ are exactly the terms that we would get if we multiply two matrices, $\mx{A}$ and $\mx{B}$. Both Equation (6.27) and (6.28) can be expressed on matrix/vector form as
 \begin{gather} \vc{z}=\mx{A}\vc{y} \spc\spc\text{and}\spc\spc \vc{y}=\mx{B}\vc{x} \\ \Longleftrightarrow \\ \vc{z}=\mx{A}\mx{B}\vc{x}, \end{gather} (6.29)
and this motivates why the matrix-matrix multiplication is defined the way it is.

As we saw in Equation (6.21), the first operand in the matrix-matrix multiplication, $\mx{A}\mx{B}$, can be seen as a set of row vectors, while the second operand can be seen as a set of column vectors. Now, we have just seen that a matrix times a vector produces a vector (if the sizes match). Hence, we can think of the second operand $(\mx{B})$ as a set of column vectors that are transformed by the first operand, namely, $\mx{A}$. This can be expressed as
 \begin{align} \mx{P} = \mx{A}\mx{B} = \mx{A} \left( \begin{array}{ccc} \vert & & \vert \\ \vc{b}_{,1} & \dots & \vc{b}_{,t} \\ \vert & & \vert \end{array} \right) = \left( \begin{array}{ccc} \vert & & \vert \\ \mx{A}\vc{b}_{,1} & \dots & \mx{A}\vc{b}_{,t} \\ \vert & & \vert \end{array} \right). \end{align} (6.30)
That is, we can see the matrix-matrix multiplication like this: we start with $t$ column vectors, $\vc{b}_{,1}\dots \vc{b}_{,t}$ (in this case), and each of these are transformed by $\mx{A}$, and inserted as column vectors in the product matrix.

Next, we present a simple example where the dot product is expressed as matrix-matrix multiplication.

Example 6.6: Dot Product as Matrix-Matrix Multiplication
Note that since a (column) vector, $\vc{v}$, can be written as a matrix with a single column, $\mx{V}=(\vc{v})$, one can also express the dot product between two vectors, $\vc{u}$ and $\vc{v}$, using matrix-matrix multiplication between two column vectors/matrices as
 \begin{gather} \vc{u} \cdot \vc{v} = \vc{u}^\T \vc{v} = \begin{pmatrix} u_1 & u_2 & \dots & u_n \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ u_n \end{pmatrix} = \sum_{i=1}^n u_i v_i. \end{gather} (6.31)

Example 6.7: 2000 year old Chinese problem
Linear systems of equations were studied in a classic Chinese textbook Nine Chapters on the Mathematical Art. The books were compiled during the first two centuries BCE. Chapter 8 of the book is called The rectangular array. It contains several problems that are systems of linear equations. One of the problems (problem 17) is:

"Now given 5 sheep, 4 dogs, 3 hens and 2 rabbits cost 1496 coins in total; 4 sheep, 2 dogs, 6 hens and 3 rabbits cost 1175 coins; 3 sheep, 1 dog, 7 hens and 5 rabbits cost 958 coints; 2 sheep, 3 dogs, 5 hens and 1 rabbit cots 861 coins. Tell: how much is each of them?"

Introduce a variable for the cost of each type of animal, so that each sheep costs $x_1$ coins, each dog costs $x_2$, each hen costs $x_3$ and each rabbit costs $x_4$ coins. Then the four statements can be written
 $$\begin{cases} \begin{array}{rrrrrrrl} 5 x_1 & \bfm + & \bfm 4 x_2 & \bfm + & \bfm 3 x_3 & \bfm + & \bfm 2 x_4 & \bfm = 1496, \\ 4 x_1 & \bfm + & \bfm 2 x_2 & \bfm + & \bfm 6 x_3 & \bfm + & \bfm 3 x_4 & \bfm = 1175, \\ 3 x_1 & \bfm + & \bfm x_2 & \bfm + & \bfm 7 x_3 & \bfm + & \bfm 5 x_4 & \bfm = 958, \\ 2 x_1 & \bfm + & \bfm 3 x_2 & \bfm + & \bfm 5 x_3 & \bfm + & \bfm x_4 & \bfm = 861. \\ \end{array} \end{cases}$$ (6.32)
Note that this can be rewritten on matrix-vector form as
 \begin{gather} \underbrace{ \begin{pmatrix} 5 & 4 & 3 & 2 \\ 4 & 2 & 6 & 3 \\ 3 & 1 & 7 & 5 \\ 2 & 3 & 5 & 1 \\ \end{pmatrix} }_{\mx{A}} \underbrace{ \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} }_{\vc{x}} = \underbrace{ \begin{pmatrix} 1496 \\ 1175 \\ 958 \\ 861 \end{pmatrix} }_{\vc{y}} \\ \Longleftrightarrow \\ \mx{A}\vc{x} = \vc{y}. \end{gather} (6.33)
This can be solved to recover $x_1$, $x_2$, $x_3$ and $x_4$. Solving such systems of equations is the topic of Chapter 5, which is about Gaussian elimination. Solving such systems of equations was also the topic of chapter 8 of the Chinese textbook.

Next, we will also solve this system of equations using Gaussian elimination. However, to save some space, we do not write out the unknowns and we write the left and right side inside one big parenthesis with a vertical line inbetween. This results in
 \begin{align} \left( \begin{array}{rrrr|r} 5 & 4 & 3 & 2 & 1496 \\ 4 & 2 & 6 & 3 & 1175 \\ 3 & 1 & 7 & 5 & 958 \\ 2 & 3 & 5 & 1 & 861 \\ \end{array} \right). \end{align} (6.34)
This notation is also used in Example 6.11. The rules of Gaussian elimination (Theorem 5.2) can be applied still, since we have only changed the notation. For example, we can multiply row 2 by 5 and subtract row 1 multiplied by 4 and put the result in row 2. This gives us
 \begin{align} \left( \begin{array}{rrrr|r} 5 & 4 & 3 & 2 & 1496 \\ 0 & -6 & 18 & 7 & -109 \\ 3 & 1 & 7 & 5 & 958 \\ 2 & 3 & 5 & 1 & 861 \\ \end{array} \right). \end{align} (6.35)
Next, we want to get rid of the 3 and the 2 in the left column. We multiply row 3 by 2 and subtract row 4 times 3 and put the result in row 4. At the same time, we multiply row 1 times 3 and subtract row 3 by 5 and put the resulting row in row 3, which gives us
 \begin{align} \left( \begin{array}{rrrr|r} 5 & 4 & 3 & 2 & 1496 \\ 0 & -6 & 18 & 7 & -109 \\ 0 & 7 & -26 & -19 & -302 \\ 0 & -7 & -1 & 7 & -667 \\ \end{array} \right). \end{align} (6.36)
The next step is to eliminate the 7 and -7 in column 2. First, we just add row 3 and row 4 and put the result in row 4. Second, we multiply row 2 by 7 and add to row 3 times 6 and put the resulting row in row 3, which gives us
 \begin{align} \left( \begin{array}{rrrr|r} 5 & 4 & 3 & 2 & 1496 \\ 0 & -6 & 18 & 7 & -109 \\ 0 & 0 & -30 & -65 & -2575 \\ 0 & 0 & -27 & -12 & -969 \\ \end{array} \right). \end{align} (6.37)
Finally, we multiply row 3 by 27 and subtract row 4 times 30. This results in
 \begin{align} \left( \begin{array}{rrrr|r} 5 & 4 & 3 & 2 & 1496 \\ 0 & -6 & 18 & 7 & -109 \\ 0 & 0 & -30 & -65 & -2575 \\ 0 & 0 & 0 & -1395 & -40455 \\ \end{array} \right), \end{align} (6.38)
which means that the last row says $-1395 x_4 = -40455$, i.e., $x_4 = 29$ coins (per rabbit). We can use that to recover $x_3 = 23$ coins (per hen), and then use that to recover $x_2 = 121$ coins (per dog), and finally also recover $x_1=177$ coins (per sheep).

In this section, a number of useful matrices will be presented. This includes matrices for rotation, scaling, and shearing in both two and three dimensions. It should be noted that these can be generalized to higher dimensions as well.

#### 6.4.1 Two Dimensions

There are many cases when it is desirable to apply a rotation. This can be to make align a vector with another vector, or simply rotate an object in order to animate it. The rotation matrix in two dimensions is simple to derive. A point $\vc{p}=(p_x,p_y)$ can be parameterized using a radius, $r$, and an angle, $\theta$, as $\vc{p}=(p_x,p_y)=(r\cos\theta, r\sin\theta)$. Rotating $\vc{p}$ by $\phi$ radians will produce a new vector $\vc{q}=(r\cos(\theta+\phi), r\sin(\theta+\phi))$, which can be rewritten as
 \begin{align} \vc{q}=& \begin{pmatrix} r\cos(\theta+\phi) \\ r\sin(\theta+\phi) \end{pmatrix} = \begin{pmatrix} r (\cos\theta \cos\phi - \sin\theta \sin\phi) \\ r (\sin\theta \cos\phi + \cos\theta \sin\phi) \end{pmatrix} \\ =& \underbrace{ \left(\begin{array}{rr} \cos\phi & -\sin\phi \\ \sin\phi & \cos\phi \end{array}\right) }_{\mx{R}(\phi)} \underbrace{ \begin{pmatrix} r\cos\theta \\ r\sin\theta \end{pmatrix} }_{\vc{p}}, \end{align} (6.39)
where we have used the angle sum relations $cos(\theta+\phi)=\cos\theta \cos\phi - \sin\theta \sin\phi$ and $\sin(\theta+\phi) = \sin\theta \cos\phi + \cos\theta \sin\phi$. Note that we separated out a $2\times 2$ matrix $\mx{R}(\phi)$ and that the rotated vector $\vc{q}$ is that matrix multiplied by the vector $\vc{p}$, i.e.,
 \begin{align} \vc{q} = \mx{R}(\phi) \vc{p}. \end{align} (6.40)
This leads to the following definition.

Definition 6.10: Two-Dimensional Rotation Matrix
A $2\times 2$ rotation matrix is defined by
 \begin{align} \mx{R}(\phi) = & \left(\begin{array}{rr} \cos \phi & -\sin \phi \\ \sin \phi & \cos \phi \end{array}\right), \end{align} (6.41)
where $\phi$ is the number of radians that the matrix rotates by (counter-clockwise).
In Interactive Illustration 6.3, a rotation matrix is applied to the vertices of a rectangle, and the reader can change the angle, $\phi$, of the rotation matrix, $\mx{R}(\phi)$.
$\phi=$
Interactive Illustration 6.3: By pulling the slider above, you can apply a two-dimensional rotation matrix, $\mx{R}(\phi)$, where the angle $\phi$ is obtained from the slider. To the very left, the angle is $-\pi$ radians and to the very right it is $+\pi$ radians. Note that the rotation matrix is applied to the vertices (colored circles) of the rectangle, which generates rotated vertices. More specifically, vectors are created from the origin (located where the arrowed lines intersect) to the points of the rectangle corners and the matrix is actually applied to these vectors. The original rectangle and its vertices can be seen as a grey rectangle with lighter colored circles.
Interactive Illustration 6.3: By pulling the slider above, you can apply a two-dimensional rotation matrix, $\hid{\mx{R}(\phi)}$, where the angle $\hid{\phi}$ is obtained from the slider. To the very left, the angle is $\hid{-\pi}$ radians and to the very right it is $\hid{+\pi}$ radians. Note that the rotation matrix is applied to the vertices (colored circles) of the rectangle, which generates rotated vertices. More specifically, vectors are created from the origin (located where the arrowed lines intersect) to the points of the rectangle corners and the matrix is actually applied to these vectors. The original rectangle and its vertices can be seen as a grey rectangle with lighter colored circles.
A scaling matrix is very simple in that it has zeroes everywhere except in the diagonal elements. Hence, each diagonal element will be applied as a multiplicative factor to its respective dimension.

Definition 6.11: Two-Dimensional Scaling Matrix
A scaling matrix is defined by
 \begin{align} \mx{S}(f_x, f_y) = & \begin{pmatrix} f_x & 0 \\ 0 & f_y \end{pmatrix}, \end{align} (6.42)
where $f_x$ is the factor that is applied in the $x$-dimension and $f_y$ is applied to the $y$-dimension.
An example of how a scaling matrix applied to a rectangle can be seen in Interactive Illustration 6.4.
$f_x=$
$f_y=$
Interactive Illustration 6.4: The two sliders above control the scaling factors $f_x$ and $f_y$, in the $x$- and $y$-direction, respectively.
Interactive Illustration 6.4: The two sliders above control the scaling factors $\hid{f_x}$ and $\hid{f_y}$, in the $\hid{x}$- and $\hid{y}$-direction, respectively.
The effect of a shear matrix is best seen before it is described in detail, so we recommend that the reader explores Interactive Illustration 6.5 first, and then a formal definition will be provided.
$s=$
Interactive Illustration 6.5: The slider above control the shearing factor, $s$, for the shear transform. As can be seen, the $y$-coordinates are constant, while the $x$-coordinates change more, the bigger the absolute value of $y$ is.
Interactive Illustration 6.5: The slider above control the shearing factor, $\hid{s}$, for the shear transform. As can be seen, the $\hid{y}$-coordinates are constant, while the $\hid{x}$-coordinates change more, the bigger the absolute value of $\hid{y}$ is.
Given the figure above, one can come to the conclusion that shearing is done using an identity matrix, where one of the zeroes have been replaced by a non-zero factor, $s$.

Definition 6.12: Two-Dimensional Shear Matrix
A two-dimensional shear matrix is defined by either of
 \begin{align} \mx{H}_{xy}(s) = \begin{pmatrix} 1 & s \\ 0 & 1 \end{pmatrix} \ \ \ \mathrm{or} \ \ \ \mx{H}_{yx}(s) = \begin{pmatrix} 1 & 0 \\ s & 1 \end{pmatrix}. \end{align} (6.43)
Note that the first subscript of $\mx{H}$ refers to which coordinate is changed, and the second subscript refers to the coordinate that is used to scale by $s$ and add to that first coordinate.
As an example, $\mx{H}_{xy}(s)$ means that the $x$-coordinate will be sheared using $s$ times the $y$-coordinate. This is, in fact, exactly what is shown in Interactive Illustration 6.5.

#### 6.4.2 Three Dimensions

In three dimensions, rotation, scaling, and shearing behave in pretty much the same way as in two dimensions. However, there are more ways to perform each of these. For example, rotation in two dimensions occurred in the plane, but in three dimensions, it is possible to rotate around each axis. Let us start with rotation matrices.

Definition 6.13: Three-Dimensional Rotation Matrices
Rotation around the three major axes are done using the following three rotation matrices.
 \begin{gather} \mx{R}_x(\phi) = \begin{pmatrix} 1 & 0 & 0 \\ 0 & \cos \phi & -\sin \phi \\ 0 & \sin \phi & \hid{-}\cos \phi \end{pmatrix}, \ \ \ \mx{R}_y(\phi) = \begin{pmatrix} \hid{-}\cos \phi & 0 & \sin \phi \\ 0 & 1 & 0 \\ -\sin \phi & 0 & \cos \phi \end{pmatrix}, \\ \mx{R}_z(\phi) = \begin{pmatrix} \cos \phi & -\sin \phi & 0 \\ \sin \phi & \hid{-}\cos \phi & 0 \\ 0 & 0 & 1 \end{pmatrix}, \end{gather} (6.44)
where $\phi$ is the number of radians that the matrix rotates by (counter-clockwise).
Note that a rotation matrix around an axis, $i$, that is, using $\mx{R}_i(\phi)$, leaves the $i$-coordinates unaffected while the two remaining coordinates are rotated around axis $i$. It should be noted that it is possible to create rotation matrices around any arbitrary axis as well.

Recall that we use right-handed coordinate systems, if nothing else is mentioned. It is worth noting that $\mx{R}_x$ and $\mx{R}_z$ are similar to the two-dimensional rotation matrix (Definition 6.10), but $\mx{R}_y$ has the signs on the $\sin\phi$-terms flipped. This is so because we want to use positive orientations for $\phi$ for all three rotation matrices. For $\mx{R}_x$, for example, imagine that you are looking down the negative $x$-axis, which means you will see the $y$- and $z$-axes positively oriented. This is not the case, for the $x$- and $z$-axes, when looking down the negative $y$-axis for $\mx{R}_y$. Note that since $\cos(-\phi)=\cos\phi$ and $\sin(-\phi)=-\sin\phi$, rotating in the negative direction will flip the signs only on the $\sin\phi$-terms.

Scaling matrices in three dimensions are simpler since they only add scaling along the $z$-axis compared to a two-dimensional scaling matrix. This leads to the following definition.

Definition 6.14: Three-Dimensional Scaling Matrix
A three-dimensional scaling matrix is defined by
 \begin{align} \mx{S}(f_x, f_y,f_z) = & \begin{pmatrix} f_x & 0 & 0\\ 0 & f_y & 0 \\ 0 & 0 & f_z \end{pmatrix}, \end{align} (6.45)
where $f_x$, $f_y$, and $f_z$ are the factors that are applied in the $x$, $y$, and $z$-dimensions, respectively.
In contrast to scaling in three dimensions, shearing can be done in several different ways. However, the matrices are still quite similar as shown in the following definition.

Definition 6.15: Three-Dimensional Shear Matrices with One Parameter
A three-dimensional shear matrix is defined by, for example,
 \begin{align} \mx{H}_{xy}(s) = \begin{pmatrix} 1 & s & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{pmatrix}, \ \ \ \mx{H}_{yx}(s) = \begin{pmatrix} 1 & 0 & 0 \\ s & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}, \ \ \ \mx{H}_{zy}(s) = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & s & 1 \end{pmatrix}. \end{align} (6.46)
Similar to shearing in two dimensions (Definition 6.12), the first subscript of $\mx{H}$ refers to which coordinate is changed and the second subscript refers to the coordinate that is used to scale by $s$ and add to that first coordinate. The following combinations are also possible, $\mx{H}_{xz}(s)$, $\mx{H}_{yz}(s)$, and $\mx{H}_{zx}(s)$.
Sometimes, it is useful to have shear matrices with two parameters, which is a simple extension of the shear matrices with one parameter.

Definition 6.16: Three-Dimensional Shear Matrices with Two Parameters
A three-dimensional shear matrix with two parameters, $s$ and $t$, is defined by
 \begin{align} \mx{H}_{x}(s,t) = \begin{pmatrix} 1 & s & t \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{pmatrix}, \ \ \ \mx{H}_{y}(s,t) = \begin{pmatrix} 1 & 0 & 0 \\ s & 1 & t \\ 0 & 0 & 1 \end{pmatrix}, \ \ \ \mx{H}_{z}(s,t) = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ s & t & 1 \end{pmatrix}. \end{align} (6.47)
Some of the matrices in this chapter will be used to illustrate some of the properties of matrix arithmetic, which is the topic of the next section. For example, we will show that rotating first and then shearing is not the same as shearing first and then rotating. Hence, matrix multiplication is not commutative.

Theorem 6.1: Matrix Arithmetic Properties
In the following, we assume that the sizes of the matrices are such that the operations are defined.
 $$\begin{array}{llr} (i) & k(l\mx{A}) = (kl)\mx{A} & \spc\text{(associativity)} \\ (ii) & (k+l)\mx{A} = k\mx{A} +l\mx{A} & \spc\text{(distributivity)} \\ (iii) & k(\mx{A}+\mx{B}) = k\mx{A} +k\mx{B} & \spc\text{(distributivity)} \\ (iv) & \mx{A} + \mx{B} = \mx{B} + \mx{A} & \spc\text{(commutativity)} \\ (v) & \mx{A}+(\mx{B}+\mx{C})=(\mx{A}+\mx{B})+\mx{C} & \spc\text{(associativity)} \\ (vi) & \mx{A}+ (-1)\mx{A} = \mx{O} & \spc\text{(additive inverse)} \\ (vii) & \mx{A}(\mx{B}+\mx{C})=\mx{A}\mx{B}+\mx{A}\mx{C} & \spc\text{(distributivity)} \\ (viii) & (\mx{A}+\mx{B})\mx{C}=\mx{A}\mx{C}+\mx{B}\mx{C} & \spc\text{(distributivity)} \\ (ix) & (\mx{A}\mx{B})\mx{C}=\mx{A}(\mx{B}\mx{C}) & \spc\text{(associativity)} \\ (x) & \mx{I}\mx{A}=\mx{A}\mx{I}=\mx{A} & \spc\text{(multiplicative one)} \\ (xi) & (k\mx{A})^\T=k\mx{A}^\T & \spc\text{(transpose rule 1)} \\ (xii) & (\mx{A}+\mx{B})^\T=\mx{A}^\T+\mx{B}^\T & \spc\text{(transpose rule 2)} \\ (xiii) & (\mx{A}^\T)^\T=\mx{A} & \spc\text{(transpose rule 3)} \\ (xiv) & (\mx{A}\mx{B})^\T=\mx{B}^\T\mx{A}^\T & \spc\text{(transpose rule 4)} \\ \end{array}$$ (6.48)
In addition, we have the following trivial set of rules: $1\mx{A}=\mx{A}$, $0\mx{A}=\mx{O}$, $k\mx{O}=\mx{O}$, and $\mx{A}+\mx{O}=\mx{A}$.

All of these are rather simple to prove by finding an expression for the matrix element at location $ij$ on the left hand side of the equal sign and then controlling that the same expression appears on the right hand side. All rules $(i)-(viii)$ and $(x)-(xiii)$ are trivial to prove, and so are left as exercises to the reader. In the following, we will use the fact that an element in the product, $\mx{P}=\mx{A}\mx{B}$, can be expressed using dot products (see Equation (6.22)), i.e., $[p_{ij}] = \bigl[ \vc{a}_{i,} \cdot \vc{b}_{,j} \bigr]$.
$(vii)$ A matrix element in the left side of the equal sign becomes: $\bigl[ \vc{a}_{i,} \cdot (\vc{b}_{,j}+\vc{c}_{,j}) \bigr]$, and on the right side it becomes: $\bigl[ \vc{a}_{i,} \cdot \vc{b}_{,j} \bigr] + \bigl[ \vc{a}_{i,} \cdot \vc{c}_{,j} \bigr]=$ $\bigl[ \vc{a}_{i,} \cdot (\vc{b}_{,j}+\vc{c}_{,j}) \bigr]$, where we have used the law of distributivity for dot products (Theorem 3.1). This shows that a matrix element is the same for both the left side and the right side of the equal sign.
$(viii)$ Since matrix-matrix multiplication is not commutative (i.e., $\mx{A}\mx{B}\neq\mx{B}\mx{A}$ in general - see Example 6.9), we need to prove this one as well. A matrix element in the left side of the equal sign becomes: $\bigl[ (\vc{a}_{i,}+\vc{b}_{i,}) \cdot \vc{c}_{,j} \bigr]$, while the right hand side becomes: $\bigl[ \vc{a}_{i,} \cdot \vc{c}_{,j} \bigr] + \bigl[ \vc{b}_{i,} \cdot \vc{c}_{,j} \bigr]=$ $\bigl[ (\vc{a}_{i,} + \vc{b}_{,j}) \cdot \vc{c}_{,j} \bigr]$. This concludes the proof.
$(ix)$ We start with the left hand side, and after some work we find that a matrix element at position $ij$ can be expressed as: $\sum_k (\vc{a}_{i,}\cdot \vc{b}_{,k})c_{kj}$. Similarly, the right hand side becomes: $\sum_k a_{ik}(\vc{b}_{k,}\cdot \vc{c}_{,j})$, and after developing these two expression (left and right hand side), we can see that they are exactly the same. This is left as an exercise.
$(xiv)$ The left hand side is: $(\mx{A}\mx{B})^\T=$ $\bigl[ \vc{a}_{i,}^\T \vc{b}_{,j} \bigr]^\T=$ $\bigl[ \vc{a}_{j,}^\T \vc{b}_{,i} \bigr]$, where we have used Equation (6.23) for the matrix multiplication in the first step, and the transposed (changed $i$ for $j$ and $j$ for $i$) in the second step. For the right hand side, we use a similar strategy: $\mx{B}^\T \mx{A}^\T=$ $\bigl[ b_{ij} \bigr]^\T \bigl[ a_{ij} \bigr]^\T=$ $\bigl[ b_{ji} \bigr] \bigl[ a_{ji} \bigr]=$ $\bigl[ \vc{b}_{,i} \vc{a}_{j,}^\T \bigr]$. In the last expression, we can change the order of the vectors, similar to how $\vc{a}\cdot \vc{b} = \vc{b} \cdot \vc{a}$, which makes the left and right sides equal, and that concludes the proof.
$\square$

Note that $(v)$ and $(ix)$ are particularly convenient, since we can write both $\mx{A}+\mx{B}+\mx{C}$ and $\mx{A}\mx{B}\mx{C}$ (i.e., without any parenthesis), since the order does not matter. This is similar to how $1+2+3$ and $5\cdot 3 \cdot 2$ do not need any parenthesis.

Next, we present an example with $\mx{A}\mx{B}=\mx{O}$ (zero matrix defined in Definition 6.4) without either of $\mx{A}$ or $\mx{B}$ being $\mx{O}$.

Example 6.8: Matrix-Matrix Multiplication equals to the Zero Matrix
Assume we have the matrices, $\mx{A}$ and $\mx{B}$, as shown below, and that $\mx{A}\mx{B}$ needs to be calculated.
 \begin{align} \mx{A}= \left( \begin{array}{rr} 2 & 1 \\ 6 & 3 \end{array} \right) \spc\spc\text{and}\spc\spc \mx{B}= \left( \begin{array}{rr} -2 & 3 \\ 4 & -6 \end{array} \right) \\ \mx{A}\mx{B}= \left( \begin{array}{rr} 2\cdot(-2) + 1\cdot 4 & 2\cdot 3 + 1\cdot (-6) \\ 6\cdot(-2) + 3\cdot 4 & 6\cdot 3 + 3\cdot (-6) \end{array} \right) = \left( \begin{array}{rr} 0 & 0 \\ 0 & 0 \end{array} \right) =\mx{O}. \end{align} (6.49)
As can be seen, we have $\mx{A}\mx{B}=\mx{O}$ and still, neither of $\mx{A}$ and $\mx{B}$ are $\mx{0}$. Note that this has some consequences that are not always intuitive. For example, assume we have the following,
 \begin{gather} \mx{A}\mx{B} = \mx{A}\mx{C} \\ \Longleftrightarrow \\ \mx{A}\mx{B} - \mx{A}\mx{C} = \mx{O} \\ \Longleftrightarrow \\ \mx{A}(\mx{B} -\mx{C}) = \mx{O}, \end{gather} (6.50)
and that we also know that $\mx{A}$ is not equal to $\mx{O}$. Normally, when we see expressions such as the first row in (6.50) are used to that $\mx{B}=\mx{C}$. That is certainly possible, however, as we saw in (6.49), neither of the terms in a matrix-matrix multiplication needs to be the zero matrix, $\mx{O}$, in order for the product to be the zero matrix. In this case, it means that $\mx{B} -\mx{C}$ does not need to be the zero matrix.
It is also worth noting that Theorem 6.1 does not contain the rule $\mx{A}\mx{B}=\mx{B}\mx{A}$ and the reason is that this is not true in most cases. This is shown in the following example.

Example 6.9: Matrix-Matrix Multiplication is not Commutative
Assume we have two matrices
 $$\mx{A} = \left( \begin{array}{rr} 1 & 2 \\ 3 & -1 \\ \bstwo -2 & 0 \\ 0 & 4 \end{array} \right) \spc\spc\text{and}\spc\spc \mx{B} = \left( \begin{array}{rrrr} 5 & -3 & 0 & 1 \\ 3 & -1 & 2 & 6 \end{array} \right).$$ (6.51)
As can be seen, the size of $\mx{A}$ is $4\times 2$ and the size of $\mx{B}$ is $2\times 4$. Interestingly, both $\mx{A}\mx{B}$ and $\mx{B}\mx{A}$ are defined. Both these are shown below.
 \begin{align} \mx{A}\mx{B} &= \left( \begin{array}{rr} 1 & 2 \\ 3 & -1 \\ \bstwo -2 & 0 \\ 0 & 4 \end{array} \right) \left( \begin{array}{rrrr} 5 & -3 & 0 & -2 \\ 3 & 1 & 2 & 6 \end{array} \right) \\ &= \left( \begin{array}{rrrr} 1\cdot 5 + 2\cdot 3 & 1\cdot(-3) + 2\cdot 1 & 1\cdot 0 + 2\cdot 2 & 1\cdot (-2) + 2\cdot 6\\ 3\cdot 5 - 1\cdot 3 & 3\cdot (-3) - 1\cdot 1 & 3\cdot 0 - 1\cdot 2 & 3\cdot (-2) - 1\cdot 6\\ -2\cdot 5 + 0\cdot 3 & -2\cdot (-3) + 0\cdot 1 & -2\cdot 0 + 0\cdot 2 & -2\cdot (-2) + 0\cdot 6\\ 0\cdot 5 + 4\cdot 3 & 0\cdot (-3) + 4\cdot 1 & 0\cdot 0 + 4\cdot 2 & 0\cdot (-2) + 4\cdot 6 \end{array} \right) \\ &= \left( \begin{array}{rrrr} 11 & -1 & 4 & 10 \\ 12 & -10 & -2 & -12 \\ \bstwo -10 & 6 & 0 & 4 \\ 12 & 4 & 8 & 24 \end{array} \right) \end{align} (6.52)
Using the Rule (6.20) to find out the size of the product, $\mx{A}\mx{B}$, we find: $(4 \times \bcancel{2})\, (\bcancel{2} \times 4) \longrightarrow (4 \times 4)$, i.e., the size is $4\times 4$. Next, we compute $\mx{B}\mx{A}$, i.e.,
 \begin{align} \mx{B}\mx{A} &= \left( \begin{array}{rrrr} 5 & -3 & 0 & -2 \\ 3 & 1 & 2 & 6 \end{array} \right) \left( \begin{array}{rr} 1 & 2 \\ 3 & -1 \\ \bstwo -2 & 0 \\ 0 & 4 \end{array} \right) \\ &= \left( \begin{array}{rr} 5 \cdot 1 - 3 \cdot 3 + 0 \cdot (-2) - 2 \cdot 0 & 5 \cdot 2 - 3 \cdot (-1) + 0 \cdot 0 - 2 \cdot 4 \\ 3 \cdot 1 + 1 \cdot 3 + 2 \cdot (-2) + 6 \cdot 0 & 3 \cdot 2 + 1 \cdot (-1) + 2 \cdot 0 + 6 \cdot 4 \end{array} \right) \\ &= \left( \begin{array}{rr} -4 & 5 \\ 2 & 29 \end{array} \right) \end{align} (6.53)
The size of $\mx{B}\mx{A}$ is: $(2 \times \bcancel{4})\, (\bcancel{4} \times 2) \longrightarrow (2 \times 2)$, i.e., the size is $2\times 2$. Hence, it is pretty clear that in general we have $\mx{A}\mx{B} \neq \mx{B}\mx{A}$, i.e., matrix-matrix multiplication is not commutative.
In Interactive Illustration 6.6, we show an example of how the order of two matrices in a matrix multiplication affects a rectangle when its vertices are interpreted as vectors and multiplied by a matrix,
$\mx{S}$
$\mx{R}$
$\mx{SR}$
$\mx{RS}$
Interactive Illustration 6.6: Top left: shows a rectangle with a shear matrix, $\mx{S}$, applied to the vertices of the rectangle. The vertices are interpreted as vectors $\vc{v}_i$ from the center to the rectangle corners and the matrix is multiplied from the right, e.g., $\mx{S}\vc{v}_i$. Bottom left: same as above, but with a rotation matrix, $\mx{R}$. Top right: here, $\mx{S}\mx{R}$ is used. Bottom right: here, $\mx{R}\mx{S}$ is used. Try to move the sliders around as well to see the effect. As can be seen the rectangles in the top right and bottom right figures are not the same, and this implies that $\mx{S}\mx{R}\neq \mx{R}\mx{S}$, i.e., matrix multiplication does not commute.
Interactive Illustration 6.6: Top left: shows a rectangle with a shear matrix, $\hid{\mx{S}}$, applied to the vertices of the rectangle. The vertices are interpreted as vectors $\hid{\vc{v}_i}$ from the center to the rectangle corners and the matrix is multiplied from the right, e.g., $\hid{\mx{S}\vc{v}_i}$. Bottom left: same as above, but with a rotation matrix, $\hid{\mx{R}}$. Top right: here, $\hid{\mx{S}\mx{R}}$ is used. Bottom right: here, $\hid{\mx{R}\mx{S}}$ is used. Try to move the sliders around as well to see the effect. As can be seen the rectangles in the top right and bottom right figures are not the same, and this implies that $\hid{\mx{S}\mx{R}\neq \mx{R}\mx{S}}$, i.e., matrix multiplication does not commute.

Now that we have seen that matrix addition, matrix multiplication by a scalar, and matrix-matrix multiplication exist, it is reasonable to ask whether there also is a division-like operator. That is, how can we solve for $\mx{X}$ in
 $$\mx{A}\mx{X} = \mx{B}.$$ (6.54)
Now, let us take a step back, and start with a simpler expression using scalars only, i.e.,
 $$ax = b.$$ (6.55)
From algebra, the solution is trivially
 $$x = \frac{b}{a} = a^{-1}b,$$ (6.56)
Note the right hand side, where $a^{-1}$ is used as an operator solving Equation (6.55), is only valid if $a \neq 0$. This is the same notation that will be used for the matrix inverse, as we will see. However, if $a=0$ then all values of $x$ solves the equation if also $b=0$.

We will focus only on square matrix inverses, i.e., if $\mx{A}$ is square, find a solution to Equation (6.54), such that
 $$\mx{X} = \mx{A}^{-1}\mx{B}.$$ (6.57)
Only square matrices can have an inverse. Non-square matrices can have something similar to an inverse called a pseudo-inverse, but they are often less useful, and if nothing else is said, then we talk about square matrices when discussing matrix inverses. The definition of the matrix inverse follows below.

Definition 6.17: Matrix Inverse
The square matrix $\mx{A}$ is said to be invertible if there exists a matrix $\mx{A}^{-1}$, which is called the inverse of $\mx{A}$, such that
 $$\mx{A}\mx{A}^{-1} = \mx{A}^{-1}\mx{A} = \mx{I}.$$ (6.58)
For $\mx{A}\mx{A}^{-1} = \mx{I}$, $\mx{A}^{-1}$ is called a right-side inverse, while $\mx{A}^{-1}$ is called a left-side inverse if $\mx{A}^{-1}\mx{A} = \mx{I}$.
In the following, we will present a theorem that shows that if a matrix is invertible, then there is only one such matrix, and it works as a left-side and a right-side matrix.

Theorem 6.2: Matrix Inverse Existence
Let us call the left-side inverse, $\mx{A}_l^{-1}$ and the right-side inverse, $\mx{A}_r^{-1}$, i.e., $\mx{A}_l^{-1} \mx{A} = \mx{I}$ and $\mx{A}\mx{A}_r^{-1} = \mx{I}$. Then the following holds
$(i)$ If $\mx{A}_l^{-1} \mx{A} = \mx{A}\mx{A}_r^{-1} = \mx{I}$ then $\mx{A}_l^{-1} =\mx{A}_r^{-1}$.
$(ii)$ There is only one matrix inverse, $\mx{A}^{-1}$, to a matrix $\mx{A}$, i.e., $\mx{A}^{-1}= \mx{A}_l^{-1} =\mx{A}_r^{-1}$.

$(i)$ Assume that both $\mx{A}_l^{-1} \mx{A} = \mx{I}$ and $\mx{A}\mx{A}_r^{-1} = \mx{I}$ hold. Then it follows that
 $$\mx{A}_l^{-1} = \mx{A}_l^{-1} \mx{I} =\mx{A}_l^{-1} (\mx{A}\mx{A}_r^{-1}) = (\mx{A}_l^{-1} \mx{A})\mx{A}_r^{-1} = \mx{I}\mx{A}_r^{-1} = \mx{A}_r^{-1}.$$ (6.59)

$(ii)$ Assume that there are two different inverse matrices, $\mx{R}$ and $\mx{L}$, which can replace $\mx{A}^{-1}$ in Equation (6.58), i.e, we have $\mx{L}\mx{A}=\mx{I}$ and $\mx{A}\mx{R}=\mx{I}$. However, from $(i)$ above it follows that $\mx{L}=\mx{R}$, and as a consequence, there cannot be two different matrix inverses.
$\square$

Now that this theorem has been proved, we can omit the left-side and right-side matrix notation, and simply use $\mx{A}^{-1}$ as the matrix inverse notation. Next, a matrix inverse example follows.

Example 6.10: Matrix Inverses 1
For a two-dimensional rotation matrix $\mx{R}(\phi)$ (see Definition 6.10), it is reasonable to believe that the inverse rotation matrix is $\mx{R}(-\phi)$, i.e., a rotation in the opposite direction. If this is in fact true, then $\mx{R}(\phi)\mx{R}(-\phi)=\mx{I}$ per Definition 6.17. Hence, let us multiply these matrices and see if the result is the identity matrix. This is done below.
 \begin{align} \mx{R}(\phi)\mx{R}(-\phi) = & \mx{R}(\phi)\mx{R}(-\phi) = \begin{pmatrix} \cos \phi & -\sin \phi \\ \sin \phi & \hid{-}\cos \phi \end{pmatrix} \begin{pmatrix} \cos (-\phi) & -\sin(-\phi) \\ \sin (-\phi) & \hid{-}\cos(-\phi) \end{pmatrix} \\ =& \begin{pmatrix} \cos \phi & -\sin \phi \\ \sin \phi & \hid{-}\cos \phi \end{pmatrix} \begin{pmatrix} \hid{-}\cos (\phi) & \sin(\phi) \\ -\sin (\phi) & \cos(\phi) \end{pmatrix} \\ =& \begin{pmatrix} \cos^2 \phi + \sin^2\phi & \cos\phi \sin \phi - \sin \phi\cos\phi\\ \sin \phi\cos\phi - \cos\phi \sin \phi & \sin^2\phi + \cos^2 \phi \end{pmatrix} \\ =& \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = \mx{I}. \end{align} (6.60)
Here, we have used $\cos(-\phi) = \cos\phi$, $\sin(-\phi) = -\sin\phi$, and $\cos^2 \phi + \sin^2\phi=1$.

In the same manner, one can show that $\mx{H}^{-1}_{xy}(s) = \mx{H}_{xy}(-s)$, for example, for shear matrices. For scaling matrices, $\mx{S}^{-1}(f_x,f_y) = \mx{S}(1/f_x, 1/f_y)$. These two latter cases are left as exercises. In Interactive Illustration 6.7, these matrices and their inverses are illustrated.
Interactive Illustration 6.7: In this interactive illustration, we visualize a $2\times 2$ matrix, $\mx{A}$, as the two column vectors, $\vc{a}_{,1}$ and $\vc{a}_{,2}$, that it consists of, i.e., $\mx{A} = \bigl(\textcolor{#aa0000}{\vc{a}_{,1}}\,\, \textcolor{#00aa00}{\vc{a}_{,2}} \bigr)$. Both $\textcolor{#aa0000}{\vc{a}_{,1}}$ and $\textcolor{#00aa00}{\vc{a}_{,2}}$ can be moved in this illustration. In addition, the inverse matrix, $\mx{A}^{-1}$, is shown to the right, and its two column vectors are blue and yellow in the figure. This first shows an example where $\mx{A}$ is a rotation matrix using $\phi=\pi/6$, i.e., 30 degrees. You can see that the red vector then becomes $(\cos(\pi/6),\sin(\pi/6)=(\sqrt{3}/2,0.5)$. The parallelograms of the vector pair are also shown. In the case of a rotation matrix, these becomes squares with area 1.0. This will be explained further in Chapter 7, where it will become clear that the area is related to the determinant of the respective matrix. Click/press Forward to continue to the next type of matrix. Remember to click/press Reset if you have moved the vectors and want the original matrices restored.
Interactive Illustration 6.7: In this interactive illustration, we visualize a $\hid{2\times 2}$ matrix, $\hid{\mx{A}}$, as the two column vectors, $\hid{\vc{a}_{,1}}$ and $\hid{\vc{a}_{,2}}$, that it consists of, i.e., $\hid{\mx{A} = \bigl(\textcolor{#aa0000}{\vc{a}_{,1}}\,\, \textcolor{#00aa00}{\vc{a}_{,2}} \bigr)}$. Both $\hid{\textcolor{#aa0000}{\vc{a}_{,1}}}$ and $\hid{\textcolor{#00aa00}{\vc{a}_{,2}}}$ can be moved in this illustration. In addition, the inverse matrix, $\hid{\mx{A}^{-1}}$, is shown to the right, and its two column vectors are blue and yellow in the figure. This first shows an example where $\hid{\mx{A}}$ is a rotation matrix using $\hid{\phi=\pi/6}$, i.e., 30 degrees. You can see that the red vector then becomes $\hid{(\cos(\pi/6),\sin(\pi/6)=(\sqrt{3}/2,0.5)}$. The parallelograms of the vector pair are also shown. In the case of a rotation matrix, these becomes squares with area 1.0. This will be explained further in \linkref{Chapter}{ch_dt}, where it will become clear that the area is related to the determinant of the respective matrix. Click/press Forward to continue to the next type of matrix. Remember to click/press Reset if you have moved the vectors and want the original matrices restored.
$\mx{A} = \left(\begin{array}{l} \hid{1} \\ \hid{1} \end{array}\right.$
$\left.\begin{array}{l} \hid{1} \\ \hid{1} \end{array}\right)$
$\textcolor{#aa0000}{\vc{a}_{,1}}$
$\textcolor{#009000}{\vc{a}_{,2}}$
$\mx{A}^{-1} = \left(\begin{array}{l} \hid{1} \\ \hid{1} \end{array}\right.$
$\left.\begin{array}{l} \hid{1} \\ \hid{1} \end{array}\right)$
In Example 6.10, we saw that for rotation, scaling, and shear matrices, it is straightforward to obtain the corresponding matrix inverse. However, so far, nothing has been said about how this can be done for general matrices. There are several different ways to do this. For two-dimensional matrices, it is particularly simple, as shown in the following theorem.

Theorem 6.3: Two-Dimensional Matrix Inverse
For a $2\times 2$ matrix, $\mx{A}$, the inverse is
 $$\mx{A}^{-1} = \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}^{-1} = \frac{1}{a_{11}a_{22} - a_{12}a_{21}} \begin{pmatrix} \hid{-}a_{22} & -a_{12} \\ -a_{21} & \hid{-}a_{11} \end{pmatrix},$$ (6.61)
if $a_{11}a_{22} - a_{12}a_{21} \neq 0$, otherwise, the inverse does not exist.

Let us test what happens when $\mx{A}^{-1}$ and $\mx{A}$ are multiplied, i.e.,
 \begin{align} \mx{A}^{-1}\mx{A} &= \frac{1}{a_{11}a_{22} - a_{12}a_{21}} \begin{pmatrix} \hid{-}a_{22} & -a_{12} \\ -a_{21} & \hid{-}a_{11} \end{pmatrix} \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix} \\ &= \frac{1}{a_{11}a_{22} - a_{12}a_{21}} \begin{pmatrix} a_{22}a_{11} - a_{12}a_{21} & a_{22}a_{12}-a_{12}a_{22} \\ -a_{21}a_{11} + a_{11}a_{21} & -a_{21}a_{12}+a_{11}a_{22} \end{pmatrix} \\ &= \frac{1}{a_{11}a_{22} - a_{12}a_{21}} \begin{pmatrix} a_{11}a_{22} - a_{12}a_{21} & 0 \\ 0 & a_{11}a_{22} - a_{12}a_{21} \end{pmatrix} = \mx{I}, \end{align} (6.62)
where in the last step, we simply used the definition of scalar-matrix multiplication (Definition 6.7). It is also clear that if $a_{11}a_{22} - a_{12}a_{21}=0$ then we get division by zero, and therefore, the inverse does not exist.
$\square$

In Chapter 7, which is about determinants, it will become clear that the denominator in Theorem 6.3 is, in fact, the determinant of the $2\times 2$ matrix.

Next, we will show how the inverse can be computed using Gaussian elimination (Chapter 5).

Example 6.11:
The inverse of the matrix
 \begin{align} \mx{A} &= \left( \begin{array}{rrr} 5 & 3 & 1\\ 1 & 0 & -2 \\ 1 & 2 & 5 \end{array} \right) \end{align} (6.63)
is desired. Now, let us set up the following system of equations,
 \begin{gather} \mx{A}\vc{x} = \vc{y} \\ \Longleftrightarrow \\ \mx{A}\vc{x} = \mx{I}\vc{y}, \end{gather} (6.64)
where $\mx{A}\vc{x}$ and $\mx{I}\vc{y}$ are column vectors with three elements. If we multiply both sides from the left with the inverse of $\mx{A}$, we get
 \begin{gather} \mx{A}^{-1}\mx{A}\vc{x}=\mx{A}^{-1}\mx{I}\vc{y} \\ \Longleftrightarrow \\ \mx{I}\vc{x}=\mx{A}^{-1}\vc{y}, \end{gather} (6.65)
that is, we have "moved" over the identity matrix in Equation (6.64) from the right side to the left side, and at the same time, the inverse matrix is suddenly alone on the right side. Hence, if we can get the identity matrix on the left side, then we will have the matrix inverse on the other. Writing out the entire matrix structures gives us
 \begin{align} \left( \begin{array}{rrr} 5 & 3 & 1\\ 1 & 0 & -2 \\ 1 & 2 & 5 \end{array} \right) \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0\\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} y_1 \\ y_2 \\ y_3 \end{pmatrix}. \end{align} (6.66)
As we saw in Equation (6.26), a linear systems of equations can be expressed in matrix form. It is actually rather convenient, since we do not need to write out $x_1$, $x_2$, and $x_3$ all the time. Note that instead of having a constant vector on the right side of the equal sign, we now have the identity matrix times $\vc{y}$. However, Gaussian elimination can still be done here due the rules from Theorem 5.2. To save some paper, one may actually even just assume that the $x_1$, $x_2$, $x_3$, $y_1$, $y_2$, and $y_3$ are implicitly there, and avoid writing them out. The abbreviated form of the systems of equation just above is then written as
 \begin{align} \left( \begin{array}{rrr|rrr} 5 & 3 & 1 & 1 & 0 & 0\\ 1 & 0 & -2 & 0 & 1 & 0\\ 1 & 2 & 5 & 0 & 0 & 1 \end{array} \right). \end{align} (6.67)
We can now perform the usual operations as done for Gaussian elimination. For example, we can subtract the bottom row from the middle row and place in the bottom row, which would result in
 \begin{align} \left( \begin{array}{rrr|rrr} 5 & 3 & 1 & 1 & 0 & 0\\ 1 & 0 & -2 & 0 & 1 & 0\\ 0 & 2 & 7 & 0 & -1 & 1 \end{array} \right). \end{align} (6.68)
Next, multiply the middle row by $5$ and subtract the result from the first row, and place that in the middle row, which results in
 \begin{align} \left( \begin{array}{rrr|rrr} 5 & 3 & 1 & 1 & 0 & 0\\ 0 & -3 & -11 & -1 & 5 & 0\\ 0 & 2 & 7 & 0 & -1 & 1 \end{array} \right). \end{align} (6.69)
Note that these operations are applied to the right side as well. Finally, we multiply the middle row by $2$ and the bottom row by $3$, add them, and then place the result in the bottom row, i.e.,
 \begin{align} \left( \begin{array}{rrr|rrr} 5 & 3 & 1 & 1 & 0 & 0\\ 0 & -3 & -11 & -1 & 5 & 0\\ 0 & 0 & -1 & -2 & 7 & 3 \end{array} \right). \end{align} (6.70)
In the next step, we do several operations at once. The bottom row is added to the first row, the bottom row is multiplied by $-11$ and added to the middle row, and the bottom row is multiplied by $-1$ and simply stored in the bottom row, which results in
 \begin{align} \left( \begin{array}{rrr|rrr} 5 & 3 & 0 & -1 & 7 & 3\\ 0 & -3 & 0 & 21 & -72 & -33\\ 0 & 0 & 1 & 2 & -7 & -3 \end{array} \right). \end{align} (6.71)
Next, we add the middle row to the top row, and then divide the top by $5$ and the middle row by $-3$, which gives us
 \begin{align} \left( \begin{array}{rrr|rrr} 1 & 0 & 0 & 4 & -13 & -6\\ 0 & 1 & 0 & -7 & 24 & 11\\ 0 & 0 & 1 & 2 & -7 & -3 \end{array} \right). \end{align} (6.72)
Hence, the inverse matrix, $\mx{A}^{-1}$ is
 \begin{align} \mx{A}^{-1} &= \left( \begin{array}{rrr} 4 & -13 & -6\\ -7 & 24 & 11\\ 2 & -7 & -3 \end{array} \right). \end{align} (6.73)
In practice, one may collapse some of the steps above in order to further save space. Note that this method works for any size of a square matrix. Finally, if the system does not have a solution, then the matrix is not invertible. The reader may try to multiply $\mx{A}$ and $\mx{A}^{-1}$ and make sure that the result is the identity matrix, $\mx{I}$.
Note that Chapter 7 will present other ways to compute the inverse as well.

Theorem 6.4: Matrix Inverse Properties
In the following, if the matrices $\mx{A}$ and $\mx{B}$ are invertible, then $\mx{A}^{\T}$, $\mx{A}^{-1}$, $\mx{B}^{-1}$, and $\mx{A}\mx{B}$ are also invertible, and
 $$\begin{array}{llr} (i) & (\mx{A}^{-1})^{-1} = \mx{A} & \spc\text{(inverse inverse)}, \\ (ii) & (\mx{A}\mx{B})^{-1} = \mx{B}^{-1}\mx{A}^{-1} & \spc\text{(product inverse)}, \\ (iii) & (\mx{A}^{-1})^{\T} = (\mx{A}^{\T})^{-1} & \spc\text{(inverse transpose)}. \\ \end{array}$$ (6.74)
Note the order of $(ii)$, that is, that the order of $\mx{A}$ and $\mx{B}$ are reversed on the right hand side, compared to the left side.

$(i)$ By Definition 6.17 $\mx{A}\mx{A}^{-1}=\mx{A}^{-1}\mx{A}=\mx{I}$, which means that the inverse of $\mx{A}^{-1}$ is $\mx{A}$, and hence $\mx{A}^{-1}$ is invertible.
$(ii)$ We know we have the inverse to a matrix if we can multiply it by the matrix itself and get $\mx{I}$. Assume we have guessed that the inverse to $(\mx{A}\mx{B})$ is $\invmx{B}\invmx{A}$ and that we then want to check that it is correct. We can now multiply $\mx{A}\mx{B}$ with $\invmx{B}\invmx{A}$ and see the result: $(\mx{A}\mx{B})(\invmx{B}\invmx{A}) =$ $\mx{A}(\mx{B}\invmx{B})\invmx{A} =$ $\mx{A}\mx{I}\invmx{A} =$ $\mx{A}\invmx{A} = \mx{I}$. Thus our guess was correct and the inverse of $(\mx{A}\mx{B})$ is indeed $\invmx{B}\invmx{A}$.
$(iii)$ Note that $\mx{I}=\mx{I}^{\T}$, and then we use $(xiv)$ from Theorem 6.1 to get $(\underbrace{\mx{A}\mx{A}^{-1}}_{\mx{I}})^{\T} =$ $(\mx{A}^{-1})^{\T} \mx{A}^{\T} = \mx{I}$, which means that $(\mx{A}^{-1})^{\T}$ is left-hand inverse to $\mx{A}^{\T}$. Similarly, $(\underbrace{\mx{A}^{-1}\mx{A}}_{\mx{I}})^{\T} =$ $\mx{A}^{\T}(\mx{A}^{-1})^{\T} = \mx{I}$, which together shows the rule and $\mx{A}^{\T}$ is invertible.
$\square$

Example 6.12: Matrix Product Inverse au Faux
As we have just seen above in Theorem 6.4, the inverse of a matrix-matrix product, say $\mx{R}(\phi)\mx{H}_{xy}(s)$, is $(\mx{R}(\phi)\mx{H}_{xy}(s))^{-1}=$ $\mx{H}^{-1}_{xy}(s)\mx{R}^{-1}(\phi)$. Here, we will be exploring what would happen if we do not honor that rule of exchanging the order of the matrices. Controlling whether one computes a true inverse can be done by multiplying the matrix with its inverse, and that should give us the identity matrix, $\mx{I}$, i.e.,
 \begin{align} \bigl(\mx{R}(\phi)\mx{H}_{xy}(s)\bigr) \bigl(\mx{R}(\phi)\mx{H}_{xy}(s)\bigr)^{-1} = \mx{I}, \end{align} (6.75)
in our example. What if we were a little sloppy, and actually forgot that we should change the order of the two matrices? Then we would get a matrix, $\mx{M}$, as
 \begin{align} \mx{M} = \bigl(\mx{R}(\phi)\mx{H}_{xy}(s)\bigr) \bigl(\mx{R}^{-1}(\phi)\mx{H}^{-1}_{xy}(s)\bigr). \end{align} (6.76)
As we saw in Example 6.10, the inverses for the rotation and shear matrices are rather simple, i.e., $\mx{R}^{-1}(\phi) = \mx{R}(-\phi)$ and $\mx{H}^{-1}_{xy}(s) = \mx{H}_{xy}(-s)$. This means that
 \begin{align} \mx{M} = \mx{R}(\phi)\mx{H}_{xy}(s) \mx{R}(-\phi) \mx{H}_{xy}(-s). \end{align} (6.77)
The result of applying the matrix $\mx{M}$ to the vertices (interpreted as column vectors) of a unit square is shown in Interactive Illustration 6.8.
$\phi=$
$s=$
Interactive Illustration 6.8: The two sliders above control the rotation angle, $\phi$, and the shearing factor, $s$. The unit square is deformed using the following matrix, $\mx{M} = \mx{R}(\phi)\mx{H}_{xy}(s) \mx{R}(-\phi) \mx{H}_{xy}(-s)$, which is not quite the identity matrix, because the order of the matrices is not correct. Note however what happens when one of the variables are set to 0.
Interactive Illustration 6.8: The two sliders above control the rotation angle, $\hid{\phi}$, and the shearing factor, $\hid{s}$. The unit square is deformed using the following matrix, $\hid{\mx{M} = \mx{R}(\phi)\mx{H}_{xy}(s) \mx{R}(-\phi) \mx{H}_{xy}(-s)}$, which is not quite the identity matrix, because the order of the matrices is not correct. Note however what happens when one of the variables are set to 0.
This example has shown that it is very important to maintain correct order of the matrices in a matrix multiplication. Otherwise, one may get a result as the one shown in the figure above, i.e., you don't quite get the identity matrix. It is quite close, but it is not at all useful.

We saw in Chapter 5 that to test if a set of vectors $\{\vc{u}_1, \vc{u}_2, \ldots, \vc{u}_q\}$ are independent or if they span $\R^p$, we have to study a set of $p$ equations in $q$ unknowns. In this chapter, we saw that matrices can be used to conveniently express system of linear equations.

Theorem 6.5: Matrices and Linear Independence
The following two statements are equivalent.
1. The column vectors of the matrix $\mx{A}$ are linearly independent.
2. The equation $\mx{A} \vc{x} = \vc{0}$ has only the solution $\vc{x}=\vc{0}$.

According to Definition 5.2 the column vectors $\vc{a}_1, \vc{a}_1, \ldots, \vc{a}_q$ are linearly independent if and only if $\sum_{i=1}^q x_i \vc{a}_i = \vc{0}$. If $\mx{A}$ is the $p \times q$ matrix with $\vc{a}_1, \vc{a}_1, \ldots, \vc{a}_q$ as columns, then $\mx{A}\vc{x}=\vc{0} \Leftrightarrow \sum_{i=1}^q x_i \vc{a}_i = \vc{0}$. This means that the two first statements are equivalent.
$\square$

Theorem 6.6: Matrices and Linear Independence
If there exists a left-inverse $\mx{A}_l^{-1}$ to the matrix $\mx{A}$, then the columns of $\mx{A}$ are linearly independent.

Assume that there exists at least one left-inverse $\mx{A}_l^{-1}$ to the matrix $\mx{A}$. Then we can multiply the matrix equation $\mx{A} \vc{x} = \vc{0}$ with $\mx{A}_l^{-1}$ from the left to obtain $\mx{A}_l^{-1} \mx{A} \vc{x} = \mx{A}_l^{-1} \vc{0}$ or $\mx{I} \vc{x} = \vc{x} = \vc{0}$. This proves the theorem.
$\square$

Theorem 6.7: Matrices and Span
The following two statements are equivalent.
1. The column vectors of the matrix $\mx{A}$ span $\R^p$.
2. The equation $\mx{A} \vc{x} = \vc{y}$ has a solution for every $\vc{y}$.

According to Definition 5.3, the column vectors $\vc{a}_1, \vc{a}_1, \ldots, \vc{a}_q$ span $\R^p$ if and only if $\sum_{i=1}^q x_i \vc{a}_i = \vc{y}$ has a solution for every $\vc{y}$. If $\mx{A}$ is the $p \times q$ matrix with $\vc{a}_1, \vc{a}_1, \ldots, \vc{a}_q$ as columns, then $\mx{A}\vc{x}=\vc{y} \Longleftrightarrow \sum_{i=1}^q x_i \vc{a}_i = \vc{y}$. This means that the two first statements are equivalent.
$\square$

Theorem 6.8: Matrices and Linear Independence
If the columns of $\mx{A}$ span $\R^q$ then there exists a right-inverse $\mx{A}_r^{-1}$ to the matrix $\mx{A}$.

If the columns of $\mx{A}$ span $\R^q$ then the matrix equation $\mx{A}\vc{x} = \vc{y}$ has a solution for every $\vc{y}$. Let $\vc{e}_i$ be the canonical basis and let $\vc{b}_i$ be a solution to the equation $\mx{A}\vc{b}_i = \vc{e}_i$. Now form the matrix $\mx{B} = (\vc{b}_1 \cdot \vc{b}_p)$, then $\mx{A} \mx{B} = (\mx{A} \vc{b}_1 \cdot \mx{A} \vc{b}_p) = (\vc{e}_1 \cdot \vc{e}_p) = \mx{I}$. So $\mx{B}$ is a right inverse to $\mx{A}$.
$\square$

Theorem 6.9:
Let $\mx{A}$ be a square matrix. Then the following statements are equivalent:
1. The column vectors of the matrix $\mx{A}$ span $\R^p$.
2. The row vectors of the matrix $\mx{A}$ span $\R^p$.
3. The equation $\mx{A} \vc{x} = \vc{y}$ has a solution for every $\vc{y}$.
4. The column vectors of the matrix $\mx{A}$ are linearly independent.
5. The row vectors of the matrix $\mx{A}$ are linearly independent.
6. The equation $\mx{A} \vc{x} = \vc{y}$ has a solution for every $\vc{y}$.
7. The matrix $\mx{A}$ is invertible.

We have already shown that statements $(i)$ and $(iii)$ are equivalent and that $(iv)$ and $(vi)$ are equivalent. According to XXXX?. We have also showed that $(iv)$ implies that a right inverse exists. According to xxx this means that $\mx{A}$ is invertible and that a left inverse exists. Since the matrix is sqaure, Theorem 5.5 gives the equivalence between $(i)$ and $(iv)$. We now need to link $(i), (iii), (iv), (vi)$ to $(vii)$. This is done by the following chain of theorems. That the columns span $(i)$ gives the existence of a right inverse according to Theorem 6.8. Since the matrix is square, it must be invertible $(vii)$ and has also a left-inverse. Then Theorem 6.6 gives that the columns are linearly independent $(iv)$. Thus $(i), (iii), (iv), (vi)$, and $(vii)$ are all equivalent. Finally, since $\mx{A}$ is invertible whenever $\mx{A}^T$ statements on the row vectors follow.
$\square$

Section 5.10 has already outlined how change of basis can be done, where the second basis $\{\hat{\vc{e}}_1,\hat{\vc{e}}_2\}$ can be expressed in terms of a first basis $\{\vc{e}_1,\vc{e}_2\}$, i.e.,
 \begin{align} \hat{\vc{e}}_1 = b_{11} \vc{e}_1 + b_{21} \vc{e}_2,\\ \hat{\vc{e}}_2 = b_{12} \vc{e}_1 + b_{22} \vc{e}_2, \end{align} (6.78)
where we have changed from $x_{ij}$ to $b_{ij}$ above, compared to Equation (5.102). As we will see, this is much more powerful when expressed on matrix form, since you then can get the transform in the other direction by using the inverse of the corresponding matrix. In the following theorem, the change of basis is generalized to any dimension and written on matrix form.

Theorem 6.10: Change of Base
Given the following relationship between the two $\R^n$ bases $\{\vc{e}_1,\vc{e}_2,\dots,\vc{e}_n\}$ and $\{\hat{\vc{e}}_1,\hat{\vc{e}}_2,\dots,\hat{\vc{e}}_n\}$,
 \begin{align} \hat{\vc{e}}_1 &= b_{11} \vc{e}_1 + b_{21} \vc{e}_2 + \dots + b_{n,1} \vc{e}_n, \\ \hat{\vc{e}}_2 &= b_{12} \vc{e}_1 + b_{22} \vc{e}_2 + \dots + b_{n,2} \vc{e}_n, \\ &\dots \\ \hat{\vc{e}}_n &= b_{1,n} \vc{e}_1 + b_{2,n} \vc{e}_2 + \dots + b_{n,n} \vc{e}_n, \end{align} (6.79)
where a particular vector $\vc{v}$ has the following two representations
 \begin{align} \vc{v} &= v_1\vc{e}_1 + v_2\vc{e}_2 + v_3\vc{e}_3+\dots+v_n\vc{e}_n = \\ &= \hat{v}_1\hat{\vc{e}}_1 + \hat{v}_2\hat{\vc{e}}_2 + \hat{v}_3\hat{\vc{e}}_3+\dots+\hat{v}_n\hat{\vc{e}}_n, \end{align} (6.80)
and let $\mx{B}$ be the matrix with $\hat{\vc{e}}_i$ as column vectors, then it holds that
 \begin{align} \vc{v} = \mx{B}\hat{\vc{v}} = \begin{pmatrix} b_{11} & b_{12} & \dots & b_{1,n} \\ b_{21} & b_{22} & \dots & b_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ b_{n,1} & b_{n,2} & \dots & b_{n,n} \end{pmatrix} \vc{\hat{v}}, \end{align} (6.81)
where $\vc{v}=(v_1,v_2,v_3,\dots,v_n)$ and $\hat{\vc{v}}=(\hat{v}_1,\hat{v}_2,\hat{v}_3,\dots,\hat{v}_n)$.

We start by rewriting the representation of $\vc{v}$ as
 \begin{align} \vc{v} &= \sum_{i=1}^n x_i\vc{e}_i \\ &= \sum_{i=1}^n \hat{x}_i\hat{\vc{e}}_i, \end{align} (6.82)
and $\hat{\vc{e}}_j$ as
 \begin{align} \hat{\vc{e}}_j = \sum_{i=1}^n b_{ij}\vc{e}_i, \end{align} (6.83)
and then insert Equation (6.83) into line 2 of Equation (6.82), which results in
 \begin{align} \vc{v} = \sum_{j=1}^n \hat{x}_j\hat{\vc{e}}_j = \sum_{j=1}^n \hat{x}_j \Biggl( \sum_{i=1}^n b_{ij}\vc{e}_i \Biggr) = \sum_{i=1}^n \Biggl( \sum_{j=1}^n b_{ij} \hat{x}_j \Biggr) \vc{e}_i. \end{align} (6.84)
Since there is only a single set of coordinates (see Theorem 2.5 for the three-dimensional case) for a vector in a basis, and we have the following two ways of representing $\vc{v}$,
 \begin{align} \vc{v} &= \sum_{i=1}^n x_i\vc{e}_i \\ \vc{v} &= \sum_{i=1}^n \Biggl( \sum_{j=1}^n b_{ij} \hat{x}_j \Biggr) \vc{e}_i, \end{align} (6.85)
it must hold that
 \begin{gather} x_i = \sum_{j=1}^n b_{ij} \hat{x}_j \\ \Longleftrightarrow \\ \vc{x} = \mx{B}\hat{\vc{x}}. \end{gather} (6.86)
This concludes the proof.
$\square$

Note that per Equation (6.81), we have $\vc{v} = \mx{B}\vc{\hat{v}}$, which also means that $\vc{\hat{v}} = \mx{B}^{-1} \vc{v}$, assuming that $\mx{B}$ is invertible. It is often this expression that one is interested in.

A special set of matrices are the so called orthogonal matrices, and they have the convenient property that the inverse can be obtained by just taking the transpose. They are important because, for example, they can describe change of basis between two orthonormal bases. We start by the following definition.

Definition 6.18: Orthogonal Matrix
An orthogonal matrix $\mx{B}$ is a square matrix where the column vectors constitute an orthonormal basis.
Note that since the column vector constitute an orthonormal basis (Definition 3.5), it would make more sense to call the matrix orthonormal, but the term "orthogonal matrix" has a lot of legacy, so we will use it here as well. Given this short definition, the following theorem can be proved.

Theorem 6.11: Orthogonal Matrix Equivalence
The following are equivalent
$\spc (i)$ The matrix $\mx{B}$ is orthogonal.
$\spc (ii)$ The column vectors of $\mx{B}$ constitute an orthonormal basis.
$\spc (iii)$ The row vectors of $\mx{B}$ constitute an orthonormal basis.
$\spc (iv)$ $\mx{B}^{-1} = \mx{B}^{\T}$

$(i)$ and $(ii)$ are simply the definition, and so need not be proved.
Next, we show that $(iv)$ and $(ii)$ are equivalent. Assume that we have a orthonormal basis (Definition 3.5) consisting of the following set of vectors, $\{\vc{b}_1,\vc{b}_2,\vc{b}_3,\dots,\vc{b}_n\}$. Let us put them as column vectors in a matrix, $\mx{B}$, i.e.,
 \begin{align} \mx{B} &= \begin{pmatrix} | & | & \dots & | \\ \vc{b}_{1} & \vc{b}_{2} & \dots & \vc{b}_{n} \\ | & | & \dots & | \\ \end{pmatrix}, \end{align} (6.87)
and now the transpose of $\mx{B}$ multiplied by itself then becomes
 \begin{align} \mx{B}^{\T} \mx{B} &= \begin{pmatrix} -\,\,\, \vc{b}_{1}^\T - \\ -\,\,\, \vc{b}_{2}^\T - \\ -\,\,\, \vc{b}_{3}^\T - \\ \vdots \\ -\,\,\, \vc{b}_{n}^\T - \end{pmatrix} \begin{pmatrix} | & | & | & \dots & | \\ \vc{b}_{1} & \vc{b}_{2} & \vc{b}_{3} & \dots & \vc{b}_{n} \\ | & | & | & \dots & | \\ \end{pmatrix}\\ &= \begin{pmatrix} \vc{b}_{1}^\T \vc{b}_{1} & \vc{b}_{1}^\T \vc{b}_{2} & \vc{b}_{1}^\T \vc{b}_{3} & \dots & \vc{b}_{1}^\T \vc{b}_{n} \\ \vc{b}_{2}^\T \vc{b}_{1} & \vc{b}_{2}^\T \vc{b}_{2} & \vc{b}_{2}^\T \vc{b}_{3} & \dots & \vc{b}_{2}^\T \vc{b}_{n} \\ \vc{b}_{3}^\T \vc{b}_{1} & \vc{b}_{3}^\T \vc{b}_{2} & \vc{b}_{3}^\T \vc{b}_{3} & \dots & \vc{b}_{3}^\T \vc{b}_{n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \vc{b}_{n}^\T \vc{b}_{1} & \vc{b}_{n}^\T \vc{b}_{2} & \vc{b}_{n}^\T \vc{b}_{3} & \dots & \vc{b}_{n}^\T \vc{b}_{n} \\ \end{pmatrix} \\ &= \begin{pmatrix} 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & 1 & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 \\ \end{pmatrix} = \mx{I}, \end{align} (6.88)
where we arrive at the identity matrix, $\mx{I}$, since the definition of the orthonormal basis states that $\mx{b}_i\cdot\mx{b}_i=1$ and $\mx{b}_i\cdot\mx{b}_j=0$ when $i \neq j$. This means that $\mx{B}^{-1} = \mx{B}^{\T}$ when the column vectors constitute an orthonormal basis. Thus we have proved that $(ii) \rightarrow (iv)$. The proof for $(iv) \rightarrow (ii)$ is similar and left to the reader.
Now that we have proved the equivalence of $(i)$, $(ii)$, and $(iv)$, it only remains to show that $(iii)$ also is equivalent to either of $(i)$, $(ii)$, and $(iv)$. At this point, we know that $\mx{B}\mx{B}^\T=\mx{I}$ due to $(iv)$. We introduce $\mx{A} = \mx{B}^\T$ and evaluate $\mx{A}\mx{A}^\T$, i.e.,
 $$\mx{A}\mx{A}^\T = \mx{B}^\T(\mx{B}^\T)^T = \mx{B}^\T\mx{B}=\mx{I}.$$ (6.89)
This means that $\mx{A}$ must be orthogonal since $\mx{A}\mx{A}^\T=\mx{I}$, i.e., per $(ii)$ we know its column vectors constitute an orthonormal basis. However, since $\mx{A} = \mx{B}^\T$, we know also that the row vectors of $\mx{B}$ constitute an orthonormal basis, which concludes the proof.
$\square$

This means that the inverse of an orthogonal matrix is simply its transpose, that is, $\mx{A}^{-1} = \mx{A}^{\T}$, which is very convenient since the transpose is trivial to compute, while the inverse of an arbitrary square matrix usually is not.

Example 6.13: Inverse of Rotation Matrix
A rotation matrix (see Section 6.4) by $\phi$ radians around the $z$-axis is
 $$\mx{R}_z(\phi) = \begin{pmatrix} \cos \phi & -\sin \phi & 0 \\ \sin \phi & \hid{-}\cos \phi & 0 \\ 0 & 0 & 1 \end{pmatrix}.$$ (6.90)
As explained in Definition 6.18, the inverse of an orthogonal matrix is its transpose. Hence, it should be possible to get $\mx{R}_z(\phi) \mx{R}^{\T}_z(\phi)=\mx{I}$ as a result if the rotation matrix is orthogonal, i.e.,
 \begin{align} \mx{R}_z(\phi)\mx{R}^{\T}_z(\phi) &= \begin{pmatrix} \cos \phi & -\sin \phi & 0 \\ \sin \phi & \hid{-}\cos \phi & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} \hid{-}\cos \phi & \sin \phi & 0 \\ -\sin \phi & \cos \phi & 0 \\ 0 & 0 & 1 \end{pmatrix} \\ &= \begin{pmatrix} \cos \phi\cos \phi + \sin \phi\sin \phi & \cos \phi\sin \phi-\sin \phi\cos \phi & 0 \\ \sin \phi\cos \phi - \cos \phi\sin \phi & \sin \phi\sin \phi+ \cos \phi\cos \phi& 0 \\ 0 & 0 & 1 \end{pmatrix} \\ &= \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} =\mx{I}, \end{align} (6.91)
where we have used the fact that $\cos^{2}+\sin^{2}=1$. This shows that $\mx{R}_z(\phi)$ is an orthogonal matrix, and in fact, it is possible to show that all rotation matrices are orthogonal.

Theorem 6.12: Orthogonality and Length Preservation
If $\mx{B}$ is an orthogonal matrix, then $\ln{\mx{B}\vc{v}} = \ln{\vc{v}}$, and vice versa, i.e., the transform preserves length.

As seen in Example 6.6, the dot product between two vectors, $\vc{u}$ and $\vc{v}$ can be expressed as $\vc{u}\cdot\vc{v} = \vc{u}^\T \vc{v}$. Note that
 \begin{gather} \ln{\mx{B}\vc{v}} = \ln{\vc{v}} \\ \Longleftrightarrow \\ \ln{\mx{B}\vc{v}}^2 = \ln{\vc{v}}^2 \\ \Longleftrightarrow \\ \bigl(\mx{B}\vc{v}\bigr) \cdot \bigl(\mx{B}\vc{v}\bigr) = \vc{v} \cdot \vc{v} \\ \Longleftrightarrow \\ \bigl(\mx{B}\vc{v}\bigr)^\T \bigl(\mx{B}\vc{v}\bigr) = \vc{v}^\T \vc{v}. \end{gather} (6.92)
The lefthand side of the last expression can be simplified using the rules for transposes,
 \begin{gather} \bigl(\mx{B}\vc{v}\bigr)^\T \bigl(\mx{B}\vc{v}\bigr) = \vc{v}^\T\underbrace{\mx{B}^\T \mx{B}}_{\mx{I}} \vc{v} = \vc{v}^\T \mx{I} \vc{v} = \vc{v}^\T \vc{v} = \vc{v}\cdot \vc{v}, \end{gather} (6.93)
where we have exploited that $\mx{B}$ is orthogonal, and hence $\mx{B}^\T \mx{B} = \mx{I}$, and this concludes the proof.
$\square$

Theorem 6.13:
If $\mx{B}$ is an orthogonal matrix, then $(\mx{B} \vc{u}) \cdot(\mx{B}\vc{v}) = \vc{u}\cdot\vc{v}$, i.e., it does not matter in which basis one performs the dot product in.

As we saw in Example 3.6, it holds that $\vc{u}\cdot \vc{v} = \frac{1}{4}\bigl( \ln{ \vc{u} + \vc{v} }^2 - \ln{ \vc{u} - \vc{v} }^2 \bigr)$, which means that
 \begin{align} (\mx{B} \vc{u}) \cdot(\mx{B}\vc{v}) &= \frac{1}{4}\Bigl( \ln{ \mx{B}\vc{u} + \mx{B}\vc{v} }^2 - \ln{ \mx{B}\vc{u} - \mx{B}\vc{v} }^2\Bigr) \\ &= \frac{1}{4}\Bigl( \ln{ \mx{B}\bigl(\vc{u} + \vc{v}\bigr) }^2 - \ln{ \mx{B}\bigl(\vc{u} - \vc{v}\bigr) }^2\Bigr) \\ &= \frac{1}{4}\Bigl( \ln{ \vc{u} + \vc{v} }^2 - \ln{ \vc{u} - \vc{v} }^2\Bigr) = \vc{u}\cdot\vc{v}, \\ \end{align} (6.94)
where we used Theorem 6.12 in the next to last row to arrive at the last row, i.e., orthogonal matrices preserve length.
$\square$

Theorem 6.14:
If $\mx{A}$ and $\mx{B}$ are orthogonal matrices, then $\mx{A}\mx{B}$ is orthogonal as well.

Theorem 6.12 states that all length-preserving matrices are orthogonal and since
 \begin{align} || \mx{A}\mx{B}\vc{v} || = || \mx{A} \left(\mx{B}\vc{v}\right)|| = || \mx{B}\vc{v}\bigr|| = || \vc{v} || \end{align} (6.95)
we know that $\mx{A}\mx{B}$ preserves length, i.e., it is orthogonal.
$\square$

Example 6.14: Orthgonal Matrix Multiplication Visualization
As we saw in Theorem 6.14, $\mx{A}\mx{B}$ is orthogonal if $\mx{A}$ and $\mx{B}$ are orthogonal. In the following interactive illustration, we will visualize the matrix-matrix multiplication between two orthogonal matrices.
Interactive Illustration 6.9: In this interactive illustration, we visualize an orthogonal matrix, $\mx{A}$, of size $2\times 2$. The two column vectors of $\mx{A}$, i.e., $\textcolor{#aa0000}{\vc{a}_{,1}}$ and $\textcolor{#009000}{\vc{a}_{,2}}$, are shown as vectors. The topmost slider can be used to give $\mx{A}$ a different appearance. Click/press Forward to continue to the next step. Remember to click/press Reset if you have want the original matrices restored.
Interactive Illustration 6.9: In this final step, we visualize the column vectors, $\hid{\vc{m}_{,1}}$ and $\hid{\vc{m}_{,2}}$, of $\hid{\mx{M} =\mx{A}\mx{B}}$. Note that per \linkref{Theorem}{theo_mtx_ortho_times_ortho_is_ortho}, $\hid{\mx{M}}$ is also orthogonal since $\hid{\mx{A}}$ and $\hid{\mx{B}}$ are orthogonal. As an exercise, the reader is encouraged to play with the sliders to get a deeper understanding of what is going on. For example, when will the result become the identity matrix, $\hid{\mx{I}}$? Also, does it make sense to assume that both $\hid{\mx{A}}$ and $\hid{\mx{B}}$ are rotation matrices? Can this be seen in $\hid{\mx{M}}$?
$\textcolor{#aa0000}{\vc{a}_{,1}}$
$\textcolor{#009000}{\vc{a}_{,2}}$
$\textcolor{#0000aa}{\vc{b}_{,1}}$
$\textcolor{#aaaa00}{\vc{b}_{,2}}$
$\textcolor{#000000}{\vc{m}_{,1}}$
$\textcolor{#000000}{\vc{m}_{,2}}$
$\textcolor{#777777}{\vc{b}_{1,}}$
$\textcolor{#777777}{\vc{b}_{2,}}$

Example 6.15: Change of Base using Orthogonal Matrices
Assume we have two orthonormal bases, $\{\vc{e}_1, \vc{e}_2\}$ and $\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}$, defined as
 $$\vc{e}_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \vc{e}_2 = \begin{pmatrix} 0 \\ 1 \end{pmatrix}, \ \ \mathrm{and} \ \ \hat{\vc{e}}_1 = \begin{pmatrix} \frac{\sqrt{3}}{2} \\ \frac{1}{2} \end{pmatrix}, \hat{\vc{e}}_2 = \begin{pmatrix} -\frac{1}{2} \\ \frac{\sqrt{3}}{2} \end{pmatrix}.$$ (6.96)
It is easy to check that $\ln{\vc{e}_i}=1$ and $\ln{\hat{\vc{e}}_i}=1$ for $i\in\{1,2\}$ and that $\vc{e}_1 \cdot \vc{e}_2 = 0$ and $\hat{\vc{e}}_1 \cdot \hat{\vc{e}}_2 = 0$, i.e., we have two orthonormal bases, per Definition 3.5. Now, it is possible to use Theorem 6.10 to find out what the matrices look like that expresses these bases. However, an alternative way when dealing with orthonormal bases is simply to imagine the vectors $(1,0)$ and $(0,1)$ expressed in $\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}$ and then figure out how to set up a matrix that transforms those vectors into $\{\vc{e}_1, \vc{e}_2\}$. It is rather simple as seen below.
 \begin{align} \underbrace{ \begin{pmatrix} \frac{\sqrt{3}}{2} & -\frac{1}{2}\\  \frac{1}{2} & \frac{\sqrt{3}}{2} \end{pmatrix} }_{\mx{A}} \begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} \frac{\sqrt{3}}{2} \\ \frac{1}{2} \end{pmatrix} \ \ \mathrm{and} \ \ \underbrace{ \begin{pmatrix} \frac{\sqrt{3}}{2} & -\frac{1}{2}\\  \frac{1}{2} & \frac{\sqrt{3}}{2} \end{pmatrix} }_{\mx{A}} \begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} -\frac{1}{2} \\ \frac{\sqrt{3}}{2} \\ \end{pmatrix} \end{align} (6.97)
Now assume that we have a vector $\vc{v}=(1/2, 1)$ in $\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}$ and you want to transform that vector into $\{\vc{e}_1, \vc{e}_2\}$. It is simply a matter of multiplying with the matrix $\mx{A}$ above, i.e.,
 $$\mx{A}\vc{v} = \begin{pmatrix} \frac{\sqrt{3}}{2} & -\frac{1}{2}\\ \frac{1}{2} & \frac{\sqrt{3}}{2} \end{pmatrix} \begin{pmatrix} \frac{3}{4} \\ \frac{1}{2} \end{pmatrix} = \begin{pmatrix} \frac{3\sqrt{3}-2}{8} \\ \frac{2\sqrt{3}+3}{8} \end{pmatrix}$$ (6.98)
So $\mx{A}$ can be used to transform a vector in $\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}$ so that it instead is expressed in $\{\vc{e}_1, \vc{e}_2\}$. This indicates that $\mx{A}^\T$ can be used to transform a vector in $\{\vc{e}_1, \vc{e}_2\}$ so that instead is expressed in $\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}$.

Now assume we have yet another orthonormal basis, $\{\bar{\vc{e}}_1, \bar{\vc{e}}_2\}$, whose corresponding transform matrix is $\mx{B}$. This means that to take a vectors from $\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}$ to $\{\bar{\vc{e}}_1, \bar{\vc{e}}_2\}$, you first use $\mx{A}$ to get to $\{\vc{e}_1, \vc{e}_2\}$ and then $\mx{B}^\T$ to get to $\{\bar{\vc{e}}_1, \bar{\vc{e}}_2\}$. If we were to apply this to a vector, $\vc{v}$, then would be expressed as
 $$\vc{v}' = \mx{B}^\T \mx{A} \vc{v},$$ (6.99)
where $\vc{v}'$ is expressed in $\{\bar{\vc{e}}_1, \bar{\vc{e}}_2\}$. The first steps of this are visualized in Interactive Illustration 6.10.
Interactive Illustration 6.10: In the basis $\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}$, we have a vector as indicated by the gray circle (arrow omitted for the sake of clarity), and its coordinates are indicated using the dashed lines. Click/touch Forward to continue.
Interactive Illustration 6.10: Here, we have introduced a new basis, namely, $\hid{\{\vc{e}_1, \vc{e}_2\}}$, and when performing a change of basis, the task is to find a transform (described using a matrix) which when applied to a vector in $\hid{\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}}$ gives us the coordinates of the same vector in $\hid{\{\vc{e}_1, \vc{e}_2\}}$.
$\textcolor{#aa0000}{\hat{\vc{e}}_1}$
$\textcolor{#aa0000}{\hat{\vc{e}}_1}$
$\textcolor{#aa0000}{\hat{\vc{e}}_1}$
$\textcolor{#aa0000}{\hat{\vc{e}}_2}$
$\textcolor{#aa0000}{\hat{\vc{e}}_2}$
$\textcolor{#aa0000}{\hat{\vc{e}}_2}$
$\textcolor{#00aa00}{\vc{e}_1}$
$\textcolor{#00aa00}{\vc{e}_2}$

In Section 6.1, we had one image to the left (original) and another image to the right. The right image is the left image manipulated in a certain way using a matrix. A TV or computer display contains of a number (often millions) of pixels (picture elements) and each pixel has a red, green, and a blue component. For each, pixel we can put these into a vector, i.e.,
 $$\vc{p} = \begin{pmatrix} r\\ g\\ b \end{pmatrix},$$ (6.100)
where $r$ is the red component of the pixel, $g$ is the green component, and $b$ is the blue component. In the example, we also had a $3\times 3$ matrix $\mx{M}$ that was applied to each pixel. This was done as
 \begin{align} \vc{p}' = \begin{pmatrix} r'\\ g'\\ b' \end{pmatrix}= \mx{M}\vc{p} = \begin{pmatrix} m_{11} && m_{12} && m_{13} \\ m_{21} && m_{22} && m_{23} \\ m_{31} && m_{32} && m_{23} \end{pmatrix} \begin{pmatrix} r\\ g\\ b \end{pmatrix}, \end{align} (6.101)
Note that we have used $r$, $g$, $b$ for the vector components in this example instead of $x$, $y$, $z$ or $p_x$, $p_y$, $p_z$. This is to make it clearer what we are manipulating. Using the rules for matrix-vector multiplication, we see, for example, that $r' = m_{11} r+ m_{12}g+ m_{13}b$, etc. Hence, if we use the identity matrix, $\mx{I}$, the original image is obtained and using $\mx{M} = \left( \begin{smallmatrix} 1 & 1 & 1 \\ 0 & 0 & 0\\ 0 & 0 & 0 \end{smallmatrix} \right)$, we see that $r' = r+g+b$, while $g'=b'=0$, which results in a red image. Finally, if all rows in $\mx{M}$ are identical, we will obtain $r'=g'=b'$, i.e., a gray image. There are many more ways to manipulate images with, but matrices can take you a long way.

 Chapter 5: Gaussian Elimination (previous) Chapter 7: Determinants (next)