Loading and building chapter...

Enter the matrix.

Matrices are a very powerful tool to manipulate data with. As can be seen in the example shown in Interactive Illustration 6.1, matrices can be used to transform images in different ways. After the theory has been presented, the text will connect back to this example.

As we saw in Chapter 5, a typical linear system of equations can look like

\begin{equation} \begin{cases} \begin{array}{rrrl} 2 & \!\!\!\!\!\! x_1 + 4 &\!\!\!\!\!\!\!x_2 - 2 &\!\!\!\!\!\!x_3 = \hid{-}16, \\ - & \!\!\!\!\!\! x_1 - 7 &\!\!\!\!\!\!x_2 + 2 &\!\!\!\!\!\!x_3 = -27, \\ & 3 &\!\!\!\!\!\!x_2 - 6 &\!\!\!\!\!\!x_3 = -21. \\ \end{array} \end{cases} \end{equation} | (6.1) |

\begin{equation} \left(\begin{array}{rrr} 2 & 4 & -2 \\ -1 & -7 & 2 \\ 0 & 3 & -6 \end{array}\right) . \end{equation} | (6.2) |

Definition 6.1:
Matrix

A matrix, $\mx{A}$, is a two-dimensional array of scalars, $a_{ij}$, with $r$ rows and $c$ columns, e.g.,

The size of the matrix is $r \times c$, i.e., the number of rows times the number of columns.
The matrix is called square if $r=c$.
A short-hand notation for the elements in the matrix $\mx{A}$ is $[ a_{ij} ]$, which is convenient
when dealing with operations on matrices, which we will see.

Note that the notation for a matrix is upper-case bold letters, e.g., $\mx{A}$, and
as usual, all scalars are lower-case italic letters, $a_{ij}$., where $i$ is the row
and $j$ is the column of the matrix element.
Sometimes, it is convenient to extract out either a particular column of scalars, or a
particular row. Note that for a $r \times c$ matrix, $\mx{A}$,
there are $r$ different row vectors and $c$ different column vectors.
The column vectors for a matrix, $\mx{A}$, are
A matrix, $\mx{A}$, is a two-dimensional array of scalars, $a_{ij}$, with $r$ rows and $c$ columns, e.g.,

\begin{equation} \left( \begin{array}{cccc} a_{11} & a_{12} & \dots & a_{1c} \\ a_{21} & a_{22} & \dots & a_{2c} \\ \vdots & \vdots & \ddots & \vdots \\ a_{r1} & a_{r2} & \dots & a_{rc} \end{array} \right). \end{equation} | (6.3) |

\begin{align} \mx{A}=& \left( \begin{array}{rrr} 2 & 4 & -2 \\ -1 & -7 & 2 \\ 0 & 3 & -6 \end{array} \right) = \left( \begin{array}{ccc} \vert & \vert & \vert \\ \vc{a}_{,1} & \vc{a}_{,2} & \vc{a}_{,3} \\ \vert & \vert & \vert \end{array} \right), \\ &\\ &\\ &\text{where } \vc{a}_{,1} = \left( \begin{array}{rrr} 2 \\ -1\\ 0 \end{array} \right), \ \ \vc{a}_{,2} = \left( \begin{array}{rrr} 4 \\ -7\\ 3 \end{array} \right), \ \ \text{and } \vc{a}_{,3} = \left( \begin{array}{rrr} -2 \\ 2\\ -6 \end{array} \right). \end{align} | (6.4) |

\begin{align} \mx{A}=& \left( \begin{array}{rrr} 2 & 4 & -2 \\ -1 & -7 & 2 \\ 0 & 3 & -6 \end{array} \right) = \left( \begin{array}{c} -\,\,\, \vc{a}_{1,}^\T - \\ -\,\,\, \vc{a}_{2,}^\T - \\ -\,\,\, \vc{a}_{3,}^\T - \end{array} \right), \\ &\text{where } \vc{a}_{1,} = \left( \begin{array}{rrr} 2 \\ 4\\ -2 \end{array} \right), \ \ \vc{a}_{2,} = \left( \begin{array}{rrr} -1 \\ -7\\ 2 \end{array} \right), \ \ \text{and } \vc{a}_{3,} = \left( \begin{array}{rrr} 0 \\ 3\\ -6 \end{array} \right). \end{align} | (6.5) |

Definition 6.2:
Row and Column Vectors from a Matrix

The $i$:th row vector of an $r \times c$ matrix, $\mx{A}$, is denoted by $\vc{a}_{i,}^\T$, and it has $c$ scalar elements in it.

The $i$:th column vector of $\mx{A}$ is denoted by $\vc{a}_{,i}$, which has $r$ scalar elements in it. Using vectors, a matrix can thus be written in the following two ways,

In the definition above, we have omitted vertical and horizontal lines (as used
in (6.5) and (6.4))
in Equation (6.6).
Note that the row vector is denoted $\vc{a}_{i,}^\T$, i.e., it is a column vector ($\vc{a}_{i,}$),
which has been transposed into a row vector.
The $i$:th row vector of an $r \times c$ matrix, $\mx{A}$, is denoted by $\vc{a}_{i,}^\T$, and it has $c$ scalar elements in it.

The $i$:th column vector of $\mx{A}$ is denoted by $\vc{a}_{,i}$, which has $r$ scalar elements in it. Using vectors, a matrix can thus be written in the following two ways,

\begin{equation} \mx{A} = \bigl(\vc{a}_{,1} \,\,\, \vc{a}_{,2} \,\,\,\dots\,\,\, \vc{a}_{,c}\bigr) = \left( \begin{array}{c} \vc{a}_{1,}^\T\\ \vc{a}_{2,}^\T\\ \vdots \\ \vc{a}_{r,}^\T\\ \end{array} \right). \end{equation} | (6.6) |

There are also two special constant matrices called the

Definition 6.3:
Identity Matrix

An identity matrix, $\mx{I}$, of size $n \times n$ has zeroes everywhere except in the diagonal that goes from the upper left to the lower right, where there are ones, i.e.,

Hence, a $2 \times 2$ identity matrix is
$\mx{I} =\bigl( \begin{smallmatrix} 1 & 0\\ 0 & 1\end{smallmatrix} \bigr)$,
and a $3\times 3$ identity matrix is
$\mx{I} =\Bigl( \begin{smallmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{smallmatrix} \Bigr)$. As can be seen, we have used $\mx{I}$ for both these matrices.
Next follows the definition of the zero matrix.
An identity matrix, $\mx{I}$, of size $n \times n$ has zeroes everywhere except in the diagonal that goes from the upper left to the lower right, where there are ones, i.e.,

\begin{equation} \mx{I} = \left( \begin{array}{cccc} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 1 \end{array} \right). \end{equation} | (6.7) |

Definition 6.4:
Zero Matrix

A zero matrix, $\mx{O}$, has all its matrix elements equal to zero.

In most cases, the size of $\mx{I}$ and $\mx{O}$ can be determined from the context in which they are used,
and otherwise, we will mention what the size is.
A zero matrix, $\mx{O}$, has all its matrix elements equal to zero.

Note also that if the number of columns of a matrix, $\mx{A}$, is 1, i.e., $c=1$, then we have a column vector, and we may denote it as a vector instead, i.e., $\vc{a}$. Furthermore, if the number of rows is 1, i.e., $r=1$, in a matrix, $\mx{B}$, then we have a row vector, and may we denote it by a transposed column vector instead, i.e., $\vc{b}^\T$. Below, we show a $3\times 1$ matrix (column vector) and a $1\times 3$ matrix (row vector), which is written out as a transposed column vector.

\begin{equation} \underbrace{ \mx{A} }_{3\times 1} = \left( \begin{array}{c} 3 \\ 2 \\ 6 \end{array} \right) = \underbrace{\vc{a}}_{\begin{array}{c} \text{column} \\ \text{vector} \end{array}} ,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \underbrace{ \mx{B} }_{1\times 3} = \bigl(5\,\,\, 1 \,\,\, 4\bigr) = \underbrace{ \vc{b}^\T }_{\begin{array}{c} \text{transposed} \\ \text{column} \\ \text{vector} \end{array}} \end{equation} | (6.8) |

As we have already seen in Chapter 2, vectors can be transposed. This means that a column vector becomes a row vector and vice versa. A matrix can also be transposed as defined below.

Definition 6.5:
Matrix Transpose

The transpose of an $r\times c$ matrix, $\mx{A}=[a_{ij}]$, is denoted by $\mx{A}^\T$ (of size $c\times r$) and is formed by making the columns of $\mx{A}$ into rows in $\mx{A}^\T$ (or rows into columns, which is equivalent). This can also be expressed using the shorthand notation for a matrix as

Note that the order of the indices has changed from $ij$ to $ji$.

A square matrix may also be symmetric if it can be reflected along the main diagonal (from upper left down to lower right)
while remaining the same. This is summarized in the following definition.
The transpose of an $r\times c$ matrix, $\mx{A}=[a_{ij}]$, is denoted by $\mx{A}^\T$ (of size $c\times r$) and is formed by making the columns of $\mx{A}$ into rows in $\mx{A}^\T$ (or rows into columns, which is equivalent). This can also be expressed using the shorthand notation for a matrix as

\begin{equation} \mx{A}^\T = [a_{ji}]. \end{equation} | (6.9) |

Definition 6.6:
Symmetric Matrix

A square matrix is called symmetric if $\mx{A}=\mx{A}^\T$.

Next follows some examples of matrix transposing.
A square matrix is called symmetric if $\mx{A}=\mx{A}^\T$.

Example 6.1:
Matrix Transposes

Assume we have the following matrices,

Their corresponding transposes are

Note that $\mx{A}=\mx{A}^\T$, which means that $\mx{A}$ is
symmetric (Definition 6.6).
It is also worth noting that the size of $\mx{B}$ is $3\times 2$, while the
size of $\mx{B}^\T$ is $2\times 3$, which makes sense, since the rows turn into columns when
transposing. Finally, $\mx{C}$ is a single row, which turns into a single column in $\mx{C}^\T$.
This is similar to how a transposed column vector becomes a row vector.

With these definitions, it is time to attempt to visualize a matrix with geometry. This
is not done in any books that we have seen, however, it makes for a deeper understanding in some
cases. See Interactive Illustration 6.2.
Assume we have the following matrices,

\begin{equation} \mx{A}= \left( \begin{array}{rrr} 1 & 6 & 5 \\ 6 & 2 & 4 \\ 5 & 4 & 3 \end{array} \right) ,\spc\spc \mx{B}= \left( \begin{array}{rr} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{array} \right) ,\spc\spc \mx{C}=\bigl(1\,\,\, 2\,\,\, 3 \bigr). \end{equation} | (6.10) |

\begin{equation} \mx{A}^\T= \left( \begin{array}{rrr} 1 & 6 & 5 \\ 6 & 2 & 4 \\ 5 & 4 & 3 \end{array} \right) ,\spc\spc \mx{B}^\T= \left( \begin{array}{rrr} 1 & 2 & 3 \\ 4 & 5 & 6 \end{array} \right) ,\spc\spc \mx{C}^\T= \left( \begin{array}{c} 1 \\ 2 \\ 3 \end{array} \right) \end{equation} | (6.11) |

$\mx{A} = \left(\begin{array}{l} \hid{1} \\ \hid{1} \end{array}\right.$

$\left.\begin{array}{l} \hid{1} \\ \hid{1} \end{array}\right)$

$\textcolor{#aa0000}{\vc{a}_{,1}}$

$\textcolor{#009000}{\vc{a}_{,2}}$

$\textcolor{#0000aa}{\vc{a}_{,3}}$

$\mx{A} = \left(\begin{array}{l} \hid{1} \\ \hid{1} \\ \hid{1} \end{array}\right.$

$\left.\begin{array}{l} \hid{1} \\ \hid{1} \\ \hid{1} \end{array}\right)$

There are three fundamental operations on matrices. These are

- matrix multiplication by a scalar,
- matrix addition, and
- matrix-matrix multiplication.

Matrix multiplication by a scalar is quite similar to vector multiplication by a scalar (Section 2.3), as can be seen in the following definition.

Definition 6.7:
Matrix Multiplication by a Scalar

A matrix $\mx{A}$ can be multiplied by a scalar $k$ to form a new matrix $\mx{S} = k \mx{A}$, which is of the same size as $\mx{A}$.

This is more compactly expressed as:
$[ s_{ij} ] = k[ a_{ij} ] = [ k a_{ij} ]$.

A short example on the scalar multiplication by a matrix follows.
A matrix $\mx{A}$ can be multiplied by a scalar $k$ to form a new matrix $\mx{S} = k \mx{A}$, which is of the same size as $\mx{A}$.

\begin{equation} \mx{S}= \left( \begin{array}{cccc} s_{11} & s_{12} & \dots & s_{1c} \\ s_{21} & s_{22} & \dots & s_{2c} \\ \vdots & \vdots & \ddots & \vdots \\ s_{r1} & s_{r2} & \dots & s_{rc} \end{array} \right) = \left( \begin{array}{cccc} k a_{11} & k a_{12} & \dots & k a_{1c} \\ k a_{21} & k a_{22} & \dots & k a_{2c} \\ \vdots & \vdots & \ddots & \vdots \\ k a_{r1} & k a_{r2} & \dots & k a_{rc} \end{array} \right) \end{equation} | (6.12) |

Example 6.2:
Matrix Multiplication by a Scalar

A $2\times 2$ matrix $\mx{A}$ is

If we want to multiply this matrix by a scalar, $k=4$, we get

A $2\times 2$ matrix $\mx{A}$ is

\begin{equation} \mx{A}= \left( \begin{array}{rr} 5 & -2 \\ 3 & 8 \end{array} \right). \end{equation} | (6.13) |

\begin{equation} k\mx{A}=4 \left( \begin{array}{rr} 5 & -2 \\ 3 & 8 \end{array} \right) = \left( \begin{array}{rr} 4\cdot 5 & 4\cdot (-2) \\ 4\cdot 3 & 4\cdot 8 \end{array} \right) = \left( \begin{array}{rr} 20 & -8 \\ 12 & 32 \end{array} \right). \end{equation} | (6.14) |

Matrix addition is also similar to vector addition (Section 2.2).

Definition 6.8:
Matrix Addition

If two matrices $\mx{A}$ and $\mx{B}$ have the same size, then the two matrices can be added to form a new matrix, $\mx{S}=\mx{A} + \mx{B}$, of the same size, where each element $s_{ij}$ is the sum of the elements in the same position in $\mx{A}$ and $\mx{B}$, i.e.,

This is more compactly expressed as:
$[ s_{ij} ] = [ a_{ij} ] + [ b_{ij} ] = [ a_{ij} + b_{ij} ]$.

With the help from Definition 6.7 (matrix multiplication by a scalar),
and with the definition of matrix addition, we can easily subtract two matrices. The difference $\mx{D}$ between
$\mx{A}$ and $\mx{B}$ becomes
If two matrices $\mx{A}$ and $\mx{B}$ have the same size, then the two matrices can be added to form a new matrix, $\mx{S}=\mx{A} + \mx{B}$, of the same size, where each element $s_{ij}$ is the sum of the elements in the same position in $\mx{A}$ and $\mx{B}$, i.e.,

\begin{equation} \mx{S}= \left( \begin{array}{cccc} s_{11} & s_{12} & \dots & s_{1c} \\ s_{21} & s_{22} & \dots & s_{2c} \\ \vdots & \vdots & \ddots & \vdots \\ s_{r1} & s_{r2} & \dots & s_{rc} \end{array} \right) = \left( \begin{array}{cccc} a_{11}+b_{11} & a_{12}+b_{12} & \dots & a_{1c}+b_{1c} \\ a_{21}+b_{21} & a_{22}+b_{22} & \dots & a_{2c}+b_{2c} \\ \vdots & \vdots & \ddots & \vdots \\ a_{r1}+b_{r1} & a_{r2}+b_{r2} & \dots & a_{rc}+b_{rc} \end{array} \right). \end{equation} | (6.15) |

\begin{gather} \mx{D} = \mx{A} + (-1)\mx{B} = \mx{A} - \mx{B} \\ \Longleftrightarrow \\ [d_{ij}] = [a_{ij}]+ (-1)[b_{ij}] = [a_{ij}]+ [-b_{ij}] = [a_{ij} - b_{ij}], \end{gather} | (6.16) |

A short example on matrix addition follows.

Example 6.3:
Matrix Addition

Assume we have two $2\times 2$ matrices $\mx{A}$ and $\mx{B}$, which are set as

The matrix addition, $\mx{S}=\mx{A}+\mx{B}$, is

Assume we have two $2\times 2$ matrices $\mx{A}$ and $\mx{B}$, which are set as

\begin{equation} \mx{A}= \left( \begin{array}{rr} 5 & -2 \\ 3 & 8 \end{array} \right) \,\,\,\,\,\,\text{and}\,\,\,\,\,\, \mx{B}= \left( \begin{array}{rr} -1 & 2 \\ 4 & -6 \end{array} \right). \end{equation} | (6.17) |

\begin{equation} \mx{S}=\mx{A}+\mx{B}= \left( \begin{array}{rr} 5 & -2 \\ 3 & 8 \end{array} \right) + \left( \begin{array}{rr} -1 & 2 \\ 4 & -6 \end{array} \right) = \left( \begin{array}{rr} 5-1 & -2+2 \\ 3+4 & 8-6 \end{array} \right) = \left( \begin{array}{rr} 4 & 0 \\ 7 & 2 \end{array} \right). \end{equation} | (6.18) |

While matrix multiplication by a scalar and matrix addition are rather straightforward, the matrix-matrix multiplication may not be so at first. However, as we will see, it is an extremely powerful tool. The definition is below.

Definition 6.9:
Matrix-Matrix Multiplication

If $\mx{A}$ is an $r \times s$ matrix and $\mx{B}$ is an $s\times t$ matrix, then the product matrix $\mx{P}=\mx{A}\mx{B}$, which is an $r \times t$ matrix, is defined as

Note that there must be as many columns in $\mx{A}$ as there are rows in $\mx{B}$, otherwise the
matrix-matrix multiplication is not defined.
The matrix-matrix multiplication is more compactly expressed as
$\bigl[p_{ij}\bigr] = \Bigl[\sum_{k=1}^s a_{ik} b_{kj}\Bigr]$.

The size of the product may be remembered more easily with this rule,
If $\mx{A}$ is an $r \times s$ matrix and $\mx{B}$ is an $s\times t$ matrix, then the product matrix $\mx{P}=\mx{A}\mx{B}$, which is an $r \times t$ matrix, is defined as

\begin{align} \mx{P} =& \mx{A}\mx{B} = \left( \begin{array}{ccc} a_{11} & \dots & a_{1s} \\ \vdots & \ddots & \vdots \\ a_{r1} & \dots & a_{rs} \end{array} \right) \left( \begin{array}{ccc} b_{11} & \dots & b_{1t} \\ \vdots & \ddots & \vdots \\ b_{s1} & \dots & b_{st} \end{array} \right)\\ &\\ =& \left( \begin{array}{ccc} \sum_{k=1}^s a_{1k} b_{k1} & \dots & \sum_{k=1}^s a_{1k} b_{kt} \\ \vdots & \ddots & \vdots \\ \sum_{k=1}^s a_{rk} b_{k1} & \dots & \sum_{k=1}^s a_{sk} b_{kt} \end{array} \right) = \left( \begin{array}{ccc} p_{11} & \dots & p_{1t} \\ \vdots & \ddots & \vdots \\ p_{r1} & \dots & p_{rt} \end{array} \right). \end{align} | (6.19) |

\begin{equation} (r \times \bcancel{s})\, (\bcancel{s} \times t) \longrightarrow (r \times t), \end{equation} | (6.20) |

Note that the sum in the product, $\sum_{k=1}^s a_{ik} b_{kj}$, reminds us of a dot product in an orthonormal basis (Definition 3.4). Hence, by using Definition 6.2, we can express the matrix-matrix multiplication in terms of the row vectors of $\mx{A}$ and the column vectors of $\mx{B}$ as

\begin{align} \mx{P} = \mx{A}\mx{B} &= \left( \begin{array}{c} -\,\,\, \vc{a}_{1,}^\T - \\ \textcolor{#cc0000}{-}\,\,\, \textcolor{#cc0000}{\vc{a}_{2,}^\T} \textcolor{#cc0000}{-} \\ \vdots \\ -\,\,\, \vc{a}_{r,}^\T - \end{array} \right) \left( \begin{array}{ccccc} \vert & \vert & \textcolor{#cc0000}{\vert} & & \vert \\ \vc{b}_{,1} & \vc{b}_{,2} & \textcolor{#cc0000}{\vc{b}_{,3}} & \dots & \vc{b}_{,t} \\ \vert & \vert & \textcolor{#cc0000}{\vert} & & \vert \end{array} \right) \\ &\\ &= \left( \begin{array}{ccccc} \vc{a}_{1,}\cdot \vc{b}_{,1} & \vc{a}_{1,}\cdot \vc{b}_{,2} & \vc{a}_{1,}\cdot \vc{b}_{,3} & \dots & \vc{a}_{1,}\cdot \vc{b}_{,t} \\ \vc{a}_{2,}\cdot \vc{b}_{,1} & \vc{a}_{2,}\cdot \vc{b}_{,2} & \textcolor{#cc0000}{\vc{a}_{2,}\cdot \vc{b}_{,3}} & \dots & \vc{a}_{2,}\cdot \vc{b}_{,t} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \vc{a}_{r,}\cdot \vc{b}_{,1} & \vc{a}_{r,}\cdot \vc{b}_{,2} & \vc{a}_{r,}\cdot \vc{b}_{,3} & \dots & \vc{a}_{r,}\cdot \vc{b}_{,t} \\ \end{array} \right). \end{align} | (6.21) |

\begin{equation} [p_{ij}] = \Biggl[\sum_{k=1}^s a_{ik} b_{kj}\Biggr] = \bigl[ \vc{a}_{i,} \cdot \vc{b}_{,j} \bigr], \end{equation} | (6.22) |

\begin{equation} [p_{ij}] = \Biggl[ \sum_{k=1}^s a_{ik} b_{kj} \Biggr] = \biggl[ \vc{a}_{i,}^\T \vc{b}_{,j} \biggr] = \Biggl[ \Bigl(a_1\spc a_2 \dots a_s \Bigr) \left( \begin{array}{c} b_1\\ b_2\\ \vdots\\ b_s \end{array} \right) \Biggr]. \end{equation} | (6.23) |

Example 6.4:
Matrix-Matrix Multiplication

In this example, we will multiply a $3 \times 2$ matrix $\mx{M}$ by a $2\times 3$ matrix $\mx{N}$ i.e.,

Note that the rows in $\mx{M}$ and the columns in $\vc{N}$ have been color coded here to more easily
see what is going on.
The size of the product is (see Rule (6.20)):
$(3 \times \bcancel{2})\, (\bcancel{2} \times 3) \longrightarrow (3 \times 3)$, i.e.,
the result is a
$3\times 3$ matrix.

Per Definition 6.9, we know that the matrix-matrix
multiplication, $\mx{M}\mx{N}$, is only defined if $\mx{M}$ is $r \times s$ and $\mx{N}$ is $s\times t$.
This means the number of columns ($s$) in $\mx{M}$ must be equal to the number of rows ($s$) in $\mx{N}$.
Hence, $r$ and $t$ can be arbitrary values as long as they are $\geq 1$.
If $t=1$ then $\mx{N}$ has only one column, and we do not really have a matrix, but rather a column vector,
as exemplified in (6.8). This means that matrix-vector multiplication
is a subset of the matrix-matrix multiplication. An example is given below.
In this example, we will multiply a $3 \times 2$ matrix $\mx{M}$ by a $2\times 3$ matrix $\mx{N}$ i.e.,

\begin{gather} \mx{M} = \left( \begin{array}{rr} 4 & 2 \\ 3 & -2 \\ 0 & -1 \end{array} \right) ,\,\,\,\,\,\, \mx{N} = \left( \begin{array}{rrr} 2 & 1 & 3 \\ -1 & 5 & 8 \\ \end{array} \right) \\ \, \\ \mx{M}\vc{N} = \left( \begin{array}{rr} \textcolor{#00aaaa}{4} & \textcolor{#00aaaa}{2} \\ \textcolor{#aaaa00}{3} & \textcolor{#aaaa00}{-2} \\ \textcolor{#aa00aa}{0} & \textcolor{#aa00aa}{-1} \end{array} \right) \left( \begin{array}{rrr} \textcolor{#cc0000}{2} & \textcolor{#00cc00}{1} & \textcolor{#0000cc}{3} \\ \textcolor{#cc0000}{-1} & \textcolor{#00cc00}{5} & \textcolor{#0000cc}{8} \\ \end{array} \right) = \\ \, \\ \left( \begin{array}{rrr} \textcolor{#00aaaa}{4} \cdot \textcolor{#cc0000}{2} + \textcolor{#00aaaa}{2}\cdot (\textcolor{#cc0000}{-1}) & \textcolor{#00aaaa}{4} \cdot \textcolor{#00cc00}{1} + \textcolor{#00aaaa}{2}\cdot \textcolor{#00cc00}{5} & \textcolor{#00aaaa}{4} \cdot \textcolor{#0000cc}{3} + \textcolor{#00aaaa}{2}\cdot \textcolor{#0000cc}{8} \\ \textcolor{#aaaa00}{3} \cdot \textcolor{#cc0000}{2} \textcolor{#aaaa00}{-2}\cdot (\textcolor{#cc0000}{-1}) & \textcolor{#aaaa00}{3} \cdot \textcolor{#00cc00}{1} \textcolor{#aaaa00}{-2}\cdot \textcolor{#00cc00}{5} & \textcolor{#aaaa00}{3} \cdot \textcolor{#0000cc}{3} \textcolor{#aaaa00}{-2}\cdot \textcolor{#0000cc}{8} \\ \textcolor{#aa00aa}{0} \cdot \textcolor{#cc0000}{2} \textcolor{#aa00aa}{-1}\cdot (\textcolor{#cc0000}{-1}) & \textcolor{#aa00aa}{0} \cdot \textcolor{#00cc00}{1} \textcolor{#aa00aa}{-1}\cdot \textcolor{#00cc00}{5} & \textcolor{#aa00aa}{0} \cdot \textcolor{#0000cc}{3} \textcolor{#aa00aa}{-1}\cdot \textcolor{#0000cc}{8} \end{array} \right) = \\ \, \\ \left( \begin{array}{rrr} 6 & 14 & 28 \\ 8 & -7 & -7 \\ 1 & -5 & -8 \end{array} \right). \end{gather} | (6.24) |

Example 6.5:
Matrix-Vector Multiplication

In this example, a $3\times 3$ matrix, $\mx{M}$, will be multiplied by a three-dimensional vector, $\vc{v}$, i.e.,

Note that the rows in $\mx{M}$ and $\vc{v}$ have been color coded here to more easily see what is going on.
The matrix-vector multiplication behaves exactly as the matrix-matrix multiplication, except
here, the second operand $(\vc{v})$ has only one column.
The size of the product is (see Rule (6.20)):
$(3 \times \bcancel{3})\, (\bcancel{3} \times 1) \longrightarrow (3 \times 1)$, i.e.,
the result is a
$3\times 1$ matrix, which is a three-dimensional column vector.

Note that the example in (6.1) is a linear system of equations that can be expressed using
a matrix and two vectors, i.e.,
In this example, a $3\times 3$ matrix, $\mx{M}$, will be multiplied by a three-dimensional vector, $\vc{v}$, i.e.,

\begin{gather} \mx{M} = \left( \begin{array}{rrr} 1 & 0 & 2 \\ 2 & -1 & 3 \\ 4 & -2 & -3 \end{array} \right) ,\,\,\,\,\,\, \vc{v} = \left( \begin{array}{r} -4\\ 5 \\ 6 \end{array} \right) \\ \, \\ \mx{M}\vc{v} = \left( \begin{array}{rrr} \textcolor{#00aaaa}{1} & \textcolor{#00aaaa}{0} & \textcolor{#00aaaa}{2} \\ \textcolor{#aaaa00}{2} & \textcolor{#aaaa00}{-1} & \textcolor{#aaaa00}{3} \\ \textcolor{#aa00aa}{4} & \textcolor{#aa00aa}{-2} & \textcolor{#aa00aa}{-3} \end{array} \right) \left( \begin{array}{r} \textcolor{#cc0000}{-4}\\ \textcolor{#00cc00}{5} \\ \textcolor{#0000cc}{6} \end{array} \right) = \left( \begin{array}{r} \textcolor{#00aaaa}{1} \cdot (\textcolor{#cc0000}{-4}) + \textcolor{#00aaaa}{0}\cdot \textcolor{#00cc00}{5} + \textcolor{#00aaaa}{2} \cdot \textcolor{#0000cc}{6}\\ \textcolor{#aaaa00}{2} \cdot (\textcolor{#cc0000}{-4}) \textcolor{#aaaa00}{- 1}\cdot \textcolor{#00cc00}{5} + \textcolor{#aaaa00}{3} \cdot \textcolor{#0000cc}{6}\\ \textcolor{#0000cc}{4} \cdot (\textcolor{#cc0000}{-4}) \textcolor{#0000cc}{- 2}\cdot \textcolor{#00cc00}{5} \textcolor{#0000cc}{- 3} \cdot \textcolor{#0000cc}{6} \end{array} \right) = \left( \begin{array}{r} 8 \\ 5 \\ -44 \end{array} \right). \end{gather} | (6.25) |

\begin{gather} \begin{cases} \begin{array}{rrrl} 2 & \!\!\!\!\!\! x_1 + 4 &\!\!\!\!\!\!\!x_2 - 2 &\!\!\!\!\!\!x_3 = \hid{-}16 \\ - & \!\!\!\!\!\! x_1 - 7 &\!\!\!\!\!\!x_2 + 2 &\!\!\!\!\!\!x_3 = -27 \\ & 3 &\!\!\!\!\!\!x_2 - 6 &\!\!\!\!\!\!x_3 = -21 \\ \end{array} \end{cases} \\ \Longleftrightarrow \\ \underbrace{ \left( \begin{array}{rrr} -2 & 4 & -2 \\ -1 & -7 & 2 \\ 0 & 3 & -6 \end{array} \right) }_{\mx{A}} \, \underbrace{ \left( \begin{array}{c} x_1\\ x_2\\ x_3 \end{array} \right) }_{\vc{x}} = \underbrace{ \left( \begin{array}{r} 16\\ -27\\ -21 \end{array} \right) }_{\vc{b}} \\ \Longleftrightarrow \\ \mx{A} \vc{x} = \vc{b}. \end{gather} | (6.26) |

\begin{gather} \begin{cases} \begin{array}{r} z_1 = a_{11} y_1 + a_{12} y_2 \\ z_2 = a_{21} y_1 + a_{22} y_2 \end{array} \end{cases} \,\,\,\,\,\,\,\,\text{and}\,\,\,\,\,\,\,\, \begin{cases} \begin{array}{r} y_1 = b_{11} x_1 + b_{12} x_2 \\ y_2 = b_{21} x_1 + b_{22} x_2 \end{array} \end{cases}. \end{gather} | (6.27) |

\begin{gather} \begin{cases} \begin{array}{r} z_1 = a_{11} (b_{11} x_1 + b_{12} x_2) + a_{12} (b_{21} x_1 + b_{22} x_2) \\ z_2 = a_{21} (b_{11} x_1 + b_{12} x_2) + a_{22} (b_{21} x_1 + b_{22} x_2) \end{array} \end{cases} \\ \Longleftrightarrow \\ \begin{cases} \begin{array}{r} z_1 = (a_{11} b_{11} + a_{12} b_{21}) x_1 + (a_{11} b_{12} + a_{12} b_{22}) x_2) \\ z_2 = (a_{21} b_{11} + a_{22} b_{21}) x_1 + (a_{21} b_{12} + a_{22} b_{22}) x_2) \end{array} \end{cases}. \end{gather} | (6.28) |

\begin{gather} \vc{z}=\mx{A}\vc{y} \spc\spc\text{and}\spc\spc \vc{y}=\mx{B}\vc{x} \\ \Longleftrightarrow \\ \vc{z}=\mx{A}\mx{B}\vc{x}, \end{gather} | (6.29) |

As we saw in Equation (6.21), the first operand in the matrix-matrix multiplication, $\mx{A}\mx{B}$, can be seen as a set of row vectors, while the second operand can be seen as a set of column vectors. Now, we have just seen that a matrix times a vector produces a vector (if the sizes match). Hence, we can think of the second operand $(\mx{B})$ as a set of column vectors that are transformed by the first operand, namely, $\mx{A}$. This can be expressed as

\begin{align} \mx{P} = \mx{A}\mx{B} = \mx{A} \left( \begin{array}{ccc} \vert & & \vert \\ \vc{b}_{,1} & \dots & \vc{b}_{,t} \\ \vert & & \vert \end{array} \right) = \left( \begin{array}{ccc} \vert & & \vert \\ \mx{A}\vc{b}_{,1} & \dots & \mx{A}\vc{b}_{,t} \\ \vert & & \vert \end{array} \right). \end{align} | (6.30) |

Next, we present a simple example where the dot product is expressed as matrix-matrix multiplication.

Example 6.6:
Dot Product as Matrix-Matrix Multiplication

Note that since a (column) vector, $\vc{v}$, can be written as a matrix with a single column, $\mx{V}=(\vc{v})$, one can also express the dot product between two vectors, $\vc{u}$ and $\vc{v}$, using matrix-matrix multiplication between two column vectors/matrices as

Note that since a (column) vector, $\vc{v}$, can be written as a matrix with a single column, $\mx{V}=(\vc{v})$, one can also express the dot product between two vectors, $\vc{u}$ and $\vc{v}$, using matrix-matrix multiplication between two column vectors/matrices as

\begin{gather} \vc{u} \cdot \vc{v} = \vc{u}^\T \vc{v} = \begin{pmatrix} u_1 & u_2 & \dots & u_n \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ u_n \end{pmatrix} = \sum_{i=1}^n u_i v_i. \end{gather} | (6.31) |

Example 6.7:
2000 year old Chinese problem

Linear systems of equations were studied in a classic Chinese textbook*Nine Chapters on the Mathematical Art*.
The books were compiled during the first two centuries BCE. Chapter 8 of the book is called *The rectangular array*.
It contains several problems that are systems of linear equations. One of the problems (problem 17) is:

"Now given 5 sheep, 4 dogs, 3 hens and 2 rabbits cost 1496 coins in total; 4 sheep, 2 dogs, 6 hens and 3 rabbits cost 1175 coins; 3 sheep, 1 dog, 7 hens and 5 rabbits cost 958 coints; 2 sheep, 3 dogs, 5 hens and 1 rabbit cots 861 coins. Tell: how much is each of them?"

Introduce a variable for the cost of each type of animal, so that each sheep costs $x_1$ coins, each dog costs $x_2$, each hen costs $x_3$ and each rabbit costs $x_4$ coins. Then the four statements can be written

Note that this can be rewritten on matrix-vector form as

This can be solved to recover
$x_1$, $x_2$, $x_3$ and $x_4$. Solving such systems of equations is the topic
of Chapter 5, which is about Gaussian elimination.
Solving such systems of equations was also the topic of chapter 8 of the Chinese textbook.

Next, we will also solve this system of equations using Gaussian elimination. However, to save some space, we do not write out the unknowns and we write the left and right side inside one big parenthesis with a vertical line inbetween. This results in

This notation is also used in Example 6.11.
The rules of Gaussian elimination (Theorem 5.2) can be applied still, since we have
only changed the notation.
For example, we can multiply row 2 by 5 and subtract row 1 multiplied by 4 and put the result in row 2. This gives us

Next, we want to get rid of the 3 and the 2 in the left column. We multiply row 3 by 2 and subtract row 4 times 3 and put
the result in row 4. At the same time, we multiply row 1 times 3 and subtract row 3 by 5 and put the resulting row in row 3,
which gives us

The next step is to eliminate the 7 and -7 in column 2. First, we just add row 3 and row 4 and put the result in row 4.
Second, we multiply row 2 by 7 and add to row 3 times 6 and put the resulting row in row 3, which gives us

Finally, we multiply row 3 by 27 and subtract row 4 times 30. This results in

which means that the last row says $-1395 x_4 = -40455$, i.e., $x_4 = 29$ coins (per rabbit).
We can use that to recover $x_3 = 23$ coins (per hen), and then use that to recover $x_2 = 121$ coins (per dog), and finally also
recover $x_1=177$ coins (per sheep).

Linear systems of equations were studied in a classic Chinese textbook

"Now given 5 sheep, 4 dogs, 3 hens and 2 rabbits cost 1496 coins in total; 4 sheep, 2 dogs, 6 hens and 3 rabbits cost 1175 coins; 3 sheep, 1 dog, 7 hens and 5 rabbits cost 958 coints; 2 sheep, 3 dogs, 5 hens and 1 rabbit cots 861 coins. Tell: how much is each of them?"

Introduce a variable for the cost of each type of animal, so that each sheep costs $x_1$ coins, each dog costs $x_2$, each hen costs $x_3$ and each rabbit costs $x_4$ coins. Then the four statements can be written

\begin{equation} \begin{cases} \begin{array}{rrrrrrrl} 5 x_1 & \bfm + & \bfm 4 x_2 & \bfm + & \bfm 3 x_3 & \bfm + & \bfm 2 x_4 & \bfm = 1496, \\ 4 x_1 & \bfm + & \bfm 2 x_2 & \bfm + & \bfm 6 x_3 & \bfm + & \bfm 3 x_4 & \bfm = 1175, \\ 3 x_1 & \bfm + & \bfm x_2 & \bfm + & \bfm 7 x_3 & \bfm + & \bfm 5 x_4 & \bfm = 958, \\ 2 x_1 & \bfm + & \bfm 3 x_2 & \bfm + & \bfm 5 x_3 & \bfm + & \bfm x_4 & \bfm = 861. \\ \end{array} \end{cases} \end{equation} | (6.32) |

\begin{gather} \underbrace{ \begin{pmatrix} 5 & 4 & 3 & 2 \\ 4 & 2 & 6 & 3 \\ 3 & 1 & 7 & 5 \\ 2 & 3 & 5 & 1 \\ \end{pmatrix} }_{\mx{A}} \underbrace{ \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} }_{\vc{x}} = \underbrace{ \begin{pmatrix} 1496 \\ 1175 \\ 958 \\ 861 \end{pmatrix} }_{\vc{y}} \\ \Longleftrightarrow \\ \mx{A}\vc{x} = \vc{y}. \end{gather} | (6.33) |

Next, we will also solve this system of equations using Gaussian elimination. However, to save some space, we do not write out the unknowns and we write the left and right side inside one big parenthesis with a vertical line inbetween. This results in

\begin{align} \left( \begin{array}{rrrr|r} 5 & 4 & 3 & 2 & 1496 \\ 4 & 2 & 6 & 3 & 1175 \\ 3 & 1 & 7 & 5 & 958 \\ 2 & 3 & 5 & 1 & 861 \\ \end{array} \right). \end{align} | (6.34) |

\begin{align} \left( \begin{array}{rrrr|r} 5 & 4 & 3 & 2 & 1496 \\ 0 & -6 & 18 & 7 & -109 \\ 3 & 1 & 7 & 5 & 958 \\ 2 & 3 & 5 & 1 & 861 \\ \end{array} \right). \end{align} | (6.35) |

\begin{align} \left( \begin{array}{rrrr|r} 5 & 4 & 3 & 2 & 1496 \\ 0 & -6 & 18 & 7 & -109 \\ 0 & 7 & -26 & -19 & -302 \\ 0 & -7 & -1 & 7 & -667 \\ \end{array} \right). \end{align} | (6.36) |

\begin{align} \left( \begin{array}{rrrr|r} 5 & 4 & 3 & 2 & 1496 \\ 0 & -6 & 18 & 7 & -109 \\ 0 & 0 & -30 & -65 & -2575 \\ 0 & 0 & -27 & -12 & -969 \\ \end{array} \right). \end{align} | (6.37) |

\begin{align} \left( \begin{array}{rrrr|r} 5 & 4 & 3 & 2 & 1496 \\ 0 & -6 & 18 & 7 & -109 \\ 0 & 0 & -30 & -65 & -2575 \\ 0 & 0 & 0 & -1395 & -40455 \\ \end{array} \right), \end{align} | (6.38) |

In this section, a number of useful matrices will be presented. This includes matrices for rotation, scaling, and shearing in both two and three dimensions. It should be noted that these can be generalized to higher dimensions as well.

There are many cases when it is desirable to apply a rotation. One may want to align a vector with another vector or simply rotate an object in order to animate it. The rotation matrix in two dimensions is simple to derive. A point $\vc{p}=(p_x,p_y)$ can be parameterized using a radius, $r$, and an angle, $\theta$, as $\vc{p}=(p_x,p_y)=(r\cos\theta, r\sin\theta)$. Rotating $\vc{p}$ by $\phi$ radians will produce a new vector $\vc{q}=(r\cos(\theta+\phi), r\sin(\theta+\phi))$, which can be rewritten as

\begin{align} \vc{q}=& \begin{pmatrix} r\cos(\theta+\phi) \\ r\sin(\theta+\phi) \end{pmatrix} = \begin{pmatrix} r (\cos\theta \cos\phi - \sin\theta \sin\phi) \\ r (\sin\theta \cos\phi + \cos\theta \sin\phi) \end{pmatrix} \\ =& \underbrace{ \left(\begin{array}{rr} \cos\phi & -\sin\phi \\ \sin\phi & \cos\phi \end{array}\right) }_{\mx{R}(\phi)} \underbrace{ \begin{pmatrix} r\cos\theta \\ r\sin\theta \end{pmatrix} }_{\vc{p}}, \end{align} | (6.39) |

\begin{align} \vc{q} = \mx{R}(\phi) \vc{p}. \end{align} | (6.40) |

Definition 6.10:
Two-Dimensional Rotation Matrix

A $2\times 2$ rotation matrix is defined by

where $\phi$ is the number of radians that the matrix rotates by (counter-clockwise).

In Interactive Illustration 6.3, a rotation matrix is applied to the vertices of a rectangle,
and the reader can change the angle, $\phi$, of the rotation matrix, $\mx{R}(\phi)$.
A scaling matrix is very simple in that it has zeroes everywhere except in the diagonal elements.
Hence, each diagonal element will be applied as a multiplicative factor to its respective dimension.
A $2\times 2$ rotation matrix is defined by

\begin{align} \mx{R}(\phi) = & \left(\begin{array}{rr} \cos \phi & -\sin \phi \\ \sin \phi & \cos \phi \end{array}\right), \end{align} | (6.41) |

Definition 6.11:
Two-Dimensional Scaling Matrix

A scaling matrix is defined by

where $f_x$ is the factor that is applied in the $x$-dimension and $f_y$ is applied to the $y$-dimension.

An example of how a scaling matrix applied to a rectangle can be seen
in Interactive Illustration 6.4.
The effect of a shear matrix is best seen before it is described in detail, so we recommend that
the reader explores Interactive Illustration 6.5 first,
and then a formal definition will be provided.
Given the figure above, one can come to the conclusion that
shearing is done using an identity matrix, where one of the zeroes have been replaced by a
non-zero factor, $s$.
A scaling matrix is defined by

\begin{align} \mx{S}(f_x, f_y) = & \begin{pmatrix} f_x & 0 \\ 0 & f_y \end{pmatrix}, \end{align} | (6.42) |

Definition 6.12:
Two-Dimensional Shear Matrix

A two-dimensional shear matrix is defined by either of

Note that the first subscript of $\mx{H}$ refers to which coordinate is changed, and the second
subscript refers to the coordinate that is used to scale by $s$ and add to that first coordinate.

As an example, $\mx{H}_{xy}(s)$ means that the $x$-coordinate will be sheared using $s$ times the $y$-coordinate.
This is, in fact, exactly what is shown in Interactive Illustration 6.5.
A two-dimensional shear matrix is defined by either of

\begin{align} \mx{H}_{xy}(s) = \begin{pmatrix} 1 & s \\ 0 & 1 \end{pmatrix} \ \ \ \mathrm{or} \ \ \ \mx{H}_{yx}(s) = \begin{pmatrix} 1 & 0 \\ s & 1 \end{pmatrix}. \end{align} | (6.43) |

In three dimensions, rotation, scaling, and shearing behave in pretty much the same way as in two dimensions. However, there are more ways to perform each of these. For example, rotation in two dimensions occurred in the plane, but in three dimensions, it is possible to rotate around each axis. Let us start with rotation matrices.

Definition 6.13:
Three-Dimensional Rotation Matrices

Rotation around the three major axes are done using the following three rotation matrices.

where $\phi$ is the number of radians that the matrix rotates by (counter-clockwise).

Note that a rotation matrix around an axis, $i$, that is, using $\mx{R}_i(\phi)$, leaves
the $i$-coordinates unaffected while the two remaining coordinates are rotated around axis $i$.
It should be noted that it is possible to create rotation matrices around any arbitrary axis as
well.
Rotation around the three major axes are done using the following three rotation matrices.

\begin{gather} \mx{R}_x(\phi) = \begin{pmatrix} 1 & 0 & 0 \\ 0 & \cos \phi & -\sin \phi \\ 0 & \sin \phi & \hid{-}\cos \phi \end{pmatrix}, \ \ \ \mx{R}_y(\phi) = \begin{pmatrix} \hid{-}\cos \phi & 0 & \sin \phi \\ 0 & 1 & 0 \\ -\sin \phi & 0 & \cos \phi \end{pmatrix}, \\ \mx{R}_z(\phi) = \begin{pmatrix} \cos \phi & -\sin \phi & 0 \\ \sin \phi & \hid{-}\cos \phi & 0 \\ 0 & 0 & 1 \end{pmatrix}, \end{gather} | (6.44) |

Recall that we use right-handed coordinate systems, if nothing else is mentioned. It is worth noting that $\mx{R}_x$ and $\mx{R}_z$ are similar to the two-dimensional rotation matrix (Definition 6.10), but $\mx{R}_y$ has the signs on the $\sin\phi$-terms flipped. This is so because we want to use positive orientations for $\phi$ for all three rotation matrices. For $\mx{R}_x$, for example, imagine that you are looking down the negative $x$-axis, which means you will see the $y$- and $z$-axes positively oriented. This is not the case, for the $x$- and $z$-axes, when looking down the negative $y$-axis for $\mx{R}_y$. Note that since $\cos(-\phi)=\cos\phi$ and $\sin(-\phi)=-\sin\phi$, rotating in the negative direction will flip the signs only on the $\sin\phi$-terms.

Scaling matrices in three dimensions are simpler since they only add scaling along the $z$-axis compared to a two-dimensional scaling matrix. This leads to the following definition.

Definition 6.14:
Three-Dimensional Scaling Matrix

A three-dimensional scaling matrix is defined by

where $f_x$, $f_y$, and $f_z$ are the factors that are applied in the $x$, $y$, and $z$-dimensions, respectively.

In contrast to scaling in three dimensions, shearing can be done in several different ways.
However, the matrices are still quite similar as shown in the following definition.
A three-dimensional scaling matrix is defined by

\begin{align} \mx{S}(f_x, f_y,f_z) = & \begin{pmatrix} f_x & 0 & 0\\ 0 & f_y & 0 \\ 0 & 0 & f_z \end{pmatrix}, \end{align} | (6.45) |

Definition 6.15:
Three-Dimensional Shear Matrices with One Parameter

A three-dimensional shear matrix is defined by, for example,

Similar to shearing in two dimensions (Definition 6.12),
the first subscript of $\mx{H}$ refers to which coordinate is changed and the second
subscript refers to the coordinate that is used to scale by $s$ and add to that first coordinate.
The following combinations are also possible, $\mx{H}_{xz}(s)$, $\mx{H}_{yz}(s)$, and $\mx{H}_{zx}(s)$.

Sometimes, it is useful to have shear matrices with two parameters, which is a simple extension
of the shear matrices with one parameter.
A three-dimensional shear matrix is defined by, for example,

\begin{align} \mx{H}_{xy}(s) = \begin{pmatrix} 1 & s & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{pmatrix}, \ \ \ \mx{H}_{yx}(s) = \begin{pmatrix} 1 & 0 & 0 \\ s & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}, \ \ \ \mx{H}_{zy}(s) = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & s & 1 \end{pmatrix}. \end{align} | (6.46) |

Definition 6.16:
Three-Dimensional Shear Matrices with Two Parameters

A three-dimensional shear matrix with two parameters, $s$ and $t$, is defined by

Some of the matrices in this chapter will be used to illustrate some of the properties of matrix
arithmetic, which is the topic of the next section. For example, we will show that rotating first and then
shearing is not the same as shearing first and then rotating. Hence, matrix multiplication is not commutative.
A three-dimensional shear matrix with two parameters, $s$ and $t$, is defined by

\begin{align} \mx{H}_{x}(s,t) = \begin{pmatrix} 1 & s & t \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{pmatrix}, \ \ \ \mx{H}_{y}(s,t) = \begin{pmatrix} 1 & 0 & 0 \\ s & 1 & t \\ 0 & 0 & 1 \end{pmatrix}, \ \ \ \mx{H}_{z}(s,t) = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ s & t & 1 \end{pmatrix}. \end{align} | (6.47) |

Theorem 6.1:
Matrix Arithmetic Properties

In the following, we assume that the sizes of the matrices are such that the operations are defined.

In addition, we have the following trivial set of rules: $1\mx{A}=\mx{A}$, $0\mx{A}=\mx{O}$,
$k\mx{O}=\mx{O}$, and $\mx{A}+\mx{O}=\mx{A}$.

In the following, we assume that the sizes of the matrices are such that the operations are defined.

\begin{equation} \begin{array}{llr} (i) & k(l\mx{A}) = (kl)\mx{A} & \spc\text{(associativity)} \\ (ii) & (k+l)\mx{A} = k\mx{A} +l\mx{A} & \spc\text{(distributivity)} \\ (iii) & k(\mx{A}+\mx{B}) = k\mx{A} +k\mx{B} & \spc\text{(distributivity)} \\ (iv) & \mx{A} + \mx{B} = \mx{B} + \mx{A} & \spc\text{(commutativity)} \\ (v) & \mx{A}+(\mx{B}+\mx{C})=(\mx{A}+\mx{B})+\mx{C} & \spc\text{(associativity)} \\ (vi) & \mx{A}+ (-1)\mx{A} = \mx{O} & \spc\text{(additive inverse)} \\ (vii) & \mx{A}(\mx{B}+\mx{C})=\mx{A}\mx{B}+\mx{A}\mx{C} & \spc\text{(distributivity)} \\ (viii) & (\mx{A}+\mx{B})\mx{C}=\mx{A}\mx{C}+\mx{B}\mx{C} & \spc\text{(distributivity)} \\ (ix) & (\mx{A}\mx{B})\mx{C}=\mx{A}(\mx{B}\mx{C}) & \spc\text{(associativity)} \\ (x) & \mx{I}\mx{A}=\mx{A}\mx{I}=\mx{A} & \spc\text{(multiplicative one)} \\ (xi) & (k\mx{A})^\T=k\mx{A}^\T & \spc\text{(transpose rule 1)} \\ (xii) & (\mx{A}+\mx{B})^\T=\mx{A}^\T+\mx{B}^\T & \spc\text{(transpose rule 2)} \\ (xiii) & (\mx{A}^\T)^\T=\mx{A} & \spc\text{(transpose rule 3)} \\ (xiv) & (\mx{A}\mx{B})^\T=\mx{B}^\T\mx{A}^\T & \spc\text{(transpose rule 4)} \\ \end{array} \end{equation} | (6.48) |

All of these are rather simple to prove by finding an expression for the matrix element at location $ij$ on the left hand side of the equal sign and then controlling that the same expression appears on the right hand side. All rules $(i)-(viii)$ and $(x)-(xiii)$ are trivial to prove, and so are left as exercises to the reader. In the following, we will use the fact that an element in the product, $\mx{P}=\mx{A}\mx{B}$, can be expressed using dot products (see Equation (6.22)), i.e., $[p_{ij}] = \bigl[ \vc{a}_{i,} \cdot \vc{b}_{,j} \bigr]$.

$(vii)$ A matrix element in the left side of the equal sign becomes: $\bigl[ \vc{a}_{i,} \cdot (\vc{b}_{,j}+\vc{c}_{,j}) \bigr]$, and on the right side it becomes: $\bigl[ \vc{a}_{i,} \cdot \vc{b}_{,j} \bigr] + \bigl[ \vc{a}_{i,} \cdot \vc{c}_{,j} \bigr]=$ $\bigl[ \vc{a}_{i,} \cdot (\vc{b}_{,j}+\vc{c}_{,j}) \bigr]$, where we have used the law of distributivity for dot products (Theorem 3.1). This shows that a matrix element is the same for both the left side and the right side of the equal sign.

$(viii)$ Since matrix-matrix multiplication is not commutative (i.e., $\mx{A}\mx{B}\neq\mx{B}\mx{A}$ in general - see Example 6.9), we need to prove this one as well. A matrix element in the left side of the equal sign becomes: $\bigl[ (\vc{a}_{i,}+\vc{b}_{i,}) \cdot \vc{c}_{,j} \bigr]$, while the right hand side becomes: $\bigl[ \vc{a}_{i,} \cdot \vc{c}_{,j} \bigr] + \bigl[ \vc{b}_{i,} \cdot \vc{c}_{,j} \bigr]=$ $\bigl[ (\vc{a}_{i,} + \vc{b}_{,j}) \cdot \vc{c}_{,j} \bigr]$. This concludes the proof.

$(ix)$ We start with the left hand side, and after some work we find that a matrix element at position $ij$ can be expressed as: $\sum_k (\vc{a}_{i,}\cdot \vc{b}_{,k})c_{kj}$. Similarly, the right hand side becomes: $\sum_k a_{ik}(\vc{b}_{k,}\cdot \vc{c}_{,j})$, and after developing these two expression (left and right hand side), we can see that they are exactly the same. This is left as an exercise.

$(xiv)$ The left hand side is: $(\mx{A}\mx{B})^\T=$ $\bigl[ \vc{a}_{i,}^\T \vc{b}_{,j} \bigr]^\T=$ $\bigl[ \vc{a}_{j,}^\T \vc{b}_{,i} \bigr]$, where we have used Equation (6.23) for the matrix multiplication in the first step, and the transposed (changed $i$ for $j$ and $j$ for $i$) in the second step. For the right hand side, we use a similar strategy: $\mx{B}^\T \mx{A}^\T=$ $\bigl[ b_{ij} \bigr]^\T \bigl[ a_{ij} \bigr]^\T=$ $\bigl[ b_{ji} \bigr] \bigl[ a_{ji} \bigr]=$ $\bigl[ \vc{b}_{,i} \vc{a}_{j,}^\T \bigr]$. In the last expression, we can change the order of the vectors, similar to how $\vc{a}\cdot \vc{b} = \vc{b} \cdot \vc{a}$, which makes the left and right sides equal, and that concludes the proof.

$\square$

Note that $(v)$ and $(ix)$ are particularly convenient, since we can write both $\mx{A}+\mx{B}+\mx{C}$ and $\mx{A}\mx{B}\mx{C}$ (i.e., without any parenthesis), since the order does not matter. This is similar to how $1+2+3$ and $5\cdot 3 \cdot 2$ do not need any parenthesis.

Next, we present an example with $\mx{A}\mx{B}=\mx{O}$ (zero matrix defined in Definition 6.4) without either of $\mx{A}$ or $\mx{B}$ being $\mx{O}$.

Example 6.8:
Matrix-Matrix Multiplication equals to the Zero Matrix

Assume we have the matrices, $\mx{A}$ and $\mx{B}$, as shown below, and that $\mx{A}\mx{B}$ needs to be calculated.

As can be seen, we have $\mx{A}\mx{B}=\mx{O}$ and still, neither of $\mx{A}$ and $\mx{B}$ are $\mx{0}$.
Note that this has some consequences that are not always intuitive.
For example, assume we have the following,

and that we also know that $\mx{A}$ is not equal to $\mx{O}$.
Normally, when we see expressions such as the first row in
(6.50) are used to that $\mx{B}=\mx{C}$. That is certainly possible, however,
as we saw in (6.49), neither
of the terms in a matrix-matrix multiplication needs to be the zero matrix, $\mx{O}$, in order for
the product to be the zero matrix. In this case, it means that $\mx{B} -\mx{C}$ does not need
to be the zero matrix.

It is also worth noting that Theorem 6.1 does not
contain the rule $\mx{A}\mx{B}=\mx{B}\mx{A}$ and the reason is that this is not true
in most cases. This is shown in the following example.
Assume we have the matrices, $\mx{A}$ and $\mx{B}$, as shown below, and that $\mx{A}\mx{B}$ needs to be calculated.

\begin{align} \mx{A}= \left( \begin{array}{rr} 2 & 1 \\ 6 & 3 \end{array} \right) \spc\spc\text{and}\spc\spc \mx{B}= \left( \begin{array}{rr} -2 & 3 \\ 4 & -6 \end{array} \right) \\ \mx{A}\mx{B}= \left( \begin{array}{rr} 2\cdot(-2) + 1\cdot 4 & 2\cdot 3 + 1\cdot (-6) \\ 6\cdot(-2) + 3\cdot 4 & 6\cdot 3 + 3\cdot (-6) \end{array} \right) = \left( \begin{array}{rr} 0 & 0 \\ 0 & 0 \end{array} \right) =\mx{O}. \end{align} | (6.49) |

\begin{gather} \mx{A}\mx{B} = \mx{A}\mx{C} \\ \Longleftrightarrow \\ \mx{A}\mx{B} - \mx{A}\mx{C} = \mx{O} \\ \Longleftrightarrow \\ \mx{A}(\mx{B} -\mx{C}) = \mx{O}, \end{gather} | (6.50) |

Example 6.9:
Matrix-Matrix Multiplication is not Commutative

Assume we have two matrices

As can be seen, the size of $\mx{A}$ is $4\times 2$ and the size of $\mx{B}$ is $2\times 4$.
Interestingly, both $\mx{A}\mx{B}$ and $\mx{B}\mx{A}$ are defined. Both these are shown below.

Using the Rule (6.20) to find out the size of the
product, $\mx{A}\mx{B}$, we find:
$(4 \times \bcancel{2})\, (\bcancel{2} \times 4) \longrightarrow (4 \times 4)$, i.e., the size
is $4\times 4$. Next, we compute $\mx{B}\mx{A}$, i.e.,

The size of $\mx{B}\mx{A}$ is:
$(2 \times \bcancel{4})\, (\bcancel{4} \times 2) \longrightarrow (2 \times 2)$, i.e., the size
is $2\times 2$.
Hence, it is pretty clear that in general we have $\mx{A}\mx{B} \neq \mx{B}\mx{A}$, i.e., matrix-matrix
multiplication is not commutative.

In Interactive Illustration 6.6, we show an example of how the order of two
matrices in a matrix multiplication affects a rectangle when its vertices are interpreted
as vectors and multiplied by a matrix,
Assume we have two matrices

\begin{equation} \mx{A} = \left( \begin{array}{rr} 1 & 2 \\ 3 & -1 \\ \bstwo -2 & 0 \\ 0 & 4 \end{array} \right) \spc\spc\text{and}\spc\spc \mx{B} = \left( \begin{array}{rrrr} 5 & -3 & 0 & 1 \\ 3 & -1 & 2 & 6 \end{array} \right). \end{equation} | (6.51) |

\begin{align} \mx{A}\mx{B} &= \left( \begin{array}{rr} 1 & 2 \\ 3 & -1 \\ \bstwo -2 & 0 \\ 0 & 4 \end{array} \right) \left( \begin{array}{rrrr} 5 & -3 & 0 & -2 \\ 3 & 1 & 2 & 6 \end{array} \right) \\ &= \left( \begin{array}{rrrr} 1\cdot 5 + 2\cdot 3 & 1\cdot(-3) + 2\cdot 1 & 1\cdot 0 + 2\cdot 2 & 1\cdot (-2) + 2\cdot 6\\ 3\cdot 5 - 1\cdot 3 & 3\cdot (-3) - 1\cdot 1 & 3\cdot 0 - 1\cdot 2 & 3\cdot (-2) - 1\cdot 6\\ -2\cdot 5 + 0\cdot 3 & -2\cdot (-3) + 0\cdot 1 & -2\cdot 0 + 0\cdot 2 & -2\cdot (-2) + 0\cdot 6\\ 0\cdot 5 + 4\cdot 3 & 0\cdot (-3) + 4\cdot 1 & 0\cdot 0 + 4\cdot 2 & 0\cdot (-2) + 4\cdot 6 \end{array} \right) \\ &= \left( \begin{array}{rrrr} 11 & -1 & 4 & 10 \\ 12 & -10 & -2 & -12 \\ \bstwo -10 & 6 & 0 & 4 \\ 12 & 4 & 8 & 24 \end{array} \right) \end{align} | (6.52) |

\begin{align} \mx{B}\mx{A} &= \left( \begin{array}{rrrr} 5 & -3 & 0 & -2 \\ 3 & 1 & 2 & 6 \end{array} \right) \left( \begin{array}{rr} 1 & 2 \\ 3 & -1 \\ \bstwo -2 & 0 \\ 0 & 4 \end{array} \right) \\ &= \left( \begin{array}{rr} 5 \cdot 1 - 3 \cdot 3 + 0 \cdot (-2) - 2 \cdot 0 & 5 \cdot 2 - 3 \cdot (-1) + 0 \cdot 0 - 2 \cdot 4 \\ 3 \cdot 1 + 1 \cdot 3 + 2 \cdot (-2) + 6 \cdot 0 & 3 \cdot 2 + 1 \cdot (-1) + 2 \cdot 0 + 6 \cdot 4 \end{array} \right) \\ &= \left( \begin{array}{rr} -4 & 5 \\ 2 & 29 \end{array} \right) \end{align} | (6.53) |

Now that we have seen that matrix addition, matrix multiplication by a scalar, and matrix-matrix multiplication exist, it is reasonable to ask whether there also is a division-like operator. That is, how can we solve for $\mx{X}$ in

\begin{equation} \mx{A}\mx{X} = \mx{B}. \end{equation} | (6.54) |

\begin{equation} ax = b. \end{equation} | (6.55) |

\begin{equation} x = \frac{b}{a} = a^{-1}b, \end{equation} | (6.56) |

We will focus only on square matrix inverses, i.e., if $\mx{A}$ is square, find a solution to Equation (6.54), such that

\begin{equation} \mx{X} = \mx{A}^{-1}\mx{B}. \end{equation} | (6.57) |

Definition 6.17:
Matrix Inverse

The square matrix $\mx{A}$ is said to be invertible if there exists a matrix $\mx{A}^{-1}$, which is called the inverse of $\mx{A}$, such that

For $\mx{A}\mx{A}^{-1} = \mx{I}$, $\mx{A}^{-1}$ is called a right-side inverse, while
$\mx{A}^{-1}$ is called a left-side inverse if $\mx{A}^{-1}\mx{A} = \mx{I}$.

In the following, we will present a theorem that shows that if a matrix is invertible, then there is only
one such matrix, and it works as a left-side and a right-side matrix.
The square matrix $\mx{A}$ is said to be invertible if there exists a matrix $\mx{A}^{-1}$, which is called the inverse of $\mx{A}$, such that

\begin{equation} \mx{A}\mx{A}^{-1} = \mx{A}^{-1}\mx{A} = \mx{I}. \end{equation} | (6.58) |

Theorem 6.2:
Matrix Inverse Existence

Let us call the left-side inverse, $\mx{A}_l^{-1}$ and the right-side inverse, $\mx{A}_r^{-1}$, i.e., $\mx{A}_l^{-1} \mx{A} = \mx{I}$ and $\mx{A}\mx{A}_r^{-1} = \mx{I}$. Then the following holds

$(i)$ If $\mx{A}_l^{-1} \mx{A} = \mx{A}\mx{A}_r^{-1} = \mx{I}$ then $\mx{A}_l^{-1} =\mx{A}_r^{-1}$.

$(ii)$ There is only one matrix inverse, $\mx{A}^{-1}$, to a matrix $\mx{A}$, i.e., $\mx{A}^{-1}= \mx{A}_l^{-1} =\mx{A}_r^{-1}$.

Let us call the left-side inverse, $\mx{A}_l^{-1}$ and the right-side inverse, $\mx{A}_r^{-1}$, i.e., $\mx{A}_l^{-1} \mx{A} = \mx{I}$ and $\mx{A}\mx{A}_r^{-1} = \mx{I}$. Then the following holds

$(i)$ If $\mx{A}_l^{-1} \mx{A} = \mx{A}\mx{A}_r^{-1} = \mx{I}$ then $\mx{A}_l^{-1} =\mx{A}_r^{-1}$.

$(ii)$ There is only one matrix inverse, $\mx{A}^{-1}$, to a matrix $\mx{A}$, i.e., $\mx{A}^{-1}= \mx{A}_l^{-1} =\mx{A}_r^{-1}$.

$(i)$ Assume that both $\mx{A}_l^{-1} \mx{A} = \mx{I}$ and $\mx{A}\mx{A}_r^{-1} = \mx{I}$ hold. Then it follows that

\begin{equation} \mx{A}_l^{-1} = \mx{A}_l^{-1} \mx{I} =\mx{A}_l^{-1} (\mx{A}\mx{A}_r^{-1}) = (\mx{A}_l^{-1} \mx{A})\mx{A}_r^{-1} = \mx{I}\mx{A}_r^{-1} = \mx{A}_r^{-1}. \end{equation} | (6.59) |

$(ii)$ Assume that there are two different inverse matrices, $\mx{R}$ and $\mx{L}$, which can replace $\mx{A}^{-1}$ in Equation (6.58), i.e, we have $\mx{L}\mx{A}=\mx{I}$ and $\mx{A}\mx{R}=\mx{I}$. However, from $(i)$ above it follows that $\mx{L}=\mx{R}$, and as a consequence, there cannot be two different matrix inverses.

$\square$

Now that this theorem has been proved, we can omit the left-side and right-side matrix notation, and simply use $\mx{A}^{-1}$ as the matrix inverse notation. Next, a matrix inverse example follows.

Example 6.10:
Matrix Inverses 1

For a two-dimensional rotation matrix $\mx{R}(\phi)$ (see Definition 6.10), it is reasonable to believe that the inverse rotation matrix is $\mx{R}(-\phi)$, i.e., a rotation in the opposite direction. If this is in fact true, then $\mx{R}(\phi)\mx{R}(-\phi)=\mx{I}$ per Definition 6.17. Hence, let us multiply these matrices and see if the result is the identity matrix. This is done below.

Here, we have used $\cos(-\phi) = \cos\phi$, $\sin(-\phi) = -\sin\phi$, and $\cos^2 \phi + \sin^2\phi=1$.

In the same manner, one can show that $\mx{H}^{-1}_{xy}(s) = \mx{H}_{xy}(-s)$, for example, for shear matrices. For scaling matrices, $\mx{S}^{-1}(f_x,f_y) = \mx{S}(1/f_x, 1/f_y)$. These two latter cases are left as exercises. In Interactive Illustration 6.7, these matrices and their inverses are illustrated.

In Example 6.10, we saw that for rotation, scaling, and shear matrices, it is
straightforward to obtain the corresponding matrix inverse. However, so far, nothing has been said about how
this can be done for general matrices. There are several different ways to do this. For two-dimensional
matrices, it is particularly simple, as shown in the following theorem.
For a two-dimensional rotation matrix $\mx{R}(\phi)$ (see Definition 6.10), it is reasonable to believe that the inverse rotation matrix is $\mx{R}(-\phi)$, i.e., a rotation in the opposite direction. If this is in fact true, then $\mx{R}(\phi)\mx{R}(-\phi)=\mx{I}$ per Definition 6.17. Hence, let us multiply these matrices and see if the result is the identity matrix. This is done below.

\begin{align} \mx{R}(\phi)\mx{R}(-\phi) = & \mx{R}(\phi)\mx{R}(-\phi) = \begin{pmatrix} \cos \phi & -\sin \phi \\ \sin \phi & \hid{-}\cos \phi \end{pmatrix} \begin{pmatrix} \cos (-\phi) & -\sin(-\phi) \\ \sin (-\phi) & \hid{-}\cos(-\phi) \end{pmatrix} \\ =& \begin{pmatrix} \cos \phi & -\sin \phi \\ \sin \phi & \hid{-}\cos \phi \end{pmatrix} \begin{pmatrix} \hid{-}\cos (\phi) & \sin(\phi) \\ -\sin (\phi) & \cos(\phi) \end{pmatrix} \\ =& \begin{pmatrix} \cos^2 \phi + \sin^2\phi & \cos\phi \sin \phi - \sin \phi\cos\phi\\ \sin \phi\cos\phi - \cos\phi \sin \phi & \sin^2\phi + \cos^2 \phi \end{pmatrix} \\ =& \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = \mx{I}. \end{align} | (6.60) |

In the same manner, one can show that $\mx{H}^{-1}_{xy}(s) = \mx{H}_{xy}(-s)$, for example, for shear matrices. For scaling matrices, $\mx{S}^{-1}(f_x,f_y) = \mx{S}(1/f_x, 1/f_y)$. These two latter cases are left as exercises. In Interactive Illustration 6.7, these matrices and their inverses are illustrated.

$\mx{A} = \left(\begin{array}{l} \hid{1} \\ \hid{1} \end{array}\right.$

$\left.\begin{array}{l} \hid{1} \\ \hid{1} \end{array}\right)$

$\textcolor{#aa0000}{\vc{a}_{,1}}$

$\textcolor{#009000}{\vc{a}_{,2}}$

$\mx{A}^{-1} = \left(\begin{array}{l} \hid{1} \\ \hid{1} \end{array}\right.$

$\left.\begin{array}{l} \hid{1} \\ \hid{1} \end{array}\right)$

Theorem 6.3:
Two-Dimensional Matrix Inverse

For a $2\times 2$ matrix, $\mx{A}$, the inverse is

if $a_{11}a_{22} - a_{12}a_{21} \neq 0$, otherwise, the inverse does not exist.

For a $2\times 2$ matrix, $\mx{A}$, the inverse is

\begin{equation} \mx{A}^{-1} = \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}^{-1} = \frac{1}{a_{11}a_{22} - a_{12}a_{21}} \begin{pmatrix} \hid{-}a_{22} & -a_{12} \\ -a_{21} & \hid{-}a_{11} \end{pmatrix}, \end{equation} | (6.61) |

Let us test what happens when $\mx{A}^{-1}$ and $\mx{A}$ are multiplied, i.e.,

\begin{align} \mx{A}^{-1}\mx{A} &= \frac{1}{a_{11}a_{22} - a_{12}a_{21}} \begin{pmatrix} \hid{-}a_{22} & -a_{12} \\ -a_{21} & \hid{-}a_{11} \end{pmatrix} \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix} \\ &= \frac{1}{a_{11}a_{22} - a_{12}a_{21}} \begin{pmatrix} a_{22}a_{11} - a_{12}a_{21} & a_{22}a_{12}-a_{12}a_{22} \\ -a_{21}a_{11} + a_{11}a_{21} & -a_{21}a_{12}+a_{11}a_{22} \end{pmatrix} \\ &= \frac{1}{a_{11}a_{22} - a_{12}a_{21}} \begin{pmatrix} a_{11}a_{22} - a_{12}a_{21} & 0 \\ 0 & a_{11}a_{22} - a_{12}a_{21} \end{pmatrix} = \mx{I}, \end{align} | (6.62) |

$\square$

In Chapter 7, which is about determinants, it will become clear that the denominator in Theorem 6.3 is, in fact, the determinant of the $2\times 2$ matrix.

Next, we will show how the inverse can be computed using Gaussian elimination (Chapter 5).

Example 6.11:

The inverse of the matrix

is desired. Now, let us set up the following system of equations,

where $\mx{A}\vc{x}$ and $\mx{I}\vc{y}$ are column vectors with three elements. If we multiply both sides from the left with the
inverse of $\mx{A}$, we get

that is, we have "moved" over the identity matrix in Equation (6.64) from the right side to the left side, and at the same time, the inverse matrix is suddenly alone on
the right side. Hence, if we can get the identity matrix on the left side, then we will have
the matrix inverse on the other. Writing out the entire matrix structures gives us

As we saw in Equation (6.26), a linear systems of
equations can be expressed in matrix form. It is actually rather convenient, since we do not
need to write out $x_1$, $x_2$, and $x_3$ all the time. Note that instead of having a constant
vector on the right side of the equal sign, we now have the identity matrix times $\vc{y}$. However, Gaussian
elimination can still be done here due the rules from Theorem 5.2.
To save some paper, one may actually even just assume that the $x_1$, $x_2$, $x_3$, $y_1$, $y_2$, and $y_3$
are implicitly there, and avoid writing them out. The abbreviated form of the systems of equation just above
is then written as

We can now perform the usual operations as done for Gaussian elimination. For example, we can subtract
the bottom row from the middle row and place in the bottom row, which would result in

Next, multiply the middle row by $5$ and subtract the result from the first row, and place that in the middle row,
which results in

Note that these operations are applied to the right side as well.
Finally, we multiply the middle row by $2$ and the bottom row by $3$, add them, and then place
the result in the bottom row, i.e.,

In the next step, we do several operations at once. The bottom row is added to the first row,
the bottom row is multiplied by $-11$ and added to the middle row, and the bottom row
is multiplied by $-1$ and simply stored in the bottom row, which results in

Next, we add the middle row to the top row, and then divide the top by $5$ and the middle row by $-3$,
which gives us

Hence, the inverse matrix, $\mx{A}^{-1}$ is

In practice, one may collapse some of the steps above in order to further save space. Note that this
method works for any size of a square matrix. Finally, if the system does not have a solution, then
the matrix is not invertible. The reader may try to multiply $\mx{A}$ and $\mx{A}^{-1}$ and make sure that
the result is the identity matrix, $\mx{I}$.

Note that Chapter 7 will present other ways to compute the inverse as well.
The inverse of the matrix

\begin{align} \mx{A} &= \left( \begin{array}{rrr} 5 & 3 & 1\\ 1 & 0 & -2 \\ 1 & 2 & 5 \end{array} \right) \end{align} | (6.63) |

\begin{gather} \mx{A}\vc{x} = \vc{y} \\ \Longleftrightarrow \\ \mx{A}\vc{x} = \mx{I}\vc{y}, \end{gather} | (6.64) |

\begin{gather} \mx{A}^{-1}\mx{A}\vc{x}=\mx{A}^{-1}\mx{I}\vc{y} \\ \Longleftrightarrow \\ \mx{I}\vc{x}=\mx{A}^{-1}\vc{y}, \end{gather} | (6.65) |

\begin{align} \left( \begin{array}{rrr} 5 & 3 & 1\\ 1 & 0 & -2 \\ 1 & 2 & 5 \end{array} \right) \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0\\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} y_1 \\ y_2 \\ y_3 \end{pmatrix}. \end{align} | (6.66) |

\begin{align} \left( \begin{array}{rrr|rrr} 5 & 3 & 1 & 1 & 0 & 0\\ 1 & 0 & -2 & 0 & 1 & 0\\ 1 & 2 & 5 & 0 & 0 & 1 \end{array} \right). \end{align} | (6.67) |

\begin{align} \left( \begin{array}{rrr|rrr} 5 & 3 & 1 & 1 & 0 & 0\\ 1 & 0 & -2 & 0 & 1 & 0\\ 0 & 2 & 7 & 0 & -1 & 1 \end{array} \right). \end{align} | (6.68) |

\begin{align} \left( \begin{array}{rrr|rrr} 5 & 3 & 1 & 1 & 0 & 0\\ 0 & -3 & -11 & -1 & 5 & 0\\ 0 & 2 & 7 & 0 & -1 & 1 \end{array} \right). \end{align} | (6.69) |

\begin{align} \left( \begin{array}{rrr|rrr} 5 & 3 & 1 & 1 & 0 & 0\\ 0 & -3 & -11 & -1 & 5 & 0\\ 0 & 0 & -1 & -2 & 7 & 3 \end{array} \right). \end{align} | (6.70) |

\begin{align} \left( \begin{array}{rrr|rrr} 5 & 3 & 0 & -1 & 7 & 3\\ 0 & -3 & 0 & 21 & -72 & -33\\ 0 & 0 & 1 & 2 & -7 & -3 \end{array} \right). \end{align} | (6.71) |

\begin{align} \left( \begin{array}{rrr|rrr} 1 & 0 & 0 & 4 & -13 & -6\\ 0 & 1 & 0 & -7 & 24 & 11\\ 0 & 0 & 1 & 2 & -7 & -3 \end{array} \right). \end{align} | (6.72) |

\begin{align} \mx{A}^{-1} &= \left( \begin{array}{rrr} 4 & -13 & -6\\ -7 & 24 & 11\\ 2 & -7 & -3 \end{array} \right). \end{align} | (6.73) |

Theorem 6.4:
Matrix Inverse Properties

In the following, if the matrices $\mx{A}$ and $\mx{B}$ are invertible, then $\mx{A}^{\T}$, $\mx{A}^{-1}$, $\mx{B}^{-1}$, and $\mx{A}\mx{B}$ are also invertible, and

Note the order of $(ii)$, that is, that the order of $\mx{A}$ and $\mx{B}$ are reversed on the right hand side, compared to the left side.

In the following, if the matrices $\mx{A}$ and $\mx{B}$ are invertible, then $\mx{A}^{\T}$, $\mx{A}^{-1}$, $\mx{B}^{-1}$, and $\mx{A}\mx{B}$ are also invertible, and

\begin{equation} \begin{array}{llr} (i) & (\mx{A}^{-1})^{-1} = \mx{A} & \spc\text{(inverse inverse)}, \\ (ii) & (\mx{A}\mx{B})^{-1} = \mx{B}^{-1}\mx{A}^{-1} & \spc\text{(product inverse)}, \\ (iii) & (\mx{A}^{-1})^{\T} = (\mx{A}^{\T})^{-1} & \spc\text{(inverse transpose)}. \\ \end{array} \end{equation} | (6.74) |

$(i)$ By Definition 6.17 $\mx{A}\mx{A}^{-1}=\mx{A}^{-1}\mx{A}=\mx{I}$, which means that the inverse of $\mx{A}^{-1}$ is $\mx{A}$, and hence $\mx{A}^{-1}$ is invertible.

$(ii)$ We know we have the inverse to a matrix if we can multiply it by the matrix itself and get $\mx{I}$. Assume we have guessed that the inverse to $(\mx{A}\mx{B})$ is $\invmx{B}\invmx{A}$ and that we then want to check that it is correct. We can now multiply $\mx{A}\mx{B}$ with $\invmx{B}\invmx{A}$ and see the result: $(\mx{A}\mx{B})(\invmx{B}\invmx{A}) =$ $\mx{A}(\mx{B}\invmx{B})\invmx{A} = $ $\mx{A}\mx{I}\invmx{A} = $ $\mx{A}\invmx{A} = \mx{I}$. Thus our guess was correct and the inverse of $(\mx{A}\mx{B})$ is indeed $\invmx{B}\invmx{A}$.

$(iii)$ Note that $\mx{I}=\mx{I}^{\T}$, and then we use $(xiv)$ from Theorem 6.1 to get $(\underbrace{\mx{A}\mx{A}^{-1}}_{\mx{I}})^{\T} = $ $(\mx{A}^{-1})^{\T} \mx{A}^{\T} = \mx{I}$, which means that $(\mx{A}^{-1})^{\T}$ is left-hand inverse to $\mx{A}^{\T}$. Similarly, $(\underbrace{\mx{A}^{-1}\mx{A}}_{\mx{I}})^{\T} = $ $\mx{A}^{\T}(\mx{A}^{-1})^{\T} = \mx{I}$, which together shows the rule and $\mx{A}^{\T}$ is invertible.

$\square$

Example 6.12:
Matrix Product Inverse au Faux

As we have just seen above in Theorem 6.4, the inverse of a matrix-matrix product, say $\mx{R}(\phi)\mx{H}_{xy}(s)$, is $(\mx{R}(\phi)\mx{H}_{xy}(s))^{-1}=$ $\mx{H}^{-1}_{xy}(s)\mx{R}^{-1}(\phi)$. Here, we will be exploring what would happen if we do not honor that rule of exchanging the order of the matrices. Controlling whether one computes a true inverse can be done by multiplying the matrix with its inverse, and that should give us the identity matrix, $\mx{I}$, i.e.,

in our example.
What if we were a little sloppy, and actually forgot that we should change the order of the two matrices?
Then we would get a matrix, $\mx{M}$, as

As we saw in Example 6.10, the inverses for the rotation and shear matrices
are rather simple, i.e., $\mx{R}^{-1}(\phi) = \mx{R}(-\phi)$ and $\mx{H}^{-1}_{xy}(s) = \mx{H}_{xy}(-s)$.
This means that

The result of applying the matrix $\mx{M}$ to the vertices (interpreted as column vectors) of a unit square
is shown in Interactive Illustration 6.8.
This example has shown that it is very important to maintain correct order of the matrices in a matrix multiplication.
Otherwise, one may get a result as the one shown in the figure above, i.e., you don't quite get the identity matrix. It is quite
close, but it is not at all useful.

As we have just seen above in Theorem 6.4, the inverse of a matrix-matrix product, say $\mx{R}(\phi)\mx{H}_{xy}(s)$, is $(\mx{R}(\phi)\mx{H}_{xy}(s))^{-1}=$ $\mx{H}^{-1}_{xy}(s)\mx{R}^{-1}(\phi)$. Here, we will be exploring what would happen if we do not honor that rule of exchanging the order of the matrices. Controlling whether one computes a true inverse can be done by multiplying the matrix with its inverse, and that should give us the identity matrix, $\mx{I}$, i.e.,

\begin{align} \bigl(\mx{R}(\phi)\mx{H}_{xy}(s)\bigr) \bigl(\mx{R}(\phi)\mx{H}_{xy}(s)\bigr)^{-1} = \mx{I}, \end{align} | (6.75) |

\begin{align} \mx{M} = \bigl(\mx{R}(\phi)\mx{H}_{xy}(s)\bigr) \bigl(\mx{R}^{-1}(\phi)\mx{H}^{-1}_{xy}(s)\bigr). \end{align} | (6.76) |

\begin{align} \mx{M} = \mx{R}(\phi)\mx{H}_{xy}(s) \mx{R}(-\phi) \mx{H}_{xy}(-s). \end{align} | (6.77) |

We saw in Chapter 5 that to test if a set of vectors $\{\vc{u}_1, \vc{u}_2, \ldots, \vc{u}_q\}$ are independent or if they span $\R^p$, we have to study a set of $p$ equations in $q$ unknowns. In this chapter, we saw that matrices can be used to conveniently express system of linear equations.

Theorem 6.5:
Matrices and Linear Independence

The following two statements are equivalent.

The following two statements are equivalent.

- The column vectors of the matrix $\mx{A}$ are linearly independent.
- The equation $\mx{A} \vc{x} = \vc{0}$ has only the solution $\vc{x}=\vc{0}$.

According to Definition 5.2 the column vectors $\vc{a}_1, \vc{a}_1, \ldots, \vc{a}_q$ are linearly independent if and only if $\sum_{i=1}^q x_i \vc{a}_i = \vc{0}$. If $\mx{A}$ is the $p \times q$ matrix with $\vc{a}_1, \vc{a}_1, \ldots, \vc{a}_q$ as columns, then $\mx{A}\vc{x}=\vc{0} \Leftrightarrow \sum_{i=1}^q x_i \vc{a}_i = \vc{0}$. This means that the two first statements are equivalent.

$\square$

Theorem 6.6:
Matrices and Linear Independence

If there exists a left-inverse $\mx{A}_l^{-1}$ to the matrix $\mx{A}$, then the columns of $\mx{A}$ are linearly independent.

If there exists a left-inverse $\mx{A}_l^{-1}$ to the matrix $\mx{A}$, then the columns of $\mx{A}$ are linearly independent.

Assume that there exists at least one left-inverse $\mx{A}_l^{-1}$ to the matrix $\mx{A}$. Then we can multiply the matrix equation $\mx{A} \vc{x} = \vc{0}$ with $\mx{A}_l^{-1}$ from the left to obtain $\mx{A}_l^{-1} \mx{A} \vc{x} = \mx{A}_l^{-1} \vc{0}$ or $ \mx{I} \vc{x} = \vc{x} = \vc{0}$. This proves the theorem.

$\square$

Theorem 6.7:
Matrices and Span

The following two statements are equivalent.

The following two statements are equivalent.

- The column vectors of the matrix $\mx{A}$ span $\R^p$.
- The equation $\mx{A} \vc{x} = \vc{y}$ has a solution for every $\vc{y}$.

According to Definition 5.3, the column vectors $\vc{a}_1, \vc{a}_1, \ldots, \vc{a}_q$ span $\R^p$ if and only if $\sum_{i=1}^q x_i \vc{a}_i = \vc{y}$ has a solution for every $\vc{y}$. If $\mx{A}$ is the $p \times q$ matrix with $\vc{a}_1, \vc{a}_1, \ldots, \vc{a}_q$ as columns, then $\mx{A}\vc{x}=\vc{y} \Longleftrightarrow \sum_{i=1}^q x_i \vc{a}_i = \vc{y}$. This means that the two first statements are equivalent.

$\square$

Theorem 6.8:
Matrices and Linear Independence

If the columns of $\mx{A}$ span $\R^p$ then there exists a right-inverse $\mx{A}_r^{-1}$ to the matrix $\mx{A}$.

If the columns of $\mx{A}$ span $\R^p$ then there exists a right-inverse $\mx{A}_r^{-1}$ to the matrix $\mx{A}$.

If the columns of $\mx{A}$ span $\R^p$ then the matrix equation $\mx{A}\vc{x} = \vc{y}$ has a solution for every $\vc{y}$. Let $\vc{e}_i$ be the canonical basis and let $\vc{b}_i$ be a solution to the equation $\mx{A}\vc{b}_i = \vc{e}_i$. Now form the matrix $\mx{B} = (\vc{b}_1 \cdot \vc{b}_p)$, then $\mx{A} \mx{B} = (\mx{A} \vc{b}_1 \cdot \mx{A} \vc{b}_p) = (\vc{e}_1 \cdot \vc{e}_p) = \mx{I}$. So $\mx{B}$ is a right inverse to $\mx{A}$.

$\square$

Theorem 6.9:

Let $\mx{A}$ be a square matrix. Then the following statements are equivalent:

Let $\mx{A}$ be a square matrix. Then the following statements are equivalent:

- The column vectors of the matrix $\mx{A}$ span $\R^p$.
- The row vectors of the matrix $\mx{A}$ span $\R^p$.
- The equation $\mx{A} \vc{x} = \vc{y}$ has a solution for every $\vc{y}$.
- The column vectors of the matrix $\mx{A}$ are linearly independent.
- The row vectors of the matrix $\mx{A}$ are linearly independent.
- The equation $\mx{A} \vc{x} = \vc{0}$ has only the solution $\vc{x}=\vc{0}$.
- The matrix $\mx{A}$ is invertible.

We have already shown that statements $(i)$ and $(iii)$ are equivalent and that $(iv)$ and $(vi)$ are equivalent. We have also showed that $(iv)$ implies that a right inverse exists. According to Theorem 6.2 this means that $\mx{A}$ is invertible and that a left inverse exists. Since the matrix is sqaure, Theorem 5.5 gives the equivalence between $(i)$ and $(iv)$. We now need to link $(i), (iii), (iv), (vi)$ to $(vii)$. This is done by the following chain of theorems. That the columns span $(i)$ gives the existence of a right inverse according to Theorem 6.8. Since the matrix is square, it must be invertible $(vii)$ and has also a left-inverse. Then Theorem 6.6 gives that the columns are linearly independent $(iv)$. Thus $(i), (iii), (iv), (vi)$, and $(vii)$ are all equivalent. Finally, since $\mx{A}$ is invertible whenever $\mx{A}^T$ statements on the row vectors follow.

$\square$

Section 5.10 has already outlined how change of basis can be done, where the second basis $\{\hat{\vc{e}}_1,\hat{\vc{e}}_2\}$ can be expressed in terms of a first basis $\{\vc{e}_1,\vc{e}_2\}$, i.e.,

\begin{align} \hat{\vc{e}}_1 = b_{11} \vc{e}_1 + b_{21} \vc{e}_2,\\ \hat{\vc{e}}_2 = b_{12} \vc{e}_1 + b_{22} \vc{e}_2, \end{align} | (6.78) |

Theorem 6.10:
Change of Base

Given the following relationship between the two $\R^n$ bases $\{\vc{e}_1,\vc{e}_2,\dots,\vc{e}_n\}$ and $\{\hat{\vc{e}}_1,\hat{\vc{e}}_2,\dots,\hat{\vc{e}}_n\}$,

where a particular vector $\vc{v}$ has the following two representations

and let $\mx{B}$ be the matrix with $\hat{\vc{e}}_i$ as column vectors,
then it holds that

where $\vc{v}=(v_1,v_2,v_3,\dots,v_n)$ and $\hat{\vc{v}}=(\hat{v}_1,\hat{v}_2,\hat{v}_3,\dots,\hat{v}_n)$.

Given the following relationship between the two $\R^n$ bases $\{\vc{e}_1,\vc{e}_2,\dots,\vc{e}_n\}$ and $\{\hat{\vc{e}}_1,\hat{\vc{e}}_2,\dots,\hat{\vc{e}}_n\}$,

\begin{align} \hat{\vc{e}}_1 &= b_{11} \vc{e}_1 + b_{21} \vc{e}_2 + \dots + b_{n,1} \vc{e}_n, \\ \hat{\vc{e}}_2 &= b_{12} \vc{e}_1 + b_{22} \vc{e}_2 + \dots + b_{n,2} \vc{e}_n, \\ &\dots \\ \hat{\vc{e}}_n &= b_{1,n} \vc{e}_1 + b_{2,n} \vc{e}_2 + \dots + b_{n,n} \vc{e}_n, \end{align} | (6.79) |

\begin{align} \vc{v} &= v_1\vc{e}_1 + v_2\vc{e}_2 + v_3\vc{e}_3+\dots+v_n\vc{e}_n = \\ &= \hat{v}_1\hat{\vc{e}}_1 + \hat{v}_2\hat{\vc{e}}_2 + \hat{v}_3\hat{\vc{e}}_3+\dots+\hat{v}_n\hat{\vc{e}}_n, \end{align} | (6.80) |

\begin{align} \vc{v} = \mx{B}\hat{\vc{v}} = \begin{pmatrix} b_{11} & b_{12} & \dots & b_{1,n} \\ b_{21} & b_{22} & \dots & b_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ b_{n,1} & b_{n,2} & \dots & b_{n,n} \end{pmatrix} \vc{\hat{v}}, \end{align} | (6.81) |

We start by rewriting the representation of $\vc{v}$ as

\begin{align} \vc{v} &= \sum_{i=1}^n x_i\vc{e}_i \\ &= \sum_{i=1}^n \hat{x}_i\hat{\vc{e}}_i, \end{align} | (6.82) |

\begin{align} \hat{\vc{e}}_j = \sum_{i=1}^n b_{ij}\vc{e}_i, \end{align} | (6.83) |

\begin{align} \vc{v} = \sum_{j=1}^n \hat{x}_j\hat{\vc{e}}_j = \sum_{j=1}^n \hat{x}_j \Biggl( \sum_{i=1}^n b_{ij}\vc{e}_i \Biggr) = \sum_{i=1}^n \Biggl( \sum_{j=1}^n b_{ij} \hat{x}_j \Biggr) \vc{e}_i. \end{align} | (6.84) |

\begin{align} \vc{v} &= \sum_{i=1}^n x_i\vc{e}_i \\ \vc{v} &= \sum_{i=1}^n \Biggl( \sum_{j=1}^n b_{ij} \hat{x}_j \Biggr) \vc{e}_i, \end{align} | (6.85) |

\begin{gather} x_i = \sum_{j=1}^n b_{ij} \hat{x}_j \\ \Longleftrightarrow \\ \vc{x} = \mx{B}\hat{\vc{x}}. \end{gather} | (6.86) |

$\square$

Note that per Equation (6.81), we have $\vc{v} = \mx{B}\vc{\hat{v}}$, which also means that $\vc{\hat{v}} = \mx{B}^{-1} \vc{v}$, assuming that $\mx{B}$ is invertible. It is often this expression that one is interested in.

A special set of matrices are the so called

Definition 6.18:
Orthogonal Matrix

An orthogonal matrix $\mx{B}$ is a square matrix where the column vectors constitute an orthonormal basis.

Note that since the column vector constitute an orthonormal basis (Definition 3.3),
it would make more sense to call the matrix
orthonormal, but the term "orthogonal matrix" has a lot of legacy, so we will use it here as well.
Given this short definition, the following theorem can be proved.
An orthogonal matrix $\mx{B}$ is a square matrix where the column vectors constitute an orthonormal basis.

Theorem 6.11:
Orthogonal Matrix Equivalence

The following are equivalent

$\spc (i)$ The matrix $\mx{B}$ is orthogonal.

$\spc (ii)$ The column vectors of $\mx{B}$ constitute an orthonormal basis.

$\spc (iii)$ The row vectors of $\mx{B}$ constitute an orthonormal basis.

$\spc (iv)$ $\mx{B}^{-1} = \mx{B}^{\T}$

The following are equivalent

$\spc (i)$ The matrix $\mx{B}$ is orthogonal.

$\spc (ii)$ The column vectors of $\mx{B}$ constitute an orthonormal basis.

$\spc (iii)$ The row vectors of $\mx{B}$ constitute an orthonormal basis.

$\spc (iv)$ $\mx{B}^{-1} = \mx{B}^{\T}$

$(i)$ and $(ii)$ are simply the definition, and so need not be proved.

Next, we show that $(iv)$ and $(ii)$ are equivalent. Assume that we have an orthonormal basis (Definition 3.3) consisting of the following set of vectors, $\{\vc{b}_1,\vc{b}_2,\vc{b}_3,\dots,\vc{b}_n\}$. Let us put them as column vectors in a matrix, $\mx{B}$, i.e.,

\begin{align} \mx{B} &= \begin{pmatrix} | & | & \dots & | \\ \vc{b}_{1} & \vc{b}_{2} & \dots & \vc{b}_{n} \\ | & | & \dots & | \\ \end{pmatrix}, \end{align} | (6.87) |

\begin{align} \mx{B}^{\T} \mx{B} &= \begin{pmatrix} -\,\,\, \vc{b}_{1}^\T - \\ -\,\,\, \vc{b}_{2}^\T - \\ -\,\,\, \vc{b}_{3}^\T - \\ \vdots \\ -\,\,\, \vc{b}_{n}^\T - \end{pmatrix} \begin{pmatrix} | & | & | & \dots & | \\ \vc{b}_{1} & \vc{b}_{2} & \vc{b}_{3} & \dots & \vc{b}_{n} \\ | & | & | & \dots & | \\ \end{pmatrix}\\ &= \begin{pmatrix} \vc{b}_{1}^\T \vc{b}_{1} & \vc{b}_{1}^\T \vc{b}_{2} & \vc{b}_{1}^\T \vc{b}_{3} & \dots & \vc{b}_{1}^\T \vc{b}_{n} \\ \vc{b}_{2}^\T \vc{b}_{1} & \vc{b}_{2}^\T \vc{b}_{2} & \vc{b}_{2}^\T \vc{b}_{3} & \dots & \vc{b}_{2}^\T \vc{b}_{n} \\ \vc{b}_{3}^\T \vc{b}_{1} & \vc{b}_{3}^\T \vc{b}_{2} & \vc{b}_{3}^\T \vc{b}_{3} & \dots & \vc{b}_{3}^\T \vc{b}_{n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \vc{b}_{n}^\T \vc{b}_{1} & \vc{b}_{n}^\T \vc{b}_{2} & \vc{b}_{n}^\T \vc{b}_{3} & \dots & \vc{b}_{n}^\T \vc{b}_{n} \\ \end{pmatrix} \\ &= \begin{pmatrix} 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & 1 & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 \\ \end{pmatrix} = \mx{I}, \end{align} | (6.88) |

Now that we have proved the equivalence of $(i)$, $(ii)$, and $(iv)$, it only remains to show that $(iii)$ also is equivalent to either of $(i)$, $(ii)$, and $(iv)$. At this point, we know that $\mx{B}\mx{B}^\T=\mx{I}$ due to $(iv)$. We introduce $\mx{A} = \mx{B}^\T$ and evaluate $\mx{A}\mx{A}^\T$, i.e.,

\begin{equation} \mx{A}\mx{A}^\T = \mx{B}^\T(\mx{B}^\T)^T = \mx{B}^\T\mx{B}=\mx{I}. \end{equation} | (6.89) |

$\square$

This means that the inverse of an orthogonal matrix is simply its transpose, that is, $\mx{A}^{-1} = \mx{A}^{\T}$, which is very convenient since the transpose is trivial to compute, while the inverse of an arbitrary square matrix usually is not.

Example 6.13:
Inverse of Rotation Matrix

A rotation matrix (see Section 6.4) by $\phi$ radians around the $z$-axis is

As explained in Definition 6.18, the inverse of an orthogonal
matrix is its transpose. Hence, it should be possible to get $\mx{R}_z(\phi) \mx{R}^{\T}_z(\phi)=\mx{I}$ as a result
if the rotation matrix is orthogonal,
i.e.,

where we have used the fact that $\cos^{2}+\sin^{2}=1$.
This shows that $\mx{R}_z(\phi)$ is an orthogonal matrix, and in fact, it is possible to show that
all rotation matrices are orthogonal.

A rotation matrix (see Section 6.4) by $\phi$ radians around the $z$-axis is

\begin{equation} \mx{R}_z(\phi) = \begin{pmatrix} \cos \phi & -\sin \phi & 0 \\ \sin \phi & \hid{-}\cos \phi & 0 \\ 0 & 0 & 1 \end{pmatrix}. \end{equation} | (6.90) |

\begin{align} \mx{R}_z(\phi)\mx{R}^{\T}_z(\phi) &= \begin{pmatrix} \cos \phi & -\sin \phi & 0 \\ \sin \phi & \hid{-}\cos \phi & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} \hid{-}\cos \phi & \sin \phi & 0 \\ -\sin \phi & \cos \phi & 0 \\ 0 & 0 & 1 \end{pmatrix} \\ &= \begin{pmatrix} \cos \phi\cos \phi + \sin \phi\sin \phi & \cos \phi\sin \phi-\sin \phi\cos \phi & 0 \\ \sin \phi\cos \phi - \cos \phi\sin \phi & \sin \phi\sin \phi+ \cos \phi\cos \phi& 0 \\ 0 & 0 & 1 \end{pmatrix} \\ &= \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} =\mx{I}, \end{align} | (6.91) |

Theorem 6.12:
Orthogonality and Length Preservation

If $\mx{B}$ is an orthogonal matrix, then $\ln{\mx{B}\vc{v}} = \ln{\vc{v}}$, and vice versa, i.e., the transform preserves length.

If $\mx{B}$ is an orthogonal matrix, then $\ln{\mx{B}\vc{v}} = \ln{\vc{v}}$, and vice versa, i.e., the transform preserves length.

As seen in Example 6.6, the dot product between two vectors, $\vc{u}$ and $\vc{v}$ can be expressed as $\vc{u}\cdot\vc{v} = \vc{u}^\T \vc{v}$. Note that

\begin{gather} \ln{\mx{B}\vc{v}} = \ln{\vc{v}} \\ \Longleftrightarrow \\ \ln{\mx{B}\vc{v}}^2 = \ln{\vc{v}}^2 \\ \Longleftrightarrow \\ \bigl(\mx{B}\vc{v}\bigr) \cdot \bigl(\mx{B}\vc{v}\bigr) = \vc{v} \cdot \vc{v} \\ \Longleftrightarrow \\ \bigl(\mx{B}\vc{v}\bigr)^\T \bigl(\mx{B}\vc{v}\bigr) = \vc{v}^\T \vc{v}. \end{gather} | (6.92) |

\begin{gather} \bigl(\mx{B}\vc{v}\bigr)^\T \bigl(\mx{B}\vc{v}\bigr) = \vc{v}^\T\underbrace{\mx{B}^\T \mx{B}}_{\mx{I}} \vc{v} = \vc{v}^\T \mx{I} \vc{v} = \vc{v}^\T \vc{v} = \vc{v}\cdot \vc{v}, \end{gather} | (6.93) |

$\square$

Theorem 6.13:

If $\mx{B}$ is an orthogonal matrix, then $(\mx{B} \vc{u}) \cdot(\mx{B}\vc{v}) = \vc{u}\cdot\vc{v}$, i.e., it does not matter in which basis one performs the dot product in.

If $\mx{B}$ is an orthogonal matrix, then $(\mx{B} \vc{u}) \cdot(\mx{B}\vc{v}) = \vc{u}\cdot\vc{v}$, i.e., it does not matter in which basis one performs the dot product in.

As we saw in Example 3.6, it holds that $\vc{u}\cdot \vc{v} = \frac{1}{4}\bigl( \ln{ \vc{u} + \vc{v} }^2 - \ln{ \vc{u} - \vc{v} }^2 \bigr)$, which means that

\begin{align} (\mx{B} \vc{u}) \cdot(\mx{B}\vc{v}) &= \frac{1}{4}\Bigl( \ln{ \mx{B}\vc{u} + \mx{B}\vc{v} }^2 - \ln{ \mx{B}\vc{u} - \mx{B}\vc{v} }^2\Bigr) \\ &= \frac{1}{4}\Bigl( \ln{ \mx{B}\bigl(\vc{u} + \vc{v}\bigr) }^2 - \ln{ \mx{B}\bigl(\vc{u} - \vc{v}\bigr) }^2\Bigr) \\ &= \frac{1}{4}\Bigl( \ln{ \vc{u} + \vc{v} }^2 - \ln{ \vc{u} - \vc{v} }^2\Bigr) = \vc{u}\cdot\vc{v}, \\ \end{align} | (6.94) |

$\square$

Theorem 6.14:

If $\mx{A}$ and $\mx{B}$ are orthogonal matrices, then $\mx{A}\mx{B}$ is orthogonal as well.

If $\mx{A}$ and $\mx{B}$ are orthogonal matrices, then $\mx{A}\mx{B}$ is orthogonal as well.

Theorem 6.12 states that all length-preserving matrices are orthogonal and since

\begin{align} || \mx{A}\mx{B}\vc{v} || = || \mx{A} \left(\mx{B}\vc{v}\right)|| = || \mx{B}\vc{v}\bigr|| = || \vc{v} || \end{align} | (6.95) |

$\square$

Example 6.14:
Orthogonal Matrix Multiplication Visualization

As we saw in Theorem 6.14, $\mx{A}\mx{B}$ is orthogonal if $\mx{A}$ and $\mx{B}$ are orthogonal. In the following interactive illustration, we will visualize the matrix-matrix multiplication between two orthogonal matrices.

As we saw in Theorem 6.14, $\mx{A}\mx{B}$ is orthogonal if $\mx{A}$ and $\mx{B}$ are orthogonal. In the following interactive illustration, we will visualize the matrix-matrix multiplication between two orthogonal matrices.

Example 6.15:
Change of Base using Orthogonal Matrices

Assume we have two orthonormal bases, $\{\vc{e}_1, \vc{e}_2\}$ and $\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}$, defined as

It is easy to check that $\ln{\vc{e}_i}=1$ and $\ln{\hat{\vc{e}}_i}=1$ for $i\in\{1,2\}$
and that $\vc{e}_1 \cdot \vc{e}_2 = 0$ and $\hat{\vc{e}}_1 \cdot \hat{\vc{e}}_2 = 0$,
i.e., we have two orthonormal bases, per Definition 3.3.
Now, it is possible to use Theorem 6.10 to
find out what the matrices look like that expresses these bases. However, an alternative
way when dealing with orthonormal bases is simply to imagine the vectors $(1,0)$ and $(0,1)$
expressed in $\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}$ and then figure out how to set up a matrix
that transforms those vectors into $\{\vc{e}_1, \vc{e}_2\}$. It is rather simple as seen below.

Now assume that we have a vector $\vc{v}=(3/4, 1/2)$ in $\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}$
and we want to transform that vector into $\{\vc{e}_1, \vc{e}_2\}$. It is simply
a matter of multiplying with the matrix $\mx{A}$ above, i.e.,

So $\mx{A}$ can be used to transform a vector in $\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}$ so that it instead
is expressed in $\{\vc{e}_1, \vc{e}_2\}$. This indicates that $\mx{A}^\T$ can be used to transform a vector
in $\{\vc{e}_1, \vc{e}_2\}$ so that instead is expressed in $\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}$.

Now assume we have yet another orthonormal basis, $\{\bar{\vc{e}}_1, \bar{\vc{e}}_2\}$, whose corresponding transform matrix is $\mx{B}$. This means that to take a vectors from $\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}$ to $\{\bar{\vc{e}}_1, \bar{\vc{e}}_2\}$, you first use $\mx{A}$ to get to $\{\vc{e}_1, \vc{e}_2\}$ and then $\mx{B}^\T$ to get to $\{\bar{\vc{e}}_1, \bar{\vc{e}}_2\}$. If we were to apply this to a vector, $\vc{v}$, then would be expressed as

where $\vc{v}'$ is expressed in $\{\bar{\vc{e}}_1, \bar{\vc{e}}_2\}$.
The first steps of this are visualized in Interactive Illustration 6.10.

Assume we have two orthonormal bases, $\{\vc{e}_1, \vc{e}_2\}$ and $\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}$, defined as

\begin{equation} \vc{e}_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \vc{e}_2 = \begin{pmatrix} 0 \\ 1 \end{pmatrix}, \ \ \mathrm{and} \ \ \hat{\vc{e}}_1 = \begin{pmatrix} \frac{\sqrt{3}}{2} \\ \frac{1}{2} \end{pmatrix}, \hat{\vc{e}}_2 = \begin{pmatrix} -\frac{1}{2} \\ \frac{\sqrt{3}}{2} \end{pmatrix}. \end{equation} | (6.96) |

\begin{align} \underbrace{ \begin{pmatrix} \frac{\sqrt{3}}{2} & -\frac{1}{2}\\ \frac{1}{2} & \frac{\sqrt{3}}{2} \end{pmatrix} }_{\mx{A}} \begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} \frac{\sqrt{3}}{2} \\ \frac{1}{2} \end{pmatrix} \ \ \mathrm{and} \ \ \underbrace{ \begin{pmatrix} \frac{\sqrt{3}}{2} & -\frac{1}{2}\\ \frac{1}{2} & \frac{\sqrt{3}}{2} \end{pmatrix} }_{\mx{A}} \begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} -\frac{1}{2} \\ \frac{\sqrt{3}}{2} \\ \end{pmatrix} \end{align} | (6.97) |

\begin{equation} \mx{A}\vc{v} = \begin{pmatrix} \frac{\sqrt{3}}{2} & -\frac{1}{2}\\ \frac{1}{2} & \frac{\sqrt{3}}{2} \end{pmatrix} \begin{pmatrix} \frac{3}{4} \\ \frac{1}{2} \end{pmatrix} = \begin{pmatrix} \frac{3\sqrt{3}-2}{8} \\ \frac{2\sqrt{3}+3}{8} \end{pmatrix} \end{equation} | (6.98) |

Now assume we have yet another orthonormal basis, $\{\bar{\vc{e}}_1, \bar{\vc{e}}_2\}$, whose corresponding transform matrix is $\mx{B}$. This means that to take a vectors from $\{\hat{\vc{e}}_1, \hat{\vc{e}}_2\}$ to $\{\bar{\vc{e}}_1, \bar{\vc{e}}_2\}$, you first use $\mx{A}$ to get to $\{\vc{e}_1, \vc{e}_2\}$ and then $\mx{B}^\T$ to get to $\{\bar{\vc{e}}_1, \bar{\vc{e}}_2\}$. If we were to apply this to a vector, $\vc{v}$, then would be expressed as

\begin{equation} \vc{v}' = \mx{B}^\T \mx{A} \vc{v}, \end{equation} | (6.99) |

In Section 6.1, we had one image to the left (original) and another image to the right. The right image is the left image manipulated in a certain way using a matrix. A TV or computer display contains of a number (often millions) of pixels (picture elements) and each pixel has a red, green, and a blue component. For each, pixel we can put these into a vector, i.e.,

\begin{equation} \vc{p} = \begin{pmatrix} r\\ g\\ b \end{pmatrix}, \end{equation} | (6.100) |

\begin{align} \vc{p}' = \begin{pmatrix} r'\\ g'\\ b' \end{pmatrix}= \mx{M}\vc{p} = \begin{pmatrix} m_{11} && m_{12} && m_{13} \\ m_{21} && m_{22} && m_{23} \\ m_{31} && m_{32} && m_{23} \end{pmatrix} \begin{pmatrix} r\\ g\\ b \end{pmatrix}, \end{align} | (6.101) |