Loading and building chapter...

In this chapter, we introduce the concepts of eigenvalues and eigenvectors of a square matrix. These concepts have numerous uses, for example, to better understand and visualize linear mappings, to understand the stability of mechanical constructions, for solving systems of differential equations, to recognize images, to interpret and visualize quadratic equations, and for image segmentation. First, let us start with two examples.

Example 10.1:
Eigenfaces

There are many methods for recognizing faces in images. State of the art methods typically use deep learning, for which linear algebra is an important part. One popular approach for understanding and compressing face images is to use, so called, eigenfaces. This is illustrated in Interactive Illustration 10.1, where we show an example of how new faces can be generated using two parameters. By just specifying two numbers, it is possible to synthesize a whole range of images as is shown in the figure. Thus the technique can be used for*synthesizing* new images. This can also bee seen as *image compression* since only two numbers need to be stored for every new image. The technique can also be used for
*image understanding* in the following way. For new images one may calculate what the corresponding coordinates $(x_1,x_2)$ are. Different regions in $R^2$ correspond to smiling face, open mouth etc.
The mean image, $\vc{O}$, and the two eigenfaces, $\vc{E}_1$ and $\vc{E}_2$, are shown below.
The images used to generated $\vc{O}$, $\vc{E}_1$, and $\vc{E}_2$ are shown in Figure 10.2. This process is explained later in a bit more detail in Example 10.18 once we have learnt more about eigenvalues and eigenvectors. Briefly, one can say that the images in Figure 10.2 are used to calculate the mean $O$ and the covariance matrix. The two eigenvectors of the covariance matrix corresponding to the two largest eigenvalues are used as $\vc{E}_1$ and $\vc{E}_2$.

There are many methods for recognizing faces in images. State of the art methods typically use deep learning, for which linear algebra is an important part. One popular approach for understanding and compressing face images is to use, so called, eigenfaces. This is illustrated in Interactive Illustration 10.1, where we show an example of how new faces can be generated using two parameters. By just specifying two numbers, it is possible to synthesize a whole range of images as is shown in the figure. Thus the technique can be used for

Example 10.2:
Pagerank

How does Google rank web pages? In this example, we will look at a simplified model for ranking web pages. The main idea is that a web page has a higher rank if many pages (with high rank) links to this page. The problem is recursive. In order to calculate the rank of one page, you need to know the ranks of the other pages. Another way to look at this is to think of a person that randomly clicks on one of the links of a homepage. After random browsing, the page rank is the probability that you are on a certain homepage. In order to connect all homepages one has the idea that with a small probability (say 15 percent) one jumps to a totally random homepage. Let's look at a toy example. Assume that there are four homepages on the internet, 1 - 'Immersive Math', 2 - 'Linear Algebra', 3 - 'Boolean Algebra' and 4 - 'Boring Algebra'.
The links from one homepage to another are illustrated as arrows in the figure.
Assume that the probability of being at a homepage is given by a vector $\vc{v} = \begin{pmatrix} v_1 \\ v_2 \\ v_3 \\ v_4 \end{pmatrix}$, where $v_i$ is the probability of being at homepage with number $i$. If you choose randomly among the links of one homepage, the probability of being on a homepage becomes $ \vc{w} = \mx{A} \vc{v}$,
where the transition matrix, $\mx{A}$, is

The first column has two entries. If you are at homepage $1$ there are two links, one to page 2 and one to page 3. There is thus a probability $0.5$ of ending up in homepage 2 and probability $0.5$ of ending up in homepage 3. Similarly, the entries in column $j$ depend on what links there are on homepage $j$.
For the page rank problem, one usually modifies the probability in the following way. You have a small probability, say $0.15$, that you choose a random homepage with equal probability and then a probability $0.85$ that you choose the next homepage according to the links. The transition matrix becomes

The rank of pages are connected to the so called stationary probability vector $\vc{p}$ that fulfills

This is a linear equation in the unknowns $\vc{p}$, which could be calculated using Gaussian elimination, but it is also connected to the concept of eigenvectors that we are discussing in this chapter.
In this example, the vector $\vc{p}$ is

So obviously 'Immersive Math' should be the top choice.
A continued discussion on how to solve the pagerank problem for really large networks (like the internet) will be given in Example 10.12 using what we will learn about eigenvalues and eigenvectors in this chapter.

How does Google rank web pages? In this example, we will look at a simplified model for ranking web pages. The main idea is that a web page has a higher rank if many pages (with high rank) links to this page. The problem is recursive. In order to calculate the rank of one page, you need to know the ranks of the other pages. Another way to look at this is to think of a person that randomly clicks on one of the links of a homepage. After random browsing, the page rank is the probability that you are on a certain homepage. In order to connect all homepages one has the idea that with a small probability (say 15 percent) one jumps to a totally random homepage. Let's look at a toy example. Assume that there are four homepages on the internet, 1 - 'Immersive Math', 2 - 'Linear Algebra', 3 - 'Boolean Algebra' and 4 - 'Boring Algebra'.

\begin{equation} \mx{A} = \begin{pmatrix} 0 & 1 & 1/2 & 1 \\ 1/2 & 0 & 1/2 & 0 \\ 1/2 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{pmatrix} . \end{equation} | (10.1) |

\begin{equation} \mx{B} = 0.15 \begin{pmatrix} 1/4 & 1/4 & 1/4 & 1/4 \\ 1/4 & 1/4 & 1/4 & 1/4 \\ 1/4 & 1/4 & 1/4 & 1/4 \\ 1/4 & 1/4 & 1/4 & 1/4 \end{pmatrix} + 0.85 \begin{pmatrix} 0 & 1 & 1/2 & 1 \\ 1/2 & 0 & 1/2 & 0 \\ 1/2 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{pmatrix} . \end{equation} | (10.2) |

\begin{equation} \mx{B} \vc{p } = \vc{p} . \end{equation} | (10.3) |

\begin{equation} \vc{p} = \begin{pmatrix} p_1\\ p_2\\ p_3\\ p_4 \end{pmatrix} = \begin{pmatrix} 0.43\\ 0.31\\ 0.22\\ 0.04 \end{pmatrix} \end{equation} | (10.4) |

Let us start with the definition of eigenvalues and eigenvectors.

Definition 10.1:
Eigenvalue and Eigenvector

Let $\mx{A}$ be a square matrix. A non-zero column vector $\vc{v}$ is called an eigenvector if $\mx{A}\vc{v}$ is parallell to $\vc{v}$, i.e., if there exists a scalar $\lambda$ such that

The scalar $\lambda$ is then called an eigenvalue.

In Chapter 9, we saw that a linear mapping can be represented as matrix multiplication. For linear mappings, we define eigenvectors and eigenvalues in the corresponding way, i.e., for a linear mapping $F$, we say that a non-zero vector $\vc{v}$ is an eigenvector if $F(\vc{v})$ is parallel to $\vc{v}$. This means that $F(\vc{v}) = \lambda \vc{v}$. In the same way, $\lambda$ is called the eigenvalue.
Let $\mx{A}$ be a square matrix. A non-zero column vector $\vc{v}$ is called an eigenvector if $\mx{A}\vc{v}$ is parallell to $\vc{v}$, i.e., if there exists a scalar $\lambda$ such that

\begin{equation} \mx{A}\vc{v} = \lambda \vc{v} . \end{equation} | (10.5) |

Example 10.3:
Find The Eigenvector Game

Notice that eigenvectors for linear mappings or transformations are well defined without knowing the coordinate system. We illustrate the basic concept in the following example and interactive illustration, which can be thought of as a game 'Find the Eigenvectors'. In Interactive Illustration 10.3, we also calculate $\frac{\ln{ F(\vc{v}) } }{\ln{\vc{v}}}$. Try to see how many eigenvectors you can find by interactively moving the input vector ${\vc{v}}$ and attempt to determine the eigenvalue to each eigenvector.

In Interactive Illustration 10.3,
you may have noticed that changing the length of the vector $\vc{v}$ does not change the fact that it is (or is not) an eigenvector. This follows from the linearity of the mapping, which is described in the following theorem.
Notice that eigenvectors for linear mappings or transformations are well defined without knowing the coordinate system. We illustrate the basic concept in the following example and interactive illustration, which can be thought of as a game 'Find the Eigenvectors'. In Interactive Illustration 10.3, we also calculate $\frac{\ln{ F(\vc{v}) } }{\ln{\vc{v}}}$. Try to see how many eigenvectors you can find by interactively moving the input vector ${\vc{v}}$ and attempt to determine the eigenvalue to each eigenvector.

Theorem 10.1:

If $\vc{v}$ is an eigenvector with eigenvalue $\lambda$, then $\mu \vc{v}$ is also an eigenvector with the same eigenvalue for each $\mu \neq 0$.

If $\vc{v}$ is an eigenvector with eigenvalue $\lambda$, then $\mu \vc{v}$ is also an eigenvector with the same eigenvalue for each $\mu \neq 0$.

If $\vc{v}$ is an eigenvector with eigenvalue $\lambda$, then $\vc{v} \neq 0$ and $F(\vc{v}) = \lambda \vc{v}$. For each non-zero multiple $\vc{u} = \mu \vc{v}$, we have that $\vc{u} \neq 0$ and that $F(\vc{u}) = F( \mu \vc{v}) = \mu F(\vc{v}) = \mu \lambda \vc{v} = \lambda \vc{u}$. This proves the theorem.

$\square$

So the eigenvectors can always be rescaled. However, remember that eigenvectors have to be non-zero.

Example 10.4:
Maximum Elongation of Linear Functions

In the Example 10.3, we illustrated eigenvectors and eigenvalues interactively. One interesting question is to study how much larger the output $F(\vc{v})$ of a linear mapping $F$ is in relation to its input vector $\vc{v}$. In the figure below, we keep the length of the input vector $\vc{v}$ to one. You may rotate the input vector using the slider below the figure. Notice that the tip of the output vector $F(\vc{v})$ lies on an ellipse. There is one direction which gives the largest elongation and perpendicular to this direction are the input vectors that gives the smallest elongation. Later in this chapter (in Section 10.6), we will see why this is so and how these directions can be calculated.

In the Example 10.3, we illustrated eigenvectors and eigenvalues interactively. One interesting question is to study how much larger the output $F(\vc{v})$ of a linear mapping $F$ is in relation to its input vector $\vc{v}$. In the figure below, we keep the length of the input vector $\vc{v}$ to one. You may rotate the input vector using the slider below the figure. Notice that the tip of the output vector $F(\vc{v})$ lies on an ellipse. There is one direction which gives the largest elongation and perpendicular to this direction are the input vectors that gives the smallest elongation. Later in this chapter (in Section 10.6), we will see why this is so and how these directions can be calculated.

In this section, we will describe how eigenvectors and eigenvalues can be calculated. (Note: The methods is based on finding roots to a polynomial. In practice, there are more efficient numerical methods for finding eigenvalues and eigenvectors. The connection between eigenvalues and roots are then used to calculate roots using eigenvalues and not the other way around).

Definition 10.2:
Characteristic Polynomial

Let $\mx{A}$ be a square matrix of size $n \times n$. The polynomial

is then called the * characteristic polynomial *.

Let $\mx{A}$ be a square matrix of size $n \times n$. The polynomial

\begin{equation} p_{\mx{A}}(\lambda) = \det(\lambda I - \mx{A}) \end{equation} | (10.6) |

Theorem 10.2:

The characteristic polynomial is a polynomial of degree $n$ in $\lambda$. The eigenvalues $\lambda$ to the matrix $\mx{A}$ are the roots of the characteristic polynomial $p_{\mx{A}}(\lambda)$.

The characteristic polynomial is a polynomial of degree $n$ in $\lambda$. The eigenvalues $\lambda$ to the matrix $\mx{A}$ are the roots of the characteristic polynomial $p_{\mx{A}}(\lambda)$.

Given a matrix $\mx{A}$, how can we calculate the eigenvectors $\vc{v}$ and the corresponding eigenvalues $\lambda$? The equation

\begin{equation} \mx{A}\vc{v} = \lambda \vc{v} , \vc{v} \neq 0 \end{equation} | (10.7) |

\begin{equation} \lambda \vc{v} - \mx{A}\vc{v} = ( \lambda I - \mx{A} ) \vc{v} = 0. \end{equation} | (10.8) |

$\square$

Example 10.5:
Calculating Eigenvalues and Eigenvectors using the Characteristic Polynomial

Calculate the eigenvalues and eigenvectors of

The eigenvalues are the roots to the characteristic polynomial

The two roots of $\lambda^2 - \frac{5}{2}\lambda + \frac{3}{2} = 0$ are

The two eigenvalues are thus $\lambda_1 = 1$ and $\lambda_2 = 3/2$.
For each eigenvalue, the corresponding eigenvectors can be found by solving
$( \lambda I - \mx{A} ) \vc{v} = 0$.

For the eigenvalue $\lambda_1 = 1$, we have

This system of equations has the parametric solution

Thus all vectors

are eigenvectors corresponding to eigenvalue $1$. Notice that we have to enforce $t \neq 0$, since a zero vector by definition is not an eigenvector.

For the eigenvalue $\lambda_2 = 3/2$, we have

This system of equations has the parametric solution

Thus all vectors

are eigenvectors corresponding to eigenvalue $3/2$.
Interactive Illustration 10.5 shows this linear mapping. By moving the red input arrow, it is possible to visually "prove" that these vectors are indeed eigenvectors since the blue output vector becomes parallel with the red input vector.

Calculate the eigenvalues and eigenvectors of

\begin{equation} \mx{A} = \begin{pmatrix} 5/2 & -3/4 \\ 2 & 0 \end{pmatrix} . \end{equation} | (10.9) |

\begin{equation} \det( \lambda I - \mx{A} ) = \begin{vmatrix} \lambda -5/2 & 3/4 \\ -2 & \lambda \end{vmatrix} = (\lambda-\frac{5}{2}) (\lambda) - \frac{3}{4}(-2) = \lambda^2 - \frac{5}{2}\lambda + \frac{3}{2} = 0 . \end{equation} | (10.10) |

\begin{equation} \lambda_{1,2} = \frac{5}{4} \pm \sqrt{ \frac{25}{4}-\frac{3}{2}} = \frac{5}{4} \pm \frac{1}{4} . \end{equation} | (10.11) |

For the eigenvalue $\lambda_1 = 1$, we have

\begin{equation} (1 I - \mx{A}) \vc{v} = \begin{pmatrix} -3/2 & 3/4 \\ -2 & 1 \end{pmatrix} \vc{v} = 0 . \end{equation} | (10.12) |

\begin{equation} \vc{v} = t \begin{pmatrix} 1\\ 2 \end{pmatrix} . \end{equation} | (10.13) |

\begin{equation} \vc{v} = t \begin{pmatrix} 1\\ 2 \end{pmatrix} , \quad t \neq 0 \end{equation} | (10.14) |

For the eigenvalue $\lambda_2 = 3/2$, we have

\begin{equation} (\frac{3}{2} I - \mx{A}) \vc{v} = \begin{pmatrix} -1 & 3/4 \\ -2 & 3/2 \end{pmatrix} \vc{v} = 0 . \end{equation} | (10.15) |

\begin{equation} \vc{v} = t \begin{pmatrix} 3\\ 4 \end{pmatrix} . \end{equation} | (10.16) |

\begin{equation} \vc{v} = t \begin{pmatrix} 3\\ 4 \end{pmatrix} , \quad t \neq 0 \end{equation} | (10.17) |

Eigenvectors and eigenvalues are connected to a concept of diagonalization. The idea here is that a linear mapping is particularly simple if one chooses the eigenvectors as basis vectors. We will need a definition of diagonalization for both a linear mapping and for a matrix. These two definitions are essentially the same.

For the first definition, we should remember that a linear mapping $F(\vc{v})$ can be defined even without a basis. An example is the linear mapping where the output vector is just the input vector but twice as long - this can be understood regardless of the basis chosen. However, if we do select a basis for the linear mapping, we can write the linear mapping on the form $F(\vc{v}) = \mx{A}\vc{v}$, where $\mx{A}$ is called the transformation matrix. The transformation matrix depends on the basis chosen. With this refresher in mind we are now ready for the definition of diagonalizability for a linear mapping.

Definition 10.3:

A linear mapping $F$ is diagonalizable if there is a basis, such that the transformation matrix is a diagonal matrix.

A linear mapping $F$ is diagonalizable if there is a basis, such that the transformation matrix is a diagonal matrix.

Definition 10.4:

A matrix $\mx{A}$ is diagonalizable, if there exists an invertible matrix $\mx{V}$ and a diagonal matrix $\mx{D}$ such that

The matrix $\mx{A}$ is factorized as a product of three matrices $\mx{V}$, $\mx{D}$, and
$\mx{V}^{-1}$. Such factorizations are extremely useful in many applications. We will give a few such applications later in this chapter.
A matrix $\mx{A}$ is diagonalizable, if there exists an invertible matrix $\mx{V}$ and a diagonal matrix $\mx{D}$ such that

\begin{equation} \mx{A} = \mx{V} \mx{D} \mx{V}^{-1} . \end{equation} | (10.18) |

To understand how diagonalization relates to eigenvectors and eigenvalues, we rearrange the matrix equation as

\begin{equation} \mx{A} \mx{V}= \mx{V} \mx{D} . \end{equation} | (10.19) |

\begin{equation} \mx{A} \begin{pmatrix} \vc{v}_1 & \ldots & \vc{v}_n \end{pmatrix} = \begin{pmatrix} \vc{v}_1 & \ldots & \vc{v}_n \end{pmatrix} \mx{D} \end{equation} | (10.20) |

\begin{equation} \begin{pmatrix} \mx{A} \vc{v}_1 & \ldots & \mx{A} \vc{v}_n \end{pmatrix} = \begin{pmatrix} \lambda_1 \vc{v}_1 & \ldots & \lambda_n \vc{v}_n \end{pmatrix}. \end{equation} | (10.21) |

\begin{equation} \mx{A} \vc{v}_k = \lambda_k \vc{v}_k, \end{equation} | (10.22) |

Theorem 10.3:

A matrix $\mx{A}$ is diagonalizable if and only if there exists a basis of eigenvectors. Furthermore, in the factorization $ \mx{A} = \mx{V} \mx{D} \mx{V}^{-1}$, the columns of $\mx{V}$ correspond to eigenvectors and the corresponding elements of the diagonal matrix $\mx{D}$ are the eigenvalues.

A matrix $\mx{A}$ is diagonalizable if and only if there exists a basis of eigenvectors. Furthermore, in the factorization $ \mx{A} = \mx{V} \mx{D} \mx{V}^{-1}$, the columns of $\mx{V}$ correspond to eigenvectors and the corresponding elements of the diagonal matrix $\mx{D}$ are the eigenvalues.

Example 10.6:
Single Roots and Diagonalizable Matrices

Check if the matrix

is diagonalizable and if so, calculate the matrices $\mx{V}$ and $\mx{D}$ so
that

In the previous example, we found two linearly independent eigenvectors

with eigenvalues $\lambda_1 = 1$ and $\lambda_2 = 3/2$. According to Theorem 10.3,
$\mx{A}$ can be factorized using

and

For matrices of size $2 \times 2$, the characteristic polynomial is of degree 2 and there are therefore 2 roots (if you count complex roots and take multiplicity of roots into account). In Example 10.5, the two roots ($2$ and $3$) are different and there is a basis of eigenvectors. The same holds for general $n \times n$ matrices as is captured by the following theorem.
Check if the matrix

\begin{equation} \mx{A} = \begin{pmatrix} 5/2 & -3/4 \\ 2 & 0 \end{pmatrix} \end{equation} | (10.23) |

\begin{equation} \mx{A} = \mx{V} \mx{D} \mx{V}^{-1} . \end{equation} | (10.24) |

\begin{equation} \vc{v} _1= \begin{pmatrix} 1\\ 2 \end{pmatrix} \spc \text{and} \spc \vc{v} _2= \begin{pmatrix} 3\\ 4 \end{pmatrix} \end{equation} | (10.25) |

\begin{equation} \mx{V} = \begin{pmatrix} 1 & 3 \\ 2 & 4 \end{pmatrix} \end{equation} | (10.26) |

\begin{equation} \mx{D} = \begin{pmatrix} 1 & 0 \\ 0 & \frac{3}{2} \end{pmatrix} . \end{equation} | (10.27) |

Theorem 10.4:

If the $n \times n$ matrix $\mx{A}$ has $n$ different eigenvalues, then $\mx{A}$ is diagonalizable. Furthermore, $n$ eigenvectors corresponding to different eigenvalues are always linearly independent.

If the $n \times n$ matrix $\mx{A}$ has $n$ different eigenvalues, then $\mx{A}$ is diagonalizable. Furthermore, $n$ eigenvectors corresponding to different eigenvalues are always linearly independent.

Denote the $n$ different eigenvalues $\lambda_1, \ldots, \lambda_n$. To each eigenvalue $\lambda_k$, there is at least one eigenvector $\vc{v}_k$. If these $n$ eigenvectors are linearly independent then these eigenvectors form a basis. According to Theorem 10.3, the matrix $\mx{A}$ is then diagonalizable. Thus we only need to show that the $n$ eigenvectors are linearly independent. To prove this, we will use recursion over the number $k$ of eigenvectors. For $k=2$, it is true since two vectors that are parallell have the same eigenvalue. Assume that $k-1$ eigenvectors with different eigenvalues are always linearly independent. We need to show that for $k$ eigenvectors with different eigenvalues, the equation

\begin{equation} z_1 \vc{v}_1 + z_2 \vc{v}_2 + \ldots + z_k \vc{v}_k = 0 \end{equation} | (10.28) |

\begin{equation} \mx{A} (z_1 \vc{v}_1 + z_2 \vc{v}_2 + \ldots + z_k \vc{v}_k) = z_1 \mx{A} \vc{v}_1 + z_2 \mx{A}\vc{v}_2 + \ldots + z_k \mx{A}\vc{v}_k = z_1 \lambda_1 \vc{v}_1 + z_2 \lambda_2 \vc{v}_2 + \ldots + z_k \lambda_k \vc{v}_k = 0 . \end{equation} | (10.29) |

\begin{equation} z_1 (\lambda_1-\lambda_k) \vc{v}_1 + z_2 (\lambda_2 -\lambda_k) \vc{v}_2 + \ldots + z_{k-1} (\lambda_{k-1} - \lambda_k) \vc{v}_{k-1} = 0 . \end{equation} | (10.30) |

\begin{equation} \begin{array}{ll} z_1 (\lambda_1-\lambda_k) & = 0 \\ z_2 (\lambda_2-\lambda_k) & = 0 \\ & \vdots \\ z_{k-1} (\lambda_{k-1} - \lambda_k) & = 0. \end{array} \end{equation} | (10.31) |

$\square$

So if all roots are different, the matrix is diagonalizable. If there are roots with higher multiplicity, the matrix could be either diagonalizable (Example 10.7) or not (Example 10.8).

Example 10.7:
Double Roots and Diagonalizable Matrices

Check if the matrix

is diagonalizable and if so, calculate the matrices $\mx{V}$ and $\mx{D}$ so
that

This example is a bit silly, since the matrix $\mx{A}$ already is diagonal, but it serves its purpose as being a matrix that is diagonalizable even though the characteristic polynomial has a double root. Although it is clearly diagonalizable, we will still go through with the calculations.
As in Example 10.5, we start by determining the eigenvalues.
The eigenvalues are the roots to the characteristic polynomial

The polynomial has a double root at $\lambda = 2$.

For the eigenvalue, $\lambda = 2$ we have

This system of equations has the parametric solution

In this case it is possible to find a basis of eigenvectors, e.g.

and

Thus the matrix is diagonalizable.

Actually, this should not come as a surprise, given that the matrix $\mx{A}$ already is diagonal. Hence, without calculation, we can choose $\mx{D} = \mx{A}$, $\mx{V} = \mx{I}$ and $\mx{V}^{-1} = \mx{I}^{-1} = \mx{I}$ and write $\mx{A}$ as

Check if the matrix

\begin{equation} \mx{A} = \begin{pmatrix} 2 & 0\\ 0 & 2 \end{pmatrix} \end{equation} | (10.32) |

\begin{equation} \mx{A} = \mx{V} \mx{D} \mx{V}^{-1} . \end{equation} | (10.33) |

\begin{equation} \det( \lambda I - \mx{A} ) = \mx{A} = \begin{vmatrix} \lambda - 2 & 0\\ 0 & \lambda -2 \end{vmatrix} = (\lambda-2) (\lambda-2) - (0)(0) = (\lambda-2) (\lambda-2) = 0 . \end{equation} | (10.34) |

For the eigenvalue, $\lambda = 2$ we have

\begin{equation} (2 I - \mx{A}) \vc{v} = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix} \vc{v} = 0 . \end{equation} | (10.35) |

\begin{equation} \vc{v} = t_1\begin{pmatrix} 1\\ 0 \end{pmatrix} + t_2 \begin{pmatrix} 0\\ 1 \end{pmatrix} . \end{equation} | (10.36) |

\begin{equation} \vc{v}_1 = \begin{pmatrix} 1\\ 0 \end{pmatrix} \end{equation} | (10.37) |

\begin{equation} \vc{v}_1 = \begin{pmatrix} 0\\ 1 \end{pmatrix} . \end{equation} | (10.38) |

Actually, this should not come as a surprise, given that the matrix $\mx{A}$ already is diagonal. Hence, without calculation, we can choose $\mx{D} = \mx{A}$, $\mx{V} = \mx{I}$ and $\mx{V}^{-1} = \mx{I}^{-1} = \mx{I}$ and write $\mx{A}$ as

\begin{equation} \mx{A} = \mx{I}\mx{A}\mx{I}^{-1} = \mx{V} \mx{D} \mx{V}^{-1}. \end{equation} | (10.39) |

Example 10.8:
Double Roots and Non-diagonalizable Matrices

Check if the matrix

is diagonalizable and if so, calculate the matrices $\mx{V}$ and $\mx{D}$ so
that

As in Example 10.5, we start by determining the eigenvalues.
The eigenvalues are the roots to the characteristic polynomial

The polynomial has a double root at $\lambda = 1$.

For the eigenvalue $\lambda = 1$, we have

This system of equations has the parametric solution

Thus all vectors

are eigenvectors corresponding to eigenvalue $1$.
These are the only eigenvectors. Any choice of two such eigenvectors are linearly dependent. This means that it is not possible to diagonalize the matrix.

In Example 10.6, there were only single roots, and thus all eigenvalues are different. According to Theorem 10.4 such matrices are always diagonalizable. A matrix, whose characteristic polynomial has double roots (or roots with higher multiplicity) can be either diagonalizable as in
Example 10.7 or not diagonalizable as in
Example 10.8. Whether or not it works depends on how many linearly independent eigenvectors one can select from the nullspace of $(\lambda I - \mx{A})$ for each eigenvalue $\lambda$.
Here it is useful to introduce the notion of algebraic and geometric multiplicity.
Check if the matrix

\begin{equation} \mx{A} = \begin{pmatrix} 1 & 1\\ 0 & 1 \end{pmatrix} \end{equation} | (10.40) |

\begin{equation} \mx{A} = \mx{V} \mx{D} \mx{V}^{-1} . \end{equation} | (10.41) |

\begin{equation} \det( \lambda I - \mx{A} ) = \mx{A} = \begin{vmatrix} \lambda - 1 & -1\\ 0 & \lambda -1 \end{vmatrix} = (\lambda-1) (\lambda-1) - (0)(-1) = (\lambda-1) (\lambda-1) = 0 . \end{equation} | (10.42) |

For the eigenvalue $\lambda = 1$, we have

\begin{equation} (1 I - \mx{A}) \vc{v} = \begin{pmatrix} 0 & -1 \\ 0 & 0 \end{pmatrix} \vc{v} = 0 . \end{equation} | (10.43) |

\begin{equation} \vc{v} = t \begin{pmatrix} 1\\ 0 \end{pmatrix} . \end{equation} | (10.44) |

\begin{equation} \vc{v} = t \begin{pmatrix} 1\\ 0 \end{pmatrix} , \quad t \neq 0 \end{equation} | (10.45) |

Definition 10.5:
Algebraic Multiplicity, Eigenspace, and Geometric Multiplicity

Let $\lambda$ be an eigenvalue to the matrix $\mx{A}$. The multiplicity of $\lambda$ with respect to the characteristic polynomial is called the algebraic multiplicity of $\lambda$. The nullspace of $(\lambda I - \mx{A})$ is called the eigenspace of the matrix $\mx{A}$ associated with the eigenvalue $\lambda$. The geometric multiplicity of $\lambda$ is the dimension of the nullspace of $(\lambda I - \mx{A})$, i.e., the dimension of the eigenspace.

Let $\lambda$ be an eigenvalue to the matrix $\mx{A}$. The multiplicity of $\lambda$ with respect to the characteristic polynomial is called the algebraic multiplicity of $\lambda$. The nullspace of $(\lambda I - \mx{A})$ is called the eigenspace of the matrix $\mx{A}$ associated with the eigenvalue $\lambda$. The geometric multiplicity of $\lambda$ is the dimension of the nullspace of $(\lambda I - \mx{A})$, i.e., the dimension of the eigenspace.

Theorem 10.5:

The geometric multiplicity of an eigenvalue $\lambda$ is always less than or equal to the algebraic multiplicity.

The geometric multiplicity of an eigenvalue $\lambda$ is always less than or equal to the algebraic multiplicity.

Assume that the geometric multiplicity for the eigenvalue $\lambda_0$ is $k$. This means that the dimension of the nullspace of $(\lambda_0 I - \mx{A})$ is $k$, in other words, we can find $k$ linearly independent eigenvectors $\vc{e}_1, \ldots, \vc{e}_k$ to the eigenvalue $\lambda_0$. Choose a new basis according to $\vc{e}_1, \ldots, \vc{e}_k, \ldots \vc{e}_n$, so that the first $k$ basis vectors are given by the eigenvectors to $\lambda_0$. The transformation matrix $\mx{A}$ in this new basis has the form

\begin{equation} \mx{A} = \begin{pmatrix} \lambda_0 & 0 & \ldots & 0 & a_{1,k+1} & \ldots & a_{1,n}\\ 0 & \lambda_0 & \ldots & 0 & a_{2,k+1} & \ldots & a_{2,n}\\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ 0 & 0 & \ldots & \lambda_0 & a_{k,k+1} & \ldots & a_{k,n}\\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ 0 & 0 & \ldots & 0 & a_{n,k+1} & \ldots & a_{n,n}\\ \end{pmatrix} \end{equation} | (10.46) |

$\square$

Theorem 10.6:

A matrix $\mx{A}$ is diagonalizable if and only if the geometric and algebraic multiplicity is the same for each eigenvalue. or equivalently for an $n \times n$ matrix $\mx{A}$, if the sum of the dimensions of the eigenspaces is $n$, then $\mx{A}$ is diagonalizable.

A matrix $\mx{A}$ is diagonalizable if and only if the geometric and algebraic multiplicity is the same for each eigenvalue. or equivalently for an $n \times n$ matrix $\mx{A}$, if the sum of the dimensions of the eigenspaces is $n$, then $\mx{A}$ is diagonalizable.

Assume that there are $k$ different eigenvalues $\lambda_1, \ldots, \lambda_k$. If the algebraic and geometric multiplicities for each eigenvalue are the same. For each eigenvalue $\lambda_i$, let this multiplicity be $m_i$. From the algebraic multiplicity, we have $\sum_{i=1}^k m_i = n$. Since the geometric multiplicity is $m_i$ there are $m_i$ independent eigenvectors $\vc{v}_{i,1}, \ldots, \vc{v}_{i,m_i}$. All of these eigenvectors (from all eigenvalues) $\{ \vc{v}_{1,1}, \ldots, \vc{v}_{1,m_i}, \ldots, \vc{v}_{k,1}, \ldots, \vc{v}_{k,m_k} \}$ is a collection of $n$ vectors. We will show that these form a basis by showing that they are linearly independent. Assume that there are scalars $c_{i,j}$ with $1 \leq i \leq k$ and $1 \leq j \leq m_i$ so that

\begin{equation} \sum_{i=1}^k \sum_{j=1}^{m_i} c_{i,j} \vc{v}_{i,j} = 0 . \end{equation} | (10.47) |

\begin{equation} \vc{w}_{i} = \sum_{j=1}^{m_i} c_{i,j} \vc{v}_{i,j}. \end{equation} | (10.48) |

\begin{equation} \sum_{i=1}^k \vc{w}_i= 0 \end{equation} | (10.49) |

\begin{equation} \vc{w}_{i} = \sum_{j=1}^{m_i} c_{i,j} \vc{v}_{i,j} = 0 . \end{equation} | (10.50) |

\begin{equation} c_{i,j} = 0, \quad 1 \leq i \leq k, 1 \leq j \leq m_i \, . \end{equation} | (10.51) |

$\square$

In both Example 10.7 and Example 10.8, there was only one eigenvalue. In both examples, the algebraic multiplicity was 2. In Example 10.7, the geometric multiplicity was only 1 and the matrix was therefore not diagonalizable. However, in Example 10.8, the geometric multiplicity was 2 and the matrix was therefore diagonalizable. Note that for random matrices, for example, if each element of matrix was drawn from independent normal distributions, the chance of obtaining a non-diagonalizable matrix is extremely small. In fact, if a true continuous normal distribution is used rather than a (discrete) computer approximation, the chance is actually zero. So in some sense, non-diagonalizable matrices are the exception. However, there are practical applications matrices that have inherent structure that makes them non-diagonalizable.

We have mostly worked with real matrices and vectors, but it is useful to work with matrices and vectors that contain complex numbers. Even for real matrices, it is sometimes useful to consider complex eigenvalues and eigenvectors. For both real and complex matrices $\mx{A}$, the definition of complex eigenvalues and eigenvectors are similar.

Definition 10.6:
Complex Eigenvalue and Eigenvector

Let $\mx{A}$ be a square matrix with real or complex entries. A non-zero column vector $\vc{v}$ with complex entries is called an eigenvector if $\mx{A}\vc{v}$ is parallell to $\vc{v}$, i.e., if there exists a complex scalar $\lambda$, such that

The scalar $\lambda$ is then called an eigenvalue.

A useful property of symmetric matrices is summarized in the following theorem.
Let $\mx{A}$ be a square matrix with real or complex entries. A non-zero column vector $\vc{v}$ with complex entries is called an eigenvector if $\mx{A}\vc{v}$ is parallell to $\vc{v}$, i.e., if there exists a complex scalar $\lambda$, such that

\begin{equation} \mx{A}\vc{v} = \lambda \vc{v} . \end{equation} | (10.52) |

Theorem 10.7:

The eigenvalues of a symmetric matrix $\mx{A}$ are all real.

The eigenvalues of a symmetric matrix $\mx{A}$ are all real.

Assume we have an eigenvector $\vc{v}$, which may be complex, and corresponding eigenvalue $\lambda$, which also could be complex. First we show that $z = \bar{\vc{v}}^\T \mx{A} \vc{v}$ is a real number. To do this, we will use the complex conjugate of a complex number, $z=a+bi$, which is $\overline{z}=a-bi$. We get

\begin{equation} \bar{z}= \overline{\bar{\vc{v}}^\T \mx{A} \vc{v}} = \vc{v}^\T \bar{\mx{A}} \bar{\vc{v}} = (\vc{v}^\T \bar{\mx{A}} \bar{\vc{v}})^\T = \bar{\vc{v}} \bar{\mx{A}}^\T \vc{v} = \bar{\vc{v}}^\T \mx{A} \vc{v} = z . \end{equation} | (10.53) |

\begin{equation} z = \bar{\vc{v}}^\T \mx{A} \vc{v}= \bar{\vc{v}}^\T \lambda \vc{v} = \lambda (\bar{\vc{v}}^\T \vc{v}) . \end{equation} | (10.54) |

$\square$

Notice that in the proof, we needed the fact that $\bar{\mx{A}}^\T = \mx{A}$. Thus we can prove that all eigenvalues are real even for complex matrices if they fulfill the constraint $\bar{\mx{A}}^\T = \mx{A}$.

Theorem 10.8:

For symmetric matrix $\mx{A}$, it is possible to find an orthogonal matrix $\mx{U}$ and diagonal matrix $\mx{D}$ such that

In other words, it is possible to diagonalise using an orthonormal basis of eigenvectors.

For symmetric matrix $\mx{A}$, it is possible to find an orthogonal matrix $\mx{U}$ and diagonal matrix $\mx{D}$ such that

\begin{equation} \mx{A} = \mx{U} \mx{D} \mx{U}^{-1} = \mx{U} \mx{D} \mx{U}^{T} . \end{equation} | (10.55) |

We will prove this using induction over the size $k$ of the matrix $\mx{A}$. For $k = 1$, i.e., for $1 \times 1$ matrices $\mx{A}$, the theorem is true. One can then choose $\mx{U} = (1)$ and $\mx{D}= \mx{A}$. We assume that the theorem is true for $(k-1) \times (k-1)$ matrices. Let $\mx{A}$ be an $n \times n$ real symmetric matrix. We are going to prove that it is possible to diagonalise $\mx{A}$ using an orthonormal basis of eigenvectors. There is at least one eigenvector $\lambda$ with corresponding eigenvector $\vc{v}$. Set $\vc{q}_1 = \frac{\vc{v}}{||\vc{v}||}$. Choose an arbitrary $k \times k$ orthogonal matrix $\mx{Q} = \begin{pmatrix}\vc{q}_1 & \ldots & \vc{q}_k\end{pmatrix}$ with $\vc{q}_1$ as its first column. Multiplying $\mx{A}$ with $\mx{Q}$ gives

\begin{equation} \mx{A}\mx{Q} = \begin{pmatrix} \lambda \vc{q}_1 & \mx{A}\vc{q}_2 & \ldots & \mx{A} \vc{q}_k \end{pmatrix} = \mx{Q} \begin{pmatrix} \lambda & \vc{w}^\T \\ 0 & \tilde{\mx{A}} \end{pmatrix} . \end{equation} | (10.56) |

\begin{equation} \mx{Q}^\T \mx{A} \mx{Q} = \begin{pmatrix} \lambda & \vc{w}^\T \\ 0 & \tilde{\mx{A}} \end{pmatrix} . \end{equation} | (10.57) |

\begin{equation} (\mx{Q}^\T \mx{A} \mx{Q})^\T = \mx{Q}^\T \mx{A}^\T \mx{Q} = \mx{Q}^\T \mx{A} \mx{Q} = \begin{pmatrix} \lambda & 0 \\ \vc{w} & \tilde{\mx{A}}^\T \end{pmatrix} . \end{equation} | (10.58) |

\begin{equation} \tilde{\mx{A}} = \tilde{\mx{Q}} \tilde{\mx{D}} \tilde{\mx{Q}}^\T. \end{equation} | (10.59) |

\begin{equation} \mx{U} = \mx{Q} \begin{pmatrix} 1 & 0 \\ 0 & \tilde{\mx{Q}} \end{pmatrix} \end{equation} | (10.60) |

\begin{equation} \mx{D} = \begin{pmatrix} \lambda & 0 \\ 0 & \tilde{\mx{D}} \end{pmatrix}, \end{equation} | (10.61) |

$\square$

Definition 10.7:
Definiteness

A symmetric real $n \times n$ matrix $\mx{A}$ is said to be

A symmetric real $n \times n$ matrix $\mx{A}$ is said to be

\begin{align} \begin{array}{lll} &(1) & \text{positive definite if } & \vc{v}^T \mx{A} \vc{v} > 0, \, \, \forall \vc{v} \in R^n \setminus 0 \\ &(2) & \text{positive semi-definite if } & \vc{v}^T \mx{A} \vc{v} \geq 0, \, \, \forall \vc{v} \in R^n \setminus 0\\ &(3) & \text{negative definite if } & \vc{v}^T \mx{A} \vc{v} < 0, \, \, \forall \vc{v} \in R^n \setminus 0\\ &(4) & \text{negative semi-definite if } & \vc{v}^T \mx{A} \vc{v} \leq 0, \, \, \forall \vc{v} \in R^n \setminus 0\\ &(5) & \text{indefinite if } & \vc{v}^T \mx{A} \vc{v} \text{ takes on both positive and negative values} \\ \end{array} \end{align} | (10.62) |

Theorem 10.9:
Definiteness and eigenvalues

A symmetric real $n \times n$ matrix $\mx{A}$ is

A symmetric real $n \times n$ matrix $\mx{A}$ is

\begin{align} \begin{array}{ll} &(1) & \text{positive definite if and only if all eigenvalues are positive} \\ &(2) & \text{positive semi-definite if and only if all eigenvalues are non-negative} \\ &(3) & \text{negative definite if and only if all eigenvalues are negative} \\ &(4) & \text{negative semi-definite if and only if all eigenvalues are non-positive} \\ &(5) & \text{indefinite if and only if there are both positive and negative eigenvalues} \\ \end{array} \end{align} | (10.63) |

Since the matrix $\mx{A}$ is symmetric it is possible to diagonalize it using an orthonormal matrix $\mx{U}$, i.e. $\mx{A} = \mx{U}^\T \mx{D} \mx{U}$. A change of coordinates $\vc{w} = \mx{U} \vc{v}$ gives

\begin{equation} \vc{v}^T \mx{A} \vc{v} = \vc{v}^T \mx{U}^\T \mx{D} \mx{U} \vc{v} = \vc{w}^T \mx{D} \vc{w} \end{equation} | (10.64) |

$\square$

In Example 10.4, we studied how long the output vector $F(\vc{v})$ could be if we fixed the length of the input vector. Here we will use the tools from Theorem 10.8 and Theorem 10.7 to understand this better. Given a real matrix $\mx{A}$ and the corresponding linear map $F(\vc{v}) = \mx{A} \vc{v}$ for real vectors $\vc{v}$, which input $\vc{v}$ gives the largest and smallest elongation $s = \frac{|| \mx{A} \vc{v}||}{||\vc{v}||}$? It is easier to study the square of the elongation, i.e.,

\begin{equation} s^2 =\left(\frac{|| \mx{A} \vc{v}||}{||\vc{v}||}\right)^2= \frac{\vc{v}^\T \mx{A}^\T \mx{A} \vc{v}}{\vc{v}^\T \vc{v}}, \end{equation} | (10.65) |

\begin{equation} \mx{A}^\T \mx{A} = \mx{U} \mx{D} \mx{U}^\T . \end{equation} | (10.66) |

\begin{equation} s^2 = \frac{\vc{v}^\T \mx{A}^\T \mx{A} \vc{v}}{\vc{v}^\T \vc{v}} = \frac{\vc{v}^\T \mx{U} \mx{D} \mx{U}^\T \vc{v}}{\vc{v}^\T \mx{U} \mx{U}^\T \vc{v}} . \end{equation} | (10.67) |

\begin{equation} s^2 = \frac{\vc{w}^\T \mx{D} \vc{w}}{\vc{w}^\T \vc{w}}. \end{equation} | (10.68) |

\begin{equation} w_1^2 d_1 + w_2^2 d_2 + \ldots + w_n^2 d_n, \end{equation} | (10.69) |

\begin{equation} w_1^2 + w_2^2 + \ldots + w_n^2. \end{equation} | (10.70) |

If we want a formal proof for this intuition, we can proceed the following way. First we rewrite Equation (10.68) as

\begin{equation} s^2 = \frac{\sum_{i=1}^n d_i w_i^2 }{\sum_{i=1}^n w_i^2}. \end{equation} | (10.71) |

\begin{equation} \frac{\partial (s^2)}{\partial w_i} = \frac{ (2 d_i w_i) (\sum_{i=1}^n w_i^2) - (\sum_{i=1}^n d_i w_i^2 )(2 w_i)}{(\sum_{i=1}^n w_i^2)^2} . \end{equation} | (10.72) |

\begin{equation} \begin{pmatrix} d_1 w_1 & w_1 \\ d_2 w_2 & w_2 \\ \vdots & \vdots \\ d_n w_n & w_n \end{pmatrix} \begin{pmatrix} \sum_{i=1}^n w_i^2 \\ \sum_{i=1}^n d_i w_i^2 \end{pmatrix} = \vc{0}. \end{equation} | (10.73) |

\begin{equation} \det \begin{pmatrix} d_i w_i & w_i \\ d_j w_j & w_j \end{pmatrix} =(d_i-d_j) w_i w_j = 0 \end{equation} | (10.74) |

Example 10.9:
Max elongation for symmetric matrices

Study the linear mapping in Interactive Illustration 10.6, where the (red) input vector has length one. Try to change the input vector $\vc{v}$ and make the resulting vector $F(\vc{v})$ as long as possible and then as short as possible. In this example the mapping $\vc{v} \rightarrow \mx{A} \vc{v}$, where $\mx{A}$ is a symmetric matrix. For this case the interpretation is slightly more simple. In Chapter 9, we saw that a (finite dimensional) linear mapping $F$ could always be represented as a matrix multiplication $\vc{y} = \mx{A} \vc{x}$. If the matrix is diagonalizable, we know from Definition 10.4 that

This means that if we change coordinates according to $ \vc{x} = \mx{V} \tilde{\vc{x}}$ (and thus also $\vc{y} = \mx{V} \tilde{\vc{y}}$), we get a simplified relation $\tilde{\vc{y}} = \tilde{\mx{A}} \tilde{\vc{x}}$ in the new coordinate system, i.e.,

which means that

This means that the linear mapping is now represented by a diagonal matrix containing the eigenvalues.

In this example the matrix $\mx{A}$ is given by

In the general case (see above), it is the eigenvalues to $\mx{A}^T \mx{A}$ that gives the largest elongation, but since the matrix is symmetric, the eigenvectors to $\mx{A}$ are also eigenvectors to $\mx{A}^T \mx{A} = \mx{A}^2$.

Study the linear mapping in Interactive Illustration 10.6, where the (red) input vector has length one. Try to change the input vector $\vc{v}$ and make the resulting vector $F(\vc{v})$ as long as possible and then as short as possible. In this example the mapping $\vc{v} \rightarrow \mx{A} \vc{v}$, where $\mx{A}$ is a symmetric matrix. For this case the interpretation is slightly more simple. In Chapter 9, we saw that a (finite dimensional) linear mapping $F$ could always be represented as a matrix multiplication $\vc{y} = \mx{A} \vc{x}$. If the matrix is diagonalizable, we know from Definition 10.4 that

\begin{equation} \mx{A} = \mx{V} \mx{D} \mx{V}^{-1} . \end{equation} | (10.75) |

\begin{equation} \tilde{\vc{y}} = \mx{V}^{-1} \vc{y} = \mx{V}^{-1} \mx{V} \mx{D} \mx{V}^{-1} \vc{x} = \mx{V}^{-1} \mx{V} \mx{D} \mx{V}^{-1} \mx{V} \tilde{\vc{x}} \end{equation} | (10.76) |

\begin{equation} \tilde{\vc{y}} = \mx{D} \tilde{\vc{x}} . \end{equation} | (10.77) |

In this example the matrix $\mx{A}$ is given by

\begin{equation} \mx{A} = \frac{1}{10}\left(\begin{array}{cc} 8 & 6\\ 6 & 17\\ \end{array}\right). \end{equation} | (10.78) |

In this section, we provide some useful results and properties of eigenvalues and eigenvectors.

Theorem 10.10:

If $\vc{v}$ is an eigenvector with eigenvalue $\lambda$ to the matrix $\mx{A}$ and also eigenvector with eigenvalue $\mu$ to the matrix $\mx{B}$, then

If $\vc{v}$ is an eigenvector with eigenvalue $\lambda$ to the matrix $\mx{A}$ and also eigenvector with eigenvalue $\mu$ to the matrix $\mx{B}$, then

\begin{equation} \begin{array}{ll} (i) & \text{The vector}\ \vc{v} \ \text{is an eigenvector with eigenvalue}\ c\lambda \ \text{to the matrix}\ c\mx{A}. \\ (ii) & \text{The vector}\ \vc{v} \ \text{is an eigenvector with eigenvalue}\ \lambda+d \ \text{to the matrix}\ \mx{A}+d \mx{I}. \\ (iii) & \text{The vector}\ \vc{v} \ \text{is an eigenvector with eigenvalue}\ \lambda^n \ \text{to the matrix}\ \mx{A}^n. \\ (iv) & \text{The vector}\ \vc{v} \ \text{is an eigenvector with eigenvalue}\ 1/\lambda. \\ & \text{to the matrix}\ \mx{A}^{-1}, \ \text{if}\ \mx{A}\ \text{is invertible}. \\ (v) & \text{The vector}\ \vc{v} \ \text{is an eigenvector with eigenvalue}\ \lambda + \mu \ \text{to the matrix}\ \mx{A}+\mx{B}. \\ \end{array} \end{equation} | (10.79) |

Most of these statements are fairly simple to prove.

$(i)$: Multiply the matrix $c\mx{A}$ with the vector $\vc{v}$.

\begin{equation} (c \mx{A}) \vc{v}= c \mx{A} \vc{v} = c \lambda \vc{v} = (c \lambda) \vc{v} . \end{equation} | (10.80) |

$(ii)$: Multiply the matrix $\mx{A} + d\mx{I}$ with the vector $\vc{v}$.

\begin{equation} (\mx{A} + d\mx{I}) \vc{v} = \mx{A}\vc{v} + d \mx{I} \vc{v} = \lambda \vc{v} + d \vc{v} = (\lambda + d) \vc{v} . \end{equation} | (10.81) |

$(iii)$: Multiply the matrix $\mx{A}^n$ with the vector $\vc{v}$.

\begin{equation} \mx{A}^n \vc{v} = \mx{A}^{n-1} \mx{A} \vc{v} = \lambda \mx{A}^{n-1} \vc{v} = \lambda^2 \mx{A}^{n-2} \vc{v} = \ldots = \lambda^n \vc{v} . \end{equation} | (10.82) |

$(iv)$: Study the equation

\begin{equation} \mx{A} \vc{v} = \lambda \vc{v} . \end{equation} | (10.83) |

\begin{equation} \vc{v} = \lambda \mx{A}^{-1} \vc{v} . \end{equation} | (10.84) |

\begin{equation} \frac{1}{\lambda} \vc{v} = \mx{A}^{-1} \vc{v} . \end{equation} | (10.85) |

$(v)$: Multiply the matrix $\mx{A}+\mx{B}$ with the vector $\vc{v}$.

\begin{equation} ( \mx{A} + \mx{B}) \vc{v} = \mx{A} \vc{v} + \mx{B} \vc{v} = \lambda \vc{v} + \mu \vc{v} = (\lambda + \mu) \vc{v} . \end{equation} | (10.86) |

$\square$

Theorem 10.11:

The matrices $\mx{A}$ and $\mx{A}^\T$ have the same characteristic polynomial, i.e., $p_{\mx{A}}(\lambda) = p_{\mx{A}^\T}(\lambda)$.

The matrices $\mx{A}$ and $\mx{A}^\T$ have the same characteristic polynomial, i.e., $p_{\mx{A}}(\lambda) = p_{\mx{A}^\T}(\lambda)$.

This follows from the rules for determinants (Theorem 7.1).

\begin{equation} p_{\mx{A}^\T}(\lambda) = \det( \lambda \mx{I} - \mx{A}^\T ) = \det( (\lambda \mx{I} - \mx{A})^\T ) = \det(\lambda \mx{I} - \mx{A}) = p_\mx{A}(\lambda) . \end{equation} | (10.87) |

$\square$

This means that the eigenvalues of $ \mx{A}$ and $\mx{A}^\T$ are the same. However, the eigenvectors could be different.

Theorem 10.12:

If $\mx{V}$ is an invertible matrix, then the two matrices $\mx{A}$ and $\mx{V} \mx{A} \mx{V}^{-1}$ have the same chararecteristic polynomial, i.e., $p_{\mx{A}}(\lambda) = p_{\mx{V} \mx{A} \mx{V}^{-1}}(\lambda)$.

If $\mx{V}$ is an invertible matrix, then the two matrices $\mx{A}$ and $\mx{V} \mx{A} \mx{V}^{-1}$ have the same chararecteristic polynomial, i.e., $p_{\mx{A}}(\lambda) = p_{\mx{V} \mx{A} \mx{V}^{-1}}(\lambda)$.

This also follows from the rules for determinants (Theorem 7.1). We can write $\mx{I} = \mx{V}\mx{V}^{-1} = \mx{V}\mx{I}\mx{V}^{-1}$, and hence

\begin{equation} p_{\mx{V} \mx{A} \mx{V}^{-1}}(\lambda) = \det( \lambda \mx{I} - \mx{V} \mx{A} \mx{V}^{-1} ) = \det( \lambda \mx{V}\mx{I}\mx{V}^{-1} - \mx{V} \mx{A} \mx{V}^{-1} ). \end{equation} | (10.88) |

\begin{equation} p_{\mx{V} \mx{A} \mx{V}^{-1}}(\lambda) = \det( \mx{V} (\lambda \mx{I} - \mx{A}) \mx{V}^{-1} ) = \det( \mx{V}) \det(\lambda \mx{I} - \mx{A}) \det(\mx{V}^{-1} ) = \det(\lambda \mx{I} - \mx{A}) = p_{\mx{A}}(\lambda), \end{equation} | (10.89) |

$\square$

Theorem 10.13:

If $\mx{A}$ is an $n \times n$ matrix with eigenvalues $\lambda_1, \ldots, \lambda_n$, then

and

In other words, the determinant is the product of the eigenvalues and the trace, which is defined as the
sum of the diagonal values of a matrix, is the sum of
the eigenvalues.

If $\mx{A}$ is an $n \times n$ matrix with eigenvalues $\lambda_1, \ldots, \lambda_n$, then

\begin{equation} \det(\mx{A}) = \lambda_1 \cdot \lambda_2 \cdot \ldots \cdot \lambda_n . \end{equation} | (10.90) |

\begin{equation} \tr(\mx{A}) = \lambda_1 + \lambda_2 + \ldots + \lambda_n . \end{equation} | (10.91) |

The proofs of both of these theorems can be found by studying the characteristic polynomial in two different ways. The characteristic polynomial is on the one hand

\begin{equation} p_{\mx{A}}(\lambda) = (\lambda-\lambda_1) (\lambda-\lambda_2) \cdot (\lambda - \lambda_n) = \lambda^n - (\lambda_1 + \lambda_2 + \cdot + \lambda_n) \lambda^{n-1} + \ldots + (\lambda_1 \lambda_2 \ldots \lambda_n) (-1)^n . \end{equation} | (10.92) |

\begin{equation} p_{\mx{A}}(\lambda) = \det( \lambda \mx{I} - \mx{A} ) = \lambda^n - (a_{11} + a_{22} + \ldots + a_{nn}) \lambda^{n-1} + \ldots + (\det(\mx{A})) (-1)^n \end{equation} | (10.93) |

\begin{equation} p_{\mx{A}}(\lambda) = \det( \lambda \mx{I} - \mx{A} ) = \lambda^n - \tr(\mx{A}) \lambda^{n-1} + \ldots + (\det(\mx{A})) (-1)^n . \end{equation} | (10.94) |

$\square$

In this section, we show that eigenvalues and eigenvectors can be truly useful using a set of examples.

Example 10.10:
Calculation of $\mx{A}^n$

Diagonalization can be used to simplify and understand $\mx{A}^n$. If $\mx{A}$ is diagonalizable, then $\mx{A} = \mx{V} \mx{D} \mx{V}^{-1}$. Calculation of $\mx{A}^n$ can then be rewritten

In other word, calculation of $\mx{A}^n$ can be written explicitly using eigenvalue decomposition

Diagonalization can be used to simplify and understand $\mx{A}^n$. If $\mx{A}$ is diagonalizable, then $\mx{A} = \mx{V} \mx{D} \mx{V}^{-1}$. Calculation of $\mx{A}^n$ can then be rewritten

\begin{equation} \mx{A}^n = \mx{V} \mx{D} \mx{V}^{-1} \mx{V} \mx{D} \mx{V}^{-1} \mx{V} \mx{D} \mx{V}^{-1} \ldots \mx{V} \mx{D} \mx{V}^{-1} = \mx{V} \mx{D} \mx{D} \mx{D} \ldots \mx{D} \mx{V}^{-1} = \mx{V} \mx{D}^n \mx{V}^{-1}. \end{equation} | (10.95) |

Example 10.11:
Calculation of $\mx{A}^n$

What is

What happens when $n \rightarrow \infty$?
The matrix is symmetric so we known that the eigenvalues are going to be real and that it is possible to diagonalize with an orthogonal matrix. This is not so important here, but a factorization is

According to the previous example, we have

As $n$ tends to infinity $(0.5)^n$ tends to zero and

What is

\begin{equation} \begin{pmatrix} 0.68 & -0.24 \\ -0.24 & 0.82 \end{pmatrix}^n? \end{equation} | (10.96) |

\begin{equation} \begin{pmatrix} 0.68 & -0.24 \\ -0.24 & 0.82 \end{pmatrix} = \mx{U} \mx{D} \mx{U}^\T = \begin{pmatrix} 0.8 & -0.6 \\ 0.6 & 0.8 \end{pmatrix} \begin{pmatrix} 0.5 & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 0.8 & -0.6 \\ 0.6 & 0.8 \end{pmatrix}^\T \end{equation} | (10.97) |

\begin{equation} \begin{pmatrix} 0.68 & -0.24 \\ -0.24 & 0.82 \end{pmatrix}^n = \begin{pmatrix} 0.8 & -0.6 \\ 0.6 & 0.8 \end{pmatrix} \begin{pmatrix} (0.5)^n& 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 0.8 & -0.6 \\ 0.6 & 0.8 \end{pmatrix}^\T. \end{equation} | (10.98) |

\begin{equation} \begin{pmatrix} 0.68 & -0.24 \\ -0.24 & 0.82 \end{pmatrix}^n \rightarrow \begin{pmatrix} 0.8 & -0.6 \\ 0.6 & 0.8 \end{pmatrix} \begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 0.8 & -0.6 \\ 0.6 & 0.8 \end{pmatrix}^\T = \begin{pmatrix} 0.36 & -0.48 \\ -0.48 & 0.64 \end{pmatrix} . \end{equation} | (10.99) |

Example 10.12:
Pagerank: Calculation of $\vc{p}$ so that $\mx{B}\vc{p}=\vc{p}$

In the beginning of the chapter, we studied how the importance or rank of webpages can be modelled as a vector $\vc{p}$, which can be found by the linear equation $\mx{B}\vc{p} = \vc{p}$. In practice, it is infeasible to calculate $\vc{p}$ using Gaussian elimination on the system of equations $\mx{B}\vc{p} = \vc{p}$. It turns out that it is much more efficient to calculate $\vc{p}$ by iteration. Start with any vector $\vc{w}_0$ and form

For this type of matrix $\mx{B}$, it is possible to show that $\vc{w}_k$ quickly converge to $\vc{p}$ such that $\mx{B}\vc{p} = \vc{p}$.
To understand this, one first shows that the matrix has one eigenvector $\vc{v}_1$ with eigenvalue $\lambda_1 = 1$ and that
all other eigenvectors $\vc{v}_2, \ldots, \vc{v}_n$ have eigenvalues with absolute magnitude smaller than 1.
This means that for any vector $\vc{w}_0$, we have $\vc{w}_k = \mx{B}^k \vc{w}_0$, with

where

when $k\rightarrow \infty$.
Introduce the vector $\vc{e}_1 = \begin{pmatrix} 1 & 0 & \ldots & 0 \end{pmatrix}$.
We see that

where $\mx{V} \vc{e}_1 = \vc{v}_1$ is the first eigenvector and where the unknown scalar
$\mu = \vc{e}_1^\T \mx{V}^{-1} \vc{w}_0 $. In other words, by choosing a random vector $\vc{w}_0$ and
iterating $\vc{w}_{k+1} = \mx{B} \vc{w}_k$, this vector will converge to a multiple of $\vc{v_1}$,
which in turn is a multiple of $\vc{p}$.
By construction, the matrix $\mx{B}$ is such that if the sum of the elements of a vector $\vc{w}$ is positive and sums to 1, then the vector $\mx{B} \vc{w}$ is also positive and sums to one, so by starting with such a probability vector $\vc{w}_k \rightarrow \vc{p}$.
In Example 10.2, there were four homepages on internet and the matrix $\mx{B}$ was

This matrix has four eigenvalues $1$, $0.42$, $-0.42$, and $0$. Since the absolute value of the eigenvalus $\lambda_j$ is smaller than $0.5$, we have that for each 10 iterations, the error $||\vc{w}_k - \vc{p}||$ will be decreased by a factor of 1000.
Even though the number of web pages on internet is large, it is actually possible to perform the iterations $\vc{w}_{k+1} = \mx{B} \vc{w}_k$ and since only a few iterations are needed, it is actually feasible to calculate $\vc{p}$ in this way.

In the beginning of the chapter, we studied how the importance or rank of webpages can be modelled as a vector $\vc{p}$, which can be found by the linear equation $\mx{B}\vc{p} = \vc{p}$. In practice, it is infeasible to calculate $\vc{p}$ using Gaussian elimination on the system of equations $\mx{B}\vc{p} = \vc{p}$. It turns out that it is much more efficient to calculate $\vc{p}$ by iteration. Start with any vector $\vc{w}_0$ and form

\begin{equation} \vc{w}_{k+1} = \mx{B} \vc{w}_k . \end{equation} | (10.100) |

\begin{equation} \vc{w}_k = \mx{B}^k \vc{w}_0 = (\mx{V} \mx{D} \mx{V}^{-1})^k \vc{w}_0 = \mx{V} \mx{D}^k \mx{V}^{-1} \vc{w}_0, \end{equation} | (10.101) |

\begin{equation} \mx{D}^k= \begin{pmatrix} \lambda_1^k & 0 & \ldots & 0 \\ 0 & \lambda_2^k & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & \lambda_n^k \\ \end{pmatrix} \rightarrow \begin{pmatrix} 1& 0 & \ldots & 0 \\ 0 & 0 & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & 0 \\ \end{pmatrix}, \end{equation} | (10.102) |

\begin{equation} \vc{w}_k = \mx{B}^k \vc{w}_0 \rightarrow \mx{V} \vc{e}_1 \vc{e}_1^\T \mx{V}^{-1} \vc{w}_0 = \vc{v}_1 \mu, \end{equation} | (10.103) |

\begin{equation} \mx{B} = 0.15 \begin{pmatrix} 1/4 & 1/4 & 1/4 & 1/4 \\ 1/4 & 1/4 & 1/4 & 1/4 \\ 1/4 & 1/4 & 1/4 & 1/4 \\ 1/4 & 1/4 & 1/4 & 1/4 \end{pmatrix} + 0.85 \begin{pmatrix} 0 & 1 & 1/2 & 1 \\ 1/2 & 0 & 1/2 & 0 \\ 1/2 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{pmatrix} . \end{equation} | (10.104) |

Eigenvectors can be used to better understand linear mappings.

Example 10.13:
Interpretation of Linear Mappings

In Example 9.7, we studied the linear mapping used to find the shadow of a cube on a planar surface. In that example, we calculated the transformation matrix as

Eigenvectors and eigenvalues are useful for interpreting the linear mapping.
The characteristic polynomial is $(\lambda-1)^2 \lambda$. For the double root eigenvalue $\lambda_1 = \lambda_2 = 1$, we can choose two linearly independent eigenvectors, such as $\vc{v_1} = (1, 0, 0)$ and $\vc{v_2} = (0, 0, 1)$. For the eigenvalue $\lambda_3 = 0$, the eigenvector is
$\vc{v_3} = t (0.5, 1.0, 0.25), t \neq 0$.
Since the first two eigenvectors span the plane $y=0$, all vectors lying in this plane are unaffected by the linear mapping. This is due to the fact that such a vector $\vc{x} = (x_1, 0, x_2)$ can be written as a linear combination of the two eigenvectors $\vc{v}_1$ and $\vc{v}_2$:

and performing the linear mapping on such a vector thus results in

Conversely, all vectors parallel to $\vc{v}_3 = (0.5, 1.0, 0.25)$ are mapped to the zero vector: Such a vector can be written $\vc{x} = t \vc{v}_3$, and it is mapped to $\mx{A}\vc{x} = t\mx{A}\vc{v}_3 = t\lambda_3\vc{v}_3 = t 0 \vc{v}_3 = \vc{0}$.

In Interactive Illustration 10.7, we show how this can be applied to shadow casting. Assume we have a point represented by the vector $\vc{p}$ on the object. Since the three eigenvectors are linearly independent, we can express the vector $\vc{p}$ as a linear combination of the three eigenvectors,

This can be interpreted as having two components that specify a point on the "ground" (a point in the plane) and then one component that goes in the direction of the light source. We now see what happens when perform the linear mapping $\mx{A}\vc{p}$:

The first two terms still specify a point in the plane, and the last term vanishes since $\lambda_3=0$. This can be interpreted as the occluding point being projected down to the plane in the direction of the third eigenvector.

In Example 9.7, we studied the linear mapping used to find the shadow of a cube on a planar surface. In that example, we calculated the transformation matrix as

\begin{equation} \mx{A} = \left(\begin{array}{ccc} 1 & -0.5 & 0\\ 0 & 0 & 0\\ 0 & -0.25 & 1\\ \end{array}\right). \end{equation} | (10.105) |

\begin{equation} \vc{x} = x_1 \vc{v}_1 + x_2 \vc{v}_2, \end{equation} | (10.106) |

\begin{equation} \mx{A}\vc{x} = \mx{A}(x_1 \vc{v}_1 + x_2 \vc{v}_2)\\ = x_1 \mx{A}\vc{v}_1 + x_2 \mx{A} \vc{v}_2\\ = x_1 \lambda_1 \vc{v}_1 + x_2 \lambda_2 \vc{v}_2\\ = x_1 \vc{v}_1 + x_2 \vc{v}_2 = \vc{x}. \end{equation} | (10.107) |

In Interactive Illustration 10.7, we show how this can be applied to shadow casting. Assume we have a point represented by the vector $\vc{p}$ on the object. Since the three eigenvectors are linearly independent, we can express the vector $\vc{p}$ as a linear combination of the three eigenvectors,

\begin{equation} \vc{p} = \alpha_1 \vc{v}_1 + \alpha_2 \vc{v_2} + \alpha_3 \vc{v_3}. \end{equation} | (10.108) |

\begin{equation} \mx{A}\vc{p} = \alpha_1 \mx{A}\vc{v}_1 + \alpha_2 \mx{A}\vc{v_2} + \alpha_3 \mx{A}\vc{v_3} = \alpha_1 \lambda_1 \vc{v}_1 + \alpha_2 \lambda_2 \vc{v}_2 + \alpha_3 \lambda_3 \vc{v}_3. \end{equation} | (10.109) |

Example 10.14:
Waves and Vibrations 1

One of the many success stories of linear algebra is the use for understanding*partial differential equations*. In a series of three examples, we will study small vibrations. In the first example, we model small vibrations on a string as having one single weight of mass $m$ in the middle. Assume that the string is fastened at two points with a distance $l$ from each other. Introduce a variable $x$ as the horizontal distance to the mass from the mid-point as shown in Example 10.8.
The string is affected by the force $2 T \sin(\alpha)$, where $\alpha$ is the angle as illustrated in the figure. Assuming that the vibrations are small, we can approximate $\sin(\alpha)$ with $x/(l/2)$, which means that the force is approximately

Using Newtons first law that $F = ma$, we obtain a differential equation

Introduce a constant $q^2 = \frac{4T}{ml}$. This gives

which has the following solutions

where $a$ is an arbitrary amplitude and $\phi$ is an arbitrary phase offset, that can be determined by the initial conditions, for example how hard was the guitar string picked ($a$) and when was it released ($\phi$).

One of the many success stories of linear algebra is the use for understanding

\begin{equation} F = - \frac{4T}{l} x. \end{equation} | (10.110) |

\begin{equation} x'' = - \frac{4T}{ml} x . \end{equation} | (10.111) |

\begin{equation} x'' = - q^2 x , \end{equation} | (10.112) |

\begin{equation} x(t) = a \sin(qt+\phi) , \end{equation} | (10.113) |

Example 10.15:
Waves and Vibrations 2

In this example, we elaborate further on the previous example by refining the model slightly. Here, we model the string as having three small masses at equidistant points along the string, see Interactive Illustration 10.9. Assume, as before, that the string is fastened at two points with a distance $l$ from each other. Introduce three variables $(x_1, x_2, x_3)$ as the horizontal distances to the three masses as shown in the figure. Assuming as before that the vibrations are small, the differential equation for the position $x_1$ of the first mass is

For the other two distances, we have

and

Introduce $q = \frac{4T}{ml}$. The three equations can be written using matrix notation as

or simply

The differential equation is difficult to solve, primarily because the three functions $x_1(t)$, $x_2(t)$ and $x_3(t)$ are intertwined. It would be much easier if they were decoupled. This is precisely what the eigenvalue decomposition can help us with.
Since $\mx{A}$ is symmetric, we know that it is diagonalizable with a rotation (orthogonal) matrix $\mx{R}$. Thus we have

or

By changing coordinate system so that

we obtain a decoupled system of equations

or

Since the differential equations are decoupled, they can be solved one by one independently.
The solutions are

Finally, the solution can be expressed in the original coordinate system using $ \vc{x} = \mx{R} \vc{z}$

The motion is characterized by the three * eigenfrequencies*, which depend on the eigenvalues $\lambda_i$. For each eigenvalue, there is a characteristic spatial shape of the string. These are given by the eigenvectors. See Figure 10.9.

To get the full solution including the amplitudes ($a_1$, $a_2$, and $a_3$) and phases ($\phi_1$, $\phi_2$, and $\phi_3$) we need information about the initial conditions. In the interactive illustration we can set the position of the weights at time $t=0$, effectively assigning values to $x_1(0)$, $x_2(0)$ and $x_3(0)$. These can be used directly on Equation (10.125), but that equation is quite hairy and hard to solve. Instead, we again change the coordinate system of the constraints using

and use the newly obtained constraints $\vc{z}(0)$ on the equations in (10.124). However, we only have three equations and six unknowns (all amplitudes and phases). Hence, we need three more constraints. These are set in the interactive illustration by changing the inital velocities $\dot{x}_1(0)$, $\dot{x}_2(0)$, and $\dot{x}_3(0)$ (i.e., moving the yellow arrows). By differentiating the equations in (10.124), we get

Dividing each equation in (10.124) by the corresponding equation in (10.127) and setting $t$ to zero gives

where $\dot{z}(0)$ can again be obtained using $\dot{\vc{z}}(0) = \mx{R}^\T \dot{\vc{x}}(0)$. Since we have three equations and three unknowns ($\phi_1$, $\phi_2$ and $\phi_3$), these can be solved for. They can now be inserted in the equations in either (10.124) or (10.127) to obtain $a_1$, $a_2$, and $a_3$. This is what is automatically done in the interactive figure when the weights and arrows are changed by the user. In the next step of the illustration, Equation (10.125) is used to animate the weights.

In this example, we elaborate further on the previous example by refining the model slightly. Here, we model the string as having three small masses at equidistant points along the string, see Interactive Illustration 10.9. Assume, as before, that the string is fastened at two points with a distance $l$ from each other. Introduce three variables $(x_1, x_2, x_3)$ as the horizontal distances to the three masses as shown in the figure. Assuming as before that the vibrations are small, the differential equation for the position $x_1$ of the first mass is

\begin{equation} x_1'' = - \frac{8T}{ml} x_1 + \frac{4T}{ml} x_2 . \end{equation} | (10.114) |

\begin{equation} x_2'' = \frac{4T}{ml} x_1 - \frac{8T}{ml} x_2 + \frac{4T}{ml} x_3 \end{equation} | (10.115) |

\begin{equation} x_3'' = \frac{4T}{ml} x_2 - \frac{8T}{ml} x_3 . \end{equation} | (10.116) |

\begin{equation} \left(\begin{array}{c} x_1'' \\ x_2'' \\ x_3'' \\ \end{array}\right) = q^2 \left(\begin{array}{ccc} -2 & 1 & 0\\ 1 & -2 & 1\\ 0 & 1 & -2\\ \end{array}\right) \left(\begin{array}{c} x_1 \\ x_2 \\ x_3 \\ \end{array}\right) \end{equation} | (10.117) |

\begin{equation} \vc{x}'' = \mx{A} \vc{x} . \end{equation} | (10.118) |

\begin{equation} \vc{x}'' = \mx{R} \mx{D}\mx{R}^\T \vc{x} \end{equation} | (10.119) |

\begin{equation} \mx{R}^\T \vc{x}'' = \mx{D}\mx{R}^\T \vc{x} . \end{equation} | (10.120) |

\begin{equation} \vc{z} = \mx{R}^\T \vc{x}, \end{equation} | (10.121) |

\begin{equation} \vc{z}'' = \mx{D} \vc{z} \end{equation} | (10.122) |

\begin{equation} \begin{split} z_1'' = \lambda_1 z_1, \\ z_2'' = \lambda_2 z_2,\\ z_3'' = \lambda_3 z_3. \\ \end{split} \end{equation} | (10.123) |

\begin{equation} \begin{cases} \begin{array}{rrrl} z_1(t) = a_1 \sin( \sqrt{-\lambda_1} t + \phi_1), \\ z_2(t) = a_2 \sin( \sqrt{-\lambda_2} t + \phi_2), \\ z_3(t) = a_3 \sin( \sqrt{-\lambda_3} t + \phi_3). \end{array} \end{cases} \end{equation} | (10.124) |

\begin{equation} \begin{cases} \begin{array}{rrrl} x_1(t) = a_1 r_{11} \sin( \sqrt{-\lambda_1} t + \phi_1) + a_2 r_{12} \sin( \sqrt{-\lambda_2} t + \phi_2) + a_3 r_{13} \sin( \sqrt{-\lambda_3} t + \phi_3), \\ x_2(t) = a_1 r_{21} \sin( \sqrt{-\lambda_1} t + \phi_1) + a_2 r_{22} \sin( \sqrt{-\lambda_2} t + \phi_2) + a_3 r_{23} \sin( \sqrt{-\lambda_3} t + \phi_3), \\ x_3(t) = a_1 r_{31} \sin( \sqrt{-\lambda_1} t + \phi_1) + a_2 r_{32} \sin( \sqrt{-\lambda_2} t + \phi_2) + a_3 r_{33} \sin( \sqrt{-\lambda_3} t + \phi_3). \\ \end{array} \end{cases} \end{equation} | (10.125) |

To get the full solution including the amplitudes ($a_1$, $a_2$, and $a_3$) and phases ($\phi_1$, $\phi_2$, and $\phi_3$) we need information about the initial conditions. In the interactive illustration we can set the position of the weights at time $t=0$, effectively assigning values to $x_1(0)$, $x_2(0)$ and $x_3(0)$. These can be used directly on Equation (10.125), but that equation is quite hairy and hard to solve. Instead, we again change the coordinate system of the constraints using

\begin{equation} \vc{z}(0) = \mx{R}^\T \vc{x}(0) \end{equation} | (10.126) |

\begin{equation} \begin{cases} \begin{array}{rrrl} \dot{z}_1(t) = a_1 \sqrt{-\lambda_1}\cos( \sqrt{-\lambda_1} t + \phi_1), \\ \dot{z}_2(t) = a_2 \sqrt{-\lambda_1}\cos( \sqrt{-\lambda_2} t + \phi_2), \\ \dot{z}_3(t) = a_3 \sqrt{-\lambda_1}\cos( \sqrt{-\lambda_3} t + \phi_3). \end{array} \end{cases} \end{equation} | (10.127) |

\begin{equation} \begin{cases} \begin{array}{rrrl} \frac{z_1(0)}{\dot{z}_1(0)} = \frac{1}{\sqrt{-\lambda_1}}\tan( \phi_1), \\ \frac{z_2(0)}{\dot{z}_2(0)} = \frac{1}{\sqrt{-\lambda_1}}\tan( \phi_2), \\ \frac{z_3(0)}{\dot{z}_3(0)} = \frac{1}{\sqrt{-\lambda_1}}\tan( \phi_3), \end{array} \end{cases} \end{equation} | (10.128) |

Example 10.16:
Waves and Vibrations 3

In the previous example, we modelled a string that vibrates without any external forces. Using vector notation, the model is

If there are external forces these add to the acceleration. Thus, a reasonable model is

An interesting question is to study how the string $\vc{x}(t)$ will move if it is influenced by a sinusoidal external force $\vc{u}(t) = \cos(\omega t) \vc{u_0}$. If we assume that there is a particular solution $\vc{x}(t) = \cos(\omega t) \vc{x}_0$,
then we obtain

that is,

This means that we can calculate how large the shape and size of the vibration $\vc{x}_0$ are, depending on the input $\vc{u}_0$
and of the frequency $\omega$, as

if $(-\omega^2 \mx{I} - \mx{A})$ is invertible. It is not invertible when
$\det{(-\omega^2 \mx{I} - \mx{A})} = 0$, that is, if $-\omega^2$ is an eigenvalue to $\mx{A}$.
Without going into detail, these eigenvalues are called * poles * to the so called * transfer function *.
The transfer function is loosely speaking a description on how inputs $ \vc{u}_0$ and outputs $\vc{x}_0$ are related.
The size of the output, as characterized by the size of $\vc{x}_0$, is large when $-\omega^2$ is close to one of the eigenvalues of $\mx{A}$. When $-\omega^2$ is exactly equal to one of the eigenvalues of $\mx{A}$, then
$(-\omega^2 \mx{I} - \mx{A})$ is not invertible. We will not go into what will happen in this case here.
Understanding where the poles of a system are, is of utmost importance to understanding the stability of the system. If there are poles corresponding to frequencies that commonly appear in noise or disturbances, undesired results may arise.
On 12 April 1831, the Broughton Suspension Bridge outside Manchester collapsed due to resonance from input $u(t)$ induced by troops marching in step.
Another famous example is the collapse of the Tacoma Narrows Bridge, see
this link.
This might not be an example of resonance, but rather a more complicated phenomenon called aeroelastic flutter.

A plane was flying from Warsaw to Krakow. As they were going past some beautiful landmarks, the pilot came over the intercom and instructed all who were interested in seeing the landmark to look out the right side of the plane. Many passengers did so, and the plane promptly crashed. Why?
There were too many poles in the right hand plane.
In the previous example, we modelled a string that vibrates without any external forces. Using vector notation, the model is

\begin{equation} \vc{x}'' = \mx{A} \vc{x} . \end{equation} | (10.129) |

\begin{equation} \vc{x}''(t) = \mx{A} \vc{x}(t) + \vc{u}(t). \end{equation} | (10.130) |

\begin{equation} -\omega^2 \cos(\omega t) \vc{x}_0 = \mx{A} \cos(\omega t) \vc{x}_0 + \cos(\omega t) \vc{u}_0, \end{equation} | (10.131) |

\begin{equation} (-\omega^2 \mx{I} - \mx{A}) \vc{x}_0 = \vc{u}_0. \end{equation} | (10.132) |

\begin{equation} \vc{x}_0 = (-\omega^2 \mx{I} - \mx{A})^{-1} \vc{u}_0, \end{equation} | (10.133) |

Example 10.17:
Understanding Covariance Matrices

Assume that we have a number of points $\vc{x}_1, \ldots, \vc{x}_n$, where each point $\vc{x}_i$ is represented as a column vector. The mean is a vector

which is illustrated as a red point in the illustration below.
The covariance matrix is

The covariance matrix is symmetric (and positive definite) by definition, so it can be diagonalized using an orthonormal matrix $\mx{U}$,

This means that it is possible to choose an orthonormal basis of eigenvectors.
The columns of $\mx{U} = \begin{pmatrix} \vc{u}_1 & \ldots & \vc{u}_n \end{pmatrix}$ are the eigenvectors. Assume that the order has been chosen so that corresponding
eigenvalues $\lambda_1, \ldots, \lambda_n$ are in decreasing order, i.e., $\lambda_1 \geq \ldots \geq \lambda_n \geq 0$. This means that
$\vc{u}_1$ corresponds to the direction with the largest variation and $\sqrt{\lambda_1}$ corresponds to the standard deviation in this direction. In the figure, $\sqrt{\lambda_1}$ times $\vc{u}_1$ is the longer of the two arrows and
$\sqrt{\lambda_2}$ times $\vc{u}_2$ is the shorter one.
See Interactive Illustration 10.10.

Assume that we have a number of points $\vc{x}_1, \ldots, \vc{x}_n$, where each point $\vc{x}_i$ is represented as a column vector. The mean is a vector

\begin{equation} \vc{m} = \frac{1}{n} \sum_{i=1}^{n} \vc{x}_i, \end{equation} | (10.134) |

\begin{equation} \mx{C} = \frac{1}{n} \sum_{i=1}^{n} (\vc{x}_i - \vc{m}) (\vc{x}_i - \vc{m})^\T . \end{equation} | (10.135) |

\begin{equation} \mx{C} = \mx{U} \mx{D} \mx{U}^\T . \end{equation} | (10.136) |

Example 10.18:
Eigenfaces

The main idea is that one can*learn* how images of faces look like and use this information for *image* or *video compression*, for *face detection*, or *face recognition*. Briefly, this is done by first calculating the mean and the covariance matrix for a set of images. Then the *eigenvectors* and *eigenvalues* of the covariance matrix are calculated. The eigenvectors corresponding to the largest eigenvalues are the ones that best encode image changes.
This is an example of *machine learning*. In more general terms, *machine learning* are tools for learning good representations of data, or learning how to recognize images. Methods involving linear algebra, mathematical statistics, analysis, and optimization are used on (often large) collections of data.

In this example, we have collected a set of face images shown in Figure 10.11. Images can be thought of as elements of a linear vector space. Each image $\vc{W}$ has $m$ rows and $n$ columns of image intensities, i.e.,

However, now we want to think of $\vc{W}$ as an element of a linear vector space, that is, we do not think of $\vc{W}$ as a matrix that can be multiplied with a vector!
Introduce a linear mapping from images to column vectors according to column stacking, that is,

where $N = mn$. Also introduce the inverse linear mapping from column vector $\vc{x}$ to image $\vc{W}$, i.e.,

In other words $f$ takes an image and produces a very long column vector and $f^{-1}$ takes a column vector and produces an image.
Assume that we have a number of images $\vc{W}_1, \ldots, \vc{W}_{23}$ as in Figure 10.11.
The images are all of the same size $81 \times 41$ pixels. By mapping each image onto corresponding column vector through $\vc{x}_i = f(\vc{W}_i)$, we produce $23$ vectors $\vc{x}_1, \ldots, \vc{x}_{23}$, where each vector has
size $3321 \times 1$. The mean is a vector

which can be illustrated as an image after converting it from vector to image, i.e., $\vc{O} = f^{-1}(\vc{m)}$
The covariance matrix is

As in the previous example, we factorize the covariance matrix using an orthonormal matrix $\mx{U}$,

where we again sort the eigenvectors, i.e.,
the columns of $\mx{U} = \begin{pmatrix} \vc{u}_1 & \ldots & \vc{u}_N \end{pmatrix}$ so that corresponding
eigenvalues $\lambda_1, \ldots, \lambda_N$ are in decreasing order, i.e., $\lambda_1 \geq \ldots \geq \lambda_N \geq 0$. This means that
$\vc{u}_1$ correspond to the direction with the largest variation and $\sqrt{\lambda_1}$ corresponds to the standard deviation in this direction.
In the example, we chose the two vectors corresponding to the two largest eigenvalues as our eigenfaces (after converting from vector to image, i.e., $\vc{E}_1 = f^{-1}(\sqrt{\lambda_1} \vc{u}_1) $ and
$\vc{E}_2 = f^{-1}(\sqrt{\lambda_2} \vc{u}_2)$.
These three images $\vc{O}$, $\vc{E}_1$, and $\vc{E}_2$, are shown from left to right below.

The main idea is that one can

In this example, we have collected a set of face images shown in Figure 10.11. Images can be thought of as elements of a linear vector space. Each image $\vc{W}$ has $m$ rows and $n$ columns of image intensities, i.e.,

\begin{equation} \vc{W} = \begin{pmatrix} w_{1,1} & w_{1,2} & \ldots & w_{1,n} \\ w_{2,1} & w_{2,2} & \ldots & w_{2,n} \\ \vdots & \vdots & \vdots & \vdots \\ w_{m,1} & w_{m,2} & \ldots & w_{m,n} \end{pmatrix}. \end{equation} | (10.137) |

\begin{equation} \begin{pmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{m} \\ x_{m+1} \\ x_{m+2} \\ \vdots \\ x_{m+m} \\ \vdots \\ x_{m(n-1)+1} \\ x_{m(n-1)+2} \\ \vdots \\ x_{N} \end{pmatrix} = \vc{x} = f(\vc{W}) = \begin{pmatrix} w_{1,1} \\ w_{2,1} \\ \vdots \\ w_{m,1} \\ w_{1,2} \\ w_{2,2} \\ \vdots \\ w_{m,2} \\ \vdots \\ w_{1,n} \\ w_{2,n} \\ \vdots \\ w_{m,n} \end{pmatrix} , \end{equation} | (10.138) |

\begin{equation} \begin{pmatrix} w_{1,1} & w_{1,2} & \ldots & w_{1,n} \\ w_{2,1} & w_{2,2} & \ldots & w_{2,n} \\ \vdots & \vdots & \vdots & \vdots \\ w_{m,1} & w_{m,2} & \ldots & w_{m,n} \end{pmatrix} = \vc{w} = f^{-1}(\vc{x}) = \begin{pmatrix} x_{1} & x_{m+1} & \ldots & x_{m (n-1) + 1} \\ x_{2} & x_{m+2} & \ldots & x_{m (n-1) + 2} \\ \vdots & \vdots & \vdots & \vdots \\ x_{m} & x_{m+m} & \ldots & x_{m n} \end{pmatrix} . \end{equation} | (10.139) |

\begin{equation} \vc{m} = \frac{1}{n} \sum_{i=1}^{n} \vc{x}_i , \end{equation} | (10.140) |

\begin{equation} \mx{C} = \frac{1}{n} \sum_{i=1}^{n} (\vc{x}_i - \vc{m}) (\vc{x}_i - \vc{m})^\T . \end{equation} | (10.141) |

\begin{equation} \mx{C} = \mx{U} \mx{D} \mx{U}^\T, \end{equation} | (10.142) |

We round off this chapter with an example that illustrates that there is much more to learn on this topic.

Example 10.19:

The theory of eigenvectors and eigenvalues is much more general than that of square matrices. This example is a small detour, but is intended to show the generality of some of the concepts in this chapter.

One can study eigenvectors to general linear mappings or operators. One interesting example is the operation of taking the derivative of a function. This process is linear, but are there any "eigenvectors" (or eigenfunctions) of the derivative operator?

Think of "computing the derivative" as an operation $D$ that takes a function $g$ and returns a function $h$.

Study the exponential function $g$ defined as

Taking the derivative of $g$ gives

This means that $D(g) = \lambda g$. In other words, $g$ is an eigenvector to the derivative-operator with eigenvalue $\lambda$.

However, this takes us a bit away from the core of the course.

The theory of eigenvectors and eigenvalues is much more general than that of square matrices. This example is a small detour, but is intended to show the generality of some of the concepts in this chapter.

One can study eigenvectors to general linear mappings or operators. One interesting example is the operation of taking the derivative of a function. This process is linear, but are there any "eigenvectors" (or eigenfunctions) of the derivative operator?

Think of "computing the derivative" as an operation $D$ that takes a function $g$ and returns a function $h$.

\begin{equation} h = D(g) = \frac{d}{dx} g . \end{equation} | (10.143) |

\begin{equation} g(x) = e^{\lambda x} . \end{equation} | (10.144) |

\begin{equation} D(g) = \frac{d}{dx} g = \lambda e^{\lambda x} = \lambda g \end{equation} | (10.145) |

However, this takes us a bit away from the core of the course.