© 2015-2017 Jacob Ström, Kalle Åström, and Tomas Akenine-Möller Forum

# Chapter 9: Linear Mappings

This chapter is about linear mappings. A mapping is simply a function that takes a vector in and outputs another vector. A linear mapping is a special kind of function that is very useful since it is simple and yet powerful.

Example 9.1: Image Compresssion
Linear mappings are common in real world engineering problems. One example is in image or video compression. Here an image to be coded is broken down to blocks, such as the $4 \times 4$ pixel blocks as shown in Figure 9.1.

Instead of encoding the brightness of each pixel in the block directly, a linear transform is applied to each block. The first step is to make a vector out of the $4\times 4$ block, by stacking every column of the block until a 16-dimensional vector $\vc{x}$ is obtained. This vector is now multiplied by a transformation matrix $\mx{A}$, which gives a new 16-dimensional vector $\vc{y} =\mx{A}\vc{x}$. This can now be unstacked to form a $4 \times 4$ image block. As can be seen in the figure, these transformed blocks are now much more similar to each other. The first value of the block changes a lot, but most of the others are close to zero (gray in the image). This means that they can either be ignored (set to zero) or be represented cheaply in terms of bits. The decoder obtains $\hat{\vc{y}}$, which is an approximate version of $\vc{y}$ since some values may have been set to zero or otherwise been approximated. The decoder then performs the opposite transformation $\hat{\vc{x}} =\mx{A}^{-1} \hat{\vc{y}}$ recover an approximate version of the pixel brightness values.

A real encoder is more complicated than this picture, and contain many optimizations. For instance, the linear mapping is not implemented using a matrix multiplication, but in a faster way that is mathematically equivalent to it.
Figure 9.1: Simplified schematic of an image coder. The image is divided into blocks, and each block is transformed using a linear mapping. After the transformation the blocks contain many values close to zero (gray in the image) and thus become easy to compress.
Figure 9.1: Simplified schematic of an image coder. The image is divided into blocks, and each block is transformed using a linear mapping. After the transformation the blocks contain many values close to zero (gray in the image) and thus become easy to compress.
In Interactive Illustration 9.2, we show an example of how the variability changes as over an image.
Interactive Illustration 9.2: Left: original image. Middle: $4 \times 4$ pixel area to be transformed. Right: after transform. Drag the mouse over the image (or touch it on a mobile device) to change the $4\times 4$ area that is transformed. Notice how much less the transformed area changes compared to the pixels. This reduced variability translates into reduced bits after compression.
Interactive Illustration 9.2: Left: original image. Middle: $\hid{4 \times 4}$ pixel area to be transformed. Right: after transform. Drag the mouse over the image (or touch it on a mobile device) to change the $\hid{4\times 4}$ area that is transformed. Notice how much less the transformed area changes compared to the pixels. This reduced variability translates into reduced bits after compression.

We will start out this chapter by rehearsing what a mapping or a function for regular real values (scalars), and then move to vector mappings.

Definition 9.1: Mapping
A mapping $F$ is a rule that, for every item in one set $N$, provides one item in another set $M$
 $$F: N \rightarrow M.$$ (9.1)
This may look abstract, but in fact you have already been dealing with mappings, but under the name functions. Another way to state the same thing is
 $$y = F(x).$$ (9.2)
The form
 $$F: x \rightarrow y, x \in N.$$ (9.3)
is also used. For example the function $y = x^2$, shown in Figure 9.3, is a rule that, for every item in the set of real numbers $\mathbb{R}$, provides another item from the set of real numbers $\mathbb{R}$. Thus, in this example, both $N$ and $M$ equals $\mathbb{R}$.
$y = x^2$
Interactive Illustration 9.3: The function $y=x^2$ is an example of a mapping from the real numbers $N = \mathbb{R}$ to the set of real numbers $M = \mathbb{R}$.
Interactive Illustration 9.3: The function $\hid{y=x^2}$ is an example of a mapping from the real numbers $\hid{N = \mathbb{R}}$ to the set of real numbers $\hid{M = \mathbb{R}}$.
In linear algebra, the term mapping is traditionally used instead of function, but the meaning is the same, i.e., you start out with an item $x$, and you obtain an item $y$. We say that the value $x$ maps to the value $y$. Note that every $x$ only maps to one value $y$. This means that the curve in Figure 9.4 cannot be the graph of a function $y =f(x)$ since, for instance, the value $x=1$ is associated with two $y$-values, namely $+1$ and $-1$.
Interactive Illustration 9.4: The curve above cannot be the graph of a function $y=f(x)$ since there are two $y$-values for some $x$-values, such as $x=1$.
Interactive Illustration 9.4: The curve above cannot be the graph of a function $\hid{y=f(x)}$ since there are two $\hid{y}$-values for some $\hid{x}$-values, such as $\hid{x=1}$.
The set $N$ is called the domain of the mapping, whereas $M$ is called the codomain. The subset of the codomain that can be reached by the mapping is called the range of the codomain, or alternatively, the image of the codomain, and we denote it $V_F$. This is summarized in the following definition.

Definition 9.2: Domain, Codomain, and Range of a Mapping
Assume we have a mapping $y = F(x)$ where $x \in N$ and $y \in M$. Then $N$ is the domain of the mapping, and $M$ is the codomain of the mapping. The range (or alternatively, the image) of the mapping is the set $V_F$, where
 $$V_F = \{F(x) | x \in N\}.$$ (9.4)
The vertical bar should be read as "such that" or "with the property of". In this example, the expression can be read out as "$V_F$ is the set of all elements $F(x)$ such that $x$ belongs to the set $N$". For the example $y=x^2$, the range equals the set of positive real numbers including zero, $V_F = \mathbb{R}_{\geq 0}$. Therefore, in this case, we only reach a subset of the codomain, i.e., $V_F$ is a subset of $M$.

In linear algebra, the inputs and outputs of a function are vectors instead of scalar. Assume we have a coordinate system $\vc{e}_1,\vc{e}_2$ and that $\begin{pmatrix}x_1 \\ x_2 \\\end{pmatrix}$ is the coordinate for the vector $\vc{x}$. We can now have a function $\vc{y} = F( \vc{x} )$ that maps every $\vc{x}$ to a new vector $\vc{y} = \begin{pmatrix}y_1 \\ y_2 \\\end{pmatrix}$ according to, for instance,
 $$\begin{cases} y_1 = x_1 \\ y_2 = 0 \end{cases}$$ (9.5)
It is not possible to draw a simple graph for this mapping, since four dimensions would be needed for that (two for the input and two for the output). However, it is often possible to get an intuitive understanding of the mapping by drawing both the input and the output in the same diagram. Interactive Illustration 9.5 shows this for the mapping mentioned above. Note that you can move the red input arrow $\vc{x}$ and see how the blue output arrow $\vc{y}$ moves.
$\vc{x}$
$\vc{y}$
$\vc{e}_1$
$\vc{e}_2$
Interactive Illustration 9.5: An example of a mapping from $\vc{x}$ (red vector) to $\vc{y}$ (blue vector). As can be seen the vector is projected down on the $x$-axis.
Interactive Illustration 9.5: The range, or image, of the function $\hid{F}$ in this case is the x-axis (marked with green), since all outputs of the function land on the $\hid{x}$-axis.
As can be seen, the effect of the mapping is to project any input vector on to the $\vc{e}_1$-axis. Any vector in the plane can be used as input, hence the domain is $\mathbb{R}^2$. The codomain is also $\mathbb{R}^2$, since the output is a vector of two dimensions, but the range or image is the $\vc{e}_1$-axis. The range is marked with green in the second step of the figure.

A slightly more interesting example of a mapping is the following
 $$\begin{cases} y_1 = \cos(\frac{\pi}{3}) x_1 - \sin(\frac{\pi}{3}) x_2, \\ y_2 = \sin(\frac{\pi}{3}) x_1 + \cos(\frac{\pi}{3}) x_2. \end{cases}$$ (9.6)
As can be seen, the factors before the $x_1$s and $x_2$s resemble a rotation matrix (see Definition 6.10) by $\pi/3$ radians. This mapping is illustrated in the Interactive Illustration 9.6, where again the input vector is marked with red and the output vector is marked with blue.
$\vc{x}$
$\vc{y}$
$\vc{e}_1$
$\vc{e}_2$
Interactive Illustration 9.6: An example of a mapping from $\vc{x}$ (red vector) to $\vc{y}$ (blue vector). As can be seen, the vector is rotated $\frac{\pi}{3}$ around the origin.
Interactive Illustration 9.6: An example of a mapping from $\hid{\vc{x}}$ (red vector) to $\hid{\vc{y}}$ (blue vector). As can be seen, the vector is rotated $\hid{\frac{\pi}{3}}$ around the origin.
We have now learned that vector mappings are simply functions that take a vector as input and outputs another vector. Next we will explore when such mappings can be written using matrices.

As can be seen by playing around with Interactive Ilustration 9.6, the output vector is a rotated copy of the input vector with the rotation angle $\frac{\pi}{3}$. As a matter of fact, we can write Equation (9.6) in matrix form as
 $$\begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = \left(\begin{array}{rr} \cos \frac{\pi}{3} & -\sin \frac{\pi}{3} \\ \sin \frac{\pi}{3} & \cos \frac{\pi}{3} \end{array}\right) \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}$$ (9.7)
or, shorter,
 $$\vc{y} = \mx{A} \vc{x}.$$ (9.8)
It is now easy to see that the matrix $\mx{A}$ is just a two-dimensional rotation matrix as defined in Definition 6.10 in Chapter 6. When a mapping can be written in matrix form, i.e., in the form $\vc{y} = \mx{A} \vc{x}$, we call $\mx{A}$ the transformation matrix.

The example in Interactive Illustration 9.3 can also be written in matrix form,
 $$\begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = \left(\begin{array}{rr} 1 & 0 \\ 0 & 0 \end{array}\right) \begin{pmatrix} x_1 \\ x_2 \end{pmatrix},$$ (9.9)
where the transformation matrix in this case equals $\left(\begin{array}{rr} 1 & 0 \\ 0 & 0 \end{array}\right)$. That raises the question whether all vector mappings be written on the form $\vc{y} = \mx{A} \vc{x}$ for some $\mx{A}$ with constant coefficients? The answer is no. As an example, the mapping
 $$\begin{cases} y_1 = x_1 x_2 + x_2\\ y_2 = x_1 + e^{x_2} \end{cases}$$ (9.10)
cannot be written as $\vc{y} = \mx{A}\vc{x}$. It is of course possible to write $\begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = \left(\begin{array}{rr} x_2 & 1 \\ 1 & \frac{e^{x_2}}{x_2} \end{array}\right) \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}$, but that violates the rule that $\mx{A}$ should consist of constant coefficients, i.e, independent of $\vc{x}$. To investigate which mappings can be written in this form, we first introduce the concept of a linear mapping.

Definition 9.3: Linear Mapping
A linear mapping is a mapping $F$, which satisfies
 $$\begin{cases} F( \vc{x}' + \vc{x}'') = F(\vc{x}') + F(\vc{x}''), \\ F( \lambda \vc{x} ) = \lambda F(\vc{x}). \\ \end{cases}$$ (9.11)
The intuitive description of the first condition is that it shouldn't matter if you take the function of the sum or the sum of the functions, you get the same result. The second condition states that amplifying the input by a certain amount gives the same result as amplifying the output by the same amount.

Example 9.2: Shopping Cart to Cost
Assume that a shop only sells packages of penne, jars of Arrabiata sauce, and bars of chocolate. The contents of your shopping cart can be modelled as a vector space. Introduce addition of two shopping carts as putting all of the items of both carts in one cart. Introduce multiplication of a scalar as multiplying the number of items in a shopping cart with that scalar. Notice that here, there are practical problems with multiplying a shopping cart with non-integer numbers or negative numbers, which makes the model less useful in practice. Introduce a set of basis shopping carts. Let $\vc{e}_1$ correspond to the shopping cart containing one package of penne, let $\vc{e}_2$ correspond to the shopping cart containing one jar of Arrabiata sauce, and let $\vc{e}_3$ correspond to the shopping cart containing one bar of chocolate. Then each shopping cart $\vc{x}$ can be described by three coordinates $(x_1, x_2, x_3)$ such that $\vc{x} = x_1 \vc{e}_1 + x_2 \vc{e}_2 + x_3 \vc{e}_3$.

There is a mapping from shopping carts $\vc{x}$ to price $y \in \R$. We can introduce the matrix $\vc{A} = \begin{pmatrix}a_{11} & a_{12} & a_{13} \end{pmatrix}$ where $a_{11}$ is the price of a package of penne, $a_{12}$ is the price of a jar of Arrabiata sauce and $a_{13}$ is the price of a bar of chocolate. The price $y$ can now be expressed as $y = \mx{A} \vc{x}$.

In real life this map is often non-linear, e.g., a shop might have campaigns saying 'buy 3 for the price of 2'. But modelling the mapping as a linear map is often a reasonable and useful model. Again (as is common with mathematical modelling) there is a discrepancy between mathematical model and reality. The results of mathematical analysis must always be used with reason and critical thinking. Even if the cost of a shopping cart of 1 package of penne is 10, it does not always mean that you can sell packages of penne to the store for 10 each.

Theorem 9.1: Matrix Form of Linear Mappings
A mapping $\vc{y} = F(\vc{x})$ can be written in the form $\vc{y} = \mx{A}\vc{x}$ (matrix form) if and only if it is linear.

To prove the theorem we need to prove both directions, that every linear mapping can be written as $\vc{y} = \mx{A}\vc{x}$, and that every mapping $\vc{y} = \mx{A}\vc{x}$ is linear. We will do this for the case when both $\vc{x}$ and $\vc{y}$ are two-dimensional mappings, but a similar proof works for any dimensionality of both $\vc{x}$ and $\vc{y}$.

Assume we have a basis $\vc{e}_1$,$\vc{e}_2$ in $N$ and $M$. We can then write the input $\vc{x}$ and the output $\vc{y}$ in this basis,
 \begin{align} \vc{x} & = x_1 \vc{e}_1 + x_2 \vc{e}_2,\\ \vc{y} & = y_1 \vc{e}_1 + y_2 \vc{e}_2.\\ \end{align} (9.12)
Inserting the expression for $\vc{x}$ in $\vc{y} = F(x)$, we get
 $$\vc{y} = F(\vc{x}) = F(x_1 \vc{e}_1 + x_2 \vc{e}_2),$$ (9.13)
and since $F$ is linear, we can apply the first and second conditions of linearity,
 $$\vc{y} = F(x_1 \vc{e}_1) + F(x_2 \vc{e}_2) = x_1F(\vc{e}_1) + x_2F(\vc{e}_2).$$ (9.14)
Since $F$ maps one vector to another vector, $F(\vc{e}_1)$ must also be a vector that can be expressed in the basis. Assume it has the coordinates $\begin{pmatrix}a_{11} \\ a_{21} \end{pmatrix}$ in the base $\vc{e}_1$, $\vc{e}_2$,
 $$F(\vc{e}_1) = a_{11}\vc{e}_1 + a_{21}\vc{e}_2.$$ (9.15)
Likewise, we assume
 $$F(\vc{e}_2) = a_{12}\vc{e}_1 + a_{22}\vc{e}_2.$$ (9.16)
We can now continue the expansion of $F(\vc{x})$ as
 $$\vc{y} = x_1(a_{11}\vc{e}_1 + a_{21}\vc{e}_2) + x_2(a_{12}\vc{e}_1 + a_{22}\vc{e}_2) = \\ (x_1 a_{11} + x_2 a_{12})\vc{e}_1 + (x_1 a_{21} + x_2 a_{22})\vc{e}_2 \\$$ (9.17)
Comparing this expression to the second row of Equation (9.12), we understand that $y_1$ must equal $a_{11}x_1 + a_{12}x_2$ and $y_2 = a_{21}x_1 + a_{22}x_2$. We have
 $$\begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = \left(\begin{array}{rr} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array}\right) \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}$$ (9.18)
that is
 $$\vc{y} = \mx{A}\vc{x}.$$ (9.19)
Now we need to prove the converse, that if $\vc{y} = \mx{A}\vc{x}$, then the mapping is linear. Assume we have one input $\vc{x}'$ with coordinates $\vc{x}' = x_1' \vc{e}_1 + x_2' \vc{e}_2$ or in vector form $\vc{x}' = \begin{pmatrix}x'_1\\x'_2\end{pmatrix}$, and another input $\vc{x}''$ or in vector form $\vc{x}'' = \begin{pmatrix}x''_1\\x''_2\end{pmatrix}$. The first condition follows directly from rule $(vii)$ of matrix arithmetic properties in Theorem 6.1, that is,
 $$F(\vc{x}' + \vc{x}'') = \mx{A}(\vc{x}' + \vc{x}'') = \mx{A}\vc{x}' + \mx{A}\vc{x}'' = F(\vc{x}') + F(\vc{x}'').$$ (9.20)
The second condition also follows from matrix algebra
 $$F(\lambda \vc{x'}) = \mx{A}(\lambda \vc{x}') = \lambda \mx{A} \vc{x}' = \lambda F(\vc{x'})$$ (9.21)
since a scalar $\lambda$ can be placed on either side of a matrix ($\mx{A}\lambda = \lambda \mx{A}$). The proof is thus complete.
$\square$

It is often the case that you want to find the matrix $\mx{A}$ for a particular linear mapping. Then the following theorem can be of great help.

Theorem 9.2: Mapping of Basis Vectors
For a linear mapping $\vc{y} = F(\vc{x})$ written on the form $\vc{y} = \mx{A}\vc{x}$ in the base $\vc{e}_1, \ \vc{e}_2, \ \ldots, \ \vc{e}_n$, the column vectors of $\mx{A}$ are the images of the basis vectors, $\vc{a}_{,1} = F(\vc{e}_1), \ \vc{a}_{,2} = F(\vc{e}_2), \ \ldots, \ \vc{a}_{,n} = F(\vc{e}_n)$.

We will prove the case when $N = M = 3$, but the proof for other values of $M$ and $N$ is similar.

The first basis vector $\vc{e}_1$ can be written as $\vc{e}_1 = 1 \vc{e}_1 + 0\vc{e}_2 + 0 \vc{e}_3$ and thus has the coordinates $(1, 0, 0)$. Using $\vc{x} = \begin{pmatrix} 1 \\ 0 \\ 0\end{pmatrix}$ in the formula $\vc{y} = \mx{A}\vc{x}$ gives
 $$\begin{pmatrix} y_1 \\ y_2 \\ y_3\end{pmatrix} = \left(\begin{array}{rrr} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33}\\ \end{array}\right) \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} = \left(\begin{array}{r} 1 a_{11} + 0 a_{12} + 0 a_{13}\\ 1 a_{21} + 0 a_{22} + 0 a_{23}\\ 1 a_{31} + 0 a_{32} + 0 a_{33}\\ \end{array}\right) = \begin{pmatrix} a_{11} \\ a_{21} \\ a_{31} \end{pmatrix},$$ (9.22)
which is the first column in $\mx{A}$. Thus the image $F(\vc{e}_1)$ of the basis vector $\vc{e}_1$ is the first column of $\mx{A}$, denoted $\vc{a}_{,1}$. Likewise, the second basis vector can be written $\vc{e}_2 = 0 \vc{e}_1 + 1 \vc{e}_2 + 0 \vc{e}_3$, and thus has the coordinates $(0, 1, 0)$. Its image is therefore
 $$\left(\begin{array}{rrr} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33}\\ \end{array}\right) \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} = \left(\begin{array}{r} 0 a_{11} + 1 a_{12} + 0 a_{13}\\ 0 a_{21} + 1 a_{22} + 0 a_{23}\\ 0 a_{31} + 1 a_{32} + 0 a_{33}\\ \end{array}\right) = \begin{pmatrix} a_{12} \\ a_{22} \\ a_{32} \end{pmatrix},$$ (9.23)
which is the second column vector $\vc{a}_{,2}$ of the matrix $\mx{A}$. Similarly for the third basis vector we get
 $$\left(\begin{array}{rrr} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33}\\ \end{array}\right) \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix} = \left(\begin{array}{r} 0 a_{11} + 0 a_{12} + 1 a_{13}\\ 0 a_{21} + 0 a_{22} + 1 a_{23}\\ 0 a_{31} + 0 a_{32} + 1 a_{33}\\ \end{array}\right) = \begin{pmatrix} a_{13} \\ a_{23} \\ a_{33} \end{pmatrix},$$ (9.24)
which is the third column of $\mx{A}$. This can be extended to any numbers of $M$ and $N$.
$\square$

We can now use this fact to easily find the matrix $\mx{A}$ for a linear mapping by simply observing what happens to the basis function.

Example 9.3: Finding a Linear Mapping's Matrix
A linear mapping $\vc{y} = F(\vc{x})$ rotates a two-dimensional vector $\vc{x}$ counterclockwise 90 degrees. Find the transformation matrix $\mx{A}$ of the matrix form $\vc{y} = \mx{A} \vc{x}$ when an the standard orthonormal basis $\vc{e}_1=(1,0)$, $\vc{e}_2=(0,1)$ is used.

The first column vector of $\mx{A}$ will be the image of the first basis vector $\vc{e}_1$, which is the $x$-axis. Rotating that 90 degrees counterclockwise brings it parallel with the $y$-axis, which has coordinates $(0, 1)$. Thus the first column vector is $\vc{a}_{,1}= \begin{pmatrix} 0 \\ 1 \end{pmatrix}$.

The second column vector will be the image of the second basis vector, which is the $y$-axis. Rotating the $y$-axis 90 degrees counter clockwise makes it ending up in $(-1, 0)$. The second column vector is thus $\vc{a}_{,1} = \begin{pmatrix} -1 \\ 0 \end{pmatrix}$ and we can write $\mx{A}$ as
 $$\mx{A} = \left(\begin{array}{rr} 0 & -1 \\ 1 & 0 \\ \end{array}\right) .$$ (9.25)
This is shown in Interactive Illustration 9.7.
$\textcolor{#aa0000}{\vc{e}_1 = \left(\begin{array}{c} 1 \\ 0\\ \end{array}\right)}$
$\textcolor{#0000aa}{\vc{F}(\vc{e}_1) = \left(\begin{array}{c} 0 \\ 1\\ \end{array}\right)}$
$\begin{array}{rr} \textcolor{#0000aa}{0} & \hid{-1} \\ \textcolor{#0000aa}{1} & \hid{0} \\ \end{array}$
$\begin{array}{rr} \textcolor{#0000aa}{0} & \hid{-1} \\ \textcolor{#0000aa}{1} & \hid{0} \\ \end{array}$
$\begin{array}{rr} \hid{0} & \textcolor{#00aa00}{-1} \\ \hid{1} & \textcolor{#00aa00}{0} \\ \end{array}$
$\textcolor{#aa0000}{\vc{e}_2 = \left(\begin{array}{c} 0 \\ 1\\ \end{array}\right)}$
$\textcolor{#00aa00}{\vc{F}(\vc{e}_2) = \left(\begin{array}{r} -1 \\ 0\\ \end{array}\right)}$
$\vc{y} = \left(\begin{array}{rr} \hid{0} & \hid{-1} \\ \hid{1} & \hid{0} \\ \end{array}\right) \vc{x}$
Interactive Illustration 9.7: To get the transformation matrix $\mx{A}$ of a $90$-degree counterclockwise rotation, we start with the first basis vector $\vc{e}_1 = \left(\begin{array}{c}1\\0\\ \end{array}\right)$ marked in red.
Interactive Illustration 9.7: To get the transformation matrix $\hid{\mx{A}}$ of a $\hid{90}$-degree counterclockwise rotation, we start with the first basis vector $\hid{\vc{e}_1 = \left(\begin{array}{c}1\\0\\ \end{array}\right)}$ marked in red.

Assume we have a linear mapping $\vc{y} = F(\vc{x})$, and that the output vector $\vc{y}$ can be fed into another linear mapping $\vc{z}= G(\vc{y})$. For this to work we need the domain of $G$ to be equal the codomain of $F$. As an example, if $G(\vc{y})$ accepts two-dimensional vectors, i.e., the domain of $G$ is $\mathbb{R}^2$, then the output of $F$, must be a two-dimensional vector, i.e., the codomain of $F(\vc{x})$ must also be $\mathbb{R}^2$. We then call the mapping
 $$\vc{z} = G(F(\vc{x}))$$ (9.26)
a composite mapping.

Theorem 9.3: Composition of Linear Mappings
If $\vc{y} = F(\vc{x})$ and $\vc{z} = G(\vc{y})$ are both linear mappings then the composite mapping $\vc{z} = G(F(\vc{x}))$ is also linear.

Since $\vc{y} = F(\vc{x})$ is a linear mapping, it can be written on matrix from $\vc{y} = \mx{A}\vc{x}$, and likewise can $\vc{z} = G(\vc{y})$ be written as $\vc{z} = \mx{B}\vc{y}$. We can then write
 $$\vc{z} = \mx{B}\vc{y} = \mx{B}(\mx{A}\vc{x}) = (\mx{B}\mx{A})\vc{x} = \mx{C}\vc{x},$$ (9.27)
where $\mx{C}$ is a new matrix equal to $\mx{B}\mx{A}$. But this is just a matrix equation of the form $\vc{z} = \mx{C}\vc{x}$ which, according to Theorem 9.1 means that the mapping from $\vc{x}$ to $\vc{z}$ must also be a linear mapping.
$\square$

Example 9.4: Composition of Linear Mappings
Find a mapping $F$ that rotates a two-dimensional vector by 30 degrees, and then multiplies the $x$-coordinate by two.

We split this into two parts. First we find a mapping $\vc{y} =G(\vc{x})$ that rotates 30 degrees, and then another mapping $\vc{z} = H(\vc{y})$ that doubles the $x$-coordinate.

Let $\mx{A}$ be the transformation matrix for $G$. We know from Definition 6.10 that a rotation by $\phi$ is obtained by $\left(\begin{array}{rr} \cos \phi & -\sin \phi \\ \sin \phi & \cos \phi \end{array}\right).$ Thus, setting $\phi = \frac{\pi}{6}$, we get
 $$\mx{A} = \left(\begin{array}{rr} \frac{\sqrt{3}}{2} & -\frac{1}{2} \\ \frac{1}{2} & \frac{\sqrt{3}}{2} \end{array}\right).$$ (9.28)
Let $\mx{B}$ be the transformation matrix for $H$. The task of this is to double the $x$-coordinate, but leave the $y$-coordinate intact. This can be solved by setting
 $$\mx{B} = \left(\begin{array}{rr} 2 & 0 \\ 0 & 1 \end{array}\right).$$ (9.29)
Finally, the transformation matrix $\mx{C}$ for the composite mapping $\vc{z} = H(G(\mx{x}))$ is
 $$\mx{C} = \mx{B}\mx{A} =\left(\begin{array}{rr} 2 & 0 \\ 0 & 1 \end{array}\right) \left(\begin{array}{rr} \frac{\sqrt{3}}{2} & -\frac{1}{2} \\ \frac{1}{2} & \frac{\sqrt{3}}{2} \end{array}\right) = \left(\begin{array}{rr} \sqrt{3} & -1 \\ \frac{1}{2} & \frac{\sqrt{3}}{2} \end{array}\right).$$ (9.30)

For some mappings, several inputs $\vc{x}$ can map to the same value $F(\vc{x})$. As an example, in Interactive Illustration 9.5, where we simply project a point onto the $x$-axis by setting the $y$-coordinate to zero, both $\vc{x}_1 = (1,5)$ and $\vc{x}_2 = (1,2)$ will map to the same point $(1,0) = F(\vc{x}_1) =F(\vc{x}_2)$. However, for some mappings, the result will be unique, so if we start out with two different $\vc{x}_1 \neq \vc{x}_2$, we know that $F(\vc{x}_1) \neq F(\vc{x}_2)$. We call such mappings injective.

Definition 9.4: Injective mappings
A mapping $y = F(x)$ is injective, if any two different vectors $\vc{x}_1 \neq \vc{x}_2$ will always give rise to two different images $F(\vc{x}_1) \neq F(\vc{x}_2)$.
Another way to state this is to say that an injective mapping has the property that if two images $F(\vc{x}_1)$ and $F(\vc{x}_2)$ are equal, then $\vc{x_1}$ must equal $\vc{x_2}$.

For some mappings $\vc{y} = F(\vc{x})$, we can reach every point $\vc{y}$ in the codomain by selecting a suitable $\vc{x}$. When the range of the mapping covers the codomain like this, we call the mapping surjective. A mapping that is both injective and surjective is called bijective.

Definition 9.5: Surjective mapping
A mapping $\vc{y} =F(\vc{x})$, where $\vc{x} \in N$ and $\vc{y} \in M$ is surjective if the range $V_F$ is equal to to the codomain $M$, $V_F = M$.

Definition 9.6: Bijective mapping
A mapping is bijective if it is both injective and surjective.
We will now go through a few examples of mappings and investigate whether they are injective, surjective, and bijective.

Example 9.5: Type of Mapping I
Consider the mapping $\vc{y} = F(\vc{x})$
 $$\begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = F\left(\begin{pmatrix} x_1 \\ x_2 \end{pmatrix}\right) = \left(\begin{array}{r} e^{x_1} \\ e^{x_2} \end{array}\right) ,$$ (9.31)
where $\vc{x}$ belongs to the domain $N = \mathbb{R}^2$ and the resulting vector $\vc{y}$ belongs to the codomain $M =\mathbb{R}^2$. This mapping is injective, since if we have two different input values $\vc{a}$ and $\vc{b}$, the output vector $\begin{pmatrix}e^{a_1}\\e^{a_2}\end{pmatrix}$ will be different from $\begin{pmatrix}e^{b_1}\\e^{b_2}\end{pmatrix}$ unless $\vc{a} =\vc{b}$. However, the mapping is not surjective, since it is impossible to reach negative values for $y_1$ and $y_2$ ($e^x > 0$ for all real values of $x$). The range $V_F$ is in this case only the quadrant $y_1 > 0, y_2 > 0$ and not equal to the codomain $M =\mathbb{R}^2$. Since the mapping is not surjective, it is also not bijective.

Example 9.6: Type of Mapping II
Investigate whether the mapping $\vc{y} = F(\vc{x})$
 $$\begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = F\left(\begin{pmatrix} x_1 \\ x_2 \end{pmatrix}\right) = \left(\begin{array}{r} 2 x_1 \\ 3 x_2 \end{array}\right)$$ (9.32)
is bijective. In this case both the domain $N$ and the codomain $M$ equals the plane of real numbers $\mathbb{R}^2$.

First we investigate whether it is injective. It is clear that it is, since two different input values $\vc{a}$ and $\vc{b}$ will generate two different output vectors $\begin{pmatrix}2 a_1\\3a_2\end{pmatrix} \neq \begin{pmatrix}2b_1\\3b_2\end{pmatrix}$ unless $\vc{a} = \vc{b}$.

Also all output vectors $\begin{pmatrix}y_1\\y_2\end{pmatrix}$ (the entire codomain $M = \mathbb{R}^2$) can be reached by using the input $\begin{pmatrix}x_1\\ x_2\end{pmatrix}= \begin{pmatrix}\frac{y_1}{2}\\\frac{y_2}{3}\end{pmatrix}$. This means that the mapping is also surjective. Since the mapping is both injective and surjective, it is bijective.
As it turns out, linear mappings that are bijective have many useful properties. As an example, it is simple to find the inverse mapping, as we will see in the following theorem.

Theorem 9.4: Equivalence of Inverse Mapping
For a linear mapping $\vc{y} = F(\vc{x})$ from $\vc{x} \in \mathbb{R}^n$ to $\vc{y} \in \mathbb{R}^n$, the following three statements are equivalent:
1. The mapping $F$ is bijective.
2. The transformation matrix for $F$ is invertible.
3. The images $F(\vc{e}_1), \ldots, F(\vc{e}_n)$ of the basis vectors $\vc{e}_1, \, \ldots, \, \vc{e}_n$ constitute a basis in $\mathbb{R}^n$.

i $\rightarrow$ ii: If the mapping $F$ is bijective, this means that the equation $F(\vc{x}) = \vc{y}$ has a unique solution $\vc{x}$ for every $\vc{y}$. Specifically, since $F$ is injective, we know that if $F(\vc{u})=F(\vc{v})$, $\vc{u}$ must be equal to $\vc{v}$. Also, since $F$ is surjective, every $\vc{y}$ can be reached with the proper $\vc{x}$. The matrix form of $F$ is $\vc{y} = \mx{A}\vc{x}$, and that this equation has a unique solution for every $\vc{y}$ is equivalent to saying that $\mx{A}$ is invertible according to Theorem 6.9.

ii $\rightarrow$ iii: According to Theorem 9.2, $F(\vc{e}_1)$ is just the first column of $\mx{A}$. Likewise, $F(\vc{e}_2)$ is the second column of $\mx{A}$, and so on. Theorem 6.9 tells us that if $\mx{A}$ is invertible, the columns of $\mx{A}$ span $\mathbb{R}^n$, which means that they form a basis in $\mathbb{R}^n$.

iii $\rightarrow$ i: We first prove that iii means that $F$ is injective, i.e., that if $F(\vc{u}) = F(\vc{v})$ then $\vc{u}$ must equal $\vc{v}$.

We can write $\vc{u} = u_1 \vc{e}_1 + u_2 \vc{e}_2 + \ldots + u_n\vc{e}_n$ and $\vc{v} = v_1 \vc{e}_1 + v_2 \vc{e}_2 + \ldots + v_n\vc{e}_n$. We know that $F(\vc{u}) = F(\vc{v})$, hence $F(\vc{u})-F(\vc{v}) = \vc{0}$, which is equivalent to
 $$F(u_1 \vc{e}_1 + u_2 \vc{e}_2 + \ldots + u_n\vc{e}_n) - F(v_1 \vc{e}_1 + v_2 \vc{e}_2 + \ldots + v_n\vc{e}_n) = \vc{0}.$$ (9.33)
Since $F$ is linear, this can be written
 $$u_1 F(\vc{e}_1) + u_2 F(\vc{e}_2) + \ldots + u_nF(\vc{e}_n) - \Big( v_1 F(\vc{e}_1) + v_2 F(\vc{e}_2) + \ldots + v_nF(\vc{e}_n)\Big) = \vc{0}.$$ (9.34)
Gathering terms gives us
 $$(u_1-v_1) F(\vc{e}_1) + (u_2 - v_2) F(\vc{e}_2) + \ldots + (u_n - v_n) F(\vc{e}_n) = \vc{0},$$ (9.35)
but since $F(\vc{e}_1), F(\vc{e}_2), \ldots, F(\vc{e}_n)$ are linearly independent, this only has the solution $(u_1-v_1) = 0, \,(u_2-v_2)=0, \, \ldots, \, (u_n-v_n)=0$ according to Theorem 5.2. Hence $\vc{u} = \vc{v}$ and $F$ is injective.

We then prove that $\vc{y} = F(\vc{x})$ is surjective, i.e., we can reach every $\vc{y}$ with a proper choice of $\vc{x}$.

We can write $\vc{x} = x_1\vc{e}_1 + x_2\vc{e}_2 + \ldots + x_n\vc{e}_n$, and hence $\vc{y} = F(\vc{x})$ can be written
 $$\vc{y} = F(\vc{x}) = F(x_1\vc{e}_1 + x_2\vc{e}_2 + \ldots + x_n\vc{e}_n).$$ (9.36)
Since $F$ is linear, this is equal to
 $$\vc{y} = x_1F(\vc{e}_1) + x_2F(\vc{e}_2) + \ldots x_n F(\vc{e}_n),$$ (9.37)
and since $F(\vc{e}_1), F(\vc{e}_2), \ldots, F(\vc{e}_n)$ constitue a basis in $\mathbb{R}^n$, these vectors span $\mathbb{R}^n$ and we can therefore reach any $\vc{y}$ in $\mathbb{R}^n$. Hence $F$ is surjective. Since $F$ is both surjective and injective, it is bijective, which concludes the proof.
$\square$

From this proof, the following theorem follows.

Theorem 9.5: Inverse Mapping Matrix
For a bijective linear mapping $\vc{y} = F(\vc{x})$ with transformation matrix $\mx{A}$, the inverse mapping $\vc{x} =F^{-1}(\vc{y})$ is linear and has the transformation matrix $\inv{\mx{A}}$.

We are seeking a mapping $\inv{F}$ with the property that $\inv{F}(F(\vc{x})) = \vc{x}$.

Consider $\vc{y} = F(\vc{x})$ on matrix from
 $$\vc{y} = \mx{A}\vc{x}.$$ (9.38)
Multiplying both sides with $\inv{\mx{A}}$ gives
 $$\inv{\mx{A}}\vc{y} = \inv{\mx{A}}\mx{A}\vc{x} = \mx{I} \vc{x} = \vc{x},$$ (9.39)
or
 $$\vc{x} = \inv{\mx{A}} \vc{y}.$$ (9.40)
This is exactly the mapping we were seeking above - if we input $\vc{y} = \mx{A}\vc{x}$ we get $\vc{x}$ as the output. Thus $\inv{\mx{A}}$ must be the transformation matrix of $\inv{F}$, and since $\inv{F}$ can be written on matrix form this way, it must be linear.
$\square$

In Interactive Illustration 9.8, we show how linear mappings can be used to generate a shadow effect.

Assume we have a plane that goes through the origin with the normal vector $\vc{n} = (0, 1, 0)$. Furthermore, assume the sun is infinitely far away from the origin, positioned in the direction $\vc{r} = (0.5,1.0, 0.25)$. We have a lot of points above the plane, and would like to know their shadows on the plane. Assume also that we have been reassured that this particular type of shadow projection can be expressed as a linear mapping.

It would be possible to calculate the shadow point by point, but it would be more convenient if we could create a linear mapping $\vc{y} =\mx{A}\vc{x}$ so that we could directly get the projected point $\vc{y}$ by simply multiplying the point $\vc{x}$ by the matrix $\mx{A}$. This should be possible, given that the problem is linear.

In this case, we take advantage of Theorem 9.2, i.e., to obtain $\mx{A}$ we only need to know what happens to the three unit vectors. The first column vector of the transformation matrix $\mx{A}$ is simply the image of the first unit vector $\vc{e_1} = (1,0,0)$, and so forth. Hence, if we find out where on the plane the shadow of the point $(1,0,0)$ lands, we have the first column of $\mx{A}$.

We can write the plane on the form $ax+by+cz+d = 0$, and since we know the normal is $(0, 1, 0)$ this simplifies to $y+d=0$. Furthermore, since the origin $(0, 0, 0)$ is contained in the plane, the equation further simplifies to $y=0$. We can now take the line that goes through the point $(p_x, p_y, p_z)$ and follow it in the direction of the sun, $(r_x, r_y, r_z)$,
 $$\begin{pmatrix} x \\ y \\ z \end{pmatrix}= \begin{pmatrix} p_x \\ p_y \\ p_z \end{pmatrix}+ \lambda \begin{pmatrix} r_x \\ r_y \\ r_z \end{pmatrix} = \begin{pmatrix} p_x \\ p_y \\ p_z \end{pmatrix}+ \lambda \begin{pmatrix} 0.5 \\ 1.0 \\ 0.25 \end{pmatrix}.$$ (9.41)
Inserting this line equation into the plane equation should give us the distance $\lambda$ that we need to travel from $P$ to get to the intersection. When we insert it into the plane equation $y = 0$ we get $p_y + 1.0 \lambda = 0$. For the first unit vector $\vc{e}_1=(1, 0, 0)$, we have $p_y = 0$, giving $0 + \lambda = 0$, which means that $\lambda = 0$. The intersection point is therefore at $(1, 0, 0) + 0\vc{r} = (1, 0, 0)$. This is therefore the first column of $\mx{A}$. For the second unit vector $\vc{e}_2 = (0, 1, 0)$, we have $p_y = 1$, giving the equation $1+\lambda = 0$, or $\lambda = -1$. Therefore the second column onf $\mx{A}$ is $(0, 1, 0) + (-1)(0.5, 1.0, 0.25) = (-0.5, 0, -0.25)$. The third unit vector $\vc{e}_3 = (0, 0, 1)$ gets $\lambda = 0$ and hence the last column of $\mx{A}$ is $(0, 0, 1) - 0\vc{r} = (0, 0, 1)$.

In summary, we have now created a linear mapping $\vc{y} = \mx{A}\vc{x}$ that takes a point $\vc{x}$ and maps it to its shadow $\vc{y}$. The matrix $\mx{A}$ equals
 $$\mx{A} = \left(\begin{array}{ccc} 1 & -0.5 & 0\\ 0 & 0 & 0\\ 0 & -0.25 & 1\\ \end{array}\right).$$ (9.42)
This is used in the figure below. Each point of the cube is just projected to the plane using $\vc{y} = \mx{A}\vc{x}$ with the matrix $\mx{A}$ from above. By drawing each face of the cube using the projected coordinates instead of the original coordinates, it is possible to draw the shadow of the cube.
$O$
$\vc{n}$
$\vc{r}$
$P$
$P'$
$\vc{x}$
$\vc{y}=\mx{A}\vc{x}$
Interactive Illustration 9.8: In this illustration, we will show how a linear mapping can be used to create shadow effects. We start by constructing a linear mapping that projects a point $P$ to $P$', which is the projection on to the plane along the vector $\vc{r}$.
Interactive Illustration 9.8: In this illustration, we will show how a linear mapping can be used to create shadow effects. We start by constructing a linear mapping that projects a point $\hid{P}$ to $\hid{P}$', which is the projection on to the plane along the vector $\hid{\vc{r}}$.
Now that we have this shadow example, we can investigate whether the mapping is bijective.

Examine whether the mapping in in Example 9.7 is bijective.

According to Theorem 9.4, a mapping $\vc{y} = \mx{A}\vc{x}$ bijective if and only if the transformation matrix $\mx{A}$ is invertible. We know from Theorem 7.10 from the chapter on determinants that a matrix is invertible only if the determinant is nonzero. However, the determinant
 $$\det(\mx{A}) = \left|\begin{array}{ccc} 1 & -0.5 & 0\\ 0 & 0 & 0\\ 0 & -0.25 & 1\\ \end{array}\right|$$ (9.43)
must be zero since the second row is zero (see Theorem 7.1$(iv)$ combined with $(vii)$). Hence the mapping cannot be bijective.

This makes sense. If you know a set of points and the direction of the sun, you can calculate their respective shadows. However, if you only have the shadows and the direction of the sun, you cannot recover the original positions since you only know the direction in which the point is located and not the distance to it. Hence the operation is not reversible, or in formal terms, bijective.

Example 9.9: Image compression revisited
In Example 9.1 we saw that a mapping $\vc{y} = \mx{A}\vc{x}$ was used as part of the compression process. Is this mapping likely to be bijective?

The answer is yes. In order to be useful, we want the decoded image to look the same as the original, at least if we spend enough bits. If we had a mapping that was not bijective, information would be lost in the transform step $\vc{y} = \mx{A}\vc{x}$. No matter how well we would preserve $\vc{y}$, i.e., no matter how many bits we would spend to describe $\vc{y}$, it would never be possible to recover $\vc{x}$. If the mapping is bijective, on the other hand, it is simple to recover $\vc{x}$ using $\vc{x} = \inv{\mx{A}}\vc{y}$.

 Chapter 8: Rank (previous) Chapter 10: Eigenvalues and Eigenvectors (next)