Definition 6.10: Two-Dimensional Rotation Matrix A $2\times 2$ rotation matrix is defined by
\begin{align}
\mx{R}(\phi) = &
\left(\begin{array}{rr}
\cos \phi & -\sin \phi \\
\sin \phi & \cos \phi
\end{array}
\right),
\end{align}
where $\phi$ is the number of radians that the matrix rotates by (counter-clockwise).
Theorem 6.1: Matrix Arithmetic Properties In the following, we assume that the sizes of the matrices are such that the
operations are defined.
\begin{equation}
\begin{array}{llr}
(i) & k(l\mx{A}) = (kl)\mx{A} & \spc\text{(associativity)} \\
(ii) & (k+l)\mx{A} = k\mx{A} +l\mx{A} & \spc\text{(distributivity)} \\
(iii) & k(\mx{A}+\mx{B}) = k\mx{A} +k\mx{B} & \spc\text{(distributivity)} \\
(iv) & \mx{A} + \mx{B} = \mx{B} + \mx{A} & \spc\text{(commutativity)} \\
(v) & \mx{A}+(\mx{B}+\mx{C})=(\mx{A}+\mx{B})+\mx{C} & \spc\text{(associativity)} \\
(vi) & \mx{A}+ (-1)\mx{A} = \mx{O} & \spc\text{(additive inverse)} \\
(vii) & \mx{A}(\mx{B}+\mx{C})=\mx{A}\mx{B}+\mx{A}\mx{C} & \spc\text{(distributivity)} \\
(viii) & (\mx{A}+\mx{B})\mx{C}=\mx{A}\mx{C}+\mx{B}\mx{C} & \spc\text{(distributivity)} \\
(ix) & (\mx{A}\mx{B})\mx{C}=\mx{A}(\mx{B}\mx{C}) & \spc\text{(associativity)} \\
(x) & \mx{I}\mx{A}=\mx{A}\mx{I}=\mx{A} & \spc\text{(multiplicative one)} \\
(xi) & (k\mx{A})^\T=k\mx{A}^\T & \spc\text{(transpose rule 1)} \\
(xii) & (\mx{A}+\mx{B})^\T=\mx{A}^\T+\mx{B}^\T & \spc\text{(transpose rule 2)} \\
(xiii) & (\mx{A}^\T)^\T=\mx{A} & \spc\text{(transpose rule 3)} \\
(xiv) & (\mx{A}\mx{B})^\T=\mx{B}^\T\mx{A}^\T & \spc\text{(transpose rule 4)} \\
\end{array}
\end{equation}
In addition, we have the following trivial set of rules: $1\mx{A}=\mx{A}$, $0\mx{A}=\mx{O}$,
$k\mx{O}=\mx{O}$, and $\mx{A}+\mx{O}=\mx{A}$.
Theorem 9.1: Matrix Form of Linear Mappings A mapping $\vc{y} = F(\vc{x})$ can be written in the form $\vc{y} = \mx{A}\vc{x}$ (matrix form) if and only if it is linear.
Theorem 6.9: Let $\mx{A}$ be a square matrix. Then the following statements are equivalent:
The column vectors of the matrix $\mx{A}$ span $\R^p$.
The row vectors of the matrix $\mx{A}$ span $\R^p$.
The equation $\mx{A} \vc{x} = \vc{y}$ has a solution for every $\vc{y}$.
The column vectors of the matrix $\mx{A}$ are linearly independent.
The row vectors of the matrix $\mx{A}$ are linearly independent.
The equation $\mx{A} \vc{x} = \vc{0}$ has only the solution $\vc{x}=\vc{0}$.
The matrix $\mx{A}$ is invertible.
Theorem 9.2: Mapping of Basis Vectors For a linear mapping $\vc{y} = F(\vc{x})$ written on the form
$\vc{y} = \mx{A}\vc{x}$ in the base $\vc{e}_1, \ \vc{e}_2, \ \ldots, \ \vc{e}_n$, the column vectors of $\mx{A}$ are the
images of the basis vectors, $\vc{a}_{,1} = F(\vc{e}_1), \ \vc{a}_{,2} = F(\vc{e}_2), \ \ldots, \ \vc{a}_{,n} = F(\vc{e}_n)$.
Definition 5.2: Linear Independence and Dependence The set of vectors, $\vc{v}_1,\dots,\vc{v}_n$, is said to be linearly independent, if the equation
\begin{equation}
k_1\vc{v}_1 + k_2 \vc{v}_2 + \dots + k_n \vc{v}_n = \vc{0},
\end{equation}
only has a single solution which is
\begin{equation}
k_1 = k_2 = \dots = k_n =0.
\end{equation}
If there is at least one other solution, then the set of vectors is linearly dependent.
Theorem 9.4: Equivalence of Inverse Mapping For a linear mapping $\vc{y} = F(\vc{x})$ from $\vc{x} \in \mathbb{R}^n$ to $\vc{y} \in \mathbb{R}^n$, the following three statements are equivalent:
The mapping $F$ is bijective.
The transformation matrix for $F$ is invertible.
The images $F(\vc{e}_1), \ldots, F(\vc{e}_n)$ of the basis vectors $\vc{e}_1, \, \ldots, \, \vc{e}_n$ constitute a basis in $\mathbb{R}^n$.
Theorem 7.10: The following equivalence holds for all square matrices $\mx{A}$,
\begin{equation}
\begin{array}{ll}
(i) & \spc\text{The column vectors of}\spc \mx{A} \spc\text{is a basis} \\
(ii) & \spc\text{The row vectors of}\spc \mx{A} \spc\text{is a basis} \\
(iii) & \spc\text{The matrix equation} \spc \mx{A} \vc{x} = 0 \spc\text{has only one solution} \spc \vc{x}=0 \\
(iv) & \spc\text{The matrix equation}\spc \mx{A} \vc{x} = \vc{y} \spc\text{has a solution for every} \spc \vc{y} \\
(v) & \spc\text{The matrix} \spc \mx{A} \spc\text{is invertible} \\
(vi) & \spc\det \mx{A} \neq 0 \\
\end{array}
\end{equation}
This chapter is about linear mappings. A mapping is simply a function
that takes a vector in and outputs another vector. A linear mapping is
a special kind of function that is very useful since it is simple and
yet powerful.
Example 9.1:Image Compresssion Linear mappings are common in real world engineering problems. One
example is in image or video compression. Here an image
to be coded is broken down to blocks, such as the $4 \times 4$ pixel
blocks as shown in Figure 9.1.
Instead of encoding the brightness of each pixel in the block
directly, a linear transform is applied to each block. The first step
is to make a vector out of the $4\times 4$ block, by stacking every
column of the block until a 16-dimensional vector $\vc{x}$ is
obtained. This vector is now multiplied by a transformation matrix
$\mx{A}$, which gives a new 16-dimensional vector $\vc{y} =\mx{A}\vc{x}$. This can now be unstacked to form a $4 \times 4$ image
block. As can be seen in the figure, these transformed blocks are now
much more similar to each other. The first value of the block changes
a lot, but most of the others are close to zero (gray in the
image). This means that they can either be ignored (set to zero) or be
represented cheaply in terms of bits. The decoder obtains
$\hat{\vc{y}}$, which is an approximate version of $\vc{y}$ since some
values may have been set to zero or otherwise been approximated. The
decoder then performs the opposite transformation $\hat{\vc{x}} =\mx{A}^{-1} \hat{\vc{y}}$ recover an approximate version of the pixel
brightness values.
A real encoder is more complicated than this picture, and contain many
optimizations. For instance, the linear mapping is not implemented
using a matrix multiplication, but in a faster way that is
mathematically equivalent to it.
Figure 9.1:
Simplified schematic of an image coder. The image is divided into blocks, and each block is transformed using a linear mapping. After the transformation the blocks contain many values close to zero (gray in the image) and thus become easy to compress.
Figure 9.1:
Simplified schematic of an image coder. The image is divided into blocks, and each block is transformed using a linear mapping. After the transformation the blocks contain many values close to zero (gray in the image) and thus become easy to compress.
Interactive Illustration 9.2:
Left: original image. Middle: $4 \times 4$ pixel area to be transformed. Right: after transform. Drag the mouse over the image (or touch it on a mobile device) to change the $4\times 4$ area that is transformed. Notice how much less the transformed area changes compared to the pixels. This reduced variability translates into reduced bits after compression.
Interactive Illustration 9.2:
Left: original image. Middle: $\hid{4 \times 4}$ pixel area to be transformed. Right: after transform. Drag the mouse over the image (or touch it on a mobile device) to change the $\hid{4\times 4}$ area that is transformed. Notice how much less the transformed area changes compared to the pixels. This reduced variability translates into reduced bits after compression.
We will start out this chapter by rehearsing what a mapping or a
function for regular real values (scalars), and then move to vector
mappings.
Definition 9.1:Mapping
A mapping $F$ is a rule that, for every item in one set $N$,
provides one item in another set $M$
\begin{equation}
F: N \rightarrow M.
\end{equation}
(9.1)
This may look abstract, but in fact you have already been dealing with
mappings, but under the name functions. Another way to state
the same thing is
\begin{equation}
y = F(x).
\end{equation}
(9.2)
The form
\begin{equation}
F: x \rightarrow y, x \in N.
\end{equation}
(9.3)
is also used. For example the function $y = x^2$, shown in
Figure 9.3, is a rule that, for every item in
the set of real numbers $\mathbb{R}$, provides another item from the
set of real numbers $\mathbb{R}$. Thus, in this example, both $N$ and
$M$ equals $\mathbb{R}$.
$y = x^2$
Interactive Illustration 9.3:
The function $y=x^2$ is an example of a mapping from the real numbers $N = \mathbb{R}$ to the set of real numbers $M = \mathbb{R}$.
Interactive Illustration 9.3:
The function $\hid{y=x^2}$ is an example of a mapping from the real numbers $\hid{N = \mathbb{R}}$ to the set of real numbers $\hid{M = \mathbb{R}}$.
In linear algebra, the term mapping is traditionally used
instead of function, but the meaning is the same, i.e., you start out with
an item $x$, and you obtain an item $y$. We say that the value $x$
maps to the value $y$. Note that every $x$ only maps to one
value $y$. This means that the curve in
Figure 9.4 cannot be the graph of a function $y =f(x)$ since, for instance, the value $x=1$ is associated with two
$y$-values, namely $+1$ and $-1$.
Interactive Illustration 9.4:
The curve above cannot be the graph of a function $y=f(x)$ since there are two $y$-values for some $x$-values, such as $x=1$.
Interactive Illustration 9.4:
The curve above cannot be the graph of a function $\hid{y=f(x)}$ since there are two $\hid{y}$-values for some $\hid{x}$-values, such as $\hid{x=1}$.
The set $N$ is called the domain of the mapping, whereas $M$ is
called the codomain. The subset of the codomain that can be
reached by the mapping is called the range of the codomain, or
alternatively, the image of the codomain, and we denote it
$V_F$. This is summarized in the following definition.
Definition 9.2:Domain, Codomain, and Range of a Mapping
Assume we have a mapping $y = F(x)$ where $x \in N$ and $y \in M$. Then $N$ is the domain of the mapping, and $M$ is the codomain of the mapping. The range (or alternatively, the image) of the mapping is the set $V_F$, where
\begin{equation}
V_F = \{F(x) | x \in N\}.
\end{equation}
(9.4)
The vertical bar should be read as "such that" or "with the property of". In this example, the expression can be read out as
"$V_F$ is the set of all elements $F(x)$ such that $x$ belongs to the set $N$".
For the example $y=x^2$, the range equals the set of positive real
numbers including zero, $V_F = \mathbb{R}_{\geq 0}$. Therefore, in this case, we only reach a subset of the codomain, i.e., $V_F$ is a subset of $M$.
In linear algebra, the inputs and outputs of a function are vectors
instead of scalar. Assume we have a coordinate system $\vc{e}_1,\vc{e}_2$ and that $\begin{pmatrix}x_1 \\ x_2 \\\end{pmatrix}$ is the
coordinate for the vector $\vc{x}$. We can now have a function
$\vc{y} = F( \vc{x} )$ that maps every $\vc{x}$ to a new vector
$\vc{y} = \begin{pmatrix}y_1 \\ y_2 \\\end{pmatrix}$ according to,
for instance,
It is not possible to draw a simple graph for this mapping, since
four dimensions would be needed for that (two for the input and two for the
output). However, it is often possible to get an intuitive
understanding of the mapping by drawing both the input and the output
in the same diagram.
Interactive Illustration 9.5
shows this for the
mapping mentioned above. Note that you can move the red input arrow
$\vc{x}$ and see how the blue output arrow $\vc{y}$ moves.
$\vc{x}$
$\vc{y}$
$\vc{e}_1$
$\vc{e}_2$
Interactive Illustration 9.5:
An example of a mapping from $\vc{x}$ (red vector) to $\vc{y}$ (blue vector). As can be seen the vector is projected down on the $x$-axis.
Interactive Illustration 9.5:
The range, or image, of the function $F$ in this case is the x-axis (marked with green), since all outputs of the function land on the $x$-axis.
Interactive Illustration 9.5:
The range, or image, of the function $\hid{F}$ in this case is the x-axis (marked with green), since all outputs of the function land on the $\hid{x}$-axis.
As can be seen, the effect of the mapping is to project any input
vector on to the $\vc{e}_1$-axis. Any vector in the plane can be used
as input, hence the domain is $\mathbb{R}^2$. The codomain is also
$\mathbb{R}^2$, since the output is a vector of two dimensions, but
the range or image is the $\vc{e}_1$-axis. The range is marked with green in the second step of the figure.
A slightly more interesting example of a mapping is the following
As can be seen, the factors before the $x_1$s and $x_2$s resemble a rotation matrix (see Definition 6.10)
by $\pi/3$ radians.
This mapping is illustrated in the
Interactive Illustration 9.6,
where again the input
vector is marked with red and the output vector is marked with blue.
$\vc{x}$
$\vc{y}$
$\vc{e}_1$
$\vc{e}_2$
Interactive Illustration 9.6:
An example of a mapping from $\vc{x}$ (red vector) to $\vc{y}$ (blue vector). As can be seen, the vector is rotated $\frac{\pi}{3}$ around the origin.
Interactive Illustration 9.6:
An example of a mapping from $\hid{\vc{x}}$ (red vector) to $\hid{\vc{y}}$ (blue vector). As can be seen, the vector is rotated $\hid{\frac{\pi}{3}}$ around the origin.
We have now learned that vector mappings are simply functions that take a vector as input and outputs another vector. Next we will explore when such mappings can be written using matrices.
As can be seen by playing around with Interactive Ilustration 9.6, the output
vector is a rotated copy of the input vector with the rotation angle
$\frac{\pi}{3}$. As a matter of fact, we can write
Equation (9.6) in matrix form as
It is now easy to see that the matrix $\mx{A}$ is just a
two-dimensional rotation matrix as defined in
Definition 6.10 in
Chapter 6. When a mapping can be written in matrix
form, i.e., in the form $\vc{y} = \mx{A} \vc{x}$, we call $\mx{A}$ the
transformation matrix.
where the transformation matrix in this case equals
$\left(\begin{array}{rr} 1 & 0 \\ 0 & 0 \end{array}\right)$. That
raises the question whether all vector mappings be written on the form
$\vc{y} = \mx{A} \vc{x}$ for some $\mx{A}$ with constant coefficients? The answer is no. As an
example, the mapping
cannot be written as $\vc{y} = \mx{A}\vc{x}$. It is of course possible to write
$\begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = \left(\begin{array}{rr} x_2 & 1 \\ 1 & \frac{e^{x_2}}{x_2} \end{array}\right) \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}$,
but that violates the rule that $\mx{A}$ should consist of constant coefficients, i.e, independent of $\vc{x}$.
To investigate which
mappings can be written in this form, we first introduce the concept
of a linearmapping.
Definition 9.3:Linear Mapping
A linear mapping is a mapping $F$, which satisfies
The intuitive description of the first condition is that it shouldn't
matter if you take the function of the sum or the sum of the
functions, you get the same result. The second condition states that
amplifying the input by a certain amount gives the same result as
amplifying the output by the same amount.
Example 9.2:Shopping Cart to Cost Assume that a shop only sells packages of penne, jars of Arrabiata sauce, and bars of chocolate.
The contents of your shopping cart can be modelled as a vector space.
Introduce addition of two shopping carts as putting all of the items of both carts in one cart.
Introduce multiplication of a scalar as multiplying the number of items in a shopping cart with that scalar.
Notice that here, there are practical problems with multiplying a shopping cart with non-integer numbers or negative numbers, which makes the model less useful in practice.
Introduce a set of basis shopping carts. Let $\vc{e}_1$ correspond to the shopping cart containing one package of penne, let
$\vc{e}_2$ correspond to the shopping cart containing one jar of Arrabiata sauce, and let $\vc{e}_3$ correspond to the shopping cart containing one bar of chocolate. Then each shopping cart $\vc{x}$ can be described by three coordinates $(x_1, x_2, x_3)$ such that
$\vc{x} = x_1 \vc{e}_1 + x_2 \vc{e}_2 + x_3 \vc{e}_3$.
There is a mapping from shopping carts $\vc{x}$ to price $y \in \R$. We can introduce the matrix $\vc{A} = \begin{pmatrix}a_{11} & a_{12} & a_{13} \end{pmatrix}$ where $a_{11}$ is the price of a package of penne, $a_{12}$ is the price of a jar of Arrabiata sauce and $a_{13}$ is the price of a bar of chocolate. The price $y$ can now be expressed as $y = \mx{A} \vc{x}$.
In real life this map is often non-linear, e.g., a shop might have campaigns saying 'buy 3 for the price of 2'. But modelling the mapping as a linear map is often a reasonable and useful model.
Again (as is common with mathematical modelling) there is a discrepancy between mathematical model and reality.
The results of mathematical analysis must always be used with reason and critical thinking.
Even if the cost of a shopping cart of 1 package of penne is 10, it does not always mean that you can sell packages of penne to the store for 10 each.
Theorem 9.1:Matrix Form of Linear Mappings
A mapping $\vc{y} = F(\vc{x})$ can be written in the form $\vc{y} = \mx{A}\vc{x}$ (matrix form) if and only if it is linear.
To prove the theorem we need to prove both directions, that every
linear mapping can be written as $\vc{y} = \mx{A}\vc{x}$, and that
every mapping $\vc{y} = \mx{A}\vc{x}$ is linear. We will do this
for the case when both $\vc{x}$ and $\vc{y}$ are two-dimensional
mappings, but a similar proof works for any dimensionality of both
$\vc{x}$ and $\vc{y}$.
Assume we have a basis $\vc{e}_1$,$\vc{e}_2$ in $N$ and $M$. We can
then write the input $\vc{x}$ and the output $\vc{y}$ in this
basis,
Since $F$ maps one vector to another vector, $F(\vc{e}_1)$ must
also be a vector that can be expressed in the basis. Assume it has
the coordinates $\begin{pmatrix}a_{11} \\ a_{21} \end{pmatrix}$ in
the base $\vc{e}_1$, $\vc{e}_2$,
Comparing this expression to the second row of Equation (9.12), we understand that $y_1$ must equal $a_{11}x_1 + a_{12}x_2$ and $y_2 = a_{21}x_1 + a_{22}x_2$. We have
Now we need to prove the converse, that if $\vc{y} = \mx{A}\vc{x}$,
then the mapping is linear. Assume we have one input $\vc{x}'$ with
coordinates $\vc{x}' = x_1' \vc{e}_1 + x_2' \vc{e}_2$ or in vector
form $\vc{x}' = \begin{pmatrix}x'_1\\x'_2\end{pmatrix}$, and
another input $\vc{x}''$ or in vector form $\vc{x}'' = \begin{pmatrix}x''_1\\x''_2\end{pmatrix}$. The first condition
follows directly from rule $(vii)$ of matrix arithmetic
properties in Theorem 6.1, that is,
since a scalar $\lambda$ can be placed on either side of a matrix ($\mx{A}\lambda = \lambda \mx{A}$). The proof is thus complete.
$\square$
It is often the case that you want to find the matrix $\mx{A}$ for a
particular linear mapping. Then the following theorem can be of great
help.
Theorem 9.2:Mapping of Basis Vectors
For a linear mapping $\vc{y} = F(\vc{x})$ written on the form
$\vc{y} = \mx{A}\vc{x}$ in the base $\vc{e}_1, \ \vc{e}_2, \ \ldots, \ \vc{e}_n$, the column vectors of $\mx{A}$ are the
images of the basis vectors, $\vc{a}_{,1} = F(\vc{e}_1), \ \vc{a}_{,2} = F(\vc{e}_2), \ \ldots, \ \vc{a}_{,n} = F(\vc{e}_n)$.
We will prove the case when $N = M = 3$, but the proof for other
values of $M$ and $N$ is similar.
The first basis vector $\vc{e}_1$ can be written as $\vc{e}_1 = 1 \vc{e}_1 + 0\vc{e}_2 + 0 \vc{e}_3$ and thus has the coordinates
$(1, 0, 0)$. Using $\vc{x} = \begin{pmatrix} 1 \\ 0 \\ 0\end{pmatrix}$ in the formula $\vc{y} = \mx{A}\vc{x}$ gives
which is the first column in $\mx{A}$. Thus the image $F(\vc{e}_1)$
of the basis vector $\vc{e}_1$ is the first column of $\mx{A}$,
denoted $\vc{a}_{,1}$. Likewise, the second basis vector can be
written $\vc{e}_2 = 0 \vc{e}_1 + 1 \vc{e}_2 + 0 \vc{e}_3$, and thus
has the coordinates $(0, 1, 0)$. Its image is therefore
which is the third column of $\mx{A}$. This can be extended to any numbers of $M$ and $N$.
$\square$
We can now use this fact to easily find the matrix $\mx{A}$ for a
linear mapping by simply observing what happens to the basis function.
Example 9.3:Finding a Linear Mapping's Matrix A linear mapping $\vc{y} = F(\vc{x})$ rotates a two-dimensional vector
$\vc{x}$ counterclockwise 90 degrees. Find the transformation matrix
$\mx{A}$ of the matrix form $\vc{y} = \mx{A} \vc{x}$ when
the standard orthonormal basis $\vc{e}_1=(1,0)$, $\vc{e}_2=(0,1)$ is used.
The first column vector of $\mx{A}$ will be the image of the first
basis vector $\vc{e}_1$, which is the $x$-axis. Rotating that 90 degrees
counterclockwise brings it parallel with the $y$-axis, which has
coordinates $(0, 1)$. Thus the first column vector is $\vc{a}_{,1}= \begin{pmatrix} 0 \\ 1 \end{pmatrix}$.
The second column vector will be the image of the second basis vector,
which is the $y$-axis. Rotating the $y$-axis 90 degrees counter clockwise
makes it ending up in $(-1, 0)$. The second column vector is thus
$\vc{a}_{,1} = \begin{pmatrix} -1 \\ 0 \end{pmatrix}$ and we can write
$\mx{A}$ as
Interactive Illustration 9.7:
To get the transformation matrix $\mx{A}$ of a $90$-degree counterclockwise rotation, we start with the first basis vector $\vc{e}_1 = \left(\begin{array}{c}1\\0\\ \end{array}\right)$ marked in red.
Interactive Illustration 9.7:
Using the function on this vector gives the blue vector $\vc{F}(\vc{e}_1) = \left(\begin{array}{c}0\\1\\ \end{array}\right)$, which is rotated $90$ degrees.
Interactive Illustration 9.7:
Since this is the image of the first basis vector, it is also the first column in the matrix $\mx{A}$.
Interactive Illustration 9.7:
Next we start with the second basis vector $\vc{e}_2 = \left(\begin{array}{c}0\\1\\ \end{array}\right)$ marked in red.
Interactive Illustration 9.7:
The image of this second basis vector is $\vc{F}(\vc{e}_2) = \left(\begin{array}{c}-1\\0\\ \end{array}\right)$ (marked in green).
Interactive Illustration 9.7:
This therefore becomes the second column in the matrix $\mx{A}$, and we have found the entire matrix $\mx{A}$.
Interactive Illustration 9.7:
To get the transformation matrix $\hid{\mx{A}}$ of a $\hid{90}$-degree counterclockwise rotation, we start with the first basis vector $\hid{\vc{e}_1 = \left(\begin{array}{c}1\\0\\ \end{array}\right)}$ marked in red.
Assume we have a linear mapping $\vc{y} = F(\vc{x})$, and that the
output vector $\vc{y}$ can be fed into another linear mapping $\vc{z}= G(\vc{y})$. For this to work we need the domain of $G$ to be equal the
codomain of $F$. As an example, if $G(\vc{y})$ accepts two-dimensional
vectors, i.e., the domain of $G$ is $\mathbb{R}^2$, then the output of
$F$, must be a two-dimensional vector, i.e., the codomain of
$F(\vc{x})$ must also be $\mathbb{R}^2$. We then call the mapping
Theorem 9.3:Composition of Linear Mappings
If $\vc{y} = F(\vc{x})$ and $\vc{z} = G(\vc{y})$ are both linear mappings
then the composite mapping $\vc{z} = G(F(\vc{x}))$ is also linear.
Since $\vc{y} = F(\vc{x})$ is a linear mapping, it can be written on matrix from $\vc{y} = \mx{A}\vc{x}$, and likewise can
$\vc{z} = G(\vc{y})$ be written as $\vc{z} = \mx{B}\vc{y}$.
We can then write
where $\mx{C}$ is a new matrix equal to $\mx{B}\mx{A}$. But this is
just a matrix equation of the form $\vc{z} = \mx{C}\vc{x}$ which,
according to Theorem 9.1 means that the mapping
from $\vc{x}$ to $\vc{z}$ must also be a linear mapping.
$\square$
Example 9.4:Composition of Linear Mappings Find a mapping $F$ that rotates a two-dimensional vector by 30
degrees, and then multiplies the $x$-coordinate by two.
We split this into two parts. First we find a mapping $\vc{y} =G(\vc{x})$ that rotates 30 degrees, and then another mapping
$\vc{z} = H(\vc{y})$ that doubles the $x$-coordinate.
Let $\mx{A}$ be the transformation matrix for $G$. We know from
Definition 6.10 that a rotation by $\phi$ is
obtained by
$ \left(\begin{array}{rr} \cos \phi & -\sin \phi \\ \sin \phi & \cos \phi \end{array}\right).$
Thus, setting $\phi = \frac{\pi}{6}$, we get
Let $\mx{B}$ be the transformation matrix for $H$. The task of this is
to double the $x$-coordinate, but leave the $y$-coordinate intact. This can be solved by setting
For some mappings, several inputs $\vc{x}$ can map to the same value
$F(\vc{x})$. As an example, in
Interactive Illustration 9.5,
where we simply project a point onto the $x$-axis by setting the
$y$-coordinate to zero, both $\vc{x}_1 = (1,5)$ and $\vc{x}_2 = (1,2)$
will map to the same point $(1,0) = F(\vc{x}_1) =F(\vc{x}_2)$. However, for some mappings, the result will be unique,
so if we start out with two different $\vc{x}_1 \neq \vc{x}_2$, we know
that $F(\vc{x}_1) \neq F(\vc{x}_2)$. We call such mappingsinjective.
Definition 9.4:Injective mappings
A mapping $y = F(x)$ is injective, if any two different vectors $\vc{x}_1 \neq \vc{x}_2$ will always give rise to two different images $F(\vc{x}_1) \neq F(\vc{x}_2)$.
Another way to state this is to say that an injectivemapping has the property that if two images $F(\vc{x}_1)$ and $F(\vc{x}_2)$ are equal, then $\vc{x_1}$ must equal $\vc{x_2}$.
For some mappings $\vc{y} = F(\vc{x})$, we can reach every point
$\vc{y}$ in the codomain by selecting a suitable $\vc{x}$. When the
range of the mapping covers the codomain like this, we call the
mappingsurjective. A mapping that is both injective and
surjective is called bijective.
Definition 9.5:Surjective mapping
A mapping $\vc{y} =F(\vc{x})$, where $\vc{x} \in N$ and $\vc{y} \in M$ is
surjective if the range $V_F$ is equal to to the codomain $M$,
$V_F = M$.
Definition 9.6:Bijective mapping
A mapping is bijective if it is both injective and surjective.
We will now go through a few examples of mappings and investigate whether they are injective, surjective, and bijective.
Example 9.5:Type of Mapping I Consider the mapping $\vc{y} = F(\vc{x})$
where $\vc{x}$ belongs to the domain $N = \mathbb{R}^2$ and the
resulting vector $\vc{y}$ belongs to the codomain $M =\mathbb{R}^2$. This mapping is injective, since if we have two
different input values $\vc{a}$ and $\vc{b}$, the output vector
$\begin{pmatrix}e^{a_1}\\e^{a_2}\end{pmatrix}$ will be different from
$\begin{pmatrix}e^{b_1}\\e^{b_2}\end{pmatrix}$ unless $\vc{a} =\vc{b}$. However, the mapping is not surjective, since it is
impossible to reach negative values for $y_1$ and $y_2$ ($e^x > 0$ for
all real values of $x$). The range $V_F$ is in this case only the
quadrant $y_1 > 0, y_2 > 0$ and not equal to the codomain $M =\mathbb{R}^2$. Since the mapping is not surjective, it is also not
bijective.
Example 9.6:Type of Mapping II Investigate whether the mapping $\vc{y} = F(\vc{x})$
is bijective. In this case both the domain $N$ and the codomain $M$ equals the plane of real numbers $\mathbb{R}^2$.
First we investigate whether it is injective. It is clear that it is, since two different input values $\vc{a}$ and $\vc{b}$ will generate two different output vectors $\begin{pmatrix}2 a_1\\3a_2\end{pmatrix} \neq \begin{pmatrix}2b_1\\3b_2\end{pmatrix}$ unless $\vc{a} = \vc{b}$.
Also all output vectors $\begin{pmatrix}y_1\\y_2\end{pmatrix}$ (the
entire codomain $M = \mathbb{R}^2$) can be reached by using the
input $\begin{pmatrix}x_1\\ x_2\end{pmatrix}= \begin{pmatrix}\frac{y_1}{2}\\\frac{y_2}{3}\end{pmatrix}$. This
means that the mapping is also surjective. Since the mapping is
both injective and surjective, it is bijective.
As it turns out, linear mappings that are bijective have many useful properties. As an example, it is simple to find the inverse mapping, as we will see in the following theorem.
Theorem 9.4:Equivalence of Inverse Mapping
For a linear mapping $\vc{y} = F(\vc{x})$ from $\vc{x} \in \mathbb{R}^n$ to $\vc{y} \in \mathbb{R}^n$, the following three statements are equivalent:
The mapping $F$ is bijective.
The transformation matrix for $F$ is invertible.
The images $F(\vc{e}_1), \ldots, F(\vc{e}_n)$ of the basis vectors $\vc{e}_1, \, \ldots, \, \vc{e}_n$ constitute a basis in $\mathbb{R}^n$.
i $\rightarrow$ ii: If the mapping $F$ is bijective, this means that
the equation $F(\vc{x}) = \vc{y}$ has a unique solution $\vc{x}$ for
every $\vc{y}$. Specifically, since $F$ is injective, we know that if
$F(\vc{u})=F(\vc{v})$, $\vc{u}$ must be equal to $\vc{v}$. Also, since
$F$ is surjective, every $\vc{y}$ can be reached with the proper
$\vc{x}$. The matrix form of $F$ is $\vc{y} = \mx{A}\vc{x}$, and that
this equation has a unique solution for every $\vc{y}$ is equivalent
to saying that $\mx{A}$ is invertible according to
Theorem 6.9.
ii $\rightarrow$ iii: According to
Theorem 9.2, $F(\vc{e}_1)$ is just
the first column of $\mx{A}$. Likewise, $F(\vc{e}_2)$ is the second
column of $\mx{A}$, and so
on. Theorem 6.9 tells us that if
$\mx{A}$ is invertible, the columns of $\mx{A}$ span $\mathbb{R}^n$,
which means that they form a basis in $\mathbb{R}^n$.
iii $\rightarrow$ i: We first prove that iii means that $F$ is
injective, i.e., that if $F(\vc{u}) = F(\vc{v})$ then $\vc{u}$ must
equal $\vc{v}$.
We can write $\vc{u} = u_1 \vc{e}_1 + u_2 \vc{e}_2 + \ldots + u_n\vc{e}_n$ and $\vc{v} = v_1 \vc{e}_1 + v_2 \vc{e}_2 + \ldots + v_n\vc{e}_n$. We know that $F(\vc{u}) = F(\vc{v})$, hence
$F(\vc{u})-F(\vc{v}) = \vc{0}$, which is equivalent to
but since $F(\vc{e}_1), F(\vc{e}_2), \ldots, F(\vc{e}_n)$ are linearly
independent, this only has the solution $(u_1-v_1) = 0, \,(u_2-v_2)=0, \, \ldots, \, (u_n-v_n)=0$ according to
Theorem 5.2. Hence
$\vc{u} = \vc{v}$ and $F$ is injective.
We then prove that $\vc{y} = F(\vc{x})$ is surjective, i.e., we can
reach every $\vc{y}$ with a proper choice of $\vc{x}$.
We can write $\vc{x} = x_1\vc{e}_1 + x_2\vc{e}_2 + \ldots + x_n\vc{e}_n$, and hence $\vc{y} = F(\vc{x})$ can be written
and since $F(\vc{e}_1), F(\vc{e}_2), \ldots, F(\vc{e}_n)$ constitue a
basis in $\mathbb{R}^n$, these vectors span $\mathbb{R}^n$ and we can
therefore reach any $\vc{y}$ in $\mathbb{R}^n$. Hence $F$ is
surjective. Since $F$ is both surjective and injective, it is
bijective, which concludes the proof.
$\square$
From this proof, the following theorem follows.
Theorem 9.5:Inverse Mapping Matrix
For a bijectivelinear mapping $\vc{y} = F(\vc{x})$ with
transformation matrix $\mx{A}$, the inverse mapping $\vc{x} =F^{-1}(\vc{y})$ is linear and has the transformation matrix $\inv{\mx{A}}$.
This is exactly the mapping we were seeking above - if we input
$\vc{y} = \mx{A}\vc{x}$ we get $\vc{x}$ as the output. Thus
$\inv{\mx{A}}$ must be the transformation matrix of $\inv{F}$, and
since $\inv{F}$ can be written on matrix form this way, it must be
linear.
Example 9.7:Shadows Assume we have a plane that goes through the origin with the normal
vector $\vc{n} = (0, 1, 0)$. Furthermore, assume the sun is infinitely
far away from the origin, positioned in the direction $\vc{r} = (0.5,1.0, 0.25)$. We have a lot of points above the plane, and would like
to know their shadows on the plane. Assume also that we have been
reassured that this particular type of shadow projection can be
expressed as a linear mapping.
It would be possible to calculate the shadow point by point, but it
would be more convenient if we could create a linear mapping $\vc{y} =\mx{A}\vc{x}$ so that we could directly get the projected point
$\vc{y}$ by simply multiplying the point $\vc{x}$ by the matrix
$\mx{A}$. This should be possible, given that the problem is linear.
In this case, we take advantage of
Theorem 9.2, i.e., to obtain $\mx{A}$ we
only need to know what happens to the three unit vectors. The first
column vector of the transformation matrix $\mx{A}$ is simply the
image of the first unit vector $\vc{e_1} = (1,0,0)$, and so
forth. Hence, if we find out where on the plane the shadow of the
point $(1,0,0)$ lands, we have the first column of $\mx{A}$.
We can write the plane on the form $ax+by+cz+d = 0$, and since we know the normal is $(0, 1, 0)$ this simplifies to $y+d=0$. Furthermore, since the origin $(0, 0, 0)$ is contained in the plane, the equation further simplifies to $y=0$. We can now take the line that goes through the point $(p_x, p_y, p_z)$ and follow it in the direction of the sun, $(r_x, r_y, r_z)$,
Inserting this line equation into the plane equation should give us the distance $\lambda$ that we need to travel from $P$ to get to the intersection. When we insert it into the plane equation $y = 0$ we get $p_y + 1.0 \lambda = 0$. For the first unit vector $\vc{e}_1=(1, 0, 0)$, we have $p_y = 0$, giving $0 + \lambda = 0$, which means that $\lambda = 0$. The intersection point is therefore at
$(1, 0, 0) + 0\vc{r} = (1, 0, 0)$. This is therefore the first column of $\mx{A}$. For the second unit vector $\vc{e}_2 = (0, 1, 0)$, we have $p_y = 1$, giving the equation $1+\lambda = 0$, or $\lambda = -1$. Therefore the second column of $\mx{A}$ is $(0, 1, 0) + (-1)(0.5, 1.0, 0.25) = (-0.5, 0, -0.25)$. The third unit vector $\vc{e}_3 = (0, 0, 1)$ gets $\lambda = 0$ and hence the last column of $\mx{A}$ is $(0, 0, 1) - 0\vc{r} = (0, 0, 1)$.
In summary, we have now created a linear mapping $\vc{y} = \mx{A}\vc{x}$ that takes a point $\vc{x}$ and maps it to its shadow $\vc{y}$. The matrix $\mx{A}$ equals
This is used in the figure below. Each point of the cube is just projected to the plane using $\vc{y} = \mx{A}\vc{x}$ with the matrix $\mx{A}$ from above. By drawing each face of the cube using the projected coordinates instead of the original coordinates, it is possible to draw the shadow of the cube.
$O$
$\vc{n}$
$\vc{r}$
$P$
$P'$
$\vc{x}$
$\vc{y}=\mx{A}\vc{x}$
Interactive Illustration 9.8:
In this illustration, we will show how a linear mapping can be used to create shadow effects. We start by constructing a linear mapping that projects a point $P$ to $P$', which is the projection on to the plane along the vector $\vc{r}$.
Interactive Illustration 9.8:
The linear mapping $\vc{y} = \mx{A}\vc{x}$ is constructed according to the example. The vector $\overrightarrow{OP}$ is then inserted as $\vc{x}$ into the mapping.
Interactive Illustration 9.8:
The resulting $\vc{y}$ will then equal the vector $\overrightarrow{OP'}$ that points to the projected point $P'$.
Interactive Illustration 9.8:
In this manner, we can conveniently project any point $P$ to the plane along $\vc{r}$.
Interactive Illustration 9.8:
Assume we have a polygon consisting of four points. By applying the linear mapping on all four points before drawing the polygon it is possible to draw the shadow of the polygon.
Interactive Illustration 9.8:
This is possible for any three-dimensional object. As long as it can be drawn using a set of polygons it is possible to draw its shadow by applying the linear mapping before drawing it.
Interactive Illustration 9.8:
In this illustration, we will show how a linear mapping can be used to create shadow effects. We start by constructing a linear mapping that projects a point $\hid{P}$ to $\hid{P}$', which is the projection on to the plane along the vector $\hid{\vc{r}}$.
Now that we have this shadow example, we can investigate whether the mapping is bijective.
Example 9.8:Bijectivity of Shadows Examine whether the mapping in in Example 9.7 is bijective.
According to Theorem 9.4, a mapping $\vc{y} = \mx{A}\vc{x}$ bijective if and only if the transformation matrix $\mx{A}$ is invertible. We know from Theorem 7.10 from the chapter on determinants that a matrix is invertible only if the determinant is nonzero. However, the determinant
must be zero since the second row is zero (see Theorem 7.1$(iv)$ combined with $(vii)$). Hence the mapping cannot be bijective.
This makes sense. If you know a set of points and the direction of the sun, you can calculate their respective shadows. However, if you only have the shadows and the direction of the sun, you cannot recover the original positions since you
only know the direction in which the point is located and not the distance to it. Hence the operation is not reversible, or in formal terms, bijective.
Example 9.9:Image compression revisited
In Example 9.1 we saw that a mapping
$\vc{y} = \mx{A}\vc{x}$ was used as part of the compression process. Is this mapping likely to be bijective?
The answer is yes. In order to be useful, we want the decoded image to look the same as the original, at least if we spend enough bits. If we had a mapping that was not bijective, information would be lost in the transform step $\vc{y} = \mx{A}\vc{x}$. No matter how well we would preserve $\vc{y}$, i.e., no matter how many bits we would spend to describe $\vc{y}$, it would never be possible to recover $\vc{x}$. If the mapping is bijective, on the other hand, it is simple to recover $\vc{x}$ using $\vc{x} = \inv{\mx{A}}\vc{y}$.
Popup-help:
A basis (bases in plural) is a set of linearly independent (basis) vectors, such that each vector in the space
can be described as a linear combination of the basis vectors.
If the basis is $\{\vc{e}_1,\vc{e}_2,\vc{e}_3\}$, then any vector $\vc{v}$ in the space can be described by
three numbers $(x,y,z)$ such that $\vc{v} = x\vc{e}_1 +y\vc{e}_2 +z\vc{e}_3$. Often the basis is implicit,
in which case we write $\vc{v} = (x,y,z)$, or to make it possible to distinguish between vector elements from
one vector $\vc{v}$ to another $\vc{u}$, we may use $\vc{v}=(v_x,v_y,v_z)$ and $\vc{u}=(u_x,u_y,u_z)$ as notation.
Popup-help:
A mapping $\vc{y} = F(\vc{x})$ is bijective if it is both
injective and surjective. Since it is surjective, it can reach
every $\vc{y}$ in the codomain, and since it is injective, going
from $\vc{y}$ back to $\vc{x}$ is well-defined.
If a linear mapping $\vc{y} = \mx{A}\vc{x}$ is bijective it means
that the transformation matrix $\mx{A}$ is invertible and that the
columns of $\mx{A}$ constitute a basis in the codomain.
Popup-help:
A mapping $F$ is a rule that is written as $F: N \rightarrow M$,
where for every item in one set $N$, the function $F$
provides one item in another set $M$. Here $M$ is the codomain.
Popup-help:
An $n$-dimensional column vector, $\vc{v}$, is represented with respect to a basis and consists of a column of $n$ scalar values.
The vector elements are sometimes denoted $v_1$, $v_2$,..., $v_n$. For two- and three-dimensional vectors, we sometimes
also use $v_x$, $v_y$, and $v_z$.
The notation is
where $\vc{u} = u_x \vc{e}_1$, $\vc{v} = v_x \vc{e}_1 + v_y \vc{e}_2$,
and $\vc{w} = w_x \vc{e}_1 + w_y \vc{e}_2 + w_z \vc{e}_3$.
Note that $\vc{e}_i$ are the basis vectors.
In our text, we also use the shorthand notation $\vc{w} = \bigl(w_1,w_2,w_3\bigr)$, which means the same as above
(notice the commas between the vector elements).
A row vector does not have any commas between, though.
Popup-help:
A mapping $F$ is a rule that is written as $F: N \rightarrow M$,
where for every item in one set $N$, the function $F$
provides one item in another set $M$. Here $N$ is the domain.
Popup-help:
A mapping $\vc{y} = F(\vc{x})$ is injective if going back
from $\vc{y}$ to $\vc{x}$ is well-defined.
More formally, if $\vc{y} = F(\vc{x})$ is injective, then two different vectors $\vc{x}_1 \neq \vc{x}_2$ must produce two different mappings, $F(\vc{y}_1) \neq F(\vc{y}_2)$.
Likewise, if $\vc{y} = F(\vc{x})$ is injective, then if two mappings $\vc{y}_1 = F(\vc{x}_1)$ and $\vc{y}_2 = F(\vc{x}_2)$ are the same $\vc{y}_1 = \vc{y}_2$, then this means that the vectors $\vc{x}_1$ and $\vc{x}_2$ must also be the same $\vc{x}_1 = \vc{x}_2$.
Popup-help:
On parameterized form, a line is defined by a starting point, $S$, and a line direction $\vc{d}$, and a scalar
parameter, $t$, such that any point on the line can be obtained from the line equation, i.e.,
$P(t) = S + t\vc{d}$. This representation works in any dimension.
On implicit form, a line is defined by a starting point, $S$, and a normal vector, $\vc{n}$. For any point, $P$,
on the line, it holds that $\vc{n}\cdot (P-S) = 0$. This representation works only in two dimensions.
Popup-help:
A systems of equations is called linear if it only contains polynomial terms of the zero:th and first order,
that is, either constants or first-order terms, such as $9x$, $-2y$, and $0.5z$.
Popup-help:
A mapping $F$ is linear if it satisfies
Popup-help:
A mapping $F$ is a rule that is written as $F: N \rightarrow M$,
where for every item in one set $N$, the function $F$
provides one item in another set $M$.
Popup-help:
A mapping $F$ is non-linear if it violates one or both of these rules:
Popup-help:
A normal, $\vc{n}$, is part of the description of the two-dimensional line and the three-dimensional plane on implicit form.
A plane equation is
where $\vc{n}$ is a vector that is orthogonal to the plane, $P$ is any point on the plane,
and $S$ is a known point on the plane.
Similarly, a line is $\vc{n}\cdot (P-S)=0$, where $\vc{n}$ is a vector orthogonal the line, $P$ is any point on the line, and
$S$ is a particular point on the plane. The plane can also be written as $ax+by+cz+d=0$ and a line can be written
as $ax+by+c=0$.
Popup-help:
An $n$-dimensional orthonormalbasis consists of $n$ basis vectors $\vc{e}_i$, such that
$\vc{e}_i\cdot \vc{e}_j$ is 0 if $i\neq j$ and 1 if $i=j$. The standard basis is orthonormal.
In two dimensions, we can see that this is true
since $\vc{e}_1\cdot\vc{e}_2=$$(1,0)\cdot(0,1)=0$ and $\vc{e}_i\cdot\vc{e}_i=1$.
Popup-help:
On parameterized form (also called explicit form), a plane equation is given by $P(t_1, t_2) = S + t_1 \vc{d}_1 + t_2\vc{d}_2$,
where $S$ a point in the plane and $\vc{d}_1$ and $\vc{d}_2$ are two non-parallel non-zero vectors
on the plane. Using the parameters $t_1$ and $t_2$, any point $P(t_1,t_2)$ can be reached.
On implicit form, a plane equation is given by $\vc{n}\cdot(P-S)=0$, where $\vc{n}$ is the normal of
the plane and $S$ is a point on the plane. Only for points, $P$, that lie in the plane is the
formula equal to $0$.
Popup-help:
Projection is a mapping from one set onto a smaller set (i.e., a subset) of that set. For example, if a 3D point
is projected to the closest point on a plane, then starting set was all 3D points which was reduced to
the points in the plane using projection.
Popup-help:
The range or the image of a mapping $F$ is the set $V_F$, where
\begin{equation}
V_F = \{F(x) | x \in N\}.
\end{equation}
Popup-help:
A mapping $\vc{y} = F(\vc{x})$ is surjective if it is
possible to reach every possible $\vc{y}$ in the codomain. In
other words, if $\vc{x} \in N$ and $\vc{y} \in M$, then for every
$\vc{y}$ in $M$ there exists at least one $\vc{x}$ in $N$ so that
$F(\vc{x}) = \vc{y}$.
Popup-help:
A unit vector, $\vc{u}$, is a vector such that its length is 1, i.e., $||\vc{u}||=1$. The process of creating a unit
vector from another vector is called normalization, and a unit vector is therefore often also called a normalized vector.
To generate a unit vector from an arbitrary vector, $\vc{v}$, simply divide by its length: $\vc{u} = \frac{1}{\ln{\vc{v}}}\vc{v}$.