© 2015-2020 Jacob Ström, Kalle Åström, and Tomas Akenine-Möller

Loading and building chapter...

Chapter 5: Gaussian Elimination





This chapter is about Gaussian Elimination which is a method for solving systems of linear equations. Such systems are often encountered when dealing with real problems, such as this computer vision problem: Given a number of images of an object, calculate a 3D model of the object.
Figure 5.1: Four out of 671 pictures of Örebro Castle. Images courtesy of Carl Olsson at Lund University.
Figure 5.1: Four out of 671 pictures of Örebro Castle. Images courtesy of Carl Olsson at Lund University.
Figure 5.1 shows four images of Örebro Castle. Using those, and 667 other similar images, it is possible to calculate the point cloud shown in Figure 5.2. These calculations include solving many systems of equations. Gaussian Elimination is a very good way to learn about such systems. For more information about this example see this YouTube clip.
Interactive Illustration 5.2: Point cloud calculated from 671 images. Press the rotation icon in the top left corner to rotate the point cloud, or right-hand click plus drag to see the model from any orientation. Try look at the castle from above to see the geometry!
Interactive Illustration 5.2: Point cloud calculated from 671 images. Press the rotation icon in the top left corner to rotate the point cloud, or right-hand click plus drag to see the model from any orientation. Try look at the castle from above to see the geometry!
5.1 Introduction


One way to familiarize oneself with systems of equations is to investigate what happen when two lines cross.

To describe a two-dimensional line in the plane, one of the first notations that many people encounter is
\begin{equation} y = kx + m, \end{equation} (5.1)
where $k$ denotes the slope of the line and $m$ its intersection with the $y$-axis. This notation works well for most lines, but not for all. No matter what value is used for $k$, it will not be possible to represent a completely vertical line ($x=4$, for example). Therefore, the mathematically more convenient notation
\begin{equation} ax + by + c = 0 \end{equation} (5.2)
is typically preferred. This works well for all lines of the form $y =kx + m$, just set $a=-k$, $b=1$ and $c=-m$. In addition, it also works for vertical lines. For instance, setting $a=1$, $b=0$ and $c=-4$ represents the line $x=4$ that goes through the point $(4,0)$ and is parallel to the $y$-axis. In Figure 5.3, we show two lines represented by the equations $2x + 3y + 5 = 0$ and $5x+2y-4=0$.
$2 x + 3 y + 5 = 0$
$5 x + 2 y -4 = 0$
Interactive Illustration 5.3: An example of two lines described on the form $ax+by+c=0$. These lines intersect at $(2,-3)$.
Interactive Illustration 5.3: An example of two lines described on the form $\hid{ax+by+c=0}$. These lines intersect at $\hid{(2,-3)}$.
The line $2x+3y+5=0$ consists of all the points $(x,y)$ for which $2x+3y+5=0$ holds. Likewise, the other line consists of all the $(x,y)$ pairs for which $5x+2y-4=0$ is satisfied. The intersection point of the two lines must be a coordinate $(x,y)$ which satisfies both equations. Hence to find the intersection point, we need to solve the system of equations
\begin{equation} \begin{cases} 2 x + 3 y = -5,\\ 5 x + 2 y = 4. \end{cases} \end{equation} (5.3)
The system of equations described above is called linear, since it only contains polynomial terms of the first degree such as $2 x$ and $3 y$, and not any higher order terms such as $x^2$, $xy$ or $y^3$ or any other non-linear terms such as $\cos x$ or $e^x$. Such systems have the nice property that they are easy to investigate. For instance, if there is a solution, it is always possible to find it. In case there is no solution, such as in the case when the two lines are parallel and do not overlap, the investigation will show this. Finally, in the case when there are many solutions, such as in the case when the two lines are parallel and overlapping, the investigation will reveal also this.

Interactive Illustration 5.4 shows how the point of intersection between two lines, i.e., the solution to systems of equations such as (5.3), changes interactively when the two line equations are changed.
Interactive Illustration 5.4: The reader may move the red and blue points on the lines in order to change the corresponding line equation. The illustration then uses the techniques in this chapter to find the point of intersection (green) between the two lines.
Interactive Illustration 5.4: The reader may move the red and blue points on the lines in order to change the corresponding line equation. The illustration then uses the techniques in this chapter to find the point of intersection (green) between the two lines.


This chapter presents a systematic way, called Gaussian elimination, of analyzing and solving such systems. However, first we will start out with a few more examples of linear equations.
5.2 Examples


Example 5.1: Video Compression
Each pixel in a videoframe is displayed on a computer screen by lighting up three subpixels, one for red, one for green, and one for blue. In the top row of Figure 5.5, these color components are shown independently to the right.
Figure 5.5: Top: the red, green, and blue components of an image shown as gray scale values. Bottom: the RGB color components have been transformed to $Y$ (roughly luminance) and $Cb$ & $Cr$ (chrominance), and also shown as gray scale values.
Figure 5.5: Top: the red, green, and blue components of an image shown as gray scale values. Bottom: the RGB color components have been transformed to $\hid{Y}$ (roughly luminance) and $\hid{Cb}$ & $\hid{Cr}$ (chrominance), and also shown as gray scale values.
As can be seen, these color channels (RGB) are quite similar to each other, which is due to the fact that all three are affected by luminance, i.e., the overall brightness per pixel. Stating the same thing three times is inefficient, and therefore video coding schemes such as H.264 do not store the pixels as $(R, G, B)$. Instead $(Y, Cb, Cr)$ is stored, where
\begin{equation} \begin{cases} \begin{array}{rrrl} Y = & \bs 0.299 & \bs R + 0.587 & \bs G + 0.114 & \bs B,\\ Cb = & \bs -0.169 & \bs R - 0.331 & \bs G + 0.500 & \bs B + 128,\\ Cr = & \bs 0.500 & \bs R - 0.419 & \bs G - 0.081 & \bs B + 128. \end{array} \end{cases} \end{equation} (5.4)
As can be seen in the lower row of Figure 5.5, the $Y$, $Cb$ and $Cr$ channels are now less similar to each other. The last two channels also seem more washed-out, and this translates, it turns out, to that fewer bits can be used to represent the image. On top of this, the human visual system is much more sensitive to errors in luminance (roughly $Y$) than in the two other channels. This is also exploited by representing the chrominance channels less accurately, which results in even fewer bits.

During decompression, we need to get back from $(Y, Cb, Cr)$ to $(R, G, B)$ in order to display the pixel on the screen. Assume we have computed that the color in the $(Y, Cb, Cr)$-space for a pixel is $Y=180$, $Cb=80$, and $Cr=150$. What $(R, G, B)$ color does that represent? The answer can be given by inserting the values for $Y$, $Cb$ and $Cr$ in the equations above and solving the resulting linear system of equations, i.e.,
\begin{equation} \begin{cases} \begin{array}{rrrl} 0.299 &\bs R + 0.587 &\bs G + 0.114 &\bs B = &\bs 180,\\ -0.169 &\bs R - 0.331 &\bs G + 0.500 &\bs B = &\bs -48,\\ 0.500 &\bs R - 0.419 &\bs G - 0.081 &\bs B = &\bs 22. \end{array} \end{cases} \end{equation} (5.5)

Example 5.2: Electronic Circuit
In the electronic circuit below, we would like to calculate the currents $I_1$, $I_2$, and $I_3$.
Figure 5.6: An example circuit. The currents $I_1$, $I_2$ and $I_3$ are unknown.
Figure 5.6: An example circuit. The currents $\hid{I_1}$, $\hid{I_2}$ and $\hid{I_3}$ are unknown.
Kirchoff's voltage law states that the sum of the voltages equals zero when going in a closed loop in the circuit. If we apply this to the left part of the circuit we get $5V - I_2 1\mathrm{k}\Omega - I_2 2\mathrm{k}\Omega = 0$. If we apply it to the outer loop of the circuit we get $5V - I_3 4\mathrm{k}\Omega = 0$. Finally, Kirchoff's current law can be used at the red point in the circuit to get a third equation, $I_1 = I_2 + I_3$. In summary, this yields the following system of equations
\begin{equation} \begin{cases} 1000 I_2 + 2000 I_2 \hid{+ 4000 I_3} = 5,\\ \hid{1000 I_1 + 2000 I_2 +} 4000 I_3 = 5,\\ \hid{1000}I_1\hid{2000} -I_2\hid{4000}-I_3 = 0,\\ \end{cases} \end{equation} (5.6)
where the units have been removed. This can be solved to recover $I_1$, $I_2$, and $I_3$. Solving such systems of equations is the topic of the rest of this chapter.

Example 5.3: Fitting a circle to three points
There are some problems that although non-linear, can be changed into a system of linear equations after a change of variables. Here we illustrate this with the problem of fitting a circle to three points. A circle is given by its center point $(u,v)$ and its radius $r$. A point $(x_1,y_1)$ lies on this circle if
\begin{equation} (x_1-u)^2 + (y_1-v)^2 = r^2 . \end{equation} (5.7)
Since there are $3$ unknowns ($u$, $v$ and $r$) one might expect that we need $3$ equations. So maybe three points ($(x_1,y_1)$, $(x_2,y_2)$, and $(x_3,y_3)$) are needed to specify a circle. What is the circle that goes through the three points $(-2,3)$, $(-1,1)$, and $(-1,2)$? This gives a system of 3 non-linear equations
\begin{equation} \begin{cases} (-2-u)^2 + (3-v)^2 = r^2 ,\\ (-1-u)^2 + (1-v)^2 = r^2 ,\\ (-1-u)^2 + (2-v)^2 = r^2 .\\ \end{cases} \end{equation} (5.8)
By expanding the squares in the equation we get
\begin{equation} \begin{cases} 4+4u+u^2 + 9-6v+v^2 = r^2 ,\\ 1+2u+u^2 + 1-2v+v^2 = r^2 ,\\ 1+2u+u^2 + 4-4v+v^2 = r^2 .\\ \end{cases} \end{equation} (5.9)
After rearrangement of the terms we have
\begin{equation} \begin{cases} (r^2 - u^2 - v^2) -4u +6v = 13 ,\\ (r^2 - u^2 - v^2) -2u +2v = 2 ,\\ (r^2 - u^2 - v^2) -2u +4v = 5 .\\ \end{cases} \end{equation} (5.10)
By introducing $w = r^2-u^2-v^2$ we obtain a system of linear equations
\begin{equation} \begin{cases} w -4u +6v = 13 ,\\ w -2u +2v = 2 ,\\ w -2u +4v = 5 .\\ \end{cases} \end{equation} (5.11)
Again note that this seemingly non-linear problem could be converted into a system of linear equations. When this has been solved, we obtain values for $w$, $u$, and $v$, which can then be used to find out what the radius is, using $w = r^2-u^2-v^2$, i.e., $r^2 = w+u^2+v^2$.

The example above is shown in Interactive Illustration 5.7.
Interactive Illustration 5.7: Given the three points (red, green, blue) denoted by $(x_i,y_i)$, $i\in\{1,2,3\}$, we can calculate the center, $(u,v)$, of the circle and its radius $r$. The reader may move the red, green, and blue points.
Interactive Illustration 5.7: Given the three points (red, green, blue) denoted by $\hid{(x_i,y_i)}$, $\hid{i\in\{1,2,3\}}$, we can calculate the center, $\hid{(u,v)}$, of the circle and its radius $\hid{r}$. The reader may move the red, green, and blue points.
$(x_1,y_1)$
$(x_2,y_2)$
$(x_3,y_3)$
$(u,v)$
$r$

Example 5.4: 4000 year old Babylonian problem
Linear systems of equations were described on Babylonian clay tablets. One such problem is the following:

"There are two fields whose total area is 1800 square yards. One produces grain at the rate of 2/3 of a bushel per square yard while the other produces grain at the rate of 1/2 a bushel per square yard. If the total yield is 1100 bushels, what is the size of each field?"

Introduce a variable for the size of each field, i.e. $x_1$ square yards of the first and $x_2$ square yards of the second. Then the problem can be reformulated as
\begin{equation} \begin{cases} \begin{array}{rrrl} \frac{2}{3} x_1 & \bfm + & \bfm \frac{1}{2} x_2 & \bfm = 1100, \\ x_1 & \bfm + & \bfm x_2 & \bfm = 1800. \\ \end{array} \end{cases} \end{equation} (5.12)
This can be solved to recover $x_1$, $x_2$. Solving such systems of equations is the topic of the rest of this chapter.
5.3 Gaussian Elimination


There are many possible ways to solve systems of equations, and you have most likely studied some of them before. However, what Gaussian elimination provides is a way that always works. You could think of it as a cookbook recipe that will always give you the same tasty dish if you just follow the steps in the recipe. In our case the end result is not edible, but a system of equations that has a triangular shape, and is hence called a triangular systems of equations. This can be seen in Figure 5.8.
$\begin{array}{rrrl} 6 & \!\!\!\!\!\! x + &\!\!\!\!\!\!\!y - 4 &\!\!\!\!\!\!z = \hid{-}4 \\ & 2 &\!\!\!\!\!\!y + 3 &\!\!\!\!\!\!z = -5 \\ & & 11 &\!\!\!\!\!\!z = -33 \\ \end{array}$
Interactive Illustration 5.8: Here a systems of equations is shown. Press the Forward button to proceed.
Interactive Illustration 5.8: A triangle is used to highlight the triangular structure of the systems of equations.
Although the system in Figure 5.8 has three equations and three unknowns, it is very simple to solve. The reason is that the bottom-most equation is so simple, $z$ can be recovered by dividing by $11$ on both sides, and thus $z=\frac{-33}{11} = -3$. This value of $z$ can then be inserted in the second to last equation, yielding $2y + 3(-3) = -5$, or $2y = 4$. Again, it is simple to solve $y=2$ and both $y$ and $z$ can now be inserted in the top equation to generate a simple equation for $x$.

The goal of Gaussian elimination is to start with any linear system of equations, and then transform it into a triangular system. This is done using only three simple rules, as shown in the following theorem.

Theorem 5.1: Gaussian Elimination Rules
The solutions to a linear system of equations do not change if we
  1. swap the order of two rows,
  2. multiply a row with a constant $\neq 0$, or
  3. add a multiple of another row to a row.
At this point, we defer the proof of this theorem until Section 5.7.1, where we also repeat this theorem for the sake of completeness.
As an example, assume we have the following system of equations,
\begin{equation} \begin{cases} \begin{array}{rl} -x + \hid{4} y =& \!\!\!\! 8,\\ 2 x + 4y =& \!\!\!\!-10. \end{array} \end{cases} \end{equation} (5.13)
We can now use the second and third rules and add two times the first row to the last. This brings us to the system
\begin{equation} \begin{cases} -x + \hid{6} y = 8,\\ \hid{-x+}6y = 6, \end{cases} \end{equation} (5.14)
which is a triangular system, and those can easily be solved, as we have seen earlier in this chapter. From the last row, we can see that $y=1$, which inserted into the top equation gives us $-x + 1 = 8$, i.e., $x=-7$.

The example above was easy to solve since we could use an integer multiple. In general, this is not always possible. Consider for instance,
\begin{equation} \begin{cases} 2 x + 3 y = -5, \\ 5 x + 2 y = 4. \end{cases} \end{equation} (5.15)
To eliminate the $x$-variable we would need to multiply the top row by $-\frac{5}{2}$ before adding it to the bottom row. Since it is easier to do calculations with integers than with fractions, it is simpler to multiply the top equation by $5$ and the bottom equation by $2$ and then subtract them. This is explained in detail in Interactive Illustration 5.9.
Interactive Illustration 5.9: Typical example of Gaussian elimination. The first step is simply to copy the first equation, which can be seen when clicking/pressing Forward.
Interactive Illustration 5.9: We start by working on the bottom equation. We now want to eliminate the first variable, $\hid{x}$ in this case. To do so, we show some scrap calculations to the right. We copy the first equation to the scrap.
$\Big\{\begin{array}{l} 2 x + 3 y = -5 \\ 5 x + 2 y = 4 \end{array}$
$2 x + 3 y = -5$
$\Big\{\begin{array}{rl} \class{hidden}{2 x + 3 y} & \class{hidden}{= -5} \\ \class{hidden}{11 y} & \class{hidden}{= -33} \end{array}$
$\class{hidden}{\Big\{}\begin{array}{l} 2 x + 3 y = -5 \\ \class{hidden}{5 x + 2 y = 4} \end{array}$
$2 x + 3 y = -5$
$\Big\{\begin{array}{l} 2 x + 3 y = -5 \\ \textcolor{red}{5} x + 2 y = 4 \end{array}$
$\textcolor{red}{5}\class{hidden}{(2 x + 3 y) = 5 (-5) }$
$\textcolor{red}{5}\class{hidden}{(2 x + 3 y) = 5 (-5) }$
$\class{hidden}{5}(2 x + 3 y) = \class{hidden}{5} (-5)$
$\textcolor{red}{5} (2 x + 3 y) = \textcolor{red}{5} (-5)$
$5 x + 2 y = 4$
$5 x + 2 y = 4$
$\class{hidden}{2}(5 x + 2 y) = \class{hidden}{2}(4)$
$\Big\{\begin{array}{l} \textcolor{#00dd00}{2} x + 3 y = -5 \\ 5 x + 2 y = 4 \end{array}$
$\textcolor{#00dd00}{2}$
$\textcolor{#00dd00}{2}$
$\textcolor{#00dd00}{2}\class{hidden}{(5 x + 2 y) =}\textcolor{#00dd00}{2}\class{hidden}{(4)}$
_____________________
$(\textcolor{red}{5}\cdot 2 - \textcolor{#00dd00}{2}\cdot 5) x + (\textcolor{red}{5} \cdot 3 - \textcolor{#00dd00}{2} \cdot 2) y = (\textcolor{red}{5})(-5) - \textcolor{#00dd00}{2}(4)$
$\Leftrightarrow$
$11 y = -33$
$\class{hidden}{\Big\{}\begin{align*} \class{hidden}{2 x + 3 y} & \class{hidden}{= -5} \\ 11 y & = -33 \end{align*}$
$\class{hidden}{5(2 x + 3 y) = }\textcolor{red}{5}\class{hidden}{(-5)}$
$\class{hidden}{2(5 x + 2 y) =}\textcolor{#00dd00}{2}\class{hidden}{(4)}$
$\class{hidden}{\Big\{}\begin{align*} \class{hidden}{2 x + 3 y} & \class{hidden}{= -5} \\ 11 y & = -33 \end{align*}$
$\class{hidden}{\Big\{}\begin{align*} 2 x + 3 y & = -5 \\ \class{hidden}{11 y} & \class{hidden}{= -33} \end{align*}$
$2 x + 3 y = -5$
$\textcolor{red}{5}\class{hidden}{(2 x + 3 y) = 5 (-5) }$
$\class{hidden}{5(2 x + 3 y) =}\textcolor{red}{5}\class{hidden}{(-5)}$
$5 x + 2 y = 4$
$\textcolor{#00dd00}{2}\class{hidden}{(5 x + 2 y) = 2(4)}$
$\class{hidden}{2(5 x + 2 y) =}\textcolor{#00dd00}{2}\class{hidden}{(4)}$
$\class{hidden}{\Big\{}\begin{align*} \class{hidden}{2 x + 3 y} & \class{hidden}{= -5} \\ 11 y & = -33 \end{align*}$
Note that the step from the original equation system to the triangular equation system in Interactive Illustration 5.9 is not a direct application of one of the rules. For example, we add a multiple of one row to a multiple of another row, and there is no such rule. However, this can be seen as applying two rules in succession, namely, first multiplying the second row by -2 (rule 2), then adding 5 times the first row to this (rule 3). This means that the transactions in Interactive Illustration 5.9 are legal after all.

Sometimes it is not possible to eliminate the first variable in the lower rows, since it is not present in the first row. An example is
\begin{equation} \begin{cases} \begin{array}{rrrl} 3 & \bs x_2 - 6 & \bs x_3 = -21, \\ 2 x_1 + 4 & \bs x_2 - 2 & \bs x_3 = \hid{-}16, \\ - x_1 - 7 & \bs x_2 + 2 & \bs x_3 = -27. \\ \end{array} \end{cases} \end{equation} (5.16)
The solution then is simply to reorder the equations according to the first rule. Putting the first equation last, we get
\begin{equation} \begin{cases} \begin{array}{rrrl} 2 & \!\!\!\!\!\! x_1 + 4 &\!\!\!\!\!\!\!x_2 - 2 &\!\!\!\!\!\!x_3 = \hid{-}16, \\ - & \!\!\!\!\!\! x_1 - 7 &\!\!\!\!\!\!x_2 + 2 &\!\!\!\!\!\!x_3 = -27, \\ & 3 &\!\!\!\!\!\!x_2 - 6 &\!\!\!\!\!\!x_3 = -21. \\ \end{array} \end{cases} \end{equation} (5.17)
Eliminating $x_1$ from the second row can be done by multiplying it by 2 and then adding it to the first row, giving
\begin{equation} \begin{cases} \begin{array}{rrrl}\! 2 x_1 + 4&\!\!\!\!\!\!x_2 - 2&\!\!\!\!\!\!x_3 = \hid{-}16, \\ -10&\!\!\!\!\!\!x_2 + 2&\!\!\!\!\!\!x_3 = -38, \\ 3&\!\!\!\!\!\!x_2 - 6&\!\!\!\!\!\!x_3 = -21. \\ \end{array} \end{cases} \end{equation} (5.18)
To eliminate $x_2$, we multiply the second row by 3 and the last row by 10 and add them together. This gives
\begin{equation} \begin{cases} \begin{array}{rrrl} 2 x_1 + 4 x_2 - 2 x_3 & = 16, \\ -10 x_2 + 2 x_3 & = -38, \\ -54 x_3 & = -324.\\ \end{array} \end{cases} \end{equation} (5.19)
The system is now triangular and it is simple to calculate $x_3 =\frac{-324}{-54} = 6$. Inserting this in the second row yields $-10x_2 + 12 = -38$, which is equivalent to $-10 x_2 = -50$, i.e., $x_2 =5$. Finally, $x_2 = 5$ and $x_3 = 6$ are inserted into the top equation, which gives $2 x_1 + 20 - 12 = 16$, which can be simplified to $2 x_1= 8$, or simply $x_1 = 4$. The system is now solved.
5.4 Special Cases


So far, we have only encountered systems of equations which have exactly one solution. This is not always the case, as is demonstrated in
\begin{equation} \begin{cases} \begin{array}{rl} 5 &\bs x + 2 y = 4,\\ 10 &\bs x + 4 y = 13. \end{array} \end{cases} \end{equation} (5.20)
A normal attempt to solve this system would be to multiply the top equation by -2 and add to the second equation. This gives
\begin{equation} \begin{cases} \begin{array}{rl} 5 x + 2 &\bs y = 4\\ &\bs 0 = 5 \end{array} \end{cases} \end{equation} (5.21)
Zero can never be equal to five, and yet we have followed the Gaussian elimination rules (Theorem 5.1) exactly. This is because this system of equations has no solution. It is easy to see why if we instead multiply the top equation by 2, i.e.,
\begin{equation} \begin{cases} 10 x + 4 y = 8,\\ 10 x + 4 y = 13. \end{cases} \end{equation} (5.22)
These two equations are simply contradicting each other. The left hand sides are exactly the same, and yet the right hand sides are different. Hence this system does not have a solution. Doing Gaussian elimination on such a system will result in contradictions of the type $0=5$, which we encountered above. When this happens, it is safe to say that the system has no solution.
$5 x + 2 y -4 = 0$
$10 x + 4 y -13 = 0$
By interpreting the two equations in System of Equations (5.20) as two lines, it is possible to see what is going on graphically. The top equation, $5x + 2y = 4$, represents the left line in Figure 5.10. The second equation, $10x + 4y = 13$, represent the rightmost line. As can be seen in the figure, these lines are parallel and will thus never cross. Since the crossing represents the solution, two parallel lines in the plane separated from each other is equivalent to a system of equations with no solution.

Having no solution is not the only special case. As an example, assume we have the following system of equations,
\begin{equation} \begin{cases} \begin{array}{rll} - &\bs x + &\bs y = \hid{-}8,\\ 3 &\bs x - 3 &\bs y = -24. \end{array} \end{cases} \end{equation} (5.23)
Performing Gaussian elimination on the first variable can be done by adding three times the first row to the second row. This gives
\begin{equation} \begin{cases} \begin{array}{rl} -x + y & \!\!\!\!= 8,\\ 0 & \!\!\!\!= 0. \end{array} \end{cases} \end{equation} (5.24)
The second row thus collapses to $0=0$, which is always true, and no information about the $y$-variable is obtained. When this happens, there is not only one pair, $(x,y)$, that satisfies the solution, but many. To express all such pairs, set $y = t$. This gives
\begin{equation} \begin{cases} \begin{array}{rl} -x + t &\bsfyra = 8,\\ y &\bsfyra = t,\\ \end{array} \end{cases} \end{equation} (5.25)
and if $x$ is solved for, we get
\begin{equation} \begin{cases} x = -8 + t,\\ y = \class{hidden}{-}0 + t.\\ \end{cases} \end{equation} (5.26)
$\textcolor{#009600}{S = \begin{pmatrix} -8\\ 0 \end{pmatrix}}$
$\textcolor{#c80000}{\mathrm{\bf d} = \begin{pmatrix} 1\\ 1 \end{pmatrix}}$
$P(t) = S + \mathrm{\bf d}$
$-x + y = 8$
$3x - 3y = -24$
Thus the vector $\begin{pmatrix} x\\ y \end{pmatrix}$ can be written as $\begin{pmatrix} -8\\ 0 \end{pmatrix} + \begin{pmatrix} 1\\ 1 \end{pmatrix}t$. As we have seen in Section 3.6.1, this is a line of the form $P(t) = S+ t\mathrm{\bf d}$, where $S = \begin{pmatrix} -8\\ 0 \end{pmatrix}$ and $\mathrm{\bf d} = \begin{pmatrix} 1\\ 1 \end{pmatrix}$. In other words, there is an entire line of solutions to this system of equations. This is shown in Illustration 5.11, where the green dot equals the point $S$, the red arrow shows the direction vector $\mathrm{\bf d}$, and the resulting line is shown in black.

Another way to look at this is to go back to the original equation,
\begin{equation} \begin{cases} \begin{array}{rll} - &\bs x + &\bs y = \hid{-}8,\\ 3 &\bs x - 3 &\bs y = -24. \end{array} \end{cases} \end{equation} (5.27)
Now, if the bottom equation is multiplied by $-\frac{1}{3}$, we get
\begin{equation} \begin{cases} -x + y = 8,\\ -x + y = 8. \end{cases} \end{equation} (5.28)
In other words, we get the same equation twice. The second equation can be seen as just repeating what we already knew, i.e., no new information is obtained. The two lines $-x + y = 8$ and $3x - 3y =-24$ are lying on top of each other, i.e., they are the same line. This is shown in Interactive Illustration 5.11 if you press the forward button below that illustration. In fact, it is the same line, only that $-x +y = 8$ and $3x-3y=-24$ are written in implicit form, whereas $P(t) = S + \mathrm{\bf d}$ is written in explicit form.

Now, consider a system with three equations and three unknowns, such as
\begin{equation} \begin{cases} \begin{array}{rrrl} 2 & \!\!\!\!\!\! x + 4 &\!\!\!\!\!\!\!y - 2 &\!\!\!\!\!\!z = \hid{-}16, \\ - & \!\!\!\!\!\! x - 7 &\!\!\!\!\!\!y + 2 &\!\!\!\!\!\!z = -27, \\ & 3 &\!\!\!\!\!\!y - 6 &\!\!\!\!\!\!z = -21. \\ \end{array} \end{cases} \end{equation} (5.29)
Just as in the case with the smaller two-equation systems, these systems can be interpreted geometrically. However, in this case every equation becomes a plane (see Section 3.6.2) in three-dimensional space. The solution to the system of equations is the point where all three planes intersect each other. This is shown in the first step of Interactive Illustration 5.12, where each plane represents one equation, and they all intersect in a single point, which is colored black in the figure. Note that you can change the viewpoint of the interactive illustration by holding the right mouse button down while moving the mouse (ctrl plus mouse button on mac, or pressing down with two fingers on mac laptop). On an iPad, change the camera position by dragging using two fingers.
Interactive Illustration 5.12: One solution example: when all three planes intersect in one point, we have exactly one solution.
Interactive Illustration 5.12: Several solutions example: this is another example where there are infinitely many solutions along a line, because there are two equations representing the same plane (dashed) and the third plane crosses this plane, which results in a line of intersection that represents the solutions.
Sometimes however, the equations represent planes that all include the same line, as shown in the second step in Interactive Illustration 5.12. Then there is not only one solution, but infinitely many solutions to the system of equations. These solutions are equal to the line shown in black.

In other cases, two equations represent the same plane. This may be because they have the same coefficients, or one equation is equal to the other times a constant. Then there is an entire plane that satisfy these two equations, as illustrated by the dashed plane in the third step in Interactive Illustration 5.12. The solution to the entire system is then the intersection between this (double) plane and the third plane. This amounts to an entire line of solutions, as illustrated by the solid black line in the figure.

The last case when the system has a solution is when all three equations represent the same plane, such as the system
\begin{equation} \begin{cases} \begin{array}{rrrl} & \bs x + \!\!\!\!&\bs y + &\bs z = 1,\\ 2&\bs x + 2&\bs y + 2&\bs z = 2,\\ 3&\bs x + 3&\bs y + 3&\bs z = 3.\\ \end{array} \end{cases} \end{equation} (5.30)
If a point $(x,y,z)$ satisfies the first equation, it also satisfies the second and third equations since they are just the first equation multiplied by $2$ or $3$. Therefore this point is also a solution to the entire system of equations. The solution is therefore visualised as entire black plane shown in the last step in Figure 5.12.

For a system of two equations and two unknowns not to have a solution, the two lines representing the two equations must be parallel as was shown in Interactive Illustration 5.10. This is not the case in three dimensions. Consider, for instance, the following system of equations.
\begin{equation} \begin{cases} \begin{array}{rrrl} 3 \sqrt{6}&\bs x + 3&\bs y + 3&\bs z = \sqrt{6}\\ -3 \sqrt{6}&\bs x + 3&\bs y + 3&\bs z = \sqrt{6}\\ & -6&\bs y - 6&\bs z = \sqrt{6}\\ \end{array} \end{cases} \end{equation} (5.31)
It is easy to see that none of these planes are parallel, since their normals differ (and cannot be made equal by multiplying with a constant). The three normals are $(3\sqrt{6}, 3, 3)$, $(-3\sqrt{6},3, 3)$, and $(0, -6, -6)$. Yet there exists no solution as is easily seen when performing Gaussian elimination. Adding the first and second row eliminates $x$ from the first equation, i.e.,
\begin{equation} \begin{cases} \begin{array}{rrrl} 3 \sqrt{6}&\bs x + 3&\bs y + 3&\bs z = \sqrt{6},\\ & 6&\bs y + 6&\bs z = 2\sqrt{6},\\ & -6&\bs y - 6&\bs z = \sqrt{6}.\\ \end{array} \end{cases} \end{equation} (5.32)
Now to eliminate $y$, we add the two last equations together, which results in
\begin{equation} \begin{cases} \begin{array}{rrrl} 3 \sqrt{6}&\bs x + 3&\bs y + 3&\bs z = \sqrt{6},\\ & 6&\bs y + 6&\bs z = 2\sqrt{6},\\ & & &\bs 0 = 3\sqrt{6}.\\ \end{array} \end{cases} \end{equation} (5.33)
This also eliminates $z$ and ends up in the contradiction $0 =3\sqrt{6}$, which means that no solutions exist. How this looks geometrically is shown in the first step in Figure 5.13. There are places where two of the equations are satisfied, but not the third, and these places are marked by dotted lines. Note that there is no point which is in all three planes, and thus no solution to the system of equations exists, although the planes are not parallel.
Interactive Illustration 5.13: No solution example: three planes that intersect each other pairwise, but all three never intersect in a single point. The dashed lines are the intersections between two planes, and represent positions where two of the equations are satisfied but not the third.
Interactive Illustration 5.13: No solution example: in this case, two of the planes are identical. The corresponding area is dashed to illustrate that two of the equations are satisfied here. However, since no point satisfies also the third equation, there exists no solution to the system of equations.
If all three planes are parallel and non-coinciding, then there is no solution. This is shown in the second step of Figure 5.13.

A special case of this is when two equations represent the same plane, but the third equation represents a different plane parallel to, but not coinciding with, these. This example is shown in the third step of Figure 5.13, where the dashed plane satisfies two of the equations and the other plane satisfies the third equation. Again, no solution to this system of equations exists.

Finally, the last case when no solution exists is when two of the planes are parallel to each other (and different from each other). The third plane then intersects these two parallel planes along two lines (dashed in step 4 in Interactive Illustration 5.13), but there is not a single point that belongs to all three planes, and hence there exists no solution to the system of equations.

In summary, we can see that a system of equations either has % a system ... has
  1. exactly one solution,
  2. no solution, or
  3. an infinite number of solutions.
5.5 The Homogeneous Case


A special case is when all the equations equal zero, as in the following case.
\begin{equation} \begin{cases} \begin{array}{rrrl} 2 x_1 + 5 & \bs x_2 + 3 & \bs x_3 = 0, \\ 4 x_1 + 2 & \bs x_2 +\hid{3}& \bs x_3 = 0, \\ 2 x_1 - 3 & \bs x_2 - 2 & \bs x_3 = 0, \\ \end{array} \end{cases} \end{equation} (5.34)
Such a system of equations is called homogeneous, and it always has the solution $x_1 = x_2 = x_3 = 0$. Since this solution is so 'easy' to find, it is often denoted the 'trivial' solution. Sometimes however, the system of equations has more than the trivial solution. We use Gaussian elimination to see if this is the case. The first step is to eliminate $x_1$ from the two lower-most equations. For the middle equation we take two times the first equation minus the second equation. For the last equation we take the first equation minus the last equation, resulting in
\begin{equation} \begin{cases} \begin{array}{rrrl} 2 x_1 + 5 & \bs x_2 + 3 & \bs x_3 = 0, \\ 8 & \bs x_2 + 5 & \bs x_3 = 0, \\ 8 & \bs x_2 + 5 & \bs x_3 = 0. \\ \end{array} \end{cases} \end{equation} (5.35)
Eliminating $x_2$ from the last two rows gives
\begin{equation} \begin{cases} \begin{array}{rrrl} 2 x_1 + 5 & \bs x_2 + 3 & \bs x_3 = 0, \\ 8 & \bs x_2 + 5 & \bs x_3 = 0, \\ & & \bs 0 = 0. \\ \end{array} \end{cases} \end{equation} (5.36)
We can now set $x_3 = t$ which gives
\begin{equation} \begin{cases} \begin{array}{rrrl} 2 x_1 + 5 & \bs x_2 + 3 & \bs t = 0, \\ 8 & \bs x_2 + 5 & \bs t = 0, \\ & & \bs\bs x_3 = t. \\ \end{array} \end{cases} \end{equation} (5.37)
Solving for $x_2$ gives $x_2 = -\frac{5}{8}t$, and inserting this in the first equation gives $2 x_1 + 5 (-\frac{5}{8})t + 3 t = 0$. This can be simplified to $x_1 = \frac{1}{16}t$. We have
\begin{equation} \begin{cases} \begin{array}{rrl} x_1 = & \bs \frac{1}{16} & \bs t, \\ x_2 = & \bs -\frac{5}{8} & \bs t, \\ x_3 = & \bs &\bs t, \\ \end{array} \end{cases} \end{equation} (5.38)
which means that, in addition to the trivial solution $x_1 = x_2 = x_3= 0$, there is an infinite number of solutions to this system of equations.
5.6 Implicit and Explicit form


As we have seen in Section 3.6, planes and lines can be written on both implicit and explicit form. Using Gaussian elimination, it is possible to convert between these two types. Recall that the plane $P(t_1, t_2) = S + t_1 \vc{d}_1 + t_2 \vc{d}_2$ is on explicit form, while $\vc{n}\cdot(P-S)=0$ is a plane on implicit form. Assume we have the following plane
\begin{equation} \begin{cases} x &= \class{hidden}{0 +} \frac{3}{2} t_1, \\ y &= \class{hidden}{0 + \frac{3}{2}} t_1 + t_2, \\ z &= 1 \class{hidden}{\frac{3}{2}}-t_1 + 2 t_2, \end{cases} \end{equation} (5.39)
where $S = (0, 0, 1)$, $\vc{d}_1 = (\frac{3}{2}, 1, -1)$, and $\vc{d}_2 = (0, 1, 2)$. This can be transformed to explicit form by using Gaussian elimination to remove the variables $t_1$ and $t_2$. To see how this can be done, the system of equations is first rewritten so that the variables to be eliminated are in the leftmost position, i.e.,
\begin{equation} \begin{cases} \frac{3}{2} &\bs t_1 \class{hidden}{+2t_2}= x,\\ &\bs t_1 \class{hidden}{2}+ t_2 = y,\\ -&\bs t_1 + 2 t_2 = z - 1.\\ \end{cases} \end{equation} (5.40)
To make the Gaussian elimination easier, the top row is moved to the bottom, which results in
\begin{equation} \begin{cases} &\bs t_1 \class{hidden}{2}+ t_2 = y,\\ -&\bs t_1 + 2 t_2 = z - 1,\\ \frac{3}{2} &\bs t_1 \class{hidden}{+2t_2}= x.\\ \end{cases} \end{equation} (5.41)
To eliminate $t_1$ in the second row, the two top rows are added. To eliminate $t_1$ in the last row, the first row is multiplied by $\frac{3}{2}$ and then the last row is subtracted, resulting in
\begin{equation} \begin{cases} \begin{array}{rl} t_1 + & \bsfem t_2 = y,\\ 3 & \bsfem t_2 = y + z - 1,\\ \frac{3}{2} & \bsfem t_2 = \frac{3}{2}y - x.\\ \end{array} \end{cases} \end{equation} (5.42)
It is now possible to eliminate $t_2$ from the last equation by subtracting two times the last row from the middle row, i.e.,
\begin{equation} \begin{cases} \begin{array}{rl} t_1 \class{hidden}{2}+ t_2 & \bsfem = y,\\ 3 t_2 & \bsfem = y + z - 1,\\ 0 & \bsfem= y + z - 1 - 2(\frac{3}{2}y - x).\\ \end{array} \end{cases} \end{equation} (5.43)
The last row simplifies to $2x - 2y +z - 1 = 0$, which is the implicit form of the plane equation. To go from implicit to explicit form is easier. Setting $y=t_1$ and $z = t_2$ gives the equation system
\begin{equation} \begin{cases} \begin{array}{rl} 2x - 2t_1 +t_2 + -1 &\bsfem = 0,\\ y &\bsfem = t_1,\\ z &\bsfem = t_2, \end{array} \end{cases} \end{equation} (5.44)
which can be rewritten as
\begin{equation} \begin{cases} \begin{array}{rlll} x &\bsfem = \frac{1}{2} + t_1 -\frac{1}{2}t_2, \\ y &\bsfem = \class{hidden}{\frac{1}{2} } +t_1, \\ z &\bsfem = \class{hidden}{\frac{1}{2} + t1 \frac{1}{2}} + t_2. \end{array} \end{cases} \end{equation} (5.45)
This is now on explicit form, i.e., $P(t_1, t_2) = \hat{S} + t_1\hat{\vc{d}}_1 + t_2 \hat{\vc{d}}_2$, where $\hat{S}=(\frac{1}{2},0,0)$, $\hat{\vc{d}}_1 = (1, 1, 0)$, and $\hat{\vc{d}}_2 = (-\frac{1}{2}, 0, 1)$. Note that this explicit form does not look like the one (Equation (5.39)) we started out with, i.e., $\hat{S} \neq S$, $\hat{\vc{d}_1} \neq \vc{d}_1$, and $\hat{\vc{d}_2}\neq \vc{d}_2$. Yet those two represent the same plane. The explanation is simply that there exists many different parameterizations for a particular plane. This is shown in Figure 5.14, where the point $S$, the vectors $\vc{d}_1$ and $\vc{d}_2$ are shown in red and represent the plane used in our example. However, one may also use another point $\hat{S}$ and two other vectors $\hat{\vc{d}}_1$, $\hat{\vc{d}}_2$, shown in blue, to represent the same plane.
5.7 Theoretical Underpinning


The sections above have given examples of Gaussian elimination along with geometrical interpretations where this was possible. In the following section, we will go through the theory why Gaussian elimination works.

5.7.1 The Gaussian Elimination Rules



Let us revisit our example Systems of Equation (5.16), that is,
\begin{equation} \begin{cases} \begin{array}{rrrl} 3 & \bs x_2 - 6 & \bs x_3 = -21, \\ 2 x_1 + 4 & \bs x_2 - 2 & \bs x_3 = \hid{-}16, \\ - x_1 - 7 & \bs x_2 + 2 & \bs x_3 = -27. \\ \end{array} \end{cases} \end{equation} (5.46)
Using Gaussian elimination, we arrived at a triangular system in (5.19), which had the solution $\vc{x} =(x_1,x_2,x_3) = (4, 5, 6)$. But how can we be sure that this is a solution also to the original system above? It is of course possible try the solution. Inserting the solution, $\vc{x}$, in the above system gives
\begin{equation} \begin{cases} \begin{array}{rrrrl} 3 \cdot & \bs 5 - 6 \cdot & \bs 6 = & \bstwo 15 - 36 & = -21, \\ 2\cdot 4 + 4 \cdot & \bs 5 - 2 \cdot & \bs 6 = & \bstwo 8 + 20 - 12 & = \hid{-}16, \\ - 4 - 7 \cdot & \bs 5 + 2 \cdot & \bs 6 = & \bstwo -4 -35 + 12 & = -27, \\ \end{array} \end{cases} \end{equation} (5.47)
which matches exactly. However, even though this particular case has been verified, it would be desirable with a proof that Gaussian elimination will never change the solutions of a system of equations.

Here it is useful to introduce the notion of implication $\Rightarrow$ and equivalence $\Leftrightarrow$. We say that one system of equations implies a second system of equations if every solution to the first system is also a solution to the second system. When this works both ways we say that the two systems of equations are equivalent. Thus we would write
\begin{equation} \begin{cases} \begin{array}{rrrrrl} 1 x_1 & \bsfem +& \bsfem 2 x_2 & \bsfem = 1, \\ 2 x_1 & \bsfem +& \bsfem 5 x_2 & \bsfem = 7, \\ \end{array} \end{cases} \Leftrightarrow \begin{cases} \begin{array}{rrrrrl} 1 x_1 & \bsfem +& \bsfem 2 x_2 & \bsfem = 1, \\ & \bsfem +& \bsfem 1 x_2 & \bsfem = 5, \\ \end{array} \end{cases} \end{equation} (5.48)
to say that the set of solutions to the second system is exactly the same as the solutions to the first system. But how can we be sure that the solutions to the systems of equations do not change as we are manipulating them? This is captured by the following theorem, which is repetition of Theorem 5.1 as a convenience to the reader.

Theorem 5.2: Gaussian Elimination Rules
The solutions to a linear system of equations do not change if we
  1. swap the order of two equations,
  2. multiply an equation with a constant $\neq 0$, or
  3. add a multiple of another equation to an equation.

The first rule of Gaussian elimination in Theorem 5.2 says it is allowed to swap the order of two equations. It should be readily appreciated that simple reordering of the equations does not change the solution of the system of equations, i.e.,
\begin{equation} \begin{cases} \begin{array}{rrrl} a_{11} x & \bsfem \, + a_{12} y & \bsfem \, + a_{13} z & \bsfem \, = b_1 , \\ a_{21} x & \bsfem \, + a_{22} y & \bsfem \, + a_{23} z & \bsfem \, = b_2 , \end{array} \end{cases} \Leftrightarrow \begin{cases} \begin{array}{rrrl} a_{21} x & \bsfem \, + a_{22} y & \bsfem \, + a_{23} z & \bsfem \, = b_2 , \\ a_{11} x & \bsfem \, + a_{12} y & \bsfem \, + a_{13} z & \bsfem \, = b_1 . \end{array} \end{cases} \end{equation} (5.49)


The next rule allows us to multiply an equation with a constant different from $0$. From algebra, it should be clear that multiplying both sides of an equation by a non-zero constant does not change the solution to the equation, and hence does not affect the solution to the system of equations.
\begin{equation} \begin{cases} \begin{array}{rrrll} a_{11} x & \bsfem \, + a_{12} y & \bsfem \, + a_{13} z & \bsfem \, = b_1 , & \bsfem \, \, \, (i) \\ a_{21} x & \bsfem \, + a_{22} y & \bsfem \, + a_{23} z & \bsfem \, = b_2 , & \bsfem \, \, \, (ii) \end{array} \end{cases} \Leftrightarrow \begin{cases} \begin{array}{rrrll} ka_{11} x & \bsfem \, + ka_{12} y & \bsfem \, + ka_{13} z & \bsfem \, = kb_1 , & \bsfem \, \, \, (i') = k (i)\\ a_{21} x & \bsfem \, + a_{22} y & \bsfem \, + a_{23} z & \bsfem \, = b_2 . & \bsfem \, \, \, (ii') = (ii) \end{array} \end{cases} \end{equation} (5.50)
Notice that we can obtain the original system of equations by dividing $(i')$ with $k$. Thus every solution to the equations $(i), (i)$ are solutions to $(i'), (i')$ and vice versa.

Multiplying by zero on the other hand can change the solution. As an example, $(x,y) = (2, 3)$ does not satisfy the equation
\begin{equation} x + y = 1, \, \, \, (i) \end{equation} (5.51)
but it does satisfy the equation
\begin{equation} 0\cdot (x + y) = 0 \cdot 1. \, \, \, \, (i') = 0 (i) \end{equation} (5.52)
Here every solution to $(i)$ is a solution to $(i')$, but there might be solutions to $(i')$ that are not solutions to $(i)$. We cannot retrieve the equation $(i)$ from $(i')$.

The third rule allows us to add a multiple of another row to a row. Assume we have a solution $(x, y, z)$ that satisfies the system
\begin{equation} \begin{cases} \begin{array}{rrrll} a_{11} x & \bsfem \, + a_{12} y & \bsfem \, + a_{13} z & \bsfem \, = b_1, & \bsfem \, \, \, (i)\\ a_{21} x & \bsfem \, + a_{22} y & \bsfem \, + a_{23} z & \bsfem \, = b_2, & \bsfem \, \, \, (ii)\\ a_{31} x & \bsfem \, + a_{32} y & \bsfem \, + a_{33} z & \bsfem \, = b_3. & \bsfem \, \, \, (iii)\\ \end{array} \end{cases} \end{equation} (5.53)
By adding $k$ times the first row to the second, we obtain
\begin{equation} \begin{cases} \begin{array}{rrrrll} a_{11} x & \bsfem \, + a_{12} y & \bsfem \, + a_{13} z & \bsfyra = & \bsfem b_1, & \bsfem \, \, \, (i')=(i) \\ (a_{21}+ka_{11}) x & \bsfem \, + (a_{22}+ka_{12}) y & \bsfem \, + (a_{23}+ka_{13}) z & \bsfyra = & \bsfem b_2 + k b_1, & \bsfem \, \, \, (ii') = (ii)+k(i)\\ a_{31} x & \bsfem \, + a_{32} y & \bsfem \, + a_{33} z & \bsfyra = & \bsfem b_3. & \bsfem \, \, \, (iii') = (iii) \\ \end{array} \end{cases} \end{equation} (5.54)
Since only standard algebra rules have been used here (e.g., adding a constant to both sides of an equation, which does not change the equation itself), a solution $(x,y,z)$ to the original System (5.53) must also be a solution to the System (5.54) above. It is important to notice that we can obtain the original system System (5.53) from the new system System (5.54) by adding $-k$ of the first row (i') to the second row (ii'). Thus every solution to System (5.54) must also be a solution to the System (5.53) above. Thus we have proved that the two systems are equivalent.
$\square$


5.7.2 The General Case



This chapter has given examples of systems of linear equations that either have no solution, one solution, or an infinite number of solutions. This section will go through the general case and formally prove that this is the case.

Theorem 5.3: Solutions to a System of Linear Equations
A system of linear equations has either
  1. no solution, or
  2. exactly one solution,
  3. an infinite number of solutions.

A general linear system looks like
\begin{equation} \begin{cases} \begin{array}{rrrrrl} a_{11} x_1 + & \bs a_{12} x_2 + & \bs a_{13} x_3 + & \bs \ldots + & \bs a_{1M} x_M & \bs = b_1,\\ a_{21} x_1 + & \bs a_{22} x_2 + & \bs a_{23} x_3 + & \bs \ldots + & \bs a_{2M} x_M & \bs = b_2,\\ a_{31} x_1 + & \bs a_{32} x_2 + & \bs a_{33} x_3 + & \bs \ldots + & \bs a_{3M} x_M & \bs = b_3,\\ & & & & & \bsfem \vdots\\ a_{N1} x_1 + & \bs a_{N2} x_2 + & \bs a_{N3} x_3 + & \bs \ldots + & \bs a_{NM} x_M & \bs = b_N,\\ \end{array} \end{cases} \end{equation} (5.55)
where $a_{kl}$ and $b_k$ are known constants and where $x_k$ are the unknown variables. We will first work with the first variable $x_1$. Two things can happen. Either all of the coefficients $a_{i1}$ are zero. Then there is nothing more we can do with this variable. In Example 5.5 we describe this case in more detail. The more typical case is that at least one of the coefficents $a_{i1}$ is non-zero. In this case it is always possible to reorder the equations, using the first rule of Gaussian elimination, to a system such as the above where $a_{11}\neq 0$. It is then possible to add $-(a_{21}/a_{11})$ times the first row to the second row, and $-(a_{31}/a_{11})$ times the first row to the third row and so forth, and hence eliminate $x_1$ from all equations except the first. Calling the new constants $a'_{kl}$ and $b'_k$ gives
\begin{equation} \begin{cases} \begin{array}{rl} a_{11} x_1 + & \bs a_{12} x_2 + & \bs a_{13} x_3 + & \bs \ldots & \bsfem + a_{1M} x_M & \bs = b_1,\\ & \bs a'_{22} x_2 + & \bs a'_{13} x_3 + & \bs \ldots & \bsfem + a'_{2M} x_M & \bs = b'_2,\\ & \bs a'_{32} x_2 + & \bs a'_{33} x_3 + & \bs \ldots & \bsfem + a'_{3M} x_M & \bs = b'_3,\\ & & & & & \bsfem \vdots\\ & \bs a'_{N2} x_2 + & \bs a'_{N3} x_3 + & \bs \ldots & \bsfem + a'_{NM} x_M & \bs = b'_N.\\ \end{array} \end{cases} \end{equation} (5.56)
After this step, the coefficients before $x_1$ is non-zero in the first equation only or they are all zero.

Now the same procedure can be applied to the smaller system of primed constants, namely, all rows in (5.56) that do not involve $x_1$. Again there are two cases. Either all of these equations have a coefficient of zero for the variable $x_2$ that we are trying to eliminate, or there is at least one equation where there is a non-zero coefficient for $x_2$. After reordering and elimination, either all equations have zero coefficient for the variable $x_2$, or there is exactly one equation where there is a non-zero coefficient for $x_2$.

This process is continued until there are either no more equations involving the unknown variables. In the end, there might be equations left of the type $0 = 0$. These can just be removed. There are three possible outcomes.

The first case is if there is at least one equation of type $0 = \hat{b}_{i} \neq 0$, in which case there is no solution to the system. An example of this is shown in the Systems of Equations (5.33). For the continued discussion we assume that this is not the case.

The second case is when the resulting system is triangular, i.e.,
\begin{equation} \begin{cases} \begin{array}{rl} \hat{a}_{11} x_1 + & \bs \hat{a}_{12} x_2 + & \bs \hat{a}_{13} x_3 + & \bs \ldots + & \bsfem \hat{a}_{1M} x_M & \bs = \hat{b}_1,\\ & \bs \hat{a}_{22} x_2 + & \bs \hat{a}_{13} x_3 + & \bs \ldots + & \bsfem \hat{a}_{2M} x_M & \bs = \hat{b}_2,\\ & & \bs \hat{a}_{33} x_3 + & \bs \ldots + & \bsfem \hat{a}_{3M} x_M & \bs = \hat{b}_3,\\ & & & & & \bsfem \vdots\\ & & & & \bsfem \hat{a}_{MM} x_M & \bs = \hat{b}_M,\\ \end{array} \end{cases} \end{equation} (5.57)
where $\hat{a}_{kl}$ and $\hat{b}_k$ are the constants after a full pass of Gaussian elimination. In this case, the variable $x_M$ can be exactly recovered, and this result can be used to recover $x_{M-1}$ from the second last equation and so forth. Thus the system has one unique solution.

The third possibility is that there are fewer equations than unknown after elimination, e.g.,
\begin{equation} \begin{cases} \begin{array}{rl} \hat{a}_{11} x_1 + & \bs \hat{a}_{12} x_2 + & \bs \hat{a}_{13} x_3 + & \bs \hat{a}_{14} x_4 + & \bsfem \hat{a}_{15} x_M & \bs = \hat{b}_1,\\ & & \bs \hat{a}_{23} x_3 + & \bs \hat{a}_{24} x_4 + & \bsfem \hat{a}_{25} x_5 & \bs = \hat{b}_2,\\ & & & \bs \hat{a}_{34} x_4 + & \bsfem \hat{a}_{35} x_5 & \bs = \hat{b}_3.\\ \end{array} \end{cases} \end{equation} (5.58)
As can be seen in this example, there are $3$ equations with non-zero, leading coefficients before $3$ variables $x_1$, $x_3$, and $x_4$. There are in fact no constraints on the remaining two variables, i.e., $x_2$ and $x_5$. One can then set $x_2 = t_1$, $x_5 = t_2$ and solve for the remaining variables. In this case
\begin{equation} x_4 = \frac{1}{\hat{a}_{34}}(\hat{b}_3 -\hat{a}_{35} t_{2}) . \\ \end{equation} (5.59)
After this, the remaining variables (in this case $x_1$ and $x_3$) can be solved. Since the solutions depends on a set of variables $\{t_1, t_2\}$, which can take on any values, we have an infinite number of solutions.

Thus, the system has either no solution, one solution, or an infinite number of solutions.
$\square$


Note that for the case of infinite number of soutions, it is also interesting to understand how many free variables there are in the solution set. Or differently phrased, what is the dimension of the solution set. The is captured by the concept of $\rank$, which will be explained in Chapter 8.

Example 5.5: Vanishing Coefficients
As mentioned in the proof above, sometimes when performing Gaussian elimination, a certain variable may get a zero coefficient in all remaining equations. In this case one may simply need to skip one step, as is illustrated in this example,
\begin{equation} \begin{cases} \begin{array}{rrrrrrrl} x_1 & \bfm + & \bfm x_2 & \bfm + & \bfm 2 x_3 & \bfm + & \bfm 3 x_4 & \bfm = 1, \\ 2 x_1 & \bfm + & \bfm 2 x_2 & \bfm + & \bfm 4 x_3 & \bfm + & \bfm 7 x_4 & \bfm = 1, \\ 3 x_1 & \bfm + & \bfm 3 x_2 & \bfm + & \bfm 6 x_3 & \bfm + & \bfm 10 x_4 & \bfm = 2. \\ \end{array} \end{cases} \end{equation} (5.60)
We eliminate $x_1$ in the second line by adding $-2$ times the first equation to the second equation, and we eliminate $x_1$ in the third line by adding $-3$ times the first equation to the last equation. We then get
\begin{equation} \begin{cases} \begin{array}{rrrrrrrl} x_1 & \bfm + & \bfm x_2 & \bfm + & \bfm 2 x_3 & \bfm + & \bfm 3 x_4 & \bfm = 1, \\ & & & & & \bfm & \bfm x_4 & \bfm = -1, \\ & & & & & \bfm & \bfm x_4 & \bfm = -1. \\ \end{array} \end{cases} \end{equation} (5.61)
Normally after having eliminated $x_1$, the next step would be to try to eliminate $x_2$. But in this case the coefficient in front of $x_2$ is zero both for the second and third line. We then simply skip this and go to the next variable $x_3$. The same happens here, so we skip to the last variable, $x_4$. Eliminating that just gives the equation $0=0$ so we can rewrite the system as
\begin{equation} \begin{cases} \begin{array}{rrrrrrrl} x_1 & \bfm + & \bfm x_2 & \bfm + & \bfm 2 x_3 & \bfm + & \bfm 3 x_4 & \bfm = 1, \\ & & & & & \bfm & \bfm x_4 & \bfm = -1. \\ \end{array} \end{cases} \end{equation} (5.62)
From the last equation we can see that $x_4 = -1$. We then set $x_2 = s$ and $x_3 = t$ and can then insert these into the top equation,
\begin{equation} x_1 + s + 2t + 3(-1) = 1, \end{equation} (5.63)
from which we can gather that
\begin{equation} x_1 = -s - 2t + 4. \end{equation} (5.64)
In summary we therefore have
\begin{equation} \begin{cases} \begin{array}{rllll} x_1 =& \bstre - & \bs s -2 \bs & t \bs & +4,\\ x_2 =& \bstre & \bs s, \bs & \bs & \\ x_3 =& \bstre & \bs \bs & t, \bs & \\ x_4 =& \bstre & \bs \bs & \bs & -1.\\ \end{array} \end{cases} \end{equation} (5.65)
5.8 Linear Dependence and Independence


In this section, we will introduce linear dependence and linear independence, which are two important concepts in linear algebra. However, we start with linear combination, which is another concept that also is often used. The definition follows below.

Definition 5.1: Linear Combination
The vector, $\vc{u}$, is a linear combination of the vectors, $\vc{v}_1,\dots,\vc{v}_n$ when $\vc{u}$ is expressed as
\begin{equation} \vc{u} = k_1\vc{v}_1 + k_2\vc{v}_2 + \dots + k_n \vc{v}_n = \sum_{i=1}^{n} k_i \vc{v}_i, \end{equation} (5.66)
where $k_1, k_2, \dots, k_n$ are scalar values.
For the column vector notation (Definition 2.5), we saw, for example, that $\vc{w}= w_x \vc{e}_1+w_y \vc{e}_2+w_z \vc{e}_3$, That is, the three-dimensional vector, $\vc{w}$, is a linear combination of the basis vectors, $\vc{e}_1$, $\vc{e}_2$, and $\vc{e}_3$, and the scalar values are $w_x$, $w_y$, and $w_z$.

Using the definition of the linear combination, we will now look into sets of vectors that are linear dependent and linear independent.

Definition 5.2: Linear Independence and Dependence
The set of vectors, $\vc{v}_1,\dots,\vc{v}_n$, is said to be linearly independent, if the equation
\begin{equation} k_1\vc{v}_1 + k_2 \vc{v}_2 + \dots + k_n \vc{v}_n = \vc{0}, \end{equation} (5.67)
only has a single solution which is
\begin{equation} k_1 = k_2 = \dots = k_n =0. \end{equation} (5.68)
If there is at least one other solution, then the set of vectors is linearly dependent.

Example 5.6: Standard Basis
The standard basis in three dimensions consists of the basis vectors
\begin{align} \vc{e}_x & = (1,0,0), \\ \vc{e}_y & = (0,1,0), \\ \vc{e}_z & = (0,0,1). \end{align} (5.69)
To determine whether they are linearly independent or not, one can set up Equation (5.67) for this situation,
\begin{equation} k_1\vc{e}_x + k_2\vc{e}_y + k_3\vc{e}_z = \vc{0}. \end{equation} (5.70)
Now, if we instead write this on column vector notation (Definition 2.5), we get
\begin{align} \vc{0} &= k_1\vc{e}_x + k_2\vc{e}_y + k_3\vc{e}_z = k_1 \begin{pmatrix}1\\ 0 \\ 0\end{pmatrix} + k_2 \begin{pmatrix}0\\ 1 \\ 0\end{pmatrix} + k_3 \begin{pmatrix}0\\ 0 \\ 1\end{pmatrix} \\ &= \begin{pmatrix}k_1\\ 0 \\ 0\end{pmatrix} + \begin{pmatrix}0\\ k_2 \\ 0\end{pmatrix} + \begin{pmatrix}0\\ 0 \\ k_3\end{pmatrix} = \begin{pmatrix}k_1\\ k_2 \\ k_3\end{pmatrix}. \end{align} (5.71)
So, we arrive at $(k_1, k_2, k_3) = (0,0,0)$, which proves that the standard basis is linearly independent.

Theorem 5.4: Linear combination and linear dependence
One of the vectors $\vc{v}_1, \vc{v}_2, \dots, \vc{v}_n$ can be written as a linear combination of the others if and only if the vectors $\vc{v}_1, \vc{v}_2, \dots, \vc{v}_n$ are linearly dependent.

We know from the definition of linear independence (Definition 5.2) that if the vectors are linearly dependent, then the equation
\begin{equation} k_1\vc{v}_1 + k_2\vc{v}_2 + \dots + k_n \vc{v}_n = \vc{0} \end{equation} (5.72)
has a solution where at least one of $k_1, k_2, \dots k_n$ is not zero. Assume that the last value $k_n$ is such a non-zero value. (We can always reorder the indices so that this is true.) We can then move the $n$:th term to the other side
\begin{equation} k_1\vc{v}_1 + k_2\vc{v}_2 + \dots + k_{n-1} \vc{v}_{n-1} = - k_n \vc{v}_n, \end{equation} (5.73)
and since $k_n$ is not zero we can divide by $-k_n$ and switch sides, which gives
\begin{equation} \vc{v}_n = -\frac{k_1}{k_n}\vc{v}_1 - \dots - \frac{k_{n-1}}{k_n} \vc{v}_{n-1}. \end{equation} (5.74)
As we can see we have now written the vector $\vc{v}_n$ as a linear combination of the other vectors.

The proof in the other direction is simpler. Assume that one of the vectors can be written as a linear combination of the others. After renumbering, assume it is vector $\vc{v}_n$. We then get
\begin{equation} \vc{v}_n = a_1\vc{v}_1 + a_2\vc{v}_2 + \dots + a_{n-1} \vc{v}_{n-1}. \end{equation} (5.75)
By moving the $\vc{v}_n$-term to the other side we get
\begin{equation} 0 = a_1\vc{v}_1 + a_2\vc{v}_2 + \dots + a_{n-1} \vc{v}_{n-1} - 1\vc{v}_n, \end{equation} (5.76)
and by selecting $k_1 = a_1, k_2 = a_2, \dots k_{n-1} = a_{n-1}, k_n = -1$, we have a solution to the equation
\begin{equation} k_1\vc{v}_1 + k_2\vc{v}_2 + \dots + k_{n-1} \vc{v}_{n-1} - k_n\vc{v}_n = 0 \end{equation} (5.77)
where not all $k_j$ are zero, since $k_n$ equals $-1$. This means that the vectors $\vc{v}_1, \vc{v}_2, \dots \vc{v}_n$ are linearly dependent.
$\square$


Example 5.7:
The two vectors $\vc{u} = (1, 2, 3)$ and $\vc{v} = (2, 4, 6)$ are linearly dependent. This is true since $\vc{v}$ can be written as a linear combination of $\vc{u}$ according to $\vc{v} = 2\vc{u}$.

Example 5.8:
The three vectors $\vc{u} = (1, 2, 3)$, $\vc{v} = (2, 4, 6)$, and $\vc{w} = (1, 0, 7)$ are linearly dependent. This is true since $\vc{v}$ can be written as a linear combination of $\vc{u}$ and $\vc{w}$ according to $\vc{v} = 2\vc{u} + 0\vc{w}$. From this one can draw the conclusion that adding a vector to a set of vectors that are already linearly dependent can never make the new set linearly independent.

Example 5.9:
The single vector $\vc{v} = (0, 0, 0)$ is linearly dependent. This follows from Definition 5.2 since $k_1\vc{v} = \vc{0}$ for non-zero values of $k_1$, for instance $k_1 = 5$. For the same reason a set that consists of a single nonzero vector is linearly independent.

Example 5.10:
Examine if the three vectors $\vc{u} = (1, 2, 3)$, $\vc{v} = (2, 3, 4)$, $\vc{w} = (-1, 0, 1)$ are linearly independent.

There are no obvious way to see that the vectors are linearly dependent. As an example, none of the vectors are equal to $(0, 0, 0)$ and no vector is a simple multiple of another vector. We go back to the definition which states that the vectors are linearly independent if and only if
\begin{equation} k_1 \begin{pmatrix}1\\ 2 \\ 3\end{pmatrix} + k_2 \begin{pmatrix}2\\ 3 \\ 4\end{pmatrix} + k_3 \begin{pmatrix}-1\\ 0 \\ 1\end{pmatrix} = \begin{pmatrix}0\\ 0 \\ 0\end{pmatrix} \end{equation} (5.78)
only has the solution $(k_1, k_2, k_3) = (0, 0, 0)$. Each of the three components now give rise to an equation, resulting in the system
\begin{equation} \begin{cases} \begin{array}{rrrl} k_1 \bsfem & + \ 2 k_2 \bsfem & - \ k_3 \bsfem & = 0,\\ 2 k_1 \bsfem & + \ 3 k_2 \bsfem & & = 0,\\ 3 k_1 \bsfem & + \ 4 k_2 \bsfem & + \ k_3 \bsfem & = 0.\\ \end{array} \end{cases} \end{equation} (5.79)
Eliminating $k_1$ from the second and third equation gives
\begin{equation} \begin{cases} \begin{array}{rrrl} k_1 \bsfem & + \ 2 k_2 \bsfem & - \ \hid{2} k_3 \bsfem & = 0,\\ & k_2 \bsfem & - \ 2 k_3 \bsfem & = 0,\\ & 2 k_2 \bsfem & - \ 4 k_3 \bsfem & = 0.\\ \end{array} \end{cases} \end{equation} (5.80)
Now if we try to eliminate $k_2$ from the third equation we get
\begin{equation} \begin{cases} \begin{array}{rrrl} k_1 \bsfem & + \ 2 k_2 \bsfem & - \ \hid{2} k_3 \bsfem & = 0,\\ & k_2 \bsfem & - \ 2 k_3 \bsfem & = 0,\\ & & 0 \bsfem & = 0.\\ \end{array} \end{cases} \end{equation} (5.81)
The system of equations is underdetermined and there is an infinite number of solutions. Setting $k_3 = t$ and inserting in the second equation gives $k_2 = 2t$, and inserting these two values in the first equation gives
\begin{equation} k_1 + \ 4 t - \ t = 0, \end{equation} (5.82)
which can be simplified to $k_1 = -3t$. Since $t$ can be non-zero, we have that the three vectors are linearly dependent. As an example, using $t = 1$, we get $k_1 = -3$, $k_2 = 2$ and $k_3 = 1$ which can be inserted into Equation (5.78) yielding
\begin{equation} -3 \begin{pmatrix}1\\ 2 \\ 3\end{pmatrix} +2 \begin{pmatrix}2\\ 3 \\ 4\end{pmatrix} + 1 \begin{pmatrix}-1\\ 0 \\ 1\end{pmatrix} = \begin{pmatrix}0\\ 0 \\ 0\end{pmatrix}, \end{equation} (5.83)
and by moving the first two vectors over to the right-hand side we see that $\vc{w} = 3\vc{u} - 2\vc{v}$.

If we instead change the last vector to $\vc{w}' = (-1, 0, 2)$, we get the following system of equations
\begin{equation} \begin{cases} \begin{array}{rrrl} k_1 \bsfem & + \ 2 k_2 \bsfem & - \hid{2}k_3 \bsfem & = 0,\\ 2 k_1 \bsfem & + \ 3 k_2 \bsfem & & = 0,\\ 3 k_1 \bsfem & + \ 4 k_2 \bsfem & + 2k_3 \bsfem & = 0.\\ \end{array} \end{cases} \end{equation} (5.84)
Eliminating $k_1$ gives
\begin{equation} \begin{cases} \begin{array}{rrrl} k_1 \bsfem & + \ 2 k_2 \bsfem & - \ \hid{2} k_3 \bsfem & = 0,\\ & k_2 \bsfem & - \ 2 k_3 \bsfem & = 0,\\ & 2 k_2 \bsfem & - \ 5 k_3 \bsfem & = 0.\\ \end{array} \end{cases} \end{equation} (5.85)
\begin{equation} \begin{cases} \begin{array}{rrrl} k_1 \bsfem & + \ 2 k_2 \bsfem & - \ \hid{2} k_3 \bsfem & = 0,\\ & k_2 \bsfem & - \ 2 k_3 \bsfem & = 0,\\ & & k_3 \bsfem & = 0.\\ \end{array} \end{cases} \end{equation} (5.86)
The last equation gives $k_3 = 0$, which inserted into the second equation gives $k_2 = 0$, and inserting these two values into the top equation gives $k_1=0$. We thus have that $(k_1, k_2, k_3) = (0, 0, 0)$ is the only solution and the vectors $\vc{u}$, $\vc{v}$, and $\vc{w}'$ are therefore linearly independent.

Theorem 5.5: The Basis Theorem
  1. Two vectors $\vc{u}$ and $\vc{v}$ in the plane constitute a basis if and only if they are linearly independent.
  2. Three vectors $\vc{u}$, $\vc{v}$, and $\vc{w}$ in 3D space constitute a basis if and only if they are linearly independent.
  3. Three or more vectors in the plane are always linearly dependent.
  4. Four or more vectors in 3D space are always linearly dependent.

The two first theorems go in both directions, so we mark the proof in the forward direction with $\Rightarrow$ and in the backward direction with $\Leftarrow$.

1 $\Rightarrow$: Assume $\vc{u}$ and $\vc{v}$ are linearly independent. Theorem 5.4 then gives that $\vc{u}$ cannot be written as a linear combination of $\vc{v}$, i.e., $\vc{u} \neq \lambda \vc{v}$ and likewise $\vc{v} \neq \lambda \vc{u}$. This means that $\vc{u}$ and $\vc{v}$ are not parallel and Theorem 2.4 then gives that they constitute a basis in the plane.

1 $\Leftarrow$: Assume $\vc{u}$ and $\vc{v}$ constitute a basis in the plane. We will now prove that this means that $\vc{u}$ and $\vc{v}$ are linearly independent. Assume they are not linearly independent, but anyway constitute a basis, i.e., the expression $x_u \vc{u} + x_v \vc{v}$ can reach every point in the plane. Since they are linearly dependent, Theorem 5.4 states that we can write $\vc{u} = \lambda \vc{v}$ or $\vc{v} = \lambda\vc{u}$. Now, assume further that we can write $\vc{u}$ as $\vc{u} = \lambda \vc{v}$ (this can always be done by switching labels of $\vc{u}$ and $\vc{v}$). Assume we want to describe a new vector $\vc{w}$ in this basis using $\vc{w} = x_u \vc{u} + x_v \vc{v}$. This equals $\vc{w} = x_u \lambda \vc{v} + x_v \vc{v} = (x_u \lambda + x_v) \vc{v} = \mu \vc{v}$, where $\mu$ is a scalar equal to $(x_u \lambda + x_v)$. This means that only vectors $\vc{w}$ parallel to $\vc{v}$ can be described using $x_u \vc{u} + x_v \vc{v}$, and hence $\vc{u}$, $\vc{v}$ cannot be a basis for the full three-dimensional space.

2 $\Rightarrow$: Theorem 2.5 states that if there is no plane that is parallel to $\vc{u}$, $\vc{v}$ and $\vc{w}$, then the three vectors constitute a basis in the room. Assume that there is a plane $\pi$ parallel to all three vectors. Since it is parallel to $\vc{u}$ and $\vc{v}$, it can be written as
\begin{equation} \pi: \vc{p} + s\vc{u} + t\vc{v}, \end{equation} (5.87)
where $\vc{p}$ is a point in the plane and $s$ and $t$ are scalars. If $\pi$ is parallel to $\vc{w}$, it means that if we have a point $\vc{q}$ in the plane, then the point $\vc{q} + \vc{w}$ will also be in the plane. That $\vc{q}$ is in the plane means that there are two scalars $s_1$ and $t_1$ such that
\begin{equation} \vc{q} = \vc{p} + s_1\vc{u} + t_1\vc{v}, \end{equation} (5.88)
and that $\vc{q} + \vc{w}$ is in the plane means that there are two scalars $s_2$ and $t_2$ such that
\begin{equation} \vc{q} + \vc{w} = \vc{p} + s_2\vc{u} + t_2\vc{v}. \end{equation} (5.89)
If we subtract the first equation from the second, we get
\begin{equation} \vc{w} = (s_2-s_1)\vc{u} + (t_2-t_1)\vc{v}. \end{equation} (5.90)
Thus $\vc{w}$ can be written as a linear combination of $\vc{u}$ and $\vc{v}$, and according to Theorem 5.4 this means that the vectors are not linearly independent. Hence there cannot be a plane that is parallel to all three vectors $\vc{u}$, $\vc{v}$ and $\vc{w}$ if they are linearly independent. And thus, by Theorem 2.5, three linearly independent vectors $\vc{u}$, $\vc{v}$, $\vc{w}$ constitute a basis in the room.

2 $\Leftarrow$: Assume $\vc{u}$, $\vc{v}$ and $\vc{w}$ constitute a basis in the room. We will now prove that this means that $\vc{u}$, $\vc{v}$ and $\vc{w}$ are linearly independent. Assume they are not linearly independent, but anyway constitute a basis, i.e., the expression $x_u \vc{u} + x_v \vc{v} + x_w \vc{w}$ can reach every point in the room. Since they are linearly dependent, Theorem 5.4 says that we can write at least one of the vectors as a linear combination of the two others. Assume that one of them is $\vc{u}$ (otherwise rename the vectors), we can now write $\vc{u}$ as $\vc{u} = \lambda_v \vc{v} + \lambda_w \vc{w}$. Now we want to describe a $\vc{q}$ in this basis using $\vc{q} = x_u \vc{u} + x_v \vc{v} + x_w \vc{w}$. Inserting the linear expression for $\vc{u}$ gives $\vc{q} = x_u (\lambda_v \vc{v} + \lambda_w \vc{w}) + x_v \vc{v} + x_w \vc{w} = (x_u \lambda + x_v) \vc{v} + (x_u \lambda_w + x_w) \vc{w} = \mu_v \vc{v} + \mu_w \vc{w}$, where $\mu_v$ and $\mu_w$ are scalars. This means that only vectors $\vc{q}$ parallel to the plane spanned by $\vc{v}$ and $\vc{w}$ can be described using $x_u \vc{u} + x_v \vc{v} + x_w \vc{w}$, and hence $\vc{u}$, $\vc{v}, \vc{w}$ cannot be a basis for the full plane. Hence, they are linearly independent.

3: Assume $\vc{u}$, $\vc{v}$ and $\vc{w}$ are vectors in the plane. First study $\vc{u}$ and $\vc{v}$. If they are linearly dependent, i.e., parallel, then the entire set of vectors $\vc{u}$, $\vc{v}$, $\vc{w}$ are also linearly dependent and we are finished. However, if $\vc{u}$ and $\vc{v}$ are linearly independent, then according to (1) they constitue a basis in the plane. This means that $\vc{w}$ can be written as a linear combination

4: The proof for 4 follows that of 3.
\begin{equation} \vc{w} = \lambda_1 \vc{u} + \lambda_2 \vc{v} \end{equation} (5.91)
for some $\lambda_1, \lambda_2$. This can be written as
\begin{equation} \lambda_1 \vc{u} + \lambda_2 \vc{v} + (-1)\vc{w} = 0 \end{equation} (5.92)
which, according to Definition 5.2 means that the three vectors are linear dependent since $k_1 = \lambda_1$, $k_2 = \lambda_2$, $k_3 = -1$ satisfy Equation (5.67).
$\square$


5.9 Spanning


In this section, the concept of span will be introduced. It is useful when talking about how much of a vector space a set of vectors can "occupy" as a linear combination. We start with the definition.

Definition 5.3: Span
The set of vectors $\{\vc{v}_1,\dots,\vc{v}_q\}$ in $\R^n$ is said to span $\R^n$, if the equation
\begin{equation} k_1\vc{v}_1 + k_2 \vc{v}_2 + \dots + k_q \vc{v}_q = \vc{u}, \end{equation} (5.93)
has at least one solution, for every vector $\vc{u}$.

Example 5.11:
Examine if the three vectors $\vc{u} = (1, 2, 3)$, $\vc{v} = (2, 3, 4)$, and $\vc{w} = (-1, 0, 1)$ span $\R^3$.

We use the definition to see
\begin{equation} k_1 \begin{pmatrix}1\\ 2 \\ 3\end{pmatrix} + k_2 \begin{pmatrix}2\\ 3 \\ 4\end{pmatrix} + k_3 \begin{pmatrix}-1\\ 0 \\ 1\end{pmatrix} = \begin{pmatrix}u_1\\ u_2 \\ u_3\end{pmatrix} \end{equation} (5.94)
has at least one solution for every $\vc{u}= (u_1, u_2, u_3)$. Each of the three components now give rise to an equation, resulting in the system
\begin{equation} \begin{cases} \begin{array}{rrrl} k_1 \bsfem & + \ 2 k_2 \bsfem & - \ k_3 \bsfem & = u_1,\\ 2 k_1 \bsfem & + \ 3 k_2 \bsfem & & = u_2,\\ 3 k_1 \bsfem & + \ 4 k_2 \bsfem & + \ k_3 \bsfem & = u_3.\\ \end{array} \end{cases} \end{equation} (5.95)
Eliminating $k_1$ from the second and third equation gives
\begin{equation} \begin{cases} \begin{array}{rrrl} k_1 \bsfem & + \ 2 k_2 \bsfem & - \ \hid{2} k_3 \bsfem & = u_1,\\ & k_2 \bsfem & - \ 2 k_3 \bsfem & = 2u_1-u_2,\\ & 2 k_2 \bsfem & - \ 4 k_3 \bsfem & = 3 u_1-u_3.\\ \end{array} \end{cases} \end{equation} (5.96)
Now if we try to eliminate $k_2$ from the third equation we get
\begin{equation} \begin{cases} \begin{array}{rrrl} k_1 \bsfem & + \ 2 k_2 \bsfem & - \ \hid{2} k_3 \bsfem & = u_1,\\ & k_2 \bsfem & - \ 2 k_3 \bsfem & = 2 u_1-u_2,\\ & & 0 \bsfem & = u_1-2 u_2+u_3.\\ \end{array} \end{cases} \end{equation} (5.97)
The system of equations does not have a solution for every $\vc{u} = (u_1, u_2, u_3)$. In fact there is only a solution if the vector lies in the plane defined by $u_1-2 u_2+u_3 = 0$. So the three vectors do not span $\R^3$.

If we instead change the last vector to $\vc{w}' = (-1, 0, 2)$, we get the following system of equations
\begin{equation} \begin{cases} \begin{array}{rrrl} k_1 \bsfem & + \ 2 k_2 \bsfem & - \hid{2}k_3 \bsfem & = u_1,\\ 2 k_1 \bsfem & + \ 3 k_2 \bsfem & & = u_2,\\ 3 k_1 \bsfem & + \ 4 k_2 \bsfem & + 2k_3 \bsfem & = u_3.\\ \end{array} \end{cases} \end{equation} (5.98)
Eliminating $k_1$ gives
\begin{equation} \begin{cases} \begin{array}{rrrl} k_1 \bsfem & + \ 2 k_2 \bsfem & - \ \hid{2} k_3 \bsfem & = u_1,\\ & k_2 \bsfem & - \ 2 k_3 \bsfem & = 2 u_1-u_2,\\ & 2 k_2 \bsfem & - \ 5 k_3 \bsfem & = 3 u_1-u_3.\\ \end{array} \end{cases} \end{equation} (5.99)
\begin{equation} \begin{cases} \begin{array}{rrrl} k_1 \bsfem & + \ 2 k_2 \bsfem & - \ \hid{2} k_3 \bsfem & = u_1,\\ & k_2 \bsfem & - \ 2 k_3 \bsfem & = 2 u_1-u_2,\\ & & k_3 \bsfem & = u_1-2 u_2+u_3.\\ \end{array} \end{cases} \end{equation} (5.100)
This equation is solvable for all $\vc{u} = (u_1, u_2, u_3)$. So the three vectors do span $\R^3$. The last equation gives $k_3 = u_1-2 u_2+u_3$, which inserted into the second equation gives $k_2 = 4u_1-5u_2+2u_3$, and inserting these two values into the top equation gives $k_1=-6 u_1 + 8 u_2 - 3 u_3$. We thus have a unique solution
\begin{equation} \begin{cases} \begin{array}{lrrr} k_1 & = -6 u_1 \bsfem & + \ 8 u_2 \bsfem & - \ 3 u_3 & \bsfem\\ k_2 & = 4 u_1 \bsfem & - \ 5 u_2 \bsfem & + \ 2 u_3 & \bsfem\\ k_3 & = \hid{1} u_1 \bsfem & - \ 2 u_2 \bsfem & + \ \hid{1} u_3 & \bsfem\\ \end{array} \end{cases} \end{equation} (5.101)
for every $\vc{u} = (u_1, u_2, u_3)$.

Theorem 5.6: Requirements of a basis
A set of vectors $\{\vc{u}_1, \vc{u}_2, \ldots, \vc{u}_k\}$ is a basis in $\R^n$ if and only if the vectors are linearly independent and span $\R^n$.

Assume that the vectors $\vc{u}_1, \vc{u}_2, \ldots, \vc{u}_q$ are a basis of $\R^n$. Then by Definition 2.10, there is a solution to $k_1\vc{v}_1 + k_2 \vc{v}_2 + \dots + k_q \vc{v}_q= \vc{u}$ for every $\vc{u}$, so the vectors span $\R^n$ according to Definition 5.3. To see that the vectors are linearly independent, we note that the equation $ \sum_{i=1}^q k_i \vc{v}_i = \vc{0}$ has one solution, $k_1=0, k_2=0, \ldots, k_q=0$. According to the definition of a basis (Definition 2.10), there is only one solution. This proves that the vectors are linearly independent.

To complete the proof in the other direction, we assume that the vectors $\vc{u}_1, \vc{u}_2, \ldots, \vc{u}_q$ span $\R^n$ and are linearly independent. We need to show that the $\sum_{i=1}^q k_i \vc{v}_i = \vc{u}$ has exactly one solution for every $\vc{u}$. Since the vectors span $\R^n$, there is at least one solution for every $\vc{u}$. Assume that there are two solutions, $(k_1, k_2, \ldots, k_q)$ and $(k^\prime_1, k^\prime_2, \ldots, k^\prime_q)$. If we subtract the equation $\sum_{i=1}^q k^\prime_i \vc{v}_i = \vc{u}$ from $\sum_{i=1}^q k_i \vc{v}_i = \vc{u}$, we get $\sum_{i=1}^q (k_i-k^\prime_i) \vc{v}_i = \vc{0}$. Since the vectors are linearly independent (Definition 5.2), we have $(k_i-k^\prime_i) = 0$, but this gives $k_i=k^\prime_i$, so the solutions are the same.
$\square$


5.10 Change of Basis


Assume we have a basis in the plane, i.e., two basis vectors, $\vc{e}_1$ and $\vc{e}_2$, that are non-parallel. In addition, assume that we have another pair of non-parallel vectors $\hat{\vc{e}}_1, \hat{\vc{e}}_2$ that also constitute a basis. As we have seen above, these two vectors can now be described using the first two vectors as
\begin{gather} \hat{\vc{e}}_1 = x_{11} \vc{e}_1 + x_{21} \vc{e}_2,\\ \hat{\vc{e}}_2 = x_{12} \vc{e}_1 + x_{22} \vc{e}_2. \end{gather} (5.102)
Now take any vector $\vc{u}$, which can be described using the first basis with the coordinates $(u_1, u_2)$
\begin{gather} \vc{u} = u_1 \vc{e}_1 + u_2 \vc{e}_2 .\\ \end{gather} (5.103)
The same vector can also be described with the coordinates $(\hat{u}_1, \hat{u}_2)$ using the second basis according to
\begin{gather} \vc{u} = \hat{u}_1 \hat{\vc{e}}_1 + \hat{u}_2 \hat{\vc{e}}_2.\\ \end{gather} (5.104)
By inserting Equations (5.102) into Equation (5.104), we get
\begin{gather} \vc{u} = \hat{u}_1 (x_{11} \vc{e}_1 + x_{21} \vc{e}_2) + \hat{u}_2 (x_{12} \vc{e}_1 + x_{22} \vc{e}_2), \end{gather} (5.105)
which can be rearranged to
\begin{gather} \vc{u} = (x_{11} \hat{u}_1 + x_{12} \hat{u}_2 ) \vc{e}_1 + (x_{21}\hat{u}_1 + x_{22}\hat{u}_2 ) \vc{e}_2. \end{gather} (5.106)
However, since the coordinates of a vector are uniquely determined for a given basis, the number $(x_{11} \hat{u}_1 + x_{12} \hat{u}_2)$ in front of $\vc{e}_1$ in Equation (5.106) must be the same as the number $u_1$ in front of $\vc{e}_1$ in Equation (5.103), and the same goes for $u_2$. This is summarized as
\begin{gather} u_1 = x_{11} \hat{u}_1 + x_{12} \hat{u}_2, \\ u_2 = x_{21}\hat{u}_1 + x_{22}\hat{u}_2. \end{gather} (5.107)
Note thus that the equations for the relation between the basis vectors in Equation (5.102) are similar to the ones between the coordinates above, with two important distinctions: the hats are on the other side, and the rows have become columns (for instance the row $x_{11} x_{12}$ has become a column).

We will return to change of basis in the Chapter 6, which covers the topic of matrices.


Chapter 4: The Vector Product (previous) Chapter 6: The Matrix (next)