From A First Course in Linear Algebra
Version 2.01
© 2004.
Licensed under the GNU Free Documentation License.
http://linear.ups.edu/
We will now be more careful about analyzing the reduced row-echelon form
derived from the augmented matrix of a system of linear equations. In
particular, we will see how to systematically handle the situation when we
have infinitely many solutions to a system, and we will prove that every
system of linear equations has either zero, one or infinitely many solutions.
With these tools, we will be able to solve any system by a well-described
method.
The computer scientist Donald Knuth said, “Science is what we understand well enough to explain to a computer. Art is everything else.” In this section we’ll remove solving systems of equations from the realm of art, and into the realm of science. We begin with a definition.
Definition CS
Consistent System
A system of linear equations is consistent if it has at least
one solution. Otherwise, the system is called inconsistent.
△
We will want to first recognize when a system is inconsistent or consistent, and in the case of consistent systems we will be able to further refine the types of solutions possible. We will do this by analyzing the reduced row-echelon form of a matrix, using the value of r, and the sets of column indices, D and F, first defined back in Definition RREF.
Use of the notation for the elements of D and F can be a bit confusing, since we have subscripted variables that are in turn equal to integers used to index the matrix. However, many questions about matrices and systems of equations can be answered once we know r, D and F. The choice of the letters D and F refer to our upcoming definition of dependent and free variables (Definition IDV). An example will help us begin to get comfortable with this aspect of reduced row-echelon form.
Example RREFN
Reduced row-echelon form notation
For the 5 × 9
matrix
in reduced row-echelon form we have
Notice that the sets
have nothing in common and together account for all of the columns of B (we say it is a partition of the set of column indices). ⊠
The number r is the single most important piece of information we can get from the reduced row-echelon form of a matrix. It is defined as the number of nonzero rows, but since each nonzero row has a leading 1, it is also the number of leading 1’s present. For each leading 1, we have a pivot column, so r is also the number of pivot columns. Repeating ourselves, r is the number of nonzero rows, the number of leading 1’s and the number of pivot columns. Across different situations, each of these interpretations of the meaning of r will be useful.
Before proving some theorems about the possibilities for solution sets to systems of equations, let’s analyze one particular system with an infinite solution set very carefully as an example. We’ll use this technique frequently, and shortly we’ll refine it slightly.
Archetypes I and J are both fairly large for doing computations by hand (though not impossibly large). Their properties are very similar, so we will frequently analyze the situation in Archetype I, and leave you the joy of analyzing Archetype J yourself. So work through Archetype I with the text, by hand and/or with a computer, and then tackle Archetype J yourself (and check your results with those listed). Notice too that the archetypes describing systems of equations each lists the values of r, D and F. Here we go…
Example ISSI
Describing infinite solution sets, Archetype I
Archetype I is the system of m = 4
equations in n = 7
variables.
This system has a 4 × 8 augmented matrix that is row-equivalent to the following matrix (check this!), and which is in reduced row-echelon form (the existence of this matrix is guaranteed by Theorem REMEF and its uniqueness is guaranteed by Theorem RREFU),
So we find that r = 3 and
Let i denote one of the r = 3 non-zero rows, and then we see that we can solve the corresponding equation represented by this row for the variable {x}_{{d}_{i}} and write it as a linear function of the variables {x}_{{f}_{1}},\kern 1.95872pt {x}_{{f}_{2}},\kern 1.95872pt {x}_{{f}_{3}},\kern 1.95872pt {x}_{{f}_{4}} (notice that {f}_{5} = 8 does not reference a variable). We’ll do this now, but you can already see how the subscripts upon subscripts takes some getting used to.
Each element of the set F = \left \{{f}_{1},\kern 1.95872pt {f}_{2},\kern 1.95872pt {f}_{3},\kern 1.95872pt {f}_{4},\kern 1.95872pt {f}_{5}\right \} = \left \{2,\kern 1.95872pt 5,\kern 1.95872pt 6,\kern 1.95872pt 7,\kern 1.95872pt 8\right \} is the index of a variable, except for {f}_{5} = 8. We refer to {x}_{{f}_{1}} = {x}_{2}, {x}_{{f}_{2}} = {x}_{5}, {x}_{{f}_{3}} = {x}_{6} and {x}_{{f}_{4}} = {x}_{7} as “free” (or “independent”) variables since they are allowed to assume any possible combination of values that we can imagine and we can continue on to build a solution to the system by solving individual equations for the values of the other (“dependent”) variables.
Each element of the set D = \left \{{d}_{1},\kern 1.95872pt {d}_{2},\kern 1.95872pt {d}_{3}\right \} = \left \{1,\kern 1.95872pt 3,\kern 1.95872pt 4\right \} is the index of a variable. We refer to the variables {x}_{{d}_{1}} = {x}_{1}, {x}_{{d}_{2}} = {x}_{3} and {x}_{{d}_{3}} = {x}_{4} as “dependent” variables since they depend on the independent variables. More precisely, for each possible choice of values for the independent variables we get exactly one set of values for the dependent variables that combine to form a solution of the system.
To express the solutions as a set, we write
\left \{\left [\array{
4 − 4{x}_{2} − 2{x}_{5} − {x}_{6} + 3{x}_{7}
\cr
{x}_{2}
\cr
2 − {x}_{5} + 3{x}_{6} − 5{x}_{7}
\cr
1 − 2{x}_{5} + 6{x}_{6} − 6{x}_{7}
\cr
{x}_{5}
\cr
{x}_{6}
\cr
{x}_{7} } \right ]\mathrel{∣}{x}_{2},\kern 1.95872pt {x}_{5},\kern 1.95872pt {x}_{6},\kern 1.95872pt {x}_{7} ∈ {ℂ}^{}\right \}
|
The condition that {x}_{2},\kern 1.95872pt {x}_{5},\kern 1.95872pt {x}_{6},\kern 1.95872pt {x}_{7} ∈ {ℂ}^{} is how we specify that the variables {x}_{2},\kern 1.95872pt {x}_{5},\kern 1.95872pt {x}_{6},\kern 1.95872pt {x}_{7} are “free” to assume any possible values.
This systematic approach to solving a system of equations will allow us to create a precise description of the solution set for any consistent system once we have found the reduced row-echelon form of the augmented matrix. It will work just as well when the set of free variables is empty and we get just a single solution. And we could program a computer to do it! Now have a whack at Archetype J (Exercise TSS.T10), mimicking the discussion in this example. We’ll still be here when you get back. ⊠
Using the reduced row-echelon form of the augmented matrix of a system of equations to determine the nature of the solution set of the system is a very key idea. So let’s look at one more example like the last one. But first a definition, and then the example. We mix our metaphors a bit when we call variables free versus dependent. Maybe we should call dependent variables “enslaved”?
Definition IDV
Independent and Dependent Variables
Suppose A
is the augmented matrix of a consistent system of linear equations and
B
is a row-equivalent matrix in reduced row-echelon form. Suppose
j is the index of
a column of B
that contains the leading 1 for some row (i.e. column
j is a pivot column).
Then the variable {x}_{j}
is dependent. A variable that is not dependent is called independent or free.
△
If you studied this definition carefully, you might wonder what to do if the system has n variables and column n + 1 is a pivot column? We will see shortly, by Theorem RCLS, that this never happens for a consistent system.
Example FDV
Free and dependent variables
Consider the system of five equations in five variables,
whose augmented matrix row-reduces to
\left [\array{
\text{1}&−1&0&0& 3 &6
\cr
0& 0 &\text{1}&0&−2&1
\cr
0& 0 &0&\text{1}& 4 &9
\cr
0& 0 &0&0& 0 &0
\cr
0& 0 &0&0& 0 &0 } \right ]
|
There are leading 1’s in columns 1, 3 and 4, so D = \left \{1,\kern 1.95872pt 3,\kern 1.95872pt 4\right \}. From this we know that the variables {x}_{1}, {x}_{3} and {x}_{4} will be dependent variables, and each of the r = 3 nonzero rows of the row-reduced matrix will yield an expression for one of these three variables. The set F is all the remaining column indices, F = \left \{2,\kern 1.95872pt 5,\kern 1.95872pt 6\right \}. That 6 ∈ F refers to the column originating from the vector of constants, but the remaining indices in F will correspond to free variables, so {x}_{2} and {x}_{5} (the remaining variables) are our free variables. The resulting three equations that describe our solution set are then,
Make sure you understand where these three equations came from, and notice how the location of the leading 1’s determined the variables on the left-hand side of each equation. We can compactly describe the solution set as,
S = \left \{\left [\array{
6 + {x}_{2} − 3{x}_{5}
\cr
{x}_{2}
\cr
1 + 2{x}_{5}
\cr
9 − 4{x}_{5}
\cr
{x}_{5} } \right ]\mathrel{∣}{x}_{2},\kern 1.95872pt {x}_{5} ∈ {ℂ}^{}\right \}
|
Notice how we express the freedom for {x}_{2} and {x}_{5}: {x}_{2},\kern 1.95872pt {x}_{5} ∈ {ℂ}^{}. ⊠
Sets are an important part of algebra, and we’ve seen a few already. Being comfortable with sets is important for understanding and writing proofs. If you haven’t already, pay a visit now to Section SET.
We can now use the values of m, n, r, and the independent and dependent variables to categorize the solution sets for linear systems through a sequence of theorems. Through the following sequence of proofs, you will want to consult three proof techniques. See Technique E. See Technique N. See Technique CP.
First we have an important theorem that explores the distinction between consistent and inconsistent linear systems.
Theorem RCLS
Recognizing Consistency of a Linear System
Suppose A
is the augmented matrix of a system of linear equations with
n variables.
Suppose also that B
is a row-equivalent matrix in reduced row-echelon form with
r nonzero
rows. Then the system of equations is inconsistent if and only if the leading 1 of row
r is located
in column n + 1
of B.
□
Proof ( ⇐) The first half of the proof begins with the assumption that the leading 1 of row r is located in column n + 1 of B. Then row r of B begins with n consecutive zeros, finishing with the leading 1. This is a representation of the equation 0 = 1, which is false. Since this equation is false for any collection of values we might choose for the variables, there are no solutions for the system of equations, and it is inconsistent.
( ⇒) For the second half of the proof, we wish to show that if we assume the system is inconsistent, then the final leading 1 is located in the last column. But instead of proving this directly, we’ll form the logically equivalent statement that is the contrapositive, and prove that instead (see Technique CP). Turning the implication around, and negating each portion, we arrive at the logically equivalent statement: If the leading 1 of row r is not in column n + 1, then the system of equations is consistent.
If the leading 1 for row r is located somewhere in columns 1 through n, then every preceding row’s leading 1 is also located in columns 1 through n. In other words, since the last leading 1 is not in the last column, no leading 1 for any row is in the last column, due to the echelon layout of the leading 1’s (Definition RREF). We will now construct a solution to the system by setting each dependent variable to the entry of the final column for the row with the corresponding leading 1, and setting each free variable to zero. That sentence is pretty vague, so let’s be more precise. Using our notation for the sets D and F from the reduced row-echelon form (Notation RREFA):
These values for the variables make the equations represented by the first r rows of B all true (convince yourself of this). Rows numbered greater than r (if any) are all zero rows, hence represent the equation 0 = 0 and are also all true. We have now identified one solution to the system represented by B, and hence a solution to the system represented by A (Theorem REMES). So we can say the system is consistent (Definition CS). ■
The beauty of this theorem being an equivalence is that we can unequivocally test to see if a system is consistent or inconsistent by looking at just a single entry of the reduced row-echelon form matrix. We could program a computer to do it!
Notice that for a consistent system the row-reduced augmented matrix has n + 1 ∈ F, so the largest element of F does not refer to a variable. Also, for an inconsistent system, n + 1 ∈ D, and it then does not make much sense to discuss whether or not variables are free or dependent since there is no solution. Take a look back at Definition IDV and see why we did not need to consider the possibility of referencing {x}_{n+1} as a dependent variable.
With the characterization of Theorem RCLS, we can explore the relationships between r and n in light of the consistency of a system of equations. First, a situation where we can quickly conclude the inconsistency of a system.
Theorem ISRN
Inconsistent Systems, r
and n
Suppose A
is the augmented matrix of a system of linear equations in
n variables.
Suppose also that B
is a row-equivalent matrix in reduced row-echelon form with
r rows that are not
completely zeros. If r = n + 1,
then the system of equations is inconsistent.
□
Proof If r = n + 1, then D = \left \{1,\kern 1.95872pt 2,\kern 1.95872pt 3,\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt n,\kern 1.95872pt n + 1\right \} and every column of B contains a leading 1 and is a pivot column. In particular, the entry of column n + 1 for row r = n + 1 is a leading 1. Theorem RCLS then says that the system is inconsistent. ■
Do not confuse Theorem ISRN with its converse! Go check out Technique CV right now.
Next, if a system is consistent, we can distinguish between a unique solution and infinitely many solutions, and furthermore, we recognize that these are the only two possibilities.
Theorem CSRN
Consistent Systems, r
and n
Suppose A
is the augmented matrix of a consistent system of linear equations with
n variables.
Suppose also that B
is a row-equivalent matrix in reduced row-echelon form with
r rows that are not
zero rows. Then r ≤ n. If
r = n, then the system has a
unique solution, and if r < n,
then the system has infinitely many solutions.
□
Proof This theorem contains three implications that we must establish. Notice first that B has n + 1 columns, so there can be at most n + 1 pivot columns, i.e. r ≤ n + 1. If r = n + 1, then Theorem ISRN tells us that the system is inconsistent, contrary to our hypothesis. We are left with r ≤ n.
When r = n, we find n − r = 0 free variables (i.e. F = \left \{n + 1\right \}) and any solution must equal the unique solution given by the first n entries of column n + 1 of B.
When r < n, we have n − r > 0 free variables, corresponding to columns of B without a leading 1, excepting the final column, which also does not contain a leading 1 by Theorem RCLS. By varying the values of the free variables suitably, we can demonstrate infinitely many solutions. ■
The next theorem simply states a conclusion from the final paragraph of the previous proof, allowing us to state explicitly the number of free variables for a consistent system.
Theorem FVCS
Free Variables for Consistent Systems
Suppose A
is the augmented matrix of a consistent system of linear equations with
n variables.
Suppose also that B
is a row-equivalent matrix in reduced row-echelon form with
r rows
that are not completely zeros. Then the solution set can be described with
n − r free
variables. □
Proof See the proof of Theorem CSRN. ■
Example CFV
Counting free variables
For each archetype that is a system of equations, the values of
n and
r are
listed. Many also contain a few sample solutions. We can use this information
profitably, as illustrated by four examples.
We have accomplished a lot so far, but our main goal has been the following theorem, which is now very simple to prove. The proof is so simple that we ought to call it a corollary, but the result is important enough that it deserves to be called a theorem. (See Technique LC.) Notice that this theorem was presaged first by Example TTS and further foreshadowed by other examples.
Theorem PSSLS
Possible Solution Sets for Linear Systems
A system of linear equations has no solutions, a unique solution or infinitely many
solutions. □
Proof By its definition, a system is either inconsistent or consistent (Definition CS). The first case describes systems with no solutions. For consistent systems, we have the remaining two possibilities as guaranteed by, and described in, Theorem CSRN. ■
Here is a diagram that consolidates several of our theorems from this section, and which is of practical use when you analyze systems of equations.
We have one more theorem to round out our set of tools for determining solution sets to systems of linear equations.
Theorem CMVEI
Consistent, More Variables than Equations, Infinite solutions
Suppose a consistent system of linear equations has
m equations in
n variables. If
n > m, then the system has
infinitely many solutions. □
Proof Suppose that the augmented matrix of the system of equations is row-equivalent to B, a matrix in reduced row-echelon form with r nonzero rows. Because B has m rows in total, the number that are nonzero rows is less. In other words, r ≤ m. Follow this with the hypothesis that n > m and we find that the system has a solution set described by at least one free variable because
n − r ≥ n − m > 0.
|
A consistent system with free variables will have an infinite number of solutions, as given by Theorem CSRN. ■
Notice that to use this theorem we need only know that the system is consistent, together with the values of m and n. We do not necessarily have to compute a row-equivalent reduced row-echelon form matrix, even though we discussed such a matrix in the proof. This is the substance of the following example.
Example OSGMD
One solution gives many, Archetype D
Archetype D is the system of m = 3
equations in n = 4
variables,
and the solution {x}_{1} = 0, {x}_{2} = 1, {x}_{3} = 2, {x}_{4} = 1 can be checked easily by substitution. Having been handed this solution, we know the system is consistent. This, together with n > m, allows us to apply Theorem CMVEI and conclude that the system has infinitely many solutions. ⊠
These theorems give us the procedures and implications that allow us to completely solve any system of linear equations. The main computational tool is using row operations to convert an augmented matrix into reduced row-echelon form. Here’s a broad outline of how we would instruct a computer to solve a system of linear equations.
The above makes it all sound a bit simpler than it really is. In practice, row operations employ division (usually to get a leading entry of a row to convert to a leading 1) and that will introduce round-off errors. Entries that should be zero sometimes end up being very, very small nonzero entries, or small entries lead to overflow errors when used as divisors. A variety of strategies can be employed to minimize these sorts of errors, and this is one of the main topics in the important subject known as numerical linear algebra.
Solving a linear system is such a fundamental problem in so many areas of mathematics, and its applications, that any computational device worth using for linear algebra will have a built-in routine to do just that.See: Computation LS.MMA Computation LS.SAGE . In this section we’ve gained a foolproof procedure for solving any system of linear equations, no matter how many equations or variables. We also have a handful of theorems that allow us to determine partial information about a solution set without actually constructing the whole set itself. Donald Knuth would be proud.
C10 In the spirit of Example ISSI, describe the infinite solution set for
Archetype J.
Contributed by Robert Beezer
M45 Prove that Archetype J has infinitely many solutions without row-reducing
the augmented matrix.
Contributed by Robert Beezer Solution [168]
For Exercises M51–M57 say as much as possible about each system’s
solution set. Be sure to make it clear which theorems you are using to reach your
conclusions.
M51 A consistent system of 8 equations in 6 variables.
Contributed by Robert Beezer Solution [168]
M52 A consistent system of 6 equations in 8 variables.
Contributed by Robert Beezer Solution [168]
M53 A system of 5 equations in 9 variables.
Contributed by Robert Beezer Solution [168]
M54 A system with 12 equations in 35 variables.
Contributed by Robert Beezer Solution [168]
M56 A system with 6 equations in 12 variables.
Contributed by Robert Beezer Solution [168]
M57 A system with 8 equations and 6 variables. The reduced row-echelon form
of the augmented matrix of the system has 7 pivot coulmns.
Contributed by Robert Beezer Solution [169]
M60 Without doing any computations, and without examining any solutions,
say as much as possible about the form of the solution set for each archetype that
is a system of equations.
Archetype A
Archetype B
Archetype C
Archetype D
Archetype E
Archetype F
Archetype G
Archetype H
Archetype I
Archetype J
Contributed by Robert Beezer
T10 An inconsistent system may have
r > n. If we
try (incorrectly!) to apply Theorem FVCS to such a system, how many free
variables would we discover?
Contributed by Robert Beezer Solution [169]
T40 Suppose that the coefficient matrix of a consistent system of linear
equations has two columns that are identical. Prove that the system has infinitely
many solutions.
Contributed by Robert Beezer Solution [169]
T41 Consider the system of linear equations
ℒS\kern -1.95872pt \left (A,\kern 1.95872pt b\right ),
and suppose that every element of the vector of constants
b is a
common multiple of the corresponding element of a certain column of
A. More precisely, there
is a complex number α,
and a column index j,
such that {\left [b\right ]}_{i} = α{\left [A\right ]}_{ij}
for all i.
Prove that the system is consistent.
Contributed by Robert Beezer Solution [169]
M45 Contributed by Robert Beezer Statement [166]
Demonstrate that the system is consistent by verifying any
one of the four sample solutions provided. Then because
n = 9 > 6 = m,
Theorem CMVEI gives us the conclusion that the system has infinitely many
solutions.
Notice that we only know the system will have at least 9 − 6 = 3 free variables, but very well could have more. We do not know know that r = 6, only that r ≤ 6.
M51 Contributed by Robert Beezer Statement [166]
Consistent means there is at least one solution (Definition CS). It will have either
a unique solution or infinitely many solutions (Theorem PSSLS).
M52 Contributed by Robert Beezer Statement [166]
With 6 rows in the augmented matrix, the row-reduced version will have
r ≤ 6.
Since the system is consistent, apply Theorem CSRN to see that
n − r ≥ 2
implies infinitely many solutions.
M53 Contributed by Robert Beezer Statement [166]
The system could be inconsistent. If it is consistent, then because it has more
variables than equations Theorem CMVEI implies that there would be infinitely
many solutions. So, of all the possibilities in Theorem PSSLS, only the case of a
unique solution can be ruled out.
M54 Contributed by Robert Beezer Statement [166]
The system could be inconsistent. If it is consistent, then Theorem CMVEI tells
us the solution set will be infinite. So we can be certain that there is not a unique
solution.
M56 Contributed by Robert Beezer Statement [166]
The system could be inconsistent. If it is consistent, and since
12 > 6, then
Theorem CMVEI says we will have infinitely many solutions. So there are two
possibilities. Theorem PSSLS allows to state equivalently that a unique solution
is an impossibility.
M57 Contributed by Robert Beezer Statement [166]
7 pivot columns implies that there are
r = 7
nonzero rows (so row 8 is all zeros in the reduced row-echelon form). Then
n + 1 = 6 + 1 = 7 = r and
Theorem ISRN allows to conclude that the system is inconsistent.
T10 Contributed by Robert Beezer Statement [167]
Theorem FVCS will indicate a negative number of free variables, but we can say even more.
If r > n, then the only
possibility is that r = n + 1,
and then we compute n − r = n − (n + 1) = −1
free variables.
T40 Contributed by Robert Beezer Statement [167]
Since the system is consistent, we know there is either a unique solution, or
infinitely many solutions (Theorem PSSLS). If we perform row operations
(Definition RO) on the augmented matrix of the system, the two equal columns
of the coefficient matrix will suffer the same fate, and remain equal in the final
reduced row-echelon form. Suppose both of these columns are pivot columns
(Definition RREF). Then there is single row containing the two leading
1’s of the two pivot columns, a violation of reduced row-echelon form
(Definition RREF). So at least one of these columns is not a pivot column, and
the column index indicates a free variable in the description of the solution set
(Definition IDV). With a free variable, we arrive at an infinite solution set
(Theorem FVCS).
T41 Contributed by Robert Beezer Statement [167]
The condition about the multiple of the column of constants will allow
you to show that the following values form a solution of the system
ℒS\kern -1.95872pt \left (A,\kern 1.95872pt b\right ),
With one solution of the system known, we can say the system is consistent (Definition CS).
A more involved proof can be built using Theorem RCLS. Begin by proving that each of the three row operations (Definition RO) will convert the augmented matrix of the system into another matrix where column j is α times the entry of the same row in the last column. In other words, the “column multiple property” is preserved under row operations. These proofs will get successively more involved as you work through the three operations.
Now construct a proof by contradiction (Technique CD), by supposing that the system is inconsistent. Then the last column of the reduced row-echelon form of the augmented matrix is a pivot column (Theorem RCLS). Then column j must have a zero in the same row as the leading 1 of the final column. But the “column multiple propery” implies that there is an α in column j in the same row as the leading 1. So α = 0. By hypothesis, then the vector of constants is the zero vector. However, if we began with a final column of zeros, row operations would never have created a leading 1 in the final column. This contradicts the final column being a pivot column, and therefore the system cannot be inconsistent.