From A First Course in Linear Algebra
Version 2.12
© 2004.
Licensed under the GNU Free Documentation License.
http://linear.ups.edu/
This section is in draft form
Needs examples near beginning
We have seen in Section IS that generalized eigenspaces are invariant subspaces that in every instance have led to a direct sum decomposition of the domain of the associated linear transformation. This allows us to create a block diagonal matrix representation (Example ISMR4, Example ISMR6). We also know from Theorem RGEN that the restriction of a linear transformation to a generalized eigenspace is almost a nilpotent linear transformation. Of course, we understand nilpotent linear transformations very well from Section NLT and we have carefully determined a nice matrix representation for them.
So here is the game plan for the final push. Prove that the domain of a linear transformation always decomposes into a direct sum of generalized eigenspaces. We have unravelled Theorem RGEN at Theorem MRRGE so that we can formulate the matrix representations of the restrictions on the generalized eigenspaces using our storehouse of results about nilpotent linear transformations. Arrive at a matrix representation of any linear transformation that is block diagonal with each block being a Jordan block.
In Theorem UTMR we were able to show that any linear transformation from V to V has an upper triangular matrix representation (Definition UTM). We will now show that we can improve on the basis yielding this representation by massaging the basis so that the matrix representation is also block diagonal. The subspaces associated with each block will be generalized eigenspaces, so the most general result will be a decomposition of the domain of a linear transformation into a direct sum of generalized eigenspaces.
Theorem GESD
Generalized Eigenspace Decomposition
Suppose that T : V \mathrel{↦}V
is a linear transformation with distinct eigenvalues
{λ}_{1},\kern 1.95872pt {λ}_{2},\kern 1.95872pt {λ}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {λ}_{m}.
Then
Proof Suppose that \mathop{ dim}\nolimits \left (V \right ) = n and the n (not necessarily distinct) eigenvalues of T are {ρ}_{1},\kern 1.95872pt {ρ}_{2},\kern 1.95872pt {ρ}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {ρ}_{n}. We begin with a basis of V that yields an upper triangular matrix representation, as guaranteed by Theorem UTMR, B = \left \{{x}_{1},\kern 1.95872pt {x}_{2},\kern 1.95872pt {x}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {x}_{n}\right \}. Since the matrix representation is upper triangular, and the eigenvalues of the linear transformation are the diagonal elements we can choose this basis so that there are then scalars {a}_{ij}, 1 ≤ j ≤ n, 1 ≤ i ≤ j − 1 such that
We now define a new basis for V which is just a slight variation in the basis B. Choose any k and ℓ such that 1 ≤ k < ℓ ≤ n and {ρ}_{k}\mathrel{≠}{ρ}_{ℓ}. Define the scalar α = {a}_{kl}∕\left ({ρ}_{ℓ} − {ρ}_{k}\right ). The new basis is C = \left \{{y}_{1},\kern 1.95872pt {y}_{2},\kern 1.95872pt {y}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {y}_{n}\right \} where
We now compute the values of the linear transformation T with inputs from C, noting carefully the changed scalars in the linear combinations of C describing the outputs. These changes will translate to minor changes in the matrix representation built using the basis C. There are three cases to consider, depending on which column of the matrix representation we are examining. First, assume j < ℓ. Then
That seems a bit pointless. The first ℓ − 1 columns of the matrix representations of T relative to B and C are identical. OK, if that was too easy, here’s the main act. Assume j = ℓ. Then
So how different are the matrix representations relative to B and C in column ℓ? For i > k, the coefficient of {y}_{i} is {a}_{ij}, as in the representation relative to B. It is a different story for i ≤ k, where the coefficients of {y}_{i} may be very different. We are especially interested in the coefficient of {y}_{k}. In fact, this whole first part of this proof is about this particular entry of the matrix representation. The coefficient of {y}_{k} is
If the definition of α was a mystery, then no more. In the matrix representation of T relative to C, the entry in column ℓ, row k is a zero. Nice. The only price we pay is that other entries in column ℓ, specifically rows 1 through k − 1, may also change in a way we can’t control.
One more case to consider. Assume j > ℓ. Then
As before, we ask: how different are the matrix representations relative to B and C in column j? Only {y}_{k} has a coefficient different from the corresponding coefficient when the basis is B. So in the matrix representations, the only entries to change are in row k, for columns ℓ + 1 through n.
What have we accomplished? With a change of basis, we can place a zero in a desired entry (row k, column ℓ) of the matrix representation, leaving most of the entries untouched. The only entries to possibly change are above the new zero entry, or to the right of the new zero entry. Suppose we repeat this procedure, starting by “zeroing out” the entry above the diagonal in the second column and first row. Then we move right to the third column, and zero out the element just above the diagonal in the second row. Next we zero out the element in the third column and first row. Then tackle the fourth column, work upwards from the diagonal, zeroing out elements as we go. Entries above, and to the right will repeatedly change, but newly created zeros will never get wrecked, since they are below, or just to the left of the entry we are working on. Similarly the values on the diagonal do not change either. This entire argument can be retooled in the language of change-of-basis matrices and similarity transformations, and this is the approach taken by Noble in his Applied Linear Algebra. It is interesting to concoct the change-of-basis matrix between the matrices B and C and compute the inverse.
Perhaps you have noticed that we have to be just a bit more careful than the previous paragraph suggests. The definition of α has a denominator that cannot be zero, which restricts our maneuvers to zeroing out entries in row k and column ℓ only when {ρ}_{k}\mathrel{≠}{ρ}_{ℓ}. So we do not necessarily arrive at a diagonal matrix. More carefully we can write
where the {b}_{ij} are our new coefficients after repeated changes, the {y}_{j} are the new basis vectors, and the condition “i : \kern 1.95872pt {ρ}_{i} = {ρ}_{j}” means that we only have terms in the sum involving vectors whose final coefficients are identical diagonal values (the eigenvalues). Now reorder the basis vectors carefully. Group together vectors that have equal diagonal entries in the matrix representation, but within each group preserve the order of the precursor basis. This grouping will create a block diagonal structure for the matrix representation, while otherwise preserving the order of the basis will retain the upper triangular form of the representation. So we can arrive at a basis that yields a matrix representation that is upper triangular and block diagonal, with the diagonal entries of each block all equal to a common eigenvalue of the linear transformation.
More carefully, employing the distinct eigenvalues of T, {λ}_{i}, 1 ≤ i ≤ m, we can assert there is a set of basis vectors for V , {u}_{ij}, 1 ≤ i ≤ m, 1 ≤ j ≤ {α}_{T }\left ({λ}_{i}\right ), such that
So the subspace {U}_{i} = \left \langle \left \{{u}_{ij}\mathrel{∣}1 ≤ j ≤ {α}_{T }\left ({λ}_{i}\right )\right \}\right \rangle , 1 ≤ i ≤ m is an invariant subspace of V relative to T and the restriction T{|}_{{U}_{i}} has an upper triangular matrix representation relative to the basis \left \{{u}_{ij}\mathrel{∣}1 ≤ j ≤ {α}_{T }\left ({λ}_{i}\right )\right \} where the diagonal entries are all equal to {λ}_{i}. Notice too that with this definition,
Whew. This is a good place to take a break, grab a cup of coffee, use the toilet, or go for a short stroll, before we show that {U}_{i} is a subspace of the generalized eigenspace {G}_{T }\left ({λ}_{i}\right ). This will follow if we can prove that each of the basis vectors for {U}_{i} is a generalized eigenvector of T for {λ}_{i} (Definition GEV). We need some power of T − {λ}_{i}{I}_{V } that takes {u}_{ij} to the zero vector. We prove by induction on j (Technique I) the claim that {\left (T − {λ}_{i}{I}_{V }\right )}^{j}\left ({u}_{ ij}\right ) = 0. For j = 1 we have,
For the induction step, assume that if k < j, then {\left (T − {λ}_{i}{I}_{V }\right )}^{k} takes {u}_{ik} to the zero vector. Then
This completes the induction step. Since every vector of the spanning set for {U}_{i} is an element of the subspace {G}_{T }\left ({λ}_{i}\right ), Property AC and Property SC allow us to conclude that {U}_{i} ⊆{G}_{T }\left ({λ}_{i}\right ). Then by Definition S, {U}_{i} is a subspace of {G}_{T }\left ({λ}_{i}\right ). Notice that this inductive proof could be interpreted to say that every element of {U}_{i} is a generalized eigenvector of T for {λ}_{i}, and the algebraic multiplicity of {λ}_{i} is a sufficiently high power to demonstrate this via the definition for each vector.
We are now prepared for our final argument in this long proof. We wish to establish that the dimension of the subspace {G}_{T }\left ({λ}_{i}\right ) is the algebraic multiplicity of {λ}_{i}. This will be enough to show that {U}_{i} and {G}_{T }\left ({λ}_{i}\right ) are equal, and will finally provide the desired direct sum decomposition.
We will prove by induction (Technique I) the following claim. Suppose that T : V \mathrel{↦}V is a linear transformation and B is a basis for V that provides an upper triangular matrix representation of T. The number of times any eigenvalue λ occurs on the diagonal of the representation is greater than or equal to the dimension of the generalized eigenspace {G}_{T }\left (λ\right ).
We will use the symbol m for the dimension of V so as to avoid confusion with our notation for the nullity. So \mathop{ dim}\nolimits V = m and our proof will proceed by induction on m. Use the notation {#}_{T }(λ) to count the number of times λ occurs on the diagonal of a matrix representation of T. We want to show that
For the base case, \mathop{ dim}\nolimits V = 1. Every matrix representation of T is an upper triangular matrix with the lone eigenvalue of T, λ, as the single diagonal entry. So {#}_{T }(λ) = 1. The generalized eigenspace of λ is not trivial (since by Theorem GEK it equals the regular eigenspace), so it cannot be a subspace of dimension zero, and thus \mathop{ dim}\nolimits \left ({G}_{T }\left (λ\right )\right ) = 1.
Now for the induction step, assume the claim is true for any linear transformation defined on a vector space with dimension m − 1 or less. Suppose that B = \left \{{v}_{1},\kern 1.95872pt {v}_{2},\kern 1.95872pt {v}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {v}_{m}\right \} is a basis for V that yields a diagonal matrix representation for T with diagonal entries {λ}_{1},\kern 1.95872pt {λ}_{2},\kern 1.95872pt {λ}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {λ}_{m}. Then U = \left \langle \left \{{v}_{1},\kern 1.95872pt {v}_{2},\kern 1.95872pt {v}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {v}_{m−1}\right \}\right \rangle is a subspace of V that is invariant relative to T. The restriction T{|}_{U}: U\mathrel{↦}U is then a linear transformation defined on U, a vector space of dimension m − 1. A matrix representation of T{|}_{U} relative to the basis C = \left \{{v}_{1},\kern 1.95872pt {v}_{2},\kern 1.95872pt {v}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {v}_{m−1}\right \} will be an upper triangular matrix with diagonal entries {λ}_{1},\kern 1.95872pt {λ}_{2},\kern 1.95872pt {λ}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {λ}_{m−1}. We can therefore apply the induction hypothesis to T{|}_{U} and its representation relative to C.
Suppose that λ is any eigenvalue of T. Then suppose that v ∈K\kern -1.95872pt \left ({\left (T − λ{I}_{V }\right )}^{m}\right ). As an element of V , we can write v as a linear combination of the basis elements of B, or more compactly, there is a vector u ∈ U and a scalar α such that v = u + α{v}_{m}. Then,
The final expression in this string of equalities is an element of U since U is invariant relative to both T and {I}_{V }. The expression at the beginning is a scalar multiple of {v}_{m}, and as such cannot be a nonzero element of U without violating the linear independence of B. So
The vector {v}_{m} is nonzero since B is linearly independent, so Theorem SMEZV tells us that α{\left ({λ}_{m} − λ\right )}^{m} = 0. From the properties of scalar multiplication, we are confronted with two possibilities.
Our first case is that λ\mathrel{≠}{λ}_{m}. Notice then that λ occurs the same number of times along the diagonal in the representations of T{|}_{U} and T. Now α = 0 and v = u + 0{v}_{m} = u. Since v was chosen as an arbitrary element of K\kern -1.95872pt \left ({\left (T − λ{I}_{V }\right )}^{m}\right ), Definition SSET says that K\kern -1.95872pt \left ({\left (T − λ{I}_{V }\right )}^{m}\right ) ⊆ U. It is always the case that K\kern -1.95872pt \left ({\left (T{|}_{U} − λ{I}_{U}\right )}^{m}\right ) ⊆K\kern -1.95872pt \left ({\left (T − λ{I}_{ V }\right )}^{m}\right ). However, we can also see that in this case, the opposite set inclusion is true as well. By Definition SE we have K\kern -1.95872pt \left ({\left (T{|}_{U} − λ{I}_{U}\right )}^{m}\right ) = K\kern -1.95872pt \left ({\left (T − λ{I}_{ V }\right )}^{m}\right ). Then
The second case is that λ = {λ}_{m}. Notice then that λ occurs one more time along the diagonal in the representation of T compared to the representation of T{|}_{U}. Then
So u ∈K\kern -1.95872pt \left (T{|}_{U} − λ{I}_{U}\right ). The vector v is an arbitrary member of K\kern -1.95872pt \left ({\left (T − λ{I}_{V }\right )}^{m}\right ) and is also equal to an element of K\kern -1.95872pt \left (T{|}_{U} − λ{I}_{U}\right ) (u) plus a scalar multiple of the vector {v}_{m}. This observation yields
Now count eigenvalues on the diagonal,
In Theorem UTMR we constructed an upper triangular matrix representation of T where each eigenvalue occurred {α}_{T }\left (λ\right ) times on the diagonal. So
Thus, \mathop{ dim}\nolimits \left ({G}_{T }\left ({λ}_{i}\right )\right ) = {α}_{T }\left ({λ}_{i}\right ) and by Theorem EDYES, {U}_{i} = {G}_{T }\left ({λ}_{i}\right ) and we can write
Besides a nice decomposition into invariant subspaces, this proof has a bonus for us.
Theorem DGES
Dimension of Generalized Eigenspaces
Suppose T : V \mathrel{↦}V is a linear
transformation with eigenvalue λ.
Then the dimension of the generalized eigenspace for
λ is the algebraic
multiplicity of λ,
\mathop{ dim}\nolimits \left ({G}_{T }\left ({λ}_{i}\right )\right ) = {α}_{T }\left ({λ}_{i}\right ).
□
Proof At the very end of the proof of Theorem GESD we obtain the inequalities
which establishes the desired equality. ■
Now we are in a position to define what we (and others) regard as an especially nice matrix representation. The word “canonical” has at its root, the word “canon,” which has various meanings. One is the set of laws established by a church council. Another is a set of writings that are authentic, important or representative. Here we take to to mean the accepted, or best, representative among a variety of choices. Every linear transformation admits a variety of representations, and will declare one as the best. Hopefully you will agree.
Definition JCF
Jordan Canonical Form
A square matrix is in Jordan canonical form if it meets the following
requirements:
Theorem JCFLT
Jordan Canonical Form for a Linear Transformation
Suppose T : V \mathrel{↦}V
is a linear transformation. Then there is a basis
B for
V such that the matrix
representation of T
with the following properties:
Proof This theorem is really just the consequence of applying to T, consecutively Theorem GESD, Theorem MRRGE and Theorem CFNLT.
Theorem GESD gives us a decomposition of V into generalized eigenspaces, one for each distinct eigenvalue. Since these generalized eigenspaces ar invariant relative to T, this provides a block diagonal matrix representation where each block is the matrix representation of the restriction of T to the generalized eigenspace.
Restricting T to a generalized eigenspace results in a “nearly nilpotent” linear transformation, as stated more precisely in Theorem RGEN. We unravel Theorem RGEN in the proof of Theorem MRRGE so that we can apply Theorem CFNLT about representations of nilpotent linear transformations.
We know the dimension of a generalized eigenspace is the algebraic multiplicity of the eigenvalue (Theorem DGES), so the blocks associated with the generalized eigenspaces are square with a size equal to the algebraic multiplicity. In refining the basis for this block, and producing Jordan blocks the results of Theorem CFNLT apply. The total number of blocks will be the nullity of T{|}_{{G}_{T}\left (λ\right )} − λ{I}_{{G}_{T}\left (λ\right )}, which is the geometric multiplicity of λ as an eigenvalue of T (Definition GME). The largest of the Jordan blocks will have size equal to the index of the nilpotent linear transformation T{|}_{{G}_{T}\left (λ\right )} − λ{I}_{{G}_{T}\left (λ\right )}, which is exactly the definition of the index of the eigenvalue λ (Definition IE). ■
Before we do some examples of this result, notice how close Jordan canonical form is to a diagonal matrix. Or, equivalently, notice how close we have come to diagonalizing a matrix (Definition DZM). We have a matrix representation which has diagonal entries that are the eigenvalues of a matrix. Each occurs on the diagonal as many times as the algebraic multiplicity. However, when the geometric multiplicity is strictly less than the algebraic multiplicity, we have some entries in the representation just above the diagonal (the “superdiagonal”). Furthermore, we have some idea how often this happens if we know the geometric multiplicity and the index of the eigenvalue.
We now recognize just how simple a diagonalizable linear transformation really is. For each eigenvalue, the generalized eigenspace is just the regular eigenspace, and it decomposes into a direct sum of one-dimensional subspaces, each spanned by a different eigenvector chosen from a basis of eigenvectors for the eigenspace.
Some authors create matrix representations of nilpotent linear transformations where the Jordan block has the ones just below the diagonal (the “subdiagonal”). No matter, it is really the same, just different. We have also defined Jordan canonical form to place blocks for the larger eigenvalues earlier, and for blocks with the same eigenvalue, we place the bigger ones earlier. This is fairly standard, but there is no reason we couldn’t order the blocks differently. It’d be the same, just different. The reason for choosing some ordering is to be assured that there is just one canonical matrix representation for each linear transformation.
Example JCF10
Jordan canonical form, size 10
Suppose that T : {ℂ}^{10}\mathrel{↦}{ℂ}^{10} is the linear
transformation defined by T\left (x\right ) = Ax
where
We’ll find a basis for {ℂ}^{10} that will yield a matrix representation of T in Jordan canonical form. First we find the eigenvalues, and their multiplicities, with the techniques of Chapter E.
For each eigenvalue, we can compute a generalized eigenspace. By Theorem GESD we know that {ℂ}^{10} will decompose into a direct sum of these eigenspaces, and we can restrict T to each part of this decomposition. At this stage we know that the Jordan canonical form will be block diagonal with blocks of size 2, 3 and 5, since the dimensions of the generalized eigenspaces are equal to the algebraic multiplicities of the eigenvalues (Theorem DGES). The geometric multiplicities tell us how many Jordan blocks occupy each of the three larger blocks, but we will discuss this as we analyze each eigenvalue. We do not yet know the index of each eigenvalue (though we can easily infer it for λ = 2) and even if we did have this information, it only determines the size of the largest Jordan block (per eigenvalue). We will press ahead, considering each eigenvalue one at a time.
The eigenvalue λ = 2 has “full” geometric multiplicity, and is not an impediment to diagonalizing T. We will treat it in full generality anyway. First we compute the generalized eigenspace. Since Theorem GEK says that {G}_{T }\left (2\right ) = K\kern -1.95872pt \left ({\left (T − 2{I}_{{ℂ}^{10}}\right )}^{10}\right ) we can compute this generalized eigenspace as a null space derived from the matrix A,
The restriction of T to {G}_{T }\left (2\right ) relative to the two basis vectors above has a matrix representation that is a 2 × 2 diagonal matrix with the eigenvalue λ = 2 as the diagonal entries. So these two vectors will be the first two vectors in our basis for {ℂ}^{10},
Notice that it was not strictly necessary to compute the 10-th power of A − 2{I}_{10}. With {α}_{T }\left (2\right ) = {γ}_{T }\left (2\right ) the null space of the matrix A − 2{I}_{10} contains all of the generalized eigenvectors of T for the eigenvalue λ = 2. But there was no harm in computing the 10-th power either. This discussion is equivalent to the observation that the linear transformation T{|}_{{G}_{T}\left (2\right )}: {G}_{T }\left (2\right )\mathrel{↦}{G}_{T }\left (2\right ) is nilpotent of index 1. In other words, {ι}_{T }\left (2\right ) = 1.
The eigenvalue λ = 0 will not be quite as simple, since the geometric multiplicity is strictly less than the geometric multiplicity. As before, we first compute the generalized eigenspace. Since Theorem GEK says that {G}_{T }\left (0\right ) = K\kern -1.95872pt \left ({\left (T − 0{I}_{{ℂ}^{10}}\right )}^{10}\right ) we can compute this generalized eigenspace as a null space derived from the matrix A,
So \mathop{ dim}\nolimits \left ({G}_{T }\left (0\right )\right ) = 3 = {α}_{T }\left (0\right ), as expected. We will use these three basis vectors for the generalized eigenspace to construct a matrix representation of T{|}_{{G}_{T}\left (0\right )}, where F is being defined implicitly as the basis of {G}_{T }\left (0\right ). We construct this representation as usual, applying Definition MR,
So we have the matrix representation
By Theorem RGEN we can obtain a nilpotent matrix from this matrix representation by subtracting the eigenvalue from the diagonal elements, and then we can apply Theorem CFNLT to M − (0){I}_{3}. First check that {\left (M − (0){I}_{3}\right )}^{2} = O, so we know that the index of M − (0){I}_{3} as a nilpotent matrix, and that therefore λ = 0 is an eigenvalue of T with index 2, {ι}_{T }\left (0\right ) = 2. To determine a basis of {ℂ}^{3} that converts M − (0){I}_{3} to canonical form, we need the null spaces of the powers of M − (0){I}_{3}. For convenience, set N = M − (0){I}_{3}.
Then we choose a vector from N\kern -1.95872pt \left ({N}^{2}\right ) that is not an element of N\kern -1.95872pt \left ({N}^{1}\right ). Any vector with unequal first two entries will fit the bill, say
where we are employing the notation in Theorem CFNLT. The next step is to multiply this vector by N to get part of the basis for N\kern -1.95872pt \left ({N}^{1}\right ),
We need a vector to pair with {z}_{1,1} that will make a basis for the two-dimensional subspace N\kern -1.95872pt \left ({N}^{1}\right ). Examining the basis for N\kern -1.95872pt \left ({N}^{1}\right ) we see that a vector with its first two entries equal will do the job.
Reordering, we find the basis,
From this basis, we can get a matrix representation of N (when viewed as a linear transformation) relative to the basis C for {ℂ}^{3},
Now we add back the eigenvalue λ = 0 to the representation of N to obtain a representation for M. Of course, with an eigenvalue of zero, the change is not apparent, so we won’t display the same matrix again. This is the second block of the Jordan canonical form for T. However, the three vectors in C will not suffice as basis vectors for the domain of T — they have the wrong size! The vectors in C are vectors in the domain of a linear transformation defined by the matrix M. But M was a matrix representation of T{|}_{{G}_{T}\left (0\right )} − 0{I}_{{G}_{T}\left (0\right )} relative to the basis F for {G}_{T }\left (0\right ). We need to “uncoordinatize” each of the basis vectors in C to produce a linear combination of vectors in F that will be an element of the generalized eigenspace {G}_{T }\left (0\right ). These will be the next three vectors of our final answer, a basis for {ℂ}^{10} that has a pleasing matrix representation.
Five down, five to go. Basis vectors, that is. λ = −1 is the smallest eigenvalue, but it will require the most computation. First we compute the generalized eigenspace. Since Theorem GEK says that {G}_{T }\left (−1\right ) = K\kern -1.95872pt \left ({\left (T − (−1){I}_{{ℂ}^{10}}\right )}^{10}\right ) we can compute this generalized eigenspace as a null space derived from the matrix A,
So \mathop{ dim}\nolimits \left ({G}_{T }\left (−1\right )\right ) = 5 = {α}_{T }\left (−1\right ), as expected. We will use these five basis vectors for the generalized eigenspace to construct a matrix representation of T{|}_{{G}_{T}\left (−1\right )}, where F is being recycled and defined now implicitly as the basis of {G}_{T }\left (−1\right ). We construct this representation as usual, applying Definition MR,
So we have the matrix representation of the restriction of T (again recycling and redefining the matrix M)
By Theorem RGEN we can obtain a nilpotent matrix from this matrix representation by subtracting the eigenvalue from the diagonal elements, and then we can apply Theorem CFNLT to M − (−1){I}_{5}. First check that {\left (M − (−1){I}_{5}\right )}^{3} = O, so we know that the index of M − (−1){I}_{5} as a nilpotent matrix, and that therefore λ = −1 is an eigenvalue of T with index 3, {ι}_{T }\left (−1\right ) = 3. To determine a basis of {ℂ}^{5} that converts M − (−1){I}_{5} to canonical form, we need the null spaces of the powers of M − (−1){I}_{5}. Again, for convenience, set N = M − (−1){I}_{5}.
Then we choose a vector from N\kern -1.95872pt \left ({N}^{3}\right ) that is not an element of N\kern -1.95872pt \left ({N}^{2}\right ). The sum of the four basis vectors for N\kern -1.95872pt \left ({N}^{2}\right ) sum to a vector with all five entries equal to 1. We will mess with the first entry to create a vector not in N\kern -1.95872pt \left ({N}^{2}\right ),
where we are employing the notation in Theorem CFNLT. The next step is to multiply this vector by N to get a portion of the basis for N\kern -1.95872pt \left ({N}^{2}\right ),
We have a basis for the two-dimensional subspace N\kern -1.95872pt \left ({N}^{1}\right ) and we can add to that the vector {z}_{2,1} and we have three of four basis vectors for N\kern -1.95872pt \left ({N}^{2}\right ). These three vectors span the subspace we call {Q}_{2}. We need a fourth vector outside of {Q}_{2} to complete a basis of the four-dimensional subspace N\kern -1.95872pt \left ({N}^{2}\right ). Check that the vector
is an element of N\kern -1.95872pt \left ({N}^{2}\right ) that lies outside of the subspace {Q}_{2}. This vector was constructed by getting a nice basis for {Q}_{2} and forming a linear combination of this basis that specifies three of the five entries of the result. Of the remaining two entries, one was changed to move the vector outside of {Q}_{2} and this was followed by a change to the remaining entry to place the vector into N\kern -1.95872pt \left ({N}^{2}\right ). The vector {z}_{2,2} is the lone basis vector for the subspace we call {R}_{2}.
The remaining two basis vectors are easy to come by. They are the result of applying N to each of the two most recently determined basis vectors,
Now we reorder these basis vectors, to arrive at the basis
A matrix representation of N relative to C is
To obtain a matrix representation of M, we add back in the matrix (−1){I}_{5}, placing the eigenvalue back along the diagonal, and slightly modifying the Jordan blocks,
The basis C yields a pleasant matrix representation for the restriction of the linear transformation T − (−1)I to the generalized eigenspace {G}_{T }\left (−1\right ). However, we must remember that these vectors in {ℂ}^{5} are representations of vectors in {ℂ}^{10} relative to the basis F. Each needs to be “un-coordinatized” before joining our final basis. Here we go,
To summarize, we list the entire basis B = \left \{{v}_{1},\kern 1.95872pt {v}_{2},\kern 1.95872pt {v}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {v}_{10}\right \},
The resulting matrix representation is
If you are not inclined to check all of these computations, here are a few that should convince you of the amazing properties of the basis B. Compute the matrix-vector products A{v}_{i}, 1 ≤ i ≤ 10. In each case the result will be a vector of the form λ{v}_{i} + δ{v}_{i−1}, where λ is one of the eigenvalues (you should be able to predict ahead of time which one) and δ ∈\left \{0, 1\right \}.
Alternatively, if we can write inputs to the linear transformation T as linear combinations of the vectors in B (which we can do uniquely since B is a basis, Theorem VRRB), then the “action” of T is reduced to a matrix-vector product with the exceedingly simple matrix that is the Jordan canonical form. Wow! ⊠
Jordan was a French mathematician who was active in the late 1800’s. Cayley and Hamilton were 19th-century contemporaries of Jordan from Britain. The theorem that bears their names is perhaps one of the most celebrated in basic linear algebra. While our result applies only to vector spaces and linear transformations with scalars from the set of complex numbers, ℂ, the result is equally true if we restrict our scalars to the real numbers, {ℝ}^{}. It says that every matrix satisfies its own characteristic polynomial.
Theorem CHT
Cayley-Hamilton Theorem
Suppose A
is a square matrix with characteristic polynomial
{p}_{A}\left (x\right ). Then
{p}_{A}\left (A\right ) = O.
□
Proof Suppose B and C are similar matrices via the matrix S, B = {S}^{−1}CS, and q(x) is any polynomial. Then q\left (B\right ) is similar to q\left (C\right ) via S, q\left (B\right ) = {S}^{−1}q\left (C\right )S. (See Example HPDM for hints on how to convince yourself of this.)
By Theorem JCFLT and Theorem SCB we know A is similar to a matrix, J, in Jordan canonical form. Suppose {λ}_{1},\kern 1.95872pt {λ}_{2},\kern 1.95872pt {λ}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {λ}_{m} are the distinct eigenvalues of A (and are therefore the eigenvalues and diagonal entries of J). Then by Theorem EMRCP and Definition AME, we can factor the characteristic polynomial as
On substituting the matrix J we have
The matrix J − {λ}_{k}I will be block diagonal, and the block arising from the generalized eigenspace for {λ}_{k} will have zeros along the diagonal. Suitably adjusted for matrices (rather than linear transformations), Theorem RGEN tells us this matrix is nilpotent. Since the size of this nilpotent matrix is equal to the algebraic multiplicity of {λ}_{k}, the power {\left (J − {λ}_{k}I\right )}^{{α}_{A}\left ({λ}_{k}\right )} will be a zero matrix (Theorem KPNLT) in the location of this block.
Repeating this argument for each of the m eigenvalues will place a zero block in some term of the product at every location on the diagonal. The entire product will then be zero blocks on the diagonal, and zero off the diagonal. In other words, it will be the zero matrix. Since A and J are similar, {p}_{A}\left (A\right ) = {p}_{A}\left (J\right ) = O. ■
Definition VR
Matrix representations build on vector representations, so this is the definition
that gets us started. A representation depends on the choice of a single basis for
the vector space. Theorem VRRB is what tells us this idea might be
useful.
Theorem VRILT
As an invertible linear transformation, vector representation allows
us to translate, back and forth, between abstract vector spaces
(V ) and concrete
vector spaces ({ℂ}^{n}).
This is key to all our notions of representations in this chapter.
Theorem CFDVS
Every vector space with finite dimension “looks like” a vector space of column
vectors. Vector representation is the isomorphism that establishes that these
vector spaces are isomorphic.
Definition MR
Building on the definition of a vector representation, we define a representation of
a linear transformation, determined by a choice of two bases, one for the domain
and one for the codomain. Notice that vectors are represented by columnar lists of
scalars, while linear transformations are represented by rectangular tables of
scalars. Building a matrix representation is as important a skill as row-reducing a
matrix.
Theorem FTMR
Definition MR is not really very interesting until we have this theorem. The
second form tells us that we can compute outputs of linear transformations via
matrix multiplication, along with some bookkeeping for vector representations.
Searching forward through the text on “FTMR” is an interesting exercise.
You will find reference to this result buried inside many key proofs at
critical points, and it also appears in numerous examples and solutions to
exercises.
Theorem MRCLT
Turns out that matrix multiplication is really a very natural operation, it is just
the chaining together (composition) of functions (linear transformations).
Beautiful. Even if you don’t try to work the problem, study Solution MR.T80 for
more insight.
Theorem KNSI
Kernels “are” null spaces. For this reason you’ll see these terms used
interchangeably.
Theorem RCSI
Ranges “are” column spaces. For this reason you’ll see these terms used
interchangeably.
Theorem IMR
Invertible linear transformations are represented by invertible (nonsingular)
matrices.
Theorem NME9
The NMEx series has always been important, but we’ve held off saying so until
now. This is the end of the line for this one, so it is a good time to contemplate all
that it means.
Theorem SCB
Diagonalization back in Section SD was really a change of basis to achieve a
diagonal matrix repesentation. Maybe we should be highlighting the more general
Theorem MRCB here, but its overly technical description just isn’t as appealing.
However, it will be important in some of the matrix decompostions in
Chapter MD.
Theorem EER
This theorem, with the companion definition, Definition EELT, tells us that
eigenvalues, and eigenvectors, are fundamentally a characteristic of linear
transformations (not matrices). If you study matrix decompositions in
Chapter MD you will come to appreciate that almost all of a matrix’s secrets can
be unlocked with knowledge of the eigenvalues and eigenvectors.
Theorem OD
Can you imagine anything nicer than an orthonormal diagonalization? A basis
of pairwise orthogonal, unit norm, eigenvectors that provide a diagonal
representation for a matrix? Here we learn just when this can happen —
precisely when a matrix is normal, which is a disarmingly simple property to
define.
Theorem CFNLT
Nilpotent linear transformations are the fundamental obstacle to a matrix (or
linear transformation) being diagonalizable. This specialized representation
theorem is the fundamental expression of just how close we can come
to surmounting the obstacle, i.e. how close we can come to a diagonal
representation.
Theorem DGES
This theorem is a long time in coming, but perhaps it best explains our interest in
generalized eigenspaces. When the dimension of a “regular” eigenspace (the
geometic multiplicity) does not meet the algebraic multiplicity of the
corresponding eigenvalue, then a matrix is not diagonalizable (Theorem DMFE).
However, if we generalize the idea of an eigenspace (Definition GES), then we
arrive at invariant subspaces that together give a complete decomposition of the
domain as a direct sum. And these subspaces have dimensions equal to the
corresponding algebraic multiplicities.
Theorem JCFLT
If you can’t diagonalize, just how close can you come? This is an answer
(there are others, like rational canonical form). “Canonicalism” is in the
eye of the beholder. But this is a good place to conclude our study of a
widely accepted canonical form that is possible for every matrix or linear
transformation.