From A First Course in Linear Algebra
Version 2.01
© 2004.
Licensed under the GNU Free Documentation License.
http://linear.ups.edu/
This section is in draft form
Nearly complete
We have seen that some matrices are diagonalizable and some are not. Some authors refer to a non-diagonalizable matrix as defective, but we will study them carefully anyway. Examples of such matrices include Example EMMS4, Example HMEM5, and Example CEMS6. Each of these matrices has at least one eigenvalue with geometric multiplicity strictly less than its algebraic multiplicity, and therefore Theorem DMFE tells us these matrices are not diagonalizable.
Given a square matrix A, it is likely similar to many, many other matrices. Of all these possibilities, which is the best? “Best” is a subjective term, but we might agree that a diagonal matrix is certainly a very nice choice. Unfortunately, as we have seen, this will not always be possible. What form of a matrix is “next-best”? Our goal, which will take us several sections to reach, is to show that every matrix is similar to a matrix that is “nearly-diagonal” (Section JCF). More precisely, every matrix is similar to a matrix with elements on the diagonal, and zeros and ones on the diagonal just above the main diagonal (the “super diagonal”), with zeros everywhere else. In the language of equivalence relations (see Theorem SER), we are determining a systematic representative for each equivalence class. Such a representative for a set of similar matrices is called a canonical form.
We have just discussed the determination of a canonical form as a question about matrices. However, we know that every square matrix creates a natural linear transformation (Theorem MBLT) and every linear transformation with identical domain and codomain has a square matrix representation for each choice of a basis, with a change of basis creating a similarity transformation (Theorem SCB). So we will state, and prove, theorems using the language of linear transformations on abstract vector spaces, while most of our examples will work with square matrices. You can, and should, mentally translate between the two settings frequently and easily.
We will discover that nilpotent linear transformations are the essential obstacle in a non-diagonalizable linear transformation. So we will study them carefully first, both as an object of inherent mathematical interest, but also as the object at the heart of the argument that leads to a pleasing canonical form for any linear transformation. Once we understand these linear transformations thoroughly, we will be able to easily analyze the structure of any linear transformation.
Definition NLT
Nilpotent Linear Transformation
Suppose that T : V \mathrel{↦}V
is a linear transformation such that there is an integer
p > 0 such that
{T}^{p}\left (v\right ) = 0 for every
v ∈ V . The smallest
p for which this condition
is met is called the index
of T.
△
Of course, the linear transformation T defined by T\left (v\right ) = 0 will qualify as nilpotent of index 1. But are there others?
Example NM64
Nilpotent matrix, size 6, index 4
Recall that our definitions and theorems are being stated for linear
transformations on abstract vector spaces, while our examples will work with
square matrices (and use the same terms interchangeably). In this case, to
demonstrate the existence of nontrivial nilpotent linear transformations, we
desire a matrix such that some power of the matrix is the zero matrix.
Consider
Thus we can say that A is nilpotent of index 4.
Because it will presage some upcoming theorems, we will record some extra information about the eigenvalues and eigenvectors of A here. A has just one eigenvalue, λ = 0, with algebraic multiplicity 6 and geometric multiplicity 2. The eigenspace for this eigenvalue is
If there were degrees of singularity, we might say this matrix was very singular, since zero is an eigenvalue with maximum algebraic multiplicity (Theorem SMZE, Theorem ME). Notice too that A is “far” from being diagonalizable (Theorem DMFE). ⊠
Another example.
Example NM62
Nilpotent matrix, size 6, index 2
Consider the matrix
So B is nilpotent of index 2. Again, the only eigenvalue of B is zero, with algebraic multiplicity 6. The geometric multiplicity of the eigenvalue is 3, as seen in the eigenspace,
Again, Theorem DMFE tells us that B is far from being diagonalizable. ⊠
On a first encounter with the definition of a nilpotent matrix, you might wonder if such a thing was possible at all. That a high power of a nonzero object could be zero is so very different from our experience with scalars that it seems very unnatural. Hopefully the two previous examples were somewhat surprising. But we have seen that matrix algebra does not always behave the way we expect (Example MMNC), and we also now recognize matrix products not just as arithmetic, but as function composition (Theorem MRCLT). We will now turn to some examples of nilpotent matrices which might be more transparent.
Definition JB
Jordan Block
Given the scalar λ ∈ ℂ,
the Jordan block {J}_{n}\left (λ\right )
is the n × n
matrix defined by
(This definition contains Notation JB.) △
Example JB4
Jordan block, size 4
A simple example of a Jordan block,
We will return to general Jordan blocks later, but in this section we are just interested in Jordan blocks where λ = 0. Here’s an example of why we are specializing in these matrices now.
Example NJB5
Nilpotent Jordan block, size 5
Consider
So {J}_{5}\left (0\right ) is nilpotent of index 5. As before, we record some information about the eigenvalues and eigenvectors of this matrix. The only eigenvalue is zero, with algebraic multiplicity 5, the maximum possible (Theorem ME). The geometric multiplicity of this eigenvalue is just 1, the minimum possible (Theorem ME), as seen in the eigenspace,
There should not be any real surprises in this example. We can watch the ones in the powers of {J}_{5}\left (0\right ) slowly march off to the upper-right hand corner of the powers. In some vague way, the eigenvalues and eigenvectors of this matrix are equally extreme. ⊠
We can form combinations of Jordan blocks to build a variety of nilpotent matrices. Simply place Jordan blocks on the diagonal of a matrix with zeros everywhere else, to create a block diagonal matrix.
Example NM83
Nilpotent matrix, size 8, index 3
Consider the matrix
So C is nilpotent of index 3. You should notice how block diagonal matrices behave in products (much like diagonal matrices) and that it was the largest Jordan block that determined the index of this combination. All eight eigenvalues are zero, and each of the three Jordan blocks contributes one eigenvector to a basis for the eigenspace, resulting in zero having a geometric multiplicity of 3. ⊠
It would appear that nilpotent matrices only have zero as an eigenvalue, so the algebraic multiplicity will be the maximum possible. However, by creating block diagonal matrices with Jordan blocks on the diagonal you should be able to attain any desired geometric multiplicity for this lone eigenvalue. Likewise, the size of the largest Jordan block employed will determine the index of the matrix. So nilpotent matrices with various combinations of index and geometric multiplicities are easy to manufacture. The predictable properties of block diagonal matrices in matrix products and eigenvector computations, along with the next theorem, make this possible. You might find Example NJB5 a useful companion to this proof.
Theorem NJB
Nilpotent Jordan Blocks
The Jordan block {J}_{n}\left (0\right ) is
nilpotent of index n.
□
Proof While not phrased as an if-then statement, the statement in the theorem is understood to mean that if we have a specific matrix ({J}_{n}\left (0\right )) then we need to establish it is nilpotent of a specified index. The first column of {J}_{n}\left (0\right ) is the zero vector, and the remaining n − 1 columns are the standard unit vectors {e}_{i}, 1 ≤ i ≤ n − 1 (Definition SUV), which are also the first n − 1 columns of the size n identity matrix {I}_{n}. As shorthand, write J = {J}_{n}\left (0\right ).
We will use the definition of matrix multiplication (Definition MM), together with a proof by induction (Technique I), to study the powers of J. Our claim is that
for 1 ≤ k ≤ n. For the base case, k = 1, and the definition of {J}^{1} = {J}_{ n}\left (0\right ) establishes the claim. For the induction step, first note that J{e}_{1} = 0 and J{e}_{i} = {e}_{i−1} for 2 ≤ i ≤ n. Then, assuming the claim is true for k, we examine the k + 1 case,
This concludes the induction. So {J}^{k} has a nonzero entry (a one) in row n − k and column n, for 1 ≤ k ≤ n − 1, and is therefore a nonzero matrix. However, {J}^{n} = \left [0\left |0\right .\left |\mathop{\mathop{…}}\kern 1.95872pt \right .\left |0\right .\right ] = O. By Definition NLT, J is nilpotent of index n. ■
In this subsection we collect some basic properties of nilpotent linear transformations. After studying the examples in the previous section, some of these will be no surprise.
Theorem ENLT
Eigenvalues of Nilpotent Linear Transformations
Suppose that T : V \mathrel{↦}V is a nilpotent
linear transformation and λ
is an eigenvalue of T.
Then λ = 0.
□
Proof Let x be an eigenvector of T for the eigenvalue λ, and suppose that T is nilpotent with index p. Then
Because x is an eigenvector, it is nonzero, and therefore Theorem SMEZV tells us that {λ}^{p} = 0 and so λ = 0. ■
Paraphrasing, all of the eigenvalues of a nilpotent linear transformation are zero. So in particular, the characteristic polynomial of a nilpotent linear transformation, T, on a vector space of dimension n, is simply {p}_{T }\left (x\right ) = {x}^{n}.
The next theorem is not critical for what follows, but it will explain our interest in nilpotent linear transformations. More specifically, it is the first step in backing up the assertion that nilpotent linear transformations are the essential obstacle in a non-diagonalizable linear transformation. While it is not obvious from the statement of the theorem, it says that a nilpotent linear transformation is not diagonalizable, unless it is trivially so.
Theorem DNLT
Diagonalizable Nilpotent Linear Transformations
Suppose the linear transformation T : V \mathrel{↦}V
is nilpotent. Then T is
diagonalizable if and only T is
the zero linear transformation. □
Proof We start with the easy direction. Let n =\mathop{ dim}\nolimits \left (V \right ).
( ⇐) The linear transformation Z : V \mathrel{↦}V defined by Z\left (v\right ) = 0 for all v ∈ V is nilpotent of index p = 1 and a matrix repesentation relative to any basis of V is the n × n zero matrix, O. Quite obviously, the zero matrix is a diagonal matrix (Definition DIM) and hence Z is diagonalizable (Definition DZM).
( ⇒) Assume now that T is diagonalizable, so {γ}_{T }\left (λ\right ) = {α}_{T }\left (λ\right ) for every eigenvalue λ (Theorem DMFE). By Theorem ENLT, T has only one eigenvalue (zero), which therefore must have algebraic multiplicity n (Theorem NEM). So the geometric multiplicity of zero will be n as well, {γ}_{T }\left (0\right ) = n.
Let B be a basis for the eigenspace {ℰ}_{T }\left (0\right ). Then B is a linearly independent subset of V of size n, and by Theorem G will be a basis for V . For any x ∈ B we have
So T is identically zero on a basis for B, and since the action of a linear transformation on a basis determines all of the values of the linear transformation (Theorem LTDB), it is easy to see that T\left (v\right ) = 0 for every v ∈ V . ■
So, other than one trivial case (the zero matrix), every nilpotent linear transformation is not diagonalizable. It remains to see what is so “essential” about this broad class of non-diagonalizable linear transformations. For this we now turn to a discussion of kernels of powers of nilpotent linear transformations, beginning with a result about general linear transformations that may not necessarily be nilpotent.
Theorem KPLT
Kernels of Powers of Linear Transformations
Suppose T : V \mathrel{↦}V is a linear
transformation, where \mathop{ dim}\nolimits \left (V \right ) = n.
Then there is an integer m,
0 ≤ m ≤ n, such
that
Proof There are several items to verify in the conclusion as stated. First, we show that K\kern -1.95872pt \left ({T}^{k}\right ) ⊆K\kern -1.95872pt \left ({T}^{k+1}\right ) for any k. Choose z ∈K\kern -1.95872pt \left ({T}^{k}\right ). Then
So by Definition KLT, z ∈K\kern -1.95872pt \left ({T}^{k+1}\right ) and by Definition SSET we have K\kern -1.95872pt \left ({T}^{k}\right ) ⊆K\kern -1.95872pt \left ({T}^{k+1}\right ).
Second, we demonstrate the existence of a power m where consecutive powers result in equal kernels. A by-product will be the condition that m can be chosen so that m ≤ n. To the contrary, suppose that
Since K\kern -1.95872pt \left ({T}^{k}\right ) ⊊ K\kern -1.95872pt \left ({T}^{k+1}\right ), Theorem PSSD implies that \mathop{ dim}\nolimits \left (K\kern -1.95872pt \left ({T}^{k+1}\right )\right ) ≥\mathop{ dim}\nolimits \left (K\kern -1.95872pt \left ({T}^{k}\right )\right ) + 1. Repeated application of this observation yields
Thus, K\kern -1.95872pt \left ({T}^{n+1}\right ) has a basis of size at least n + 1, which is a linearly independent set of size greater than n in the vector space V of dimension n. This contradicts Theorem G.
This contradiction yields the existence of an integer k such that K\kern -1.95872pt \left ({T}^{k}\right ) = K\kern -1.95872pt \left ({T}^{k+1}\right ), so we can define m to be smallest such integer with this property. From the argument above about dimensions resulting from a strictly increasing chain of subspaces, it should be clear that m ≤ n.
It remains to show that once two consecutive kernels are equal, then all of the remaining kernels are equal. More formally, if K\kern -1.95872pt \left ({T}^{m}\right ) = K\kern -1.95872pt \left ({T}^{m+1}\right ), then K\kern -1.95872pt \left ({T}^{m}\right ) = K\kern -1.95872pt \left ({T}^{m+j}\right ) for all j ≥ 1. We will give a proof by induction on j (Technique I). The base case (j = 1) is precisely our defining property for m.
In the induction step, we assume that K\kern -1.95872pt \left ({T}^{m}\right ) = K\kern -1.95872pt \left ({T}^{m+j}\right ) and endeavor to show that K\kern -1.95872pt \left ({T}^{m}\right ) = K\kern -1.95872pt \left ({T}^{m+j+1}\right ). At the outset of this proof we established that K\kern -1.95872pt \left ({T}^{m}\right ) ⊆K\kern -1.95872pt \left ({T}^{m+j+1}\right ). So Definition SE requires only that we establish the subset inclusion in the opposite direction. To wit, choose z ∈K\kern -1.95872pt \left ({T}^{m+j+1}\right ). Then
So by Definition KLT, z ∈K\kern -1.95872pt \left ({T}^{m}\right ) as desired. ■
We now specialize Theorem KPLT to the case of nilpotent linear transformations, which buys us just a bit more precision in the conclusion.
Theorem KPNLT
Kernels of Powers of Nilpotent Linear Transformations
Suppose T : V \mathrel{↦}V
is a nilpotent linear transformation with index
p and
\mathop{ dim}\nolimits \left (V \right ) = n.
Then 0 ≤ p ≤ n
and
Proof Since {T}^{p} = 0 it follows that {T}^{p+j} = 0 for all j ≥ 0 and thus K\kern -1.95872pt \left ({T}^{p+j}\right ) = V for j ≥ 0. So the value of m guaranteed by Theorem KPLT is at most p. The only remaining aspect of our conclusion that does not follow from Theorem KPLT is that m = p. To see this we must show that K\kern -1.95872pt \left ({T}^{k}\right ) ⊊ K\kern -1.95872pt \left ({T}^{k+1}\right ) for 0 ≤ k ≤ p − 1. If K\kern -1.95872pt \left ({T}^{k}\right ) = K\kern -1.95872pt \left ({T}^{k+1}\right ) for some k < p, then K\kern -1.95872pt \left ({T}^{k}\right ) = K\kern -1.95872pt \left ({T}^{p}\right ) = V . This implies that {T}^{k} = 0, violating the fact that T has index p. So the smallest value of m is indeed p, and we learn that p < n. ■
The structure of the kernels of powers of nilpotent linear transformations will be crucial to what follows. But immediately we can see a practical benefit. Suppose we are confronted with the question of whether or not an n × n matrix, A, is nilpotent or not. If we don’t quickly find a low power that equals the zero matrix, when do we stop trying higher and higher powers? Theorem KPNLT gives us the answer: if we don’t see a zero matrix by the time we finish computing {A}^{n}, then it is not going to ever happen. We’ll now take a look at one example of Theorem KPNLT in action.
Example KPNLT
Kernels of powers of a nilpotent linear transformation
We will recycle the nilpotent matrix A
of index 4 from Example NM64. We now know that would have only needed to look at the
first 6 powers of A
if the matrix had not been nilpotent. We list bases for the null spaces of the powers
of A.
(Notice how we are using null spaces for matrices interchangeably with kernels of
linear transformations, see Theorem KNSI for justification.)
With the exception of some convenience scaling of the basis vectors in N\kern -1.95872pt \left ({A}^{2}\right ) these are exactly the basis vectors described in Theorem BNS. We can see that the dimension of N\kern -1.95872pt \left (A\right ) equals the geometric multiplicity of the zero eigenvalue. Why is this not an accident? We can see the dimensions of the kernels consistently increasing, and we can see that N\kern -1.95872pt \left ({A}^{4}\right ) = {ℂ}^{6}. But Theorem KPNLT says a little more. Each successive kernel should be a superset of the previous one. We ought to be able to begin with a basis of N\kern -1.95872pt \left (A\right ) and extend it to a basis of N\kern -1.95872pt \left ({A}^{2}\right ). Then we should be able to extend a basis of N\kern -1.95872pt \left ({A}^{2}\right ) into a basis of N\kern -1.95872pt \left ({A}^{3}\right ), all with repeated applications of Theorem ELIS. Verify the following,
Do not be concerned at the moment about how these bases were constructed since we are not describing the applications of Theorem ELIS here. Do verify carefully for each alleged basis that, (1) it is a superset of the basis for the previous kernel, (2) the basis vectors really are members of the kernel of the right power of A, (3) the basis is a linearly independent set, (4) the size of the basis is equal to the size of the basis found previously for each kernel. With these verifications, Theorem G will tell us that we have successfully demonstrated what Theorem KPNLT guarantees. ⊠
Our main purpose in this section is to find a basis so that a nilpotent linear transformation will have a pleasing, nearly-diagonal matrix representation. Of course, we will not have a definition for “pleasing,” nor for “nearly-diagonal.” But the short answer is that our preferred matrix representation will be built up from Jordan blocks, {J}_{n}\left (0\right ). Here’s the theorem. You will find Example CFNLT helpful as you study this proof, since it uses the same notation, and is large enough to (barely) illustrate the full generality of the theorem (see ).
Theorem CFNLT
Canonical Form for Nilpotent Linear Transformations
Suppose that T : V \mathrel{↦}V
is a nilpotent linear transformation of index
p. Then there is a basis
for V so that the matrix
representation, {M}_{B,B}^{T },
is block diagonal with each block being a Jordan block,
{J}_{n}\left (0\right ). The size of the largest
block is the index p,
and the total number of blocks is the nullity of
T,
n\left (T\right ).
□
Proof We will explicitly construct the desired basis, so the proof is constructive (Technique C), and can be used in practice. As we begin, the basis vectors will not be in the proper order, but we will rearrange them at the end of the proof. For convenience, define {n}_{i} = n\left ({T}^{i}\right ), so for example, {n}_{0} = 0, {n}_{1} = n\left (T\right ) and {n}_{p} = n\left ({T}^{p}\right ) =\mathop{ dim}\nolimits \left (V \right ). Define {s}_{i} = {n}_{i} − {n}_{i−1}, for 1 ≤ i ≤ p, so we can think of {s}_{i} as “how much bigger” K\kern -1.95872pt \left ({T}^{i}\right ) is than K\kern -1.95872pt \left ({T}^{i−1}\right ). In particular, Theorem KPNLT implies that {s}_{i} > 0 for 1 ≤ i ≤ p.
We are going to build a set of vectors {z}_{i,j}, 1 ≤ i ≤ p, 1 ≤ j ≤ {s}_{i}. Each {z}_{i,j} will be an element of K\kern -1.95872pt \left ({T}^{i}\right ) and not an element of K\kern -1.95872pt \left ({T}^{i−1}\right ). In total, we will obtain a linearly independent set of {\mathop{ \mathop{∑ }}\nolimits }_{i=1}^{p}{s}_{ i} ={\mathop{ \mathop{∑ }}\nolimits }_{i=1}^{p}{n}_{ i} − {n}_{i−1} = {n}_{p} − {n}_{0} =\mathop{ dim}\nolimits \left (V \right ) vectors that form a basis of V . We construct this set in pieces, starting at the “wrong” end. Our procedure will build a series of subspaces, {Z}_{i}, each lying in between K\kern -1.95872pt \left ({T}^{i−1}\right ) and K\kern -1.95872pt \left ({T}^{i}\right ), having bases {z}_{i,j}, 1 ≤ j ≤ {s}_{i}, and which together equal V as a direct sum. Now would be a good time to review the results on direct sums collected in Subsection PD.DS. OK, here we go.
We build the subspace {Z}_{p} first (this is what we meant by “starting at the wrong end”). K\kern -1.95872pt \left ({T}^{p−1}\right ) is a proper subspace of K\kern -1.95872pt \left ({T}^{p}\right ) = V (Theorem KPNLT). Theorem DSFOS says that there is a subspace of V that will pair with the subspace K\kern -1.95872pt \left ({T}^{p−1}\right ) to form a direct sum of V . Call this subspace {Z}_{p}, and choose vectors {z}_{p,j}, 1 ≤ j ≤ {s}_{p} as a basis of {Z}_{p}, which we will denote as {B}_{p}. Note that we have a fair amount of freedom in how to choose these first basis vectors. Several observations will be useful in the next step. First V = K\kern -1.95872pt \left ({T}^{p−1}\right ) ⊕ {Z}_{ p}. The basis {B}_{p} = \left \{{z}_{p,1},\kern 1.95872pt {z}_{p,2},\kern 1.95872pt {z}_{p,3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {z}_{p,{s}_{p}}\right \} is linearly independent. For 1 ≤ j ≤ {s}_{i}, {z}_{p,j} ∈K\kern -1.95872pt \left ({T}^{p}\right ) = V . Since the two subspaces of a direct sum have no nonzero vectors in common (Theorem DSZI), for 1 ≤ j ≤ {s}_{i}, {z}_{p,j}∉K\kern -1.95872pt \left ({T}^{p−1}\right ). That was comparably easy.
If obtaining {Z}_{p} was easy, getting {Z}_{p−1} will be harder. We will repeat the next step p − 1 times, and so will do it carefully the first time. Eventually, {Z}_{p−1} will have dimension {s}_{p−1}. However, the first {s}_{p} vectors of a basis are straightforward. Define {z}_{p−1,j} = T\left ({z}_{p,j}\right ), 1 ≤ j ≤ {s}_{p}. Notice that we have no choice in creating these vectors, they are a consequence of our choices for {z}_{p,j}. In retrospect (i.e. on a second reading of this proof), you will recognize this as the key step in realizing a matrix representation of a nilpotent linear transformation with Jordan blocks. We need to know that this set of vectors in linearly independent, so start with a relation of linear dependence (Definition RLD), and massage it,
Define x = {a}_{1}{z}_{p,1} + {a}_{2}{z}_{p,2} + {a}_{3}{z}_{p,3} + \mathrel{⋯} + {a}_{{s}_{p}}{z}_{p,{s}_{p}}. The statement just above means that x ∈K\kern -1.95872pt \left (T\right ) ⊆K\kern -1.95872pt \left ({T}^{p−1}\right ) (Definition KLT, Theorem KPNLT). As defined, x is a linear combination of the basis vectors {B}_{p}, and therefore x ∈ {Z}_{p}. Thus x ∈K\kern -1.95872pt \left ({T}^{p−1}\right ) ∩ {Z}_{ p} (Definition SI). Because V = K\kern -1.95872pt \left ({T}^{p−1}\right ) ⊕ {Z}_{ p}, Theorem DSZI tells us that x = 0. Now we recognize the definition of x as a relation of linear dependence on the linearly independent set {B}_{p}, and therefore {a}_{1} = {a}_{2} = \mathrel{⋯} = {a}_{{s}_{p}} = 0 (Definition LI). This establishes the linear independence of {z}_{p−1,j}, 1 ≤ j ≤ {s}_{p} (Definition LI).
We also need to know where the vectors {z}_{p−1,j}, 1 ≤ j ≤ {s}_{p} live. First we demonstrate that they are members of K\kern -1.95872pt \left ({T}^{p−1}\right ).
So {z}_{p−1,j} ∈K\kern -1.95872pt \left ({T}^{p−1}\right ), 1 ≤ j ≤ {s}_{p}. However, we now show that these vectors are not elements of K\kern -1.95872pt \left ({T}^{p−2}\right ). Suppose to the contrary (Technique CD) that {z}_{p−1,j} ∈K\kern -1.95872pt \left ({T}^{p−2}\right ). Then
which contradicts the earlier statement that {z}_{p,j}∉K\kern -1.95872pt \left ({T}^{p−1}\right ). So {z}_{p−1,j}∉K\kern -1.95872pt \left ({T}^{p−2}\right ), 1 ≤ j ≤ {s}_{p}.
Now choose a basis {C}_{p−2} = \left \{{u}_{1},\kern 1.95872pt {u}_{2},\kern 1.95872pt {u}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {u}_{{n}_{p−2}}\right \} for K\kern -1.95872pt \left ({T}^{p−2}\right ). We want to extend this basis by adding in the {z}_{p−1,j} to span a subspace of K\kern -1.95872pt \left ({T}^{p−1}\right ). But first we want to know that this set is linearly independent. Let {a}_{k}, 1 ≤ k ≤ {n}_{p−2} and {b}_{j}, 1 ≤ j ≤ {s}_{p} be the scalars in a relation of linear dependence,
Then,
Define y = {b}_{1}{z}_{p,1} + {b}_{2}{z}_{p,2} + \mathrel{⋯} + {b}_{{s}_{p}}{z}_{p,{s}_{p}}. The statement just above means that y ∈K\kern -1.95872pt \left ({T}^{p−1}\right ) (Definition KLT). As defined, y is a linear combination of the basis vectors {B}_{p}, and therefore y ∈ {Z}_{p}. Thus y ∈K\kern -1.95872pt \left ({T}^{p−1}\right ) ∩ {Z}_{ p}. Because V = K\kern -1.95872pt \left ({T}^{p−1}\right ) ⊕ {Z}_{ p}, Theorem DSZI tells us that y = 0. Now we recognize the definition of y as a relation of linear dependence on the linearly independent set {B}_{p}, and therefore {b}_{1} = {b}_{2} = \mathrel{⋯} = {b}_{{s}_{p}} = 0 (Definition LI). Return to the full relation of linear dependence with both sets of scalars (the {a}_{i} and {b}_{j}). Now that we know that {b}_{j} = 0 for 1 ≤ j ≤ {s}_{p}, this relation of linear dependence simplifies to a relation of linear dependence on just the basis {C}_{p−1}. Therefore, {a}_{i} = 0, 1 ≤ {a}_{i} ≤ {n}_{p−1} and we have the desired linear independence.
Define a new subspace of K\kern -1.95872pt \left ({T}^{p−1}\right ) as
By Theorem DSFOS there exists a subspace of K\kern -1.95872pt \left ({T}^{p−1}\right ) which will pair with {Q}_{p−1} to form a direct sum. Call this subspace {R}_{p−1}, so by definition, K\kern -1.95872pt \left ({T}^{p−1}\right ) = {Q}_{ p−1} ⊕ {R}_{p−1}. We are interested in the dimension of {R}_{p−1}. Note first, that since the spanning set of {Q}_{p−1} is linearly independent, \mathop{ dim}\nolimits \left ({Q}_{p−1}\right ) = {n}_{p−2} + {s}_{p}. Then
Notice that if {s}_{p−1} = {s}_{p}, then {R}_{p−1} is trivial. Now choose a basis of {R}_{p−1}, and denote these {s}_{p−1} − {s}_{p} vectors as {z}_{p−1,{s}_{p}+1}, {z}_{p−1,{s}_{p}+2}, {z}_{p−1,{s}_{p}+3}, …, {z}_{p−1,{s}_{p−1}}. This is another occassion to notice that we have some freedom in this choice.
We now have K\kern -1.95872pt \left ({T}^{p−1}\right ) = {Q}_{ p−1} ⊕ {R}_{p−1}, and we have bases for each of the two subspaces. The union of these two bases will therefore be a linearly independent set in K\kern -1.95872pt \left ({T}^{p−1}\right ) with size
So, by Theorem G, the following set is a basis of K\kern -1.95872pt \left ({T}^{p−1}\right ),
We built up this basis in three parts, we will now split it in half. Define the subspace {Z}_{p−1} by
where we have implicitly denoted the basis as {B}_{p−1}. Then Theorem DSFB allows us to split up the basis for K\kern -1.95872pt \left ({T}^{p−1}\right ) as {C}_{p−1} ∪ {B}_{p−1} and write
Whew! This is a good place to recap what we have achieved. The vectors {z}_{i,j} form bases for the subspaces {Z}_{i} and right now
The key feature of this decomposition of V is that the first {s}_{p} vectors in the basis for {Z}_{p−1} are outputs of the linear transformation T using the basis vectors of {Z}_{p} as inputs.
Now we want to further decompose K\kern -1.95872pt \left ({T}^{p−2}\right ) (into K\kern -1.95872pt \left ({T}^{p−3}\right ) and {Z}_{p−2}). The procedure is the same as above, so we will only sketch the key steps. Checking the details procedes in the same manner as above. Technically, we could have set up the preceding as the induction step in a proof by induction (Technique I), but this probably would make the proof harder to understand.
Hit each element of {B}_{p−1} with T, to create vectors {z}_{p−2,j}, 1 ≤ j ≤ {s}_{p−1}. These vectors form a linearly independent set, and each is an element of K\kern -1.95872pt \left ({T}^{p−2}\right ), but not an element of K\kern -1.95872pt \left ({T}^{p−3}\right ). Grab a basis {C}_{p−3} of K\kern -1.95872pt \left ({T}^{p−3}\right ) and tack on the newly-created vectors {z}_{p−2,j}, 1 ≤ j ≤ {s}_{p−1}. This expanded set is linearly independent, and we can define a subspace {Q}_{p−2} using it as a basis. Theorem DSFOS gives us a subspace {R}_{p−2} such that K\kern -1.95872pt \left ({T}^{p−2}\right ) = {Q}_{ p−2} ⊕ {R}_{p−2}. Vectors {z}_{p−2,j}, {s}_{p−1} + 1 ≤ j ≤ {s}_{p−2} are chosen as a basis for {R}_{p−2} once the relevant dimensions have been verified. The union of {C}_{p−3} and {z}_{p−2,j}, 1 ≤ j ≤ {s}_{p−2} then form a basis of K\kern -1.95872pt \left ({T}^{p−2}\right ), which can be split into two parts to yield the decomposition
Here {Z}_{p−2} is the subspace of K\kern -1.95872pt \left ({T}^{p−2}\right ) with basis {B}_{p−2} = \left \{{z}_{p−2,j}\mathrel{∣}1 ≤ j ≤ {s}_{p−2}\right \}. Finally,
Again, the key feature of this decomposition is that the first vectors in the basis of {Z}_{p−2} are outputs of T using vectors from the basis {Z}_{p−1} as inputs (and in turn, some of these inputs are outputs of T derived from inputs in {Z}_{p}).
Now assume we repeat this procedure until we decompose K\kern -1.95872pt \left ({T}^{2}\right ) into subspaces K\kern -1.95872pt \left (T\right ) and {Z}_{2}. Finally, decompose K\kern -1.95872pt \left (T\right ) into subspaces K\kern -1.95872pt \left ({T}^{0}\right ) = K\kern -1.95872pt \left ({I}_{ n}\right ) = \left \{0\right \} and {Z}_{1}, so that we recognize the vectors {z}_{1,j}, 1 ≤ j ≤ {s}_{1} = {n}_{1} as elements of K\kern -1.95872pt \left (T\right ). The set
is linearly independent by Theorem DSLI and has size
So by Theorem G, B is a basis of V . We desire a matrix representation of T relative to B (Definition MR), but first we will reorder the elements of B. The following display lists the elements of B in the desired order, when read across the rows right-to-left in the usual way. Notice that we arrived at these vectors column-by-column, beginning on the right.
It is difficult to layout this table with the notation we have been using, nor would it be especially useful to invent some notation to overcome the difficulty. (One approach would be to define something like the inverse of the nonincreasing function, i → {s}_{i}.) Do notice that there are {s}_{1} = {n}_{1} rows and d columns. Column i is the basis {B}_{i}. The vectors in the first column are elements of K\kern -1.95872pt \left (T\right ). Each row is the same length, or shorter, than the one above it. If we apply T to any vector in the table, other than those in the first column, the output is the preceding vector in the row.
Now contemplate the matrix representation of T relative to B as we read across the rows of the table above. In the first row, T\left ({z}_{1,1}\right ) = 0, so the first column of the representation is the zero column. Next, T\left ({z}_{2,1}\right ) = {z}_{1,1}, so the second column of the representation is a vector with a single one in the first entry, and zeros elsewhere. Next, T\left ({z}_{3,1}\right ) = {z}_{2,1}, so column 3 of the representation is a zero, then a one, then all zeros. Continuing in this vein, we obtain the first d columns of the representation, which is the Jordan block {J}_{d}\left (0\right ) followed by rows of zeros.
When we apply T to the basis vectors of the second row, what happens? Applying T to the first vector, the result is the zero vector, so the representation gets a zero column. Applying T to the second vector in the row, the output is simply the first vector in that row, making the next column of the representation all zeros plus a lone one, sitting just above the diagonal. Continuing, we create a Jordan block, sitting on the diagonal of the matrix representation. It is not possible in general to state the size of this block, but since the second row is no longer than the first, it cannot have size larger than d.
Since there are as many rows as the dimension of K\kern -1.95872pt \left (T\right ), the representation contains as many Jordan blocks as the nullity of T, n\left (T\right ). Each successive block is smaller than the preceding one, with the first, and largest, having size d. The blocks are Jordan blocks since the basis vectors {z}_{i,j} were often defined as the result of applying T to other elements of the basis already determined, and then we rearranged the basis into an order that placed outputs of T just before their inputs, excepting the start of each row, which was an element of K\kern -1.95872pt \left (T\right ). ■
The proof of Theorem CFNLT is constructive (Technique C), so we can use it to create bases of nilpotent linear transformations with pleasing matrix representations. Recall that Theorem DNLT told us that nilpotent linear transformations are almost never diagonalizable, so this is progress. As we have hinted before, with a nice representation of nilpotent matrices, it will not be difficult to build up representations of other non-diagonalizable matrices. Here is the promised example which illustrates the previous theorem. It is a useful companion to your study of the proof of Theorem CFNLT.
Example CFNLT
Canonical form for a nilpotent linear transformation
The 6 × 6 matrix,
A, of Example NM64 is
nilpotent of index p = 4. If we define
the linear transformation T : {ℂ}^{6}\mathrel{↦}{ℂ}^{6}
by T\left (x\right ) = Ax, then
T is nilpotent of
index 4 and we can
seek a basis of {ℂ}^{6}
that yields a matrix representation with Jordan blocks on the diagonal. The nullity of
T is
2, so
from Theorem CFNLT we can expect the largest Jordan block to be
{J}_{4}\left (0\right ), and
there will be just two blocks. This only leaves enough room for the second block
to have size 2.
We will recycle the bases for the null spaces of the powers of A from Example KPNLT rather than recomputing them here. We will also use the same notation used in the proof of Theorem CFNLT.
To begin, {s}_{4} = {n}_{4} − {n}_{3} = 6 − 5 = 1, so we need one vector of K\kern -1.95872pt \left ({T}^{4}\right ) = {ℂ}^{6}, that is not in K\kern -1.95872pt \left ({T}^{3}\right ), to be a basis for {Z}_{4}. We have a lot of latitude in this choice, and we have not described any sure-fire method for constructing a vector outside of a subspace. Looking at the basis for K\kern -1.95872pt \left ({T}^{3}\right ) we see that if a vector is in this subspace, and has a nonzero value in the first entry, then it must also have a nonzero value in the fourth entry. So the vector
will not be an element of K\kern -1.95872pt \left ({T}^{3}\right ) (notice that many other choices could be made here, so our basis will not be unique). This completes the determination of {Z}_{p} = {Z}_{4}.
Next, {s}_{3} = {n}_{3} − {n}_{2} = 5 − 4 = 1, so we again need just a single basis vector for {Z}_{3}. We start by evaluating T with each basis vector of {Z}_{4},
Since {s}_{3} = {s}_{4}, the subspace {R}_{3} is trivial, and there is nothing left to do, {z}_{3,1} is the lone basis vector of {Z}_{3}.
Now {s}_{2} = {n}_{2} − {n}_{1} = 4 − 2 = 2, so the construction of {Z}_{2} will not be as simple as the construction of {Z}_{3}. We first apply T to the basis vector of {Z}_{2},
The two basis vectors of K\kern -1.95872pt \left ({T}^{1}\right ), together with {z}_{2,1}, form a basis for {Q}_{2}. Because \mathop{ dim}\nolimits \left (K\kern -1.95872pt \left ({T}^{2}\right )\right ) −\mathop{ dim}\nolimits \left ({Q}_{ 2}\right ) = 4 − 3 = 1 we need only find a single basis vector for {R}_{2}. This vector must be an element of K\kern -1.95872pt \left ({T}^{2}\right ), but not an element of {Q}_{2}. Again, there is a variety of vectors that fit this description, and we have no precise algorithm for finding them. Since they are plentiful, they are not too hard to find. We add up the four basis vectors of K\kern -1.95872pt \left ({T}^{2}\right ), ensuring an element of K\kern -1.95872pt \left ({T}^{2}\right ). Then we check to see if the vector is a linear combination of three vectors: the two basis vectors of K\kern -1.95872pt \left ({T}^{1}\right ) and {z}_{2,1}. Having passed the tests, we have chosen
Thus, {Z}_{2} = \left \langle \left \{{z}_{2,1},\kern 1.95872pt {z}_{2,2}\right \}\right \rangle .
Lastly, {s}_{1} = {n}_{1} − {n}_{0} = 2 − 0 = 2. Since {s}_{2} = {s}_{1}, we again have a trivial {R}_{1} and need only complete our basis by evaluating the basis vectors of {Z}_{2} with T,
Now we reorder these vectors as the desired basis,
We now apply Definition MR to build a matrix representation of T relative to B,
Installing these vectors as the columns of the matrix representation we have
which is a block diagonal matrix with Jordan blocks {J}_{4}\left (0\right ) and {J}_{2}\left (0\right ). If we constructed the matrix S having the vectors of B as columns, then Theorem SCB tells us that a similarity transformation with S relates the original matrix repreentation of T with the matrix representation consisting of Jordan blocks., i.e. {S}^{−1}AS = {M}_{ B,B}^{T }. ⊠
Notice that constructing interesting examples of matrix representations requires domains with dimensions bigger than just two or three. Going forward we will see several more big examples.