Section NLT  Nilpotent Linear Transformations

From A First Course in Linear Algebra
Version 2.20
© 2004.
Licensed under the GNU Free Documentation License.
http://linear.ups.edu/

This section is in draft form
Nearly complete

We have seen that some matrices are diagonalizable and some are not. Some authors refer to a non-diagonalizable matrix as defective, but we will study them carefully anyway. Examples of such matrices include Example EMMS4, Example HMEM5, and Example CEMS6. Each of these matrices has at least one eigenvalue with geometric multiplicity strictly less than its algebraic multiplicity, and therefore Theorem DMFE tells us these matrices are not diagonalizable.

Given a square matrix A, it is likely similar to many, many other matrices. Of all these possibilities, which is the best? “Best” is a subjective term, but we might agree that a diagonal matrix is certainly a very nice choice. Unfortunately, as we have seen, this will not always be possible. What form of a matrix is “next-best”? Our goal, which will take us several sections to reach, is to show that every matrix is similar to a matrix that is “nearly-diagonal” (Section JCF). More precisely, every matrix is similar to a matrix with elements on the diagonal, and zeros and ones on the diagonal just above the main diagonal (the “super diagonal”), with zeros everywhere else. In the language of equivalence relations (see Theorem SER), we are determining a systematic representative for each equivalence class. Such a representative for a set of similar matrices is called a canonical form.

We have just discussed the determination of a canonical form as a question about matrices. However, we know that every square matrix creates a natural linear transformation (Theorem MBLT) and every linear transformation with identical domain and codomain has a square matrix representation for each choice of a basis, with a change of basis creating a similarity transformation (Theorem SCB). So we will state, and prove, theorems using the language of linear transformations on abstract vector spaces, while most of our examples will work with square matrices. You can, and should, mentally translate between the two settings frequently and easily.

Subsection NLT: Nilpotent Linear Transformations

We will discover that nilpotent linear transformations are the essential obstacle in a non-diagonalizable linear transformation. So we will study them carefully first, both as an object of inherent mathematical interest, but also as the object at the heart of the argument that leads to a pleasing canonical form for any linear transformation. Once we understand these linear transformations thoroughly, we will be able to easily analyze the structure of any linear transformation.

Definition NLT
Nilpotent Linear Transformation
Suppose that T : V V is a linear transformation such that there is an integer p > 0 such that Tp v = 0 for every v V . The smallest p for which this condition is met is called the index of T.

Of course, the linear transformation T defined by T v = 0 will qualify as nilpotent of index 1. But are there others?

Example NM64
Nilpotent matrix, size 6, index 4
Recall that our definitions and theorems are being stated for linear transformations on abstract vector spaces, while our examples will work with square matrices (and use the same terms interchangeably). In this case, to demonstrate the existence of nontrivial nilpotent linear transformations, we desire a matrix such that some power of the matrix is the zero matrix. Consider

A = 3325 0 5 3 5 3 4 3 9 3 42643 3 3 2 5 0 5 3 324 2 6 2 3 2 2 4 7  and compute powers of A, A2 = 1 21 0 34 0 2 1 1 3 4 3 0 03 0 0 1 2 1 0 3 4 0 21 1 341 2 1 2 3 4 A3 = 100100 1 0 0 1 0 0 0 00 0 00 1 0 0 1 0 0 1 00100 1 0 0 1 0 0 A4 = 000000 0 0 0 0 0 0 0 00000 0 0 0 0 0 0 0 00000 0 0 0 0 0 0

Thus we can say that A is nilpotent of index 4.

Because it will presage some upcoming theorems, we will record some extra information about the eigenvalues and eigenvectors of A here. A has just one eigenvalue, λ = 0, with algebraic multiplicity 6 and geometric multiplicity 2. The eigenspace for this eigenvalue is

A 0 = 2 2 5 2 1 0 , 1 1 5 1 0 1

If there were degrees of singularity, we might say this matrix was very singular, since zero is an eigenvalue with maximum algebraic multiplicity (Theorem SMZE, Theorem ME). Notice too that A is “far” from being diagonalizable (Theorem DMFE).

Another example.

Example NM62
Nilpotent matrix, size 6, index 2
Consider the matrix

B = 1 1 1 4 3 1 1 1 1 2 3 1 9 105 9 5 15 1 1 1 4 3 1 1 1 0 2 4 2 4 3 1 1 5 5  and compute the second power of B, B2 = 000000 0 0 0 0 0 0 0 00000 0 0 0 0 0 0 0 00000 0 0 0 0 0 0

So B is nilpotent of index 2. Again, the only eigenvalue of B is zero, with algebraic multiplicity 6. The geometric multiplicity of the eigenvalue is 3, as seen in the eigenspace,

B 0 = 1 3 6 1 0 0 , 0 4 7 0 1 0 , 0 2 1 0 0 1

Again, Theorem DMFE tells us that B is far from being diagonalizable.

On a first encounter with the definition of a nilpotent matrix, you might wonder if such a thing was possible at all. That a high power of a nonzero object could be zero is so very different from our experience with scalars that it seems very unnatural. Hopefully the two previous examples were somewhat surprising. But we have seen that matrix algebra does not always behave the way we expect (Example MMNC), and we also now recognize matrix products not just as arithmetic, but as function composition (Theorem MRCLT). We will now turn to some examples of nilpotent matrices which might be more transparent.

Definition JB
Jordan Block
Given the scalar λ , the Jordan block Jn λ is the n × n matrix defined by

Jn λij = λi = j 1 j = i + 1 0  otherwise

(This definition contains Notation JB.)

Example JB4
Jordan block, size 4
A simple example of a Jordan block,

J4 5 = 5100 0 5 1 0 0 051 0 0 0 5

We will return to general Jordan blocks later, but in this section we are just interested in Jordan blocks where λ = 0. Here’s an example of why we are specializing in these matrices now.

Example NJB5
Nilpotent Jordan block, size 5
Consider

J5 0 = 01000 0 0 1 0 0 0 0010 0 0 0 0 1 0 0000  and compute powers, J5 0 2 = 00100 0 0 0 1 0 0 0001 0 0 0 0 0 0 0000 J5 0 3 = 00010 0 0 0 0 1 0 0000 0 0 0 0 0 0 0000 J5 0 4 = 00001 0 0 0 0 0 0 0000 0 0 0 0 0 0 0000 J5 0 5 = 00000 0 0 0 0 0 0 0000 0 0 0 0 0 0 0000

So J5 0 is nilpotent of index 5. As before, we record some information about the eigenvalues and eigenvectors of this matrix. The only eigenvalue is zero, with algebraic multiplicity 5, the maximum possible (Theorem ME). The geometric multiplicity of this eigenvalue is just 1, the minimum possible (Theorem ME), as seen in the eigenspace,

J50 0 = 1 0 0 0 0

There should not be any real surprises in this example. We can watch the ones in the powers of J5 0 slowly march off to the upper-right hand corner of the powers. In some vague way, the eigenvalues and eigenvectors of this matrix are equally extreme.

We can form combinations of Jordan blocks to build a variety of nilpotent matrices. Simply place Jordan blocks on the diagonal of a matrix with zeros everywhere else, to create a block diagonal matrix.

Example NM83
Nilpotent matrix, size 8, index 3
Consider the matrix

C = J3 0 O O O J3 0 O O O J2 0 = 01000000 0 0 1 0 0 0 0 0 0 0000000 0 0 0 0 1 0 0 0 0 0000100 0 0 0 0 0 0 0 0 0 0000001 0 0 0 0 0 0 0 0  and compute powers, C2 = 00100000 0 0 0 0 0 0 0 0 0 0000000 0 0 0 0 0 1 0 0 0 0000000 0 0 0 0 0 0 0 0 0 0000000 0 0 0 0 0 0 0 0 C3 = 00000000 0 0 0 0 0 0 0 0 0 0000000 0 0 0 0 0 0 0 0 0 0000000 0 0 0 0 0 0 0 0 0 0000000 0 0 0 0 0 0 0 0

So C is nilpotent of index 3. You should notice how block diagonal matrices behave in products (much like diagonal matrices) and that it was the largest Jordan block that determined the index of this combination. All eight eigenvalues are zero, and each of the three Jordan blocks contributes one eigenvector to a basis for the eigenspace, resulting in zero having a geometric multiplicity of 3.

It would appear that nilpotent matrices only have zero as an eigenvalue, so the algebraic multiplicity will be the maximum possible. However, by creating block diagonal matrices with Jordan blocks on the diagonal you should be able to attain any desired geometric multiplicity for this lone eigenvalue. Likewise, the size of the largest Jordan block employed will determine the index of the matrix. So nilpotent matrices with various combinations of index and geometric multiplicities are easy to manufacture. The predictable properties of block diagonal matrices in matrix products and eigenvector computations, along with the next theorem, make this possible. You might find Example NJB5 a useful companion to this proof.

Theorem NJB
Nilpotent Jordan Blocks
The Jordan block Jn 0 is nilpotent of index n.

Proof   While not phrased as an if-then statement, the statement in the theorem is understood to mean that if we have a specific matrix (Jn 0) then we need to establish it is nilpotent of a specified index. The first column of Jn 0 is the zero vector, and the remaining n 1 columns are the standard unit vectors ei, 1 i n 1 (Definition SUV), which are also the first n 1 columns of the size n identity matrix In. As shorthand, write J = Jn 0.

J = 0 e1 e2 e3 en1

We will use the definition of matrix multiplication (Definition MM), together with a proof by induction (Technique I), to study the powers of J. Our claim is that

Jk = 0 0 0 e 1 e2 enk

for 1 k n. For the base case, k = 1, and the definition of J1 = J n 0 establishes the claim. For the induction step, first note that Je1 = 0 and Jei = ei1 for 2 i n. Then, assuming the claim is true for k, we examine the k + 1 case,

Jk+1 = JJk = J 0 0 0 e1 e2 enk  Induction Hypothesis = J0 J0 J0 Je1 Je2 Jenk  Definition MM = 0 0 0 0 e1 e2 enk1  Definition MVP = 0 0 0 e1 e2 en(k+1)

This concludes the induction. So Jk has a nonzero entry (a one) in row n k and column n, for 1 k n 1, and is therefore a nonzero matrix. However, Jn = 0 0 0 = O. By Definition NLT, J is nilpotent of index n.

Subsection PNLT: Properties of Nilpotent Linear Transformations

In this subsection we collect some basic properties of nilpotent linear transformations. After studying the examples in the previous section, some of these will be no surprise.

Theorem ENLT
Eigenvalues of Nilpotent Linear Transformations
Suppose that T : V V is a nilpotent linear transformation and λ is an eigenvalue of T. Then λ = 0.

Proof   Let x be an eigenvector of T for the eigenvalue λ, and suppose that T is nilpotent with index p. Then

0 = Tp x  Definition NLT = λpx  Theorem EOMP

Because x is an eigenvector, it is nonzero, and therefore Theorem SMEZV tells us that λp = 0 and so λ = 0.

Paraphrasing, all of the eigenvalues of a nilpotent linear transformation are zero. So in particular, the characteristic polynomial of a nilpotent linear transformation, T, on a vector space of dimension n, is simply pT x = xn.

The next theorem is not critical for what follows, but it will explain our interest in nilpotent linear transformations. More specifically, it is the first step in backing up the assertion that nilpotent linear transformations are the essential obstacle in a non-diagonalizable linear transformation. While it is not obvious from the statement of the theorem, it says that a nilpotent linear transformation is not diagonalizable, unless it is trivially so.

Theorem DNLT
Diagonalizable Nilpotent Linear Transformations
Suppose the linear transformation T : V V is nilpotent. Then T is diagonalizable if and only T is the zero linear transformation.

Proof   We start with the easy direction. Let n = dim V .

( ) The linear transformation Z : V V defined by Z v = 0 for all v V is nilpotent of index p = 1 and a matrix representation relative to any basis of V is the n × n zero matrix, O. Quite obviously, the zero matrix is a diagonal matrix (Definition DIM) and hence Z is diagonalizable (Definition DZM).

( ) Assume now that T is diagonalizable, so γT λ = αT λ for every eigenvalue λ (Theorem DMFE). By Theorem ENLT, T has only one eigenvalue (zero), which therefore must have algebraic multiplicity n (Theorem NEM). So the geometric multiplicity of zero will be n as well, γT 0 = n.

Let B be a basis for the eigenspace T 0. Then B is a linearly independent subset of V of size n, and by Theorem G will be a basis for V . For any x B we have

T x = 0x  Definition EM = 0  Theorem ZSSM

So T is identically zero on a basis for B, and since the action of a linear transformation on a basis determines all of the values of the linear transformation (Theorem LTDB), it is easy to see that T v = 0 for every v V .

So, other than one trivial case (the zero matrix), every nilpotent linear transformation is not diagonalizable. It remains to see what is so “essential” about this broad class of non-diagonalizable linear transformations. For this we now turn to a discussion of kernels of powers of nilpotent linear transformations, beginning with a result about general linear transformations that may not necessarily be nilpotent.

Theorem KPLT
Kernels of Powers of Linear Transformations
Suppose T : V V is a linear transformation, where dim V = n. Then there is an integer m, 0 m n, such that

0 = KT0 KT1 KT2 KTm = KTm+1 = KTm+2 =

Proof   There are several items to verify in the conclusion as stated. First, we show that KTk KTk+1 for any k. Choose z KTk. Then

Tk+1 z = T Tk z  Definition LTC = T 0  Definition KLT = 0  Theorem LTTZZ

So by Definition KLT, z KTk+1 and by Definition SSET we have KTk KTk+1.

Second, we demonstrate the existence of a power m where consecutive powers result in equal kernels. A by-product will be the condition that m can be chosen so that m n. To the contrary, suppose that

0 = KT0 KT1 KT2 KTn1 KTn KTn+1

Since KTk KTk+1, Theorem PSSD implies that dim KTk+1 dim KTk + 1. Repeated application of this observation yields

dim KTn+1 dim KTn + 1 dim KTn1 + 2 dim KT0 + (n + 1) = dim 0 + n + 1 = n + 1

Thus, KTn+1 has a basis of size at least n + 1, which is a linearly independent set of size greater than n in the vector space V of dimension n. This contradicts Theorem G.

This contradiction yields the existence of an integer k such that KTk = KTk+1, so we can define m to be smallest such integer with this property. From the argument above about dimensions resulting from a strictly increasing chain of subspaces, it should be clear that m n.

It remains to show that once two consecutive kernels are equal, then all of the remaining kernels are equal. More formally, if KTm = KTm+1, then KTm = KTm+j for all j 1. We will give a proof by induction on j (Technique I). The base case (j = 1) is precisely our defining property for m.

In the induction step, we assume that KTm = KTm+j and endeavor to show that KTm = KTm+j+1. At the outset of this proof we established that KTm KTm+j+1. So Definition SE requires only that we establish the subset inclusion in the opposite direction. To wit, choose z KTm+j+1. Then

0 = Tm+j+1 z  Definition KLT = Tm+j T z  Definition LTC = Tm T z  Induction Hypothesis = Tm+1 z  Definition LTC = Tm z  Base Case

So by Definition KLT, z KTm as desired.

We now specialize Theorem KPLT to the case of nilpotent linear transformations, which buys us just a bit more precision in the conclusion.

Theorem KPNLT
Kernels of Powers of Nilpotent Linear Transformations
Suppose T : V V is a nilpotent linear transformation with index p and dim V = n. Then 0 p n and

0 = KT0 KT1 KT2 KTp = KTp+1 = = V

Proof   Since Tp = 0 it follows that Tp+j = 0 for all j 0 and thus KTp+j = V for j 0. So the value of m guaranteed by Theorem KPLT is at most p. The only remaining aspect of our conclusion that does not follow from Theorem KPLT is that m = p. To see this we must show that KTk KTk+1 for 0 k p 1. If KTk = KTk+1 for some k < p, then KTk = KTp = V . This implies that Tk = 0, violating the fact that T has index p. So the smallest value of m is indeed p, and we learn that p < n.

The structure of the kernels of powers of nilpotent linear transformations will be crucial to what follows. But immediately we can see a practical benefit. Suppose we are confronted with the question of whether or not an n × n matrix, A, is nilpotent or not. If we don’t quickly find a low power that equals the zero matrix, when do we stop trying higher and higher powers? Theorem KPNLT gives us the answer: if we don’t see a zero matrix by the time we finish computing An, then it is not going to ever happen. We’ll now take a look at one example of Theorem KPNLT in action.

Example KPNLT
Kernels of powers of a nilpotent linear transformation
We will recycle the nilpotent matrix A of index 4 from Example NM64. We now know that would have only needed to look at the first 6 powers of A if the matrix had not been nilpotent. We list bases for the null spaces of the powers of A. (Notice how we are using null spaces for matrices interchangeably with kernels of linear transformations, see Theorem KNSI for justification.)

NA = N3325 0 5 3 5 3 4 3 9 3 42643 3 3 2 5 0 5 3 324 2 6 2 3 2 2 4 7 = 2 2 5 2 1 0 , 1 1 5 1 0 1 NA2 = N1 21 0 34 0 2 1 1 3 4 3 0 03 0 0 1 2 1 0 3 4 0 21 1 341 2 1 2 3 4 = 0 1 2 0 0 0 , 2 1 0 2 0 0 , 0 3 0 0 2 0 , 0 2 0 0 0 1 NA3 = N100100 1 0 0 1 0 0 0 00 0 00 1 0 0 1 0 0 1 00100 1 0 0 1 0 0 = 0 1 0 0 0 0 , 0 0 1 0 0 0 , 1 0 0 1 0 0 , 0 0 0 0 1 0 , 0 0 0 0 0 1 NA4 = N000000 0 0 0 0 0 0 0 00000 0 0 0 0 0 0 0 00000 0 0 0 0 0 0 = 1 0 0 0 0 0 , 0 1 0 0 0 0 , 0 0 1 0 0 0 , 0 0 0 1 0 0 , 0 0 0 0 1 0 , 0 0 0 0 0 1

With the exception of some convenience scaling of the basis vectors in NA2 these are exactly the basis vectors described in Theorem BNS. We can see that the dimension of NA equals the geometric multiplicity of the zero eigenvalue. Why is this not an accident? We can see the dimensions of the kernels consistently increasing, and we can see that NA4 = 6. But Theorem KPNLT says a little more. Each successive kernel should be a superset of the previous one. We ought to be able to begin with a basis of NA and extend it to a basis of NA2. Then we should be able to extend a basis of NA2 into a basis of NA3, all with repeated applications of Theorem ELIS. Verify the following,

NA = 2 2 5 2 1 0 , 1 1 5 1 0 1 NA2 = 2 2 5 2 1 0 , 1 1 5 1 0 1 , 0 3 0 0 2 0 , 0 2 0 0 0 1 NA3 = 2 2 5 2 1 0 , 1 1 5 1 0 1 , 0 3 0 0 2 0 , 0 2 0 0 0 1 , 0 0 0 0 0 1 NA4 = 2 2 5 2 1 0 , 1 1 5 1 0 1 , 0 3 0 0 2 0 , 0 2 0 0 0 1 , 0 0 0 0 0 1 , 0 0 0 1 0 0

Do not be concerned at the moment about how these bases were constructed since we are not describing the applications of Theorem ELIS here. Do verify carefully for each alleged basis that, (1) it is a superset of the basis for the previous kernel, (2) the basis vectors really are members of the kernel of the right power of A, (3) the basis is a linearly independent set, (4) the size of the basis is equal to the size of the basis found previously for each kernel. With these verifications, Theorem G will tell us that we have successfully demonstrated what Theorem KPNLT guarantees.

Subsection CFNLT: Canonical Form for Nilpotent Linear Transformations

Our main purpose in this section is to find a basis so that a nilpotent linear transformation will have a pleasing, nearly-diagonal matrix representation. Of course, we will not have a definition for “pleasing,” nor for “nearly-diagonal.” But the short answer is that our preferred matrix representation will be built up from Jordan blocks, Jn 0. Here’s the theorem. You will find Example CFNLT helpful as you study this proof, since it uses the same notation, and is large enough to (barely) illustrate the full generality of the theorem (see ).

Theorem CFNLT
Canonical Form for Nilpotent Linear Transformations
Suppose that T : V V is a nilpotent linear transformation of index p. Then there is a basis for V so that the matrix representation, MB,BT , is block diagonal with each block being a Jordan block, Jn 0. The size of the largest block is the index p, and the total number of blocks is the nullity of T, n T.

Proof   We will explicitly construct the desired basis, so the proof is constructive (Technique C), and can be used in practice. As we begin, the basis vectors will not be in the proper order, but we will rearrange them at the end of the proof. For convenience, define ni = n Ti, so for example, n0 = 0, n1 = n T and np = n Tp = dim V . Define si = ni ni1, for 1 i p, so we can think of si as “how much bigger” KTi is than KTi1. In particular, Theorem KPNLT implies that si > 0 for 1 i p.

We are going to build a set of vectors zi,j, 1 i p, 1 j si. Each zi,j will be an element of KTi and not an element of KTi1. In total, we will obtain a linearly independent set of i=1ps i = i=1pn i ni1 = np n0 = dim V vectors that form a basis of V . We construct this set in pieces, starting at the “wrong” end. Our procedure will build a series of subspaces, Zi, each lying in between KTi1 and KTi, having bases zi,j, 1 j si, and which together equal V as a direct sum. Now would be a good time to review the results on direct sums collected in Subsection PD.DS. OK, here we go.

We build the subspace Zp first (this is what we meant by “starting at the wrong end”). KTp1 is a proper subspace of KTp = V (Theorem KPNLT). Theorem DSFOS says that there is a subspace of V that will pair with the subspace KTp1 to form a direct sum of V . Call this subspace Zp, and choose vectors zp,j, 1 j sp as a basis of Zp, which we will denote as Bp. Note that we have a fair amount of freedom in how to choose these first basis vectors. Several observations will be useful in the next step. First V = KTp1 Z p. The basis Bp = zp,1,zp,2,zp,3,,zp,sp is linearly independent. For 1 j sp, zp,j KTp = V . Since the two subspaces of a direct sum have no nonzero vectors in common (Theorem DSZI), for 1 j sp, zp,jKTp1. That was comparably easy.

If obtaining Zp was easy, getting Zp1 will be harder. We will repeat the next step p 1 times, and so will do it carefully the first time. Eventually, Zp1 will have dimension sp1. However, the first sp vectors of a basis are straightforward. Define zp1,j = T zp,j, 1 j sp. Notice that we have no choice in creating these vectors, they are a consequence of our choices for zp,j. In retrospect (i.e. on a second reading of this proof), you will recognize this as the key step in realizing a matrix representation of a nilpotent linear transformation with Jordan blocks. We need to know that this set of vectors in linearly independent, so start with a relation of linear dependence (Definition RLD), and massage it,

0 = a1zp1,1 + a2zp1,2 + a3zp1,3 + + aspzp1,sp = a1T zp,1 + a2T zp,2 + a3T zp,3 + + aspT zp,sp = T a1zp,1 + a2zp,2 + a3zp,3 + + aspzp,sp  Theorem LTLC

Define x = a1zp,1 + a2zp,2 + a3zp,3 + + aspzp,sp. The statement just above means that x KT KTp1 (Definition KLT, Theorem KPNLT). As defined, x is a linear combination of the basis vectors Bp, and therefore x Zp. Thus x KTp1 Z p (Definition SI). Because V = KTp1 Z p, Theorem DSZI tells us that x = 0. Now we recognize the definition of x as a relation of linear dependence on the linearly independent set Bp, and therefore a1 = a2 = = asp = 0 (Definition LI). This establishes the linear independence of zp1,j, 1 j sp (Definition LI).

We also need to know where the vectors zp1,j, 1 j sp live. First we demonstrate that they are members of KTp1.

Tp1 z p1,j = Tp1 T z p,j = Tp z p,j = 0

So zp1,j KTp1, 1 j sp. However, we now show that these vectors are not elements of KTp2. Suppose to the contrary (Technique CD) that zp1,j KTp2. Then

0 = Tp2 z p1,j = Tp2 T z p,j = Tp1 z p,j

which contradicts the earlier statement that zp,jKTp1. So zp1,jKTp2, 1 j sp.

Now choose a basis Cp2 = u1,u2,u3,,unp2 for KTp2. We want to extend this basis by adding in the zp1,j to span a subspace of KTp1. But first we want to know that this set is linearly independent. Let ak, 1 k np2 and bj, 1 j sp be the scalars in a relation of linear dependence,

0 = a1u1 + a2u2 + + anp2unp2 + b1zp1,1 + b2zp1,2 + + bspzp1,sp

Then,

0 = Tp2 0 = Tp2 a 1u1 + a2u2 + + anp2unp2 + b1zp1,1 + b2zp1,2 + + bspzp1,sp = a1Tp2 u 1 + a2Tp2 u 2 + + anp2Tp2 u np2 + b1Tp2 z p1,1 + b2Tp2 z p1,2 + + bspTp2 z p1,sp = a10 + a20 + + anp20 + b1Tp2 z p1,1 + b2Tp2 z p1,2 + + bspTp2 z p1,sp = b1Tp2 z p1,1 + b2Tp2 z p1,2 + + bspTp2 z p1,sp = b1Tp2 T z p,1 + b2Tp2 T z p,2 + + bspTp2 T z p,sp = b1Tp1 z p,1 + b2Tp1 z p,2 + + bspTp1 z p,sp = Tp1 b 1zp,1 + b2zp,2 + + bspzp,sp

Define y = b1zp,1 + b2zp,2 + + bspzp,sp. The statement just above means that y KTp1 (Definition KLT). As defined, y is a linear combination of the basis vectors Bp, and therefore y Zp. Thus y KTp1 Z p. Because V = KTp1 Z p, Theorem DSZI tells us that y = 0. Now we recognize the definition of y as a relation of linear dependence on the linearly independent set Bp, and therefore b1 = b2 = = bsp = 0 (Definition LI). Return to the full relation of linear dependence with both sets of scalars (the ai and bj). Now that we know that bj = 0 for 1 j sp, this relation of linear dependence simplifies to a relation of linear dependence on just the basis Cp1. Therefore, ai = 0, 1 ai np1 and we have the desired linear independence.

Define a new subspace of KTp1 as

Qp1 = u1,u2,u3,,unp1,zp1,1,zp1,2,zp1,3,,zp1,sp

By Theorem DSFOS there exists a subspace of KTp1 which will pair with Qp1 to form a direct sum. Call this subspace Rp1, so by definition, KTp1 = Q p1 Rp1. We are interested in the dimension of Rp1. Note first, that since the spanning set of Qp1 is linearly independent, dim Qp1 = np2 + sp. Then

dim Rp1 = dim KTp1 dim Q p1  Theorem DSD = np1 np2 + sp = np1 np2 sp = sp1 sp

Notice that if sp1 = sp, then Rp1 is trivial. Now choose a basis of Rp1, and denote these sp1 sp vectors as zp1,sp+1, zp1,sp+2, zp1,sp+3, …, zp1,sp1. This is another occassion to notice that we have some freedom in this choice.

We now have KTp1 = Q p1 Rp1, and we have bases for each of the two subspaces. The union of these two bases will therefore be a linearly independent set in KTp1 with size

np2 + sp + sp1 sp = np2 + sp1 = np2 + np1 np2 = np1 = dim KTp1

So, by Theorem G, the following set is a basis of KTp1,

u1,u2,u3,,unp2,zp1,1,zp1,2,,zp1,sp,zp1,sp+1,zp1,sp+2,,zp1,sp1

We built up this basis in three parts, we will now split it in half. Define the subspace Zp1 by

Zp1 = Bp1 = zp1,1,zp1,2,,zp1,sp1

where we have implicitly denoted the basis as Bp1. Then Theorem DSFB allows us to split up the basis for KTp1 as Cp1 Bp1 and write

KTp1 = KTp2 Z p1

Whew! This is a good place to recap what we have achieved. The vectors zi,j form bases for the subspaces Zi and right now

V = KTp1 Z p = KTp2 Z p1 Zp

The key feature of this decomposition of V is that the first sp vectors in the basis for Zp1 are outputs of the linear transformation T using the basis vectors of Zp as inputs.

Now we want to further decompose KTp2 (into KTp3 and Zp2). The procedure is the same as above, so we will only sketch the key steps. Checking the details proceeds in the same manner as above. Technically, we could have set up the preceding as the induction step in a proof by induction (Technique I), but this probably would make the proof harder to understand.

Hit each element of Bp1 with T, to create vectors zp2,j, 1 j sp1. These vectors form a linearly independent set, and each is an element of KTp2, but not an element of KTp3. Grab a basis Cp3 of KTp3 and tack on the newly-created vectors zp2,j, 1 j sp1. This expanded set is linearly independent, and we can define a subspace Qp2 using it as a basis. Theorem DSFOS gives us a subspace Rp2 such that KTp2 = Q p2 Rp2. Vectors zp2,j, sp1 + 1 j sp2 are chosen as a basis for Rp2 once the relevant dimensions have been verified. The union of Cp3 and zp2,j, 1 j sp2 then form a basis of KTp2, which can be split into two parts to yield the decomposition

KTp2 = KTp3 Z p2

Here Zp2 is the subspace of KTp2 with basis Bp2 = zp2,j1 j sp2. Finally,

V = KTp1 Z p = KTp2 Z p1 Zp = KTp3 Z p2 Zp1 Zp

Again, the key feature of this decomposition is that the first vectors in the basis of Zp2 are outputs of T using vectors from the basis Zp1 as inputs (and in turn, some of these inputs are outputs of T derived from inputs in Zp).

Now assume we repeat this procedure until we decompose KT2 into subspaces KT and Z2. Finally, decompose KT into subspaces KT0 = KI n = 0 and Z1, so that we recognize the vectors z1,j, 1 j s1 = n1 as elements of KT. The set

B = B1 B2 B3 Bp = zi,j1 i p,1 j si

is linearly independent by Theorem DSLI and has size

i=1ps i = i=1pn i ni1 = np n0 = dim V

So by Theorem G, B is a basis of V . We desire a matrix representation of T relative to B (Definition MR), but first we will reorder the elements of B. The following display lists the elements of B in the desired order, when read across the rows left-to-right in the usual way. Notice that we established the existence of these vectors column-by-column, and beginning on the right.

z1,1 z2,1 z3,1 zp,1 z1,2 z2,2 z3,2 zp,2 z1,sp z2,sp z3,sp zp,sp z1,sp+1 z2,sp+1 z3,sp+1 z1,s3 z2,s3 z3,s3 z1,s2 z2,s2 z1,s1

It is difficult to layout this table with the notation we have been using, but it would not be especially useful to invent some notation to overcome the difficulty. (One approach would be to define something like the inverse of the nonincreasing function, i si.) Do notice that there are s1 = n1 rows and p columns. Column i is the basis Bi. The vectors in the first column are elements of KT. Each row is the same length, or shorter, than the one above it. If we apply T to any vector in the table, other than those in the first column, the output is the preceding vector in the row.

Now contemplate the matrix representation of T relative to B as we read across the rows of the table above. In the first row, T z1,1 = 0, so the first column of the representation is the zero column. Next, T z2,1 = z1,1, so the second column of the representation is a vector with a single one in the first entry, and zeros elsewhere. Next, T z3,1 = z2,1, so column 3 of the representation is a zero, then a one, then all zeros. Continuing in this vein, we obtain the first p columns of the representation, which is the Jordan block Jp 0 followed by rows of zeros.

When we apply T to the basis vectors of the second row, what happens? Applying T to the first vector, the result is the zero vector, so the representation gets a zero column. Applying T to the second vector in the row, the output is simply the first vector in that row, making the next column of the representation all zeros plus a lone one, sitting just above the diagonal. Continuing, we create a Jordan block, sitting on the diagonal of the matrix representation. It is not possible in general to state the size of this block, but since the second row is no longer than the first, it cannot have size larger than p.

Since there are as many rows as the dimension of KT, the representation contains as many Jordan blocks as the nullity of T, n T. Each successive block is smaller than the preceding one, with the first, and largest, having size p. The blocks are Jordan blocks since the basis vectors zi,j were often defined as the result of applying T to other elements of the basis already determined, and then we rearranged the basis into an order that placed outputs of T just before their inputs, excepting the start of each row, which was an element of KT.

The proof of Theorem CFNLT is constructive (Technique C), so we can use it to create bases of nilpotent linear transformations with pleasing matrix representations. Recall that Theorem DNLT told us that nilpotent linear transformations are almost never diagonalizable, so this is progress. As we have hinted before, with a nice representation of nilpotent matrices, it will not be difficult to build up representations of other non-diagonalizable matrices. Here is the promised example which illustrates the previous theorem. It is a useful companion to your study of the proof of Theorem CFNLT.

Example CFNLT
Canonical form for a nilpotent linear transformation
The 6 × 6 matrix, A, of Example NM64 is nilpotent of index p = 4. If we define the linear transformation T : 6 6 by T x = Ax, then T is nilpotent of index 4 and we can seek a basis of 6 that yields a matrix representation with Jordan blocks on the diagonal. The nullity of T is 2, so from Theorem CFNLT we can expect the largest Jordan block to be J4 0, and there will be just two blocks. This only leaves enough room for the second block to have size 2.

We will recycle the bases for the null spaces of the powers of A from Example KPNLT rather than recomputing them here. We will also use the same notation used in the proof of Theorem CFNLT.

To begin, s4 = n4 n3 = 6 5 = 1, so we need one vector of KT4 = 6, that is not in KT3, to be a basis for Z4. We have a lot of latitude in this choice, and we have not described any sure-fire method for constructing a vector outside of a subspace. Looking at the basis for KT3 we see that if a vector is in this subspace, and has a nonzero value in the first entry, then it must also have a nonzero value in the fourth entry. So the vector

z4,1 = 1 0 0 0 0 0

will not be an element of KT3 (notice that many other choices could be made here, so our basis will not be unique). This completes the determination of Zp = Z4.

Next, s3 = n3 n2 = 5 4 = 1, so we again need just a single basis vector for Z3. We start by evaluating T with each basis vector of Z4,

z3,1 = T z4,1 = Az4,1 = 3 3 3 3 3 2

Since s3 = s4, the subspace R3 is trivial, and there is nothing left to do, z3,1 is the lone basis vector of Z3.

Now s2 = n2 n1 = 4 2 = 2, so the construction of Z2 will not be as simple as the construction of Z3. We first apply T to the basis vector of Z2,

z2,1 = T z3,1 = Az3,1 = 1 0 3 1 0 1

The two basis vectors of KT1, together with z2,1, form a basis for Q2. Because dim KT2 dim Q 2 = 4 3 = 1 we need only find a single basis vector for R2. This vector must be an element of KT2, but not an element of Q2. Again, there is a variety of vectors that fit this description, and we have no precise algorithm for finding them. Since they are plentiful, they are not too hard to find. We add up the four basis vectors of KT2, ensuring an element of KT2. Then we check to see if the vector is a linear combination of three vectors: the two basis vectors of KT1 and z2,1. Having passed the tests, we have chosen

z2,2 = 2 1 2 2 2 1

Thus, Z2 = z2,1,z2,2.

Lastly, s1 = n1 n0 = 2 0 = 2. Since s2 = s1, we again have a trivial R1 and need only complete our basis by evaluating the basis vectors of Z2 with T,

z1,1 = T z2,1 = Az2,1 = 1 1 0 1 1 1 z1,2 = T z2,2 = Az2,2 = 2 2 5 2 1 0

Now we reorder these vectors as the desired basis,

B = z1,1,z2,1,z3,1,z4,1,z1,2,z2,2

We now apply Definition MR to build a matrix representation of T relative to B,

ρB T z1,1 = ρB Az1,1 = ρB 0 = 0 0 0 0 0 0 ρB T z2,1 = ρB Az2,1 = ρB z1,1 = 1 0 0 0 0 0 ρB T z3,1 = ρB Az3,1 = ρB z2,1 = 0 1 0 0 0 0 ρB T z4,1 = ρB Az4,1 = ρB z3,1 = 0 0 1 0 0 0 ρB T z1,2 = ρB Az1,2 = ρB 0 = 0 0 0 0 0 0 ρB T z2,2 = ρB Az2,2 = ρB z1,2 = 0 0 0 0 1 0

Installing these vectors as the columns of the matrix representation we have

MB,BT = 010000 0 0 1 0 0 0 0 00100 0 0 0 0 0 0 0 00001 0 0 0 0 0 0

which is a block diagonal matrix with Jordan blocks J4 0 and J2 0. If we constructed the matrix S having the vectors of B as columns, then Theorem SCB tells us that a similarity transformation with S relates the original matrix representation of T with the matrix representation consisting of Jordan blocks., i.e. S1AS = M B,BT .

Notice that constructing interesting examples of matrix representations requires domains with dimensions bigger than just two or three. Going forward we will see several more big examples.