Section T  Trace

From A First Course in Linear Algebra
Version 2.20
© 2004.
Licensed under the GNU Free Documentation License.

This section contributed by Andy Zimmer.

The matrix trace is a function that sends square matrices to scalars. In some ways it is reminiscent of the determinant. And like the determinant, it has many useful and surprising properties.

Definition T
Suppose A is a square matrix of size n. Then the trace of A, t A, is the sum of the diagonal entries of A. Symbolically,

t A = i=1n A ii

(This definition contains Notation T.)

The next three proofs make for excellent practice. In some books they would be left as exercises for the reader as they are all “trivial” in the sense they do not rely on anything but the definition of the matrix trace.

Theorem TL
Trace is Linear
Suppose A and B are square matrices of size n. Then t A + B = t A + t B. Furthermore, if α , then t αA = αt A.

Proof   These properties are exactly those required for a linear transformation. To prove these results we just manipulate sums,

t A + B = k=1n A + B ii  Definition T = i=1n A ii + Bii  Definition MA = i=1n A ii + i=1n B ii  Property CACN = t A + t B  Definition T

The second part is as straightforward as the first,

t αA = i=1n αA ii  Definition T = i=1nα A ii  Definition MSM = α i=1n A ii  Property DCN = αt A  Definition T

Theorem TSRM
Trace is Symmetric with Respect to Multiplication
Suppose A and B are square matrices of size n. Then t AB = t BA.


t AB = k=1n AB kk  Definition T = k=1n =1n A k Bk  Theorem EMP = =1n k=1n A k Bk  Property CACN = =1n k=1n B k Ak  Property CMCN = =1n BA  Theorem EMP = t BA  Definition T

Theorem TIST
Trace is Invariant Under Similarity Transformations
Suppose A and S are square matrices of size n and S is invertible. Then t S1AS = t A.

Proof   Invariant means constant under some operation. In this case the operation is a similarity transformation. A lengthy exercise (but possibly a educational one) would be to prove this result without referencing Theorem TSRM. But here we will,

t S1AS = t S1AS  Theorem MMA = t S S1A  Theorem TSRM = t SS1 A  Theorem MMA = t A  Definition MI

Now we could define the trace of a linear transformation as the trace of any matrix representation of the transformation. Would this definition be well-defined? That is, will two different representations of the same linear transformation always have the same trace? Why? (Think Theorem SCB.) We will now prove one of the most interesting and surprising results about the trace.

Theorem TSE
Trace is the Sum of the Eigenvalues
Suppose that A is a square matrix of size n with distinct eigenvalues λ1,λ2,λ3,,λk. Then

t A = i=1kα A λi λi

Proof   It is amazing that the eigenvalues would have anything to do with the sum of the diagonal entries. Our proof will rely on double counting. We will demonstrate two different ways of counting the same thing therefore proving equality. Our object of interest is the coefficient of xn1 in the characteristic polynomial of A (Definition CP), which will be denoted αn1. From the proof of Theorem NEM we have,

pA x = (1)n(x λ 1)αAλ1 (x λ2)αAλ2 (x λ3)αAλ3 (x λk)αAλk

First we want to prove that αn1 is equal to (1)n+1 i=1kα A λi λi and to do this we will use a straight forward counting argument. Induction can be used here as well (try it), but the intuitive approach is a much stronger technique. Let’s imagine creating each term one by one from the extended product. How do we do this? From each (x λi) we pick either a x or a λi. But we are only interested in the terms that result in x to the power n 1. As i=1kα A λi = n, we have n factors of the form (x λi). Then to get terms with xn1 we need to pick x’s in every (x λi), except one. Since we have n linear factors there are n ways to do this, namely each eigenvalue represented as many times as it’s algebraic multiplicity. Now we have to take into account the sign of each term. As we pick n 1 x’s and one λi (which has a negative sign in the linear factor) we get a factor of 1. Then we have to take into account the (1)n in the characteristic polynomial. Thus αn1 is the sum of these terms,

αn1 = (1)n+1 i=1kα A λi λi

Now we will now show that αn1 is also equal to (1)n1t A. For this we will proceed by induction on the size of A. If A is a 1 × 1 square matrix then pA x = det A xIn = A11 x and (1)11t A = A 11. With our base case in hand let’s assume A is a square matrix of size n. By Definition CP

pA x = det A xIn = A xIn 11 det (A xIn) 1|1 A xIn 12 det (A xIn) 1|2 + A xIn 13 det (A xInn) 1|3 + (1)n+1 A xI n 1n det (A xIn) 1|n

First let’s consider the maximum degree of A xIn 1i det (A xIn) 1|i when i1. For polynomials, the degree of f, denoted d(f), is the highest power of x in the expression f(x). A well known result of this definition is: if f(x) = g(x)h(x) then d(f) = d(g) + d(h) (can you prove this?). Now A xIn 1i has degree zero when i1. Furthermore (A xIn) 1|i has n 1 rows, one of which has all of its entries of degree zero, since column i is removed. The other n 2 rows have one entry with degree one and the remainder of degree zero. Then by Exercise T.T30, the maximum degree of A xIn 1i det (A xIn) 1|i is n 2. So these terms will not affect the coefficient of xn1. Now we are free to focus all of our attention on the term A xIn 11 det (A xIn) 1|1. As A 1|1 is a (n 1) × (n 1) matrix the induction hypothesis tells us that det (A xIn) 1|1 has a coefficient of (1)n2t A 1|1 for xn2. We also note that the proof of Theorem NEM tells us that the leading coefficient of det (A xIn) 1|1 is (1)n1. Then,

A xIn 11 det (A xIn) 1|1 = A11 x (1)n1xn1 + (1)n2t A 1|1 xn2 +

Expanding the product shows αn1 (the coefficient of xn1) to be

αn1 = (1)n1 A 11 + (1)n1t A 1|1 = (1)n1 A 11 + (1)n1 k=1n1 A 1|1 kk  Definition T = (1)n1 A 11 + k=1n1 A 1|1 kk  Property DCN = (1)n1 A 11 + k=2n A kk  Definition SM = (1)n1t A  Definition T

With two expressions for αn1, we have our result,

t A = (1)n+1(1)n1t A = (1)n+1α n1 = (1)n+1(1)n+1 i=1kα A λi λi = i=1kα A λi λi

Subsection EXC: Exercises

T10 Prove there are no square matrices A and B such that AB BA = In.  
Contributed by Andy Zimmer

T12 Assume A is a square matrix of size n matrix. Prove t A = t At.  
Contributed by Andy Zimmer

T20 If Tn = M Mnnt M = 0 then prove Tn is a subspace of Mnn and determine it’s dimension.  
Contributed by Andy Zimmer

T30 Assume A is a n × n matrix with polynomial entries. Define md(A,i) to be the maximum degree of the entries in row i. Then d(det A) md(A, 1) + md(A, 2) + + md(A,n). (Hint: If f(x) = h(x) + g(x), then d(f)  max{d(h),d(g)}.)  
Contributed by Andy Zimmer Solution [2361]

T40 If A is a square matrix, the matrix exponential is defined as

eA = i=0Ai i!

Prove that det eA = etA. (You might want to give some thought to the convergence of the infinite sum as well.)  
Contributed by Andy Zimmer

Subsection SOL: Solutions

T30 Contributed by Andy Zimmer Statement [2359]
We will proceed by induction. If A is a square matrix of size 1, then clearly d(det A) md(A, 1). Now assume A is a square matrix of size n then by Theorem DER,

det A = (1)2 A 1,1 det A 1|1 + (1)3 A 1,2 det A 1|2 + (1)4 A 1,3 det A 1|3 + + (1)n+1 A 1,n det A 1|n

Let’s consider the degree of term j, (1)1+j A 1,j det A 1|j. By definition of the function md, d(A1,j) md(A,j). We use our induction hypothesis to examine the other part of the product which tells us that

d det A 1|j md(A 1|j, 1) + md(A 1|j, 2) + + md(A 1|j,n 1)

Furthermore by definition of A 1|j (Definition SM) row i of matrix A contains all the entries of the corresponding row in A 1|j then,

md(A 1|j, 1) md(A, 1) md(A 1|j, 2) md(A, 2) md(A 1|j,j 1) md(A,j 1) md(A 1|j,j) md(A,j + 1) md(A 1|j,n 1) md(A,n)


d det A 1|j md(A 1|j, 1) + md(A 1|j, 2) + + md(A 1|j,n 1) md(A, 1) + md(A, 2) + + md(A,j 1) + md(A,j + 1) + + md(A,n 1)

Then using the property that if f(x) = g(x)h(x) then d(f) = d(g) + d(h),

d (1)1+j A 1,j det A 1|j = d A1,j + d det A 1|j md(A,j) + md(A, 1) + md(A, 2) + + md(A,j 1) + md(A,j + 1) + + md(A,n) = md(A, 1) + md(A, 2) + + md(A,n)

As j is arbitrary the degree of all terms in the determinant are so bounded. Finally using the fact that if f(x) = g(x) + h(x) then d(f)  max{d(h),d(g)} we have

d(det A) md(A, 1) + md(A, 2) + + md(A,n)