Section T Trace

The matrix trace is a function that sends square matrices to scalars. In some ways it is reminiscent of the determinant. And like the determinant, it has many useful and surprising properties.

Definition T
Trace
Suppose A is a square matrix of size n. Then the trace of A, t\left (A\right ), is the sum of the diagonal entries of A. Symbolically,

\eqalignno{ t\left (A\right ) & ={ \mathop{∑ }}_{i=1}^{n}{\left [A\right ]}_{ ii} & & }

The next three proofs make for excellent practice. In some books they would be left as exercises for the reader as they are all “trivial” in the sense they do not rely on anything but the definition of the matrix trace.

Theorem TL
Trace is Linear
Suppose A and B are square matrices of size n. Then t\left (A + B\right ) = t\left (A\right ) + t\left (B\right ). Furthermore, if α ∈ ℂ, then t\left (αA\right ) = αt\left (A\right ). □

Proof These properties are exactly those required for a linear transformation. To prove these results we just manipulate sums,

\eqalignno{ t\left (A + B\right ) & ={ \mathop{∑ }}_{k=1}^{n}{\left [A + B\right ]}_{ ii} & &\text{@(a href="#definition.T")Definition T@(/a)} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{n}{\left [A\right ]}_{ ii} +{ \left [B\right ]}_{ii} & &\text{@(a href="fcla-jsmath-2.10li30.html#definition.MA")Definition MA@(/a)} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{n}{\left [A\right ]}_{ ii} +{ \mathop{∑ }}_{i=1}^{n}{\left [B\right ]}_{ ii} & &\text{@(a href="fcla-jsmath-2.10li69.html#property.CACN")Property CACN@(/a)} & & & & \cr & = t\left (A\right ) + t\left (B\right ) & &\text{@(a href="#definition.T")Definition T@(/a)} & & & & }

\eqalignno{ t\left (αA\right ) & ={ \mathop{∑ }}_{i=1}^{n}{\left [αA\right ]}_{ ii} & &\text{@(a href="#definition.T")Definition T@(/a)} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{n}α{\left [A\right ]}_{ ii} & &\text{@(a href="fcla-jsmath-2.10li30.html#definition.MSM")Definition MSM@(/a)} & & & & \cr & = α{\mathop{∑ }}_{i=1}^{n}{\left [A\right ]}_{ ii} & &\text{@(a href="fcla-jsmath-2.10li69.html#property.DCN")Property DCN@(/a)} & & & & \cr & = αt\left (A\right ) & &\text{@(a href="#definition.T")Definition T@(/a)} & & & & }

\eqalignno{ t\left (AB\right ) & ={ \mathop{∑ }}_{k=1}^{n}{\left [AB\right ]}_{ kk} & &\text{@(a href="#definition.T")Definition T@(/a)} & & & & \cr & ={ \mathop{∑ }}_{k=1}^{n}{ \mathop{∑ }}_{ℓ=1}^{n}{\left [A\right ]}_{ kℓ}{\left [B\right ]}_{ℓk} & &\text{@(a href="fcla-jsmath-2.10li31.html#theorem.EMP")Theorem EMP@(/a)} & & & & \cr & ={ \mathop{∑ }}_{ℓ=1}^{n}{ \mathop{∑ }}_{k=1}^{n}{\left [A\right ]}_{ kℓ}{\left [B\right ]}_{ℓk} & &\text{@(a href="fcla-jsmath-2.10li69.html#property.CACN")Property CACN@(/a)} & & & & \cr & ={ \mathop{∑ }}_{ℓ=1}^{n}{ \mathop{∑ }}_{k=1}^{n}{\left [B\right ]}_{ ℓk}{\left [A\right ]}_{kℓ} & &\text{@(a href="fcla-jsmath-2.10li69.html#property.CMCN")Property CMCN@(/a)} & & & & \cr & ={ \mathop{∑ }}_{ℓ=1}^{n}{\left [BA\right ]}_{ ℓℓ} & &\text{@(a href="fcla-jsmath-2.10li31.html#theorem.EMP")Theorem EMP@(/a)} & & & & \cr & = t\left (BA\right ) & &\text{@(a href="#definition.T")Definition T@(/a)} & & & & }

Proof Invariant means constant under some operation. In this case the operation is a similarity transformation. A lengthy exercise (but possibly a educational one) would be to prove this result without referencing Theorem TSRM. But here we will,

\eqalignno{ t\left ({S}^{−1}AS\right ) & = t\left (\left ({S}^{−1}A\right )S\right ) & &\text{@(a href="fcla-jsmath-2.10li31.html#theorem.MMA")Theorem MMA@(/a)} & & & & \cr & = t\left (S\left ({S}^{−1}A\right )\right ) & &\text{@(a href="#theorem.TSRM")Theorem TSRM@(/a)} & & & & \cr & = t\left (\left (S{S}^{−1}\right )A\right ) & &\text{@(a href="fcla-jsmath-2.10li31.html#theorem.MMA")Theorem MMA@(/a)} & & & & \cr & = t\left (A\right ) & &\text{@(a href="fcla-jsmath-2.10li32.html#definition.MI")Definition MI@(/a)} & & & & }

Now we could define the trace of a linear transformation as the trace of any matrix representation of the transformation. Would this definition be well-defined? That is, will two different representations of the same linear transformation always have the same trace? Why? (Think Theorem SCB.) We will now prove one of the most interesting and surprising results about the trace.

Theorem TSE
Trace is the Sum of the Eigenvalues
Suppose that A is a square matrix of size n with distinct eigenvalues {λ}_{1},\kern 1.95872pt {λ}_{2},\kern 1.95872pt {λ}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {λ}_{k}. Then

\eqalignno{ t\left (A\right ) & ={ \mathop{∑ }}_{i=1}^{k}{α}_{ A}\left ({λ}_{i}\right ){λ}_{i} & & }

Proof It is amazing that the eigenvalues would have anything to do with the sum of the diagonal entries. Our proof will rely on double counting. We will demonstrate two different ways of counting the same thing therefore proving equality. Our object of interest is the coefficient of {x}^{n−1} in the characteristic polynomial of A (Definition CP), which will be denoted {α}_{n−1}. From the proof of Theorem NEM we have,

\eqalignno{ {p}_{A}\left (x\right ) & = {(−1)}^{n}{(x − {λ}_{ 1})}^{{α}_{A}\left ({λ}_{1}\right )}{(x − {λ}_{ 2})}^{{α}_{A}\left ({λ}_{2}\right )}{(x − {λ}_{ 3})}^{{α}_{A}\left ({λ}_{3}\right )}\mathrel{⋯}{(x − {λ}_{ k})}^{{α}_{A}\left ({λ}_{k}\right )} & & }

First we want to prove that {α}_{n−1} is equal to {(−1)}^{n+1}{\mathop{ \mathop{∑}}\nolimits }_{ i=1}^{k}{α}_{ A}\left ({λ}_{i}\right ){λ}_{i} and to do this we will use a straight forward counting argument. Induction can be used here as well (try it), but the intuitive approach is a much stronger technique. Let’s imagine creating each term one by one from the extended product. How do we do this? From each (x − {λ}_{i}) we pick either a x or a {λ}_{i}. But we are only interested in the terms that result in x to the power n − 1. As {\mathop{ \mathop{∑ }}\nolimits }_{i=1}^{k}{α}_{ A}\left ({λ}_{i}\right ) = n, we have n factors of the form (x − {λ}_{i}). Then to get terms with {x}^{n−1} we need to pick x’s in every (x − {λ}_{i}), except one. Since we have n linear factors there are n ways to do this, namely each eigenvalue represented as many times as it’s algebraic multiplicity. Now we have to take into account the sign of each term. As we pick n − 1 x’s and one {λ}_{i} (which has a negative sign in the linear factor) we get a factor of − 1. Then we have to take into account the {(−1)}^{n} in the characteristic polynomial. Thus {α}_{n−1} is the sum of these terms,

\eqalignno{ {α}_{n−1} = {(−1)}^{n+1}{ \mathop{∑ }}_{i=1}^{k}{α}_{ A}\left ({λ}_{i}\right ){λ}_{i} & & }

Now we will now show that {α}_{n−1} is also equal to {(−1)}^{n−1}t\left (A\right ). For this we will proceed by induction on the size of A. If A is a 1 × 1 square matrix then {p}_{A}\left (x\right ) =\mathop{ det} \left (A − x{I}_{n}\right ) = \left ({\left [A\right ]}_{11} − x\right ) and {(−1)}^{1−1}t\left (A\right ) ={ \left [A\right ]}_{ 11}. With our base case in hand let’s assume A is a square matrix of size n. By Definition CP

\eqalignno{ {p}_{A}\left (x\right )& =\mathop{ det} \left (A − x{I}_{n}\right ) && \cr & ={ \left [A − x{I}_{n}\right ]}_{11}\mathop{ det} \left ((A − x{I}_{n})\left (1|1\right )\right ) −{\left [A − x{I}_{n}\right ]}_{12}\mathop{ det} \left ((A − x{I}_{n})\left (1|2\right )\right )+ && \cr &\quad \quad {\left [A − x{I}_{n}\right ]}_{13}\mathop{ det} \left ((A − x{I}_{n}n)\left (1|3\right )\right ) −\mathrel{⋯} + {(−1)}^{n+1}{\left [A − x{I}_{ n}\right ]}_{1n}\mathop{ det} \left ((A − x{I}_{n})\left (1|n\right )\right )&& }

First let’s consider the maximum degree of {\left [A − x{I}_{n}\right ]}_{1i}\mathop{ det} \left ((A − x{I}_{n})\left (1|i\right )\right ) when i\mathrel{≠}1. For polynomials, the degree of f, denoted d(f), is the highest power of x in the expression f(x). A well known result of this definition is: if f(x) = g(x)h(x) then d(f) = d(g) + d(h) (can you prove this?). Now {\left [A − x{I}_{n}\right ]}_{1i} has degree zero when i\mathrel{≠}1. Furthermore (A − x{I}_{n})\left (1|i\right ) has n − 1 rows, one of which has all of its entries of degree zero, since column i is removed. The other n − 2 rows have one entry with degree one and the remainder of degree zero. Then by Exercise T.T30, the maximum degree of {\left [A − x{I}_{n}\right ]}_{1i}\mathop{ det} \left ((A − x{I}_{n})\left (1|i\right )\right ) is n − 2. So these terms will not affect the coefficient of {x}^{n−1}. Now we are free to focus all of our attention on the term {\left [A − x{I}_{n}\right ]}_{11}\mathop{ det} \left ((A − x{I}_{n})\left (1|1\right )\right ). As A\left (1|1\right ) is a (n − 1) × (n − 1) matrix the induction hypothesis tells us that \mathop{ det} \left ((A − x{I}_{n})\left (1|1\right )\right ) has a coefficient of {(−1)}^{n−2}t\left (A\left (1|1\right )\right ) for {x}^{n−2}. We also note that the proof of Theorem NEM tells us that the leading coefficient of \mathop{ det} \left ((A − x{I}_{n})\left (1|1\right )\right ) is {(−1)}^{n−1}. Then,

\eqalignno{ {\left [A − x{I}_{n}\right ]}_{11}&\mathop{ det} \left ((A − x{I}_{n})\left (1|1\right )\right )& = \left ({\left [A\right ]}_{11} − x\right )\left ({(−1)}^{n−1}{x}^{n−1} + {(−1)}^{n−2}t\left (A\left (1|1\right )\right ){x}^{n−2} + \mathop{\mathop{…}}\right )&&&& }

\eqalignno{ {α}_{n−1} & = {(−1)}^{n−1}{\left [A\right ]}_{ 11} + {(−1)}^{n−1}t\left (A\left (1|1\right )\right ) & & & & \cr & = {(−1)}^{n−1}{\left [A\right ]}_{ 11} + {(−1)}^{n−1}{ \mathop{∑ }}_{k=1}^{n−1}{\left [A\left (1|1\right )\right ]}_{ kk} & &\text{@(a href="#definition.T")Definition T@(/a)} & & & & \cr & = {(−1)}^{n−1}\Big ({\left [A\right ]}_{ 11} +{ \mathop{∑ }}_{k=1}^{n−1}{\left [A\left (1|1\right )\right ]}_{ kk}\Big ) & &\text{@(a href="fcla-jsmath-2.10li69.html#property.DCN")Property DCN@(/a)} & & & & \cr & = {(−1)}^{n−1}\Big ({\left [A\right ]}_{ 11} +{ \mathop{∑ }}_{k=2}^{n}{\left [A\right ]}_{ kk}\Big ) & &\text{@(a href="fcla-jsmath-2.10li44.html#definition.SM")Definition SM@(/a)} & & & & \cr & = {(−1)}^{n−1}t\left (A\right ) & &\text{@(a href="#definition.T")Definition T@(/a)} & & & & }

\eqalignno{ t\left (A\right ) & = {(−1)}^{n+1}{(−1)}^{n−1}t\left (A\right ) & & \cr & = {(−1)}^{n+1}{α}_{ n−1} & & \cr & = {(−1)}^{n+1}{(−1)}^{n+1}{ \mathop{∑ }}_{i=1}^{k}{α}_{ A}\left ({λ}_{i}\right ){λ}_{i} & & \cr & ={ \mathop{∑ }}_{i=1}^{k}{α}_{ A}\left ({λ}_{i}\right ){λ}_{i} & & }

Subsection EXC: Exercises

T10 Prove there are no square matrices A and B such that AB − BA = {I}_{n}.
Contributed by Andy Zimmer

T12 Assume A is a square matrix of size n matrix. Prove t\left (A\right ) = t\left ({A}^{t}\right ).
Contributed by Andy Zimmer

T20 If {T}_{n} = \left \{M ∈ {M}_{nn}\mathrel{∣}t\left (M\right ) = 0\right \} then prove {T}_{n} is a subspace of {M}_{nn} and determine it’s dimension.
Contributed by Andy Zimmer

T30 Assume A is a n × n matrix with polynomial entries. Define md(A,i) to be the maximum degree of the entries in row i. Then d(\mathop{det} \left (A\right )) ≤ md(A, 1) + md(A, 2) + \mathop{\mathop{…}} + md(A,n). (Hint: If f(x) = h(x) + g(x), then d(f) ≤\text{max}\{d(h),d(g)\}.)
Contributed by Andy Zimmer Solution [2477]

\eqalignno{ {e}^{A} ={ \mathop{∑ }}_{i=0}^{∞}{{A}^{i}\over i!} & & }

Prove that \mathop{ det} \left ({e}^{A}\right ) = {e}^{t\left (A\right )}. (You might want to give some thought to the convergence of the infinite sum as well.)
Contributed by Andy Zimmer

Subsection SOL: Solutions

T30 Contributed by Andy Zimmer Statement [2475]
We will proceed by induction. If A is a square matrix of size 1, then clearly d(\mathop{det} \left (A\right )) ≤ md(A, 1). Now assume A is a square matrix of size n then by Theorem DER,

\eqalignno{ \mathop{ det} \left (A\right ) & = {(−1)}^{2}{\left [A\right ]}_{ 1,1}\mathop{ det} \left (A\left (1|1\right )\right ) + {(−1)}^{3}{\left [A\right ]}_{ 1,2}\mathop{ det} \left (A\left (1|2\right )\right ) & & \cr &\quad + {(−1)}^{4}{\left [A\right ]}_{ 1,3}\mathop{ det} \left (A\left (1|3\right )\right ) + \mathrel{⋯} + {(−1)}^{n+1}{\left [A\right ]}_{ 1,n}\mathop{ det} \left (A\left (1|n\right )\right ) & & }

Let’s consider the degree of term j, {(−1)}^{1+j}{\left [A\right ]}_{ 1,j}\mathop{ det} \left (A\left (1|j\right )\right ). By definition of the function md, d({\left [A\right ]}_{1,j}) ≤ md(A,j). We use our induction hypothesis to examine the other part of the product which tells us that

Furthermore by definition of A\left (1|j\right ) (Definition SM) row i of matrix A contains all the entries of the corresponding row in A\left (1|j\right ) then,

\eqalignno{ md(A\left (1|j\right ), 1) & ≤ md(A, 1) & & \cr md(A\left (1|j\right ), 2) & ≤ md(A, 2) & & \cr &\mathop{\mathop{⋮}} & & \cr md(A\left (1|j\right ),j − 1) & ≤ md(A,j − 1) & & \cr md(A\left (1|j\right ),j) & ≤ md(A,j + 1) & & \cr &\mathop{\mathop{⋮}} & & \cr md(A\left (1|j\right ),n − 1) & ≤ md(A,n) & & }

\eqalignno{ d\left (\mathop{det} \left (A\left (1|j\right )\right )\right )& ≤ md(A\left (1|j\right ), 1) + md(A\left (1|j\right ), 2) + \mathrel{⋯} + md(A\left (1|j\right ),n − 1) && \cr & ≤ md(A, 1) + md(A, 2) + \mathrel{⋯} + md(A,j − 1) + md(A,j + 1) + \mathrel{⋯} + md(A,n − 1)&& }

\eqalignno{ d\left ({(−1)}^{1+j}{\left [A\right ]}_{ 1,j}\mathop{ det} \left (A\left (1|j\right )\right )\right )& = d\left ({\left [A\right ]}_{1,j}\right ) + d\left (\mathop{det} \left (A\left (1|j\right )\right )\right ) && \cr & ≤ md(A,j) + md(A, 1) + md(A, 2) + \mathrel{⋯}+ && \cr &\quad \quad md(A,j − 1) + md(A,j + 1) + \mathrel{⋯} + md(A,n)&& \cr & = md(A, 1) + md(A, 2) + \mathrel{⋯} + md(A,n) && }

As j is arbitrary the degree of all terms in the determinant are so bounded. Finally using the fact that if f(x) = g(x) + h(x) then d(f) ≤\text{max}\{d(h),d(g)\} we have