Section SD Similarity and Diagonalization

The notion of matrices being “similar” is a lot like saying two matrices are row-equivalent. Two similar matrices are not equal, but they share many important properties. This section, and later sections in Chapter R will be devoted in part to discovering just what these common properties are.

Definition SIM
Similar Matrices
Suppose A and B are two square matrices of size n. Then A and B are similar if there exists a nonsingular matrix of size n, S, such that A = {S}^{−1}BS. △

We will say “A is similar to B via S” when we want to emphasize the role of S in the relationship between A and B. Also, it doesn’t matter if we say A is similar to B, or B is similar to A. If one statement is true then so is the other, as can be seen by using {S}^{−1} in place of S (see Theorem SER for the careful proof). Finally, we will refer to {S}^{−1}BS as a similarity transformation when we want to emphasize the way S changes B. OK, enough about language, let’s build a few examples.

Example SMS5
Similar matrices of size 5
If you wondered if there are examples of similar matrices, then it won’t be hard to convince you they exist. Define

\eqalignno{ B = \left [\array{ −4&1&−3&−2& 2 \cr 1 &2&−1& 3 &−2 \cr −4&1& 3 & 2 & 2 \cr −3&4&−2&−1&−3 \cr 3 &1&−1& 1 &−4 } \right ] & &S = \left [\array{ 1 & 2 &−1& 1 & 1 \cr 0 & 1 &−1&−2&−1 \cr 1 & 3 &−1& 1 & 1 \cr −2&−3& 3 & 1 &−2 \cr 1 & 3 &−1& 2 & 1 \cr } \right ] & & & & }

\eqalignno{ A & = {S}^{−1}BS & & \cr & = \left [\array{ 10& 1 & 0 & 2 &−5 \cr −1& 0 & 1 & 0 & 0 \cr 3 & 0 & 2 & 1 &−3 \cr 0 & 0 &−1& 0 & 1 \cr −4&−1& 1 &−1& 1 } \right ]\left [\array{ −4&1&−3&−2& 2 \cr 1 &2&−1& 3 &−2 \cr −4&1& 3 & 2 & 2 \cr −3&4&−2&−1&−3 \cr 3 &1&−1& 1 &−4 } \right ]\left [\array{ 1 & 2 &−1& 1 & 1 \cr 0 & 1 &−1&−2&−1 \cr 1 & 3 &−1& 1 & 1 \cr −2&−3& 3 & 1 &−2 \cr 1 & 3 &−1& 2 & 1 } \right ] & & \cr & = \left [\array{ −10&−27&−29&−80&−25 \cr −2 & 6 & 6 & 10 & −2 \cr −3 & 11 & −9 &−14& −9 \cr −1 &−13& 0 &−10& −1 \cr 11 & 35 & 6 & 49 & 19 } \right ] & & }

\eqalignno{ B = \left [\array{ −13&−8&−4 \cr 12 & 7 & 4 \cr 24 &16& 7 } \right ] & &S = \left [\array{ 1 & 1 & 2 \cr −2&−1&−3 \cr 1 &−2& 0 } \right ] & & & & }

\eqalignno{ A & = {S}^{−1}BS & & \cr & = \left [\array{ −6&−4&−1 \cr −3&−2&−1 \cr 5 & 3 & 1 } \right ]\left [\array{ −13&−8&−4 \cr 12 & 7 & 4 \cr 24 &16& 7 } \right ]\left [\array{ 1 & 1 & 2 \cr −2&−1&−3 \cr 1 &−2& 0 } \right ] & & \cr & = \left [\array{ −1&0& 0 \cr 0 &3& 0 \cr 0 &0&−1 } \right ] & & }

So by this construction, we know that A and B are similar. But before we move on, look at how pleasing the form of A is. Not convinced? Then consider that several computations related to A are especially easy. For example, in the spirit of Example DUTM, \mathop{ det} \left (A\right ) = (−1)(3)(−1) = 3. Similarly, the characteristic polynomial is straightforward to compute by hand, {p}_{A}\left (x\right ) = (−1 − x)(3 − x)(−1 − x) = −(x − 3){(x + 1)}^{2} and since the result is already factored, the eigenvalues are transparently λ = 3,\kern 1.95872pt − 1. Finally, the eigenvectors of A are just the standard unit vectors (Definition SUV). ⊠

Subsection PSM: Properties of Similar Matrices

Similar matrices share many properties and it is these theorems that justify the choice of the word “similar.” First we will show that similarity is an equivalence relation. Equivalence relations are important in the study of various algebras and can always be regarded as a kind of weak version of equality. Sort of alike, but not quite equal. The notion of two matrices being row-equivalent is an example of an equivalence relation we have been working with since the beginning of the course (see Exercise RREF.T11). Row-equivalent matrices are not equal, but they are a lot alike. For example, row-equivalent matrices have the same rank. Formally, an equivalence relation requires three conditions hold: reflexive, symmetric and transitive. We will illustrate these as we prove that similarity is an equivalence relation.

Proof To see that A is similar to A, we need only demonstrate a nonsingular matrix that effects a similarity transformation of A to A. {I}_{n} is nonsingular (since it row-reduces to the identity matrix, Theorem NMRRI), and

If we assume that A is similar to B, then we know there is a nonsingular matrix S so that A = {S}^{−1}BS by Definition SIM. By Theorem MIMI, {S}^{−1} is invertible, and by Theorem NI is therefore nonsingular. So

\eqalignno{ {({S}^{−1})}^{−1}A({S}^{−1}) & = SA{S}^{−1} & &\text{@(a href="fcla-jsmath-2.00li32.html#theorem.MIMI")Theorem MIMI@(/a)} & & & & \cr & = S{S}^{−1}BS{S}^{−1} & &\text{@(a href="#definition.SIM")Definition SIM@(/a)} & & & & \cr & = \left (S{S}^{−1}\right )B\left (S{S}^{−1}\right ) & &\text{@(a href="fcla-jsmath-2.00li31.html#theorem.MMA")Theorem MMA@(/a)} & & & & \cr & = {I}_{n}B{I}_{n} & &\text{@(a href="fcla-jsmath-2.00li32.html#definition.MI")Definition MI@(/a)} & & & & \cr & = B & &\text{@(a href="fcla-jsmath-2.00li31.html#theorem.MMIM")Theorem MMIM@(/a)} & & & & }

Assume that A is similar to B, and B is similar to C. This gives us the existence of two nonsingular matrices, S and R, such that A = {S}^{−1}BS and B = {R}^{−1}CR, by Definition SIM. (Notice how we have to assume S\mathrel{≠}R, as will usually be the case.) Since S and R are invertible, so too RS is invertible by Theorem SS and then nonsingular by Theorem NI. Now

\eqalignno{ {(RS)}^{−1}C(RS) & = {S}^{−1}{R}^{−1}CRS & &\text{@(a href="fcla-jsmath-2.00li32.html#theorem.SS")Theorem SS@(/a)} & & & & \cr & = {S}^{−1}\left ({R}^{−1}CR\right )S & &\text{@(a href="fcla-jsmath-2.00li31.html#theorem.MMA")Theorem MMA@(/a)} & & & & \cr & = {S}^{−1}BS & &\text{@(a href="#definition.SIM")Definition SIM@(/a)} & & & & \cr & = A & & & & }

Here’s another theorem that tells us exactly what sorts of properties similar matrices share.

Theorem SMEE
Similar Matrices have Equal Eigenvalues
Suppose A and B are similar matrices. Then the characteristic polynomials of A and B are equal, that is, {p}_{A}\left (x\right ) = {p}_{B}\left (x\right ). □

Proof Let n denote the size of A and B. Since A and B are similar, there exists a nonsingular matrix S, such that A = {S}^{−1}BS (Definition SIM). Then

\eqalignno{ {p}_{A}\left (x\right ) & =\mathop{ det} \left (A − x{I}_{n}\right ) & &\text{@(a href="fcla-jsmath-2.00li47.html#definition.CP")Definition CP@(/a)} & & & & \cr & =\mathop{ det} \left ({S}^{−1}BS − x{I}_{ n}\right ) & &\text{@(a href="#definition.SIM")Definition SIM@(/a)} & & & & \cr & =\mathop{ det} \left ({S}^{−1}BS − x{S}^{−1}{I}_{ n}S\right ) & &\text{@(a href="fcla-jsmath-2.00li31.html#theorem.MMIM")Theorem MMIM@(/a)} & & & & \cr & =\mathop{ det} \left ({S}^{−1}BS − {S}^{−1}x{I}_{ n}S\right ) & &\text{@(a href="fcla-jsmath-2.00li31.html#theorem.MMSMM")Theorem MMSMM@(/a)} & & & & \cr & =\mathop{ det} \left ({S}^{−1}\left (B − x{I}_{ n}\right )S\right ) & &\text{@(a href="fcla-jsmath-2.00li31.html#theorem.MMDAA")Theorem MMDAA@(/a)} & & & & \cr & =\mathop{ det} \left ({S}^{−1}\right )\mathop{ det} \left (B − x{I}_{ n}\right )\mathop{ det} \left (S\right ) & &\text{@(a href="fcla-jsmath-2.00li45.html#theorem.DRMM")Theorem DRMM@(/a)} & & & & \cr & =\mathop{ det} \left ({S}^{−1}\right )\mathop{ det} \left (S\right )\mathop{ det} \left (B − x{I}_{ n}\right ) & &\text{@(a href="fcla-jsmath-2.00li69.html#property.CMCN")Property CMCN@(/a)} & & & & \cr & =\mathop{ det} \left ({S}^{−1}S\right )\mathop{ det} \left (B − x{I}_{ n}\right ) & &\text{@(a href="fcla-jsmath-2.00li45.html#theorem.DRMM")Theorem DRMM@(/a)} & & & & \cr & =\mathop{ det} \left ({I}_{n}\right )\mathop{ det} \left (B − x{I}_{n}\right ) & &\text{@(a href="fcla-jsmath-2.00li32.html#definition.MI")Definition MI@(/a)} & & & & \cr & = 1\mathop{ det} \left (B − x{I}_{n}\right ) & &\text{@(a href="fcla-jsmath-2.00li44.html#definition.DM")Definition DM@(/a)} & & & & \cr & = {p}_{B}\left (x\right ) & &\text{@(a href="fcla-jsmath-2.00li47.html#definition.CP")Definition CP@(/a)} & & & & }

So similar matrices not only have the same set of eigenvalues, the algebraic multiplicities of these eigenvalues will also be the same. However, be careful with this theorem. It is tempting to think the converse is true, and argue that if two matrices have the same eigenvalues, then they are similar. Not so, as the following example illustrates.

\eqalignno{ A & = \left [\array{ 1&1 \cr 0&1 } \right ] &B & = \left [\array{ 1&0 \cr 0&1 } \right ] & & & & }

and so A and B have equal characteristic polynomials. If the converse of Theorem SMEE were true, then A and B would be similar. Suppose this is the case. More precisely, suppose there is a nonsingular matrix S so that A = {S}^{−1}BS. Then

Clearly A\mathrel{≠}{I}_{2} and this contradiction tells us that the converse of Theorem SMEE is false. ⊠

Subsection D: Diagonalization

Good things happen when a matrix is similar to a diagonal matrix. For example, the eigenvalues of the matrix are the entries on the diagonal of the diagonal matrix. And it can be a much simpler matter to compute high powers of the matrix. Diagonalizable matrices are also of interest in more abstract settings. Here are the relevant definitions, then our main theorem for this section.

Definition DIM
Diagonal Matrix
Suppose that A is a square matrix. Then A is a diagonal matrix if {\left [A\right ]}_{ij} = 0 whenever i\mathrel{≠}j. △

Definition DZM
Diagonalizable Matrix
Suppose A is a square matrix. Then A is diagonalizable if A is similar to a diagonal matrix. △

and is similar to a diagonal matrix, as can be seen by the following computation with the nonsingular matrix S,

\eqalignno{ {S}^{−1}BS & ={ \left [\array{ −5&−3&−2 \cr 3 & 2 & 1 \cr 1 & 1 & 1 } \right ]}^{−1}\left [\array{ −7&−6&−12 \cr 5 & 5 & 7 \cr 1 & 0 & 4 } \right ]\left [\array{ −5&−3&−2 \cr 3 & 2 & 1 \cr 1 & 1 & 1 } \right ] & & \cr & = \left [\array{ −1&−1&−1 \cr 2 & 3 & 1 \cr −1&−2& 1 } \right ]\left [\array{ −7&−6&−12 \cr 5 & 5 & 7 \cr 1 & 0 & 4 } \right ]\left [\array{ −5&−3&−2 \cr 3 & 2 & 1 \cr 1 & 1 & 1 } \right ] & & \cr & = \left [\array{ −1&0&0 \cr 0 &1&0 \cr 0 &0&2 } \right ] & & }

Example SMS3 provides yet another example of a matrix that is subjected to a similarity transformation and the result is a diagonal matrix. Alright, just how would we find the magic matrix S that can be used in a similarity transformation to produce a diagonal matrix? Before you read the statement of the next theorem, you might study the eigenvalues and eigenvectors of Archetype B and compute the eigenvalues and eigenvectors of the matrix in Example SMS3.

Theorem DC
Diagonalization Characterization
Suppose A is a square matrix of size n. Then A is diagonalizable if and only if there exists a linearly independent set S that contains n eigenvectors of A. □

Proof ( ⇐) Let S = \left \{{x}_{1},\kern 1.95872pt {x}_{2},\kern 1.95872pt {x}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {x}_{n}\right \} be a linearly independent set of eigenvectors of A for the eigenvalues {λ}_{1},\kern 1.95872pt {λ}_{2},\kern 1.95872pt {λ}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {λ}_{n}. Recall Definition SUV and define

\eqalignno{ R & = \left [{x}_{1}|{x}_{2}|{x}_{3}|\mathop{\mathop{…}}|{x}_{n}\right ] & & \cr D & = \left [\array{ {λ}_{1}& 0 & 0 &\mathrel{⋯}\kern 1.95872pt & 0 \cr 0 &{λ}_{2}& 0 &\mathrel{⋯}\kern 1.95872pt & 0 \cr 0 & 0 &{λ}_{3}&\mathrel{⋯}\kern 1.95872pt & 0 \cr \mathop{\mathop{⋮}} & \mathop{\mathop{⋮}} & \mathop{\mathop{⋮}} & & \mathop{\mathop{⋮}} \cr 0 & 0 & 0 &\mathrel{⋯}\kern 1.95872pt &{λ}_{n} } \right ] = [{λ}_{1}{e}_{1}|{λ}_{2}{e}_{2}|{λ}_{3}{e}_{3}|\mathop{\mathop{…}}|{λ}_{n}{e}_{n}] & & }

The columns of R are the vectors of the linearly independent set S and so by Theorem NMLIC the matrix R is nonsingular. By Theorem NI we know {R}^{−1} exists.

\eqalignno{ {R}^{−1}AR & = {R}^{−1}A\left [{x}_{ 1}|{x}_{2}|{x}_{3}|\mathop{\mathop{…}}|{x}_{n}\right ] & & & & \cr & = {R}^{−1}[A{x}_{ 1}|A{x}_{2}|A{x}_{3}|\mathop{\mathop{…}}|A{x}_{n}] & &\text{@(a href="fcla-jsmath-2.00li31.html#definition.MM")Definition MM@(/a)} & & & & \cr & = {R}^{−1}[{λ}_{ 1}{x}_{1}|{λ}_{2}{x}_{2}|{λ}_{3}{x}_{3}|\mathop{\mathop{…}}|{λ}_{n}{x}_{n}] & &\text{@(a href="fcla-jsmath-2.00li47.html#definition.EEM")Definition EEM@(/a)} & & & & \cr & = {R}^{−1}[{λ}_{ 1}R{e}_{1}|{λ}_{2}R{e}_{2}|{λ}_{3}R{e}_{3}|\mathop{\mathop{…}}|{λ}_{n}R{e}_{n}] & &\text{@(a href="fcla-jsmath-2.00li31.html#definition.MVP")Definition MVP@(/a)} & & & & \cr & = {R}^{−1}[R({λ}_{ 1}{e}_{1})|R({λ}_{2}{e}_{2})|R({λ}_{3}{e}_{3})|\mathop{\mathop{…}}|R({λ}_{n}{e}_{n})] & &\text{@(a href="fcla-jsmath-2.00li31.html#theorem.MMSMM")Theorem MMSMM@(/a)} & & & & \cr & = {R}^{−1}R[{λ}_{ 1}{e}_{1}|{λ}_{2}{e}_{2}|{λ}_{3}{e}_{3}|\mathop{\mathop{…}}|{λ}_{n}{e}_{n}] & &\text{@(a href="fcla-jsmath-2.00li31.html#definition.MM")Definition MM@(/a)} & & & & \cr & = {I}_{n}D & &\text{@(a href="fcla-jsmath-2.00li32.html#definition.MI")Definition MI@(/a)} & & & & \cr & = D & &\text{@(a href="fcla-jsmath-2.00li31.html#theorem.MMIM")Theorem MMIM@(/a)} & & & & }

This says that A is similar to the diagonal matrix D via the nonsingular matrix R. Thus A is diagonalizable (Definition DZM).

( ⇒) Suppose that A is diagonalizable, so there is a nonsingular matrix of size n

\eqalignno{ T & = \left [{y}_{1}|{y}_{2}|{y}_{3}|\mathop{\mathop{…}}|{y}_{n}\right ] & & & & \text{and a diagonal matrix (recall @(a href="fcla-jsmath-2.00li28.html#definition.SUV")Definition SUV@(/a))} \cr E & = \left [\array{ {d}_{1}& 0 & 0 &\mathrel{⋯}\kern 1.95872pt & 0 \cr 0 &{d}_{2}& 0 &\mathrel{⋯}\kern 1.95872pt & 0 \cr 0 & 0 &{d}_{3}&\mathrel{⋯}\kern 1.95872pt & 0 \cr \mathop{\mathop{⋮}}&\mathop{\mathop{⋮}}&\mathop{\mathop{⋮}}& & \mathop{\mathop{⋮}} \cr 0 & 0 & 0 &\mathrel{⋯}\kern 1.95872pt &{d}_{n} } \right ] = [{d}_{1}{e}_{1}|{d}_{2}{e}_{2}|{d}_{3}{e}_{3}|\mathop{\mathop{…}}|{d}_{n}{e}_{n}] & &\text{} & & & & }

\eqalignno{ [A{y}_{1}|A{y}_{2}|A{y}_{3}|\mathop{\mathop{…}}|A{y}_{n}]& = A\left [{y}_{1}|{y}_{2}|{y}_{3}|\mathop{\mathop{…}}|{y}_{n}\right ] &&\text{@(a href="fcla-jsmath-2.00li31.html#definition.MM")Definition MM@(/a)} &&&& \cr & = AT && && \cr & = {I}_{n}AT &&\text{@(a href="fcla-jsmath-2.00li31.html#theorem.MMIM")Theorem MMIM@(/a)}&&&& \cr & = T{T}^{−1}AT &&\text{@(a href="fcla-jsmath-2.00li32.html#definition.MI")Definition MI@(/a)} &&&& \cr & = TE && && \cr & = T[{d}_{1}{e}_{1}|{d}_{2}{e}_{2}|{d}_{3}{e}_{3}|\mathop{\mathop{…}}|{d}_{n}{e}_{n}] && && \cr & = [T({d}_{1}{e}_{1})|T({d}_{2}{e}_{2})|T({d}_{3}{e}_{3})|\mathop{\mathop{…}}|T({d}_{n}{e}_{n})]&&\text{@(a href="fcla-jsmath-2.00li31.html#definition.MM")Definition MM@(/a)} &&&& \cr & = [{d}_{1}T{e}_{1}|{d}_{2}T{e}_{2}|{d}_{3}T{e}_{3}|\mathop{\mathop{…}}|{d}_{n}T{e}_{n}] &&\text{@(a href="fcla-jsmath-2.00li31.html#definition.MM")Definition MM@(/a)} &&&& \cr & = [{d}_{1}{y}_{1}|{d}_{2}{y}_{2}|{d}_{3}{y}_{3}|\mathop{\mathop{…}}|{d}_{n}{y}_{n}] &&\text{@(a href="fcla-jsmath-2.00li31.html#definition.MVP")Definition MVP@(/a)} &&&& }

Notice that the proof of Theorem DC is constructive. To diagonalize a matrix, we need only locate n linearly independent eigenvectors. Then we can construct a nonsingular matrix using the eigenvectors as columns (R) so that {R}^{−1}AR is a diagonal matrix (D). The entries on the diagonal of D will be the eigenvalues of the eigenvectors used to create R, in the same order as the eigenvectors appear in R. We illustrate this by diagonalizing some matrices.

\eqalignno{ λ & = 3 &{ℰ}_{F }\left (3\right ) & = \left \langle \left \{\left [\array{ −{1\over 2} \cr {1\over 2} \cr 1 } \right ]\right \}\right \rangle & & & & \cr λ & = −1 &{ℰ}_{F }\left (−1\right ) & = \left \langle \left \{\left [\array{ −{2\over 3} \cr 1 \cr 0 } \right ],\kern 1.95872pt \left [\array{ −{1\over 3} \cr 0 \cr 1 } \right ]\right \}\right \rangle & & & & }

Define the matrix S to be the 3 × 3 matrix whose columns are the three basis vectors in the eigenspaces for F,

Check that S is nonsingular (row-reduces to the identity matrix, Theorem NMRRI or has a nonzero determinant, Theorem SMZD). Then the three columns of S are a linearly independent set (Theorem NMLIC). By Theorem DC we now know that F is diagonalizable. Furthermore, the construction in the proof of Theorem DC tells us that if we apply the matrix S to F in a similarity transformation, the result will be a diagonal matrix with the eigenvalues of F on the diagonal. The eigenvalues appear on the diagonal of the matrix in the same order as the eigenvectors appear in S. So,

\eqalignno{ {S}^{−1}FS & ={ \left [\array{ −{1\over 2}&−{2\over 3}&−{1\over 3} \cr {1\over 2} & 1 & 0 \cr 1 & 0 & 1 } \right ]}^{−1}\left [\array{ −13&−8&−4 \cr 12 & 7 & 4 \cr 24 &16& 7 } \right ]\left [\array{ −{1\over 2}&−{2\over 3}&−{1\over 3} \cr {1\over 2} & 1 & 0 \cr 1 & 0 & 1 } \right ] & & \cr & = \left [\array{ 6 & 4 & 2 \cr −3&−1&−1 \cr −6&−4&−1 } \right ]\left [\array{ −13&−8&−4 \cr 12 & 7 & 4 \cr 24 &16& 7 } \right ]\left [\array{ −{1\over 2}&−{2\over 3}&−{1\over 3} \cr {1\over 2} & 1 & 0 \cr 1 & 0 & 1 } \right ] & & \cr & = \left [\array{ 3& 0 & 0 \cr 0&−1& 0 \cr 0& 0 &−1 } \right ] & & }

Note that the above computations can be viewed two ways. The proof of Theorem DC tells us that the four matrices (F, S, {F}^{−1} and the diagonal matrix) will interact the way we have written the equation. Or as an example, we can actually perform the computations to verify what the theorem predicts. ⊠

The dimension of an eigenspace can be no larger than the algebraic multiplicity of the eigenvalue by Theorem ME. When every eigenvalue’s eigenspace is this large, then we can diagonalize the matrix, and only then. Three examples we have seen so far in this section, Example SMS5, Example DAB and Example DMS3, illustrate the diagonalization of a matrix, with varying degrees of detail about just how the diagonalization is achieved. However, in each case, you can verify that the geometric and algebraic multiplicities are equal for every eigenvalue. This is the substance of the next theorem.

Theorem DMFE
Diagonalizable Matrices have Full Eigenspaces
Suppose A is a square matrix. Then A is diagonalizable if and only if {γ}_{A}\left (λ\right ) = {α}_{A}\left (λ\right ) for every eigenvalue λ of A. □

Proof Suppose A has size n and k distinct eigenvalues, {λ}_{1},\kern 1.95872pt {λ}_{2},\kern 1.95872pt {λ}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {λ}_{k}. Let {S}_{i} = \left \{{x}_{i1},\kern 1.95872pt {x}_{i2},\kern 1.95872pt {x}_{i3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {x}_{i{γ}_{A}\left ({λ}_{i}\right )}\right \}, denote a basis for the eigenspace of {λ}_{i}, {ℰ}_{A}\left ({λ}_{i}\right ), for 1 ≤ i ≤ k. Then

is a set of eigenvectors for A. A vector cannot be an eigenvector for two different eigenvalues (see Exercise EE.T20) so {S}_{i} ∩ {S}_{j} = ∅ whenever i\mathrel{≠}j. In other words, S is a disjoint union of {S}_{i}, 1 ≤ i ≤ k.

\eqalignno{ \left \vert S\right \vert & ={ \mathop{∑ }}_{i=1}^{k}{γ}_{ A}\left ({λ}_{i}\right ) & &\text{$S$ disjoint union of ${S}_{i}$} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{k}{α}_{ A}\left ({λ}_{i}\right ) & &\text{Hypothesis} & & & & \cr & = n & &\text{@(a href="fcla-jsmath-2.00li48.html#theorem.NEM")Theorem NEM@(/a)} & & & & }

We next show that S is a linearly independent set. So we will begin with a relation of linear dependence on S, using doubly-subscripted scalars and eigenvectors,

\eqalignno{ 0& = \left ({a}_{11}{x}_{11} + {a}_{12}{x}_{12} + \mathrel{⋯} + {a}_{1{γ}_{A}\left ({λ}_{1}\right )}{x}_{1{γ}_{A}\left ({λ}_{1}\right )}\right ) + \left ({a}_{21}{x}_{21} + {a}_{22}{x}_{22} + \mathrel{⋯} + {a}_{2{γ}_{A}\left ({λ}_{2}\right )}{x}_{2{γ}_{A}\left ({λ}_{2}\right )}\right )&& \cr &\quad \quad + \mathrel{⋯} + \left ({a}_{k1}{x}_{k1} + {a}_{k2}{x}_{k2} + \mathrel{⋯} + {a}_{k{γ}_{A}\left ({λ}_{k}\right )}{x}_{k{γ}_{A}\left ({λ}_{k}\right )}\right ) && }

\eqalignno{ {y}_{1} & = \left ({a}_{11}{x}_{11} + {a}_{12}{x}_{12} + {a}_{13}{x}_{13} + \mathrel{⋯} + {a}_{{γ}_{A}\left (1{λ}_{1}\right )}{x}_{1{γ}_{A}\left ({λ}_{1}\right )}\right ) & & \cr {y}_{2} & = \left ({a}_{21}{x}_{21} + {a}_{22}{x}_{22} + {a}_{23}{x}_{23} + \mathrel{⋯} + {a}_{{γ}_{A}\left (2{λ}_{2}\right )}{x}_{2{γ}_{A}\left ({λ}_{2}\right )}\right ) & & \cr {y}_{3} & = \left ({a}_{31}{x}_{31} + {a}_{32}{x}_{32} + {a}_{33}{x}_{33} + \mathrel{⋯} + {a}_{{γ}_{A}\left (3{λ}_{3}\right )}{x}_{3{γ}_{A}\left ({λ}_{3}\right )}\right ) & & \cr &\quad \quad \mathop{\mathop{⋮}} & & \cr {y}_{k} & = \left ({a}_{k1}{x}_{k1} + {a}_{k2}{x}_{k2} + {a}_{k3}{x}_{k3} + \mathrel{⋯} + {a}_{{γ}_{A}\left (k{λ}_{k}\right )}{x}_{k{γ}_{A}\left ({λ}_{k}\right )}\right ) & & }

\eqalignno{ 0 & = {y}_{1} + {y}_{2} + {y}_{3} + \mathrel{⋯} + {y}_{k} & & }

Since the eigenspace {ℰ}_{A}\left ({λ}_{i}\right ) is closed under vector addition and scalar multiplication, {y}_{i} ∈{ℰ}_{A}\left ({λ}_{i}\right ), 1 ≤ i ≤ k. Thus, for each i, the vector {y}_{i} is an eigenvector of A for {λ}_{i}, or is the zero vector. Recall that sets of eigenvectors whose eigenvalues are distinct form a linearly independent set by Theorem EDELI. Should any (or some) {y}_{i} be nonzero, the previous equation would provide a nontrivial relation of linear dependence on a set of eigenvectors with distinct eigenvalues, contradicting Theorem EDELI. Thus {y}_{i} = 0, 1 ≤ i ≤ k.

Each of the k equations, {y}_{i} = 0 is a relation of linear dependence on the corresponding set {S}_{i}, a set of basis vectors for the eigenspace {ℰ}_{A}\left ({λ}_{i}\right ), which is therefore linearly independent. From these relations of linear dependence on linearly independent sets we conclude that the scalars are all zero, more precisely, {a}_{ij} = 0, 1 ≤ j ≤ {γ}_{A}\left ({λ}_{i}\right ) for 1 ≤ i ≤ k. This establishes that our original relation of linear dependence on S has only the trivial relation of linear dependence, and hence S is a linearly independent set.

We have determined that S is a set of n linearly independent eigenvectors for A, and so by Theorem DC is diagonalizable.

( ⇒) Now we assume that A is diagonalizable. Aiming for a contradiction (Technique CD), suppose that there is at least one eigenvalue, say {λ}_{t}, such that {γ}_{A}\left ({λ}_{t}\right )\mathrel{≠}{α}_{A}\left ({λ}_{t}\right ). By Theorem ME we must have {γ}_{A}\left ({λ}_{t}\right ) < {α}_{A}\left ({λ}_{t}\right ), and {γ}_{A}\left ({λ}_{i}\right ) ≤ {α}_{A}\left ({λ}_{i}\right ) for 1 ≤ i ≤ k, i\mathrel{≠}t.

Since A is diagonalizable, Theorem DC guarantees a set of n linearly independent vectors, all of which are eigenvectors of A. Let {n}_{i} denote the number of eigenvectors in S that are eigenvectors for {λ}_{i}, and recall that a vector cannot be an eigenvector for two different eigenvalues (Exercise EE.T20). S is a linearly independent set, so the the subset {S}_{i} containing the {n}_{i} eigenvectors for {λ}_{i} must also be linearly independent. Because the eigenspace {ℰ}_{A}\left ({λ}_{i}\right ) has dimension {γ}_{A}\left ({λ}_{i}\right ) and {S}_{i} is a linearly independent subset in {ℰ}_{A}\left ({λ}_{i}\right ), Theorem G tells us that {n}_{i} ≤ {γ}_{A}\left ({λ}_{i}\right ), for 1 ≤ i ≤ k. Putting all these facts together gives,

\eqalignno{ n & = {n}_{1} + {n}_{2} + {n}_{3} + \mathrel{⋯} + {n}_{t} + \mathrel{⋯} + {n}_{k} & &\text{@(a href="fcla-jsmath-2.00li70.html#definition.SU")Definition SU@(/a)} & & & & \cr & ≤ {γ}_{A}\left ({λ}_{1}\right ) + {γ}_{A}\left ({λ}_{2}\right ) + {γ}_{A}\left ({λ}_{3}\right ) + \mathrel{⋯} + {γ}_{A}\left ({λ}_{t}\right ) + \mathrel{⋯} + {γ}_{A}\left ({λ}_{k}\right ) & &\text{@(a href="fcla-jsmath-2.00li42.html#theorem.G")Theorem G@(/a)} & & & & \cr & < {α}_{A}\left ({λ}_{1}\right ) + {α}_{A}\left ({λ}_{2}\right ) + {α}_{A}\left ({λ}_{3}\right ) + \mathrel{⋯} + {α}_{A}\left ({λ}_{t}\right ) + \mathrel{⋯} + {α}_{A}\left ({λ}_{k}\right ) & &\text{@(a href="fcla-jsmath-2.00li48.html#theorem.ME")Theorem ME@(/a)} & & & & \cr & = n & &\text{@(a href="fcla-jsmath-2.00li48.html#theorem.NEM")Theorem NEM@(/a)} & & & & }

This is a contradiction (we can’t have n < n!) and so our assumption that some eigenspace had less than full dimension was false. ■

So the geometric multiplicity of λ = 2 is {γ}_{B}\left (2\right ) = 1, while the algebraic multiplicity is {α}_{B}\left (2\right ) = 3. By Theorem DMFE, the matrix B is not diagonalizable. ⊠

Archetype A is the lone archetype with a square matrix that is not diagonalizable, as the algebraic and geometric multiplicities of the eigenvalue λ = 0 differ. Example HMEM5 is another example of a matrix that cannot be diagonalized due to the difference between the geometric and algebraic multiplicities of λ = 2, as is Example CEMS6 which has two complex eigenvalues, each with differing multiplicities. Likewise, Example EMMS4 has an eigenvalue with different algebraic and geometric multiplicities and so cannot be diagonalized.

Proof Let {λ}_{1},\kern 1.95872pt {λ}_{2},\kern 1.95872pt {λ}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {λ}_{n} denote the n distinct eigenvalues of A. Then by Theorem NEM we have n ={\mathop{ \mathop{∑ }}\nolimits }_{i=1}^{n}{α}_{ A}\left ({λ}_{i}\right ), which implies that {α}_{A}\left ({λ}_{i}\right ) = 1, 1 ≤ i ≤ n. From Theorem ME it follows that {γ}_{A}\left ({λ}_{i}\right ) = 1, 1 ≤ i ≤ n. So {γ}_{A}\left ({λ}_{i}\right ) = {α}_{A}\left ({λ}_{i}\right ), 1 ≤ i ≤ n and Theorem DMFE says A is diagonalizable. ■

and so is a 5 × 5 matrix with 5 distinct eigenvalues. By Theorem DED we know H must be diagonalizable. But just for practice, we exhibit the diagonalization itself. The matrix S contains eigenvectors of H as columns, one from each eigenspace, guaranteeing linear independent columns and thus the nonsingularity of S. The diagonal matrix has the eigenvalues of H in the same order that their respective eigenvectors appear as the columns of S. Notice that we are using the versions of the eigenvectors from Example DEMS5 that have integer entries.

\eqalignno{ &{S}^{−1}HS && \cr & ={ \left [\array{ 2 & 1 &−1& 1 & 1 \cr −1& 0 & 2 & 0 &−1 \cr −2& 0 & 2 &−1&−2 \cr −4&−1& 0 &−2&−1 \cr 2 & 2 & 1 & 2 & 1 } \right ]}^{−1}\left [\array{ 15 & 18 & −8 & 6 & −5 \cr 5 & 3 & 1 & −1 & −3 \cr 0 & −4 & 5 & −4 & −2 \cr −43&−46& 17 &−14& 15 \cr 26 & 30 &−12& 8 &−10 } \right ]\left [\array{ 2 & 1 &−1& 1 & 1 \cr −1& 0 & 2 & 0 &−1 \cr −2& 0 & 2 &−1&−2 \cr −4&−1& 0 &−2&−1 \cr 2 & 2 & 1 & 2 & 1 } \right ]&& \cr & = \left [\array{ −3&−3& 1 &−1& 1 \cr −1&−2& 1 & 0 & 1 \cr −5&−4& 1 &−1& 2 \cr 10&10&−3& 2 &−4 \cr −7&−6& 1 &−1& 3 } \right ]\left [\array{ 15 & 18 & −8 & 6 & −5 \cr 5 & 3 & 1 & −1 & −3 \cr 0 & −4 & 5 & −4 & −2 \cr −43&−46& 17 &−14& 15 \cr 26 & 30 &−12& 8 &−10 } \right ]\left [\array{ 2 & 1 &−1& 1 & 1 \cr −1& 0 & 2 & 0 &−1 \cr −2& 0 & 2 &−1&−2 \cr −4&−1& 0 &−2&−1 \cr 2 & 2 & 1 & 2 & 1 } \right ] && \cr & = \left [\array{ −3& 0 &0&0&0 \cr 0 &−1&0&0&0 \cr 0 & 0 &0&0&0 \cr 0 & 0 &0&1&0 \cr 0 & 0 &0&0&2 } \right ] && }

Archetype B is another example of a matrix that has as many distinct eigenvalues as its size, and is hence diagonalizable by Theorem DED.

Powers of a diagonal matrix are easy to compute, and when a matrix is diagonalizable, it is almost as easy. We could state a theorem here perhaps, but we will settle instead for an example that makes the point just as well.

and we wish to compute {A}^{20}. Normally this would require 19 matrix multiplications, but since A is diagonalizable, we can simplify the computations substantially. First, we diagonalize A. With

\eqalignno{ D = {S}^{−1}AS& = \left [\array{ −6& 1 &−3&−6 \cr 0 & 2 &−2&−3 \cr 3 & 0 & 1 & 2 \cr −1&−1& 1 & 1 } \right ]\left [\array{ 19 & 0 & 6 & 13 \cr −33&−1& −9 &−21 \cr 21 &−4& 12 & 21 \cr −36& 2 &−14&−28 } \right ]\left [\array{ 1 &−1& 2 &−1 \cr −2& 3 &−3& 3 \cr 1 & 1 & 3 & 3 \cr −2& 1 &−4& 0 } \right ]&& \cr & = \left [\array{ −1&0&0&0 \cr 0 &0&0&0 \cr 0 &0&2&0 \cr 0 &0&0&1 } \right ] && }

\eqalignno{ {A}^{20} & = AAA\mathop{\mathop{…}}A & & \cr & = {I}_{n}A{I}_{n}A{I}_{n}A{I}_{n}\mathop{\mathop{…}}{I}_{n}A{I}_{n} & & \cr & = \left (S{S}^{−1}\right )A\left (S{S}^{−1}\right )A\left (S{S}^{−1}\right )A\left (S{S}^{−1}\right )\mathop{\mathop{…}}\left (S{S}^{−1}\right )A\left (S{S}^{−1}\right ) & & \cr & = S\left ({S}^{−1}AS\right )\left ({S}^{−1}AS\right )\left ({S}^{−1}AS\right )\mathop{\mathop{…}}\left ({S}^{−1}AS\right ){S}^{−1} & & \cr & = SDDD\mathop{\mathop{…}}D{S}^{−1} & & \cr & = S{D}^{20}{S}^{−1} & & \text{and since $D$ is a diagonal matrix, powers are much easier to compute,} \cr & = S{\left [\array{ −1&0&0&0 \cr 0 &0&0&0 \cr 0 &0&2&0 \cr 0 &0&0&1 } \right ]}^{20}{S}^{−1} & & \cr & = S\left [\array{ {(−1)}^{20}& 0 & 0 & 0 \cr 0 &{(0)}^{20}& 0 & 0 \cr 0 & 0 &{(2)}^{20}& 0 \cr 0 & 0 & 0 &{(1)}^{20} } \right ]{S}^{−1} & & \cr & = \left [\array{ 1 &−1& 2 &−1 \cr −2& 3 &−3& 3 \cr 1 & 1 & 3 & 3 \cr −2& 1 &−4& 0 } \right ]\left [\array{ 1&0& 0 &0 \cr 0&0& 0 &0 \cr 0&0&1048576&0 \cr 0&0& 0 &1 } \right ]\left [\array{ −6& 1 &−3&−6 \cr 0 & 2 &−2&−3 \cr 3 & 0 & 1 & 2 \cr −1&−1& 1 & 1 } \right ] & & \cr & = \left [\array{ 6291451 & 2 & 2097148 & 4194297 \cr −9437175 &−5&−3145719&−6291441 \cr 9437175 &−2& 3145728 & 6291453 \cr −12582900&−2&−4194298&−8388596 } \right ] & & }

Notice how we effectively replaced the twentieth power of A by the twentieth power of D, and how a high power of a diagonal matrix is just a collection of powers of scalars on the diagonal. The price we pay for this simplification is the need to diagonalize the matrix (by computing eigenvalues and eigenvectors) and finding the inverse of the matrix of eigenvectors. And we still need to do two matrix products. But the higher the power, the greater the savings. ⊠

Subsection FS: Fibonacci Sequences

\eqalignno{ {a}_{0} & = 0 &{a}_{1} & = 1 &{a}_{n+1} & = {a}_{n} + {a}_{n−1},\quad n ≥ 1 & & & & & & }

So the initial portion of the sequence is 0,\kern 1.95872pt 1,\kern 1.95872pt 1,\kern 1.95872pt 2,\kern 1.95872pt 3,\kern 1.95872pt 5,\kern 1.95872pt 8,\kern 1.95872pt 13,\kern 1.95872pt 21,\kern 1.95872pt \mathop{\mathop{…}}. In this subsection we will illustrate an application of eigenvalues and diagonalization through the determination of a closed-form expression for an arbitrary term of this sequence.

To begin, verify that for any n ≥ 1 the recursive statement above establishes the truth of the statement

\eqalignno{ \left [\array{ {a}_{n} \cr {a}_{n+1} } \right ] & = \left [\array{ 0&1 \cr 1&1 } \right ]\left [\array{ {a}_{n−1} \cr {a}_{n}} \right ] & & }

Let A denote this 2 × 2 matrix. Through repeated applications of the statement above we have

\eqalignno{ \left [\array{ {a}_{n} \cr {a}_{n+1} } \right ] & = A\left [\array{ {a}_{n−1} \cr {a}_{n}} \right ] = {A}^{2}\left [\array{ {a}_{n−2} \cr {a}_{n−1} } \right ] = {A}^{3}\left [\array{ {a}_{n−3} \cr {a}_{n−2} } \right ] = \mathrel{⋯} = {A}^{n}\left [\array{ {a}_{0} \cr {a}_{1} } \right ] & & }

In preparation for working with this high power of A, not unlike in Example HPDM, we will diagonalize A. The characteristic polynomial of A is {p}_{A}\left (x\right ) = {x}^{2} − x − 1, with roots (the eigenvalues of A by Theorem EMRCP)

\eqalignno{ ρ & = {1 + \sqrt{5}\over 2} &δ & = {1 −\sqrt{5}\over 2} & & & & }

With two distinct eigenvalues, Theorem DED implies that A is diagonalizable. It will be easier to compute with these eigenvalues once you confirm the following properties (all but the last can be derived from the fact that ρ and δ are roots of the characteristic polynomial, in a factored or unfactored form)

\eqalignno{ ρ + δ & = 1 &ρδ & = −1 &1 + ρ & = {ρ}^{2} &1 + δ & = {δ}^{2} &ρ − δ & = \sqrt{5} & & & & & & & & & & }

\eqalignno{ &\left [\array{ 1 \cr ρ } \right ] & &\left [\array{ 1 \cr δ } \right ] & & & & }

\eqalignno{ \left [\array{ 0&1 \cr 1&1 } \right ]\left [\array{ 1 \cr ρ } \right ] & = \left [\array{ ρ \cr 1 + ρ } \right ] = \left [\array{ ρ \cr {ρ}^{2} } \right ] = ρ\left [\array{ 1 \cr ρ } \right ] & & }

From the proof of Theorem DC we know A can be diagonalized by a matrix S with these eigenvectors as columns, giving D = {S}^{−1}AS. We list S, {S}^{−1} and the diagonal matrix D,

\eqalignno{ S & = \left [\array{ 1&1 \cr ρ&δ } \right ] &{S}^{−1} & = {1\over ρ − δ}\left [\array{ −δ& 1 \cr ρ &−1 } \right ] &D & = \left [\array{ ρ&0 \cr 0&δ } \right ] & & & & & & }

OK, we have everything in place now. The main step in the following is to replace A by SD{S}^{−1}. Here we go,

\eqalignno{ \left [\array{ {a}_{n} \cr {a}_{n+1} } \right ] & = {A}^{n}\left [\array{ {a}_{0} \cr {a}_{1} } \right ] & & \cr & ={ \left (SD{S}^{−1}\right )}^{n}\left [\array{ {a}_{0} \cr {a}_{1} } \right ] & & \cr & = SD{S}^{−1}SD{S}^{−1}SD{S}^{−1}\mathrel{⋯}SD{S}^{−1}\left [\array{ {a}_{0} \cr {a}_{1} } \right ] & & \cr & = SDDD\mathrel{⋯}D{S}^{−1}\left [\array{ {a}_{0} \cr {a}_{1} } \right ] & & \cr & = S{D}^{n}{S}^{−1}\left [\array{ {a}_{0} \cr {a}_{1} } \right ] & & \cr & = \left [\array{ 1&1 \cr ρ&δ } \right ]{\left [\array{ ρ&0 \cr 0&δ } \right ]}^{n} {1\over ρ − δ}\left [\array{ −δ& 1 \cr ρ &−1 } \right ]\left [\array{ {a}_{0} \cr {a}_{1} } \right ] & & \cr & = {1\over ρ − δ}\left [\array{ 1&1 \cr ρ&δ } \right ]\left [\array{ {ρ}^{n}& 0 \cr 0 &{δ}^{n} } \right ]\left [\array{ −δ& 1 \cr ρ &−1 } \right ]\left [\array{ 0 \cr 1 } \right ] & & \cr & = {1\over ρ − δ}\left [\array{ 1&1 \cr ρ&δ } \right ]\left [\array{ {ρ}^{n}& 0 \cr 0 &{δ}^{n} } \right ]\left [\array{ 1 \cr −1 } \right ] & & \cr & = {1\over ρ − δ}\left [\array{ 1&1 \cr ρ&δ } \right ]\left [\array{ {ρ}^{n} \cr −{δ}^{n} } \right ] & & \cr & = {1\over ρ − δ}\left [\array{ {ρ}^{n} − {δ}^{n} \cr {ρ}^{n+1} − {δ}^{n+1} } \right ] & & }

Performing the scalar multiplication and equating the first entries of the two vectors, we arrive at the closed form expression

\eqalignno{ {a}_{n} & = {1\over ρ − δ}\left ({ρ}^{n} − {δ}^{n}\right ) & & \cr & = {1\over \sqrt{5}}\left ({\left ({1 + \sqrt{5}\over 2} \right )}^{n} −{\left ({1 −\sqrt{5}\over 2} \right )}^{n}\right ) & & \cr & = {1\over { 2}^{n}\sqrt{5}}\left ({\left (1 + \sqrt{5}\right )}^{n} −{\left (1 −\sqrt{5}\right )}^{n}\right ) & & }

Notice that it does not matter whether we use the equality of the first or second entries of the vectors, we will arrive at the same formula, once in terms of n and again in terms of n + 1. Also, our definition clearly describes a sequence that will only contain integers, yet the presence of the irrational number \sqrt{ 5} might make us suspicious. But no, our expression for {a}^{n} will always yield an integer!

The Fibonacci sequence, and generalizations of it, have been extensively studied (Fibonacci lived in the 12th and 13th centuries). There are many ways to derive the closed-form expression we just found, and our approach may not be the most efficient route. But it is a nice demonstration of how diagonalization can be used to solve a problem outside the field of linear algebra. ⊠

We close this section with a comment about an important upcoming theorem that we prove in Chapter R. A consequence of Theorem OD is that every Hermitian matrix (Definition HM) is diagonalizable (Definition DZM), and the similarity transformation that accomplishes the diagonalization uses a unitary matrix (Definition UM). This means that for every Hermitian matrix of size n there is a basis of {ℂ}^{n} that is composed entirely of eigenvectors for the matrix and also forms an orthonormal set (Definition ONS). Notice that for matrices with only real entries, we only need the hypothesis that the matrix is symmetric (Definition SYM) to reach this conclusion (Example ESMS4). Can you imagine a prettier basis for use with a matrix? I can’t.

These results in Section OD explain much of our recurring interest in orthogonality, and make the section a high point in your study of linear algebra. A precise statement of this diagonalization result applies to a slightly broader class of matrices, known as “normal” matrices (Definition NRML), which are matrices that commute with their adjoints. With this expanded category of matrices, the result becomes an equivalence (Technique E). See Theorem OD and Theorem OBNM in Section OD for all the details.

Subsection READ: Reading Questions

Subsection EXC: Exercises

C20 Consider the matrix A below. First, show that A is diagonalizable by computing the geometric multiplicities of the eigenvalues and quoting the relevant theorem. Second, find a diagonal matrix D and a nonsingular matrix S so that {S}^{−1}AS = D. (See Exercise EE.C20 for some of the necessary computations.)

C21 Determine if the matrix A below is diagonalizable. If the matrix is diagonalizable, then find a diagonal matrix D that is similar to A, and provide the invertible matrix S that perfoms the similarity transformation. You should use your calculator to find the eigenvalues of the matrix, but try only using the row-reducing function of your calculator to assist with finding eigenvectors.

C22 Consider the matrix A below. Find the eigenvalues of A using a calculator and use these to construct the characteristic polynomial of A, {p}_{A}\left (x\right ). State the algebraic multiplicity of each eigenvalue. Find all of the eigenspaces for A by computing expressions for null spaces, only using your calculator to row-reduce matrices. State the geometric multiplicity of each eigenvalue. Is A diagonalizable? If not, explain why. If so, find a diagonal matrix D that is similar to A.

T15 Suppose that A and B are similar matrices. Prove that {A}^{3} and {B}^{3} are similar matrices. Generalize.
Contributed by Robert Beezer Solution [1307]

T16 Suppose that A and B are similar matrices, with A nonsingular. Prove that B is nonsingular, and that {A}^{−1} is similar to {B}^{−1}.
Contributed by Robert Beezer Solution [1308]

T17 Suppose that B is a nonsingular matrix. Prove that AB is similar to BA.
Contributed by Robert Beezer Solution [1309]

Subsection SOL: Solutions

C20 Contributed by Robert Beezer Statement [1297]
Using a calculator, we find that A has three distinct eigenvalues, λ = 3,\kern 1.95872pt 2,\kern 1.95872pt − 1, with λ = 2 having algebraic multiplicity two, {α}_{A}\left (2\right ) = 2. The eigenvalues λ = 3,\kern 1.95872pt − 1 have algebraic multiplicity one, and so by Theorem ME we can conclude that their geometric multiplicities are one as well. Together with the computation of the geometric multiplicity of λ = 2 from Exercise EE.C20, we know

\eqalignno{ {γ}_{A}\left (3\right ) & = {α}_{A}\left (3\right ) = 1 &{γ}_{A}\left (2\right ) & = {α}_{A}\left (2\right ) = 2 &{γ}_{A}\left (−1\right ) & = {α}_{A}\left (−1\right ) = 1 & & & & & & }

This satisfies the hypotheses of Theorem DMFE, and so we can conclude that A is diagonalizable.

A calculator will give us four eigenvectors of A, the two for λ = 2 being linearly independent presumably. Or, by hand, we could find basis vectors for the three eigenspaces. For λ = 3,\kern 1.95872pt − 1 the eigenspaces have dimension one, and so any eigenvector for these eigenvalues will be multiples of the ones we use below. For λ = 2 there are many different bases for the eigenspace, so your answer could vary. Our eigenvectors are the basis vectors we would have obtained if we had actually constructed a basis in Exercise EE.C20 rather than just computing the dimension.

By the construction in the proof of Theorem DC, the required matrix S has columns that are four linearly independent eigenvectors of A and the diagonal matrix has the eigenvalues on the diagonal (in the same order as the eigenvectors in S). Here are the pieces, “doing” the diagonalization,

C21 Contributed by Robert Beezer Statement [1297]
A calculator will provide the eigenvalues λ = 2,\kern 1.95872pt 2,\kern 1.95872pt 1,\kern 1.95872pt 0, so we can reconstruct the characteristic polynomial as

\eqalignno{ {α}_{A}\left (2\right ) & = 2 &{α}_{A}\left (1\right ) & = 1 &{α}_{A}\left (0\right ) & = 1 & & & & & & }

Now compute eigenspaces by hand, obtaining null spaces for each of the three eigenvalues by constructing the correct singular matrix (Theorem EMNS),

\eqalignno{ A − 2{I}_{4} & = \left [\array{ −1& 9 & 9 & 24 \cr −3&−29&−29&−68 \cr 1 & 11 & 11 & 26 \cr 1 & 7 & 7 & 16 } \right ]\mathop{\longrightarrow}\limits_{}^{\text{RREF}}\left [\array{ 1&0&0&−{3\over 2} \cr 0&1&1& {5\over 2} \cr 0&0&0& 0 \cr 0&0&0& 0 } \right ] & & \cr {ℰ}_{A}\left (2\right ) & = N\kern -1.95872pt \left (A − 2{I}_{4}\right ) = \left \langle \left \{\left [\array{ {3\over 2} \cr −{5\over 2} \cr 0 \cr 1 } \right ],\kern 1.95872pt \left [\array{ 0 \cr −1 \cr 1 \cr 0 } \right ]\right \}\right \rangle = \left \langle \left \{\left [\array{ 3 \cr −5 \cr 0 \cr 2 } \right ],\kern 1.95872pt \left [\array{ 0 \cr −1 \cr 1 \cr 0 } \right ]\right \}\right \rangle & & \cr A − 1{I}_{4} & = \left [\array{ 0 & 9 & 9 & 24 \cr −3&−28&−29&−68 \cr 1 & 11 & 12 & 26 \cr 1 & 7 & 7 & 17 } \right ]\mathop{\longrightarrow}\limits_{}^{\text{RREF}}\left [\array{ 1&0&0&−{5\over 3} \cr 0&1&0& {13\over 3} \cr 0&0&1&−{5\over 3} \cr 0&0&0& 0 } \right ] & & \cr {ℰ}_{A}\left (1\right ) & = N\kern -1.95872pt \left (A − {I}_{4}\right ) = \left \langle \left \{\left [\array{ {5\over 3} \cr −{13\over 3} \cr {5\over 3} \cr 1 } \right ]\right \}\right \rangle = \left \langle \left \{\left [\array{ 5 \cr −13 \cr 5 \cr 3 } \right ]\right \}\right \rangle & & \cr A − 0{I}_{4} & = \left [\array{ 1 & 9 & 9 & 24 \cr −3&−27&−29&−68 \cr 1 & 11 & 13 & 26 \cr 1 & 7 & 7 & 18 } \right ]\mathop{\longrightarrow}\limits_{}^{\text{RREF}}\left [\array{ 1&0&0&−3 \cr 0&1&0& 5 \cr 0&0&1&−2 \cr 0&0&0& 0 } \right ] & & \cr {ℰ}_{A}\left (0\right ) & = N\kern -1.95872pt \left (A − {I}_{4}\right ) = \left \langle \left \{\left [\array{ 3 \cr −5 \cr 2 \cr 1 } \right ]\right \}\right \rangle & & }

From this we can compute the dimensions of the eigenspaces to obtain the geometric multiplicities,

\eqalignno{ {γ}_{A}\left (2\right ) & = 2 &{γ}_{A}\left (1\right ) & = 1 &{γ}_{A}\left (0\right ) & = 1 & & & & & & }

For each eigenvalue, the algebraic and geometric multiplicities are equal and so by Theorem DMFE we now know that A is diagonalizable. The construction in Theorem DC suggests we form a matrix whose columns are eigenvectors of A

Since \mathop{ det} \left (S\right ) = −1\mathrel{≠}0, we know that S is nonsingular (Theorem SMZD), so the columns of S are a set of 4 linearly independent eigenvectors of A. By the proof of Theorem SMZD we know

a diagonal matrix with the eigenvalues of A along the diagonal, in the same order as the associated eigenvectors appear as columns of S.

C22 Contributed by Robert Beezer Statement [1298]
A calculator will report λ = 0 as an eigenvalue of algebraic multiplicity of 2, and λ = −1 as an eigenvalue of algebraic multiplicity 2 as well. Since eigenvalues are roots of the characteristic polynomial (Theorem EMRCP) we have the factored version

\eqalignno{ λ & = 0 & & \cr A − (0){I}_{4} & = \left [\array{ 19 & 25 & 30 & 5 \cr −23&−30&−35&−5 \cr 7 & 9 & 10 & 1 \cr −3 & −4 & −5 &−1 } \right ]\mathop{\longrightarrow}\limits_{}^{\text{RREF}}\left [\array{ \text{1}&0&−5&−5 \cr 0&\text{1}& 5 & 4 \cr 0&0& 0 & 0 \cr 0&0& 0 & 0 } \right ] & & \cr {ℰ}_{A}\left (0\right ) & = N\kern -1.95872pt \left (C − (0){I}_{4}\right ) = \left \langle \left \{\left [\array{ 5 \cr −5 \cr 1 \cr 0 } \right ],\kern 1.95872pt \left [\array{ 5 \cr −4 \cr 0 \cr 1 } \right ]\right \}\right \rangle & & }

\eqalignno{ λ & = −1 & & \cr A − (−1){I}_{4} & = \left [\array{ 20 & 25 & 30 & 5 \cr −23&−29&−35&−5 \cr 7 & 9 & 11 & 1 \cr −3 & −4 & −5 & 0 } \right ]\mathop{\longrightarrow}\limits_{}^{\text{RREF}}\left [\array{ \text{1}&0&−1& 4 \cr 0&\text{1}& 2 &−3 \cr 0&0& 0 & 0 \cr 0&0& 0 & 0 } \right ] & & \cr {ℰ}_{A}\left (−1\right ) & = N\kern -1.95872pt \left (C − (−1){I}_{4}\right ) = \left \langle \left \{\left [\array{ 1 \cr −2 \cr 1 \cr 0 } \right ],\kern 1.95872pt \left [\array{ −4 \cr 3 \cr 0 \cr 1 } \right ]\right \}\right \rangle & & }

For each of the two eigenvalues, the algebraic and geometric multiplicities are equal. Theorem DMFE says that in this situation the matrix is diagonalizable. We know from Theorem DC that when we diagonalize A the diagonal matrix will have the eigenvalues of A on the diagonal (in some order). So we can claim that

\eqalignno{ {A}^{3} & = {({S}^{−1}BS)}^{3} & & & & \cr & = ({S}^{−1}BS)({S}^{−1}BS)({S}^{−1}BS) & & & & \cr & = {S}^{−1}B(S{S}^{−1})B(S{S}^{−1})BS & &\text{@(a href="fcla-jsmath-2.00li31.html#theorem.MMA")Theorem MMA@(/a)} & & & & \cr & = {S}^{−1}B({I}_{ 3})B({I}_{3})BS & &\text{@(a href="fcla-jsmath-2.00li32.html#definition.MI")Definition MI@(/a)} & & & & \cr & = {S}^{−1}BBBS & &\text{@(a href="fcla-jsmath-2.00li31.html#theorem.MMIM")Theorem MMIM@(/a)} & & & & \cr & = {S}^{−1}{B}^{3}S & & & & }

More generally, if A is similar to B, and m is a non-negative integer, then {A}^{m} is similar to {B}^{m}. This can be proved using induction (Technique I).

T16 Contributed by Steve Canfield Statement [1299]
A being similar to B means that there exists an S such that A = {S}^{−1}BS. So, B = SA{S}^{−1} and because S, A, and {S}^{−1} are nonsingular, by Theorem NPNT, B is nonsingular.

\eqalignno{ {A}^{−1} & ={ \left ({S}^{−1}BS\right )}^{−1} & &\text{@(a href="#definition.SIM")Definition SIM@(/a)} & & & & & & & & \cr & = {S}^{−1}{B}^{−1}{\left ({S}^{−1}\right )}^{−1} & &\text{@(a href="fcla-jsmath-2.00li32.html#theorem.SS")Theorem SS@(/a)} & = {S}^{−1}{B}^{−1}S & &\text{@(a href="fcla-jsmath-2.00li32.html#theorem.MIMI")Theorem MIMI@(/a)} & & & & & & & & }

T17 Contributed by Robert Beezer Statement [1299]
The nonsingular (invertible) matrix B will provide the desired similarity transformation,

\eqalignno{ {B}^{−1}\left (BA\right )B & = \left ({B}^{−1}B\right )\left (AB\right ) & &\text{@(a href="fcla-jsmath-2.00li31.html#theorem.MMA")Theorem MMA@(/a)} & & & & \cr & = {I}_{n}AB & &\text{@(a href="fcla-jsmath-2.00li32.html#definition.MI")Definition MI@(/a)} & & & & \cr & = AB & &\text{@(a href="fcla-jsmath-2.00li31.html#theorem.MMIM")Theorem MMIM@(/a)} & & & & \cr & & & & }

Annotated Acronyms E: Eigenvalues

Theorem EMRCP
Much of what we know about eigenvalues can be traced to analysis of the characteristic polynomial. When we first defined eigenvalues, you might have wondered if they were scarce, or abundant. The characteristic polynomial allows us to answer a question like this with a result like Theorem NEM which tells us there are always a few eigenvalues, but never too many.

Theorem EMNS
If Theorem EMRCP allows us to learn about eigenvalues through what we know about roots of polynomials, then Theorem EMNS allows us to learn about eigenvectors, and eigenspaces, from what we already know about null spaces. These two theorems, along with Definition EEM, provide the starting points for discerning the properties of eigenvalues and eigenvectors (to say nothing of actually computing them).

Theorem HMRE
As we have remarked before, we choose to include all of the complex numbers in our set of allowed scalars, whereas many introductory texts restrict their attention to just the real numbers. Here is one of the payoffs to this approach. Begin with a matrix, possibly containing complex entries, and require the matrix to be Hermitian (Definition HM). In the case of only real entries, this boils down to just requiring the matrix to be symmetric (Definition SYM). Generally, the roots of a characteristic polynomial, even with all real coefficients, can have complex numbers as roots. But for a Hermitian matrix, all of the eigenvalues are real numbers! When somebody tells you mathematics can be beautiful, this is an example of what they are talking about.

Theorem DC
Diagonalizing a matrix, or the question of if a matrix is diagonalizable, could be viewed as one of a handful of central questions in linear algebra. Here we have an unequivocal answer to the question of “if,” along with a proof containing a construction for the diagonalization. So this theorem is of theoretical and computational interest. This topic will be important again in Chapter R.

Theorem DMFE
Another unequivocal answer to the question of if a matrix is diagonalizable, with perhaps a simpler condition to test. The proof also tells us how to construct the necessary set of n linearly independent eigenvectors — just round up bases for each eigenspace and join them together. No need to test the linear independence of the combined set.