Section OD  Orthonormal Diagonalization

From A First Course in Linear Algebra
Version 2.23
© 2004.
Licensed under the GNU Free Documentation License.
http://linear.ups.edu/

This section is in draft form
Theorems & definitions are complete, needs examples

We have seen in Section SD that under the right conditions a square matrix is similar to a diagonal matrix. We recognize now, via Theorem SCB, that a similarity transformation is a change of basis on a matrix representation. So we can now discuss the choice of a basis used to build a matrix representation, and decide if some bases are better than others for this purpose. This will be the tone of this section. We will also see that every matrix has a reasonably useful matrix representation, and we will discover a new class of diagonalizable linear transformations. First we need some basic facts about triangular matrices.

Subsection TM: Triangular Matrices

An upper, or lower, triangular matrix is exactly what it sounds like it should be, but here are the two relevant definitions.

Definition UTM
Upper Triangular Matrix
The n × n square matrix A is upper triangular if {\left [A\right ]}_{ij} = 0 whenever i > j.

Definition LTM
Lower Triangular Matrix
The n × n square matrix A is lower triangular if {\left [A\right ]}_{ij} = 0 whenever i < j.

Obviously, properties of a lower triangular matrices will have analogues for upper triangular matrices. Rather than stating two very similar theorems, we will say that matrices are “triangular of the same type” as a convenient shorthand to cover both possibilities and then give a proof for just one type.

Theorem PTMT
Product of Triangular Matrices is Triangular
Suppose that A and B are square matrices of size n that are triangular of the same type. Then AB is also triangular of that type.

Proof   We prove this for lower triangular matrices and leave the proof for upper triangular matrices to you. Suppose that A and B are both lower triangular. We need only establish that certain entries of the product AB are zero. Suppose that i < j, then

\eqalignno{ {\left [AB\right ]}_{ij} & ={ \mathop{∑ }}_{k=1}^{n}{\left [A\right ]}_{ ik}{\left [B\right ]}_{kj} & &\text{@(a href="fcla-jsmath-2.23li31.html#theorem.EMP")Theorem EMP@(/a)} & & & & \cr & ={ \mathop{∑ }}_{k=1}^{j−1}{\left [A\right ]}_{ ik}{\left [B\right ]}_{kj} +{ \mathop{∑ }}_{k=j}^{n}{\left [A\right ]}_{ ik}{\left [B\right ]}_{kj} & &\text{@(a href="fcla-jsmath-2.23li69.html#property.AACN")Property AACN@(/a)} & & & & \cr & ={ \mathop{∑ }}_{k=1}^{j−1}{\left [A\right ]}_{ ik}0 +{ \mathop{∑ }}_{k=j}^{n}{\left [A\right ]}_{ ik}{\left [B\right ]}_{kj} & &\text{$k < j$, @(a href="#definition.LTM")Definition LTM@(/a)} & & & & \cr & ={ \mathop{∑ }}_{k=1}^{j−1}{\left [A\right ]}_{ ik}0 +{ \mathop{∑ }}_{k=j}^{n}0{\left [B\right ]}_{ kj} & &\text{$i < j ≤ k$, @(a href="#definition.LTM")Definition LTM@(/a)} & & & & \cr & ={ \mathop{∑ }}_{k=1}^{j−1}0 +{ \mathop{∑ }}_{k=j}^{n}0 & & & & \cr & = 0 & & & & }

Since {\left [AB\right ]}_{ij} = 0 whenever i < j, by Definition LTM, AB is lower triangular.

The inverse of a triangular matrix is triangular, of the same type.

Theorem ITMT
Inverse of a Triangular Matrix is Triangular
Suppose that A is a nonsingular matrix of size n that is triangular. Then the inverse of A, {A}^{−1}, is triangular of the same type. Furthermore, the diagonal entries of {A}^{−1} are the reciprocals of the corresponding diagonal entries of A. More precisely, {\left [{A}^{−1}\right ]}_{ ii} ={ \left [A\right ]}_{ii}^{−1}.

Proof   We give the proof for the case when A is lower triangular, and leave the case when A is upper triangular for you. Consider the process for computing the inverse of a matrix that is outlined in the proof of Theorem CINM. We augment A with the size n identity matrix, {I}_{n}, and row-reduce the n × 2n matrix to reduced row-echelon form via the algorithm in Theorem REMEF. The proof involves tracking the peculiarities of this process in the case of a lower triangular matrix. Let M = \left [\left .A\kern 1.95872pt \right \vert \kern 1.95872pt {I}_{n}\right ].

First, none of the diagonal elements of A are zero. By repeated expansion about the first row, the determinant of a lower triangular matrix can be seen to be the product of the diagonal entries (Theorem DER). If just one of these diagonal elements was zero, then the determinant of A is zero and A is singular by Theorem SMZD. Slightly violating the exact algorithm for row reduction we can form a matrix, {M}^{′}, that is row-equivalent to M, by multiplying row i by the nonzero scalar {\left [A\right ]}_{ii}^{−1}, for 1 ≤ i ≤ n. This sets {\left [{M}^{′}\right ]}_{ ii} = 1 and {\left [{M}^{′}\right ]}_{ i,n+1} ={ \left [A\right ]}_{ii}^{−1}, and leaves every zero entry of M unchanged.

Let {M}_{j} denote the matrix obtained form {M}^{′} after converting column j to a pivot column. We can convert column j of {M}_{j−1} into a pivot column with a set of n − j − 1 row operations of the form α{R}_{j} + {R}_{k} with j + 1 ≤ k ≤ n. The key observation here is that we add multiples of row j only to higher-numbered rows. This means that none of the entries in rows 1 through j − 1 is changed, and since row j has zeros in columns j + 1 through n, none of the entries in rows j + 1 through n is changed in columns j + 1 through n. The first n columns of {M}^{′} form a lower triangular matrix with 1’s on the diagonal. In its conversion to the identity matrix through this sequence of row operations, it remains lower triangular with 1’s on the diagonal.

What happens in columns n + 1 through 2n of {M}^{′}? These columns began in M as the identity matrix, and in {M}^{′} each diagonal entry was scaled to a reciprocal of the corresponding diagonal entry of A. Notice that trivially, these final n columns of {M}^{′} form a lower triangular matrix. Just as we argued for the first n columns, the row operations that convert {M}_{j−1} into {M}_{j} will preserve the lower triangular form in the final n columns and preserve the exact values of the diagonal entries. By Theorem CINM, the final n columns of {M}_{n} is the inverse of A, and this matrix has the necessary properties advertised in the conclusion of this theorem.

Subsection UTMR: Upper Triangular Matrix Representation

Not every matrix is diagonalizable, but every linear transformation has a matrix representation that is an upper triangular matrix, and the basis that achieves this representation is especially pleasing. Here’s the theorem.

Theorem UTMR
Upper Triangular Matrix Representation
Suppose that T : V → V is a linear transformation. Then there is a basis B for V such that the matrix representation of T relative to B, {M}_{B,B}^{T }, is an upper triangular matrix. Each diagonal entry is an eigenvalue of T, and if λ is an eigenvalue of T, then λ occurs {α}_{T }\left (λ\right ) times on the diagonal.

Proof   We begin with a proof by induction (Technique I) of the first statement in the conclusion of the theorem. We use induction on the dimension of V to show that if T : V → V is a linear transformation, then there is a basis B for V such that the matrix representation of T relative to B, {M}_{B,B}^{T }, is an upper triangular matrix.

To start suppose that \mathop{ dim}\nolimits \left (V \right ) = 1. Choose any nonzero vector v ∈ V and realize that V = \left \langle \left \{v\right \}\right \rangle . Then we can determine T uniquely by T\left (v\right ) = βv for some β ∈ ℂ (Theorem LTDB). This description of T also gives us a matrix representation relative to the basis B = \left \{v\right \} as the 1 × 1 matrix with lone entry equal to β. And this matrix representation is upper triangular (Definition UTM).

For the induction step let \mathop{ dim}\nolimits \left (V \right ) = m, and assume the theorem is true for every linear transformation defined on a vector space of dimension less than m. By Theorem EMHE (suitably converted to the setting of a linear transformation), T has at least one eigenvalue, and we denote this eigenvalue as λ. (We will remark later about how critical this step is.) We now consider properties of the linear transformation T − λ{I}_{V }: V → V .

Let x be an eigenvector of T for λ. By definition x\mathrel{≠}0. Then

\eqalignno{ \left (T − λ{I}_{V }\right )\left (x\right ) & = T\left (x\right ) − λ{I}_{V }\left (x\right ) & &\text{@(a href="fcla-jsmath-2.23li51.html#theorem.VSLT")Theorem VSLT@(/a)} & & & & \cr & = T\left (x\right ) − λx & &\text{@(a href="fcla-jsmath-2.23li54.html#definition.IDLT")Definition IDLT@(/a)} & & & & \cr & = λx − λx & &\text{@(a href="fcla-jsmath-2.23li58.html#definition.EELT")Definition EELT@(/a)} & & & & \cr & = 0 & &\text{@(a href="fcla-jsmath-2.23li37.html#property.AI")Property AI@(/a)} & & & & }

So T − λ{I}_{V } is not injective, as it has a nontrivial kernel (Theorem KILT). With an application of Theorem RPNDD we bound the rank of T − λ{I}_{V },

\eqalignno{ r\left (T − λ{I}_{V }\right ) & =\mathop{ dim}\nolimits \left (V \right ) − n\left (T − λ{I}_{V }\right ) ≤ m − 1 & & }

Define W to be the subspace of V that is the range of T − λ{I}_{V }, W = ℛ\kern -1.95872pt \left (T − λ{I}_{V }\right ). We define a new linear transformation S, on W,

\eqalignno{ &S : W → W &S\left (w\right ) & = T\left (w\right ) & & & & }

This does not look we have accomplished much, since the action of S is identical to the action of T. For our purposes this will be a good thing. What is different is the domain and codomain. S is defined on W, a vector space with dimension less than m, and so is susceptible to our induction hypothesis. Verifying that S is really a linear transformation is almost entirely routine, with one exception. Employing T in our definition of S raises the possibility that the outputs of S will not be contained within W (but instead will lie inside V , but outside W). To examine this possibility, suppose that w ∈ W.

\eqalignno{ S\left (w\right ) & = T\left (w\right ) & & & & \cr & = T\left (w\right ) + 0 & &\text{@(a href="fcla-jsmath-2.23li37.html#property.Z")Property Z@(/a)} & & & & \cr & = T\left (w\right ) + \left (λ{I}_{V }\left (w\right ) − λ{I}_{V }\left (w\right )\right ) & &\text{@(a href="fcla-jsmath-2.23li37.html#property.AI")Property AI@(/a)} & & & & \cr & = \left (T\left (w\right ) − λ{I}_{V }\left (w\right )\right ) + λ{I}_{V }\left (w\right ) & &\text{@(a href="fcla-jsmath-2.23li37.html#property.AA")Property AA@(/a)} & & & & \cr & = \left (T\left (w\right ) − λ{I}_{V }\left (w\right )\right ) + λw & &\text{@(a href="fcla-jsmath-2.23li54.html#definition.IDLT")Definition IDLT@(/a)} & & & & \cr & = \left (T − λ{I}_{V }\right )\left (w\right ) + λw & &\text{@(a href="fcla-jsmath-2.23li51.html#theorem.VSLT")Theorem VSLT@(/a)} & & & & }

Since W is the range of T − λ{I}_{V }, \left (T − λ{I}_{V }\right )\left (w\right ) ∈ W. And by Property SC, λw ∈ W. Finally, applying Property AC we see by closure that the sum is in W and so we conclude that S\left (w\right ) ∈ W. This argument convinces us that it is legitimate to define S as we did with W as the codomain.

S is a linear transformation defined on a vector space with dimension less than m, so we can apply the induction hypothesis and conclude that W has a basis, C = \left \{{w}_{1},\kern 1.95872pt {w}_{2},\kern 1.95872pt {w}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {w}_{k}\right \}, such that the matrix representation of S relative to C is an upper triangular matrix.

By Theorem DSFOS there exists a second subspace of V , which we will call U, so that V is a direct sum of W and U, V = W ⊕ U. Choose a basis D = \left \{{u}_{1},\kern 1.95872pt {u}_{2},\kern 1.95872pt {u}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {u}_{ℓ}\right \} for U. So m = k + ℓ by Theorem DSD, and B = C ∪ D is basis for V by Theorem DSLI and Theorem G. B is the basis we desire. What does a matrix representation of T look like, relative to B?

Since the definition of T and S agree on W, the first k columns of {M}_{B,B}^{T } will have the upper triangular matrix representation of S in the first k rows. The remaining ℓ = m − k rows of these first k columns will be all zeros since the outputs of T on C are all contained in W. The situation for T on D is not quite as pretty, but it is close.

For 1 ≤ i ≤ ℓ, consider

\eqalignno{ {ρ}_{B}\left (T\left ({u}_{i}\right )\right ) & = {ρ}_{B}\left (T\left ({u}_{i}\right ) + 0\right ) & &\text{@(a href="fcla-jsmath-2.23li37.html#property.Z")Property Z@(/a)} & & & & \cr & = {ρ}_{B}\left (T\left ({u}_{i}\right ) + \left (λ{I}_{V }\left ({u}_{i}\right ) − λ{I}_{V }\left ({u}_{i}\right )\right )\right ) & &\text{@(a href="fcla-jsmath-2.23li37.html#property.AI")Property AI@(/a)} & & & & \cr & = {ρ}_{B}\left (\left (T\left ({u}_{i}\right ) − λ{I}_{V }\left ({u}_{i}\right )\right ) + λ{I}_{V }\left ({u}_{i}\right )\right ) & &\text{@(a href="fcla-jsmath-2.23li37.html#property.AA")Property AA@(/a)} & & & & \cr & = {ρ}_{B}\left (\left (T\left ({u}_{i}\right ) − λ{I}_{V }\left ({u}_{i}\right )\right ) + λ{u}_{i}\right ) & &\text{@(a href="fcla-jsmath-2.23li54.html#definition.IDLT")Definition IDLT@(/a)} & & & & \cr & = {ρ}_{B}\left (\left (T − λ{I}_{V }\right )\left ({u}_{i}\right ) + λ{u}_{i}\right ) & &\text{@(a href="fcla-jsmath-2.23li51.html#theorem.VSLT")Theorem VSLT@(/a)} & & & & \cr & = {ρ}_{B}\left ({a}_{1}{w}_{1} + {a}_{2}{w}_{2} + {a}_{3}{w}_{3} + \mathrel{⋯} + {a}_{k}{w}_{k} + λ{u}_{i}\right ) & &\text{@(a href="fcla-jsmath-2.23li53.html#definition.RLT")Definition RLT@(/a)} & & & & \cr & = \left [\array{ {a}_{1} \cr {a}_{2} \cr \mathop{\mathop{⋮}}\cr {a}_{ k} \cr 0\cr \mathop{\mathop{⋮}} \cr 0\cr λ \cr 0\cr \mathop{\mathop{⋮}} \cr 0 } \right ] & &\text{@(a href="fcla-jsmath-2.23li56.html#definition.VR")Definition VR@(/a)} & & & & }

In the penultimate step of this proof, we have rewritten an element of the range of T − λ{I}_{V } as a linear combination of the basis vectors, C, for the range of T − λ{I}_{V }, W, using the scalars {a}_{1},\kern 1.95872pt {a}_{2},\kern 1.95872pt {a}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {a}_{k}. If we incorporate these column vectors into the matrix representation {M}_{B,B}^{T } we find occurrences of λ on the diagonal, and any nonzero entries lying only in the first k rows. Together with the k × k upper triangular representation in the upper left-hand corner, the entire matrix representation is now clearly upper triangular. This completes the induction step, so for any linear transformation there is a basis that creates an upper triangular matrix representation.

We have one more statement in the conclusion of the theorem to verify. The eigenvalues of T, and their multiplicities, can be computed with the techniques of Chapter E relative to any matrix representation (Theorem EER). We take this approach with our upper triangular matrix representation {M}_{B,B}^{T }. Let {d}_{i} be the diagonal entry of {M}_{B,B}^{T } in row i and column i. Then the characteristic polynomial, computed as a determinant (Definition CP) with repeated expansions about the first column, is

\eqalignno{ {p}_{{M}_{B,B}^{T}}\left (x\right ) & = \left ({d}_{1} − x\right )\left ({d}_{2} − x\right )\left ({d}_{3} − x\right )\mathrel{⋯}\left ({d}_{m} − x\right ) & & }

The roots of the polynomial equation {p}_{{M}_{B,B}^{T}}\left (x\right ) = 0 are the eigenvalues of the linear transformation (Theorem EMRCP). So each diagonal entry is an eigenvalue, and is repeated on the diagonal exactly {α}_{T }\left (λ\right ) times (Definition AME).

A key step in this proof was the construction of the subspace W with dimension strictly less than that of V . This required an eigenvalue/eigenvector pair, which was guaranteed to us by Theorem EMHE. Digging deeper, the proof of Theorem EMHE requires that we can factor polynomials completely, into linear factors. This will not always happen if our set of scalars is the reals, {ℝ}^{}. So this is our final explanation of our choice of the complex numbers, , as our set of scalars. In polynomials factor completely, so every matrix has at least one eigenvalue, and an inductive argument will get us to upper triangular matrix representations.

In the case of linear transformations defined on {ℂ}^{m}, we can use the inner product (Definition IP) profitably to fine-tune the basis that yields an upper triangular matrix representation. Recall that the adjoint of matrix A (Definition A) is written as {A}^{∗}.

Theorem OBUTR
Orthonormal Basis for Upper Triangular Representation
Suppose that A is a square matrix. Then there is a unitary matrix U, and an upper triangular matrix T, such that

\eqalignno{ {U}^{∗}AU = T & & }

and T has the eigenvalues of A as the entries of the diagonal.

Proof   This theorem is a statement about matrices and similarity. We can convert it to a statement about linear transformations, matrix representations and bases (Theorem SCB). Suppose that A is an n × n matrix, and define the linear transformation S : {ℂ}^{n} → {ℂ}^{n} by S\left (x\right ) = Ax. Then Theorem UTMR gives us a basis B = \left \{{v}_{1},\kern 1.95872pt {v}_{2},\kern 1.95872pt {v}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {v}_{n}\right \} for {ℂ}^{n} such that a matrix representation of S relative to B, {M}_{B,B}^{S}, is upper triangular.

Now convert the basis B into an orthogonal basis, C, by an application of the Gram-Schmidt procedure (Theorem GSP). This is a messy business computationally, but here we have an excellent illustration of the power of the Gram-Schmidt procedure. We need only be sure that B is linearly independent and spans {ℂ}^{n}, and then we know that C is linearly independent, spans {ℂ}^{n} and is also an orthogonal set. We will now consider the matrix representation of S relative to C (rather than B). Write the new basis as C = \left \{{y}_{1},\kern 1.95872pt {y}_{2},\kern 1.95872pt {y}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {y}_{n}\right \}. The application of the Gram-Schmidt procedure creates each vector of C, say {y}_{j}, as the difference of {v}_{j} and a linear combination of {y}_{1},\kern 1.95872pt {y}_{2},\kern 1.95872pt {y}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {y}_{j−1}. We are not concerned here with the actual values of the scalars in this linear combination, so we will write

\eqalignno{ {y}_{j} & = {v}_{j} −{\mathop{∑ }}_{k=1}^{j−1}{b}_{ jk}{y}_{k} & & }

where the {b}_{jk} are shorthand for the scalars. The equation above is in a form useful for creating the basis C from B. To better understand the relationship between B and C convert it to read

\eqalignno{ {v}_{j} & = {y}_{j} +{ \mathop{∑ }}_{k=1}^{j−1}{b}_{ jk}{y}_{k} & & }

In this form, we recognize that the change-of-basis matrix {C}_{B,C} = {M}_{B,C}^{{I}_{{ℂ}^{n}}} (Definition CBM) is an upper triangular matrix. By Theorem SCB we have

\eqalignno{ {M}_{C,C}^{S} & = {C}_{ B,C}{M}_{B,B}^{S}{C}_{ B,C}^{−1} & & }

The inverse of an upper triangular matrix is upper triangular (Theorem ITMT), and the product of two upper triangular matrices is again upper triangular (Theorem PTMT). So {M}_{C,C}^{S} is an upper triangular matrix.

Now, multiply each vector of C by a nonzero scalar, so that the result has norm 1. In this way we create a new basis D which is an orthonormal set (Definition ONS). Note that the change-of-basis matrix {C}_{C,D} is a diagonal matrix with nonzero entries equal to the norms of the vectors in C.

Now we can convert our results into the language of matrices. Let E be the basis of {ℂ}^{n} formed with the standard unit vectors (Definition SUV). Then the matrix representation of S relative to E is simply A, A = {M}_{E,E}^{S}. The change-of-basis matrix {C}_{D,E} has columns that are simply the vectors in D, the orthonormal basis. As such, Theorem CUMOS tells us that {C}_{D,E} is a unitary matrix, and by Definition UM has an inverse equal to its adjoint. Write U = {C}_{D,E}. We have

\eqalignno{ {U}^{∗}AU & = {U}^{−1}AU & &\text{@(a href="fcla-jsmath-2.23li33.html#theorem.UMI")Theorem UMI@(/a)} & & & & \cr & = {C}_{D,E}^{−1}{M}_{ E,E}^{S}{C}_{ D,E} & & & & \cr & = {M}_{D,D}^{S} & &\text{@(a href="fcla-jsmath-2.23li58.html#theorem.SCB")Theorem SCB@(/a)} & & & & \cr & = {C}_{C,D}{M}_{C,C}^{S}{C}_{ C,D}^{−1} & &\text{@(a href="fcla-jsmath-2.23li58.html#theorem.SCB")Theorem SCB@(/a)} & & & & }

The inverse of a diagonal matrix is also a diagonal matrix, and so this final expression is the product of three upper triangular matrices, and so is again upper triangular (Theorem PTMT). Thus the desired upper triangular matrix, T, is the matrix representation of S relative to the orthonormal basis D, {M}_{D,D}^{S}.

Subsection NM: Normal Matrices

Normal matrices comprise a broad class of interesting matrices, many of which we have met already. But they are most interesting since they define exactly which matrices we can diagonalize via a unitary matrix. This is the upcoming Theorem OD. Here’s the definition.

Definition NRML
Normal Matrix
The square matrix A is normal if {A}^{∗}A = A{A}^{∗}.

So a normal matrix commutes with its adjoint. Part of the beauty of this definition is that it includes many other types of matrices. A diagonal matrix will commute with its adjoint, since the adjoint is again diagonal and the entries are just conjugates of the entries of the original diagonal matrix. A Hermitian (self-adjoint) matrix (Definition HM) will trivially commute with its adjoint, since the two matrices are the same. A real, symmetric matrix is Hermitian, so these matrices are also normal. A unitary matrix (Definition UM) has its adjoint as its inverse, and inverses commute (Theorem OSIS), so unitary matrices are normal. Another class of normal matrices is the skew-symmetric matrices. However, these broad descriptions still do not capture all of the normal matrices, as the next example shows.

Example ANM
A normal matrix
Let

\eqalignno{ A & = \left [\array{ 1&−1\cr 1& 1 } \right ] & & }

Then

\eqalignno{ \left [\array{ 1&−1\cr 1& 1 } \right ]\left [\array{ 1 &1\cr −1 &1 } \right ] & = \left [\array{ 2&0\cr 0&2 } \right ] = \left [\array{ 1 &1\cr −1 &1 } \right ]\left [\array{ 1&−1\cr 1& 1 } \right ] & & }

so we see by Definition NRML that A is normal. However, A is not symmetric (hence, as a real matrix, not Hermitian), not unitary, and not skew-symmetric.

Subsection OD: Orthonormal Diagonalization

A diagonal matrix is very easy to work with in matrix multiplication (Example HPDM) and an orthonormal basis also has many advantages (Theorem COB). How about converting a matrix to a diagonal matrix through a similarity transformation using a unitary matrix (i.e. build a diagonal matrix representation with an orthonormal matrix)? That’d be fantastic! When can we do this? We can always accomplish this feat when the matrix is normal, and normal matrices are the only ones that behave this way. Here’s the theorem.

Theorem OD
Orthonormal Diagonalization
Suppose that A is a square matrix. Then there is a unitary matrix U and a diagonal matrix D, with diagonal entries equal to the eigenvalues of A, such that {U}^{∗}AU = D if and only if A is a normal matrix.

Proof   () Suppose there is a unitary matrix U that diagonalizes A, resulting in D, i.e. {U}^{∗}AU = D. We check the normality of A,

\eqalignno{ {A}^{∗}A & = {I}_{ n}{A}^{∗}{I}_{ n}A{I}_{n} & &\text{@(a href="fcla-jsmath-2.23li31.html#theorem.MMIM")Theorem MMIM@(/a)} & & & & \cr & = U{U}^{∗}{A}^{∗}U{U}^{∗}AU{U}^{∗} & &\text{@(a href="fcla-jsmath-2.23li33.html#definition.UM")Definition UM@(/a)} & & & & \cr & = U{U}^{∗}{A}^{∗}UD{U}^{∗} & & & & \cr & = U{U}^{∗}{A}^{∗}{\left ({U}^{∗}\right )}^{∗}D{U}^{∗} & &\text{@(a href="fcla-jsmath-2.23li30.html#theorem.AA")Theorem AA@(/a)} & & & & \cr & = U{\left ({U}^{∗}AU\right )}^{∗}D{U}^{∗} & &\text{Adjoint of a product} & & & & \cr & = U{D}^{∗}D{U}^{∗} & & & & \cr & = U{\left (\overline{D}\right )}^{t}D{U}^{∗} & &\text{@(a href="fcla-jsmath-2.23li30.html#definition.A")Definition A@(/a)} & & & & \cr & = U\overline{D}D{U}^{∗} & &\text{Diagonal matrix} & & & & \cr & = UD\overline{D}{U}^{∗} & &\text{@(a href="fcla-jsmath-2.23li69.html#property.CMCN")Property CMCN@(/a)} & & & & \cr & = UD{\left (\overline{D}\right )}^{t}{U}^{∗} & &\text{Diagonal matrix} & & & & \cr & = UD{D}^{∗}{U}^{∗} & &\text{@(a href="fcla-jsmath-2.23li30.html#definition.A")Definition A@(/a)} & & & & \cr & = UD{\left ({U}^{∗}AU\right )}^{∗}{U}^{∗} & & & & \cr & = UD{U}^{∗}{A}^{∗}{\left ({U}^{∗}\right )}^{∗}{U}^{∗} & &\text{Adjoint of a product} & & & & \cr & = UD{U}^{∗}{A}^{∗}U{U}^{∗} & &\text{@(a href="fcla-jsmath-2.23li30.html#theorem.AA")Theorem AA@(/a)} & & & & \cr & = U{U}^{∗}AU{U}^{∗}{A}^{∗}U{U}^{∗} & & & & \cr & = {I}_{n}A{I}_{n}{A}^{∗}{I}_{ n} & &\text{@(a href="fcla-jsmath-2.23li33.html#definition.UM")Definition UM@(/a)} & & & & \cr & = A{A}^{∗} & &\text{@(a href="fcla-jsmath-2.23li31.html#theorem.MMIM")Theorem MMIM@(/a)} & & & & }

So by Definition NRML, A is a normal matrix.

() For the converse, suppose that A is a normal matrix. Whether or not A is normal, Theorem OBUTR provides a unitary matrix U and an upper triangular matrix T, whose diagonal entries are the eigenvalues of A, and such that {U}^{∗}AU = T. With the added condition that A is normal, we will determine that the entries of T above the diagonal must be all zero. Here we go. First we show that T is normal.

\eqalignno{ {T}^{∗}T & ={ \left ({U}^{∗}AU\right )}^{∗}{U}^{∗}AU & & & & \cr & = {U}^{∗}{A}^{∗}{\left ({U}^{∗}\right )}^{∗}{U}^{∗}AU & &\text{Adjoint of a product} & & & & \cr & = {U}^{∗}{A}^{∗}U{U}^{∗}AU & &\text{@(a href="fcla-jsmath-2.23li30.html#theorem.AA")Theorem AA@(/a)} & & & & \cr & = {U}^{∗}{A}^{∗}{I}_{ n}AU & &\text{@(a href="fcla-jsmath-2.23li33.html#definition.UM")Definition UM@(/a)} & & & & \cr & = {U}^{∗}{A}^{∗}AU & &\text{@(a href="fcla-jsmath-2.23li31.html#theorem.MMIM")Theorem MMIM@(/a)} & & & & \cr & = {U}^{∗}A{A}^{∗}U & &\text{@(a href="#definition.NRML")Definition NRML@(/a)} & & & & \cr & = {U}^{∗}A{I}_{ n}{A}^{∗}U & &\text{@(a href="fcla-jsmath-2.23li31.html#theorem.MMIM")Theorem MMIM@(/a)} & & & & \cr & = {U}^{∗}AU{U}^{∗}{A}^{∗}U & &\text{@(a href="fcla-jsmath-2.23li33.html#definition.UM")Definition UM@(/a)} & & & & \cr & = {U}^{∗}AU{U}^{∗}{A}^{∗}{\left ({U}^{∗}\right )}^{∗} & &\text{@(a href="fcla-jsmath-2.23li30.html#theorem.AA")Theorem AA@(/a)} & & & & \cr & = {U}^{∗}AU{\left ({U}^{∗}AU\right )}^{∗} & &\text{Adjoint of a product} & & & & \cr & = T{T}^{∗} & & & & }

So by Definition NRML, T is a normal matrix.

We can translate the normality of T into the statement T{T}^{∗}− {T}^{∗}T = O. We now establish an equality we will use repeatedly. For 1 ≤ i ≤ n,

\eqalignno{ 0 & ={ \left [O\right ]}_{ii} & &\text{@(a href="fcla-jsmath-2.23li30.html#definition.ZM")Definition ZM@(/a)} & & & & \cr & ={ \left [T{T}^{∗}− {T}^{∗}T\right ]}_{ ii} & &\text{@(a href="#definition.NRML")Definition NRML@(/a)} & & & & \cr & ={ \left [T{T}^{∗}\right ]}_{ ii} −{\left [{T}^{∗}T\right ]}_{ ii} & &\text{@(a href="fcla-jsmath-2.23li30.html#definition.MA")Definition MA@(/a)} & & & & \cr & ={ \mathop{∑ }}_{k=1}^{n}{\left [T\right ]}_{ ik}{\left [{T}^{∗}\right ]}_{ ki} −{\mathop{∑ }}_{k=1}^{n}{\left [{T}^{∗}\right ]}_{ ik}{\left [T\right ]}_{ki} & &\text{@(a href="fcla-jsmath-2.23li31.html#theorem.EMP")Theorem EMP@(/a)} & & & & \cr & ={ \mathop{∑ }}_{k=1}^{n}{\left [T\right ]}_{ ik}\overline{{\left [T\right ]}_{ik}} −{\mathop{∑ }}_{k=1}^{n}\overline{{\left [T\right ]}_{ ki}}{\left [T\right ]}_{ki} & &\text{@(a href="fcla-jsmath-2.23li30.html#definition.A")Definition A@(/a)} & & & & \cr & ={ \mathop{∑ }}_{k=i}^{n}{\left [T\right ]}_{ ik}\overline{{\left [T\right ]}_{ik}} −{\mathop{∑ }}_{k=1}^{i}\overline{{\left [T\right ]}_{ ki}}{\left [T\right ]}_{ki} & &\text{@(a href="#definition.UTM")Definition UTM@(/a)} & & & & \cr & ={ \mathop{∑ }}_{k=i}^{n}{\left \vert {\left [T\right ]}_{ ik}\right \vert }^{2} −{\mathop{∑ }}_{k=1}^{i}{\left \vert {\left [T\right ]}_{ ki}\right \vert }^{2} & &\text{@(a href="fcla-jsmath-2.23li69.html#definition.MCN")Definition MCN@(/a)} & & & & }

To conclude, we use the above equality repeatedly, beginning with i = 1, and discover, row by row, that the entries above the diagonal of T are all zero. The key observation is that a sum of squares can only equal zero when each term of the sum is zero. For i = 1 we have

\eqalignno{ 0 & ={ \mathop{∑ }}_{k=1}^{n}{\left \vert {\left [T\right ]}_{ 1k}\right \vert }^{2} −{\mathop{∑ }}_{k=1}^{1}{\left \vert {\left [T\right ]}_{ k1}\right \vert }^{2} ={ \mathop{∑ }}_{k=2}^{n}{\left \vert {\left [T\right ]}_{ 1k}\right \vert }^{2} & & }

which forces the conclusions

\eqalignno{ {\left [T\right ]}_{12} & = 0 &{\left [T\right ]}_{13} & = 0 &{\left [T\right ]}_{14} & = 0 &\mathrel{⋯} & &{\left [T\right ]}_{1n} & = 0 & & & & & & & & & & & & & }

For i = 2 we use the same equality, but also incorporate the portion of the above conclusions that says {\left [T\right ]}_{12} = 0,

\eqalignno{ 0 & ={ \mathop{∑ }}_{k=2}^{n}{\left \vert {\left [T\right ]}_{ 2k}\right \vert }^{2} −{\mathop{∑ }}_{k=1}^{2}{\left \vert {\left [T\right ]}_{ k2}\right \vert }^{2} ={ \mathop{∑ }}_{k=2}^{n}{\left \vert {\left [T\right ]}_{ 2k}\right \vert }^{2} −{\mathop{∑ }}_{k=2}^{2}{\left \vert {\left [T\right ]}_{ k2}\right \vert }^{2} ={ \mathop{∑ }}_{k=3}^{n}{\left \vert {\left [T\right ]}_{ 2k}\right \vert }^{2} & & }

which forces the conclusions

\eqalignno{ {\left [T\right ]}_{23} & = 0 &{\left [T\right ]}_{24} & = 0 &{\left [T\right ]}_{25} & = 0 &\mathrel{⋯} & &{\left [T\right ]}_{2n} & = 0 & & & & & & & & & & & & & }

We can repeat this process for the subsequent values of i = 3,\kern 1.95872pt 4,\kern 1.95872pt 5\mathop{\mathop{…}},\kern 1.95872pt n − 1. Notice that it is critical we do this in order, since we need to employ portions of each of the previous conclusions about rows having zero entries in order to successfully get the same conclusion for later rows. Eventually, we conclude that all of the nondiagonal entries of T are zero, so the extra assumption of normality forces T to be diagonal.

We can rearrange the conclusion of this theorem to read A = UD{U}^{∗}. Recall that a unitary matrix can be viewed as a geometry-preserving transformation (isometry), or more loosely as a rotation of sorts. Then a matrix-vector product, Ax, can be viewed instead as a sequence of three transformations. {U}^{∗} is unitary, so is a rotation. Since D is diagonal, it just multiplies each entry of a vector by a scalar. Diagonal entries that are positive or negative, with absolute values bigger or smaller than 1 evoke descriptions like reflection, expansion and contraction. Generally we can say that D “stretches” a vector in each component. Final multiplication by U undoes (inverts) the rotation performed by {U}^{∗}. So a normal matrix is a rotation-stretch-rotation transformation.

The orthonormal basis formed from the columns of U can be viewed as a system of mutually perpendicular axes. The rotation by {U}^{∗} allows the transformation by A to be replaced by the simple transformation D along these axes, and then D brings the result back to the original coordinate system. For this reason Theorem OD is known as the Principal Axis Theorem.

The columns of the unitary matrix in Theorem OD create an especially nice basis for use with the normal matrix. We record this observation as a theorem.

Theorem OBNM
Orthonormal Bases and Normal Matrices
Suppose that A is a normal matrix of size n. Then there is an orthonormal basis of {ℂ}^{n} composed of eigenvectors of A.

Proof   Let U be the unitary matrix promised by Theorem OD and let D be the resulting diagonal matrix. The desired set of vectors is formed by collecting the columns of U into a set. Theorem CUMOS says this set of columns is orthonormal. Since U is nonsingular (Theorem UMI), Theorem CNMB says the set is a basis.

Since A is diagonalized by U, the diagonal entries of the matrix D are the eigenvalues of A. An argument exactly like the second half of the proof of Theorem DC shows that each vector of the basis is an eigenvector of A.

In a vague way Theorem OBNM is an improvement on Theorem HMOE which said that eigenvectors of a Hermitian matrix for different eigenvalues are always orthogonal. Hermitian matrices are normal and we see that we can find at least one basis where every pair of eigenvectors is orthogonal. Notice that this is not a generalization, since Theorem HMOE states a weak result which applies to many (but not all) pairs of eigenvectors, while Theorem OBNM is a seemingly stronger result, but only asserts that there is one collection of eigenvectors with the stronger property.