##### DefinitionUTMUpper Triangular Matrix

The \(n\times n\) square matrix \(A\) is *upper triangular* if \(\matrixentry{A}{ij}=0\) whenever \(i\gt j\text{.}\)

We have seen in Section SD that under the right conditions a square matrix is similar to a diagonal matrix. We recognize now, via Theorem SCB, that a similarity transformation is a change of basis on a matrix representation. So we can now discuss the choice of a basis used to build a matrix representation, and decide if some bases are better than others for this purpose. This will be the tone of this section. We will also see that every matrix has a reasonably useful matrix representation, and we will discover a new class of diagonalizable linear transformations. First we need some basic facts about triangular matrices.

An upper, or lower, triangular matrix is exactly what it sounds like it should be, but here are the two relevant definitions.

The \(n\times n\) square matrix \(A\) is *upper triangular* if \(\matrixentry{A}{ij}=0\) whenever \(i\gt j\text{.}\)

The \(n\times n\) square matrix \(A\) is *lower triangular* if \(\matrixentry{A}{ij}=0\) whenever \(i\lt j\text{.}\)

Obviously, properties of a lower triangular matrices will have analogues for upper triangular matrices. Rather than stating two very similar theorems, we will say that matrices are “triangular of the same type” as a convenient shorthand to cover both possibilities and then give a proof for just one type.

Suppose that \(A\) and \(B\) are square matrices of size \(n\) that are triangular of the same type. Then \(AB\) is also triangular of that type.

The inverse of a triangular matrix is triangular, of the same type.

Suppose that \(A\) is a nonsingular matrix of size \(n\) that is triangular. Then the inverse of \(A\text{,}\) \(\inverse{A}\text{,}\) is triangular of the same type. Furthermore, the diagonal entries of \(\inverse{A}\) are the reciprocals of the corresponding diagonal entries of \(A\text{.}\) More precisely, \(\matrixentry{\inverse{A}}{ii}=\matrixentry{A}{ii}^{-1}\text{.}\)

Not every matrix is diagonalizable, but every linear transformation has a matrix representation that is an upper triangular matrix, and the basis that achieves this representation is especially pleasing. Here is the theorem.

Suppose that \(\ltdefn{T}{V}{V}\) is a linear transformation. Then there is a basis \(B\) for \(V\) such that the matrix representation of \(T\) relative to \(B\text{,}\) \(\matrixrep{T}{B}{B}\text{,}\) is an upper triangular matrix. Each diagonal entry is an eigenvalue of \(T\text{,}\) and if \(\lambda\) is an eigenvalue of \(T\text{,}\) then \(\lambda\) occurs \(\algmult{T}{\lambda}\) times on the diagonal.

A key step in this proof was the construction of the subspace \(W\) with dimension strictly less than that of \(V\text{.}\) This required an eigenvalue/eigenvector pair, which was guaranteed to us by Theorem EMHE. Digging deeper, the proof of Theorem EMHE requires that we can factor polynomials completely, into linear factors. This will not always happen if our set of scalars is the reals, \(\reals\text{.}\) So this is our final explanation of our choice of the complex numbers, \(\complexes\text{,}\) as our set of scalars. In \(\complexes\) polynomials factor completely, so every matrix has at least one eigenvalue, and an inductive argument will get us to upper triangular matrix representations.

In the case of linear transformations defined on \(\complex{m}\text{,}\) we can use the inner product (Definition IP) profitably to fine-tune the basis that yields an upper triangular matrix representation. Recall that the adjoint of matrix \(A\) (Definition A) is written as \(\adjoint{A}\text{.}\)

Suppose that \(A\) is a square matrix. Then there is a unitary matrix \(U\text{,}\) and an upper triangular matrix \(T\text{,}\) such that \begin{gather*} \adjoint{U}AU=T \end{gather*} and \(T\) has the eigenvalues of \(A\) as the entries of the diagonal.

Normal matrices comprise a broad class of interesting matrices, many of which we have met already. But they are most interesting since they define exactly which matrices we can diagonalize via a unitary matrix. This is the upcoming Theorem OD. Here is the definition.

The square matrix \(A\) is normal if \(\adjoint{A}A=A\adjoint{A}\text{.}\)

So a normal matrix commutes with its adjoint. Part of the beauty of this definition is that it includes many other types of matrices. A diagonal matrix will commute with its adjoint, since the adjoint is again diagonal and the entries are just conjugates of the entries of the original diagonal matrix. A Hermitian (self-adjoint) matrix (Definition HM) will trivially commute with its adjoint, since the two matrices are the same. A real, symmetric matrix is Hermitian, so these matrices are also normal. A unitary matrix (Definition UM) has its adjoint as its inverse, and inverses commute (Theorem OSIS), so unitary matrices are normal. Another class of normal matrices is the skew-symmetric matrices. However, these broad descriptions still do not capture all of the normal matrices, as the next example shows.

A diagonal matrix is very easy to work with in matrix multiplication (Example HPDM) and an orthonormal basis also has many advantages (Theorem COB). How about converting a matrix to a diagonal matrix through a similarity transformation using a unitary matrix (i.e. build a diagonal matrix representation with an orthonormal matrix)? That'd be fantastic! When can we do this? We can always accomplish this feat when the matrix is normal, and normal matrices are the only ones that behave this way. Here is the theorem.

Suppose that \(A\) is a square matrix. Then there is a unitary matrix \(U\) and a diagonal matrix \(D\text{,}\) with diagonal entries equal to the eigenvalues of \(A\text{,}\) such that \(\adjoint{U}AU=D\) if and only if \(A\) is a normal matrix.

We can rearrange the conclusion of this theorem to read \(A=UD\adjoint{U}\text{.}\) Recall that a unitary matrix can be viewed as a geometry-preserving transformation (isometry), or more loosely as a rotation of sorts. Then a matrix-vector product, \(A\vect{x}\text{,}\) can be viewed instead as a sequence of three transformations. \(\adjoint{U}\) is unitary, and so is a rotation. Since \(D\) is diagonal, it just multiplies each entry of a vector by a scalar. Diagonal entries that are positive or negative, with absolute values bigger or smaller than 1 evoke descriptions like reflection, expansion and contraction. Generally we can say that \(D\) “stretches” a vector in each component. Final multiplication by \(U\) undoes (inverts) the rotation performed by \(\adjoint{U}\text{.}\) So a normal matrix is a rotation-stretch-rotation transformation.

The orthonormal basis formed from the columns of \(U\) can be viewed as a system of mutually perpendicular axes. The rotation by \(\adjoint{U}\) allows the transformation by \(A\) to be replaced by the simple transformation \(D\) along these axes, and then \(D\) brings the result back to the original coordinate system. For this reason Theorem OD is known as the *Principal Axis Theorem*.

The columns of the unitary matrix in Theorem OD create an especially nice basis for use with the normal matrix. We record this observation as a theorem.

Suppose that \(A\) is a normal matrix of size \(n\text{.}\) Then there is an orthonormal basis of \(\complex{n}\) composed of eigenvectors of \(A\text{.}\)

In a vague way Theorem OBNM is an improvement on Theorem HMOE which said that eigenvectors of a Hermitian matrix for different eigenvalues are always orthogonal. Hermitian matrices are normal and we see that we can find at least one basis where *every* pair of eigenvectors is orthogonal. Notice that this is not a generalization, since Theorem HMOE states a weak result which applies to many (but not all) pairs of eigenvectors, while Theorem OBNM is a seemingly stronger result, but only asserts that there is one collection of eigenvectors with the stronger property.

Given an \(n\times n\) matrix \(A\text{,}\) an orthonormal basis for \(\complex{n}\text{,}\) comprised of eigenvectors of \(A\) is an extremely useful basis to have at the service of the matrix \(A\text{.}\) Why do we say this? We can consider the vectors of a basis as a preferred set of directions, known as “axes,” which taken together might also be called a “coordinate system.” The standard basis of Definition SUV could be considered the default, or prototype, coordinate system. When a basis is orthornormal, we can consider the directions to be standardized to have unit length, and we can consider the axes as being mutually perpendicular. But there is more — let us be a bit more formal.

Suppose \(U\) is a matrix whose columns are an orthonormal basis of eigenvectors of the \(n\times n\) matrix \(A\text{.}\) So, in particular \(U\) is a unitary matrix (Theorem CUMOS). For a vector \(\vect{x}\in\complex{n}\text{,}\) use the notation \(\hat{\vect{x}}\) for the vector representation of \(\vect{x}\) relative to the orthonormal basis. So the entries of \(\hat{\vect{x}}\text{,}\) used in a linear combination of the columns of \(U\) will create \(\vect{x}\text{.}\) With Definition MVP, we can write this relationship as \begin{gather*} U\hat{\vect{x}} = \vect{x}\text{.} \end{gather*} Since \(\adjoint{U}\) is the inverse of \(U\) (Definition UM), we can rearrange this equation as \begin{gather*} \hat{\vect{x}} = \adjoint{U}\vect{x}\text{.} \end{gather*} This says we can easily create the vector representation relative to the orthonormal basis with a matrix-vector product of the adjoint of \(U\text{.}\) Note that the adjoint is much easier to compute than a matrix inverse, which would be one general way to obtain a vector representation. This is our first observation about coordinatization relative to orthonormal basis. However, we already knew this, as we just have Theorem COB in disguise (see Exercise OD.T20).

We also know that orthonormal bases play nicely with inner products. Theorem UMPIP says unitary matrices preserve inner products (and hence norms). More geometrically, lengths and angles are preserved by multiplication by a unitary matrix. Using our notation, this becomes \begin{gather*} \innerproduct{\vect{x}}{\vect{y}} =\innerproduct{U\hat{\vect{x}}}{U\hat{\vect{y}}} =\innerproduct{\hat{\vect{x}}}{\hat{\vect{y}}}\text{.} \end{gather*} So we can compute inner products with the original vectors, or with their representations, and obtain the same result. It follows that norms, lengths, and angles can all be computed with the original vectors or with the representations in the new coordinate system based on an orthonormal basis.

So far we have not really said anything new, nor has the matrix \(A\text{,}\) or its eigenvectors, come into play. We know that a matrix is really a linear transformation, so we express this view of a matrix as a function by writing generically that \(A\vect{x}=\vect{y}\text{.}\) The matrix \(U\) will diagonalize \(A\text{,}\) creating the diagonal matrix \(D\) with diagonal entries equal to the eigenvalues of \(A\text{.}\) We can write this as \(\adjoint{U}AU = D\) and convert to \(\adjoint{U}A=D\adjoint{U}\text{.}\) Then we have \begin{gather*} \hat{\vect{y}} =\adjoint{U}\vect{y} =\adjoint{U}A\vect{x} =D\adjoint{U}\vect{x} =D\hat{\vect{x}}\text{.} \end{gather*} So with the coordinatized vectors, the transformation by the matrix \(A\) can be accomplished with multiplication by a diagonal matrix \(D\text{.}\) A moment's thought should convince you that a matrix-vector product with a diagonal matrix is exeedingly simple computationally. Geometrically, this is simply stretching, contracting and/or reflecting in the direction of each basis vector (“axis”). And the multiples used for these changes are the diagonal entries of the diagonal matrix, the eigenvalues of \(A\text{.}\)

So the new coordinate system (provided by the orthonormal basis of eigenvectors) is a collection of mutually perpendicular unit vectors where inner products are preserved, and the action of the matrix \(A\) is described by multiples (eigenvalues) of the entries of the coordinatized versions of the vectors. Nice.

Name three broad classes of normal matrices that we have studied previously. No set that you give should be a subset of another on your list.

Compare and contrast Theorem UTMR with Theorem OD.

Given an \(n\times n\) matrix \(A\text{,}\) why would you desire an orthonormal basis of \(\complex{n}\) composed entirely of eigenvectors of \(A\text{?}\)

Exercise MM.T35 asked you to show that \(A\adjoint{A}\) is Hermitian. Prove directly that \(A\adjoint{A}\) is a normal matrix.

In the discussion following Theorem OBNM we comment that the equation \(\hat{\vect{x}} = \adjoint{U}\vect{x}\) is just Theorem COB in disguise. Formulate this observation more formally and prove the equivalence.

For the proof of Theorem PTMT we only show that the product of two lower triangular matrices is again lower triangular. Provide a proof that the product of two upper triangular matrices is again upper triangular. Look to the proof of Theorem PTMT for guidance if you need a hint.