Section 2.3 Singular Value Decomposition
¶Subsection 2.3.1 Matrix-Adjoint Products
Our definitions and theorems in this section rely heavily on the properties of the matrix-adjoint products (A∗A and AA∗). We start by examining some of the basic properties of these two positive semi-definite matrices. Now would be a good time to review the basic facts about positive semi-definite matrices in Section Section 1.8.Theorem 2.3.1. Eigenvalues and Eigenvectors of Matrix-Adjoint Product.
Suppose that A is an m×n matrix and A∗A has rank r. Let λ1,λ2,λ3,…,λp be the nonzero distinct eigenvalues of A∗A and let ρ1,ρ2,ρ3,…,ρq be the nonzero distinct eigenvalues of AA∗. Then,
- p=q.
- The distinct nonzero eigenvalues can be ordered such that λi=ρi, 1≤i≤p.
- Properly ordered, the algebraic multiplicities of the nonzero eigenvalues are identical, αA∗A(λi)=αAA∗(ρi), 1≤i≤p.
- The rank of A∗A is equal to the rank of AA∗.
- There is an orthonormal basis, {x1,x2,x3,…,xn} of Cn composed of eigenvectors of A∗A and an orthonormal basis, {y1,y2,y3,…,ym} of Cm composed of eigenvectors of AA∗ with the following properties. Order the eigenvectors so that xi, r+1≤i≤n are the eigenvectors of A∗A for the zero eigenvalue. Let δi, 1≤i≤r denote the nonzero eigenvalues of A∗A. Then Axi=√δiyi, 1≤i≤r and Axi=0, r+1≤i≤n. Finally, yi, r+1≤i≤m, are eigenvectors of AA∗ for the zero eigenvalue.
Proof.
Suppose that \(\vect{x}\in\complex{n}\) is any eigenvector of \(\adjoint{A}A\) for a nonzero eigenvalue \(\lambda\text{.}\) We will show that \(A\vect{x}\) is an eigenvector of \(A\adjoint{A}\) for the same eigenvalue, \(\lambda\text{.}\) First, we ascertain that \(A\vect{x}\) is not the zero vector.
Since \(\vect{x}\) is an eigenvector, \(\vect{x}\neq\zerovector\text{,}\) and by Theorem PIP, \(\innerproduct{\vect{x}}{\vect{x}}\neq 0\text{.}\) As \(\lambda\) was assumed to be nonzero, we see that \(\innerproduct{A\vect{x}}{A\vect{x}}\neq 0\text{.}\) Again, Theorem PIP tells us that \(A\vect{x}\neq\zerovector\text{.}\)
Much of the sequel turns on the following simple computation. If you ever wonder what all the fuss is about adjoints, Hermitian matrices, square roots, and singular values, return to this brief computation, as it holds the key. There is much more to do in this proof, but after this it is mostly bookkeeping. Here we go. We check that \(A\vect{x}\) functions as an eigenvector of \(A\adjoint{A}\) for the eigenvalue \(\lambda\text{,}\)
That's it. If \(\vect{x}\) is an eigenvector of \(\adjoint{A}A\text{,}\) for a nonzero eigenvalue, then \(A\vect{x}\) is an eigenvector for \(A\adjoint{A}\) for the same nonzero eigenvalue. Let's see what this buys us.
\(\adjoint{A}A\) and \(A\adjoint{A}\) are Hermitian matrices (Definition HM), and hence are normal (Definition Definition 1.7.1). This provides the existence of orthonormal bases of eigenvectors for each matrix by Theorem OBNM. Also, since each matrix is diagonalizable (Definition DZM) by Theorem OD the algebraic and geometric multiplicities of each eigenvalue (zero or not) are equal by Theorem DMFE.
Our first step is to establish that a nonzero eigenvalue \(\lambda\) has the same geometric multiplicity for both \(\adjoint{A}A\) and \(A\adjoint{A}\text{.}\) Suppose \(\set{\vectorlist{x}{s}}\) is an orthonormal basis of eigenvectors of \(\adjoint{A}A\) for the eigenspace \(\eigenspace{\adjoint{A}A}{\lambda}\text{.}\) Then for \(1\leq i\lt j\leq s\text{,}\)
So the set \(E=\set{A\vect{x}_1,\,A\vect{x}_2,\,A\vect{x}_3,\,\dots,\,A\vect{x}_s}\) is an orthogonal set of nonzero eigenvectors of \(A\adjoint{A}\) for the eigenvalue \(\lambda\text{.}\) By Theorem OSLI, the set \(E\) is linearly independent and so the geometric multiplicity of \(\lambda\) as an eigenvalue of \(A\adjoint{A}\) is \(s\) or greater. We have
This inequality applies to any matrix for any of its nonzero eigenvalues. We now apply it to the matrix \(\adjoint{A}\text{,}\)
With the twin inequalities, we see that the four multiplicities of a common nonzero eigenvalue of \(\adjoint{A}A\) and \(A\adjoint{A}\) are all equal. This is enough to establish that \(p=q\text{,}\) since we cannot have an eigenvalue of one of the matrix-adjoint products along with a zero algebraic multiplicity for the other matrix-adjoint product. Then the eigenvalues can be ordered such that \(\lambda_i=\rho_i\) for \(1\leq i\leq p\text{.}\)
For any matrix, the null space is identical to the eigenspace of the zero eigenvalue, and thus the nullity of the matrix is equal to the geometric multiplicity of the zero eigenvalue. As our matrix-adjoint products are diagonalizable, the nullity is equal to the algebraic multiplicity of the zero eigenvalue. The algebraic multiplicities of all the eigenvalues sum to the size of the matrix (Theorem NEM), as does the rank and nullity (Theorem RPNC). So the rank of our matrix-adjoint products is equal to the sum of the algebraic multiplicities of the nonzero eigenvalues. So the ranks of \(\adjoint{A}A\) and \(A\adjoint{A}\) are equal,
When \(A\) is rectangular, the square matrices \(\adjoint{A}A\) and \(A\adjoint{A}\) have different sizes. With equal algebraic and geometric multiplicities for their common nonzero eigenvalues, the difference in their sizes is manifest in different algebraic multiplicities for the zero eigenvalue and different nullities. Specifically,
Suppose that \(\vectorlist{x}{n}\) is an orthonormal basis of \(\complex{n}\) composed of eigenvectors of \(\adjoint{A}A\) and ordered so that \(\vect{x}_i\text{,}\) \(r+1\leq i\leq n\) are eigenvectors of \(\adjoint{A}A\) for the zero eigenvalue. Denote the associated nonzero eigenvalues of \(\adjoint{A}A\) for these eigenvectors by \(\vect{\delta}_i\text{,}\) \(1\leq i\leq r\text{.}\) Then define
Let \(\vect{y}_{r+1},\,\vect{y}_{r+2},\,\vect{y}_{r+2},\,\dots,\,\vect{y}_{m}\) be an orthonormal basis for the eigenspace \(\eigenspace{A\adjoint{A}}{0}\text{,}\) whose existence is guaranteed by the Gram-Schmidt process (Theorem GSP). As scalar multiples of demonstrated eigenvectors of \(A\adjoint{A}\text{,}\) \(\vect{y}_i\text{,}\) \(1\leq i\leq r\) are also eigenvectors of \(A\adjoint{A}\text{,}\) and \(\vect{y}_i\text{,}\) \(r+1\leq i\leq n\) have been chosen as eigenvectors of \(A\adjoint{A}\text{.}\) These eigenvectors also have norm 1, as we now show.
For \(1\leq i\leq r\text{,}\)
For \(r+1\leq i\leq n\text{,}\) the \(\vect{y}_i\) have been chosen to have norm 1.
Finally we check orthogonality. Consider two eigenvectors \(\vect{y}_i\) and \(\vect{y}_j\) with \(1\leq i\lt j\leq m\text{.}\) If these two vectors have different eigenvalues, then Theorem HMOE establishes that the two eigenvectors are orthogonal. If the two eigenvectors have a zero eigenvalue, then they are orthogonal by the choice of the orthonormal basis of \(\eigenspace{A\adjoint{A}}{0}\text{.}\) If the two eigenvectors have identical eigenvalues, which are nonzero, then
So \(\set{\vectorlist{y}{m}}\) is an orthonormal set of eigenvectors for \(A\adjoint{A}\text{.}\) The critical relationship between these two orthonormal bases is present by design. For \(1\leq i\leq r\text{,}\)
For \(r+1\leq i\leq n\) we have
So by Theorem PIP, \(A\vect{x}_i=\zerovector\text{.}\)
Subsection 2.3.2 Singular Value Decomposition
The square roots of the eigenvalues of A∗A (or almost equivalently, AA∗!) are known as the singular values of A. Here is the definition.Definition 2.3.2. Singular Values.
Suppose A is an m×n matrix. If the eigenvalues of A∗A are δ1,δ2,δ3,…,δn, then the singular values of A are
Theorem 2.3.3. Singular Value Decomposition.
Suppose A is an m×n matrix of rank r with nonzero singular values s1,s2,s3,…,sr. Then A=USV∗ where U is a unitary matrix of size m, V is a unitary matrix of size n and S is an m×n matrix given by
Proof.
Let \(\vectorlist{x}{n}\) and \(\vectorlist{y}{m}\) be the orthonormal bases described by the conclusion of Theorem Theorem 2.3.1. Define \(U\) to be the \(m\times m\) matrix whose columns are \(\vect{y}_i\text{,}\) \(1\leq i\leq m\text{,}\) and define \(V\) to be the \(n\times n\) matrix whose columns are \(\vect{x}_i\text{,}\) \(1\leq i\leq n\text{.}\) With orthonormal sets of columns, both \(U\) and \(V\) are unitary matrices by Theorem CUMOS.
Then for \(1\leq i\leq m\text{,}\) \(1\leq j\leq n\text{,}\)
So by Theorem ME, \(AV\) and \(US\) are equal matrices, \(AV=US\text{.}\) \(V\) is unitary, so applying its inverse yields the decomposition
Subsection 2.3.3 Visualizing the SVD
¶It is helpful to think of the orthonormal bases that are the columns of U and V as “coordinate systems,” since they are pairwise “perpendicular” unit vectors, like the →i,→j,→k often used in describing the geometry of space. We would then call each basis vector an “axis”. Now think of an m×n matrix as a function from Cn to Cm. For an input vector w∈Cn, we have the output vector y=Aw∈Cm. If we write the output vector using the SVD decomposition of A as Aw=USV∗w we can consider the output as a three-step process that is more formally a composition of three linear transformations. Recall that unitary matrices preserve inner products, and thus preserve norms (length) and relative positions of two vectors (the “angle” between vectors), which is why unitary matrices are sometimes called “isometries” (Theorem UMPIP).- V∗ is the inverse of V so it will take any basis vector xi to a column of the identity matrix ei, an element of the standard basis of Cn, or and axis of the “usual” coordinate system. Any other vector, such as our w, will have its length preserved, but changing its position, though its position relative to the axes is unchanged.
- S converts the “rotated” w from Cn to Cm. But it does so in a very simple way. It simply scales each entry of the vector by a positive amount using a different singular value for each entry. If m>n, then the extra entries are simply new zeros. If m<n, then some entries get discarded.
- U will convert the standard basis vectors (the usual axes) to the new orthonormal basis given by the yi. The twice-transformed version of w will have its length preserved, but change position, though its position relative to the axes is unchanged.
Subsection 2.3.4 Properties of the SVD
¶The appeal of the singular value decomposition is two-fold. First it is applicable to any matrix. That should be obvious enough. Second, components of the SVD provide a wealth of information about a matrix, and in the case of numerical matrices, they are well-behaved. In this subsection we collect various theorems about the SVD, and explore the consequences in a section about applications, Section (((section-applications-of-SVD))). The SVD gives a decomposition of a matrix of a sum of rank one matrices, and the magnitude of the singular values tells us which of these rank one matrices is the most “important”.Theorem 2.3.4. SVD Rank One Decomposition.
Suppose the singular value decomposition of an m×n matrix A is given by A=USV∗, where the nonzero singular values in the first r entries of the diagonal of S are s1,s2,s3,…,sr, and the columns of U and V are, respectively, y1,y2,y3,…,ym and x1,x2,x3,…,xn. Then
Proof.
As usual, let \(\vect{e}_i\) be column \(i\) of the identity matrix \(I_m\text{.}\) Define \(S_i\text{,}\) for \(1\leq i\leq r\) to be the \(m\times n\) matrix where every entry is zero, except column \(i\) is \(s_i\vect{e}_i\text{.}\) Then, by design, \(S=\sum_{i=1}^r\,S_i\text{.}\) We have,