Section SVD  Singular Value Decomposition

From A First Course in Linear Algebra
Version 1.08
© 2004.
Licensed under the GNU Free Documentation License.

This Section is a Draft, Subject to Changes
Needs Numerical Examples

The singular value decomposition is one of the more useful ways to represent any matrix, even rectangular ones. We can also view the singular values of a (rectangular) matrix as analogues of the eigenvalues of a square matrix. Our definitions and theorems in this section rely heavily on the properties of the matrix-adjoint products (AA and AA), which we first met in Theorem CPSM. We start by examining some of the basic properties of these two matrices. Now would be a good time to review the basic facts about positive semi-definite matrices in Section PSM.

Subsection MAP: Matrix-Adjoint Product

Theorem EEMAP
Eigenvalues and Eigenvectors of Matrix-Adjoint Product
Suppose that A is an m × n matrix and AA has rank r. Let λ1,λ2,λ3,,λp be the nonzero distinct eigenvalues of AA and let ρ1,ρ2,ρ3,,ρq be the nonzero distinct eigenvalues of AA. Then,

  1. p = q.
  2. The distinct nonzero eigenvalues can be ordered such that λi = ρi, 1 i p.
  3. Properly ordered, αAA λi = αAA ρi, 1 i p.
  4. The rank of AA is equal to the rank of AA.
  5. There is an orthonormal basis, x1,x2,x3,,xn of n composed of eigenvectors of AA and an orthonormal basis, y1,y2,y3,,ym of m composed of eigenvectors of AA with the following properties. Order the eigenvectors so that xi, r + 1 i n are the eigenvectors of AA for the zero eigenvalue. Let δi, 1 i r denote the nonzero eigenvalues of AA. Then Axi = δiyi, 1 i r and Axi = 0, r + 1 i n. Finally, yi, r + 1 i m, are eigenvectors of AA for the zero eigenvalue.

Proof   Suppose that x n is any eigenvector of AA for a nonzero eigenvalue λ. We will show that Ax is an eigenvector of AA for the same eigenvalue, λ. First, we ascertain that Ax is not the zero vector.

Ax,Ax = Ax, Ax  Theorem AA = AAx,x  Theorem AIP = λx,x  Definition EEM = λ x,x  Theorem IPSM

Since x is an eigenvector, x0, and by Theorem PIP, x,x0. As λ was assumed to be nonzero, we see that Ax,Ax0. Again, Theorem PIP tells us that Ax0.

Much of the sequel turns on the following simple computation. If you ever wonder what all the fuss is about adjoints, Hermitian matrices, square roots, and singular values, return to this brief computation, as it holds the key. There is much more to do in this proof, but after this it is mostly bookkeeping. Here we go. We check that Ax functions as an eigenvector of AA for the eigenvalue λ,

AAAx = A AAx  Theorem MMA = Aλx  Definition EEM = λ Ax  Theorem MMSMM

That’s it. If x is an eigenvector of AA (for a nonzero eigenvalue), then Ax is an eigenvector for AA for the same eigenvalue. Let’s see what this buys us.

AA and AA are Hermitian matrices (Definition HM), and hence are normal (Definition NRML). This provides the existence of orthonormal bases of eigenvectors for each matrix by Theorem OBNM. Also, since each matrix is diagonalizable (Definition DZM) by Theorem OD we can interchange algebraic and geometric multiplicities by Theorem DMFE.

Our first step is to establish that an eigenvalue λ has the same geometric multiplicity for both AA and AA. Suppose x1,x2,x3,,xs is an orthonormal basis of eigenvectors of AA for the eigenspace AA λ. Then for 1 i < j s, note

Axi,Axj = Axi, Ax j  Theorem AA = AAx i,xj  Theorem AIP = λxi,xj  Definition EEM = λ xi,xj  Theorem IPSM = λ(0)  Definition ONS = 0  Property ZCN

Then the set E = Ax1,Ax2,Ax3,,Axs is an orthogonal set of nonzero eigenvectors of AA for the eigenvalue λ. By Theorem OSLI, the set E is linearly independent and so the geometric multiplicity of λ as an eigenvalue of AA is s or greater. We have

αAA λ = γAA λ γAA λ = αAA λ

This inequality applies to any matrix, so long as the eigenvalue is nonzero. We now apply it to the matrix A,

αAA λ = αAA λ αAA λ = αAA λ

So for a nonzero eigenvalue, its algebraic multiplicities as an eigenvalue of AA and AA are equal. This is enough to establish that p = q and the eigenvalues can be ordered such that λi = ρi for 1 i p.

For any matrix B, the null space is identical to the eigenspace of the zero eigenvalue, NB = B 0, and thus the nullity of the matrix is equal to the geometric multiplicity of the zero eigenvalue. With this, we can examine the ranks of AA and AA.

r AA = n n AA  Theorem RPNC = αAA 0 + i=1pα AA λi n AA  Theorem NEM = αAA 0 + i=1pα AA λi γAA 0  Definition GME = αAA 0 + i=1pα AA λi αAA 0  Theorem DMFE = i=1pα AA λi = i=1pα AA λi = αAA 0 + i=1pα AA λi αAA 0 = αAA 0 + i=1pα AA λi γAA 0  Theorem DMFE = αAA 0 + i=1pα AA λi n AA  Definition GME = m n AA  Theorem NEM = r AA  Theorem RPNC

When A is rectangular, the square matrices AA and AA have different sizes. With equal algebraic and geometric multiplicities for their common nonzero eigenvalues, the difference in their sizes is manifest in different algebraic multiplicities for the zero eigenvalue and different nullities. Specifically,

n AA = n r n AA = m r

Suppose that x1,x2,x3,,xn is an orthonormal basis of n composed of eigenvectors of AA and ordered so that xi, r + 1 i n are eigenvectors of AA for the zero eigenvalue. Denote the associated nonzero eigenvalues of AA for these eigenvectors by δi, 1 i r. Then define

yi = 1 δiAxi 1 i r

Let yr+1,yr+2,yr+2,,ym be an orthonormal basis for the eigenspace AA 0, whose existence is guaranteed by Theorem GSP. As scalar multiples of demonstrated eigenvectors of AA, yi, 1 i r are also eigenvectors of AA, and yi, r + 1 i n have been chosen as eigenvectors of AA. These eigenvectors also have norm 1, as we now show. For 1 i r,

yi = 1 δiAxi = 1 δiAxi, 1 δiAxi  Theorem IPN = 1 δi 1 δi¯ Axi,Axi  Theorem IPSM = 1 δi 1 δi Axi,Axi  Theorem HMRE = 1 δiAxi , Axi = 1 δiAxi , A xi  Theorem AA = 1 δiA Axi , xi  Theorem AIP = 1 δiδi xi , xi  Theorem EEM = 1 δiδi xi , xi  Theorem IPSM = 1 δiδi (1)  Definition ONS = 1

For r + 1 i n, the yi have been chosen to have norm 1.

Finally we check orthogonality. Consider two eigenvectors yi and yj with 1 i < j m. If these two vectors have different eigenvalues, then Theorem HMOE establishes that the two eigenvectors are orthogonal. If the two eigenvectors have a zero eigenvalue, then they are orthogonal by the choice of the orthonormal basis of AA 0. If the two eigenvectors have identical, nonzero, eigenvalues, then

yi,yj = 1 δiAxi, 1 δjAxj = 1 δi 1 δj¯ Axi,Axj  Theorem IPSM = 1 δi δj Axi,Axj  Theorem HMRE = 1 δi δj Axi, Ax j  Theorem AA = 1 δi δj AAx i,xj  Theorem AIP = 1 δi δj δixi,xj  Definition EEM = δi δi δj xi,xj  Theorem IPSM = δi δi δj(0)  Definition ONS = 0

So y1,y2,y3,,ym is an orthonormal set of eigenvectors for AA. The critical relationship between these two orthonormal bases is present by design. For 1 i r,

Axi = δi 1 δiAxi = δiyi

For r + 1 i n we have

Axi,Axi = Axi, Ax i  Theorem AA = AAx i,xi  Theorem AIP = 0,xi  Definition EEM = 0  Definition IP

So by Theorem PIP, Axi = 0.

Subsection SVD: Singular Value Decomposition

The square roots of the eigenvalues of AA (or almost equivalently, AA!) are known as the singular values of A. Here is the definition.

Definition SV
Singular Values
Suppose A is an m × n matrix. If the eigenvalues of AA are δ1,δ2,δ3,,δn, then the singular values of A are δ1,δ2,δ3,,δn.

Theorem EEMAP is a total setup for the singular value decomposition. This remarkable theorem says that any matrix can be broken into a product of three matrices. Two are square, and unitary. In light of Theorem UMPIP, we can view these matrices as transforming vectors or coordinates in a rotational fashion. The middle matrix of this decomposition is rectangular, but is as close to being diagonal as a rectangular matrix can be. Viewed as a transformation, this matrix effects, relections, contractions or expansions along axes — it stretches vectors. So any matrix, viewed as a transformation is the product of a rotation, a stretch and a rotation.

The singular value theorem can also be viewed as an application of our most general statement about matrix representations of linear transformations relative to different bases. Theorem MRCB concerns linear transformations T : UV where U and V are possibly different vector spaces. When U and V have different dimensions, the resulting matrix representation will be rectangular. In Section CB we quickly specialized to the case where U = V and the matrix representations are square with one of our most central results, Theorem SCB. Theorem SVD is an application of the full generality of Theorem MRCB where the relevant bases are now orthonormal sets.

Theorem SVD
Singular Value Decomposition
Suppose A is an m × n matrix of rank r with nonzero singular values s1,s2,s3,,sr. Then A = UDV where U is a unitary matrix of size m, V is a unitary matrix of size n and D is an m × n matrix given by

Dij = si if 1 i = j r 0  otherwise

Proof   Let x1,x2,x3,,xn and y1,y2,y3,,ym be the orthonormal bases described by the conclusion of Theorem EEMAP. Define U to be the m × m matrix whose columns are yi, 1 i m, and define V to be the n × n matrix whose columns are xi, 1 i n. With orthonormal sets of columns, by Theorem CUMOS both U and V are unitary matrices.

Then for 1 i m, 1 j n,

AV ij = Axj i  Definition MM = δjyj i  Theorem EEMAP = sjyj i  Definition SV = Dyj i  Definition MVP = DUij  Definition MM

So by Theorem ME, AV = DU and thus

A = AIn = AV V = DUV