Section SVD Singular Value Decomposition

The singular value decomposition is one of the more useful ways to represent any matrix, even rectangular ones. We can also view the singular values of a (rectangular) matrix as analogues of the eigenvalues of a square matrix. Our definitions and theorems in this section rely heavily on the properties of the matrix-adjoint products (A∗A and AA∗), which we first met in Theorem CPSM. We start by examining some of the basic properties of these two matrices. Now would be a good time to review the basic facts about positive semi-definite matrices in Section PSM.

Subsection MAP: Matrix-Adjoint Product

Theorem EEMAP
Eigenvalues and Eigenvectors of Matrix-Adjoint Product
Suppose that A is an m×n matrix and A∗A has rank r. Let λ1

λ2

λ3

…

λp be the nonzero distinct eigenvalues of A∗A and let ρ1

ρ2

ρ3

…

ρq be the nonzero distinct eigenvalues of AA∗. Then,

Proof Suppose that x ∈ {ℂ}^{n} is any eigenvector of {A}^{∗}A for a nonzero eigenvalue λ. We will show that Ax is an eigenvector of A{A}^{∗} for the same eigenvalue, λ. First, we ascertain that Ax is not the zero vector.

\eqalignno{ \left \langle Ax,\kern 1.95872pt Ax\right \rangle & = \left \langle Ax,\kern 1.95872pt {\left ({A}^{∗}\right )}^{∗}x\right \rangle & &\text{@(a href="fcla-jsmath-2.01li30.html#theorem.AA")Theorem AA@(/a)} & & & & \cr & = \left \langle {A}^{∗}Ax,\kern 1.95872pt x\right \rangle & &\text{@(a href="fcla-jsmath-2.01li31.html#theorem.AIP")Theorem AIP@(/a)} & & & & \cr & = \left \langle λx,\kern 1.95872pt x\right \rangle & &\text{@(a href="fcla-jsmath-2.01li47.html#definition.EEM")Definition EEM@(/a)} & & & & \cr & = λ\left \langle x,\kern 1.95872pt x\right \rangle & &\text{@(a href="fcla-jsmath-2.01li28.html#theorem.IPSM")Theorem IPSM@(/a)} & & & & }

Since x is an eigenvector, x\mathrel{≠}0, and by Theorem PIP, \left \langle x,\kern 1.95872pt x\right \rangle \mathrel{≠}0. As λ was assumed to be nonzero, we see that \left \langle Ax,\kern 1.95872pt Ax\right \rangle \mathrel{≠}0. Again, Theorem PIP tells us that Ax\mathrel{≠}0.

Much of the sequel turns on the following simple computation. If you ever wonder what all the fuss is about adjoints, Hermitian matrices, square roots, and singular values, return to this brief computation, as it holds the key. There is much more to do in this proof, but after this it is mostly bookkeeping. Here we go. We check that Ax functions as an eigenvector of A{A}^{∗} for the eigenvalue λ,

\eqalignno{ \left (A{A}^{∗}\right )Ax & = A\left ({A}^{∗}A\right )x & &\text{@(a href="fcla-jsmath-2.01li31.html#theorem.MMA")Theorem MMA@(/a)} & & & & \cr & = Aλx & &\text{@(a href="fcla-jsmath-2.01li47.html#definition.EEM")Definition EEM@(/a)} & & & & \cr & = λ\left (Ax\right ) & &\text{@(a href="fcla-jsmath-2.01li31.html#theorem.MMSMM")Theorem MMSMM@(/a)} & & & & }

That’s it. If x is an eigenvector of {A}^{∗}A (for a nonzero eigenvalue), then Ax is an eigenvector for A{A}^{∗} for the same eigenvalue. Let’s see what this buys us.

{A}^{∗}A and A{A}^{∗} are Hermitian matrices (Definition HM), and hence are normal (Definition NRML). This provides the existence of orthonormal bases of eigenvectors for each matrix by Theorem OBNM. Also, since each matrix is diagonalizable (Definition DZM) by Theorem OD we can interchange algebraic and geometric multiplicities by Theorem DMFE.

Our first step is to establish that an eigenvalue λ has the same geometric multiplicity for both {A}^{∗}A and A{A}^{∗}. Suppose \left \{{x}_{1},\kern 1.95872pt {x}_{2},\kern 1.95872pt {x}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {x}_{s}\right \} is an orthonormal basis of eigenvectors of {A}^{∗}A for the eigenspace {ℰ}_{{A}^{∗}A}\left (λ\right ). Then for 1 ≤ i < j ≤ s, note

\eqalignno{ \left \langle A{x}_{i},\kern 1.95872pt A{x}_{j}\right \rangle & = \left \langle A{x}_{i},\kern 1.95872pt {\left ({A}^{∗}\right )}^{∗}{x}_{ j}\right \rangle & &\text{@(a href="fcla-jsmath-2.01li30.html#theorem.AA")Theorem AA@(/a)} & & & & \cr & = \left \langle {A}^{∗}A{x}_{ i},\kern 1.95872pt {x}_{j}\right \rangle & &\text{@(a href="fcla-jsmath-2.01li31.html#theorem.AIP")Theorem AIP@(/a)} & & & & \cr & = \left \langle λ{x}_{i},\kern 1.95872pt {x}_{j}\right \rangle & &\text{@(a href="fcla-jsmath-2.01li47.html#definition.EEM")Definition EEM@(/a)} & & & & \cr & = λ\left \langle {x}_{i},\kern 1.95872pt {x}_{j}\right \rangle & &\text{@(a href="fcla-jsmath-2.01li28.html#theorem.IPSM")Theorem IPSM@(/a)} & & & & \cr & = λ(0) & &\text{@(a href="fcla-jsmath-2.01li28.html#definition.ONS")Definition ONS@(/a)} & & & & \cr & = 0 & &\text{@(a href="fcla-jsmath-2.01li69.html#property.ZCN")Property ZCN@(/a)} & & & & }

Then the set E = \left \{A{x}_{1},\kern 1.95872pt A{x}_{2},\kern 1.95872pt A{x}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt A{x}_{s}\right \} is an orthogonal set of nonzero eigenvectors of A{A}^{∗} for the eigenvalue λ. By Theorem OSLI, the set E is linearly independent and so the geometric multiplicity of λ as an eigenvalue of A{A}^{∗} is s or greater. We have

\eqalignno{ {α}_{{A}^{∗}A}\left (λ\right ) & = {γ}_{{A}^{∗}A}\left (λ\right ) ≤ {γ}_{A{A}^{∗}}\left (λ\right ) = {α}_{A{A}^{∗}}\left (λ\right ) & & }

This inequality applies to any matrix, so long as the eigenvalue is nonzero. We now apply it to the matrix {A}^{∗},

\eqalignno{ {α}_{A{A}^{∗}}\left (λ\right ) & = {α}_{{\left ({A}^{∗}\right )}^{∗}{A}^{∗}}\left (λ\right ) ≤ {α}_{{A}^{∗}{\left ({A}^{∗}\right )}^{∗}}\left (λ\right ) = {α}_{{A}^{∗}A}\left (λ\right ) & & }

So for a nonzero eigenvalue, its algebraic multiplicities as an eigenvalue of {A}^{∗}A and A{A}^{∗} are equal. This is enough to establish that p = q and the eigenvalues can be ordered such that {λ}_{i} = {ρ}_{i} for 1 ≤ i ≤ p.

For any matrix B, the null space is identical to the eigenspace of the zero eigenvalue, N\kern -1.95872pt \left (B\right ) = {ℰ}_{B}\left (0\right ), and thus the nullity of the matrix is equal to the geometric multiplicity of the zero eigenvalue. With this, we can examine the ranks of {A}^{∗}A and A{A}^{∗}.

\eqalignno{ r\left ({A}^{∗}A\right ) & = n − n\left ({A}^{∗}A\right ) & &\text{@(a href="fcla-jsmath-2.01li41.html#theorem.RPNC")Theorem RPNC@(/a)} & & & & \cr & = \left ({α}_{{A}^{∗}A}\left (0\right ) +{ \mathop{∑ }}_{i=1}^{p}{α}_{{ A}^{∗}A}\left ({λ}_{i}\right )\right ) − n\left ({A}^{∗}A\right ) & &\text{@(a href="fcla-jsmath-2.01li48.html#theorem.NEM")Theorem NEM@(/a)} & & & & \cr & = \left ({α}_{{A}^{∗}A}\left (0\right ) +{ \mathop{∑ }}_{i=1}^{p}{α}_{{ A}^{∗}A}\left ({λ}_{i}\right )\right ) − {γ}_{{A}^{∗}A}\left (0\right ) & &\text{@(a href="fcla-jsmath-2.01li47.html#definition.GME")Definition GME@(/a)} & & & & \cr & = \left ({α}_{{A}^{∗}A}\left (0\right ) +{ \mathop{∑ }}_{i=1}^{p}{α}_{{ A}^{∗}A}\left ({λ}_{i}\right )\right ) − {α}_{{A}^{∗}A}\left (0\right ) & &\text{@(a href="fcla-jsmath-2.01li49.html#theorem.DMFE")Theorem DMFE@(/a)} & & & & \cr & ={ \mathop{∑ }}_{i=1}^{p}{α}_{{ A}^{∗}A}\left ({λ}_{i}\right ) & & & & \cr & ={ \mathop{∑ }}_{i=1}^{p}{α}_{ A{A}^{∗}}\left ({λ}_{i}\right ) & & & & \cr & = \left ({α}_{A{A}^{∗}}\left (0\right ) +{ \mathop{∑ }}_{i=1}^{p}{α}_{ A{A}^{∗}}\left ({λ}_{i}\right )\right ) − {α}_{A{A}^{∗}}\left (0\right ) & & & & \cr & = \left ({α}_{A{A}^{∗}}\left (0\right ) +{ \mathop{∑ }}_{i=1}^{p}{α}_{ A{A}^{∗}}\left ({λ}_{i}\right )\right ) − {γ}_{A{A}^{∗}}\left (0\right ) & &\text{@(a href="fcla-jsmath-2.01li49.html#theorem.DMFE")Theorem DMFE@(/a)} & & & & \cr & = \left ({α}_{A{A}^{∗}}\left (0\right ) +{ \mathop{∑ }}_{i=1}^{p}{α}_{ A{A}^{∗}}\left ({λ}_{i}\right )\right ) − n\left (A{A}^{∗}\right ) & &\text{@(a href="fcla-jsmath-2.01li47.html#definition.GME")Definition GME@(/a)} & & & & \cr & = m − n\left (A{A}^{∗}\right ) & &\text{@(a href="fcla-jsmath-2.01li48.html#theorem.NEM")Theorem NEM@(/a)} & & & & \cr & = r\left (A{A}^{∗}\right ) & &\text{@(a href="fcla-jsmath-2.01li41.html#theorem.RPNC")Theorem RPNC@(/a)} & & & & }

When A is rectangular, the square matrices {A}^{∗}A and A{A}^{∗} have different sizes. With equal algebraic and geometric multiplicities for their common nonzero eigenvalues, the difference in their sizes is manifest in different algebraic multiplicities for the zero eigenvalue and different nullities. Specifically,

\eqalignno{ n\left ({A}^{∗}A\right ) & = n − r &n\left (A{A}^{∗}\right ) & = m − r & & & & }

Suppose that {x}_{1},\kern 1.95872pt {x}_{2},\kern 1.95872pt {x}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {x}_{n} is an orthonormal basis of {ℂ}^{n} composed of eigenvectors of {A}^{∗}A and ordered so that {x}_{i}, r + 1 ≤ i ≤ n are eigenvectors of A{A}^{∗} for the zero eigenvalue. Denote the associated nonzero eigenvalues of {A}^{∗}A for these eigenvectors by {δ}_{i}, 1 ≤ i ≤ r. Then define

\eqalignno{ {y}_{i} & = {1\over \sqrt{{δ}_{i}}}A{x}_{i} &1 & ≤ i ≤ r & & & & }

Let {y}_{r+1},\kern 1.95872pt {y}_{r+2},\kern 1.95872pt {y}_{r+2},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {y}_{m} be an orthonormal basis for the eigenspace {ℰ}_{A{A}^{∗}}\left (0\right ), whose existence is guaranteed by Theorem GSP. As scalar multiples of demonstrated eigenvectors of A{A}^{∗}, {y}_{i}, 1 ≤ i ≤ r are also eigenvectors of A{A}^{∗}, and {y}_{i}, r + 1 ≤ i ≤ n have been chosen as eigenvectors of A{A}^{∗}. These eigenvectors also have norm 1, as we now show. For 1 ≤ i ≤ r,

\eqalignno{ \left \Vert {y}_{i}\right \Vert & = \left \Vert {1\over \sqrt{{δ}_{i}}}A{x}_{i}\right \Vert & & & & \cr & = \sqrt{\left \langle {1\over \sqrt{{δ}_{i}}}A{x}_{i},\kern 1.95872pt {1\over \sqrt{{δ}_{i}}}A{x}_{i}\right \rangle } & &\text{@(a href="fcla-jsmath-2.01li28.html#theorem.IPN")Theorem IPN@(/a)} & & & & \cr & = \sqrt{ {1\over \sqrt{{δ}_{i}}}\overline{ {1\over \sqrt{{δ}_{i}}}}\left \langle A{x}_{i},\kern 1.95872pt A{x}_{i}\right \rangle } & &\text{@(a href="fcla-jsmath-2.01li28.html#theorem.IPSM")Theorem IPSM@(/a)} & & & & \cr & = \sqrt{ {1\over \sqrt{{δ}_{i}}} {1\over \sqrt{{δ}_{i}}}\left \langle A{x}_{i},\kern 1.95872pt A{x}_{i}\right \rangle } & &\text{@(a href="fcla-jsmath-2.01li48.html#theorem.HMRE")Theorem HMRE@(/a)} & & & & \cr & = {1\over \sqrt{{δ}_{i}}}\sqrt{\left \langle A{x}_{i } , \kern 1.95872pt A{x}_{i}\right \rangle } & & & & \cr & = {1\over \sqrt{{δ}_{i}}}\sqrt{\left \langle A{x}_{i } , \kern 1.95872pt { \left ({A}^{∗ } \right ) }^{∗ } {x}_{i}\right \rangle } & &\text{@(a href="fcla-jsmath-2.01li30.html#theorem.AA")Theorem AA@(/a)} & & & & \cr & = {1\over \sqrt{{δ}_{i}}}\sqrt{\left \langle {A}^{∗ } A{x}_{i } , \kern 1.95872pt {x}_{i}\right \rangle } & &\text{@(a href="fcla-jsmath-2.01li31.html#theorem.AIP")Theorem AIP@(/a)} & & & & \cr & = {1\over \sqrt{{δ}_{i}}}\sqrt{\left \langle {δ}_{i } {x}_{i } , \kern 1.95872pt {x}_{i}\right \rangle } & &\text{@(a href="fcla-jsmath-2.01li47.html#definition.EEM")Definition EEM@(/a)} & & & & \cr & = {1\over \sqrt{{δ}_{i}}}\sqrt{{δ}_{i } \left \langle {x}_{i } , \kern 1.95872pt {x}_{i}\right \rangle } & &\text{@(a href="fcla-jsmath-2.01li28.html#theorem.IPSM")Theorem IPSM@(/a)} & & & & \cr & = {1\over \sqrt{{δ}_{i}}}\sqrt{{δ}_{i } (1)} & &\text{@(a href="fcla-jsmath-2.01li28.html#definition.ONS")Definition ONS@(/a)} & & & & \cr & = 1 & & & & }

Finally we check orthogonality. Consider two eigenvectors {y}_{i} and {y}_{j} with 1 ≤ i < j ≤ m. If these two vectors have different eigenvalues, then Theorem HMOE establishes that the two eigenvectors are orthogonal. If the two eigenvectors have a zero eigenvalue, then they are orthogonal by the choice of the orthonormal basis of {ℰ}_{A{A}^{∗}}\left (0\right ). If the two eigenvectors have identical, nonzero, eigenvalues, then

\eqalignno{ \left \langle {y}_{i},\kern 1.95872pt {y}_{j}\right \rangle & = \left \langle {1\over \sqrt{{δ}_{i}}}A{x}_{i},\kern 1.95872pt {1\over \sqrt{{δ}_{j}}}A{x}_{j}\right \rangle & & & & \cr & = {1\over \sqrt{{δ}_{i}}}\overline{ {1\over \sqrt{{δ}_{j}}}}\left \langle A{x}_{i},\kern 1.95872pt A{x}_{j}\right \rangle & &\text{@(a href="fcla-jsmath-2.01li28.html#theorem.IPSM")Theorem IPSM@(/a)} & & & & \cr & = {1\over \sqrt{{δ}_{i } {δ}_{j}}}\left \langle A{x}_{i},\kern 1.95872pt A{x}_{j}\right \rangle & &\text{@(a href="fcla-jsmath-2.01li48.html#theorem.HMRE")Theorem HMRE@(/a)} & & & & \cr & = {1\over \sqrt{{δ}_{i } {δ}_{j}}}\left \langle A{x}_{i},\kern 1.95872pt {\left ({A}^{∗}\right )}^{∗}{x}_{ j}\right \rangle & &\text{@(a href="fcla-jsmath-2.01li30.html#theorem.AA")Theorem AA@(/a)} & & & & \cr & = {1\over \sqrt{{δ}_{i } {δ}_{j}}}\left \langle {A}^{∗}A{x}_{ i},\kern 1.95872pt {x}_{j}\right \rangle & &\text{@(a href="fcla-jsmath-2.01li31.html#theorem.AIP")Theorem AIP@(/a)} & & & & \cr & = {1\over \sqrt{{δ}_{i } {δ}_{j}}}\left \langle {δ}_{i}{x}_{i},\kern 1.95872pt {x}_{j}\right \rangle & &\text{@(a href="fcla-jsmath-2.01li47.html#definition.EEM")Definition EEM@(/a)} & & & & \cr & = {{δ}_{i}\over \sqrt{{δ}_{i } {δ}_{j}}}\left \langle {x}_{i},\kern 1.95872pt {x}_{j}\right \rangle & &\text{@(a href="fcla-jsmath-2.01li28.html#theorem.IPSM")Theorem IPSM@(/a)} & & & & \cr & = {{δ}_{i}\over \sqrt{{δ}_{i } {δ}_{j}}}(0) & &\text{@(a href="fcla-jsmath-2.01li28.html#definition.ONS")Definition ONS@(/a)} & & & & \cr & = 0 & & & & }

So \left \{{y}_{1},\kern 1.95872pt {y}_{2},\kern 1.95872pt {y}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {y}_{m}\right \} is an orthonormal set of eigenvectors for A{A}^{∗}. The critical relationship between these two orthonormal bases is present by design. For 1 ≤ i ≤ r,

\eqalignno{ A{x}_{i} & = \sqrt{{δ}_{i}} {1\over \sqrt{{δ}_{i}}}A{x}_{i} = \sqrt{{δ}_{i}}{y}_{i} & & }

\eqalignno{ \left \langle A{x}_{i},\kern 1.95872pt A{x}_{i}\right \rangle & = \left \langle A{x}_{i},\kern 1.95872pt {\left ({A}^{∗}\right )}^{∗}{x}_{ i}\right \rangle & &\text{@(a href="fcla-jsmath-2.01li30.html#theorem.AA")Theorem AA@(/a)} & & & & \cr & = \left \langle {A}^{∗}A{x}_{ i},\kern 1.95872pt {x}_{i}\right \rangle & &\text{@(a href="fcla-jsmath-2.01li31.html#theorem.AIP")Theorem AIP@(/a)} & & & & \cr & = \left \langle 0,\kern 1.95872pt {x}_{i}\right \rangle & &\text{@(a href="fcla-jsmath-2.01li47.html#definition.EEM")Definition EEM@(/a)} & & & & \cr & = 0 & &\text{@(a href="fcla-jsmath-2.01li28.html#definition.IP")Definition IP@(/a)} & & & & }

Subsection SVD: Singular Value Decomposition

The square roots of the eigenvalues of {A}^{∗}A (or almost equivalently, A{A}^{∗}!) are known as the singular values of A. Here is the definition.

Definition SV
Singular Values
Suppose A is an m × n matrix. If the eigenvalues of {A}^{∗}A are {δ}_{1},\kern 1.95872pt {δ}_{2},\kern 1.95872pt {δ}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {δ}_{n}, then the singular values of A are \sqrt{{δ}_{1}},\kern 1.95872pt \sqrt{{δ}_{2}},\kern 1.95872pt \sqrt{{δ}_{3}},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt \sqrt{{δ}_{n}}. △

Theorem EEMAP is a total setup for the singular value decomposition. This remarkable theorem says that any matrix can be broken into a product of three matrices. Two are square, and unitary. In light of Theorem UMPIP, we can view these matrices as transforming vectors or coordinates in a rotational fashion. The middle matrix of this decomposition is rectangular, but is as close to being diagonal as a rectangular matrix can be. Viewed as a transformation, this matrix effects, relections, contractions or expansions along axes — it stretches vectors. So any matrix, viewed as a transformation is the product of a rotation, a stretch and a rotation.

The singular value theorem can also be viewed as an application of our most general statement about matrix representations of linear transformations relative to different bases. Theorem MRCB concerns linear transformations T : U\mathrel{↦}V where U and V are possibly different vector spaces. When U and V have different dimensions, the resulting matrix representation will be rectangular. In Section CB we quickly specialized to the case where U = V and the matrix representations are square with one of our most central results, Theorem SCB. Theorem SVD is an application of the full generality of Theorem MRCB where the relevant bases are now orthonormal sets.

Theorem SVD
Singular Value Decomposition
Suppose A is an m × n matrix of rank r with nonzero singular values {s}_{1},\kern 1.95872pt {s}_{2},\kern 1.95872pt {s}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {s}_{r}. Then A = UD{V }^{∗} where U is a unitary matrix of size m, V is a unitary matrix of size n and D is an m × n matrix given by

\eqalignno{ {\left [D\right ]}_{ij} & = \left \{\array{ {s}_{i}\quad &\text{if }1 ≤ i = j ≤ r \cr 0 \quad &\text{otherwise} } \right . & & }

Proof Let {x}_{1},\kern 1.95872pt {x}_{2},\kern 1.95872pt {x}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {x}_{n} and {y}_{1},\kern 1.95872pt {y}_{2},\kern 1.95872pt {y}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {y}_{m} be the orthonormal bases described by the conclusion of Theorem EEMAP. Define U to be the m × m matrix whose columns are {y}_{i}, 1 ≤ i ≤ m, and define V to be the n × n matrix whose columns are {x}_{i}, 1 ≤ i ≤ n. With orthonormal sets of columns, by Theorem CUMOS both U and V are unitary matrices.

\eqalignno{ {\left [AV \right ]}_{ij} & ={ \left [A{x}_{j}\right ]}_{i} & &\text{@(a href="fcla-jsmath-2.01li31.html#definition.MM")Definition MM@(/a)} & & & & \cr & ={ \left [\sqrt{{δ}_{j}}{y}_{j}\right ]}_{i} & &\text{@(a href="#theorem.EEMAP")Theorem EEMAP@(/a)} & & & & \cr & ={ \left [{s}_{j}{y}_{j}\right ]}_{i} & &\text{@(a href="#definition.SV")Definition SV@(/a)} & & & & \cr & ={ \left [{y}_{j}\right ]}_{i}{s}_{j} & &\text{@(a href="fcla-jsmath-2.01li23.html#definition.CVSM")Definition CVSM@(/a)} & & & & \cr & ={ \left [U\right ]}_{ij}{\left [D\right ]}_{jj} & & & & \cr & ={ \mathop{∑ }}_{k=1}^{m}{\left [U\right ]}_{ ik}{\left [D\right ]}_{kj} & & & & \cr & ={ \left [UD\right ]}_{ij} & &\text{@(a href="fcla-jsmath-2.01li31.html#theorem.EMP")Theorem EMP@(/a)} & & & & }