This Sectionisa Draft, Subjectto Changes Needs Numerical Examples
The singular value decomposition is one of the more useful ways to represent any
matrix, even rectangular ones. We can also view the singular values of a (rectangular)
matrix as analogues of the eigenvalues of a square matrix. Our definitions and
theorems in this section rely heavily on the properties of the matrix-adjoint products
({A}^{∗}A and
A{A}^{∗}),
which we first met in Theorem CPSM. We start by examining some of the basic
properties of these two matrices. Now would be a good time to review the basic
facts about positive semi-definite matrices in Section PSM.
Subsection MAP: Matrix-Adjoint Product
Theorem EEMAP Eigenvalues and Eigenvectors of Matrix-Adjoint Product Suppose that A
is an m × n matrix
and {A}^{∗}A has
rank r. Let
{λ}_{1},\kern 1.95872pt {λ}_{2},\kern 1.95872pt {λ}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {λ}_{p} be the nonzero distinct
eigenvalues of {A}^{∗}A and
let {ρ}_{1},\kern 1.95872pt {ρ}_{2},\kern 1.95872pt {ρ}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {ρ}_{q} be the nonzero
distinct eigenvalues of A{A}^{∗}.
Then,
p = q.
The distinct nonzero eigenvalues can be ordered such that {λ}_{i} = {ρ}_{i},
1 ≤ i ≤ p.
Properly ordered, {α}_{{A}^{∗}A}\left ({λ}_{i}\right ) = {α}_{A{A}^{∗}}\left ({ρ}_{i}\right ),
1 ≤ i ≤ p.
The rank of {A}^{∗}A
is equal to the rank of A{A}^{∗}.
There is an orthonormal basis, \left \{{x}_{1},\kern 1.95872pt {x}_{2},\kern 1.95872pt {x}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {x}_{n}\right \}
of {ℂ}^{n}
composed of eigenvectors of {A}^{∗}A
and an orthonormal basis, \left \{{y}_{1},\kern 1.95872pt {y}_{2},\kern 1.95872pt {y}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {y}_{m}\right \}
of {ℂ}^{m}
composed of eigenvectors of A{A}^{∗}
with the following properties. Order the eigenvectors so that {x}_{i},
r + 1 ≤ i ≤ n
are the eigenvectors of {A}^{∗}A
for the zero eigenvalue. Let {δ}_{i},
1 ≤ i ≤ r
denote the nonzero eigenvalues of {A}^{∗}A.
Then A{x}_{i} = \sqrt{{δ}_{i}}{y}_{i},
1 ≤ i ≤ r
and A{x}_{i} = 0,
r + 1 ≤ i ≤ n.
Finally, {y}_{i},
r + 1 ≤ i ≤ m,
are eigenvectors of A{A}^{∗}
for the zero eigenvalue.
□
Proof Suppose that x ∈ {ℂ}^{n}
is any eigenvector of {A}^{∗}A for
a nonzero eigenvalue λ.
We will show that Ax is
an eigenvector of A{A}^{∗} for
the same eigenvalue, λ.
First, we ascertain that Ax
is not the zero vector.
Since x is an
eigenvector, x\mathrel{≠}0, and by
Theorem PIP, \left \langle x,\kern 1.95872pt x\right \rangle \mathrel{≠}0. As
λ was assumed to be nonzero,
we see that \left \langle Ax,\kern 1.95872pt Ax\right \rangle \mathrel{≠}0. Again,
Theorem PIP tells us that Ax\mathrel{≠}0.
Much of the sequel turns on the following simple computation. If you ever wonder
what all the fuss is about adjoints, Hermitian matrices, square roots, and singular
values, return to this brief computation, as it holds the key. There is much more
to do in this proof, but after this it is mostly bookkeeping. Here we go. We check that
Ax functions as an
eigenvector of A{A}^{∗} for
the eigenvalue λ,
That’s it. If x is an
eigenvector of {A}^{∗}A (for a
nonzero eigenvalue), then Ax
is an eigenvector for A{A}^{∗}
for the same eigenvalue. Let’s see what this buys us.
{A}^{∗}A and
A{A}^{∗} are
Hermitian matrices (Definition HM), and hence are normal (Definition NRML).
This provides the existence of orthonormal bases of eigenvectors for each matrix
by Theorem OBNM. Also, since each matrix is diagonalizable (Definition DZM)
by Theorem OD we can interchange algebraic and geometric multiplicities by
Theorem DMFE.
Our first step is to establish that an eigenvalue
λ has the same geometric
multiplicity for both {A}^{∗}A
and A{A}^{∗}. Suppose
\left \{{x}_{1},\kern 1.95872pt {x}_{2},\kern 1.95872pt {x}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {x}_{s}\right \} is an orthonormal
basis of eigenvectors of {A}^{∗}A
for the eigenspace {ℰ}_{{A}^{∗}A}\left (λ\right ).
Then for 1 ≤ i < j ≤ s,
note
Then the set E = \left \{A{x}_{1},\kern 1.95872pt A{x}_{2},\kern 1.95872pt A{x}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt A{x}_{s}\right \}
is an orthogonal set of nonzero eigenvectors of
A{A}^{∗} for the eigenvalue
λ. By Theorem OSLI,
the set E
is linearly independent and so the geometric multiplicity of
λ as an
eigenvalue of A{A}^{∗}
is s or
greater. We have
So for a nonzero eigenvalue, its algebraic multiplicities as an eigenvalue of
{A}^{∗}A and
A{A}^{∗} are equal. This is
enough to establish that p = q
and the eigenvalues can be ordered such that
{λ}_{i} = {ρ}_{i} for
1 ≤ i ≤ p.
For any matrix B,
the null space is identical to the eigenspace of the zero eigenvalue,
N\kern -1.95872pt \left (B\right ) = {ℰ}_{B}\left (0\right ),
and thus the nullity of the matrix is equal to the geometric multiplicity
of the zero eigenvalue. With this, we can examine the ranks of
{A}^{∗}A and
A{A}^{∗}.
When A is rectangular,
the square matrices {A}^{∗}A
and A{A}^{∗}
have different sizes. With equal algebraic and geometric multiplicities for their
common nonzero eigenvalues, the difference in their sizes is manifest in
different algebraic multiplicities for the zero eigenvalue and different nullities.
Specifically,
\eqalignno{
n\left ({A}^{∗}A\right ) & = n − r &n\left (A{A}^{∗}\right ) & = m − r & & & &
}
Suppose that {x}_{1},\kern 1.95872pt {x}_{2},\kern 1.95872pt {x}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {x}_{n} is an
orthonormal basis of {ℂ}^{n}
composed of eigenvectors of {A}^{∗}A
and ordered so that {x}_{i},
r + 1 ≤ i ≤ n are
eigenvectors of A{A}^{∗}
for the zero eigenvalue. Denote the associated nonzero eigenvalues of
{A}^{∗}A for these
eigenvectors by {δ}_{i},
1 ≤ i ≤ r.
Then define
\eqalignno{
{y}_{i} & = {1\over
\sqrt{{δ}_{i}}}A{x}_{i} &1 & ≤ i ≤ r & & & &
}
Let {y}_{r+1},\kern 1.95872pt {y}_{r+2},\kern 1.95872pt {y}_{r+2},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {y}_{m} be an orthonormal
basis for the eigenspace {ℰ}_{A{A}^{∗}}\left (0\right ),
whose existence is guaranteed by Theorem GSP. As scalar multiples of demonstrated
eigenvectors of A{A}^{∗},
{y}_{i},
1 ≤ i ≤ r are also
eigenvectors of A{A}^{∗},
and {y}_{i},
r + 1 ≤ i ≤ n have been chosen
as eigenvectors of A{A}^{∗}.
These eigenvectors also have norm 1, as we now show. For
1 ≤ i ≤ r,
For r + 1 ≤ i ≤ n,
the {y}_{i}
have been chosen to have norm 1.
Finally we check orthogonality. Consider two eigenvectors
{y}_{i} and
{y}_{j} with
1 ≤ i < j ≤ m. If
these two vectors have different eigenvalues, then Theorem HMOE establishes
that the two eigenvectors are orthogonal. If the two eigenvectors have a zero
eigenvalue, then they are orthogonal by the choice of the orthonormal basis of
{ℰ}_{A{A}^{∗}}\left (0\right ). If
the two eigenvectors have identical, nonzero, eigenvalues, then
So \left \{{y}_{1},\kern 1.95872pt {y}_{2},\kern 1.95872pt {y}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {y}_{m}\right \} is an orthonormal
set of eigenvectors for A{A}^{∗}.
The critical relationship between these two orthonormal bases is present by design.
For 1 ≤ i ≤ r,
The square roots of the eigenvalues of
{A}^{∗}A (or almost equivalently,
A{A}^{∗}!) are known as the
singular values of A.
Here is the definition.
Definition SV Singular Values Suppose A is an
m × n matrix. If the
eigenvalues of {A}^{∗}A
are {δ}_{1},\kern 1.95872pt {δ}_{2},\kern 1.95872pt {δ}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {δ}_{n}, then the
singular values of A
are \sqrt{{δ}_{1}},\kern 1.95872pt \sqrt{{δ}_{2}},\kern 1.95872pt \sqrt{{δ}_{3}},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt \sqrt{{δ}_{n}}.
△
Theorem EEMAP is a total setup for the singular value decomposition. This
remarkable theorem says that any matrix can be broken into a product of three
matrices. Two are square, and unitary. In light of Theorem UMPIP, we can view
these matrices as transforming vectors or coordinates in a rotational fashion. The
middle matrix of this decomposition is rectangular, but is as close to being
diagonal as a rectangular matrix can be. Viewed as a transformation, this matrix
effects, reflections, contractions or expansions along axes — it stretches vectors.
So any matrix, viewed as a transformation is the product of a rotation, a stretch
and a rotation.
The singular value theorem can also be viewed as an application of our most
general statement about matrix representations of linear transformations
relative to different bases. Theorem MRCB concerns linear transformations
T : U\mathrel{↦}V where
U and
V are possibly different
vector spaces. When U
and V
have different dimensions, the resulting matrix representation will be
rectangular. In Section CB we quickly specialized to the case where
U = V and
the matrix representations are square with one of our most central results,
Theorem SCB. Theorem SVD is an application of the full generality of
Theorem MRCB where the relevant bases are now orthonormal sets.
Theorem SVD Singular Value Decomposition Suppose A is an
m × n matrix of rank
r with nonzero
singular values {s}_{1},\kern 1.95872pt {s}_{2},\kern 1.95872pt {s}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {s}_{r}.
Then A = UD{V }^{∗} where
U is a unitary
matrix of size m,
V is a unitary
matrix of size n
and D
is an m × n
matrix given by
Proof Let {x}_{1},\kern 1.95872pt {x}_{2},\kern 1.95872pt {x}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {x}_{n}
and {y}_{1},\kern 1.95872pt {y}_{2},\kern 1.95872pt {y}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {y}_{m} be
the orthonormal bases described by the conclusion of Theorem EEMAP. Define
U to be the
m × m matrix whose
columns are {y}_{i},
1 ≤ i ≤ m, and define
V to be the
n × n matrix whose
columns are {x}_{i},
1 ≤ i ≤ n.
With orthonormal sets of columns, by Theorem CUMOS both
U and
V are
unitary matrices.