Section O  Orthogonality

From A First Course in Linear Algebra
Version 1.08
© 2004.
Licensed under the GNU Free Documentation License.
http://linear.ups.edu/

In this section we define a couple more operations with vectors, and prove a few theorems. These definitions and results are not central to what follows, but we will make use of them frequently throughout the remainder of the course on various occasions. Because we have chosen to use as our set of scalars, this subsection is a bit more, uh, … complex than it would be for the real numbers. We’ll explain as we go along how things get easier for the real numbers . If you haven’t already, now would be a good time to review some of the basic properties of arithmetic with complex numbers described in Section CNO. With that done, we can extend the basics of complex number arithmetic to our study of vectors in m.

Subsection CAV: Complex Arithmetic and Vectors

We know how the addition and multiplication of complex numbers is employed in defining the operations for vectors in m (Definition CVA and Definition CVSM). We can also extend the idea of the conjugate to vectors.

Definition CCCV
Complex Conjugate of a Column Vector
Suppose that u is a vector from m. Then the conjugate of the vector, u¯, is defined by

u¯i = ui¯  1 i m

(This definition contains Notation CCCV.)

With this definition we can show that the conjugate of a column vector behaves as we would expect with regard to vector addition and scalar multiplication.

Theorem CRVA
Conjugation Respects Vector Addition
Suppose x and y are two vectors from m. Then

x + y¯ = x¯ + y¯

Proof   For each 1 i m,

x + y¯i = x + yi¯  Definition CCCV = xi + yi¯  Definition CVA = xi¯ + yi¯  Theorem CCRA = x¯i + y¯i  Definition CCCV = x¯ + y¯i  Definition CVA

Then by Definition CVE we have x + y¯ = x¯ + y¯.

Theorem CRSM
Conjugation Respects Vector Scalar Multiplication
Suppose x is a vector from m, and α is a scalar. Then

αx¯ = α¯x¯

Proof   For 1 i m,

αx¯i = αxi¯  Definition CCCV = α xi¯  Definition CVSM = α¯ xi¯  Theorem CCRM = α¯ x¯i  Definition CCCV = α¯x¯i  Definition CVSM

Then by Definition CVE we have αx¯ = α¯x¯.

These two theorems together tell us how we can “push” complex conjugation through linear combinations.

Subsection IP: Inner products

Definition IP
Inner Product
Given the vectors u,v m the inner product of u and v is the scalar quantity in ,

u,v = u1 v1¯ + u2 v2¯ + u3 v3¯ + + um vm¯ = i=1m u i vi¯

(This definition contains Notation IP.)

This operation is a bit different in that we begin with two vectors but produce a scalar. Computing one is straightforward.

Example CSIP
Computing some inner products
The scalar product of

u = 2 + 3i 5 + 2i 3 + i  and v = 1 + 2i 4 + 5i 0 + 5i

is

u,v = (2 + 3i)(1 + 2i¯) + (5 + 2i)( 4 + 5i¯) + (3 + i)(0 + 5i¯) = (2 + 3i)(1 2i) + (5 + 2i)(4 5i) + (3 + i)(0 5i) = (8 i) + (10 33i) + (5 + 15i) = 3 19i

The scalar product of

w = 2 4 3 2 8  and x = 3 1 0 1 2

is

w,x = 2(3¯)+4(1¯)+(3)(0¯)+2( 1¯)+8( 2¯) = 2(3)+4(1)+(3)0+2(1)+8(2) = 8.

In the case where the entries of our vectors are all real numbers (as in the second part of Example CSIP), the computation of the inner product may look familiar and be known to you as a dot product or scalar product. So you can view the inner product as a generalization of the scalar product to vectors from m (rather than m).

Also, note that we have chosen to conjugate the entries of the second vector listed in the inner product, while many authors choose to conjugate entries from the first component. It really makes no difference which choice is made, it just requires that subsequent definitions and theorems are consistent with the choice. You can study the conclusion of Theorem IPAC as an explanation of the magnitude of the difference that results from this choice. But be careful as you read other treatments of the inner product or its use in applications, and be sure you know ahead of time which choice has been made.

There are several quick theorems we can now prove, and they will each be useful later.

Theorem IPVA
Inner Product and Vector Addition
Suppose u,v,w m. Then

 1. u + v,w = u,w + v,w  2. u,v + w = u,v + u,w

Proof   The proofs of the two parts are very similar, with the second one requiring just a bit more effort due to the conjugation that occurs. We will prove part 2 and you can prove part 1 (Exercise O.T10).

u,v + w = i=1m u i v + wi¯  Definition IP = i=1m u i( vi + wi¯)  Definition CVA = i=1m u i( vi¯ + wi¯)  Theorem CCRA = i=1m u i vi¯ + ui wi¯  Property DCN = i=1m u i vi¯ + i=1m u i wi¯  Property ACCN = u,v + u,w  Definition IP

Theorem IPSM
Inner Product and Scalar Multiplication
Suppose u,v m and α . Then

 1. αu,v = α u,v  2. u,αv = α¯ u,v

Proof   The proofs of the two parts are very similar, with the second one requiring just a bit more effort due to the conjugation that occurs. We will prove part 2 and you can prove part 1 (Exercise O.T11).

u,αv = i=1m u i αvi¯  Definition IP = i=1m u iα vi¯  Definition CVSM = i=1m u iα¯ vi¯  Theorem CCRM = i=1mα¯ u i vi¯  Property MCCN = α¯ i=1m u i vi¯  Property DCN = α¯ u,v  Definition IP

Theorem IPAC
Inner Product is Anti-Commutative
Suppose that u and v are vectors in m. Then u,v = v,u¯.

Proof  

u,v = i=1m u i vi¯  Definition IP = i=1m u i¯¯ vi¯  Theorem CCT = i=1m u i¯ vi¯  Theorem CCRM = i=1m u i¯ vi¯  Theorem CCRA = i=1m v i ui¯¯  Property MCCN = v,u¯  Definition IP

Subsection N: Norm

If treating linear algebra in a more geometric fashion, the length of a vector occurs naturally, and is what you would expect from its name. With complex numbers, we will define a similar function. Recall that if c is a complex number, then c denotes its modulus (Definition MCN).

Definition NV
Norm of a Vector
The norm of the vector u is the scalar quantity in

u = u 1 2 + u 2 2 + u 3 2 + + u m 2 = i=1m ui 2

(This definition contains Notation NV.)

Computing a norm is also easy to do.

Example CNSV
Computing the norm of some vectors
The norm of

u = 3 + 2i 1 6i 2 + 4i 2 + i

is

u = 3 + 2i 2 + 1 6i 2 + 2 + 4i 2 + 2 + i 2 = 13 + 37 + 20 + 5 = 75 = 53.

The norm of

v = 3 1 2 4 3

is

v = 3 2 + 1 2 + 2 2 + 4 2 + 3 2 = 32 + 12 + 22 + 42 + 32 = 39.

Notice how the norm of a vector with real number entries is just the length of the vector. Inner products and norms are related by the following theorem.

Theorem IPN
Inner Products and Norms
Suppose that u is a vector in m. Then u2 = u,u.

Proof  

u2 = i=1m ui 2 2  Definition NV = i=1m u i 2 = i=1m u i ui¯  Definition MCN = u,u  Definition IP

When our vectors have entries only from the real numbers Theorem IPN says that the dot product of a vector with itself is equal to the length of the vector squared.

Theorem PIP
Positive Inner Products
Suppose that u is a vector in m. Then u,u 0 with equality if and only if u = 0.

Proof   From the proof of Theorem IPN we see that

u,u = u1 2 + u 2 2 + u 3 2 + + u m 2

Since each modulus is squared, every term is positive, and the sum must also be positive. (Notice that in general the inner product is a complex number and cannot be compared with zero, but in the special case of u,u the result is a real number.) The phrase, “with equality if and only if” means that we want to show that the statement u,u = 0 (i.e. with equality) is equivalent (“if and only if”) to the statement u = 0.

If u = 0, then it is a straightforward computation to see that u,u = 0. In the other direction, assume that u,u = 0. As before, u,u is a sum of moduli. So we have

0 = u,u = u1 2 + u 2 2 + u 3 2 + + u m 2

Now we have a sum of squares equaling zero, so each term must be zero. Then by similar logic, ui = 0 will imply that ui = 0, since 0 + 0i is the only complex number with zero modulus. Thus every entry of u is zero and so u = 0, as desired.

Notice that Theorem PIP contains three implications: u is any vector u,u 0, u = 0 u,u = 0, and u,u = 0 u = 0. The results contained in Theorem PIP are summarized by saying “the inner product is positive definite.”

Subsection OV: Orthogonal Vectors

“Orthogonal” is a generalization of “perpendicular.” You may have used mutually perpendicular vectors in a physics class, or you may recall from a calculus class that perpendicular vectors have a zero dot product. We will now extend these ideas into the realm of higher dimensions and complex scalars.

Definition OV
Orthogonal Vectors
A pair of vectors, u and v, from m are orthogonal if their inner product is zero, that is, u,v = 0.

Example TOV
Two orthogonal vectors
The vectors

u = 2 + 3i 4 2i 1 + i 1 + i v = 1 i 2 + 3i 4 6i 1

are orthogonal since

u,v = (2 + 3i)(1 + i) + (4 2i)(2 3i) + (1 + i)(4 + 6i) + (1 + i)(1) = (1 + 5i) + (2 16i) + (2 + 10i) + (1 + i) = 0 + 0i.

We extend this definition to whole sets by requiring vectors to be pairwise orthogonal. Despite using the same word, careful thought about what objects you are using will eliminate any source of confusion.

Definition OSV
Orthogonal Set of Vectors
Suppose that S = u1,u2,u3,,un is a set of vectors from m. Then the set S is orthogonal if every pair of different vectors from S is orthogonal, that is ui,uj = 0 whenever ij.

The next example is trivial in some respects, but is still worthy of discussion since it is the prototypical orthogonal set.

Example SUVOS
Standard Unit Vectors are an Orthogonal Set
The standard unit vectors are the columns of the identity matrix (Definition SUV). Computing the inner product of two distinct vectors, ei, ej, ij, gives,

ei,ej = 00¯ + 00¯ + + 10¯ + + 00¯ + + 01¯ + + 00¯ + 00¯ = 0(0) + 0(0) + + 1(0) + + 0(1) + + 0(0) + 0(0) = 0

Example AOS
An orthogonal set
The set

x1,x2,x3,x4 = 1 + i 1 1 i i , 1 + 5i 6 + 5i 7 i 1 6i , 7 + 34i 8 23i 10 + 22i 30 + 13i , 2 4i 6 + i 4 + 3i 6 i

is an orthogonal set. Since the inner product is anti-commutative (Theorem IPAC) we can test pairs of different vectors in any order. If the result is zero, then it will also be zero if the inner product is computed in the opposite order. This means there are six pairs of different vectors to use in an inner product computation. We’ll do two and you can practice your inner products on the other four.

x1,x3 = (1 + i)(7 34i) + (1)(8 + 23i) + (1 i)(10 22i) + (i)(30 13i) = (27 41i) + (8 + 23i) + (32 12i) + (13 + 30i) = 0 + 0i  and x2,x4 = (1 + 5i)(2 + 4i) + (6 + 5i)(6 i) + (7 i)(4 3i) + (1 6i)(6 + i) = (22 6i) + (41 + 24i) + (31 + 17i) + (12 35i) = 0 + 0i

So far, this section has seen lots of definitions, and lots of theorems establishing un-surprising consequences of those definitions. But here is our first theorem that suggests that inner products and orthogonal vectors have some utility. It is also one of our first illustrations of how to arrive at linear independence as the conclusion of a theorem.

Theorem OSLI
Orthogonal Sets are Linearly Independent
Suppose that S = u1,u2,u3,,un is an orthogonal set of nonzero vectors. Then S is linearly independent.

Proof   To prove linear independence of a set of vectors, we can appeal to the definition (Definition LICV) and begin with a relation of linear dependence (Definition RLDCV),

α1u1 + α2u2 + α3u3 + + αnun = 0.

Then, for every 1 i n, we have

0 = 0 ui,ui = 0ui,ui  Theorem IPSM = 0,ui  Theorem CVSM = α1u1 + α2u2 + α3u3 + + αnun,ui  Relation of linear dependence = α1u1,ui + α2u2,ui + α3u3,ui + + αnun,ui  Theorem IPVA = α1 u1,ui + α2 u2,ui + α3 u3,ui + + αi ui,ui + + αn un,ui  Theorem IPSM = α1(0) + α2(0) + α3(0) + + αi ui,ui + + αn(0) Orthogonal set = αi ui,ui

So we have 0 = αi ui,ui. However, since ui0 (the hypothesis said our vectors were nonzero), Theorem PIP says that ui,ui > 0. So we must conclude that αi = 0 for all 1 i n. But this says that S is a linearly independent set since the only way to form a relation of linear dependence is the trivial way, with all the scalars zero. Boom!

Subsection GSP: Gram-Schmidt Procedure

The Gram-Schmidt Procedure is really a theorem. It says that if we begin with a linearly independent set of p vectors, S, then we can do a number of calculations with these vectors and produce an orthogonal set of p vectors, T, so that S = T. Given the large number of computations involved, it is indeed a procedure to do all the necessary computations, and it is best employed on a computer. However, it also has value in proofs where we may on occasion wish to replace a linearly independent set by an orthogonal set.

This is our first occasion to use the technique of “mathematical induction” for a proof, a technique we will see again several times, especially in Chapter D. So study the simple example described in Technique I first.

Theorem GSP
Gram-Schmidt Procedure
Suppose that S = v1,v2,v3,,vp is a linearly independent set of vectors in m. Define the vectors ui, 1 i p by

ui = vi vi,u1 u1,u1 u1 vi,u2 u2,u2 u2 vi,u3 u3,u3 u3 vi,ui1 ui1,ui1 ui1

Then if T = u1,u2,u3,,up, then T is an orthogonal set of non-zero vectors, and T = S.

Proof   We will prove the result by using induction on p (Technique I). To begin, we prove that T has the desired properties when p = 1. In this case u1 = v1 and T = u1 = v1 = S. Because S and T are equal, S = T. Equally trivial, T is an orthogonal set. If u1 = 0, then S would be a linearly dependent set, a contradiction.

Now suppose that the theorem is true for any set of p 1 linearly independent vectors. Let S = v1,v2,v3,,vp be a linearly independent set of p vectors. Then S = v 1,v2,v3,,vp1 is also linearly independent. So we can apply the theorem to S and construct the vectors T = u 1,u2,u3,,up1. T is therefore an orthogonal set of nonzero vectors and S = T. Define

up = vp vp,u1 u1,u1 u1 vp,u2 u2,u2 u2 vp,u3 u3,u3 u3 vp,up1 up1,up1 up1

and let T = Tu p. We need to now show that T has several properties by building on what we know about T. But first notice that the above equation has no problems with the denominators ( ui,ui) being zero, since the ui are from T, which is composed of nonzero vectors.

We show that T = S, by first establishing that T S. Suppose x T, so

x = a1u1 + a2u2 + a3u3 + + apup

The term apup is a linear combination of vectors from T and the vector vp, while the remaining terms are a linear combination of vectors from T. Since T = S, any term that is a multiple of a vector from T can be rewritten as a linear combination of vectors from S. The remaining term apvp is a multiple of a vector in S. So we see that x can be rewritten as a linear combination of vectors from S, i.e. x S.

To show that S T, begin with y S, so

y = a1v1 + a2v2 + a3v3 + + apvp

Rearrange our defining equation for up by solving for vp. Then the term apvp is a multiple of a linear combination of elements of T. The remaining terms are a linear combination of v1,v2,v3,,vp1, hence an element of S = T. Thus these remaining terms can be written as a linear combination of the vectors in T. So y is a linear combination of vectors from T, i.e. y T.

The elements of T are nonzero, but what about up? Suppose to the contrary that up = 0,

0 = up = vp vp,u1 u1,u1 u1 vp,u2 u2,u2 u2 vp,u3 u3,u3 u3 vp,up1 up1,up1 up1 vp = vp,u1 u1,u1 u1 + vp,u2 u2,u2 u2 + vp,u3 u3,u3 u3 + + vp,up1 up1,up1 up1

Since S = T we can write the vectors u1,u2,u3,,up1 on the right side of this equation in terms of the vectors v1,v2,v3,,vp1 and we then have the vector vp expressed as a linear combination of the other p 1 vectors in S, implying that S is a linearly dependent set (Theorem DLDS), contrary to our lone hypothesis about S.

Finally, it is a simple matter to establish that T is an orthogonal set, though it will not appear so simple looking. Think about your objects as you work through the following — what is a vector and what is a scalar. Since T is an orthogonal set by induction, most pairs of elements in T are orthogonal. We just need to test inner products between up and ui, for 1 i p 1. Here we go, using summation notation,

up,ui = vp k=1p1 vp,uk uk,uk uk,ui = vp,ui k=1p1 vp,uk uk,uk uk,ui  Theorem IPVA = vp,ui k=1p1 vp,uk uk,uk uk,ui  Theorem IPVA = vp,ui k=1p1 vp,uk uk,uk uk,ui  Theorem IPSM = vp,ui vp,ui ui,ui ui,ui ki vp,uk uk,uk (0) Induction Hypothesis = vp,ui vp,ui ki0 = 0

Example GSTV
Gram-Schmidt of three vectors
We will illustrate the Gram-Schmidt process with three vectors. Begin with the linearly independent (check this!) set

S = v1,v2,v3 = 1 1 + i 1 , i 1 1 + i , 0 i i

Then

u1 = v1 = 1 1 + i 1 u2 = v2 v2,u1 u1,u1 u1 = 1 4 2 3i 1 i 2 + 5i u3 = v3 v3,u1 u1,u1 u1 v3,u2 u2,u2 u2 = 1 11 3 i 1 + 3i 1 i

and

T = u1,u2,u3 = 1 1 + i 1 ,1 4 2 3i 1 i 2 + 5i , 1 11 3 i 1 + 3i 1 i

is an orthogonal set (which you can check) of nonzero vectors and T = S (all by Theorem GSP). Of course, as a by-product of orthogonality, the set T is also linearly independent (Theorem OSLI).

One final definition related to orthogonal vectors.

Definition ONS
OrthoNormal Set
Suppose S = u1,u2,u3,,un is an orthogonal set of vectors such that ui = 1 for all 1 i n. Then S is an orthonormal set of vectors.

Once you have an orthogonal set, it is easy to convert it to an orthonormal set — multiply each vector by the reciprocal of its norm, and the resulting vector will have norm 1. This scaling of each vector will not affect the orthogonality properties (apply Theorem IPSM).

Example ONTV
Orthonormal set, three vectors
The set

T = u1,u2,u3 = 1 1 + i 1 ,1 4 2 3i 1 i 2 + 5i , 1 11 3 i 1 + 3i 1 i

from Example GSTV is an orthogonal set. We compute the norm of each vector,

u1 = 2 u2 = 1 211 u3 = 2 11

Converting each vector to a norm of 1, yields an orthonormal set,

w1 = 1 2 1 1 + i 1 w2 = 1 1 211 1 4 2 3i 1 i 2 + 5i = 1 211 2 3i 1 i 2 + 5i w3 = 1 2 11 1 11 3 i 1 + 3i 1 i = 1 22 3 i 1 + 3i 1 i

Example ONFV
Orthonormal set, four vectors
As an exercise convert the linearly independent set

S = 1 + i 1 1 i i , i 1 + i 1 i , i i 1 + i 1 , 1 i i 1 1

to an orthogonal set via the Gram-Schmidt Process (Theorem GSP) and then scale the vectors to norm 1 to create an orthonormal set. You should get the same set you would if you scaled the orthogonal set of Example AOS to become an orthonormal set.

It is crazy to do all but the simplest and smallest instances of the Gram-Schmidt procedure by hand. Well, OK, maybe just once or twice to get a good understanding of Theorem GSP. After that, let a machine do the work for you. That’s what they are for. See: Computation GSP.MMA .

We will see orthonormal sets again in Subsection MINM.UM. They are intimately related to unitary matrices (Definition UM) through Theorem CUMOS. Some of the utility of orthonormal sets is captured by Theorem COB in Subsection B.OBC. Orthonormal sets appear once again in Section OD where they are key in orthonormal diagonalization.

Subsection READ: Reading Questions

  1. Is the set
    1 1 2 , 5 3 1 , 8 4 2

    an orthogonal set? Why?

  2. What is the distinction between an orthogonal set and an orthonormal set?
  3. What is nice about the output of the Gram-Schmidt process?

Subsection EXC: Exercises

C20 Complete Example AOS by verifying that the four remaining inner products are zero.

 
Contributed by Robert Beezer

C21 Verify that the set T created in Example GSTV by the Gram-Schmidt Procedure is an orthogonal set.  
Contributed by Robert Beezer

T10 Prove part 1 of the conclusion of Theorem IPVA.  
Contributed by Robert Beezer

T11 Prove part 1 of the conclusion of Theorem IPSM.  
Contributed by Robert Beezer