Section O  Orthogonality

From A First Course in Linear Algebra
Version 2.00
© 2004.
Licensed under the GNU Free Documentation License.

In this section we define a couple more operations with vectors, and prove a few theorems. At first blush these definitions and results will not appear central to what follows, but we will make use of them at key points in the remainder of the course (such as Section MINM, Section OD). Because we have chosen to use as our set of scalars, this subsection is a bit more, uh, … complex than it would be for the real numbers. We’ll explain as we go along how things get easier for the real numbers . If you haven’t already, now would be a good time to review some of the basic properties of arithmetic with complex numbers described in Section CNO. With that done, we can extend the basics of complex number arithmetic to our study of vectors in m.

Subsection CAV: Complex Arithmetic and Vectors

We know how the addition and multiplication of complex numbers is employed in defining the operations for vectors in m (Definition CVA and Definition CVSM). We can also extend the idea of the conjugate to vectors.

Definition CCCV
Complex Conjugate of a Column Vector
Suppose that u is a vector from m. Then the conjugate of the vector, u¯, is defined by

u¯i = ui¯  1 i m

(This definition contains Notation CCCV.)

With this definition we can show that the conjugate of a column vector behaves as we would expect with regard to vector addition and scalar multiplication.

Theorem CRVA
Conjugation Respects Vector Addition
Suppose x and y are two vectors from m. Then

x + y¯ = x¯ + y¯

Proof   For each 1 i m,

x + y¯i = x + yi¯  Definition CCCV = xi + yi¯  Definition CVA = xi¯ + yi¯  Theorem CCRA = x¯i + y¯i  Definition CCCV = x¯ + y¯i  Definition CVA

Then by Definition CVE we have x + y¯ = x¯ + y¯.

Theorem CRSM
Conjugation Respects Vector Scalar Multiplication
Suppose x is a vector from m, and α is a scalar. Then

αx¯ = α¯x¯

Proof   For 1 i m,

αx¯i = αxi¯  Definition CCCV = α xi¯  Definition CVSM = α¯ xi¯  Theorem CCRM = α¯ x¯i  Definition CCCV = α¯x¯i  Definition CVSM

Then by Definition CVE we have αx¯ = α¯x¯.

These two theorems together tell us how we can “push” complex conjugation through linear combinations.

Subsection IP: Inner products

Definition IP
Inner Product
Given the vectors u,v m the inner product of u and v is the scalar quantity in ,

u,v = u1 v1¯ + u2 v2¯ + u3 v3¯ + + um vm¯ = i=1m u i vi¯

(This definition contains Notation IP.)

This operation is a bit different in that we begin with two vectors but produce a scalar. Computing one is straightforward.

Example CSIP
Computing some inner products
The scalar product of

u = 2 + 3i 5 + 2i 3 + i  and v = 1 + 2i 4 + 5i 0 + 5i


u,v = (2 + 3i)(1 + 2i¯) + (5 + 2i)( 4 + 5i¯) + (3 + i)(0 + 5i¯) = (2 + 3i)(1 2i) + (5 + 2i)(4 5i) + (3 + i)(0 5i) = (8 i) + (10 33i) + (5 + 15i) = 3 19i

The scalar product of

w = 2 4 3 2 8  and x = 3 1 0 1 2


w,x = 2(3¯)+4(1¯)+(3)(0¯)+2( 1¯)+8( 2¯) = 2(3)+4(1)+(3)0+2(1)+8(2) = 8.

In the case where the entries of our vectors are all real numbers (as in the second part of Example CSIP), the computation of the inner product may look familiar and be known to you as a dot product or scalar product. So you can view the inner product as a generalization of the scalar product to vectors from m (rather than m).

Also, note that we have chosen to conjugate the entries of the second vector listed in the inner product, while many authors choose to conjugate entries from the first component. It really makes no difference which choice is made, it just requires that subsequent definitions and theorems are consistent with the choice. You can study the conclusion of Theorem IPAC as an explanation of the magnitude of the difference that results from this choice. But be careful as you read other treatments of the inner product or its use in applications, and be sure you know ahead of time which choice has been made.

There are several quick theorems we can now prove, and they will each be useful later.

Theorem IPVA
Inner Product and Vector Addition
Suppose u,v,w m. Then

 1. u + v,w = u,w + v,w  2. u,v + w = u,v + u,w

Proof   The proofs of the two parts are very similar, with the second one requiring just a bit more effort due to the conjugation that occurs. We will prove part 2 and you can prove part 1 (Exercise O.T10).

u,v + w = i=1m u i v + wi¯  Definition IP = i=1m u i( vi + wi¯)  Definition CVA = i=1m u i( vi¯ + wi¯)  Theorem CCRA = i=1m u i vi¯ + ui wi¯  Property DCN = i=1m u i vi¯ + i=1m u i wi¯  Property CACN = u,v + u,w  Definition IP

Theorem IPSM
Inner Product and Scalar Multiplication
Suppose u,v m and α . Then

 1. αu,v = α u,v  2. u,αv = α¯ u,v

Proof   The proofs of the two parts are very similar, with the second one requiring just a bit more effort due to the conjugation that occurs. We will prove part 2 and you can prove part 1 (Exercise O.T11).

u,αv = i=1m u i αvi¯  Definition IP = i=1m u iα vi¯  Definition CVSM = i=1m u iα¯ vi¯  Theorem CCRM = i=1mα¯ u i vi¯  Property CMCN = α¯ i=1m u i vi¯  Property DCN = α¯ u,v  Definition IP

Theorem IPAC
Inner Product is Anti-Commutative
Suppose that u and v are vectors in m. Then u,v = v,u¯.


u,v = i=1m u i vi¯  Definition IP = i=1m u i¯¯ vi¯  Theorem CCT = i=1m u i¯ vi¯  Theorem CCRM = i=1m u i¯ vi¯  Theorem CCRA = i=1m v i ui¯¯  Property CMCN = v,u¯  Definition IP

Subsection N: Norm

If treating linear algebra in a more geometric fashion, the length of a vector occurs naturally, and is what you would expect from its name. With complex numbers, we will define a similar function. Recall that if c is a complex number, then c denotes its modulus (Definition MCN).

Definition NV
Norm of a Vector
The norm of the vector u is the scalar quantity in

u = u 1 2 + u 2 2 + u 3 2 + + u m 2 = i=1m ui 2

(This definition contains Notation NV.)

Computing a norm is also easy to do.

Example CNSV
Computing the norm of some vectors
The norm of

u = 3 + 2i 1 6i 2 + 4i 2 + i


u = 3 + 2i 2 + 1 6i 2 + 2 + 4i 2 + 2 + i 2 = 13 + 37 + 20 + 5 = 75 = 53.

The norm of

v = 3 1 2 4 3


v = 3 2 + 1 2 + 2 2 + 4 2 + 3 2 = 32 + 12 + 22 + 42 + 32 = 39.

Notice how the norm of a vector with real number entries is just the length of the vector. Inner products and norms are related by the following theorem.

Theorem IPN
Inner Products and Norms
Suppose that u is a vector in m. Then u2 = u,u.


u2 = i=1m ui 2 2  Definition NV = i=1m u i 2 = i=1m u i ui¯  Definition MCN = u,u  Definition IP

When our vectors have entries only from the real numbers Theorem IPN says that the dot product of a vector with itself is equal to the length of the vector squared.

Theorem PIP
Positive Inner Products
Suppose that u is a vector in m. Then u,u 0 with equality if and only if u = 0.

Proof   From the proof of Theorem IPN we see that

u,u = u1 2 + u 2 2 + u 3 2 + + u m 2

Since each modulus is squared, every term is positive, and the sum must also be positive. (Notice that in general the inner product is a complex number and cannot be compared with zero, but in the special case of u,u the result is a real number.) The phrase, “with equality if and only if” means that we want to show that the statement u,u = 0 (i.e. with equality) is equivalent (“if and only if”) to the statement u = 0.

If u = 0, then it is a straightforward computation to see that u,u = 0. In the other direction, assume that u,u = 0. As before, u,u is a sum of moduli. So we have

0 = u,u = u1 2 + u 2 2 + u 3 2 + + u m 2

Now we have a sum of squares equaling zero, so each term must be zero. Then by similar logic, ui = 0 will imply that ui = 0, since 0 + 0i is the only complex number with zero modulus. Thus every entry of u is zero and so u = 0, as desired.

Notice that Theorem PIP contains three implications:

u m u,u 0 u = 0 u,u = 0 u,u = 0 u = 0

The results contained in Theorem PIP are summarized by saying “the inner product is positive definite.”

Subsection OV: Orthogonal Vectors

“Orthogonal” is a generalization of “perpendicular.” You may have used mutually perpendicular vectors in a physics class, or you may recall from a calculus class that perpendicular vectors have a zero dot product. We will now extend these ideas into the realm of higher dimensions and complex scalars.

Definition OV
Orthogonal Vectors
A pair of vectors, u and v, from m are orthogonal if their inner product is zero, that is, u,v = 0.

Example TOV
Two orthogonal vectors
The vectors

u = 2 + 3i 4 2i 1 + i 1 + i v = 1 i 2 + 3i 4 6i 1

are orthogonal since

u,v = (2 + 3i)(1 + i) + (4 2i)(2 3i) + (1 + i)(4 + 6i) + (1 + i)(1) = (1 + 5i) + (2 16i) + (2 + 10i) + (1 + i) = 0 + 0i.

We extend this definition to whole sets by requiring vectors to be pairwise orthogonal. Despite using the same word, careful thought about what objects you are using will eliminate any source of confusion.

Definition OSV
Orthogonal Set of Vectors
Suppose that S = u1,u2,u3,,un is a set of vectors from m. Then S is an orthogonal set if every pair of different vectors from S is orthogonal, that is ui,uj = 0 whenever ij.

We now define the prototypical orthogonal set, which we will reference repeatedly.

Definition SUV
Standard Unit Vectors
Let ej m, 1 j m denote the column vectors defined by

ej i = 0 if ij 1  if i = j

Then the set

e1,e2,e3,,em = ej 1 j m

is the set of standard unit vectors in m.

(This definition contains Notation SUV.)

Notice that ej is identical to column j of the m × m identity matrix Im (Definition IM). This observation will often be useful. It is not hard to see that the set of standard unit vectors is an orthogonal set. We will reserve the notation ei for these vectors.

Example SUVOS
Standard Unit Vectors are an Orthogonal Set
Compute the inner product of two distinct vectors from the set of standard unit vectors (Definition SUV), say ei, ej, where ij,

ei,ej = 00¯ + 00¯ + + 10¯ + + 00¯ + + 01¯ + + 00¯ + 00¯ = 0(0) + 0(0) + + 1(0) + + 0(1) + + 0(0) + 0(0) = 0

So the set e1,e2,e3,,em is an orthogonal set.

Example AOS
An orthogonal set
The set

x1,x2,x3,x4 = 1 + i 1 1 i i , 1 + 5i 6 + 5i 7 i 1 6i , 7 + 34i 8 23i 10 + 22i 30 + 13i , 2 4i 6 + i 4 + 3i 6 i

is an orthogonal set. Since the inner product is anti-commutative (Theorem IPAC) we can test pairs of different vectors in any order. If the result is zero, then it will also be zero if the inner product is computed in the opposite order. This means there are six pairs of different vectors to use in an inner product computation. We’ll do two and you can practice your inner products on the other four.

x1,x3 = (1 + i)(7 34i) + (1)(8 + 23i) + (1 i)(10 22i) + (i)(30 13i) = (27 41i) + (8 + 23i) + (32 12i) + (13 + 30i) = 0 + 0i  and x2,x4 = (1 + 5i)(2 + 4i) + (6 + 5i)(6 i) + (7 i)(4 3i) + (1 6i)(6 + i) = (22 6i) + (41 + 24i) + (31 + 17i) + (12 35i) = 0 + 0i

So far, this section has seen lots of definitions, and lots of theorems establishing un-surprising consequences of those definitions. But here is our first theorem that suggests that inner products and orthogonal vectors have some utility. It is also one of our first illustrations of how to arrive at linear independence as the conclusion of a theorem.

Theorem OSLI
Orthogonal Sets are Linearly Independent
Suppose that S is an orthogonal set of nonzero vectors. Then S is linearly independent.

Proof   Let S = u1,u2,u3,,un be an orthogonal set of nonzero vectors. To prove the linear independence of S, we can appeal to the definition (Definition LICV) and begin with an arbitrary relation of linear dependence (Definition RLDCV),

α1u1 + α2u2 + α3u3 + + αnun = 0.

Then, for every 1 i n, we have

αi = 1 ui,ui αi ui,ui  Theorem PIP = 1 ui,ui α1(0) + α2(0) + + αi ui,ui + + αn(0) Property ZCN = 1 ui,ui α1 u1,ui + + αi ui,ui + + αn un,ui  Definition OSV = 1 ui,ui α1u1,ui + α2u2,ui + + αnun,ui  Theorem IPSM = 1 ui,ui α1u1 + α2u2 + α3u3 + + αnun,ui  Theorem IPVA = 1 ui,ui 0,ui  Definition RLDCV = 1 ui,ui 0  Definition IP = 0  Property ZCN

So we conclude that αi = 0 for all 1 i n in any relation of linear dependence on S. But this says that S is a linearly independent set since the only way to form a relation of linear dependence is the trivial way (Definition LICV). Boom!

Subsection GSP: Gram-Schmidt Procedure

The Gram-Schmidt Procedure is really a theorem. It says that if we begin with a linearly independent set of p vectors, S, then we can do a number of calculations with these vectors and produce an orthogonal set of p vectors, T, so that S = T. Given the large number of computations involved, it is indeed a procedure to do all the necessary computations, and it is best employed on a computer. However, it also has value in proofs where we may on occasion wish to replace a linearly independent set by an orthogonal set.

This is our first occasion to use the technique of “mathematical induction” for a proof, a technique we will see again several times, especially in Chapter D. So study the simple example described in Technique I first. 

Theorem GSP
Gram-Schmidt Procedure
Suppose that S = v1,v2,v3,,vp is a linearly independent set of vectors in m. Define the vectors ui, 1 i p by

ui = vi vi,u1 u1,u1 u1 vi,u2 u2,u2 u2 vi,u3 u3,u3 u3 vi,ui1 ui1,ui1 ui1

Then if T = u1,u2,u3,,up, then T is an orthogonal set of non-zero vectors, and T = S.

Proof   We will prove the result by using induction on p (Technique I). To begin, we prove that T has the desired properties when p = 1. In this case u1 = v1 and T = u1 = v1 = S. Because S and T are equal, S = T. Equally trivial, T is an orthogonal set. If u1 = 0, then S would be a linearly dependent set, a contradiction.

Suppose that the theorem is true for any set of p 1 linearly independent vectors. Let S = v1,v2,v3,,vp be a linearly independent set of p vectors. Then S = v 1,v2,v3,,vp1 is also linearly independent. So we can apply the theorem to S and construct the vectors T = u 1,u2,u3,,up1. T is therefore an orthogonal set of nonzero vectors and S = T. Define

up = vp vp,u1 u1,u1 u1 vp,u2 u2,u2 u2 vp,u3 u3,u3 u3 vp,up1 up1,up1 up1

and let T = Tu p. We need to now show that T has several properties by building on what we know about T. But first notice that the above equation has no problems with the denominators ( ui,ui) being zero, since the ui are from T, which is composed of nonzero vectors.

We show that T = S, by first establishing that T S. Suppose x T, so

x = a1u1 + a2u2 + a3u3 + + apup

The term apup is a linear combination of vectors from T and the vector vp, while the remaining terms are a linear combination of vectors from T. Since T = S, any term that is a multiple of a vector from T can be rewritten as a linear combination of vectors from S. The remaining term apvp is a multiple of a vector in S. So we see that x can be rewritten as a linear combination of vectors from S, i.e. x S.

To show that S T, begin with y S, so

y = a1v1 + a2v2 + a3v3 + + apvp

Rearrange our defining equation for up by solving for vp. Then the term apvp is a multiple of a linear combination of elements of T. The remaining terms are a linear combination of v1,v2,v3,,vp1, hence an element of S = T. Thus these remaining terms can be written as a linear combination of the vectors in T. So y is a linear combination of vectors from T, i.e. y T.

The elements of T are nonzero, but what about up? Suppose to the contrary that up = 0,

0 = up = vp vp,u1 u1,u1 u1 vp,u2 u2,u2 u2 vp,u3 u3,u3 u3 vp,up1 up1,up1 up1 vp = vp,u1 u1,u1 u1 + vp,u2 u2,u2 u2 + vp,u3 u3,u3 u3 + + vp,up1 up1,up1 up1

Since S = T we can write the vectors u1,u2,u3,,up1 on the right side of this equation in terms of the vectors v1,v2,v3,,vp1 and we then have the vector vp expressed as a linear combination of the other p 1 vectors in S, implying that S is a linearly dependent set (Theorem DLDS), contrary to our lone hypothesis about S.

Finally, it is a simple matter to establish that T is an orthogonal set, though it will not appear so simple looking. Think about your objects as you work through the following — what is a vector and what is a scalar. Since T is an orthogonal set by induction, most pairs of elements in T are already known to be orthogonal. We just need to test “new” inner products, between up and ui, for 1 i p 1. Here we go, using summation notation,

up,ui = vp k=1p1 vp,uk uk,uk uk,ui = vp,ui k=1p1 vp,uk uk,uk uk,ui  Theorem IPVA = vp,ui k=1p1 vp,uk uk,uk uk,ui  Theorem IPVA = vp,ui k=1p1 vp,uk uk,uk uk,ui  Theorem IPSM = vp,ui vp,ui ui,ui ui,ui ki vp,uk uk,uk (0) Induction Hypothesis = vp,ui vp,ui ki0 = 0

Example GSTV
Gram-Schmidt of three vectors
We will illustrate the Gram-Schmidt process with three vectors. Begin with the linearly independent (check this!) set

S = v1,v2,v3 = 1 1 + i 1 , i 1 1 + i , 0 i i


u1 = v1 = 1 1 + i 1 u2 = v2 v2,u1 u1,u1 u1 = 1 4 2 3i 1 i 2 + 5i u3 = v3 v3,u1 u1,u1 u1 v3,u2 u2,u2 u2 = 1 11 3 i 1 + 3i 1 i


T = u1,u2,u3 = 1 1 + i 1 ,1 4 2 3i 1 i 2 + 5i , 1 11 3 i 1 + 3i 1 i

is an orthogonal set (which you can check) of nonzero vectors and T = S (all by Theorem GSP). Of course, as a by-product of orthogonality, the set T is also linearly independent (Theorem OSLI).

One final definition related to orthogonal vectors.

Definition ONS
OrthoNormal Set
Suppose S = u1,u2,u3,,un is an orthogonal set of vectors such that ui = 1 for all 1 i n. Then S is an orthonormal set of vectors.

Once you have an orthogonal set, it is easy to convert it to an orthonormal set — multiply each vector by the reciprocal of its norm, and the resulting vector will have norm 1. This scaling of each vector will not affect the orthogonality properties (apply Theorem IPSM).

Example ONTV
Orthonormal set, three vectors
The set

T = u1,u2,u3 = 1 1 + i 1 ,1 4 2 3i 1 i 2 + 5i , 1 11 3 i 1 + 3i 1 i

from Example GSTV is an orthogonal set. We compute the norm of each vector,

u1 = 2 u2 = 1 211 u3 = 2 11

Converting each vector to a norm of 1, yields an orthonormal set,

w1 = 1 2 1 1 + i 1 w2 = 1 1 211 1 4 2 3i 1 i 2 + 5i = 1 211 2 3i 1 i 2 + 5i w3 = 1 2 11 1 11 3 i 1 + 3i 1 i = 1 22 3 i 1 + 3i 1 i

Example ONFV
Orthonormal set, four vectors
As an exercise convert the linearly independent set

S = 1 + i 1 1 i i , i 1 + i 1 i , i i 1 + i 1 , 1 i i 1 1

to an orthogonal set via the Gram-Schmidt Process (Theorem GSP) and then scale the vectors to norm 1 to create an orthonormal set. You should get the same set you would if you scaled the orthogonal set of Example AOS to become an orthonormal set.

It is crazy to do all but the simplest and smallest instances of the Gram-Schmidt procedure by hand. Well, OK, maybe just once or twice to get a good understanding of Theorem GSP. After that, let a machine do the work for you. That’s what they are for.See: Computation GSP.MMA .

We will see orthonormal sets again in Subsection MINM.UM. They are intimately related to unitary matrices (Definition UM) through Theorem CUMOS. Some of the utility of orthonormal sets is captured by Theorem COB in Subsection B.OBC. Orthonormal sets appear once again in Section OD where they are key in orthonormal diagonalization.

Subsection READ: Reading Questions

  1. Is the set
    1 1 2 , 5 3 1 , 8 4 2

    an orthogonal set? Why?

  2. What is the distinction between an orthogonal set and an orthonormal set?
  3. What is nice about the output of the Gram-Schmidt process?

Subsection EXC: Exercises

C20 Complete Example AOS by verifying that the four remaining inner products are zero.

Contributed by Robert Beezer

C21 Verify that the set T created in Example GSTV by the Gram-Schmidt Procedure is an orthogonal set.  
Contributed by Robert Beezer

T10 Prove part 1 of the conclusion of Theorem IPVA.  
Contributed by Robert Beezer

T11 Prove part 1 of the conclusion of Theorem IPSM.  
Contributed by Robert Beezer

T20 Suppose that u,v,w n, α,β and u is orthogonal to both v and w. Prove that u is orthogonal to αv + βw.  
Contributed by Robert Beezer Solution [519]

T30 Suppose that the set S in the hypothesis of Theorem GSP is not just linearly independent, but is also orthogonal. Prove that the set T created by the Gram-Schmidt procedure is equal to S. (Note that we are getting a stronger conclusion than T = S — the conclusion is that T = S.) In other words, it is pointless to apply the Gram-Schmidt procedure to a set that is already orthogonal.  
Contributed by Steve Canfield

Subsection SOL: Solutions

T20 Contributed by Robert Beezer Statement [517]
Vectors are orthogonal if their inner product is zero (Definition OV), so we compute,

αv + βw,u = αv,u + βw,u  Theorem IPVA = α v,u + β w,u  Theorem IPSM = α 0 + β 0  Definition OV = 0

So by Definition OV, u and αv + βw are an orthogonal pair of vectors.

Annotated Acronyms V: Vectors

Theorem VSPCV
These are the fundamental rules for working with the addition, and scalar multiplication, of column vectors. We will see something very similar in the next chapter (Theorem VSPM) and then this will be generalized into what is arguably our most important definition, Definition VS.

Theorem SLSLC
Vector addition and scalar multiplication are the two fundamental operations on vectors, and linear combinations roll them both into one. Theorem SLSLC connects linear combinations with systems of equations. This one we will see often enough that it is worth memorizing.

Theorem PSPHS
This theorem is interesting in its own right, and sometimes the vaugeness surrounding the choice of z can seem mysterious. But we list it here because we will see an important theorem in Section ILT which will generalize this result (Theorem KPI).

Theorem LIVRN
If you have a set of column vectors, this is the fastest computational approach to determine if the set is linearly independent. Make the vectors the columns of a matrix, row-reduce, compare r and n. That’s it — and you always get an answer. Put this one in your toolkit.

Theorem BNS
We will have several theorems (all listed in these “Annotated Acronyms” sections) whose conclusions will provide a linearly independent set of vectors whose span equals some set of interest (the null space here). While the notation in this theorem might appear a gruesome, in practice it can become very routine to apply. So practice this one — we’ll be using it all through the book.

Theorem BS
As promised, another theorem that provides a linearly independent set of vectors whose span equals some set of interest (a span now). You can use this one to clean up any span.