From A First Course in Linear Algebra

Version 2.00

© 2004.

Licensed under the GNU Free Documentation License.

http://linear.ups.edu/

This Section is Incomplete

Given two points in the plane, there is a unique line through them. Given three points in the plane, and not in a line, there is a unique parabola through them. Given four points in the plane, there is a unique polynomial, of degree 3 or less, passing through them. And so on. We can prove this result, and give a procedure for finding the polynomial with the help of Vandermonde matrices (Section VM).

Theorem IP

Interpolating Polynomial

Suppose \left \{\left .({x}_{i},\kern 1.95872pt {y}_{i})\kern 1.95872pt \right \vert \kern 1.95872pt 1 ≤ i ≤ n + 1\right \} is a set of
n + 1 points in the plane
where the x-coordinates
are all different. Then there is a unique polynomial of degree
n or less,
p(x), such
that p({x}_{i}) = {y}_{i},
1 ≤ i ≤ n + 1.
□

Proof Write p(x) = {a}_{0} + {a}_{1}x + {a}_{2}{x}^{2} + \mathrel{⋯} + {a}_{ n}{x}^{n}. To meet the conclusion of the theorem, we desire,

\eqalignno{
{y}_{i} & = p({x}_{i}) = {a}_{0} + {a}_{1}{x}_{i} + {a}_{2}{x}_{i}^{2} + \mathrel{⋯} + {a}_{
n}{x}_{i}^{n} &1 ≤ i ≤ n + 1 & & & & &
}

This is a system of n + 1 linear equations in the n + 1 variables {a}_{0},\kern 1.95872pt {a}_{1},\kern 1.95872pt {a}_{2},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {a}_{n}. The vector of conatants in this system is the vector containing the y-coordinates of the points. More importantly, the coefficient matrix is a Vandermonde matrix (Definition VM) built from the x-coordinates {x}_{1},\kern 1.95872pt {x}_{2},\kern 1.95872pt {x}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {x}_{n+1}. Since we have required that these scalars all be different, Theorem NVM tells us that the coefficient matrix is nonsingular and Theorem NMUS says the solution for the coefficients of the polynomial exists, and is unique. As a practical matter, Theorem SNCM provides an expression for the solution. ■

Example PTFP

Polynomial through five points

Suppose we have the following 5 points in the plane and we wish to pass a degree
4 polynomial through them.

i | 1 | 2 | 3 | 4 | 5 | |

{x}_{i} | -3 | -1 | 2 | 3 | 6 | |

{y}_{i} | 276 | 16 | 31 | 144 2319 |

The required system of equations has a coefficient matrix that is the Vandermonde matrix where row i is successive powers of {x}_{i}

\eqalignno{
A & = \left [\array{
1&−3& 9 &−27& 81
\cr
1&−1& 1 & −1 & 1
\cr
1& 2 & 4 & 8 & 16
\cr
1& 3 & 9 & 27 & 81
\cr
1& 6 &36&216&1296 } \right ] & &
}

Theorem NMUS provides a solution as

\eqalignno{
\left [\array{
{a}_{0}
\cr
{a}_{1}
\cr
{a}_{2}
\cr
{a}_{3}
\cr
{a}_{4} } \right ] & = {A}^{−1}\left [\array{
276
\cr
16
\cr
31
\cr
144
\cr
2319 } \right ] = \left [\array{
−{1\over
15}& {9\over
14} & {9\over
10} & −{1\over
2} & {1\over
42}
\cr
0 & −{3\over
7} & {3\over
4} & −{1\over
3} & {1\over
84}
\cr
{5\over
108} & −{1\over
56} & −{1\over
4} & {17\over
72} &−{11\over
756}
\cr
−{1\over
54}& {1\over
21} &−{1\over
12}& {1\over
18} &− {1\over
756}
\cr
{1\over
540} &− {1\over
168}& {1\over
60} &−{1\over
72}& {1\over
756} } \right ]\left [\array{
276
\cr
16
\cr
31
\cr
144
\cr
2319 } \right ] = \left [\array{
3
\cr
−4
\cr
5
\cr
−2
\cr
2 } \right ] & &
}

So the polynomial is p(x) = 3 − 4x + 5{x}^{2} − 2{x}^{3} + 2{x}^{4}. ⊠

The unique polynomial passing through a set of points is known as the interpolating polynomial and it has many uses. Unfortunately, when confronted with data from an experiment the situation may not be so simple or clear cut. Read on.

Suppose that we have n real variables, {x}_{1},\kern 1.95872pt {x}_{2},\kern 1.95872pt {x}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {x}_{n}, that we can measure in an experiment. We believe that these variables combine, in a linear fashion, to equal another real variable, y. In other words, we have reason to believe from our understanding of the experiment, that

\eqalignno{
y & = {a}_{1}{x}_{1} + {a}_{2}{x}_{2} + {a}_{3}{x}_{3} + \mathrel{⋯} + {a}_{n}{x}_{n} & &
}

where the scalars {a}_{1},\kern 1.95872pt {a}_{2},\kern 1.95872pt {a}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {a}_{n} are not known to us, but are instead desirable. We would call this our model of the situation. Then we run the experiment m times, collecting sets of values for the variables of the experiment. For run number k we might denote these values as {y}_{k}, {x}_{k1}, {x}_{k2}, {x}_{k3}, …, {x}_{kn}. If we substitute these values into the model equation, we get m linear equations in the unknown coefficients {a}_{1},\kern 1.95872pt {a}_{2},\kern 1.95872pt {a}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {a}_{n}. If m = n, then we have a square coefficient matrix of the system which might happen to be nonsingular and there would be a unique solution.

However, more likely m > n (the more data we collect, the greater our confidence in the results) and the resulting system is inconsistent. It may be that our model is only an approximate understanding of the relationship between the {x}_{i} and y, or our measurements are not completely accurate. Still we would like to understand the situation we are studying, and would like some best answer for {a}_{1},\kern 1.95872pt {a}_{2},\kern 1.95872pt {a}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {a}_{n}.

Let y denote the vector with {\left [y\right ]}_{i} = {y}_{i}, 1 ≤ i ≤ m, let a denote the vector with {\left [a\right ]}_{j} = {a}_{j}, 1 ≤ j ≤ n, and let X denote the m × n matrix with {\left [X\right ]}_{ij} = {x}_{ij}, 1 ≤ i ≤ m, 1 ≤ j ≤ n. Then the model equation, evaluated with each run of the experiment, translates to Xa = y. With the presumption that this system has no solution, we can try to minimize the differecne between the two side of the equation y − Xa. As a vector, it is hard to imagine what the minimum might be, so we instead minimize the square of its norm

\eqalignno{
S ={ \left (y − Xa\right )}^{t}\left (y − Xa\right ) & &
}

To keep the logical flow accurate, we will define the minimizing value and then give the proof that it behaves as desired.

Definition LSS

Least Squares Solution

Given the equation Xa = y,
where X is an
m × n matrix of rank
n, the least squares
solution for a
is {\left ({X}^{t}X\right )}^{−1}{X}^{t}y.
△

Theorem LSMR

Least Squares Minimizes Residuals

Suppose that X is
an m × n matrix of rank
n. The least squares
solution of Xa = y,
{a}^{′} ={ \left ({X}^{t}X\right )}^{−1}{X}^{t}y,
minimizes the expression

\eqalignno{
S ={ \left (y − Xa\right )}^{t}\left (y − Xa\right ) & &
}

□
Proof We begin by finding the critical points of S. In preparation, let {X}_{j} denote column j of X, for 1 ≤ j ≤ n and compute partial derivatives with respect to {a}_{j}, 1 ≤ j ≤ n. A matrix product of the form {x}^{t}y is a sum of products, so a derivative is a sum of applications of the product rule,

\eqalignno{
{∂\over
∂{a}_{j}}S & = {∂\over
∂{a}_{j}}\left ({\left (y − Xa\right )}^{t}\left (y − Xa\right )\right ) & &
\cr
& ={ \mathop{∑
}}_{i=1}^{m} {∂\over
∂{a}_{j}}\left ({\left [y − Xa\right ]}_{i}\right ){\left [y − Xa\right ]}_{i} +{ \left [y − Xa\right ]}_{i} {∂\over
∂{a}_{j}}\left ({\left [y − Xa\right ]}_{i}\right ) & &
\cr
& = 2{ \mathop{∑
}}_{i=1}^{m} {∂\over
∂{a}_{j}}\left ({\left [y − Xa\right ]}_{i}\right ){\left [y − Xa\right ]}_{i} & &
\cr
& = 2{ \mathop{∑
}}_{i=1}^{m} {∂\over
∂{a}_{j}}\left ({\left [y\right ]}_{i} −{\mathop{∑
}}_{k=1}^{n}{\left [X\right ]}_{
ik}{\left [a\right ]}_{k}\right ){\left [y − Xa\right ]}_{i} & &
\cr
& = 2{ \mathop{∑
}}_{i=1}^{m} −{\left [X\right ]}_{
ij}{\left [y − Xa\right ]}_{i} & &
\cr
& = −2{\left ({X}_{j}\right )}^{t}\left (y − Xa\right ) & &
\cr
& &
}

The first partial derivatives will allow us to find critical points, while second partial derivatives will be needed to confirm that a critical point will yield a minimum. Return to the next-to-last expression for the first partial derivative of S,

\eqalignno{
{∂\over
∂{a}_{ℓ}{a}_{j}}S & = {∂\over
∂{a}_{ℓ}}2{ \mathop{∑
}}_{i=1}^{m} −{\left [X\right ]}_{
ij}{\left [y − Xa\right ]}_{i} & &
\cr
& = −2{ \mathop{∑
}}_{i=1}^{m} {∂\over
∂{a}_{ℓ}}{\left [X\right ]}_{ij}{\left [y − Xa\right ]}_{i} & &
\cr
& = −2{ \mathop{∑
}}_{i=1}^{m}{\left [X\right ]}_{
ij} {∂\over
∂{a}_{ℓ}}\left ({\left [y\right ]}_{i} −{\mathop{∑
}}_{k=1}^{n}{\left [X\right ]}_{
ik}{\left [a\right ]}_{k}\right ) & &
\cr
& = −2{ \mathop{∑
}}_{i=1}^{m}{\left [X\right ]}_{
ij}\left (−{\left [X\right ]}_{iℓ}\right ) & &
\cr
& = 2{ \mathop{∑
}}_{i=1}^{m}{\left [X\right ]}_{
ij}{\left [X\right ]}_{iℓ} & &
\cr
& = 2{ \mathop{∑
}}_{i=1}^{m}{\left [{X}^{t}\right ]}_{
ji}{\left [X\right ]}_{iℓ} & &
\cr
& = 2{\left [{X}^{t}X\right ]}_{
jℓ} & &
}

For 1 ≤ j ≤ n, set {∂\over ∂{a}_{j}}S = 0. This results in the n scalar equations

\eqalignno{
{\left ({X}_{j}\right )}^{t}Xa & ={ \left ({X}_{
j}\right )}^{t}y &1 ≤ j ≤ n & & & & &
}

These n vector equations can be summarized in the single vector equation,

\eqalignno{
{X}^{t}Xa = {X}^{t}y & &
}

{X}^{t}X is an n × n matrix and since we have assumed that X has rank n, {X}^{t}X will also have rank n. Since {X}^{t}X is invertible, we have a critical point at

\eqalignno{
{a}^{′} ={ \left ({X}^{t}X\right )}^{−1}{X}^{t}y & &
}

Is this lone critical point really a minimum? The matrix of second partial derivatives is constant, and a positive multiple of {X}^{t}X. Theorem CPSM tells us that this matrix is positive semi-definite. In an advanced course on multivariable calculus, it is shown that a minimum occurs exactly where the matrix of second partial derivatives is positive semi-definite. You may have seen this in the two-variable case, where a check on the positive semi-definiteness is disguised with a determinant of the 2 × 2 matrix of second partial derivatives. ■

T20 Theorem IP constructs a unique polynomial through a set of n + 1 points in the plane, \left \{\left .({x}_{i},\kern 1.95872pt {y}_{i})\kern 1.95872pt \right \vert \kern 1.95872pt 1 ≤ i ≤ n + 1\right \}, where the x-coordinates are all different. Prove that the expression below is the same polynomial and include an explanation of the necessity of the hypothesis that the x-coordinates are all different.

\eqalignno{
p(x) & ={ \mathop{∑
}}_{i=1}^{n+1}{y}_{
i}{ \mathop{∏
}}_{\begin{array}{c}j=1
\\
j\mathrel{≠}i \end{array}}^{n+1} {x − {x}_{j}\over {
x}_{i} − {x}_{j}} & &
}

This is known as the Lagrange form of the interpolating polynomial.

Contributed by Robert Beezer