Part IB — Numerical Analysis

Based on lectures by G. Moore

Notes taken by Dexter Chua

Lent 2016

These notes are not endorsed by the lecturers, and I have modified them (often

significantly) after lectures. They are nowhere near accurate representations of what

was actually lectured, and in particular, all errors are almost surely mine.

Polynomial approximation

Interpolation by polynomials. Divided differences of functions and relations to deriva-

tives. Orthogonal polynomials and their recurrence relations. Least squares approx-

imation by polynomials. Gaussian quadrature formulae. Peano kernel theorem and

applications. [6]

Computation of ordinary differential equations

Euler’s method and proof of convergence. Multistep methods, including order, the root

condition and the concept of convergence. Runge-Kutta schemes. Stiff equations and

A-stability. [5]

Systems of equations and least squares calculations

LU triangular factorization of matrices. Relation to Gaussian elimination. Column

pivoting. Factorizations of symmetric and band matrices. The Newton-Raphson

method for systems of non-linear algebraic equations. QR factorization of rectangular

matrices by Gram-Schmidt, Givens and Householder techniques. Application to linear

least squares calculations. [5]

Contents

0 Introduction

1 Polynomial interpolation

1.1 The interpolation problem

1.2 The Lagrange formula

1.3 The Newton formula

1.4 A useful property of divided differences

1.5 Error bounds for polynomial interpolation

2 Orthogonal polynomials

2.1 Scalar product

2.2 Orthogonal polynomials

2.3 Three-term recurrence relation

2.4 Examples

2.5 Least-squares polynomial approximation

3 Approximation of linear functionals

3.1 Linear functionals

3.2 Gaussian quadrature

4 Expressing errors in terms of derivatives

5 Ordinary differential equations

5.1 Introduction

5.2 One-step methods

5.3 Multi-step methods

5.4 Runge-Kutta methods

6 Stiff equations

6.1 Introduction

6.2 Linear stability

7 Implementation of ODE methods

7.1 Local error estimation

7.2 Solving for implicit methods

8 Numerical linear algebra

8.1 Triangular matrices

8.2 LU factorization

8.3 A = LU for special A

9 Linear least squares

0 Introduction

Numerical analysis is the study of algorithms. There are many problems we

would like algorithms to solve. In this course, we will tackle the problems of

polynomial approximation, solving ODEs and solving linear equations. These

are all problems that frequently arise when we do (applied) maths.

In general, there are two things we are concerned with — accuracy and speed.

Accuracy is particularly important in the cases of polynomial approximation and

solving ODEs, since we are trying approximate things. We would like to make

good approximations of the solution with relatively little work. On the other

hand, in the case of solving linear equations, we are more concerned with speed

— our solutions will be exact (up to numerical errors due to finite precision of

calculations), but we would like to solve it quickly. We might have to deal with

huge systems, and we don’t want the computation time to grow too quickly.

In the past, this was an important subject, since they had no computers.

The algorithms had to be implemented by hand. It was thus very important to

find some practical and efficient methods of computing things, or else it would

take them forever to calculate what they wanted. So we wanted quick algorithms

that give reasonably accurate results.

Nowadays, this is still an important subject. While we have computers that

are much faster at computation, we still want our programs to be fast. We would

also want to get really accurate results, since we might be using them to, say,

send our rocket to the Moon. Moreover, with more computational power, we

might sacrifice efficiency for some other desirable properties. For example, if we

are solving for the trajectory of a particle, we might want the solution to satisfy

the conservation of energy. This would require some much more complicated and

slower algorithms that no one would have considered in the past. Nowadays, with

computers, these algorithms become more feasible, and are becoming increasingly

more popular.

1 Polynomial interpolation

Polynomials are nice. Writing down a polynomial of degree

n

involves only

n

+ 1

numbers. They are easy to evaluate, integrate and differentiate. So it would be

nice if we can approximate things with polynomials. For simplicity, we will only

deal with real polynomials.

Notation.

We write

P

n

[

x

] for the real linear vector space of polynomials (with

real coefficients) having degree n or less.

It is easy to show that dim(P

n

[x]) = n + 1.

1.1 The interpolation problem

The idea of polynomial interpolation is that we are given

n

+ 1 distinct interpo-

lation points

{x

i

}

n

i=0

⊆ R

, and

n

+ 1 data values

{f

i

}

n

i=0

⊆ R

. The objective is

to find a p ∈ P

n

[x] such that

p(x

i

) = f

i

for i = 0, ··· , n.

In other words, we want to fit a polynomial through the points (x

i

, f

i

).

There are many situations where this may come up. For example, we may

be given

n

+ 1 actual data points, and we want to fit a polynomial through

the points. Alternatively, we might have a complicated function

f

, and want

to approximate it with a polynomial

p

such that

p

and

f

agree on at least that

n + 1 points.

The naive way of looking at this is that we try a polynomial

p(x) = a

n

x

n

+ a

n−1

x

n−1

+ ··· + a

0

,

and then solve the system of equations

f

i

= p(x

i

) = a

n

x

n

i

+ a

n−1

x

n−1

i

+ ··· + a

0

.

This is a perfectly respectable system of

n

+ 1 equations in

n

+ 1 unknowns.

From linear algebra, we know that in general, such a system is not guaranteed

to have a solution, and if the solution exists, it is not guaranteed to be unique.

That was not helpful. So our first goal is to show that in the case of polynomial

interpolation, the solution exists and is unique.

1.2 The Lagrange formula

It turns out the problem is not too hard. You can probably figure it out yourself

if you lock yourself in a room for a few days (or hours). In fact, we will come up

with two solutions to the problem.

The first is via the Lagrange cardinal polynomials. These are simple to

state, and it is obvious that these work very well. However, practically, this is

not the best way to solve the problem, and we will not talk about them much.

Instead, we use this solution as a proof of existence of polynomial interpolations.

We will then develop our second method on the assumption that polynomial

interpolations exist, and find a better way of computing them.

Definition

(Lagrange cardinal polynomials)

.

The Lagrange cardinal polynomials

with respect to the interpolation points {x

i

}

n

i=0

are, for k = 0, ··· , n,

`

k

(x) =

n

Y

i=0,i6=k

x − x

i

x

k

− x

i

.

Note that these polynomials have degree exactly

n

. The significance of these

polynomials is we have

`

k

(

x

i

) = 0 for

i 6

=

k

, and

`

k

(

x

k

) = 1. In other words, we

have

`

k

(x

j

) = δ

jk

.

This is obvious from definition.

With these cardinal polynomials, we can immediately write down a solution

to the interpolation problem.

Theorem. The interpolation problem has exactly one solution.

Proof. We define p ∈ P

n

[x] by

p(x) =

n

X

k=0

f

k

`

k

(x).

Evaluating at x

i

gives

p(x

j

) =

n

X

k=0

f

k

`

k

(x

j

) =

n

X

k=0

f

k

δ

jk

= f

j

.

So we get existence.

For uniqueness, suppose

p, q ∈ P

n

[

x

] are solutions. Then the difference

r

=

p − q ∈ P

n

[

x

] satisfies

r

(

x

j

) = 0 for all

j

, i.e. it has

n

+ 1 roots. However, a

non-zero polynomial of degree

n

can have at most

n

roots. So in fact

p − q

is

zero, i.e. p = q.

While this works, it is not ideal. If we one day decide we should add one more

interpolation point, we would have to recompute all the cardinal polynomials,

and that is not fun. Ideally, we would like some way to reuse our previous

computations when we have new interpolation points.

1.3 The Newton formula

The idea of Newton’s formula is as follows — for

k

= 0

, ··· , n

, we write

p

k

∈ P

k

[

x

]

for the polynomial that satisfies

p

k

(x

i

) = f

i

for i = 0, ··· , k.

This is the unique degree-

k

polynomial that satisfies the first

k

+ 1 conditions,

whose existence (and uniqueness) is guaranteed by the previous section. Then

we can write

p(x) = p

n

(x) = p

0

(x) + (p

1

(x) − p

0

(x)) + ··· + (p

n

(x) − p

n−1

(x)).

Hence we are done if we have an efficient way of finding the differences

p

k

−p

k−1

.

We know that

p

k

and

p

k−1

agree on

x

0

, ··· , x

k−1

. So

p

k

−p

k−1

evaluates to

0 at those points, and we must have

p

k

(x) − p

k−1

(x) = A

k

k−1

Y

i=0

(x − x

i

),

for some A

k

yet to be found out. Then we can write

p(x) = p

n

(x) = A

0

+

n

X

k=1

A

k

k−1

Y

i=0

(x − x

i

).

This formula has the advantage that it is built up gradually from the interpo-

lation points one-by-one. If we stop the sum at any point, we have obtained

the polynomial that interpolates the data for the first

k

points (for some

k

).

Conversely, if we have a new data point, we just need to add a new term, instead

of re-computing everything.

All that remains is to find the coefficients

A

k

. For

k

= 0, we know

A

0

is the

unique constant polynomial that interpolates the point at x

0

, i.e. A

0

= f

0

.

For the others, we note that in the formula for

p

k

− p

k−1

, we find that

A

k

is

the leading coefficient of

x

k

. But

p

k−1

(

x

) has no degree

k

term. So

A

k

must

be the leading coefficient of

p

k

. So we have reduced our problem to finding the

leading coefficients of p

k

.

The solution to this is known as the Newton divided differences. We first

invent a new notation:

A

k

= f[x

0

, ··· , x

k

].

Note that these coefficients depend only on the first

k

interpolation points.

Moreover, since the labelling of the points

x

0

, ··· , x

k

is arbitrary, we don’t have

to start with x

0

. In general, the coefficient

f[x

j

, ··· , x

k

]

is the leading coefficient of the unique

q ∈ P

k−j

[

x

] such that

q

(

x

i

) =

f

i

for

i = j, ··· , k.

While we do not have an explicit formula for what these coefficients are, we

can come up with a recurrence relation for these coefficients.

Theorem

(Recurrence relation for Newton divided differences)

.

For 0

≤ j <

k ≤ n, we have

f[x

j

, ··· , x

k

] =

f[x

j+1

, ··· , x

k

] − f[x

j

, ··· , x

k−1

]

x

k

− x

j

.

Proof.

The key to proving this is to relate the interpolating polynomials. Let

q

0

, q

1

∈ P

k−j−1

[x] and q

2

∈ P

k−j

satisfy

q

0

(x

i

) = f

i

i = j, ··· , k − 1

q

1

(x

i

) = f

i

i = j + 1, ··· , k

q

2

(x

i

) = f

i

i = j, ··· , k

We now claim that

q

2

(x) =

x − x

j

x

k

− x

j

q

1

(x) +

x

k

− x

x

k

− x

j

q

0

(x).

We can check directly that the expression on the right correctly interpolates

the points

x

i

for

i

=

j, ··· , k

. By uniqueness, the two expressions agree. Since

f

[

x

j

, ··· , x

k

],

f

[

x

j+1

, ··· , x

k

] and

f

[

x

j

, ··· , x

k−1

] are the leading coefficients of

q

2

, q

1

, q

0

respectively, the result follows.

Thus the famous Newton divided difference table can be constructed

x

i

f

i

f[∗, ∗] f[∗, ∗, ∗] ··· f[∗, ··· , ∗]

x

0

f[x

0

]

f[x

0

, x

1

]

x

1

f[x

1

] f[x

0

, x

1

, x

2

]

f[x

1

, x

2

]

.

.

.

x

2

f[x

2

] f[x

2

, x

3

, x

4

] ··· f[x

0

, x

1

, ··· , x

n

]

f[x

2

, x

3

]

.

.

.

x

3

f[x

3

]

.

.

.

.

.

.

.

.

.

.

.

.

x

n

f[x

n

]

From the first

n

columns, we can find the

n

+ 1th column using the recurrence

relation above. The values of

A

k

can then be found at the top diagonal, and

this is all we really need. However, to compute this diagonal, we will need to

compute everything in the table.

In practice, we often need not find the actual interpolating polynomial. If

we just want to evaluate

p

(

ˆx

) at some new point

ˆx

using the divided table, we

can use Horner’s scheme, given by

S <- f[x

0

,..., x

n

]

for k = n - 1,..., 0

S <- (^x - x

k

)S + f[x

0

,..., x

k

]

end

This only takes O(n) operations.

If an extra data point

{x

n+1

, f

n+1

}

is added, then we only have to compute

an extra diagonal

f

[

x

k

, ··· , x

n+1

] for

k

=

n, ··· ,

0 in the divided difference table

to obtain the new coefficient, and the old results can be reused. This requires

O(n) operations. This is less straightforward for Lagrange’s method.

1.4 A useful property of divided differences

In the next couple of sections, we are interested in the error of polynomial inter-

polation. Suppose the data points come from

f

i

=

f

(

x

i

) for some complicated

f we want to approximate, and we interpolate f at n data points {x

i

}

n

i=1

with

p

n

. How does the error

e

n

(

x

) =

f

(

x

)

− p

n

(

x

) depend on

n

and the choice of

interpolation points?

At the interpolation point, the error is necessarily 0, by definition of interpo-

lation. However, this does not tell us anything about the error elsewhere.

Our ultimate objective is to bound the error

e

n

by suitable multiples of the

derivatives of

f

. We will do this in two steps. We first relate the derivatives to

the divided differences, and then relate the error to the divided differences.

The first part is an easy result based on the following purely calculus lemma.

Lemma.

Let

g ∈ C

m

[

a, b

] have a continuous

m

th derivative. Suppose

g

is zero

at m + ` distinct points. Then g

(m)

has at least ` distinct zeros in [a, b].

Proof.

This is a repeated application of Rolle’s theorem. We know that between

every two zeros of

g

, there is at least one zero of

g

0

∈ C

m−1

[

a, b

]. So by

differentiating once, we have lost at most 1 zeros. So after differentiating

m

times, g

(m)

has lost at most m zeros. So it still has at least ` zeros.

Theorem.

Let

{x

i

}

n

i=0

∈

[

a, b

] and

f ∈ C

n

[

a, b

]. Then there exists some

ξ ∈ (a, b) such that

f[x

0

, ··· , x

n

] =

1

n!

f

(n)

(ξ).

Proof.

Consider

e

=

f − p

n

∈ C

n

[

a, b

]. This has at least

n

+ 1 distinct zeros in

[

a, b

]. So by the lemma,

e

(n)

=

f

(n)

− p

(n)

n

must vanish at some

ξ ∈

(

a, b

). But

then p

(n)

n

= n!f[x

0

, ··· , x

n

] constantly. So the result follows.

1.5 Error bounds for polynomial interpolation

The actual error bound is not too difficult as well. It turns out the error

e

=

f − p

n

is “like the next term in the Newton’s formula”. This vague sentence

is made precise in the following theorem:

Theorem.

Assume

{x

i

}

n

i=0

⊆

[

a, b

] and

f ∈ C

[

a, b

]. Let

¯x ∈

[

a, b

] be a non-

interpolation point. Then

e

n

(¯x) = f[x

0

, x

1

, ··· , x

n

, ¯x]ω(¯x),

where

ω(x) =

n

Y

i=0

(x − x

i

).

Note that we forbid the case where

¯x

is an interpolation point, since it is

not clear what the expression

f

[

x

0

, x

1

, ··· , x

n

, ¯x

] means. However, if

¯x

is an

interpolation point, then both

e

n

(

¯x

) and

ω

(

¯x

) are zero, so there isn’t much to

say.

Proof. We think of ¯x = x

n+1

as a new interpolation point so that

p

n+1

(x) − p

n

(x) = f[x

0

, ··· , x

n

, ¯x]ω(x)

for all

x ∈ R

. In particular, putting

x

=

¯x

, we have

p

n+1

(

¯x

) =

f

(

¯x

), and we get

the result.

Combining the two results, we find

Theorem.

If in addition

f ∈ C

n+1

[

a, b

], then for each

x ∈

[

a, b

], we can find

ξ

x

∈ (a, b) such that

e

n

(x) =

1

(n + 1)!

f

(n+1)

(ξ

x

)ω(x)

Proof.

The statement is trivial if

x

is an interpolation point — pick arbitrary

ξ

x

, and both sides are zero. Otherwise, this follows directly from the last two

theorems.

This is an exact result, which is not too useful, since there is no easy

constructive way of finding what

ξ

x

should be. Instead, we usually go for a

bound. We introduce the max norm

kgk

∞

= max

t∈[a,b]

|g(t)|.

This gives the more useful bound

Corollary. For all x ∈ [a, b], we have

|f(x) − p

n

(x)| ≤

1

(n + 1)!

kf

(n+1)

k

∞

|ω(x)|

Assuming our function

f

is fixed, this error bound depends only on

ω

(

x

),

which depends on our choice of interpolation points. So can we minimize

ω

(

x

)

in some sense by picking some clever interpolation points ∆ =

{x

i

}

n

i=0

? Here

we will have

n

fixed. So instead, we put ∆ as the subscript. We can write our

bound as

kf − p

∆

k

∞

≤

1

(n + 1)!

kf

(n+1)

k

∞

kω

∆

k

∞

.

So the objective is to find a ∆ that minimizes kω

∆

k

∞

.

For the moment, we focus on the special case where the interval is [

−

1

,

1].

The general solution can be obtained by an easy change of variable.

For some magical reasons that hopefully will become clear soon, the optimal

choice of ∆ comes from the Chebyshev polynomials.

Definition

(Chebyshev polynomial)

.

The Chebyshev polynomial of degree

n

on

[−1, 1] is defined by

T

n

(x) = cos(nθ),

where x = cos θ with θ ∈ [0, π].

So given an

x

, we find the unique

θ

that satisfies

x

=

cos θ

, and then find

cos

(

nθ

). This is in fact a polynomial in disguise, since from trigonometric

identities, we know

cos

(

nθ

) can be expanded as a polynomial in

cos θ

up to

degree n.

Two key properties of T

n

on [−1, 1] are

(i) The maximum absolute value is obtained at

X

k

= cos

πk

n

for k = 0, ··· , n with

T

n

(X

k

) = (−1)

k

.

(ii) This has n distinct zeros at

x

k

= cos

2k − 1

2n

π

.

for k = 1, ··· , n.

When plotted out, the polynomials look like this:

x

T

4

(x)

−1 1

1

−1

All that really matters about the Chebyshev polynomials is that the maximum

is obtained at

n

+ 1 distinct points with alternating sign. The exact form of the

polynomial is not really important.

Notice there is an intentional clash between the use of

x

k

as the zeros and

x

k

as the interpolation points — we will show these are indeed the optimal

interpolation points.

We first prove a convenient recurrence relation for the Chebyshev polynomials:

Lemma

(3-term recurrence relation)

.

The Chebyshev polynomials satisfy the

recurrence relations

T

n+1

(x) = 2xT

n

(x) − T

n−1

(x)

with initial conditions

T

0

(x) = 1, T

1

(x) = x.

Proof.

cos((n + 1)θ) + cos((n − 1)θ) = 2 cos θ cos(nθ).

This recurrence relation can be useful for many things, but for our purposes,

we only use it to show that the leading coefficient of T

n

is 2

n−1

(for n ≥ 1).

Theorem

(Minimal property for

n ≥

1)

.

On [

−

1

,

1], among all polynomials

p ∈ P

n

[

x

] with leading coefficient 1,

1

2

n−1

kT

n

k

minimizes

kpk

∞

. Thus, the

minimum value is

1

2

n−1

.

Proof.

We proceed by contradiction. Suppose there is a polynomial

q

n

∈ P

n

with leading coefficient 1 such that kq

n

k

∞

<

1

2

n−1

. Define a new polynomial

r =

1

2

n−1

T

n

− q

n

.

This is, by assumption, non-zero.

Since both the polynomials have leading coefficient 1, the difference must

have degree at most

n −

1, i.e.

r ∈ P

n−1

[

x

]. Since

1

2

n−1

T

n

(

X

k

) =

±

1

2

n−1

, and

|q

n

(

X

n

)

| <

1

2

n−1

by assumption,

r

alternates in sign between these

n

+ 1 points.

But then by the intermediate value theorem,

r

has to have at least

n

zeros. This

is a contradiction, since r has degree n − 1, and cannot be zero.

Corollary. Consider

w

∆

=

n

Y

i=0

(x − x

i

) ∈ P

n+1

[x]

for any distinct points ∆ = {x

i

}

n

i=0

⊆ [−1, 1]. Then

min

∆

kω

∆

k

∞

=

1

2

n

.

This minimum is achieved by picking the interpolation points to be the zeros of

T

n+1

, namely

x

k

= cos

2k + 1

2n + 2

π

, k = 0, ··· , n.

Theorem.

For

f ∈ C

n+1

[

−

1

,

1], the Chebyshev choice of interpolation points

gives

kf − p

n

k

∞

≤

1

2

n

1

(n + 1)!

kf

(n+1)

k

∞

.

Suppose

f

has as many continuous derivatives as we want. Then as we

increase

n

, what happens to the error bounds? The coefficients involve dividing

by an exponential and a factorial. Hence as long as the higher derivatives of

f

don’t blow up too badly, in general, the error will tend to zero as

n → ∞

, which

makes sense.

The last two results can be easily generalized to arbitrary intervals [

a, b

], and

this is left as an exercise for the reader.

2 Orthogonal polynomials

It turns out the Chebyshev polynomials is just an example of a more general

class of polynomials, known as orthogonal polynomials. As in linear algebra, we

can define a scalar product on the space of polynomials, and then find a basis

of orthogonal polynomials of the vector space under this scalar product. We

shall show that each set of orthogonal polynomials has a three-term recurrence

relation, just like the Chebyshev polynomials.

2.1 Scalar product

The scalar products we are interested in would be generalization of the usual

scalar product on Euclidean space,

hx, yi =

n

X

i=1

x

i

y

i

.

We want to generalize this to vector spaces of functions and polynomials. We

will not provide a formal definition of vector spaces and scalar products on an

abstract vector space. Instead, we will just provide some examples of commonly

used ones.

Example.

(i)

Let

V

=

C

s

[

a, b

], where [

a, b

] is a finite interval and

s ≥

0. Pick a weight

function

w

(

x

)

∈ C

(

a, b

) such that

w

(

x

)

>

0 for all

x ∈

(

a, b

), and

w

is

integrable over [

a, b

]. In particular, we allow

w

to vanish at the end points,

or blow up mildly such that it is still integrable.

We then define the inner product to be

hf, gi =

Z

b

a

w(x)f(x)d(x) dx.

(ii)

We can allow [

a, b

] to be infinite, e.g. [0

, ∞

) or even (

−∞, ∞

), but we have

to be more careful. We first define

hf, gi =

Z

b

a

w(x)f(x)g(x) dx

as before, but we now need more conditions. We require that

R

b

a

w

(

x

)

x

n

d

x

to exist for all

n ≥

0, since we want to allow polynomials in our vector

space. For example,

w

(

x

) =

e

−x

on [0

, ∞

), works, or

w

(

x

) =

e

−x

2

on

(

−∞, ∞

). These are scalar products for

P

n

[

x

] for

n ≥

0, but we cannot

extend this definition to all smooth functions since they might blow up too

fast at infinity. We will not go into the technical details, since we are only

interested in polynomials, and knowing it works for polynomials suffices.

(iii) We can also have a discrete inner product, defined by

hf, gi =

m

X

j=1

w

j

f(ξ

j

)g(ξ

j

)

with

{ξ

j

}

m

j=1

distinct points and

{w

j

}

m

j=1

>

0. Now we have to restrict

ourselves a lot. This is a scalar product for

V

=

P

m−1

[

x

], but not for

higher degrees, since a scalar product should satisfy

hf, f i >

0 for

f 6

= 0.

In particular, we cannot extend this to all smooth functions.

With an inner product, we can define orthogonality.

Definition

(Orthogonalilty)

.

Given a vector space

V

and an inner product

h·, ·i, two vectors f, g ∈ V are orthogonal if hf, gi = 0.

2.2 Orthogonal polynomials

Definition

(Orthogonal polynomial)

.

Given a vector space

V

of polynomials

and inner product

h·, ·i

, we say

p

n

∈ P

n

[

x

] is the

n

th orthogonal polynomial if

hp

n

, qi = 0 for all q ∈ P

n−1

[x].

In particular, hp

n

, p

m

i = 0 for n 6= m.

We said the orthogonal polynomial, but we need to make sure such a polyno-

mial has to be unique. It is clearly not unique, since if

p

n

satisfies these relations,

then so does

λp

n

for all

λ 6

= 0. For uniqueness, we need to impose some scaling.

We usually do so by requiring the leading polynomial to be 1, i.e. it is monic.

Definition

(Monic polynomial)

.

A polynomial

p ∈ P

n

[

x

] is monic if the coeffi-

cient of x

n

is 1.

In practice, most famous traditional polynomials are not monic. They have

a different scaling imposed. Still, as long as we have some scaling, we will have

uniqueness.

We will not mess with other scalings, and stick to requiring them to be monic

since this is useful for proving things.

Theorem.

Given a vector space

V

of functions and an inner product

h·, ·i

,

there exists a unique monic orthogonal polynomial for each degree

n ≥

0. In

addition, {p

k

}

n

k=0

form a basis for P

n

[x].

Proof.

This is a big induction proof over both parts of the theorem. We induct

over

n

. For the base case, we pick

p

0

(

x

) = 1, which is the only degree-zero monic

polynomial.

Now suppose we already have {p

n

}

n

k=0

satisfying the induction hypothesis.

Now pick any monic

q

n+1

∈ P

n+1

[

x

], e.g.

x

n+1

. We now construct

p

n+1

from

q

n+1

by the Gram-Schmidt process. We define

p

n+1

= q

n+1

−

n

X

k=0

hq

n+1

, p

k

i

hp

k

, p

k

i

p

k

.

This is again monic since q

n+1

is, and we have

hp

n+1

, p

m

i = 0

for all m ≤ n, and hence hp

n+1

, pi = 0 for all p ∈ P

n

[x] = hp

0

, ··· , p

n

i.

To obtain uniqueness, assume both

p

n+1

, ˆp

n+1

∈ P

n+1

[

x

] are both monic

orthogonal polynomials. Then r = p

n+1

− ˆp

n+1

∈ P

n

[x]. So

hr, ri = hr, p

n+1

− ˆp

n+1

i = hr, p

n+1

i − hr, ˆp

n+1

i = 0 − 0 = 0.

So r = 0. So p

n+1

= ˆp

n−1

.

Finally, we have to show that

p

0

, ··· , p

n+1

form a basis for

P

n+1

[

x

]. Now

note that every p ∈ P

n+1

[x] can be written uniquely as

p = cp

n+1

+ q,

where

q ∈ P

n

[

x

]. But

{p

k

}

n

k=0

is a basis for

P

n

[

x

]. So

q

can be uniquely

decomposed as a linear combination of p

0

, ··· , p

n

.

Alternatively, this follows from the fact that any set of orthogonal vectors

must be linearly independent, and since there are

n

+ 2 of these vectors and

P

n+1

[x] has dimension n + 2, they must be a basis.

In practice, following the proof naively is not the best way of producing the

new

p

n+1

. Instead, we can reduce a lot of our work by making a clever choice of

q

n+1

.

2.3 Three-term recurrence relation

Recall that for the Chebyshev polynomials, we obtained a three-term recurrence

relation for them using special properties of the cosine. It turns out these

recurrence relations exist in general for orthogonal polynomials.

We start by picking

q

n+1

=

xp

n

in the previous proof. We now use the fact

that

hxf, gi = hf, xgi.

This is not necessarily true for arbitrary inner products, but for most sensible

inner products we will meet in this course, this is true. In particular, it is clearly

true for inner products of the form

hf, gi =

Z

w(x)f(x)g(x) dx.

Assuming this, we obtain the following theorem.

Theorem. Monic orthogonal polynomials are generated by

p

k+1

(x) = (x − α

k

)p

k

(x) − β

k

p

k−1

(x)

with initial conditions

p

0

= 1, p

1

(x) = (x − α

0

)p

0

,

where

α

k

=

hxp

k

, p

k

i

hp

k

, p

k

i

, β

k

=

hp

k

, p

k

i

hp

k−1

, p

k−1

i

.

Proof. By inspection, the p

1

given is monic and satisfies

hp

1

, p

0

i = 0.

Using q

n+1

= xp

n

in the Gram-Schmidt process gives

p

n+1

= xp

n

−

n

X

k=0

hxp

n

, p

k

i

hp

k

, p

k

i

p

k

p

n+1

= xp

n

−

n

X

k=0

hp

n

, xp

k

i

hp

k

, p

k

i

p

k

We notice that

hp

n

, xp

k

i

and vanishes whenever

xp

k

has degree less than

n

. So

we are left with

= xp

n

−

hxp

n

, p

n

i

hp

n

, p

n

i

p

n

−

hp

n

, xp

n−1

i

hp

n−1

, p

n−1

i

p

n−1

= (x − α

n

)p

n

−

hp

n

, xp

n−1

i

hp

n−1

, p

n−1

i

p

n−1

.

Now we notice that

xp

n−1

is a monic polynomial of degree

n

so we can write

this as xp

n−1

= p

n

+ q. Thus

hp

n

, xp

n−1

i = hp

n

, p

n

+ qi = hp

n

, p

n

i.

Hence the coefficient of p

n−1

is indeed the β we defined.

2.4 Examples

The four famous examples are the Legendre polynomials, Chebyshev polynomials,

Laguerre polynomials and Hermite polynomials. We first look at how the

Chebyshev polynomials fit into this framework.

Chebyshev is based on the scalar product defined by

hf, gi =

Z

1

−1

1

√

1 − x

2

f(x)g(x) dx.

Note that the weight function blows up mildly at the end, but this is fine since

it is still integrable.

This links up with

T

n

(x) = cos(nθ)

for x = cos θ via the usual trigonometric substitution. We have

hT

n

, T

m

i =

Z

π

0

1

√

1 − cos

2

θ

cos(nθ) cos(mθ) sin θ dθ

=

Z

π

0

cos(nθ) cos(mθ) dθ

= 0 if m 6= n.

The other orthogonal polynomials come from scalar products of the form

hf, gi =

Z

b

a

w(x)f(x)g(x) dx,

as described in the table below:

Type Range Weight

Legendre [−1, 1] 1

Chebyshev [−1, 1]

1

√

1−x

2

Laguerre [0, ∞) e

−x

Hermite (−∞, ∞) e

−x

2

2.5 Least-squares polynomial approximation

If we want to approximate a function with a polynomial, polynomial interpolation

might not be the best idea, since all we do is make sure the polynomial agrees

with

f

at certain points, and then hope it is a good approximation elsewhere.

Instead, the idea is to choose a polynomial

p

in

P

n

[

x

] that “minimizes the error”.

What exactly do we mean by minimizing the error? The error is defined as

the function

f −p

. So given an appropriate inner product on the vector space of

continuous functions, we want to minimize

kf − pk

2

= hf − p, f − pi.

This is usually of the form

hf − p, f − pi =

Z

b

a

w(x)[f(x) − p(x)]

2

dx,

but we can also use alternative inner products such as

hf − p, f − pi =

m

X

j=1

w

j

[f(ξ

i

) − p(ξ

i

)]

2

.

Unlike polynomial interpolation, there is no guarantee that the approximation

agrees with the function anywhere. Unlike polynomial interpolation, there is

some guarantee that the total error is small (or as small as we can get, by

definition). In particular, if

f

is continuous, then the Weierstrass approximation

theorem tells us the total error must eventually vanish.

Unsurprisingly, the solution involves the use of the orthogonal polynomials

with respect to the corresponding inner products.

Theorem.

If

{p

n

}

n

k=0

are orthogonal polynomials with respect to

h·, ·i

, then

the choice of c

k

such that

p =

n

X

k=0

c

k

p

k

minimizes kf − pk

2

is given by

c

k

=

hf, p

k

i

kp

k

k

2

,

and the formula for the error is

kf − pk

2

= kfk

2

−

n

X

k=0

hf, p

k

i

2

kp

k

k

2

.

Note that the solution decouples, in the sense that

c

k

depends only on

f

and

p

k

. If we want to take one more term, we just need to compute an extra term,

and not redo our previous work.

Also, we notice that the formula for the error is a positive term

kfk

2

sub-

tracting a lot of squares. As we increase

n

, we subtract more squares, and the

error decreases. If we are lucky, the error tends to 0 as we take

n → ∞

. Even

though we might not know how many terms we need in order to get the error to

be sufficiently small, we can just keep adding terms until the computed error

small enough (which is something we have to do anyway even if we knew what

n to take).

Proof. We consider a general polynomial

p =

n

X

k=0

c

k

p

k

.

We substitute this in to obtain

hf − p, f − pi = hf, fi − 2

n

X

k=0

c

k

hf, p

k

i +

n

X

k=0

c

2

k

kp

k

k

2

.

Note that there are no cross terms between the different coefficients. We minimize

this quadratic by setting the partial derivatives to zero:

0 =

∂

∂c

k

hf − p, f − pi = −2hf, p

k

i + 2c

k

kp

k

k

2

.

To check this is indeed a minimum, note that the Hessian matrix is simply 2

I

,

which is positive definite. So this is really a minimum. So we get the formula for

the

c

k

’s as claimed, and putting the formula for

c

k

gives the error formula.

Note that our constructed

p ∈ P

n

[

x

] has a nice property: for

k ≤ n

, we have

hf − p, p

k

i = hf, p

k

i − hp, p

k

i = hf, p

k

i −

hf, p

k

i

kp

k

k

2

hp

k

, p

k

i = 0.

Thus for all q ∈ P

n

[x], we have

hf − p, qi = 0.

In particular, this is true when

q

=

p

, and tells us

hf, pi

=

hp, pi

. Using this to

expand hf − p, f − pi gives

kf − pk

2

+ kpk

2

= kfk

2

,

which is just a glorified Pythagoras theorem.

3 Approximation of linear functionals

3.1 Linear functionals

In this chapter, we are going to study approximations of linear functions. Before

we start, it is helpful to define what a linear functional is, and look at certain

examples of these.

Definition

(Linear functional)

.

A linear functional is a linear mapping

L

:

V →

R, where V is a real vector space of functions.

In generally, a linear functional is a linear mapping from a vector space to

its underlying field of scalars, but for the purposes of this course, we will restrict

to this special case.

We usually don’t put so much emphasis on the actual vector space

V

. Instead,

we provide a formula for

L

, and take

V

to be the vector space of functions for

which the formula makes sense.

Example.

(i) We can choose some fixed ξ ∈ R, and define a linear functional by

L(f) = f(ξ).

(ii) Alternatively, for fixed η ∈ R we can define our functional by

L(f) = f

0

(η).

In this case, we need to pick a vector space in which this makes sense, e.g.

the space of continuously differentiable functions.

(iii) We can define

L(f) =

Z

b

a

f(x) dx.

The set of continuous (or even just integrable) functions defined on [

a, b

]

will be a sensible domain for this linear functional.

(iv)

Any linear combination of these linear functions are also linear functionals.

For example, we can pick some fixed α, β ∈ R, and define

L(f) = f(β) − f(α) −

β − α

2

[f

0

(β) + f

0

(α)].

The objective of this chapter is to construct approximations to more compli-

cated linear functionals (usually integrals, possibly derivatives point values) in

terms of simpler linear functionals (usually point values of f itself).

For example, we might produce an approximation of the form

L(f) ≈

N

X

i=0

a

i

f(x

i

),

where V = C

p

[a, b], p ≥ 0, and {x

i

}

N

i=0

⊆ [a, b] are distinct points.

How can we choose the coefficients

a

i

and the points

x

i

so that our approxi-

mation is “good”?

We notice that most of our functionals can be easily evaluated exactly when

f

is a polynomial. So we might approximate our function

f

by a polynomial,

and then do it exactly for polynomials.

More precisely, we let

{x

i

}

N

i=0

⊆

[

a, b

] be arbitrary points. Then using the

Lagrange cardinal polynomials `

i

, we have

f(x) ≈

N

X

i=0

f(x

i

)`

i

(x).

Then using linearity, we can approximate

L(f) ≈ L

N

X

i=0

f(x

i

)`

i

(x)

!

=

N

X

i=0

L(`

i

)f(x

i

).

So we can pick

a

i

= L(`

i

).

Similar to polynomial interpolation, this formula is exact for

f ∈ P

N

[

x

]. But we

could do better. If we can freely choose

{a

i

}

N

i=0

and

{x

i

}

N

i=0

, then since we now

have 2

n

+ 2 free parameters, we might expect to find an approximation that is

exact for

f ∈ P

2N+1

[

x

]. This is not always possible, but there are cases when

we can. The most famous example is Gaussian quadrature.

3.2 Gaussian quadrature

The objective of Gaussian quadrature is to approximate integrals of the form

L(f) =

Z

b

a

w(x)f(x) dx,

where w(x) is a weight function that determines a scalar product.

Traditionally, we have a different set of notations used for Gaussian quadra-

ture. So just in this section, we will use some funny notation that is inconsistent

with the rest of the course.

We let

hf, gi =

Z

b

a

w(x)f(x)g(x) dx

be a scalar product for

P

ν

[

x

]. We will show that we can find weights, written

{b

n

}

ν

k=1

, and nodes, written {c

k

}

ν

k=1

⊆ [a, b], such that the approximation

Z

b

a

w(x)f(x) dx ≈

ν

X

k=1

b

k

f(c

k

)

is exact for

f ∈ P

2ν−1

[

x

]. The nodes

{c

k

}

ν

k=1

will turn out to be the zeros of

the orthogonal polynomial

p

ν

with respect to the scalar product. The aim of

this section is to work this thing out.

We start by showing that this is the best we can achieve.

Proposition.

There is no choice of

ν

weights and nodes such that the approxi-

mation of

R

b

a

w(x)f(x) dx is exact for all f ∈ P

2ν

[x].

Proof. Define

q(x) =

ν

Y

k=1

(x − c

k

) ∈ P

ν

[x].

Then we know

Z

b

a

w(x)q

2

(x) dx > 0,

since q

2

is always non-negative and has finitely many zeros. However,

ν

X

k=1

b

k

q

2

(c

n

) = 0.

So this cannot be exact for q

2

.

Recall that we initially had the idea of doing the approximation by interpo-

lating f at some arbitrary points in [a, b]. We call this ordinary quadrature.

Theorem

(Ordinary quadrature)

.

For any distinct

{c

k

}

ν

k=1

⊆

[

a, b

], let

{`

k

}

ν

k=1

be the Lagrange cardinal polynomials with respect to

{c

k

}

ν

k=1

. Then by choosing

b

k

=

Z

b

a

w(x)`

k

(x) dx,

the approximation

L(f) =

Z

b

a

w(x)f(x) dx ≈

ν

X

k=1

b

k

f(c

k

)

is exact for f ∈ P

ν−1

[x].

We call this method ordinary quadrature.

This simple idea is how we generate many classical numerical integration

techniques, such as the trapezoidal rule. But those are quite inaccurate. It turns

out a clever choice of

{c

k

}

does much better — take them to be the zeros of

the orthogonal polynomials. However, to do this, we must make sure the roots

indeed lie in [

a, b

]. This is what we will prove now — given any inner product,

the roots of the orthogonal polynomials must lie in [a, b].

Theorem.

For

ν ≥

1, the zeros of the orthogonal polynomial

p

ν

are real, distinct

and lie in (a, b).

We have in fact proved this for a particular case in IB Methods, and the

same argument applies.

Proof.

First we show there is at least one root. Notice that

p

0

= 1. Thus for

ν ≥ 1, by orthogonality, we know

Z

b

a

w(x)p

ν

(x)p

1

(x) dx =

Z

b

a

w(x)p

ν

(x) dx = 0.

So there is at least one sign change in (

a, b

). We have already got the result we

need for ν = 1, since we only need one zero in (a, b).

Now for

ν >

1, suppose

{ξ

j

}

m

j=1

are the places where the sign of

p

ν

changes

in (a, b) (which is a subset of the roots of p

ν

). We define

q(x) =

m

Y

j=1

(x − ξ

j

) ∈ P

m

[x].

Since this changes sign at the same place as

p

ν

, we know

qp

ν

maintains the same

sign in (a, b). Now if we had m < ν, then orthogonality gives

hq, p

ν

i =

Z

b

a

w(x)q(x)p

ν

(x) dx = 0,

which is impossible, since

qp

ν

does not change sign. Hence we must have

m = ν.

Theorem.

In the ordinary quadrature, if we pick

{c

k

}

ν

k=1

to be the roots of

p

ν

(

x

), then get we exactness for

f ∈ P

2ν−1

[

x

]. In addition,

{b

n

}

ν

k=1

are all

positive.

Proof. Let f ∈ P

2ν−1

[x]. Then by polynomial division, we get

f = qp

ν

+ r,

where

q, r

are polynomials of degree at most

ν −

1. We apply orthogonality to

get

Z

b

a

w(x)f(x) dx =

Z

b

a

w(x)(q(x)p

ν

(x) + r(x)) dx =

Z

b

a

w(x)r(x) dx.

Also, since each c

k

is a root of p

ν

, we get

ν

X

k=1

b

k

f(c

k

) =

ν

X

k=1

b

k

(q(c

k

)p

ν

(c

k

) + r(c

k

)) =

ν

X

k=1

b

k

r(c

k

).

But

r

has degree at most

ν −

1, and this formula is exact for polynomials in

P

ν−1

[x]. Hence we know

Z

b

a

w(x)f(x) dx =

Z

b

a

w(x)r(x) dx =

ν

X

k=1

b

k

r(c

k

) =

ν

X

k=1

b

k

f(c

k

).

To show the weights are positive, we simply pick as special

f

. Consider

f ∈

{`

2

k

}

ν

k=1

⊆ P

2ν−2

[

x

], for

`

k

the Lagrange cardinal polynomials for

{c

k

}

ν

k=1

. Since

the quadrature is exact for these, we get

0 <

Z

b

a

w(x)`

2

k

(x) dx =

ν

X

j=1

b

j

`

2

k

(c

j

) =

ν

X

j=1

b

j

δ

jk

= b

k

.

Since this is true for all k = 1, ··· , ν, we get the desired result.

4 Expressing errors in terms of derivatives

As before, we approximate a linear functional L by

L(f) ≈

n

X

i=0

a

i

L

i

(f),

where

L

i

are some simpler linear functionals, and suppose this is exact for

f ∈ P

k

[x] for some k ≥ 0.

Hence we know the error

e

L

(f) = L(f) −

n

X

i=0

a

i

L

i

(f) = 0

whenever

f ∈ P

k

[

x

]. We say the error annihilates for polynomials of degree less

than k.

How can we use this property to generate formulae for the error and error

bounds? We first start with a rather simple example.

Example. Let L(f) = f(β). We decide to be silly and approximate L(f) by

L(f) ≈ f(α) +

β − α

2

(f

0

(β) + f

0

(α)),

where α 6= β. This is clearly much easier to evaluate. The error is given by

e

L

(f) = f(β) − f(α) −

β − α

2

(f

0

(β) + f

0

(α)),

and this vanishes for f ∈ P

2

[x].

How can we get a more useful error formula? We can’t just use the fact that

it annihilates polynomials of degree

k

. We need to introduce something beyond

this — the k + 1th derivative. We now assume f ∈ C

k+1

[a, b].

Note that so far, everything we’ve done works if the interval is infinite, as long

as the weight function vanishes sufficiently quickly as we go far away. However,

for this little bit, we will need to require [

a, b

] to be finite, since we want to make

sure we can take the supremum of our functions.

We now seek an exact error formula in terms of

f

(k+1)

, and bounds of the

form

|e

L

(f)| ≤ c

L

kf

(k+1)

k

∞

for some constant

c

L

. Moreover, we want to make

c

L

as small as possible. We

don’t want to give a constant of 10 million, while in reality we can just use 2.

Definition

(Sharp error bound)

.

The constant

c

L

is said to be sharp if for any

ε > 0, there is some f

ε

∈ C

k+1

[a, b] such that

|e

L

(f)| ≥ (c

L

− ε)kf

(k+1)

ε

k

∞

.

This makes it precise what we mean by

c

L

“cannot be better”. This doesn’t

say anything about whether

c

L

can actually be achieved. This depends on the

particular form of the question.

To proceed, we need Taylor’s theorem with the integral remainder, i.e.

f(x) = f(a) + (x −a)f

0

(a) + ···+

(x − a)

k

k!

f

(k)

(a) +

1

k!

Z

x

a

(x −θ)

k

f

(k+1)

(θ) dθ.

This is not really good, since there is an

x

in the upper limit of the integral.

Instead, we write the integral as

Z

b

a

(x − θ)

k

+

f

(k+1)

(θ) dθ,

where we define (x − θ)

k

+

is a new function defined by

(x − θ)

k

+

=

(

(x − θ)

k

x ≥ θ

0 x < θ.

Then if λ is a linear functional that annihilates P

k

[x], then we have

λ(f) = λ

1

k!

Z

b

a

(x − θ)

k

+

f

(k+1)

(θ) dθ

!

for all f ∈ C

k+1

[a, b].

For our linear functionals, we can simplify by taking the

λ

inside the integral

sign and obtain

λ(f) =

1

k!

Z

b

a

λ((x − θ)

k

+

)f

(k+1)

(θ) dθ,

noting that

λ

acts on (

x − θ

)

k

+

∈ C

k−1

[

a, b

] as a function of

x

, and think of

θ

as

being held constant.

Of course, pure mathematicians will come up with linear functionals for

which we cannot move the

λ

inside, but for our linear functionals (point values,

derivative point values, integrals etc.), this is valid, as we can verify directly.

Hence we arrive at

Theorem

(Peano kernel theorem)

.

If

λ

annihilates polynomials of degree

k

or

less, then

λ(f) =

1

k!

Z

b

a

K(θ)f

(k+1)

(θ) dθ

for all f ∈ C

k+1

[a, b], where

Definition (Peano kernel). The Peano kernel is

K(θ) = λ((x − θ)

k

+

).

The important thing is that the kernel

K

is independent of

f

. Taking suprema

in different ways, we obtain different forms of bounds:

|λ(f)| ≤

1

k!

Z

b

a

|K(θ)| dθkf

(k+1)

k

∞

Z

b

a

|K(θ)|

2

dθ

!

1

2

kf

(k+1)

k

2

kK(θ)k

∞

kf

(k+1)

k

1

.

Hence we can find the constant

c

L

for different choices of the norm. When

computing c

L

, don’t forget the factor of

1

k!

!

By fiddling with functions a bit, we can show these bounds are indeed sharp.

Example. Consider our previous example where

e

L

(f) = f(β) − f(α) −

β − α

2

(f

0

(β) + f

0

(α)),

with exactness up to polynomials of degree 2. We wlog assume α < β. Then

K(θ) = e

L

((x − θ)

2

+

) = (β − θ)

2

+

− (α − θ)

2

+

− (β − α)((β − θ)

+

+ (α − θ)

+

).

Hence we get

K(θ) =

0 a ≤ θ ≤ α

(α − θ)(β − θ) α ≤ θ ≤ β

0 β ≤ θ ≤ b.

Hence we know

e

L

(f) =

1

2

Z

β

α

(α − θ)(β − θ)f

000

(θ) dθ

for all f ∈ C

3

[a, b].

Note that in this particular case, our function

K

(

θ

) does not change sign on

[a, b]. Under this extra assumption, we can say a bit more.

First, we note that the bound

|λ(f)| ≤

1

k!

Z

b

a

K(θ) dθ

kf

(k+1)

k

∞

can be achieved by

x

k+1

, since this has constant

k

+ 1th derivative. Also, we

can use the integral mean value theorem to get the bound

λ(f) =

1

k!

Z

b

a

K(θ) dθ

!

f

(k+1)

(ξ),

where ξ ∈ (a, b) depends on f. These are occasionally useful.

Example.

Continuing our previous example, we see that

K

(

θ

)

≤

0 on [

a, b

],

and

Z

b

a

K(θ) dθ = −

1

6

(β − α)

3

.

Hence we have the bound

|e

L

(f)| ≤

1

12

(β − α)

3

kf

000

k

∞

,

and this bound is achieved for x

3

. We also have

e

L

(f) = −

1

12

(β − α)

3

f

000

(ξ)

for some f-dependent value of some ξ ∈ (a, b).

Finally, note that Peano’s kernel theorem says if

e

L

(

f

) = 0 for all

f ∈ P

k

[

x

],

then we have

e

L

(f) =

1

k!

Z

b

a

K(θ)f

(k+1)

(θ) dθ

for all f ∈ C

k+1

[a, b].

But for any other fixed

j

= 0

, ··· , k −

1, we also have

e

L

(

f

) = 0 for all

f ∈ P

j

[x]. So we also know

e

L

(f) =

1

j!

Z

b

a

K

j

(θ)f

(j+1)

(θ) dθ

for all f ∈ C

j+1

[a, b]. Note that we have a different kernel.

In general, this might not be a good idea, since we are throwing information

away. Yet, this can be helpful if we get some less smooth functions that don’t

have k + 1 derivatives.

5 Ordinary differential equations

5.1 Introduction

Our next big goal is to solve ordinary differential equations numerically. We will

focus on differential equations of the form

y

0

(t) = f(t, y(t))

for 0 ≤ t ≤ T , with initial conditions

y(0) = y

0

.

The data we are provided is the function

f

:

R×R

N

→ R

N

, the ending time

T >

0,

and the initial condition

y

0

∈ R

n

. What we seek is the function

y

: [0

, T

]

→ R

N

.

When solving the differential equation numerically, our goal would be to

make our numerical solution as close to the true solution as possible. This makes

sense only if a “true” solution actually exists, and is unique. From IB Analysis

II, we know a unique solution to the ODE exists if f is Lipschitz.

Definition

(Lipschitz function)

.

A function

f

:

R × R

N

→ R

N

is Lipschitz with

Lipschitz constant λ ≥ 0 if

kf(t, x) − f (t,

ˆ

x)k ≤ λkx −

ˆ

xk

for all t ∈ [0, T ] and x,

ˆ

x ∈ R

N

.

A function is Lipschitz if it is Lipschitz with Lipschitz constant

λ

for some

λ

.

It doesn’t really matter what norm we pick. It will just change the

λ

. The

importance is the existence of a λ.

A special case is when

λ

= 0, i.e.

f

does not depend on

x

. In this case, this

is just an integration problem, and is usually easy. This is a convenient test case

— if our numerical approximation does not even work for these easy problems,

then it’s pretty useless.

Being Lipschitz is sufficient for existence and uniqueness of a solution to

the differential equation, and hence we can ask if our solution converges to

this unique solution. An extra assumption we will often make is that

f

can

be expanded in a Taylor series to as many degrees as we want, since this is

convenient for our analysis.

What exactly does a numerical solution to the ODE consist of? We first

choose a small time step h > 0, and then construct approximations

y

n

≈ y(t

n

), n = 1, 2, ··· ,

with

t

n

=

nh

. In particular,

t

n

− t

n−1

=

h

and is always constant. In practice,

we don’t fix the step size

t

n

− t

n−1

, and allow it to vary in each step. However,

this makes the analysis much more complicated, and we will not consider varying

time steps in this course.

If we make

h

smaller, then we will (probably) make better approximations.

However, this is more computationally demanding. So we want to study the

behaviour of numerical methods in order to figure out what h we should pick.

5.2 One-step methods

There are many ways we can classify numerical methods. One important classifi-

cation is one-step versus multi-step methods. In one-step methods, the value of

y

n+1

depends only on the previous iteration

t

n

and

y

n

. In multi-step methods,

we are allowed to look back further in time and use further results.

Definition

((Explicit) one-step method)

.

A numerical method is (explicit)

one-step if y

n+1

depends only on t

n

and y

n

, i.e.

y

n+1

= φ

h

(t

n

, y

n

)

for some function φ

h

: R × R

N

→ R

N

.

We will later see what “explicit” means.

The simplest one-step method one can imagine is Euler’s method.

Definition (Euler’s method). Euler’s method uses the formula

y

n+1

= y

n

+ hf(t

n

, y

n

).

We want to show that this method “converges”. First of all, we need to make

precise the notion of “convergence”. The Lipschitz condition means there is a

unique solution to the differential equation. So we would want the numerical

solution to be able to approximate the actual solution to arbitrary accuracy as

long as we take a small enough h.

Definition

(Convergence of numerical method)

.

For each

h >

0, we can produce

a sequence of discrete values

y

n

for

n

= 0

, ··· ,

[

T/h

], where [

T/h

] is the integer

part of

T/h

. A method converges if, as

h →

0 and

nh → t

(hence

n → ∞

), we

get

y

n

→ y(t),

where

y

is the true solution to the differential equation. Moreover, we require

the convergence to be uniform in t.

We now prove that Euler’s method converges. We will only do this properly for

Euler’s method, since the algebra quickly becomes tedious and incomprehensible.

However, the proof strategy is sufficiently general that it can be adapted to most

other methods.

Theorem (Convergence of Euler’s method).

(i) For all t ∈ [0, T ], we have

lim

h→0

nh→t

y

n

− y(t) = 0.

(ii) Let λ be the Lipschitz constant of f. Then there exists a c ≥ 0 such that

ke

n

k ≤ ch

e

λT

− 1

λ

for all 0 ≤ n ≤ [T/h], where e

n

= y

n

− y(t

n

).

Note that the bound in the second part is uniform. So this immediately gives

the first part of the theorem.

Proof.

There are two parts to proving this. We first look at the local truncation

error. This is the error we would get at each step assuming we got the previous

steps right. More precisely, we write

y(t

n+1

) = y(t

n

) + h(f, t

n

, y(t

n

)) + R

n

,

and

R

n

is the local truncation error. For the Euler’s method, it is easy to get

R

n

, since

f

(

t

n

, y

(

t

n

)) =

y

0

(

t

n

), by definition. So this is just the Taylor series

expansion of

y

. We can write

R

n

as the integral remainder of the Taylor series,

R

n

=

Z

t

n+1

t

n

(t

n+1

− θ)y

00

(θ) dθ.

By some careful analysis, we get

kR

n

k

∞

≤ ch

2

,

where

c =

1

2

ky

00

k

∞

.

This is the easy part, and tends to go rather smoothly even for more complicated

methods.

Once we have bounded the local truncation error, we patch them together to

get the actual error. We can write

e

n+1

= y

n+1

− y(t

n+1

)

= y

n

+ hf(t

n

, y

n

) −

y(t

n

) + hf(t

n

, y(t

n

)) + R

n

= (y

n

− y(t

n

)) + h

f(t

n

, y

n

) − f(t

n

, y(t

n

))

− R

n

Taking the infinity norm, we get

ke

n+1

k

∞

≤ ky

n

− y(t

n

)k

∞

+ hkf(t

n

, y

n

) − f(t

n

, y(t

n

))k

∞

+ kR

n

k

∞

≤ ke

n

k

∞

+ hλke

n

k

∞

+ ch

2

= (1 + λh)ke

n

k

∞

+ ch

2

.

This is valid for all

n ≥

0. We also know

ke

0

k

= 0. Doing some algebra, we get

ke

n

k

∞

≤ ch

2

n−1

X

j=0

(1 + hλ)

j

≤

ch

λ

((1 + hλ)

n

− 1) .

Finally, we have

(1 + hλ) ≤ e

λh

,

since 1 +

λh

is the first two terms of the Taylor series, and the other terms are

positive. So

(1 + hλ)

n

≤ e

λhn

≤ e

λT

.

So we obtain the bound

ke

n

k

∞

≤ ch

e

λT

− 1

λ

.

Then this tends to 0 as we take h → 0. So the method converges.

This works as long as

λ 6

= 0. However,

λ

= 0 is the easy case, since it is

just integration. We can either check this case directly, or use the fact that

e

λT

−1

λ

→ T as λ → 0.

The same proof strategy works for most numerical methods, but the algebra

will be much messier. We will not do those in full. We, however, take note of

some useful terminology:

Definition

(Local truncation error)

.

For a general (multi-step) numerical

method

y

n+1

= φ(t

n

, y

0

, y

1

, ··· , y

n

),

the local truncation error is

η

n+1

= y(t

n+1

) − φ

n

(t

n

, y(t

0

), y(t

1

), ··· , y(t

n

)).

This is the error we will make at the (

n

+ 1)th step if we had accurate values

for the first n steps.

For Euler’s method, the local truncation error is just the Taylor series

remainder term.

Definition

(Order)

.

The order of a numerical method is the largest

p ≥

1 such

that η

n+1

= O(h

p+1

).

The Euler method has order 1. Notice that this is one less than the power

of the local truncation error, since when we look at the global error, we drop a

power, and only have e

n

∼ h.

Let’s try to get a little bit beyond Euler’s method.

Definition (θ-method). For θ ∈ [0, 1], the θ-method is

y

n+1

= y

n

+ h

θf(t

n

, y

n

) + (1 − θ)f(t

n+1

, y

n+1

)

.

If we put

θ

= 1, then we just get Euler’s method. The other two most

common choices of θ are θ = 0 (backward Euler) and θ =

1

2

(trapezoidal rule).

Note that for

θ 6

= 1, we get an implicit method. This is since

y

n+1

doesn’t

just appear simply on the left hand side of the equality. Our formula for

y

n+1

involves

y

n+1

itself! This means, in general, unlike the Euler method, we can’t

just write down the value of

y

n+1

given the value of

y

n

. Instead, we have to

treat the formula as

N

(in general) non-linear equations, and solve them to find

y

n+1

!

In the past, people did not like to use this, because they didn’t have computers,

or computers were too slow. It is tedious to have to solve these equations in

every step of the method. Nowadays, these are becoming more and more popular

because it is getting easier to solve equations, and

θ

-methods have some huge

theoretical advantages (which we do not have time to get into).

We now look at the error of the θ-method. We have

η = y(t

n+1

) − y(t

n

) − h

θy

0

(t

n

) + (1 − θ)y

0

(t

n+1

)

We expand all terms about t

n

with Taylor’s series to obtain

=

θ −

1

2

h

2

y

00

(t

n

) +

1

2

θ −

1

3

h

3

y

000

(t

n

) + O(h

4

).

We see that

θ

=

1

2

gives us an order 2 method. Otherwise, we get an order 1

method.

5.3 Multi-step methods

We can try to make our methods more efficient by making use of previous values

of

y

n

instead of the most recent one. One common method is the AB2 method:

Definition

(2-step Adams-Bashforth method)

.

The 2-step Adams-Bashforth

(AB2) method has

y

n+2

= y

n+1

+

1

2

h (3f (t

n+1

, y

n+1

) − f(t

n

, y

n

)) .

This is a particular example of the general class of Adams-Bashforth methods.

In general, a multi-step method is defined as follows:

Definition (Multi-step method). A s-step numerical method is given by

s

X

`=0

ρ

`

y

n+`

= h

s

X

`=0

σ

`

f(t

n+`

, y

n+`

).

This formula is used to find the value of y

n+s

given the others.

One point to note is that we get the same method if we multiply all the

constants

ρ

`

, σ

`

by a non-zero constant. By convention, we normalize this by

setting ρ

s

= 1. Then we can alternatively write this as

y

n+s

= h

s

X

`=0

σ

`

f(t

n+`

, y

n+`

) −

s−1

X

`=0

ρ

`

y

n+`

.

This method is an implicit method if σ

s

6= 0. Otherwise, it is explicit.

Note that this method is linear in the sense that the coefficients

ρ

`

and

σ

`

appear linearly, outside the

f

s. Later we will </