Part IB — Methods
Based on lectures by D. B. Skinner
Notes taken by Dexter Chua
Michaelmas 2015
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Selfadjoint ODEs
Periodic functions. Fourier series: definition and simple properties; Parseval’s theorem.
Equations of second order. Selfadjoint differential operators. The SturmLiouville
equation; eigenfunctions and eigenvalues; reality of eigenvalues and orthogonality of
eigenfunctions; eigenfunction expansions (Fourier series as prototype), approximation
in mean square, statement of completeness. [5]
PDEs on bounded domains: separation of variables
Physical basis of Laplace’s equation, the wave equation and the diffusion equation.
General method of separation of variables in Cartesian, cylindrical and spherical
co ordinates. Legendre’s equation: derivation, solutions including explicit forms of
P
0
,
P
1
and
P
2
, orthogonality. Bessel’s equation of integer order as an example of a
selfadjoint eigenvalue problem with nontrivial weight.
Examples including potentials on rectangular and circular domains and on a spherical
domain (axisymmetric case only), waves on a finite string and heat flow down a
semiinfinite rod. [5]
Inhomogeneous ODEs: Green’s functions
Properties of the Dirac delta function. Initial value problems and forced problems with
two fixed end points; solution using Green’s functions. Eigenfunction expansions of the
delta f unction and Green’s functions. [4]
Fourier transforms
Fourier transforms: definition and simple properties; inversion and convolution theorems.
The discrete Fourier transform. Examples of application to linear systems. Relationship
of transfer function to Green’s function for initial value problems. [4]
PDEs on unbounded domains
Classification of PDEs in two independent variables. Well posedness. Solution by
the method of characteristics. Green’s functions for PDEs in 1, 2 and 3 independent
variables; fundamental solutions of the wave equation, Laplace’s equation and the
diffusion equation. The method of images. Application to the forced wave equation,
Poisson’s equation and forced diffusion equation. Transient solutions of diffusion
problems: the error function. [6]
Contents
0 Introduction
1 Vector spaces
2 Fourier series
2.1 Fourier series
2.2 Convergence of Fourier series
2.3 Differentiability and Fourier series
3 SturmLiouville Theory
3.1 SturmLiouville operators
3.2 Least squares approximation
4 Partial differential equations
4.1 Laplace’s equation
4.2 Laplace’s equation in the unit disk in R
2
4.3 Separation of variables
4.4 Laplace’s equation in spherical polar coordinates
4.4.1 Laplace’s equation in spherical polar coordinates
4.4.2 Legendre Polynomials
4.4.3 Solution to radial part
4.5 Multipole expansions for Laplace’s equation
4.6 Laplace’s equation in cylindrical coordinates
4.7 The heat equation
4.8 The wave equation
5 Distributions
5.1 Distributions
5.2 Green’s functions
5.3 Green’s functions for initial value problems
6 Fourier transforms
6.1 The Fourier transform
6.2 The Fourier inversion theorem
6.3 Parseval’s theorem for Fourier transforms
6.4 A word of warning
6.5 Fourier transformation of distributions
6.6 Linear systems and response functions
6.7 General form of transfer function
6.8 The discrete Fourier transform
6.9 The fast Fourier transform*
7 More partial differential equations
7.1 Wellposedness
7.2 Method of characteristics
7.3 Characteristics for 2nd order partial differential equations
7.4 Green’s functions for PDEs on R
n
7.5 Poisson’s equation
0 Introduction
In the previous courses, the (partial) differential equations we have seen are
mostly linear. For example, we have Laplace’s equation:
∂
2
φ
∂x
2
+
∂φ
∂y
2
= 0,
and the heat equation:
∂φ
∂t
= κ
∂
2
φ
∂x
2
+
∂
2
φ
∂y
2
.
The Schr¨odinger’ equation in quantum mechanics is also linear:
i~
∂Φ
∂t
= −
~
2
2m
∂
2
φ
∂x
2
+ V (x)Φ(x).
By being linear, these equations have the property that if
φ
1
, φ
2
are solutions,
then so are λ
1
φ
1
+ λ
2
φ
2
(for any constants λ
i
).
Why are all these linear? In general, if we just randomly write down a
differential equation, most likely it is not going to be linear. So where did the
linearity of equations of physics come from?
The answer is that the real world is not linear in general. However, often we
are not looking for a completely accurate and precise description of the universe.
When we have low energy/speed/whatever, we can often quite accurately approx
imate reality by a linear equation. For example, the equation of general relativity
is very complicated and nowhere near being linear, but for small masses and
velocities, they reduce to Newton’s law of gravitation, which is linear.
The only exception to this seems to be Schr¨odinger’s equation. While there
are many theories and equations that superseded the Schr¨odinger equation, these
are all still linear in nature. It seems that linearity is the thing that underpins
quantum mechanics.
Due to the prevalence of linear equations, it is rather important that we
understand these equations well, and this is our primary objective of the course.
1 Vector spaces
When dealing with functions and differential equations, we will often think of
the space of functions as a vector space. In many cases, we will try to find a
“basis” for our space of functions, and expand our functions in terms of the basis.
Under different situations, we would want to use a different basis for our space.
For example, when dealing with periodic functions, we will want to pick basis
elements that are themselves periodic. In other situations, these basis elements
would not be that helpful.
A familiar example would be the Taylor series, where we try to approximate
a function f by
f(x) =
∞
X
n=0
f
(n)
(0)
n!
x
n
.
Here we are thinking of
{x
n
:
n ∈ N}
as the basis of our space, and trying to
approximate an arbitrary function as a sum of the basis elements. When writing
the function
f
as a sum like this, it is of course important to consider whether
the sum converges, and when it does, whether it actually converges back to f.
Another issue of concern is if we have a general set of basis functions
{y
n
}
,
how can we find the coefficients
c
n
such that
f
(
x
) =
P
c
n
y
n
(
x
)? This is the
bit where linear algebra comes in. Finding these coefficients is something we
understand well in linear algebra, and we will attempt to borrow the results and
apply them to our space of functions.
Another concept we would want to borrow is eigenvalues and eigenfunctions,
as well as selfadjoint (“Hermitian”) operators. As we go along the course, we
will see some close connections between functions and vector spaces, and we can
often get inspirations from linear algebra.
Of course, there is no guarantee that the results from linear algebra would
apply directly, since most of our linear algebra results was about finite basis and
finite linear sums. However, it is often a good starting point, and usually works
when dealing with sufficiently nice functions.
We start with some preliminary definitions, which should be familiar from
IA Vectors and Matrices and/or IB Linear Algebra.
Definition
(Vector space)
.
A vector space over
C
(or
R
) is a set
V
with an
operation + which obeys
(i) u + v = v + u (commutativity)
(ii) (u + v) + w = u + (v + w) (associativity)
(iii) There is some 0 ∈ V such that 0 + u = u for all u (identity)
We can also multiply vectors by a scalars λ ∈ C, which satisfies
(i) λ(µv) = (λµ)v (associativity)
(ii) λ(u + v) = λu + λv (distributivity in V )
(iii) (λ + µ)u = λu + λv (distributivity in C)
(iv) 1v = v (identity)
Often, we wouldn’t have just a vector space. We usually give them some
additional structure, such as an inner product.
Definition
(Inner product)
.
An inner product on
V
is a map (
·, ·
) :
V ×V → C
that satisfies
(i) (u, λv) = λ(u, v) (linearity in second argument)
(ii) (u, v + w) = (u, v) + (u, w) (additivity)
(iii) (u, v) = (v, u)
∗
(conjugate symmetry)
(iv) (u, u) ≥ 0, with equality iff u = 0 (positivity)
Note that the positivity condition makes sense since conjugate symmetry entails
that (u, u) ∈ R.
The inner product in turn defines a norm
kuk
=
p
(u, u)
that provides the
notion of length and distance.
It is important to note that we only have linearity in the second argument.
For the first argument, we have (λu, v) = (v, λu)
∗
= λ
∗
(v, u)
∗
= λ
∗
(u, v).
Definition
(Basis)
.
A set of vectors
{v
1
, v
2
, ··· , v
n
}
form a basis of
V
iff any
u ∈ V can be uniquely written as a linear combination
u =
n
X
i=1
λ
i
v
i
for some scalars
λ
i
. The dimension of a vector space is the number of basis
vectors in its basis.
A basis is orthogonal (with respect to the inner product) if (
v
i
, v
j
) = 0
whenever i 6= j.
A basis is orthonormal (with respect to the inner product) if it is orthogonal
and (v
i
, v
i
) = 1 for all i.
Orthonormal bases are the nice bases, and these are what we want to work
with.
Given an orthonormal basis, we can use the inner product to find the
expansion of any u ∈ V in terms of the basis, for if
u =
X
i
λ
i
v
i
,
taking the inner product with v
j
gives
(v
j
, u) =
v
j
,
X
i
λ
i
v
i
!
=
X
i
λ
i
(v
j
, v
i
) = λ
j
,
using additivity and linearity. Hence we get the general formula
λ
i
= (v
i
, u).
We have seen all these so far in IA Vectors and Matrices, where a vector is a list
of finitely many numbers. However, functions can also be thought of as elements
of an (infinite dimensional) vector space.
Suppose we have
f, g
: Ω
→ C
. Then we can define the sum
f
+
g
by
(f + g)(x) = f(x) + g(x). Given scalar λ, we can also define (λf )(x) = λf (x).
This also makes intuitive sense. We can simply view a functions as a list
of numbers, where we list out the values of
f
at each point. The list could be
infinite, but a list nonetheless.
Most of the time, we don’t want to look at the set of all functions. That
would be too huge and uninteresting. A natural class of functions to consider
would be the set of solutions to some particular differential solution. However,
this doesn’t always work. For this class to actually be a vector space, the sum
of two solutions and the scalar multiple of a solution must also be a solution.
This is exactly the requirement that the differential equation is linear. Hence,
the set of solutions to a linear differential equation would form a vector space.
Linearity pops up again.
Now what about the inner product? A natural definition is
(f, g) =
Z
Σ
f(x)
∗
g(x) dµ,
where
µ
is some measure. For example, we could integrate d
x
, or d
x
2
. This
measure specifies how much weighting we give to each point x.
Why does this definition make sense? Recall that the usual inner product
on finitedimensional vector spaces is
P
v
∗
i
w
i
. Here we are just summing the
different components of
v
and
w
. We just said we can think of the function
f
as
a list of all its values, and this integral is just the sum of all components of
f
and g.
Example. Let Σ = [a, b]. Then we could take
(f, g) =
Z
b
a
f(x)
∗
g(x) dx.
Alternatively, let Σ = D
2
⊆ R
2
be the unit disk. Then we could have
(f, g) =
Z
1
0
Z
2π
0
f(r, θ)
∗
g(r, θ) dθ r dr
Note that we were careful and said that d
µ
is “some measure”. Here we are
integrating against d
θ r
d
r
. We will later see cases where this can be even more
complicated.
If Σ has a boundary, we will often want to restrict our functions to take
particular values on the boundary, known as boundary conditions. Often, we
want the boundary conditions to preserve linearity. We call these nice boundary
conditions homogeneous conditions.
Definition
(Homogeneous boundary conditions)
.
A boundary condition is
homogeneous if whenever
f
and
g
satisfy the boundary conditions, then so does
λf + µg for any λ, µ ∈ C (or R).
Example.
Let Σ = [
a, b
]. We could require that
f
(
a
) + 7
f
0
(
b
) = 0, or maybe
f
(
a
) + 3
f
00
(
a
) = 0. These are examples of homogeneous boundary conditions.
On the other hand, the requirement f(a) = 1 is not homogeneous.
2 Fourier series
The first type of functions we will consider is periodic functions.
Definition
(Periodic function)
.
A function
f
is periodic if there is some fixed
R such that f(x + R) = f (x) for all x.
However, it is often much more convenient to think of this as a function
f
:
S
1
→ C
from unit circle to
C
, and parametrize our function by an angle
θ
ranging from 0 to 2π.
Now why do we care about periodic functions? Apart from the fact that
genuine periodic functions exist, we can also use them to model functions on a
compact domain. For example, if we have a function defined on [0
,
1], we can
pretend it is a periodic function on
R
by making infinitely many copies of the
function to the intervals [1, 2], [2, 3] etc.
x x
2.1 Fourier series
So. As mentioned in the previous chapter, we want to find a set of “basis
functions” for periodic functions. We could go with the simplest case of periodic
functions we know of — the exponential function
e
inθ
. These functions have a
period of 2
π
, and are rather easy to work with. We all know how to integrate
and differentiate the exponential function.
More importantly, this set of basis functions is orthogonal. We have
(e
imθ
, e
inθ
) =
Z
π
−π
e
−imθ
e
inθ
dθ =
Z
π
−π
e
i(n−m)θ
dθ =
(
2π n = m
0 n 6= m
= 2πδ
nm
We can normalize these to get a set of orthonormal functions
n
1
√
2π
e
inθ
: n ∈ Z
o
.
Fourier’s idea was to use this as a basis for any periodic function. Fourier
claimed that any f : S
1
→ C can be expanded in this basis:
f(θ) =
X
n∈Z
ˆ
f
n
e
inθ
,
where
ˆ
f
n
=
1
2π
(e
inθ
, f) =
1
2π
Z
π
−π
e
−inθ
f(θ) dθ.
These really should be defined as
f
(
θ
) =
P
ˆ
f
n
e
inθ
√
2π
with
ˆ
f
n
=
e
inθ
√
2π
, f
, but for
convenience reasons, we move all the constant factors to the
ˆ
f
n
coefficients.
We can consider the special case where
f
:
S
1
→ R
is a real function. We
might want to make our expansion look a bit more “real”. We get
(
ˆ
f
n
)
∗
=
1
2π
Z
π
−π
e
−inθ
f(θ) dθ
∗
=
1
2π
Z
π
−π
e
inθ
f(θ) dθ =
ˆ
f
−n
.
So we can replace our Fourier series by
f(θ) =
ˆ
f
0
+
∞
X
n=1
ˆ
f
n
e
inθ
+
ˆ
f
−n
e
−inθ
=
ˆ
f
0
+
∞
X
n=1
ˆ
f
n
e
inθ
+
ˆ
f
∗
n
e
−inθ
.
Setting
ˆ
f
n
=
a
n
+ ib
n
2
, we can write this as
f(θ) =
ˆ
f
0
+
∞
X
n=1
(a
n
cos nθ + b
n
sin nθ)
=
a
0
2
+
∞
X
n=1
(a
n
cos nθ + b
n
sin nθ).
Here the coefficients are
a
n
=
1
π
Z
π
−π
cos nθf(θ) dθ, b
n
=
1
π
Z
π
−π
sin nθf(θ) dθ.
This is an alternative formulation of the Fourier series in terms of sin and cos.
So when given a real function, which expansion should we use? It depends. If
our function is odd (or even), it would be useful to pick the sine/cosine expansion,
since the cosine (or sine) terms will simply disappear. On the other hand, if we
want to stick our function into a differential equation, exponential functions are
usually more helpful.
2.2 Convergence of Fourier series
When Fourier first proposed the idea of a Fourier series, people didn’t really
believe in him. How can we be sure that the infinite series actually converges?
It turns out that in many cases, they don’t.
To investigate the convergence of the series, we define the partial Fourier
sum as
S
n
f =
n
X
m=−n
ˆ
f
m
e
imθ
.
The question we want to answer is whether
S
n
f
“converges” to
f
. Here we
have to be careful with what we mean by convergence. As we (might) have seen
in Analysis, there are many ways of defining convergence of functions. If we
have a “norm” on the space of functions, we can define convergence to mean
lim
n→∞
kS
n
f − fk = 0. Our “norm” can be defined as
kS
n
f − fk =
1
2π
Z
π
−π
S
n
f(θ) − f(θ)
2
dθ.
However, this doesn’t really work if
S
n
f
and
f
can be arbitrary functions, as in
this does not necessarily have to be a norm. Indeed, if
lim
n→∞
S
n
f
differs from
f
on finitely or countably many points, the integral will still be zero. In particular,
lim S
n
f
and
f
can differ at all rational points, but this definition will say that
S
n
f converges to f.
Hence another possible definition of convergence is to require
lim
n→∞
S
n
f(θ) − f(θ) = 0
for all
θ
. This is known as pointwise convergence. However, this is often too
weak a notion. We can ask for more, and require that the rate of convergent is
independent of θ. This is known as uniform convergence, and is defined by
lim
n→∞
sup
θ
S
n
f(θ) − f(θ) = 0.
Of course, with different definitions of convergence, we can get different answers
to whether it converges. Unfortunately, even if we manage to get our definition
of convergence right, it is still difficult to come up with a criterion to decide if a
function has a convergent Fourier Series.
It is not difficult to prove a general criterion for convergence in norm, but
we will not do that, since that is analysis. If interested, one should take the IID
Analysis of Functions course. In this course, instead of trying to come up with
something general, let’s look at an example instead.
Example. Consider the sawtooth function f(θ) = θ.
−3π
−π
π
3π
Note that this function is discontinuous at odd multiples of π.
The Fourier coefficients (for n 6= 0) are
ˆ
f
n
=
1
2π
Z
π
−π
e
−inθ
θ dθ =
−
1
2πin
e
−inθ
θ
π
−π
+
1
2πin
Z
π
−π
e
inθ
dθ =
(−1)
n+1
in
.
We also have
ˆ
f
0
= 0.
Hence we have
θ =
X
n6=0
(−1)
n+1
in
e
inθ
.
It turns out that this series converges to the sawtooth for all
θ 6
= (2
m
+ 1)
π
, i.e.
everywhere that the sawtooth is continuous.
Let’s look explicitly at the case where
θ
=
π
. Each term of the partial Fourier
series is zero. So we can say that the Fourier series converges to 0. This is the
average value of lim
ε→0
f(π ± ε).
This is typical. At an isolated discontinuity, the Fourier series is the average
of the limiting values of the original function as we approach from either side.
2.3 Differentiability and Fourier series
Integration is a smoothening operator. If we take, say, the step function
Θ(x) =
(
1 x > 0
0 x < 0
.
Then the integral is given by
Z
Θ(x) dx =
(
x x > 0
0 x < 0
.
This is now a continuous function. If we integrate it again, we make the positive
side look like a quadratic curve, and it is now differentiable.
x
y
R
dx
x
y
On the other hand, when we differentiate a function, we generally make it worse.
For example, when we differentiate the continuous function
R
Θ(
x
) d
x
, we obtain
the discontinuous function Θ(
x
). If we attempt to differentiate this again, we
get the δ(non)function.
Hence, it is often helpful to characterize the “smoothness” of a function by
how many times we can differentiate it. It turns out this is rather relevant to
the behaviour of the Fourier series.
Suppose that we have a function that is itself continuous and whose first
m −
1 derivatives are also continuous, but
f
(m)
has isolated discontinuities at
{θ
1
, θ
2
, θ
3
, ··· , θ
r
}.
We can look at the kth Fourier coefficient:
ˆ
f
k
=
1
2π
Z
π
−π
e
−ikθ
f(θ) dθ
=
−
1
2πik
e
−ikθ
f(θ)
π
−π
+
1
2πik
Z
π
−π
e
−ikθ
f
0
(θ) dθ
The first term vanishes since
f
(
θ
) is continuous and takes the same value at
π
and −π. So we are left with
=
1
2πik
Z
π
−π
e
−ikθ
f
0
(θ) dθ
= ···
=
1
(ik)
m
1
2π
Z
π
−π
e
ikθ
f
(m)
(θ) dθ.
Now we have to be careful, since
f
(m)
is no longer continuous. However provided
that
f
(m)
is everywhere finite, we can approximate this by removing small strips
(
θ
i
−ε, θ
i
+
ε
) from our domain of integration, and take the limit
ε →
0. We can
write this as
= lim
ε→0
1
(ik)
m
1
2π
Z
θ
1
−ε
−π
+
Z
θ
2
−ε
θ
1
+ε
+ ··· +
Z
π
θ
r
+ε
e
ikθ
f
(m)
(θ) dθ
=
1
(ik)
m+1
1
2π
"
r
X
s=1
(f
(m)
(θ
+
s
) − f
(m)
(θ
−
s
)) +
Z
(−π,π)\θ
e
−ikθ
f
(m+1)
(θ) dθ
#
.
We now have to stop. So
ˆ
f
k
decays like
1
k
m+1
if our function and its (
m −
1)
derivatives are continuous. This means that if a function is more differentiable,
then the coefficients decay more quickly.
This makes sense. If we have a rather smooth function, then we would expect
the first few Fourier terms (with low frequency) to account for most of the
variation of f. Hence the coefficients decay really quickly.
However, if the function is jiggly and bumps around all the time, we would
expect to need some higher frequency terms to account for the minute variation.
Hence the terms would not decay away that quickly. So in general, if we can
differentiate it more times, then the terms should decay quicker.
An important result, which is in some sense a Linear Algebra result, is
Parseval’s theorem.
Theorem (Parseval’s theorem).
(f, f) =
Z
π
−π
f(θ)
2
dθ = 2π
X
n∈Z

ˆ
f
n

2
Proof.
(f, f) =
Z
π
−π
f(θ)
2
dθ
=
Z
π
−π
X
m∈Z
ˆ
f
∗
m
e
−imθ
!
X
n∈Z
ˆ
f
n
e
inθ
!
dθ
=
X
m,n∈Z
ˆ
f
∗
m
ˆ
f
n
Z
π
−π
e
i(n−m)θ
dθ
= 2π
X
m,n∈Z
ˆ
f
∗
m
ˆ
f
n
δ
mn
= 2π
X
n∈Z

ˆ
f
n

2
Note that this is not a fully rigorous proof, since we assumed not only that
the Fourier series converges to the function, but also that we could commute the
infinite sums. However, for the purposes of an applied course, this is sufficient.
Last time, we computed that the sawtooth function
f
(
θ
) =
θ
has Fourier
coefficients
ˆ
f
0
= 0,
ˆ
f
n
=
i(−1)
n
n
for n 6= 0.
But why do we care? It turns out this has some applications in number theory.
You might have heard of the Riemann ζfunction, defined by
ζ(s) =
∞
X
n=1
1
n
s
.
We will show that this obeys the property that for any
m
,
ζ
(2
m
) =
π
2m
q
for
some
q ∈ Q
. This may not be obvious at first sight. So let’s apply Parseval’s
theorem for the sawtooth defined by
f
(
θ
) =
θ
. By direct computation, we know
that
(f, f) =
Z
π
−π
θ
2
dθ =
2π
3
3
.
However, by Parseval’s theorem, we know that
(f, f) = 2π
X
n∈Z

ˆ
f
n

2
= 4π
∞
X
n=1
1
n
2
.
Putting these together, we learn that
∞
X
n=1
1
n
2
= ζ(2) =
π
2
6
.
We have just done it for the case where
m
= 1. But if we integrate the sawtooth
function repeatedly, then we can get the general result for all m.
At this point, we might ask, why are we choosing these
e
imθ
as our basis?
Surely there are a lot of different sets of basis we can use. For example, in finite
dimensions, we can just arbitrary choose random vectors (that are not linearly
dependent) to get a set of basis vectors. However, in practice, we don’t pick
them randomly. We often choose a basis that can reveal the symmetry of a
system. For example, if we have a rotationally symmetric system, we would like
to use polar coordinates. Similarly, if we have periodic functions, then
e
imθ
is
often a good choice of basis.
3 SturmLiouville Theory
3.1 SturmLiouville operators
In finite dimensions, we often consider linear maps
M
:
V → W
. If
{v
i
}
is a
basis for
V
and
{w
i
}
is a basis for
W
, then we can represent the map by a
matrix with entries
M
ai
= (w
a
, Mv
i
).
A map
M
:
V → V
is called selfadjoint if
M
†
=
M
as matrices. However, it is
not obvious how we can extend this notion to arbitrary maps between arbitrary
vector spaces (with an inner product) when they cannot be represented by a
matrix.
Instead, we make the following definitions:
Definition
(Adjoint and selfadjoint)
.
The adjoint
B
of a map
A
:
V → V
is a
map such that
(Bu, v) = (u, Av)
for all vectors u, v ∈ V . A map is then selfadjoint if
(Mu, v) = (u, M v).
Selfadjoint matrices come with a natural basis. Recall that the eigenvalues
of a matrix are the roots of
det
(
M − λI
) = 0. The eigenvector corresponding to
an eigenvalue λ is defined by Mv
i
= λ
i
v
i
.
In general, eigenvalues can be any complex number. However, selfadjoint
maps have real eigenvalues. Suppose
Mv
i
= λ
i
v
i
.
Then we have
λ
i
(v
i
, v
i
) = (v
i
, Mv
i
) = (Mv
i
, v
i
) = λ
∗
i
(v
i
, v
i
).
So λ
i
= λ
∗
i
.
Furthermore, eigenvectors with distinct eigenvalues are orthogonal with
respect to the inner product. Suppose that
Mv
i
= λ
i
v
i
, M v
j
= λ
j
v
j
.
Then
λ
i
(v
j
, v
i
) = (v
j
, Mv
i
) = (Mv
j
, v
i
) = λ
j
(v
j
, v
i
).
Since λ
i
6= λ
j
, we must have (v
j
, v
i
) = 0.
Knowing eigenvalues and eigenvalues gives a neat way so solve linear equations
of the form
Mu = f.
Here we are given
M
and
f
, and want to find
u
. Of course, the answer is
u = M
−1
f. However, if we expand in terms of eigenvectors, we obtain
Mu = M
X
u
i
v
i
=
X
u
i
λ
i
v
i
.
Hence we have
X
u
i
λ
i
v
i
=
X
f
i
v
i
.
Taking the inner product with v
j
, we know that
u
j
=
f
j
λ
j
.
So far, these are all things from IA Vectors and Matrices. SturmLiouville theory
is the infinitedimensional analogue.
In our vector space of differentiable functions, our “matrices” would be linear
differential operators L. For example, we could have
L = A
p
(x)
d
p
dx
p
+ A
p−1
(x)
d
p−1
dx
p−1
+ ··· + A
1
(x)
d
dx
+ A
0
(x).
It is an easy check that this is in fact linear.
We say L has order p if the highest derivative that appears is
d
p
dx
p
.
In most applications, we will be interested in the case
p
= 2. When will our
L be selfadjoint?
In the p = 2 case, we have
Ly = P
d
2
y
dx
2
+ R
dy
dx
− Qy
= P
d
2
y
dx
2
+
R
P
dy
dx
−
Q
P
y
= P
e
−
R
R
P
dx
d
dx
e
R
R
P
dx
dy
dx
−
Q
P
y
Let p = exp
R
R
P
dx
. Then we can write this as
= P p
−1
d
dx
p
dy
dx
−
Q
P
py
.
We further define
q
=
Q
P
p
. We also drop a factor of
P p
−1
. Then we are left with
L =
d
dx
p(x)
d
dx
− q(x).
This is the SturmLiouville form of the operator. Now let’s compute (
f, Lg
). We
integrate by parts numerous times to obtain
(f, Lg) =
Z
b
a
f
∗
d
dx
p
dg
dx
− qg
dx
= [f
∗
pg
0
]
b
a
−
Z
b
a
df
∗
dx
p
dg
dx
+ f
∗
qg
dx
= [f
∗
pg
0
− f
0∗
pg]
b
a
+
Z
b
a
d
dx
p
df
∗
dx
− qf
∗
g dx
= [(f
∗
g
0
− f
0∗
g)p]
b
a
+ (Lf, g),
assuming that p, q are real.
So 2nd order linear differential operators are selfadjoint with respect to this
norm if
p, q
are real and the boundary terms vanish. When do the boundary
terms vanish? One possibility is when
p
is periodic (with the right period), or if
we constrain f and g to be periodic.
Example. We can consider a simple case, where
L =
d
2
dx
2
.
Here we have p = 1, q = 0. If we ask for functions to be periodic on [a, b], then
Z
b
a
f
∗
d
2
g
dx
2
dx =
Z
b
a
d
2
f
∗
dx
2
g dx.
Note that it is important that we have a secondorder differential operator. If it
is firstorder, then we would have a negative sign, since we integrated by parts
once.
Just as in finite dimensions, selfadjoint operators have eigenfunctions and
eigenvalues with special properties. First, we define a more sophisticated inner
product.
Definition
(Inner product with weight)
.
An inner product with weight
w
,
written ( ·, ·)
w
, is defined by
(f, g)
w
=
Z
b
a
f
∗
(x)g(x)w(x) dx,
where w is real, nonnegative, and has only finitely many zeroes.
Why do we want a weight
w
(
x
)? In the future, we might want to work with
the unit disk, instead of a square in
R
2
. When we want to use polar coordinates,
we will have to integrate with
r
d
r
d
θ
, instead of just d
r
d
θ
. Hence we need the
weight of
r
. Also, we allow it to have finitely many zeroes, so that the radius
can be 0 at the origin.
Why can’t we have more zeroes? We want the inner product to keep the
property that (
f, f
)
w
= 0 iff
f
= 0 (for continuous
f
). If
w
is zero at too many
places, then the inner product could be zero without f being zero.
We now define what it means to be an eigenfunction.
Definition
(Eigenfunction with weight)
.
An eigenfunction with weight
w
of
L
is a function y : [a, b] → C obeying the differential equation
Ly = λwy,
where λ ∈ C is the eigenvalue.
This might be strange at first sight. It seems like we can take any nonsense
y
, apply
L
, to get some nonsense
Ly
. But then it is fine, since we can write it
as some nonsense
w
times our original
y
. So any function is an eigenfunction?
No! There are many constraints
w
has to satisfy, like being positive, real and
having finitely many zeroes. It turns out this severely restraints what values
y
can take, so not everything will be an eigenfunction. In fact we can develop
this theory without the weight function
w
. However, weight functions are much
more convenient when, say, dealing with the unit disk.
Proposition. The eigenvalues of a SturmLiouville operator are real.
Proof. Suppose Ly
i
= λ
i
wy
i
. Then
λ
i
(y
i
, y
i
)
w
= λ
i
(y
i
, wy
i
) = (y
i
, Ly
i
) = (Ly
i
, y
i
) = (λ
i
wy
i
, y
i
) = λ
∗
i
(y
i
, y
i
)
w
.
Since (y
i
, y
i
)
w
6= 0, we have λ
i
= λ
∗
i
.
Note that the first and last terms use the weighted inner product, but the
middle terms use the unweighted inner product.
Proposition.
Eigenfunctions with different eigenvalues (but same weight) are
orthogonal.
Proof. Let Ly
i
= λ
i
wy
i
and Ly
j
= λ
j
wy
j
. Then
λ
i
(y
j
, y
i
)
w
= (y
j
, Ly
i
) = (Ly
j
, y
i
) = λ
j
(y
j
, y
i
)
w
.
Since λ
i
6= λ
j
, we must have (y
j
, y
i
)
w
= 0.
Those were pretty straightforward manipulations. However, the main results
of Sturm–Liouville theory are significantly harder, and we will not prove them.
We shall just state them and explore some examples.
Theorem. On a compact domain, the eigenvalues λ
1
, λ
2
, ··· form a countably
infinite sequence and are discrete.
This will be a rather helpful result in quantum mechanics, since in quantum
mechanics, the possible values of, say, the energy are the eigenvalues of the
Hamiltonian operator. Then this result says that the possible values of the
energy are discrete and form an infinite sequence.
Note here the word compact. In quantum mechanics, if we restrict a particle
in a well [0
,
1], then it will have quantized energy level since the domain is
compact. However, if the particle is free, then it can have any energy at all since
we no longer have a compact domain. Similarly, angular momentum is quantized,
since it describe rotations, which takes values in S
1
, which is compact.
Theorem.
The eigenfunctions are complete: any function
f
: [
a, b
]
→ C
(obeying
appropriate boundary conditions) can be expanded as
f(x) =
X
n
ˆ
f
n
y
n
(x),
where
ˆ
f
n
=
Z
y
∗
n
(x)f(x)w(x) dx.
Example.
Let [
a, b
] = [
−L, L
],
L
=
d
2
dx
2
,
w
= 1, restricting to periodic functions
Then our eigenfunction obeys
d
2
y
n
dx
2
= λ
n
y
n
(x),
Then our eigenfunctions are
y
n
(x) = exp
inπx
L
with eigenvalues
λ
n
= −
n
2
π
2
L
2
for n ∈ Z. This is just the Fourier series!
Example
(Hermite polynomials)
.
We are going to cheat a little bit and pick
our domain to be R. We want to study the differential equation
1
2
H
00
− xH
0
= λH,
with H : R → C. We want to put this in SturmLiouville form. We have
p(x) = exp
−
Z
x
0
2t dt
= e
−x
2
,
ignoring constant factors. Then q(x) = 0. We can rewrite this as
d
dx
e
−x
2
dH
dx
= −2λe
−x
2
H(x).
So we take our weight function to be w(x) = e
−x
2
.
We now ask that
H
(
x
) grows at most polynomially as
x → ∞
. In particular,
we want
e
−x
2
H
(
x
)
2
→
0. This ensures that the boundary terms from integration
by parts vanish at the infinite boundary, so that our SturmLiouville operator is
selfadjoint.
The eigenfunctions turn out to be
H
n
(x) = (−1)
n
e
x
2
d
n
dx
n
e
−x
2
.
These are known as the Hermite polynomials. Note that these are indeed
polynomials. When we differentiate the
e
−x
2
term many times, we get a lot
of things from the product rule, but they will always keep an
e
−x
2
, which will
ultimately cancel with e
x
2
.
Just as for matrices, we can use the eigenfunction expansion to solve forced
differential equations. For example, if might want to solve
Lg = f(x),
where f (x) is a forcing term. We can write this as
Lg = w(x)F (x).
We expand our g as
g(x) =
X
n∈Z
ˆg
n
y
n
(x).
Then by linearity,
Lg =
X
n∈Z
ˆg
n
Ly
n
(x) =
X
n∈Z
ˆg
n
λ
n
w(x)y
n
(x).
We can also expand our forcing function as
w(x)F(x) = w(x)
X
n∈Z
ˆ
F
n
y
n
(x)
!
.
Taking the (regular) inner product with
y
m
(
x
) (and noting orthogonality of
eigenfunctions), we obtain
w(x)ˆg
m
λ
m
= w(x)
ˆ
F
m
This tells us that
ˆg
m
=
ˆ
F
m
λ
m
.
So we have
g(x) =
X
n∈Z
ˆ
F
n
λ
n
y
n
(x),
provided all λ
n
are nonzero.
This is a systematic way of solving forced differential equations. We used to
solve these by “being smart”. We just looked at the forcing term and tried to
guess what would work. Unsurprisingly, this approach does not succeed all the
time. Thus it is helpful to have a systematic way of solving the equations.
It is often helpful to rewrite this into another form, using the fact that
ˆ
F
n
= (y
n
, F )
w
. So we have
g(x) =
X
n∈Z
1
λ
n
(y
n
, F )
w
y
n
(x) =
Z
b
a
X
n∈Z
1
λ
n
y
∗
n
(t)y
n
(x)w(t)F(t) dt.
Note that we swapped the sum and the integral, which is in general a dangerous
thing to do, but we don’t really care because this is an applied course. We can
further write the above as
g(x) =
Z
b
a
G(x, t)F (t)w(t) dt,
where G(x, t) is the infinite sum
G(x, t) =
X
n∈Z
1
λ
n
y
∗
n
(t)y
n
(x).
We call this the Green’s function. Note that this depends on
λ
n
and
y
n
only.
It depends on the differential operator
L
, but not the forcing term
F
. We can
think of this as something like the “inverse matrix”, which we can use to solve
the forced differential equation for any forcing term.
Recall that for a matrix, the inverse exists if the determinant is nonzero,
which is true if the eigenvalues are all nonzero. Similarly, here a necessary
condition for the Green’s function to exist is that all the eigenvalues are nonzero.
We now have a second version of Parseval’s theorem.
Theorem (Parseval’s theorem II).
(f, f)
w
=
X
n∈Z

ˆ
f
n

2
.
Proof. We have
(f, f)
w
=
Z
Ω
f
∗
(x)f(x)w(x) dx
=
X
n,m∈Z
Z
Ω
ˆ
f
∗
n
y
∗
n
(x)
ˆ
f
m
y
m
(x)w(x) dx
=
X
n,m∈Z
ˆ
f
∗
n
ˆ
f
m
(y
n
, y
m
)
w
=
X
n∈Z

ˆ
f
n

2
.
3.2 Least squares approximation
So far, we have expanded functions in terms of infinite series. However, in real
life, when we ask a computer to do this for us, it is incapable of storing and
calculating an infinite series. So we need to truncate it at some point.
Suppose we approximate some function
f
: Ω
→ C
by a finite set of eigen
functions y
n
(x). Suppose we write the approximation g as
g(x) =
n
X
k=1
c
k
y
k
(x).
The objective here is to figure out what values of the coefficients
c
k
are “the
best”, i.e. can make g represent f as closely as possible.
Firstly, we have to make it explicit what we mean by “as closely as possible”.
Here we take this to mean that we want to minimize the
w
norm (
f −g, f −g
)
w
.
By linearity of the norm, we know that
(f − g, f − g)
w
= (f, f)
w
+ (g, g)
w
− (f, g)
w
− (g, f)
w
.
To minimize this norm, we want
0 =
∂
∂c
j
(f − g, f − g)
w
=
∂
∂c
j
[(f, f)
w
+ (g, g)
w
− (f, g)
w
− (g, f)
w
].
We know that the (
f, f
)
w
term vanishes since it does not depend on
c
k
. Expanding
our definitions of g, we can get
0 =
∂
∂c
j
∞
X
i=1
c
k

n
−
n
X
k=1
ˆ
f
∗
k
c
k
−
n
X
k=1
c
∗
k
ˆ
f
k
!
= c
∗
j
−
ˆ
f
∗
j
.
Note that here we are treating
c
∗
j
and
c
j
as distinct quantities. So when we vary
c
j
,
c
∗
j
is unchanged. To formally justify this treatment, we can vary the real and
imaginary parts separately.
Hence, the extremum is achieved at
c
∗
j
=
ˆ
f
∗
j
. Similarly, varying with respect
to c
∗
j
, we get that c
j
=
ˆ
f
j
.
To check that this is indeed an minimum, we can look at the secondderivatives
∂
2
∂c
i
∂c
j
(f − g, f − g)
w
=
∂
2
∂c
∗
i
c
∗
j
(f − g, f − g)
w
= 0,
while
∂
2
∂c
∗
i
∂c
j
(f − g, f − g)
w
= δ
ij
≥ 0.
Hence this is indeed a minimum.
Thus we know that (f − g, f − g)
w
is minimized over all g(x) when
c
k
=
ˆ
f
k
= (y
k
, f)
w
.
These are exactly the coefficients in our infinite expansion. Hence if we truncate
our series at an arbitrary point, it is still the best approximation we can get.
4 Partial differential equations
So far, we have come up with a lot of general theory about functions and stuff.
We are going to look into some applications of these. In this section, we will
develop the general technique of separation of variables, and apply this to many
many problems.
4.1 Laplace’s equation
Definition
(Laplace’s equation)
.
Laplace’s equation on
R
n
says that a (twice
differentiable) equation φ : R
n
→ C obeys
∇
2
φ = 0,
where
∇
2
=
n
X
i=1
∂
2
∂x
2
i
.
This equation arises in many situations. For example, if we have a conservative
force
F
=
−∇φ
, and we are in a region where the force obeys the sourcefree
Gauss’ law
∇·F
= 0 (e.g. when there is no electric charge in an electromagnetic
field), then we get Laplace’s equation.
It also arises as a special case of the heat equation
∂φ
∂t
=
κ∇
2
φ
, and wave
equation
∂
2
φ
∂t
2
= c
2
∇
2
φ, when there is no time dependence.
Definition
(Harmonic functions)
.
Functions that obey Laplace’s equation are
often called harmonic functions.
Proposition.
Let Ω be a compact domain, and
∂
Ω be its boundary. Let
f : ∂Ω → R. Then there is a unique solution to ∇
2
φ = 0 on Ω with φ
∂Ω
= f.
Proof.
Suppose
φ
1
and
φ
2
are both solutions such that
φ
1

∂Ω
=
φ
2

∂Ω
=
f
. Then
let Φ = φ
1
− φ
2
. So Φ = 0 on the boundary. So we have
0 =
Z
Ω
Φ∇
2
Φ d
n
x = −
Z
Ω
(∇Φ) · (∇Φ) dx +
Z
∂Ω
Φ∇Φ · n d
n−1
x.
We note that the second term vanishes since Φ = 0 on the boundary. So we have
0 = −
Z
Ω
(∇Φ) · (∇Φ) dx.
However, we know that (
∇
Φ)
·
(
∇
Φ)
≥
0 with equality iff
∇
Φ = 0. Hence
Φ is constant throughout Ω. Since at the boundary, Φ = 0, we have Φ = 0
throughout, i.e. φ
1
= φ
2
.
Asking that
φ
∂Ω
=
f
(
x
) is a Dirichlet boundary condition. We can instead
ask
n ·∇φ
∂Ω
=
g
, a Neumann condition. In this case, we have a unique solution
up to a constant factor.
These we’ve all seen in IA Vector Calculus, but this time in arbitrary dimen
sions.
4.2 Laplace’s equation in the unit disk in R
2
Let Ω = {(x, y) ∈ R
2
: x
2
+ y
2
≤ 1}. Then Laplace’s equation becomes
0 = ∇
2
φ =
∂
2
φ
∂x
2
+
∂
2
φ
∂y
2
=
∂
2
φ
∂z∂ ¯z
,
where we define z = x + iy, ¯z = x − iy. Then the general solution is
φ(z, ¯z) = ψ(z) + χ(¯z)
for some ψ, χ.
Suppose we want a solution obeying the Dirichlet boundary condition
φ
(
z, ¯z
)

∂Ω
=
f
(
θ
), where the boundary
∂
Ω is the unit circle. Since the do
main is the unit circle S
1
, we can Fourierexpand it! We can write
f(θ) =
X
n∈Z
ˆ
f
n
e
inθ
=
ˆ
f
0
+
∞
X
n=1
ˆ
f
n
e
inθ
+
∞
X
n=1
ˆ
f
−n
e
−inθ
.
However, we know that
z
=
re
iθ
and
¯z
=
re
iθ
. So on the boundary, we know
that z = e
iθ
and ¯z = e
−iθ
. So we can write
f(θ) =
ˆ
f
0
+
∞
X
n=1
(
ˆ
f
n
z
n
+
ˆ
f
−n
¯z
n
)
This is defined for z = 1. Now we can define φ on the unit disk by
φ(z, ¯z) =
ˆ
f
0
+
∞
X
n=1
ˆ
f
n
z
n
+
∞
X
n=1
ˆ
f
−n
¯z
n
.
It is clear that this obeys the boundary conditions by construction. Also,
φ
(
z, ¯z
)
certainly converges when
z <
1 if the series for
f
(
θ
) converged on
∂
Ω. Also,
φ
(
z, ¯z
) is of the form
ψ
(
z
) +
χ
(
¯z
). So it is a solution to Laplace’s equation on
the unit disk. Since the solution is unique, we’re done!