3Sturm-Liouville Theory
IB Methods
3.1 Sturm-Liouville operators
In finite dimensions, we often consider linear maps
M
:
V → W
. If
{v
i
}
is a
basis for
V
and
{w
i
}
is a basis for
W
, then we can represent the map by a
matrix with entries
M
ai
= (w
a
, Mv
i
).
A map
M
:
V → V
is called self-adjoint if
M
†
=
M
as matrices. However, it is
not obvious how we can extend this notion to arbitrary maps between arbitrary
vector spaces (with an inner product) when they cannot be represented by a
matrix.
Instead, we make the following definitions:
Definition
(Adjoint and self-adjoint)
.
The adjoint
B
of a map
A
:
V → V
is a
map such that
(Bu, v) = (u, Av)
for all vectors u, v ∈ V . A map is then self-adjoint if
(Mu, v) = (u, Mv).
Self-adjoint matrices come with a natural basis. Recall that the eigenvalues
of a matrix are the roots of
det
(
M − λI
) = 0. The eigenvector corresponding to
an eigenvalue λ is defined by M v
i
= λ
i
v
i
.
In general, eigenvalues can be any complex number. However, self-adjoint
maps have real eigenvalues. Suppose
Mv
i
= λ
i
v
i
.
Then we have
λ
i
(v
i
, v
i
) = (v
i
, Mv
i
) = (Mv
i
, v
i
) = λ
∗
i
(v
i
, v
i
).
So λ
i
= λ
∗
i
.
Furthermore, eigenvectors with distinct eigenvalues are orthogonal with
respect to the inner product. Suppose that
Mv
i
= λ
i
v
i
, Mv
j
= λ
j
v
j
.
Then
λ
i
(v
j
, v
i
) = (v
j
, Mv
i
) = (Mv
j
, v
i
) = λ
j
(v
j
, v
i
).
Since λ
i
6= λ
j
, we must have (v
j
, v
i
) = 0.
Knowing eigenvalues and eigenvalues gives a neat way so solve linear equations
of the form
Mu = f .
Here we are given
M
and
f
, and want to find
u
. Of course, the answer is
u = M
−1
f. However, if we expand in terms of eigenvectors, we obtain
Mu = M
X
u
i
v
i
=
X
u
i
λ
i
v
i
.
Hence we have
X
u
i
λ
i
v
i
=
X
f
i
v
i
.
Taking the inner product with v
j
, we know that
u
j
=
f
j
λ
j
.
So far, these are all things from IA Vectors and Matrices. Sturm-Liouville theory
is the infinite-dimensional analogue.
In our vector space of differentiable functions, our “matrices” would be linear
differential operators L. For example, we could have
L = A
p
(x)
d
p
dx
p
+ A
p−1
(x)
d
p−1
dx
p−1
+ ··· + A
1
(x)
d
dx
+ A
0
(x).
It is an easy check that this is in fact linear.
We say L has order p if the highest derivative that appears is
d
p
dx
p
.
In most applications, we will be interested in the case
p
= 2. When will our
L be self-adjoint?
In the p = 2 case, we have
Ly = P
d
2
y
dx
2
+ R
dy
dx
− Qy
= P
d
2
y
dx
2
+
R
P
dy
dx
−
Q
P
y
= P
e
−
R
R
P
dx
d
dx
e
R
R
P
dx
dy
dx
−
Q
P
y
Let p = exp
R
R
P
dx
. Then we can write this as
= P p
−1
d
dx
p
dy
dx
−
Q
P
py
.
We further define
q
=
Q
P
p
. We also drop a factor of
P p
−1
. Then we are left with
L =
d
dx
p(x)
d
dx
− q(x).
This is the Sturm-Liouville form of the operator. Now let’s compute (
f, Lg
). We
integrate by parts numerous times to obtain
(f, Lg) =
Z
b
a
f
∗
d
dx
p
dg
dx
− qg
dx
= [f
∗
pg
0
]
b
a
−
Z
b
a
df
∗
dx
p
dg
dx
+ f
∗
qg
dx
= [f
∗
pg
0
− f
0∗
pg]
b
a
+
Z
b
a
d
dx
p
df
∗
dx
− qf
∗
g dx
= [(f
∗
g
0
− f
0∗
g)p]
b
a
+ (Lf, g),
assuming that p, q are real.
So 2nd order linear differential operators are self-adjoint with respect to this
norm if
p, q
are real and the boundary terms vanish. When do the boundary
terms vanish? One possibility is when
p
is periodic (with the right period), or if
we constrain f and g to be periodic.
Example. We can consider a simple case, where
L =
d
2
dx
2
.
Here we have p = 1, q = 0. If we ask for functions to be periodic on [a, b], then
Z
b
a
f
∗
d
2
g
dx
2
dx =
Z
b
a
d
2
f
∗
dx
2
g dx.
Note that it is important that we have a second-order differential operator. If it
is first-order, then we would have a negative sign, since we integrated by parts
once.
Just as in finite dimensions, self-adjoint operators have eigenfunctions and
eigenvalues with special properties. First, we define a more sophisticated inner
product.
Definition
(Inner product with weight)
.
An inner product with weight
w
,
written ( ·, ·)
w
, is defined by
(f, g)
w
=
Z
b
a
f
∗
(x)g(x)w(x) dx,
where w is real, non-negative, and has only finitely many zeroes.
Why do we want a weight
w
(
x
)? In the future, we might want to work with
the unit disk, instead of a square in
R
2
. When we want to use polar coordinates,
we will have to integrate with
r
d
r
d
θ
, instead of just d
r
d
θ
. Hence we need the
weight of
r
. Also, we allow it to have finitely many zeroes, so that the radius
can be 0 at the origin.
Why can’t we have more zeroes? We want the inner product to keep the
property that (
f, f
)
w
= 0 iff
f
= 0 (for continuous
f
). If
w
is zero at too many
places, then the inner product could be zero without f being zero.
We now define what it means to be an eigenfunction.
Definition
(Eigenfunction with weight)
.
An eigenfunction with weight
w
of
L
is a function y : [a, b] → C obeying the differential equation
Ly = λwy,
where λ ∈ C is the eigenvalue.
This might be strange at first sight. It seems like we can take any nonsense
y
, apply
L
, to get some nonsense
Ly
. But then it is fine, since we can write it
as some nonsense
w
times our original
y
. So any function is an eigenfunction?
No! There are many constraints
w
has to satisfy, like being positive, real and
having finitely many zeroes. It turns out this severely restraints what values
y
can take, so not everything will be an eigenfunction. In fact we can develop
this theory without the weight function
w
. However, weight functions are much
more convenient when, say, dealing with the unit disk.
Proposition. The eigenvalues of a Sturm-Liouville operator are real.
Proof. Suppose Ly
i
= λ
i
wy
i
. Then
λ
i
(y
i
, y
i
)
w
= λ
i
(y
i
, wy
i
) = (y
i
, Ly
i
) = (Ly
i
, y
i
) = (λ
i
wy
i
, y
i
) = λ
∗
i
(y
i
, y
i
)
w
.
Since (y
i
, y
i
)
w
6= 0, we have λ
i
= λ
∗
i
.
Note that the first and last terms use the weighted inner product, but the
middle terms use the unweighted inner product.
Proposition.
Eigenfunctions with different eigenvalues (but same weight) are
orthogonal.
Proof. Let Ly
i
= λ
i
wy
i
and Ly
j
= λ
j
wy
j
. Then
λ
i
(y
j
, y
i
)
w
= (y
j
, Ly
i
) = (Ly
j
, y
i
) = λ
j
(y
j
, y
i
)
w
.
Since λ
i
6= λ
j
, we must have (y
j
, y
i
)
w
= 0.
Those were pretty straightforward manipulations. However, the main results
of Sturm–Liouville theory are significantly harder, and we will not prove them.
We shall just state them and explore some examples.
Theorem. On a compact domain, the eigenvalues λ
1
, λ
2
, ··· form a countably
infinite sequence and are discrete.
This will be a rather helpful result in quantum mechanics, since in quantum
mechanics, the possible values of, say, the energy are the eigenvalues of the
Hamiltonian operator. Then this result says that the possible values of the
energy are discrete and form an infinite sequence.
Note here the word compact. In quantum mechanics, if we restrict a particle
in a well [0
,
1], then it will have quantized energy level since the domain is
compact. However, if the particle is free, then it can have any energy at all since
we no longer have a compact domain. Similarly, angular momentum is quantized,
since it describe rotations, which takes values in S
1
, which is compact.
Theorem.
The eigenfunctions are complete: any function
f
: [
a, b
]
→ C
(obeying
appropriate boundary conditions) can be expanded as
f(x) =
X
n
ˆ
f
n
y
n
(x),
where
ˆ
f
n
=
Z
y
∗
n
(x)f(x)w(x) dx.
Example.
Let [
a, b
] = [
−L, L
],
L
=
d
2
dx
2
,
w
= 1, restricting to periodic functions
Then our eigenfunction obeys
d
2
y
n
dx
2
= λ
n
y
n
(x),
Then our eigenfunctions are
y
n
(x) = exp
inπx
L
with eigenvalues
λ
n
= −
n
2
π
2
L
2
for n ∈ Z. This is just the Fourier series!
Example
(Hermite polynomials)
.
We are going to cheat a little bit and pick
our domain to be R. We want to study the differential equation
1
2
H
00
− xH
0
= λH,
with H : R → C. We want to put this in Sturm-Liouville form. We have
p(x) = exp
−
Z
x
0
2t dt
= e
−x
2
,
ignoring constant factors. Then q(x) = 0. We can rewrite this as
d
dx
e
−x
2
dH
dx
= −2λe
−x
2
H(x).
So we take our weight function to be w(x) = e
−x
2
.
We now ask that
H
(
x
) grows at most polynomially as
|x| → ∞
. In particular,
we want
e
−x
2
H
(
x
)
2
→
0. This ensures that the boundary terms from integration
by parts vanish at the infinite boundary, so that our Sturm-Liouville operator is
self-adjoint.
The eigenfunctions turn out to be
H
n
(x) = (−1)
n
e
x
2
d
n
dx
n
e
−x
2
.
These are known as the Hermite polynomials. Note that these are indeed
polynomials. When we differentiate the
e
−x
2
term many times, we get a lot
of things from the product rule, but they will always keep an
e
−x
2
, which will
ultimately cancel with e
x
2
.
Just as for matrices, we can use the eigenfunction expansion to solve forced
differential equations. For example, if might want to solve
Lg = f (x),
where f(x) is a forcing term. We can write this as
Lg = w(x)F (x).
We expand our g as
g(x) =
X
n∈Z
ˆg
n
y
n
(x).
Then by linearity,
Lg =
X
n∈Z
ˆg
n
Ly
n
(x) =
X
n∈Z
ˆg
n
λ
n
w(x)y
n
(x).
We can also expand our forcing function as
w(x)F (x) = w(x)
X
n∈Z
ˆ
F
n
y
n
(x)
!
.
Taking the (regular) inner product with
y
m
(
x
) (and noting orthogonality of
eigenfunctions), we obtain
w(x)ˆg
m
λ
m
= w(x)
ˆ
F
m
This tells us that
ˆg
m
=
ˆ
F
m
λ
m
.
So we have
g(x) =
X
n∈Z
ˆ
F
n
λ
n
y
n
(x),
provided all λ
n
are non-zero.
This is a systematic way of solving forced differential equations. We used to
solve these by “being smart”. We just looked at the forcing term and tried to
guess what would work. Unsurprisingly, this approach does not succeed all the
time. Thus it is helpful to have a systematic way of solving the equations.
It is often helpful to rewrite this into another form, using the fact that
ˆ
F
n
= (y
n
, F )
w
. So we have
g(x) =
X
n∈Z
1
λ
n
(y
n
, F )
w
y
n
(x) =
Z
b
a
X
n∈Z
1
λ
n
y
∗
n
(t)y
n
(x)w(t)F (t) dt.
Note that we swapped the sum and the integral, which is in general a dangerous
thing to do, but we don’t really care because this is an applied course. We can
further write the above as
g(x) =
Z
b
a
G(x, t)F (t)w(t) dt,
where G(x, t) is the infinite sum
G(x, t) =
X
n∈Z
1
λ
n
y
∗
n
(t)y
n
(x).
We call this the Green’s function. Note that this depends on
λ
n
and
y
n
only.
It depends on the differential operator
L
, but not the forcing term
F
. We can
think of this as something like the “inverse matrix”, which we can use to solve
the forced differential equation for any forcing term.
Recall that for a matrix, the inverse exists if the determinant is non-zero,
which is true if the eigenvalues are all non-zero. Similarly, here a necessary
condition for the Green’s function to exist is that all the eigenvalues are non-zero.
We now have a second version of Parseval’s theorem.
Theorem (Parseval’s theorem II).
(f, f)
w
=
X
n∈Z
|
ˆ
f
n
|
2
.
Proof. We have
(f, f)
w
=
Z
Ω
f
∗
(x)f(x)w(x) dx
=
X
n,m∈Z
Z
Ω
ˆ
f
∗
n
y
∗
n
(x)
ˆ
f
m
y
m
(x)w(x) dx
=
X
n,m∈Z
ˆ
f
∗
n
ˆ
f
m
(y
n
, y
m
)
w
=
X
n∈Z
|
ˆ
f
n
|
2
.