Part II Integrable Systems
Based on lectures by A. Ashton
Notes taken by Dexter Chua
Michaelmas 2016
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Part IB Methods, and Complex Methods or Complex Analysis are essential; Part II
Classical Dynamics is desirable.
Integrability of ordinary differential equations: Hamiltonian systems and the Arnol’d–
Liouville Theorem (sketch of proof). Examples. [3]
Integrability of partial differential equations: The rich mathematical structure and
the universality of the integrable nonlinear partial differential equations (Korteweg-de
Vries, sine–Gordon). Backlund transformations and soliton solutions. [2]
The inverse scattering method: Lax pairs. The inverse scattering method for the
KdV equation, and other integrable PDEs. Multi soliton solutions. Zero curvature
representation. [6]
Hamiltonian formulation of soliton equations. [2]
Painleve equations and Lie symmetries: Symmetries of differential equations, the ODE
reductions of certain integrable nonlinear PDEs, Painleve equations. [3]
Contents
0 Introduction
1 Integrability of ODE’s
1.1 Vector fields and flow maps
1.2 Hamiltonian dynamics
1.3 Canonical transformations
1.4 The Arnold-Liouville theorem
2 Partial Differential Equations
2.1 KdV equation
2.2 Sine–Gordon equation
2.3 acklund transformations
3 Inverse scattering transform
3.1 Forward scattering problem
3.1.1 Continuous spectrum
3.1.2 Discrete spacetime and bound states
3.1.3 Summary of forward scattering problem
3.2 Inverse scattering problem
3.3 Lax pairs
3.4 Evolution of scattering data
3.4.1 Continuous spectrum (λ = k
2
> 0)
3.4.2 Discrete spectrum (λ = κ
2
< 0)
3.4.3 Summary of inverse scattering transform
3.5 Reflectionless potentials
3.6 Infinitely many first integrals
4 Structure of integrable PDEs
4.1 Infinite dimensional Hamiltonian system
4.2 Bihamiltonian systems
4.3 Zero curvature representation
4.4 From Lax pairs to zero curvature
5 Symmetry methods in PDEs
5.1 Lie groups and Lie algebras
5.2 Vector fields and one-parameter groups of transformations
5.3 Symmetries of differential equations
5.4 Jets and prolongations
5.5 Painlev´e test and integrability
0 Introduction
What is an integrable system? Unfortunately, an integrable system is a some-
thing mathematicians have not yet managed to define properly. Intuitively, an
integrable system is a differential equation we can “integrate up” directly. While
in theory, integrable systems should be very rare, it happens that in nature, a
lot of systems happen to be integrable. By exploiting the fact that they are
integrable, we can solve them much more easily.
1 Integrability of ODE’s
1.1 Vector fields and flow maps
In the first section, we are going to look at the integrability of ODE’s. Here we
are going to consider a general
m
-dimensional first order non-linear ODE’s. As
always, restricting to only first-order ODE’s is not an actual restriction, since
any higher-order ODE can be written as a system of first-order ODE’s. At the
end, we will be concerned with a special kind of ODE given by a Hamiltonian
system. However, in this section, we first give a quick overview of the general
theory of ODEs.
An m-dimensional ODE is specified by a vector field V : R
m
R
m
and an
initial condition
x
0
R
m
. The objective is to find some
x
(
t
)
R
m
, which is a
function of t (a, b) for some interval (a, b) containing 0, satisfying
˙
x = V(x), x(0) = x
0
.
In this course, we will assume the vector field
V
is sufficiently “nice”, so that
the following result holds:
Fact.
For a “nice” vector field
V
and any initial condition
x
0
, there is always
a unique solution to
˙
x
=
V
(
x
),
x
(0) =
x
0
. Moreover, this solution depends
smoothly (i.e. infinitely differentiably) on t and x
0
.
It is convenient to write the solution as
x(t) = g
t
x
0
,
where
g
t
:
R
m
R
m
is called the flow map. Since
V
is nice, we know this is a
smooth map. This flow map has some nice properties:
Proposition.
(i) g
0
= id
(ii) g
t+s
= g
t
g
s
(iii) (g
t
)
1
= g
t
If one knows group theory, then this says that
g
is a group homomorphism
from
R
to the group of diffeomorphisms of
R
m
, i.e. the group of smooth invertible
maps R
m
R
m
.
Proof.
The equality
g
0
=
id
is by definition of
g
, and the last equality follows
from the first two since t + (t) = 0. To see the second, we need to show that
g
t+s
x
0
= g
t
(g
s
x
0
)
for any
x
0
. To do so, we see that both of them, as a function of
t
, are solutions
to
˙
x = V(x), x(0) = g
s
x
0
.
So the result follows since solutions are unique.
We say that
V
is the infinitesimal generator of the flow
g
t
. This is because
we can Taylor expand.
x(ε) = g
ε
x
0
= x(0) + ε
˙
x(0) + o(ε) = x
0
+ εV(x
0
) + o(ε).
Given vector fields
V
1
, V
2
, one natural question to ask is whether their flows
commute, i.e. if they generate g
t
1
and g
s
2
, then must we have
g
t
1
g
s
2
x
0
= g
s
2
g
t
1
x
0
for all
x
0
? In general, this need not be true, so we might be interested to find out
if this happens to be true for particular
V
1
, V
2
. However, often, it is difficult
to check this directly, because differential equations are generally hard to solve,
and we will probably have a huge trouble trying to find explicit expressions for
g
1
and g
2
.
Thus, we would want to be able to consider this problem at an infinitesimal
level, i.e. just by looking at
V
1
, V
2
themselves. It turns out the answer is given
by the commutator:
Definition
(Commutator)
.
For two vector fields
V
1
, V
2
:
R
m
R
m
, we define
a third vector field called the commutator by
[V
1
, V
2
] =
V
1
·
x
V
2
V
2
·
x
V
1
,
where we write
x
=
x
1
, ··· ,
x
n
T
.
More explicitly, the ith component is given by
[V
1
, V
2
]
i
=
m
X
j=1
(V
1
)
j
x
j
(V
2
)
i
(V
2
)
j
x
j
(V
1
)
i
The result we have is
Proposition.
Let
V
1
, V
2
be vector fields with flows
g
t
1
and
g
s
2
. Then we have
[V
1
, V
2
] = 0 g
t
1
g
s
2
= g
s
2
g
t
1
.
Proof. See example sheet 1.
1.2 Hamiltonian dynamics
From now on, we are going to restrict to a very special kind of ODE, known as
a Hamiltonian system. To write down a general ODE, the background setting
is just the space
R
n
. We then pick a vector field, and then we get an ODE.
To write down a Hamiltonian system, we need more things in the background,
but conversely we need to supply less information to get the system. These
Hamiltonian system are very useful in classical dynamics, and our results here
have applications in classical dynamics, but we will not go into the physical
applications here.
The background settings of a Hamiltonian is a phase space
M
=
R
2n
. Points
on M are described by coordinates
(q, p) = (q
1
, ··· , q
n
, p
1
, ··· , p
n
).
We tend to think of the
q
i
are “generalized positions” of particles, and the
p
n
as
the “generalized momentum” coordinates. We will often write
x = (q, p)
T
.
It is very important to note that here we have “paired up” each
q
i
with the
corresponding
p
i
. In normal
R
n
, all the coordinates are equal, but this is
no longer the case here. To encode this information, we define the 2
n ×
2
n
anti-symmetric matrix
J =
0 I
n
I
n
0
.
We call this the symplectic form, and this is the extra structure we have for a
phase space. We will later see that all the things we care about can be written
in terms of
J
, but for practical purposes, we will often express them in terms of
p and q instead.
The first example is the Poisson bracket:
Definition
(Poisson bracket)
.
For any two functions
f, g
:
M R
, we define
the Poisson bracket by
{f, g} =
f
x
J
g
x
=
f
q
·
g
p
f
p
·
g
q
.
This has some obvious and not-so-obvious properties:
Proposition.
(i) This is linear in each argument.
(ii) This is antisymmetric, i.e. {f, g} = −{g, f}.
(iii) This satisfies the Leibniz property:
{f, gh} = {f, g}h + {f, h}g.
(iv) This satisfies the Jacobi identity:
{f, {g, h}} + {g, {h, f}} + {h, {f, g}} = 0.
(v) We have
{q
i
, q
j
} = {p
i
, p
j
} = 0, {q
i
, p
j
} = δ
ij
.
Proof.
Just write out the definitions. In particular, you will be made to write
out the 24 terms of the Jacobi identity in the first example sheet.
We will be interested in problems on M of the following form:
Definition
(Hamilton’s equation)
.
Hamilton’s equation is an equation of the
form
˙
q =
H
p
,
˙
p =
H
q
()
for some function H : M R called the Hamiltonian.
Just as we think of
q
and
p
as generalized position and momentum, we tend
to think of H as generalized energy.
Note that given the phase space
M
, all we need to specify a Hamiltonian
system is just a Hamiltonian function
H
:
M R
, which is much less information
that that needed to specify a vector field.
In terms of J, we can write Hamilton’s equation as
˙
x = J
H
x
.
We can imagine Hamilton’s equation as specifying the trajectory of a particle.
In this case, we might want to ask how, say, the speed of the particle changes as
it evolves. In general, suppose we have a smooth function
f
:
M R
. We want
to find the value of
df
dt
. We simply have to apply the chain rule to obtain
df
dt
=
d
dt
f(x(t)) =
f
x
·
˙
x =
f
x
J
H
x
= {f, H}.
We record this result:
Proposition.
Let
f
:
M R
be a smooth function. If
x
(
t
) evolves according
to Hamilton’s equation, then
df
dt
= {f, H}.
In particular, a function
f
is constant if and only if
{f, H}
= 0. This is
very convenient. Without a result like this, if we want to see if
f
is a conserved
quantity of the particle (i.e.
df
dt
= 0), we might have to integrate the equations
of motion, and then try to find explicitly what is conserved, or perhaps mess
around with the equations of motion to somehow find that
df
dt
vanishes. However,
we now have a very systematic way of figuring out if
f
is a conserved quantity
we just compute {f, H}.
In particular, we automatically find that the Hamiltonian is conserved:
dH
dt
= {H, H} = 0.
Example.
Consider a particle (of unit mass) with position
q
= (
q
1
, q
2
, q
3
) (in
Cartesian coordinates) moving under the influence of a potential
U
(
q
). By
Newton’s second law, we have
¨
q =
U
q
.
This is actually a Hamiltonian system. We define the momentum variables by
p
i
= ˙q
i
,
then we have
˙
x =
˙
q
˙
p
=
p
U
q
= J
H
x
,
with
H =
1
2
|p|
2
+ U (q).
This is just the usual energy! Indeed, we can compute
H
p
= p,
H
q
=
U
q
.
Definition
(Hamiltonian vector field)
.
Given a Hamiltonian function
H
, the
Hamiltonian vector field is given by
V
H
= J
H
x
.
We then see that by definition, the Hamiltonian vector field generates the
Hamiltonian flow. More generally, for any f : M R, we call
V
f
= J
f
x
.
This is the Hamiltonian vector field with respect to f.
We know have two bracket-like things we can form. Given two
f, g
, we can
take the Poisson bracket to get
{f, g}
, and consider its Hamiltonian vector field
V
{f,g}
. On the other hand, we can first get
V
f
and
V
g
, and then take the
commutator of the vector fields. It turns out these are not equal, but differ by a
sign.
Proposition. We have
[V
f
, V
g
] = V
{f,g}
.
Proof. See first example sheet.
Definition
(First integral)
.
Given a phase space
M
with a Hamiltonian
H
, we
call f : M R a first integral of the Hamiltonian system if
{f, H} = 0.
The reason for the term “first integral” is historical when we solve a
differential equation, we integrate the equation. Every time we integrate it, we
obtain a new constant. And the first constant we obtain when we integrate is
known as the first integral. However, for our purposes, we can just as well think
of it as a constant of motion.
Example.
Consider the two-body problem the Sun is fixed at the origin,
and a planet has Cartesian coordinates
q
= (
q
1
, q
2
, q
3
). The equation of motion
will be
¨
q =
q
|q|
3
.
This is equivalent to the Hamiltonian system p =
˙
q, with
H =
1
2
|p|
2
1
|q|
.
We have an angular momentum given by
L = q p.
Working with coordinates, we have
L
i
= ε
ijk
q
j
p
k
.
We then have (with implicit summation)
{L
i
, H} =
L
i
q
`
H
p
`
L
i
p
`
H
q
`
= ε
ijk
p
k
δ
`j
p
`
+
q
j
q
k
|q|
3
= ε
ijk
p
k
p
j
+
q
j
q
k
|q|
3
= 0,
where we know the thing vanishes because we contracted a symmetric tensor
with an antisymmetric one. So this is a first integral.
Less interestingly, we know
H
is also a first integral. In general, some
Hamiltonians have many many first integrals.
Our objective of the remainder of the chapter is to show that if our Hamilto-
nian system has enough first integrals, then we can find a change of coordinates
so that the equations of motion are “trivial”. However, we need to impose some
constraints on the integrals for this to be true. We will need to know about the
following words:
Definition (Involution). We say that two first integrals F, G are in involution
if {F, G} = 0 (so F and G Poisson commute”).
Definition
(Independent first integrals)
.
A collection of functions
f
i
:
M
R
are independent if at each
x M
, the vectors
f
i
x
for
i
= 1
, ··· , n
are
independent.
In general we will say a system is “integrable” if we can find a change of
coordaintes so that the equations of motion become “trivial” and we can just
integrate it up. This is a bit vague, so we will define integrability in terms of
the existence of first integrals, and then we will later see that if these conditions
are satisfied, then we can indeed integrate it up.:
Definition
(Integrable system)
.
A 2
n
-dimensional Hamiltonian system (
M, H
)
is integrable if there exists
n
first integrals
{f
i
}
n
i=1
that are independent and in
involution (i.e. {f
i
, f
j
} = 0 for all i, j).
The word independent is very important, or else people will cheat, e.g. take
H, 2H, e
H
, H
2
, ···.
Example. Two-dimensional Hamiltonian systems are always integrable.
1.3 Canonical transformations
We now come to the main result of the chapter. We will show that we can indeed
integrate up integrable systems. We are going to show that there is a clever
choice of coordinates such Hamilton’s equations become “trivial”. However,
recall that the coordinates in a Hamiltonian system are not arbitrary. We have
somehow “paired up”
q
i
and
p
i
. So we want to only consider coordinate changes
that somehow respect this pairing.
There are many ways we can define what it means to “respect” the pairing.
We will pick a simple definition we require that it preserves the form of
Hamilton’s equation.
Suppose we had a general coordinate change (q, p) 7→ (Q(q, p), P(q, p)).
Definition
(Canonical transformation)
.
A coordinate change (
q, p
)
7→
(
Q, P
)
is called canonical if it leaves Hamilton’s equations invariant, i.e. the equations
in the original coordinates
˙
q =
H
q
, p =
H
q
.
is equivalent to
˙
Q =
˜
H
P
,
˙
P =
˜
H
Q
,
where
˜
H(Q, P) = H(q, p).
If we write x = (q, p) and y = (Q, P), then this is equivalent to asking for
˙
x = J
H
x
˙
y = J
˜
H
y
.
Example.
If we just swap the
q
and
p
around, then the equations change by a
sign. So this is not a canonical transformation.
Example.
The simplest possible case of a canonical transformation is a linear
transformation. Consider a linear change of coordinates given by
x 7→ y(x) = Ax.
We claim that this is canonical iff AJA
t
= J, i.e. that A is symplectic.
Indeed, by linearity, we have
˙
y = A
˙
x = AJ
H
x
.
Setting
˜
H(y) = H(x), we have
H
x
i
=
y
j
x
i
˜
H(y)
y
j
= A
ji
˜
H(y)
y
j
=
"
A
T
˜
H
y
#
i
.
Putting this back in, we have
˙
y = AJA
T
˜
H
y
.
So y 7→ y(x) is canonical iff J = AJA
T
.
What about more general cases? Recall from IB Analysis II that a differ-
entiable map is “locally linear”. Now Hamilton’s equations are purely local
equations, so we might expect the following:
Proposition. A map x 7→ y(x) is canonical iff Dy is symplectic, i.e.
DyJ(Dy)
T
= J.
Indeed, this follows from a simple application of the chain rule.
Generating functions
We now discuss a useful way of producing canonical transformation, known as
generating functions. In general, we can do generating functions in four different
ways, but they are all very similar, so we will just do one that will be useful
later on.
Suppose we have a function
S
:
R
2n
R
. We suggestively write its arguments
as S(q, P). We now set
p =
S
q
, Q =
S
P
.
By this equation, we mean we write down the first equation, which allows us to
solve for
P
in terms of
q, p
. Then the second equation tells us the value of
Q
in
terms of q, P, hence in terms of p, q.
Usually, the way we use this is that we already have a candidate for what
P
should be. We then try to find a function
S
(
q, P
) such that the first equation
holds. Then the second equation will tell us what the right choice of Q is.
Checking that this indeed gives rise to a canonical transformation is just a
very careful application of the chain rule, which we shall not go into. Instead,
we look at a few examples to see it in action.
Example. Consider the generating function
S(q, P) = q · P.
Then we have
p =
S
q
= P, Q =
S
P
= q.
So this generates the identity transformation (Q, P) = (q, p).
Example.
In a 2-dimensional phase space, we consider the generating function
S(q, P ) = qP + q
2
.
Then we have
p =
S
q
= P + 2q, Q =
S
P
= q.
So we have the transformation
(Q, P ) = (q, p 2q).
In matrix form, this is
Q
P
=
1 0
2 1
q
p
.
To see that this is canonical, we compute
1 0
2 1
J
1 0
2 1
T
=
1 0
2 1
0 1
1 0
1 2
0 1
=
0 1
1 0
So this is indeed a canonical transformation.
1.4 The Arnold-Liouville theorem
We now get to the Arnold-Liouville theorem. This theorem says that if a
Hamiltonian system is integrable, then we can find a canonical transformation
(
q, p
)
7→
(
Q, P
) such that
˜
H
depends only on
P
. If this happened, then
Hamilton’s equations reduce to
˙
Q =
˜
H
P
,
˙
P =
˜
H
Q
= 0,
which is pretty easy to solve. We find that
P
(
t
) =
P
0
is a constant, and since
the right hand side of the first equation depends only on
P
, we find that
˙
Q
is
also constant! So Q = Q
0
+ t, where
Ω =
˜
H
P
(P
0
).
So the solution just falls out very easily.
Before we prove the Arnold-Liouville theorem in full generality, we first see
how the canonical transformation looks like in a very particular case. Here we
will just have to write down the canonical transformation and see that it works,
but we will later find that the Arnold-Liouville theorem give us a general method
to find the transformation.
Example. Consider the harmonic oscillator with Hamiltonian
H(q, p) =
1
2
p
2
+
1
2
ω
2
q
2
.
Since is a 2-dimensional system, so we only need a single first integral. Since
H
is a first integral for trivial reasons, this is an integrable Hamiltonian system.
We can actually draw the lines on which
H
is constant they are just
ellipses:
q
q
We note that the ellipses are each homeomorphic to
S
1
. Now we introduce the
coordinate transformation (q, p) 7→ (φ, I), defined by
q =
r
2I
ω
sin φ, p =
2Iω cos φ,
For the purpose of this example, we can suppose we obtained this formula
through divine inspiration. However, in the Arnold-Liouville theorem, we will
provide a general way of coming up with these formulas.
We can manually show that this transformation is canonical, but it is merely
a computation and we will not waste time doing that. IN these new coordinates,
the Hamiltonian looks like
˜
H(φ, I) = H(q(φ, I), p(φ, I)) = ωI.
This is really nice. There is no φ! Now Hamilton’s equations become
˙
φ =
˜
H
I
= ω,
˙
I =
˜
H
φ
= 0.
We can integrate up to obtain
φ(t) = φ
0
+ ωt, I(t) = I
0
.
For some unexplainable reason, we decide it is fun to consider the integral along
paths of constant H:
1
2π
I
p dq =
1
2π
Z
2π
0
p(φ, I)
q
φ
dφ +
q
I
dI
=
1
2π
Z
2π
0
p(φ, I)
q
φ
dφ
=
1
2π
Z
2π
0
r
2I
ω
2Iω cos
2
φ dφ
= I
This is interesting. We could always have performed the integral
1
2π
H
p
d
q
along
paths of constant
H
without knowing anything about
I
and
φ
, and this would
have magically gave us the new coordinate I.
There are two things to take away from this.
(i) The motion takes place in S
1
(ii) We got I by performing
1
2π
H
p dq.
These two ideas are essentially what we are going to prove for general Hamiltonian
system.
Theorem
(Arnold-Liouville theorem)
.
We let (
M, H
) be an integrable 2
n
-
dimensional Hamiltonian system with independent, involutive first integrals
f
1
, ··· , f
n
, where f
1
= H. For any fixed c R
n
, we set
M
c
= {(q, p) M : f
i
(q, p) = c
i
, i = 1, ··· , n}.
Then
(i) M
c
is a smooth
n
-dimensional surface in
M
. If
M
c
is compact and
connected, then it is diffeomorphic to
T
n
= S
1
× ··· × S
1
.
(ii)
If
M
c
is compact and connected, then locally, there exists canonical coor-
dinate transformations (
q, p
)
7→
(
φ, I
) called the action-angle coordinates
such that the angles
{φ
k
}
n
k=1
are coordinates on
M
c
; the actions
{I
k
}
n
k=1
are first integrals, and
H
(
q, p
) does not depend on
φ
. In particular,
Hamilton’s equations
˙
I = 0,
˙
φ =
˜
H
I
= constant.
Some parts of the proof will refer to certain results from rather pure courses,
which the applied people may be willing to just take on faith.
Proof sketch.
The first part is pure differential geometry. To show that
M
c
is
smooth and
n
-dimensional, we apply the preimage theorem you may or may not
have learnt from IID Differential Geometry (which is in turn an easy consequence
of the inverse function theorem from IB Analysis II). The key that makes this
work is that the constraints are independent, which is the condition that allows
the preimage theorem to apply.
We next show that
M
c
is diffeomorphic to the torus if it is compact and
connected. Consider the Hamiltonian vector fields defined by
V
f
i
= J
f
i
x
.
We claim that these are tangent to the surface
M
c
. By differential geometry, it
suffices to show that the derivative of the
{f
j
}
in the direction of
V
f
i
vanishes.
We can compute
V
f
i
·
x
f
j
=
f
j
x
J
f
i
x
= {f
j
, f
i
} = 0.
Since this vanishes, we know that
V
f
i
is a tangent to the surface. Again by
differential geometry, the flow maps
{g
i
}
must map
M
c
to itself. Also, we know
that the flow maps commute. Indeed, this follows from the fact that
[V
f
i
, V
f
j
] = V
{f
i
,f
j
}
= V
0
= 0.
So we have a whole bunch of commuting flow maps from M
c
to itself. We set
g
t
= g
t
1
1
g
t
2
2
···g
t
n
n
,
where t R
n
. Then because of commutativity, we have
g
t
1
+t
2
= g
t
1
g
t
2
.
So this is gives a group action of
R
n
on the surface
M
c
. We fix
x M
c
. We
define
stab(x) = {t R
n
: g
t
x = x}.
We introduce the map
φ :
R
n
stab(x)
M
c
given by
φ
(
t
) =
g
t
x
. By the orbit-stabilizer theorem, this gives a bijection
between
R
n
/ stab
(
x
) and the orbit of
x
. It can be shown that the orbit of
x
is
exactly the connected component of
x
. Now if
M
c
is connected, then this must
be the whole of
x
! By general differential geometry theory, we get that this map
is indeed a diffeomorphism.
We know that
stab
(
x
) is a subgroup of
R
n
, and if the
g
i
are non-trivial, it can
be seen (at least intuitively) that this is discrete. Thus, it must be isomorphic
to something of the form Z
k
with 1 k n.
So we have
M
c
=
R
n
/ stab(x)
=
R
n
/Z
k
=
R
k
/Z
k
× R
nk
=
T
k
× R
nk
.
Now if
M
c
is compact, we must have
n k
= 0, i.e.
n
=
k
, so that we have no
factors of R. So M
c
=
T
n
.
With all the differential geometry out of the way, we can now construct the
action-angle coordinates.
For simplicity of presentation, we only do it in the case when
n
= 2. The
proof for higher dimensions is entirely analogous, except that we need to use
a higher-dimensional analogue of Green’s theorem, which we do not currently
have.
We note that it is currently trivial to re-parameterize the phase space with
coordinates (
Q, P
) such that
P
is constant within the Hamiltonian flow, and
each coordinate of
Q
takes values in
S
1
. Indeed, we just put
P
=
c
and use the
diffeomorphism
T
n
=
M
c
to parameterize each
M
c
as a product of
n
copies of
S
n
. However, this is not good enough, because such an arbitrary transformation
will almost certainly not be canonical. So we shall try to find a more natural
and in fact canonical way of parametrizing our phase space.
We first work on the generalized momentum part. We want to replace
c
with something nicer. We will do something analogous to the simple harmonic
oscillator we’ve got.
So we fix a
c
, and try to come up with some numbers
I
that labels this
M
c
.
Recall that our surface M
c
looks like a torus:
Up to continuous deformation of loops, we see that there are two non-trivial
“single” loops in the torus, given by the red and blue loops:
More generally, for an n torus, we have n such distinct loops Γ
1
, ··· , Γ
n
. More
concretely, after identifying M
c
with S
n
, these are the loops given by
{0} × ··· × {0} × S
1
× {0} × ··· × {0} S
1
.
We now attempt to define:
I
j
=
1
2π
I
Γ
j
p · dq,
This is just like the formula we had for the simple harmonic oscillator.
We want to make sure this is well-defined — recall that Γ
i
actually represents
a class of loops identified under continuous deformation. What if we picked a
different loop?
Γ
0
2
Γ
2
On M
c
, we have the equation
f
i
(q, p) = c
i
.
We will have to assume that we can invert this equation for
p
locally, i.e. we can
write
p = p(q, c).
The condition for being able to do so is just
det
f
i
p
j
6= 0,
which is not hard.
Then by definition, the following holds identically:
f
i
(q, p(q, c)) = c
i
.
We an then differentiate this with respect to q
k
to obtain
f
i
q
k
+
f
i
p
`
p
`
q
k
= 0
on M
c
. Now recall that the {f
i
}’s are in involution. So on M
c
, we have
0 = {f
i
, f
j
}
=
f
i
q
k
f
j
p
k
f
i
p
k
f
j
q
k
=
f
i
p
`
p
`
q
k
f
j
p
k
f
i
p
k
f
j
p
`
p
`
q
k
=
f
i
p
k
p
k
q
`
f
j
p
`
f
i
p
k
f
j
p
`
p
`
q
k
=
f
i
p
k
p
`
q
k
p
k
q
`
f
j
p
`
.
Recall that the determinants of the matrices
f
i
p
k
and
f
j
p
`
are non-zero, i.e. the
matrices are invertible. So for this to hold, the middle matrix must vanish! So
we have
p
`
q
k
p
k
q
`
= 0.
In our particular case of
n
= 2, since
`, k
can only be 1
,
2, the only non-trivial
thing this says is
p
1
q
2
p
2
q
1
= 0.
Now suppose we have two “simple” loops Γ
2
and Γ
0
2
. Then they bound an area
A:
Γ
0
2
Γ
2
A
Then we have
I
Γ
2
I
Γ
0
2
!
p · dq =
I
A
p · dq
=
ZZ
A
p
2
q
1
p
1
q
2
dq
1
dq
2
= 0
by Green’s theorem.
So I
j
is well-defined, and
I = I(c)
is just a function of
c
. This will be our new “momentum” coordinates. To figure
out what the angles
φ
should be, we use generating functions. For now, we
assume that we can invert I(c), so that we can write
c = c(I).
We arbitrarily pick a point x
0
, and define the generating function
S(q, I) =
Z
x
x
0
p(q
0
, c(I)) · dq
0
,
where
x
= (
q, p
) = (
q, p
(
q, c
(
I
))). However, this is not a priori well-defined,
because we haven’t said how we are going to integrate from
x
0
to
x
. We are going
to pick paths arbitrarily, but we want to make sure it is well-defined. Suppose
we change from a path γ
1
to γ
2
by a little bit, and they enclose a surface B.
x
0
x
γ
2
γ
1
B
Then we have
S(q, I) 7→ S(q, I) +
I
B
p · dq.
Again, we are integrating p ·dq around a boundary, so there is no change.
However, we don’t live in flat space. We live in a torus, and we can have a
crazy loop that does something like this:
Then what we have effectively got is that we added a loop (say) Γ
2
to our path,
and this contributes a factor of 2
πI
2
. In general, these transformations give
changes of the form
S(q, I) 7→ S(q, I) + 2πI
j
.
This is the only thing that can happen. So differentiating with respect to
I
, we
know that
φ =
S
I
is well-defined modulo 2
π
. These are the angles coordinates. Note that just like
angles, we can pick
φ
consistently locally without this ambiguity, as long as we
stay near some fixed point, but when we want to talk about the whole surface,
this ambiguity necessarily arises. Now also note that
S
q
= p.
Indeed, we can write
S =
Z
x
x
0
F · dx
0
,
where
F = (p, 0).
So by the fundamental theorem of calculus, we have
S
x
= F.
So we get that
S
q
= p.
In summary, we have constructed on M
c
the following: I = I(c), S(q, I), and
φ =
S
I
, p =
S
q
.
So
S
is a generator for the canonical transformation, and (
q, p
)
7→
(
φ, I
) is a
canonical transformation.
Note that at any point
x
, we know
c
=
f
(
x
). So
I
(
c
) =
I
(
f
) depends on the
first integrals only. So we have
˙
I = 0.
So Hamilton’s equations become
˙
φ =
˜
H
I
,
˙
I = 0 =
˜
H
φ
.
So the new Hamiltonian depends only on I. So we can integrate up and get
φ(t) = φ
0
+ t, I(t) = I
0
,
where
Ω =
˜
H
I
(I
0
).
To summarize, to integrate up an integrable Hamiltonian system, we identify
the different cycles Γ
1
, ··· , Γ
n
on M
c
. We then construct
I
j
=
1
2π
I
Γ
j
p · dq,
where p = p(q, c). We then invert this to say
c = c(I).
We then compute
φ =
S
I
,
where
S =
Z
x
x
0
p(q
0
, c(I)) · dq
0
.
Now we do this again with the Harmonic oscillator.
Example. In the harmonic oscillator, we have
H(q, p) =
1
2
p
2
+
1
2
ω
2
q
2
.
We then have
M
c
=
(q, p) :
1
2
p
2
+
1
2
ω
2
q
2
= c
.
The first part of the Arnold-Liouville theorem says this is diffeomorphic to
T
1
=
S
1
, which it is! The next step is to pick a loop, and there is an obvious
one the circle itself. We write
p = p(q, c) = ±
p
2c ω
2
q
2
on M
c
. Then we have
I =
1
2π
Z
p · dq =
c
ω
.
We can then write c as a function of I by
c = c(I) = ωI.
Now construct
S(q, I) =
Z
x
x
0
p(q
0
, c(I)) dq
0
.
We can pick x
0
to be the point corresponding to θ = 0. Then this is equal to
Z
q
0
p
2ωI ω
2
q
02
dq
0
.
To find φ, we need to differentiate this thing to get
φ =
S
I
= ω
Z
q
0
dq
0
p
2ωI ω
2
q
02
= sin
1
r
ω
2I
q
As expected, this is only well-defined up to a factor of 2
π
! Using the fact that
c = H, we have
q =
r
2π
ω
sin φ, p =
2Iω cos φ.
These are exactly the coordinates we obtained through divine inspiration last
time.
2 Partial Differential Equations
For the remainder of the course, we are going to look at PDE’s. We can view
these as infinite-dimensional analogues of ODE’s. So what do we expect for
integrable PDE’s? Recall that If an 2
n
-dimensional ODE is integrable, then we
n
first integrals. Since PDE’s are infinite-dimensional, and half of infinity is still
infinity, we would expect to have infinitely many first integrals. Similar to the
case of integrable ODE’s, we would also expect that there will be some magic
transformation that allows us to write down the solution with ease, even if the
initial problem looks very complicated.
These are all true, but our journey will be less straightforward. To begin with,
we will not define what integrability means, because it is a rather complicated
issue. We will go through one method of “integrating up” a PDE in detail,
known as the inverse scattering transform, and we will apply it to a particular
equation. Unfortunately, the way we apply the inverse scattering transform to a
PDE is not obvious, and here we will have to do it through “divine inspiration”.
Before we get to the inverse scattering transform, we first look at a few
examples of PDEs.
2.1 KdV equation
The KdV equation is given by
u
t
+ u
xxx
6uu
x
= 0.
Before we study the KdV equation, we will look at some variations of this where
we drop some terms, and then see how they compare.
Example. Consider the linear PDE
u
t
+ u
xxx
= 0,
where
u
=
u
(
x, t
) is a function on two variables. This admits solutions of the
form
e
ikxt
,
known as plane wave modes. For this to be a solution,
ω
must obey the dispersion
relation
ω = ω(k) = k
3
.
For any
k
, as long as we pick
ω
this way, we obtain a solution. By writing the
solution as
u(x, t) = exp
ik
x
ω(k)
k
t

,
we see that plane wave modes travel at speed
ω
k
= k
2
.
It is very important that the speed depends on
k
. Different plane wave modes
travel at different speeds. This is going to give rise to what we call dispersion.
A general solution is a superposition of plane wave modes
X
k
a(k)e
ikx(k)t
,
or even an uncountable superposition
Z
k
A(k)e
ikx(k)t
dk.
It is a theorem that for linear PDE’s on convex domains, all solutions are indeed
superpositions of plane wave modes. So this is indeed completely general.
So suppose we have an initial solution that looks like this:
We write this as a superposition of plane wave modes. As we let time pass,
different plane wave modes travel at different speeds, so this becomes a huge
mess! So after some time, it might look like
Intuitively, what gives us the dispersion is the third order derivative
3
x
. If we
had
x
instead, then there will be no dispersion.
Example. Consider the non-linear PDE
u
t
6uu
x
= 0.
This looks almost intractable, as non-linear PDE’s are scary, and we don’t know
what to do. However, it turns out that we can solve this for any initial data
u
(
x,
0) =
f
(
x
) via the method of characteristics. Details are left on the second
example sheet, but the solution we get is
u(x, t) = f (ξ),
where ξ is given implicitly by
ξ = x 6tf(ξ)
We can show that
u
x
becomes, in general, infinite in finite time. Indeed, we have
u
x
= f
0
(ξ)
ξ
x
.
We differentiate the formula for ξ to obtain
ξ
x
= 1 6tf
0
(ξ)
ξ
x
So we know
ξ
x
becomes infinite when 1+6
tf
0
(
ξ
) = 0. In general, this happens in
finite time, and at the time, we will get a straight slope. After that, it becomes
a multi-valued function! So the solution might evolve like this:
This is known as wave-breaking.
We can imagine that 6uu
x
gives us wave breaking.
What happens if we combine both of these effects?
Definition (KdV equation). The KdV equation is given by
u
t
+ u
xxx
6uu
x
= 0.
It turns out that this has a perfect balance between dispersion and non-
linearity. This admits very special solutions known as solitons. For example, a
1-solution solution is
u(x, t) = 2χ
2
1
sech
2
χ
1
(x 4χ
2
1
t)
.
The solutions tries to both topple over and disperse, and it turns out they
actually move like normal waves at a constant speed. If we look at the solution,
then we see that this has a peculiar property that the speed of the wave depends
on the amplitude the taller you are, the faster you move.
Now what if we started with two of these solitons? If we placed them far
apart, then they should not interact, and they would just individually move to
the right. But note that the speed depends on the amplitude. So if we put a
taller one before a shorter one, they might catch up with each other and then
collide! Indeed, suppose they started off looking like this:
After a while, the tall one starts to catch up:
Note that both of the humbs are moving to the right. It’s just that we had to
move the frame so that everything stays on the page. Soon, they collide into
each other:
and then they start to merge:
What do we expect to happen? The KdV equation is a very complicated non-
linear equation, so we might expect a lot of interactions, and the result to be a
huge mess. But nope. They pass through each other as if nothing has happened:
and then they just walk away
and then they depart.
This is like magic! If we just looked at the equation, there is no way we could
have guessed that these two solitons would interact in such an uneventful manner.
Non-linear PDEs in general are messy. But these are very stable structures in
the system, and they behave more like particles than waves.
At first, this phenomenon was discovered through numerical simulation.
However, later we will see that the KdV equation is integrable, and we can in
fact find explicit expressions for a general N -soliton equation.
2.2 Sine–Gordon equation
We next look at another equation that again has soliton solutions, known as the
sine–Gordon equation.
Definition (Sine–Gordon equation). The sine–Gordon equation is given by
u
tt
u
xx
+ sin u = 0.
This is known as the sine–Gordon equation, because there is a famous
equation in physics known as the Klein–Gordon equation, given by
u
tt
u
xx
+ u = 0.
Since we have a sine instead of a u, we call it a sine-Gordon equation!
There are a few ways we can motive the sine-Gordon equation. We will use
one from physics. Suppose we have a chain of pendulums of length
`
with masses
m:
m m m m m m
x
`
The pendulum will be allowed to rotate about the vertical plane, i.e. the plane
with normal along the horizontal line, and we specify the angle by
θ
i
(
t
). Since
we want to eventually take the limit as
x
0, we imagine
θ
is a function of
both space and time, and write this as θ
i
(t) = θ(ix, t).
Since gravity exists, each pendulum has a torque of
m`g sin θ
i
.
We now introduce an interaction between the different pendulum. We imagine
the masses are connected by some springs, so that the
i
th pendulum gets a
torque of
K(θ
i+1
θ
i
)
x
,
K(θ
i1
θ
i
)
x
.
By Newton’s laws, the equations of motion is
m`
2
d
2
θ
i
dt
2
= mg` sin θ
i
+
K(θ
i+1
2θ
i
+ θ
i1
)
x
.
We divide everything by
x
, and take the limit as
x
0, with
M
=
m
x
held
constant. We then end up with
M`
2
2
θ
t
2
= Mg` sin θ + K
2
θ
x
2
.
Making some simple coordinate scaling, this becomes
u
tt
u
xx
+ sin u = 0.
There is also another motivation for this from differential geometry. It turns out
solutions to the sine-Gordon equation correspond to pseudospherical surfaces in
R
3
, namely the surfaces that have constant negative curvature.
If we pick so-called light cone coordinates
ξ
=
1
2
(
x t
) and
τ
=
1
2
(
x
+
t
),
then the sine-Gordon equations become
2
u
ξτ
= sin u,
and often this is the form of the sine-Gordon equations we will encounter.
This also admits soliton solutions
u(x, t) = 4 tan
1
exp
x vt
1 v
2

.
We can check that this is indeed a solution for this non-linear PDE.
This solution looks like
2π
Now remember that
θ
was an angle. So 2
π
is just the same as 0! If we think of
the value of
u
as living in the circle
S
1
, then this satisfies the boundary condition
u 0 as x ±∞:
If we view it this way, it is absolutely obvious that no matter how this solution
evolves in time, it will never become, or even approach the “trivial” solution
u = 0, even though both satisfy the boundary condition u 0 as x ±∞.
2.3 acklund transformations
For a linear partial differential equation, we have the principle of superposition
if we have two solutions, then we can add them to get a third solution. This is
no longer true in non-linear PDE’s.
One way we can find ourselves a new solution is through a acklund trans-
formation. This originally came from geometry, where we wanted to transform a
surface to another, but we will only consider the applications to PDE’s.
The actual definition of the acklund transformation is complicated. So we
start with an example.
Example. Consider the Cauchy-Riemann equation
u
x
= v
u
, u
y
= v
x
.
We know that the pair
u, v
satisfies the Cauchy-Riemann equations, if and only
if both u, v are harmonic, i.e. u
xx
+ u
yy
= 0 etc.
Now suppose we have managed to find a harmonic function
v
=
v
(
x, y
). Then
we can try to solve the Cauchy-Riemann equations, and we would get another
harmonic function u = u(x, y).
For example, if v = 2xy, then we get the partial differential equations
u
x
= 2x, u
y
= 2y.
So we obtain
u(x, y) = x
2
y
2
+ C
for some constant
C
, and this function
u
is guaranteed to be a solution to
Laplace’s equations.
So the Cauchy-Riemann equation generates new solutions to Laplace’s equa-
tion from old ones. This is an example of an (auto-)B¨acklund transformation for
Laplace’s equation.
In general, we have the following definition:
Definition
(B¨acklund transformation)
.
A acklund transformation is a system
of equations that relate the solutions of some PDE’s to
(i) A solution to some other PDE; or
(ii) Another solution to the same PDE.
In the second case, we call it an auto-B¨acklund transformation.
Example.
The equation
u
xt
=
e
u
is related to the equation
v
xt
= 0 via the
acklund transformation
u
x
+ v
x
=
2 exp
u v
2
, u
t
v
t
=
2 exp
u + v
2
.
The verification is left as an exercise on the first example sheet. Since
v
xt
= 0 is
an easier equation to solve, this gives us a method to solve u
xt
= e
u
.
We also have examples of auto-B¨acklund transformations:
Example. For any non-zero constant ε, consider
ξ
(ϕ
1
ϕ
2
) = 2ε sin
ϕ
1
+ ϕ
2
2
τ
(ϕ
1
+ ϕ
2
) =
2
ε
sin
ϕ
1
ϕ
2
2
.
These equations come from geometry, and we will not go into details motivating
these. We can compute
2
ξτ
(ϕ
1
ϕ
2
) =
τ
2ε sin
ϕ
1
+ ϕ
2
2

= 2ε cos
ϕ
1
+ ϕ
2
2
τ
ϕ
1
+ ϕ
2
2
= 2ε cos
ϕ
1
+ ϕ
2
2
1
2
·
2
ε
sin
ϕ
1
ϕ
2
2
= 2 cos
ϕ
1
+ ϕ
2
2
sin
ϕ
1
ϕ
2
2
= sin ϕ
1
sin ϕ
2
.
It then follows that
2
ϕ
2
ξτ
= sin ϕ
2
2
ϕ
1
ξτ
= sin ϕ
1
.
In other words,
ϕ
1
solves the sine-Gordon equations in light cone coordinates,
if and only if
ϕ
2
does. So this gives an auto-B¨acklund transformation for the
sine-Gordon equation. Moreover, since we had a free parameter
ε
, we actually
have a family of auto-B¨acklund transforms.
For example, we already know a solution to the sine-Gordon equation, namely
ϕ
1
= 0. Using this, the equations say we need to solve
ϕ
ξ
= 2ε sin
ϕ
2
ϕ
τ
=
2
ε
sin
ϕ
2
.
We see this equation has some sort of symmetry between
ξ
and
τ
. So we use an
ansatz
ϕ(ξ, τ ) = 2χ(εξ ε
1
τ).
Then both equations tell us
dχ
dx
= sin χ.
We can separate this into
csc χ dχ = dx.
Integrating this gives us
log tan
χ
2
= x + C.
So we find
χ(x) = 2 tan
1
(Ae
x
).
So it follows that
ϕ(ξ, τ ) = 2 tan
1
(A(εξ + ε
1
τ)),
where
A
and
ε
are free parameters. After a bit more work, this was the 1-soliton
solution we previously found.
Applying the acklund transform again to this new solution produces multi-
soliton solutions.
3 Inverse scattering transform
Recall that in IB Methods, we decided we can use Fourier transforms to solve
PDE’s. For example, if we wanted to solve the Klein–Gordon equation
u
tt
u
xx
= u,
then we simply had to take the Fourier transform with respect to x to get
ˆu
tt
+ k
2
ˆu = ˆu.
This then becomes a very easy ODE in t:
ˆu
tt
= (1 k
2
)ˆu,
which we can solve. After solving for this, we can take the inverse Fourier
transform to get u.
The inverse scattering transform will follow a similar procedure, except it is
much more involved and magical. Again, given a differential equation in
u
(
x, t
),
for each fixed time
t
, we can transform the solution
u
(
x, t
) to something known
as the scattering data of
u
. Then the differential equation will tell us how the
scattering data should evolve. After we solved for the scattering data at all
times, we invert the transformation and recover the solution u.
We will find that each step of that process will be linear, i.e. easy, and this
will magically allow us to solve non-linear equations.
3.1 Forward scattering problem
Before we talk about the inverse scattering transform, it is helpful to know
what the forward problem is. This is, as you would have obviously guessed,
related to the Schr¨odinger operator we know and love from quantum mechanics.
Throughout this section, L will be the Schr¨odinger operator
L =
2
x
2
+ u(x),
where the “potential”
u
has compact support, i.e.
u
= 0 for
|x|
sufficiently large.
What we actually need is just that
u
decays quickly enough as
|x|
, but to
make our life easy, we do not figure out the precise conditions to make things
work, and just assume that
u
actually vanishes for large
|x|
. For a fixed
u
, we are
interested in an eigenvalue (or “spectral”) problem, i.e. we want to find solutions
to
= λψ.
This is the “forward” problem, i.e. given a
u
, we want to find the eigenvalues and
eigenfunctions. The inverse problem is given the collection of all such eigenvalues
and eigenfunctions, some sort of solutions like this, we want to find out what
u
is.
We will divide this into the continuous and discrete cases.
3.1.1 Continuous spectrum
Here we consider solutions to
=
k
2
ψ
for real
k
. Since
u
= 0 for
|x|
large, we
must have
ψ
xx
+ k
2
ψ = 0
for large |x|.
So solutions as
|x|
are linear combinations of
e
±ikx
. We look for specific
solutions for ψ = ϕ(x, k) defined by the condition
ϕ = e
ikx
as x −∞.
Then there must be coefficients a = a(k) and b = b(k) such that
φ(x, k) = a(k)e
ikx
+ b(k)e
ikx
as x +.
We define the quantities
Φ(x, k) =
ϕ(x, k)
a(k)
, R(k) =
b(k)
a(k)
, T (k) =
1
a(k)
.
Here
R
(
k
) is called the reflection coefficient, and
A
(
k
) is the transmission
coefficient. You may have seen these terms from IB Quantum Mechanics. Then
we can write
Φ(x, k) =
(
T (k)e
ikx
x −∞
e
ikx
+ R(k)e
kx
x +
.
We can view the
e
ikx
term as waves travelling to the left, and
e
ikx
as waves
travelling to the right. Thus in this scenario, we have an incident
e
ikx
wave
coming from the right, the potential reflects some portion of the wave, namely
R(k)e
ikx
, and transmits the remaining T (k)e
ikx
. It will be shown on the first
example sheet that in fact |T (k)|
2
+ |R(k)|
2
= 1.
What would happen when we change
k
? Since
k
is the “frequency” of the
wave, which is proportional to the energy we would expect that the larger
k
is,
the more of the wave is transmitted. Thus we might expect that
T
(
k
)
1, and
R
(
k
)
0. This is indeed true, but we will not prove it. We can think of these
as “boundary conditions” for T and R.
So far, we’ve only been arguing hypothetically about what the solution has
to look like if it existed. However, we do not know if there is a solution at all!
In general, differential equations are bad. They are hard to talk about,
because if we differentiate a function, it generally gets worse. It might cease to
be differentiable, or even continuous. This means differential operators could
take our function out of the relevant function space we are talking about. On
the other hand, integration makes functions look better. The more times we
integrate, the smoother it becomes. So if we want to talk about the existence of
solutions, it is wise to rewrite the differential equation as an integral solution
instead.
We consider the integral equation for f = f (x, k) given by
f(x, k) = f
0
(x, k) +
Z
−∞
G(x y, k)u(y)f(y, k) dy,
where
f
0
is any solution to (
2
x
+
k
2
)
f
0
= 0, and
G
is the Green’s function for
the differential operator
2
x
+ k
2
, i.e. we have
(
2
x
+ k
2
)G = δ(x).
What we want to show is that if we can find an
f
that satisfies this integral
equation, then it also satisfies the eigenvalue equation. We simply compute
(
2
x
+ k
2
)f = (
2
x
+ k
2
)f
0
+
Z
−∞
(
2
x
+ k
2
)G(x y, k)u(y)f(y, k) dy
= 0 +
Z
−∞
δ(x y)u(y)f(y, k) dy
= u(x)f(x, k).
In other words, we have
Lf = k
2
f.
So it remains to prove that solutions to the integral equation exists.
We pick f
0
= e
ikx
and
G(x, k) =
(
0 x < 0
1
k
sin(kx) x 0
.
Then our integral equation automatically implies
f(x, k) = e
ikx
as
x −∞
, because for very small
x
, either
x y <
0 or
y
is very small, so the
integral always vanishes as u has compact support.
To solve the integral equation, we write this in abstract form
(I K)f = f
0
,
where I is the identity, and
(Kf)(x) =
Z
−∞
G(x y, k)u(y)f(y, k) dy.
So we can “invert”
f = (I K)
1
f
0
.
We can “guess” a solution to the inverse. If we don’t care about rigour and just
expand this, we get
f = (I + K + K
2
+ ···)f
0
.
It doesn’t matter how unrigorous our derivation was. To see it is a valid solution,
we just have to check that it works! The first question to ask is if this expression
converges. On the second example sheet, we will show that this thing actually
converges. If this holds, then we have
(I K)f = If
0
+ Kf
0
+ K
2
f
0
+ ··· (K + K
2
f
0
+ K
3
f
0
+ ···) = f
0
.
So this is a solution!
Of course, this result is purely formal. Usually, there are better ad hoc ways
to solve the equation, as we know from IB Quantum Mechanics.
3.1.2 Discrete spacetime and bound states
We now consider the case
λ
=
κ
2
<
0, where we wlog
λ >
0. We are going to
seek solutions to
κ
= κ
2
ψ
κ
.
This time, we are going to ask that
kψ
κ
k
2
=
Z
−∞
ψ
κ
(x)
2
dx = 1.
We will wlog ψ
κ
R. We will call these things bound states.
Since u has compact support, any solution
= κ
2
ϕ
must obey
ϕ
xx
κ
2
φ = 0
for
|x|
. Then the solutions are linear combinations of
e
±κx
as
|x|
.
We now fix ϕ
κ
by the boundary condition
ϕ
κ
(x) = e
κx
as x +
Then as x −∞, there must exist coefficients α = α(κ), β = β(κ) such that
ϕ
κ
(x) = α(κ)e
κx
+ β(κ)e
κx
as x −∞.
Note that for any
κ
, we can solve the equation
=
κ
2
ϕ
and find a solution
of this form. However, we have the additional condition that
kψ
κ
k
2
= 1, and in
particular is finite. So we must have
β
(
κ
) = 0. It can be shown that the function
β = β(κ) has only finitely many zeroes
χ
1
> χ
2
> ··· > χ
N
> 0.
So we have a finite list of bound-states {ψ
n
}
N
n=1
, written
ψ
n
(x) = c
n
ϕ
χ
n
(x),
where c
n
are normalization constants chosen so that kψ
n
k = 1.
3.1.3 Summary of forward scattering problem
In summary, we had a spectral problem
= λψ,
where
L =
2
x
2
+ u,
where u has compact support. The goal is to find ψ and λ.
In the continuous spectrum, we have
λ
=
k
2
>
0. Then we can find some
T (k) and R(k) such that
φ(x, k) =
(
T (k)e
ikx
x −∞
e
ikx
+ R(k)e
kx
x +
,
and solutions exist for all k.
In the discrete spectrum, we have
λ
=
κ
2
<
0. We can construct bound
states {ψ
n
}
N
n=1
such that
n
= χ
2
n
ψ
n
with
χ
1
> χ
2
> ··· > χ
N
> 0,
and kψ
n
k = 1.
Bound states are characterized by large, positive x behaviour
ψ
n
(x) = c
n
e
χ
n
x
as x +,
where {c
n
}
N
n=1
are normalization constants.
Putting all these together, the scattering data for L is
S =
{χ
n
, c
n
}
N
n=1
, R(k), T (k)
.
Example.
Consider the Dirac potential
u
(
x
) =
2
αδ
(
x
), where
α >
0. Let’s
try to compute the scattering data.
We do the continuous spectrum first. Since
u
(
x
) = 0 for
x 6
= 0, we must have
Φ(x, k) =
(
T (k)e
ikx
x < 0
e
ikx
+ R(k)e
ikx
x > 0
Also, we want Φ(x, k) to be continuous at x = 0. So we must have
T (k) = 1 + R(k).
By integrating
L
Φ =
k
2
Φ over (
ε, ε
), taking
ε
0, we find that
Φ
x
has a jump
discontinuity at x = 0 given by
ik(R 1) + ikT = 2αT.
We now have two equations and two unknowns, and we can solve to obtain
R(k) =
k
, T (k) =
k
k
.
We can see that we indeed have
|R|
2
+ |T |
2
= 1.
Note that as
k
increases, we find that
R
(
k
)
0 and
T
(
k
)
1. This makes
sense, since we can think of
k
as the energy of the wave, and the larger the
energy, the more likely we are to pass through.
Now let’s do the discrete part of the spectrum, and we jump through the
same hoops. Since δ(x) = 0 for x 6= 0, we must have
2
ψ
n
x
2
+ χ
2
n
ψ
n
= 0
for x 6= 0. So we have
ψ
n
(x) = c
n
e
χ
n
|x|
.
Integrating
n
= χ
2
n
ψ
n
over (ε, ε), we similarly find that
c
n
χ
n
= c
n
α.
So there is just one bound state, with
χ
1
=
α
. We finally find
c
n
by requiring
kψ
1
k = 1. We have
1 =
Z
−∞
ψ
1
(x)
2
dx = c
2
1
Z
−∞
e
2χ
1
|x|
dx =
c
2
1
α
.
So we have
c
1
=
α.
In total, we have the following scattering data:
S =
{α,
α},
k
,
k
k
.
3.2 Inverse scattering problem
We might be interested in the inverse problem. Given scattering data
S =
{χ
n
, c
n
}
N
n=1
, R(k), T (k)
,
can we reconstruct the potential u = u(x) such that
L =
2
x
2
+ u(x)
has scattering data
S
? The answer is yes! Moreover, it turns out that
T
(
k
) is
not needed.
We shall write down a rather explicit formula for the inverse scattering
problem, but we will not justify it.
Theorem
(GLM inverse scattering theorem)
.
A potential
u
=
u
(
x
) that decays
rapidly to 0 as |x| is completely determined by its scattering data
S =
{χ
n
, c
n
}
N
n=1
, R(k)
.
Given such a scattering data, if we set
F (x) =
N
X
n=1
c
2
n
e
χ
n
x
+
1
2π
Z
−∞
e
ikx
R(k) dk,
and define k(x, y) to be the unique solution to
k(x, y) + F (x + y) +
Z
x
k(x, z)f(z + y) dz = 0,
then
u(x) = 2
d
dx
k(x, x).
Proof. Too hard.
Note that this equation
k(x, y) + F (x + y) +
Z
x
k(x, z)f(z + y) dz = 0
is not too hard to solve. We can view it as a linear equation of the form
x + b + Ax = 0
for some linear operator
A
, then use our familiar linear algebra techniques to
guess a solution. Afterwards, we can then verify that it works. We will see an
explicit example later on when we actually use this to solve problems.
Now that we’ve got this result, we understand how scattering problems work.
We know how to go forwards and backwards.
This is all old theory, and not too exciting. The real exciting thing is how
we are going to use this to solve PDE’s. Given the KdV equation
u
t
+ u
xxx
6uu
x
= 0,
we can think of this as a potential evolving over time, with a starting potential
u
(
x,
0) =
u
0
(
x
). We then compute the initial scattering data
T
,
R
,
χ
and
c
.
Afterwards, we obtain the corresponding equations of evolution of the scattering
data form the KdV equation. It turns out this is really simple the
χ
n
are
always fixed, and the others evolve as
R(k, t) = e
8ik
3
t
R(k, 0)
T (k, t) = T (k, 0)
c
n
(t) = e
4χ
3
n
t
c
n
(0).
Then we use this GLM formula to reconstruct the potential u at all times!
3.3 Lax pairs
The final ingredient to using the inverse scattering transform is how to relate
the evolution of the potential to the evolution of the scattering data. This is
given by a lax pair.
Recall that when we studied Hamiltonian systems at the beginning of the
course, under a Hamiltonian flow, functions evolve by
df
dt
= {f, H}.
In quantum mechanics, when we “quantize” this, in the Heisenberg picture, the
operators evolve by
i~
dL
dt
= [L, H].
In some sense, these equations tell us
H
“generates” time evolution. What we
need here is something similar an operator that generates the time evolution
of our operator.
Definition (Lax pair). Consider a time-dependent self-adjoint linear operator
L = a
m
(x, t)
m
x
m
+ ··· + a
1
(x, t)
x
+ a
0
(x, t),
where the
{a
i
}
(possibly matrix-valued) functions of (
x, t
). If there is a second
operator A such that
L
t
= LA AL = [L, A],
where
L
t
= ˙a
m
m
x
m
+ ··· + ˙a
0
,
denotes the derivative of L with respect to t, then we call (L, A) a Lax pair.
The main theorem about Lax pairs is the following isospectral flow theorem:
Theorem
(Isospectral flow theorem)
.
Let (
L, A
) be a Lax pair. Then the
discrete eigenvalues of
L
are time-independent. Also, if
=
λψ
, where
λ
is a
discrete eigenvalue, then
L
˜
ψ = λ
˜
ψ,
where
˜
ψ = ψ
t
+ .
The word “isospectral” means that we have an evolving system, but the
eigenvalues are time-independent.
Proof.
We will assume that the eigenvalues at least vary smoothly with
t
, so
that for each eigenvalue
λ
0
at
t
= 0 with eigenfunction
ψ
0
(
x
), we can find some
λ(t) and ψ(x, t) with λ(0) = λ
0
, ψ(x, 0) = ψ
0
(x) such that
L(t)ψ(x, t) = λ(t)ψ(x, t).
We will show that in fact
λ
(
t
) is constant in time. Differentiating with respect
to t and rearranging, we get
λ
t
ψ = L
t
ψ +
t
λψ
t
= LAψ ALψ +
t
λψ
t
= LAψ λAψ +
t
λψ
t
= (L λ)(ψ
t
+ )
We now take the inner product ψ, and use that kψk = 1. We then have
λ
t
= hψ, λ
t
ψi
= hψ, (L λ)(ψ
t
+ A
ψ
)i
= h(L λ)ψ, ψ
t
+ A
ψ
i
= 0,
using the fact that L, hence L λ is self-adjoint.
So we know that
λ
t
= 0, i.e. that
λ
is time-independent. Then our above
equation gives
L
˜
ψ = λ
˜
ψ,
where
˜
ψ = ψ
t
+ .
In the case where
L
is the Schr¨odinger operator, the isospectral theorem tells
us how we can relate the evolution of some of the scattering data (namely the
χ
n
), to some differential equation in
L
(namely the Laxness of
L
). For a cleverly
chosen
A
, we will be able to relate the Laxness of
L
to some differential equation
in
u
, and this establishes our first correspondence between evolution of
u
and
the evolution of scattering data.
Example. Consider
L =
2
x
+ u(x, t)
A = 4
3
x
3(u∂
x
+
x
u).
Then (L, A) is a Lax pair iff u = u(x, t) satisfies KdV. In other words, we have
L
t
[L, A] = 0 u
t
+ u
xxx
6uu
x
= 0.
3.4 Evolution of scattering data
Now we do the clever bit: we allow the potential
u
=
u
(
x, t
) to evolve via KdV
u
t
+ u
xxx
6uu
x
= 0.
We see how the scattering data for
L
=
2
x
+
u
(
x, t
) evolves. Again, we will
assume that u has compact support. Note that this implies that we have
A = 4
3
x
as |x| .
3.4.1 Continuous spectrum (λ = k
2
> 0)
As in Section 3.1.1, for each
t
, we can construct a solution
ϕ
to
=
k
2
ϕ
such
that
ϕ(x, t) =
(
e
ikx
x −∞
a(k, t)e
ikx
+ b(k, t)e
ikx
x
.
This time, we know that for any
u
, we can find a solution for any
k
. So we can
assume that k is fixed in the equation
= k
2
ϕ.
We assume that
u
is a solution to the KdV equation, so that (
L, A
) is a Lax
pair. As in the proof of the isospectral flow theorem, we differentiate this to get
0 = (L k
2
)(ϕ
t
+ A
ϕ
).
This tells us that
˜ϕ = ϕ
t
+
solves
L ˜ϕ = k
2
˜ϕ.
We can try to figure out what
˜ϕ
is for large
|x|
. We recall that for large
|x|
, we
simply have A = 4
3
x
. Then we can write
˜ϕ(x, t) =
(
4ik
3
e
ikx
x −∞
(a
t
+ 4ik
3
a)e
ikx
+ (b
t
4ik
3
b)e
ikx
x
We now consider the function
θ = 4ik
3
ϕ ˜ϕ.
By linearity of L, we have
= k
2
θ.
Note that by construction, we have
θ
(
x, t
)
0 as
x −∞
. We recall that the
solution to Lf = k
2
f for f = f
0
as x −∞ is just
f = (I K)
1
f
0
= (I + K + K
2
+ ···)f
0
So we obtain
θ = (1 + K + K
2
+ ···)0 = 0.
So we must have
˜ϕ = 4ik
3
ϕ.
Looking at the x + behaviour, we figure that
a
t
+ 4ik
3
a = 4ik
3
a
b
t
4ik
3
b = 4ik
3
b
Of course, these are equations we can solve. We have
a(k, t) = a(k, 0)
b(k, t) = b(k, 0)e
8ik
3
t
.
In terms of the reflection and transmission coefficients, we have
R(k, t) = R(k, 0)e
8ik
3
t
T (k, t) = T (k, 0).
Thus, we have shown that if we assume
u
evolves according to the really compli-
cated KdV equation, then the scattering data must evolve in this simple way!
This is AMAZING.
3.4.2 Discrete spectrum (λ = κ
2
< 0)
The discrete part is similar. By the isospectral flow theorem, we know the
χ
n
are constant in time. For each
t
, we can construct bound states
{ψ
n
(
x, t
)
}
N
n=1
such that
n
= χ
2
n
ψ
n
, kψ
n
k = 1.
Moreover, we have
ψ
n
(x, t) = c
n
(t)e
χ
n
|x|
as x +.
From the isospectral theorem, we know the function
˜
ψ
n
=
t
ψ
n
+
n
also satisfies
L
˜
ψ
n
= χ
2
n
˜
ψ
n
It is an exercise to show that these solutions must actually be proportional to
one another. Looking at Wronskians, we can show that
˜
ψ
n
ψ
n
. Also, we have
hψ
n
,
˜
ψ
n
i = hψ
n
,
t
ψ
n
i + hψ
n
,
n
i
=
1
2
t
hψ
n
, ψ
n
i + hψ
n
,
n
i
= 0,
using the fact that
A
is antisymmetric and
kψ
n
k
is constant. We thus deduce
that
˜
ψ
n
= 0.
Looking at large x-behaviour, we have
˜
ψ
n
(x, t) = ( ˙c
n
4χ
3
n
c
n
)e
χ
n
x
as x +. Since
˜
ψ
n
0, we must have
˙c
n
4χ
3
n
c
n
= 0.
So we have
c
n
(t) = c
n
(0)e
4χ
3
n
t
.
This is again AMAZING.
3.4.3 Summary of inverse scattering transform
So in summary, suppose we are given that
u
=
u
(
x, t
) evolves according to KdV,
namely
u
t
+ u
xxx
6uu
x
= 0.
If we have an initial condition
u
0
(
x
) =
u
(
x,
0), then we can compute its scattering
data
S(0) =
{χ
n
, c
n
(0)}
N
n=1
, R(k, 0)
.
Then for arbitrary time, the scattering data for L =
2
x
+ u is
S(t) =
n
{χ
n
, c
n
(0)e
4χ
3
n
t
}
N
n=1
, R(k, 0)e
8ik
3
t
o
.
We then apply GLM to obtain u(x, t) for all time t.
u
0
(x) S(0) =
{χ
n
, c
n
(0)}
N
n=1
, R(k, 0)
u(x, t) S(t) =
n
{χ
n
, c
n
(0)e
4χ
3
n
t
}
N
n=1
, R(k, 0)e
8ik
3
t
o
Construct scattering data
L=
2
x
+u
0
(x)
KdV
equation
Evolve
scattering
data
L
t
=[L,A]
Solve GLM equation
The key thing that makes this work is that
u
t
+
u
xxx
6
uu
x
holds if and only if
L
t
= [L, A].
For comparison, this is what we would do if we had to solve
u
t
+
u
xxx
= 0
by a Fourier transform:
u
0
(x) ˆu
0
(k)
u(x, t) ˆu(u, t) = ˆu
0
(k)e
ik
3
t
Fourier transform
u
t
+u
xxx
=0
ˆu
t
ik
3
ˆu=0
Inverse Fourier
Transform
It is just the same steps, but with a simpler transform!
3.5 Reflectionless potentials
We are now going to actually solve the KdV equation for a special kind of
potential reflectionless potentials.
Definition
(Reflectionless potential)
.
A reflectionless potential is a potential
u(x, 0) satisfying R(k, 0) = 0.
Now if u evolves according to the KdV equation, then
R(k, t) = R(k, 0)e
8ik
3
t
= 0.
So if a potential starts off reflectionless, then it remains reflectionless.
We now want to solve the GLM equation in this case. Using the notation
when we wrote down the GLM equation, we simply have
F (x) =
N
X
n=1
c
2
n
e
χ
n
x
.
We will mostly not write out the
t
when we do this, and only put it back in at
the very end. We now guess that the GLM equation has a solution of the form
K(x, y) =
N
X
m=1
K
m
(x)e
χ
m
y
for some unknown functions
{K
m
}
(in the second example sheet, we show that
it must have this form). We substitute this into the GLM equation and find that
N
X
n=1
"
c
2
n
e
χ
n
x
+ K
n
(x) +
N
X
m=1
c
2
n
K
m
(x)
Z
x
e
(χ
n
+χ
m
)z
dz
#
e
χ
n
y
= 0.
Now notice that the
e
χ
n
y
for
n
= 1
, ··· , N
are linearly independent. So we
actually have N equations, one for each n. So we know that
c
2
n
e
χ
n
x
+ K
n
(x) +
N
X
m=1
c
2
n
K
m
(x)
χ
n
+ χ
m
e
(χ
n
+χ
m
)x
= 0 ()
for all
n
= 1
, ··· , N
. Now if our goal is to solve the
K
n
(
x
), then this is just a
linear equation for each x! We set
c = (c
2
1
e
χ
1
x
, ··· , c
2
N
e
χ
N
x
)
T
K = (K
1
(x), ··· , K
N
(x))
T
A
nm
= δ
nm
+
c
2
n
e
(χ
n
χ
m
)x
χ
n
+ χ
m
.
Then () becomes
AK = c.
This really is a linear algebra problem. But we don’t really have to solve this.
The thing we really want to know is
K(x, x) =
N
X
m=1
K
m
(x)e
χ
m
x
=
N
X
m=1
N
X
m=1
(A
1
)
mn
(c)
n
e
χ
m
x
Now note that
d
dx
A
nm
(x) = A
0
nm
(x) = c
2
n
e
χ
n
x
e
χ
m
x
= (c)
n
e
χ
m
x
.
So we can replace the above thing by
K(x, x) =
N
X
m=1
N
X
n=1
(A
1
)
mn
A
0
nm
= tr(A
1
A
0
).
It is an exercise on the second example sheet to show that this is equal to
K(x, x) =
1
det A
d
dx
(det A) =
d
dx
log(det A).
So we have
u(x) = 2
d
2
dx
2
log(det A).
We now put back the
t
-dependence we didn’t bother to write all along. Then we
have
u(x, t) = 2
2
x
2
log(det A(x, t)),
where
A
nm
(x, t) = δ
nm
+
c
n
(0)
2
e
8χ
3
n
t
e
(χ
n
+χ
m
)x
χ
n
+ χ
m
.
It turns out these are soliton solutions, and the number of discrete eigenstates
N is just the number of solitons!
3.6 Infinitely many first integrals
As we’ve previously mentioned, we are expecting our integrable PDE’s to have
infinitely many first integrals. Recall we can construct ϕ = ϕ(x, k, t) such that
= k
2
ϕ,
and we had
ϕ(x, k, t) =
(
e
ikx
x −∞
a(k, t)e
ikx
+ b(k, t)e
ikx
x
.
But when we looked at the evolution of the scattering data, we can actually
write down what
a
and
b
are. In particular,
a
(
k, t
) =
a
(
k
) is independent of
t
.
So we might be able to extract some first integrals from it. We have
e
ikx
ϕ(x, k, t) = a(k) + b(k, t)e
2ikx
as x .
We now take the average over [
R,
2
R
] for
R
. We do the terms one by one.
We have the boring integral
1
R
Z
2R
R
a(k) dx = a(k).
For the b(k, t) term, we have
1
R
Z
2R
R
b(k, t)e
2ikx
dx = O
1
R
,
So we have
a(k) = lim
R→∞
1
R
Z
2R
R
e
ikx
ϕ(x, k, t) dx
= lim
R→∞
Z
2
1
e
ikRx
ϕ(Rx, k, t) dx.
So can we figure out what this thing is? Since
ϕ
=
e
ikx
as
x −∞
, it is
“reasonable” to write
ϕ(x, k, t) = exp
ikx +
Z
x
−∞
S(y, k, t) dy
for some function S. Then after some dubious manipulations, we would get
a(k) = lim
R→∞
Z
2
1
exp
Z
Rx
−∞
S(y, k, t) dy
!
dx
= exp
Z
−∞
S(y, k, t) dy
. ()
Now this is interesting, since the left hand side
a
(
k
) has no
t
-dependence, but
the right-hand formula does. So this is where we get our first integrals from.
Now we need to figure out what S is. To find S, recall that ϕ satisfies
= k
2
ϕ.
So we just try to shove our formula of ϕ into this equation. Notice that
ϕ
x
= (S ik)ϕ, ϕ
xx
= S
x
ϕ + (S ik)
2
ϕ.
We then put these into the Schr¨odinger equation to find
S
x
(2ik)S + S
2
= u.
We have got no
ϕ
’s left. This is a famous type of equation a Ricatti-type
equation. We can make a guess
S(x, k, t) =
X
n=1
S
n
(x, t)
(2ik)
n
.
This seems like a strange thing to guess, but there are indeed some good reasons
for this we will not get into. Putting this into the equation and comparing
coefficients of k
n
, we obtain a recurrence relation
S
1
= u
S
n+1
=
dS
n
dx
+
n1
X
m=1
S
m
S
nm
.
This is a straightforward recurrence relation to compute. We can make a
computer do this, and get
S
2
= u
x
, S
3
= u
xx
+ u
2
, S
4
= ···
Using the expression for S in (), we find that
log a(k) =
Z
−∞
S(x, k, t) dx
=
X
n=1
1
(2ik)
n
Z
−∞
S
n
(x, t) dx.
Since the LHS is time-independent, so is the RHS. Moreover, this is true for all
k. So we know that
Z
−∞
S
n
(x, t) dt
must be constant with time!
We can explicitly compute the first few terms:
(i) For n = 1, we find a first integral
Z
−∞
u(x, t) dx
We can view this as a conservation of mass.
(ii) For n = 2, we obtain a first integral
Z
−∞
u
x
(x, t) dx.
This is actually boring, since we assumed that
u
vanishes at infinity. So
we knew this is always zero anyway.
(iii) For n = 3, we have
Z
−∞
(u
xx
(x, t) + u(x, t)
2
) dx =
Z
−∞
u(x, t)
2
dx.
This is in some sense a conservation of momentum.
It is an exercise to show that
S
n
is a total derivative for all even
n
, so we
don’t get any interesting conserved quantity. But still, half of infinity is infinity,
and we do get infinitely many first integrals!
4 Structure of integrable PDEs
4.1 Infinite dimensional Hamiltonian system
When we did ODEs, our integrable ODEs were not just random ODEs. They
came from some (finite-dimensional) Hamiltonian systems. If we view PDEs
as infinite-dimension ODEs, then it is natural to ask if we can generalize the
notion of Hamiltonian systems to infinite-dimensional ones, and then see if we
can put our integrable systems in the form of a Hamiltonian system. It turns
out we can, and nice properties of the PDE falls out of this formalism.
We recall that a (finite-dimensional) phase space is given by
M
=
R
2n
and a non-degenerate anti-symmetric matrix
J
. Given a Hamiltonian function
H : M R, the equation of motion for x(t) M becomes
dx
dt
= J
H
x
,
where
x
(
t
) is a vector of length 2
n
,
J
is a non-degenerate anti-symmetric matrix,
and H = H(x) is the Hamiltonian.
In the infinite-dimensional case, instead of having 2
n
coordinates
x
i
(
t
), we
have a function
u
(
x, t
) that depends continuously on the parameter
x
. When
promoting finite-dimensional things to infinite-dimensional versions, we think
of
x
as a continuous version of
i
. We now proceed to generalize the notions we
used to have for finite-dimensional to infinite dimensional ones.
The first is the inner product. In the finite-dimensional case, we could take
the inner product of two vectors by
x · y =
X
x
i
y
i
.
Here we have an analogous inner product, but we replace the sum with an
integral.
Notation. For functions u(x) and v(x), we write
hu, vi =
Z
R
u(x)v(x) dx.
If u, v are functions of time as well, then so is the inner product.
For finite-dimensional phase spaces, we talked about functions of
x
. In
particular, we had the Hamiltonian
H
(
x
). In the case of infinite-dimensional
phase spaces, we will not consider arbitrary functions on
u
, but only functionals:
Definition
(Functional)
.
A functional
F
is a real-valued function (on some
function space) of the form
F [u] =
Z
R
f(x, u, u
x
, u
xx
, ···) dx.
Again, if u is a function of time as well, the F [u] is a function of time.
We used to be able to talk about the derivatives of functions. Time derivatives
of
F
would work just as well, but differentiating with respect to
u
will involve
the functional derivative, which you may have met in IB Variational Principles.
Definition
(Functional derivative/Euler-Lagrange derivative)
.
The functional
derivative of F = F [u] at u is the unique function δF satisfying
hδF, ηi = lim
ε0
F [u + εη] F [u]
ε
for all smooth η with compact support.
Alternatively, we have
F [u + εη] = F [u] + εhδF, ηi+ o(ε).
Note that δF is another function, depending on u.
Example. Set
F [u] =
1
2
Z
u
2
x
dx.
We then have
F [u + εη] =
1
2
Z
(u
x
+ εη
x
)
2
dx
=
1
2
u
2
x
dx + ε
Z
u
x
η
x
dx + o(ε)
= F [u] + εhu
x
, η
x
i + o(ε)
This is no good, because we want something of the form
hδF, ηi
, not an inner
product with η
x
. When in doubt, integrate by parts! This is just equal to
= F [u] + εh−u
xx
, ηi + o(ε).
Note that when integrating by parts, we don’t have to mess with the boundary
terms, because η is assumed to have compact support. So we have
δF = u
xx
.
In general, from IB Variational Principles, we know that if
F [u] =
Z
f(x, u, u
x
, u
xx
, ···) dx,
then we have
δF =
f
u
D
x
f
u
x
+ D
2
x
f
u
xx
··· .
Here D
x
is the total derivative, which is different from the partial derivative.
Definition
(Total derivative)
.
Consider a function
f
(
x, u, u
x
, ···
). For any
given function u(x), the total derivative with respect to x is
d
dx
f(x, u(x), u
x
(x), ···) =
f
x
+ u
x
f
u
+ u
xx
f
u
x
+ ···
Example.
x
(xu) = u, D
x
(xu) = u + xu
x
.
Finally, we need to figure out an alternative for
J
. In the case of a finite-
dimensional Hamiltonian system, it is healthy to think of it as an anti-symmetric
bilinear form, so that
vJw
is
J
applied to
v
and
w
. However, since we also have
an inner product given by the dot product, we can alternatively think of
J
as a
linear map R
2n
R
2n
so that we apply it as
v · Jw = v
T
Jw.
Using this J, we can define the Poisson bracket of f = f (x), g = g(x) by
{f, g} =
f
x
· J
g
x
.
We know this is bilinear, antisymmetric and satisfies the Jacobi identity.
How do we promote this to infinite-dimensional Hamiltonian systems? We
can just replace
f
x
with the functional derivative and the dot product with the
inner product. What we need is a replacement for
J
, which we will write as
J
.
There is no obvious candidate for
J
, but assuming we have found a reasonable
linear and antisymmetric candidate, we can make the following definition:
Definition
(Poisson bracket for infinite-dimensional Hamiltonian systems)
.
We
define the Poisson bracket for two functionals to be
{F, G} = hδF, JδGi =
Z
δF (x)JδG(x) dx.
Since
J
is linear and antisymmetric, we know that this Poisson bracket is
bilinear and antisymmetric. The annoying part is the Jacobi identity
{F, {G, H}} + {G, {H, F }} + {H, {F, G}} = 0.
This is not automatically satisfied. We need conditions on
J
. The simplest
antisymmetric linear map we can think of would be
J
=
x
, and this works, i.e.
the Jacobi identity is satisfied. Proving that is easy, but painful.
Finally, we get to the equations of motions. Recall that for finite-dimensional
systems, our equation of evolution is given by
dx
dt
= J
H
x
.
We make the obvious analogues here:
Definition
(Hamiltonian form)
.
An evolution equation for
u
=
u
(
x, t
) is in
Hamiltonian form if it can be written as
u
t
= J
δH
δu
.
for some functional
H
=
H
[
u
] and some linear, antisymmetric
J
such that the
Poisson bracket
{F, G} = hδF, JδGi
obeys the Jacobi identity.
Such a J is known as a Hamiltonian operator.
Definition
(Hamiltonian operator)
.
A Hamiltonian operator is linear antisym-
metric function
J
on the space of functions such that the induced Poisson
bracket obeys the Jacobi identity.
Recall that for a finite-dimensional Hamiltonian system, if
f
=
f
(
x
) is any
function, then we had
df
dt
= {f, H}.
This generalizes to the infinite-dimensional case.
Proposition. If u
t
= JδH and I = I[u], then
dI
dt
= {I, H}.
In particular I[u] is a first integral of u
t
= JδH iff {I, H} = 0.
The proof is the same.
Proof.
dI
dt
= lim
ε0
I[u + εu
t
] I[u]
ε
= hδI, u
t
i = hδI, JδHi = {I, H}.
In summary, we have the following correspondence:
2n-dimensional phase space infinite dimensional phase space
x
i
(t) : i = 1, ··· , 2n u(x, t) : x
x · y =
P
i
x
i
y
i
hu, vi =
R
u(x, t)v(x, t) dx
d
dt
t
x
δ
δu
anti-symmetric matrix J anti-symmetric linear operator J
functions f = f (x) functionals F = F [u]
4.2 Bihamiltonian systems
So far, this is not too interesting, as we just generalized the finite-dimensional
cases in sort-of the obvious way. However, it is possible that the same PDE
might be able to be put into Hamiltonian form for different
J
’s. These are
known as bihamiltonian systems.
Definition
(Bihamiltonian system)
.
A PDE is bihamiltonian if it can be written
in Hamiltonian form for different J.
It turns out that when this happens, then the system has infinitely many first
integrals in involution! We will prove this later on. This is rather miraculous!
Example. We can write the KdV equation in Hamiltonian form by
u
t
= J
1
δH
1
, J
1
=
x
, H
1
[u] =
Z
1
2
u
2
x
+ u
3
dx.
We can check that this says
u
t
=
x
u
D
x
u
x

1
2
u
2
x
+ u
3
= 6uu
x
u
xxx
,
and this is the KdV equation.
We can also write it as
u
t
= J
0
δH
0
, J
0
=
3
x
3
+ 4u∂
x
+ 2u
x
, H
0
[u] =
Z
1
2
u
2
dx.
So KdV is bi-Hamiltonian. We then know that
J
1
δH
1
= J
0
δH
0
.
We define a sequence of Hamiltonians {H
n
}
n0
via
J
1
δH
n+1
= J
0
δH
n
.
We will assume that we can always solve for
H
n+1
given
H
n
. This can be proven,
but we shall not. We then have the miraculous result.
Theorem.
Suppose a system is bi-Hamiltonian via (
J
0
, H
0
) and (
J
1
, H
1
). It is
a fact that we can find a sequence {H
n
}
n0
such that
J
1
δH
n+1
= J
0
δH
n
.
Under these definitions,
{H
n
}
are all first integrals of the system and are in
involution, i.e.
{H
n
, H
m
} = 0
for all n, m 0, where the Poisson bracket is taken with respect to J
1
.
Proof. We notice the following interesting fact: for m 1, we have
{H
n
, H
m
} = hδH
n
, J
1
δH
m
i
= hδH
n
, J
0
δH
m1
i
= −hJ
0
δH
n
, δH
m1
i
= −hJ
1
δH
n+1
, δH
m1
i
= hδH
n+1
, J
1
δH
m1
i
= {H
n+1
, H
m1
}.
Iterating this many times, we find that for any n, m, we have
{H
n
, H
m
} = {H
m
, H
n
}.
Then by antisymmetry, they must both vanish. So done.
4.3 Zero curvature representation
There is a more geometric way to talk about integrable systems, which is via
zero-curvature representations.
Suppose we have a function
u
(
x, t
), which we currently think of as being fixed.
From this, we construct
N × N
matrices
U
=
U
(
λ
) and
V
=
V
(
λ
) that depend
on
λ
,
u
and its derivatives. The
λ
will be thought of as a spectral parameter”,
like the λ in the eigenvalue problem = λϕ.
Now consider the system of PDE’s
x
v = U(λ)v,
t
v = V (λ)v, ()
where v = v(x, t; λ) is an N-dimensional vector.
Now notice that here we have twice as many equations as there are unknowns.
So we need some compatibility conditions. We use the fact that
v
xt
=
v
tx
. So
we need
0 =
t
U(λ)v
x
V (λ)v
=
U
t
v + U
v
t
V
x
v V
v
x
=
U
t
v + UV x
V
x
v V Uv
=
U
t
V
x
+ [U, V ]
v.
So we know that if a (non-trivial) solution to the PDE’s exists for any initial
v
0
,
then we must have
U
t
V
x
+ [U, V ] = 0.
These are known as the zero curvature equations.
There is a beautiful theorem by Frobenius that if this equation holds, then
solutions always exist. So we have found a correspondence between the existence
of solutions to the PDE, and some equation in U and V .
Why are these called the zero curvature equations? In differential geometry,
a connection
A
on a tangent bundle has a curvature given by the Riemann
curvature tensor
R = Γ Γ + ΓΓ ΓΓ,
where Γ is the Christoffel symbols associated to the connection. This equation is
less silly than it seems, because each of the objects there has a bunch of indices,
and the indices on consecutive terms are not equal. So they do not just outright
cancel. In terms of the connection A, the curvature vanishes iff
A
j
x
i
A
i
x
j
+ [A
i
, A
j
] = 0,
which has the same form as the zero-curvature equation.
Example. Consider
U(λ) =
i
2
2λ u
x
u
x
2λ
, V (λ) =
1
4
cos u i sin u
i sin u cos u
.
Then the zero curvature equation is equivalent to the sine–Gordon equation
u
xt
= sin u.
In other words, the sine–Gordon equation holds iff the PDEs (
) have a solution.
In geometry, curvature is an intrinsic property of our geometric object, say a
surface. If we want to to compute the curvature, we usually pick some coordinate
systems, take the above expression, interpret it in that coordinate system, and
evaluate it. However, we could pick a different coordinate system, and we get
different expressions for each of, say,
A
j
x
i
. However, if the curvature vanishes in
one coordinate system, then it should also vanish in any coordinate system. So
by picking a new coordinate system, we have found new things that satisfies the
curvature equation.
Back to the real world, in general, we can give a gauge transformation that
takes some solution (
U, V
) to a new (
˜
U,
˜
V
) that preserves the zero curvature
equation. So we can use gauge transformations to obtain a lot of new solutions!
This will be explored in the last example sheet.
What are these zero-curvature representations good for? We don’t have time
to go deep into the matter, but these can be used to do some inverse-scattering
type things. In the above formulation of the sine–Gordon equation. If
u
x
0
as |x| , we write
v =
ψ
1
ψ
2
.
Then we have
x
ψ
1
ψ
2
=
i
2
2λ u
x
u
x
2λ
ψ
1
ψ
2
=
ψ
1
ψ
2
.
So we know
ψ
1
ψ
2
= A
1
0
e
iλx
+ B
0
1
e
iλx
as
|x|
. So with any
v
satisfying the first equation in (
), we can associate
to it some “scattering data”
A, B
. Then the second equation in (
) tells us how
v
, and thus
A, B
evolves in time, and using this we can develop some inverse
scattering-type way of solving the equation.
4.4 From Lax pairs to zero curvature
Lax pairs are very closely related to the zero curvature. Recall that we had this
isospectral flow theorem if Lax’s equation
L
t
= [L, A],
is satisfied, then the eigenvalues of
L
are time-independent. Also, we found that
our eigensolutions satisfied
˜
ψ = ψ
t
+ = 0.
So we have two equations:
= λψ
ψ
t
+ = 0.
Now suppose we reverse this we enforce that
λ
t
= 0. Then differentiating the
first equation and substituting in the second gives
L
t
= [L, A].
So we can see Lax’s equation as a compatibility condition for the two equations
above. We will see that given any equations of this form, we can transform it
into a zero curvature form.
Note that if we have
L =
x
n
+
n1
X
j=0
u
j
(x, t)
x
j
A =
x
n
+
n1
X
j=0
v
j
(x, t)
x
j
then
= λψ
means that derivatives of order
n
can be expressed as linear combinations of
derivatives < n. Indeed, we just have
n
x
ψ = λψ
n1
X
j=0
u
j
(x, t)
j
x
ψ.
Then differentiating this equation will give us an expression for the higher
derivatives in terms of the lower ones.
Now by introducing the vector
Ψ = (ψ,
x
ψ, ··· ,
n1
x
ψ),
The equation = λψ can be written as
x
Ψ = U(λ)Ψ,
where
U(λ) =
0 1 0 ··· 0
0 0 1 ··· 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 ··· 1
λ u
0
u
1
u
2
··· −u
n1
Now differentiate ψ
t
+ = 0” i times with respect to x to obtain
(
i1
x
ψ)
t
+
i1
x
m1
X
j=0
v
j
(x, t)
x
j
ψ
| {z }
P
n
j=1
V
ij
(x,t)
j1
x
ψ
= 0
for some
V
ij
(
x, t
) depending on
v
j
, u
i
and their derivatives. We see that this
equation then just says
t
Ψ = V Ψ.
So we have shown that
L
t
= [L, A]
(
= λψ
ψ
t
+ = 0
(
Ψ
x
= U(λ)Ψ
Ψ
t
= V (λ)Ψ
U
t
V
x
+ [U, V ] = 0.
So we know that if something can be written in the form of Lax’s equation, then
we can come up with an equivalent equation in zero curvature form.
5 Symmetry methods in PDEs
Finally, we are now going to learn how we can exploit symmetries to solve
differential equations. A lot of the things we do will be done for ordinary
differential equations, but they all work equally well for partial differential
equations.
To talk about symmetries, we will have to use the language of groups. But
this time, since differential equations are continuous objects, we will not be
content with just groups. We will talk about smooth groups, or Lie groups. With
Lie groups, we can talk about continuous families of symmetries, as opposed to
the more “discrete” symmetries like the symmetries of a triangle.
At this point, the more applied students might be scared and want to run
away from the word “group”. However, understanding “pure” mathematics is
often very useful when doing applied things, as a lot of the structures we see in
the physical world can be explained by concepts coming from pure mathematics.
To demonstrate this, we offer the following cautionary tale, which may or may
not be entirely made up.
Back in the 60’s, Gell-Mann was trying to understand the many different
seemingly-fundamental particles occurring nature. He decided one day that he
should plot out the particles according to certain quantum numbers known as
isospin and hypercharge. The resulting diagram looked like this:
So this is a nice picture, as it obviously formed some sort of lattice. However, it
is not clear how one can generalize this for more particles, or where this pattern
came from.
Now a pure mathematician happened to got lost, and was somehow walked
into in the physics department and saw that picture. He asked “so you are also
interested in the eight-dimensional adjoint representations of
su
(3)?”, and the
physicist was like, “no. . . ?”.
It turns out the weight diagram (whatever that might be) of the eight-
dimensional adjoint representation of
su
(3) (whatever that might be), looked
exactly like that. Indeed, it turns out there is a good correspondence between
representations of
su
(3) and quantum numbers of particles, and then the way to
understand and generalize this phenomenon became obvious.
5.1 Lie groups and Lie algebras
So to begin with, we remind ourselves with what a group is!
Definition (Group). A group is a set G with a binary operation
(g
1
, g
2
) 7→ g
1
g
2
called “group multiplication”, satisfying the axioms
(i) Associativity: (g
1
g
2
)g
3
= g
1
(g
2
g
3
) for all g
1
, g
2
, g
3
(ii)
Existence of identity: there is a (unique) identity element
e G
such that
ge = eg = g
for all g G
(iii) Inverses exist: for each g G, there is g
1
G such that
gg
1
= g
1
g = e.
Example. (Z, +) is a group.
What we are really interested in is how groups act on certain sets.
Definition
(Group action)
.
A group
G
acts on a set
X
if there is a map
G × X X sending (g, x) 7→ g(x) such that
g(h(x)) = (gh)(x), e(x) = x
for all g, h G and x X.
Example. The rotation matrices SO(2) acts on R
2
via matrix multiplication.
We are not going to consider groups in general, but we will only talk about
Lie groups, and coordinate changes born of them. For the sake of simplicity, we
are not going to use the “real” definition of Lie group, but use an easier version
that really looks more like the definition of a local Lie group than a Lie group.
The definition will probably be slightly confusing, but it will become clearer
with examples.
Definition
(Lie group)
.
An
m
-dimensional Lie group is a group such that all
the elements depend continuously on
m
parameters, in such a way that the
maps (
g
1
, g
2
)
7→ g
1
g
2
and
g 7→ g
1
correspond to a smooth function of those
parameters.
In practice, it suffices to check that the map (g
1
, g
2
) 7→ g
1
g
1
2
is smooth.
So elements of an (
m
-dimensional) Lie group can be written as
g
(
t
), where
t R
m
. We make the convention that
g
(0) =
e
. For those who are doing
differential geometry, this is a manifold with a group structure such that the
group operations are smooth maps. For those who are doing category theory,
this is a group object in the category of smooth manifolds.
Example. Any element of G = SO(2) can be written as
g(t) =
cos t sin t
sin t cos t
for
t R
. So this is a candidate for a 1-dimensional Lie group that depends on
a single parameter
t
. We now have to check that the map (
g
1
, g
2
)
7→ g
1
g
1
2
is
smooth. We note that
g(t
1
)
1
= g(t
1
).
So we have
g(t
1
)g(t
2
)
1
= g(t
1
)g(t
2
) = g(t
1
t
2
).
So the map
(g
1
, g
2
) 7→ g
1
g
1
2
corresponds to
(t
1
, t
2
) 7→ t
1
t
2
.
Since this map is smooth, we conclude that
SO
(2) is a 1-dimensional Lie group.
Example. Consider matrices of the form
g(t) =
1 t
1
t
3
0 1 t
2
0 0 1
, t R
3
It is easy to see that this is a group under matrix multiplication. This is known
as the Heisenberg group. We now check that it is in fact a Lie group. It has three
obvious parameters
t
1
, t
2
, t
3
, and we have to check the smoothness criterion. We
have
g(a)g(b) =
1 a
1
a
3
0 1 a
2
0 0 1
1 b
1
b
3
0 1 b
2
0 0 1
=
1 a
1
+ b
1
a
3
+ b
3
+ a
1
b
2
0 1 a
2
+ b
2
0 0 1
.
We can then write down the inverse
g(b)
1
=
1 b
1
b
1
b
2
b
3
0 1 b
2
0 0 1
So we have
g(a)g(b)
1
=
1 a
1
a
3
0 1 a
2
0 0 1
1 b
1
b
1
b
2
b
3
0 1 b
2
0 0 1
=
1 a
1
b
1
b
1
b
2
b
3
a
1
b
2
+ a
3
0 1 a
2
b
2
0 0 1
This then corresponds to
(a, b) 7→
a
1
b
1
a
2
b
2
b
1
b
2
b
3
a
1
b
2
+ a
3
,
which is a smooth map! So we conclude that the Heisenberg group is a three-
dimensional Lie group.
Recall that at the beginning of the course, we had vector fields and flow
maps. Flow maps are hard and complicated, while vector fields are nice and easy.
Thus, we often want to reduce the study of flow maps to the study of vector
fields, which can be thought of as the “infinitesimal flow”. For example, checking
that two flows commute is very hard, but checking that the commutator of two
vector fields vanishes is easy.
Here we are going to do the same. Lie groups are hard. To make life easier,
we look at “infinitesimal” elements of Lie groups, and this is known as the Lie
algebra.
We will only study Lie algebras informally, and we’ll consider only the case
of matrix Lie groups, so that it makes sense to add, subtract, differentiate the
elements of the Lie group (in addition to the group multiplication), and the
presentation becomes much easier.
Suppose we have a curve
x
1
(
ε
) in our parameter space passing through 0 at
time 0. Then we can obtain a curve
A(ε) = g(x
1
(t))
in our Lie group G. We set a = A
0
(0), so that
A(ε) = I + εa + o(ε).
We now define the Lie algebra
g
to be the set of all “leading order terms”
a
arising from such curves. We now proceed to show that
g
is in fact a vector
space.
Suppose we have a second curve B(x), which we expand similarly as
B(ε) = I + εb + o(ε).
We will show that a + b g. Consider the curve
t 7→ A(t)B(t),
using the multiplication in the Lie group. Then we have
A(ε)B(ε) = (I + εa + o(ε))(I + εb + o(ε)) = I + ε(a + b) + o(ε).
So we know a, b g implies a + b g.
For scalar multiplication, given λ R, we can construct a new curve
t 7→ A(λt).
Then we have
A(λε) = I + ε(λa) + o(ε).
So if a g, then so is λa g for any λ R.
So we get that
g
has the structure of a vector space! This is already a little
interesting. Groups are complicated. They have this weird structure and they
are not necessarily commutative. However, we get a nice, easy vector space
structure form the group structure.
It turns out we can do something more fun. The commutator of any two
elements of g is also in g. To see this, we define a curve C(t) for t > 0 by
t 7→ A(
t)B(
t)A(
t)
1
B(
t)
1
.
We now notice that
A
(
ε
)
1
=
I εa
+
o
(
ε
), since if
A
(
ε
)
1
=
I
+
ε˜a
+
o
(
ε
), then
I = A(ε)A(ε)
1
= (I + εa + o(ε))(I + ε˜a + o(ε))
= I + ε(a + ˜a) + o(ε)
So we must have ˜a = a.
Then we have
C(ε) = (I +
εa + ···)(I + εb + ···)(I
εa + ···)(I
εb + ···)
= I + ε(ab ba) + o(ε).
It is an exercise to show that this is actually true, because we have to keep track
of the second order terms we didn’t write out to make sure they cancel properly.
So if a, b g, then
[a, b]
L
= ab ba g.
Vector spaces with this extra structure is called a Lie algebra. The idea is that
the Lie algebra consists of elements of the group infinitesimally close to the
identity. While the product of two elements
a, b
infinitesimally close to the
identity need not remain infinitesimally close to the identity, the commutator
ab ba does.
Definition
(Lie algebra)
.
A Lie algebra is a vector space
g
equipped with a
bilinear, anti-symmetric map [
·, ·
]
L
:
g ×g g
that satisfies the Jacobi identity
[a, [b, c]
L
]
L
+ [b, [c, a]
L
]
L
+ [c, [a, b]
L
]
L
= 0.
This antisymmetric map is called the Lie bracket.
If dim g = m, we say the Lie algebra has dimension m.
The main source of Lie algebras will come from Lie groups, but there are
many other examples.
Example. We can set g = R
3
, and
[a, b]
L
= a × b.
It is a straightforward (and messy) check to see that this is a Lie algebra.
Example. Let M be our phase space, and let
g = {f : M R smooth}.
Then
[f, g]
K
= {f, g}
is a Lie algebra.
Example. We now find the Lie algebra of the matrix group SO(n). We let
G = SO(n) = {A Mat
n
(R) : AA
T
= I, det A = 1}.
We let A(ε) be a curve in G with A(0) = I. Then we have
I = A(ε)A(ε)
T
= (I + εa + o(ε))(I + εa
T
+ o(ε))
= I + ε(a + a
T
) + o(ε).
So we must have
a
+
a
T
= 0, i.e.
a
is anti-symmetric. The other condition says
1 = det A(ε) = det(I + εa + o(ε)) = 1 + ε tr(a) + o(ε).
So we need tr(a) = 0, but this is already satisfied since A is antisymmetric.
So it looks like the Lie algebra
g
=
so
(
n
) corresponding to
SO
(
n
) is the vector
space of anti-symmetric matrices:
so(n) = {a Mat
n
(R) : a + a
T
= 0}.
To see this really is the answer, we have to check that every antisymmetric
matrix comes from some curve. It is an exercise to check that the curve
A(t) = exp(at).
works.
We can manually check that g is closed under the commutator:
[a, b]
L
= ab ba.
Indeed, we have
[a, b]
T
L
= (ab ba)
T
= b
T
a
T
a
T
b
T
= ba ab = [a, b]
T
L
.
Note that it is standard that if we have a group whose name is in capital
letters (e.g.
SO
(
n
)), then the corresponding Lie algebra is the same thing in
lower case, fraktur letters (e.g. so(n)).
Note that above all else,
g
is a vector space. So (at least if
g
is finite-
dimensional) we can give
g
a basis
{a
i
}
m
i=1
. Since the Lie bracket maps
g ×g g
,
it must be the case that
[a
i
, a
j
] =
m
X
k=1
c
k
ij
a
k
for some constants c
k
ij
. These are known as the structure constants.
5.2 Vector fields and one-parameter groups of transforma-
tions
Ultimately, we will be interested in coordinate transformations born of the action
of some Lie group. In other words, we let the Lie group act on our coordinate
space (smoothly), and then use new coordinates
˜
x = g(x),
where
g G
for some Lie group
G
. For example, if
G
is the group of rotations,
then this gives new coordinates by rotating.
Recall that a vector field
V
:
R
n
R
n
defines an integral curve through the
point x via the solution of differential equations
d
dε
˜
x = V(
˜
x),
˜
x(0) = x.
To represent solutions to this problem, we use the flow map g
ε
defined by
˜x(ε) = g
ε
x = x + εV(x) + o(ε).
We call
V
the generator of the flow. This flow map is an example of a one-
parameter group of transformations.
Definition
(One-parameter group of transformations)
.
A smooth map
g
ε
:
R
n
R
n
is called a one-parameter group of transformations (1.p.g.t) if
g
0
= id, g
ε
1
g
ε
2
= g
ε
1
+ε
2
.
We say such a one-parameter group of transformations is generated by the vector
field
V(x) =
d
dε
(g
ε
x)
ε=0
.
Conversely, every vector field
V
:
R
n
R
n
generates a one-parameter group of
transformations via solutions of
d
dε
˜
x = V(
˜
x),
˜
x(0) = x.
For some absurd reason, differential geometers decided that we should repre-
sent vector fields in a different way. This notation is standard but odd-looking,
and is in many settings more convenient.
Notation.
Consider a vector field
V
= (
V
1
, ··· , V
n
)
T
:
R
n
R
n
. This vector
field uniquely defines a differential operator
V = V
1
x
1
+ V
2
x
2
+ ··· + V
n
x
n
.
Conversely, any linear differential operator gives us a vector field like that. We
will confuse a vector field with the associated differential operator, and we think
of the
x
i
as a basis for our vector field.
Example. We will write the vector field V = (x
2
+ y, yx) as
V = (x
2
+ y)
x
+ yx
y
.
One good reason for using this definition is that we have a simple description
of the commutator of two vector fields. Recall that the commutator of two vector
fields V, W was previously defined by
[V, W]
i
=

V ·
x
W
W ·
x
V
i
= V
j
W
i
x
j
W
j
V
i
x
j
.
Now if we think of the vector field as a differential operator, then we have
V = V ·
x
, W = W ·
x
.
The usual definition of commutator would then be
(V W W V )(f) = V
j
x
j
W
i
f
x
i
W
j
x
j
W
i
f
x
i
=
V
j
W
i
x
j
W
j
V
i
x
j
f
x
i
+ V
j
W
i
2
f
x
i
x
j
W
j
V
i
2
f
x
i
x
j
=
V
j
W
i
x
j
W
j
V
i
x
j
f
x
i
= [V, W] ·
x
f.
So with the new notation, we literally have
[V, W ] = V W WV.
We shall now look at some examples of vector fields and the one-parameter
groups of transformations they generate. In simple cases, it is not hard to find
the correspondence.
Example. Consider a vector field
V = x
x
+
y
.
This generates a 1-parameter group of transformations via solutions to
d
˜
x
dε
= ˜x,
d˜y
dε
= 1
where
(˜x(0), ˜y(0)) = (x, y).
As we are well-trained with differential equations, we can just write down the
solution
(˜x(ε), ˜y(ε)) = g
ε
(x, y) = (xe
ε
, y + ε)
Example. Consider the natural action of SO(2)
=
S
1
on R
2
via
g
ε
(x, y) = (x cos ε y sin ε, y cos ε + x sin ε).
We can show that
g
0
=
id
and
g
ε
1
g
ε
2
=
g
ε
1
+ε
2
. The generator of this vector
field is
V =
d˜x
dε
ε=0
x
+
d˜y
dε
ε=0
y
= y
x
+ x
y
.
We can plot this as:
Example. If
V = α
x
,
then we have
g
ε
x = x + αε.
This is a translation with constant speed.
If we instead have
V = βx
x
,
then we have
g
ε
x = e
βε
x,
which is scaling x up at an exponentially growing rate.
How does this study of one-parameter group of transformations relate to our
study of Lie groups? It turns out the action of Lie groups on
R
n
can be reduced
to the study of one-parameter groups of transformations. If a Lie group
G
acts
on
R
n
, then it might contain many one-parameter groups of transformations.
More precisely, we could find some elements
g
ε
G
depending smoothly on
ε
such that the action of g
ε
on R
n
is a one-parameter group of transformation.
It turns out that Lie groups contain a lot of one-parameter groups of trans-
formations. In general, given any
g
(
t
)
G
(in a neighbourhood of
e G
), we
can reach it via a sequence of one-parameter group of transformations:
g(t) = g
ε
1
i
1
g
ε
2
i
2
···g
ε
N
i
N
.
So to understand a Lie group, we just have to understand the one-parameter
groups of transformations. And to understand these one-parameter groups, we
just have to understand the vector fields that generate them, i.e. the Lie algebra,
and this is much easier to deal with than a group!
5.3 Symmetries of differential equations
So far we’ve just been talking about Lie groups in general. We now try to apply
this to differential equations. We will want to know when a one-parameter group
of transformations is a symmetry of a differential equation.
We denote a general (ordinary) differential equation by
∆[x, u, u
x
, u
xx
···] = 0.
Note that in general, ∆ can be a vector, so that we can have a system of equations.
We say
u
=
u
(
x
) is a solution to the differential equation if it satisfies the above
equation.
Suppose
g
ε
be a 1-parameter group of transformations generated by a vector
field V , and consider the new coordinates
(
˜
x, ˜u) = g
ε
(x, u).
Note that we transform both the domain
x
and the codomain
u
of the function
u(x), and we are allowed to mix them together.
We call g
ε
a Lie point symmetry of if
∆[x, u, u
x
, ···] = 0 = ∆[
˜
x,
˜
u, ˜u
˜
x
, ···] = 0
In other words, it takes solutions to solutions.
We say this Lie point symmetry is generated by V .
Example. Consider the KdV equation
∆ = u
t
+ u
xxx
6uu
x
= 0.
Then translation in the t direction given by
g
ε
(x, t, u) = (x, t + ε, u)
is a Lie point symmetry. This is generated by
V =
t
.
Indeed, by the chain rule, we have
˜u
˜
t
=
u
˜
t
=
t
˜
t
u
t
+
x
˜
t
u
x
=
u
t
.
Similarly, we have
˜u
˜x
= u
x
, ˜u
˜x˜x˜x
= u
xxx
.
So if
∆[x, t, u] = 0,
then we also have
∆[˜x,
˜
t, ˜u] = ∆[x, t, u] = 0.
In other words, the vector field
V
=
t
generates a Lie point symmetry of the
KdV equation.
Obviously Lie point symmetries give us new solutions from old ones. More
importantly, we can use it to solve equations!
Example. Consider the ODE
du
dx
= F
u
x
.
We see that there are things that look like
u/x
on both sides. So it is not too
hard to see that this admits a Lie-point symmetry
g
ε
(x, u) = (e
ε
x, e
ε
u).
This Lie point symmetry is generated by
V = x
x
+ t
t
.
The trick is to find coordinates (
s, t
) such that
V
(
s
) = 0 and
V
(
t
) = 1. We call
these “invariant coordinates”. Then since
V
is still a symmetry of the equation,
this suggests that
t
should not appear explicitly in the differential equation, and
this will in general make our lives easier. Of course, terms like
t
s
can still appear
because translating t by a constant does not change t
s
.
We pick
s =
u
x
, t = log |x|,
which does indeed satisfy V (s) = 0, V (t) = 1. We can invert these to get
x = e
t
, u = se
t
.
With respect to the (s, t) coordinates, the ODE becomes
dt
ds
=
1
F (s) s
,
at least for
F
(
s
)
6
=
s
. As promised, this does not have an explicit
t
dependence.
So we can actually integrate this thing up. We can write the solution as
t = C +
Z
s
ds
0
F (s
0
) s
0
.
Going back to the original coordinates, we know
log |x| = C
Z
u/x
ds
F (s) s
.
If we actually had an expression for
F
and did the integral, we could potentially
restore this to get an expression of
u
in terms of
x
. So the knowledge of the Lie
point symmetry allowed us to integrate up our ODE.
In general, for an nth order ODE
∆[x, u, u
0
, ··· , u
(n)
] = 0
admitting a Lie point symmetry generated by
V = ξ(x, u)
x
+ η(x, u)
u
,
we introduce coordinates
s = s(u, x), t = t(u, x)
such that in the new coordinates, we have
V =
t
.
This means that in the new coordinates, the ODE has the form
∆[s, t
0
, ··· , t
(n)
] = 0.
Note that there is no explicit
t
! We can now set
r
=
t
0
, so we get an (
n
1)th
order ODE
∆[s, r, r
0
, ··· , r
(n1)
],
i.e. we have reduced the order of the ODE by 1. Now rinse and repeat.
5.4 Jets and prolongations
This is all nice, but we still need to find a way to get Lie point symmetries. So
far, we have just found them by divine inspiration, which is not particularly
helpful. In general. Is there a more systematic way of finding Lie symmetries?
We can start by looking at the trivial case a 0th order ODE
∆[x, u] = 0.
Then we know g
ε
: (x, u) 7→ (˜x, ˜u) is a Lie point symmetry if
∆[x, u] = 0 = ∆[˜x, ˜u] = ∆[g
ε
(x, u)] = 0.
Can we reduce this to a statement about the generator of g
ε
? Here we need to
assume that is of maximal rank , i.e. the matrix of derivatives
j
y
i
is of maximal rank, where the
y
i
runs over
x, u
, and in general all coordinates.
So for example, the following theory will not work if, say ∆[
x, u
] =
x
2
. Assuming
is indeed of maximal rank, it is an exercise on the example sheet to see that if
V is the generator of g
ε
, then g
ε
is a Lie point symmetry iff
∆ = 0 = V (∆) = 0.
This essentially says that the flow doesn’t change iff the derivative of
along
V
is constantly zero, which makes sense. Here we are thinking of
V
as a
differential operator. We call this constraint an on-shell condition, because we
only impose it whenever ∆ = 0 is satisfied, instead of at all points.
This equivalent statement is very easy! This is just an algebraic equation for
the coefficients of V , and it is in general very easy to solve!
However, as you may have noticed, these aren’t really ODE’s. They are just
equations. So how do we generalize this to
N
1 order ODE’s? Consider a
general vector field
V (x, u) = ξ(x, u)
x
+ η(x, u)
u
.
This only knows what to do to
x
and
u
. But if we know how
x
and
u
change, we
should also know how
u
x
, u
xx
etc. change. Indeed this is true, and extending the
action of V to the derivatives is known as the prolongation of the vector field.
We start with a concrete example.
Example. Consider a 1 parameter group of transformations
g
ε
: (x, u) 7→ (e
ε
x, e
ε
u) = (˜x, ˜u)
with generator
V = x
x
u
u
.
This induces a transformation
(x, u, u
x
) 7→ (˜x, ˜u, ˜u
˜x
)
By the chain rule, we know
d˜u
d˜x
=
d˜u/dx
d˜x/dx
= e
2ε
u
x
.
So in fact
(˜x, ˜u, ˜u
˜x
) (e
ε
x, e
ε
u, e
2ε
u
x
).
If we call (
x, u
) coordinates for the base space, then we call the extended
system (
x, u, u
x
) coordinates for the first jet space. Given any function
u
=
u
(
x
),
we will get a point (x, u, u
x
) in the jet space for each x.
What we’ve just seen is that a one-parameter group of transformation of the
base space induces a one-parameter group of transformation of the first jet space.
This is known as the prolongation, written
pr
(1)
g
ε
: (x, u, u
x
) 7→ (˜x, ˜u, ˜u
˜x
) = (e
ε
x, e
ε
u, e
2ε
u
x
).
One might find it a bit strange to call
u
x
a coordinate. If we don’t like doing
that, we can just replace
u
x
with a different symbol
p
1
. If we have the
n
th
derivative, we replace the nth derivative with p
n
.
Since we have a one-parameter group of transformations, we can write down
the generator. We see that pr
(1)
g
ε
is generated by
pr
(1)
V = x
x
u
u
2u
x
u
x
.
This is called the first prolongation of V .
Of course, we can keep on going. Similarly,
pr
(2)
g
ε
acts on the second jet
space which has coordinates (x, u, u
x
, u
xx
). In this case, we have
pr
(2)
g
ε
(x, u, u
x
, u
xx
) 7→ (˜x, ˜u, ˜u
˜x˜x
) (e
ε
x, e
ε
u, e
2ε
u
x
, e
3ε
u
xx
).
This is then generated by
pr
(2)
V = x
x
u
u
2u
x
u
x
3u
xx
u
xx
.
Note that we don’t have to recompute all terms. The
x, u, u
x
terms did not
change, so we only need to check what happens to ˜u
˜x˜x
.
We can now think of an nth order ODE
∆[x, u, u
x
, ··· , u
(n)
] = 0
as an algebraic equation on the
n
th jet space. Of course, this is not just an
arbitrary algebraic equation. We will only consider solutions in the
n
th jet space
that come from some function
u
=
u
(
x
). Similarly, we only consider symmetries
on the
n
th jet space that come from the prolongation of some transformation on
the base space.
With that restriction in mind, we have effectively dressed up our problem
into an algebraic problem, just like the case of ∆[
x, u
] = 0 we discussed at the
beginning. Then g
ε
: (x, u) 7→ (˜x, ˜u) is a Lie point symmetry if
∆[˜x, ˜u, ˜u
˜x
, . . . , ˜u
(n)
] = 0
when ∆ = 0. Or equivalently, we need
∆[pr
(n)
g
ε
(x, u, . . . , u
(n)
)] = 0
when = 0. This is just a one-parameter group of transformations on a
huge coordinate system on the jet space. Thinking of all
x, u, ··· , u
(n)
as just
independent coordinates, we can rewrite it in terms of vector fields. (Assuming
maximal rank) this is equivalent to asking for
pr
(n)
V (∆) = 0.
This results in an overdetermined system of differential equations for (
ξ, η
),
where
V (x, u) = ξ(x, u)
x
+ η(x, u)
u
.
Now in order to actually use this, we need to be able to compute the
n
th
prolongation of an arbitrary vector field. This is what we are going to do next.
Note that if we tried to compute the prolongation of the action of the Lie
group, then it would be horrendous. However, what we actually need to compute
is the prolongation of the vector field, which is about the Lie algebra. This makes
it much nicer.
We can write
g
ε
(x, u) = (˜x, ˜u) = (x + εξ(x, u), + η(x, u)) + o(ε).
We know the nth prolongation of V must be of the form
pr
(n)
V = V +
n
X
k=1
η
k
u
(k)
,
where we have to find out what η
k
is. Then we know η
k
will satisfy
g
ε
(x, u, ··· , u
(n)
) = (˜x, ˜u, ··· , ˜u
(n)
)
= (x + εξ, u + εη, u
x
+ η
1
, ··· , u
(n)
+ εη
n
) + o(ε).
To find η
1
, we use the contact condition
d˜u =
d˜u
d˜x
d˜x = ˜u
˜x
d˜x.
We now use the fact that
˜x = x + εξ(x, u) + o(ε)
˜u = x + εη(x, u) + o(ε).
Substituting in, we have
du + εdη = ˜u
˜x
(dx + εdξ) + o(ε).
We want to write everything in terms of dx. We have
du = u
x
dx
dη =
η
x
dx +
η
u
du
=
η
x
+ u
x
η
u
dx
= D
x
η dx,
where D
x
is the total derivative
D
x
=
x
+ u
x
u
+ u
xx
u
x
+ ··· .
We similarly have
dξ = D
x
ξdx.
So substituting in, we have
(u
x
+ εD
x
η)dx = ˜u
˜x
(1 + εD
x
ξ)dx + o(ε).
This implies that
˜u
˜x
=
u
x
+ εD
x
η
1 + εD
x
ξ
+ o(ε)
= (u
x
+ εD
x
η)(1 εD
x
ξ) + o(ε)
= u
x
+ ε(D
x
η u
x
D
x
ξ) + o(ε).
So we have
η
1
= D
x
η u
x
D
x
η.
Now building up η
k
recursively, we use the contact condition
d˜u
(k)
=
d˜u
(k)
d˜x
d˜x = ˜u
(k+1)
d˜x.
We use
˜u
(k)
= u
(k)
+ εη
k
+ o(ε)
˜x = x + εξ + o(ε).
Substituting that back in, we get
(u
(k+1)
+ εD
x
η
k
) dx = ˜u
(k+1)
(1 + εD
x
ξ)dx + o(ε).
So we get
˜u
(k+1)
= (u
(k+1)
+ εD
x
η
k
)(1 εD
x
ξ) + o(ε)
= u
(k+1)
+ ε(D
x
η
k
u
(k+1)
D
x
ξ) + o(ε).
So we know
η
k+1
= D
x
η
k
u
(k+1)
D
x
ξ.
So we know
Proposition (Prolongation formula). Let
V (x, u) = ξ(x, u)
x
+ η(x, u)
u
.
Then we have
pr
(n)
V = V +
n
X
k=1
η
k
u
(k)
,
where
η
0
= η(x, u)
η
k+1
= D
x
η
k
u
(k+1)
D
x
ξ.
Example. For
g
ε
: (x, u) 7→ (e
ε
x, e
ε
u),
we have
V = x
x
+ (u)
u
.
So we have
ξ(x, u) = x
η(x, u) = u.
So by the prolongation formula, we have
pr
(1)
V = V + η
1
u
x
,
where
η
x
= D
x
(u) u
x
D
x
(x) = 2u
x
,
in agreement with what we had earlier!
In the last example sheet, we will derive an analogous prolongation formula
for PDEs.
5.5 Painlev´e test and integrability
We end with a section on the Painlev´e test. If someone just gave us a PDE,
how can we figure out if it is integrable? It turns out there are some necessary
conditions for integrability we can check.
Recall the following definition.
Definition
(Singularirty)
.
A singularity of a complex-valued function
w
=
w
(
z
)
is a place at which it loses analyticity.
These can be poles, branch points, essential singularities etc.
Suppose we had an ODE of the form
d
2
w
dz
2
+ p(z)
dw
dz
+ q(z)w = 0,
and we want to know if the solutions have singularities. It turns out that
any singularity of a solution
w
=
w
(
z
) must be inherited from the functions
p
(
z
)
, q
(
z
). In particular, the locations of the singularities will not depend on
initial conditions w(z
0
), w
0
(z
0
).
This is not the case for non-linear ODE’s. For example, the equation
dw
dz
+ w
2
= 0
gives us
w(z) =
1
z z
0
.
The location of this singularity changes, and it depends on the initial condition.
We say it is movable.
This leads to the following definition:
Definition (Painlev´e property). We will say that an ODE of the form
d
n
w
dz
n
= F
d
n1
w
dz
n1
, ··· , w, z
has the Painlev´e property if the movable singularities of its solutions are at worst
poles.
Example. We have
dw
dz
+ w
2
= 0.
has a solution
w(z) =
1
z z
0
.
Since this movable singularity is a pole, this has the Painleve´e property.
Example. Consider the equation
dw
dz
+ w
3
= 0.
Then the solution is
w(z) =
1
p
2(z z
0
)
,
whose singularity is not a pole.
In the olden days, Painlev´e wanted to classify all ODE’s of the form
d
2
w
dz
2
= F
dw
dz
, w, z
,
where F is a rational function, that had the Painlev´e property.
He managed to show that there are fifty such equations (up to simple
coordinate transformations). The interesting thing is that 44 of these can be
solved in terms of well-known functions, e.g. Jacobi elliptic functions, Weierstrass
functions, Bessel functions etc.
The remaining six gave way to solutions that were genuinely new functions,
called the six Painlev´e transcendents. The six differential equations are
(PI)
d
2
w
dz
2
= 6w
2
+ z
(PII)
d
2
w
dz
2
= 2x
3
+ zw + α
(PIII)
d
2
w
dz
2
=
1
w
dw
dz
2
+
1
z
dw
dz
+ αw
2
+ β
+ γw
3
+
δ
w
(PIV)
d
2
w
dz
2
=
1
2w
dw
dz
2
+
3w
3
2
+ 4zw
2
+ 2(z
2
α)w +
β
w
(PV)
d
2
w
dz
2
=
1
2w
+
1
w 1
dw
dz
2
1
z
dw
dz
+
(w 1)
2
z
2
αw +
β
w
+
γw
z
+
δw(w + 1)
w 1
(PVI)
d
2
w
dz
2
=
1
2
1
w
+
1
w 1
+
1
w z
dw
dz
2
1
z
+
1
z 1
+
1
w z
dw
dz
+
w(w 1)(w z)
z
2
(z 1)
2
α +
βz
w
2
+
γ(z 1)
(w 1)
2
+
δz(z 1)
(w z)
2
.
Fun fact: Painlev´e served as the prime minister of France twice, for 9 weeks and
7 months respectively.
This is all good, but what has this got to do with integrability of PDE’s?
Conjecture
(Ablowitz-Ramani-Segur conjecture (1980))
.
Every ODE reduction
(explained later) of an integrable PDE has the Painlev´e property.
This is still a conjecture since, as we’ve previously mentioned, we don’t
really have a definition of integrability. However, we have proved this conjecture
for certain special cases, where we have managed to pin down some specific
definitions.
What do we mean by ODE reduction? Vaguely speaking, if we have a Lie
point symmetry of a PDE, then we can use it to introduce coordinates that are
invariant and then form subsequence ODE’s in these coordinates. We can look
at some concrete examples:
Example.
In the wave equation, we can try a solution of the form
u
(
x, t
) =
f
(
x ct
), and then the wave equation gives us an ODE (or lack of) in terms of
f.
Example. Consider the sine–Gordon equation in light cone coordinates
u
xt
= sin u.
This equation admits a Lie-point symmetry
g
ε
: (x, t, u) 7→ (e
ε
x, e
ε
t, u),
which is generated by
V = x
x
t
t
.
We should now introduce a variable invariant under this Lie-point symmetry.
Clearly z = xt is invariant, since
V (z) = xt tx = 0.
What we should do, then is to look for a solution that depends on z, say
u(x, t) = F (z).
Setting
w = e
iF
,
the sine–Gordon equation becomes
d
2
w
dz
2
=
1
w
dw
dz
2
1
z
dw
dz
1
2z
.
This is equivalent to PIII, i.e. this ODE reduction has the Painlev´e property.
Example. Consider the KdV equation
u
t
+ u
xxx
6uu
x
= 0.
This admits a not-so-obvious Lie-point symmetry
g
ε
(x, t, u) =
x + εt +
1
2
ε
2
, t + ε, u
1
6
ε
This is generated by
V = t
x
+
t
1
6
u
.
We then have invariant coordinates
z = x
1
2
t
2
, w =
1
6
t + u.
To get an ODE for w, we write the second equation as
u(x, t) =
1
6
t + w(z).
Then we have
u
t
=
1
5
tw
0
(z), u
x
= w
0
(z), u
xx
= w
00
(z), u
xxx
= w
000
(z).
So KdV becomes
0 = u
t
+ u
xxx
6uu
x
=
1
6
+ w
000
(z) 6ww
0
(z).
We would have had some problems if the
t
’s didn’t get away, because we wouldn’t
have an ODE in
w
. But since we constructed these coordinates, such that
w
and
z
are invariant under the Lie point symmetry but
t
is not, we are guaranteed
that there will be no t left in the equation.
Integrating this equation once, we get an equation
w
00
(z) 3w(z)
2
1
5
z + z
0
= 0,
which is is PI. So this ODE reduction of KdV has the Painlev´e property.
In summary, the Painlev´e test of integrability is as follows:
(i) Find all Lie point symmetries of the PDE.
(ii) Find all corresponding ODE reductions.
(iii) Test each ODE for Painlev´e property.
We can then see if our PDE is not integrable. Unfortunately, there is no real
test for the converse.