II Integrable Systems (Full)

Part II — Integrable Systems

Based on lectures by A. Ashton

Notes taken by Dexter Chua

Michaelmas 2016

These notes are not endorsed by the lecturers, and I have modified them (often

significantly) after lectures. They are nowhere near accurate representations of what

was actually lectured, and in particular, all errors are almost surely mine.

Part IB Methods, and Complex Methods or Complex Analysis are essential; Part II

Classical Dynamics is desirable.

Integrability of ordinary differential equations: Hamiltonian systems and the Arnol’d–

Liouville Theorem (sketch of proof). Examples. [3]

Integrability of partial differential equations: The rich mathematical structure and

the universality of the integrable nonlinear partial differential equations (Korteweg-de

Vries, sine–Gordon). Backlund transformations and soliton solutions. [2]

The inverse scattering method: Lax pairs. The inverse scattering method for the

KdV equation, and other integrable PDEs. Multi soliton solutions. Zero curvature

representation. [6]

Hamiltonian formulation of soliton equations. [2]

Painleve equations and Lie symmetries: Symmetries of differential equations, the ODE

reductions of certain integrable nonlinear PDEs, Painleve equations. [3]

Contents

0 Introduction

1 Integrability of ODE’s

1.1 Vector fields and flow maps

1.2 Hamiltonian dynamics

1.3 Canonical transformations

1.4 The Arnold-Liouville theorem

2 Partial Differential Equations

2.1 KdV equation

2.2 Sine–Gordon equation

2.3 B¨acklund transformations

3 Inverse scattering transform

3.1 Forward scattering problem

3.1.1 Continuous spectrum

3.1.2 Discrete spacetime and bound states

3.1.3 Summary of forward scattering problem

3.2 Inverse scattering problem

3.3 Lax pairs

3.4 Evolution of scattering data

3.4.1 Continuous spectrum (λ = k

> 0)

3.4.2 Discrete spectrum (λ = −κ

< 0)

3.4.3 Summary of inverse scattering transform

3.5 Reflectionless potentials

3.6 Infinitely many first integrals

4 Structure of integrable PDEs

4.1 Infinite dimensional Hamiltonian system

4.2 Bihamiltonian systems

4.3 Zero curvature representation

4.4 From Lax pairs to zero curvature

5 Symmetry methods in PDEs

5.1 Lie groups and Lie algebras

5.2 Vector fields and one-parameter groups of transformations

5.3 Symmetries of differential equations

5.4 Jets and prolongations

5.5 Painlev´e test and integrability

0 Introduction

What is an integrable system? Unfortunately, an integrable system is a some-

thing mathematicians have not yet managed to define properly. Intuitively, an

integrable system is a differential equation we can “integrate up” directly. While

in theory, integrable systems should be very rare, it happens that in nature, a

lot of systems happen to be integrable. By exploiting the fact that they are

integrable, we can solve them much more easily.

1 Integrability of ODE’s

1.1 Vector fields and flow maps

In the first section, we are going to look at the integrability of ODE’s. Here we

are going to consider a general

-dimensional first order non-linear ODE’s. As

always, restricting to only first-order ODE’s is not an actual restriction, since

any higher-order ODE can be written as a system of first-order ODE’s. At the

end, we will be concerned with a special kind of ODE given by a Hamiltonian

system. However, in this section, we first give a quick overview of the general

theory of ODEs.

An m-dimensional ODE is specified by a vector field V : R

→ R

and an

initial condition

∈ R

. The objective is to find some

(

)

∈ R

, which is a

function of t ∈ (a, b) for some interval (a, b) containing 0, satisfying

x = V(x), x(0) = x

In this course, we will assume the vector field

is sufficiently “nice”, so that

the following result holds:

Fact.

For a “nice” vector field

and any initial condition

, there is always

a unique solution to

(

(0) =

. Moreover, this solution depends

smoothly (i.e. infinitely differentiably) on t and x

It is convenient to write the solution as

x(t) = g

where

→ R

is called the flow map. Since

is nice, we know this is a

smooth map. This flow map has some nice properties:

Proposition.

(i) g

= id

(ii) g

t+s

= g

(iii) (g

)

−1

= g

−t

If one knows group theory, then this says that

is a group homomorphism

from

to the group of diffeomorphisms of

, i.e. the group of smooth invertible

maps R

→ R

Proof.

The equality

is by definition of

, and the last equality follows

from the first two since t + (−t) = 0. To see the second, we need to show that

t+s

= g

)

for any

. To do so, we see that both of them, as a function of

, are solutions

x = V(x), x(0) = g

So the result follows since solutions are unique.

We say that

is the infinitesimal generator of the flow

. This is because

we can Taylor expand.

x(ε) = g

= x(0) + ε

x(0) + o(ε) = x

+ εV(x

) + o(ε).

Given vector fields

, V

, one natural question to ask is whether their flows

commute, i.e. if they generate g

and g

, then must we have

= g

for all

? In general, this need not be true, so we might be interested to find out

if this happens to be true for particular

, V

. However, often, it is difficult

to check this directly, because differential equations are generally hard to solve,

and we will probably have a huge trouble trying to find explicit expressions for

and g

Thus, we would want to be able to consider this problem at an infinitesimal

level, i.e. just by looking at

, V

themselves. It turns out the answer is given

by the commutator:

Definition

(Commutator)

For two vector fields

, V

→ R

, we define

a third vector field called the commutator by

, V

] =



∂

∂x



−



∂

∂x



where we write

∂

∂x



∂

∂x

, ··· ,

∂

∂x



More explicitly, the ith component is given by

, V

]

j=1

)

∂

∂x

)

− (V

)

∂

∂x

)

The result we have is

Proposition.

Let

, V

be vector fields with flows

and

. Then we have

, V

] = 0 ⇐⇒ g

= g

Proof. See example sheet 1.

1.2 Hamiltonian dynamics

From now on, we are going to restrict to a very special kind of ODE, known as

a Hamiltonian system. To write down a general ODE, the background setting

is just the space

. We then pick a vector field, and then we get an ODE.

To write down a Hamiltonian system, we need more things in the background,

but conversely we need to supply less information to get the system. These

Hamiltonian system are very useful in classical dynamics, and our results here

have applications in classical dynamics, but we will not go into the physical

applications here.

The background settings of a Hamiltonian is a phase space

. Points

on M are described by coordinates

(q, p) = (q

, ··· , q

, p

, ··· , p

We tend to think of the

are “generalized positions” of particles, and the

the “generalized momentum” coordinates. We will often write

x = (q, p)

It is very important to note that here we have “paired up” each

with the

corresponding

. In normal

, all the coordinates are equal, but this is

no longer the case here. To encode this information, we define the 2

n ×

anti-symmetric matrix

J =



0 I

−I



We call this the symplectic form, and this is the extra structure we have for a

phase space. We will later see that all the things we care about can be written

in terms of

, but for practical purposes, we will often express them in terms of

p and q instead.

The first example is the Poisson bracket:

Definition

(Poisson bracket)

For any two functions

f, g

M → R

, we define

the Poisson bracket by

{f, g} =

∂f

∂x

∂g

∂x

∂f

∂q

∂g

∂p

−

∂f

∂p

∂g

∂q

This has some obvious and not-so-obvious properties:

Proposition.

(i) This is linear in each argument.

(ii) This is antisymmetric, i.e. {f, g} = −{g, f}.

(iii) This satisfies the Leibniz property:

{f, gh} = {f, g}h + {f, h}g.

(iv) This satisfies the Jacobi identity:

{f, {g, h}} + {g, {h, f}} + {h, {f, g}} = 0.

(v) We have

, q

} = {p

, p

} = 0, {q

, p

} = δ

Proof.

Just write out the definitions. In particular, you will be made to write

out the 24 terms of the Jacobi identity in the first example sheet.

We will be interested in problems on M of the following form:

Definition

(Hamilton’s equation)

Hamilton’s equation is an equation of the

form

q =

∂H

∂p

p = −

∂H

∂q

(∗)

for some function H : M → R called the Hamiltonian.

Just as we think of

and

as generalized position and momentum, we tend

to think of H as generalized energy.

Note that given the phase space

, all we need to specify a Hamiltonian

system is just a Hamiltonian function

M → R

, which is much less information

that that needed to specify a vector field.

In terms of J, we can write Hamilton’s equation as

x = J

∂H

∂x

We can imagine Hamilton’s equation as specifying the trajectory of a particle.

In this case, we might want to ask how, say, the speed of the particle changes as

it evolves. In general, suppose we have a smooth function

M → R

. We want

to find the value of

. We simply have to apply the chain rule to obtain

f(x(t)) =

∂f

∂x

x =

∂f

∂x

∂H

∂x

= {f, H}.

We record this result:

Proposition.

Let

M → R

be a smooth function. If

(

) evolves according

to Hamilton’s equation, then

= {f, H}.

In particular, a function

is constant if and only if

{f, H}

= 0. This is

very convenient. Without a result like this, if we want to see if

is a conserved

quantity of the particle (i.e.

= 0), we might have to integrate the equations

of motion, and then try to find explicitly what is conserved, or perhaps mess

around with the equations of motion to somehow find that

vanishes. However,

we now have a very systematic way of figuring out if

is a conserved quantity —

we just compute {f, H}.

In particular, we automatically find that the Hamiltonian is conserved:

= {H, H} = 0.

Example.

Consider a particle (of unit mass) with position

= (

, q

) (in

Cartesian coordinates) moving under the influence of a potential

(

). By

Newton’s second law, we have

q = −

∂U

∂q

This is actually a Hamiltonian system. We define the momentum variables by

= ˙q

then we have

x =







−

∂U

∂q



= J

∂H

∂x

with

H =

|p|

+ U (q).

This is just the usual energy! Indeed, we can compute

∂H

∂p

= p,

∂H

∂q

∂U

∂q

Definition

(Hamiltonian vector field)

Given a Hamiltonian function

, the

Hamiltonian vector field is given by

= J

∂H

∂x

We then see that by definition, the Hamiltonian vector field generates the

Hamiltonian flow. More generally, for any f : M → R, we call

= J

∂f

∂x

This is the Hamiltonian vector field with respect to f.

We know have two bracket-like things we can form. Given two

f, g

, we can

take the Poisson bracket to get

{f, g}

, and consider its Hamiltonian vector field

{f,g}

. On the other hand, we can first get

and

, and then take the

commutator of the vector fields. It turns out these are not equal, but differ by a

sign.

Proposition. We have

, V

] = −V

{f,g}

Proof. See first example sheet.

Definition

(First integral)

Given a phase space

with a Hamiltonian

, we

call f : M → R a first integral of the Hamiltonian system if

{f, H} = 0.

The reason for the term “first integral” is historical — when we solve a

differential equation, we integrate the equation. Every time we integrate it, we

obtain a new constant. And the first constant we obtain when we integrate is

known as the first integral. However, for our purposes, we can just as well think

of it as a constant of motion.

Example.

Consider the two-body problem — the Sun is fixed at the origin,

and a planet has Cartesian coordinates

= (

, q

). The equation of motion

will be

q = −

|q|

This is equivalent to the Hamiltonian system p =

q, with

H =

|p|

−

|q|

We have an angular momentum given by

L = q ∧ p.

Working with coordinates, we have

= ε

ijk

We then have (with implicit summation)

, H} =

∂L

∂q

∂H

∂p

−

∂L

∂p

∂H

∂q

= ε

ijk



|q|



= ε

ijk



|q|



= 0,

where we know the thing vanishes because we contracted a symmetric tensor

with an antisymmetric one. So this is a first integral.

Less interestingly, we know

is also a first integral. In general, some

Hamiltonians have many many first integrals.

Our objective of the remainder of the chapter is to show that if our Hamilto-

nian system has enough first integrals, then we can find a change of coordinates

so that the equations of motion are “trivial”. However, we need to impose some

constraints on the integrals for this to be true. We will need to know about the

following words:

Definition (Involution). We say that two first integrals F, G are in involution

if {F, G} = 0 (so F and G “Poisson commute”).

Definition

(Independent first integrals)

A collection of functions

M →

are independent if at each

x ∈ M

, the vectors

∂f

∂x

for

= 1

, ··· , n

are

independent.

In general we will say a system is “integrable” if we can find a change of

coordaintes so that the equations of motion become “trivial” and we can just

integrate it up. This is a bit vague, so we will define integrability in terms of

the existence of first integrals, and then we will later see that if these conditions

are satisfied, then we can indeed integrate it up.:

Definition

(Integrable system)

A 2

-dimensional Hamiltonian system (

M, H

)

is integrable if there exists

first integrals

}

i=1

that are independent and in

involution (i.e. {f

, f

} = 0 for all i, j).

The word independent is very important, or else people will cheat, e.g. take

H, 2H, e

, H

, ···.

Example. Two-dimensional Hamiltonian systems are always integrable.

1.3 Canonical transformations

We now come to the main result of the chapter. We will show that we can indeed

integrate up integrable systems. We are going to show that there is a clever

choice of coordinates such Hamilton’s equations become “trivial”. However,

recall that the coordinates in a Hamiltonian system are not arbitrary. We have

somehow “paired up”

and

. So we want to only consider coordinate changes

that somehow respect this pairing.

There are many ways we can define what it means to “respect” the pairing.

We will pick a simple definition — we require that it preserves the form of

Hamilton’s equation.

Suppose we had a general coordinate change (q, p) 7→ (Q(q, p), P(q, p)).

Definition

(Canonical transformation)

A coordinate change (

q, p

)

7→

(

Q, P

)

is called canonical if it leaves Hamilton’s equations invariant, i.e. the equations

in the original coordinates

q =

∂H

∂q

, p = −

∂H

∂q

is equivalent to

Q =

∂

∂P

P = −

∂

∂Q

where

H(Q, P) = H(q, p).

If we write x = (q, p) and y = (Q, P), then this is equivalent to asking for

x = J

∂H

∂x

⇐⇒

y = J

∂

∂y

Example.

If we just swap the

and

around, then the equations change by a

sign. So this is not a canonical transformation.

Example.

The simplest possible case of a canonical transformation is a linear

transformation. Consider a linear change of coordinates given by

x 7→ y(x) = Ax.

We claim that this is canonical iff AJA

= J, i.e. that A is symplectic.

Indeed, by linearity, we have

y = A

x = AJ

∂H

∂x

Setting

H(y) = H(x), we have

∂H

∂x

∂y

∂x

∂

H(y)

∂y

= A

∂

H(y)

∂y

∂

∂y

Putting this back in, we have

y = AJA

∂

∂y

So y 7→ y(x) is canonical iff J = AJA

What about more general cases? Recall from IB Analysis II that a differ-

entiable map is “locally linear”. Now Hamilton’s equations are purely local

equations, so we might expect the following:

Proposition. A map x 7→ y(x) is canonical iff Dy is symplectic, i.e.

DyJ(Dy)

= J.

Indeed, this follows from a simple application of the chain rule.

Generating functions

We now discuss a useful way of producing canonical transformation, known as

generating functions. In general, we can do generating functions in four different

ways, but they are all very similar, so we will just do one that will be useful

later on.

Suppose we have a function

→ R

. We suggestively write its arguments

as S(q, P). We now set

p =

∂S

∂q

, Q =

∂S

∂P

By this equation, we mean we write down the first equation, which allows us to

solve for

in terms of

q, p

. Then the second equation tells us the value of

terms of q, P, hence in terms of p, q.

Usually, the way we use this is that we already have a candidate for what

should be. We then try to find a function

(

q, P

) such that the first equation

holds. Then the second equation will tell us what the right choice of Q is.

Checking that this indeed gives rise to a canonical transformation is just a

very careful application of the chain rule, which we shall not go into. Instead,

we look at a few examples to see it in action.

Example. Consider the generating function

S(q, P) = q · P.

Then we have

p =

∂S

∂q

= P, Q =

∂S

∂P

= q.

So this generates the identity transformation (Q, P) = (q, p).

Example.

In a 2-dimensional phase space, we consider the generating function

S(q, P ) = qP + q

Then we have

p =

∂S

∂q

= P + 2q, Q =

∂S

∂P

= q.

So we have the transformation

(Q, P ) = (q, p − 2q).

In matrix form, this is







1 0

−2 1





To see that this is canonical, we compute



1 0

−2 1





1 0

−2 1





1 0

−2 1



0 1

−1 0



1 −2

0 1





0 1

−1 0



So this is indeed a canonical transformation.

1.4 The Arnold-Liouville theorem

We now get to the Arnold-Liouville theorem. This theorem says that if a

Hamiltonian system is integrable, then we can find a canonical transformation

(

q, p

)

7→

(

Q, P

) such that

depends only on

. If this happened, then

Hamilton’s equations reduce to

Q =

∂

∂P

P = −

∂

∂Q

= 0,

which is pretty easy to solve. We find that

(

) =

is a constant, and since

the right hand side of the first equation depends only on

, we find that

also constant! So Q = Q

+ Ωt, where

Ω =

∂

∂P

So the solution just falls out very easily.

Before we prove the Arnold-Liouville theorem in full generality, we first see

how the canonical transformation looks like in a very particular case. Here we

will just have to write down the canonical transformation and see that it works,

but we will later find that the Arnold-Liouville theorem give us a general method

to find the transformation.

Example. Consider the harmonic oscillator with Hamiltonian

H(q, p) =

Since is a 2-dimensional system, so we only need a single first integral. Since

is a first integral for trivial reasons, this is an integrable Hamiltonian system.

We can actually draw the lines on which

is constant — they are just

ellipses:

We note that the ellipses are each homeomorphic to

. Now we introduce the

coordinate transformation (q, p) 7→ (φ, I), defined by

q =

sin φ, p =

√

2Iω cos φ,

For the purpose of this example, we can suppose we obtained this formula

through divine inspiration. However, in the Arnold-Liouville theorem, we will

provide a general way of coming up with these formulas.

We can manually show that this transformation is canonical, but it is merely

a computation and we will not waste time doing that. IN these new coordinates,

the Hamiltonian looks like

H(φ, I) = H(q(φ, I), p(φ, I)) = ωI.

This is really nice. There is no φ! Now Hamilton’s equations become

φ =

∂

∂I

= ω,

I = −

∂

∂φ

= 0.

We can integrate up to obtain

φ(t) = φ

+ ωt, I(t) = I

For some unexplainable reason, we decide it is fun to consider the integral along

paths of constant H:

2π

p dq =

2π

p(φ, I)



∂q

∂φ

dφ +

∂q

∂I



2π

p(φ, I)



∂q

∂φ

dφ



2π

√

2Iω cos

φ dφ

= I

This is interesting. We could always have performed the integral

2π

along

paths of constant

without knowing anything about

and

, and this would

have magically gave us the new coordinate I.

There are two things to take away from this.

(i) The motion takes place in S

(ii) We got I by performing

2π

p dq.

These two ideas are essentially what we are going to prove for general Hamiltonian

system.

Theorem

(Arnold-Liouville theorem)

We let (

M, H

) be an integrable 2

dimensional Hamiltonian system with independent, involutive first integrals

, ··· , f

, where f

= H. For any fixed c ∈ R

, we set

= {(q, p) ∈ M : f

(q, p) = c

, i = 1, ··· , n}.

Then

(i) M

is a smooth

-dimensional surface in

. If

is compact and

connected, then it is diffeomorphic to

= S

× ··· × S

(ii)

is compact and connected, then locally, there exists canonical coor-

dinate transformations (

q, p

)

7→

(

φ, I

) called the action-angle coordinates

such that the angles

{φ

}

k=1

are coordinates on

; the actions

}

k=1

are first integrals, and

(

q, p

) does not depend on

. In particular,

Hamilton’s equations

I = 0,

φ =

∂

∂I

= constant.

Some parts of the proof will refer to certain results from rather pure courses,

which the applied people may be willing to just take on faith.

Proof sketch.

The first part is pure differential geometry. To show that

smooth and

-dimensional, we apply the preimage theorem you may or may not

have learnt from IID Differential Geometry (which is in turn an easy consequence

of the inverse function theorem from IB Analysis II). The key that makes this

work is that the constraints are independent, which is the condition that allows

the preimage theorem to apply.

We next show that

is diffeomorphic to the torus if it is compact and

connected. Consider the Hamiltonian vector fields defined by

= J

∂f

∂x

We claim that these are tangent to the surface

. By differential geometry, it

suffices to show that the derivative of the

}

in the direction of

vanishes.

We can compute



∂

∂x



∂f

∂x

∂f

∂x

= {f

, f

} = 0.

Since this vanishes, we know that

is a tangent to the surface. Again by

differential geometry, the flow maps

}

must map

to itself. Also, we know

that the flow maps commute. Indeed, this follows from the fact that

, V

] = −V

}

= −V

= 0.

So we have a whole bunch of commuting flow maps from M

to itself. We set

= g

···g

where t ∈ R

. Then because of commutativity, we have

= g

So this is gives a group action of

on the surface

. We fix

x ∈ M

. We

define

stab(x) = {t ∈ R

: g

x = x}.

We introduce the map

φ :

stab(x)

→ M

given by

(

) =

. By the orbit-stabilizer theorem, this gives a bijection

between

/ stab

(

) and the orbit of

. It can be shown that the orbit of

exactly the connected component of

. Now if

is connected, then this must

be the whole of

! By general differential geometry theory, we get that this map

is indeed a diffeomorphism.

We know that

stab

(

) is a subgroup of

, and if the

are non-trivial, it can

be seen (at least intuitively) that this is discrete. Thus, it must be isomorphic

to something of the form Z

with 1 ≤ k ≤ n.

So we have

∼

/ stab(x)

∼

× R

n−k

∼

× R

n−k

Now if

is compact, we must have

n − k

= 0, i.e.

, so that we have no

factors of R. So M

∼

With all the differential geometry out of the way, we can now construct the

action-angle coordinates.

For simplicity of presentation, we only do it in the case when

= 2. The

proof for higher dimensions is entirely analogous, except that we need to use

a higher-dimensional analogue of Green’s theorem, which we do not currently

have.

We note that it is currently trivial to re-parameterize the phase space with

coordinates (

Q, P

) such that

is constant within the Hamiltonian flow, and

each coordinate of

takes values in

. Indeed, we just put

and use the

diffeomorphism

∼

to parameterize each

as a product of

copies of

. However, this is not good enough, because such an arbitrary transformation

will almost certainly not be canonical. So we shall try to find a more natural

and in fact canonical way of parametrizing our phase space.

We first work on the generalized momentum part. We want to replace

with something nicer. We will do something analogous to the simple harmonic

oscillator we’ve got.

So we fix a

, and try to come up with some numbers

that labels this

Recall that our surface M

looks like a torus:

Up to continuous deformation of loops, we see that there are two non-trivial

“single” loops in the torus, given by the red and blue loops:

More generally, for an n torus, we have n such distinct loops Γ

, ··· , Γ

. More

concretely, after identifying M

with S

, these are the loops given by

{0} × ··· × {0} × S

× {0} × ··· × {0} ⊆ S

We now attempt to define:

2π

p · dq,

This is just like the formula we had for the simple harmonic oscillator.

We want to make sure this is well-defined — recall that Γ

actually represents

a class of loops identified under continuous deformation. What if we picked a

different loop?

On M

, we have the equation

(q, p) = c

We will have to assume that we can invert this equation for

locally, i.e. we can

write

p = p(q, c).

The condition for being able to do so is just

det



∂f

∂p



6= 0,

which is not hard.

Then by definition, the following holds identically:

(q, p(q, c)) = c

We an then differentiate this with respect to q

to obtain

∂f

∂q

∂f

∂p

∂q

= 0

on M

. Now recall that the {f

}’s are in involution. So on M

, we have

0 = {f

, f

}

∂f

∂q

∂f

∂p

−

∂f

∂p

∂f

∂q



−

∂f

∂p

∂q



∂f

∂p

−

∂f

∂p



−

∂f

∂p

∂q





−

∂f

∂p

∂q



∂f

∂p

−

∂f

∂p



−

∂f

∂p

∂q



∂f

∂p



∂p

∂q

−

∂p

∂q



∂f

∂p

Recall that the determinants of the matrices

∂f

∂p

and

∂f

∂p

are non-zero, i.e. the

matrices are invertible. So for this to hold, the middle matrix must vanish! So

we have

∂p

∂q

−

∂p

∂q

= 0.

In our particular case of

= 2, since

`, k

can only be 1

2, the only non-trivial

thing this says is

∂p

∂q

−

∂p

∂q

= 0.

Now suppose we have two “simple” loops Γ

and Γ

. Then they bound an area

Then we have

−

p · dq =

∂A

p · dq



∂p

∂q

−

∂p

∂q



= 0

by Green’s theorem.

So I

is well-defined, and

I = I(c)

is just a function of

. This will be our new “momentum” coordinates. To figure

out what the angles

should be, we use generating functions. For now, we

assume that we can invert I(c), so that we can write

c = c(I).

We arbitrarily pick a point x

, and define the generating function

S(q, I) =

p(q

, c(I)) · dq

where

= (

q, p

) = (

q, p

(

q, c

(

))). However, this is not a priori well-defined,

because we haven’t said how we are going to integrate from

. We are going

to pick paths arbitrarily, but we want to make sure it is well-defined. Suppose

we change from a path γ

to γ

by a little bit, and they enclose a surface B.

Then we have

S(q, I) 7→ S(q, I) +

∂B

p · dq.

Again, we are integrating p ·dq around a boundary, so there is no change.

However, we don’t live in flat space. We live in a torus, and we can have a

crazy loop that does something like this:

Then what we have effectively got is that we added a loop (say) Γ

to our path,

and this contributes a factor of 2

πI

. In general, these transformations give

changes of the form

S(q, I) 7→ S(q, I) + 2πI

This is the only thing that can happen. So differentiating with respect to

, we

know that

φ =

∂S

∂I

is well-defined modulo 2

. These are the angles coordinates. Note that just like

angles, we can pick

consistently locally without this ambiguity, as long as we

stay near some fixed point, but when we want to talk about the whole surface,

this ambiguity necessarily arises. Now also note that

∂S

∂q

= p.

Indeed, we can write

S =

F · dx

where

F = (p, 0).

So by the fundamental theorem of calculus, we have

∂S

∂x

= F.

So we get that

∂S

∂q

= p.

In summary, we have constructed on M

the following: I = I(c), S(q, I), and

φ =

∂S

∂I

, p =

∂S

∂q

is a generator for the canonical transformation, and (

q, p

)

7→

(

φ, I

) is a

canonical transformation.

Note that at any point

, we know

(

). So

(

) =

(

) depends on the

first integrals only. So we have

I = 0.

So Hamilton’s equations become

φ =

∂

∂I

I = 0 =

∂

∂φ

So the new Hamiltonian depends only on I. So we can integrate up and get

φ(t) = φ

+ Ωt, I(t) = I

where

Ω =

∂

∂I

To summarize, to integrate up an integrable Hamiltonian system, we identify

the different cycles Γ

, ··· , Γ

on M

. We then construct

2π

p · dq,

where p = p(q, c). We then invert this to say

c = c(I).

We then compute

φ =

∂S

∂I

where

S =

p(q

, c(I)) · dq

Now we do this again with the Harmonic oscillator.

Example. In the harmonic oscillator, we have

H(q, p) =

We then have



(q, p) :

= c



The first part of the Arnold-Liouville theorem says this is diffeomorphic to

, which it is! The next step is to pick a loop, and there is an obvious

one — the circle itself. We write

p = p(q, c) = ±

2c − ω

on M

. Then we have

I =

2π

p · dq =

We can then write c as a function of I by

c = c(I) = ωI.

Now construct

S(q, I) =

p(q

, c(I)) dq

We can pick x

to be the point corresponding to θ = 0. Then this is equal to

2ωI − ω

To find φ, we need to differentiate this thing to get

φ =

∂S

∂I

= ω

2ωI − ω

= sin

−1





As expected, this is only well-defined up to a factor of 2

! Using the fact that

c = H, we have

q =

2π

sin φ, p =

√

2Iω cos φ.

These are exactly the coordinates we obtained through divine inspiration last

time.

2 Partial Differential Equations

For the remainder of the course, we are going to look at PDE’s. We can view

these as infinite-dimensional analogues of ODE’s. So what do we expect for

integrable PDE’s? Recall that If an 2

-dimensional ODE is integrable, then we

first integrals. Since PDE’s are infinite-dimensional, and half of infinity is still

infinity, we would expect to have infinitely many first integrals. Similar to the

case of integrable ODE’s, we would also expect that there will be some magic

transformation that allows us to write down the solution with ease, even if the

initial problem looks very complicated.

These are all true, but our journey will be less straightforward. To begin with,

we will not define what integrability means, because it is a rather complicated

issue. We will go through one method of “integrating up” a PDE in detail,

known as the inverse scattering transform, and we will apply it to a particular

equation. Unfortunately, the way we apply the inverse scattering transform to a

PDE is not obvious, and here we will have to do it through “divine inspiration”.

Before we get to the inverse scattering transform, we first look at a few

examples of PDEs.

2.1 KdV equation

The KdV equation is given by

+ u

xxx

− 6uu

= 0.

Before we study the KdV equation, we will look at some variations of this where

we drop some terms, and then see how they compare.

Example. Consider the linear PDE

+ u

xxx

= 0,

where

(

x, t

) is a function on two variables. This admits solutions of the

form

ikx−iωt

known as plane wave modes. For this to be a solution,

must obey the dispersion

relation

ω = ω(k) = −k

For any

, as long as we pick

this way, we obtain a solution. By writing the

solution as

u(x, t) = exp



x −

ω(k)



we see that plane wave modes travel at speed

= −k

It is very important that the speed depends on

. Different plane wave modes

travel at different speeds. This is going to give rise to what we call dispersion.

A general solution is a superposition of plane wave modes

a(k)e

ikx−iω(k)t

or even an uncountable superposition

A(k)e

ikx−iω(k)t

dk.

It is a theorem that for linear PDE’s on convex domains, all solutions are indeed

superpositions of plane wave modes. So this is indeed completely general.

So suppose we have an initial solution that looks like this:

We write this as a superposition of plane wave modes. As we let time pass,

different plane wave modes travel at different speeds, so this becomes a huge

mess! So after some time, it might look like

Intuitively, what gives us the dispersion is the third order derivative

∂

. If we

had ∂

instead, then there will be no dispersion.

Example. Consider the non-linear PDE

− 6uu

= 0.

This looks almost intractable, as non-linear PDE’s are scary, and we don’t know

what to do. However, it turns out that we can solve this for any initial data

(

0) =

(

) via the method of characteristics. Details are left on the second

example sheet, but the solution we get is

u(x, t) = f (ξ),

where ξ is given implicitly by

ξ = x − 6tf(ξ)

We can show that

becomes, in general, infinite in finite time. Indeed, we have

= f

(ξ)

∂ξ

∂x

We differentiate the formula for ξ to obtain

∂ξ

∂x

= 1 − 6tf

(ξ)

∂ξ

∂x

So we know

∂ξ

∂x

becomes infinite when 1+6

(

) = 0. In general, this happens in

finite time, and at the time, we will get a straight slope. After that, it becomes

a multi-valued function! So the solution might evolve like this:

This is known as wave-breaking.

We can imagine that −6uu

gives us wave breaking.

What happens if we combine both of these effects?

Definition (KdV equation). The KdV equation is given by

+ u

xxx

− 6uu

= 0.

It turns out that this has a perfect balance between dispersion and non-

linearity. This admits very special solutions known as solitons. For example, a

1-solution solution is

u(x, t) = −2χ

sech



(x − 4χ



The solutions tries to both topple over and disperse, and it turns out they

actually move like normal waves at a constant speed. If we look at the solution,

then we see that this has a peculiar property that the speed of the wave depends

on the amplitude — the taller you are, the faster you move.

Now what if we started with two of these solitons? If we placed them far

apart, then they should not interact, and they would just individually move to

the right. But note that the speed depends on the amplitude. So if we put a

taller one before a shorter one, they might catch up with each other and then

collide! Indeed, suppose they started off looking like this:

After a while, the tall one starts to catch up:

Note that both of the humbs are moving to the right. It’s just that we had to

move the frame so that everything stays on the page. Soon, they collide into

each other:

and then they start to merge:

What do we expect to happen? The KdV equation is a very complicated non-

linear equation, so we might expect a lot of interactions, and the result to be a

huge mess. But nope. They pass through each other as if nothing has happened:

and then they just walk away

and then they depart.

This is like magic! If we just looked at the equation, there is no way we could

have guessed that these two solitons would interact in such an uneventful manner.

Non-linear PDEs in general are messy. But these are very stable structures in

the system, and they behave more like particles than waves.

At first, this phenomenon was discovered through numerical simulation.

However, later we will see that the KdV equation is integrable, and we can in

fact find explicit expressions for a general N -soliton equation.

2.2 Sine–Gordon equation

We next look at another equation that again has soliton solutions, known as the

sine–Gordon equation.

Definition (Sine–Gordon equation). The sine–Gordon equation is given by

− u

+ sin u = 0.

This is known as the sine–Gordon equation, because there is a famous

equation in physics known as the Klein–Gordon equation, given by

− u

+ u = 0.

Since we have a sine instead of a u, we call it a sine-Gordon equation!

There are a few ways we can motive the sine-Gordon equation. We will use

one from physics. Suppose we have a chain of pendulums of length

with masses

m m m m m m

∆x

The pendulum will be allowed to rotate about the vertical plane, i.e. the plane

with normal along the horizontal line, and we specify the angle by

(

). Since

we want to eventually take the limit as ∆

x →

0, we imagine

is a function of

both space and time, and write this as θ

(t) = θ(i∆x, t).

Since gravity exists, each pendulum has a torque of

−m`g sin θ

We now introduce an interaction between the different pendulum. We imagine

the masses are connected by some springs, so that the

th pendulum gets a

torque of

K(θ

i+1

− θ

)

∆x

K(θ

i−1

− θ

)

∆x

By Newton’s laws, the equations of motion is

= −mg` sin θ

K(θ

i+1

− 2θ

+ θ

i−1

)

∆x

We divide everything by ∆

, and take the limit as ∆

x →

0, with

∆x

held

constant. We then end up with

∂

∂t

= −Mg` sin θ + K

∂

∂x

Making some simple coordinate scaling, this becomes

− u

+ sin u = 0.

There is also another motivation for this from differential geometry. It turns out

solutions to the sine-Gordon equation correspond to pseudospherical surfaces in

, namely the surfaces that have constant negative curvature.

If we pick so-called “light cone coordinates”

(

x − t

) and

(

then the sine-Gordon equations become

∂

∂ξ∂τ

= sin u,

and often this is the form of the sine-Gordon equations we will encounter.

This also admits soliton solutions

u(x, t) = 4 tan

−1



exp



x − vt

√

1 − v



We can check that this is indeed a solution for this non-linear PDE.

This solution looks like

2π

Now remember that

was an angle. So 2

is just the same as 0! If we think of

the value of

as living in the circle

, then this satisfies the boundary condition

u → 0 as x → ±∞:

If we view it this way, it is absolutely obvious that no matter how this solution

evolves in time, it will never become, or even approach the “trivial” solution

u = 0, even though both satisfy the boundary condition u → 0 as x → ±∞.

2.3 B¨acklund transformations

For a linear partial differential equation, we have the principle of superposition

— if we have two solutions, then we can add them to get a third solution. This is

no longer true in non-linear PDE’s.

One way we can find ourselves a new solution is through a B¨acklund trans-

formation. This originally came from geometry, where we wanted to transform a

surface to another, but we will only consider the applications to PDE’s.

The actual definition of the B¨acklund transformation is complicated. So we

start with an example.

Example. Consider the Cauchy-Riemann equation

= v

, u

= −v

We know that the pair

u, v

satisfies the Cauchy-Riemann equations, if and only

if both u, v are harmonic, i.e. u

+ u

= 0 etc.

Now suppose we have managed to find a harmonic function

(

x, y

). Then

we can try to solve the Cauchy-Riemann equations, and we would get another

harmonic function u = u(x, y).

For example, if v = 2xy, then we get the partial differential equations

= 2x, u

= −2y.

So we obtain

u(x, y) = x

− y

+ C

for some constant

, and this function

is guaranteed to be a solution to

Laplace’s equations.

So the Cauchy-Riemann equation generates new solutions to Laplace’s equa-

tion from old ones. This is an example of an (auto-)B¨acklund transformation for

Laplace’s equation.

In general, we have the following definition:

Definition

(B¨acklund transformation)

A B¨acklund transformation is a system

of equations that relate the solutions of some PDE’s to

(i) A solution to some other PDE; or

(ii) Another solution to the same PDE.

In the second case, we call it an auto-B¨acklund transformation.

Example.

The equation

is related to the equation

= 0 via the

B¨acklund transformation

+ v

√

2 exp



u − v



, u

− v

√

2 exp



u + v



The verification is left as an exercise on the first example sheet. Since

= 0 is

an easier equation to solve, this gives us a method to solve u

= e

We also have examples of auto-B¨acklund transformations:

Example. For any non-zero constant ε, consider

∂

∂ξ

(ϕ

− ϕ

) = 2ε sin



+ ϕ



∂

∂τ

(ϕ

+ ϕ

) =

sin



− ϕ



These equations come from geometry, and we will not go into details motivating

these. We can compute

∂

∂ξ∂τ

(ϕ

− ϕ

) =

∂

∂τ



2ε sin



+ ϕ



= 2ε cos



+ ϕ



∂

∂τ



+ ϕ



= 2ε cos



+ ϕ



sin



− ϕ



= 2 cos



+ ϕ



sin



− ϕ



= sin ϕ

− sin ϕ

It then follows that

∂

∂ξ∂τ

= sin ϕ

⇐⇒

∂

∂ξ∂τ

= sin ϕ

In other words,

solves the sine-Gordon equations in light cone coordinates,

if and only if

does. So this gives an auto-B¨acklund transformation for the

sine-Gordon equation. Moreover, since we had a free parameter

, we actually

have a family of auto-B¨acklund transforms.

For example, we already know a solution to the sine-Gordon equation, namely

= 0. Using this, the equations say we need to solve

∂ϕ

∂ξ

= 2ε sin

∂ϕ

∂τ

= −

sin

We see this equation has some sort of symmetry between

and

. So we use an

ansatz

ϕ(ξ, τ ) = 2χ(εξ − ε

−1

τ).

Then both equations tell us

dχ

= sin χ.

We can separate this into

csc χ dχ = dx.

Integrating this gives us

log tan

= x + C.

So we find

χ(x) = 2 tan

−1

(Ae

So it follows that

ϕ(ξ, τ ) = 2 tan

−1

(A(εξ + ε

−1

τ)),

where

and

are free parameters. After a bit more work, this was the 1-soliton

solution we previously found.

Applying the B¨acklund transform again to this new solution produces multi-

soliton solutions.

3 Inverse scattering transform

Recall that in IB Methods, we decided we can use Fourier transforms to solve

PDE’s. For example, if we wanted to solve the Klein–Gordon equation

− u

= u,

then we simply had to take the Fourier transform with respect to x to get

ˆu

+ k

ˆu = ˆu.

This then becomes a very easy ODE in t:

ˆu

= (1 − k

)ˆu,

which we can solve. After solving for this, we can take the inverse Fourier

transform to get u.

The inverse scattering transform will follow a similar procedure, except it is

much more involved and magical. Again, given a differential equation in

(

x, t

for each fixed time

, we can transform the solution

(

x, t

) to something known

as the scattering data of

. Then the differential equation will tell us how the

scattering data should evolve. After we solved for the scattering data at all

times, we invert the transformation and recover the solution u.

We will find that each step of that process will be linear, i.e. easy, and this

will magically allow us to solve non-linear equations.

3.1 Forward scattering problem

Before we talk about the inverse scattering transform, it is helpful to know

what the forward problem is. This is, as you would have obviously guessed,

related to the Schr¨odinger operator we know and love from quantum mechanics.

Throughout this section, L will be the Schr¨odinger operator

L = −

∂

∂x

+ u(x),

where the “potential”

has compact support, i.e.

= 0 for

|x|

sufficiently large.

What we actually need is just that

decays quickly enough as

|x| → ∞

, but to

make our life easy, we do not figure out the precise conditions to make things

work, and just assume that

actually vanishes for large

|x|

. For a fixed

, we are

interested in an eigenvalue (or “spectral”) problem, i.e. we want to find solutions

Lψ = λψ.

This is the “forward” problem, i.e. given a

, we want to find the eigenvalues and

eigenfunctions. The inverse problem is given the collection of all such eigenvalues

and eigenfunctions, some sort of solutions like this, we want to find out what

is.

We will divide this into the continuous and discrete cases.

3.1.1 Continuous spectrum

Here we consider solutions to

Lψ

for real

. Since

= 0 for

|x|

large, we

must have

+ k

ψ = 0

for large |x|.

So solutions as

|x| → ∞

are linear combinations of

±ikx

. We look for specific

solutions for ψ = ϕ(x, k) defined by the condition

ϕ = e

−ikx

as x → −∞.

Then there must be coefficients a = a(k) and b = b(k) such that

φ(x, k) = a(k)e

−ikx

+ b(k)e

ikx

as x → +∞.

We define the quantities

Φ(x, k) =

ϕ(x, k)

a(k)

, R(k) =

b(k)

a(k)

, T (k) =

a(k)

Here

(

) is called the reflection coefficient, and

(

) is the transmission

coefficient. You may have seen these terms from IB Quantum Mechanics. Then

we can write

Φ(x, k) =

(

T (k)e

−ikx

x → −∞

−ikx

+ R(k)e

x → +∞

We can view the

−ikx

term as waves travelling to the left, and

ikx

as waves

travelling to the right. Thus in this scenario, we have an incident

−ikx

wave

coming from the right, the potential reflects some portion of the wave, namely

R(k)e

ikx

, and transmits the remaining T (k)e

−ikx

. It will be shown on the first

example sheet that in fact |T (k)|

+ |R(k)|

= 1.

What would happen when we change

? Since

is the “frequency” of the

wave, which is proportional to the energy we would expect that the larger

is,

the more of the wave is transmitted. Thus we might expect that

(

)

→

1, and

(

)

→

0. This is indeed true, but we will not prove it. We can think of these

as “boundary conditions” for T and R.

So far, we’ve only been arguing hypothetically about what the solution has

to look like if it existed. However, we do not know if there is a solution at all!

In general, differential equations are bad. They are hard to talk about,

because if we differentiate a function, it generally gets worse. It might cease to

be differentiable, or even continuous. This means differential operators could

take our function out of the relevant function space we are talking about. On

the other hand, integration makes functions look better. The more times we

integrate, the smoother it becomes. So if we want to talk about the existence of

solutions, it is wise to rewrite the differential equation as an integral solution

instead.

We consider the integral equation for f = f (x, k) given by

f(x, k) = f

(x, k) +

∞

−∞

G(x − y, k)u(y)f(y, k) dy,

where

is any solution to (

∂

)

= 0, and

is the Green’s function for

the differential operator ∂

+ k

, i.e. we have

(∂

+ k

)G = δ(x).

What we want to show is that if we can find an

that satisfies this integral

equation, then it also satisfies the eigenvalue equation. We simply compute

(∂

+ k

)f = (∂

+ k

∞

−∞

(∂

+ k

)G(x − y, k)u(y)f(y, k) dy

= 0 +

∞

−∞

δ(x − y)u(y)f(y, k) dy

= u(x)f(x, k).

In other words, we have

Lf = k

So it remains to prove that solutions to the integral equation exists.

We pick f

= e

−ikx

and

G(x, k) =

(

0 x < 0

sin(kx) x ≥ 0

Then our integral equation automatically implies

f(x, k) = e

−ikx

x → −∞

, because for very small

, either

x −y <

0 or

is very small, so the

integral always vanishes as u has compact support.

To solve the integral equation, we write this in abstract form

(I − K)f = f

where I is the identity, and

(Kf)(x) =

∞

−∞

G(x − y, k)u(y)f(y, k) dy.

So we can “invert”

f = (I − K)

−1

We can “guess” a solution to the inverse. If we don’t care about rigour and just

expand this, we get

f = (I + K + K

+ ···)f

It doesn’t matter how unrigorous our derivation was. To see it is a valid solution,

we just have to check that it works! The first question to ask is if this expression

converges. On the second example sheet, we will show that this thing actually

converges. If this holds, then we have

(I − K)f = If

+ Kf

+ K

+ ··· − (K + K

+ K

+ ···) = f

So this is a solution!

Of course, this result is purely formal. Usually, there are better ad hoc ways

to solve the equation, as we know from IB Quantum Mechanics.

3.1.2 Discrete spacetime and bound states

We now consider the case

−κ

0, where we wlog

λ >

0. We are going to

seek solutions to

Lψ

= −κ

This time, we are going to ask that

kψ

∞

−∞

(x)

dx = 1.

We will wlog ψ

∈ R. We will call these things bound states.

Since u has compact support, any solution

Lϕ = −κ

must obey

− κ

φ = 0

for

|x| → ∞

. Then the solutions are linear combinations of

±κx

|x| → ∞

We now fix ϕ

by the boundary condition

(x) = e

−κx

as x → +∞

Then as x → −∞, there must exist coefficients α = α(κ), β = β(κ) such that

(x) = α(κ)e

κx

+ β(κ)e

−κx

as x → −∞.

Note that for any

, we can solve the equation

Lϕ

−κ

and find a solution

of this form. However, we have the additional condition that

kψ

= 1, and in

particular is finite. So we must have

(

) = 0. It can be shown that the function

β = β(κ) has only finitely many zeroes

> χ

> ··· > χ

> 0.

So we have a finite list of bound-states {ψ

}

n=1

, written

(x) = c

(x),

where c

are normalization constants chosen so that kψ

k = 1.

3.1.3 Summary of forward scattering problem

In summary, we had a spectral problem

Lψ = λψ,

where

L = −

∂

∂x

+ u,

where u has compact support. The goal is to find ψ and λ.

In the continuous spectrum, we have

0. Then we can find some

T (k) and R(k) such that

φ(x, k) =

(

T (k)e

−ikx

x → −∞

−ikx

+ R(k)e

x → +∞

and solutions exist for all k.

In the discrete spectrum, we have

−κ

0. We can construct bound

states {ψ

}

n=1

such that Lψ

= −χ

with

> χ

> ··· > χ

> 0,

and kψ

k = 1.

Bound states are characterized by large, positive x behaviour

(x) = c

−χ

as x → +∞,

where {c

}

n=1

are normalization constants.

Putting all these together, the scattering data for L is

S =



{χ

, c

}

n=1

, R(k), T (k)



Example.

Consider the Dirac potential

(

) =

−

αδ

(

), where

α >

0. Let’s

try to compute the scattering data.

We do the continuous spectrum first. Since

(

) = 0 for

x 6

= 0, we must have

Φ(x, k) =

(

T (k)e

−ikx

x < 0

−ikx

+ R(k)e

ikx

x > 0

Also, we want Φ(x, k) to be continuous at x = 0. So we must have

T (k) = 1 + R(k).

By integrating

Φ =

Φ over (

−ε, ε

), taking

ε →

0, we find that

∂Φ

∂x

has a jump

discontinuity at x = 0 given by

ik(R − 1) + ikT = −2αT.

We now have two equations and two unknowns, and we can solve to obtain

R(k) =

iα

k − iα

, T (k) =

k − iα

We can see that we indeed have

|R|

+ |T |

= 1.

Note that as

increases, we find that

(

)

→

0 and

(

)

→

1. This makes

sense, since we can think of

as the energy of the wave, and the larger the

energy, the more likely we are to pass through.

Now let’s do the discrete part of the spectrum, and we jump through the

same hoops. Since δ(x) = 0 for x 6= 0, we must have

−

∂

∂x

+ χ

= 0

for x 6= 0. So we have

(x) = c

−χ

|x|

Integrating Lψ

= −χ

over (−ε, ε), we similarly find that

= c

α.

So there is just one bound state, with

. We finally find

by requiring

kψ

k = 1. We have

1 =

∞

−∞

(x)

dx = c

∞

−∞

−2χ

|x|

dx =

So we have

√

α.

In total, we have the following scattering data:

S =



{α,

√

α},

iα

k − iα



3.2 Inverse scattering problem

We might be interested in the inverse problem. Given scattering data

S =



{χ

, c

}

n=1

, R(k), T (k)



can we reconstruct the potential u = u(x) such that

L = −

∂

∂x

+ u(x)

has scattering data

? The answer is yes! Moreover, it turns out that

(

) is

not needed.

We shall write down a rather explicit formula for the inverse scattering

problem, but we will not justify it.

Theorem

(GLM inverse scattering theorem)

A potential

(

) that decays

rapidly to 0 as |x| → ∞ is completely determined by its scattering data

S =



{χ

, c

}

n=1

, R(k)



Given such a scattering data, if we set

F (x) =

n=1

−χ

2π

∞

−∞

ikx

R(k) dk,

and define k(x, y) to be the unique solution to

k(x, y) + F (x + y) +

∞

k(x, z)f(z + y) dz = 0,

then

u(x) = −2

k(x, x).

Proof. Too hard.

Note that this equation

k(x, y) + F (x + y) +

∞

k(x, z)f(z + y) dz = 0

is not too hard to solve. We can view it as a linear equation of the form

x + b + Ax = 0

for some linear operator

, then use our familiar linear algebra techniques to

guess a solution. Afterwards, we can then verify that it works. We will see an

explicit example later on when we actually use this to solve problems.

Now that we’ve got this result, we understand how scattering problems work.

We know how to go forwards and backwards.

This is all old theory, and not too exciting. The real exciting thing is how

we are going to use this to solve PDE’s. Given the KdV equation

+ u

xxx

− 6uu

= 0,

we can think of this as a potential evolving over time, with a starting potential

(

0) =

(

). We then compute the initial scattering data

and

Afterwards, we obtain the corresponding equations of evolution of the scattering

data form the KdV equation. It turns out this is really simple — the

are

always fixed, and the others evolve as

R(k, t) = e

8ik

R(k, 0)

T (k, t) = T (k, 0)

(t) = e

4χ

(0).

Then we use this GLM formula to reconstruct the potential u at all times!

3.3 Lax pairs

The final ingredient to using the inverse scattering transform is how to relate

the evolution of the potential to the evolution of the scattering data. This is

given by a lax pair.

Recall that when we studied Hamiltonian systems at the beginning of the

course, under a Hamiltonian flow, functions evolve by

= {f, H}.

In quantum mechanics, when we “quantize” this, in the Heisenberg picture, the

operators evolve by

= [L, H].

In some sense, these equations tell us

“generates” time evolution. What we

need here is something similar — an operator that generates the time evolution

of our operator.

Definition (Lax pair). Consider a time-dependent self-adjoint linear operator

L = a

(x, t)

∂

∂x

+ ··· + a

(x, t)

∂

∂x

+ a

(x, t),

where the

}

(possibly matrix-valued) functions of (

x, t

). If there is a second

operator A such that

= LA − AL = [L, A],

where

= ˙a

∂

∂x

+ ··· + ˙a

denotes the derivative of L with respect to t, then we call (L, A) a Lax pair.

The main theorem about Lax pairs is the following isospectral flow theorem:

Theorem

(Isospectral flow theorem)

Let (

L, A

) be a Lax pair. Then the

discrete eigenvalues of

are time-independent. Also, if

Lψ

λψ

, where

is a

discrete eigenvalue, then

ψ = λ

ψ,

where

ψ = ψ

+ Aψ.

The word “isospectral” means that we have an evolving system, but the

eigenvalues are time-independent.

Proof.

We will assume that the eigenvalues at least vary smoothly with

, so

that for each eigenvalue

= 0 with eigenfunction

(

), we can find some

λ(t) and ψ(x, t) with λ(0) = λ

, ψ(x, 0) = ψ

(x) such that

L(t)ψ(x, t) = λ(t)ψ(x, t).

We will show that in fact

(

) is constant in time. Differentiating with respect

to t and rearranging, we get

ψ = L

ψ + Lψ

− λψ

= LAψ − ALψ + Lψ

− λψ

= LAψ − λAψ + Lψ

− λψ

= (L − λ)(ψ

+ Aψ)

We now take the inner product ψ, and use that kψk = 1. We then have

= hψ, λ

ψi

= hψ, (L −λ)(ψ

+ A

= h(L − λ)ψ, ψ

+ A

= 0,

using the fact that L, hence L − λ is self-adjoint.

So we know that

= 0, i.e. that

is time-independent. Then our above

equation gives

ψ = λ

ψ,

where

ψ = ψ

+ Aψ.

In the case where

is the Schr¨odinger operator, the isospectral theorem tells

us how we can relate the evolution of some of the scattering data (namely the

), to some differential equation in

(namely the Laxness of

). For a cleverly

chosen

, we will be able to relate the Laxness of

to some differential equation

, and this establishes our first correspondence between evolution of

and

the evolution of scattering data.

Example. Consider

L = −∂

+ u(x, t)

A = 4∂

− 3(u∂

+ ∂

u).

Then (L, A) is a Lax pair iff u = u(x, t) satisfies KdV. In other words, we have

− [L, A] = 0 ⇐⇒ u

+ u

xxx

− 6uu

= 0.

3.4 Evolution of scattering data

Now we do the clever bit: we allow the potential

(

x, t

) to evolve via KdV

+ u

xxx

− 6uu

= 0.

We see how the scattering data for

−∂

(

x, t

) evolves. Again, we will

assume that u has compact support. Note that this implies that we have

A = 4∂

as |x| → ∞.

3.4.1 Continuous spectrum (λ = k

> 0)

As in Section 3.1.1, for each

, we can construct a solution

Lϕ

such

that

ϕ(x, t) =

(

−ikx

x → −∞

a(k, t)e

−ikx

+ b(k, t)e

ikx

x → ∞

This time, we know that for any

, we can find a solution for any

. So we can

assume that k is fixed in the equation

Lϕ = k

ϕ.

We assume that

is a solution to the KdV equation, so that (

L, A

) is a Lax

pair. As in the proof of the isospectral flow theorem, we differentiate this to get

0 = (L − k

)(ϕ

+ A

This tells us that

˜ϕ = ϕ

+ Aϕ

solves

L ˜ϕ = k

˜ϕ.

We can try to figure out what

˜ϕ

is for large

|x|

. We recall that for large

|x|

, we

simply have A = 4∂

. Then we can write

˜ϕ(x, t) =

(

4ik

−ikx

x → −∞

+ 4ik

a)e

−ikx

+ (b

− 4ik

b)e

ikx

x → ∞

We now consider the function

θ = 4ik

ϕ − ˜ϕ.

By linearity of L, we have

Lθ = k

θ.

Note that by construction, we have

(

x, t

)

→

0 as

x → −∞

. We recall that the

solution to Lf = k

f for f = f

as x → −∞ is just

f = (I − K)

−1

= (I + K + K

+ ···)f

So we obtain

θ = (1 + K + K

+ ···)0 = 0.

So we must have

˜ϕ = 4ik

ϕ.

Looking at the x → +∞ behaviour, we figure that

+ 4ik

a = 4ik

− 4ik

b = 4ik

Of course, these are equations we can solve. We have

a(k, t) = a(k, 0)

b(k, t) = b(k, 0)e

8ik

In terms of the reflection and transmission coefficients, we have

R(k, t) = R(k, 0)e

8ik

T (k, t) = T (k, 0).

Thus, we have shown that if we assume

evolves according to the really compli-

cated KdV equation, then the scattering data must evolve in this simple way!

This is AMAZING.

3.4.2 Discrete spectrum (λ = −κ

< 0)

The discrete part is similar. By the isospectral flow theorem, we know the

are constant in time. For each

, we can construct bound states

{ψ

(

x, t

)

}

n=1

such that

Lψ

= −χ

, kψ

k = 1.

Moreover, we have

(x, t) = c

(t)e

−χ

|x|

as x → +∞.

From the isospectral theorem, we know the function

= ∂

+ Aψ

also satisfies

= −χ

It is an exercise to show that these solutions must actually be proportional to

one another. Looking at Wronskians, we can show that

∝ ψ

. Also, we have

hψ

i = hψ

, ∂

i + hψ

, Aψ

∂

∂t

hψ

, ψ

i + hψ

, Aψ

= 0,

using the fact that

is antisymmetric and

kψ

is constant. We thus deduce

that

= 0.

Looking at large x-behaviour, we have

(x, t) = ( ˙c

− 4χ

−χ

as x → +∞. Since

≡ 0, we must have

˙c

− 4χ

= 0.

So we have

(t) = c

(0)e

4χ

This is again AMAZING.

3.4.3 Summary of inverse scattering transform

So in summary, suppose we are given that

(

x, t

) evolves according to KdV,

namely

+ u

xxx

− 6uu

= 0.

If we have an initial condition

(

) =

(

0), then we can compute its scattering

data

S(0) =



{χ

, c

(0)}

n=1

, R(k, 0)



Then for arbitrary time, the scattering data for L = −∂

+ u is

S(t) =

{χ

, c

(0)e

4χ

}

n=1

, R(k, 0)e

8ik

We then apply GLM to obtain u(x, t) for all time t.

(x) S(0) =



{χ

, c

(0)}

n=1

, R(k, 0)



u(x, t) S(t) =

{χ

, c

(0)e

4χ

}

n=1

, R(k, 0)e

8ik

Construct scattering data

L=−∂

(x)

KdV

equation

Evolve

scattering

data

=[L,A]

Solve GLM equation

The key thing that makes this work is that

xxx

−

holds if and only if

= [L, A].

For comparison, this is what we would do if we had to solve

xxx

= 0

by a Fourier transform:

(x) ˆu

(k)

u(x, t) ˆu(u, t) = ˆu

(k)e

Fourier transform

xxx

ˆu

−ik

ˆu=0

Inverse Fourier

Transform

It is just the same steps, but with a simpler transform!

3.5 Reflectionless potentials

We are now going to actually solve the KdV equation for a special kind of

potential — reflectionless potentials.

Definition

(Reflectionless potential)

A reflectionless potential is a potential

u(x, 0) satisfying R(k, 0) = 0.

Now if u evolves according to the KdV equation, then

R(k, t) = R(k, 0)e

8ik

= 0.

So if a potential starts off reflectionless, then it remains reflectionless.

We now want to solve the GLM equation in this case. Using the notation

when we wrote down the GLM equation, we simply have

F (x) =

n=1

−χ

We will mostly not write out the

when we do this, and only put it back in at

the very end. We now guess that the GLM equation has a solution of the form

K(x, y) =

m=1

(x)e

−χ

for some unknown functions

}

(in the second example sheet, we show that

it must have this form). We substitute this into the GLM equation and find that

n=1

−χ

+ K

(x) +

m=1

(x)

∞

−(χ

+χ

−χ

= 0.

Now notice that the

−χ

for

= 1

, ··· , N

are linearly independent. So we

actually have N equations, one for each n. So we know that

−χ

+ K

(x) +

m=1

(x)

+ χ

−(χ

+χ

= 0 (∗)

for all

= 1

, ··· , N

. Now if our goal is to solve the

(

), then this is just a

linear equation for each x! We set

c = (c

−χ

, ··· , c

−χ

)

K = (K

(x), ··· , K

(x))

= δ

−(χ

−χ

+ χ

Then (∗) becomes

AK = −c.

This really is a linear algebra problem. But we don’t really have to solve this.

The thing we really want to know is

K(x, x) =

m=1

(x)e

−χ

m=1

−1

)

(−c)

−χ

Now note that

(x) = A

(x) = −c

−χ

= (−c)

−χ

So we can replace the above thing by

K(x, x) =

m=1

n=1

−1

)

= tr(A

−1

It is an exercise on the second example sheet to show that this is equal to

K(x, x) =

det A

(det A) =

log(det A).

So we have

u(x) = −2

log(det A).

We now put back the

-dependence we didn’t bother to write all along. Then we

have

u(x, t) = −2

∂

∂x

log(det A(x, t)),

where

(x, t) = δ

(0)

8χ

−(χ

+χ

+ χ

It turns out these are soliton solutions, and the number of discrete eigenstates

N is just the number of solitons!

3.6 Infinitely many first integrals

As we’ve previously mentioned, we are expecting our integrable PDE’s to have

infinitely many first integrals. Recall we can construct ϕ = ϕ(x, k, t) such that

Lϕ = k

ϕ,

and we had

ϕ(x, k, t) =

(

−ikx

x → −∞

a(k, t)e

−ikx

+ b(k, t)e

ikx

x → ∞

But when we looked at the evolution of the scattering data, we can actually

write down what

and

are. In particular,

(

k, t

) =

(

) is independent of

So we might be able to extract some first integrals from it. We have

ikx

ϕ(x, k, t) = a(k) + b(k, t)e

2ikx

as x → ∞.

We now take the average over [

] for

R → ∞

. We do the terms one by one.

We have the boring integral

a(k) dx = a(k).

For the b(k, t) term, we have

b(k, t)e

2ikx

dx = O





So we have

a(k) = lim

R→∞

ikx

ϕ(x, k, t) dx

= lim

R→∞

ikRx

ϕ(Rx, k, t) dx.

So can we figure out what this thing is? Since

−ikx

x → −∞

, it is

“reasonable” to write

ϕ(x, k, t) = exp



−ikx +

−∞

S(y, k, t) dy



for some function S. Then after some dubious manipulations, we would get

a(k) = lim

R→∞

exp

−∞

S(y, k, t) dy

= exp



∞

−∞

S(y, k, t) dy



. (†)

Now this is interesting, since the left hand side

(

) has no

-dependence, but

the right-hand formula does. So this is where we get our first integrals from.

Now we need to figure out what S is. To find S, recall that ϕ satisfies

Lϕ = k

ϕ.

So we just try to shove our formula of ϕ into this equation. Notice that

= (S − ik)ϕ, ϕ

= S

ϕ + (S − ik)

ϕ.

We then put these into the Schr¨odinger equation to find

− (2ik)S + S

= −u.

We have got no

’s left. This is a famous type of equation — a Ricatti-type

equation. We can make a guess

S(x, k, t) =

∞

n=1

(x, t)

(2ik)

This seems like a strange thing to guess, but there are indeed some good reasons

for this we will not get into. Putting this into the equation and comparing

coefficients of k

−n

, we obtain a recurrence relation

= −u

n+1

n−1

m=1

n−m

This is a straightforward recurrence relation to compute. We can make a

computer do this, and get

= −u

, S

= −u

+ u

, S

= ···

Using the expression for S in (†), we find that

log a(k) =

∞

−∞

S(x, k, t) dx

∞

n=1

(2ik)

∞

−∞

(x, t) dx.

Since the LHS is time-independent, so is the RHS. Moreover, this is true for all

k. So we know that

∞

−∞

(x, t) dt

must be constant with time!

We can explicitly compute the first few terms:

(i) For n = 1, we find a first integral

∞

−∞

u(x, t) dx

We can view this as a conservation of mass.

(ii) For n = 2, we obtain a first integral

∞

−∞

(x, t) dx.

This is actually boring, since we assumed that

vanishes at infinity. So

we knew this is always zero anyway.

(iii) For n = 3, we have

∞

−∞

(−u

(x, t) + u(x, t)

) dx =

∞

−∞

u(x, t)

dx.

This is in some sense a conservation of momentum.

It is an exercise to show that

is a total derivative for all even

, so we

don’t get any interesting conserved quantity. But still, half of infinity is infinity,

and we do get infinitely many first integrals!

4 Structure of integrable PDEs

4.1 Infinite dimensional Hamiltonian system

When we did ODEs, our integrable ODEs were not just random ODEs. They

came from some (finite-dimensional) Hamiltonian systems. If we view PDEs

as infinite-dimension ODEs, then it is natural to ask if we can generalize the

notion of Hamiltonian systems to infinite-dimensional ones, and then see if we

can put our integrable systems in the form of a Hamiltonian system. It turns

out we can, and nice properties of the PDE falls out of this formalism.

We recall that a (finite-dimensional) phase space is given by

and a non-degenerate anti-symmetric matrix

. Given a Hamiltonian function

H : M → R, the equation of motion for x(t) ∈ M becomes

= J

∂H

∂x

where

(

) is a vector of length 2

is a non-degenerate anti-symmetric matrix,

and H = H(x) is the Hamiltonian.

In the infinite-dimensional case, instead of having 2

coordinates

(

), we

have a function

(

x, t

) that depends continuously on the parameter

. When

promoting finite-dimensional things to infinite-dimensional versions, we think

as a continuous version of

. We now proceed to generalize the notions we

used to have for finite-dimensional to infinite dimensional ones.

The first is the inner product. In the finite-dimensional case, we could take

the inner product of two vectors by

x · y =

Here we have an analogous inner product, but we replace the sum with an

integral.

Notation. For functions u(x) and v(x), we write

hu, vi =

u(x)v(x) dx.

If u, v are functions of time as well, then so is the inner product.

For finite-dimensional phase spaces, we talked about functions of

. In

particular, we had the Hamiltonian

(

). In the case of infinite-dimensional

phase spaces, we will not consider arbitrary functions on

, but only functionals:

Definition

(Functional)

A functional

is a real-valued function (on some

function space) of the form

F [u] =

f(x, u, u

, u

, ···) dx.

Again, if u is a function of time as well, the F [u] is a function of time.

We used to be able to talk about the derivatives of functions. Time derivatives

would work just as well, but differentiating with respect to

will involve

the functional derivative, which you may have met in IB Variational Principles.

Definition

(Functional derivative/Euler-Lagrange derivative)

The functional

derivative of F = F [u] at u is the unique function δF satisfying

hδF, ηi = lim

ε→0

F [u + εη] − F [u]

for all smooth η with compact support.

Alternatively, we have

F [u + εη] = F [u] + εhδF, ηi+ o(ε).

Note that δF is another function, depending on u.

Example. Set

F [u] =

dx.

We then have

F [u + εη] =

+ εη

)

dx + ε

dx + o(ε)

= F [u] + εhu

, η

i + o(ε)

This is no good, because we want something of the form

hδF, ηi

, not an inner

product with η

. When in doubt, integrate by parts! This is just equal to

= F [u] + εh−u

, ηi + o(ε).

Note that when integrating by parts, we don’t have to mess with the boundary

terms, because η is assumed to have compact support. So we have

δF = −u

In general, from IB Variational Principles, we know that if

F [u] =

f(x, u, u

, u

, ···) dx,

then we have

δF =

∂f

∂u

− D



∂f

∂u



+ D



∂f

∂u



− ··· .

Here D

is the total derivative, which is different from the partial derivative.

Definition

(Total derivative)

Consider a function

(

x, u, u

, ···

). For any

given function u(x), the total derivative with respect to x is

f(x, u(x), u

(x), ···) =

∂f

∂x

+ u

∂f

∂u

+ u

∂f

∂u

+ ···

Example.

∂

∂x

(xu) = u, D

(xu) = u + xu

Finally, we need to figure out an alternative for

. In the case of a finite-

dimensional Hamiltonian system, it is healthy to think of it as an anti-symmetric

bilinear form, so that

vJw

applied to

and

. However, since we also have

an inner product given by the dot product, we can alternatively think of

as a

linear map R

→ R

so that we apply it as

v · Jw = v

Jw.

Using this J, we can define the Poisson bracket of f = f (x), g = g(x) by

{f, g} =

∂f

∂x

· J

∂g

∂x

We know this is bilinear, antisymmetric and satisfies the Jacobi identity.

How do we promote this to infinite-dimensional Hamiltonian systems? We

can just replace

∂f

∂x

with the functional derivative and the dot product with the

inner product. What we need is a replacement for

, which we will write as

There is no obvious candidate for

, but assuming we have found a reasonable

linear and antisymmetric candidate, we can make the following definition:

Definition

(Poisson bracket for infinite-dimensional Hamiltonian systems)

define the Poisson bracket for two functionals to be

{F, G} = hδF, JδGi =

δF (x)JδG(x) dx.

Since

is linear and antisymmetric, we know that this Poisson bracket is

bilinear and antisymmetric. The annoying part is the Jacobi identity

{F, {G, H}} + {G, {H, F }} + {H, {F, G}} = 0.

This is not automatically satisfied. We need conditions on

. The simplest

antisymmetric linear map we can think of would be

∂

, and this works, i.e.

the Jacobi identity is satisfied. Proving that is easy, but painful.

Finally, we get to the equations of motions. Recall that for finite-dimensional

systems, our equation of evolution is given by

= J

∂H

∂x

We make the obvious analogues here:

Definition

(Hamiltonian form)

An evolution equation for

(

x, t

) is in

Hamiltonian form if it can be written as

= J

δH

δu

for some functional

[

] and some linear, antisymmetric

such that the

Poisson bracket

{F, G} = hδF, JδGi

obeys the Jacobi identity.

Such a J is known as a Hamiltonian operator.

Definition

(Hamiltonian operator)

A Hamiltonian operator is linear antisym-

metric function

on the space of functions such that the induced Poisson

bracket obeys the Jacobi identity.

Recall that for a finite-dimensional Hamiltonian system, if

(

) is any

function, then we had

= {f, H}.

This generalizes to the infinite-dimensional case.

Proposition. If u

= JδH and I = I[u], then

= {I, H}.

In particular I[u] is a first integral of u

= JδH iff {I, H} = 0.

The proof is the same.

Proof.

= lim

ε→0

I[u + εu

] − I[u]

= hδI, u

i = hδI, JδHi = {I, H}.

In summary, we have the following correspondence:

2n-dimensional phase space infinite dimensional phase space

(t) : i = 1, ··· , 2n u(x, t) : x ∈ Ω

x · y =

hu, vi =

Ω

u(x, t)v(x, t) dx

∂

∂t

∂

∂x

δu

anti-symmetric matrix J anti-symmetric linear operator J

functions f = f (x) functionals F = F [u]

4.2 Bihamiltonian systems

So far, this is not too interesting, as we just generalized the finite-dimensional

cases in sort-of the obvious way. However, it is possible that the same PDE

might be able to be put into Hamiltonian form for different

’s. These are

known as bihamiltonian systems.

Definition

(Bihamiltonian system)

A PDE is bihamiltonian if it can be written

in Hamiltonian form for different J.

It turns out that when this happens, then the system has infinitely many first

integrals in involution! We will prove this later on. This is rather miraculous!

Example. We can write the KdV equation in Hamiltonian form by

= J

δH

, J

∂

∂x

, H

[u] =

+ u

dx.

We can check that this says

∂

∂x



∂

∂u

− D



∂

∂u



+ u



= 6uu

− u

xxx

and this is the KdV equation.

We can also write it as

= J

δH

, J

= −

∂

∂x

+ 4u∂

+ 2u

, H

[u] =

dx.

So KdV is bi-Hamiltonian. We then know that

δH

= J

δH

We define a sequence of Hamiltonians {H

}

n≥0

via

δH

n+1

= J

δH

We will assume that we can always solve for

n+1

given

. This can be proven,

but we shall not. We then have the miraculous result.

Theorem.

Suppose a system is bi-Hamiltonian via (

, H

) and (

, H

). It is

a fact that we can find a sequence {H

}

n≥0

such that

δH

n+1

= J

δH

Under these definitions,

}

are all first integrals of the system and are in

involution, i.e.

, H

} = 0

for all n, m ≥ 0, where the Poisson bracket is taken with respect to J

Proof. We notice the following interesting fact: for m ≥ 1, we have

, H

} = hδH

, J

δH

= hδH

, J

δH

m−1

= −hJ

δH

, δH

m−1

= −hJ

δH

n+1

, δH

m−1

= hδH

n+1

, J

δH

m−1

= {H

n+1

, H

m−1

Iterating this many times, we find that for any n, m, we have

, H

} = {H

, H

Then by antisymmetry, they must both vanish. So done.

4.3 Zero curvature representation

There is a more geometric way to talk about integrable systems, which is via

zero-curvature representations.

Suppose we have a function

(

x, t

), which we currently think of as being fixed.

From this, we construct

N × N

matrices

(

) and

(

) that depend

and its derivatives. The

will be thought of as a “spectral parameter”,

like the λ in the eigenvalue problem Lϕ = λϕ.

Now consider the system of PDE’s

∂

∂x

v = U(λ)v,

∂

∂t

v = V (λ)v, (†)

where v = v(x, t; λ) is an N-dimensional vector.

Now notice that here we have twice as many equations as there are unknowns.

So we need some compatibility conditions. We use the fact that

. So

we need

0 =

∂

∂t

U(λ)v −

∂

∂x

V (λ)v

∂U

∂t

v + U

∂v

∂t

−

∂V

∂x

v − V

∂v

∂x

∂U

∂t

v + UV x −

∂V

∂x

v − V Uv



∂U

∂t

−

∂V

∂x

+ [U, V ]



So we know that if a (non-trivial) solution to the PDE’s exists for any initial

then we must have

∂U

∂t

−

∂V

∂x

+ [U, V ] = 0.

These are known as the zero curvature equations.

There is a beautiful theorem by Frobenius that if this equation holds, then

solutions always exist. So we have found a correspondence between the existence

of solutions to the PDE, and some equation in U and V .

Why are these called the zero curvature equations? In differential geometry,

a connection

on a tangent bundle has a curvature given by the Riemann

curvature tensor

R = ∂Γ −∂Γ + ΓΓ −ΓΓ,

where Γ is the Christoffel symbols associated to the connection. This equation is

less silly than it seems, because each of the objects there has a bunch of indices,

and the indices on consecutive terms are not equal. So they do not just outright

cancel. In terms of the connection A, the curvature vanishes iff

∂A

∂x

−

∂A

∂x

+ [A

, A

] = 0,

which has the same form as the zero-curvature equation.

Example. Consider

U(λ) =



2λ u

−2λ



, V (λ) =

4iλ



cos u −i sin u

i sin u −cos u



Then the zero curvature equation is equivalent to the sine–Gordon equation

= sin u.

In other words, the sine–Gordon equation holds iff the PDEs (

†

) have a solution.

In geometry, curvature is an intrinsic property of our geometric object, say a

surface. If we want to to compute the curvature, we usually pick some coordinate

systems, take the above expression, interpret it in that coordinate system, and

evaluate it. However, we could pick a different coordinate system, and we get

different expressions for each of, say,

∂A

∂x

. However, if the curvature vanishes in

one coordinate system, then it should also vanish in any coordinate system. So

by picking a new coordinate system, we have found new things that satisfies the

curvature equation.

Back to the real world, in general, we can give a gauge transformation that

takes some solution (

U, V

) to a new (

) that preserves the zero curvature

equation. So we can use gauge transformations to obtain a lot of new solutions!

This will be explored in the last example sheet.

What are these zero-curvature representations good for? We don’t have time

to go deep into the matter, but these can be used to do some inverse-scattering

type things. In the above formulation of the sine–Gordon equation. If

→

as |x| → ∞, we write

v =





Then we have

∂

∂x







2λ u

−2λ





= iλ



−ψ



So we know





= A





iλx

+ B





−iλx

|x| → ∞

. So with any

satisfying the first equation in (

†

), we can associate

to it some “scattering data”

A, B

. Then the second equation in (

†

) tells us how

, and thus

A, B

evolves in time, and using this we can develop some inverse

scattering-type way of solving the equation.

4.4 From Lax pairs to zero curvature

Lax pairs are very closely related to the zero curvature. Recall that we had this

isospectral flow theorem — if Lax’s equation

= [L, A],

is satisfied, then the eigenvalues of

are time-independent. Also, we found that

our eigensolutions satisfied

ψ = ψ

+ Aψ = 0.

So we have two equations:

Lψ = λψ

+ Aψ = 0.

Now suppose we reverse this — we enforce that

= 0. Then differentiating the

first equation and substituting in the second gives

= [L, A].

So we can see Lax’s equation as a compatibility condition for the two equations

above. We will see that given any equations of this form, we can transform it

into a zero curvature form.

Note that if we have

L =



∂

∂x



n−1

j=0

(x, t)



∂

∂x



A =



∂

∂x



n−1

j=0

(x, t)



∂

∂x



then

Lψ = λψ

means that derivatives of order

≥ n

can be expressed as linear combinations of

derivatives < n. Indeed, we just have

∂

ψ = λψ −

n−1

j=0

(x, t)∂

ψ.

Then differentiating this equation will give us an expression for the higher

derivatives in terms of the lower ones.

Now by introducing the vector

Ψ = (ψ, ∂

ψ, ··· , ∂

n−1

ψ),

The equation Lψ = λψ can be written as

∂

∂x

Ψ = U(λ)Ψ,

where

U(λ) =







0 1 0 ··· 0

0 0 1 ··· 0

0 0 0 ··· 1

λ − u

−u

··· −u

n−1







Now differentiate “ψ

+ Aψ = 0” i times with respect to x to obtain

(∂

i−1

ψ)

+ ∂

i−1





m−1

j=0

(x, t)



∂

∂x







| {z }

j=1

(x,t)∂

j−1

= 0

for some

(

x, t

) depending on

, u

and their derivatives. We see that this

equation then just says

∂

∂t

Ψ = V Ψ.

So we have shown that

= [L, A] ⇔

(

Lψ = λψ

+ Aψ = 0

⇔

(

= U(λ)Ψ

= V (λ)Ψ

⇔

∂U

∂t

−

∂V

∂x

+ [U, V ] = 0.

So we know that if something can be written in the form of Lax’s equation, then

we can come up with an equivalent equation in zero curvature form.

5 Symmetry methods in PDEs

Finally, we are now going to learn how we can exploit symmetries to solve

differential equations. A lot of the things we do will be done for ordinary

differential equations, but they all work equally well for partial differential

equations.

To talk about symmetries, we will have to use the language of groups. But

this time, since differential equations are continuous objects, we will not be

content with just groups. We will talk about smooth groups, or Lie groups. With

Lie groups, we can talk about continuous families of symmetries, as opposed to

the more “discrete” symmetries like the symmetries of a triangle.

At this point, the more applied students might be scared and want to run

away from the word “group”. However, understanding “pure” mathematics is

often very useful when doing applied things, as a lot of the structures we see in

the physical world can be explained by concepts coming from pure mathematics.

To demonstrate this, we offer the following cautionary tale, which may or may

not be entirely made up.

Back in the 60’s, Gell-Mann was trying to understand the many different

seemingly-fundamental particles occurring nature. He decided one day that he

should plot out the particles according to certain quantum numbers known as

isospin and hypercharge. The resulting diagram looked like this:

So this is a nice picture, as it obviously formed some sort of lattice. However, it

is not clear how one can generalize this for more particles, or where this pattern

came from.

Now a pure mathematician happened to got lost, and was somehow walked

into in the physics department and saw that picture. He asked “so you are also

interested in the eight-dimensional adjoint representations of

(3)?”, and the

physicist was like, “no. . . ?”.

It turns out the weight diagram (whatever that might be) of the eight-

dimensional adjoint representation of

(3) (whatever that might be), looked

exactly like that. Indeed, it turns out there is a good correspondence between

representations of

(3) and quantum numbers of particles, and then the way to

understand and generalize this phenomenon became obvious.

5.1 Lie groups and Lie algebras

So to begin with, we remind ourselves with what a group is!

Definition (Group). A group is a set G with a binary operation

, g

) 7→ g

called “group multiplication”, satisfying the axioms

(i) Associativity: (g

= g

) for all g

, g

(ii)

Existence of identity: there is a (unique) identity element

e ∈ G

such that

ge = eg = g

for all g ∈ G

(iii) Inverses exist: for each g ∈ G, there is g

−1

∈ G such that

−1

= g

−1

g = e.

Example. (Z, +) is a group.

What we are really interested in is how groups act on certain sets.

Definition

(Group action)

A group

acts on a set

if there is a map

G × X → X sending (g, x) 7→ g(x) such that

g(h(x)) = (gh)(x), e(x) = x

for all g, h ∈ G and x ∈ X.

Example. The rotation matrices SO(2) acts on R

via matrix multiplication.

We are not going to consider groups in general, but we will only talk about

Lie groups, and coordinate changes born of them. For the sake of simplicity, we

are not going to use the “real” definition of Lie group, but use an easier version

that really looks more like the definition of a local Lie group than a Lie group.

The definition will probably be slightly confusing, but it will become clearer

with examples.

Definition

(Lie group)

-dimensional Lie group is a group such that all

the elements depend continuously on

parameters, in such a way that the

maps (

, g

)

7→ g

and

g 7→ g

−1

correspond to a smooth function of those

parameters.

In practice, it suffices to check that the map (g

, g

) 7→ g

−1

is smooth.

So elements of an (

-dimensional) Lie group can be written as

(

), where

t ∈ R

. We make the convention that

(0) =

. For those who are doing

differential geometry, this is a manifold with a group structure such that the

group operations are smooth maps. For those who are doing category theory,

this is a group object in the category of smooth manifolds.

Example. Any element of G = SO(2) can be written as

g(t) =



cos t −sin t

sin t cos t



for

t ∈ R

. So this is a candidate for a 1-dimensional Lie group that depends on

a single parameter

. We now have to check that the map (

, g

)

7→ g

−1

smooth. We note that

g(t

)

−1

= g(−t

So we have

g(t

)g(t

)

−1

= g(t

)g(−t

) = g(t

− t

So the map

, g

) 7→ g

−1

corresponds to

, t

) 7→ t

− t

Since this map is smooth, we conclude that

(2) is a 1-dimensional Lie group.

Example. Consider matrices of the form

g(t) =





1 t

0 1 t

0 0 1





, t ∈ R

It is easy to see that this is a group under matrix multiplication. This is known

as the Heisenberg group. We now check that it is in fact a Lie group. It has three

obvious parameters

, t

, and we have to check the smoothness criterion. We

have

g(a)g(b) =





1 a

0 1 a

0 0 1









1 b

0 1 b

0 0 1









1 a

+ b

+ a

0 1 a

+ b

0 0 1





We can then write down the inverse

g(b)

−1





1 −b

− b

0 1 −b

0 0 1





So we have

g(a)g(b)

−1





1 a

0 1 a

0 0 1









1 −b

− b

0 1 −b

0 0 1









1 a

− b

− a

+ a

0 1 a

− b

0 0 1





This then corresponds to

(a, b) 7→





− b

− a

+ a





which is a smooth map! So we conclude that the Heisenberg group is a three-

dimensional Lie group.

Recall that at the beginning of the course, we had vector fields and flow

maps. Flow maps are hard and complicated, while vector fields are nice and easy.

Thus, we often want to reduce the study of flow maps to the study of vector

fields, which can be thought of as the “infinitesimal flow”. For example, checking

that two flows commute is very hard, but checking that the commutator of two

vector fields vanishes is easy.

Here we are going to do the same. Lie groups are hard. To make life easier,

we look at “infinitesimal” elements of Lie groups, and this is known as the Lie

algebra.

We will only study Lie algebras informally, and we’ll consider only the case

of matrix Lie groups, so that it makes sense to add, subtract, differentiate the

elements of the Lie group (in addition to the group multiplication), and the

presentation becomes much easier.

Suppose we have a curve

(

) in our parameter space passing through 0 at

time 0. Then we can obtain a curve

A(ε) = g(x

(t))

in our Lie group G. We set a = A

(0), so that

A(ε) = I + εa + o(ε).

We now define the Lie algebra

to be the set of all “leading order terms”

arising from such curves. We now proceed to show that

is in fact a vector

space.

Suppose we have a second curve B(x), which we expand similarly as

B(ε) = I + εb + o(ε).

We will show that a + b ∈ g. Consider the curve

t 7→ A(t)B(t),

using the multiplication in the Lie group. Then we have

A(ε)B(ε) = (I + εa + o(ε))(I + εb + o(ε)) = I + ε(a + b) + o(ε).

So we know a, b ∈ g implies a + b ∈ g.

For scalar multiplication, given λ ∈ R, we can construct a new curve

t 7→ A(λt).

Then we have

A(λε) = I + ε(λa) + o(ε).

So if a ∈ g, then so is λa ∈ g for any λ ∈ R.

So we get that

has the structure of a vector space! This is already a little

interesting. Groups are complicated. They have this weird structure and they

are not necessarily commutative. However, we get a nice, easy vector space

structure form the group structure.

It turns out we can do something more fun. The commutator of any two

elements of g is also in g. To see this, we define a curve C(t) for t > 0 by

t 7→ A(

√

t)B(

√

t)A(

√

−1

√

−1

We now notice that

(

)

−1

I −εa

(

), since if

(

)

−1

ε˜a

(

), then

I = A(ε)A(ε)

−1

= (I + εa + o(ε))(I + ε˜a + o(ε))

= I + ε(a + ˜a) + o(ε)

So we must have ˜a = −a.

Then we have

C(ε) = (I +

√

εa + ···)(I + εb + ···)(I −

√

εa + ···)(I −

√

εb + ···)

= I + ε(ab − ba) + o(ε).

It is an exercise to show that this is actually true, because we have to keep track

of the second order terms we didn’t write out to make sure they cancel properly.

So if a, b ∈ g, then

[a, b]

= ab − ba ∈ g.

Vector spaces with this extra structure is called a Lie algebra. The idea is that

the Lie algebra consists of elements of the group infinitesimally close to the

identity. While the product of two elements

a, b

infinitesimally close to the

identity need not remain infinitesimally close to the identity, the commutator

ab − ba does.

Definition

(Lie algebra)

A Lie algebra is a vector space

equipped with a

bilinear, anti-symmetric map [

·, ·

]

g ×g → g

that satisfies the Jacobi identity

[a, [b, c]

]

+ [b, [c, a]

]

+ [c, [a, b]

]

= 0.

This antisymmetric map is called the Lie bracket.

If dim g = m, we say the Lie algebra has dimension m.

The main source of Lie algebras will come from Lie groups, but there are

many other examples.

Example. We can set g = R

, and

[a, b]

= a × b.

It is a straightforward (and messy) check to see that this is a Lie algebra.

Example. Let M be our phase space, and let

g = {f : M → R smooth}.

Then

[f, g]

= {f, g}

is a Lie algebra.

Example. We now find the Lie algebra of the matrix group SO(n). We let

G = SO(n) = {A ∈ Mat

(R) : AA

= I, det A = 1}.

We let A(ε) be a curve in G with A(0) = I. Then we have

I = A(ε)A(ε)

= (I + εa + o(ε))(I + εa

+ o(ε))

= I + ε(a + a

) + o(ε).

So we must have

= 0, i.e.

is anti-symmetric. The other condition says

1 = det A(ε) = det(I + εa + o(ε)) = 1 + ε tr(a) + o(ε).

So we need tr(a) = 0, but this is already satisfied since A is antisymmetric.

So it looks like the Lie algebra

(

) corresponding to

(

) is the vector

space of anti-symmetric matrices:

so(n) = {a ∈ Mat

(R) : a + a

= 0}.

To see this really is the answer, we have to check that every antisymmetric

matrix comes from some curve. It is an exercise to check that the curve

A(t) = exp(at).

works.

We can manually check that g is closed under the commutator:

[a, b]

= ab − ba.

Indeed, we have

[a, b]

= (ab − ba)

= b

− a

= ba − ab = −[a, b]

Note that it is standard that if we have a group whose name is in capital

letters (e.g.

(

)), then the corresponding Lie algebra is the same thing in

lower case, fraktur letters (e.g. so(n)).

Note that above all else,

is a vector space. So (at least if

is finite-

dimensional) we can give

a basis

}

i=1

. Since the Lie bracket maps

g ×g → g

it must be the case that

, a

] =

k=1

for some constants c

. These are known as the structure constants.

5.2 Vector fields and one-parameter groups of transforma-

tions

Ultimately, we will be interested in coordinate transformations born of the action

of some Lie group. In other words, we let the Lie group act on our coordinate

space (smoothly), and then use new coordinates

x = g(x),

where

g ∈ G

for some Lie group

. For example, if

is the group of rotations,

then this gives new coordinates by rotating.

Recall that a vector field

→ R

defines an integral curve through the

point x via the solution of differential equations

dε

x = V(

x),

x(0) = x.

To represent solutions to this problem, we use the flow map g

defined by

˜x(ε) = g

x = x + εV(x) + o(ε).

We call

the generator of the flow. This flow map is an example of a one-

parameter group of transformations.

Definition

(One-parameter group of transformations)

A smooth map

→ R

is called a one-parameter group of transformations (1.p.g.t) if

= id, g

= g

+ε

We say such a one-parameter group of transformations is generated by the vector

field

V(x) =

dε



ε=0

Conversely, every vector field

→ R

generates a one-parameter group of

transformations via solutions of

dε

x = V(

x),

x(0) = x.

For some absurd reason, differential geometers decided that we should repre-

sent vector fields in a different way. This notation is standard but odd-looking,

and is in many settings more convenient.

Notation.

Consider a vector field

= (

, ··· , V

)

→ R

. This vector

field uniquely defines a differential operator

V = V

∂

∂x

+ V

∂

∂x

+ ··· + V

∂

∂x

Conversely, any linear differential operator gives us a vector field like that. We

will confuse a vector field with the associated differential operator, and we think

of the

∂

∂x

as a basis for our vector field.

Example. We will write the vector field V = (x

+ y, yx) as

V = (x

+ y)

∂

∂x

+ yx

∂

∂y

One good reason for using this definition is that we have a simple description

of the commutator of two vector fields. Recall that the commutator of two vector

fields V, W was previously defined by

[V, W]



V ·

∂

∂x



W −



W ·

∂

∂x





= V

∂W

∂x

− W

∂V

∂x

Now if we think of the vector field as a differential operator, then we have

V = V ·

∂

∂x

, W = W ·

∂

∂x

The usual definition of commutator would then be

(V W − W V )(f) = V

∂

∂x

∂f

∂x

− W

∂

∂x

∂f

∂x



∂W

∂x

− W

∂V

∂x



∂f

∂x

+ V

∂

∂x

− W

∂

∂x



∂W

∂x

− W

∂V

∂x



∂f

∂x

= [V, W] ·

∂

∂x

So with the new notation, we literally have

[V, W ] = V W − WV.

We shall now look at some examples of vector fields and the one-parameter

groups of transformations they generate. In simple cases, it is not hard to find

the correspondence.

Example. Consider a vector field

V = x

∂

∂x

∂

∂y

This generates a 1-parameter group of transformations via solutions to

dε

= ˜x,

d˜y

dε

= 1

where

(˜x(0), ˜y(0)) = (x, y).

As we are well-trained with differential equations, we can just write down the

solution

(˜x(ε), ˜y(ε)) = g

(x, y) = (xe

, y + ε)

Example. Consider the natural action of SO(2)

∼

on R

via

(x, y) = (x cos ε − y sin ε, y cos ε + x sin ε).

We can show that

and

+ε

. The generator of this vector

field is

V =



d˜x

dε



ε=0



∂

∂x



d˜y

dε



ε=0



∂

∂y

= −y

∂

∂x

+ x

∂

∂y

We can plot this as:

Example. If

V = α

∂

∂x

then we have

x = x + αε.

This is a translation with constant speed.

If we instead have

V = βx

∂

∂x

then we have

x = e

βε

which is scaling x up at an exponentially growing rate.

How does this study of one-parameter group of transformations relate to our

study of Lie groups? It turns out the action of Lie groups on

can be reduced

to the study of one-parameter groups of transformations. If a Lie group

acts

, then it might contain many one-parameter groups of transformations.

More precisely, we could find some elements

∈ G

depending smoothly on

such that the action of g

on R

is a one-parameter group of transformation.

It turns out that Lie groups contain a lot of one-parameter groups of trans-

formations. In general, given any

(

)

∈ G

(in a neighbourhood of

e ∈ G

), we

can reach it via a sequence of one-parameter group of transformations:

g(t) = g

···g

So to understand a Lie group, we just have to understand the one-parameter

groups of transformations. And to understand these one-parameter groups, we

just have to understand the vector fields that generate them, i.e. the Lie algebra,

and this is much easier to deal with than a group!

5.3 Symmetries of differential equations

So far we’ve just been talking about Lie groups in general. We now try to apply

this to differential equations. We will want to know when a one-parameter group

of transformations is a symmetry of a differential equation.

We denote a general (ordinary) differential equation by

∆[x, u, u

, u

···] = 0.

Note that in general, ∆ can be a vector, so that we can have a system of equations.

We say

(

) is a solution to the differential equation if it satisfies the above

equation.

Suppose

be a 1-parameter group of transformations generated by a vector

field V , and consider the new coordinates

(

x, ˜u) = g

(x, u).

Note that we transform both the domain

and the codomain

of the function

u(x), and we are allowed to mix them together.

We call g

a Lie point symmetry of ∆ if

∆[x, u, u

, ···] = 0 =⇒ ∆[

u, ˜u

, ···] = 0

In other words, it takes solutions to solutions.

We say this Lie point symmetry is generated by V .

Example. Consider the KdV equation

∆ = u

+ u

xxx

− 6uu

= 0.

Then translation in the t direction given by

(x, t, u) = (x, t + ε, u)

is a Lie point symmetry. This is generated by

V =

∂

∂t

Indeed, by the chain rule, we have

∂˜u

∂

∂u

∂

∂t

∂

∂u

∂t

∂x

∂

∂u

∂x

∂u

∂t

Similarly, we have

˜u

˜x

= u

, ˜u

˜x˜x˜x

= u

xxx

So if

∆[x, t, u] = 0,

then we also have

∆[˜x,

t, ˜u] = ∆[x, t, u] = 0.

In other words, the vector field

∂

∂t

generates a Lie point symmetry of the

KdV equation.

Obviously Lie point symmetries give us new solutions from old ones. More

importantly, we can use it to solve equations!

Example. Consider the ODE

= F





We see that there are things that look like

u/x

on both sides. So it is not too

hard to see that this admits a Lie-point symmetry

(x, u) = (e

x, e

u).

This Lie point symmetry is generated by

V = x

∂

∂x

+ t

∂

∂t

The trick is to find coordinates (

s, t

) such that

(

) = 0 and

(

) = 1. We call

these “invariant coordinates”. Then since

is still a symmetry of the equation,

this suggests that

should not appear explicitly in the differential equation, and

this will in general make our lives easier. Of course, terms like

can still appear

because translating t by a constant does not change t

We pick

s =

, t = log |x|,

which does indeed satisfy V (s) = 0, V (t) = 1. We can invert these to get

x = e

, u = se

With respect to the (s, t) coordinates, the ODE becomes

F (s) − s

at least for

(

)

. As promised, this does not have an explicit

dependence.

So we can actually integrate this thing up. We can write the solution as

t = C +

F (s

) − s

Going back to the original coordinates, we know

log |x| = C

u/x

F (s) − s

If we actually had an expression for

and did the integral, we could potentially

restore this to get an expression of

in terms of

. So the knowledge of the Lie

point symmetry allowed us to integrate up our ODE.

In general, for an nth order ODE

∆[x, u, u

, ··· , u

(n)

] = 0

admitting a Lie point symmetry generated by

V = ξ(x, u)

∂

∂x

+ η(x, u)

∂

∂u

we introduce coordinates

s = s(u, x), t = t(u, x)

such that in the new coordinates, we have

V =

∂

∂t

This means that in the new coordinates, the ODE has the form

∆[s, t

, ··· , t

(n)

] = 0.

Note that there is no explicit

! We can now set

, so we get an (

n −

1)th

order ODE

∆[s, r, r

, ··· , r

(n−1)

i.e. we have reduced the order of the ODE by 1. Now rinse and repeat.

5.4 Jets and prolongations

This is all nice, but we still need to find a way to get Lie point symmetries. So

far, we have just found them by divine inspiration, which is not particularly

helpful. In general. Is there a more systematic way of finding Lie symmetries?

We can start by looking at the trivial case — a 0th order ODE

∆[x, u] = 0.

Then we know g

: (x, u) 7→ (˜x, ˜u) is a Lie point symmetry if

∆[x, u] = 0 =⇒ ∆[˜x, ˜u] = ∆[g

(x, u)] = 0.

Can we reduce this to a statement about the generator of g

? Here we need to

assume that ∆ is of maximal rank , i.e. the matrix of derivatives

∂∆

∂y

is of maximal rank, where the

runs over

x, u

, and in general all coordinates.

So for example, the following theory will not work if, say ∆[

x, u

] =

. Assuming

∆ is indeed of maximal rank, it is an exercise on the example sheet to see that if

V is the generator of g

, then g

is a Lie point symmetry iff

∆ = 0 =⇒ V (∆) = 0.

This essentially says that the flow doesn’t change ∆ iff the derivative of ∆

along

is constantly zero, which makes sense. Here we are thinking of

as a

differential operator. We call this constraint an on-shell condition, because we

only impose it whenever ∆ = 0 is satisfied, instead of at all points.

This equivalent statement is very easy! This is just an algebraic equation for

the coefficients of V , and it is in general very easy to solve!

However, as you may have noticed, these aren’t really ODE’s. They are just

equations. So how do we generalize this to

N ≥

1 order ODE’s? Consider a

general vector field

V (x, u) = ξ(x, u)

∂

∂x

+ η(x, u)

∂

∂u

This only knows what to do to

and

. But if we know how

and

change, we

should also know how

, u

etc. change. Indeed this is true, and extending the

action of V to the derivatives is known as the prolongation of the vector field.

We start with a concrete example.

Example. Consider a 1 parameter group of transformations

: (x, u) 7→ (e

x, e

−ε

u) = (˜x, ˜u)

with generator

V = x

∂

∂x

− u

∂

∂u

This induces a transformation

(x, u, u

) 7→ (˜x, ˜u, ˜u

˜x

)

By the chain rule, we know

d˜u

d˜x

d˜u/dx

d˜x/dx

= e

−2ε

So in fact

(˜x, ˜u, ˜u

˜x

) ≡ (e

x, e

−ε

u, e

−2ε

If we call (

x, u

) coordinates for the base space, then we call the extended

system (

x, u, u

) coordinates for the first jet space. Given any function

(

we will get a point (x, u, u

) in the jet space for each x.

What we’ve just seen is that a one-parameter group of transformation of the

base space induces a one-parameter group of transformation of the first jet space.

This is known as the prolongation, written

(1)

: (x, u, u

) 7→ (˜x, ˜u, ˜u

˜x

) = (e

x, e

−ε

u, e

−2ε

One might find it a bit strange to call

a coordinate. If we don’t like doing

that, we can just replace

with a different symbol

. If we have the

derivative, we replace the nth derivative with p

Since we have a one-parameter group of transformations, we can write down

the generator. We see that pr

(1)

is generated by

(1)

V = x

∂

∂x

− u

∂

∂u

− 2u

∂

∂u

This is called the first prolongation of V .

Of course, we can keep on going. Similarly,

(2)

acts on the second jet

space which has coordinates (x, u, u

, u

). In this case, we have

(2)

(x, u, u

, u

) 7→ (˜x, ˜u, ˜u

˜x˜x

) ≡ (e

x, e

−ε

u, e

−2ε

, e

−3ε

This is then generated by

(2)

V = x

∂

∂x

− u

∂

∂u

− 2u

∂

∂u

− 3u

∂

∂u

Note that we don’t have to recompute all terms. The

x, u, u

terms did not

change, so we only need to check what happens to ˜u

˜x˜x

We can now think of an nth order ODE

∆[x, u, u

, ··· , u

(n)

] = 0

as an algebraic equation on the

th jet space. Of course, this is not just an

arbitrary algebraic equation. We will only consider solutions in the

th jet space

that come from some function

(

). Similarly, we only consider symmetries

on the

th jet space that come from the prolongation of some transformation on

the base space.

With that restriction in mind, we have effectively dressed up our problem

into an algebraic problem, just like the case of ∆[

x, u

] = 0 we discussed at the

beginning. Then g

: (x, u) 7→ (˜x, ˜u) is a Lie point symmetry if

∆[˜x, ˜u, ˜u

˜x

, . . . , ˜u

(n)

] = 0

when ∆ = 0. Or equivalently, we need

∆[pr

(n)

(x, u, . . . , u

(n)

)] = 0

when ∆ = 0. This is just a one-parameter group of transformations on a

huge coordinate system on the jet space. Thinking of all

x, u, ··· , u

(n)

as just

independent coordinates, we can rewrite it in terms of vector fields. (Assuming

maximal rank) this is equivalent to asking for

(n)

V (∆) = 0.

This results in an overdetermined system of differential equations for (

ξ, η

where

V (x, u) = ξ(x, u)

∂

∂x

+ η(x, u)

∂

∂u

Now in order to actually use this, we need to be able to compute the

prolongation of an arbitrary vector field. This is what we are going to do next.

Note that if we tried to compute the prolongation of the action of the Lie

group, then it would be horrendous. However, what we actually need to compute

is the prolongation of the vector field, which is about the Lie algebra. This makes

it much nicer.

We can write

(x, u) = (˜x, ˜u) = (x + εξ(x, u), uε + η(x, u)) + o(ε).

We know the nth prolongation of V must be of the form

(n)

V = V +

k=1

∂

∂u

(k)

where we have to find out what η

is. Then we know η

will satisfy

(x, u, ··· , u

(n)

) = (˜x, ˜u, ··· , ˜u

(n)

)

= (x + εξ, u + εη, u

+ η

, ··· , u

(n)

+ εη

) + o(ε).

To find η

, we use the contact condition

d˜u =

d˜u

d˜x

d˜x = ˜u

˜x

d˜x.

We now use the fact that

˜x = x + εξ(x, u) + o(ε)

˜u = x + εη(x, u) + o(ε).

Substituting in, we have

du + εdη = ˜u

˜x

(dx + εdξ) + o(ε).

We want to write everything in terms of dx. We have

du = u

dη =

∂η

∂x

dx +

∂η

∂u



∂η

∂x

+ u

∂η

∂u



= D

η dx,

where D

is the total derivative

∂

∂x

+ u

∂

∂u

+ u

∂

∂u

+ ··· .

We similarly have

dξ = D

ξdx.

So substituting in, we have

+ εD

η)dx = ˜u

˜x

(1 + εD

ξ)dx + o(ε).

This implies that

˜u

˜x

+ εD

1 + εD

+ o(ε)

= (u

+ εD

η)(1 − εD

ξ) + o(ε)

= u

+ ε(D

η − u

ξ) + o(ε).

So we have

= D

η − u

η.

Now building up η

recursively, we use the contact condition

d˜u

(k)

d˜u

(k)

d˜x

d˜x = ˜u

(k+1)

d˜x.

We use

˜u

(k)

= u

(k)

+ εη

+ o(ε)

˜x = x + εξ + o(ε).

Substituting that back in, we get

(k+1)

+ εD

) dx = ˜u

(k+1)

(1 + εD

ξ)dx + o(ε).

So we get

˜u

(k+1)

= (u

(k+1)

+ εD

)(1 − εD

ξ) + o(ε)

= u

(k+1)

+ ε(D

− u

(k+1)

ξ) + o(ε).

So we know

k+1

= D

− u

(k+1)

ξ.

So we know

Proposition (Prolongation formula). Let

V (x, u) = ξ(x, u)

∂

∂x

+ η(x, u)

∂

∂u

Then we have

(n)

V = V +

k=1

∂

∂u

(k)

where

= η(x, u)

k+1

= D

− u

(k+1)

ξ.

Example. For

: (x, u) 7→ (e

x, e

−ε

u),

we have

V = x

∂

∂x

+ (−u)

∂

∂u

So we have

ξ(x, u) = x

η(x, u) = −u.

So by the prolongation formula, we have

(1)

V = V + η

∂

∂u

where

= D

(−u) − u

(x) = −2u

in agreement with what we had earlier!

In the last example sheet, we will derive an analogous prolongation formula

for PDEs.

5.5 Painlev´e test and integrability

We end with a section on the Painlev´e test. If someone just gave us a PDE,

how can we figure out if it is integrable? It turns out there are some necessary

conditions for integrability we can check.

Recall the following definition.

Definition

(Singularirty)

A singularity of a complex-valued function

(

)

is a place at which it loses analyticity.

These can be poles, branch points, essential singularities etc.

Suppose we had an ODE of the form

+ p(z)

+ q(z)w = 0,

and we want to know if the solutions have singularities. It turns out that

any singularity of a solution

(

) must be inherited from the functions

(

)

, q

(

). In particular, the locations of the singularities will not depend on

initial conditions w(z

), w

This is not the case for non-linear ODE’s. For example, the equation

+ w

= 0

gives us

w(z) =

z − z

The location of this singularity changes, and it depends on the initial condition.

We say it is movable.

This leads to the following definition:

Definition (Painlev´e property). We will say that an ODE of the form

= F



n−1

, ··· , w, z



has the Painlev´e property if the movable singularities of its solutions are at worst

poles.

Example. We have

+ w

= 0.

has a solution

w(z) =

z − z

Since this movable singularity is a pole, this has the Painleve´e property.

Example. Consider the equation

+ w

= 0.

Then the solution is

w(z) =

2(z − z

)

whose singularity is not a pole.

In the olden days, Painlev´e wanted to classify all ODE’s of the form

= F



, w, z



where F is a rational function, that had the Painlev´e property.

He managed to show that there are fifty such equations (up to simple

coordinate transformations). The interesting thing is that 44 of these can be

solved in terms of well-known functions, e.g. Jacobi elliptic functions, Weierstrass

℘ functions, Bessel functions etc.

The remaining six gave way to solutions that were genuinely new functions,

called the six Painlev´e transcendents. The six differential equations are

(PI)

= 6w

+ z

(PII)

= 2x

+ zw + α

(PIII)







−

+ αw

+ β



+ γw

(PIV)





+ 4zw

+ 2(z

− α)w +

(PV)



w − 1





−

(w − 1)



αw +



γw

δw(w + 1)

w − 1

(PVI)



w − 1

w − z





−



z − 1

w − z



w(w −1)(w − z)

(z − 1)



α +

βz

γ(z − 1)

(w − 1)

δz(z − 1)

(w − z)



Fun fact: Painlev´e served as the prime minister of France twice, for 9 weeks and

7 months respectively.

This is all good, but what has this got to do with integrability of PDE’s?

Conjecture

(Ablowitz-Ramani-Segur conjecture (1980))

Every ODE reduction

(explained later) of an integrable PDE has the Painlev´e property.

This is still a conjecture since, as we’ve previously mentioned, we don’t

really have a definition of integrability. However, we have proved this conjecture

for certain special cases, where we have managed to pin down some specific

definitions.

What do we mean by ODE reduction? Vaguely speaking, if we have a Lie

point symmetry of a PDE, then we can use it to introduce coordinates that are

invariant and then form subsequence ODE’s in these coordinates. We can look

at some concrete examples:

Example.

In the wave equation, we can try a solution of the form

(

x, t

) =

(

x − ct

), and then the wave equation gives us an ODE (or lack of) in terms of

Example. Consider the sine–Gordon equation in light cone coordinates

= sin u.

This equation admits a Lie-point symmetry

: (x, t, u) 7→ (e

x, e

−ε

t, u),

which is generated by

V = x

∂

∂x

− t

∂

∂t

We should now introduce a variable invariant under this Lie-point symmetry.

Clearly z = xt is invariant, since

V (z) = xt − tx = 0.

What we should do, then is to look for a solution that depends on z, say

u(x, t) = F (z).

Setting

w = e

the sine–Gordon equation becomes





−

This is equivalent to PIII, i.e. this ODE reduction has the Painlev´e property.

Example. Consider the KdV equation

+ u

xxx

− 6uu

= 0.

This admits a not-so-obvious Lie-point symmetry

(x, t, u) =



x + εt +

, t + ε, u −



This is generated by

V = t

∂

∂x

∂

∂t

−

∂

∂u

We then have invariant coordinates

z = x −

, w =

t + u.

To get an ODE for w, we write the second equation as

u(x, t) = −

t + w(z).

Then we have

= −

− tw

(z), u

= w

(z), u

= w

(z), u

xxx

= w

000

(z).

So KdV becomes

0 = u

+ u

xxx

− 6uu

= −

+ w

000

(z) − 6ww

(z).

We would have had some problems if the

’s didn’t get away, because we wouldn’t

have an ODE in

. But since we constructed these coordinates, such that

and

are invariant under the Lie point symmetry but

is not, we are guaranteed

that there will be no t left in the equation.

Integrating this equation once, we get an equation

(z) − 3w(z)

−

z + z

= 0,

which is is PI. So this ODE reduction of KdV has the Painlev´e property.

In summary, the Painlev´e test of integrability is as follows:

(i) Find all Lie point symmetries of the PDE.

(ii) Find all corresponding ODE reductions.

(iii) Test each ODE for Painlev´e property.

We can then see if our PDE is not integrable. Unfortunately, there is no real

test for the converse.