III Stochastic Calculus and Applications (Full)

Part III — Stochastic Calculus and Applications

Based on lectures by R. Bauerschmidt

Notes taken by Dexter Chua

Lent 2018

These notes are not endorsed by the lecturers, and I have modified them (often

significantly) after lectures. They are nowhere near accurate representations of what

was actually lectured, and in particular, all errors are almost surely mine.

– Brownian motion. Existence and sample path properties.

–

Stochastic calculus for continuous processes. Martingales, local martingales, semi-

martingales, quadratic variation and cross-variation, Itˆo’s isometry, definition of

the stochastic integral, Kunita–Watanabe theorem, and Itˆo’s formula.

–

Applications to Brownian motion and martingales. L´evy characterization of

Brownian motion, Dubins–Schwartz theorem, martingale representation, Gir-

sanov theorem, conformal invariance of planar Brownian motion, and Dirichlet

problems.

–

Stoc hastic differential equations. Strong and weak solutions, notions of existence

and uniqueness, Yamada–Watanabe theorem, strong Markov property, and

relation to second order partial differential equations.

Pre-requisites

Knowledge of measure theoretic probability as taught in Part III Advanced Probability

will be assumed, in particular familiarity with discrete-time martingales and Brownian

motion.

Contents

0 Introduction

1 The Lebesgue–Stieltjes integral

2 Semi-martingales

2.1 Finite variation processes

2.2 Local martingale

2.3 Square integrable martingales

2.4 Quadratic variation

2.5 Covariation

2.6 Semi-martingale

3 The stochastic integral

3.1 Simple processes

3.2 Itˆo isometry

3.3 Extension to local martingales

3.4 Extension to semi-martingales

3.5 Itˆo formula

3.6 The L´evy characterization

3.7 Girsanov’s theorem

4 Stochastic differential equations

4.1 Existence and uniqueness of solutions

4.2 Examples of stochastic differential equations

4.3 Representations of solutions to PDEs

0 Introduction

Ordinary differential equations are central in analysis. The simplest class of

equations tend to look like

˙x(t) = F (x(t)).

Stochastic differential equations are differential equations where we make the

function

“random”. There are many ways of doing so, and the simplest way

is to write it as

˙x(t) = F (x(t)) + η(t),

where

is a random function. For example, when modeling noisy physical

systems, our physical bodies will be subject to random noise. What should we

expect the function

to be like? We might expect that for

|t − s| 

0, the

variables

(

) and

(

) are “essentially” independent. If we are interested in

physical systems, then this is a rather reasonable assumption, since random noise

is random!

In practice, we work with the idealization, where we claim that

(

) and

(

) are independent for

t 6

. Such an

exists, and is known as white noise.

However, it is not a function, but just a Schwartz distribution.

To understand the simplest case, we set F = 0. We then have the equation

˙x = η.

We can write this in integral form as

x(t) = x(0) +

η(s) ds.

To make sense of this integral, the function

should at least be a signed measure.

Unfortunately, white noise isn’t. This is bad news.

We ignore this issue for a little bit, and proceed as if it made sense. If the

equation held, then for any 0 = t

< t

< ···, the increments

x(t

) − x(t

i−1

) =

i−1

η(s) ds

should be independent, and moreover their variance should scale linearly with

− t

i−1

|. So maybe this x should be a Brownian motion!

Formalizing these ideas will take up a large portion of the course, and the

work isn’t always pleasant. Then why should we be interested in this continuous

problem, as opposed to what we obtain when we discretize time? It turns out

in some sense the continuous problem is easier. When we learn measure theory,

there is a lot of work put into constructing the Lebesgue measure, as opposed

to the sum, which we can just define. However, what we end up is much easier

— it’s easier to integrate

than to sum

∞

n=1

. Similarly, once we have set

up the machinery of stochastic calculus, we have a powerful tool to do explicit

computations, which is usually harder in the discrete world.

Another reason to study stochastic calculus is that a lot of continuous

time processes can be described as solutions to stochastic differential equations.

Compare this with the fact that functions such as trigonometric and Bessel

functions are described as solutions to ordinary differential equations!

There are two ways to approach stochastic calculus, namely via the Itˆo

integral and the Stratonovich integral. We will mostly focus on the Itˆo integral,

which is more useful for our purposes. In particular, the Itˆo integral tends to

give us martingales, which is useful.

To give a flavour of the construction of the Itˆo integral, we consider a simpler

scenario of the Wiener integral.

Definition

(Gaussian space)

Let (Ω

, F, P

) be a probability space. Then a

subspace

S ⊆ L

(Ω

, F, P

) is called a Gaussian space if it is a closed linear

subspace and every X ∈ S is a centered Gaussian random variable.

An important construction is

Proposition.

Let

be any separable Hilbert space. Then there is a probability

space (Ω

, F, P

) with a Gaussian subspace

S ⊆ L

(Ω

, F, P

) and an isometry

H → S

. In other words, for any

f ∈ H

, there is a corresponding random

variable

(

)

∼ N

(

f, f

)

). Moreover,

(

αf

βg

) =

αI

(

) +

βI

(

) and

(f, g)

= E[I(f)I(g)].

Proof.

By separability, we can pick a Hilbert space basis (

)

∞

i=1

. Let

(Ω

, F, P

) be any probability space that carries an infinite independent sequence

of standard Gaussian random variables

∼ N

1). Then send

, extend

by linearity and continuity, and take S to be the image.

In particular, we can take H = L

Definition

(Gaussian white noise)

A Gaussian white noise on

is an isometry

W N

from

(

) into some Gaussian space. For

A ⊆ R

, we write

W N

(

) =

W N(1

Proposition.

– For A ⊆ R

with |A| < ∞, W N (A) ∼ N(0, |A|).

–

For disjoint

A, B ⊆ R

, the variables

W N

(

) and

W N

(

) are indepen-

dent.

– If A =

∞

i=1

for disjoint sets A

⊆ R

, with |A| < ∞, |A

| < ∞, then

W N(A) =

∞

i=1

W N(A

) in L

and a.s.

Proof. Only the last point requires proof. Observe that the partial sum

i=1

W N(A)

is a martingale, and is bounded in L

as well, since

i=1

EW N(A

)

i=1

| ≤ |A|.

So we are done by the martingale convergence theorem. The limit is indeed

W N(A) because 1

∞

n=1

The point of the proposition is that

W N

really looks like a random measure

, except it is not. We only have convergence almost surely above, which

means we have convergence on a set of measure 1. However, the set depends on

which

and

we pick. For things to actually work out well, we must have a

fixed set of measure 1 for which convergence holds for all A and A

But perhaps we can ignore this problem, and try to proceed. We define

= W N([0, t])

for t ≥ 0.

Exercise.

This

is a standard Brownian motion, except for the continuity

requirement. In other words, for any

, t

, . . . , t

, the vector (

)

i=1

is jointly

Gaussian with

E[B

] = s ∧ t for s, t ≥ 0.

Moreover,

= 0 a.s. and

− B

is independent of

(

r ≤ s

). Moreover,

− B

∼ N(0, t − s) for t ≥ s.

In fact, by picking a good basis of L

), we can make B

continuous.

We can now try to define some stochastic integral. If

f ∈ L

(

) is a step

function,

f =

i=1

]

with s

< t

, then

W N(f) =

i=1

− B

)

This motivates the notation

W N(f) =

f(s) dB

However, extending this to a function that is not a step function would be

problematic.

1 The Lebesgue–Stieltjes integral

In calculus, we are able to perform integrals more exciting than simply

(

) d

In particular, if

h, a

: [0

→ R

are

functions, we can perform integrals of

the form

h(x) da(x).

For them, it is easy to make sense of what this means — it’s simply

h(x) da =

h(x)a

(x) dx.

In our world, we wouldn’t expect our functions to be differentiable, so this is

not a great definition. One reasonable strategy to make sense of this is to come

up with a measure that should equal “da”.

An immediate difficulty we encounter is that

(

) need not be positive all the

time. So for example,

1 d

could be a negative number, which one wouldn’t

expect for a usual measure! Thus, we are naturally lead to think about signed

measures.

From now on, we always use the Borel

-algebra on [0

, T

] unless otherwise

specified.

Definition

(Signed measure)

A signed measure on [0

, T

] is a difference

−µ

−

of two positive measures on [0

, T

] of disjoint support. The decomposition

µ = µ

− µ

−

is called the Hahn decomposition.

In general, given two measures

and

with not necessarily disjoint

supports, we may still want to talk about µ

− µ

Theorem.

For any two finite measures

, µ

, there is a signed measure

with

µ(A) = µ

(A) − µ

(A).

and

are given by densities

, f

, then we can simply decompose

as (

−f

)

+ (

−f

)

−

, where

and

−

denote the positive and negative

parts respectively. In general, they need not be given by densities with respect to

dx, but they are always given by densities with respect to some other measure.

Proof.

Let

. By Radon–Nikodym, there are positive functions

, f

such that µ

(dt) = f

(t)ν(dt). Then

(µ

− µ

)(dt) = (f

− f

)

(t) · ν(dt) + (f

− f

)

−

(t) · ν(dt).

Definition

(Total variation)

The total variation of a signed measure

− µ

−

is |µ| = µ

+ µ

−

We now want to figure out how we can go from a function to a signed measure.

Let’s think about how one would attempt to define

(

) d

as a Riemann

sum. A natural option would be to write something like

h(s) da(s) = lim

m→∞

i=1

h(t

(m)

i−1

)



a(t

(m)

) − a(t

(m)

i−1

)



for any sequence of subdivisions 0 =

(m)

< ··· < t

(m)

of [0

, t

] with

max

(m)

− t

(m)

i−1

| → 0.

In particular, since we want the integral of

= 1 to be well-behaved, the

sum

(a(t

(m)

) − a(t

(m)

i−1

)) must be well-behaved. This leads to the notion of

Definition

(Total variation)

The total variation of a function

: [0

, T

]

→ R

(t) = |a(0)| + sup

(

i=1

|a(t

) − a(t

i−1

)| : 0 = t

< t

< ··· < t

= T

)

We say a has bounded variation if V

(T ) < ∞. In this case, we write a ∈ BV .

We include the

(0)

term because we want to pretend

is defined on all of

R with a(t) = 0 for t < 0.

We also define

Definition

(C`adl`ag)

A function

: [0

, T

]

→ R

is c`adl`ag if it is right-continuous

and has left-limits.

The following theorem is then clear:

Theorem. There is a bijection

signed measures on [0, T ]

←→



c`adl`ag functions of bounded

variation a : [0, T ] → R



that sends a signed measure

(

) =

([0

, t

]). To construct the inverse, given

a, we define

± a).

Then a

are both positive, and a = a

− a

−

. We can then define µ

[0, t] = a

(t) − a

(0)

µ = µ

− µ

−

Moreover, V

µ[0,t]

= |µ|[0, t].

Example. Let a : [0, 1] → R be given by

a(t) =

(

1 t <

0 t ≥

This is c`adl`ag, and it’s total variation is

(1) = 2. The associated signed

measure is

µ = δ

− δ

1/2

and the total variation measure is

|µ| = δ

+ δ

1/2

We are now ready to define the Lebesgue–Stieltjes integral.

Definition

(Lebesgue–Stieltjes integral)

Let

: [0

, T

]

→ R

be c`adl`ag of

bounded variation and let

be the associated signed measure. Then for

h ∈

([0, T ], |µ|), the Lebesgue–Stieltjes integral is defined by

h(r) da(r) =

(s,t]

h(r)µ(dr),

where 0 ≤ s ≤ t ≤ T , and

h(r) |da(r)| =

(s,t]

h(r)|µ|(dr).

We also write

h · a(t) =

h(r) da(r).

To let T = ∞, we need the following notation:

Definition

(Finite variation)

A c`adl`ag function

: [0

, ∞

)

→ R

is of finite

variation if a|

[0,T ]

∈ BV [0, 1] for all T > 0.

Fact. Let a : [0, T ] → R be c`adl`ag and BV, and h ∈ L

([0, T ], |da|), then



h(s) da(s)



≤

|h(s)| |da(s)|,

and the function

h · a

: [0

, T

]

→ R

is c`adl`ag and BV with associated signed

measure h(s) da(s). Moreover, |h(s) da(s)| = |h(s)| |da(s)|.

We can, unsurprisingly, characterize the Lebesgue–Stieltjes integral by a

Riemann sum:

Proposition.

Let

be c`adl`ag and BV on [0

, t

], and

bounded and left-

continuous. Then

h(s) da(s) = lim

m→∞

i=1

h(t

(m)

i−1

)



a(t

(m)

) − a(t

(m)

i−1

)



h(s) |da(s)| = lim

m→∞

i=1

h(t

(m)

i−1

)



a(t

(m)

) − a(t

(m)

i−1

)



for any sequence of subdivisions 0 =

(m)

< ··· < t

(m)

of [0

, t

] with

max

(m)

− t

(m)

i−1

| → 0.

Proof. We approximate h by h

defined by

(0) = 0, h

(s) = h(t

(m)

i−1

) for s ∈ (t

(m)

i−1

, t

(m)

Then by left continuity, we have

h(s) = lim

n→∞

(s)

by left continuity, and moreover

lim

m→∞

i=1

h(t

(m)

i−1

)(a(t

(m)

) − a(t

(m)

i−1

)) = lim

m→∞

(0,t]

(s)µ( ds) =

(0,t]

h(s)µ(ds)

by dominated convergence theorem. The statement about

(

)

is left as an

exercise.

2 Semi-martingales

The title of the chapter is “semi-martingales”, but we are not going even meet

the definition of a semi-martingale till the end of the chapter. The reason is that

a semi-martingale is essentially defined to be the sum of a (local) martingale and

a finite variation process, and understanding semi-martingales mostly involves

understanding the two parts separately. Thus, for most of the chapter, we

will be studying local martingales (finite variation processes are rather more

boring), and at the end we will put them together to say a word or two about

semi-martingales.

From now on, (Ω

, F,

(

)

t≥0

, P

) will be a filtered probability space. Recall

the following definition:

Definition

(C`adl`ag adapted process)

A c`adl`ag adapted process is a map

X : Ω × [0, ∞) → R such that

(i) X is c`adl`ag, i.e. X(ω, ·) : [0, ∞) → R is c`adl`ag for all ω ∈ Ω.

(ii) X is adapted, i.e. X

= X( ·, t) is F

-measurable for every t ≥ 0.

Notation.

We will write

X ∈ G

to denote that a random variable

is measur-

able with respect to a σ-algebra G.

2.1 Finite variation processes

The definition of a finite variation function extends immediately to a finite

variation process.

Definition

(Finite variation process)

A finite variation process is a c`adl`ag

adapted process

such that

(

ω, ·

) : [0

, ∞

)

→ R

has finite variation for all

ω ∈ Ω. The total variation process V of a finite variation process A is

|dA

Proposition.

The total variation process

of a c`adl`ag adapted process

also c`adl`ag, finite variation and adapted, and it is also increasing.

Proof.

We only have to check that it is adapted. But that follows directly

from our previous expression of the integral as the limit of a sum. Indeed, let

0 =

(m)

< t

(m)

< ··· < t

be a (nested) sequence of subdivisions of [0

, t

]

with max

(m)

− t

(m)

i−1

| → 0. We have seen

= lim

m→∞

i=1

(m)

− A

(m)

i−1

| + |A(0)| ∈ F

Definition

((

H ·A

)

Let

be a finite variation process and

a process such

that for all ω ∈ Ω and t ≥ 0,

(ω)| |dA

(ω)| < ∞.

Then define a process ((H · A)

)

t≥0

(H · A)

For the process H · A to be adapted, we need a condition.

Definition

(Previsible process)

A process

: Ω

, ∞

)

→ R

is previsible if

it is measurable with respect to the previsible

-algebra

generated by the sets

E × (s, t], where E ∈ F

and s < t. We call the generating set Π.

Very roughly, the idea is that a previsible event is one where whenever it

happens, you know it a finite (though possibly arbitrarily small) before.

Definition

(Simple process)

A process

: Ω

, ∞

)

→ R

is simple, written

H ∈ E, if

H(ω, t) =

i=1

i−1

(ω)1

i−1

]

(t)

for random variables H

i−1

∈ F

i−1

and 0 = t

< ··· < t

Fact. Simple processes and their limits are previsible.

Fact.

Let

be a c`adl`ag adapted process. Then

−

defines a left-

continuous process and is previsible.

In particular, continuous processes are previsible.

Proof.

Since

is c`adl`ag adapted, it is clear that

is left-continuous and

adapted. Since

is left-continuous, it is approximated by simple processes.

Indeed, let

i=1

(i−1)2

−n

((i−1)2

−n

,i2

−n

]

(t) ∧ n ∈ E.

Then H

→ H for all t by left continuity, and previsibility follows.

Exercise. Let H be previsible. Then

∈ F

−

= σ(F

: s < t).

Example. Brownian motion is previsible (since it is continuous).

Example. A Poisson process (N

) is not previsible since N

6∈ F

−

Proposition. Let A be a finite variation process, and H previsible such that

|H(ω, s)| |dA(ω, s)| < ∞ for all (ω, t) ∈ Ω × [0, ∞).

Then H · A is a finite variation process.

Proof.

The finite variation and c`adl`ag parts follow directly from the deterministic

versions. We only have to check that

H · A

is adapted, i.e. (

H · A

)(

·, t

)

∈ F

for

all t ≥ 0.

First,

H ·A

is adapted if

(

ω, s

) = 1

(u,v]

(

) for some

u < v

and

E ∈ F

since

(H · A)(ω, t) = 1

(ω)(A(ω, t ∧ v) − A(ω, t ∧ u)) ∈ F

Thus,

H ·A

is adapted for

when

F ∈

Π. Clearly, Π is a

system, i.e. it

is closed under intersections and non-empty, and by definition it generates the

previsible

-algebra

. So to extend the adaptedness of

H · A

to all previsible

H, we use the monotone class theorem.

We let

V = {H : Ω × [0, ∞) → R : H · A is adapted}.

Then

(i) 1 ∈ V

(ii) 1

∈ V for all F ∈ Π.

(iii) V is closed under monotone limits.

So V contains all bounded P-measurable functions.

So the conclusion is that if

is a finite variation process, then as long as

reasonable finiteness conditions are satisfied, we can integrate functions against

. Moreover, this integral was easy to define, and it obeys all expected

properties such as dominated convergence, since ultimately, it is just an integral

in the usual measure-theoretic sense. This crucially depends on the fact that

is a finite variation process.

However, in our motivating example, we wanted to take

to be Brownian

motion, which is not of finite variation. The work we will do in this chapter

and the next is to come up with a stochastic integral where we let

be a

martingale instead. The heuristic idea is that while martingales can vary wildly,

the martingale property implies there will be some large cancellation between

the up and down movements, which leads to the possibility of a well-defined

stochastic integral.

2.2 Local martingale

From now on, we assume that (Ω

, F,

(

)

, P

) satisfies the usual conditions,

namely that

(i) F

contains all P-null sets

(ii) (F

)

is right-continuous, i.e. F

= (F

s>t

for all t ≥ 0.

We recall some of the properties of continuous martingales.

Theorem

(Optional stopping theorem)

Let

be a c`adl`ag adapted integrable

process. Then the following are equivalent:

(i) X is a martingale, i.e. X

∈ L

for every t, and

E(X

| F

) = X

for all t > s.

(ii)

The stopped process

= (

) = (

T ∧t

) is a martingale for all stopping

times T .

(iii)

For all stopping times

T, S

with

bounded,

∈ L

and

(

| F

) =

T ∧S

almost surely.

(iv) For all bounded stopping times T , X

∈ L

and E(X

) = E(X

For X uniformly integrable, (iii) and (iv) hold for all stopping times.

In practice, most of our results will be first proven for bounded martingales,

or perhaps square integrable ones. The point is that the square-integrable

martingales form a Hilbert space, and Hilbert space techniques can help us say

something useful about these martingales. To get something about a general

martingale

, we can apply a cutoff

inf{t >

0 :

≥ n}

, and then

will be a martingale for all

. We can then take the limit

n → ∞

to recover

something about the martingale itself.

But if we are doing this, we might as well weaken the martingale condition

a bit — we only need the

to be martingales. Of course, we aren’t doing

this just for fun. In general, martingales will not always be closed under the

operations we are interested in, but local (or maybe semi-) martingales will be.

In general, we define

Definition

(Local martingale)

A c`adl`ag adapted process

is a local martingale

if there exists a sequence of stopping times

such that

→ ∞

almost surely,

and X

is a martingale for every n. We say the sequence T

reduces X.

Example.

(i)

Every martingale is a local martingale, since by the optional stopping

theorem, we can take T

= n.

(ii) Let (B

) to be a standard 3d Brownian motion on R

. Then

)

t≥1





t≥1

is a local martingale but not a martingale.

To see this, first note that

sup

t≥1

< ∞, EX

→ 0.

Since

→

0 and

≥

0, we know

cannot be a martingale. However,

we can check that it is a local martingale. Recall that for any f ∈ C

= f(B

) − f(B

) −

∆f(B

) ds

is a martingale. Moreover, ∆

|x|

= 0 for all

x 6

= 0. Thus, if

|x|

didn’t have

a singularity at 0, this would have told us

is a martingale. Thus, we

are safe if we try to bound |B

| away from zero.

Let

= inf



t ≥ 1 : |B

| <



and pick

∈ C

such that

(

) =

|x|

for

|x| ≥

. Then

− X

t∧T

. So X

is a martingale.

It remains to show that

→ ∞

, and this follows from the fact that

→ 0.

Proposition.

Let

be a local martingale and

≥

0 for all

. Then

is a

supermartingale.

Proof. Let (T

) be a reducing sequence for X. Then

E(X

| F

) = E



lim inf

n→∞

t∧T

| F



≤ lim

n→∞

E(X

t∧T

| F

)

= lim inf

→∞

s∧T

= X

Recall the following result from Advanced Probability:

Proposition. Let X ∈ L

(Ω, F, P). Then the set

χ = {E(X | G) : G ⊆ F a sub-σ-algebra}

is uniformly integrable, i.e.

sup

Y ∈χ

E(|Y |1

|Y |>λ

) → 0 as λ → ∞.

Recall also the following important result about uniformly integrable random

variables:

Theorem

(Vitali theorem)

. X

→ X

iff (

) is uniformly integrable and

→ X in probability.

With these, we can state the following characterization of martingales in

terms of local martingales:

Proposition. The following are equivalent:

(i) X is a martingale.

(ii) X is a local martingale, and for all t ≥ 0, the set

= {X

: T is a stopping time with T ≤ t}

is uniformly integrable.

Proof.

–

(a)

⇒

(b): Let

be a martingale. Then by the optional stopping theorem,

(

| F

) for any bounded stopping time

T ≤ t

. So

is uniformly

integrable.

–

(b)

⇒

(a): Let

be a local martingale with reducing sequence (

), and

assume that the sets

are uniformly integrable for all

t ≥

0. By the

optional stopping theorem, it suffices to show that

(

) =

(

) for any

bounded stopping time T .

So let T be a bounded stopping time, say T ≤ t. Then

E(X

) = E(X

T ∧T

)

for all

. Now

T ∧ T

is a stopping time

≤ t

, so

T ∧T

}

is uniformly

integrable by assumption. Moreover,

∧T → T

almost surely as

n → ∞

hence

T ∧T

→ X

in probability. Hence by Vitali, this converges in

E(X

) = E(X

Corollary.

Z ∈ L

is such that

| ≤ Z

for all

, then

is a martingale. In

particular, every bounded local martingale is a martingale.

The definition of a local martingale does not give us control over what the

reducing sequence

}

is. In particular, it is not necessarily true that

will be bounded, which is a helpful property to have. Fortunately, we have the

following proposition:

Proposition. Let X be a continuous local martingale with X

= 0. Define

= inf{t ≥ 0 : |X

| = n}.

Then

is a stopping time,

→ ∞

and

is a bounded martingale. In

particular, (S

) reduces X.

Proof. It is clear that S

is a stopping time, since (if it is not clear)

≤ t} =

k∈N



sup

s≤t

| > n −



k∈N

[

s<t,s∈Q



| > n −



∈ F

It is also clear that S

→ ∞, since

sup

s≤t

| ≤ n ↔ S

≥ t,

and by continuity and compactness, sup

s≤t

| is finite for every (ω, t).

Finally, we show that

is a martingale. By the optional stopping theorem,

∧S

is a martingale, so

is a local martingale. But it is also bounded by

n. So it is a martingale.

An important and useful theorem is the following:

Theorem.

Let

be a continuous local martingale with

= 0. If

is also a

finite variation process, then X

= 0 for all t.

This would rule out interpreting

as a Lebesgue–Stieltjes integral

for

a non-zero continuous local martingale. In particular, we cannot take

to be Brownian motion. Instead, we have to develop a new theory of integration

for continuous local martingales, namely the Itˆo integral.

On the other hand, this theorem is very useful. We will later want to define

the stochastic integral with respect to the sum of a continuous local martingale

and a finite variation process, which is the appropriate generality for our theorems

to make good sense. This theorem tells us there is a unique way to decompose a

process as a sum of a finite variation process and a continuous local martingale

(if it can be done). So we can simply define this stochastic integral by using the

Lebesgue–Stieltjes integral on the finite variation part and the Itˆo integral on

the continuous local martingale part.

Proof.

Let

be a finite-variation continuous local martingale with

= 0. Since

is finite variation, we can define the total variation process (

) corresponding

to X, and let

= inf{t ≥ 0 : V

≥ n} = inf



t ≥ 0 :

|dX

| ≥ n



Then

is a stopping time, and

→ ∞

since

is assumed to be finite

variation. Moreover, by optional stopping,

is a local martingale, and is also

bounded, since

≤

t∧S

|dX

| ≤ n.

So X

is in fact a martingale.

We claim its

-norm vanishes. Let 0 =

< t

< ··· < t

be a

subdivision of [0

, t

]. Using the fact that

is a martingale and has orthogonal

increments, we can write

E((X

)

) =

i=1

E((X

− X

i−1

)

Observe that

is finite variation, but the right-hand side is summing the

square of the variation, which ought to vanish when we take the limit

max |t

−

i−1

| → 0. Indeed, we can compute

E((X

)

) =

i=1

E((X

− X

i−1

)

≤ E

max

1≤i≤k

− X

i−1

i=1

− X

i−1

≤ E



max

1≤i≤k

− X

i−1

t∧S



≤ E



max

1≤i≤k

− X

i−1



Of course, the first term is also bounded by the total variation. Moreover, we

can make further subdivisions so that the mesh size tends to zero, and then the

first term vanishes in the limit by continuity. So by dominated convergence, we

must have

((

)

) = 0. So

= 0 almost surely for all

. So

= 0 for all

t almost surely.

2.3 Square integrable martingales

As previously discussed, we will want to use Hilbert space machinery to construct

the Itˆo integral. The rough idea is to define the Itˆo integral with respect to a

fixed martingale on simple processes via a (finite) Riemann sum, and then by

calculating appropriate bounds on how this affects the norm, we can extend this

to all processes by continuity, and this requires our space to be Hilbert. The

interesting spaces are defined as follows:

Definition (M

). Let



X : Ω × [0, ∞) → R : X is c´adl´ag martingale with sup

t≥0

E(X

) < ∞





X ∈ M

: X(ω, ·) is continuous for every ω ∈ Ω



We define an inner product on M

(X, Y )

= E(X

∞

which in aprticular induces a norm

kXk



E(X

∞

)



1/2

We will prove this is indeed an inner product soon. Here recall that for

X ∈ M

the martingale convergence theorem implies

→ X

∞

almost surely and in

Our goal will be to prove that these spaces are indeed Hilbert spaces. First

observe that if

X ∈ M

, then (

)

t≥0

is a submartingale by Jensen, so

t 7→ EX

is increasing, and

∞

= sup

t≥0

All the magic that lets us prove they are Hilbert spaces is Doob’s inequality.

Theorem (Doob’s inequality). Let X ∈ M

. Then



sup

t≥0



≤ 4E(X

∞

So once we control the limit

∞

, we control the whole path. This is why

the definition of the norm makes sense, and in particular we know

kXk

= 0

implies that X = 0.

Theorem. M

is a Hilbert space and M

is a closed subspace.

Proof.

We need to check that

is complete. Thus let (

)

⊆ M

be a Cauchy

sequence, i.e.

E((X

∞

− X

∞

)

) → 0 as n, m → ∞.

By passing to a subsequence, we may assume that

E((X

∞

− X

n−1

∞

)

) ≤ 2

−n

First note that

sup

t≥0

− X

n−1

≤



sup

t≥0

− X

n−1



1/2

(CS)

≤



∞

− X

n−1

∞



1/2

(Doob’s)

≤ 2

−n/2

< ∞.

∞

n=1

sup

t≥0

− X

n−1

| < ∞ a.s. (∗)

So on this event, (

) is a Cauchy sequence in the space (

, ∞

)

, k · k

∞

) of

c´adl´ag sequences. So there is some X(ω, ·) ∈ D[0, ∞) such that

(ω, ·) − X(ω, ·)k

∞

→ 0 for almost all ω.

and we set X = 0 outside this almost sure event (∗). We now claim that



sup

t≥0

− X|



→ 0 as n → ∞.

We can just compute



sup

− X|



= E



lim

m→∞

sup

− X



≤ lim inf

m→∞



sup

− X



(Fatou)

≤ lim inf

m→∞

4E(X

∞

− X

∞

)

(Doob’s)

and this goes to 0 in the limit n → ∞ as well.

We finally have to check that

is indeed a martingale. We use the triangle

inequality to write

kE(X

| F

) − X

≤ kE(X

− X

| F

+ kX

− X

≤ E(E((X

− X

)

| F

))

1/2

+ kX

− X

= kX

− X

+ kX

− X

≤ 2E



sup

− X



1/2

→ 0

n → ∞

. But the left-hand side does not depend on

. So it must vanish. So

X ∈ M

We could have done exactly the same with continuous martingales, so the

second part follows.

2.4 Quadratic variation

Physicists are used to dropping all terms above first-order. It turns out that

Brownian motion, and continuous local martingales in general oscillate so wildly

that second order terms become important. We first make the following definition:

Definition

(Uniformly on compact sets in probability)

For a sequence of

processes (X

) and a process X, we say that X

→ X u.c.p. iff

sup

s∈[0,t]

− X

| > ε

→ 0 as n → ∞ for all t > 0, ε > 0.

Theorem.

Let

be a continuous local martingale with

= 0. Then there

exists a unique (up to indistinguishability) continuous adapted increasing process

(

hMi

)

t≥0

such that

hMi

= 0 and

− hMi

is a continuous local martingale.

Moreover,

hMi

= lim

n→∞

hMi

(n)

, hMi

(n)

i=1

−n

− M

(i−1)2

−n

)

where the limit u.c.p.

Definition (Quadratic variation). hMi is called the quadratic variation of M .

It is probably more useful to understand

hMi

in terms of the explicit formula,

and the fact that

− hM i

is a continuous local martingale is a convenient

property.

Example.

Let

be a standard Brownian motion. Then

−t

is a martingale.

Thus, hBi

= t.

The proof is long and mechanical, but not hard. All the magic happened

when we used the magical Doob’s inequality to show that

and

are

Hilbert spaces.

Proof.

To show uniqueness, we use that finite variation and local martingale

are incompatible. Suppose (

) and (

) obey the conditions for

hMi

. Then

−

= (

−

)

−

(

−A

) is a continuous adapted local martingale starting

at 0. Moreover, both

and

are increasing, hence have finite variation. So

A −

A = 0 almost surely.

To show existence, we need to show that the limit exists and has the right

property. We do this in steps.

Claim. The result holds if M is in fact bounded.

Suppose

(

ω, t

)

| ≤ C

for all (

ω, t

). Then

M ∈ M

. Fix

T >

0 deterministic.

Let

T e

i=1

(i−1)2

−n

∧t

− M

(i−1)2

−n

∧t

This is defined so that

hMi

(n)

−n

= M

−n

− 2X

−n

This reduces the study of hM i

(n)

to that of X

−n

We check that (

) is a Cauchy sequence in

. The fact that it is a

martingale is an immediate computation. To show it is Cauchy, for

n ≥ m

, we

calculate

∞

− X

∞

T e

i=1

(i−1)2

−n

− M

b(i−1)2

m−n

−m

)(M

−n

− M

(i−1)2

−n

We now take the expectation of the square to get

E(X

∞

− X

∞

)

= E





T e

i=1

(i−1)2

−n

− M

b(i−1)2

m−n

−m

)

−n

− M

(i−1)2

−n

)





≤ E





sup

|s−t|≤2

−m

− M

T e

i=1

−n

− M

(i−1)2

−n

)





= E

sup

|s−t|≤2

−m

− M

hMi

(n)

≤ E

sup

|s−t|≤2

−m

− M

1/2



(hMi

(n)

)



1/2

(Cauchy–Schwarz)

We shall show that the second factor is bounded, while the first factor tends to

zero as

m → ∞

. These are both not surprising — the first term vanishing in

the limit corresponds to

being continuous, and the second term is bounded

since M itself is bounded.

To show that the first term tends to zero, we note that we have

− M

≤ 16C

and moreover

sup

|s−t|≤2

−m

− M

| → 0 as m → ∞ by uniform continuity.

So we are done by the dominated convergence theorem.

To show the second term is bounded, we do (writing N = d2

T e)



(hMi

(n)

)



= E





i=1

−n

− M

(i−1)2

−n

)





i=1



−n

− M

(i−1)2

−n

)



+ 2

i=1

−n

− M

(i−1)2

−n

)

k=i+1

−n

− M

(k−1)2

−n

)

We use the martingale property and orthogonal increments the rearrange the

off-diagonal term as



−n

− M

(i−1)2

−n

)(M

−n

− M

−n

)



Taking some sups, we get



(hMi

(n)

)



≤ 12C

i=1

−n

− M

(i−1)2

−n

)

= 12C



−n

− M

)



≤ 12C

· 4C

So done.

So we now have X

→ X in M

for some X ∈ M

. In particular, we have



sup

− X



→ 0

So we know that

sup

− X

| → 0

almost surely along a subsequence Λ.

Let N ⊆ Ω be the events on which this convergence fails. We define

(T )

(

− 2X

ω ∈ Ω \ N

0 ω ∈ N

Then

(T )

is continuous, adapted since

and

are, and (

t∧T

− A

(T )

t∧T

)

is a

martingale since

is. Finally,

(T )

is increasing since

− X

is increasing

on 2

−n

Z ∩

, T

] and the limit is uniform. So this

(T )

basically satisfies all the

properties we want hM i

to satisfy, except we have the stopping time T .

We next observe that for any

T ≥

(T )

t∧T

(T +1)

t∧T

for all

almost surely.

This essentially follows from the same uniqueness argument as we had at the

beginning of the proof. Thus, there is a process (hM i

)

t≥0

such that

hMi

= A

(T )

for all

t ∈

, T

] and

T ∈ N

, almost surely. Then this is the desired process. So

we have constructed hMi in the case where M is bounded.

Claim. hMi

(n)

→ hMi u.c.p.

Recall that

hMi

(n)

= M

−n

− 2X

−n

We also know that

sup

t≤T

− X

| → 0

in L

, hence also in probability. So we have

|hMi

− hMi

(n)

| ≤ sup

t≤T

−n

− M

+ sup

t≤T

−n

− X

−n

| + sup

t≤T

−n

− X

The first and last terms

→

0 in probability since

and

are uniformly

continuous on [0

, T

]. The second term converges to zero by our previous assertion.

So we are done.

Claim. The theorem holds for M any continuous local martingale.

We let

inf{t ≥

0 :

| ≥ n}

. Then (

) reduces

and

is a

bounded martingale. So in particular

is a bounded continuous martingale.

We set

= hM

Then (

) and (

n+1

t∧T

) are indistinguishable for

t < T

by the uniqueness argu-

ment. Thus there is a process

hMi

such that

hMi

t∧T

are indistinguishable

for all

. Clearly,

hMi

is increasing since the

are, and

t∧T

−hM i

t∧T

is a

martingale for every n, so M

− hMi

is a continuous local martingale.

Claim. hMi

(n)

→ hMi u.c.p.

We have seen

(n)

→ hM

i u.c.p.

for every k. So



sup

t≤T

|hMi

(n)

− hM

i| > ε



≤ P(T

< T ) + P



sup

t≤T

|hM

(n)

− hM

> ε



So we can fisrt pick

large enough such that the first term is small, then pick

large enough so that the second is small.

There are a few easy consequence of this theorem.

Fact.

Let

be a continuous local martingale, and let

be a stopping time.

Then alsmot surely for all t ≥ 0,

= hMi

t∧T

Proof.

Since

−hMi

is a continuous local martingle, so is

t∧T

−hMi

t∧T

)

− hMi

t∧T

. So we are done by uniqueness.

Fact.

Let

be a continuous local martingale with

= 0. Then

= 0 iff

hMi = 0.

Proof.

= 0, then

hMi

= 0. Conversely, if

hMi

= 0, then

is a continuous

local martingale and positive. Thus EM

≤ EM

= 0.

Proposition.

Let

M ∈ M

. Then

− hMi

is a uniformly integrable martin-

gale, and

kM − M

= (EhMi

∞

)

1/2

Proof. We will show that hM i

∞

∈ L

. This then implies

− hMi

| ≤ sup

t≥0

+ hMi

∞

Then the right hand side is in

. Since

− hMi

is a local martingale, this

implies that it is in fact a uniformly integrable martingale.

To show hM i

∞

∈ L

, we let

= inf{t ≥ 0 : hM i

≥ n}.

Then S

→ ∞, S

is a stopping time and moreover hMi

t∧S

≤ n. So we have

t∧S

− hMi

t∧S

≤ n + sup

t≥0

and the second term is in L

. So M

t∧S

− hMi

t∧S

is a true martingale.

t∧S

− EM

= EhMi

t∧S

Taking the limit

t → ∞

, we know

t∧S

→ EM

by dominated convergence.

Since

hMi

t∧S

is increasing, we also have

EhMi

t∧S

→ EhMi

by monotone

convergence. We can take n → ∞, and by the same justification, we have

EhMi ≤ EM

∞

− EM

= E(M

∞

− M

)

< ∞.

2.5 Covariation

We know

not only has a norm, but also an inner product. This can also be

reflected in the bracket by the polarization identity, and it is natural to define

Definition

(Covariation)

Let

M, N

be two continuous local martingales. Define

the covariation (or simply the bracket) between M and N to be process

hM, Ni

(hM + N i

− hM − N i

Then if in fact M, N ∈ M

, then putting t = ∞ gives the inner product.

Proposition.

(i) hM, N i

is the unique (up to indistinguishability) finite variation process

such that M

− hM, Ni

is a continuous local martingale.

(ii) The mapping (M, N) 7→ hM, Ni is bilinear and symmetric.

(iii)

hM, Ni

= lim

n→∞

hM, Ni

(n)

u.c.p.

hM, Ni

(n)

i=1

−n

−M

(i−1)2

−n

)(N

−n

− N

(i−1)

−n

(iv) For every stopping time T ,

, N

= hM

, Ni

= hM, Ni

t∧T

(v)

M, N ∈ M

, then

−hM, Ni

is a uniformly integrable martingale,

and

hM − M

, N − N

= EhM, Ni

∞

Example.

Let

B, B

be two independent Brownian motions (with respect to

the same filtration). Then hB, B

i = 0.

Proof.

Assume

= 0. Then

√

(

B ± B

) are Brownian motions,

and so hX

i = t. So their difference vanishes.

An important result about the covariation is the following Cauchy–Schwarz

like inequality:

Proposition

(Kunita–Watanabe)

Let

M, N

be continuous local martingales

and let H, K be two (previsible) processes. Then almost surely

∞

||K

||dhM, Ni

| ≤



∞

dhMi



1/2



∞

hNi



1/2

In fact, this is Cauchy–Schwarz. All we have to do is to take approximations

and take limits and make sure everything works out well.

Proof. For convenience, we write

hM, Ni

= hM, Ni

− hM, Ni

Claim. For all 0 ≤ s ≤ t, we have

|hM, Ni

| ≤

hM, Mi

hN, Ni

By continuity, we can assume that s, t are dyadic rationals. Then

|hM, Ni

| = lim

n→∞



i=2

s+1

−n

− M

(i−1)2

−n

)(N

−n

− N

(i−1)2

−n

)



≤ lim

n→∞



i=2

s+1

−n

− M

(i−1)2

−n

)



1/2



i=2

s+1

−n

− N

(i−1)2

−n

)



1/2

(Cauchy–Schwarz)



hM, Mi



1/2



hN, Ni



1/2

where all equalities are u.c.p.

Claim. For all 0 ≤ s < t, we have

|dhM, Ni

| ≤

hM, Mi

hN, Ni

Indeed, for any subdivision s = t

< t

< ···t

= t, we have

i=1

|hM, Ni

i−1

| ≤

i=1

hM, Mi

i−1

hN, Ni

i−1

≤

i=1

hM, Mi

i−1

1/2

i=1

hN, Ni

i−1

1/2

(Cauchy–Schwarz)

Taking the supremum over all subdivisions, the claim follows.

Claim. For all bounded Borel sets B ⊆ [0, ∞), we have

|dhM, Ni

| ≤

dhMi

dhNi

We already know this is true if

is an interval. If

is a finite union of

integrals, then we apply Cauchy–Schwarz. By a monotone class argument, we

can extend to all Borel sets.

Claim. The theorem holds for

H =

`=1

, K =

`=1

for B

⊆ [0, ∞) bounded Borel sets with disjoint support.

We have

| |dhM, Ni

| ≤

`=1

M, Ni

≤

`=1



dhMi



1/2



dhNi



1/2

≤

`=1

dhMi

1/2

`=1

dhNi

1/2

To finish the proof, approximate general

and

by step functions and take

the limit.

2.6 Semi-martingale

Definition

(Semi-martingale)

A (continuous) adapted process

is a (contin-

uous) semi-martingale if

X = X

+ M + A,

where

∈ F

is a continuous local martingale with

= 0, and

is a

continuous finite variation process with A

= 0.

This decomposition is unique up to indistinguishables.

Definition

(Quadratic variation)

Let

and

be (continuous) semi-martingales. Set

hXi = hMi, hX, X

i = hM, M

This definition makes sense, because continuous finite variation processes do

not have quadratic variation.

Exercise. We have

hX, Y i

(n)

i=1

−n

− X

(i−1)2

−n

)(Y

−n

− Y

(i−1)2

−n

) → hX, Y i u.c.p.

3 The stochastic integral

3.1 Simple processes

We now have all the background required to define the stochastic integral, and

we can start constructing it. As in the case of the Lebesgue integral, we first

define it for simple processes, and then extend to general processes by taking a

limit. Recall that we have

Definition

(Simple process)

The space of simple processes

consists of func-

tions H : Ω × [0, ∞) → R that can be written as

(ω) =

i=1

i−1

(ω)1

i−1

]

(t)

for some 0 ≤ t

≤ t

≤ ··· ≤ t

and bounded random variables H

∈ F

Definition (H · M ). For M ∈ M

and H ∈ E, we set

H dM = (H · M )

i=1

i−1

∧t

− M

i−1

∧t

were of finite variation, then this is the same as what we have previously

seen.

Recall that for the Lebesgue integral, extending this definition to general

functions required results like monotone convergence. Here we need some similar

results that put bounds on how large the integral can be. In fact, we get

something better than a bound.

Proposition. If M ∈ M

and H ∈ E, then H · M ∈ M

and

kH · M k

= E



∞

dhMi



. (∗)

Proof.

We first show that

H · M ∈ M

. By linearity, we only have to check it

for

= H

i−1

∧t

− M

i−1

∧t

)

We have to check that

(

| F

) = 0 for all

t > s

, and the only non-trivial case

is when t > t

i−1

E(X

| F

) = H

i−1

E(M

∧t

− M

i−1

∧t

| F

) = 0.

We also check that

≤ 2kHk

∞

kMk

So it is bounded. So H · M ∈ M

To prove (∗), we note that the X

are orthogonal and that

= H

i−1

(hMi

∧t

− hMi

i−1

∧t

So we have

hH ·M, H ·M i =

, X

i =

i−1

(hMi

∧t

−hM i

i−1

∧t

) =

dhMi

In particular,

kH · M k

= EhH · M i

∞

= E



∞

dhMi



Proposition. Let M ∈ M

and H ∈ E. Then

hH · M, N i = H · hM, Ni

for all N ∈ M

In other words, the stochastic integral commutes with the bracket.

Proof. Write H · M =

i−1

∧t

− M

i−1

∧t

) as before. Then

, Ni

= H

i−1

∧t

− M

i−1

∧t

, Ni = H

i−1

(hM, Ni

∧t

− hM, Ni

i−1

∧t

3.2 Itˆo isometry

We now try to extend the above definition to something more general than

simple processes.

Definition

(

))

Let

M ∈ M

. Define

(

) to be the space of (equiva-

lence classes of) previsible H : Ω × [0, ∞) → R such that

kHk

(M)

= kHk

= E



∞

dhMi



1/2

< ∞.

For H, K ∈ L

(M), we set

(H, K)

(M)

= E



∞

dhMi



In fact,

(

) is equal to

(Ω

, ∞

)

, P,

hMi

), where

is the

previsible σ-algebra, and in particular is a Hilbert space.

Proposition. Let M ∈ M

. Then E is dense in L

(M).

Proof.

Since

(

) is a Hilbert space, it suffices to show that if (

K, H

) = 0 for

all H ∈ E, then K = 0.

So assume that (K, H) = 0 for all H ∈ E and

dhMi

Then

is a well-defined finite variation process, and

∈ L

for all

. It

suffices to show that

= 0 for all

, and we shall show that

is a continuous

martingale.

Let 0

≤ s < t

and

F ∈ F

bounded. We let

(s,t]

∈ E

. By assumption,

we know

0 = (K, H) = E



dhMi



= E(F (X

− X

)).

Since this holds for all F

measurable F , we have shown that

E(X

| F

) = X

So X is a (continuous) martingale, and we are done.

Theorem. Let M ∈ M

. Then

(i)

The map

H ∈ E 7→ H ·M ∈ M

extends uniquely to an isometry

(

)

→

, called the Itˆo isometry.

(ii) For H ∈ L

(M), H · M is the unique martingale in M

such that

hH · M, N i = H · hM, Ni

for all

N ∈ M

, where the integral on the LHS is the stochastic integral

(as above) and the RHS is the finite variation integral.

(iii) If T is a stopping time, then (1

[0,T ]

H) · M = (H · M )

= H · M

Definition

(Stochastic integral)

. H · M

is the stochastic integral of

with

respect to M and we also write

(H · M )

It is important that the integral of martingale is still a martingale. After

proving Itˆo’s formula, we will use this fact to show that a lot of things are in

fact martingales in a rather systematic manner. For example, it will be rather

effortless to show that

− t

is a martingale when

is a standard Brownian

motion.

Proof.

(i)

We have already shown that this map is an isometry when restricted to

So extend by completeness of M

and denseness of E.

(ii)

Again the equation to show is known for simple

, and we want to show

it is preserved under taking limits. Suppose

→ H

(

) with

∈ L

(M). Then H

· M → H · M in M

. We want to show that

hH · M, N i

∞

= lim

n→∞

· M, Ni

∞

in L

H · hM, N i = lim

n→∞

· hM, Ni in L

for all N ∈ M

To show the first holds, we use the Kunita–Watanabe inequality to get

E|hH · M − H

· M, Ni

∞

| ≤ E (hH · M − H

· Mi

∞

)

1/2

(EhNi

∞

)

1/2

and the first factor is

kH ·M −H

·Mk

→

0, while the second is finite

since N ∈ M

. The second follows from

E |((H − H

) · hM, Ni)

∞

| ≤ kH − H

(M)

kNk

→ 0.

So we know that

hH · M, N i

∞

= (

H · hM, N i

)

∞

. We can then replace

by the stopped process N

to get hH · M, N i

= (H · hM, N i)

To see uniqueness, suppose

X ∈ M

is another such martingale. Then we

have

hX − H · M, N i

= 0 for all

. Take

X − H · M

, and then we

are done.

(iii) For N ∈ M

, we have

h(H · M )

, Ni

= hH · M, N i

t∧T

= H · hM, N i

t∧T

= (H1

[0,T ]

· hM, Ni)

for every N . So we have shown that

(H · M )

= (1

[0,T ]

H · M )

by (ii). To prove the second equality, we have

hH ·M

, Ni

= H ·hM

, Ni

= H ·hM, N i

t∧T

= ((H1

[0,T ]

·hM, N i)

Note that (ii) can be written as

(−)

, N

dhM, Ni

Corollary.

hH · M, K · N i = H · (K · hM, Ni) = (HK) · hM, Ni.

In other words,

(−)

dhM, Ni

Corollary.

Since

H · M

and (

H · M

)(

K · N

)

− hH · M, K · Ni

are martingales

starting at 0, we have



H dM



= 0







dhM, Ni

Corollary.

Let

H ∈ L

(

), then

HK ∈ L

(

) iff

K ∈ L

(

H · M

), in which

case

(KH) · M = K · (H · M).

Proof. We have



∞

dhM



= E



∞

hH · M i



so kKk

(H·M)

= kHKk

(M)

. For N ∈ M

, we have

h(KH)·M, Ni

= (KH ·hM, Ni)

= (K ·(H ·hM, Ni))

= (K ·hH ·M, Ni)

3.3 Extension to local martingales

We have now defined the stochastic integral for continuous martingales. We next

go through some formalities to extend this to local martingales, and ultimately

to semi-martingales. We are not doing this just for fun. Rather, when we

later prove results like Itˆo’s formula, even when we put in continuous (local)

martingales, we usually end up with some semi-martingales. So it is useful to be

able to deal with semi-martingales in general.

Definition (L

(M)). Let L

(M) be the space of previsible H such that

dhMi

< ∞ a.s.

for all finite t > 0.

Theorem. Let M be a continuous local martingale.

(i)

For every

H ∈ L

(

), there is a unique continuous local martingale

H ·M

with (H · M )

= 0 and

hH · M, N i = H · hM, Ni

for all N, M.

(ii) If T is a stopping time, then

[0,T ]

H) · M = (H · M )

= H · M

(iii)

H ∈ L

loc

(

is previsible, then

K ∈ L

loc

(

H ·M

) iff

HK ∈ L

loc

(

and then

K · (H · M) = (KH) · M.

(iv)

Finally, if

M ∈ M

and

H ∈ L

(

), then the definition is the same as

the previous one.

Proof.

Assume

= 0, and that

hMi

< ∞

for all

ω ∈

Ω (by setting

H = 0 when this fails). Set

= inf



t ≥ 0 :

(1 + H

) dhMi

≥ n



These S

are stopping times that tend to infinity. Then

, M

= hM, Mi

t∧S

≤ n.

So M

∈ M

. Also,

∞

dhM

dhMi

≤ n.

H ∈ L

(

), and we have already defined what

H · M

is. Now notice

that

H · M

= (H · M

)

for m ≥ n.

So it makes sense to define

H · M = lim

n→∞

H · M

This is the unique process such that (

H · M

)

H · M

. We see that

H · M

is a continuous adapted local martingale with reducing sequence S

Claim. hH · M, Ni = H · hM, Ni.

Indeed, assume that

= 0. Set

inf{t ≥

0 :

| ≥ n}

. Set

∧ S

. Observe that N

∈ M

. Then

hH · M, N i

= hH · M

, N

i = H · hM

, N

i = H · hM, Ni

Taking the limit n → ∞ gives the desired result.

The proofs of the other claims are the same as before, since they only use

the characterizing property hH · M, N i = H · hM, Ni.

3.4 Extension to semi-martingales

Definition

(Locally boounded previsible process)

A previsible process

locally bounded if for all t ≥ 0, we have

sup

s≤t

| < ∞ a.s.

Fact.

(i) Any adapted continuous process is locally bounded.

(ii)

is locally bounded and

is a finite variation process, then for all

t ≥ 0, we have

| |dA

| < ∞ a.s.

Now if

is a semi-martingale, where

∈ F

is a

continuous local martingale and

is a finite variation process, we want to define

. We already know what it means to define integration with respect to

and d

, using the Itˆo integral and the finite variation integral respectively,

and X

doesn’t change, so we can ignore it.

Definition

(Stochastic integral)

Let

be a continuous semi-

martingale, and

a locally bounded previsible process. Then the stochastic

integral H · X is the continuous semi-martingale defined by

H · X = H · M + H · A,

and we write

(H · X)

Proposition.

(i) (H, X) 7→ H · X is bilinear.

(ii) H · (K · X) = (HK) · X if H and K are locally bounded.

(iii) (H · X)

= H1

[0,T ]

· X = H · X

for every stopping time T .

(iv)

is a continuous local martingale (resp. a finite variation process), then

so is H · X.

(v)

i=1

i−1

]

and

i−1

∈ F

i−1

(not necessarily bounded),

then

(H · X)

i=1

i−1

∧t

− X

i−1

∧t

Proof.

(i) to (iv) follow from analogous properties for

H · M

and

H · A

. The

last part is also true by definition if the

are uniformly bounded. If

is not

bounded, then the finite variation part is still fine, since for each fixed

ω ∈

Ω,

(ω) is a fixed number. For the martingale part, set

= inf{t ≥ 0 : |H

| ≥ n}.

Then T

are stopping times, T

→ ∞, and H1

[0,T

]

∈ E. Thus

(H · M )

t∧T

i=1

i−1

[0,T

]

∧t

− X

i−1

∧t

Then take the limit n → ∞.

Before we get to Itˆo’s formula, we need a few more useful properties:

Proposition

(Stochastic dominated convergence theorem)

Let

be a contin-

uous semi-martingale. Let

H, H

be previsible and locally bounded, and let

be previsible and non-negative. Let t > 0. Suppose

(i) H

→ H

as n → ∞ for all s ∈ [0, t].

(ii) |H

| ≤ K

for all s ∈ [0, t] and n ∈ N.

(iii)

hMi

< ∞

and

| < ∞

(note that both conditions are

okay if K is locally bounded).

Then

→

in probability.

Proof.

For the finite variation part, the convergence follows from the usual

dominated convergence theorem. For the martingale part, we set

= inf



t ≥ 0 :

dhMi

≥ m



So we have





∧t

−

∧t





≤ E

∧t

− H

)

dhMi

→ 0.

using the usual dominated convergence theorem, since

∧t

dhMi

≤ m.

Since

∧ t

eventually as

n → ∞

almost surely, hence in probability, we

are done.

Proposition.

Let

be a continuous semi-martingale, and let

be an adapted

bounded left-continuous process. Then for every subdivision 0

< t

(m)

< t

(m)

··· < t

(m)

of [0, t] with max

(m)

− t

(m)

i−1

| → 0, then

= lim

m→∞

i=1

(m)

i−1

(m)

− X

(m)

i−1

)

in probability.

Proof.

We have already proved this for the Lebesgue–Stieltjes integral, and all

we used was dominated convergence. So the same proof works using stochastic

dominated convergence theorem.

3.5 Itˆo formula

We now prove the equivalent of the integration by parts and the chain rule, i.e.

Itˆo’s formula. Compared to the world of usual integrals, the difference is that

the quadratic variation, i.e. “second order terms” will crop up quite a lot, since

they are no longer negligible.

Theorem

(Integration by parts)

Let

X, Y

be a continuous semi-martingale.

Then almost surely,

− X

+ hX, Y i

The last term is called the Itˆo correction.

Note that if

X, Y

are martingales, then the first two terms on the right are

martingales, but the last is not. So we are forced to think about semi-martingales.

Observe that in the case of finite variation integrals, we don’t have the

correction.

Proof. We have

− X

= X

− Y

) + (X

− X

+ (X

− X

)(Y

− Y

When doing usual calculus, we can drop the last term, because it is second order.

However, the quadratic variation of martingales is in general non-zero, and so

we must keep track of this. We have

−n

− X

i=1

−n

− X

(i−1)2

−n

(i−1)2

−n

)

i=1



(i−1)2

−n

− Y

(i−1)2

−n

)

+ Y

(i−1)2

−n

− X

(i−1)2

−n

)

+ (X

−n

− X

(i−1)

−n

)(Y

−n

− Y

(i−1)2

−n

)



Taking the limit

n → ∞

with

−n

fixed, we see that the formula holds for

dyadic rational. Then by continuity, it holds for all t.

The really useful formula is the following:

Theorem

(Itˆo’s formula)

Let

, . . . , X

be continuous semi-martingales, and

let

→ R

. Then, writing

= (

, . . . , X

), we have, almost surely,

f(X

) = f(X

) +

i=1

∂f

∂x

) dX

i,j=1

∂

∂x

) dhX

, X

In particular, f (X) is a semi-martingale.

The proof is long but not hard. We first do it for polynomials by explicit

computation, and then use Weierstrass approximation to extend it to more

general functions.

Proof.

Claim. Itˆo’s formula holds when f is a polynomial.

It clearly does when

is a constant! We then proceed by induction. Suppose

Itˆo’s formula holds for some f . Then we apply integration by parts to

g(x) = x

f(x).

where x

denotes the kth component of x. Then we have

g(X

) = g(X

) +

df(X

) +

f(X

) dX

+ hX

, f(X)i

We now apply Itˆo’s formula for f to write

df(X

) =

i=1

∂f

∂x

) dX

i,j=1

∂

∂x

) dhX

, X

We also have

, f(X)i

i=1

∂f

∂x

) dhX

, X

So we have

g(X

) = g(X

) +

i=1

∂g

∂x

) dX

i,j=1

∂

∂x

) dhX

, X

So by induction, Itˆo’s formula holds for all polynomials.

Claim.

Itˆo’s formula holds for all

f ∈ C

(

)

| ≤ n

and

| ≤ n

for

all (t, ω).

By the Weierstrass approximation theorem, there are polynomials

such

that

sup

|x|≤k



|f(x) − p

(x)| + max



∂f

∂x

−

∂p

∂x



+ max

i,j



∂

∂x

−

∂p

∂x





≤

By taking limits, in probability, we have

f(X

) − f(X

) = lim

k→∞

) − p

))

∂f

∂x

) dX

= lim

k→∞

∂p

∂x

) dX

by stochastic dominated convergence theorem, and by the regular dominated

convergence, we have

∂f

∂x

dhX

, X

= lim

k→∞

∂

∂x

dhX

, X

Claim. Itˆo’s formula holds for all X.

Let

= inf



t ≥ 0 : |X

| ≥ n or

|dA

| ≥ n



Then by the previous claim, we have

f(X

) = f(X

) +

i=1

∂f

∂x

) d(X

)

i,j

∂

∂x

) dh(X

)

, (X

)

= f(X

) +

i=1

t∧T

∂f

∂x

) d(X

)

i,j

t∧T

∂

∂x

) dh(X

), (X

Then take T

→ ∞.

Example.

Let

be a standard Brownian motion,

= 0 and

(

) =

. Then

= 2

+ t.

In other words,

− t = 2

In particular, this is a continuous local martingale.

Example.

Let

= (

, . . . , B

) be a

-dimensional Brownian motion. Then

we apply Itˆo’s formula to the semi-martingale

= (

t, B

, . . . , B

). Then we

find that

f(t, B

) − f(0, B

) −



∂

∂s

∆



f(s, B

) ds =

i=1

∂

∂x

f(s, B

) dB

is a continuous local martingale.

There are some syntactic tricks that make stochastic integrals easier to

manipulate, namely by working in differential form. We can state Itˆo’s formula

in differential form

df(X

) =

i=1

∂f

∂x

i,j=1

∂

∂x

dhX

, X

which we can think of as the chain rule. For example, in the case case of Brownian

motion, we have

df(B

) = f

) dB

) dt.

Formally, one expands

using that that “(d

)

= 0” but “(d

)

= d

”. The

following formal rules hold:

− Z

⇐⇒ dZ

= H

= hX, Y i

dhX, Y i

⇐⇒ dZ

= dX

Then we have rules such as

) = (H

) dX

(dX

) = (H

) dY

d(X

) = X

+ Y

+ dX

df(X

) = f

) dX

3.6 The L´evy characterization

A more major application of the stochastic integral is the following convenient

characterization of Brownian motion:

Theorem

(L´evy’s characterization of Brownian motion)

Let (

, . . . , X

) be

continuous local martingales. Suppose that

= 0 and that

, X

for all

i, j

= 1

, . . . , d

and

t ≥

0. Then (

, . . . , X

) is a standard

-dimensional

Brownian motion.

This might seem like a rather artificial condition, but it turns out to be quite

useful in practice (though less so in this course). The point is that we know that

hH ·M i

·hMi

, and in particular if we are integrating things with respect

to Brownian motions of some sort, we know

, and so we are left with

some explicit, familiar integral to do.

Proof.

Let 0

≤ s < t

. It suffices to check that

−X

is independent of

and

− X

∼ N(0, (t − s)I).

Claim. E(e

iθ·(X

−X

)

| F

) = e

−

|θ|

(t−s)

for all θ ∈ R

and s < t.

This is sufficient, since the right-hand side is independent of

, hence so is

the left-hand side, and the Fourier transform characterizes the distribution.

To check this, for θ ∈ R

, we define

= θ · X

i=1

Then Y is a continuous local martingale, and we have

hY i

= hY, Y i

i,j=1

, X

= |θ|

by assumption. Let

= e

hY i

= e

iθ·X

|θ|

By Itˆo’s formula, with X = iY +

hY i

and f (x) = e

, we get

= Z



idY

−

dhY i



= iZ

So this implies

is a continuous local martingale. Moreover, since

is bounded

on bounded intervals of

, we know

is in fact a martingale, and

= 1. Then

by definition of a martingale, we have

E(Z

| F

) = Z

And unwrapping the definition of Z

shows that the result follows.

In general, the quadratic variation of a process doesn’t have to be linear in

It turns out if the quadratic variation increases to infinity, then the martingale

is still a Brownian motion up to reparametrization.

Theorem

(Dubins–Schwarz)

Let

be a continuous local martingale with

= 0 and hM i

∞

= ∞. Let

= inf{t ≥ 0 : hM i

> s},

the right-continuous inverse of

hMi

. Let

and

. Then

is a

) stopping time, hM i

= s for all s ≥ 0, B is a (G

)-Brownian motion, and

= B

hMi

Proof.

Since

hMi

is continuous and adapted, and

hMi

∞

, we know

is a

stopping time and T

< ∞ for all s ≥ 0.

Claim. (G

) is a filtration obeying the usual conditions, and G

∞

= F

∞

Indeed, if A ∈ G

and s < t, then

A ∩ {T

≤ u} = A ∩ {T

≤ u} ∩ {T

≤ u} ∈ F

using that

A ∩ {T

≤ u} ∈ F

since

A ∈ G

. Then right-continuity follows from

that of (F

) and the right-continuity of s 7→ T

Claim. B is adapted to (G

In general, if

is c´adl´ag and

is a stopping time, then

{T <∞}

∈ F

Apply this is with X = M , T = T

and F

= G

. Thus B

∈ G

Claim. B is continuous.

Here this is actually something to verify, because

s 7→ T

is only right contin-

uous, not necessarily continuous. Thus, we only know

is right continuous,

and we have to check it is left continuous.

Now B is left-continuous at s iff B

= B

−

, iff M

= M

s−

. Now we have

s−

= inf{t ≥ 0 : hM i

≥ s}.

s−

, then there is nothing to show. Thus, we may assume

> T

s−

Then we have

hMi

s−

. Since

hMi

is increasing, it means

hMi

constant in [T

s−

, T

]. We will later prove that

Lemma. M is constant on [a, b] iff hM i being constant on [a, b].

So we know that if T

> T

s−

, then M

= M

−

. So B is left continuous.

We then have to show that B is a martingale.

Claim. (M

− hMi)

is a uniformly integrable martingale.

To see this, observe that

∞

hMi

, and so

is bounded. So

− hMi)

is a uniformly integrable martingale.

We now apply the optional stopping theorem, which tells us

E(B

| G

) = E(M

∞

| G

) = M

= B

So B

is a martingale. Moreover,

E(B

− s | G

) = E((M

− hMi)

| F

) = M

− hMi

= B

− r.

− t

is a martingale, so by the characterizing property of the quadratic

variation,

hBi

. So by L´evy’s criterion, this is a Brownian motion in one

dimension.

The theorem is only true for martingales in one dimension. In two dimen-

sions, this need not be true, because the time change needed for the horizontal

and vertical may not agree. However, in the example sheet, we see that the

holomorphic image of a Brownian motion is still a Brownian motion up to a

time change.

Lemma. M is constant on [a, b] iff hM i being constant on [a, b].

Proof.

It is clear that if

is constant, then so is

hMi

. To prove the converse,

by continuity, it suffices to prove that for any fixed a < b,

= M

for all t ∈ [a, b]} ⊇ {hM i

= hMi

} almost surely.

We set N

= M

− M

∧ a. Then hN i

= hMi

− hMi

t∧a

. Define

= inf{t ≥ 0 : hN i

≥ ε}.

Then since N

− hNi is a local martingale, we know that

E(N

t∧T

) = E(hNi

t∧T

) ≤ ε.

Now observe that on the event

{hMi

hMi

}

, we have

hNi

= 0. So for

t ∈ [a, b], we have

E(1

{hMi

=hMi

}

) = E(1

{hMi

=hMi

t∧T

) = E(hNi

t∧T

) = 0.

3.7 Girsanov’s theorem

Girsanov’s theorem tells us what happens to our (semi)-martingales when we

change the measure of our space. We first look at a simple example when we

perform a shift.

Example.

Let

X ∼ N

, C

) be an

-dimensional centered Gaussian with

positive definite covariance

= (

)

i,j=1

. Put

−1

. Then for any

function f , we have

Ef(X) =



det

2π



1/2

f(x)e

−

(x,Mx)

dx.

Now fix an a ∈ R

. The distribution of X + a then satisfies

Ef(X + a) =



det

2π



1/2

f(x)e

−

(x−a,M(x−a))

dx = E[Zf (X)],

where

Z = Z(x) = e

−

(a,Ma)+(x,Ma)

Thus, if P denotes the distribution of X, then the measure Q with

= Z

is that of N (a, C) vector.

Example.

We can extend the above example to Brownian motion. Let

be a

Brownian motion with

= 0, and

: [0

, ∞

)

→ R

a deterministic function. We

then want to understand the distribution of B

+ h.

Fix a finite sequence of times 0 =

< t

< ··· < t

. Then we know that

(

)

i=1

is a centered Gaussian random variable. Thus, if

(

) =

(

, . . . , B

)

is a function, then

E(f(B)) = c ·

f(x)e

−

i=1

−x

i−1

)

−t

i−1

···dx

Thus, after a shift, we get

E(f(B + h)) = E(Zf (B)),

Z = exp

−

i=1

− h

i−1

)

− t

i−1

i=1

− h

i−1

)(B

− B

i−1

)

− t

i−1

In general, we are interested in what happens when we change the measure

by an exponential:

Definition

(Stochastic exponential)

Let

be a continuous local martingale.

Then the stochastic exponential (or Dol´eans–Dade exponential) of M is

E(M )

= e

−

hMi

The point of introducing that quadratic variation term is

Proposition.

Let

be a continuous local martingale with

= 0. Then

E(M ) = Z satisfies

= Z

dM,

i.e.

= 1 +

In particular,

(

) is a continuous local martingale. Moreover, if

hMi

uniformly bounded, then E(M ) is a uniformly integrable martingale.

There is a more general condition for the final property, namely Novikov’s

condition, but we will not go into that.

Proof. By Itˆo’s formula with X = M −

hMi, we have

= Z



−

dhMi



dhMi

= Z

Since

is a continuous local martingale, so is

. So

is a continuous

local martingale.

Now suppose hM i

∞

≤ b < ∞. Then



sup

t≥0

≥ a



= P



sup

t≥0

≥ a, hM i

∞

≤ b



≤ e

−a

/2b

where the final equality is an exercise on the third example sheet, which is true

for general continuous local martingales. So we get



exp



sup



∞

P(exp(sup M

) ≥ λ) dλ

∞

P(sup M

≥ log λ) dλ

≤ 1 +

∞

−(log λ)

/2b

dλ < ∞.

Since hM i ≥ 0, we know that

sup

t≥0

E(M )

≤ exp (sup M

) ,

So E(M) is a uniformly integrable martingale.

Theorem

(Girsanov’s theorem)

Let

be a continuous local martingale with

= 0. Suppose that

(

) is a uniformly integrable martingale. Define a new

probability measure

= E(M )

∞

Let

be a continuous local martingale with respect to

. Then

X − hX, M i

a continuous local martingale with respect to Q.

Proof. Define the stopping time

= inf{t ≥ 0 : |X

− hX, Mi

| ≥ n},

and

(

→ ∞

) = 1 by continuity. Since

is absolutely continuous with

respect to

, we know that

(

→ ∞

) = 1. Thus it suffices to show that

− hX

, Mi is a continuous martingale for any n. Let

Y = X

− hX

, Mi, Z = E(M ).

Claim. ZY is a continuous local martingale with respect to P.

We use the product rule to compute

d(ZY ) = Y

+ Z

+ dhY, Zi

= Y Z

+ Z

(dX

− dhX

, Mi

) + Z

dhM, X

= Y Z

+ Z

So we see that

is a stochastic integral with respect to a continuous local

martingale. Thus ZY is a continuous local martingale.

Claim. ZY is uniformly integrable.

Since

is a uniformly integrable martingale,

T is a stopping time}

uniformly integrable. Since

is bounded,

T is a stopping time}

is also

uniformly integrable. So ZY is a true martingale (with respect to P).

Claim. Y is a martingale with respect to Q.

We have

− Y

| F

) = E

∞

− Z

∞

| F

)

= E

− Z

| F

) = 0.

Note that the quadratic variation does not change since

hX − hX, M ii = hXi

= lim

n→∞

i=1

−n

− X

(i−1)2

−n

)

a.s.

along a subsequence.

4 Stochastic differential equations

4.1 Existence and uniqueness of solutions

After all this work, we can return to the problem we described in the introduction.

We wanted to make sense of equations of the form

˙x(t) = F (x(t)) + η(t),

where

(

) is Gaussian white noise. We can now interpret this equation as saying

= F (X

) dt + dB

or equivalently, in integral form,

− X

F (X

) ds + B

In general, we can make the following definition:

Definition

(Stochastic differential equation)

Let

d, m ∈ N

× R

→ R

× R

→ R

d×m

be locally bounded (and measurable). A solution to the

stochastic differential equation E(σ, b) given by

= b(t, X

) dt + σ(t, X

) dB

consists of

(i) a filtered probability space (Ω, F, (F

), P) obeying the usual conditions;

(ii) an m-dimensional Brownian motion B with B

= 0; and

(iii) an (F

)-adapted continuous process X with values in R

such that

= X

σ(s, X

) dB

b(s, X

) ds.

x ∈ R

, then we say

is a (weak) solution to

(

σ, b

). It is a strong

solution if it is adapted with respect to the canonical filtration of B.

Our goal is to prove existence and uniqueness of solutions to a general class

of SDEs. We already know what it means for solutions to be unique, and in

general there can be multiple notions of uniqueness:

Definition

(Uniqueness of solutions)

For the stochastic differential equation

E(σ, b), we say there is

–

uniqueness in law if for every

x ∈ R

, all solutions to

(

σ, b

) have the

same distribution.

–

pathwise uniqueness if when (Ω

, F,

(

)

, P

) and

are fixed, any two

solutions X, X

with X

= X

are indistinguishable.

These two notions are not equivalent, as the following example shows:

Example (Tanaka). Consider the stochastic differential equation

= sgn(X

) dB

, X

= x,

where

sgn(x) =

(

+1 x > 0

−1 x ≤ 0

This has a weak solution which is unique in law, but pathwise uniqueness fails.

To see the existence of solutions, let

be a one-dimensional Brownian motion

with X

= x, and set

sgn(X

) dX

which is well-defined because

sgn

(

) is previsible and left-continuous. Then we

have

x +

sgn(X

) dB

= x +

sgn(X

)

= x + X

− X

= X

So it remains to show that

is a Brownian motion. We already know that

a continuous local martingale, so by L´evy’s characterization, it suffices to show

its quadratic variation is t. We simply compute

hB, Bi

dhX

, X

i = t.

So there is weak existence. The same argument shows that any solution is a

Brownian motion, so we have uniqueness in law.

Finally, observe that if

= 0 and

is a solution, then

−X

is also a solution

with the same Brownian motion. Indeed,

−X

sgn(X

) dB

sgn(−X

) dB

+ 2

where the second term vanishes, since it is a continuous local martingale with

quadratic variation

ds = 0. So pathwise uniqueness does not hold.

In the other direction, however, it turns out pathwise uniqueness implies

uniqueness in law.

Theorem

(Yamada–Watanabe)

Assume weak existence and pathwise unique-

ness holds. Then

(i) Uniqueness in law holds.

(ii)

For every (Ω

, F,

(

)

, P

) and

and any

x ∈ R

, there is a unique strong

solution to E

(a, b).

We will not prove this, since we will not actually need it.

The key, important theorem we are now heading for is the existence and

uniqueness of solutions to SDEs, assuming reasonable conditions. As in the case

of ODEs, we need the following Lipschitz conditions:

Definition

(Lipschitz coefficients)

The coefficients

× R

→ R

× R

→ R

d×m

are Lipschitz in

if there exists a constant

K >

0 such that

for all t ≥ 0 and x, y ∈ R

, we have

|b(t, x) − b(t, y)| ≤ K|x − y|

|σ(t, x) − σ(t, y)| ≤ |x − y|

Theorem.

Assume

b, σ

are Lipschitz in

. Then there is pathwise uniqueness

for the

(

σ, b

) and for every (Ω

, F,

(

)

, P

) satisfying the usual conditions and

every (

)-Brownian motion

, for every

x ∈ R

, there exists a unique strong

solution to E

(σ, b).

Proof. To simplify notation, we assume m = d = 1.

We first prove pathwise uniqueness. Suppose

X, X

are two solutions with

. We will show that

[(

− X

)

] = 0. We will actually put some

bounds to control our variables. Define the stopping time

S = inf{t ≥ 0 : |X

| ≥ n or |X

| ≥ n}.

By continuity,

S → ∞

n → ∞

. We also fix a deterministic time

T >

0. Then

whenever t ∈ [0, T ], we can bound, using the identity (a + b)

≤ 2a

+ 2b

E((X

t∧S

− X

t∧S

)

) ≤ 2E





t∧S

(σ(s, X

) − σ(s, X

)) dB





+ 2E





t∧S

(b(s, X

) − b(s, X

)) ds





We can apply the Lipschitz bound to the second term immediately, while we can

simplify the first term using the (corollary of the) Itˆo isometry





t∧S

(σ(s, X

) − σ(s, X

)) dB





= E

t∧S

(σ(s, X

) − σ(s, X

))

So using the Lipschitz bound, we have

E((X

t∧S

− X

t∧S

)

) ≤ 2K

(1 + T )E

t∧S

− X

≤ 2K

(1 + T )

E(|X

s∧S

− X

s∧S

) ds.

We now use Gr¨onwall’s lemma:

Lemma. Let h(t) be a function such that

h(t) ≤ c

h(s) ds

for some constant c. Then

h(t) ≤ h(0)e

Applying this to

h(t) = E((X

t∧S

− X

t∧S

)

we deduce that h(t) ≤ h(0)e

= 0. So we know that

E(|X

t∧S

− X

t∧S

) = 0

for every t ∈ [0, T ]. Taking n → ∞ and T → ∞ gives pathwise uniqueness.

We next prove existence of solutions. We fix (Ω

, F,

(

)

) and

, and define

F (X)

= X

σ(s, X

) dB

b(s, X

) ds.

Then

is a solution to

(

a, b

) iff

(

) =

and

. To find a fixed point,

we use Picard iteration. We fix

T >

0, and define the

-norm of a continuous

adapted process X as

kXk

= E



sup

t≤T



1/2

In particular, if

is a martingale, then this is the same as the norm on the

space of L

-bounded martingales by Doob’s inequality. Then

B = {X : Ω × [0, T ] → R : kXk

< ∞}

is a Banach space.

Claim. kF (0)k

< ∞, and

kF (X) − F (Y )k

≤ (2T + 8)K

kX − Y k

dt.

We first see how this claim implies the theorem. First observe that the claim

implies

indeed maps

into itself. We can then define a sequence of processes

= x, X

i+1

= F (X

Then we have

i+1

− X

≤ CT

− X

i−1

dt ≤ ··· ≤ kX

− X





So we find that

∞

i=1

− X

i−1

< ∞

for all

. So

converges to

almost surely and uniformly on [0

, T

], and

F (X) = X. We then take T → ∞ and we are done.

To prove the claim, we write

kF (0)k

≤ |X

| +



b(s, 0) ds



σ(s, 0) dB



The first two terms are constant, and we can bound the last by Doob’s inequality

and the Itˆo isometry:



σ(s, 0) dB



≤ 2E







σ(s, 0) dB







= 2

σ(s, 0)

ds.

To prove the second part, we use

kF (X) − F (Y )k

≤ 2E

sup

t≤T



b(s, X − s) − b(s, Y

) ds



+ 2E

sup

t≤T



(σ(s, X

) − σ(s, Y

)) dB



We can bound the first term with Cauchy–Schwartz by

T E

|b(s, X

) − b(s, Y

≤ T K

kX − Y k

dt,

and the second term with Doob’s inequality by

|σ(s, X

) − σ(s, Y

≤ 4K

kX − Y k

dt.

4.2 Examples of stochastic differential equations

Example

(The Ornstein–Uhlenbeck process)

Let

λ >

0. Then the Ornstein–

Uhlenbeck process is the solution to

= −λX

dt + dB

The solution exists by the previous theorem, but we can also explicitly find one.

By Itˆo’s formula applied to e

λt

, we get

d(e

λt

) = e

λt

+ λe

λt

dt = dB

So we find that

= e

−λt

−λ(t−s)

Observe that the integrand is deterministic. So we can in fact interpret this as

an Wiener integral.

Fact.

x ∈ R

is fixed, then (

) is a Gaussian process, i.e. (

)

i=1

jointly Gaussian for all

< ··· < t

. Any Gaussian process is determined by

the mean and covariance, and in this case, we have

= e

−λt

x, cov(X

, X

) =

2λ



−λ|t−s|

− e

−λ|t+s|



Proof. We only have to compute the covariance. By the Itˆo isometry, we have

E((X

− EX

)(X

− EX

)) = E



−λ(t−u)

−λ(s−u)



= e

−λ(t+s)

t∧s

λu

du.

In particular,

∼ N



−λt

1 − e

−2λt

2λ



→ N



2λ



Fact.

∼ N

2λ

), then (

) is a centered Gaussian process with stationary

covariance, i.e. the covariance depends only on time differences:

cov(X

, X

) =

2λ

−λ|t−s|

The difference is that in the deterministic case, the

cancels the first

−λt

term, while in the non-deterministic case, it doesn’t.

This is a very nice example where we can explicitly understand the long-time

behaviour of the SDE. In general, this is non-trivial.

Dyson Brownian motion

Let

be an inner product space of real symmetric

N ×N

matrices with inner

product

N Tr

(

) for

H, K ∈ H

. Let

, . . . , H

dim(H

)

be an orthonormal

basis for H

Definition

(Gaussian orthogonal ensemble)

The Gaussian Orthogonal Ensem-

ble GOE

is the standard Gaussian measure on H

, i.e. H ∼ GOE

H =

dim H

r=1

where each X

are iid standard normals.

We now replace each

by a Ornstein–Uhlenbeck process with

. Then

GOE

is invariant under the process.

Theorem. The eigenvalues λ

(t) ≤ ··· ≤ λ

(t) satisfies

dλ





−

j6=i

− λ





dt +

Nβ

Here

= 1, but if we replace symmetric matrices by Hermitian ones, we get

β = 2; if we replace symmetric matrices by symplectic ones, we get β = 4.

This follows from Itˆo’s formula and formulas for derivatives of eigenvalues.

Example

(Geometric Brownian motion)

Fix

σ >

0 and

t ∈ R

. Then geometric

Brownian motion is given by

= σX

+ rX

dt.

We apply Itˆo’s formula to log X

to find that

= X

exp



σB



r −



Example

(Bessel process)

Let

= (

, . . . , B

) be a

-dimensional Brownian

motion. Then

= |B

satisfies the stochastic differential equation

d − 1

dt + dB

if t < inf{t ≥ 0 : X

= 0}.

4.3 Representations of solutions to PDEs

Recall that in Advanced Probability, we learnt that we can represent the solution

to Laplace’s equation via Brownian motion, namely if

is a suitably nice domain

and

∂D → R

is a function, then the solution to the Laplace’s equation on

with boundary conditions g is given by

u(x) = E

[g(B

)],

where T is the first hitting time of the boundary ∂D.

A similar statement we can make is that if we want to solve the heat equation

∂u

∂t

= ∇

with initial conditions u(x, 0) = u

(x), then we can write the solution as

u(x, t) = E

(

√

)]

This is just a fancy way to say that the Green’s function for the heat equation is

a Gaussian, but is a good way to think about it nevertheless.

In general, we would like to associate PDEs to certain stochastic processes.

Recall that a stochastic PDE is generally of the form

= b(X

) dt + σ(X

) dB

for some

→ R

and

→ R

d×m

which are measurable and locally

bounded. Here we assume these functions do not have time dependence. We

can then associate to this a differential operator L defined by

L =

i,j

∂

where a = σσ

Example. If b = 0 and σ =

√

2I, then L = ∆ is the standard Laplacian.

The basic computation is the following result, which is a standard application

of the Itˆo formula:

Proposition.

Let

x ∈ R

, and

a solution to

(

σ, b

). Then for every

f : R

× R

→ R that is C

in R

and C

in R

, the process

= f(t, X

) − f(0, X

) −



∂

∂s

+ L



f(s, X

) ds

is a continuous local martingale.

We first apply this to the Dirichlet–Poisson problem, which is essentially to

solve

−Lu

. To be precise, let

U ⊆ R

be non-empty, bounded and open;

f ∈ C

(

) and

g ∈ C

(

∂U

). We then want to find a

u ∈ C

(

) =

(

)

∩C

(

)

such that

−Lu(x) = f(x) for x ∈ U

u(x) = g(x) for x ∈ ∂U.

= 0, this is called the Dirichlet problem; if

= 0, this is called the Poisson

problem.

We will have to impose the following technical condition on a:

Definition

(Uniformly elliptic)

We say

U → R

d×d

is uniformly elliptic if

there is a constant c > 0 such that for all ξ ∈ R

and x ∈

U, we have

a(x)ξ ≥ c|ξ|

is symmetric (which it is in our case), this is the same as asking for the

smallest eigenvalue of a to be bounded away from 0.

It would be very nice if we can write down a solution to the Dirichlet–Poisson

problem using a solution to

(

σ, b

), and then simply check that it works. We

can indeed do that, but it takes a bit more time than we have. Instead, we shall

prove a slightly weaker result that if we happen to have a solution, it must be

given by our formula involving the SDE. So we first note the following theorem

without proof:

Theorem.

Assume

has a smooth boundary (or satisfies the exterior cone

condition),

a, b

are H¨older continuous and

is uniformly elliptic. Then for

every H¨older continuous

U → R

and any continuous

∂U → R

, the

Dirichlet–Poisson process has a solution.

The main theorem is the following:

Theorem.

Let

and

be bounded measurable and

σσ

uniformly elliptic,

U ⊆ R

as above. Let

be a solution to the Dirichlet–Poisson problem and

solution to E

(σ, b) for some x ∈ R

. Define the stopping time

= inf{t ≥ 0 : X

6∈ U}.

Then ET

< ∞ and

u(x) = E

g(X

) +

f(X

) ds

In particular, the solution to the PDE is unique.

Proof.

Our previous proposition applies to functions defined on all of

, while

u is just defined on U . So we set



x ∈ U : dist(x, ∂U ) >



, T

= inf{t ≥ 0 : X

6∈ U

and pick

∈ C

(

) such that

. Recalling our previous notation,

let

= (M

)

= u

t∧T

) − u

) −

t∧T

) ds.

Then this is a continuous local martingale that is bounded by the proposition,

and is bounded, hence a true martingale. Thus for

x ∈ U

and

large enough,

the martingale property implies

u(x) = u

(x) = E

u(X

t∧T

) −

t∧T

Lu(X

) ds

= E

u(X

t∧T

) +

t∧T

f(X

) ds

We would be done if we can take

n → ∞

. To do so, we first show that

[

]

< ∞

Note that this does not depend on

and

. So we can take

= 1 and

= 0,

and let v be a solution. Then we have

E(t ∧ T

) = E

−

t∧T

Lv(X

) ds

= v(x) −E(v(X

t∧T

)).

Since

is bounded, by dominated/monotone convergence, we can take the limit

to get

E(T

) < ∞.

Thus, we know that t ∧ T

→ T

as t → ∞ and n → ∞. Since

|f(X

)| ds

≤ kfk

∞

E[T

] < ∞,

the dominated convergence theorem tells us

t∧T

f(X

) ds

→ E

f(X

) ds

Since u is continuous on

U, we also have

E(u(X

t∧T

)) → E(u(T

)) = E(g(T

)).

We can use SDE’s to solve the Cauchy problem for parabolic equations as

well, just like the heat equation. The problem is as follows: for

f ∈ C

(

), we

want to find u : R

× R

→ R that is C

in R

and C

in R

such that

∂u

∂t

= Lu on R

× R

u(0, ·) = f on R

Again we will need the following theorem:

Theorem.

For every

f ∈ C

(

), there exists a solution to the Cauchy problem.

Theorem.

Let

be a solution to the Cauchy problem. Let

be a solution to

(σ, b) for x ∈ R

and 0 ≤ s ≤ t. Then

(f(X

) | F

) = u(t − s, X

In particular,

u(t, x) = E

(f(X

)).

In particular, this implies X

is a continuous Markov process.

Proof.

The martingale has

∂

∂t

, but the heat equation has

∂

∂t

− L

. So we set

g(s, x) = u(t − s, x). Then



∂

∂s

+ L



g(s, x) = −

∂

∂t

u(t − s, x) + Lu(t − s, x) = 0.

So g(s, X

) − g(0, X

) is a martingale (boundedness is an exercise), and hence

u(t − s, X

) = g(s, X

) = E(g(t, X

) | F

) = E(u(0, X

) | F

) = E(f(X

) | X

There is a generalization to the Feynman–Kac formula.

Theorem

(Feynman–Kac formula)

Let

f ∈ C

(

) and

V ∈ C

(

) and

suppose that u : R

× R

→ R satisfies

∂u

∂t

= Lu + V u on R

× R

u(0, ·) = f on R

where V u = V (x)u(x) is given by multiplication.

Then for all t > 0 and x ∈ R

, and X a solution to E

(σ, b). Then

u(t, x) = E



f(X

) exp



V (X

) ds



is the Laplacian, then this is Schr¨odinger equation, which is why Feynman

was thinking about this.