III Advanced Probability (Full)

Part III — Advanced Probability

Based on lectures by M. Lis

Notes taken by Dexter Chua

Michaelmas 2017

These notes are not endorsed by the lecturers, and I have modified them (often

significantly) after lectures. They are nowhere near accurate representations of what

was actually lectured, and in particular, all errors are almost surely mine.

The aim of the course is to introduce students to advanced topics in modern probability

theory. The emphasis is on tools required in the rigorous analysis of stochastic

processes, such as Brownian motion, and in applications where probability theory plays

an important role.

Review of measure and integration:

sigma-algebras, measures and filtrations;

integrals and expectation; convergence theorems; product measures, independence and

Fubini’s theorem.

Conditional expectation:

Discrete case, Gaussian case, conditional density functions;

existence and uniqueness; basic properties.

Martingales:

Martingales and submartingales in discrete time; optional stopping;

Doob’s inequalities, upcrossings, martingale convergence theorems; applications of

martingale techniques.

Stochastic processes in continuous time:

Kolmogorov’s criterion, regularization

of paths; martingales in continuous time.

Weak convergence:

Definitions and characterizations; convergence in distribution,

tightness, Prokhorov’s theorem; characteristic functions, L´evy’s continuity theorem.

Sums of independent random variables:

Strong laws of large numbers; central

limit theorem; Cram´er’s theory of large deviations.

Brownian motion:

Wiener’s existence theorem, scaling and symmetry properties;

martingales associated with Brownian motion, the strong Markov property, hitting

times; properties of sample paths, recurrence and transience; Brownian motion and the

Dirichlet problem; Donsker’s invariance principle.

Poisson random measures: Construction and properties; integrals.

L´evy processes: L´evy-Khinchin theorem.

Pre-requisites

A basic familiarity with measure theory and the measure-theoretic formulation of

probability theory is very helpful. These foundational topics will be reviewed at the

beginning of the course, but students unfamiliar with them are expected to consult the

literature (for instance, Williams’ book) to strengthen their understanding.

Contents

0 Introduction

1 Some measure theory

1.1 Review of measure theory

1.2 Conditional expectation

2 Martingales in discrete time

2.1 Filtrations and martingales

2.2 Stopping time and optimal stopping

2.3 Martingale convergence theorems

2.4 Applications of martingales

3 Continuous time stochastic processes

4 Weak convergence of measures

5 Brownian motion

5.1 Basic properties of Brownian motion

5.2 Harmonic functions and Brownian motion

5.3 Transience and recurrence

5.4 Donsker’s invariance principle

6 Large deviations

0 Introduction

In some other places in the world, this course might be known as “Stochastic

Processes”. In addition to doing probability, a new component studied in the

course is time. We are going to study how things change over time.

In the first half of the course, we will focus on discrete time. A familiar

example is the simple random walk — we start at a point on a grid, and at

each time step, we jump to a neighbouring grid point randomly. This gives a

sequence of random variables indexed by discrete time steps, and are related to

each other in interesting ways. In particular, we will consider martingales, which

enjoy some really nice convergence and “stopping ”properties.

In the second half of the course, we will look at continuous time. There

is a fundamental difference between the two, in that there is a nice topology

on the interval. This allows us to say things like we want our trajectories to

be continuous. On the other hand, this can cause some headaches because

is uncountable. We will spend a lot of time thinking about Brownian motion,

whose discovery is often attributed to Robert Brown. We can think of this as the

limit as we take finer and finer steps in a random walk. It turns out this has a

very rich structure, and will tell us something about Laplace’s equation as well.

Apart from stochastic processes themselves, there are two main objects that

appear in this course. The first is the conditional expectation. Recall that if we

have a random variable

, we can obtain a number

[

], the expectation of

. We can think of this as integrating out all the randomness of the system,

and just remembering the average. Conditional expectation will be some subtle

modification of this construction, where we don’t actually get a number, but

another random variable. The idea behind this is that we want to integrate out

some of the randomness in our random variable, but keep the remaining.

Another main object is stopping time. For example, if we have a production

line that produces random number of outputs at each point, then we can ask how

much time it takes to produce a fixed number of goods. This is a nice random

time, which we call a stopping time. The niceness follows from the fact that

when the time comes, we know it. An example that is not nice is, for example,

the last day it rains in Cambridge in a particular month, since on that last day,

we don’t necessarily know that it is in fact the last day.

At the end of the course, we will say a little bit about large deviations.

1 Some measure theory

1.1 Review of measure theory

To make the course as self-contained as possible, we shall begin with some review

of measure theory. On the other hand, if one doesn’t already know measure

theory, they are recommended to learn the measure theory properly before

starting this course.

Definition

(

-algebra)

Let

be a set. A subset

of the power set

(

) is

called a σ-algebra (or σ-field) if

(i) ∅ ∈ E;

(ii) If A ∈ E, then A

= E \ A ∈ E;

(iii) If A

, A

, . . . ∈ E, then

∞

n=1

∈ E.

Definition (Measurable space). A measurable space is a set with a σ-algebra.

Definition

(Borel

-algebra)

Let

be a topological space with topology

Then the Borel

-algebra

(

) on

is the

-algebra generated by

, i.e. the

smallest σ-algebra containing T .

We are often going to look at B(R), and we will just write B for it.

Definition (Measure). A function µ : E → [0, ∞] is a measure if

– µ(∅) = 0

– If A

, A

, . . . ∈ E are disjoint, then

∞

[

i=1

∞

i=1

µ(A

Definition

(Measure space)

A measure space is a measurable space with a

measure.

Definition

(Measurable function)

Let (

, E

) and (

, E

) be measurable

spaces. Then

→ E

is said to be measurable if

A ∈ E

implies

−1

(

)

∈ E

This is similar to the definition of a continuous function.

Notation.

For (

E, E

) a measurable space, we write

for the set of measurable

functions E → R.

We write

to be the positive measurable functions, which are allowed to

take value ∞.

Note that we do not allow taking the values ±∞ in the first case.

Theorem.

Let (

E, E, µ

) be a measure space. Then there exists a unique function

˜µ : mE

→ [0, ∞] satisfying

– ˜µ(1

) = µ(A), where 1

is the indicator function of A.

– Linearity: ˜µ(αf + βg) = α˜µ(f) + β ˜µ(g) if α, β ∈ R

≥0

and f, g ∈ mE

–

Monotone convergence: iff

, f

, . . . ∈ mE

are such that

% f ∈ mE

pointwise a.e. as n → ∞, then

lim

n→∞

˜µ(f

) = ˜µ(f ).

We call

˜µ

the integral with respect to

, and we will write it as

from now on.

Definition

(Simple function)

A function

is simple if there exists

∈ R

≥0

and A

∈ E for 1 ≤ n ≤ k such that

f =

n=1

From the first two properties of the measure, we see that

µ(f) =

n=1

µ(A

One convenient observation is that a function is simple iff it takes on only finitely

many values. We then see that if f ∈ mE

, then

= 2

−n

fc∧ n

is a sequence of simple functions approximating

from below. Thus, given

monotone convergence, this shows that

µ(f) = lim µ(f

and this proves the uniqueness part of the theorem.

Recall that

Definition (Almost everywhere). We say f = g almost everywhere if

µ({x ∈ E : f(x) 6= g(x)}) = 0.

We say f is a version of g.

Example.

Let

[n,n+1]

. Then

(

) = 1 for all 1, but also

→

0 and

µ(0) = 0. So the “monotone” part of monotone convergence is important.

So if the sequence is not monotone, then the measure does not preserve limits,

but it turns out we still have an inequality.

Lemma (Fatou’s lemma). Let f

∈ mE

. Then



lim inf



≤ lim inf

µ(f

Proof. Apply monotone convergence to the sequence inf

m≥n

Of course, it would be useful to extend integration to functions that are not

necessarily positive.

Definition

(Integrable function)

We say a function

f ∈ mE

is integrable if

µ(|f|) ≤ ∞. We write L

(E) (or just L

) for the space of integrable functions.

We extend µ to L

µ(f) = µ(f

) − µ(f

−

where f

= (±f) ∧ 0.

If we want to be explicit about the measure and the

-algebra, we can write

(E, Eµ).

Theorem

(Dominated convergence theorem)

∈ mE

and

→ f

a.e., such

that there exists g ∈ L

such that |f

| ≤ g a.e. Then

µ(f) = lim µ(f

Proof. Apply Fatou’s lemma to g −f

and g + f

Definition

(Product

-algebra)

Let (

, E

) and (

, E

) be measure spaces.

Then the product

-algebra

⊗E

is the smallest

-algebra on

×E

containing

all sets of the form A

× A

, where A

∈ E

Theorem.

If (

, E

, µ

) and (

, E

, µ

) are

-finite measure spaces, then there

exists a unique measure µ on E

⊗ E

) satisfying

µ(A

× A

) = µ

)µ

)

for all A

∈ E

This is called the product measure.

Theorem

(Fubini’s/Tonelli’s theorem)

(

, x

)

∈ mE

with

⊗ E

, then the functions

7→

f(x

, x

)dµ

) ∈ mE

7→

f(x

, x

)dµ

) ∈ mE

and

f du =



f(x

, x

) dµ

)



dµ

)



f(x

, x

) dµ

)



dµ

)

1.2 Conditional expectation

In this course, conditional expectation is going to play an important role, and

it is worth spending some time developing the theory. We are going to focus

on probability theory, which, mathematically, just means we assume

(

) = 1.

Practically, it is common to change notation to

= Ω,

and

. Measurable functions will be written as

X, Y, Z

, and will be called

random variables. Elements in

will be called events. An element

ω ∈

Ω will

be called a realization.

There are many ways we can think about conditional expectations. The first

one is how most of us first encountered conditional probability.

Suppose

B ∈ F

, with

(

)

0. Then the conditional probability of the

event A given B is

P(A | B) =

P(A ∩ B)

P(B)

This should be interpreted as the probability that

happened, given that

happened. Since we assume

happened, we ought to restrict to the subset of the

probability space where

in fact happened. To make this a probability space,

we scale the probability measure by

(

). Then given any event

, we take the

probability of

A ∩ B

under this probability measure, which is the formula given.

More generally, if

is a random variable, the conditional expectation of

given B is just the expectation under this new probability measure,

E[X | B] =

E[X1

]

P[B]

We probably already know this from high school, and we are probably not quite

excited by this. One natural generalization would be to allow B to vary.

Let G

, G

, . . . ∈ F be disjoint events such that

= Ω. Let

G = σ(G

, G

, . . .) =

(

[

n∈I

: I ⊆ N

)

Let X ∈ L

. We then define

Y =

∞

n=1

E(X | G

Let’s think about what this is saying. Suppose a random outcome

happens.

To compute

, we figure out which of the

our

belongs to. Let’s say

ω ∈ G

. Then

returns the expected value of

given that we live in

. In

this processes, we have forgotten the exact value of

. All that matters is which

the outcome belongs to. We can “visually” think of the

as cutting up

the sample space Ω into compartments:

We then average out

in each of these compartments to obtain

. This is what

we are going to call the conditional expectation of

given

, written

(

X | G

Ultimately, the characterizing property of Y is the following lemma:

Lemma.

The conditional expectation

(

X | G

) satisfies the following

properties:

– Y is G-measurable

– We have Y ∈ L

, and

EY 1

= EX1

for all A ∈ G.

Proof. It is clear that Y is G-measurable. To show it is L

, we compute

E[|Y |] = E



∞

n=1

E(X | G



≤ E

∞

n=1

E(|X| | G

E (E(|X| | G

)

E|X|1

= E

|X|1

= E|X|

< ∞,

where we used monotone convergence twice to swap the expectation and the

sum.

The final part is also clear, since we can explicitly enumerate the elements in

G and see that they all satisfy the last property.

It turns out for any

-subalgebra

G ⊆ F

, we can construct the conditional

expectation

(

X | G

), which is uniquely characterized by the above two proper-

ties.

Theorem

(Existence and uniqueness of conditional expectation)

Let

X ∈ L

and G ⊆ F. Then there exists a random variable Y such that

– Y is G-measurable

– Y ∈ L

, and EX1

= EY 1

for all A ∈ G.

Moreover, if

is another random variable satisfying these conditions, then

= Y almost surely.

We call Y a (version of) the conditional expectation given G.

We will write the condition expectation as

(

X | G

), and if

, we will

write P(A | G) = E(1

| G).

Recall also that if

is a random variable, then

(

) =

−1

(

) :

B ∈ B}

In this case, we will write E(X | Z) = E(X | σ(Z)).

By, say, bounded convergence, it follows from the second condition that

EXZ = EY Z for all bounded G-measurable functions Z.

Proof.

We first consider the case where

X ∈ L

(Ω

, F, µ

). Then we know from

functional analysis that for any

G ⊆ F

, the space

(

) is a Hilbert space with

inner product

hX, Y i = µ(XY ).

In particular,

(

) is a closed subspace of

(

). We can then define

to be the orthogonal projection of

onto

(

). It is immediate that

-measurable. For the second part, we use that

X − Y

is orthogonal to

(

since that’s what orthogonal projection is supposed to be. So

(

X − Y

)

= 0

for all

Z ∈ L

(

). In particular, since the measure space is finite, the indicator

function of any measurable subset is L

. So we are done.

We next focus on the case where X ∈ mE

. We define

= X ∧ n

We want to use monotone convergence to obtain our result. To do so, we need

the following result:

Claim.

If (

X, Y

) and (

, Y

) satisfy the conditions of the theorem, and

≥ X

a.s., then Y

≥ Y a.s.

Proof.

Define the event

≤ Y } ∈ G

. Consider the event

= (

Y −Y

)

Then Z ≥ 0. We then have

= EX

≥ EX1

= EY 1

So it follows that we also have

(

Y −Y

)

≤

0. So in fact

= 0. So

≥ Y

a.s.

We can now define

(

| G

), picking them so that

}

is increasing.

We then take

∞

lim Y

. Then

∞

is certainly

-measurable, and by monotone

convergence, if A ∈ G, then

EX1

= lim EX

= lim EY

= EY

∞

Now if

EX < ∞

, then

∞

EX < ∞

. So we know

∞

is finite a.s., and we

can define Y = Y

∞

<∞

Finally, we work with arbitrary

X ∈ L

. We can write

− X

−

, and

then define Y

= E(X

| G), and take Y = Y

− Y

−

Uniqueness is then clear.

Lemma.

(

)-measurable, then there exists

R → R

Borel-measurable

such that Y = h(Z). In particular,

E(X | Z) = h(Z) a.s.

for some h : R → R.

We can then define

(

X | Z

) =

(

). The point of doing this is that we

want to allow for the case where in fact we have

(

) = 0, in which case

our original definition does not make sense.

Exercise.

Consider

X ∈ L

, and

: Ω

→ N

discrete. Compute

(

X | Z

) and

compare our different definitions of conditional expectation.

Example.

Let (

U, V

)

∈ R

with density

U,V

(

u, v

), so that for any

, B

∈ B

we have

P(U ∈ B

, V ∈ B

) =

U,V

(u, v) du dv.

We want to compute

(

)

| U

), where

R → R

is Borel measurable. We

can define

(u) =

U,V

(u, v) dv,

and we define the conditional density of V given U by

(v | u) =

U,V

(u, v)

(u)

We define

g(u) =

h(u)f

V |U

(v | u) dv.

We claim that E(h(V ) | U) is just g(U).

To check this, we show that it satisfies the two desired conditions. It is clear

that it is

(

)-measurable. To check the second condition, fix an

A ∈ σ

(

Then A = {(u, v) : u ∈ B} for some B. Then

E(h(V )1

) =

h(v)1

u∈B

U,V

(u, v) du dv

h(v)1

u∈B

V |U

(v | u)f

(u) du dv

g(U)1

u∈B

(u) du

= E(g(U)1

as desired.

The point of this example is that to compute conditional expectations, we

use our intuition to guess what the conditional expectation should be, and then

check that it satisfies the two uniquely characterizing properties.

Example.

Suppose (

X, W

) are Gaussian. Then for all linear functions

→

R, the quantity ϕ(X, W ) is Gaussian.

One nice property of Gaussians is that lack of correlation implies independence.

We want to compute

(

X | W

). Note that if

is such that

X − Y

is independent of

, and

-measurable, then

(

X | W

), since

E(X − Y )1

= 0 for all σ(W )-measurable A.

The guess is that we want

to be a Gaussian variable. We put

Then EX = EY implies we must have

aEW + b = EX. (∗)

The independence part requires

cov

(

X − Y, W

) = 0. Since covariance is linear,

we have

0 = cov(X −Y, W ) = cov(X, W ) −cov(aW +b, W ) = cov(X, W)−a cov(W, W ).

Recalling that cov(W, W) = var(W ), we need

a =

cov(X, W )

var(W )

This then allows us to use (

∗

) to compute

as well. This is how we compute the

conditional expectation of Gaussians.

We note some immediate properties of conditional expectation. As usual,

all (in)equality and convergence statements are to be taken with the quantifier

“almost surely”.

Proposition.

(i) E(X | G) = X iff X is G-measurable.

(ii) E(E(X | G)) = EX

(iii) If X ≥ 0 a.s., then E(X | G) ≥ 0

(iv) If X and G are independent, then E(X | G) = E[X]

(v) If α, β ∈ R and X

, X

∈ L

, then

E(αX

+ βX

| G) = αE(X

| G) + βE(X

| G).

(vi) Suppose X

% X. Then

E(X

| G) % E(X | G).

(vii) Fatou’s lemma: If X

are non-negative measurable, then



lim inf

n→∞

| G



≤ lim inf

n→∞

E(X

| G).

(viii)

Dominated convergence theorem: If

→ X

and

Y ∈ L

such that

Y ≥ |X

| for all n, then

E(X

| G) → E(X | G).

(ix) Jensen’s inequality: If c : R → R is convex, then

E(c(X) | G) ≥ c(E(X) | G).

(x) Tower property: If H ⊆ G, then

E(E(X | G) | H) = E(X | H).

(xi) For p ≥ 1,

kE(X | G)k

≤ kXk

(xii) If Z is bounded and G-measurable, then

E(ZX | G) = ZE(X | G).

(xiii)

Let

X ∈ L

and

G, H ⊆ F

. Assume that

(

X, G

) is independent of

Then

E(X | G) = E(X | σ(G, H)).

Proof.

(i) Clear.

(ii) Take A = ω.

(iii) Shown in the proof.

(iv) Clear by property of expected value of independent variables.

(v)

Clear, since the RHS satisfies the unique characterizing property of the

LHS.

(vi) Clear from construction.

(vii) Same as the unconditional proof, using the previous property.

(viii) Same as the unconditional proof, using the previous property.

(ix) Same as the unconditional proof.

(x) The LHS satisfies the characterizing property of the RHS

(xi) Using the convexity of |x|

, Jensen’s inequality tells us

kE(X | G)k

= E|E(X | G)|

≤ E(E(|X|

| G))

= E|X|

= kXk

(xii) If Z = 1

, and let b ∈ G. Then

E(ZE(X | G)1

) = E(E(X | G) · 1

A∩B

) = E(X1

A∩B

) = E(ZX1

So the lemma holds. Linearity then implies the result for

simple, then

apply our favorite convergence theorems.

(xiii) Take B ∈ H and A ∈ G. Then

E(E(X | σ(G, H)) · 1

A∩B

) = E(X · 1

A∩B

)

= E(X1

)P(B)

= E(E(X | G)1

)P(B)

= E(E(X | G)1

A∩B

)

If instead of

A ∩ B

, we had any

(

G, H

)-measurable set, then we would

be done. But we are fine, since the set of subsets of the form

A ∩ B

with

A ∈ G, B ∈ H is a generating π-system for σ(H, G).

We shall end with the following key lemma. We will later use it to show that

many of our martingales are uniformly integrable.

Lemma.

X ∈ L

, then the family of random variables

(

X | G

) for all

G ⊆ F is uniformly integrable.

In other words, for all ε > 0, there exists λ > 0 such that

E(Y

>λ|

) < ε

for all G.

Proof.

Fix

ε >

0. Then there exists

δ >

0 such that

E|X|1

< ε

for any

with

P(A) < δ.

Take Y = E(X | G). Then by Jensen, we know

|Y | ≤ E(|X| | G)

In particular, we have

E|Y | ≤ E|X|.

By Markov’s inequality, we have

P(|Y | ≥ λ) ≤

E|Y |

≤

E|X|

So take λ such that

E|X|

< δ. So we have

E(|Y |1

|Y |≥λ

) ≤ E(E(|X| | G)1

|Y |≥λ

) = E(|X|1

|Y |≥λ

) < ε

using that 1

|Y |≥λ

is a G-measurable function.

2 Martingales in discrete time

2.1 Filtrations and martingales

We would like to model some random variable that “evolves with time”. For

example, in a simple random walk,

could be the position we are at time

To do so, we would like to have some

-algebras

that tells us the “information

we have at time n”. This structure is known as a filtration.

Definition

(Filtration)

A filtration is a sequence of

-algebras (

)

n≥0

such

that F ⊇ F

n+1

⊇ F

for all n. We define F

∞

= σ(F

, F

, . . .) ⊆ F.

We will from now on assume (Ω

, F, P

) is equipped with a filtration (

)

n≥0

Definition

(Stochastic process in discrete time)

A stochastic process (in discrete

time) is a sequence of random variables (X

)

n≥0

This is a very general definition, and in most cases, we would want

interact nicely with our filtration.

Definition (Natural filtration). The natural filtration of (X

)

n≥0

is given by

= σ(X

, . . . , X

Definition

(Adapted process)

We say that (

)

n≥0

is adapted (to (

)

n≥0

)

if X

is F

-measurable for all n ≥ 0. Equivalently, if F

⊆ F

Definition

(Integrable process)

A process (

)

n≥0

is integrable if

∈ L

for

all n ≥ 0.

We can now write down the definition of a martingale.

Definition

(Martingale)

An integrable adapted process (

)

n≥0

is a martingale

if for all n ≥ m, we have

E(X

| F

) = X

We say it is a super-martingale if

E(X

| F

) ≤ X

and a sub-martingale if

E(X

| F

) ≥ X

Note that it is enough to take

n −

1 for all

n ≥

0, using the tower

property.

The idea of a martingale is that we cannot predict whether

will go up

or go down in the future even if we have all the information up to the present.

For example, if

denotes the wealth of a gambler in a gambling game, then in

some sense (

)

n≥0

being a martingale means the game is “fair” (in the sense

of a fair dice).

Note that (

)

n≥0

is a super-martingale iff (

−X

)

n≥0

is a sub-martingale,

and if (

)

n≥0

is a martingale, then it is both a super-martingale and a sub-

martingale. Often, what these extra notions buy us is that we can formulate

our results for super-martingales (or sub-martingales), and then by applying the

result to both (

)

n≥0

and (

−X

)

n≥0

, we obtain the desired, stronger result

for martingales.

2.2 Stopping time and optimal stopping

The optional stopping theorem says the definition of a martingale in fact implies

an a priori much stronger property. To formulate the optional stopping theorem,

we need the notion of a stopping time.

Definition

(Stopping time)

A stopping time is a random variable

: Ω

→

≥0

∪ {∞} such that

{T ≤ n} ∈ F

for all n ≥ 0.

This means that at time

, if we want to know if

has occurred, we can

determine it using the information we have at time n.

Note that

is a stopping time iff

n} ∈ F

for all

, since if

is a

stopping time, then

{T = n} = {T ≤ n} \ {T ≤ n − 1},

and {T ≤ n − 1} ∈ F

n−1

⊆ F

. Conversely,

{T ≤ n} =

[

k=1

{T = k} ∈ F

This will not be true in the continuous case.

Example. If B ∈ B(R), then we can define

T = inf{n : X

∈ B}.

Then this is a stopping time.

On the other hand,

T = sup{n : X

∈ B}

is not a stopping time (in general).

Given a stopping time, we can make the following definition:

Definition

(

)

For a stopping time

, we define the random variable

(ω) = X

T (ω)

(ω)

on {T < ∞}, and 0 otherwise.

Later, for suitable martingales, we will see that the limit

∞

lim

n→∞

makes sense. In that case, We define X

(ω) to be X

∞

(ω) if T = ∞.

Similarly, we can define

Definition (Stopped process). The stopped process is defined by

)

n≥0

= (X

T (ω)∧n

(ω))

n≥0

This says we stop evolving the random variable X once T has occurred.

We would like to say that

is “

-measurable”, i.e. to compute

, we

only need to know the information up to time

. After some thought, we see

that the following is the correct definition of F

Definition (F

). For a stopping time T , define

= {A ∈ F

∞

: A ∩ {T ≤ n} ∈ F

This is easily seen to be a σ-algebra.

Example. If T ≡ n is constant, then F

= F

There are some fairly immediate properties of these objects, whose proof is

left as an exercise for the reader:

Proposition.

(i) If T, S, (T

)

n≥0

are all stopping times, then

T ∨ S, T ∧ S, sup

, inf T

, lim sup T

, lim inf T

are all stopping times.

(ii) F

is a σ-algebra

(iii) If S ≤ T , then F

⊆ F

(iv) X

T <∞

is F

-measurable.

(v)

If (

) is an adapted process, then so is (

)

n≥0

for any stopping time

(vi)

If (

) is an integrable process, then so is (

)

n≥0

for any stopping time

T .

We now come to the fundamental property of martingales.

Theorem

(Optional stopping theorem)

Let (

)

n≥0

be a super-martingale

and S ≤ T bounded stopping times. Then

≤ EX

Proof. Follows from the next theorem.

What does this theorem mean? If

is a martingale, then it is both a

super-martingale and a sub-martingale. So we can apply this to both

and

−X, and so we have

E(X

) = E(X

In particular, since 0 is a stopping time, we see that

= EX

for any bounded stopping time T .

Recall that martingales are supposed to model fair games. If we again think

as the wealth at time

, and

as the time we stop gambling, then this

says no matter how we choose

, as long as it is bounded, the expected wealth

at the end is the same as what we started with.

Example. Consider the stopping time

T = inf{n : X

= 1},

and take

such that

= 0. Then clearly we have

= 1. So this tells us

T is not a bounded stopping time!

Theorem. The following are equivalent:

(i) (X

)

n≥0

is a super-martingale.

(ii) For any bounded stopping times T and any stopping time S,

E(X

| F

) ≤ X

S∧T

(iii) (X

) is a super-martingale for any stopping time T .

(iv) For bounded stopping times S, T such that S ≤ T , we have

≤ EX

In particular, (iv) implies (i).

Proof.

–

(ii)

⇒

(iii): Consider (

)

≥0

for a stopping time

. To check if this is a

super-martingale, we need to prove that whenever m ≤ n,

E(X

n∧T

| F

) ≤ X

m∧T

But this follows from (ii) above by taking S = m and T = T

∧ n.

– (ii) ⇒ (iv): Clear by the tower law.

– (iii) ⇒ (i): Take T = ∞.

– (i) ⇒ (ii): Assume T ≤ n. Then

= X

S∧T

S≤k<T

k+1

− X

)

= X

S∧T

k=0

k+1

− X

S≤k<T

(∗)

Now note that

{S ≤ k < T }

{S ≤ k} ∩ {T ≤ k}

∈ F

. Let

A ∈ F

Then A ∩ {S ≤ k} ∈ F

by definition of F

. So A ∩ {S ≤ k < T } ∈ F

Apply E to to (∗) × 1

. Then we have

E(X

) = E(X

S∧T

) +

k=0

E(X

k+1

− X

A∩{S≤k<T }

But for all k, we know

E(X

k+1

− X

A∩{S≤k<T }

≤ 0,

since X is a super-martingale. So it follows that for all A ∈ F

, we have

E(X

· 1

) ≤ E(X

S∧T

But since

S∧T

measurable, it is in particular

measurable. So

it follows that for all A ∈ F

, we have

E(E(X

| F

) ≤ E(X

S∧T

So the result follows.

– (iv) ⇒ (i): Fix m ≤ n and A ∈ F

. Take

T = m1

+ 1

One then manually checks that this is a stopping time. Now note that

= X

+ X

So we have

0 ≥ E(X

) − E(X

)

= E(X

) − E(X

)

= E(X

) − E(X

Then the same argument as before gives the result.

2.3 Martingale convergence theorems

One particularly nice property of martingales is that they have nice conver-

gence properties. We shall begin by proving a pointwise version of martingale

convergence.

Theorem

(Almost sure martingale convergence theorem)

Suppose (

)

n≥0

is a super-martingale that is bounded in

, i.e.

sup

E|X

| < ∞

. Then there

exists an F

∞

-measurable X

∞

∈ L

such that

→ X

∞

a.s. as n → ∞.

To begin, we need a convenient characterization of when a series converges.

Definition

(Upcrossing)

Let (

) be a sequence and (

a, b

) an interval. An

upcrossing of (

a, b

) by (

) is a sequence

j, j

+ 1

, . . . , k

such that

≤ a

and

≥ b. We define

[a, b, (x

)] = number of disjoint upcrossings contained in {1, . . . , n}

U[a, b, (x

)] = lim

n→∞

[a, b, x].

We can then make the following elementary observation:

Lemma.

Let (

)

n≥0

be a sequence of numbers. Then

converges in

if and

only if

(i) lim inf |x

| < ∞.

(ii) For all a, b ∈ Q with a < b, we have U [a, b, (x

)] < ∞.

For our martingales, since they are bounded in

, Fatou’s lemma tells us

E lim inf |X

| < ∞

. So

lim inf |X

= 0 almost surely. Thus, it remains to show

that for any fixed

a < b ∈ Q

, we have

(

[

a, b,

(

)] =

∞

) = 0. This is a

consequence of Doob’s upcrossing lemma.

Lemma (Doob’s upcrossing lemma). If X

is a super-martingale, then

(b − a)E(U

[a, b(X

)]) ≤ E(X

− a)

−

Proof.

Assume that

is a positive super-martingale. We define stopping times

, T

as follows:

– T

= 0

– S

k+1

= inf{n : X

≤ a, n ≥ T

}

– T

k+1

= inf{n : X

≥ b, n ≥ S

k+1

Given an

, we want to count the number of upcrossings before

. There are

two cases we might want to distinguish:

Now consider the sum

k=1

∧n

− X

∧n

In the first case, this is equal to

k=1

− X

k=U

− X

≥ (b − a)U

In the second case, it is equal to

k=1

−X

+(X

−X

k=U

−X

≥ (b−a)U

+(X

−X

Thus, in general, we have

k=1

∧n

− X

∧n

≥ (b − a)U

+ (X

− X

∧n

By definition,

< T

≤ n

. So the expectation of the LHS is always non-negative

by super-martingale convergence, and thus

0 ≥ (b − a)EU

+ E(X

− X

∧n

Then observe that

− X

≥ −(X

− a)

−

The almost-sure martingale convergence theorem is very nice, but often it is

not good enough. For example, we might want convergence in

instead. The

following example shows this isn’t always possible:

Example. Suppose (ρ

)

n≥0

is a sequence of iid random variables and

P(ρ

= 0) =

= P(ρ

= 2).

Let

k=0

Then this is a martingale, and

= 1. On the other hand,

→

0 almost

surely. So kX

− X

∞

does not converge to 0.

For

p >

1, if we want convergence in

, it is not surprising that we at least

need the sequence to be

bounded. We will see that this is in fact sufficient.

For

= 1, however, we need a bit more than being bounded in

. We will need

uniform integrability.

To prove this, we need to establish some inequalities.

Lemma

(Maximal inequality)

Let (

) be a sub-martingale that is non-

negative, or a martingale. Define

∗

= sup

k≤n

|, X

∗

= lim

n→∞

∗

If λ ≥ 0, then

λP(X

∗

≥ λ) ≤ E[|X

∗

≥λ

In particular, we have

λP(X

∗

≥ λ) ≤ E[|X

|].

Markov’s inequality says almost the same thing, but has

[

∗

] instead of

E[|X

|]. So this is a stronger inequality.

Proof.

is a martingale, then

is a sub-martingale. So it suffices to

consider the case of a non-negative sub-martingale. We define the stopping time

T = inf{n : X

≥ λ}.

By optional stopping,

≥ EX

T ∧n

= EX

T ≤n

+ EX

T >n

≥ λP(T ≤ n) + EX

T >n

= λP(X

∗

≥ λ) + EX

T >n

Lemma (Doob’s L

inequality). For p > 1, we have

∗

≤

p − 1

for all n.

Proof. Let k > 0, and consider

∗

∧ kk

= E|X

∗

∧ k|

We use the fact that

p−1

ds.

So we have

∗

∧ kk

= E|X

∗

∧ k|

= E

∗

∧k

p−1

= E

p−1

∗

≥x

p−1

P(X

∗

≥ x) dx (Fubini)

≤

p−2

∗

≥x

dx (maximal inequality)

= EX

p−2

∗

≥x

dx (Fubini)

p − 1

∗

∧ k)

p−1

≤

p − 1

(E(X

∗

∧ k)

)

p−1

(H¨older)

p − 1

∗

∧ kk

p−1

Now take the limit k → ∞ and divide by kX

∗

p−1

Theorem

(

martingale convergence theorem)

Let (

)

n≥0

be a martingale,

and p > 1. Then the following are equivalent:

(i) (X

)

n≥0

is bounded in L

, i.e. sup

E|X

< ∞.

(ii)

(

)

n≥0

converges as

n → ∞

to a random variable

∞

∈ L

almost surely

and in L

(iii) There exists a random variable Z ∈ L

such that

= E(Z | F

)

Moreover, in (iii), we always have X

∞

= E(Z | F

∞

This gives a bijection between martingales bounded in

and

(

∞

sending (X

)

n≥0

7→ X

∞

Proof.

–

(i)

⇒

(ii): If (

)

n≥0

is bounded in

, then it is bounded in

. So by

the martingale convergence theorem, we know (

)

n≥0

converges almost

surely to X

∞

. By Fatou’s lemma, we have X

∞

∈ L

Now by monotone convergence, we have

∗

= lim

∗

≤

p − 1

sup

< ∞.

By the triangle inequality, we have

− X

∞

| ≤ 2X

∗

a.s.

So by dominated convergence, we know that X

→ X

∞

in L

– (ii) ⇒ (iii): Take Z = X

∞

. We want to prove that

= E(X

∞

| F

To do so, we show that

− E

(

∞

| F

)

= 0. For

n ≥ m

, we know

this is equal to

kE(X

| F

)−E(X

∞

| F

= kE(X

−X

∞

| F

≤ kX

−X

∞

→ 0

n → ∞

, where the last step uses Jensen’s. But it is also a constant. So

we are done.

–

(iii)

⇒

(i): Since expectation decreases

norms, we already know that

)

n≥0

is L

-bounded.

To show the “moreover” part, note that

n≥0

is a

-system that

generates F

∞

. So it is enough to prove that

∞

= E(E(Z | F

∞

But if A ∈ F

, then

∞

= lim

n→∞

= lim

n→∞

E(E(Z | F

)

= lim

n→∞

E(E(Z | F

∞

where the last step relies on the fact that 1

is F

-measurable.

We finally finish off the

= 1 case with the additional uniform integrability

condition.

Theorem

(Convergence in

)

Let (

)

n≥0

be a martingale. Then the follow-

ing are equivalent:

(i) (X

)

n≥0

is uniformly integrable.

(ii) (X

)

n≥0

converges almost surely and in L

(iii) There exists Z ∈ L

such that X

= E(Z | F

) almost surely.

Moreover, X

∞

= E(Z | F

∞

The proof is very similar to the L

case.

Proof.

–

(i)

⇒

(ii): Let (

)

n≥0

be uniformly integrable. Then (

)

n≥0

is bounded

. So the (

)

n≥0

converges to

∞

almost surely. Then by measure

theory, uniform integrability implies that in fact X

→ L

– (ii) ⇒ (iii): Same as the L

case.

–

(iii)

⇒

(i): For any

Z ∈ L

, the collection

(

Z | G

) ranging over all

σ-subalgebras G is uniformly integrable.

Thus, there is a bijection between uniformly integrable martingales and

∞

We now revisit optional stopping for uniformly integrable martingales. Recall

that in the statement of optional stopping, we needed our stopping times to be

bounded. It turns out if we require our martingales to be uniformly integrable,

then we can drop this requirement.

Theorem.

If (

)

n≥0

is a uniformly integrable martingale, and

S, T

are arbi-

trary stopping times, then E(X

| F

) = X

S∧T

. In particular EX

= X

Note that we are now allowing arbitrary stopping times, so

may be infinite

with non-zero probability. Hence we define

∞

n=0

T =n

+ X

∞

T =∞

Proof. By optional stopping, for every n, we know that

E(X

T ∧n

| F

) = X

S∧T ∧n

We want to be able to take the limit as

n → ∞

. To do so, we need to show

that things are uniformly integrable. First, we apply optional stopping to write

T ∧n

= E(X

| F

T ∧n

)

= E(E(X

∞

| F

) | F

T ∧n

)

= E(X

∞

| F

T ∧n

So we know (

)

n≥0

is uniformly integrable, and hence

n∧T

→ X

almost

surely and in L

To understand E(X

T ∧n

| F

), we note that

kE(X

n∧T

− X

| F

≤ kX

n∧T

− X

→ 0 as n → ∞.

So it follows that E(X

n∧T

| F

) → E(X

| F

) as n → ∞.

2.4 Applications of martingales

Having developed the theory, let us move on to some applications. Before we do

that, we need the notion of a backwards martingale.

Definition

(Backwards filtration)

A backwards filtration on a measurable space

(E, E) is a sequence of σ-algebras

⊆ E such that

n+1

⊆

. We define

∞

n≥0

Theorem. Let Y ∈ L

, and let

be a backwards filtration. Then

E(Y |

) → E(Y |

∞

)

almost surely and in L

A process of this form is known as a backwards martingale.

Proof.

We first show that

(

Y |

) converges. We then show that what it

converges to is indeed E(Y |

∞

We write

= E(Y |

Observe that for all

n ≥

0, the process (

n−k

)

0≤k≤n

is a martingale by the tower

property, and so is (

−X

n−k

)

0≤k≤n

. Now notice that for all

a < b

, the number

of upcrossings of [

a, b

] by (

)

0≤k≤n

is equal to the number of upcrossings of

[−b, −a] by (−X

n−k

)

0≤k≤n

Using the same arguments as for martingales, we conclude that

→ X

∞

almost surely and in L

for some X

∞

To see that

∞

(

Y |

∞

), we notice that

∞

measurable. So it is

enough to prove that

∞

= E(E(Y |

∞

)

for all A ∈

∞

. Indeed, we have

∞

= lim

n→∞

= lim

n→∞

E(E(Y |

)

= lim

n→∞

E(Y | 1

)

= E(Y | 1

)

= E(E(Y |

Theorem

(Kolmogorov 0-1 law)

Let (

)

n≥0

be independent random variables.

Then, let

= σ(X

n+1

, X

n+2

, . . .).

Then the tail σ-algebra

∞

is trivial, i.e. P(A) ∈ {0, 1} for all A ∈

∞

Proof.

Let

(

, . . . , X

). Then

and

are independent. Then for

all A ∈

∞

, we have

E(1

| F

) = P(A).

But the LHS is a martingale. So it converges almost surely and in

(

| F

∞

). But

∞

-measurable, since

∞

⊆ F

∞

. So this is just

. So

= P(A) almost surely, and we are done.

Theorem

(Strong law of large numbers)

Let (

)

n≥1

be iid random variables

in L

, with EX

= µ. Define

i=1

Then

→ µ as n → ∞

almost surely and in L

Proof. We have

= E(S

| S

) =

i=1

E(X

| S

) = nE(X

| S

So the problem is equivalent to showing that

(

| S

)

→ µ

n → ∞

. This

seems like something we can tackle with our existing technology, except that the

do not form a filtration.

Thus, define a backwards filtration

= σ(S

, S

n+1

, S

n+2

, . . .) = σ(S

, X

n+1

, X

n+2

, . . .) = σ(S

, τ

where

(

n+1

, X

n+2

, . . .

). We now use the property of conditional expec-

tation that we’ve never used so far, that adding independent information to a

conditional expectation doesn’t change the result. Since

is independent of

σ(X

, S

), we know

= E(X

| S

) = E(X

Thus, by backwards martingale convergence, we know

→ E(X

∞

But by the Kolmogorov 0-1 law, we know

∞

is trivial. So we know that

(

∞

) is almost constant, which has to be E(E(X

∞

)) = E(X

) = µ.

Recall that if (E, E, µ) is a measure space and f ∈ mE

, then

ν(A) = µ(f 1

)

is a measure on E. We say f is a density of ν with respect to µ.

We can ask an “inverse” question – given two different measures on

, when

is it the case that one is given by a density with respect to the other?

A first observation is that if

(

) =

(

), then whenever

(

) = 0, we

must have

(

) = 0. However, this is not sufficient. For example, let

be a

counting measure on

, and

the Lebesgue measure. Then our condition is

satisfied. However, if

is given by a density

with respect to

, we must have

0 = ν({x}) = µ(f 1

{x}

) = f(x).

So f ≡ 0, but taking f ≡ 0 clearly doesn’t give the Lebesgue measure.

The problem with this is that µ is not a σ-finite measure.

Theorem

(Radon–Nikodym)

Let (Ω

, F

) be a measurable space, and

and

be two probability measures on (Ω, F). Then the following are equivalent:

(i) Q

is absolutely continuous with respect to

, i.e. for any

A ∈ F

, if

(

) = 0,

then Q(A) = 0.

(ii)

For any

ε >

0, there exists

δ >

0 such that for all

A ∈ F

, if

(

)

≤ δ

, then

Q(A) ≤ ε.

(iii) There exists a random variable X ≥ 0 such that

Q(A) = E

(X1

In this case,

is called the Radon–Nikodym derivative of

with respect

to P, and we write X =

Note that this theorem works for all finite measures by scaling, and thus for

σ-finite measures by partitioning Ω into sets of finite measure.

Proof.

We shall only treat the case where

is countably generated, i.e.

(

, F

, . . .

) for some sets

. For example, any second-countable topological

space is countably generated.

– (iii) ⇒ (i): Clear.

– (ii) ⇒ (iii): Define the filtration

= σ(F

, F

, . . . , F

Since F

is finite, we can write it as

= σ(A

n,1

, . . . , A

n,m

where each

n,i

is an atom, i.e. if

B ( A

n,i

and

B ∈ F

, then

∅

. We

define

n=1

Q(A

n,i

)

P(A

n,i

)

n,i

where we skip over the terms where

(

n,i

) = 0. Note that this is exactly

designed so that for any A ∈ F

, we have

) = E

n,i

⊆A

Q(A

n,i

)

P(A

, i)

n,i

= Q(A).

Thus, if A ∈ F

⊆ F

n+1

, we have

n+1

= Q(A) = EX

So we know that

E(X

n+1

| F

) = X

It is also immediate that (X

)

n≥0

is adapted. So it is a martingale.

We next show that (

)

n≥0

is uniformly integrable. By Markov’s inequality,

we have

P(X

≥ λ) ≤

≤ δ

for λ large enough. Then

E(X

≥λ

) = Q(X

≥ λ) ≤ ε.

So we have shown uniform integrability, and so we know

→ X

almost

surely and in L

for some X. Then for all A ∈

n≥0

, we have

Q(A) = lim

n→∞

= EX1

(

−

) and

EX1

(−)

agree on

n≥0

, which is a generating

-system

for F, so they must be the same.

–

(i)

⇒

(ii): Suppose not. Then there exists some

ε >

0 and some

, A

, . . . ∈ F such that

Q(A

) ≥ ε, P(A

) ≤

Since

P(A

) is finite, by Borel–Cantelli, we know

P lim sup A

= 0.

On the other hand, by, say, dominated convergence, we have

Q lim sup A

= Q

∞

n=1

∞

[

m=n

= lim

k→∞

n=1

∞

[

m=n

≥ lim

k→∞

∞

[

m=k

≥ ε.

This is a contradiction.

Finally, we end the part on discrete time processes by relating what we have

done to Markov chains.

Let’s first recall what Markov chains are. Let

be a countable space, and

a measure on E. We write µ

= µ({x}), and then µ(f ) = µ · f.

Definition

(Transition matrix)

A transition matrix is a matrix

= (

)

x,y∈E

such that each p

= (p

x,y

)

y∈E

is a probability measure on E.

Definition

(Markov chain)

An adapted process (

) is called a Markov chain

if for any n and A ∈ F

such that {x

= x} ⊇ A, we have

P(X

n+1

= y | A) = p

Definition

(Harmonic function)

A function

E → R

is harmonic if

P f

In other words, for any x, we have

f(y) = f(x).

We then observe that

Proposition.

is harmonic and bounded, and (

)

n≥0

is Markov, then

(f(X

))

n≥0

is a martingale.

Example.

Let (

)

n≥0

be iid

-valued random variables in

, and

[

] = 0.

Then

= X

+ ··· + X

is a martingale and a Markov chain.

However, if

is a

-valued random variable, consider the random variable

(

)

n≥0

and

(

, Z

). Then this is a martingale but not a Markov

chain.

3 Continuous time stochastic processes

In the remainder of the course, we shall study continuous time processes. When

doing so, we have to be rather careful, since our processes are indexed by

an uncountable set, when measure theory tends to only like countable things.

Ultimately, we would like to study Brownian motion, but we first develop some

general theory of continuous time processes.

Definition

(Continuous time stochastic process)

A continuous time stochastic

process is a family of random variables (X

)

t≥0

(or (X

)

t∈[a,b]

In the discrete case, if

is a random variable taking values in

{

, . . .}

then it makes sense to look at the new random variable X

, since this is just

∞

n=0

T =n

This is obviously measurable, since it is a limit of measurable functions.

However, this is not necessarily the case if we have continuous time, unless

we assume some regularity conditions on our process. In some sense, we want

to depend “continuously” or at least “measurably” on t.

To make sense of X

, It would be enough to require that the map

ϕ : (ω, t) 7→ X

(ω)

is measurable when we put the product

-algebra on the domain. In this case,

(

) =

(

ω, T

(

)) is measurable. In this formulation, we see why we didn’t

have this problem with discrete time — the

-algebra on

is just

(

), and so

all sets are measurable. This is not true for B([0, ∞)).

However, being able to talk about

is not the only thing we want. Often,

the following definitions are useful:

Definition

(Cadlag function)

We say a function

: [0

, ∞

]

→ R

is cadlag if

for all t

lim

s→t

= x

, lim

s→t

−

exists.

The name cadlag (or c´adl´ag) comes from the French term continue ´a droite,

limite ´a gauche, meaning “right-continuous with left limits”.

Definition

(Continuous/Cadlag stochastic process)

We say a stochastic process

is continuous (resp. cadlag) if for any

ω ∈

Ω, the map

t 7→ X

(

) is continuous

(resp. cadlag).

Notation.

We write

([0

, ∞

)

, R

) for the space of all continuous functions

[0, ∞) → R, and D([0, ∞), R) the space of all cadlag functions.

We endow these spaces with a

-algebra generated by the coordinate functions

)

t≥0

7→ x

Then a continuous (or cadlag) process is a random variable taking values in

C([0, ∞), R) (or D([0, ∞), R)).

Definition

(Finite-dimensional distribution)

A finite dimensional distribution

of (X

)

t≥0

is a measure on R

of the form

,...,t

(A) = P((X

, . . . , X

) ∈ A)

for all A ∈ B(R

), for some 0 ≤ t

< t

< . . . < t

The important observation is that if we know all finite-dimensional distri-

butions, then we know the law of

, since the cylinder sets form a

-system

generating the σ-algebra.

If we know, a priori, that (

)

t≥0

is a continuous process, then for any dense

set

I ⊆

, ∞

), knowing (

)

t≥0

is the same as knowing (

)

t∈I

. Conversely, if

we are given some random variables (

)

t∈I

, can we extend this to a continuous

process (

)

t≥0

? The answer is, of course, “not always”, but it turns out we can

if we assume some H¨older conditions.

Theorem

(Kolmogorov’s criterion)

Let (

)

t∈I

be random variables, where

I ⊆ [0, 1] is dense. Assume that for some p > 1 and β >

, we have

kρ

− ρ

≤ C|t − s|

for all t, s ∈ I. (∗)

Then there exists a continuous process (X

)

t∈I

such that for all t ∈ I,

= ρ

almost surely,

and moreover for any

α ∈

, β −

), there exists a random variable

∈ L

such that

− X

| ≤ K

|s − t|

for all s, t ∈ [0, 1].

Before we begin, we make the following definition:

Definition (Dyadic numbers). We define



s ∈ [0, 1] : s =

for some k ∈ Z



, D =

[

n≥0

Observe that

D ⊆

1] is a dense subset. Topologically, this is just like any

other dense subset. However, it is convenient to use

instead of an arbitrary

subset when writing down formulas.

Proof.

First note that we may assume

D ⊆ I

. Indeed, for

t ∈ D

, we can define

by taking the limit of

since

is complete. The equation (

∗

) is

preserved by limits, so we may work on I ∪ D instead.

By assumption, (

)

t∈I

is H¨older in

. We claim that it is almost surely

pointwise H¨older.

Claim. There exists a random variable K

∈ L

such that

|ρ

− ρ

| ≤ K

|s − t|

for all s, t ∈ D.

Moreover, K

is increasing in α.

Given the claim, we can simply set

(ω) =

(

lim

q→t,q∈D

(ω) K

< ∞ for all α ∈ [0, β −

)

0 otherwise

Then this is a continuous process, and satisfies the desired properties.

To construct such a

, observe that given any

s, t ∈ D

, we can pick

m ≥

such that

−(m+1)

< t − s ≤ 2

−m

Then we can pick u =

m+1

such that s < u < t. Thus, we have

u − s < 2

−m

, t − u < 2

−m

Therefore, by binary expansion, we can write

u − s =

i≥m+1

, t − u =

i≥m+1

for some x

, y

∈ {0, 1}. Thus, writing

= sup

t∈D

t+2

−n

− S

we can bound

|ρ

− ρ

| ≤ 2

∞

n=m+1

and thus

|ρ

− ρ

|s − t|

≤ 2

∞

n=m+1

(m+1)α

≤ 2

∞

n=m+1

(n+1)α

Thus, we can define

= 2

n≥0

nα

We only have to check that this is in L

, and this is not hard. We first get

≤

t∈D

E|ρ

t+2

−n

− ρ

≤ C

· 2

−nβ

= C2

n(1−pβ)

Then we have

≤ 2

n≥0

nα

≤ 2C

n≥0

n(α+

−β)

< ∞.

We will later use this to construct Brownian motion. For now, we shall

develop what we know about discrete time processes for continuous time ones.

Fortunately, a lot of the proofs are either the same as the discrete time ones, or

can be reduced to the discrete time version. So not much work has to be done!

Definition

(Continuous time filtration)

A continuous-time filtration is a family

-algebras (

)

t≥0

such that

⊆ F

s ≤ t

. Define

∞

(

t ≥

0).

Definition

(Stopping time)

A random variable

: Ω

→

, ∞

] is a stopping

time if {T ≤ t} ∈ F

for all t ≥ 0.

Proposition.

Let (

)

t≥0

be a cadlag adapted process and

S, T

stopping times.

Then

(i) S ∧ T is a stopping time.

(ii) If S ≤ T , then F

⊆ F

(iii) X

T <∞

is F

-measurable.

(iv) (X

)

t≥0

= (X

T ∧t

)

t≥0

is adapted.

We only prove (iii). The first two are the same as the discrete case, and the

proof of (iv) is similar to that of (iii).

To prove this, we need a quick lemma, whose proof is a simple exercise.

Lemma.

A random variable

-measurable iff

{T ≤t}

-measurable

for all t ≥ 0.

Proof of (iii) of proposition.

We need to prove that

{T ≤t}

-measurable

for all t ≥ 0.

We write

T ≤t

= X

T <t

+ X

T =t

We know the second term is measurable. So it suffices to show that

T <t

-measurable.

Define

= 2

−n

T e

. This is a stopping time, since we always have

≥ T

Since (X

)

t≥0

is cadlag, we know

T <t

= lim

n→∞

∧t

T <t

Now

∧ t

can take only countably (and in fact only finitely) many values, so

we can write

∧t

q∈D

,q<t

+ X

T <t<T

and this is F

-measurable. So we are done.

In the continuous case, stopping times are a bit more subtle. A natural

source of stopping times is given by hitting times.

Definition (Hitting time). Let A ∈ B(R). Then the hitting time of A is

= inf

t≥0

≤ A}.

This is not always a stopping time. For example, consider the process

such that with probability

, it is given by

, and with probability

, it is

given by

(

t t ≤ 1

2 − t t > 1

Take

= (1

, ∞

). Then

= 1 in the first case, and

∞

in the second case.

But {T

≤ 1} 6∈ F

, as at time 1, we don’t know if we are going up or down.

The problem is that A is not closed.

Proposition. Let A ⊆ R be a closed set and (X

)

t≥0

be continuous. Then T

is a stopping time.

Proof. Observe that d(X

, A) is a continuous function in q. So we have

≤ t} =



inf

q∈Q,q<t

d(X

, A) = 0



Motivated by our previous non-example of a hitting time, we define

Definition

(Right-continuous filtration)

Given a continuous filtration (

)

t≥0

we define

s>t

⊇ F

We say (F

)

t≥0

is right continuous if F

= F

Often, we want to modify our events by things of measure zero. While this

doesn’t really affect anything, it could potentially get us out of

. It does no

harm to enlarge all F

to include events of measure zero.

Definition

(Usual conditions)

Let

{A ∈ F

∞

(

)

∈ {

}}

. We say

that (F

)

t≥0

satisfies the usual conditions if it is right continuous and N ⊆ F

Proposition.

Let (

)

t≥0

be an adapted process (to (

)

t≥0

) that is cadlag,

and let A be an open set. Then T

is a stopping time with respect to F

Proof. Since (X

)

t≥0

is cadlag and A is open. Then

< t} =

[

q<t,q∈Q

∈ A} ∈ F

Then

≤ t} =

n≥0



< t +



∈ F

Definition

(Coninuous time martingale)

An adapted process (

)

t≥0

is called

a martingale iff

E(X

| F

) = X

for all t ≥ s, and similarly for super-martingales and sub-martingales.

Note that if t

≤ t

≤ ···, then

= X

is a discrete time martingale. Similarly, if t

≥ t

≥ ···, and

= X

defines a discrete time backwards martingale. Using this observation, we can

now prove what we already know in the discrete case.

Theorem

(Optional stopping theorem)

Let (

)

t≥0

be an adapted cadlag

process in L

. Then the following are equivalent:

(i)

For any bounded stopping time

and any stopping time

, we have

∈ L

and

E(X

| F

) = X

T ∧S

(ii) For any stopping time T , (X

)

t≥0

= (X

T ∧t

)

t≥0

is a martingale.

(iii) For any bounded stopping time T , X

∈ L

and EX

= EX

Proof.

We show that (i)

⇒

(ii), and the rest follows from the discrete case

similarly.

Since T is bounded, assume T ≤ t, and we may wlog assume t ∈ N. Let

= 2

−n

T e, S

= 2

−n

Se.

We have T

& T as n → ∞, and so X

→ X

as n → ∞.

Since

≤ t

+ 1, by restricting our sequence to

, discrete time optional

stopping implies

E(X

t+1

| F

) = X

In particular,

is uniformly integrable. So it converges in

. This implies

∈ L

To show that

(

| F

) =

T ∧S

, we need to show that for any

A ∈ F

, we

have

= EX

S∧T

Since F

⊆ F

, we already know that

= lim

n→∞

∧T

by discrete time optional stopping, since

(

| F

) =

∧S

. So taking the

limit n → ∞ gives the desired result.

Theorem.

Let (

)

t≥0

be a super-martingale bounded in

. Then it converges

almost surely as t → ∞ to a random variable X

∞

∈ L

Proof.

Define

[

a, b,

(

)

t≥0

] be the number of upcrossings of [

a, b

] by (

)

t≥0

up to time s, and

∞

[a, b, (x

)

t≥0

] = lim

s→∞

[a, b, (x

)

t≥0

Then for all s ≥ 0, we have

[a, b, (x

)

t≥0

] = lim

n→∞

[a, b, (x

(n)

)

t∈D

By monotone convergence and Doob’s upcrossing lemma, we have

[a, b, (X

)

t≥0

] = lim

n→∞

[a, b, (X

)

t∈D

] ≤

E(X

− a)

−

b − 1

≤

E|X

| + a

b − a

We are then done by taking the supremum over

. Then finish the argument as

in the discrete case.

This shows we have pointwise convergence in

R ∪ {±∞}

, and by Fatou’s

lemma, we know that

E|X

∞

| = E lim inf

→∞

| ≤ lim inf

→∞

E|X

| < ∞.

So X

∞

is finite almost surely.

We shall now state without proof some results we already know for the

discrete case. The proofs are straightforward generalizations of the discrete

version.

Lemma

(Maximal inequality)

Let (

)

t≥0

be a cadlag martingale or a non-

negative sub-martingale. Then for all t ≥ 0, λ ≥ 0, we have

λP(X

∗

≥ λ) ≤ E|X

Lemma (Doob’s L

inequality). Let (X

)

t≥0

be as above. Then

∗

≤

p − 1

Definition

(Version)

We say a process (

)

t≥0

is a version of (

)

t≥0

if for all

t, P(Y

= X

) = 1.

Note that this not the same as saying P(∀

: Y

= X

) = 1.

Example.

Take

≡

0 for all

and take

be a uniform random variable on

[0, 1]. Define

(

1 t = U

0 otherwise

Then for all

, we have

almost surely. So (

) is a version of (

However, X

is continuous but Y

is not.

Theorem

(Regularization of martingales)

Let (

)

t≥0

be a martingale with

respect to (

), and suppose

satisfies the usual conditions. Then there exists

a version (

) of (X

) which is cadlag.

Proof. For all M > 0, define

Ω

(

sup

q∈D∩[0,M]

| < ∞

)

∩

a<b∈Q



[a, b, (X

)

t∈D∩[0,M ]

] < ∞



Then we see that P(Ω

) = 1 by Doob’s upcrossing lemma. Now define

= lim

s≥t,s→t,s∈D

Ω

Then this is F

measurable because F

satisfies the usual conditions.

Take a sequence

& t

. Then (

) is a backwards martingale. So it

converges almost surely in L

. But we can write

= E(X

| F

Since

→

, and

-measurable, we know

almost

surely.

The fact that it is cadlag is an exercise.

Theorem

(

convergence of martingales)

Let (

)

t≥0

be a cadlag martingale.

Then the following are equivalent:

(i) (X

)

t≥0

is bounded in L

(ii) (X

)

t≥0

converges almost surely and in L

(iii) There exists Z ∈ L

such that X

= E(Z | F

) almost surely.

Theorem

(

convergence of martingales)

Let (

)

t≥0

be a cadlag martingale.

Then the folloiwng are equivalent:

(i) (X

)

t≥0

is uniformly integrable.

(ii) (X

)

t≥0

converges almost surely and in L

to X

∞

(iii) There exists Z ∈ L

such that E(Z | F

) = X

almost surely.

Theorem

(Optional stopping theorem)

Let (

)

t≥0

be a uniformly integrable

martingale, and let S, T b e any stopping times. Then

E(X

| F

) = X

S∧T

4 Weak convergence of measures

Often, we may want to consider random variables defined on different spaces.

Since we cannot directly compare them, a sensible approach would be to use

them to push our measure forward to R, and compare them on R.

Definition

(Law)

Let

be a random variable on (Ω

, F, P

). The law of

the probability measure µ on (R, B(R)) defined by

µ(A) = P(X

−1

(A)).

Example. For x ∈ R, we have the Dirac δ measure

(A) = 1

{x∈A}

This is the law of a random variable that constantly takes the value x.

Now if we have a sequence

→ x

, then we would like to say

→ δ

. In

what sense is this true? Suppose f is continuous. Then

fdδ

= f(x

) → f(x) =

fdδ

So we do have some sort of convergence if we pair it with a continuous function.

Definition

(Weak convergence)

Let (

)

n≥0

be probability measures on

a metric space (

M, d

) with the Borel measure. We say that

⇒ µ

, or

converges weakly to µ if

(f) → µ(f)

for all f bounded and continuous.

If (

)

n≥0

are random variables, then we say (

) converges in distribution

if µ

converges weakly.

Note that in general, weak convergence does not say anything about how

measures of subsets behave.

Example.

→ x

, then

→ δ

weakly. However, if

for all

, then

({x}) = 0 but δ

({x}) = 1. So

({x}) 6→ δ

({x}).

Example. Pick X = [0, 1]. Let µ

k=1

. Then

(f) =

k=1





So µ

converges to the Lebesgue measure.

Proposition. Let (µ

)

n≥0

be as above. Then, the following are equivalent:

(i) (µ

)

n≥0

converges weakly to µ.

(ii) For all open G, we have

lim inf

n→∞

(G) ≥ µ(G).

(iii) For all closed A, we have

lim sup

n→∞

(A) ≤ µ(A).

(iv) For all A such that µ(∂A) = 0, we have

lim

n→∞

(A) = µ(A)

(v)

(when

)

(

)

→ F

(

) for all

at which

is continuous, where

is the distribution function of µ, defined by F

(x) = µ

((−∞, t]).

Proof.

–

(i)

⇒

(ii): The idea is to approximate the open set by continuous functions.

We know A

is closed. So we can define

(x) = 1 ∧ (N · dist(x, A

)).

This has the property that for all N > 0, we have

≤ 1

and moreover

% 1

N → ∞

. Now by definition of weak convergence,

lim inf

n→∞

µ(A) ≥ lim inf

n→∞

) = µ(F

) → µ(A) as N → ∞.

– (ii) ⇔ (iii): Take complements.

– (iii) and (ii) ⇒ (iv): Take A such that µ(∂A) = 0. Then

µ(A) = µ(

A) = µ(

A).

So we know that

lim inf

n→∞

(A) ≥ lim inf

n→∞

(

A) ≥ µ(

A) = µ(A).

Similarly, we find that

µ(A) ≥ lim sup

n→∞

(A).

So we are done.

– (iv) ⇒ (i): We have

µ(f) =

f(x) dµ(x)

∞

f(x)≥t

dt dµ(x)

∞

µ({f ≥ t}) dt.

Since

is continuous,

∂{f ≤ t} ⊆ {f

. Now there can be only

countably many

’s such that

(

)

0. So replacing

lim

n→∞

only changes the integrand at countably many places, hence doesn’t affect

the integral. So we conclude using bounded convergence theorem.

– (iv) ⇒ (v): Assume t is a continuity point of F

. Then we have

µ(∂(−∞, t]) = µ({t}) = F

(t) − F

−

) = 0.

So µ

(∂

(−∞, t]) → µ((−∞, t]), and we are done.

– (v) ⇒ (ii): If A = (a, b), then

(A) ≥ F

) − F

)

for any

a ≤ a

≤ b

with

, b

continuity points of

. So we know

that

lim inf

n→∞

(A) ≥ F

) − F

) = µ(a

, b

By taking supremum over all such a

, b

, we find that

lim inf

n→∞

(A) ≥ µ(A).

Definition

(Tight probability measures)

A sequence of probability measures

(

)

n≥0

on a metric space (

M, e

) is tight if for all

ε >

0, there exists compact

K ⊆ M such that

sup

(M \ K) ≤ ε.

Note that this is always satisfied for compact metric spaces.

Theorem

(Prokhorov’s theorem)

If (

)

n≥0

is a sequence of tight probability

measures, then there is a subsequence (

)

k≥0

and a measure

such that

⇒ µ.

To see how this can fail without the tightness assumption, suppose we define

measures µ

on R by

(A) = ˜µ(A ∩ [n, n + 1]),

where

˜µ

is the Lebesgue measure. Then for any bounded set

, we have

lim

n→∞

(

) = 0. Thus, if the weak limit existed, it must be everywhere zero,

but this does not give a probability measure.

We shall prove this only in the case

. It is not difficult to construct a

candidate of what the weak limit should be. Simply use Bolzano–Weierstrass to

pick a subsequence of the measures such that the distribution functions converge

on the rationals. Then the limit would essentially be what we want. We then

apply tightness to show that this is a genuine distribution.

Proof.

Take

Q ⊆ R

, which is dense and countable. Let

, x

, . . .

be an enumera-

tion of

. Define

. By Bolzano–Weierstrass, and some fiddling around

with sequences, we can find some F

such that

) → y

≡ F (x

)

as k → ∞, for each fixed x

Since

is non-decreasing on

, it has left and right limits everywhere. We

extend F to R by taking right limits. This implies F is cadlag.

Take

a continuity point of

. Then for each

ε >

0, there exists

s < x < t

rational such that

|F (s) − F (t)| <

Take

large enough such that

(

)

− F

(

)

| <

, and same for

. Then by

monotonicity of F and F

, we have

(x) − F (x)| ≤ |F (s) − F (t)|+ |F

(s) − F (s)|+ |F

(t) − F (t)| ≤ ε.

It remains to show that

(

)

→

1 as

x → ∞

and

(

)

→

0 as

x → −∞

. By

tightness, for all ε > 0, there exists N > 0 such that

((−∞, N]) ≤ ε, µ

((N, ∞) ≤ ε.

This then implies what we want.

We shall end the chapter with an alternative characterization of weak con-

vergence, using characteristic functions.

Definition

(Characteristic function)

Let

be a random variable taking values

in R

. The characteristic function of X is the function R

→ C defined by

(t) = Ee

iht,xi

dµ

(x).

Note that ϕ

is continuous by bounded convergence, and ϕ

(0) = 1.

Proposition. If ϕ

= ϕ

, then µ

= µ

Theorem

(L´evy’s convergence theroem)

Let (

)

n≥0

be random variables

taking values in R

. Then the following are equivalent:

(i) µ

⇒ µ

as n → ∞.

(ii) ϕ

→ ϕ

pointwise.

We will in fact prove a stronger theorem.

Theorem

(L´evy)

Let (

)

n≥0

be as above, and let

(

)

→ ψ

(

) for all

Suppose

is continuous at 0 and

(0) = 1. Then there exists a random variable

X such that ϕ

= ψ and µ

⇒ µ

as n → ∞.

We will only prove the case d = 1. We first need the following lemma:

Lemma. Let X be a real random variable. Then for all λ > 0,

(|x| ≥ λ) ≤ cλ

1/λ

(1 − Re ϕ

(t)) dt,

where C = (1 − sin 1)

−1

Proof. For M ≥ 1, we have

(1 − cos t) dt = M − sin M ≥ M(1 − sin 1).

By setting M =

|x|

, we have

|X|≥λ

≤ C

|X|

|X|/λ

(1 − cos t) dt.

By a change of variables with t 7→ Xt, we have

|X|≥λ

≤ cλ

(1 − cos Xt) dt.

Apply µ

, and use the fact that Re ϕ

(t) = E cos(Xt).

We can now prove L´evy’s theorem.

Proof of theorem.

It is clear that weak convergence implies convergence in char-

acteristic functions.

Now observe that if

⇒ µ

iff from every subsequence (

)

k≥0

, we can

choose a further subsequence (

) such that

⇒ µ

` → ∞

. Indeed,

⇒

clear, and suppose

6⇒ µ

but satisfies the subsequence property. Then we can

choose a bounded and continuous function f such that

(f) 6⇒ µ(f).

Then there is a subsequence (

)

k≥0

such that

|µ

(

)

− µ

(

)

| > ε

. Then there

is no further subsequence that converges.

Thus, to show

⇐

, we need to prove the existence of subsequential limits

(uniqueness follows from convergence of characteristic functions). It is enough to

prove tightness of the whole sequence.

By the mean value theorem, we can choose λ so large that

cλ

1/λ

(1 − Re ψ(t)) dt <

By bounded convergence, we can choose λ so large that

cλ

1/λ

(1 − Re ϕ

(t)) dt ≤ ε

for all

. Thus, by our previous lemma, we know (

)

n≥0

is tight. So we are

done.

5 Brownian motion

Finally, we can begins studying Brownian motion. Brownian motion was first

observed by the botanist Robert Brown in 1827, when he looked at the random

movement of pollen grains in water. In 1905, Albert Einstein provided the first

mathematical description of this behaviour. In 1923, Norbert Wiener provided

the first rigorous construction of Brownian motion.

5.1 Basic properties of Brownian motion

Definition

(Brownian motion)

A continuous process (

)

t≥0

taking values in

is called a Brownian motion in R

started at x ∈ R

(i) B

= x almost surely.

(ii) For all s < t, the increment B

− B

∼ N(0, (t − s)I).

(iii)

Increments are independent. More precisely, for all

< t

< ··· < t

, the

random variables

, B

− B

, . . . , B

− B

k−1

are independent.

If B

= 0, then we call it a standard Brownian motion.

We always assume our Brownian motion is standard.

Theorem

(Wiener’s theorem)

There exists a Brownian motion on some proba-

bility space.

Proof.

We first prove existence on [0

1] and in

= 1. We wish to apply

Kolmogorov’s criterion.

Recall that

are the dyadic numbers. Let (

)

d∈D

be iid

1) random

variables on some probability space. We will define a process on

inductively

on n with the required properties. We wlog assume x = 0.

In step 0, we put

= 0, B

= Z

Assume that we have already constructed (

)

d∈D

n−1

satisfying the properties.

Take d ∈ D

\ D

n−1

, and set

= d ± 2

−n

These are the two consecutive numbers in

n−1

such that

−

< d < d

. Define

+ B

−

(n+1)/2

The condition (i) is trivially satisfied. We now have to check the other two

conditions.

Consider

− B

−

(n+1)/2

− B

−

− B

−

| {z }

(n+1)/2

| {z }

Notice that

and

are normal with variance

var

(

) =

var

(

) =

n+1

. In

particular, we have

cov(N − N

, N + N

) = var(N) − var(N

) = 0.

So B

− B

and B

− B

−

are independent.

Now note that the vector of increments of (

)

d∈D

between consecutive

numbers in

is Gaussian, since after dotting with any vector, we obtain a

linear combination of independent Gaussians. Thus, to prove independence, it

suffice to prove that pairwise correlation vanishes.

We already proved this for the case of increments between

and

, and

this is the only case that is tricky, since they both involve the same

. The

other cases are straightforward, and are left as an exercise for the reader.

Inductively, we can construct (

)

d∈D

, satisfying (i), (ii) and (iii). Note that

for all s, t ∈ D, we have

E|B

− B

= |t − s|

p/2

E|N|

for

N ∼ N

1). Since

E|N|

< ∞

for all

, by Kolmogorov’s criterion, we can

extend (

)

d∈D

to (

)

t∈[0,1]

. In fact, this is

-H¨older continuous for all

α <

Since this is a continuous process and satisfies the desired properties on

a dense set, it remains to show that the properties are preserved by taking

continuous limits.

Take 0

≤ t

< t

< ··· < t

≤

1, and 0

≤ t

< t

< ··· < t

≤

1 such that

∈ D

and t

→ t

as n → ∞ and i = 1, . . . m.

We now apply L´evy’s convergence theorem. Recall that if

is a random

variable in R

and X ∼ N(0, Σ), then

(u) = exp



−

Σu



Since (B

)

t∈[0,1]

is continuous, we have

−B

,...,B

−B

m−1

)

(u) = exp



−

Σu



= exp

−

m−1

i=1

i+1

− t

We know this converges, as n → ∞, to exp



−

m−1

i=1

i+1

− t



By L´evy’s convergence theorem, the law of (

− B

, B

− B

, . . . , B

−

m−1

) is Gaussian with the right covariance. This implies that (ii) and (iii)

hold on [0, 1].

To extend the time to [0

, ∞

), we define independent Brownian motions

)

t∈[0,1],i∈N

and define

btc−1

i=0

+ B

btc

t−btc

To extend to

, take the product of

many independent one-dimensional

Brownian motions.

Lemma.

Brownian motion is a Gaussian process, i.e. for any 0

≤ t

< t

··· < t

≤ 1, the vector (B

, B

, . . . , B

) is Gaussian with covariance

cov(B

, B

) = t

∧ t

Proof.

We know (

, B

− B

, . . . , B

− B

m−1

) is Gaussian. Thus, the

sequence (

, . . . , B

) is the image under a linear isomorphism, so it is Gaussian.

To compute covariance, for s ≤ t, we have

cov(B

, B

) = EB

= EB

−EB

+ EB

= EB

−B

) + EB

= s.

Proposition

(Invariance properties)

Let (

)

t≥0

be a standard Brownian

motion in R

(i)

is an orthogonal matrix, then (

)

t≥0

is a standard Brownian motion.

(ii)

Brownian scaling: If

a >

0, then (

−1/2

)

t≥0

is a standard Brownian

motion. This is known as a random fractal property.

(iii)

(Simple) Markov property: For all

s ≥

0, the sequence (

t+s

−B

)

t≥0

is a

standard Brownian motion, independent of (F

(iv) Time inversion: Define a process

(

0 t = 0

1/t

t > 0

Then (X

)

t≥0

is a standard Brownian motion.

Proof.

Only (iv) requires proof. It is enough to prove that

is continuous and

has the right finite-dimensional distributions. We haves

, . . . , X

) = (t

1/t

, . . . , t

1/t

The right-hand side is the image of (

1/t

, . . . , B

1/t

) under a linear isomorphism.

So it is Gaussian. If s ≤ t, then the covariance is

cov(sB

, tB

) = st cov(B

1/s

, B

1/t

) = st



∧



= s = s ∧ t.

Continuity is obvious for

t >

0. To prove continuity at 0, we already proved that

(

)

q>0,q∈Q

has the same law (as a process) as Brownian motion. By continuity

of X

for positive t, we have



lim

q∈Q

,q→0

= 0



= P



lim

q∈Q

,q→0

= 0



= 1

by continuity of B.

Using the natural filtration, we have

Theorem. For all s ≥ t, the process (B

t+s

− B

)

t≥0

is independent of F

Proof. Take a sequence s

→ s such that s

> s for all n. By continuity,

t+s

− B

= lim

n→∞

t+s

− B

almost surely. Now each of

t+s

− B

is independent of

, and hence so is

the limit.

Theorem

(Blumenthal’s 0-1 law)

The

-algebra

is trivial, i.e. if

A ∈ F

then P(A) ∈ {0, 1}.

Proof.

Apply our previous theorem. Take

A ∈ F

. Then

A ∈ σ

(

s ≥

0). So

A is independent of itself.

Proposition.

(i) If d = 1, then

1 = P(inf{t ≥ 0 : B

> 0} = 0)

= P(inf{t ≥ 0 : B

< 0} = 0)

= P(inf{t > 0 : B

= 0} = 0)

(ii) For any d ≥ 1, we have

lim

t→∞

= 0

almost surely.

(iii) If we define

= sup

0≤s≤t

, I

= inf

0≤s≤t

then S

∞

= ∞ and I

∞

= −∞ almost surely.

(iv)

is open an

, then the cone of

{tx

x ∈ A, t >

}

. Then

inf{t ≥ 0 : B

∈ C

} = 0 almost surely.

Thus, Brownian motion is pretty chaotic.

Proof.

(i)

It suffices to prove the first equality. Note that the event

{inf{t ≥

0 :

}

= 0

}

is trivial. Moreover, for any finite

, the probability that

0 is

. Then take a sequence

such that

→

0, and apply Fatou to conclude

that the probability is positive.

(ii) Follows from the previous one since tB

1/t

is a Brownian motion.

(iii) By scale invariance, because S

∞

= aS

∞

for all a > 0.

(iv) Same as (i).

Theorem

(Strong Markov property)

Let (

)

t≥0

be a standard Brownian

motion in

, and let

be an almost-surely finite stopping time with respect to

)

t≥0

. Then

= B

T +t

− B

is a standard Brownian motion with respect to (

T +t

)

t≥0

that is independent of

Proof. Let T

= 2

−n

T e. We first prove the statement for T

. We let

(k)

= B

t+k/2

− B

k/2

This is then a standard Browninan motion independent of

k/2

by the simple

Markov property. Let

∗

(t) = B

t+T

− B

Let

be the

-algebra on

([0

, ∞

)

, R

), and

A ∈ A

. Let

E ∈ F

. The

claim that

∗

is a standard Brownian motion independent of

can be concisely

captured in the equality

P({B

∗

∈ A}∩ E) = P({B ∈ A})P(E). (†)

Taking

= Ω tells us

∗

and

have the same law, and then taking general

tells us B

∗

is independent of F

It is a straightforward computation to prove (†). Indeed, we have

P({B

∗

∈ A}∩ E) =

∞

k=0



(k)

∈ A}∩ E ∩





Since

E ∈ F

, we know

E ∩ {T

} ∈ F

k/2

. So by the simple Markov

property, this is equal to

∞

k=0

P({B

(k)

∈ A})P



E ∩





But we know B

is a standard Brownian motion. So this is equal to

∞

b=0

P({B ∈ A})P



E ∩





= P({B ∈ A})P(E).

So we are done.

Now as

n → ∞

, the increments of

∗

converge almost surely to the increments

, since

is continuous and

& T

almost surely. But

∗

all have the same

distribution, and almost sure convergence implies convergence in distribution.

B is a standard Brownian motion. Being independent of F

is clear.

We know that we can reset our process any time we like, and we also know

that we have a bunch of invariance properties. We can combine these to prove

some nice results.

Theorem

(Reflection principle)

Let (

)

T ≥0

and

be as above. Then the

reflected process (

)

t≥0

defined by

= B

t<T

+ (2B

− B

t≥T

is a standard Brownian motion.

Of course, the fact that we are reflecting is not important. We can apply any

operation that preserves the law. This theorem is “obvious”, but we can be a

bit more careful in writing down a proof.

Proof. By the strong Markov property, we know

= B

T +t

− B

and

−B

are standard Brownian motions independent of

. This implies that

the pairs of random variables

= ((B

)

0≤t≤T

, (B

)

t≥0

), P

= ((B

)

0≤t≤T

, (−B

)

t≥0

)

taking values in C × C have the same law on C × C with the product σ-algebra.

Define the concatenation map ψ

(X, Y ) : C × C → C by

(X, Y ) = X

t<T

+ (X

+ Y

t−T

t≥T

Assuming Y

= 0, the resulting process is continuous.

Notice that

is a measurable map, which we can prove by approximations

by discrete stopping times. We then conclude that

(

) has the same

law as ψ

Corollary.

Let (

)

T ≥0

be a standard Brownian motion in

= 1. Let

b >

and a ≤ b. Let

= sup

0≤s≤t

Then

P(S

≥ b, B

≤ a) = P(B

≥ 2b − a).

Proof.

Consider the stopping time

given by the first hitting time of

. Since

∞

, we know

is finite almost surely. Let (

)

t≥0

be the reflected process.

Then

≥ b, B

≤ a} = {

≥ 2b − a}.

Corollary. The law of S

is equal to the law of |B

Proof. Apply the previous process with b = a to get

P(S

≥ a) = P(S

≥ a, B

< a) + P(S

≥ a, B

≥ a)

= P(B

≥ a) + P(B

≥ a)

= P(B

≤ a) + P(B

≥ a)

= P(|B

| ≥ a).

Proposition.

Let

= 1 and (

)

t≥0

be a standard Brownian motion. Then

the following processes are (F

)

t≥0

martingales:

(i) (B

)

t≥0

(ii) (B

− t)

t≥0

(iii)



exp



−



t≥0

for u ∈ R.

Proof.

(i) Using the fact that B

− B

is independent of F

, we know

E(B

− B

| F

) = E(B

− B

) = 0.

(ii) We have

E(B

− t | F

) = E((B

− B

)

| F

) − E(B

| F

) + 2E(B

| F

) − t

We know

− B

is independent of

, and so the first term is equal to

var(B

− B

) = (t − s), and we can simply to get

= (t − s) − B

+ 2B

− t

= B

− s.

(iii) Similar.

5.2 Harmonic functions and Brownian motion

Recall that a Markov chain plus a harmonic function gave us a martingale. We

shall derive similar results here.

Definition (Domain). A domain is an open connected set D ⊆ R

Definition (Harmonic function). A function u : D → R is called harmonic if

∆f =

i=1

∂

∂x

= 0.

There is also an alternative characterization of harmonic functions that

involves integrals instead of derivatives.

Lemma.

Let

D → R

be measurable and locally bounded. Then the following

are equivalent:

(i) u is twice continuously differentiable and ∆u = 0.

(ii) For any x ∈ D and r > 0 such that B(x, r) ⊆ D, we have

u(x) =

Vol(B(x, r))

B(x,r)

u(y) dy

(iii) For any x ∈ D and r > 0 such that B(x, r) ⊆ D, we have

u(x) =

Area(∂B(x, r))

∂B(x,r)

u(y) dy.

The latter two properties are known as the mean value property.

Proof. IA Vector Calculus.

Theorem.

Let (

)

t≥0

be a standard Brownian motion in

, and

→ R

be harmonic such that

E|u(x + B

)| < ∞

for any

x ∈ R

and

t ≥

0. Then the process (

(

))

t≥0

is a martingale with

respect to (F

)

t≥0

To prove this, we need to prove a side lemma:

Lemma.

and

are independent random variables in

, and

measurable. If f : R

× R

→ R is such that f(X, Y ) is integrable, then

E(f(X, Y ) | G) = Ef(z, Y )|

z=X

Proof. Use Fubini and the fact that µ

(X,Y )

= µ

⊗ µ

Observe that if

is a probability measure in

such that the density of

with respect to the Lebesgue measure depends only on

|x|

, then if

is harmonic,

the mean value property implies

u(x) =

u(x + y) dµ(y).

Proof of theorem. Let t ≥ s. Then

E(u(B

) | F

) = E(u(B

+ (B

− B

)) | F

)

= E(u(z + B

− B

))|

Z=B

= u(z)|

z=B

= u(B

In fact, the following more general result is true:

Theorem.

Let

→ R

be twice continuously differentiable with bounded

derivatives. Then, the processes (X

)

t≥0

defined by

= f(B

) −

∆f(B

) ds

is a martingale with respect to (F

)

t≥0

We shall not prove this, but we can justify this as follows: suppose we have

a sequence of independent random variables {X

, X

, . . .}, with

P(X

= ±1) =

Let S

= X

+ ··· + X

. Then

E(f(S

n+1

) | S

, . . . , S

)−f(S

) =

(f(S

−1)+f(S

+1)−2f(s

)) ≡

∆f(S

and we see that this is the discretized second derivative. So

f(S

) −

n−1

i=0

∆f(S

)

is a martingale.

Now the mean value property of a harmonic function

says if we draw a

sphere

centered at

, then

(

) is the average value of

. More generally,

if we have a surface

containing

, is it true that

(

) is the average value of

on S in some sense?

Remarkably, the answer is yes, and the precise result is given by Brownian

motion. Let (

)

t≥0

be a Brownian motion started at

, and let

be the first

hitting time of S. Then, under certain technical conditions, we have

u(x) = E

u(X

In fact, given some function ϕ defined on the boundary of D, we can set

u(x) = E

ϕ(X

and this gives us the (unique) solution to Laplace’s equation with the boundary

condition given by ϕ.

It is in fact not hard to show that the resulting

is harmonic in

, since

it is almost immediate by construction that

(

) is the average of

on a small

sphere around

. The hard part is to show that

is in fact continuous at the

boundary, so that it is a genuine solution to the boundary value problem. This

is where the technical condition comes in.

First, we quickly establish that solutions to Laplace’s equation are unique.

Definition

(Maximum principle)

Let

D → R

be continuous and harmonic.

Then

(i) If u attains its maximum inside D, then u is constant.

(ii) If D is bounded, then the maximum of u in

D is attained at ∂D.

Thus, harmonic functions do not have interior maxima unless it is constant.

Proof. Follows from the mean value property of harmonic functions.

Corollary.

and

solve ∆

= ∆

= 0, and

and

agree on

∂D

, then

u = u

Proof. u − u

is also harmonic, and so attains the maximum at the boundary,

where it is 0. Similarly, the minimum is attained at the boundary.

The technical condition we impose on D is the following:

Definition

(Poincar´e cone condition)

We say a domain

satisfies the Poincar´e

cone condition if for any

x ∈ ∂D

, there is an open cone

based at

such that

C ∩ D ∩B(x, δ) = ∅

for some δ ≥ 0.

Example.

(

{

} × R

≥0

), then

does not satisfy the Poincar´e cone

condition.

And the technical lemma is as follows:

Lemma.

Let

be an open cone in

based at 0. Then there exists 0

≤ a <

such that if |x| ≤

, then

∂B(0,1)

< T

) ≤ a

Proof. Pick

a = sup

|x|≤

∂B(0,1)

< T

) < 1.

We then apply the strong Markov property, and the fact that Brownian motion is

scale invariant. We reason as follows — if we start with

|x| ≤

, we may or may

not hit

∂B

−k+1

) before hitting

. If we don’t, then we are happy. If we are not,

then we have reached

∂B

−k+1

). This happens with probability at most

. Now

that we are at

∂B

−k+1

), the probability of hitting

∂B

−k+2

) before hitting

the cone is at most

again. If we hit

∂B

−k+3

), we again have a probability of

≤ a

of hitting

∂B

−k+4

), and keep going on. Then by induction, we find that

the probability of hitting ∂B(0, 1) before hitting the cone is ≤ a

The ultimate theorem is then

Theorem.

Let

be a bounded domain satisfying the Poincar´e cone condition,

and let ϕ : ∂D → R be continuous. Let

∂D

= inf{t ≥ 0 : B

∈ ∂D}.

This is a bounded stopping time. Then the function u :

D → R defined by

u(x) = E

(ϕ(B

∂D

)),

where

is the expectation if we start at

, is the unique continuous function

such that u(x) = ϕ(x) for x ∈ ∂D, and ∆u = 0 for x ∈ D.

Proof.

Let

∂B(x,δ)

for

small. Then by the strong Markov property and

the tower law, we have

u(x) = E

(u(x

)),

and

is uniformly distributed over

∂B

(

x, δ

). So we know

is harmonic in the

interior of

, and in particular is continuous in the interior. It is also clear that

∂D

= ϕ. So it remains to show that u is continuous up to

So let

x ∈ ∂D

. Since

is continuous, for every

ε >

0, there is

δ >

0 such

that if y ∈ ∂D and |y −x| < δ, then |ϕ(y) − ϕ(x)| ≤ ε.

Take

z ∈

such that

|z −x| ≤

. Suppose we start our Brownian motion at

. If we hit the boundary before we leave the ball, then we are in good shape. If

not, then we are sad. But if the second case has small probability, then since

is bounded, we can still be fine.

Pick a cone

as in the definition of the Poincar´e cone condition, and assume

we picked δ small enough that C ∩ B(x, δ) ∩ D = ∅. Then we have

|u(z) − ϕ(x)| = |E

(ϕ(B

∂D

)) − ϕ(x)|

≤ E

|ϕ(B

∂D

− ϕ(x))|

≤ εP

B(x,δ)

> T

∂D

) + 2 sup kϕkP

∂D

> T

∂B(x,δ)

)

≤ ε + 2kϕk

∞

B(x,δ)

≤ T

and we know the second term → 0 as z → x.

5.3 Transience and recurrence

Theorem. Let (B

)

t≥0

be a Brownian motion in R

–

= 1, then (

)

t≥0

is point recurrent, i.e. for each

x, z ∈ R

, the set

{t ≥ 0 : B

= z} is unbounded P

-almost surely.

–

= 2, then (

)

t≥0

is neighbourhood recurrent, i.e. for each

x ∈ R

and

U ⊆ R

open, the set

{t ≥

0 :

∈ U}

is unbounded

-almost surely.

However, the process does not visit points, i.e. for all x, z ∈ R

, we have

= z for some t > 0) = 0.

–

d ≥

3, then (

)

t≥0

is transient, i.e.

| → ∞

t → ∞ P

-almost

surely.

Proof.

–

This is trivial, since

inf

t≥0

−∞

and

sup

t≥0

∞

almost surely,

and (B

)

t≥0

is continuous.

–

It is enough to prove for

= 0. Let 0

< ε < R < ∞

and

ϕ ∈ C

(

) such

that

(

) =

log |x|

for

ε ≤ |x| ≤ R

. It is an easy exercise to check that this

is harmonic inside the annulus. By the theorem we didn’t prove, we know

= ϕ(B

) −

∆ϕ(B

) ds

is a martingale. For

λ ≥

0, let

inf{t ≥

0 :

λ}

. If

ε ≤ |x| ≤ R

then

∧ S

-almost surely finite. Then

is a bounded

martingale. By optional stopping, we have

(log |B

|) = log |x|.

But the LHS is

log εP(S

< S

) + log RP(S

< S

So we find that

< S

) =

log R − log |x|

log R − log ε

. (∗)

Note that if we let

R → ∞

, then

→ ∞

almost surely. Using (

∗

), this

implies

(

< ∞

) = 1, and this does not depend on

. So we are done.

To prove that (

)

t≥0

does not visit points, let

ε →

0 in (

∗

) and then

R → ∞ for x 6= z = 0.

–

It is enough to consider the case

= 3. As before, let

ϕ ∈ C

(

) be such

that

ϕ(x) =

|x|

for ε ≤ x ≤ 2. Then ∆ϕ(x) = 0 for ε ≤ x ≤ R. As before, we get

< S

) =

|x|

−1

− |R|

−1

− R

−1

As R → ∞, we have

< ∞) =

Now let

= {B

≥ n for all t ≥ B

Then

) =

So by Borel–Cantelli, we know only finitely of

occur almost surely. So

infinitely many of the A

hold. This guarantees our process → ∞.

5.4 Donsker’s invariance principle

To end our discussion on Brownian motion, we provide an alternative construction

of Brownian motion, given by Donsker’s invariance principle. Suppose we run any

simple random walk on

. We can think of this as a very coarse approximation

of a Brownian motion. As we zoom out, the step sizes in the simple random

walk look smaller and smaller, and if we zoom out sufficiently much, then we

might expect that the result looks like Brownian motion, and indeed it converges

to Brownian motion in the limit.

Theorem

(Donsker’s invariance principle)

Let (

)

n≥0

be iid random variables

with mean 0 and variance 1, and set S

= X

+ ··· + X

. Define

= (1 − {t})s

btc

+ {t}S

btc+1

where {t} = t − btc.

Define

[N]

)

t≥0

= (N

−1/2

t·N

)

t∈[0,1]

As (

[N]

)

t∈[0,1]

converges in distribution to the law of standard Brownian motion

on [0, 1].

The reader might wonder why we didn’t construct our Brownian motion

this way instead of using Wiener’s theorem. The answer is that our proof of

Donsker’s invariance principle relies on the existence of a Brownian motion! The

relevance is the following theorem:

Theorem

(Skorokhod embedding theorem)

Let

be a probability measure on

with mean 0 and variance

. Then there exists a probability space (Ω

, F, P

)

with a filtration (

)

t≥0

on which there is a standard Brownian motion (

)

t≥0

and a sequence of stopping times (T

)

n≥0

such that, setting S

= B

(i) T

is a random walk with steps of mean σ

(ii) S

is a random walk with step distribution µ.

So in some sense, Brownian motion contains all random walks with finite

variance.

The only stopping times we know about are the hitting times of some value.

However, if we take

to be the hitting time of some fixed value, then

would be a pretty poor attempt at constructing a random walk. Thus, we may

try to come up with the following strategy — construct a probability space

with a Brownian motion (

)

t≥0

, and an independent iid sequence (

)

n∈N

random variables with distribution

. We then take

to be the first hitting

time of

···

. Then setting

, property (ii) is by definition

satisfied. However, (i) will not be satisfied in general. In fact, for any

y 6

= 0, the

expected first hitting time of

is infinite! The problem is that if, say,

y >

and we accidentally strayed off to the negative side, then it could take a long

time to return.

The solution is to “split”

into two parts, and construct two random variables

(

X, Y

)

∈

, ∞

)

, such that if

is the first hitting time of (

−X, Y

), then

has law µ.

Since we are interested in the stopping times

−x

and

, the following

computation will come in handy:

Lemma. Let x, y > 0. Then

−x

< T

) =

x + y

, E

−x

∧ T

= xy.

Proof sketch. Use optional stopping with (B

− t)

t≥0

Proof of Skorokhod embedding theorem.

Define Borel measures

on [0

, ∞

) by

(A) = µ(±A).

Note that these are not probability measures, but we can define a probability

measure ν on [0, ∞)

given by

dν(x, y) = C(x + y) dµ

−

(x) dµ

(y)

for some normalizing constant

(this is possible since

is integrable). This

(

) is the same (

) appearing in the denominator of

(

−x

< T

) =

x+y

Then we claim that any (X, Y ) with this distribution will do the job.

We first figure out the value of C. Note that since µ has mean 0, we have

∞

x dµ

−

(x) = C

∞

y dµ

(y).

Thus, we have

1 =

C(x + y) dµ

−

(x) dµ

(y)

= C

x dµ

−

(x)

dµ

(y) + C

y dµ

(y)

dµ

−

(x)

= C

x dµ

−

(x)



dµ

(y) +

dµ

−

(x)



= C

x dµ

−

(x) = C

y dµ

(y).

We now set up our notation. Take a probability space (Ω

, F, P

) with a stan-

dard Brownian motion (

)

t≥0

and a sequence ((

, Y

))

n≥0

iid with distribution

ν and independent of (B

)

t≥0

Define

= σ((X

, Y

), n = 1, 2, . . .), F

= σ(F

, F

Define a sequence of stopping times

= 0, T

n+1

= inf{t ≥ T

: B

− B

∈ {−X

n+1

, Y

n+1

}}.

By the strong Markov property, it suffices to prove that things work in the case

n = 1. So for convenience, let T = T

, X = X

, Y = Y

To simplify notation, let τ : C([0, 1], R) × [0, ∞)

→ [0, ∞) be given by

τ(ω, x, y) = inf{t ≥ 0 : ω(t) ∈ {−x, y}}.

Then we have

T = τ((B

)

t≥0

, X, Y ).

To check that this works, i.e. (ii) holds, if A ⊆ [0, ∞), then

P(B

∈ A) =

[0,∞)

C([0,∞),R)

τ(ω,x,y)∈A

dµ

(ω) dν(x, y).

Using the first part of the previous computation, this is given by

[0,∞)

x + y

y∈A

C(x + y) dµ

−

(x) dµ

(y) = µ

(A).

We can prove a similar result if A ⊆ (−∞, 0). So B

has the right law.

To see that T is also well-behaved, we compute

ET =

[0,∞)

C([0,1],R)

τ(ω, x, y) dµ

(ω) dν(x, y)

[0,∞)

xy dν(x, y)

= C

[0,∞)

y + yx

) dµ

−

(x) dµ

(y)

[0,∞)

dµ

−

(x) +

[0,∞)

dµ

(y)

= σ

The idea of the proof of Donsker’s invariance principle is that in the limit of

large

, the

are roughly regularly spaced, by the law of large numbers, so

this allows us to reverse the above and use the random walk to approximate the

Brownian motion.

Proof of Donsker’s invariance principle.

Let (

)

t≥0

be a standard Brownian

motion. Then by Brownian scaling,

(N)

)

t≥0

= (N

1/2

t/N

)

t≥0

is a standard Brownian motion.

For every

N >

0, we let (

(N)

)

n≥0

be a sequence of stopping times as in the

embedding theorem for B

(N)

. We then set

(N)

= B

(N)

For t not an integer, define S

(N)

by linear interpolation. Observe that

((T

(N)

)

n≥0

, S

(N)

) ∼ ((T

(1)

)

n≥0

, S

(1)

We define

(N)

= N

−1/2

(N)

Note that if t =

, then

(N)

n/N

= N

−1/2

(N)

= N

−1/2

(N)

= B

(N)

= B

. (∗)

Note that (

(N)

)

t≥0

∼

(

(N)

)

t≥0

. We will prove that we have convergence in

probability, i.e. for any δ > 0,



sup

0≤t<1

(N)

− B

| > δ



= P(k

(N)

− Bk

∞

> δ) → 0 as N → ∞.

We already know that

and

agree at some times, but the time on

is fixed

while that on

is random. So what we want to apply is the law of large numbers.

By the strong law of large numbers,

lim

n→∞

(1)

− n| → 0 as n → 0.

This implies that

sup

1≤n≤N

(1)

− n| → 0 as N → ∞.

Note that (T

(1)

)

n≥0

∼ (T

(N)

)

n≥0

, it follows for any δ > 0,

sup

1≤n≤N



(N)

−



≥ δ

→ 0 as N → ∞.

Using (

∗

) and continuity, for any

t ∈

[

n+1

], there exists

u ∈

[

(N)

n/N

, T

(N)

(n+1)/N

]

such that

(N)

= B

Note that if times are approximated well up to δ, then |t − u| ≤ δ +

Hence we have

S − Bk

∞

> ε} ≤



(N)

−



> δ for some n ≤ N

∪



− B

| > ε for some t ∈ [0, 1], |t − u| < δ +



The first probability

→

0 as

n → ∞

. For the second, we observe that (

)

T ∈[0,1]

has uniformly continuous paths, so for

ε >

0, we can find

δ >

0 such that the

second probability is less than ε whenever N >

(exercise!).

(N)

→ B

uniformly in probability, hence converges uniformly in distri-

bution.

6 Large deviations

So far, we have been interested in the average or “typical” behaviour of our

processes. Now, we are interested in “extreme cases”, i.e. events with small

probability. In general, our objective is to show that these probabilities tend to

zero very quickly.

Let (

)

n≥0

be a sequence of iid integrable random variables in

and mean

value EX

= ¯x and finite variance σ

. We let

= X

+ ··· + X

By the central limit theorem, we have

P(S

≥ n¯x +

√

nσa) → P(Z ≥ a) as n → ∞,

where Z ∼ N(0, 1). This implies

P(S

≥ an) → 0

for any a > ¯x. The question is then how fast does this go to zero?

There is a very useful lemma in the theory of sequences that tells us this

vanishes exponentially quickly with n. Note that

P(S

m+n

≥ a(m + n)) ≥ P(S

≥ am)P(S

≥ an).

So the sequence P(S

≥ an) is super-multiplicative. Thus, the sequence

= −log P(S

≥ an)

is sub-additive.

Lemma

(Fekete)

is a non-negative sub-additive sequence, then

lim

exists.

This implies the rate of decrease is exponential. Can we do better than that,

and point out exactly the rate of decrease?

For λ ≥ 0, consider the moment generating function

M(λ) = Ee

λX

We set ψ(λ) = log M(λ), and the Legendre transform of ψ is

∗

(a) = sup

λ≥0

(aλ − ψ(λ)).

Note that these things may be infinite.

Theorem (Cram´er’s theorem). For a > ¯x, we have

lim

n→∞

log P(S

≥ an) = −ψ

∗

(a).

Note that we always have

∗

(a) = sup

λ≥0

(aλ − ψ(λ)) ≥ −ψ(0) = 0.

Proof. We first prove an upper bound. For any λ, Markov tells us

P(S

≥ an) = P(e

λS

≥ e

λan

) ≤ e

−λa

λS

= e

−λan

i=1

λX

= e

−λan

M(λ)

= e

−n(λa−ψ(λ))

Since

was arbitrary, we can pick

to maximize

λa −ψ

(

), and so by definition

of ψ

∗

(a), we have P(S

≥ a

) ≤ e

−nψ

∗

(a)

. So it follows that

lim sup

log P(S

≥ a

) ≤ −ψ

∗

(a).

The lower bound is a bit more involved. One checks that by translating

we may assume a = 0, and in particular, ¯x < 0.

So we want to prove that

lim inf

log P(S

≥ 0) ≥ inf

λ≥0

ψ(λ).

We consider cases:

– If P(X ≤ 0) = 1, then

P(S

≥ 0) = P(X

= 0 for i = 1, . . . n) = P(X

= 0)

So in fact

lim inf

log P(S

≥ 0) = log P(X

= 0).

But by monotone convergence, we have

P(X

= 0) = lim

λ→∞

λX

So we are done.

–

Consider the case

(

0, but

(

∈

[

−K, K

]) = 1 for some

The idea is to modify

so that it has mean 0. For

, we define a

new distribution by the density

dµ

(x) =

θx

M(θ)

We define

g(θ) =

x dµ

(x).

We claim that g is continuous for θ ≥ 0. Indeed, by definition,

g(θ) =

θx

dµ(x)

θx

dµ(x)

and both the numerator and denominator are continuous in

by dominated

convergence.

Now observe that g(0) = ¯x, and

lim sup

θ→∞

g(θ) > 0.

So by the intermediate value theorem, we can find some

such that

g(θ

) = 0.

Define

to be the law of the sum of

iid random variables with law

We have

P(S

≥ 0) ≥ P(S

∈ [0, εn]) ≥ Ee

−εn)

∈[0,εn]

using the fact that on the event

∈

, εn

], we have

−εn)

≤

1. So

we have

P(S

≥ 0) ≥ M(θ

)

−θ

εn

({S

∈ [0, εn]}).

By the central limit theorem, for each fixed ε, we know

({S

∈ [0, εn]}) →

as n → ∞.

So we can write

lim inf

log P(S

≥ 0) ≥ ψ(θ

) − θ

ε.

Then take the limit ε → 0 to conclude the result.

–

Finally, we drop the finiteness assumption, and only assume

(

We define

to be the law of

condition on the event

{|X

| ≤ K}

. Let

be the law of the sum of n iid random variables with law ν. Define

(λ) = log

−K

λx

dµ(x)

(λ) = log

∞

−∞

λx

dν(x) = ψ

(λ) − log µ({|X| ≤ K}).

Note that for

large enough,

(

)

0. So we can use the previous

case. By definition of ν, we have

([0, ∞)) ≥ ν([0, ∞))µ(|X| ≤ K)

So we have

lim inf

log µ([0, ∞)) ≥ log µ(|X| ≤ K) + lim inf log ν

([0, ∞))

≥ log µ(|X| ≤ K) + inf ψ

(λ)

= inf

(λ)

= ψ

Since

increases as

increases to infinity, this increases to some

have

lim inf

log µ

([0, ∞)) ≥ J. (†)

Since

(

) are continuous,

{λ

(

)

≤ J}

is non-empty, compact and

nested in K. By Cantor’s theorem, we can find

∈

{λ : ψ

(λ) ≤ J}.

So the RHS of (†) satisfies

J ≥ sup

(λ

) = ψ(λ

) ≥ inf

ψ(λ).