III Advanced Probability - Martingales in discrete time

2Martingales in discrete time

III Advanced Probability

2.4 Applications of martingales

Having developed the theory, let us move on to some applications. Before we do

that, we need the notion of a backwards martingale.

Definition

(Backwards filtration)

A backwards filtration on a measurable space

(E, E) is a sequence of σ-algebras

⊆ E such that

n+1

⊆

. We define

∞

n≥0

Theorem. Let Y ∈ L

, and let

be a backwards filtration. Then

E(Y |

) → E(Y |

∞

)

almost surely and in L

A process of this form is known as a backwards martingale.

Proof.

We first show that

(

Y |

) converges. We then show that what it

converges to is indeed E(Y |

∞

We write

= E(Y |

Observe that for all

n ≥

0, the process (

n−k

)

0≤k≤n

is a martingale by the tower

property, and so is (

−X

n−k

)

0≤k≤n

. Now notice that for all

a < b

, the number

of upcrossings of [

a, b

] by (

)

0≤k≤n

is equal to the number of upcrossings of

[−b, −a] by (−X

n−k

)

0≤k≤n

Using the same arguments as for martingales, we conclude that

→ X

∞

almost surely and in L

for some X

∞

To see that

∞

(

Y |

∞

), we notice that

∞

measurable. So it is

enough to prove that

∞

= E(E(Y |

∞

)

for all A ∈

∞

. Indeed, we have

∞

= lim

n→∞

= lim

n→∞

E(E(Y |

)

= lim

n→∞

E(Y | 1

)

= E(Y | 1

)

= E(E(Y |

Theorem

(Kolmogorov 0-1 law)

Let (

)

n≥0

be independent random variables.

Then, let

= σ(X

n+1

, X

n+2

, . . .).

Then the tail σ-algebra

∞

is trivial, i.e. P(A) ∈ {0, 1} for all A ∈

∞

Proof.

Let

(

, . . . , X

). Then

and

are independent. Then for

all A ∈

∞

, we have

E(1

| F

) = P(A).

But the LHS is a martingale. So it converges almost surely and in

(

| F

∞

). But

∞

-measurable, since

∞

⊆ F

∞

. So this is just

. So

= P(A) almost surely, and we are done.

Theorem

(Strong law of large numbers)

Let (

)

n≥1

be iid random variables

in L

, with EX

= µ. Define

i=1

Then

→ µ as n → ∞

almost surely and in L

Proof. We have

= E(S

| S

) =

i=1

E(X

| S

) = nE(X

| S

So the problem is equivalent to showing that

(

| S

)

→ µ

n → ∞

. This

seems like something we can tackle with our existing technology, except that the

do not form a filtration.

Thus, define a backwards filtration

= σ(S

, S

n+1

, S

n+2

, . . .) = σ(S

, X

n+1

, X

n+2

, . . .) = σ(S

, τ

where

(

n+1

, X

n+2

, . . .

). We now use the property of conditional expec-

tation that we’ve never used so far, that adding independent information to a

conditional expectation doesn’t change the result. Since

is independent of

σ(X

, S

), we know

= E(X

| S

) = E(X

Thus, by backwards martingale convergence, we know

→ E(X

∞

But by the Kolmogorov 0-1 law, we know

∞

is trivial. So we know that

(

∞

) is almost constant, which has to be E(E(X

∞

)) = E(X

) = µ.

Recall that if (E, E, µ) is a measure space and f ∈ mE

, then

ν(A) = µ(f1

)

is a measure on E. We say f is a density of ν with respect to µ.

We can ask an “inverse” question – given two different measures on

, when

is it the case that one is given by a density with respect to the other?

A first observation is that if

(

) =

(

), then whenever

(

) = 0, we

must have

(

) = 0. However, this is not sufficient. For example, let

be a

counting measure on

, and

the Lebesgue measure. Then our condition is

satisfied. However, if

is given by a density

with respect to

, we must have

0 = ν({x}) = µ(f1

{x}

) = f(x).

So f ≡ 0, but taking f ≡ 0 clearly doesn’t give the Lebesgue measure.

The problem with this is that µ is not a σ-finite measure.

Theorem

(Radon–Nikodym)

Let (Ω

, F

) be a measurable space, and

and

be two probability measures on (Ω, F). Then the following are equivalent:

(i) Q

is absolutely continuous with respect to

, i.e. for any

A ∈ F

, if

(

) = 0,

then Q(A) = 0.

(ii)

For any

ε >

0, there exists

δ >

0 such that for all

A ∈ F

, if

(

)

≤ δ

, then

Q(A) ≤ ε.

(iii) There exists a random variable X ≥ 0 such that

Q(A) = E

(X1

In this case,

is called the Radon–Nikodym derivative of

with respect

to P, and we write X =

Note that this theorem works for all finite measures by scaling, and thus for

σ-finite measures by partitioning Ω into sets of finite measure.

Proof.

We shall only treat the case where

is countably generated, i.e.

(

, F

, . . .

) for some sets

. For example, any second-countable topological

space is countably generated.

– (iii) ⇒ (i): Clear.

– (ii) ⇒ (iii): Define the filtration

= σ(F

, F

, . . . , F

Since F

is finite, we can write it as

= σ(A

n,1

, . . . , A

n,m

where each

n,i

is an atom, i.e. if

B ( A

n,i

and

B ∈ F

, then

∅

. We

define

n=1

Q(A

n,i

)

P(A

n,i

)

n,i

where we skip over the terms where

(

n,i

) = 0. Note that this is exactly

designed so that for any A ∈ F

, we have

) = E

n,i

⊆A

Q(A

n,i

)

P(A

, i)

n,i

= Q(A).

Thus, if A ∈ F

⊆ F

n+1

, we have

n+1

= Q(A) = EX

So we know that

E(X

n+1

| F

) = X

It is also immediate that (X

)

n≥0

is adapted. So it is a martingale.

We next show that (

)

n≥0

is uniformly integrable. By Markov’s inequality,

we have

P(X

≥ λ) ≤

≤ δ

for λ large enough. Then

E(X

≥λ

) = Q(X

≥ λ) ≤ ε.

So we have shown uniform integrability, and so we know

→ X

almost

surely and in L

for some X. Then for all A ∈

n≥0

, we have

Q(A) = lim

n→∞

= EX1

(

−

) and

EX1

(−)

agree on

n≥0

, which is a generating

-system

for F, so they must be the same.

–

(i)

⇒

(ii): Suppose not. Then there exists some

ε >

0 and some

, A

, . . . ∈ F such that

Q(A

) ≥ ε, P(A

) ≤

Since

P(A

) is finite, by Borel–Cantelli, we know

P lim sup A

= 0.

On the other hand, by, say, dominated convergence, we have

Q lim sup A

= Q

∞

n=1

∞

[

m=n

= lim

k→∞

n=1

∞

[

m=n

≥ lim

k→∞

∞

[

m=k

≥ ε.

This is a contradiction.

Finally, we end the part on discrete time processes by relating what we have

done to Markov chains.

Let’s first recall what Markov chains are. Let

be a countable space, and

a measure on E. We write µ

= µ({x}), and then µ(f ) = µ ·f.

Definition

(Transition matrix)

A transition matrix is a matrix

= (

)

x,y∈E

such that each p

= (p

x,y

)

y∈E

is a probability measure on E.

Definition

(Markov chain)

An adapted process (

) is called a Markov chain

if for any n and A ∈ F

such that {x

= x} ⊇ A, we have

P(X

n+1

= y | A) = p

Definition

(Harmonic function)

A function

E → R

is harmonic if

P f

In other words, for any x, we have

f(y) = f(x).

We then observe that

Proposition.

is harmonic and bounded, and (

)

n≥0

is Markov, then

(f(X

))

n≥0

is a martingale.

Example.

Let (

)

n≥0

be iid

-valued random variables in

, and

[

] = 0.

Then

= X

+ ··· + X

is a martingale and a Markov chain.

However, if

is a

-valued random variable, consider the random variable

(

)

n≥0

and

(

, Z

). Then this is a martingale but not a Markov

chain.