2Martingales in discrete time

III Advanced Probability



2.4 Applications of martingales
Having developed the theory, let us move on to some applications. Before we do
that, we need the notion of a backwards martingale.
Definition
(Backwards filtration)
.
A backwards filtration on a measurable space
(E, E) is a sequence of σ-algebras
ˆ
F
n
E such that
ˆ
F
n+1
ˆ
F
n
. We define
ˆ
F
=
\
n0
ˆ
F
n
.
Theorem. Let Y L
1
, and let
ˆ
F
n
be a backwards filtration. Then
E(Y |
ˆ
F
n
) E(Y |
ˆ
F
)
almost surely and in L
1
.
A process of this form is known as a backwards martingale.
Proof.
We first show that
E
(
Y |
ˆ
F
n
) converges. We then show that what it
converges to is indeed E(Y |
ˆ
F
).
We write
X
n
= E(Y |
ˆ
F
n
).
Observe that for all
n
0, the process (
X
nk
)
0kn
is a martingale by the tower
property, and so is (
X
nk
)
0kn
. Now notice that for all
a < b
, the number
of upcrossings of [
a, b
] by (
X
k
)
0kn
is equal to the number of upcrossings of
[b, a] by (X
nk
)
0kn
.
Using the same arguments as for martingales, we conclude that
X
n
X
almost surely and in L
1
for some X
.
To see that
X
=
E
(
Y |
ˆ
F
), we notice that
X
is
ˆ
F
measurable. So it is
enough to prove that
EX
1
A
= E(E(Y |
ˆ
F
)1
A
)
for all A
ˆ
F
. Indeed, we have
EX
1
A
= lim
n→∞
EX
n
1
A
= lim
n→∞
E(E(Y |
ˆ
F
n
)1
A
)
= lim
n→∞
E(Y | 1
A
)
= E(Y | 1
A
)
= E(E(Y |
ˆ
F
n
)1
A
).
Theorem
(Kolmogorov 0-1 law)
.
Let (
X
n
)
n0
be independent random variables.
Then, let
ˆ
F
n
= σ(X
n+1
, X
n+2
, . . .).
Then the tail σ-algebra
ˆ
F
is trivial, i.e. P(A) {0, 1} for all A
ˆ
F
.
Proof.
Let
F
n
=
σ
(
X
1
, . . . , X
n
). Then
F
n
and
ˆ
F
n
are independent. Then for
all A
ˆ
F
, we have
E(1
A
| F
n
) = P(A).
But the LHS is a martingale. So it converges almost surely and in
L
1
to
E
(
1
A
| F
). But
1
A
is
F
-measurable, since
ˆ
F
F
. So this is just
1
A
. So
1
A
= P(A) almost surely, and we are done.
Theorem
(Strong law of large numbers)
.
Let (
X
n
)
n1
be iid random variables
in L
1
, with EX
1
= µ. Define
S
n
=
n
X
i=1
X
i
.
Then
S
n
n
µ as n
almost surely and in L
1
.
Proof. We have
S
n
= E(S
n
| S
n
) =
n
X
i=1
E(X
i
| S
n
) = nE(X
1
| S
n
).
So the problem is equivalent to showing that
E
(
X
1
| S
n
)
µ
as
n
. This
seems like something we can tackle with our existing technology, except that the
S
n
do not form a filtration.
Thus, define a backwards filtration
ˆ
F
n
= σ(S
n
, S
n+1
, S
n+2
, . . .) = σ(S
n
, X
n+1
, X
n+2
, . . .) = σ(S
n
, τ
n
),
where
τ
n
=
σ
(
X
n+1
, X
n+2
, . . .
). We now use the property of conditional expec-
tation that we’ve never used so far, that adding independent information to a
conditional expectation doesn’t change the result. Since
τ
n
is independent of
σ(X
1
, S
n
), we know
S
n
n
= E(X
1
| S
n
) = E(X
1
|
ˆ
F
n
).
Thus, by backwards martingale convergence, we know
S
n
n
E(X
1
|
ˆ
F
).
But by the Kolmogorov 0-1 law, we know
ˆ
F
is trivial. So we know that
E
(
X
1
|
ˆ
F
) is almost constant, which has to be E(E(X
1
|
ˆ
F
)) = E(X
1
) = µ.
Recall that if (E, E, µ) is a measure space and f mE
+
, then
ν(A) = µ(f1
A
)
is a measure on E. We say f is a density of ν with respect to µ.
We can ask an “inverse” question given two different measures on
E
, when
is it the case that one is given by a density with respect to the other?
A first observation is that if
ν
(
A
) =
µ
(
f1
A
), then whenever
µ
(
A
) = 0, we
must have
ν
(
A
) = 0. However, this is not sufficient. For example, let
µ
be a
counting measure on
R
, and
ν
the Lebesgue measure. Then our condition is
satisfied. However, if
ν
is given by a density
f
with respect to
ν
, we must have
0 = ν({x}) = µ(f1
{x}
) = f(x).
So f 0, but taking f 0 clearly doesn’t give the Lebesgue measure.
The problem with this is that µ is not a σ-finite measure.
Theorem
(Radon–Nikodym)
.
Let (Ω
, F
) be a measurable space, and
Q
and
P
be two probability measures on (Ω, F). Then the following are equivalent:
(i) Q
is absolutely continuous with respect to
P
, i.e. for any
A F
, if
P
(
A
) = 0,
then Q(A) = 0.
(ii)
For any
ε >
0, there exists
δ >
0 such that for all
A F
, if
P
(
A
)
δ
, then
Q(A) ε.
(iii) There exists a random variable X 0 such that
Q(A) = E
P
(X1
A
).
In this case,
X
is called the Radon–Nikodym derivative of
Q
with respect
to P, and we write X =
dQ
dP
.
Note that this theorem works for all finite measures by scaling, and thus for
σ-finite measures by partitioning into sets of finite measure.
Proof.
We shall only treat the case where
F
is countably generated, i.e.
F
=
σ
(
F
1
, F
2
, . . .
) for some sets
F
i
. For example, any second-countable topological
space is countably generated.
(iii) (i): Clear.
(ii) (iii): Define the filtration
F
n
= σ(F
1
, F
2
, . . . , F
n
).
Since F
n
is finite, we can write it as
F
n
= σ(A
n,1
, . . . , A
n,m
n
),
where each
A
n,i
is an atom, i.e. if
B ( A
n,i
and
B F
n
, then
B
=
. We
define
X
n
=
m
n
X
n=1
Q(A
n,i
)
P(A
n,i
)
1
A
n,i
,
where we skip over the terms where
P
(
A
n,i
) = 0. Note that this is exactly
designed so that for any A F
n
, we have
E
P
(X
n
1
A
) = E
P
X
A
n,i
A
Q(A
n,i
)
P(A
n
, i)
1
A
n,i
= Q(A).
Thus, if A F
n
F
n+1
, we have
EX
n+1
1
A
= Q(A) = EX
n
1
A
.
So we know that
E(X
n+1
| F
n
) = X
n
.
It is also immediate that (X
n
)
n0
is adapted. So it is a martingale.
We next show that (
X
n
)
n0
is uniformly integrable. By Markov’s inequality,
we have
P(X
n
λ)
EX
n
λ
=
1
λ
δ
for λ large enough. Then
E(X
n
1
X
n
λ
) = Q(X
n
λ) ε.
So we have shown uniform integrability, and so we know
X
n
X
almost
surely and in L
1
for some X. Then for all A
S
n0
F
n
, we have
Q(A) = lim
n→∞
EX
n
1
A
= EX1
A
.
So
Q
(
) and
EX1
()
agree on
S
n0
F
n
, which is a generating
π
-system
for F, so they must be the same.
(i)
(ii): Suppose not. Then there exists some
ε >
0 and some
A
1
, A
2
, . . . F such that
Q(A
n
) ε, P(A
n
)
1
2
n
.
Since
P
n
P(A
n
) is finite, by Borel–Cantelli, we know
P lim sup A
n
= 0.
On the other hand, by, say, dominated convergence, we have
Q lim sup A
n
= Q
\
n=1
[
m=n
A
m
!
= lim
k→∞
Q
k
\
n=1
[
m=n
A
m
!
lim
k→∞
Q
[
m=k
A
k
!
ε.
This is a contradiction.
Finally, we end the part on discrete time processes by relating what we have
done to Markov chains.
Let’s first recall what Markov chains are. Let
E
be a countable space, and
µ
a measure on E. We write µ
x
= µ({x}), and then µ(f ) = µ ·f.
Definition
(Transition matrix)
.
A transition matrix is a matrix
P
= (
p
xy
)
x,yE
such that each p
x
= (p
x,y
)
yE
is a probability measure on E.
Definition
(Markov chain)
.
An adapted process (
X
n
) is called a Markov chain
if for any n and A F
n
such that {x
n
= x} A, we have
P(X
n+1
= y | A) = p
xy
.
Definition
(Harmonic function)
.
A function
f
:
E R
is harmonic if
P f
=
f
.
In other words, for any x, we have
X
y
p
xy
f(y) = f(x).
We then observe that
Proposition.
If
F
is harmonic and bounded, and (
X
n
)
n0
is Markov, then
(f(X
n
))
n0
is a martingale.
Example.
Let (
X
n
)
n0
be iid
Z
-valued random variables in
L
1
, and
E
[
X
i
] = 0.
Then
S
n
= X
0
+ ··· + X
n
is a martingale and a Markov chain.
However, if
Z
is a
Z
-valued random variable, consider the random variable
(
ZS
n
)
n0
and
F
n
=
σ
(
F
n
, Z
). Then this is a martingale but not a Markov
chain.