6Large deviations

III Advanced Probability



6 Large deviations
So far, we have been interested in the average or “typical” behaviour of our
processes. Now, we are interested in “extreme cases”, i.e. events with small
probability. In general, our objective is to show that these probabilities tend to
zero very quickly.
Let (
X
n
)
n0
be a sequence of iid integrable random variables in
R
and mean
value EX
1
= ¯x and finite variance σ
2
. We let
S
n
= X
1
+ ··· + X
n
.
By the central limit theorem, we have
P(S
n
n¯x +
a) P(Z a) as n ,
where Z N(0, 1). This implies
P(S
n
an) 0
for any a > ¯x. The question is then how fast does this go to zero?
There is a very useful lemma in the theory of sequences that tells us this
vanishes exponentially quickly with n. Note that
P(S
m+n
a(m + n)) P(S
m
am)P(S
n
an).
So the sequence P(S
n
an) is super-multiplicative. Thus, the sequence
b
n
= log P(S
n
an)
is sub-additive.
Lemma
(Fekete)
.
If
b
n
is a non-negative sub-additive sequence, then
lim
n
b
n
n
exists.
This implies the rate of decrease is exponential. Can we do better than that,
and point out exactly the rate of decrease?
For λ 0, consider the moment generating function
M(λ) = Ee
λX
1
.
We set ψ(λ) = log M(λ), and the Legendre transform of ψ is
ψ
(a) = sup
λ0
( ψ(λ)).
Note that these things may be infinite.
Theorem (Cram´er’s theorem). For a > ¯x, we have
lim
n→∞
1
n
log P(S
n
an) = ψ
(a).
Note that we always have
ψ
(a) = sup
λ0
( ψ(λ)) ψ(0) = 0.
Proof. We first prove an upper bound. For any λ, Markov tells us
P(S
n
an) = P(e
λS
n
e
λan
) e
λa
n
Ee
λS
n
= e
λan
n
Y
i=1
Ee
λX
i
= e
λan
M(λ)
n
= e
n(λaψ(λ))
.
Since
λ
was arbitrary, we can pick
λ
to maximize
λa ψ
(
λ
), and so by definition
of ψ
(a), we have P(S
n
a
n
) e
(a)
. So it follows that
lim sup
1
n
log P(S
n
a
n
) ψ
(a).
The lower bound is a bit more involved. One checks that by translating
X
i
by
a
,
we may assume a = 0, and in particular, ¯x < 0.
So we want to prove that
lim inf
n
1
n
log P(S
n
0) inf
λ0
ψ(λ).
We consider cases:
If P(X 0) = 1, then
P(S
n
0) = P(X
i
= 0 for i = 1, . . . n) = P(X
1
= 0)
n
.
So in fact
lim inf
n
1
n
log P(S
n
0) = log P(X
1
= 0).
But by monotone convergence, we have
P(X
1
= 0) = lim
λ→∞
Ee
λX
1
.
So we are done.
Consider the case
P
(
X
1
>
0)
>
0, but
P
(
X
1
[
K, K
]) = 1 for some
K
.
The idea is to modify
X
1
so that it has mean 0. For
µ
=
µ
X
1
, we define a
new distribution by the density
dµ
θ
dµ
(x) =
e
θx
M(θ)
.
We define
g(θ) =
Z
x dµ
θ
(x).
We claim that g is continuous for θ 0. Indeed, by definition,
g(θ) =
R
xe
θx
dµ(x)
R
e
θx
dµ(x)
,
and both the numerator and denominator are continuous in
θ
by dominated
convergence.
Now observe that g(0) = ¯x, and
lim sup
θ→∞
g(θ) > 0.
So by the intermediate value theorem, we can find some
θ
0
such that
g(θ
0
) = 0.
Define
µ
θ
0
n
to be the law of the sum of
n
iid random variables with law
µ
θ
0
.
We have
P(S
n
0) P(S
n
[0, εn]) Ee
θ
0
(S
n
εn)
1
S
n
[0,εn]
,
using the fact that on the event
S
n
[0
, εn
], we have
e
θ
0
(S
n
εn)
1. So
we have
P(S
n
0) M(θ
0
)
n
e
θ
0
εn
µ
θ
0
n
({S
n
[0, εn]}).
By the central limit theorem, for each fixed ε, we know
µ
θ
0
n
({S
n
[0, εn]})
1
2
as n .
So we can write
lim inf
n
1
n
log P(S
n
0) ψ(θ
0
) θ
0
ε.
Then take the limit ε 0 to conclude the result.
Finally, we drop the finiteness assumption, and only assume
P
(
X
1
>
0)
>
0.
We define
ν
to be the law of
X
1
condition on the event
{|X
1
| K}
. Let
ν
n
be the law of the sum of n iid random variables with law ν. Define
ψ
K
(λ) = log
Z
K
K
e
λx
dµ(x)
ψ
ν
(λ) = log
Z
−∞
e
λx
dν(x) = ψ
K
(λ) log µ({|X| K}).
Note that for
K
large enough,
R
x
d
ν
(
x
)
<
0. So we can use the previous
case. By definition of ν, we have
µ
n
([0, )) ν([0, ))µ(|X| K)
n
.
So we have
lim inf
n
1
n
log µ([0, )) log µ(|X| K) + lim inf log ν
n
([0, ))
log µ(|X| K) + inf ψ
ν
(λ)
= inf
λ
ψ
K
(λ)
= ψ
λ
K
.
Since
ψ
K
increases as
K
increases to infinity, this increases to some
J
we
have
lim inf
n
1
n
log µ
n
([0, )) J. ()
Since
ψ
K
(
λ
) are continuous,
{λ
:
ψ
K
(
λ
)
J}
is non-empty, compact and
nested in K. By Cantor’s theorem, we can find
λ
0
\
K
{λ : ψ
K
(λ) J}.
So the RHS of () satisfies
J sup
K
ψ
K
(λ
0
) = ψ(λ
0
) inf
λ
ψ(λ).