III Advanced Probability - Large deviations

6Large deviations

III Advanced Probability

6 Large deviations

So far, we have been interested in the average or “typical” behaviour of our

processes. Now, we are interested in “extreme cases”, i.e. events with small

probability. In general, our objective is to show that these probabilities tend to

zero very quickly.

Let (

)

n≥0

be a sequence of iid integrable random variables in

and mean

value EX

= ¯x and finite variance σ

. We let

= X

+ ··· + X

By the central limit theorem, we have

P(S

≥ n¯x +

√

nσa) → P(Z ≥ a) as n → ∞,

where Z ∼ N(0, 1). This implies

P(S

≥ an) → 0

for any a > ¯x. The question is then how fast does this go to zero?

There is a very useful lemma in the theory of sequences that tells us this

vanishes exponentially quickly with n. Note that

P(S

m+n

≥ a(m + n)) ≥ P(S

≥ am)P(S

≥ an).

So the sequence P(S

≥ an) is super-multiplicative. Thus, the sequence

= −log P(S

≥ an)

is sub-additive.

Lemma

(Fekete)

is a non-negative sub-additive sequence, then

lim

exists.

This implies the rate of decrease is exponential. Can we do better than that,

and point out exactly the rate of decrease?

For λ ≥ 0, consider the moment generating function

M(λ) = Ee

λX

We set ψ(λ) = log M(λ), and the Legendre transform of ψ is

∗

(a) = sup

λ≥0

(aλ − ψ(λ)).

Note that these things may be infinite.

Theorem (Cram´er’s theorem). For a > ¯x, we have

lim

n→∞

log P(S

≥ an) = −ψ

∗

(a).

Note that we always have

∗

(a) = sup

λ≥0

(aλ − ψ(λ)) ≥ −ψ(0) = 0.

Proof. We first prove an upper bound. For any λ, Markov tells us

P(S

≥ an) = P(e

λS

≥ e

λan

) ≤ e

−λa

λS

= e

−λan

i=1

λX

= e

−λan

M(λ)

= e

−n(λa−ψ(λ))

Since

was arbitrary, we can pick

to maximize

λa −ψ

(

), and so by definition

of ψ

∗

(a), we have P(S

≥ a

) ≤ e

−nψ

∗

(a)

. So it follows that

lim sup

log P(S

≥ a

) ≤ −ψ

∗

(a).

The lower bound is a bit more involved. One checks that by translating

we may assume a = 0, and in particular, ¯x < 0.

So we want to prove that

lim inf

log P(S

≥ 0) ≥ inf

λ≥0

ψ(λ).

We consider cases:

– If P(X ≤ 0) = 1, then

P(S

≥ 0) = P(X

= 0 for i = 1, . . . n) = P(X

= 0)

So in fact

lim inf

log P(S

≥ 0) = log P(X

= 0).

But by monotone convergence, we have

P(X

= 0) = lim

λ→∞

λX

So we are done.

–

Consider the case

(

0, but

(

∈

[

−K, K

]) = 1 for some

The idea is to modify

so that it has mean 0. For

, we define a

new distribution by the density

dµ

(x) =

θx

M(θ)

We define

g(θ) =

x dµ

(x).

We claim that g is continuous for θ ≥ 0. Indeed, by definition,

g(θ) =

θx

dµ(x)

θx

dµ(x)

and both the numerator and denominator are continuous in

by dominated

convergence.

Now observe that g(0) = ¯x, and

lim sup

θ→∞

g(θ) > 0.

So by the intermediate value theorem, we can find some

such that

g(θ

) = 0.

Define

to be the law of the sum of

iid random variables with law

We have

P(S

≥ 0) ≥ P(S

∈ [0, εn]) ≥ Ee

−εn)

∈[0,εn]

using the fact that on the event

∈

, εn

], we have

−εn)

≤

1. So

we have

P(S

≥ 0) ≥ M(θ

)

−θ

εn

({S

∈ [0, εn]}).

By the central limit theorem, for each fixed ε, we know

({S

∈ [0, εn]}) →

as n → ∞.

So we can write

lim inf

log P(S

≥ 0) ≥ ψ(θ

) − θ

ε.

Then take the limit ε → 0 to conclude the result.

–

Finally, we drop the finiteness assumption, and only assume

(

We define

to be the law of

condition on the event

{|X

| ≤ K}

. Let

be the law of the sum of n iid random variables with law ν. Define

(λ) = log

−K

λx

dµ(x)

(λ) = log

∞

−∞

λx

dν(x) = ψ

(λ) − log µ({|X| ≤ K}).

Note that for

large enough,

(

)

0. So we can use the previous

case. By definition of ν, we have

([0, ∞)) ≥ ν([0, ∞))µ(|X| ≤ K)

So we have

lim inf

log µ([0, ∞)) ≥ log µ(|X| ≤ K) + lim inf log ν

([0, ∞))

≥ log µ(|X| ≤ K) + inf ψ

(λ)

= inf

(λ)

= ψ

Since

increases as

increases to infinity, this increases to some

have

lim inf

log µ

([0, ∞)) ≥ J. (†)

Since

(

) are continuous,

{λ

(

)

≤ J}

is non-empty, compact and

nested in K. By Cantor’s theorem, we can find

∈

{λ : ψ

(λ) ≤ J}.

So the RHS of (†) satisfies

J ≥ sup

(λ

) = ψ(λ

) ≥ inf

ψ(λ).