4Weak convergence of measures

III Advanced Probability



4 Weak convergence of measures
Often, we may want to consider random variables defined on different spaces.
Since we cannot directly compare them, a sensible approach would be to use
them to push our measure forward to R, and compare them on R.
Definition
(Law)
.
Let
X
be a random variable on (Ω
, F, P
). The law of
X
is
the probability measure µ on (R, B(R)) defined by
µ(A) = P(X
1
(A)).
Example. For x R, we have the Dirac δ measure
δ
x
(A) = 1
{xA}
.
This is the law of a random variable that constantly takes the value x.
Now if we have a sequence
x
n
x
, then we would like to say
δ
x
n
δ
x
. In
what sense is this true? Suppose f is continuous. Then
Z
fdδ
x
n
= f(x
n
) f (x) =
Z
fdδ
x
.
So we do have some sort of convergence if we pair it with a continuous function.
Definition
(Weak convergence)
.
Let (
µ
n
)
n0
,
µ
be probability measures on
a metric space (
M, d
) with the Borel measure. We say that
µ
n
µ
, or
µ
n
converges weakly to µ if
µ
n
(f) µ(f)
for all f bounded and continuous.
If (
X
n
)
n0
are random variables, then we say (
X
n
) converges in distribution
if µ
X
n
converges weakly.
Note that in general, weak convergence does not say anything about how
measures of subsets behave.
Example.
If
x
n
x
, then
δ
x
n
δ
x
weakly. However, if
x
n
6
=
x
for all
n
, then
δ
x
n
({x}) = 0 but δ
x
({x}) = 1. So
δ
x
n
({x}) 6→ δ
n
({x}).
Example. Pick X = [0, 1]. Let µ
n
=
1
n
P
n
k=1
δ
k
n
. Then
µ
n
(f) =
1
n
n
X
k=1
f
k
n
.
So µ
n
converges to the Lebesgue measure.
Proposition. Let (µ
n
)
n0
be as above. Then, the following are equivalent:
(i) (µ
n
)
n0
converges weakly to µ.
(ii) For all open G, we have
lim inf
n→∞
µ
n
(G) µ(G).
(iii) For all closed A, we have
lim sup
n→∞
µ
n
(A) µ(A).
(iv) For all A such that µ(A) = 0, we have
lim
n→∞
µ
n
(A) = µ(A)
(v)
(when
M
=
R
)
F
µ
n
(
x
)
F
µ
(
x
) for all
x
at which
F
µ
is continuous, where
F
µ
is the distribution function of µ, defined by F
µ
(x) = µ
n
((−∞, t]).
Proof.
(i)
(ii): The idea is to approximate the open set by continuous functions.
We know A
c
is closed. So we can define
f
N
(x) = 1 (N · dist(x, A
c
)).
This has the property that for all N > 0, we have
f
N
1
A
,
and moreover
f
N
% 1
A
as
N
. Now by definition of weak convergence,
lim inf
n→∞
µ(A) lim inf
n→∞
µ
n
(f
N
) = µ(F
N
) µ(A) as N .
(ii) (iii): Take complements.
(iii) and (ii) (iv): Take A such that µ(A) = 0. Then
µ(A) = µ(
˚
A) = µ(
¯
A).
So we know that
lim inf
n→∞
µ
n
(A) lim inf
n→∞
µ
n
(
˚
A) µ(
˚
A) = µ(A).
Similarly, we find that
µ(A) lim sup
n→∞
µ
n
(A).
So we are done.
(iv) (i): We have
µ(f) =
Z
M
f(x) dµ(x)
=
Z
M
Z
0
1
f(x)t
dt dµ(x)
=
Z
0
µ({f t}) dt.
Since
f
is continuous,
{f t} {f
=
t}
. Now there can be only
countably many
t
’s such that
µ
(
{f
=
t}
)
>
0. So replacing
µ
by
lim
n→∞
µ
n
only changes the integrand at countably many places, hence doesn’t affect
the integral. So we conclude using bounded convergence theorem.
(iv) (v): Assume t is a continuity point of F
µ
. Then we have
µ((−∞, t]) = µ({t}) = F
µ
(t) F
µ
(t
) = 0.
So µ
n
(
n
(−∞, t]) µ((−∞, t]), and we are done.
(v) (ii): If A = (a, b), then
µ
n
(A) F
µ
n
(b
0
) F
µ
n
(a
0
)
for any
a a
0
b
0
b
with
a
0
, b
0
continuity points of
F
µ
. So we know
that
lim inf
n→∞
µ
n
(A) F
µ
(b
0
) F
µ
(a
0
) = µ(a
0
, b
0
).
By taking supremum over all such a
0
, b
0
, we find that
lim inf
n→∞
µ
n
(A) µ(A).
Definition
(Tight probability measures)
.
A sequence of probability measures
(
µ
n
)
n0
on a metric space (
M, e
) is tight if for all
ε >
0, there exists compact
K M such that
sup
n
µ
n
(M \ K) ε.
Note that this is always satisfied for compact metric spaces.
Theorem
(Prokhorov’s theorem)
.
If (
µ
n
)
n0
is a sequence of tight probability
measures, then there is a subsequence (
µ
n
k
)
k0
and a measure
µ
such that
µ
n
k
µ.
To see how this can fail without the tightness assumption, suppose we define
measures µ
n
on R by
µ
n
(A) = ˜µ(A [n, n + 1]),
where
˜µ
is the Lebesgue measure. Then for any bounded set
S
, we have
lim
n→∞
µ
n
(
S
) = 0. Thus, if the weak limit existed, it must be everywhere zero,
but this does not give a probability measure.
We shall prove this only in the case
M
=
R
. It is not difficult to construct a
candidate of what the weak limit should be. Simply use Bolzano–Weierstrass to
pick a subsequence of the measures such that the distribution functions converge
on the rationals. Then the limit would essentially be what we want. We then
apply tightness to show that this is a genuine distribution.
Proof.
Take
Q R
, which is dense and countable. Let
x
1
, x
2
, . . .
be an enumera-
tion of
Q
. Define
F
n
=
F
µ
n
. By Bolzano–Weierstrass, and some fiddling around
with sequences, we can find some F
n
k
such that
F
n
k
(x
i
) y
i
F (x
i
)
as k , for each fixed x
i
.
Since
F
is non-decreasing on
Q
, it has left and right limits everywhere. We
extend F to R by taking right limits. This implies F is cadlag.
Take
x
a continuity point of
F
. Then for each
ε >
0, there exists
s < x < t
rational such that
|F (s) F (t)| <
ε
2
.
Take
n
large enough such that
|F
n
(
s
)
F
(
s
)
| <
ε
4
, and same for
t
. Then by
monotonicity of F and F
n
, we have
|F
n
(x) F (x)| |F (s) F (t)| + |F
n
(s) F (s)| + |F
n
(t) F (t)| ε.
It remains to show that
F
(
x
)
1 as
x
and
F
(
x
)
0 as
x −∞
. By
tightness, for all ε > 0, there exists N > 0 such that
µ
n
((−∞, N]) ε, µ
n
((N, ) ε.
This then implies what we want.
We shall end the chapter with an alternative characterization of weak con-
vergence, using characteristic functions.
Definition
(Characteristic function)
.
Let
X
be a random variable taking values
in R
d
. The characteristic function of X is the function R
d
C defined by
ϕ
X
(t) = Ee
iht,xi
=
Z
R
d
e
iht,xi
dµ
X
(x).
Note that ϕ
X
is continuous by bounded convergence, and ϕ
X
(0) = 1.
Proposition. If ϕ
X
= ϕ
Y
, then µ
X
= µ
Y
.
Theorem
(L´evy’s convergence theroem)
.
Let (
X
n
)
n0
,
X
be random variables
taking values in R
d
. Then the following are equivalent:
(i) µ
X
n
µ
X
as n .
(ii) ϕ
X
n
ϕ
X
pointwise.
We will in fact prove a stronger theorem.
Theorem
(L´evy)
.
Let (
X
n
)
n0
be as above, and let
ϕ
X
n
(
t
)
ψ
(
t
) for all
t
.
Suppose
ψ
is continuous at 0 and
ψ
(0) = 1. Then there exists a random variable
X such that ϕ
X
= ψ and µ
X
n
µ
X
as n .
We will only prove the case d = 1. We first need the following lemma:
Lemma. Let X be a real random variable. Then for all λ > 0,
µ
X
(|x| λ)
Z
1
0
(1 Re ϕ
X
(t)) dt,
where C = (1 sin 1)
1
.
Proof. For M 1, we have
Z
M
0
(1 cos t) dt = M sin M M(1 sin 1).
By setting M =
|x|
λ
, we have
1
|X|≥λ
C
λ
|X|
Z
|X|
0
(1 cos t) dt.
By a change of variables with t 7→ Xt, we have
1
|X|≥λ
Z
1
0
(1 cos Xt) dt.
Apply µ
X
, and use the fact that Re ϕ
X
(t) = E cos(Xt).
We can now prove evy’s theorem.
Proof of theorem.
It is clear that weak convergence implies convergence in char-
acteristic functions.
Now observe that if
µ
n
µ
iff from every subsequence (
n
k
)
k0
, we can
choose a further subsequence (
n
k
`
) such that
µ
n
k
`
µ
as
`
. Indeed,
is
clear, and suppose
µ
n
6⇒ µ
but satisfies the subsequence property. Then we can
choose a bounded and continuous function f such that
µ
n
(f) 6⇒ µ(f).
Then there is a subsequence (
n
k
)
k0
such that
|µ
n
k
(
f
)
µ
(
f
)
| > ε
. Then there
is no further subsequence that converges.
Thus, to show
, we need to prove the existence of subsequential limits
(uniqueness follows from convergence of characteristic functions). It is enough to
prove tightness of the whole sequence.
By the mean value theorem, we can choose λ so large that
Z
1
0
(1 Re ψ(t)) dt <
ε
2
.
By bounded convergence, we can choose λ so large that
Z
1
0
(1 Re ϕ
X
n
(t)) dt ε
for all
n
. Thus, by our previous lemma, we know (
µ
X
n
)
n0
is tight. So we are
done.