Part II — Probability and Measure
Based on lectures by J. Miller
Notes taken by Dexter Chua
Michaelmas 2016
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Analysis II is essential
Measure spaces,
σ
-algebras,
π
-systems and uniqueness of extension, statement *and
proof* of Carath´eodory’s extension theorem. Construction of Lebesgue measure on
R
.
The Borel
σ
-algebra of
R
. Existence of non-measurable subsets of
R
. Lebesgue-Stieltjes
measures and probability distribution functions. Independence of events, independence
of σ-algebras. The Borel–Cantelli lemmas. Kolmogorov’s zero-one law. [6]
Measurable functions, random variables, independence of random variables. Construc-
tion of the integral, expectation. Convergence in measure and convergence almost
everywhere. Fatou’s lemma, monotone and dominated convergence, differentiation
under the integral sign. Discussion of product measure and statement of Fubini’s
theorem. [6]
Chebyshev’s inequality, tail estimates. Jensen’s inequality. Completeness of
L
p
for
1 ≤ p ≤ ∞. The H¨older and Minkowski inequalities, uniform integrability. [4]
L
2
as a Hilbert space. Orthogonal projection, relation with elementary conditional
probability. Variance and covariance. Gaussian random variables, the multivariate
normal distribution. [2]
The strong law of large numbers, proof for independent random variables with bounded
fourth moments. Measure preserving transformations, Bernoulli shifts. Statements
*and proofs* of maximal ergodic theorem and Birkhoff’s almost everywhere ergodic
theorem, proof of the strong law. [4]
The Fourier transform of a finite measure, characteristic functions, uniqueness and
inversion. Weak convergence, statement of L´evy’s convergence theorem for characteristic
functions. The central limit theorem. [2]
Contents
0 Introduction
1 Measures
1.1 Measures
1.2 Probability measures
2 Measurable functions and random variables
2.1 Measurable functions
2.2 Constructing new measures
2.3 Random variables
2.4 Convergence of measurable functions
2.5 Tail events
3 Integration
3.1 Definition and basic properties
3.2 Integrals and limits
3.3 New measures from old
3.4 Integration and differentiation
3.5 Product measures and Fubini’s theorem
4 Inequalities and L
p
spaces
4.1 Four inequalities
4.2 L
p
spaces
4.3 Orthogonal projection in L
2
4.4 Convergence in L
1
(P) and uniform integrability
5 Fourier transform
5.1 The Fourier transform
5.2 Convolutions
5.3 Fourier inversion formula
5.4 Fourier transform in L
2
5.5 Properties of characteristic functions
5.6 Gaussian random variables
6 Ergodic theory
6.1 Ergodic theorems
7 Big theorems
7.1 The strong law of large numbers
7.2 Central limit theorem
0 Introduction
In measure theory, the main idea is that we want to assign “sizes” to different
sets. For example, we might think [0
,
2]
⊆ R
has size 2, while perhaps
Q ⊆ R
has
size 0. This is known as a measure. One of the main applications of a measure
is that we can use it to come up with a new definition of an integral. The idea
is very simple, but it is going to be very powerful mathematically.
Recall that if
f
: [0
,
1]
→ R
is continuous, then the Riemann integral of
f
is
defined as follows:
(i) Take a partition 0 = t
0
< t
1
< ··· < t
n
= 1 of [0, 1].
(ii) Consider the Riemann sum
n
X
j=1
f(t
j
)(t
j
− t
j−1
)
(iii) The Riemann integral is
Z
f = Limit of Riemann sums as the mesh size of the partition → 0.
x
y
0
t
1
t
2
t
3
···
t
k
t
k+1
···
···
1
The idea of measure theory is to use a different approximation scheme. Instead
of partitioning the domain, we partition the range of the function. We fix some
numbers r
0
< r
1
< r
2
< ··· < r
n
.
We then approximate the integral of f by
n
X
j=1
r
j
· (“size of f
−1
([r
j−1
, r
j
])”).
We then define the integral as the limit of approximations of this type as the
mesh size of the partition → 0.
x
y
We can make an analogy with bankers — If a Riemann banker is given a stack
of money, they would just add the values of the money in order. A measure-
theoretic banker will sort the bank notes according to the type, and then find
the total value by multiplying the number of each type by the value, and adding
up.
Why would we want to do so? It turns out this leads to a much more
general theory of integration on much more general spaces. Instead of integrating
functions [
a, b
]
→ R
only, we can replace the domain with any measure space.
Even in the context of
R
, this theory of integration is much much more powerful
than the Riemann sum, and can integrate a much wider class of functions. While
you probably don’t care about those pathological functions anyway, being able
to integrate more things means that we can state more general theorems about
integration without having to put in funny conditions.
That was all about measures. What about probability? It turns out the
concepts we develop for measures correspond exactly to many familiar notions
from probability if we restrict it to the particular case where the total measure
of the space is 1. Thus, when studying measure theory, we are also secretly
studying probability!
1 Measures
In the course, we will write
f
n
% f
for “
f
n
converges to
f
monotonically
increasingly”, and
f
n
& f
similarly. Unless otherwise specified, convergence is
taken to be pointwise.
1.1 Measures
The starting point of all these is to come up with a function that determines
the “size” of a given set, known as a measure. It turns out we cannot sensibly
define a size for all subsets of [0
,
1]. Thus, we need to restrict our attention to a
collection of “nice” subsets. Specifying which subsets are “nice” would involve
specifying a σ-algebra.
This section is mostly technical.
Definition
(
σ
-algebra)
.
Let
E
be a set. A
σ
-algebra
E
on
E
is a collection of
subsets of E such that
(i) ∅ ∈ E.
(ii) A ∈ E implies that A
C
= X \ A ∈ E.
(iii) For any sequence (A
n
) in E, we have that
[
n
A
n
∈ E.
The pair (E, E) is called a measurable space.
Note that the axioms imply that
σ
-algebras are also closed under countable
intersections, as we have A ∩ B = (A
C
∪ B
C
)
C
.
Definition
(Measure)
.
A measure on a measurable space (
E, E
) is a function
µ : E → [0, ∞] such that
(i) µ(∅) = 0
(ii) Countable additivity: For any disjoint sequence (A
n
) in E, then
µ
[
n
A
n
!
=
∞
X
n=1
µ(A
n
).
Example.
Let
E
be any countable set, and
E
=
P
(
E
) be the set of all subsets
of
E
. A mass function is any function
m
:
E →
[0
, ∞
]. We can then define a
measure by setting
µ(A) =
X
x∈A
m(x).
In particular, if we put
m
(
x
) = 1 for all
x ∈ E
, then we obtain the counting
measure.
Countable spaces are nice, because we can always take
E
=
P
(
E
), and the
measure can be defined on all possible subsets. However, for “bigger” spaces, we
have to be more careful. The set of all subsets is often “too large”. We will see
a concrete and also important example of this later.
In general,
σ
-algebras are often described on large spaces in terms of a smaller
set, known as the generating sets.
Definition
(Generator of
σ
-algebra)
.
Let
E
be a set, and that
A ⊆ P
(
E
) be a
collection of subsets of E. We define
σ(A) = {A ⊆ E : A ∈ E for all σ-algebras E that contain A}.
In other words
σ
(
A
) is the smallest sigma algebra that contains
A
. This is
known as the sigma algebra generated by A.
Example.
Take
E
=
Z
, and
A
=
{{x}
:
x ∈ Z}
. Then
σ
(
A
) is just
P
(
E
), since
every subset of E can be written as a countable union of singletons.
Example.
Take
E
=
Z
, and let
A
=
{{x, x
+ 1
, x
+ 2
, x
+ 3
, ···}
:
x ∈ E}
. Then
again σ(E) is the set of all subsets of E.
The following is the most important σ-algebra in the course:
Definition
(Borel
σ
-algebra)
.
Let
E
=
R
, and
A
=
{U ⊆ R
:
U is open}
. Then
σ(A) is known as the Borel σ-algebra, which is not the set of all subsets of R.
We can equivalently define this by
˜
A
=
{
(
a, b
) :
a < b, a, b ∈ Q}
. Then
σ
(
˜
A
)
is also the Borel σ-algebra.
Often, we would like to prove results that allow us to deduce properties
about the
σ
-algebra just by checking it on a generating set. However, usually,
we cannot just check it on an arbitrary generating set. Instead, the generating
set has to satisfy some nice closure properties. We are now going to introduce a
bunch of many different definitions that you need not aim to remember (except
when exams are near).
Definition
(
π
-system)
.
Let
A
be a collection of subsets of
E
. Then
A
is called
a π-system if
(i) ∅ ∈ A
(ii) If A, B ∈ A, then A ∩ B ∈ A.
Definition
(d-system)
.
Let
A
be a collection of subsets of
E
. Then
A
is called
a d-system if
(i) E ∈ A
(ii) If A, B ∈ A and A ⊆ B, then B \ A ∈ A
(iii) For all increasing sequences (A
n
) in A, we have that
S
n
A
n
∈ A.
The point of d-systems and
π
-systems is that they separate the axioms of a
σ-algebra into two parts. More precisely, we have
Proposition.
A collection
A
is a
σ
-algebra if and only if it is both a
π
-system
and a d-system.
This follows rather straightforwardly from the definitions.
The following definitions are also useful:
Definition
(Ring)
.
A collection of subsets
A
is a ring on
E
if
∅ ∈ A
and for all
A, B ∈ A, we have B \ A ∈ A and A ∪ B ∈ A.
Definition
(Algebra)
.
A collection of subsets
A
is an algebra on
E
if
∅ ∈ A
,
and for all A, B ∈ A, we have A
C
∈ A and A ∪ B ∈ A.
So an algebra is like a
σ
-algebra, but it is just closed under finite unions only,
rather than countable unions.
While the names
π
-system and
d
-system are rather arbitrary, we can make
some sense of the names “ring” and “algebra”. Indeed, a ring forms a ring
(without unity) in the algebraic sense with symmetric difference as “addition”
and intersection as “multiplication”. Then the empty set acts as the additive
identity, and
E
, if present, acts as the multiplicative identity. Similarly, an
algebra is a boolean subalgebra under the boolean algebra P (E).
A very important lemma about these things is Dynkin’s lemma:
Lemma
(Dynkin’s
π
-system lemma)
.
Let
A
be a
π
-system. Then any d-system
which contains A contains σ(A).
This will be very useful in the future. If we want to show that all elements of
σ
(
A
) satisfy a particular property for some generating
π
-system
A
, we just have
to show that the elements of
A
satisfy that property, and that the collection of
things that satisfy the property form a d-system.
While this use case might seem rather contrived, it is surprisingly common
when we have to prove things.
Proof.
Let
D
be the intersection of all d-systems containing
A
, i.e. the smallest
d-system containing
A
. We show that
D
contains
σ
(
A
). To do so, we will show
that D is a π-system, hence a σ-algebra.
There are two steps to the proof, both of which are straightforward verifica-
tions:
(i) We first show that if B ∈ D and A ∈ A, then B ∩ A ∈ D.
(ii) We then show that if A, B ∈ D, then A ∩ B ∈ D.
Then the result immediately follows from the second part.
We let
D
0
= {B ∈ D : B ∩ A ∈ D for all A ∈ A}.
We note that
D
0
⊇ A
because
A
is a
π
-system, and is hence closed under
intersections. We check that
D
0
is a d-system. It is clear that
E ∈ D
0
. If we
have B
1
, B
2
∈ D
0
, where B
1
⊆ B
2
, then for any A ∈ A, we have
(B
2
\ B
1
) ∩ A = (B
2
∩ A) \ (B
1
∩ A).
By definition of
D
0
, we know
B
2
∩ A
and
B
1
∩ A
are elements of
D
. Since
D
is
a d-system, we know this intersection is in D. So B
2
\ B
1
∈ D
0
.
Finally, suppose that (
B
n
) is an increasing sequence in
D
0
, with
B
=
S
B
n
.
Then for every A ∈ A, we have that
[
B
n
∩ A =
[
(B
n
∩ A) = B ∩ A ∈ D.
Therefore B ∈ D
0
.
Therefore
D
0
is a d-system contained in
D
, which also contains
A
. By our
choice of D, we know D
0
= D.
We now let
D
00
= {B ∈ D : B ∩ A ∈ D for all A ∈ D}.
Since
D
0
=
D
, we again have
A ⊆ D
00
, and the same argument as above implies
that
D
00
is a d-system which is between
A
and
D
. But the only way that can
happen is if D
00
= D, and this implies that D is a π-system.
After defining all sorts of things that are “weaker versions” of
σ
-algebras, we
now defined a bunch of measure-like objects that satisfy fewer properties. Again,
no one really remembers these definitions:
Definition
(Set function)
.
Let
A
be a collection of subsets of
E
with
∅ ∈ A
. A
set function function µ : A → [0, ∞] such that µ(∅) = 0.
Definition
(Increasing set function)
.
A set function is increasing if it has the
property that for all A, B ∈ A with A ⊆ B, we have µ(A) ≤ µ(B).
Definition
(Additive set function)
.
A set function is additive if whenever
A, B ∈ A and A ∪ B ∈ A, A ∩ B = ∅, then µ(A ∪ B) = µ(A) + µ(B).
Definition
(Countably additive set function)
.
A set function is countably addi-
tive if whenever A
n
is a sequence of disjoint sets in A with ∪A
n
∈ A, then
µ
[
n
A
n
!
=
X
n
µ(A
n
).
Under these definitions, a measure is just a countable additive set function
defined on a σ-algebra.
Definition
(Countably subadditive set function)
.
A set function is countably
subadditive if whenever (A
n
) is a sequence of sets in A with
S
n
A
n
∈ A, then
µ
[
n
A
n
!
≤
X
n
µ(A
n
).
The big theorem that allows us to construct measures is the Caratheodory
extension theorem. In particular, this will help us construct the Lebesgue measure
on R.
Theorem
(Caratheodory extension theorem)
.
Let
A
be a ring on
E
, and
µ
a countably additive set function on
A
. Then
µ
extends to a measure on the
σ-algebra generated by A.
Proof.
(non-examinable) We start by defining what we want our measure to be.
For B ⊆ E, we set
µ
∗
(B) = inf
(
X
n
µ(A
n
) : (A
n
) ∈ A and B ⊆
[
A
n
)
.
If it happens that there is no such sequence, we set this to be
∞
. This measure is
known as the outer measure. It is clear that
µ
∗
(
φ
) = 0, and that
µ
∗
is increasing.
We say a set A ⊆ E is µ
∗
-measurable if
µ
∗
(B) = µ
∗
(B ∩ A) + µ
∗
(B ∩ A
C
)
for all B ⊆ E. We let
M = {µ
∗
-measurable sets}.
We will show the following:
(i) M is a σ-algebra containing A.
(ii) µ
∗
is a measure on M with µ
∗
|
A
= µ.
Note that it is not true in general that
M
=
σ
(
A
). However, we will always
have M ⊇ σ(A).
We are going to break this up into five nice bite-size chunks.
Claim. µ
∗
is countably subadditive.
Suppose
B ⊆
S
n
B
n
. We need to show that
µ
∗
(
B
)
≤
P
n
µ
∗
(
B
n
). We can
wlog assume that
µ
∗
(
B
n
) is finite for all
n
, or else the inequality is trivial. Let
ε >
0. Then by definition of the outer measure, for each
n
, we can find a
sequence (B
n,m
)
∞
m=1
in A with the property that
B
n
⊆
[
m
B
n,m
and
µ
∗
(B
n
) +
ε
2
n
≥
X
m
µ(B
n,m
).
Then we have
B ⊆
[
n
B
n
⊆
[
n,m
B
n,m
.
Thus, by definition, we have
µ
∗
(B) ≤
X
n,m
µ
∗
(B
n,m
) ≤
X
n
µ
∗
(B
n
) +
ε
2
n
= ε +
X
n
µ
∗
(B
n
).
Since ε was arbitrary, we are done.
Claim. µ
∗
agrees with µ on A.
In the first example sheet, we will show that if
A
is a ring and
µ
is a countably
additive set function on
µ
, then
µ
is in fact countably subadditive and increasing.
Assuming this, suppose that
A,
(
A
n
) are in
A
and
A ⊆
S
n
A
n
. Then by
subadditivity, we have
µ(A) ≤
X
n
µ(A ∩ A
n
) ≤
X
n
µ(A
n
),
using that
µ
is countably subadditivity and increasing. Note that we have to do
this in two steps, rather than just applying countable subadditivity, since we did
not assume that
S
n
A
n
∈ A. Taking the infimum over all sequences, we have
µ(A) ≤ µ
∗
(A).
Also, we see by definition that
µ
(
A
)
≥ µ
∗
(
A
), since
A
covers
A
. So we get that
µ(A) = µ
∗
(A) for all A ∈ A.
Claim. M contains A.
Suppose that A ∈ A and B ⊆ E. We need to show that
µ
∗
(B) = µ
∗
(B ∩ A) + µ
∗
(B ∩ A
C
).
Since
µ
∗
is countably subadditive, we immediately have
µ
∗
(
B
)
≤ µ
∗
(
B ∩ A
) +
µ
∗
(
B ∩ A
C
). For the other inequality, we first observe that it is trivial if
µ
∗
(
B
)
is infinite. If it is finite, then by definition, given
ε >
0, we can find some (
B
n
)
in A such that B ⊆
S
n
B
n
and
µ
∗
(B) + ε ≥
X
n
µ(B
n
).
Then we have
B ∩ A ⊆
[
n
(B
n
∩ A)
B ∩ A
C
⊆
[
n
(B
n
∩ A
C
)
We notice that B
n
∩ A
C
= B
n
\ A ∈ A. Thus, by definition of µ
∗
, we have
µ
∗
(B ∩ A) + µ
∗
(B ∩ A
c
) ≤
X
n
µ(B
n
∩ A) +
X
n
µ(B
n
∩ A
C
)
=
X
n
(µ(B
n
∩ A) + µ(B
n
∩ A
C
))
=
X
n
µ(B
n
)
≤ µ
∗
(B
n
) + ε.
Since ε was arbitrary, the result follows.
Claim. We show that M is an algebra.
We first show that E ∈ M. This is true since we obviously have
µ
∗
(B) = µ
∗
(B ∩ E) + µ
∗
(B ∩ E
C
)
for all B ⊆ E.
Next, note that if A ∈ M, then by definition we have, for all B,
µ
∗
(B) = µ
∗
(B ∩ A) + µ
∗
(B ∩ A
C
).
Now note that this definition is symmetric in
A
and
A
C
. So we also have
A
C
∈ M .
Finally, we have to show that
M
is closed under intersection (which is
equivalent to being closed under union when we have complements). Suppose
A
1
, A
2
∈ M and B ⊆ E. Then we have
µ
∗
(B) = µ
∗
(B ∩ A
1
) + µ
∗
(B ∩ A
C
1
)
= µ
∗
(B ∩ A
1
∩ A
2
) + µ
∗
(B ∩ A
1
∩ A
C
2
) + µ
∗
(B ∩ A
C
1
)
= µ
∗
(B ∩ (A
1
∩ A
2
)) + µ
∗
(B ∩ (A
1
∩ A
2
)
C
∩ A
1
)
+ µ
∗
(B ∩ (A
1
∩ A
2
)
C
∩ A
C
1
)
= µ
∗
(B ∩ (A
1
∩ A
2
)) + µ
∗
(B ∩ (A
1
∩ A
2
)
C
).
So we have A
1
∩ A
2
∈ M. So M is an algebra.
Claim. M is a σ-algebra, and µ
∗
is a measure on M.
To show that
M
is a
σ
-algebra, we need to show that it is closed under
countable unions. We let (
A
n
) be a disjoint collection of sets in
M
, then we
want to show that A =
S
n
A
n
∈ M and µ
∗
(A) =
P
n
µ
∗
(A
n
).
Suppose that B ⊆ E. Then we have
µ
∗
(B) = µ
∗
(B ∩ A
1
) + µ
∗
(B ∩ A
C
1
)
Using the fact that A
2
∈ M and A
1
∩ A
2
= ∅, we have
= µ
∗
(B ∩ A
1
) + µ
∗
(B ∩ A
2
) + µ
∗
(B ∩ A
C
1
∩ A
C
2
)
= ···
=
n
X
i=1
µ
∗
(B ∩ A
i
) + µ
∗
(B ∩ A
C
1
∩ ··· ∩ A
C
n
)
≥
n
X
i=1
µ
∗
(B ∩ A
i
) + µ
∗
(B ∩ A
C
).
Taking the limit as n → ∞, we have
µ
∗
(B) ≥
∞
X
i=1
µ
∗
(B ∩ A
i
) + µ
∗
(B ∩ A
C
).
By the countable-subadditivity of µ
∗
, we have
µ
∗
(B ∩ A) ≤
∞
X
i=1
µ
∗
(B ∩ A
i
).
Thus we obtain
µ
∗
(B) ≥ µ
∗
(B ∩ A) + µ
∗
(B ∩ A
C
).
By countable subadditivity, we also have inequality in the other direction. So
equality holds. So A ∈ M. So M is a σ-algebra.
To see that µ
∗
is a measure on M, note that the above implies that
µ
∗
(B) =
∞
X
i=1
(B ∩ A
i
) + µ
∗
(B ∩ A
C
).
Taking B = A, this gives
µ
∗
(A) =
∞
X
i=1
(A ∩ A
i
) + µ
∗
(A ∩ A
C
) =
∞
X
i=1
µ
∗
(A
i
).
Note that when
A
itself is actually a
σ
-algebra, the outer measure can be
simply written as
µ
∗
(B) = inf{µ(A) : A ∈ A, B ⊆ A}.
Caratheodory gives us the existence of some measure extending the set function
on
A
. Could there be many? In general, there could. However, in the special
case where the measure is finite, we do get uniqueness.
Theorem.
Suppose that
µ
1
, µ
2
are measures on (
E, E
) with
µ
1
(
E
) =
µ
2
(
E
)
<
∞
. If
A
is a
π
-system with
σ
(
A
) =
E
, and
µ
1
agrees with
µ
2
on
A
, then
µ
1
=
µ
2
.
Proof. Let
D = {A ∈ E : µ
1
(A) = µ
2
(A)}
We know that
D ⊇ A
. By Dynkin’s lemma, it suffices to show that
D
is a
d-system. The things to check are:
(i) E ∈ D — this follows by assumption.
(ii) If A, B ∈ D with A ⊆ B, then B \ A ∈ D. Indeed, we have the equations
µ
1
(B) = µ
1
(A) + µ
1
(B \ A) < ∞
µ
2
(B) = µ
2
(A) + µ
2
(B \ A) < ∞.
Since
µ
1
(
B
) =
µ
2
(
B
) and
µ
1
(
A
) =
µ
2
(
A
), we must have
µ
1
(
B \ A
) =
µ
2
(B \ A).
(iii) Let (A
n
) ∈ D be an increasing sequence with
S
A
n
= A. Then
µ
1
(A) = lim
n→∞
µ
1
(A
n
) = lim
n→∞
µ
2
(A
n
) = µ
2
(A).
So A ∈ D.
The assumption that
µ
1
(
E
) =
µ
2
(
E
)
< ∞
is necessary. The theorem does
not necessarily hold without it. We can see this from a simple counterexample:
Example. Let E = Z, and let E = P (E). We let
A = {{x, x + 1, x + 2, ···} : x ∈ E}∪ {∅}.
This is a
π
-system with
σ
(
A
) =
E
. We let
µ
1
(
A
) be the number of elements in
A
,
and
µ
2
= 2
µ
1
(
A
). Then obviously
µ
1
6
=
µ
2
, but
µ
1
(
A
) =
∞
=
µ
2
(
A
) for
A ∈ A
.
Definition
(Borel
σ
-algebra)
.
Let
E
be a topological space. We define the
Borel σ-algebra as
B(E) = σ({U ⊆ E : U is open}).
We write B for B(R).
Definition
(Borel measure and Radon measure)
.
A measure
µ
on (
E, B
(
E
)) is
called a Borel measure. If
µ
(
K
)
< ∞
for all
K ⊆ E
compact, then
µ
is a Radon
measure.
The most important example of a Borel measure we will consider is the
Lebesgue measure.
Theorem. There exists a unique Borel measure µ on R with µ([a, b]) = b − a.
Proof.
We first show uniqueness. Suppose
˜µ
is another measure on
B
satisfying
the above property. We want to apply the previous uniqueness theorem, but our
measure is not finite. So we need to carefully get around that problem.
For each n ∈ Z, we set
µ
n
(A) = µ(A ∩ (n, n + 1]))
˜µ
n
(A) = ˜µ(A ∩ (n, n + 1]))
Then
µ
n
and
˜µ
n
are finite measures on
B
which agree on the
π
-system of intervals
of the form (
a, b
] with
a, b ∈ R
,
a < b
. Therefore we have
µ
n
=
˜µ
n
for all
n ∈ Z
.
Now we have
µ(A) =
X
n∈Z
µ(A ∩ (n, n + 1]) =
X
n∈Z
µ
n
(A) =
X
n∈Z
˜µ
n
(A) = ˜µ(A)
for all Borel sets A.
To show existence, we want to use the Caratheodory extension theorem. We
let A be the collection of finite, disjoint unions of the form
A = (a
1
, b
1
] ∪ (a
2
, b
2
] ∪ ··· ∪ (a
n
, b
n
].
Then
A
is a ring of subsets of
R
, and
σ
(
A
) =
B
(details are to be checked on
the first example sheet).
We set
µ(A) =
n
X
i=1
(b
i
− a
i
).
We note that µ is well-defined, since if
A = (a
1
, b
1
] ∪ ··· ∪ (a
n
, b
n
] = (˜a
1
,
˜
b
1
] ∪ ··· ∪ (˜a
n
,
˜
b
n
],
then
n
X
i=1
(b
i
− a
i
) =
n
X
i=1
(
˜
b
i
− ˜a
i
).
Also, if
µ
is additive,
A, B ∈ A
,
A ∩ B
=
∅
and
A ∪ B ∈ A
, we obviously have
µ(A ∪ B) = µ(A) + µ(B). So µ is additive.
Finally, we have to show that
µ
is in fact countably additive. Let (
A
n
) be a
disjoint sequence in
A
, and let
A
=
S
∞
i=1
A
n
∈ A
. Then we need to show that
µ(A) =
P
∞
n=1
µ(A
n
).
Since µ is additive, we have
µ(A) = µ(A
1
) + µ(A \ A
1
)
= µ(A
1
) + µ(A
2
) + µ(A \ A
1
∪ A
2
)
=
n
X
i=1
µ(A
i
) + µ
A \
n
[
i=1
A
i
!
To finish the proof, we show that
µ
A \
n
[
i=1
A
i
!
→ 0 as n → ∞.
We are going to reduce this to the finite intersection property of compact sets in
R
:
if (
K
n
) is a sequence of compact sets in
R
with the property that
T
n
m=1
K
m
6
=
∅
for all n, then
T
∞
m=1
K
m
6= ∅.
We first introduce some new notation. We let
B
n
= A \
n
[
m=1
A
m
.
We now suppose, for contradiction, that
µ
(
B
n
)
6→
0 as
n → ∞
. Since the
B
n
’s
are decreasing, there must exist ε > 0 such that µ(B
n
) ≥ 2ε for every n.
For each
n
, we take
C
n
∈ A
with the property that
C
n
⊆ B
n
and
µ
(
B
n
\C
n
)
≤
ε
2
n
. This is possible since each
B
n
is just a finite union of intervals. Thus we
have
µ(B
n
) − µ
n
\
m=1
C
m
!
= µ
B
n
\
n
\
m=1
C
m
!
≤ µ
n
[
m=1
(B
m
\ C
m
)
!
≤
n
X
m=1
µ(B
m
\ C
m
)
≤
n
X
m=1
ε
2
m
≤ ε.
On the other hand, we also know that µ(B
n
) ≥ 2ε.
µ
n
\
m=1
C
m
!
≥ ε
for all
n
. We now let that
K
n
=
T
n
m=1
C
m
. Then
µ
(
K
n
)
≥ ε
, and in particular
K
n
6= ∅ for all n.
Thus, the finite intersection property says
∅ 6=
∞
\
n=1
K
n
⊆
∞
\
n=1
B
n
= ∅.
This is a contradiction. So we have µ(B
n
) → 0 as n → ∞. So done.
Definition
(Lebesgue measure)
.
The Lebesgue measure is the unique Borel
measure µ on R with µ([a, b]) = b − a.
Note that the Lebesgue measure is not a finite measure, since
µ
(
R
) =
∞
.
However, it is a σ-finite measure.
Definition
(
σ
-finite measure)
.
Let (
E, E
) be a measurable space, and
µ
a
measure. We say
µ
is
σ
-finite if there exists a sequence (
E
n
) in
E
such that
S
n
E
n
= E and µ(E
n
) < ∞ for all n.
This is the next best thing we can hope after finiteness, and often proofs
that involve finiteness carry over to σ-finite measures.
Proposition. The Lebesgue measure is translation invariant, i.e.
µ(A + x) = µ(A)
for all A ∈ B and x ∈ R, where
A + x = {y + x, y ∈ A}.
Proof. We use the uniqueness of the Lebesgue measure. We let
µ
x
(A) = µ(A + x)
for
A ∈ B
. Then this is a measure on
B
satisfying
µ
x
([
a, b
]) =
b − a
. So the
uniqueness of the Lebesgue measure shows that µ
x
= µ.
It turns out that translation invariance actually characterizes the Lebesgue
measure.
Proposition.
Let
˜µ
be a Borel measure on
R
that is translation invariant and
µ([0, 1]) = 1. Then ˜µ is the Lebesgue measure.
Proof. We show that any such measure must satisfy
µ([a, b]) = b − a.
By additivity and translation invariance, we can show that
µ
([
p, q
]) =
q −p
for all
rational
p < q
. By considering
µ
([
p, p
+ 1
/n
]) for all
n
and using the increasing
property, we know
µ
(
{p}
) = 0. So
µ
(([
p, q
)) =
µ
((
p, q
]) =
µ
((
p, q
)) =
q − p
for
all rational p, q.
Finally, by countable additivity, we can extend this to all real intervals. Then
the result follows from the uniqueness of the Lebesgue measure.
In the proof of the Caratheodory extension theorem, we constructed a measure
µ
∗
on the
σ
-algebra
M
of
µ
∗
-measurable sets which contains
A
. This contains
B
=
σ
(
A
), but could in fact be bigger than it. We call
M
the Lebesgue
σ
-algebra.
Indeed, it can be given by
M = {A ∪ N : A ∈ B, N ⊆ B ∈ B with µ(B) = 0}.
If A ∪ N ∈ M, then µ(A ∪ N ) = µ(A). The proof is left for the example sheet.
It is also true that
M
is strictly larger than
B
, so there exists
A ∈ M
with
A 6∈ B. Construction of such a set was on last year’s exam (2016).
On the other hand, it is also true that not all sets are Lebesgue measurable.
This is a rather funny construction.
Example.
For
x, y ∈
[0
,
1), we say
x ∼ y
if
x − y
is rational. This defines an
equivalence relation on [0
,
1). By the axiom of choice, we pick a representative
of each equivalence class, and put them into a set
S ⊆
[0
,
1). We will show that
S is not Lebesgue measurable.
Suppose that
S
were Lebesgue measurable. We are going to get a contra-
diction to the countable additivity of the Lebesgue measure. For each rational
r ∈ [0, 1) ∩ Q, we define
S
r
= {s + r mod 1 : s ∈ S}.
By translation invariance, we know
S
r
is also Lebesgue measurable, and
µ
(
S
r
) =
µ(S).
Also, by construction of
S
, we know (
S
r
)
r∈Q
is disjoint, and
S
r∈Q
S
r
= [0
,
1).
Now by countable additivity, we have
1 = µ([0, 1)) = µ
[
r∈Q
S
r
=
X
r∈Q
µ(S
r
) =
X
r∈Q
µ(S),
which is clearly not possible. Indeed, if
µ
(
S
) = 0, then this says 1 = 0; If
µ(S) > 0, then this says 1 = ∞. Both are absurd.
1.2 Probability measures
Since the course is called “probability and measure”, we’d better start talking
about probability! It turns out the notions we care about in probability theory
are very naturally just special cases of the concepts we have previously considered.
Definition
(Probability measure and probability space)
.
Let (
E, E
) be a measure
space with the property that
µ
(
E
) = 1. Then we often call
µ
a probability measure,
and (E, E, µ) a probability space.
Probability spaces are usually written as (Ω, F, P) instead.
Definition
(Sample space)
.
In a probability space (Ω
, F, P
), we often call Ω
the sample space.
Definition
(Events)
.
In a probability space (Ω
, F, P
), we often call the elements
of F the events.
Definition
(Probaiblity)
.
In a probability space (Ω
, F, P
), if
A ∈ F
, we often
call P[A] the probability of the event A.
These are exactly the same things as measures, but with different names!
However, thinking of them as probabilities could make us ask different questions
about these measure spaces. For example, in probability, one is often interested
in independence.
Definition
(Independence of events)
.
A sequence of events (
A
n
) is said to be
independent if
P
"
\
n∈J
A
n
#
=
Y
n∈J
P[A
n
]
for all finite subsets J ⊆ N.
However, it turns out that talking about independence of events is usually
too restrictive. Instead, we want to talk about the independence of σ-algebras:
Definition
(Independence of
σ
-algebras)
.
A sequence of
σ
-algebras (
A
n
) with
A
n
⊆ F
for all
n
is said to be independent if the following is true: If (
A
n
) is a
sequence where A
n
∈ A
n
for all n, them (A
n
) is independent.
Proposition.
Events (
A
n
) are independent iff the
σ
-algebras
σ
(
A
n
) are inde-
pendent.
While proving this directly would be rather tedious (but not too hard), it is
an immediate consequence of the following theorem:
Theorem. Suppose A
1
and A
2
are π-systems in F. If
P[A
1
∩ A
2
] = P[A
1
]P[A
2
]
for all A
1
∈ A
1
and A
2
∈ A
2
, then σ(A
1
) and σ(A
2
) are independent.
Proof.
This will follow from two applications of the fact that a finite measure is
determined by its values on a π-system which generates the entire σ-algebra.
We first fix A
1
∈ A
1
. We define the measures
µ(A) = P[A ∩ A
1
]
and
ν(A) = P[A]P[A
1
]
for all
A ∈ F
. By assumption, we know
µ
and
ν
agree on
A
2
, and we have that
µ(Ω) = P[A
1
] = ν(Ω) ≤ 1 < ∞. So µ and ν agree on σ(A
2
). So we have
P[A
1
∩ A
2
] = µ(A
2
) = ν(A
2
) = P[A
1
]P[A
2
]
for all A
2
∈ σ(A
2
).
So we have now shown that if
A
1
and
A
2
are independent, then
A
1
and
σ
(
A
2
) are independent. By symmetry, the same argument shows that
σ
(
A
1
)
and σ(A
2
) are independent.
Say we are rolling a dice. Instead of asking what the probability of getting
a 6, we might be interested instead in the probability of getting a 6 infinitely
often. Intuitively, the answer is “it happens with probability 1”, because in each
dice roll, we have a probability of
1
6
of getting a 6, and they are all independent.
We would like to make this precise and actually prove it. It turns out that
the notions of “occurs infinitely often” and also “occurs eventually” correspond
to more analytic notions of lim sup and lim inf.
Definition (limsup and liminf). Let (A
n
) be a sequence of events. We define
lim sup A
n
=
\
n
[
m≥n
A
m
lim inf A
n
=
[
n
\
m≥n
A
m
.
To parse these definitions more easily, we can read
∩
as “for all”, and
∪
as
“there exits”. For example, we can write
lim sup A
n
= ∀n, ∃m ≥ n such that A
m
occurs
= {x : ∀n, ∃m ≥ n, x ∈ A
m
}
= {A
m
occurs infinitely often}
= {A
m
i.o.}
Similarly, we have
lim inf A
n
= ∃n, ∀m ≥ n such that A
m
occurs
= {x : ∃n, ∀m ≥ n, x ∈ A
m
}
= {A
m
occurs eventually}
= {A
m
e.v.}
We are now going to prove two “obvious” results, known as the Borel–Cantelli
lemmas. These give us necessary conditions for an event to happen infinitely
often, and in the case where the events are independent, the condition is also
sufficient.
Lemma (Borel–Cantelli lemma). If
X
n
P[A
n
] < ∞,
then
P[A
n
i.o.] = 0.
Proof. For each k, we have
P[A
n
i.o] = P
\
n
[
m≥n
A
m
≤ P
[
m≥k
A
m
≤
∞
X
m=k
P[A
m
]
→ 0
as k → ∞. So we have P[A
n
i.o.] = 0.
Note that we did not need to use the fact that we are working with a
probability measure. So in fact this holds for any measure space.
Lemma (Borel–Cantelli lemma II). Let (A
n
) be independent events. If
X
n
P[A
n
] = ∞,
then
P[A
n
i.o.] = 1.
Note that independence is crucial. If we flip a fair coin, and we set all the
A
n
to be equal to “getting a heads”, then
P
n
P
[
A
n
] =
P
n
1
2
=
∞
, but we certainly
do not have P[A
n
i.o.] = 1. Instead it is just
1
2
.
Proof.
By example sheet, if (
A
n
) is independent, then so is (
A
C
n
). Then we have
P
"
N
\
m=n
A
C
m
#
=
N
Y
m=n
P[A
C
m
]
=
N
Y
m=n
(1 − P[A
m
])
≤
N
Y
m=n
exp(−P[A
m
])
= exp
−
N
X
m=n
P[A
m
]
!
→ 0
as N → ∞, as we assumed that
P
n
P[A
n
] = ∞. So we have
P
"
∞
\
m=n
A
C
m
#
= 0.
By countable subadditivity, we have
P
"
[
n
∞
\
m=n
A
C
m
#
= 0.
This in turn implies that
P
"
\
n
∞
[
m=n
A
m
#
= 1 − P
"
[
n
∞
\
m=n
A
C
m
#
= 1.
So we are done.
2 Measurable functions and random variables
We’ve had enough of measurable sets. As in most of mathematics, not only
should we talk about objects, but also maps between objects. Here we want to
talk about maps between measure spaces, known as measurable functions. In
the case of a probability space, a measurable function is a random variable!
In this chapter, we are going to start by defining a measurable function and
investigate some of its basic properties. In particular, we are going to prove the
monotone class theorem, which is the analogue of Dynkin’s lemma for measurable
functions. Afterwards, we turn to the probabilistic aspects, and see how we can
make sense of the independence of random variables. Finally, we are going to
consider different notions of “convergence” of functions.
2.1 Measurable functions
The definition of a measurable function is somewhat like the definition of a
continuous function, except that we replace “open” with “in the σ-algebra”.
Definition
(Measurable functions)
.
Let (
E, E
) and (
G, G
) be measure spaces.
A map f : E → G is measurable if for every A ∈ G, we have
f
−1
(A) = {x ∈ E : f(x) ∈ E} ∈ E.
If (G, G) = (R, B), then we will just say that f is measurable on E.
If (
G, G
) = ([0
, ∞
]
, B
), then we will just say that
f
is non-negative measurable.
If E is a topological space and E = B(E), then we call f a Borel function.
How do we actually check in practice that a function is measurable? It turns
out we are lucky. We can simply check that
f
−1
(
A
)
∈ E
for
A
in any generating
set Q of G.
Lemma.
Let (
E, E
) and (
G, G
) be measurable spaces, and
G
=
σ
(
Q
) for some
Q. If f
−1
(A) ∈ E for all A ∈ Q, then f is measurable.
Proof. We claim that
{A ⊆ G : f
−1
(A) ∈ E}
is a
σ
-algebra on
G
. Then the result follows immediately by definition of
σ
(
Q
).
Indeed, this follows from the fact that
f
−1
preserves everything. More
precisely, we have
f
−1
[
n
A
n
!
=
[
n
f
−1
(A
n
), f
−1
(A
C
) = (f
−1
(A))
C
, f
−1
(∅) = ∅.
So if, say, all A
n
∈ A, then so is
S
n
A
n
.
Example.
In the particular case where we have a function
f
:
E → R
, we know
that
B
=
B
(
R
) is generated by (
−∞, y
] for
y ∈ R
. So we just have to check that
{x ∈ E : f(x) ≤ y} = f
−1
((−∞, y])) ∈ E.
Example.
Let
E, F
be topological spaces, and
f
:
E → F
be continuous. We
will see that
f
is a measurable function (under the Borel
σ
-algebras). Indeed,
by definition, whenever
U ⊆ F
is open, we have
f
−1
(
U
) open as well. So
f
−1
(
U
)
∈ B
(
E
) for all
U ⊆ F
open. But since
B
(
F
) is the
σ
-algebra generated
by the open sets, this implies that f is measurable.
This is one very important example. We can do another very important
example.
Example.
Suppose that
A ⊆ E
. The indicator function of
A
is
1
A
(
x
) :
E →
{0, 1} given by
1
A
(x) =
(
1 x ∈ A
0 x 6∈ A
.
Suppose we give
{
0
,
1
}
the non-trivial measure. Then
1
A
is a measurable function
iff A ∈ E.
Example. The identity function is always measurable.
Example.
Composition of measurable functions are measurable. More precisely,
if (
E, E
), (
F, F
) and (
G, G
) are measurable spaces, and the functions
f
:
E → F
and
g
:
F → G
are measurable, then the composition
g◦f
:
E → G
is measurable.
Indeed, if
A ∈ G
, then
g
−1
(
A
)
∈ F
, so
f
−1
(
g
−1
(
A
))
∈ E
. But
f
−1
(
g
−1
(
A
)) =
(g ◦ f )
−1
(A). So done.
Definition
(
σ
-algebra generated by functions)
.
Now suppose we have a set
E
,
and a family of real-valued functions {f
i
: i ∈ I} on E. We then define
σ(f
i
: i ∈ I) = σ(f
−1
i
(A) : A ∈ B, i ∈ I).
This is the smallest
σ
-algebra on
E
which makes all the
f
i
’s measurable.
This is analogous to the notion of initial topologies for topological spaces.
If we want to construct more measurable functions, the following definition
will be rather useful:
Definition
(Product measurable space)
.
Let (
E, E
) and (
G, G
) be measure
spaces. We define the product measure space as
E × G
whose
σ
-algebra is
generated by the projections
E × G
E G
π
1
π
2
.
More explicitly, the σ-algebra is given by
E ⊗ G = σ({A × B : A ∈ E, B ∈ G}).
More generally, if (
E
i
, E
i
) is a collection of measure spaces, the product measure
space has underlying set
Q
i
E
i
, and the
σ
-algebra generated by the projection
maps π
i
:
Q
j
E
j
→ E
i
.
This satisfies the following property:
Proposition.
Let
f
i
:
E → F
i
be functions. Then
{f
i
}
are all measurable iff
(
f
i
) :
E →
Q
F
i
is measurable, where the function (
f
i
) is defined by setting the
ith component of (f
i
)(x) to be f
i
(x).
Proof.
If the map (
f
i
) is measurable, then by composition with the projections
π
i
, we know that each f
i
is measurable.
Conversely, if all
f
i
are measurable, then since the
σ
-algebra of
Q
F
i
is
generated by sets of the form
π
−1
j
(
A
) :
A ∈ F
j
, and the pullback of such sets
along (f
i
) is exactly f
−1
j
(A), we know the function (f
i
) is measurable.
Using this, we can prove that a whole lot more functions are measurable.
Proposition.
Let (
E, E
) be a measurable space. Let (
f
n
:
n ∈ N
) be a sequence
of non-negative measurable functions on
E
. Then the following are measurable:
f
1
+ f
2
, f
1
f
2
, max{f
1
, f
2
}, min{f
1
, f
2
},
inf
n
f
n
, sup
n
f
n
, lim inf
n
f
n
, lim sup
n
f
n
.
The same is true with “real” replaced with “non-negative”, provided the new
functions are real (i.e. not infinity).
Proof.
This is an (easy) exercise on the example sheet. For example, the sum
f
1
+ f
2
can be written as the following composition.
E [0, ∞]
2
[0, ∞].
(f
1
,f
2
)
+
We know the second map is continuous, hence measurable. The first function is
also measurable since the f
i
are. So the composition is also measurable.
The product follows similarly, but for the infimum and supremum, we need to
check explicitly that the corresponding maps [0
, ∞
]
N
→
[0
, ∞
] is measurable.
Notation. We will write
f ∧ g = min{f, g}, f ∨ g = max{f, g}.
We are now going to prove the monotone class theorem, which is a “Dynkin’s
lemma” for measurable functions. As in the case of Dynkin’s lemma, it will
sound rather awkward but will prove itself to be very useful.
Theorem
(Monotone class theorem)
.
Let (
E, E
) be a measurable space, and
A ⊆ E
be a
π
-system with
σ
(
A
) =
E
. Let
V
be a vector space of functions such
that
(i) The constant function 1 = 1
E
is in V.
(ii) The indicator functions 1
A
∈ V for all A ∈ A
(iii) V is closed under bounded, monotone limits.
More explicitly, if (
f
n
) is a bounded non-negative sequence in
V
,
f
n
% f
(pointwise) and f is also bounded, then f ∈ V.
Then V contains all bounded measurable functions.
Note that the conditions for
V
is pretty like the conditions for a d-system,
where taking a bounded, monotone limit is something like taking increasing
unions.
Proof. We first deduce that 1
A
∈ V for all A ∈ E.
D = {A ∈ E : 1
A
∈ V}.
We want to show that
D
=
E
. To do this, we have to show that
D
is a
d
-system.
(i) Since 1
E
∈ V, we know E ∈ D.
(ii) If 1
A
∈ V , then 1 − 1
A
= 1
E\A
∈ V. So E \ A ∈ D.
(iii)
If (
A
n
) is an increasing sequence in
D
, then
1
A
n
→ 1
S
A
n
monotonically
increasingly. So 1
S
A
n
is in D.
So, by Dynkin’s lemma, we know
D
=
E
. So
V
contains indicators of all measur-
able sets. We will now try to obtain any measurable function by approximating.
Suppose that
f
is bounded and non-negative measurable. We want to show
that f ∈ V. To do this, we approximate it by letting
f
n
= 2
−n
b2
n
fc =
∞
X
k=0
k2
−n
1
{k2
−n
≤f<(k+1)2
−n
}
.
Note that since
f
is bounded, this is a finite sum. So it is a finite linear
combination of indicators of elements in
E
. So
f
n
∈ V
,