2Measurable functions and random variables
II Probability and Measure
2.3 Random variables
We are now going to look at these ideas in the context of probability. It turns
out they are concepts we already know and love!
Definition
(Random variable)
.
Let (Ω
, F, P
) be a probability space, and (
E, E
)
a measurable space. Then an
E
-valued random variable is a measurable function
X : Ω → E.
By default, we will assume the random variables are real.
Usually, when we have a random variable
X
, we might ask questions like
“what is the probability that
X ∈ A
?”. In other words, we are asking for the
“size” of the set of things that get sent to A. This is just the image measure!
Definition
(Distribution/law)
.
Given a random variable
X
: Ω
→ E
, the
distribution or law of X is the image measure µ
x
: P ◦ X
−1
. We usually write
P(X ∈ A) = µ
x
(A) = P(X
−1
(A)).
If
E
=
R
, then
µ
x
is determined by its values on the
π
-system of intervals
(−∞, y]. We set
F
X
(x) = µ
X
((−∞, x]) = P(X ≤ x)
This is known as the distribution function of X.
Proposition. We have
F
X
(x) →
(
0 x → −∞
1 x → +∞
.
Also, F
X
(x) is non-decreasing and right-continuous.
We call any function F with these properties a distribution function.
Definition
(Distribution function)
.
A distribution function is a non-decreasing,
right continuous function f : R → [0, 1] satisfying
F
X
(x) →
(
0 x → −∞
1 x → +∞
.
We now want to show that every distribution function is indeed a distribution.
Proposition.
Let
F
be any distribution function. Then there exists a probability
space (Ω, F, P) and a random variable X such that F
X
= F .
Proof. Take (Ω, F, P) = ((0, 1), B(0, 1), Lebesgue). We take X : Ω → R to be
X(ω) = inf{x : ω ≤ f(x)}.
Then we have
X(ω) ≤ x ⇐⇒ w ≤ F (x).
So we have
F
X
(x) = P[X ≤ x] = P[(0, F (x)]] = F (x).
Therefore F
X
= F .
This construction is actually very useful in practice. If we are writing
a computer program and want to sample a random variable, we will use this
procedure. The computer usually comes with a uniform (pseudo)-random number
generator. Then using this procedure allows us to produce random variables of
any distribution from a uniform sample.
The next thing we want to consider is the notion of independence of random
variables. Recall that for random variables
X, Y
, we used to say that they are
independent if for any A, B, we have
P[X ∈ A, Y ∈ B] = P[X ∈ A]P[Y ∈ B].
But this is exactly the statement that the
σ
-algebras generated by
X
and
Y
are
independent!
Definition
(Independence of random variables)
.
A family (
X
n
) of random vari-
ables is said to be independent if the family of
σ
-algebras (
σ
(
X
n
)) is independent.
Proposition. Two real-valued random variables X, Y are independent iff
P[X ≤ x, Y ≤ y] = P[X ≤ x]P[Y ≤ y].
More generally, if (
X
n
) is a sequence of real-valued random variables, then they
are independent iff
P[x
1
≤ x
1
, ··· , x
n
≤ x
n
] =
n
Y
j=1
P[X
j
≤ x
j
]
for all n and x
j
.
Proof.
The
⇒
direction is obvious. For the other direction, we simply note that
{(−∞, x] : x ∈ R} is a generating π-system for the Borel σ-algebra of R.
In probability, we often say things like “let
X
1
, X
2
, ···
be iid random vari-
ables”. However, how can we guarantee that iid random variables do indeed
exist? We start with the less ambitious goal of finding iid
Bernoulli
(1
/
2) random
variables:
Proposition. Let
(Ω, F, P) = ((0, 1), B(0, 1), Lebesgue).
be our probability space. Then there exists as sequence
R
n
of independent
Bernoulli(1/2) random variables.
Proof.
Suppose we have
ω ∈
Ω = (0
,
1). Then we write
ω
as a binary expansion
ω =
∞
X
n=1
ω
n
2
−n
,
where
ω
n
∈ {
0
,
1
}
. We make the binary expansion unique by disallowing infinite
sequences of zeroes.
We define
R
n
(
ω
) =
ω
n
. We will show that
R
n
is measurable. Indeed, we can
write
R
1
(ω) = ω
1
= 1
(1/2,1]
(ω),
where
1
(1/2,1]
is the indicator function. Since indicator functions of measurable
sets are measurable, we know R
1
is measurable. Similarly, we have
R
2
(ω) = 1
(1/4,1/2]
(ω) + 1
(3/4,1]
(ω).
So this is also a measurable function. More generally, we can do this for any
R
n
(ω): we have
R
n
(ω) =
2
n−1
X
j=1
1
(2
−n
(2j−1),2
−n
(2j)]
(ω).
So each
R
n
is a random variable, as each can be expressed as a sum of indicators
of measurable sets.
Now let’s calculate
P[R
n
= 1] =
2n−1
X
j=1
2
−n
((2j) − (2j − 1)) =
2n−1
X
j=1
2
−n
=
1
2
.
Then we have
P[R
n
= 0] = 1 − P[R
n
= 1] =
1
2
as well. So R
n
∼ Bernoulli(1/2).
We can straightforwardly check that (
R
n
) is an independent sequence, since
for n 6= m, we have
P[R
n
= 0 and R
m
= 0] =
1
4
= P[R
n
= 0]P[R
m
= 0].
We will now use the (
R
n
) to construct any independent sequence for any
distribution.
Proposition. Let
(Ω, F, P) = ((0, 1), B(0, 1), Lebesgue).
Given any sequence (
F
n
) of distribution functions, there is a sequence (
X
n
) of
independent random variables with F
X
n
= F
n
for all n.
Proof. Let m : N
2
→ N be any bijection, and relabel
Y
k,n
= R
m(k,n)
,
where the R
j
are as in the previous random variable. We let
Y
n
=
∞
X
k=1
2
−k
Y
k,n
.
Then we know that (
Y
n
) is an independent sequence of random variables, and
each is uniform on (0, 1). As before, we define
G
n
(y) = inf{x : y ≤ F
n
(x)}.
We set
X
n
=
G
n
(
Y
n
). Then (
X
n
) is a sequence of random variables with
F
X
n
= F
n
.
We end the section with a random fact: let (Ω
, F, P
) and
R
j
be as above.
Then
1
n
P
n
j=1
R
j
is the average of
n
independent of
Bernoulli
(1
/
2) random
variables. The weak law of large numbers says for any ε > 0, we have
P
1
n
n
X
j=1
R
j
−
1
2
≥ ε
→ 0 as n → ∞.
The strong law of large numbers, which we will prove later, says that
P
ω :
1
n
n
X
j=1
R
j
→
1
2
= 1.
So “almost every number” in (0
,
1) has an equal proportion of 0’s and 1’s in its
binary expansion. This is known as the normal number theorem.