II Probability and Measure - Measurable functions and random variables

2Measurable functions and random variables

II Probability and Measure

2.3 Random variables

We are now going to look at these ideas in the context of probability. It turns

out they are concepts we already know and love!

Definition

(Random variable)

Let (Ω

, F, P

) be a probability space, and (

E, E

)

a measurable space. Then an

-valued random variable is a measurable function

X : Ω → E.

By default, we will assume the random variables are real.

Usually, when we have a random variable

, we might ask questions like

“what is the probability that

X ∈ A

?”. In other words, we are asking for the

“size” of the set of things that get sent to A. This is just the image measure!

Definition

(Distribution/law)

Given a random variable

: Ω

→ E

, the

distribution or law of X is the image measure µ

: P ◦ X

−1

. We usually write

P(X ∈ A) = µ

(A) = P(X

−1

(A)).

, then

is determined by its values on the

-system of intervals

(−∞, y]. We set

(x) = µ

((−∞, x]) = P(X ≤ x)

This is known as the distribution function of X.

Proposition. We have

(x) →

(

0 x → −∞

1 x → +∞

Also, F

(x) is non-decreasing and right-continuous.

We call any function F with these properties a distribution function.

Definition

(Distribution function)

A distribution function is a non-decreasing,

right continuous function f : R → [0, 1] satisfying

(x) →

(

0 x → −∞

1 x → +∞

We now want to show that every distribution function is indeed a distribution.

Proposition.

Let

be any distribution function. Then there exists a probability

space (Ω, F, P) and a random variable X such that F

= F .

Proof. Take (Ω, F, P) = ((0, 1), B(0, 1), Lebesgue). We take X : Ω → R to be

X(ω) = inf{x : ω ≤ f(x)}.

Then we have

X(ω) ≤ x ⇐⇒ w ≤ F (x).

So we have

(x) = P[X ≤ x] = P[(0, F (x)]] = F (x).

Therefore F

= F .

This construction is actually very useful in practice. If we are writing

a computer program and want to sample a random variable, we will use this

procedure. The computer usually comes with a uniform (pseudo)-random number

generator. Then using this procedure allows us to produce random variables of

any distribution from a uniform sample.

The next thing we want to consider is the notion of independence of random

variables. Recall that for random variables

X, Y

, we used to say that they are

independent if for any A, B, we have

P[X ∈ A, Y ∈ B] = P[X ∈ A]P[Y ∈ B].

But this is exactly the statement that the

-algebras generated by

and

are

independent!

Definition

(Independence of random variables)

A family (

) of random vari-

ables is said to be independent if the family of

-algebras (

(

)) is independent.

Proposition. Two real-valued random variables X, Y are independent iff

P[X ≤ x, Y ≤ y] = P[X ≤ x]P[Y ≤ y].

More generally, if (

) is a sequence of real-valued random variables, then they

are independent iff

P[x

≤ x

, ··· , x

≤ x

] =

j=1

P[X

≤ x

]

for all n and x

Proof.

The

⇒

direction is obvious. For the other direction, we simply note that

{(−∞, x] : x ∈ R} is a generating π-system for the Borel σ-algebra of R.

In probability, we often say things like “let

, X

, ···

be iid random vari-

ables”. However, how can we guarantee that iid random variables do indeed

exist? We start with the less ambitious goal of finding iid

Bernoulli

2) random

variables:

Proposition. Let

(Ω, F, P) = ((0, 1), B(0, 1), Lebesgue).

be our probability space. Then there exists as sequence

of independent

Bernoulli(1/2) random variables.

Proof.

Suppose we have

ω ∈

Ω = (0

1). Then we write

as a binary expansion

ω =

∞

n=1

−n

where

∈ {

}

. We make the binary expansion unique by disallowing infinite

sequences of zeroes.

We define

(

) =

. We will show that

is measurable. Indeed, we can

write

(ω) = ω

= 1

(1/2,1]

(ω),

where

(1/2,1]

is the indicator function. Since indicator functions of measurable

sets are measurable, we know R

is measurable. Similarly, we have

(ω) = 1

(1/4,1/2]

(ω) + 1

(3/4,1]

(ω).

So this is also a measurable function. More generally, we can do this for any

(ω): we have

(ω) =

n−1

j=1

−n

(2j−1),2

−n

(2j)]

(ω).

So each

is a random variable, as each can be expressed as a sum of indicators

of measurable sets.

Now let’s calculate

P[R

= 1] =

2n−1

j=1

−n

((2j) − (2j − 1)) =

2n−1

j=1

−n

Then we have

P[R

= 0] = 1 − P[R

= 1] =

as well. So R

∼ Bernoulli(1/2).

We can straightforwardly check that (

) is an independent sequence, since

for n 6= m, we have

P[R

= 0 and R

= 0] =

= P[R

= 0]P[R

= 0].

We will now use the (

) to construct any independent sequence for any

distribution.

Proposition. Let

(Ω, F, P) = ((0, 1), B(0, 1), Lebesgue).

Given any sequence (

) of distribution functions, there is a sequence (

) of

independent random variables with F

= F

for all n.

Proof. Let m : N

→ N be any bijection, and relabel

k,n

= R

m(k,n)

where the R

are as in the previous random variable. We let

∞

k=1

−k

k,n

Then we know that (

) is an independent sequence of random variables, and

each is uniform on (0, 1). As before, we define

(y) = inf{x : y ≤ F

(x)}.

We set

(

). Then (

) is a sequence of random variables with

= F

We end the section with a random fact: let (Ω

, F, P

) and

be as above.

Then

j=1

is the average of

independent of

Bernoulli

2) random

variables. The weak law of large numbers says for any ε > 0, we have







j=1

−



≥ ε





→ 0 as n → ∞.

The strong law of large numbers, which we will prove later, says that











ω :

j=1

→











= 1.

So “almost every number” in (0

1) has an equal proportion of 0’s and 1’s in its

binary expansion. This is known as the normal number theorem.