IA Probability - Discrete random variables

3Discrete random variables

IA Probability

3.4 Multiple random variables

If we have two random variables, we can study the relationship between them.

Definition (Covariance). Given two random variables X, Y , the covariance is

cov(X, Y ) = E[(X − E[X])(Y − E[Y ])].

Proposition.

(i) cov(X, c) = 0 for constant c.

(ii) cov(X + c, Y ) = cov(X, Y ).

(iii) cov(X, Y ) = cov(Y, X).

(iv) cov(X, Y ) = E[XY ] − E[X]E[Y ].

(v) cov(X, X) = var(X).

(vi) var(X + Y ) = var(X) + var(Y ) + 2 cov(X, Y ).

(vii) If X, Y are independent, cov(X, Y ) = 0.

These are all trivial to prove and proof is omitted.

It is important to note that

cov

(

X, Y

) = 0 does not imply

and

are

independent.

Example.

–

Let (

X, Y

) = (2

(

−

, −

1) or (

−

1) with equal probabilities of 1

These are not independent since Y = 0 ⇒ X = 2.

However, cov(X, Y ) = E[XY ] −E[X]E[Y ] = 0 − 0 · 0 = 0.

–

If we randomly pick a point on the unit circle, and let the coordinates be

(

X, Y

), then

[

] =

[

] =

[

] = 0 by symmetry. So

cov

(

X, Y

) = 0

but

and

are clearly not independent (they have to satisfy

= 1).

The covariance is not that useful in measuring how well two variables correlate.

For one, the covariance can (potentially) have dimensions, which means that the

numerical value of the covariance can depend on what units we are using. Also,

the magnitude of the covariance depends largely on the variance of

and

themselves. To solve these problems, we define

Definition (Correlation coefficient). The correlation coefficient of

and

corr(X, Y ) =

cov(X, Y )

var(X) var(Y )

Proposition. |corr(X, Y )| ≤ 1.

Proof. Apply Cauchy-Schwarz to X − E[X] and Y − E[Y ].

Again, zero correlation does not necessarily imply independence.

Alternatively, apart from finding a fixed covariance or correlation number,

we can see how the distribution of

depends on

. Given two random variables

X, Y

(

x, Y

) is known as the joint distribution. From this joint

distribution, we can retrieve the probabilities

(

) and

(

). We can

also consider different conditional expectations.

Definition (Conditional distribution). Let

and

be random variables (in

general not independent) with joint distribution

(

x, Y

). Then the

marginal distribution (or simply distribution) of X is

P(X = x) =

y∈Ω

P(X = x, Y = y).

The conditional distribution of X given Y is

P(X = x | Y = y) =

P(X = x, Y = y)

P(Y = y)

The conditional expectation of X given Y is

E[X | Y = y] =

x∈Ω

xP(X = x | Y = y).

We can view

[

X | Y

] as a random variable in

: given a value of

, we return

the expectation of X.

Example. Consider a dice roll. Let

= 1 denote an even roll and

= 0 denote

an odd roll. Let

be the value of the roll. Then

[

X | Y

] = 3 +

, ie 4 if even,

3 if odd.

Example. Let X

, ··· , X

be iid B(1, p). Let Y = X

+ ··· + X

. Then

P(X

= 1 | Y = r) =

P(X

= 1,

= r −1)

P(Y = r)



n−1

r−1



r−1

(1 − p)

(n−1)−(r−1)





(1 − p)

n−1

E[X

| Y ] = 1 ·

+ 0



1 −



Note that this is a random variable!

Theorem. If X and Y are independent, then

E[X | Y ] = E[X]

Proof.

E[X | Y = y] =

xP(X = x | Y = y)

xP(X = x)

= E[X]

We know that the expected value of a dice roll given it is even is 4, and the

expected value given it is odd is 3. Since it is equally likely to be even or odd,

the expected value of the dice roll is 3.5. This is formally captured by

Theorem (Tower property of conditional expectation).

[X | Y ]] = E

[X],

where the subscripts indicate what variable the expectation is taken over.

Proof.

[X | Y ]] =

P(Y = y)E[X | Y = y]

P(Y = y)

xP(X = x | Y = y)

xP(X = x, Y = y)

P(X = x, Y = y)

xP(X = x)

= E[X].

This is also called the law of total expectation. We can also state it as:

suppose A

, A

, ··· , A

is a partition of Ω. Then

E[X] =

i:P(A

)>0

E[X | A

]P(A