3Discrete random variables
IA Probability
3.4 Multiple random variables
If we have two random variables, we can study the relationship between them.
Definition (Covariance). Given two random variables X, Y , the covariance is
cov(X, Y ) = E[(X − E[X])(Y − E[Y ])].
Proposition.
(i) cov(X, c) = 0 for constant c.
(ii) cov(X + c, Y ) = cov(X, Y ).
(iii) cov(X, Y ) = cov(Y, X).
(iv) cov(X, Y ) = E[XY ] − E[X]E[Y ].
(v) cov(X, X) = var(X).
(vi) var(X + Y ) = var(X) + var(Y ) + 2 cov(X, Y ).
(vii) If X, Y are independent, cov(X, Y ) = 0.
These are all trivial to prove and proof is omitted.
It is important to note that
cov
(
X, Y
) = 0 does not imply
X
and
Y
are
independent.
Example.
–
Let (
X, Y
) = (2
,
0)
,
(
−
1
, −
1) or (
−
1
,
1) with equal probabilities of 1
/
3.
These are not independent since Y = 0 ⇒ X = 2.
However, cov(X, Y ) = E[XY ] −E[X]E[Y ] = 0 − 0 · 0 = 0.
–
If we randomly pick a point on the unit circle, and let the coordinates be
(
X, Y
), then
E
[
X
] =
E
[
Y
] =
E
[
XY
] = 0 by symmetry. So
cov
(
X, Y
) = 0
but
X
and
Y
are clearly not independent (they have to satisfy
x
2
+
y
2
= 1).
The covariance is not that useful in measuring how well two variables correlate.
For one, the covariance can (potentially) have dimensions, which means that the
numerical value of the covariance can depend on what units we are using. Also,
the magnitude of the covariance depends largely on the variance of
X
and
Y
themselves. To solve these problems, we define
Definition (Correlation coefficient). The correlation coefficient of
X
and
Y
is
corr(X, Y ) =
cov(X, Y )
p
var(X) var(Y )
.
Proposition. |corr(X, Y )| ≤ 1.
Proof. Apply Cauchy-Schwarz to X − E[X] and Y − E[Y ].
Again, zero correlation does not necessarily imply independence.
Alternatively, apart from finding a fixed covariance or correlation number,
we can see how the distribution of
X
depends on
Y
. Given two random variables
X, Y
,
P
(
X
=
x, Y
=
y
) is known as the joint distribution. From this joint
distribution, we can retrieve the probabilities
P
(
X
=
x
) and
P
(
Y
=
y
). We can
also consider different conditional expectations.
Definition (Conditional distribution). Let
X
and
Y
be random variables (in
general not independent) with joint distribution
P
(
X
=
x, Y
=
y
). Then the
marginal distribution (or simply distribution) of X is
P(X = x) =
X
y∈Ω
y
P(X = x, Y = y).
The conditional distribution of X given Y is
P(X = x | Y = y) =
P(X = x, Y = y)
P(Y = y)
.
The conditional expectation of X given Y is
E[X | Y = y] =
X
x∈Ω
X
xP(X = x | Y = y).
We can view
E
[
X | Y
] as a random variable in
Y
: given a value of
Y
, we return
the expectation of X.
Example. Consider a dice roll. Let
Y
= 1 denote an even roll and
Y
= 0 denote
an odd roll. Let
X
be the value of the roll. Then
E
[
X | Y
] = 3 +
Y
, ie 4 if even,
3 if odd.
Example. Let X
1
, ··· , X
n
be iid B(1, p). Let Y = X
1
+ ··· + X
n
. Then
P(X
1
= 1 | Y = r) =
P(X
1
= 1,
P
n
2
X
i
= r −1)
P(Y = r)
=
p
n−1
r−1
p
r−1
(1 − p)
(n−1)−(r−1)
n
r
p
r
(1 − p)
n−1
=
r
n
.
So
E[X
1
| Y ] = 1 ·
r
n
+ 0
1 −
r
n
=
r
n
=
Y
n
.
Note that this is a random variable!
Theorem. If X and Y are independent, then
E[X | Y ] = E[X]
Proof.
E[X | Y = y] =
X
x
xP(X = x | Y = y)
=
X
x
xP(X = x)
= E[X]
We know that the expected value of a dice roll given it is even is 4, and the
expected value given it is odd is 3. Since it is equally likely to be even or odd,
the expected value of the dice roll is 3.5. This is formally captured by
Theorem (Tower property of conditional expectation).
E
Y
[E
X
[X | Y ]] = E
X
[X],
where the subscripts indicate what variable the expectation is taken over.
Proof.
E
Y
[E
X
[X | Y ]] =
X
y
P(Y = y)E[X | Y = y]
=
X
y
P(Y = y)
X
x
xP(X = x | Y = y)
=
X
x
X
y
xP(X = x, Y = y)
=
X
x
x
X
y
P(X = x, Y = y)
=
X
x
xP(X = x)
= E[X].
This is also called the law of total expectation. We can also state it as:
suppose A
1
, A
2
, ··· , A
n
is a partition of Ω. Then
E[X] =
X
i:P(A
i
)>0
E[X | A
i
]P(A
i
).