3Discrete random variables

IA Probability



3.2 Inequalities
Here we prove a lot of different inequalities which may be useful for certain
calculations. In particular, Chebyshev’s inequality will allow us to prove the
weak law of large numbers.
Definition (Convex function). A function
f
: (
a, b
)
R
is convex if for all
x
1
, x
2
(a, b) and λ
1
, λ
2
0 such that λ
1
+ λ
2
= 1,
λ
1
f(x
1
) + λ
2
f(x
2
) f(λ
1
x
1
+ λ
2
x
2
).
It is strictly convex if the inequality above is strict (except when
x
1
=
x
2
or
λ
1
or λ
2
= 0).
x
1
x
2
λ
1
x
1
+ λ
2
x
2
λ
1
f (x
1
) + λ
2
f (x
2
)
A function is concave if f is convex.
A useful criterion for convexity is
Proposition. If
f
is differentiable and
f
′′
(
x
)
0 for all
x
(
a, b
), then it is
convex. It is strictly convex if f
′′
(x) > 0.
Theorem (Jensen’s inequality). If f : (a, b) R is convex, then
n
X
i=1
p
i
f(x
i
) f
n
X
i=1
p
i
x
i
!
for all p
1
, p
2
, ··· , p
n
such that p
i
0 and
P
p
i
= 1, and x
i
(a, b).
This says that E[f(X)] f (E[X]) (where P(X = x
i
) = p
i
).
If
f
is strictly convex, then equalities hold only if all
x
i
are equal, i.e.
X
takes only one possible value.
Proof. Induct on n. It is true for n = 2 by the definition of convexity. Then
f(p
1
x
1
+ ··· + p
n
x
n
) = f
p
1
x
1
+ (p
2
+ ··· + p
n
)
p
2
x
2
+ ··· + l
n
x
n
p
2
+ ··· + p
n
p
1
f(x
1
) + (p
2
+ ···p
n
)f
p
2
x
2
+ ··· + p
n
x
n
p
2
+ ··· + p
n
.
p
1
f(x
1
) + (p
2
+ ··· + p
n
)
p
2
( )
f(x
2
) + ··· +
p
n
( )
f(x
n
)
= p
1
f(x
1
) + ··· + p
n
(x
n
).
where the ( ) is p
2
+ ··· + p
n
.
Strictly convex case is proved with
replaced by
<
by definition of strict
convexity.
Corollary (AM-GM inequality). Given x
1
, ··· , x
n
positive reals, then
Y
x
i
1/n
1
n
X
x
i
.
Proof.
Take
f
(
x
) =
log x
. This is convex since its second derivative is
x
2
>
0.
Take P(x = x
i
) = 1/n. Then
E[f(x)] =
1
n
X
log x
i
= log GM
and
f(E[x]) = log
1
n
X
x
i
= log AM
Since
f
(
E
[
x
])
E
[
f
(
x
)], AM
GM. Since
log x
is strictly convex, AM = GM
only if all x
i
are equal.
Theorem (Cauchy-Schwarz inequality). For any two random variables X, Y ,
(E[XY ])
2
E[X
2
]E[Y
2
].
Proof. If Y = 0, then both sides are 0. Otherwise, E[Y
2
] > 0. Let
w = X Y ·
E[XY ]
E[Y
2
]
.
Then
E[w
2
] = E
X
2
2XY
E[XY ]
E[Y
2
]
+ Y
2
(E[XY ])
2
(E[Y
2
])
2
= E[X
2
] 2
(E[XY ])
2
E[Y
2
]
+
(E[XY ])
2
E[Y
2
]
= E[X
2
]
(E[XY ])
2
E[Y
2
]
Since E[w
2
] 0, the Cauchy-Schwarz inequality follows.
Theorem (Markov inequality). If
X
is a random variable with
E|X| <
and
ε > 0, then
P(|X| ε)
E|X|
ε
.
Proof. We make use of the indicator function. We have
I[|X| ε]
|X|
ε
.
This is proved by exhaustion: if
|X| ε
, then LHS = 1 and RHS
1; If
|X| < ε
,
then LHS = 0 and RHS is non-negative.
Take the expected value to obtain
P(|X| ε)
E|X|
ε
.
Similarly, we have
Theorem (Chebyshev inequality). If
X
is a random variable with
E
[
X
2
]
<
and ε > 0, then
P(|X| ε)
E[X
2
]
ε
2
.
Proof. Again, we have
I[{|X| ε}]
x
2
ε
2
.
Then take the expected value and the result follows.
Note that these are really powerful results, since they do not make any
assumptions about the distribution of
X
. On the other hand, if we know
something about the distribution, we can often get a larger bound.
An important corollary is that if µ = E[X], then
P(|X µ| ε)
E[(X µ)
2
]
ε
2
=
var X
ε
2