IA Probability - Discrete random variables

3Discrete random variables

IA Probability

3.2 Inequalities

Here we prove a lot of different inequalities which may be useful for certain

calculations. In particular, Chebyshev’s inequality will allow us to prove the

weak law of large numbers.

Definition (Convex function). A function

: (

a, b

)

→ R

is convex if for all

, x

∈ (a, b) and λ

, λ

≥ 0 such that λ

+ λ

= 1,

f(x

) + λ

f(x

) ≥ f(λ

+ λ

It is strictly convex if the inequality above is strict (except when

or λ

= 0).

+ λ

f (x

) + λ

f (x

)

A function is concave if −f is convex.

A useful criterion for convexity is

Proposition. If

is differentiable and

′′

(

)

≥

0 for all

x ∈

(

a, b

), then it is

convex. It is strictly convex if f

′′

(x) > 0.

Theorem (Jensen’s inequality). If f : (a, b) → R is convex, then

i=1

f(x

) ≥ f

i=1

for all p

, p

, ··· , p

such that p

≥ 0 and

= 1, and x

∈ (a, b).

This says that E[f(X)] ≥ f (E[X]) (where P(X = x

) = p

is strictly convex, then equalities hold only if all

are equal, i.e.

takes only one possible value.

Proof. Induct on n. It is true for n = 2 by the definition of convexity. Then

f(p

+ ··· + p

) = f



+ (p

+ ··· + p

)

+ ··· + l

+ ··· + p



≤ p

f(x

) + (p

+ ···p



+ ··· + p



≤ p

f(x

) + (p

+ ··· + p

)



( )

f(x

) + ··· +

( )

f(x

)



= p

f(x

) + ··· + p

where the ( ) is p

+ ··· + p

Strictly convex case is proved with

≤

replaced by

by definition of strict

convexity.

Corollary (AM-GM inequality). Given x

, ··· , x

positive reals, then





1/n

≤

Proof.

Take

(

) =

−log x

. This is convex since its second derivative is

−2

Take P(x = x

) = 1/n. Then

E[f(x)] =

−log x

= −log GM

and

f(E[x]) = −log

= −log AM

Since

(

[

])

≤ E

[

(

)], AM

≥

GM. Since

−log x

is strictly convex, AM = GM

only if all x

are equal.

Theorem (Cauchy-Schwarz inequality). For any two random variables X, Y ,

(E[XY ])

≤ E[X

]E[Y

Proof. If Y = 0, then both sides are 0. Otherwise, E[Y

] > 0. Let

w = X − Y ·

E[XY ]

E[Y

]

Then

E[w

] = E



− 2XY

E[XY ]

E[Y

]

+ Y

(E[XY ])

(E[Y

])



= E[X

] − 2

(E[XY ])

E[Y

]

(E[XY ])

E[Y

]

= E[X

] −

(E[XY ])

E[Y

]

Since E[w

] ≥ 0, the Cauchy-Schwarz inequality follows.

Theorem (Markov inequality). If

is a random variable with

E|X| < ∞

and

ε > 0, then

P(|X| ≥ ε) ≤

E|X|

Proof. We make use of the indicator function. We have

I[|X| ≥ ε] ≤

|X|

This is proved by exhaustion: if

|X| ≥ ε

, then LHS = 1 and RHS

≥

1; If

|X| < ε

then LHS = 0 and RHS is non-negative.

Take the expected value to obtain

P(|X| ≥ ε) ≤

E|X|

Similarly, we have

Theorem (Chebyshev inequality). If

is a random variable with

[

]

< ∞

and ε > 0, then

P(|X| ≥ ε) ≤

E[X

]

Proof. Again, we have

I[{|X| ≥ ε}] ≤

Then take the expected value and the result follows.

Note that these are really powerful results, since they do not make any

assumptions about the distribution of

. On the other hand, if we know

something about the distribution, we can often get a larger bound.

An important corollary is that if µ = E[X], then

P(|X − µ| ≥ ε) ≤

E[(X − µ)

]

var X