II Probability and Measure - Inequalities and L<sup>p</sup> spaces

4Inequalities and L^p spaces

II Probability and Measure

4.1 Four inequalities

The four inequalities we are going to prove are the following:

(i) Chebyshev/Markov inequality

(ii) Jensen’s inequality

(iii) H¨older’s inequality

(iv) Minkowski’s inequality.

So let’s start proving the inequalities.

Proposition

(Chebyshev’s/Markov’s inequality)

Let

be non-negative mea-

surable and λ > 0. Then

µ({f ≥ λ}) ≤

µ(f).

This is often used when this is a probability measure, so that we are bounding

the probability that a random variable is big.

The proof is essentially one line.

Proof. We write

f ≥ f1

f≥λ

≥ λ1

f≥λ

Taking µ gives the desired answer.

This is incredibly simple, but also incredibly useful!

The next inequality is Jensen’s inequality. To state it, we need to know what

a convex function is.

Definition

(Convex function)

Let

I ⊆ R

be an interval. Then

I → R

convex if for any t ∈ [0, 1] and x, y ∈ I, we have

c(tx + (1 − t)y) ≤ tc(x) + (1 − t)c(y).

x y

(1 − t)x + ty

(1 − t)f (x) + tc(y)

Note that if c is twice differentiable, then this is equivalent to c

> 0.

Proposition

(Jensen’s inequality)

Let

be an integrable random variable

with values in I. If c : I → R is convex, then we have

E[c(X)] ≥ c(E[X]).

It is crucial that this only applies to a probability space. We need the total

mass of the measure space to be 1 for it to work. Just being finite is not enough.

Jensen’s inequality will be an easy consequence of the following lemma:

Lemma.

I → R

is a convex function and

is in the interior of

, then

there exists real numbers a, b such that

c(x) ≥ ax + b

for all x ∈ I, with equality at x = m.

ax + b

If the function is differentiable, then we can easily extract this from the

derivative. However, if it is not, then we need to be more careful.

Proof.

is smooth, then we know

≥

0, and thus

is non-decreasing. We

are going to show an analogous statement that does not mention the word

“derivative”. Consider x < m < y with x, y, m ∈ I. We want to show that

c(m) − c(x)

m − x

≤

c(y) − c(m)

y −m

To show this, we turn off our brains and do the only thing we can do. We can

write

m = tx + (1 − t)y

for some t. Then convexity tells us

c(m) ≤ tc(x) + (1 − t)c(y).

Writing c(m) = tc(m) + (1 − t)c(m), this tells us

t(c(m) − c(x)) ≤ (1 − t)(c(y) − c(m)).

To conclude, we simply have to compute the actual value of

and plug it in. We

have

t =

y −m

y −x

, 1 − t =

m − x

y −x

So we obtain

y −m

y −x

(c(m) − c(x)) ≤

m − x

y −x

(c(y) − c(m)).

Cancelling the y −x and dividing by the factors gives the desired result.

Now since x and y are arbitrary, we know there is some a ∈ R such that

c(m) − c(x)

m − x

≤ a ≤

c(y) − c(m)

y −m

for all x < m < y. If we rearrange, then we obtain

c(t) ≥ a(t − m) + c(m)

for all t ∈ I.

Proof of Jensen’s inequality.

To apply the previous result, we need to pick a

right m. We take

m = E[X].

To apply this, we need to know that

is in the interior of

. So we assume that

is not a.s. constant (that case is boring). By the lemma, we can find some

a, b ∈ R such that

c(X) ≥ aX + b.

We want to take the expectation of the LHS, but we have to make sure the

[

(

)] is a sensible thing to talk about. To make sure it makes sense, we show

that E[c(X)

−

] = E[(−c(X)) ∨ 0] is finite.

We simply bound

[c(X)]

−

= [−c(X)] ∨ 0 ≤ |a||X| + |b|.

So we have

E[c(X)

−

] ≤ |a|E|X| + |b| < ∞

since X is integrable. So E[c(X)] makes sense.

We then just take

E[c(X)] ≥ E[aX + b] = aE[X] + b = am + b = c(m) = c(E[X]).

So done.

We are now going to use Jensen’s inequality to prove H¨older’s inequality.

Before that, we take note of the following definition:

Definition (Conjugate). Let p, q ∈ [1, ∞]. We say that they are conjugate if

= 1,

where we take 1/∞ = 0.

Proposition

(H¨older’s inequality)

Let

p, q ∈

, ∞

) be conjugate. Then for

f, g measurable, we have

µ(|fg|) = kfgk

≤ kfk

kgk

When p = q = 2, then this is the Cauchy-Schwarz inequality.

We will provide two different proofs.

Proof.

We assume that

kfk

0 and

kfk

< ∞

. Otherwise, there is nothing to

prove. By scaling, we may assume that

kfk

= 1. We make up a probability

measure by

P[A] =

|f|

dµ.

Since we know

kfk



|f|

dµ



1/p

= 1,

we know P[ ·] is a probability measure. Then we have

µ(|fg|) = µ(|fg|1

{|f|>0}

)

= µ



|g|

|f|

p−1

{|f|>0}

|f|



= E



|g|

|f|

p−1

{|f|>0}



Now use the fact that (

E|X|

)

≤ E

[

|X|

] since

x 7→ x

is convex for

q >

1. Then

we obtain

≤





|g|

|f|

(p−1)q

{|f|>0}



1/q

The key realization now is that

= 1 means that

(

p −

1) =

. So this

becomes



|g|

|f|

{|f|>0}



1/q

= µ(|g|

)

1/q

= kgk

Using the fact that kfk

= 1, we obtain the desired result.

Alternative proof.

We wlog 0

< kfk

, kgk

< ∞

, or else there is nothing to

prove. By scaling, we wlog kf k

= kgk

= 1. Then we have to show that

|f||g| dµ ≤ 1.

To do so, we notice if

= 1, then the concavity of

log

tells us for any

a, b >

we have

log a +

log b ≤ log





Replacing a with a

; b with b

and then taking exponentials tells us

ab ≤

While we assumed

a, b >

0 when deriving, we observe that it is also valid when

some of them are zero. So we have

|f||g| dµ ≤



|f|

|g|



dµ =

= 1.

Just like Jensen’s inequality, this is very useful when bounding integrals, and

it is also theoretically very important, because we are going to use it to prove

the Minkowski inequality. This tells us that the L

norm is actually a norm.

Before we prove the Minkowski inequality, we prove the following tiny lemma

that we will use repeatedly:

Lemma. Let a, b ≥ 0 and p ≥ 1. Then

(a + b)

≤ 2

+ b

This is a terrible bound, but is useful when we want to prove that things are

finite.

Proof. We wlog a ≤ b. Then

(a + b)

≤ (2b)

≤ 2

+ b

Theorem (Minkowski inequality). Let p ∈ [1, ∞] and f, g measurable. Then

kf + gk

≤ kfk

+ kgk

Again the proof is magic.

Proof. We do the boring cases first. If p = 1, then

kf + gk

|f + g| ≤

(|f| + |g|) =

|f| +

|g| = kfk

+ kgk

The proof of the case of p = ∞ is similar.

Now note that if

= 0, then the result is trivial. On the other hand,

if kf + gk

= ∞, then since we have

|f + g|

≤ (|f| + |g|)

≤ 2

(|f|

+ |g|

we know the right hand side is infinite as well. So this case is also done.

Let’s now do the interesting case. We compute

µ(|f + g|

) = µ(|f + g||f + g|

p−1

)

≤ µ(|f||f + g|

p−1

) + µ(|g||f + g|

p−1

)

≤ kfk

k|f + g|

p−1

+ kgk

k|f + g|

p−1

= (kfk

+ kgk

)k|f + g|

p−1

= (kfk

+ kgk

)µ(|f + g|

(p−1)q

)

1−1/p

= (kfk

+ kgk

)µ(|f + g|

)

1−1/p

So we know

µ(|f + g|

) ≤ (kfk

+ kgk

)µ(|f + g|

)

1−1/p

Then dividing both sides by (µ(|f + g|

)

1−1/p

tells us

µ(|f + g|

)

1/p

= kf + gk

≤ kfk

+ kgk

Given these inequalities, we can go and prove some properties of L

spaces.