4Inequalities and Lp spaces

II Probability and Measure



4.1 Four inequalities
The four inequalities we are going to prove are the following:
(i) Chebyshev/Markov inequality
(ii) Jensen’s inequality
(iii) older’s inequality
(iv) Minkowski’s inequality.
So let’s start proving the inequalities.
Proposition
(Chebyshev’s/Markov’s inequality)
.
Let
f
be non-negative mea-
surable and λ > 0. Then
µ({f λ})
1
λ
µ(f).
This is often used when this is a probability measure, so that we are bounding
the probability that a random variable is big.
The proof is essentially one line.
Proof. We write
f f1
fλ
λ1
fλ
.
Taking µ gives the desired answer.
This is incredibly simple, but also incredibly useful!
The next inequality is Jensen’s inequality. To state it, we need to know what
a convex function is.
Definition
(Convex function)
.
Let
I R
be an interval. Then
c
:
I R
is
convex if for any t [0, 1] and x, y I, we have
c(tx + (1 t)y) tc(x) + (1 t)c(y).
x y
(1 t)x + ty
(1 t)f (x) + tc(y)
Note that if c is twice differentiable, then this is equivalent to c
00
> 0.
Proposition
(Jensen’s inequality)
.
Let
X
be an integrable random variable
with values in I. If c : I R is convex, then we have
E[c(X)] c(E[X]).
It is crucial that this only applies to a probability space. We need the total
mass of the measure space to be 1 for it to work. Just being finite is not enough.
Jensen’s inequality will be an easy consequence of the following lemma:
Lemma.
If
c
:
I R
is a convex function and
m
is in the interior of
I
, then
there exists real numbers a, b such that
c(x) ax + b
for all x I, with equality at x = m.
φ
m
ax + b
If the function is differentiable, then we can easily extract this from the
derivative. However, if it is not, then we need to be more careful.
Proof.
If
c
is smooth, then we know
c
00
0, and thus
c
0
is non-decreasing. We
are going to show an analogous statement that does not mention the word
“derivative”. Consider x < m < y with x, y, m I. We want to show that
c(m) c(x)
m x
c(y) c(m)
y m
.
To show this, we turn off our brains and do the only thing we can do. We can
write
m = tx + (1 t)y
for some t. Then convexity tells us
c(m) tc(x) + (1 t)c(y).
Writing c(m) = tc(m) + (1 t)c(m), this tells us
t(c(m) c(x)) (1 t)(c(y) c(m)).
To conclude, we simply have to compute the actual value of
t
and plug it in. We
have
t =
y m
y x
, 1 t =
m x
y x
.
So we obtain
y m
y x
(c(m) c(x))
m x
y x
(c(y) c(m)).
Cancelling the y x and dividing by the factors gives the desired result.
Now since x and y are arbitrary, we know there is some a R such that
c(m) c(x)
m x
a
c(y) c(m)
y m
.
for all x < m < y. If we rearrange, then we obtain
c(t) a(t m) + c(m)
for all t I.
Proof of Jensen’s inequality.
To apply the previous result, we need to pick a
right m. We take
m = E[X].
To apply this, we need to know that
m
is in the interior of
I
. So we assume that
X
is not a.s. constant (that case is boring). By the lemma, we can find some
a, b R such that
c(X) aX + b.
We want to take the expectation of the LHS, but we have to make sure the
E
[
c
(
X
)] is a sensible thing to talk about. To make sure it makes sense, we show
that E[c(X)
] = E[(c(X)) 0] is finite.
We simply bound
[c(X)]
= [c(X)] 0 |a||X| + |b|.
So we have
E[c(X)
] |a|E|X| + |b| <
since X is integrable. So E[c(X)] makes sense.
We then just take
E[c(X)] E[aX + b] = aE[X] + b = am + b = c(m) = c(E[X]).
So done.
We are now going to use Jensen’s inequality to prove older’s inequality.
Before that, we take note of the following definition:
Definition (Conjugate). Let p, q [1, ]. We say that they are conjugate if
1
p
+
1
q
= 1,
where we take 1/ = 0.
Proposition
(H¨older’s inequality)
.
Let
p, q
(1
,
) be conjugate. Then for
f, g measurable, we have
µ(|fg|) = kfgk
1
kfk
p
kgk
q
.
When p = q = 2, then this is the Cauchy-Schwarz inequality.
We will provide two different proofs.
Proof.
We assume that
kfk
p
>
0 and
kfk
p
<
. Otherwise, there is nothing to
prove. By scaling, we may assume that
kfk
p
= 1. We make up a probability
measure by
P[A] =
Z
|f|
p
1
A
dµ.
Since we know
kfk
p
=
Z
|f|
p
dµ
1/p
= 1,
we know P[ ·] is a probability measure. Then we have
µ(|fg|) = µ(|fg|1
{|f|>0}
)
= µ
|g|
|f|
p1
1
{|f|>0}
|f|
p
= E
|g|
|f|
p1
1
{|f|>0}
Now use the fact that (
E|X|
)
q
E
[
|X|
q
] since
x 7→ x
q
is convex for
q >
1. Then
we obtain
E
|g|
q
|f|
(p1)q
1
{|f|>0}

1/q
.
The key realization now is that
1
q
+
1
p
= 1 means that
q
(
p
1) =
p
. So this
becomes
E
|g|
q
|f|
p
1
{|f|>0}
1/q
= µ(|g|
q
)
1/q
= kgk
q
.
Using the fact that kfk
p
= 1, we obtain the desired result.
Alternative proof.
We wlog 0
< kfk
p
, kgk
q
<
, or else there is nothing to
prove. By scaling, we wlog kf k
p
= kgk
q
= 1. Then we have to show that
Z
|f||g| dµ 1.
To do so, we notice if
1
p
+
1
q
= 1, then the concavity of
log
tells us for any
a, b >
0,
we have
1
p
log a +
1
q
log b log
a
p
+
b
q
.
Replacing a with a
p
; b with b
p
and then taking exponentials tells us
ab
a
p
p
+
b
q
q
.
While we assumed
a, b >
0 when deriving, we observe that it is also valid when
some of them are zero. So we have
Z
|f||g| dµ
Z
|f|
p
p
+
|g|
q
q
dµ =
1
p
+
1
q
= 1.
Just like Jensen’s inequality, this is very useful when bounding integrals, and
it is also theoretically very important, because we are going to use it to prove
the Minkowski inequality. This tells us that the L
p
norm is actually a norm.
Before we prove the Minkowski inequality, we prove the following tiny lemma
that we will use repeatedly:
Lemma. Let a, b 0 and p 1. Then
(a + b)
p
2
p
(a
p
+ b
p
).
This is a terrible bound, but is useful when we want to prove that things are
finite.
Proof. We wlog a b. Then
(a + b)
p
(2b)
p
2
p
b
p
2
p
(a
p
+ b
p
).
Theorem (Minkowski inequality). Let p [1, ] and f, g measurable. Then
kf + gk
p
kfk
p
+ kgk
p
.
Again the proof is magic.
Proof. We do the boring cases first. If p = 1, then
kf + gk
1
=
Z
|f + g|
Z
(|f| + |g|) =
Z
|f| +
Z
|g| = kfk
1
+ kgk
1
.
The proof of the case of p = is similar.
Now note that if
kf
+
gk
p
= 0, then the result is trivial. On the other hand,
if kf + gk
p
= , then since we have
|f + g|
p
(|f| + |g|)
p
2
p
(|f|
p
+ |g|
p
),
we know the right hand side is infinite as well. So this case is also done.
Let’s now do the interesting case. We compute
µ(|f + g|
p
) = µ(|f + g||f + g|
p1
)
µ(|f||f + g|
p1
) + µ(|g||f + g|
p1
)
kfk
p
k|f + g|
p1
k
q
+ kgk
p
k|f + g|
p1
k
q
= (kfk
p
+ kgk
p
)k|f + g|
p1
k
q
= (kfk
p
+ kgk
p
)µ(|f + g|
(p1)q
)
11/p
= (kfk
p
+ kgk
p
)µ(|f + g|
p
)
11/p
.
So we know
µ(|f + g|
p
) (kfk
p
+ kgk
p
)µ(|f + g|
p
)
11/p
.
Then dividing both sides by (µ(|f + g|
p
)
11/p
tells us
µ(|f + g|
p
)
1/p
= kf + gk
p
kfk
p
+ kgk
p
.
Given these inequalities, we can go and prove some properties of L
p
spaces.