4Inequalities and Lp spaces

II Probability and Measure



4.3 Orthogonal projection in L
2
In the particular case
p
= 2, we have an extra structure on
L
2
, namely an inner
product structure, given by
hf, gi =
Z
fg dµ.
This inner product induces the L
2
norm by
kfk
2
2
= hf, fi.
Recall the following definition:
Definition
(Hilbert space)
.
A Hilbert space is a vector space with a complete
inner product.
So L
2
is not only a Banach space, but a Hilbert space as well.
Somehow Hilbert spaces are much nicer than Banach spaces, because you
have an inner product structure as well. One particular thing we can do is
orthogonal complements.
Definition (Orthogonal functions). Two functions f, g L
2
are orthogonal if
hf, gi = 0,
Definition (Orthogonal complement). Let V L
2
. We then set
V
= {f L
2
: hf, vi = 0 for all v V }.
Note that we can always make these definitions for any inner product space.
However, the completeness of the space guarantees nice properties of the orthog-
onal complement.
Before we proceed further, we need to make a definition of what it means
for a subspace of
L
2
to be closed. This isn’t the usual definition, since
L
2
isn’t
really a normed vector space, so we need to accommodate for that fact.
Definition
(Closed subspace)
.
Let
V L
2
. Then
V
is closed if whenever (
f
n
)
is a sequence in V with f
n
f , then there exists v V with v f .
Thee main thing that makes
L
2
nice is that we can use closed subspaces to
decompose functions orthogonally.
Theorem.
Let
V
be a closed subspace of
L
2
. Then each
f L
2
has an
orthogonal deco mposition
f = u + v,
where v V and u V
. Moreover,
kf vk
2
kf gk
2
for all g V with equality iff g v.
To prove this result, we need two simple identities, which can be easily proven
by writing out the expression.
Lemma (Pythagoras identity).
kf + gk
2
= kfk
2
+ kgk
2
+ 2hf, gi.
Lemma (Parallelogram law).
kf + gk
2
+ kf gk
2
= 2(kfk
2
+ kgk
2
).
To prove the existence of orthogonal decomposition, we need to use a slight
trick involving the parallelogram law.
Proof of orthogonal decomposition.
Given
f L
2
, we take a sequence (
g
n
) in
V
such that
kf g
n
k
2
d(f, V ) = inf
g
kf gk
2
.
We now want to show that the infimum is attained. To do so, we show that
g
n
is a Cauchy sequence, and by the completeness of L
2
, it will have a limit.
If we apply the parallelogram law with
u
=
f g
n
and
v
=
f g
m
, then we
know
ku + vk
2
2
+ ku vk
2
2
= 2(kuk
2
2
+ kvk
2
2
).
Using our particular choice of u and v, we obtain
2
f
g
n
+ g
m
2
2
2
+ kg
n
g
m
k
2
2
= 2(kf g
n
k
2
2
+ kf g
m
k
2
2
).
So we have
kg
n
g
m
k
2
2
= 2(kf g
n
k
2
2
+ kf g
m
k
2
2
) 4
f
g
n
+ g
m
2
2
2
.
The first two terms on the right hand side tend to
d
(
f, V
)
2
, and the last term
is bounded below in magnitude by 4
d
(
f, V
). So as
n, m
, we must have
kg
n
g
m
k
2
0. By completeness of
L
2
, there exists a
g L
2
such that
g
n
g
.
Now since
V
is assumed to be closed, we can find a
v V
such that
g
=
v
a.e. Then we know
kf vk
2
= lim
n→∞
kf g
n
k
2
= d(f, V ).
So
v
attains the infimum. To show that this gives us an orthogonal decomposition,
we want to show that
u = f v V
.
Suppose
h V
. We need to show that
hu, hi
= 0. We need to do another funny
trick. Suppose t R. Then we have
d(f, V )
2
kf (v + th)k
2
2
= kf vk
2
+ t
2
khk
2
2
2thf v, hi.
We think of this as a quadratic in t, which is minimized when
t =
hf v, hi
khk
2
2
.
But we know this quadratic is minimized when t = 0. So hf v, hi = 0.
We are now going to look at the relationship between conditional expectation
and orthogonal projection.
Definition
(Conditional expectation)
.
Suppose we have a probability space
(Ω
, F, P
), and (
G
n
) is a collection of pairwise disjoint events with
S
n
G
n
= Ω.
We let
G = σ(G
n
: n N).
The conditional expectation of X given G is the random variable
Y =
X
n=1
E[X | G
n
]1
G
n
,
where
E[X | G
n
] =
E[X1
G
n
]
P[G
n
]
for P[G
n
] > 0.
In other words, given any x Ω, say x G
n
, then Y (x) = E[X | G
n
].
If
X L
2
(
P
), then
Y L
2
(
P
), and it is clear that
Y
is
G
-measurable. We
claim that this is in fact the projection of
X
onto the subspace
L
2
(
G, P
) of
G-measurable L
2
random variables in the ambient space L
2
(P).
Proposition.
The conditional expectation of
X
given
G
is the projection of
X
onto the subspace
L
2
(
G, P
) of
G
-measurable
L
2
random variables in the ambient
space L
2
(P).
In some sense, this tells us
Y
is our best prediction of
X
given only the
information encoded in G.
Proof.
Let
Y
be the conditional expectation. It suffices to show that
E
[(
X W
)
2
]
is minimized for
W
=
Y
among
G
-measurable random variables. Suppose that
W is a G-measurable random variable. Since
G = σ(G
n
: n N),
it follows that
W =
X
n=1
a
n
1
G
n
.
where a
n
R. Then
E[(X W )
2
] = E
X
n=1
(X a
n
)1
G
n
!
2
= E
"
X
n
(X
2
+ a
2
n
2a
n
X)1
G
n
#
= E
"
X
n
(X
2
+ a
2
n
2a
n
E[X | G
n
])1
G
n
#
We now optimize the quadratic
X
2
+ a
2
n
2a
n
E[X | G
n
]
over a
n
. We see that this is minimized for
a
n
= E[X | G
n
].
Note that this does not depend on what
X
is in the quadratic, since it is in the
constant term.
Therefore we know that E[X | G
n
] is minimized for W = Y .
We can also rephrase variance and covariance in terms of the L
2
spaces.
Suppose X, Y L
2
(P) with
m
X
= E[X], m
Y
= E[Y ].
Then variance and covariance just correspond to
L
2
inner product and norm.
In fact, we have
var(X) = E[(X m
X
)
2
] = kX m
X
k
2
2
,
cov(X, Y ) = E[(X m
X
)(Y m
Y
)] = hX m
X
, Y m
Y
i.
More generally, the covariance matrix of a random vector
X
= (
X
1
, ··· , X
n
) is
given by
var(X) = (cov(X
i
, X
j
))
ij
.
On the example sheet, we will see that the covariance matrix is a positive definite
matrix.