II Probability and Measure - Inequalities and L<sup>p</sup> spaces

4Inequalities and L^p spaces

II Probability and Measure

4.3 Orthogonal projection in L

In the particular case

= 2, we have an extra structure on

, namely an inner

product structure, given by

hf, gi =

fg dµ.

This inner product induces the L

norm by

kfk

= hf, fi.

Recall the following definition:

Definition

(Hilbert space)

A Hilbert space is a vector space with a complete

inner product.

So L

is not only a Banach space, but a Hilbert space as well.

Somehow Hilbert spaces are much nicer than Banach spaces, because you

have an inner product structure as well. One particular thing we can do is

orthogonal complements.

Definition (Orthogonal functions). Two functions f, g ∈ L

are orthogonal if

hf, gi = 0,

Definition (Orthogonal complement). Let V ⊆ L

. We then set

⊥

= {f ∈ L

: hf, vi = 0 for all v ∈ V }.

Note that we can always make these definitions for any inner product space.

However, the completeness of the space guarantees nice properties of the orthog-

onal complement.

Before we proceed further, we need to make a definition of what it means

for a subspace of

to be closed. This isn’t the usual definition, since

isn’t

really a normed vector space, so we need to accommodate for that fact.

Definition

(Closed subspace)

Let

V ⊆ L

. Then

is closed if whenever (

)

is a sequence in V with f

→ f , then there exists v ∈ V with v ∼ f .

Thee main thing that makes

nice is that we can use closed subspaces to

decompose functions orthogonally.

Theorem.

Let

be a closed subspace of

. Then each

f ∈ L

has an

orthogonal deco mposition

f = u + v,

where v ∈ V and u ∈ V

⊥

. Moreover,

kf −vk

≤ kf − gk

for all g ∈ V with equality iff g ∼ v.

To prove this result, we need two simple identities, which can be easily proven

by writing out the expression.

Lemma (Pythagoras identity).

kf + gk

= kfk

+ kgk

+ 2hf, gi.

Lemma (Parallelogram law).

kf + gk

+ kf −gk

= 2(kfk

+ kgk

To prove the existence of orthogonal decomposition, we need to use a slight

trick involving the parallelogram law.

Proof of orthogonal decomposition.

Given

f ∈ L

, we take a sequence (

) in

such that

kf − g

→ d(f, V ) = inf

kf − gk

We now want to show that the infimum is attained. To do so, we show that

is a Cauchy sequence, and by the completeness of L

, it will have a limit.

If we apply the parallelogram law with

f − g

and

f − g

, then we

know

ku + vk

+ ku − vk

= 2(kuk

+ kvk

Using our particular choice of u and v, we obtain





f −

+ g





+ kg

− g

= 2(kf − g

+ kf − g

So we have

− g

= 2(kf − g

+ kf − g

) − 4



f −

+ g



The first two terms on the right hand side tend to

(

f, V

)

, and the last term

is bounded below in magnitude by 4

(

f, V

). So as

n, m → ∞

, we must have

−g

→

0. By completeness of

, there exists a

g ∈ L

such that

→ g

Now since

is assumed to be closed, we can find a

v ∈ V

such that

a.e. Then we know

kf − vk

= lim

n→∞

kf − g

= d(f, V ).

attains the infimum. To show that this gives us an orthogonal decomposition,

we want to show that

u = f − v ∈ V

⊥

Suppose

h ∈ V

. We need to show that

hu, hi

= 0. We need to do another funny

trick. Suppose t ∈ R. Then we have

d(f, V )

≤ kf − (v + th)k

= kf − vk

+ t

khk

− 2thf − v, hi.

We think of this as a quadratic in t, which is minimized when

t =

hf − v, hi

khk

But we know this quadratic is minimized when t = 0. So hf − v, hi = 0.

We are now going to look at the relationship between conditional expectation

and orthogonal projection.

Definition

(Conditional expectation)

Suppose we have a probability space

(Ω

, F, P

), and (

) is a collection of pairwise disjoint events with

= Ω.

We let

G = σ(G

: n ∈ N).

The conditional expectation of X given G is the random variable

Y =

∞

n=1

E[X | G

where

E[X | G

] =

E[X1

]

P[G

]

for P[G

] > 0.

In other words, given any x ∈ Ω, say x ∈ G

, then Y (x) = E[X | G

X ∈ L

(

), then

Y ∈ L

(

), and it is clear that

-measurable. We

claim that this is in fact the projection of

onto the subspace

(

G, P

) of

G-measurable L

random variables in the ambient space L

(P).

Proposition.

The conditional expectation of

given

is the projection of

onto the subspace

(

G, P

) of

-measurable

random variables in the ambient

space L

(P).

In some sense, this tells us

is our best prediction of

given only the

information encoded in G.

Proof.

Let

be the conditional expectation. It suffices to show that

[(

X −W

)

]

is minimized for

among

-measurable random variables. Suppose that

W is a G-measurable random variable. Since

G = σ(G

: n ∈ N),

it follows that

W =

∞

n=1

where a

∈ R. Then

E[(X − W )

] = E





∞

n=1

(X − a





= E

+ a

− 2a

X)1

= E

+ a

− 2a

E[X | G

])1

We now optimize the quadratic

+ a

− 2a

E[X | G

]

over a

. We see that this is minimized for

= E[X | G

Note that this does not depend on what

is in the quadratic, since it is in the

constant term.

Therefore we know that E[X | G

] is minimized for W = Y .

We can also rephrase variance and covariance in terms of the L

spaces.

Suppose X, Y ∈ L

(P) with

= E[X], m

= E[Y ].

Then variance and covariance just correspond to

inner product and norm.

In fact, we have

var(X) = E[(X − m

)

] = kX − m

cov(X, Y ) = E[(X − m

)(Y − m

)] = hX − m

, Y − m

More generally, the covariance matrix of a random vector

= (

, ··· , X

) is

given by

var(X) = (cov(X

, X

))

On the example sheet, we will see that the covariance matrix is a positive definite

matrix.