II Linear Analysis (Full)

Part II — Linear Analysis

Based on lectures by J. W. Luk

Notes taken by Dexter Chua

Michaelmas 2015

These notes are not endorsed by the lecturers, and I have modified them (often

significantly) after lectures. They are nowhere near accurate representations of what

was actually lectured, and in particular, all errors are almost surely mine.

Part IB Linear Algebra, Analysis II and Metric and Topological Spaces are essential

Normed and Banach spaces. Linear mappings, continuity, boundedness, and norms.

Finite-dimensional normed spaces. [4]

The Baire category theorem. The principle of uniform boundedness, the closed graph

theorem and the inversion theorem; other applications. [5]

The normality of compact Hausdorff spaces. Urysohn’s lemma and Tiezte’s exten-

sion theorem. Spaces of continuous functions. The Stone-Weierstrass theorem and

applications. Equicontinuity: the Ascoli-Arzel`a theorem. [5]

Inner product spaces and Hilbert spaces; examples and elementary properties. Or-

thonormal systems, and the orthogonalization process. Bessel’s inequality, the Parseval

equation, and the Riesz-Fischer theorem. Duality; the self duality of Hilbert space. [5]

Bounded linear operations, invariant subspaces, eigenvectors; the spectrum and resolvent

set. Compact operators on Hilbert space; discreteness of spectrum. Spectral theorem

for compact Hermitian operators. [5]

Contents

0 Introduction

1 Normed vector spaces

1.1 Bounded linear maps

1.2 Dual spaces

1.3 Adjoint

1.4 The double dual

1.5 Isomorphism

1.6 Finite-dimensional normed vector spaces

1.7 Hahn–Banach Theorem

2 Baire category theorem

2.1 The Baire category theorem

2.2 Some applications

3 The topology of C(K)

3.1 Normality of compact Hausdorff spaces

3.2 Tietze-Urysohn extension theorem

3.3 Arzel`a-Ascoli theorem

3.4 Stone–Weierstrass theorem

4 Hilbert spaces

4.1 Inner product spaces

4.2 Riesz representation theorem

4.3 Orthonormal systems and basis

4.4 The isomorphism with `

4.5 Operators

4.6 Self-adjoint operators

0 Introduction

In IB Linear Algebra, we studied vector spaces in general. Most of the time, we

concentrated on finite-dimensional vector spaces, since these are easy to reason

about. For example, we know that every finite-dimensional vector space (by

definition) has a basis. Using the basis, we can represent vectors and linear maps

concretely as column vectors (in F

) and matrices.

However, in real life, often we have to work with infinite-dimensional vector

spaces instead. For example, we might want to consider the vector space of

all continuous (real) functions, or the vector space of infinite sequences. It is

difficult to analyse these spaces using the tools from IB Linear Algebra, since

many of those assume the vector space is finite-dimensional. Moreover, in these

cases, we often are not interested in the vector space structure itself. It’s just

that the objects we are interested in happen to have a vector space structure.

Instead, we want to look at notions like continuity and convergence. We want to

do analysis on vector spaces. These are not something the vector space structure

itself provides.

In this course, we are going to give our vector spaces some additional structure.

For the first half of the course, we will grant our vector space a norm. This

allows us to assign a “length” to each vector. With this, we can easily define

convergence and continuity. It turns out this allows us to understand a lot about,

say, function spaces and sequence spaces.

In the second half, we will grant a stronger notion, called the inner product.

Among many things, this allows us to define orthogonality of the elements of a

vector space, which is something we are familiar with from, say, IB Methods.

Most of the time, we will be focusing on infinite-dimensional vector spaces,

since finite-dimensional spaces are boring. In fact, we have a section dedicated

to proving that finite-dimensional vector spaces are boring. In particular, they

are all isomorphic to

, and most of our theorems can be proved trivially for

finite-dimensional spaces using what we already know from IB Linear Algebra.

So we will not care much about them.

1 Normed vector spaces

In IB Linear Algebra, we have studied vector spaces in quite a lot of detail.

However, just knowing something is a vector space usually isn’t too helpful.

Often, we would want the vector space to have some additional structure. The

first structure we will study is a norm.

Definition

(Normed vector space)

A normed vector space is a pair (

V, k · k

where

is a vector space over a field

and

k · k

is a function

k · k

V 7→ R

known as the norm, satisfying

(i) kvk ≥ 0 for all v ∈ V , with equality iff v = 0.

(ii) kλvk = |λ|kvk for all λ ∈ F, v ∈ V .

(iii) kv + wk ≤ kvk+ kwk for all v, w ∈ V .

Intuitively, we think of kvk as the “length” or “magnitude” of the vector.

Example.

Let

be a finite dimensional vector space, and

, ··· , e

}

a basis.

Then, for any v =

i=1

, we can define a norm as

kvk =

i=1

If we are given a norm of a vector space

, we immediately obtain two more

structures on V for free, namely a metric and a topology.

Recall from IB Metric and Topological Spaces that (

V, d

) is a metric space if

the metric d : V × V → R satisfies

(i) d(x, x) = 0 for all x ∈ V .

(ii) d(x, y) = d(y, x) for all x, y ∈ V .

(iii) d(x, y) ≤ d(x, z) + d(z, y) for all x, y, z ∈ V .

Also, a topological spaces is a set

together with a topology (a collection of

open subsets) such that

(i) ∅ and V are open subsets.

(ii) The union of open subsets is open.

(iii) The finite intersection of open subsets is open.

As we have seen in IB Metric and Topological Spaces, a norm on a vector space

induces a metric by

(

v, w

) =

kv − wk

. This metric in terms defines a topology

where the open sets are given by “

U ⊆ V

is open iff for any

x ∈ U

, there

exists some ε > 0 such that B(x, ε) = {y ∈ V : d(x, y) < ε} ⊆ U”.

This induced topology is not just a random topology on the vector space.

They have the nice property that the vector space operators behave well under

this topology.

Proposition.

Addition + :

V ×V → V

, and scalar multiplication

F×V → V

are continuous with respect to the topology induced by the norm (and the usual

product topology).

Proof.

Let

be open in

. We want to show that (+)

−1

(

) is open. Let

(

, v

)

∈

(+)

−1

(

), i.e.

∈ U

. Since

∈ U

, there exists

such that

(

, ε

)

⊆ U

. By the triangle inequality, we know that

(

)

⊆

U. Hence we have (v

, v

) ∈ B



, v



⊆ (+)

−1

(U). So (+)

−1

(U) is open.

Scalar multiplication can be done in a very similar way.

This motivates the following definition — we can do without the norm, and

just require a topology in which addition and scalar multiplication are continuous.

Definition

(Topological vector space)

A topological vector space (

V, U

) is

a vector space

together with a topology

such that addition and scalar

multiplication are continuous maps, and moreover singleton points

{v}

are

closed sets.

The requirement that points are closed is just a rather technical requirement

needed in certain proofs. We should, however, not pay too much attention to

this when trying to understand it intuitively.

A natural question to ask is: when is a topological vector space normable? i.e.

Given a topological vector space, can we find a norm that induces the topology?

To answer this question, we will first need a few definitions.

Definition

(Absolute convexity)

Let

be a vector space. Then

C ⊆ V

absolutely convex (or balanced convex ) if for any

λ, µ ∈ F

such that

|λ|

|µ| ≤

we have λC + µC ⊆ C. In other words, if c

, c

∈ C, we have λc

+ µc

∈ C.

Proposition.

If (

V, k · k

) is a normed vector space, then

(

) =

(

0, t

) =

kvk < t} is absolutely convex.

Proof. By triangle inequality.

Definition

(Bounded subset)

Let

be a topological vector space. Then

B ⊆ V

is bounded if for every open neighbourhood

U ⊆ V

, there is some

s >

0 such

that B ⊆ tU for all t > s.

At first sight, this might seem like a rather weird definition. Intuitively, this

just means that

is bounded if, whenever we take any open set

, by enlarging

it by a scalar multiple, we can make it fully contain B.

Example. B(t) in a normed vector space is bounded.

Proposition.

A topological vector space (

V, U

) is normable if and only if there

exists an absolutely convex, bounded open neighbourhood of 0.

Proof.

One direction is obvious — if

is normable, then

(

) is an absolutely

convex, bounded open neighbourhood of 0.

The other direction is not too difficult as well. We define the Minkowski

functional µ : V → R by

(v) = inf{t > 0 : v ∈ tC},

where C is our absolutely convex, bounded open neighbourhood.

Note that by definition, for any

t < µ

(

v 6∈ tC

. On the other hand, by

absolute convexity, for any t > µ

(v), we have v ∈ tC.

We now show that this is a norm on V :

(i) If v = 0, then v ∈ 0C. So µ

(0) = 0. On the other hand, suppose v 6= 0.

Since a singleton point is closed,

V \{v}

is an open neighbourhood of

0. Hence there is some

such that

C ⊆ tU

. Alternatively,

C ⊆ U

. Hence,

v 6∈

C. So µ

(v) ≥

> 0. So µ

(v) = 0 iff v = 0.

(ii) We have

(λv) = inf{t > 0 : λv ∈ tC} = λ inf{t > 0 : v ∈ tC} = λµ

(v).

(iii) We want to show that

(v + w) ≤ µ

(v) + µ

(w).

This is equivalent to showing that

inf{t > 0 : v + w ∈ tC} ≤ inf{t > 0 : v ∈ tC} + inf{r > 0 : w ∈ rC}.

This is, in turn equivalent to proving that if

v ∈ tC

and

w ∈ rC

, then

(v + w) ∈ (t + r)C.

Let

v/t, w

w/r

. Then we want to show that if

∈ C

and

∈ C

then

(t+r)

(

)

∈ C

. This is exactly what is required by convexity.

So done.

In fact, the condition of absolute convexity can be replaced “convex”, where

“convex” means for every

t ∈

1],

+ (1

− t

)

C ⊆ C

. This is since for every

convex bounded

, we can find always find a absolutely convex bounded

C ⊆ C

which is something not hard to prove.

Among all normed spaces, some are particularly nice, known as Banach

spaces.

Definition

(Banach spaces)

A normed vector space is a Banach space if it is

complete as a metric space, i.e. every Cauchy sequence converges.

Example.

(i)

A finite dimensional vector space (which is isomorphic to

for some

)

is Banach.

(ii) Let X be a compact Hausdorff space. Then let

B(X) = {f : X → R such that f is bounded}.

This is obviously a vector space, and we can define the norm be

kfk

sup

x∈X

(

). It is easy to show that this is a norm. It is less trivial to

show that this is a Banach space.

Let

} ⊆ B

(

) be a Cauchy sequence. Then for any

(

)

} ⊆ R

also Cauchy. So we can define f(x) = lim

n→∞

(x).

To show that

→ f

, let

ε >

0. By definition of

being Cauchy,

there is some

such that for any

n, m > N

and any fixed

, we have

(

)

− f

(

)

| < ε

. Take the limit as

m → ∞

. Then

(

)

→ f

(

). So

(

)

− f

(

)

| ≤ ε

. Since this is true for all

, for any

n > N

, we must

have kf

− fk ≤ ε. So f

→ f.

(iii) Define X as before, and let

C(X) = {f : X → R such that f is continuous}.

Since any continuous

is bounded, so

(

)

⊆ B

(

). We define the norm

as before.

Since we know that

(

)

⊆ B

(

), to show that

(

) is Banach, it suffices

to show that

(

)

⊆ B

(

) is closed, i.e. if

→ f

∈ C

(

), then

f ∈ C

(

), i.e. the uniform limit of a continuous function is continuous.

Proof can be found in IB Analysis II.

(iv) For 1 ≤ p < ∞, define

([0, 1]) = {f : [0, 1] → R such that f is continuous}.

We define the norm k · k

kfk



|f|



1/p

It is easy to show that

is indeed a vector space, and we now check that

this is a norm.

(a) kfk

≥

0 is obvious. Also, suppose that

kfk

= 0. Then we must

have

= 0. Otherwise, if

f 6

= 0, say

(

) =

for some

. Then there

is some

such that for any

y ∈

(

x − δ, x

), we have

(

)

k ≥

Hence

kfk



|f|



1/p

≥

2δ





1/p

> 0.

(b) kλfk = |λ|kfk is obvious

(c)

The triangle inequality is the exactly what the Minkowski inequality

says, which is in the example sheet.

It turns out that

is not a Banach space. We can brute-force a hard

proof here, but we will later develop some tools that allow us to prove this

much more easily.

Hence, we define

([0

1]) to be the completion of

([0

1]). In IID

Probability and Measure, we will show that L

([0, 1]) is in fact the space

([0, 1]) =



f : [0, 1] → R such that

|f|

dx < ∞



/∼,

where the integral is the Lebesgue integral, and we are quotienting by the

relation

f ∼ g

Lebesgue almost everywhere. You will understand

what these terms mean in the IID Probability and Measure course.

(v) `

spaces: for p ∈ [1, ∞), define

(F) =

(

, x

, ···) : x

∈ F,

∞

i=1

< ∞

)

with the norm

kxk

∞

i=1

1/p

It should be easy to check that this is a normed vector space. Moreover,

this is a Banach space. Proof is in example sheet.

(vi) `

∞

space: we define

∞



, x

, ···) : x

∈ F, sup

i∈N

| < ∞



with norm

kxk

∞

= sup

i∈N

Again, this is a Banach space.

(vii)

Let

(1) be the unit open ball in

. Define

(

) to be the set of

continuous functions

B → R

. Note that unlike in our previous example,

these functions need not be bounded. So our previous norm cannot be

applied. However, we can still define a topology as follows:

Let

}

∞

i=1

be a sequence of compact subsets of

such that

⊆ K

i+1

and

∞

i=1

= B. We define the basis to include



f ∈ C(B) : sup

x∈K

|f(x)| <



for each m, i = 1, 2, ···, as well as the translations of these sets.

This weird basis is chosen such that

→ f

in this topology iff

→ f

uniformly in every compact set. It can be showed that this is not normable.

1.1 Bounded linear maps

With vector spaces, we studied linear maps. These are maps that respect the

linear structure of a vector space. With normed vector spaces, the right kind of

maps to study is the bounded linear maps.

Definition

(Bounded linear map)

. T

X → Y

is a bounded linear map if there

is a constant

C >

0 such that

kT xk

≤ Ckxk

for all

x ∈ X

. We write

(

X, Y

)

for the set of bounded linear maps from X to Y .

This is equivalent to saying

(

(1))

⊆ B

(

) for some

C >

0. This also

equivalent to saying that

(

) is bounded for every bounded subset

Note that this final characterization is also valid when we just have a topological

vector space.

How does boundedness relate to the topological structure of the vector spaces?

It turns out that boundedness is the same as continuity, which is another reason

why we like bounded linear maps.

Proposition.

Let

be normed vector spaces,

X → Y

a linear map.

Then the following are equivalent:

(i) T is continuous.

(ii) T is continuous at 0.

(iii) T is bounded.

Proof. (i) ⇒ (ii) is obvious.

(ii)

⇒

(iii): Consider

(1)

⊆ Y

, the unit open ball. Since

is continuous

at 0,

−1

(

(1))

⊆ X

is open. Hence there exists

ε >

0 such that

(

)

⊆

−1

(1)). So T (B

(ε)) ⊆ B

(1). So T (B

(1)) ⊆ B





. So T is bounded.

(iii)

⇒

(i): Let

ε >

0. Then

kT x

−T x

(

−x

)

≤ Ckx

−x

This is less than ε if kx

− x

k < C

−1

ε. So done.

Using the obvious operations,

(

X, Y

) can be made a vector space. What

about a norm?

Definition

(Norm on

(

X, Y

))

Let

X → Y

be a bounded linear map.

Define kT k

B(X,Y )

kT k

B(X,Y )

= sup

kxk≤1

kT xk

Alternatively, this is the minimum

such that

kT xk

≤ Ckxk

for all

In particular, we have

kT xk

≤ kT k

B(X,Y )

kxk

1.2 Dual spaces

We will frequently be interested in one particular case of B(X, Y ).

Definition (Dual space). Let V be a normed vector space. The dual space is

∗

= B(V, F).

We call the elements of V

∗

functionals. The algebraic dual of V is

= L(V, F),

where we do not require boundedness.

One particularly nice property of the dual is that

∗

is always a Banach

space.

Proposition. Let V be a normed vector space. Then V

∗

is a Banach space.

Proof.

Suppose

} ∈ V

∗

is a Cauchy sequence. We define

as follows: for

any

v ∈ V

(

)

} ⊆ F

is Cauchy sequence. Since

is complete (it is either

or C), we can define T : V → R by

T (v) = lim

n→∞

(v).

Our objective is to show that

→ T

. The first step is to show that we indeed

have T ∈ V

∗

, i.e. T is a bounded map.

Let

kvk ≤

1. Pick

= 1. Then there is some

such that for all

i > N

, we

have

(v) −T (v)| < 1.

Then we have

|T (v)| ≤ |T

(v) −T (v)| + |T

(v)|

< 1 + kT

∗

kvk

≤ 1 + kT

∗

≤ 1 + sup

∗

Since

is Cauchy,

sup

∗

is bounded. Since this bound does not depend

on v (and N), we get that T is bounded.

Now we want to show that kT

− T k

∗

→ 0 as n → ∞.

For arbitrary ε > 0, there is some N such that for all i, j > N , we have

− T

∗

< ε.

In particular, for any v such that kvk ≤ 1, we have

(v) −T

(v)| < ε.

Taking the limit as j → ∞, we obtain

(v) −T (v)| ≤ ε.

Since this is true for any v, we have

− T k

∗

≤ ε.

for all i > N. So T

→ T .

Exercise: in general, for

X, Y

normed vector spaces, what condition on

and Y guarantees that B(X, Y ) is a Banach space?

1.3 Adjoint

The idea of the adjoint is given a

T ∈ B

(

X, Y

), produce a “dual map”, or an

adjoint T

∗

∈ B(Y

∗

, X

∗

There is really only one (non-trivial) natural way of doing this. First we can

think about what

∗

should do. It takes in something from

∗

and produces

something in

∗

. By the definition of the dual space, this is equivalent to taking

in a function g : Y → F and returning a function T

∗

(g) : X → F.

To produce this

∗

(

), the only things we have on our hands to use are

X → Y

and

Y → F

. Thus the only option we have is to define

∗

(

)

as the composition g ◦ T , or T

∗

(g)(x) = g(T(x)) (we also have a silly option of

producing the zero map regardless of input, but this is silly). Indeed, this is the

definition of the adjoint.

Definition

(Adjoint)

Let

X, Y

be normal vector spaces. Given

T ∈ B

(

X, Y

we define the adjoint of T , denoted T

∗

, as a map T

∗

∈ B(Y

∗

, X

∗

) given by

∗

(g)(x) = g(T(x))

for x ∈ X, y ∈ Y

∗

. Alternatively, we can write

∗

(g) = g ◦ T.

It is easy to show that our T

∗

is indeed linear. We now show it is bounded.

Proposition. T

∗

is bounded.

Proof.

We want to show that

∗

B(Y

∗

)

is finite. For simplicity of notation,

the supremum is assumed to be taken over non-zero elements of the space. We

have

∗

B(Y

∗

)

= sup

g∈Y

∗

(g)k

∗

kgk

∗

= sup

g∈Y

∗

sup

x∈X

∗

(g)(x)|/kxk

kgk

∗

= sup

g∈Y

∗

sup

x∈X

|g(T x)|

kgk

∗

kxk

≤ sup

g∈Y

∗

sup

x∈X

kgk

∗

kT xk

kgk

∗

kxk

≤ sup

x∈X

kT k

B(X,Y )

kxk

= kT k

B(X,Y )

So it is finite.

1.4 The double dual

Definition

(Double dual)

Let

be a normed vector space. Define

∗∗

= (

∗

)

∗

We want to define a map

V → V

∗∗

. Again, we can reason about what

we expect this function to do. It takes in a

v ∈ V

, and produces a

(

)

∈ V

∗∗

Expanding the definition, this gives a

(

) :

∗

→ F

. Hence this

(

) takes in

a g ∈ V

∗

, and returns a φ(v)(g) ∈ F.

This is easy. Since

g ∈ V

∗

, we know that

is a function

V → F

. Given

this function

and a

v ∈ V

, it is easy to produce a

(

)(

)

∈ F

. Just apply

on v:

φ(v)(g) = g(v).

Proposition.

Let

V → V

∗∗

be defined by

(

)(

) =

(

). Then

is a

bounded linear map and kφk

B(V,V

∗

)

≤ 1

Proof. Again, we are taking supremum over non-zero elements. We have

kφk

B(V,V

∗

)

= sup

v∈V

kφ(v)k

∗∗

kvk

= sup

v∈V

sup

g∈V

∗

|φ(v)(g)|

kvk

kgk

∗

= sup

v∈V

sup

g∈V

∗

|g(v)|

kvk

kgk

∗

≤ 1.

In fact, we will later show that kφk

B(V,V

∗

)

= 1.

1.5 Isomorphism

So far, we have discussed a lot about bounded linear maps, which are “morphisms”

between normed vector spaces. It is thus natural to come up with the notion of

isomorphism.

Definition

(Isomorphism)

Let

X, Y

be normed vector spaces. Then

X → Y

is an isomorphism if it is a bounded linear map with a bounded linear inverse

(i.e. it is a homeomorphism).

We say X and Y are isomorphic if there is an isomorphism T : X → Y .

We say that

X → Y

is an isometric isomorphism if

is an isomorphism

and kT xk

= kxk

for all x ∈ X.

and

are isometrically isomorphic if there is an isometric isomorphism

between them.

Example.

Consider a finite-dimensional space

with the standard basis

, ··· , e

}. For any v =

, the norm is defined by

kvk =





1/2

Then any

g ∈ V

∗

is determined by

(

) for

= 1

, ··· , n

. We want to show that

there are no restrictions on what

(

) can be, i.e. whatever values I assign to

them, g will still be bounded. We have

kgk

∗

= sup

v∈V

|g(v)|

kvk

≤ sup

v∈V

||g(e

(

)

1/2

≤ C sup

v∈V

(

)

(

)



sup

|g(e



= C sup

|g(e

for some

, where the second-to-last line is due to the Cauchy-Schwarz inequality.

The supremum is finite since F

is finite dimensional.

Since

is uniquely determined by a list of values (

(

)

, g

(

)

, ··· , g

(

)),

it has dimension

. Therefore,

∗

is isomorphic to

. By the same lines of

argument, V

∗∗

is isomorphic to F

In fact, we can show that

V → V

∗∗

(

)(

) =

(

) is an isometric

isomorphism (this is not true for general normed vector spaces. Just pick

be incomplete, then V and V

∗∗

cannot be isomorphic since V

∗∗

is complete).

Example. Consider `

for p ∈ [1, ∞). What is `

∗

Suppose q is the conjugate exponent of p, i.e.

= 1.

(if p = 1, define q = ∞) It is easy to see that `

⊆ `

∗

by the following:

Suppose (

, x

, ···

)

∈ `

, and (

, y

, ···

)

∈ `

. Define

(

) =

∞

i=1

We will show that

defined this way is a bounded linear map. Linearity is easy

to see, and boundedness comes from the fact that

kyk

∗

= sup

ky(x)k

kxk

= sup

kxk

≤ sup

kxk

kyk

kxk

= kyk

by the H¨older’s inequality. So every (

)

∈ `

determines a bounded linear map.

In fact, we can show `

∗

is isomorphic to `

1.6 Finite-dimensional normed vector spaces

We are now going to look at a special case of normed vector spaces, where the

vector space is finite dimensional.

It turns out that finite-dimensional vector spaces are rather boring. In

particular, we have

(i) All norms are equivalent.

(ii) The closed unit ball is compact.

(iii) They are Banach spaces.

(iv) All linear maps whose domain is finite dimensional are bounded.

These are what we are going to show in this section.

First of all, we need to say what we mean when we say all norms are

“equivalent”

Definition

(Equivalent norms)

Let

be a vector space, and

k · k

norms on

. We say that these are equivalent if there exists a constant

C >

such that for any v ∈ V , we have

−1

kvk

≤ kvk

≤ Ckvk

It is an exercise to show that equivalent norms induce the same topology,

and hence agree on continuity and convergence. Also, equivalence of norms is an

equivalence relation (as the name suggests).

Now let

be an

-dimensional vector space with basis

, ··· , e

}

. We

can define the `

norm by

kvk

i=1

1/p

where

v =

i=1

Proposition.

Let

be an

-dimensional vector space. Then all norms on

are equivalent to the norm k · k

Corollary. All norms on a finite-dimensional vector space are equivalent.

Proof. Let k · k be a norm on V .

Let v = (v

, ··· , v

) =

∈ V . Then we have

kvk =



≤

i=1

|ke

≤



sup



i=1

≤ Ckvk

where C = sup ke

k < ∞ since we are taking a finite supremum.

For the other way round, let

{v ∈ V

kvk

= 1

}

. We will show the

two following results:

(i) k · k : (S

, k · k

) → R is continuous.

(ii) S

is a compact set.

We first see why this gives what we want. We know that for any continuous map

from a compact set to

, the image is bounded and the infimum is achieved. So

there is some v

∗

∈ S

such that

∗

k = inf

v∈S

kvk.

Since v

∗

6= 0, there is some c

such that kvk ≥ c

for all v ∈ S

Now take an arbitrary non-zero v ∈ V , since

kvk

∈ S

, we know that



kvk



≥ c

which is to say that

kvk ≥ c

kvk

Since we have found c, c

> 0 such that

kvk

≤ kvk ≤ ckvk

now let C = max





> 0. Then

−1

kvk

≤ kvk

≤ Ckvk

So the norms are equivalent. Now we can start to prove (i) and (ii).

First, let v, w ∈ V . We have



kvk −kwk



≤ kv − wk ≤ Ckv − wk

Hence when

is close to

under

, then

kvk

is close to

kwk

. So it is continuous.

To show (ii), it suffices to show that the unit ball

{v ∈ V

kvk

≤

}

is compact, since

is a closed subset of

. We will do so by showing it is

sequentially compact.

Let {v

(k)

}

∞

k=1

be a sequence in B. Write

(k)

i=1

(k)

Since v

(k)

∈ B, we have

i=1

|λ

(k)

| ≤ 1.

Consider the sequence λ

(k)

, which is a sequence in F.

We know that

|λ

(k)

| ≤

1. So by Bolzano-Weierstrass, there is a convergent

subsequence λ

)

Now look at

)

. Since this is bounded, there is a convergent subsequence

)

Iterate this for all

to obtain a sequence

such that

)

is convergent

for all i. So v

)

is a convergent subsequence.

Proposition.

Let

be a finite-dimensional normed vector space. Then the

closed unit ball

B(1) = {v ∈ V : kvk ≤ 1}

is compact.

Proof. This follows from the proof above.

Proposition.

Let

be a finite-dimensional normed vector space. Then

is a

Banach space.

Proof.

Let

} ∈ V

be a Cauchy sequence. Since

}

is Cauchy, it is bounded,

i.e.

} ⊆

(

) for some

R >

0. By above,

(

) is compact. So

}

has a

convergent subsequence

→ v

. Since

}

is Cauchy, we must have

→ v

So v

converges.

Proposition.

Let

V, W

be normed vector spaces,

be finite-dimensional. Also,

let T : V → W be a linear map. Then T is bounded.

Proof.

Recall discussions last time about

∗

for finite-dimensional

. We will

do a similar proof.

Note that since

is finite-dimensional,

im T

finite dimensional. So wlog

is finite-dimensional. Since all norms are equivalent, it suffices to consider the

case where the vector spaces have

and

norm. This can be represented by

a matrix T

such that

T (x

, ··· , x

) =



, ··· ,



We can bound this by

kT (x

, ··· , x

)k ≤

j=1

i=1

||x

| ≤ m



sup

i,j



i=1

| ≤ Ckxk

for some

C >

0, since we are taking the supremum over a finite set. This implies

that kT k

B(`

)

≤ C.

There is another way to prove this statement.

Proof.

(alternative) Let

V → W

be a linear map. We define a norm on

by kvk

= kvk

+ kT vk

. It is easy to show that this is a norm.

Since

is finite dimensional, all norms are equivalent. So there is a constant

C > 0 such that for all v, we have

kvk

≤ Ckvk

In particular, we have

kT vk ≤ Ckvk

So done.

Among all these properties, compactness of

(1) characterizes finite dimen-

sionality.

Proposition.

Let

be a normed vector space. Suppose that the closed unit

ball

B(1) is compact. Then V is finite dimensional.

Proof. Consider the following open cover of

B(1):

B(1) ⊆

[

y∈

B(1)





Since

(1) is compact, this has a finite subcover. So there is some

, ··· , y

such that

B(1) ⊆

[

i=1





Now let

span{y

, ··· , y

}

, which is a finite-dimensional subspace of

. We

want to show that in fact we have Y = V .

Clearly, by definition of Y , the unit ball

B(1) ⊆ Y + B





i.e. for every

v ∈ B

(1), there is some

y ∈ Y, w ∈ B

(

) such that

Multiplying everything by

, we get





⊆ Y + B





Hence we also have

B(1) ⊆ Y + B





By induction, for every n, we have

B(1) ⊆ Y + B





As a consequence,

B(1) ⊆

Y .

Since

is finite-dimensional, we know that

is complete. So

is a closed

subspace of V . So

Y = Y . So in fact

B(1) ⊆ Y.

Since every element in

can be rescaled to an element of

(1), we know that

V = Y . Hence V is finite dimensional.

This concludes our discussion on finite-dimensional vector spaces. We’ll end

with an example that shows these are not true for infinite dimensional vector

spaces.

Example.

Consider

, and

= (0

, ··· ,

, ···

), where

is 1 on the

entry, 0 elsewhere.

Note that if i 6= j, then

− e

k = 2.

Since

∈

(1), we see that

(1) cannot be covered by finitely many open balls

of radius

, since each open ball can contain at most one of {e

1.7 Hahn–Banach Theorem

Let

be a real normed vector space. What can we say about

∗

(

V, R

For instance, If V is non-trivial, must V

∗

be non-trivial?

The main goal of this section is to prove the Hahn–Banach theorem (surprise),

which allows us to produce a lot of elements in

∗

. Moreover, it doesn’t just tell

us that

∗

is non-empty (this is rather dull), but provides a tool to craft (or at

least prove existence of) elements of V

∗

that satisfy some property we want.

Proposition.

Let

be a real normed vector space, and

W ⊆ V

has co-

dimension 1. Assume we have the following two items:

– p : V → R (not necessarily linear), which is positive homogeneous, i.e.

p(λv) = λp(v)

for all v ∈ V, λ > 0, and subadditive, i.e.

p(v

+ v

) ≤ p(v

) + p(v

)

for all

, v

∈ V

. We can think of something like a norm, but more

general.

– f : W → R a linear map such that f(w) ≤ p(w) for all w ∈ W .

Then there exists an extension

V → R

which is linear such that

and

f(v) ≤ p(v) for all v ∈ V .

Why do we want this weird theorem? Our objective is to find something in

∗

. This theorem tells us that to find a bounded linear map in

, we just need

something in

bounded by a norm-like object, and then we can extend it to

Proof.

Let

∈ V \ W

. Since

has co-dimension 1, every element

v ∈ V

can be written uniquely as

, for some

w ∈ W, a ∈ R

. Therefore it

suffices to define

f(v

) and then extend linearly to V .

The condition we want to meet is

f(w + av

) ≤ p(w + av

) (∗)

for all

w ∈ W, a ∈ R

. If

= 0, then this is satisfied since

restricts to

If a > 0 then (∗) is equivalent to

f(w) + a

f(v

) ≤ p(w + av

We can divide by a to obtain

f(a

−1

w) +

f(v

) ≤ p(a

−1

w + v

We let w

= a

−1

w. So we can write this as

f(v

) ≤ p(w

+ v

) − f (w

for all w

∈ W .

If a < 0, then (∗) is equivalent to

f(w) + a

f(v

) ≤ p(w + av

We now divide by a and flip the sign of the equality. So we have

f(a

−1

w) +

f(v

) ≥ −(−a

−1

)p(w + av

In other words, we want

f(v

) ≥ −p(−a

−1

w −v

) − f (a

−1

w).

We let w

= −a

−1

w. Then we are left with

f(v

) ≥ −p(w

− v

) + f (w

for all w

∈ W .

Hence we are done if we can define a

(

) that satisfies these two conditions.

This is possible if and only if

−p(w

− v

) + f (w

) ≤ p(w

+ v

) − f (w

)

for all w

, w

. This holds since

f(w

) + f (w

) = f(w

+ w

)

≤ p(w

+ w

)

= p(w

− v

+ w

+ v

)

≤ p(w

− v

) + p(w

+ v

So the result follows.

The goal is to “iterate” this to get a similar result without the co-dimension

1 assumption. While we can do this directly for finitely many times, this isn’t

helpful (since we already know a lot about finite dimensional normed spaces). To

perform an “infinite iteration”, we need the mysterious result known as Zorn’s

lemma.

Digression on Zorn’s lemma

We first need a few definitions before we can come to Zorn’s lemma.

Definition

(Partial order)

A relation

≤

on a set

is a partial order if it

satisfies

(i) x ≤ x (reflexivity)

(ii) x ≤ y and y ≤ x implies x = y (antisymmetry)

(iii) x ≤ y and y ≤ z implies x ≤ z (transitivity)

Definition

(Total order)

Let (

S, ≤

) be a partial order.

T ⊆ S

is totally ordered

if for all x, y ∈ T , either x ≤ y or y ≤ x, i.e. every two things are related.

Definition

(Upper bound)

Let (

S, ≤

) be a partial order.

⊆ S

subset. We

say b ∈ S is an upper bound of this subset if x ≤ b for all x ∈ S

Definition

(Maximal element)

Let (

S, ≤

) be a partial order. Then

m ∈ S

is a

maximal element if x ≥ m implies x = m.

The glorious Zorn’s lemma tells us that:

Lemma

(Zorn’s lemma)

Let (

S, ≤

) be a non-empty partially ordered set such

that every totally-ordered subset

has an upper bound in

. Then

has a

maximal element.

We will not give a proof of this lemma here, but can explain why it should

be true.

We start by picking one element

. If it is maximal, then done.

Otherwise, there is some

> x

. If this is not maximal, then pick

> x

. We

do this to infinity “and beyond” — after picking infinitely many

, if we have

not yet reached a maximal element, we take an upper bound of this set, and call

it x

. If this is not maximal, we can continue picking a larger element.

We can do this forever, but if this process never stops, even after infinite

time, we would have picked out more elements than there are in

, which is

clearly nonsense. Of course, this is hardly a formal proof. The proper proof can

be found in the IID Logic and Set Theory course.

Back to vector spaces

The Hahn–Banach theorem is just our previous proposition without the constraint

that W has co-dimension 1.

Theorem

(Hahn–Banach theorem*)

Let

be a real normed vector space, and

W ⊆ V a subspace. Assume we have the following two items:

– p

V → R

(not necessarily linear), which is positive homogeneous and

subadditive;

– f : W → R a linear map such that f(w) ≤ p(w) for all w ∈ W .

Then there exists an extension

V → R

which is linear such that

and

f(v) ≤ p(v) for all v ∈ V .

Proof. Let S be the set of all pairs (

V ,

f) such that

(i) W ⊆

V ⊆ V

(ii)

f :

V → R is linear

(iii)

= f

(iv)

v) ≤ p(

v) for all

v ∈ V

We introduce a partial order

≤

by (

)

≤

(

) if

⊆

and

. It is easy to see that this is indeed a partial order.

We now check that this satisfies the assumptions of Zorn’s lemma. Let

{(

)}

α∈A

⊆ S be a totally ordered set. Define (

V ,

f) by

V =

[

α∈A

f(x) =

(x) for x ∈

This is well-defined because

{

(

V ,

)

}

α∈A

is totally ordered. So if

x ∈

and

x ∈

, wlog assume (

)

≤

(

). So

. So

(x) =

(x).

It should be clear that (

V ,

)

∈ S

and (

V ,

) is indeed an upper bound of

{(

)}

α∈A

. So the conditions of Zorn’s lemma are satisfied.

Hence by Zorn’s lemma, there is an maximal element (

W ,

)

∈ S

. Then by

definition,

is linear, restricts to

, and bounded by

. We now show

that

W = V .

Suppose not. Then there is some

∈ V \

. Define

span{

W , v

}

Now

is a co-dimensional 1 subspace of

. By our previous result, we know

that there is some

V → R

linear such that

and

(

)

≤ p

(

) for all

v ∈

V .

Hence we have (

W ,

)

∈ S

but (

W ,

)

(

V ,

). This contradicts the

maximality of (

W ,

f).

There is a particularly important special case of this, which is also known as

Hahn-Banach theorem sometimes.

Corollary

(Hahn-Banach theorem 2.0)

Let

W ⊆ V

be real normed vector

spaces. Given

f ∈ W

∗

, there exists a

f ∈ V

∗

such that

and

∗

kfk

∗

Proof.

Use the Hahn-Banach theorem with

(

) =

kfk

∗

kxk

for all

x ∈ V

Positive homogeneity and subadditivity follow directly from the axioms of the

norm. Then by definition

(

)

≤ p

(

) for all

w ∈ W

. So Hahn-Banach theorem

says that there is

V → R

linear such that

and

(

)

≤ p

(

) =

kfk

∗

kvk

Now notice that

f(v) ≤ kfk

∗

kvk

, −

f(v) =

f(−v) ≤ kfk

∗

kvk

implies that |

f(v)| ≤ kfk

∗

kvk

for all v ∈ V .

On the other hand, we have (again taking supremum over non-zero v)

∗

= sup

v∈V

f(v)|

kvk

≥ sup

w∈W

|f(w)|

kwk

= kfk

∗

So indeed we have k

∗

= kfk

∗

We’ll have some quick corollaries of these theorems.

Proposition.

Let

be a real normed vector space. For every

v ∈ V \ {

}

there is some f

∈ V

∗

such that f

(v) = kvk

and kf

∗

= 1.

Proof.

Apply Hahn-Banach theorem (2.0) with

span{v}

(

) =

kvk

Corollary.

Let

be a real normed vector space. Then

if and only if

f(v) = 0 for all f ∈ V

∗

Corollary.

Let

be a non-trivial real normed vector space,

v, w ∈ V

with

v 6= w. Then there is some f ∈ V

∗

such that f (v) 6= f(w).

Corollary.

is a non-trivial real normed vector space, then

∗

is non-trivial.

We now want to restrict the discussion to double duals. We define

V →

∗∗

as before by φ(v)(f) = f(v) for v ∈ V, f ∈ V

∗

Proposition. The map φ : V → V

∗∗

is an isometry, i.e. kφ(v)k

∗∗

= kvk

Proof. We have previously shown that

kφk

B(V,V

∗∗

)

≤ 1.

It thus suffices to show that the norm is greater than 1, or that

kφ(v)k

∗∗

≥ kvk

We can assume v 6= 0, for which the inequality is trivial. We have

kφ(v)k

∗∗

= sup

f∈V

∗

|φ(v)(f)|

kfk

∗

≥

|φ(v)(f

∗

= |f

(v)| = kvk

where

is the function such that

(

) =

kvk

, kf

∗

= 1 as we have

previously defined.

So done.

In particular,

is injective and one can view

as an isometric embedding of

V into V

∗∗

Definition (Reflexive). We say V is reflexive if φ(V ) = V

∗∗

Note that any reflexive space is Banach, since V

∗∗

, being the dual of V

∗

, is

reflexive.

You might have heard that for any infinite dimensional vector space V , the

dual of

is always strictly larger than

. This does not prevent an infinite

dimensional vector space from being reflexive. When we said the dual of

always strictly larger than

, we are referring to the algebraic dual, i.e. the set of

all linear maps from

. In the definition of reflexive (and everywhere else

where we mention “dual” in this course), we mean the continuous dual, where

we look at the set of all bounded linear maps from

. It is indeed possible

for the continuous dual to be isomorphic to the original space, even in infinite

dimensional spaces, as we will see later.

Example.

Finite-dimensional normed vector spaces are reflexive. Also

reflexive for p ∈ (1, ∞).

Recall that given T ∈ B(V, W ), we defined T

∗

∈ B(W

∗

, V

∗

) by

∗

(f)(v) = f(T v)

for v ∈ V, f ∈ W

∗

We have previously shown that

∗

B(W

∗

)

≤ kT k

B(V,W )

We will now show that in fact equality holds.

Proposition.

∗

B(W

∗

)

= kT k

B(V,W )

Proof. We have already shown that

∗

B(W

∗

)

≤ kT k

B(V,W )

For the other inequality, first let ε > 0. Since

kT k

B(V,W )

= sup

v∈V

kT vk

kvk

by definition, there is some

v ∈ V

such that

kT vk

≥ kTk

B(V,W )

kvk

− ε

wlog, assume kvk

= 1. So

kT vk

≥ kT k

B(V,W )

− ε.

Therefore, we get that

∗

B(W

∗

)

= sup

f∈W

∗

(f)k

∗

kfk

∗

≥ kT

∗

T v

∗

≥ |T

∗

T v

)(v)|

= |f

T v

(T v)|

= kT vk

≥ kT k

B(V,W )

− ε,

where we used the fact that

T v

∗

and

kvk

are both 1. Since

is arbitrary,

we are done.

2 Baire category theorem

2.1 The Baire category theorem

When we first write the Baire category theorem down, it might seem a bit

pointless. However, it turns out to be a really useful result, and we will be able

to prove surprisingly many results from it.

In fact, the Baire category theorem itself does not involve normed vector

spaces. It works on any metric space. However, most of the applications we have

here are about normed vector spaces.

To specify the theorem, we will need some terminology.

Definition

(Nowhere dense set)

Let

be a topological space. A subset

E ⊆ X

is nowhere dense if

E has empty interior.

Usually, we will pick

to be closed so that the definition just says that

has an empty interior.

Definition

(First/second category, meagre and residual)

Let

be a topological

space. We say that

Z ⊆ X

is of first category or meagre if it is a countable union

of nowhere dense sets.

A subset is of second category or non-meagre if it is not of first category.

A subset is residual if X \ Z is meagre.

Theorem

(Baire category theorem)

Let

be a complete metric space. Then

X is of second category.

This, by definition, means that it is not a countable union of nowhere dense

sets. This is equivalent to saying that if we can write

X =

∞

[

n=1

where each C

are closed, then C

has a non-empty interior for some n.

Alternatively, we can say that if

is a countable collection of open dense

sets, then

∞

n=1

∅

(for if

is open dense, then

X \ X

is closed with

empty interior).

Proof.

We will prove that the intersection of a countable collection of open dense

sets is non-empty. Let U

be a countable collection of open dense set.

The key to proving this is completeness, since that is the only information

we have. The idea is to construct a sequence, show that it is Cauchy, and prove

that the limit is in the intersection.

Construct a sequence

∈ X

and

0 as follows: let

, ε

be defined

such that

B(x

, ε

) ⊆ U

. This exists

is open and dense. By density, there is

some x

∈ U

, and ε

exists by openness.

We define the

iteratively. Suppose we already have

and

. Define

n+1

, ε

n+1

such that

B(x

n+1

, ε

n+1

) ⊆ B(x

, ε

) ∩ U

n+1

. Again, this is possible

because

n+1

is open and dense. Moreover, we choose our

n+1

such that

n+1

so that ε

→ 0.

Since

→

0, we know that

is a Cauchy sequence. By completeness of

, we can find an

x ∈ X

such that

→ x

. Since

is the limit of

, we know

that x ∈ B(x

, ε

) for all n. In particular, x ∈ U

for all n. So done.

2.2 Some applications

We are going to have a few applications of the Baire category theorem.

Proposition. R \ Q 6= ∅, i.e. there is an irrational number.

Of course, we can also prove this directly by, say, showing that

√

is irrational,

or that noting that

is uncountable, but

is not. However, we can also use

the Baire category theorem.

Proof.

Recall that we defined

to be the completion of

. So we just have to

show that Q is not complete.

First, note that

is countable. Also, for all

q ∈ Q

{q}

is closed and has

empty interior. Hence

Q =

[

q∈Q

{q}

is the countable union of nowhere dense sets. So it is not complete by the Baire

category theorem.

We will show that there are normed vector spaces which are not Banach

spaces.

Proposition. Let

be a normed vector space defined by the vector space

V = {(x

, x

, ···) : x

∈ R, ∃I ∈ N such that i > I ⇒ x

= 0},

with componentwise addition and scalar multiplication. This is the space of all

sequences that are eventually zero.

We define the norm by

kxk

∞

i=1

Then

is not a Banach space.

Note that

is not standard notation.

Proof. Let

= {x ∈

: x

= 0, ∀i ≥ n}.

By definition,

∞

[

n=1

We now show that

is nowhere dense. We first show that

is closed. If

→ x

with

∈ E

, then since

is 0 from the

th component onwards,

is also 0 from the

th component onwards. So we must have

x ∈ E

. So

closed.

We now show that

has empty interior. We need to show that for all

x ∈ E

and

ε >

0, there is some

y ∈

such that

ky − xk < ε

but

y 6∈ E

. This

is also easy. Given x = (x

, ··· , x

n−1

, 0, 0, ···), we consider

y = (x

, ··· , x

n−1

, ε/2, 0, 0, ···).

Then

ky − xk

< ε

but

y 6∈ E

. Hence by the Baire category theorem,

is not

complete.

Proposition. There exists an f ∈ C([0, 1]) which is nowhere differentiable.

Proof.

(sketch) We want to show that the set of all continuous functions which

are differentiable at at least one point is contained in a meagre subset of

([0

1]).

Then this set cannot be all of C([0, 1]) since C([0, 1]) is complete.

Let E

m,n

be the set of all f ∈ C([0, 1]) such that

(∃x)(∀y) 0 < |y −x| <

⇒ |f(y) − f(x)| < n|y − x|.

(where the quantifiers range over [0, 1]).

We now show that

{f ∈ C([0, 1]) : f is differentiable somewhere} ⊆

∞

[

n,m=1

m,n

This is easy from definition. Suppose

is differentiable at

. Then by definition,

lim

y→x

f(y) − f(x

)

y − x

= f

Let

n ∈ N

be such that

(

)

| < n

. Then by definition of the limit, there

is some

such that whenever 0

< |y − x| <

, we have

|f(y)−f (x)|

|y−x

< n

. So

f ∈ E

m,n

Finally, we need to show that each

m,n

is closed and has empty interior.

This is left as an exercise for the reader.

Theorem

(Banach-Steinhaus theorem/uniform boundedness principle)

Let

be a Banach space and

be a normed vector space. Suppose

is a collection

of bounded linear maps T

: V → W such that for each fixed v ∈ V ,

sup

(v)k

< ∞.

Then

sup

B(V,W )

< ∞.

This says that if the set of linear maps is pointwise bounded, then they are

uniformly bounded.

Proof. Let

= {v ∈ V : sup

(v)k

≤ n}.

Then by our conditions,

V =

∞

[

n=1

We can write each E

{v ∈ V : kT

(v)k

≤ n}.

Since

is bounded and hence continuous, so

{v ∈ V

(

)

≤ n}

the continuous preimage of a closed set, and is hence closed. So

, being the

intersection of closed sets, is closed.

By the Baire category theorem, there is some

such that

has non-empty

interior. In particular, (

∃n

)(

∃ε >

0)(

∃v

∈ V

) such that for all

v ∈ B

(

, ε

), we

have

sup

(v)k

≤ n.

Now consider arbitrary kv

≤ 1. Then

∈ B(v

, ε).

sup





εv





≤ n.

Therefore

sup

≤



n + sup



Note that the right hand side is independent of v

. So

sup

k≤1

sup

≤ ∞.

Note that this result is not true for general functions. For example, consider

f : [0, 1] → R defined by

n−1

n+1

n+2

Then for all x ∈ [0, 1], we have

sup

(x)| < ∞,

but

sup

(x)| = ∞.

However, by a proof very similar to what we had above, we have

Theorem

(Osgood)

Let

: [0

→ R

be a sequence of continuous functions

such that for all x ∈ [0, 1]

sup

(x)| < ∞

Then there are some a, b with 0 ≤ a < b ≤ 1 such that

sup

x∈[a,b]

(x)| < ∞.

Proof. See example sheet.

Theorem

(Open mapping theorem)

Let

and

be Banach spaces and

V → W

be a bounded surjective linear map. Then

is an open map, i.e.

T (U) is an open subset of W whenever U is an open subset of V .

Note that surjectivity is necessary. If

q 6∈ T

(

), then we can scale

down

arbitrarily and still not be in the image of

. So

(

) does not contain an open

neighbourhood of 0, and hence cannot be open.

Proof. We can break our proof into three parts:

(i)

We first want an easy way to check if a map is an open map. We want

to show that

is open if and only if

(

(1))

⊇ B

(

) for some

ε >

Note that one direction is easy — if

is open, then by definition

(

(1))

is open, and hence we can find the epsilon required. So we are going to

prove the other direction.

(ii) We show that T (B

(1)) ⊇ B

(ε) for some ε > 0

(iii)

By rescaling the norm in

, we may wlog the

obtained above is in fact

1. We then show that if T(B

(1)) ⊇ B

(1), then T (B

(1)) ⊇ B

(

We now prove them one by one.

(i)

Suppose

(

(1))

⊇ B

(

) for some

ε >

0. Let

U ⊆ V

be an open set.

We want to show that T (U) is open. So let p ∈ U, q = T p.

Since

is open, there is some

δ >

0 such that

(

p, δ

)

⊆ U

. We can also

write the ball as B

(p, δ) = p + B

(δ). Then we have

T (U) ⊇ T (p + B

(δ))

= T p + T (B

(δ))

= T p + δT (B

(1))

⊇ q + δB

(ε)

= q + B

(δε)

= B

(q, δε).

So done.

(ii) This is the step where we use the Baire category theorem.

Since T is surjective, we can write W as

W =

∞

[

n=1

T (B

(n)) =

∞

[

n=1

T (nB

(1)) =

∞

[

n=1

T (nB

(1)).

We have written

as a countable union of closed sets. Since

is a

Banach space, by the Baire category theorem, there is some

n ≥

1 such that

T (nB

(1))

has non-empty interior. But since

T (nB

(1))

nT (B

(1))

and multiplication by

is a homeomorphism, it follows that

T (B

(1))

has

non-empty interior. So there is some ε > 0 and w

∈ W such that

T (B

(1)) ⊇ B

, ε).

We have now found an open ball in the neighbourhood, but we want a ball

centered at the origin. We will use linearity in two ways. Firstly, since if

v ∈ B

(1), then −v ∈ B

(1). By linearly of T , we know that

T (B

(1)) ⊇ B

(−w

, ε).

Then by linearity, intuitively, since the image contains the balls

(

, ε

)

and

(

−w

, ε

), it must contain everything in between. In particular, it

must contain B

(ε).

To prove this properly, we need some additional work. This is easy if we had

(

(1))

⊇ B

(

, ε

) instead of the closure of it — for any

w ∈ B

(

we let

, v

∈ B

(1) be such that

(

) =

(

) =

−w

Then v =

satisfies kvk

< 1 and T (v) = w.

Since we now have the closure instead, we need to mess with sequences.

Since

T (B

(1)) ⊇ ±w

(

), for any

w ∈ B

(

), we can find sequences

(

) and (

) such that

, ku

1 for all

and

(

)

→ w

T (u

) → −w

+ w.

Now by the triangle inequality, we get



+ u



< 1,

and we also have

+ u

→

+ w

−w

+ w

= w.

So w ∈ T (B

(1)). So T (B

(1)) ⊇ B

(ε).

(iii) Let w ∈ B

(

). For any δ, we know

T (B

(δ)) ⊇ B

(δ).

Thus, picking δ =

, we can find some v

∈ V such that

, kT v

− wk <

Suppose we have recursively found v

such that

, kT (v

+ ··· + v

) − wk <

n+1

Then picking

n+1

, we can find

n+1

satsifying the properties listed

above. Then

∞

n=1

is Cauchy, hence convergent by completeness. Let

v be the limit. Then

kvk

≤

∞

i=1

< 1.

Moreover, by continuity of T , we know T v = w. So we are done.

Note that in this proof, we required both

and

to be Banach spaces.

However, we used the completeness in different ways. We used the completeness

to extract a limit, but we just used the completeness of

to say it is of

second category. In particular, it suffices to assume the image of

is of second

category, instead of assuming surjectivity. Hence if we know that

V → W

is a bounded linear map such that

is Banach and

im T

is of second category,

then

is open. As a consequence

has to be surjective (its image contains a

small open ball which we can scale up arbitrary).

We are now going to look at certain applications of the open mapping theorem

Theorem

(Inverse mapping theorem)

Let

V, W

be Banach spaces, and

V → W

be a bounded linear map which is both injective and surjective. Then

−1

exists and is a bounded linear map.

Proof.

We know that

−1

as a function of sets exists. It is also easy to show

that it is linear since

is linear. By the open mapping theorem, since

(

) is

open for all

U ⊆ V

open. So (

−1

)

−1

(

) is open for all

U ⊆ V

. By definition,

−1

is continuous. Hence

−1

is bounded since boundedness and continuity are

equivalent.

Theorem

(Closed graph theorem)

Let

V, W

be Banach spaces, and

V → W

a linear map. If the graph of T is closed, i.e.

Γ(T ) = {(v, T (v)) : v ∈ V } ⊆ V × W

is a closed subset of the product space (using the norm

(

v, w

)

V ×W

max{kvk

, kwk

}), then T is bounded.

What does this mean? Closedness of the graph means that if

→ v

and

(

)

→ w

, then

(

). What we want to show is continuity, which

is a stronger statement that if

→ v

, then

(

) converges and converges to

T (v).

Proof.

Consider

: Γ(

)

→ V

defined by

(

v, T

(

)) =

. We want to apply

the inverse mapping theorem to this. To do so, we need to show a few things.

First we need to show that the spaces are Banach spaces. This is easy — Γ(

)

is a Banach space since it is a closed subset of a complete space, and we are

already given that V is Banach.

Now we need to show surjectivity and injectivity. This is surjective since for

any

v ∈ V

, we have

(

v, T

(

)) =

. It is also injective since the function

single-valued.

Finally, we want to show φ is bounded. This is since

kvk

≤ max{kvk, kT (v)k} = k(v, T (v))k

Γ(T )

By the inverse mapping theorem,

−1

is bounded, i.e. there is some

C >

0 such

that

max{kvk

, kT (v)k} ≤ Ckvk

In particular, kT (v)k ≤ Ckvk

. So T is bounded.

Example.

We define

(

) to be equal to

([0

1]) as a vector space, but

equipped with the

([0

1]) norm instead. We seek to show that

(

) is not

complete.

To do so, consider the differentiation map

T : D(T ) → C([0, 1]),

First of all, this is unbounded. Indeed, consider the sequence of functions

(x) = x

. Then kf

C([0,1])

= 1 for all n. However,

kT f

C([0,1])

= sup

x∈[0,1]

n−1

= n.

So T is unbounded.

We claim the graph of

is closed. If so, then since

([0

1]) is complete, the

closed graph theorem implies D(T ) is not complete.

To check this, suppose we have

→ f

in the

([0

1]) norm, and

→ g

again in the C([0, 1]) norm. We want to show that f

= g.

By the fundamental theorem of calculus, we have

(t) = f

(0) +

(x) dx.

Hence by uniform convergence of f

→ g and f

→ f, we have

f(t) = f(0) +

g(x) dx.

So by the fundamental theorem of calculus, we know that

(

) =

(

). So the

graph of T is closed.

Since we know that

([0

1]) is complete, this shows that

(

) is not complete.

Example.

We can also use the Baire category theorem to understand Fourier

series. Let

→ R

be continuous, i.e.

: [

−π, π

]

→ R

which is continu-

ous with periodic boundary condition

(

−π

) =

(

). We define the Fourier

coefficients

f : Z → C by

f(k) =

2π

−π

−ikx

f(x) dx.

We define the Fourier series by

k∈Z

ikz

f(k).

In particular, we define the partial sums as

(f)(x) =

k=−n

ikx

f(k).

The question we want to ask is: does the Fourier series converge? We are not

even asking if it converges back to

. Just if it converges at all. More concretely,

we want to know if S

(f)(x) has a limit as n → ∞ for f continuous.

Unfortunately, no. We can show that there exists a continuous function

such that

(

)(

) diverges. To show this, we can consider

(

)

→ R

defined by φ

(f) = S

(f)(0). Assume that

sup

|φ

(f)|

is finite for all continuous f. By Banach-Steinhaus theorem, we have

sup

kφ

B(C(S

),R)

< ∞,

On the other hand, we can show that

(f) =

2π

−π

f(x)

sin



n +



sin

dx.

It thus suffices to find a sequence f

such that kf

C(S

)

≤ 1 but



−π

f(x)

sin



n +



sin



→ ∞,

which is a contradiction. Details are left for the example sheet.

What’s the role of the Banach-Steinhaus theorem here? If we wanted to prove

the result directly, then we need to find a single function

f ∈ C

(

) such that

(

) is unbounded. However, now we just have to find a sequence

∈ C

(

)

such that φ

) → ∞. This is much easier.

There is another thing we can ask. Note that if

is continuous, then

(

)

| →

0 as

k → ±∞

. In fact, this is even true if

f ∈ L

(

), i.e.

is Lebesgue

integrable and

−π

|f(x)| dx < ∞.

A classical question is: do all sequences

} ∈ C

with

| →

0 as

n → ±∞

arise as the Fourier series of some

f ∈ L

? The answer is no. We let

˜c

the set of all such sequences. By the inverse mapping theorem, if the map

(

)

→ ˜c

is surjective, then the inverse is bounded. But this is not true,

since we can find a sequence

such that

)

→ ∞

but

sup

(

)

| ≤

for all `. Details are again left for the reader.

3 The topology of C(K)

Before we start the chapter, it helps to understand the title. In particular, what

(

)? In the chapter,

will denote a compact Hausdorff topological space.

We will first define what it means for a space to be Hausdorff.

Definition

(Hausdorff space)

A topological space

is Hausdorff if for all

distinct

p, q ∈ X

, there are

, U

⊆ X

that are open subsets of

such that

p ∈ U

, q ∈ U

and U

∩ U

= ∅.

Example. Every metric space is Hausdorff.

What we want to look at here is compact Hausdorff spaces.

Example. [0, 1] is a compact Hausdorff space.

Notation. C(K) is the set of continuous functions f : K → R with the norm

kfk

C(K)

= sup

x∈K

|f(x)|.

There are three themes we will discuss

(i)

There are many functions in

(

). For example, we will show that given

a finite set of points

}

i=1

⊆ K

and

}

i=1

⊆ R

, there is some

f ∈ C

(

)

such that

(

) =

. We will prove this later. Note that this is trivial for

([0

1]), since we can use piecewise linear functions. However, this is not

easy to prove if

is a general compact Hausdorff space. In fact, we can

prove a much stronger statement, known as the Tietze-Urysohn theorem.

(ii)

Elements of

(

) can be approximated by nice functions. This should be

thought of as a generalization of the Weierstrass approximation theorem,

which states that polynomials are dense in

([0

1]), i.e. every continu-

ous function can be approximated uniformly to arbitrary accuracy by

polynomials.

(iii)

Compact subsets of

(

). One question we would like to understand is

that given a sequence of functions

}

∞

n=1

⊆ C

(

), when can we extract

a convergent subsequence?

3.1 Normality of compact Hausdorff spaces

At this point, it is helpful to introduce a new class of topological spaces.

Definition

(Normal space)

A topological space

is normal if for every disjoint

pair of closed subsets

, C

, there exists

, U

⊆ X

disjoint open such

that C

⊆ U

, C

⊆ U

This is similar to being Hausdorff, except that instead of requiring the ability

to separate points, we want the ability to separate closed subsets.

In general, one makes the following definition:

Definition

(

space)

A topological space

has the

property if for all

x, y ∈ X, where x 6= y, there exists U ⊆ X open such that x ∈ U and y 6∈ U .

A topological space X has the T

property if X is Hausdorff.

A topological space

has the

property if for any

x ∈ X

C ⊆ X

closed

with

x 6∈ C

, then there are

, U

disjoint open such that

x ∈ U

, C ⊆ U

These spaces are called regular.

A topological space X has the T

property if X is normal.

Note here that

and

together imply

. It suffices to notice that

implies that

is a closed set for all

— for all

, let

be such that

y ∈ U

and x 6∈ U

. Then we can write

X \ {x} =

[

y6=x

which is open since it is a union of open sets.

More importantly, we have the following theorem:

Theorem.

Let

be a Hausdorff space. If

, C

⊆ X

are compact disjoint

subsets, then there are some

, U

⊆ X

disjoint open such that

⊆ U

, C

, ⊆

In particular, if

is a compact Hausdorff space, then

is normal (since

closed subsets of compact spaces are compact).

Proof.

Since

and

are disjoint, by the Hausdorff property, for every

p ∈ C

and q ∈ C

, there is some U

p,q

, V

p,q

⊆ X disjoint open with p ∈ U

p,q

, q ∈ V

p,q

Now fix a

. Then

q∈C

p,q

⊇ C

is an open cover. Since

is compact,

there is a finite subcover, say

⊆

[

i=1

p,q

for some {q

, ··· , q

} ⊆ C

Note that n and q

depends on which p we picked at the beginning.

Define

i=1

p,q

, V

[

i=1

p,q

Since these are finite intersections and unions,

and

are open. Also,

and V

are disjoint. We also know that C

⊆ V

Now note that

p∈C

⊇ C

is an open cover. By compactness of

, there

is a finite subcover, say

⊆

[

j=1

for some {p

, ··· , p

} ⊆ C

Now define

U =

[

j=1

, V =

j=1

Then U and V are disjoint open with C

⊆ U, C

⊆ V . So done.

Why do we care about this? It turns out it is easier to discuss continuous

functions on normal spaces rather than Hausdorff spaces. Hence, if we are given

that a space is compact Hausdorff (e.g. [0, 1]), then we know it is normal.

3.2 Tietze-Urysohn extension theorem

The objective of this chapter is to show that if we have a continuous function

defined on a closed subset of a normal space, then we can extend this to the

whole of the space.

We start a special case of this theorem.

Lemma

(Urysohn’s lemma)

Let

be normal and

, C

be disjoint closed

subsets of

. Then there is a

f ∈ C

(

) such that

= 0 and

= 1, and

0 ≤ f(x) ≤ 1 for all X.

Before we prove this, let’s look at a “stupid” example. Let

= [

−

2]. This

is compact Hausdorff. We let

= [

−

0] and

= [1

2]. To construct the

function f , we do the obvious thing:

−1 0 1 2

We can define this function f (in [0, 1]) by

f(x) = inf

: a, n ∈ N, 0 ≤ a ≤ 2

, x ≤

This is obviously a rather silly way to write our function out. However, this is

what we will end up doing in the proof below. So keep this in mind for now.

Proof. In this proof, all subsets labeled C are closed, and all subsets labeled U

are open.

First note that normality is equivalent to the following: suppose

C ⊆ U ⊆ X

where

is open and

is closed. Then there is some

closed,

open such that

C ⊆

U ⊆

C ⊆ U.

We start by defining

X \ C

. Since

and

are disjoint, we know

that C

⊆ U

. By normality, there exists C

and U

such that

⊆ U

⊆ C

⊆ U

Then we can find C

, C

, U

such that

⊆ U

⊆ C

⊆ U

⊆ C

⊆ U

⊆ C

⊆ U

Iterating this, we get that for all dyadic rationals

a, n ∈ N,

< a <

there are some U

open, C

closed such that U

⊆ C

, with C

⊆ U

if q < q

We now define f by

f(x) = inf {q ∈ (0, 1] dyadic rational : x ∈ U

with the understanding that inf ∅ = 1. We now check the properties desired.

– By definition, we have 0 ≤ f(x) ≤ 1.

– If x ∈ C

, then x ∈ U

for all q. So f(x) = 0.

– If x ∈ C

, then x 6∈ U

for all q. So f(x) = 1.

–

To show

is continuous, it suffices to check that

(

)

> α}

and

(

)

< α}

are open for all

α ∈ R

, as this shows that the pre-images of

all open intervals in R are open. We know that

f(x) < α ⇔ inf{q ∈ (0, 1) dyadic rational : x ∈ U

} < α

⇔ (∃q) q < α and x ∈ U

⇔ x ∈

[

q<α

Hence we have

{x : f(x) < α} =

[

q<α

which is open, since each U

is open for all q. Similarly we know that

f(x) > α ⇔ inf{q : x ∈ U

} > α

⇔ (∃q > α) x 6∈ C

⇔ x ∈

[

q>α

X \ C

Since this is a union of complement of closed sets, this is open.

With this, we can already say that there are many continuous functions. We

can just pick some values on some

, C

, and then get a continuous function

out of it. However, we can make a stronger statement.

Theorem

(Tietze-Urysohn extension theorem)

Let

be a normal topological

space, and

C ⊆ X

be a closed subset. Suppose

C → R

is a continuous

function. Then there exists an extension

X → R

which is continuous and

satisfies

= f and k

C(X)

= kfk

C(C)

This is in some sense similar to the Hahn-Banach theorem, which states that

we can extend linear maps to larger spaces.

Note that this implies the Urysohn’s lemma, since if

and

are closed,

then

∪ C

is closed. However, we cannot be lazy and not prove Urysohn’s

lemma, because the proof of this theorem relies on the Urysohn’s lemma.

Proof.

The idea is to repeatedly use Urysohn’s lemma to get better and better

approximations. We can assume wlog that 0

≤ f

(

)

≤

1 for all

x ∈ C

. Otherwise,

we just translate and rescale our function. Moreover, we can assume that the

sup

x∈C

(

) = 1. It suffices to find

X → R

with

with 0

≤

(

)

≤

1 for

all x ∈ X.

We define the sequences of continuous functions

C → R

and

X → R

for

i ∈ N

. We want to think of the sum

i=0

to be the approximations, and

n+1

the error on C.

Let

. This is the error we have when we approximate with the zero

function.

We first define g

on a subset of X by

(x) =

(

0 x ∈ f

−1





x ∈ f

−1



, 1



We can then extend this to the whole of

with 0

≤ g

(

)

≤

for all

Urysohn’s lemma.

(x)

f(x)

We define

= f

− g

By construction, we know that 0

≤ f

≤

. This is our first approximation.

Note that we have now lowered our maximum error from 1 to

. We now repeat

this.

Given f

: C → R with 0 ≤ f

≤





, we define g

by requiring

(x) =







0 x ∈ f

−1

h





i





x ∈ f

−1

h





i+1





i

and then extending to the whole of

with 0

≤ g

≤





and

continuous.

Again, this exists by Urysohn’s lemma. We then define f

i+1

= f

− g

We then have

i=0

= (f

− f

) + (f

− f

) + ··· + (f

− f

n+1

) = f − f

n+1

We also know that

0 ≤ f

i+1

≤





i+1

We conclude by letting

f =

∞

i=0

This exists because we have the bounds

0 ≤ g

≤





and hence

i=0

is Cauchy. So the limit exists and is continuous by the

completeness of C(X).

Now we check that

i=0

− f = −f

n+1

Since we know that kf

n+1

(C)

→ 0. Therefore, we know that

∞

i=0



= f.

Finally, we check the bounds. We need to show that 0

≤

(

)

≤

1. This is true

since g

≥ 0 for all i, and also

f(x)| ≤

∞

i=0

(x) ≤

i=0





= 1.

So done.

We can already show what was stated last time — if

is compact Hausdorff,

, ··· , p

} ⊆ K

a finite set of points, and

, ··· , y

} ⊆ R

, then there exists

K → R

continuous such that

(

) =

. This is since compact Hausdorff

spaces are normal, and singleton points are closed sets in Hausdorff spaces. In

fact, we can prove this directly with the Urysohn’s lemma, by, say, requesting

functions

such that

(

) =

(

) = 0 for

i 6

. Then we just sum all

Note that normality is necessary for Urysohn’s lemma. Since Urysohn’s

lemma is a special case of the Tietze-Urysohn extension theorem, normality is

also necessary for the Tietze-Urysohn extension theorem. In fact, the lemma

is equivalent to normality. Let

, C

be disjoint closed sets of

. If there is

some

X → R

such that

= 0,

= 1, then

−1



−∞,



and

= f

−1



, ∞



are open disjoint sets such that C

⊆ U

, C

⊆ U

Closedness of

and

is also necessary in Urysohn’s lemma. For example,

we cannot extend f : [0,

) ∪ (

, 1] to [0, 1] continuously, where f is defined as

f(x) =

(

0 x <

1 x >

3.3 Arzel`a-Ascoli theorem

Let

be compact Hausdorff, and

}

∞

n=1

be a sequence of continuous functions

K → R

(or

). When does (

) have a convergent subsequence in

(

In other words, when is there a subsequence which converges uniformly?

This will be answered by the Arzel`a-Ascoli theorem. Before we get to that,

we look at some examples.

Example.

Let

= [0

1], and

(

) =

. This does not have a convergent

subsequence in C(K) since it does not even have a subsequence that converges

pointwise. This is since f

is unbounded.

We see that unboundedness is one “enemy” that prevents us from having a

convergent subsequence.

Example. We again let K[0, 1]), and let f be defined as follows:

We know that

does not have a convergent subsequence in

(

), since any

subsequence must converge pointwise to

f(x) =

(

0 x 6= 0

1 x = 0

which is not continuous.

What is happening here? For every n, fixed x and every ε, by continuity of

, there ie some

such that

|x − y| < δ

implies

(

)

− f

(

)

| < ε

, but this

choice of

depends on

, and there is no universal choice that works for us. This

is another problem that leads to the lack of a limit.

The Arzel`a-Ascoli theorem tells us these are the only two “enemies” that

prevent us from extracting a convergent subsequence.

To state this theorem, we first need a definition.

Definition

(Equicontinuous)

Let

be a topological space, and

F ⊆ C

(

We say

is equicontinuous at

x ∈ K

if for every

, there is some

which is an

open neighbourhood of x such that

(∀f ∈ F )(∀y ∈ U) |f(y) − f (x)| < ε.

We say F is equicontinuous if it is equicontinuous at x for all x ∈ K.

Theorem

(Arzel`a-Ascoli theorem)

Let

be a compact topological space. Then

F ⊆ C

(

) is pre-compact, i.e.

is compact, if and only if

is bounded and

equicontinuous.

This indeed applies to the problem of extracting a uniformly convergent

subsequence, since

(

) is a metric space, and compactness is equivalent to

sequential compactness. Indeed, let (

) be a bounded and equicontinuous

sequence in

(

). Then

n ∈ N} ⊆ C

(

) is bounded and equicon-

tinuous. So it is pre-compact, and hence (

), being a sequence in

, has a

convergent subsequence.

To prove this, it helps to introduce some more terminology and a few lemmas

first.

Definition

(

-net)

Let

be a metric space, and let

E ⊆ X

. For

ε >

0, we say

that N ⊆ X is an ε-net for E if and only if

x∈N

B(x, ε) ⊇ E.

Definition

(Totally bounded subset)

Let

be a metric space, and

E ⊆ X

We say that E is totally bounded for every ε, there is a finite ε-net N

for E.

An important result about totally bounded subsets is the following:

Proposition.

Let

be a complete metric space. Then

E ⊆ X

is totally

bounded if and only if for every sequence

}

∞

i=1

⊆ E

, there is a subsequence

which is Cauchy.

By completeness, we can rewrite this as

Corollary.

Let

be a complete metric space. Then

E ⊆ X

is totally bounded

if and only if

E is compact.

We’ll prove these later. For now, we assume this corollary and we will prove

Arzel`a-Ascoli theorem.

Theorem

(Arzel`a-Ascoli theorem)

Let

be a compact topological space. Then

F ⊆ C

(

) is pre-compact, i.e.

is compact, if and only if

is bounded and

equicontinuous.

Proof.

By the previous corollary, it suffices to prove that

is totally bounded if

and only if F is bounded and equicontinuous. We first do the boring direction.

(

⇒

) Suppose

is totally bounded. First notice that

is obviously bounded,

since F can be written as the finite union of ε-balls, which must be bounded.

Now we show

is equicontinuous. Let

ε >

0. Since

is totally bounded,

there exists a finite

-net for

, i.e. there is some

, ··· , f

} ⊆ F

such that

for every f ∈ F , there exists an i ∈ {1, ··· , n} such that kf −f

C(K)

< ε.

Consider a point

x ∈ K

. Since

, ··· , f

}

are continuous, for each

, there

exists a neighbourhood U

of x such that |f

(y) − f

(x)| < ε for all y ∈ U

Let

U =

i=1

Since this is a finite intersection,

is open. Then for any

f ∈ F

y ∈ U

, we can

find some i such that kf −f

C(K)

< ε. So

|f(y) − f(x)| ≤ |f(y) −f

(y)|+ |f

(y) − f

(x)| + |f

(x) − f (x)| < 3ε.

So F is equicontinuous at x. Since x was arbitrary, F is equicontinuous.

(

⇐

) Suppose

is bounded and equicontinuous. Let

ε >

0. By equicontinuity,

for every

x ∈ K

, there is some neighbourhood

such that

(

)

−f

(

)

| < ε

for all y ∈ U

, f ∈ F . Obviously, we have

[

x∈K

= K.

By the compactness of K, there are some {x

, ··· , x

} such that

[

i=1

⊇ K.

Consider the restriction of functions in

to these points. This can be viewed

as a bounded subset of

∞

, the

-dimensional normed vector space with the

supremum norm. Since this is finite-dimensional, boundedness implies total

boundedness (due to, say, the compactness of the closed unit ball). In other

words, there is a finite

-net

, ··· , f

}

such that for every

f ∈ F

, there is a

j ∈ {1, ··· , m} such that

max

|f(x

) − f

)| < ε.

Then for every f ∈ F , pick an f

such that the above holds. Then

kf − f

C(K)

= sup

|f(y) − f

(y)|

Since {U

} covers K, we can write this as

= max

sup

y∈U

|f(y) − f

(y)|

≤ max

sup

y∈U



|f(y) − f(x

)| + |f (x

) − f

)| + |f

) − f

(y)|



< ε + ε + ε = 3ε.

So done.

Now we return to prove the proposition we just used to prove Arzel`a-Ascoli.

Proposition.

Let

be a (complete) metric space. Then

E ⊆ X

is totally

bounded if and only if for every sequence

}

∞

i=1

⊆ E

, there is a subsequence

which is Cauchy.

Proof.

(

⇒

) Let

E ⊆ X

be totally bounded,

} ∈ E

. For every

j ∈ N

, there

exists a finite

-net, call it N

Now since

is finite, there is some

such that there are infinitely many

’s in B(x

, 1). Pick the first y

in B(x

, 1) and call it y

Now there is some

∈ N

such that there are infinitely many

’s in

(

∩ B

(

). Pick the one with smallest value of

i > i

, and call this

Continue till infinity.

This procedure gives a sequence x

∈ N

and a subsequence {y

}, and also

∈

j=1





It is easy to see that {y

} is Cauchy since if m > n, then d(y

, y

) <

(

⇐

) Suppose

is not totally bounded. So there is no finite

-net. Pick any

. Pick y

such that d(y

, y

) ≥ ε. This exists because there is no finite ε-net.

Now given

, ··· , y

such that

(

, y

)

≥ ε

for all

i, j

= 1

, ··· , n

i 6

we pick

n+1

such that

(

n+1

, y

)

≥ ε

for all

= 1

, ··· , n

. Again, this exists

because there is no finite

-net. Then clearly any subsequence of

}

is not

Cauchy.

Note that the first part is similar to the proof of Bolzano-Weierstrass in

by repeated bisection.

Recall that at the beginning of the chapter, we have seen that boundedness

and equicontinuity assumptions are necessary. The compactness of

is also

important. Let

, which is not compact, and let

φ ∈ C

∞

, an infinitely

differentiable function with compact support, say, the bump function.

We now let

(

) =

(

x − n

), i.e. we shift our bump function to the right by

units. This sequence is clearly bounded and equicontinuous, but this has no

convergent subsequence —

converges pointwise to the zero function, but the

convergence is not uniform, and this is true for arbitrary subsequences as well.

We are going to look at some applications of the theorem:

Example.

Let

K ⊆ R

be a compact space,

}

∞

n=1

be a sequence of continu-

ously differentiable functions in C(K), such that

sup

(|f

(x)| + |f

(x)|) < C

for some

. Then there is a convergent subsequence. We clearly have uniform

boundedness. To obtain equicontinuity, since the derivative is bounded, by the

mean value theorem, we have

(x) − f

(y)|

|x − y|

= |f

(z)| ≤ C

(x) − f

(y)| ≤ C|x − y|.

Consider the ordinary differential equation

(

) with the boundary

conditions

(0) =

∈ R

. Recall from IB Analysis II that Picard-Lindel¨of

theorem says that if

is a Lipschitz function, then there exists some

ε >

0 such

that the ODE has a unique solution in (−ε, ε).

What if f is not Lipschitz? If so, we can get the following

Theorem

(Peano*)

Given

continuous, then there is some

ε >

0 such that

= f(x) with boundary condition x(0) = x

∈ R has a solution in (−ε, ε).

Note that uniqueness is false. For example, if

|x|

(0) = 0, then

x(t) = 0 and x(t) = t

are both solutions.

Proof.

(sketch) We approximate

by a sequence of continuously differentiable

functions

such that

kf − f

C(K)

→

0 for some

K ⊆ R

. We use Picard-

Lindel¨of to get a solution for all

. Then we use the ODE to get estimates for

the solution. Finally, we can use Arzel`a-Ascoli to extract a limit as

n → ∞

. We

can then show it is indeed a solution.

3.4 Stone–Weierstrass theorem

Here we will prove the Stone–Weierstrass theorem, which is a generalization of

the classical Weierstrass approximation theorem.

Theorem

(Weierstrass approximation theorem)

The set of polynomials are

dense in C([0, 1]).

This tells us that

([0

1]) is not too “big” because it is has a dense subset

of “nice” functions.

We will try to generalize this for more general domains. Note that for this

section, real-valued and complex-valued continuous functions are somewhat

differentiable. So we will write these as C

(K) and C

(K).

To state this theorem, we need some definitions.

Definition

(Algebra)

A vector space (

+) is called an algebra if there is an

operation (called multiplication)

V → V

such that (

, ·

) is a rng (i.e. ring

not necessarily with multiplicative identity). Also,

(

v ·w

) = (

λv

)

·w

v ·

(

λw

)

for all λ ∈ F, v, w ∈ V .

If V is in addition a normed vector space and

kv ·wk

≤ kvk

· kwk

for all v, w ∈ V , then we say V is a normed algebra.

If V complete normed algebra, we say V is a Banach algebra.

is an algebra that is commutative as a rng, then we say

is a commutative

algebra.

If V is an algebra with multiplicative identity, then V is a unital algebra.

Example. C(K) is a commutative, unital, Banach algebra.

Example. Recall from the example sheets that if T

, T

: V → V , then

◦ T

B(V,V )

≤ kT

B(V,V )

So B(V, V ) is a unital normed algebra.

We will need this language to generalize the Weierstrass approximation

theorem. The main problem in doing so is that we have to figure out what we

can generalize polynomials to. This is why we need these funny definitions.

Theorem

(Stone-Weierstrass theorem)

Let

be compact, and

A ⊆ C

(

)

be a subalgebra (i.e. it is a subset that is closed under the operations) with the

property that it separates points, i.e. for every

x, y ∈ K

distinct, there exists

some

f ∈ A

such that

(

)

(

). Then either

(

) or there is some

∈ K such that

A = {f ∈ C

R(K)

: f(x

) = 0}.

Note that it is not possible that

is always zero on 2 or more points, since

A separates points.

This indeed implies the Weierstrass approximation theorem, since polynomials

separates points (consider the polynomial

), and the polynomial 1 is never 0

for all x. In fact, this works also for polynomials on compact subsets of R

Note, however, that the second case of Stone-Weierstrass theorem can actually

happen. For example, consider

= [0

1] compact, and

be the algebra

generated by x. Then

A = {f ∈ C

(K) : f(0) = 0}.

We will prove this in using two lemmas:

Lemma.

Let

compact,

L ⊆ C

(

) be a subset which is closed under taking

maximum and minimum, i.e. if

f, g ∈ L

, then

max{f, g} ∈ L

and

min{f, g} ∈ L

(with

max{f, g}

defined as

max{f, g}

(

) =

max{f

(

)

, g

(

)

}

, and similarly for

minimum).

Given

g ∈ C

(

), assume further that for any

ε >

0 and

x, y ∈ K

, there

exists f

x,y

∈ L such that

x,y

(x) − g(x)| < ε, |f

x,y

(y) − g(y)| < ε.

Then there exists some f ∈ L such that

kf − gk

(K)

< ε,

i.e. g ∈

This means that if we are allowed to take maximums and minimums, to be

able to approximate a function, we just need to be able to approximate it at any

two points.

The idea is to next show that if

is a subalgebra, then

is closed under

taking maximum and minimum. Then use the ability to separate points to find

x,y

, and prove that we can approximate arbitrary functions.

Proof. Let g ∈ C

(K) and ε > 0 be given. So for every x, y ∈ K, there is some

x,y

∈ L such that

x,y

(x) − g(x)| < ε, |f

x,y

(y) − g(y)| < ε.

Claim.

For each

x ∈ K

, there exists

∈ L

such that

(

)

− g

(

)

| < ε

and

(z) < g(z) + ε for all z ∈ K.

Since f

x,y

is continuous, there is some U

x,y

containing y open such that

x,y

(z) − g(z)| < ε

for all z ∈ U

x,y

. Since

[

y∈K

x,y

⊇ K,

by compactness of K, there exists a some y

, ··· , y

such that

[

i=1

x,y

⊇ K.

g + ε

We then let

(z) = min{f

x,y

(z), ··· , f

x,y

(z)}

for every

z ∈ K

. We then see that this works. Indeed, by assumption,

∈ L

If z ∈ K is some arbitrary point, then z ∈ U

x,y

for some i. Then

x,y

(z) < g(z) + ε.

Hence, since f

is the minimum of all such f

x,y

, for any z, we have

(z) < g(z) + ε.

The property at x is also clear.

Claim. There exists f ∈ L such that |f(z) − g(z)| < ε for all z ∈ K.

We are going to play the same game with this. By continuity of

, there is

containing x open such that

(w) −g(w)| < ε

for all w ∈ V

. Since

[

x∈K

⊇ K,

by compactness of K, there is some {x

, ··· , x

} such that

[

j=1

⊇ K.

Define

f(z) = max{f

(z), ··· , f

(z)}.

Again, by assumption, f ∈ L. Then we know that

f(z) > g(z) −ε.

We still have our first bound

f(z) < g(z) + ε.

Therefore we have

kf − gk

(K)

< ε.

Lemma.

Let

A ⊆ C

(

) be a subalgebra that is a closed subset in the topology

of C

(K). Then A is closed under taking maximum and minimum.

Proof. First note that

max{f(x), g(x)} =

(f(x) + g(x)) +

|f(x) − g(x)|,

min{f(x), g(x)} =

(f(x) + g(x)) −

|f(x) − g(x)|.

Since

is an algebra, it suffices to show that

f ∈ A

implies

|f| ∈ A

for every

such that kf k

(K)

≤ 1.

The key observation is the following: consider the function

(

) =

√

x + ε

Then

(

) approximates

|x|

. This has the property that the Taylor expansion

(

) centered at

is uniformly convergent for

x ∈

1]. Therefore there

exists a polynomial S(x) such that

|S(x) −

x + ε

| < ε.

Now note that

(

)

− S

(0) is a polynomial with no constant term. Therefore,

since A is an algebra, if f ∈ A, then S(f

) − S(0) ∈ A by closure.

Now look at

k|f|−(S(f

) −S(0))k

(K)

≤ k|f|−

+ ε

k+ k

+ ε

−S(f

)k+ kS(0)k.

We will make each individual term small. For the first term, note that

sup

x∈[0,1]

|x −

+ ε

| = sup

x∈[0,1]

|x +

√

+ ε

= ε.

So the first term is at most

. The second term is also easy, since

is chosen

such that

(

)

−

√

x + ε

| <

1 for

x ∈

1], and

(

)

| ≤

1 for all

x ∈

1].

So it is again bounded by ε.

By the same formula, |S(0) −

√

0 + ε

| < ε. So |S(0)| < 2ε. So

k|f| − (S(f

) − S(0))k

(K)

< 4ε.

Since

ε >

0 and

is closed in the topology of

(

f ∈ A

and

kfk

(K)

≤

implies that |f| ∈ A.

Note that if we have already proven the classical Weierstrass approximation

theorem, we can just use it to get a polynomial approximation for |f| directly.

We will now combine both lemmas and prove the Stone-Weierstrass theorem.

Theorem

(Stone-Weierstrass theorem)

Let

be compact, and

A ⊆ C

(

)

be a subalgebra (i.e. it is a subset that is closed under the operations) with the

property that it separates points, i.e. for every

x, y ∈ K

distinct, there exists

some

f ∈ A

such that

(

)

(

). Then either

(

) or there is some

∈ K such that

A = {f ∈ C

R(K)

: f(x

) = 0}.

Proof.

Note that there are two possible outcomes. We will first look at the first

possibility.

Consider the case where for all

x ∈ K

, there is some

f ∈ A

such that

(

)

= 0.

Let

g ∈ C

(

) be given. By our previous lemmas, to approximate

, we

just need to show that we can approximate

at two points. So given any

ε >

x, y ∈ K, we want to find f

x,y

∈ A such that

x,y

(x) − g(x)| < ε, |f

x,y

(y) − g(y)| < ε. (∗)

For every

x, y ∈ K

x 6

, we first show that there exists

x,y

∈ A

such that

x,y

(

)

= 0, and

x,y

(

)

x,y

(

). This is easy to see. By our assumptions, we

can find the following functions:

(i) There exists h

(1)

x,y

such that h

(1)

x,y

6= h

(1)

x,y

(y)

(ii) There exists h

(2)

x,y

such that h

(2)

x,y

(x) 6= 0.

(iii) There exists h

(3)

x,y

such that h

(3)

x,y

(y) 6= 0.

Then it is an easy exercise to show that some linear combination of

(1)

x,y

and

(2)

x,y

and h

(3)

x,y

works, say h

x,y

We will want to find our

x,y

that satisfies (

∗

). But we will do better. We

will make it equal

and

. The idea is to take linear combinations of

x,y

and

x,y

. Instead of doing the messy algebra to show that we can find a working

linear combination, just notice that (

x,y

(

)

, h

x,y

(

)) and (

x,y

(

)

, h

x,y

(

)

are linearly independent vectors in

. Therefore there exists

α, β ∈ R

such that

α(h

x,y

(x), h

x,y

(y)) + β(h

x,y

(x)

, h

x,y

(y)

) = (g(x), g(y)).

So done.

In the other case, given

, suppose there is

∈ K

such that

(

) = 0 for

all f ∈ A. Consider the algebra

= A + λ1 = {f + λ1 : f ∈ A, λ ∈ R}

Since

separates points, and for any

x ∈ K

, there is some

f ∈ A

such that

f(x) 6= 0 (e.g. f = 1), by the previous part, we know that

= C

(K).

Now note that

A ⊆ {f ∈ C

(K) : f(x

) = 0} = B.

So we suffices to show that we have equality, i.e. for any

g ∈ B

and

ε >

0, there

is some f ∈ A such that

kf − gk

(K)

< ε.

Since

(

), given such

and

, there is some

f ∈ A

and

∈ R

such

that

kg − (f + λ1)k

(K)

< ε.

But

(

) =

(

) = 0, which implies that

|λ| < ε

. Therefore

kg −fk

(K)

So done.

What happens for C

(K)?

Example.

Let

B(0, 1) ⊆ C

, and let

be the set of polynomials on

B(0, 1)

We will show that

A 6= C

(B(0, 1)).

Consider

(

) =

¯z

. This is not in the closure of

, since this is not holomor-

phic, but the uniform limit of any sequence of holomorphic is holomorphic (by

Morera’s theorem, i.e.

is holomorphic iff

(

) d

= 0 for all closed piecewise

smooth curve γ).

Hence we need an additional condition on this. It turns out that this rather

simple example is the only way in which things can break. We have the following:

Theorem

(Complex version of Stone-Weierstrass theorem)

Let

be compact

and

A ⊆ C

(

) be a subalgebra over

which separates points and is closed

under complex conjugation (i.e. if

f ∈ A

, then

). Then either

(

)

or these is an x

such that

A = {f ∈ C

(K) : f(x

) = 0}.

Proof.

It suffices to show that either

A ⊇ C

(

) or there exists a point

such that

A ⊇ {f ∈ C

(

) :

(

) = 0

}

, since we can always break a complex

function up into its real and imaginary parts.

Now consider



f +

: f ∈ A



∪



f −

: f ∈ A



Now note that by closure of

, we know that

is a subset of

and is

a subalgebra of

(

) over

, which separates points. Hence by the real

version of Stone-Weierstrass, either

(

) or there is some

such that

= {f ∈ C

(K) : f(x

) = 0}. So done.

4 Hilbert spaces

4.1 Inner product spaces

We have just looked at continuous functions on some compact space

. Another

important space is the space of square-integrable functions. Consider the space

(R) =



f : f is Lebesgue integrable :

|f|

< ∞



/∼,

where f ∼ g if f = g is Lebesgue almost everywhere.

One thing we like to think about is the Fourier series. Recall that for

f ∈ C(S

), we have defined, for each k ∈ Z,

f(k) =

2π

−π

−ikx

f(x) dx,

and we have defined the partial sum

(f)(x) =

n=−N

inx

f(k).

We have previously seen that even if

is continuous, it is possible that the partial

sums

do not converge, even pointwise. However, we can ask for something

weaker:

Proposition. Let f ∈ C(S

). Then

lim

N→∞

2π

−π

|f(x) − S

(f)(x)|

dx = 0.

We will prove this later. However, the key points of the proof is the “orthog-

onality” of {e

inx

}. More precisely, we have

2π

−π

inx

−imx

dx = 0 if m 6= n.

The introduction of Hilbert spaces is in particular a way to put this in a

general framework. We want to introduce an extra structure that gives rise to

“orthogonality”.

Definition

(Inner product)

Let

be a vector space over

. We say

p : V × V → R or C is an inner product on V it satisfies

(i) p(v, w) = p(w, v) for all v, w ∈ V . (antisymmetry)

(ii) p

(

, u

) =

(

, w

) +

(

, w

). (linearity in first argument)

(iii) p(v, v) ≥ 0 for all v ∈ V and equality holds iff v = 0. (non-negativity)

We will often denote an inner product by

(

v, w

) =

hv, wi

. We call (

V, h·, ·i

)

an inner product space.

Definition

(Orthogonality)

In an inner product space,

and

are orthogonal

if hv, wi = 0.

Orthogonality and the inner product is important when dealing with vector

spaces. For example, recall that when working with finite-dimensional spaces, we

had things like Hermitian matrices, orthogonal matrices and normal matrices. All

these are in some sense defined in terms of the inner product and orthogonality.

More fundamentally, when we have a finite-dimensional vector spaces, we often

write the vectors as a set of

coordinates. To define this coordinate system, we

start by picking

orthogonal vectors (which are in fact orthonormal), and then

the coordinates are just the projections onto these orthogonal vectors.

Hopefully, you are convinced that inner products are important. So let’s see

what we can get if we put in inner products to arbitrary vector spaces.

We will look at some easy properties of the inner product.

Proposition

(Cauchy-Schwarz inequality)

Let (

V, h·, ·i

) be an inner product

space. Then for all v, w ∈ V ,

|hv, wi| ≤

hv, vihw, wi,

with equality iff there is some λ ∈ R or C such that v = λw or w = λv.

Proof.

wlog, we can assume

w 6

= 0. Otherwise, this is trivial. Moreover, assume

hv, wi ∈ R. Otherwise, we can just multiply w by some e

iα

By non-negativity, we know that for all t, we have

0 ≤ hv + tw, v + twi

= hv, vi + 2thv, wi+ t

hw, wi.

Therefore, the discriminant of this quadratic polynomial in

is non-positive, i.e.

4(hv, wi)

− 4hv, vihw, wi ≤ 0,

from which the result follows.

Finally, note that if equality holds, then the discriminant is 0. So the

quadratic has exactly one root. So there exists

such that

= 0, which of

course implies v = −tw.

Proposition. Let (V, h·, ·i) be an inner product space. Then

kvk =

hv, vi

defines a norm.

Proof.

The first two axioms of the norm are easy to check, since it follows directly

from definition of the inner product that

kvk ≥

0 with equality iff

, and

kλvk = |λ|kvk.

The only non-trivial thing to check is the triangle inequality. We have

kv + wk

= hv + w, v + wi

= kvk

+ kwk

+ |hv, wi| + |hw, v i|

≤ kvk

+ kwk

+ 2kvkkwk

= (kvk + kwk)

Hence we know that kv + wk ≤ kvk + kwk.

This motivates the following definition:

Definition

(Euclidean space)

A normed vector space (

V, k · k

) is a Euclidean

space if there exists an inner product h·, ·i such that

kvk =

hv, vi.

Proposition.

Let (

E, k · k

) be a Euclidean space. Then there is a unique inner

product h·, ·i such that kvk =

hv, vi.

Proof. The real and complex cases are slightly different.

First suppose

is a vector space over

, and suppose also that we have an

inner product h·, ·i such that kvk =

hv, vi. Then

hv + w, v + wi = kvk

+ 2hv, wi + kwk

So we get

hv, wi =

(kv + wk

− kvk

− kwk

). (∗)

In particular, the inner product is completely determined by the norm. So this

must be unique.

Now suppose E is a vector space over C. We have

hv + w, v + wi = kvk

+ kwk

+ hv, wi + hw, v i (1)

hv −w, v −wi = kvk

+ kwk

− hv, wi −hw, v i (2)

hv + iw, v + iwi = kvk

+ kwk

− ihv, wi + ihw, v i (3)

hv −iw, v −iwi = kvk

+ kwk

+ ihv, wi −ihw, v i (4)

Now consider (1) − (2) + i(3) − i(4). Then we obtain

kv + wk

− kv − wk

+ ikv + iwk

− ikv − iwk

= 4hv, wi. (†)

So again hv, wi is again determined by the norm.

The identities (

∗

) and (

†

) are sometimes known as the polarization identities.

Definition

(Hilbert space)

A Euclidean space (

E, k · k

) is a Hilbert space if it

is complete.

We will prove certain properties of the inner product.

Proposition

(Parallelogram law)

Let (

E, k · k

) be a Euclidean space. Then

for v, w ∈ E, we have

kv −wk

+ kv + wk

= 2kvk

+ 2kwk

This is called the parallelogram law because it says that for any parallelogram,

the sum of the square of the lengths of the diagonals is the sum of square of the

lengths of the two sides.

v + w

v −w

Proof. This is just simple algebraic manipulation. We have

kv −wk

+ kv + wk

= hv −w, v −wi + hv + w, v + w i

= hv, vi −hv, wi−hw, v i + hw, wi

+ hv, vi + hv, w i + hw, vi+ hw, wi

= 2hv, vi + 2hw, wi.

Proposition

(Pythagoras theorem)

Let (

E, k · k

) be a Euclidean space, and

let v, w ∈ E be orthogonal. Then

kv + wk

= kvk

+ kwk

Proof.

kv + wk

= hv + w, v + wi

= hv, vi + hv, wi+ hw, v i + hw, wi

= hv, vi + 0 + 0 + hw, wi

= kvk

+ kwk

By induction, if

∈ E

for

= 1

, ··· , n

such that

, v

= 0 for

i 6

, i.e.

they are mutually orthogonal, then



i=1



i=1

Proposition.

Let (

E, k · k

) be a Euclidean space. Then

h·, ·i

E × E → C

continuous.

Proof. Let (v, w) ∈ E × E, and (

w) ∈ E × E. We have

khv, wi −h

wik = khv, wi −hv,

wi + hv,

wi −h

wik

≤ khv, wi −hv,

wik + khv,

wi −h

wik

= khv, w −

wik + khv −

wik

≤ kvkkw −

wk + kv −

vkk

Hence for

v, w

sufficiently closed to

, we can get

khv, wi−h

wik

arbitrarily

small. So it is continuous.

When we have an incomplete Euclidean space, we can of course take the

completion of it to form a complete extension of the original normed vector

space. However, it is not immediately obvious that the inner product can also

be extended to the completion to give a Hilbert space. The following proposition

tells us we can do so.

Proposition.

Let (

E, k · k

) denote a Euclidean space, and

its completion.

Then the inner product extends to an inner product on

, turning

into a

Hilbert space.

Proof.

Recall we constructed the completion of a space as the equivalence classes

of Cauchy sequences (where two Cauchy sequences (

) and (

) are equivalent

−x

| →

0). Let (

)

(

) be two Cauchy sequences in

, and let

˜x, ˜y ∈

denote their equivalence classes. We define the inner product as

yi = lim

n→∞

, y

i. (∗)

We want to show this is well-defined. Firstly, we need to make sure the limit

exists. We can show this by showing that this is a Cauchy sequence. We have

khx

, y

i − hx

, y

ik = khx

, y

i − hx

, y

i + hx

, y

i − hx

, y

≤ khx

, y

i − hx

, y

ik + khx

, y

i − hx

, y

≤ khx

, x

, y

ik + khx

, y

− y

≤ kx

− x

kky

k + kxkky

− y

So hx

, y

i is a Cauchy sequence since (x

) and (y

) are.

We also need to show that (

∗

) does not depend on the representatives for

and

y. This is left as an exercise for the reader

We also need to show that

h·, ·i

define the norm of

k · k

, which is yet

another exercise.

Example. Consider the space

(

, x

, ···) : x

∈ C,

∞

i=1

< ∞

)

We already know that this is a complete Banach space. We can also define an

inner product on this space by

ha, bi

∞

i=1

We need to check that this actually converges. We prove this by showing absolute

convergence. For each n, we can use Cauchy-Schwarz to obtain

i=1

| ≤

i=1

|a|

i=1

|b|

≤ kak

kbk

So it converges. Now notice that the

norm is indeed induced by this inner

product.

This is a significant example since we will later show that every separable (i.e.

has countable basis) infinite dimensional Hilbert space is isometric isomorphic

to `

Definition

(Orthogonal space)

Let

be a Euclidean space and

S ⊆ E

arbitrary subset. Then the orthogonal space of S, denoted by S

⊥

is given by

⊥

= {v ∈ E : ∀w ∈ S, hv, w i = 0}.

Proposition.

Let

be a Euclidean space and

S ⊆ E

. Then

⊥

is a closed

subspace of E, and moreover

⊥

= (span S)

⊥

Proof.

We first show it is a subspace. Let

u, v ∈ S

⊥

and

λ, µ ∈ C

. We want to

show λu + µv ∈ S

⊥

. Let w ∈ S. Then

hλu + µv, wi = λhu, wi + µhv, wi = 0.

To show it is closed, let

∈ S

⊥

be a sequence such that

→ u ∈ E

. Let

w ∈ S. Then we know that

, wi = 0.

Hence, by the continuity of the inner product, we have

0 = lim

n→∞

, wi = hlim u

, wi = hu, w i.

The remaining part is left as an exercise.

Note that if

is a linear subspace, then

V ∩V

⊥

{

}

, since any

v ∈ V ∩V

⊥

has to satisfy hv, vi = 0. So V + V

⊥

is a direct sum.

Theorem.

Let (

E, k · k

) be a Euclidean space, and

F ⊆ E

a complete subspace.

Then F ⊕ F

⊥

= E.

Hence, by definition of the direct sum, for

x ∈ E

, we can write

where x

∈ F and x

∈ F

⊥

. Moreover, x

is uniquely characterized by

− xk = inf

y∈F

ky −xk.

Note that this is not necessarily true if F is not complete.

Proof.

We already know that

F ⊕ F

⊥

is a direct sum. It thus suffices to show

that the sum is the whole of E.

Let y

∈ F be a sequence with

lim

i→∞

− xk = inf

y∈F

ky −xk = d.

We want to show that

is a Cauchy sequence. Let

ε >

0 be given. Let

∈ N

such that for all i ≥ n

, we have

− xk

≤ d

+ ε.

We now use the parallelogram law for

x − y

with

i, j ≥ n

Then the parallelogram law says:

kv + wk

+ kv − wk

= 2kvk

+ 2kwk

− y

+ k2x − y

− y

= 2ky

− xk

+ 2ky

− xk

Hence we know that

− y

≤ 2ky

− xk

+ 2ky

− xk

− 4



x −

+ y



≤ 2(d

+ ε) + 2(d

+ ε) − 4d

≤ 4ε.

is a Cauchy sequence. Since

is complete,

→ y ∈ F

for some

Moreover, by continuity, of k · k, we know that

d = lim

i→∞

− xk = ky − xk.

Now let

and

x − y

. The only thing left over is to show

∈ F

⊥

Suppose not. Then there is some

y ∈ F such that

y, x

i 6= 0.

The idea is that we can perturbe

by a little bit to get a point even closer to

By multiplying

y with a scalar, we can assume

y, x

i > 0.

Then for t > 0, we have

k(y + t

y) −xk

= hy + t

y −x, y + t

y −xi

= hy −x, y − xi + ht

y, y −xi + hy − x, t

yi + t

= d

− 2th

y, x

i + t

Hence for sufficiently small

, the

term is negligible, and we can make this

less that d

. This is a contradiction since y + t

y ∈ F .

As a corollary, we can define the projection map as follows:

Corollary.

Let

be a Euclidean space and

F ⊆ E

a complete subspace. Then

there exists a projection map

E → E

defined by

(

) =

, where

∈ F

as defined in the theorem above. Moreover,

satisfies the following properties:

(i) P (E) = F and P (F

⊥

) = {0}, and P

= P . In other words, F

⊥

≤ ker P .

(ii) (I − P )(E) = F

⊥

, (I − P )(F ) = {0}, (I − P )

= (I − P ).

(iii) kP k

B(E,E)

≤

1 and

kI −P k

B(E,E)

≤

1, with equality if and only if

F 6

{

}

and F

⊥

6= {0} respectively.

Here P projects our space to F , where I − P projects our space to F

⊥

4.2 Riesz representation theorem

Our next theorem allows us to completely understand the duals of Hilbert spaces.

Consider the following map. Given a Hilbert space

and

v ∈ H

, consider

∈ H

∗

defined by

(w) = hw, vi.

Note that this construction requires the existence of a inner product.

Notice that this is indeed a bounded linear map, where boundedness comes

from the Cauchy-Schwarz inequality

|hw, vi| ≤ kvk

kwk

Therefore, φ taking v 7→ φ

is a map φ : H → H

∗

Using this simple construction, we have managed to produce a lot of members

of the dual. Are there any more things in the dual? The answer is no, and this

is given by the Riesz representation theorem.

Proposition

(Riesz representation theorem)

Let

be a Hilbert space. Then

H → H

∗

defined by

v 7→ h·, vi

is an isometric anti-isomorphism, i.e. it is

isometric, bijective and

φ(λv + µw) =

λφ(v) + ¯µφ(v).

Proof. We first prove all the easy bits, namely everything but surjectivity.

–

To show injectivity, if

, then

hw, vi

hw, ui

for all

by definition.

hw, v−ui

= 0 for all

. In particular,

hv−w, v−wi

= 0. So

v−w

= 0.

–

To show that it is an anti-homomorphism, let

v, w, y ∈ H

and

λ, µ ∈ F

Then

λv+µw

(y) = hy, λv + µwi =

λhy, vi + ¯µhy, wi =

λφ

(y) + ¯µφ

(y).

– To show it is isometric, let v, w ∈ H and kwk

= 1. Then

|φ

(w)| = |hw, vi| ≤ kwk

kvk

= kvk

Hence, for all

kφ

∗

≤ kvk

for all

v ∈ H

. To show

kφ

∗

is exactly

kvk

, it suffices to note that

|φ

(v)| = hv, vi = kvk

So kφ

∗

≥ kvk

/kvk

= kvk

Finally, we show surjectivity. Let ξ ∈ H

∗

. If ξ = 0, then ξ = φ

Otherwise, suppose

ξ 6

= 0. The idea is that (

ker ξ

)

⊥

is one-dimensional, and

then the

we are looking for will be an element in this complement. So we

arbitrarily pick one, and then scale it appropriately.

We now write out the argument carefully. First, we note that since

continuous,

ker ξ

is closed, since it is the inverse image of the closed set

{

}

. So

ker ξ is complete, and thus we have

H = ker ξ ⊕ (ker ξ)

⊥

The next claim is that

dim

(

ker ξ

) = 1. This is an immediate consequence of the

first isomorphism theorem, whose proof is the usual one, but since we didn’t

prove that, we will run the argument manually.

We pick any two elements

, v

∈

(

ker ξ

)

⊥

. Then we can always find some

λ, µ not both zero such that

λξ(v

) + µξ(v

) = 0.

λv

µv

∈ ker ξ

. But they are also in (

ker ξ

)

⊥

by linearity. Since

ker ξ

and

(

ker ξ

)

⊥

have trivial intersection, we deduce that

λv

µv

= 0. Thus, any two

vectors in (

ker ξ

)

⊥

are dependent. Since

ξ 6

= 0, we know that

ker ξ

has dimension

Now pick any

v ∈

(

ker ξ

)

⊥

such that

(

)

= 0. By scaling it appropriately,

we can obtain a v such that

ξ(v) = hv, v i.

Finally, we show that

. To prove this, let

w ∈ H

. We decompose

using

the previous theorem to get

w = αv + w

for some

∈ ker ξ

and

α ∈ F

. Note that by definition of (

ker ξ

)

⊥

, we know

that hw

, vi = 0. Hence we know that

ξ(w) = ξ(αv + w

) = ξ(αv) = αξ(v)

= αhv, vi = hαv, vi = hαv + w

, vi = hw, vi.

Since w was arbitrary, we are done.

Using this proposition twice, we know that all Hilbert spaces are reflexive,

i.e. H

∼

∗∗

We now return to the proof of the proposition we claimed at the beginning.

Proposition. For f ∈ C(S

), defined, for each k ∈ Z,

f(k) =

2π

−π

ikx

f(x) dx.

The partial sums are then defined as

(f)(x) =

n=−N

inx

f(k).

Then we have

lim

N→∞

2π

−π

|f(x) − S

(f)(x)|

dx = 0.

Proof.

Consider the following Hilbert space

(

) defined as the completion of

) under the inner product

hf, gi =

2π

−π

f(x)¯g(x) dx,

Consider the closed subspace

= span{e

inx

: |n| ≤ N }.

Then in fact S

defined above by

(f)(x) =

n=−N

−inx

f(k)

is the projection operator onto

. This is since we have the orthonormal

condition

inx

, e

−imx

i =

2π

−π

inx

−imx

dx =

(

1 n = m

0 n 6= m

Hence it is easy to check that if

f ∈ U

, say

n=−N

inx

, then

since

(f) =

n−−N

f(k)e

−inx

n=−N

hf, e

inx

−inx

n=−N

−inx

= f

using the orthogonality relation. But if f ∈ U

⊥

, then

2π

−π

−inx

f(x) dx = 0

for all |n| < N. So S

(f) = 0. So this is indeed a projection map.

In particular, we will use the fact that projection maps have norms

≤

Hence for any P (x), we have

2π

−π

(f)(x) − S

(P )(x)|

dx ≤

2π

−π

|f(x) − P (x)|

Now consider the algebra

generated

inx

n ∈ Z}

. Notice that

separates

points and is closed under complex conjugation. Also, for every

x ∈ S

, there

exists

f ∈ A

such that

(

)

= 0 (using, say

(

) =

). Hence, by Stone-

Weierstrass theorem,

(

), i.e. for every

f ∈ C

(

) and

ε >

0, there

exists a polynomial P of e

and e

−ix

such that

kP − f k < ε.

We are almost done. We now let

N > deg P

be a large number. Then in

particular, we have S

(P ) = P . Then



2π

−π

(f) − f |



≤



2π

−π

(f) − S

(P )|





2π

−π

(P ) − P |





2π

−π

|P − f |



≤ ε + 0 + ε

= 2ε.

So done.

4.3 Orthonormal systems and basis

Definition

(Orthonormal system)

Let

be a Euclidean space. A set of unit

vectors {e

}

α∈A

is called an orthonormal system if he

, e

i = 0 if α 6= β.

We want to define a “basis” in an infinite dimensional vector space. The idea

is that these should be orthonormal systems “big enough” to span everything.

In finite-dimensions, this was easy, since there is the notion of dimension — if

we have

dimensions, then we just take an orthonormal system of

vectors,

and we are done.

If we have infinite dimensions, this is trickier. If we have many many

dimensions and vectors, we can keep adding things to our orthonormal system,

but we might never get to such a “basis”, if our “basis” has to be uncountable.

Hence we have the idea of “maximality”.

Definition

(Maximal orthonormal system)

Let

be a Euclidean space. An

orthonormal space is called maximal if it cannot be extended to a strictly larger

orthonormal system.

By Zorn’s lemma, a maximal orthonormal system always exists. We will

later see that in certain nice cases, we can construct a maximal orthonormal

system directly, without appealing to Zorn’s lemma. The advantage of an explicit

construction is that we will understand our system much more.

One important thing we would like to do is given an orthonormal system,

decide whether it is maximal. In general, this is difficult, and Zorn’s lemma is

completely useless.

Now suppose we are nicer and have a Hilbert space. What we would like to

say is that if we have a maximal orthonormal system, then its span is the whole

space

. However, this doesn’t really work. The span of a set

only allows us

to take finite linear combinations, but by completeness of

, we want to have

the infinite sums, i.e. the limits as well. So what we really have is the following.

Proposition.

Let

be a Hilbert space. Let

be a maximal orthonormal

system. Then span S = H.

While this might seem difficult to prove at first, it turns out the proof is

pretty short and simple.

Proof. Recall that S

⊥

= (span S)

⊥

. Since H is a Hilbert space, we have

H = span S ⊕(span S)

⊥

= span S ⊕ S

⊥

Since S is maximal, S

⊥

= {0}. So done.

How about the converse? It is also true. In fact, it is true even for Euclidean

spaces, and the proof is easy.

Proposition.

Let

be Euclidean, and let

be an orthonormal system. If

span S = E, then S is maximal.

Proof.

⊥

= (span S)

⊥

= E

⊥

= {0}.

So in a Hilbert space, we have an if and only if condition — a system is

maximal if and only if the closure of the span is everything. In other words,

given any vector

v ∈ H

, we can find a sequence

in the span of the maximal

system that converges to

. This sequence is clearly not unique, since we can

just add a random term to the first item.

However, we can do something better. Consider our space

, and the element

, ···

). There is a very natural way to write this as the limit of the sequence:

(1, 0, 0, ···),



, 0, ···





, 0, ···



, ··· .

What we are doing is that we are truncating the element at the

th component

for each

. Alternatively, the

th term is what we get when we project our

onto the space spanned by the first

“basis” vectors. This is a nice and natural

way to produce the sequence.

Definition

(Hilbert space basis)

Let

be a Hilbert space. A maximal or-

thonormal system is called a Hilbert space basis.

Recall that at the beginning, we said we needed Zorn’s lemma to get an

orthonormal system. In many cases, we can find a basis without using Zorn’s

lemma. This relies on the Gram-Schmidt procedure.

Proposition.

Let

}

i=1

n ∈ N

be linearly independent. Then there exists

}

i=1

such that {e

}

i=1

is an orthonormal system and

span{x

, ··· , x

} = span{e

, ··· , e

}

for all j ≤ n.

Proof. Define e

Assume we have defined {e

}

i=1

orthonormal such that

span{x

, ··· , x

} = span{e

, ··· , e

Then by linear independence, we know that

j+1

6∈ span{x

, ··· , x

} = span{e

, ··· , e

} = F

We now define

j+1

= x

j+1

− P

j+1

where P

is the projection onto F

given by

i=1

hx, e

Since F

is a closed, finite subspace, we know that

j+1

− P

j+1

⊥ F

Thus

j+1

is the right choice. We can also write this in full as

j+1

−

i=1

j+1

−

i=1

So done.

Note that projection into the first

basis is exactly what we did when we

wrote an element in `

as the limit of the sequence.

This is a helpful result, since it is a constructive way of producing orthonormal

systems. So if we are handed a set of vectors, we can just apply this result, plug

our vectors into this ugly formula, and get a result. Of course, we want to apply

this to infinite spaces.

Proposition.

Let

be separable, i.e. there is an infinite set

}

i∈N

such that

span{y

} = H.

Then there exists a countable basis for span{y

Proof.

We find a subset

}

such that

span{y

}

span{y

}

and

}

are

independent. This is easy to do since we can just throw away the useless

dependent stuff. At this point, we do Gram-Schmidt, and done.

Example. Consider H = `

and the sequence {e

}

i∈N

be defined by

= (0, 0, ··· , 0, 1, 0, ···),

with the zero in the ith column.

Note that

x ⊥ {e

}

i∈N

if and only if each component is zero, i.e.

. So

} is maximal, and hence a basis.

Example. Consider H = L

, the completion of C(S

) under the L

norm, i.e.

hf, gi =

−π

f ¯g dx.

Trigonometric polynomials are dense in

(

) with respect to the supremum

norm due to Stone-Weierstrass. So in fact

span

√

2π

inx

for

n ∈ N

is dense

(

). Hence it is dense in

(

) under the

norm since convergence

under the supremum norm implies convergence under

. In particular, it is

dense in the

space since

is the completion of

(

). Moreover, this set is

orthonormal in C(S

) under the L

norm. So

√

2π

inx

is a basis for L

Note that in these two examples, we have exhibited two different ways of

constructing a basis. In the first case, we showed that it is maximal directly. In

the second case, we show that its span is a dense subset of the space. By our

proposition, these are equivalent and valid ways of proving that it is a basis.

4.4 The isomorphism with `

We ended the previous section with two examples. Both of them are Hilbert

spaces, and both have countable basis. Is there any way we can identify the

both? This is a reasonable thing to ask. If we are given a Hilbert space

finite dimension

dim H

, then we know that

is indeed isomorphic to

(or

) with the Euclidean norm. In some sense

is just an “infinite version”

. So we might expect all other Hilbert spaces with countable dimension to

be isomorphic to `

Recall that if we have a finite-dimensional Hilbert space

with

dim H

and an orthonormal basis {e

, ··· , e

}, then each x ∈ H can be written as

x =

i=1

hx, e

and

kxk

i=1

|hx, e

Thus

is isomorphic to

, the space

with the Euclidean norm, via the map

x 7→ (hx, e

i, ··· , hx, e

i).

Can we push this to the infinite dimensional case? Yes. We will have to replace

our finite sum

i=1

with an infinite sum. Of course, with an infinite sum, we

need to make sure things converge. This is guaranteed by Bessel’s inequality.

Lemma

(Bessel’s inequality)

Let

be Euclidean and

}

i=1

with

N ∈ N∪{∞}

an orthonormal system. For any

x ∈ E

, define

hx, e

. Then for any

j ≤ N

we have

i=1

≤ kxk

Proof. Consider the case where j is finite first. Define

= span{e

, ··· , e

This is a finite dimensional subspace of

. Hence an orthogonal projection

exists. Moreover, we have an explicit formula for this:

i=1

hx, e

Thus

i=1

= kP

≤ kxk

since we know that

k ≤

1. Taking the limit as

j → ∞

proves the case for

infinite j.

The only thing we required in the proof is for the space to be Euclidean.

This is since we are talking about the sum

∞

i=1

and this is a sum of numbers. However, if we want to investigate the sum

x =

∞

i=1

hx, e

then we’d better require the space to be Hilbert, so that the sum has something

to converge to.

Proposition.

Let

be a separable Hilbert space, with a countable basis

}

i=1

, where N ∈ N ∪ {∞}. Let x, y ∈ H and

= hx, e

i, y

= hy, e

Then

x =

i=1

, y =

i=1

and

hx, yi =

i=1

¯y

Moreover, the sum converges absolutely.

Proof.

We only need to consider the case

∞

. Otherwise, it is just finite-

dimensional linear algebra.

First, note that our expression is written as an infinite sum. So we need to

make sure it converges. We define the partial sums to be

i=1

We want to show s

→ x. By Bessel’s inequality, we know that

∞

i=1

≤ kxk

In particular, the sum is bounded, and hence converges.

For any m < n, we have

− s

k =

i=m+1

≤

∞

i=m+1

m → ∞

, the series must go to 0. Thus

}

is Cauchy. Since

is Hilbert,

converges, say

→ s =

∞

i=1

Now we want to prove that this sum is indeed

itself. Note that so far in the

proof, we have not used the fact that

}

is a basis. We just used the fact that

it is orthogonal. Hence we should use this now. We notice that

hs, e

i = lim

n→∞

, e

i = lim

n→∞

j=1

, e

i = x

Hence we know that

hx − s, e

i = 0.

for all

. So

x − s

is perpendicular to all

. Since

}

is a basis, we must have

x − s = 0, i.e. x = s.

To show our formula for the inner product, we can compute

hx, yi = lim

n→∞

i=1

j=1

= lim

n→∞

i,j=1

¯y

, e

= lim

n→∞

i,j=1

¯y

= lim

n→∞

i=1

¯y

∞

i=1

¯y

Note that we know the limit exists, since the continuity of the inner product

ensures the first line is always valid.

Finally, to show absolute convergence, note that for all finite j, we have

i=1

¯y

| ≤

i=1

≤ kxkkyk.

Since this is a uniform bound for any j, the sum converges absolutely.

Note that in the case of x = y, our formula for the inner product gives

kxk

i=1

This is known as Parseval’s equality

What this proposition gives us is that given any separable Hilbert space,

we can find “coordinates” for it, and in terms of these coordinates, our inner

product and hence norm all act like `

. In particular, we have the map

x 7→ {hx, e

i=1

that takes

into

. This is injective since by Parseval’s equality, if the image

of x is 0, then kxk

0 = 0. So x = 0.

This is good, but not good enough. We want the map to be an isomorphism.

Hence, we need to show it is surjective. In other words, every element in

obtained. This is a theorem by Riesz and Fisher, and is in fact easy to prove,

since there is an obvious candidate for the preimage of any {x

}

i∈N

Proposition.

Let

be a separable Hilbert space with orthonormal basis

}

i∈N

. Let

}

i∈N

∈ `

(

). Then there exists an

x ∈ H

with

hx, e

Moreover, this x is exactly

x =

∞

i=1

Proof.

The only thing we need to show is that this sum converges. For any

n ∈ N, define

i=1

∈ H.

For m < n, we have

− s

m+1

→ 0

m → ∞

because

} ∈ `

. Hence

is Cauchy and as such converges to

Obviously, we have

hx, e

i = lim

n→∞

j=1

, e

i = a

So done.

This means we have a isomorphism between

and

. Moreover, this is

continuous and in fact isometric. So this is a very strong result. This says all

separable Hilbert spaces are `

4.5 Operators

We are going to look at operators on Hilbert spaces. For example, we would like

to see how differential operators behave on spaces of differentiable functions.

In this section, we will at least require the space to be Banach. So let

be a Banach space over

. We will consider

(

) =

(

X, X

), the vector space

of bounded linear maps from

to itself. We have seen in the example sheets

that

(

) is a unital Banach algebra, i.e. it forms a complete algebra with

composition as multiplication. Our goal is to generalize some considerations in

finite dimensions such as eigenvectors and eigenvalues.

Definition

(Spectrum and resolvent set)

Let

be a Banach space and

T ∈

B(X), we define the spectrum of T , denoted by σ(T ) by

σ(t) = {λ ∈ C : T − λI is not invertible}.

The resolvent set, denoted by ρ(T), is

ρ(t) = C \ σ(T ).

Note that if

T − λ

is is bijective, then by the inverse mapping theorem, we

know it has a bounded inverse. So if

λ ∈ σ

(

), then either

T −λI

is not injective,

or it is not surjective. In other words,

ker

(

T − λI

)

{

}

(

T − λI

)

In finite dimensions, these are equivalent by, say, the rank-nullity theorem, but

in general, they are not.

Example. Consider the shift operator s : `

∞

→ `

∞

defined by

, a

, ···) 7→ (0, a

, a

, ···).

Then this is injective but not surjective.

Now if

λ ∈ ρ

(

), i.e.

T − λI

is invertible, then (

T − λI

)

−1

is automatically

bounded by the inverse mapping theorem. This is why we want to work with

Banach spaces.

Definition

(Resolvent)

Let

be a Banach space. The resolvent is the map

R : ρ(T ) → B(X) given by

λ 7→ (T −λI)

−1

Definition

(Eigenvalue)

We say

is an eigenvalue of

ker

(

T − λI

)

{

}

Definition

(Point spectrum)

Let

be a Banach space. The point spectrum is

(T ) = {λ ∈ C : λ is an eigenvalue of T }.

Obviously, σ

(T ) ⊆ σ(T ), but they are in general not equal.

Definition

(Approximate point spectrum)

Let

be a Banach space. The

approximate point spectrum is defined as

(X) = {λ ∈ C : ∃{x

} ⊆ X : kx

= 1 and k(T − λI)x

→ 0}.

Again, we have

(T ) ⊆ σ

(T ) ⊆ σ(T ).

The last inclusion follows from the fact that if an inverse exists, then the inverse

is bounded.

An important characterization of the spectrum is the following theorem:

Theorem.

Let

be a Banach space,

T ∈ B

(

). Then

(

) is a non-empty,

closed subset of

{λ ∈ C : |λ| ≤ kT k

B(X)

In finite dimensions, this in particular implies the existence of eigenvalues,

since the spectrum is equal to the point spectrum. Notice this is only true for

vector spaces over C, as we know from linear algebra.

To prove this theorem, we will first prove two lemmas.

Lemma.

Let

be a Banach space,

T ∈ B

(

) and

kT k

B(X)

1. Then

I − T

is invertible.

Proof.

To prove it is invertible, we construct an explicit inverse. We want to

show

(I − T )

−1

∞

i=0

First, we check the right hand side is absolutely convergent. This is since

∞

i=0

B(X)

≤

∞

i=0

kT k

B(X)

≤

1 − kT k

B(X)

< ∞.

Since

is Banach, and hence

(

) is Banach, the limit is well-defined. Now it

is easy to check that

(I − T )

∞

i=1

= (I − T )(I + T + T

+ ···)

= I + (T − T ) + (T

− T

) + ···

= I.

Similarly, we have

∞

i=1

(I − T ) = I.

Lemma.

Let

be a Banach space,

∈ B

(

) be invertible. Then for all

∈ B(X) such that

−1

B(X)

− S

B(X)

< 1,

is invertible.

This is some sort of an “openness” statement for the invertible bounded

linear maps, since if

is invertible, then any “nearby” bounded linear map is

also invertible.

Proof. We can write

= S

(I − S

−1

− S

)).

Since

−1

− S

B(X)

≤ kS

−1

B(X)

− S

B(X)

< 1

by assumption, by the previous lemma, (

I − S

−1

(

− S

))

−1

exists. Therefore

the inverse of S

−1

= (I − S

−1

− S

))

−1

We can now return to prove our original theorem.

Theorem.

Let

be a Banach space,

T ∈ B

(

). Then

(

) is a non-empty,

closed subset of

{λ ∈ C : |λ| ≤ kT k

B(X)

Note that it is not hard to prove that it is closed and a subset of

{λ ∈ C

|λ| ≤ kT k

B(X)

}. The hard part is to prove it is non-empty.

Proof.

We first prove the closedness of the spectrum. It suffices to prove that

the resolvent set ρ(T ) = C \σ(T ) is open, by the definition of closedness.

Let

λ ∈ ρ

(

). By definition,

T − λI

is invertible. Define

T − µI

Then

− S

B(X)

= k(T − λI) −(T − µI)k

B(X)

= |λ − µ|.

Hence if

|λ − µ|

is sufficiently small, then

T − µI

is invertible by the above

lemma. Hence µ ∈ ρ(T ). So ρ(T ) is open.

To show σ(T ) ⊆ {λ ∈ C : |λ| ≤ kT k

B(X)

} is equivalent to showing

{λ ∈ C : |λ| > kT k

B(X)

} ⊆ C \ σ(T ) = ρ(T ).

Suppose |λ| > kT k. Then I − λ

−1

T is invertible since

kλ

−1

T k

B(X)

= λ

−1

kT k

B(X)

< 1.

Therefore, (I − λ

−1

T )

−1

exists, and hence

(λI − T )

−1

= λ

−1

(I − λT )

−1

is well-defined. Therefore λI − T , and hence T − λI is invertible. So λ ∈ ρ(T ).

Finally, we need to show it is non-empty. How did we prove it in the case

of finite-dimensional vector spaces? In that case, it ultimately boiled down to

the fundamental theorem of algebra. And how did we prove the fundamental

theorem of algebra? We said that if

(

) is a polynomial with no roots, then

p(x)

is bounded and entire, hence constant.

We are going to do the same proof. We look at

T −λI

as a function of

. If

(

) =

∅

, then this is an everywhere well-defined function. We show that this is

entire and bounded, and hence by “Liouville’s theorem”, it must be constant,

which is impossible (in the finite-dimensional case, we would have inserted a

det

there).

So suppose

(

) =

∅

, and consider the function

C → B

(

), given by

R(λ) = (T − λI)

−1

We first show this is entire. This, by definition, means

is given by a power

series near any point

∈ C

. Fix such a point. Then as before, we can expand

T − λI = (T − λ

I − (T − λ

−1



(T − λ

I) − (T −λI)

i

= (T − λ

I − (λ − λ

)(T − λ

−1

Then for (λ − λ

) small, we have

(T − λI)

−1

∞

i=0

(λ − λ

)

(T − λ

−i

(T − λ

−1

∞

i=0

(λ − λ

)

(T − λ

−i−1

So this is indeed given by an absolutely convergent power series near λ

Next, we show R is bounded, i.e.

sup

λ∈C

kR(λ)k

B(X)

< ∞.

It suffices to prove this for λ large. Note that we have

(T − λI)

−1

= λ

−1

(λ

−1

T − I)

−1

= −λ

−1

∞

i=0

−i

Hence we get

k(λI − T )

−1

B(X)

≤ |λ|

−1

∞

i=0

|λ|

−i

B(X)

≤ |λ|

−1

∞

i=0



|λ|

−1

kT k

B(X)



≤

|λ| − kT k

B(X)

which tends to 0 as |λ| → ∞. So it is bounded.

By “Liouville’s theorem”,

(

) is constant, which is clearly a contradiction

since R(λ) 6= R(µ) for λ 6= µ.

Of course, to do this properly, we need a version of Liouville’s theorem for

Banach-space valued functions as opposed to complex-valued functions. So let’s

prove this.

Proposition

(Liouville’s theorem for Banach space-valued analytic function)

Let

be a Banach space, and

C → X

be entire (in the sense that

is given

by an absolutely convergent power series in some neighbourhood of any point)

and norm bounded, i.e.

sup

z∈C

kF (z)k

< ∞.

Then F is constant.

This is a generalization of Liouville’s theorem to the case where the target

of the map is a Banach space. To prove this, we reduce this to the case of

complex-valued functions. To do so, we compose this F with a map X → C.

Proof.

Let

f ∈ X

∗

. Then we show

f ◦ F

C → C

is bounded and entire. To see

it is bounded, just note that f is a bounded linear map. So

sup

z∈C

|f ◦ F (z)| ≤ sup

z∈C

kfk

∗

kF (z)k

< ∞.

Analyticity can be shown in a similar fashion, exploiting the fact that

∗

linear.

Hence Liouville’s theorem implies

f ◦F

is constant, i.e. (

f ◦F

)(

) = (

f ◦F

)(0).

In particular, this implies

(

)

− F

(0)) = 0. Moreover, this is true for all

f ∈ X

∗

. Hence by (corollary of) Hahn-Banach theorem, we know

(

)

−F

(0) = 0

for all z ∈ C. Therefore F is constant.

We have thus completed our proof that

(

) is non-empty, closed and a

subset of {λ ∈ C : |λ| ≤ kT k

B(X)

However, we want to know more. Apart from the spectrum itself, we also

had the point spectrum

(

) and the approximate point spectrum

(

), and

we had the inclusions

(T ) ⊆ σ

(T ) ⊆ σ(T ).

We know that the largest set

(

) is non-empty, but we want the smaller ones

to be non-empty as well. We have the following theorem:

Theorem. We have

(T ) ⊇ ∂σ(T ),

where

∂σ

(

) is the boundary of

(

) in the topology of

. In particular,

(T ) 6= ∅.

On the other hand, it is possible for

(

) to be empty (in infinite dimensional

cases).

Proof.

Let

λ ∈ ∂σ

(

). Pick sequence

{λ

}

∞

n=1

⊆ ρ

(

) =

C \ σ

(

) such that

→ λ. We claim that R(λ

) = (T − λ

−1

satisfies

kR(λ

B(X)

→ ∞.

If this were the case, then we can pick

∈ X

such that

k →

0 and

kR(λ

)(y

)k = 1. Setting x

= R(λ

)(y

), we have

k(T − λI)x

k ≤ k(T −λ

I)x

+ k(λ − λ

= k(T − λ

I)(T − λ

−1

+ k(λ − λ

= ky

+ |λ − λ

→ 0.

So λ ∈ σ

(T ).

Thus, it remains to prove that

(

)

B(X)

→ ∞

. Recall from last time if

is invertible, and

−1

B(X)

− S

B(X)

≤ 1, (∗)

then S

is invertible. Thus, for any µ ∈ σ(T ), we have

kR(λ

B(X)

|µ − λ

| = kR(λ

B(X)

k(T − λ

I) − (T −µI)k

B(X)

≥ 1.

Thus, it follows that

kR(λ

B(X)

≥

inf{|µ − λ

| : µ ∈ σ(T )}

→ ∞.

So we are done.

Having proven so many theorems, we now look at an specific example.

Example. Consider the shift operator S : `

∞

→ `

∞

defined by

, a

, ···) 7→ (0, a

, a

, ···).

Then

is a bounded linear operator with norm

kSk

B(`

∞

)

= 1. The theorem

then tells σ(S) is a non-empty closed subset of {λ ∈ C : kλk ≤ 1}.

First, we want to understand what the point spectrum is. In fact, it is empty.

To show this, suppose

S(a

, a

, ···) = λ(a

, a

, ···)

for some λ ∈ C. In other words,

(0, a

, a

, ···) = λ(a

, a

, ···).

First consider the possibility that

= 0. This would imply that the left is zero.

So a

= 0 for all i.

λ 6

= 0, then for the first coordinate to match, we must have

= 0. Then

for the second coordinate to match, we also need

= 0. By induction, we need

all a

= 0. So ker(S − λI) = {0} for all λ ∈ C.

To find the spectrum, we will in fact show that

σ(S) = D = {λ ∈ C : |λ| ≤ 1}.

To prove this, we need to show that for any

λ ∈ D

S − λI

is not surjective.

The

= 0 case is obvious. For the other cases, we first have a look at what the

image of S − λI looks like. We take

, b

, ···) ∈ `

∞

Suppose for some λ ∈ D, there exists (a

, a

, ···) such that we have

(S − λI)(a

, a

, ···) = (b

, b

, ···)

In other words, we have

(0, a

, a

, ···) −(λa

, λa

, ···) = (b

, b

, ···).

So −λa

= b

. Hence we have

= −λ

−1

The next line then gives

− λa

= b

Hence

= −λ

−1

− a

) = −λ

−1

+ λ

−1

Inductively, we can show that

= λ

−1

+ λ

−1

n−1

+ λ

−2

n−2

+ ··· + λ

−n+1

Now if

|λ| ≤

1, we pick

such that

= 1 and

−i

n−i

|λ|

−i

. Then we must

have

| → ∞

. Such a sequence (

)

6∈ `

∞

. So (

)

6∈ im

(

S − λI

). Therefore

for |λ| ≤ 1, S − λI is not surjective.

Hence we have

(

)

⊇ D

. By the theorem, we also know

(

)

⊆ D

. So in

fact σ(S) = D.

Finally, we show that

(S) = ∂D = {λ ∈ C : |λ| = 1}.

Our theorem tells us that

∂D ⊆ σ

(

)

⊆ D

. To show that indeed

∂D

(

note that if |λ| < 1, then for all x ∈ `

∞

k(S − λI)xk

∞

≥ kS

`∞

− |λ|kxk

∞

= kxk

∞

− |λ|kxk

∞

= (1 − λ)kxk

∞

So if

|λ| <

1, then there exists no sequence

with

∞

= 1 and

(

S −

λI)x

∞

→ 0. So λ is not in the approximate point spectrum.

We see that these results are rather unpleasant and the spectrum behaves

rather unlike the finite dimensional cases. We saw last time that the spectrum

(

) can in general be complicated. In fact, any non-empty compact subset of

can be realized as the spectrum of some operator

on some Hilbert space

This is an exercise on the Example sheet.

We are now going to introduce a class of “nice” operators whose spectrum

behaves in a way more similar to the finite-dimensional case. In particular, most

(

) consists of

(

) and is discrete. This includes, at least, all finite rank

operators (i.e. operators T such that dim(im T )) < ∞) and their limits.

Definition

(Compact operator)

Let

X, Y

be Banach spaces. We say

T ∈

(

X, Y

) is compact if for every bounded subset

(

) is totally bounded.

We write B

(X) for the set of all compact operators T ∈ B(X).

Note that in the definition of

, we only required

to be a linear map, not

a bounded linear map. However, boundedness comes from the definition for free

because a totally bounded set is bounded.

There is a nice alternative characterization of compact operators:

Proposition.

Let

X, Y

be Banach spaces. Then

T ∈ L

(

X, Y

) is compact if and

only if T (B(1)) is totally bounded if and only if T (B(1)) is compact.

The first equivalence is obvious, since

(1) is a bounded set, and given any

bounded set

, we can rescale it to be contained in

(1). The second equivalence

comes from the fact that a space is compact if and only if it is totally bounded

and complete.

The last characterization is what we will use most of the time, and this is

where the name “compact operator” came from.

Proposition.

Let

be a Banach space. Then

(

) is a closed subspace of

B(X). Moreover, if T ∈ B

(X) and S ∈ B(X), then T S, ST ∈ B

(X).

In a more algebraic language, this means

(

) is a closed ideal of the

algebra B(X).

Proof.

There are three things to prove. First, it is obvious that

(

) is a

subspace. To check it is closed, suppose

}

∞

n=1

⊆ B

(

) and

−T k

B(X)

→

We need to show T ∈ B

(X), i.e. T (B(1)) is totally bounded.

Let ε > 0. Then there exists N such that

kT − T

B(X)

< ε

whenever

n ≥ N

. Take such an

. Then

(

(1)) is totally bounded. So there

exists

, ··· , x

∈ B

(1) such that

}

i=1

is an

-net for

(

(1)). We now

claim that {T x

}

i=1

is an 3ε-net for T (B(1)).

This is easy to show. Let

x ∈ X

be such that

kxk ≤

1. Then by the triangle

inequality,

kT x − T x

≤ kT x − T

xk + kT

x − T

k + kT

− T x

≤ ε + kT

x − T

+ ε

= 2ε + kT

x − T

Now since

}

is an

-net for

(

(1)), there is some

such that

x −

k < ε. So this gives

kT x − T x

≤ 3ε.

Finally, let

T ∈ B

(

) and

S ∈ B

(

). Let

} ⊆ X

such that

≤

1. Since

is compact, i.e.

T (B(1))

is compact, there exists a convergence subsequence

of {T x

Since

is bounded, it maps a convergent sequence to a convergent sequence.

{ST x

}

also has a convergent subsequence. So

ST (B(1))

is compact. So

is compact.

We also have to show that

T S

(

(1)) is totally bounded. Since

is bounded,

(

(1)) is bounded. Since

sends a bounded set to a totally bounded set, it

follows that T S(B(1)) is totally bounded. So T S is compact.

At this point, it helps to look at some examples at actual compact operators.

Example.

(i)

Let

X, Y

by Banach spaces. Then any finite rank operator in

(

X, Y

) is

compact. This is since

(

(1)) is a bounded subset of a finite-dimensional

space (since

is bounded), and any bounded subset of a finite-dimensional

space is totally bounded.

In particular, any

f ∈ X

∗

is compact. Moreover, by the previous propo-

sition, limits of finite-rank operators are also compact (since

(

) is

closed).

(ii)

Let

be a Banach space. Then

X → X

is compact if and only if

finite dimensional. This is since

B(1)

is compact if and only if the space is

finite-dimensional.

(iii)

Let

R → R

be a smooth function. Define

([0

1])

→ C

([0

1]) by

the convolution

(T f)(x) =

K(x − y)f(y) dy.

We first show T is bounded. This is since

sup

|T f(x)| ≤ sup

z∈[−1,1]

K(z) sup

|f(y)|.

Since

is smooth, it is bounded on [

−

1]. So

T f

is bounded. In fact,

is compact. To see this, notice

sup



d(T f)

(x)



≤ sup

z∈[−1,1]

(z)|sup

|f(y)|.

Therefore if we have a sequence

} ⊆ C

([0

1]) with

C([0,1])

≤

then

{T f

}

∞

n=1

is uniformly bounded with uniformly bounded derivative,

and hence equicontinuous. By Arzel`a-Ascoli theorem,

{T f

}

∞

n=1

has a

convergent subsequence. Therefore T (B(1)) is compact.

There is much more we can say about the last example. For example, requiring

to be smooth is clearly overkill. Requiring, say, differentiable with bounded

derivative is already sufficient. Alternatively, we can ask what we can get if we

work on, say, L

instead of C([0, 1]).

However, we don’t have enough time for that, and instead we should return

to developing some general theory. An important result is the following theorem,

characterizing the point spectrum and spectrum of a compact operator.

Theorem.

Let

be an infinite-dimensional Banach space, and

T ∈ B

(

) be a

compact operator. Then

(

) =

{λ

}

is at most countable. If

(

) is infinite,

then λ

→ 0.

The spectrum is given by

(

) =

(

)

∪ {

}

. Moreover, for every non-zero

∈ σ

(T ), the eigenspace has finite dimensions.

Note that it is still possible for a compact operator to have empty point

spectrum. In that case, the spectrum is just

{

}

. An example of this is found

on the example sheet.

We will only prove this in the case where

is a Hilbert space. In a

lot of the proofs, we will have a closed subspace

V ≤ X

, and we often want to

pick an element in

X \ V

that is “far away” from

in some sense. If we have a

Hilbert space, then we can pick this element to be an element in the complement

. If we are not in a Hilbert space, then we need to invoke Riesz’s lemma,

which we shall not go into.

We will break the full proof of the theory into many pieces.

Proposition.

Let

be a Hilbert space, and

T ∈ B

(

) a compact operator.

Let

a >

0. Then there are only finitely many linearly independent eigenvectors

whose eigenvalue have magnitude ≥ a.

This already gives most of the theorem, since this mandates

(

) is at most

countable, and if

(

) is infinite, we must have

→

0. Since there are only

finitely many linearly independent eigenvectors, the eigenspaces are also finite.

Proof.

Suppose not. There there are infinitely many independent

, x

, ···

such that T x

= λ

with |λ

| ≥ a.

Define

span{x

, ··· , x

}

. Since the

’s are linearly independent, there

exists y

∈ X

∩ X

⊥

n−1

with ky

= 1.

Now let

Note that

≤

Since

is spanned by the eigenvectors, we know that

maps

into itself.

So we have

T z

∈ X

Moreover, we claim that T z

− y

∈ X

n−1

. We can check this directly. Let

k=1

Then we have

T z

− y

k=1

−

k=1



− 1



n−1

k=1



− 1



∈ X

n−1

We next claim that

kT z

− T z

≥

1 whenever

n > m

. If this holds, then

is not compact, since T z

does not have a convergent subsequence.

To show this, wlog, assume n > m. We have

kT z

− T z

= k(T z

− y

) − (T z

− y

Note that

T z

− y

∈ X

n−1

, and since

m < n

, we also have

T z

∈ X

n−1

. By

construction, y

⊥ X

n−1

. So by Pythagorean theorem, we have

= kT z

− y

− T z

+ ky

≥ ky

= 1

So done.

To prove the previous theorem, the only remaining thing to prove is that

(

) =

(

)

∪{

}

. In order to prove this, we need a lemma, which might seem

a bit unmotivated at first, but will soon prove itself useful.

Lemma.

Let

be a Hilbert space, and

T ∈ B

(

) compact. Then

(

I −T

) is

closed.

Proof.

We let

be the orthogonal complement of

ker

(

I − T

), which is a closed

subspace, hence a Hilbert space. We shall consider the restriction (

I − T

)

which has the same image as I − T .

To show that

(

I −T

) is closed, it suffices to show that (

I −T

)

is bounded

below, i.e. there is some C > 0 such that

kxk

≤ Ck(I − T )xk

for all x ∈ S. If this were the case, then if (I − T )x

→ y in H, then

− x

k ≤ Ck(I − T )(x

− x

)k → 0,

and so

}

is a Cauchy sequence. Write

→ x

. Then by continuity, (

I−T

)

y, and so y ∈ im(I − T ).

Thus, suppose (

I − T

) is not bounded below. Pick

such that

= 1,

but (

I − T

)

→

0. Since

is compact, we know

T x

has a convergent

subsequence. We may wlog

T x

→ y

. Then since

kT x

− x

→

0, it follows

that we also have x

→ y. In particular, kyk = 1 6= 0, and y ∈ S.

But

→ y

also implies

T x

→ T y

. So this means we must have

T y

But this is a contradiction, since y does not lie in ker(I − T ).

Proposition.

Let

be a Hilbert space,

T ∈ B

(

) compact. If

λ 6

= 0 and

λ ∈ σ(T ), then λ ∈ σ

(T ).

Proof.

We will prove if

λ 6

= 0 and

λ 6∈ σ

(

), then

λ 6∈ σ

(

). In other words,

let

λ 6

= 0 and

ker

(

T − λI

) =

{

}

. We will show that

T − λI

is surjective, i.e.

im(T − λI) = H.

Suppose this is not the case. Denote

and

(

T − λI

). We

know that

is closed and is hence a Hilbert space. Moreover,

( H

assumption.

We now define the sequence {H

} recursively by

= (T − λI)H

n−1

We claim that

( H

n−1

. This must be the case, because the map (

T −λI

)

→ H

is an isomorphism (it is injective and surjective). So the inclusion

⊆ H

n−1

is isomorphic to the inclusion H

⊆ H

, which is strict.

Thus we have a strictly decreasing sequence

) H

) ···

Let

be such that

∈ H

⊥ H

n+1

and

= 1. We now claim

kT y

− T y

k ≥ |λ|

n 6

. This then contradicts the compactness of

. To

show this, again wlog we can assume that n > m. Then we have

kT y

− T y

= k(T y

− λy

) − (T y

− λy

) − λy

+ λy

= k(T − λI)y

− (T − λI)y

− λy

+ λy

Now note that (

T −λI

)

∈ H

n+1

⊆ H

m+1

, while (

T −λI

)

and

λy

are both

m+1

. So

λy

is perpendicular to all of them, and Pythagorean theorem

tells

= |λ|

+ k(T − λI)y

− (T − λI)y

− λy

≥ |λ|

= |λ|

This contradicts the compactness of T . Therefore im(T − λI) = H.

Finally, we can prove the initial theorem.

Theorem.

Let

be an infinite-dimensional Hilbert space, and

T ∈ B

(

) be a

compact operator. Then

(

) =

{λ

}

is at most countable. If

(

) is infinite,

then λ

→ 0.

The spectrum is given by

(

) =

(

)

∪

0. Moreover, for every non-zero

∈ σ

(T ), the eigenspace has finite dimensions.

Proof.

As mentioned, it remains to show that

(

) =

(

)

∪{

}

. The previous

proposition tells us

(

)

} ⊆ σ

(

). So it only remains to show that 0

∈ σ

(

There are two possible cases. The first is if

{λ

}

is infinite. We have already

shown that λ

→ 0. So 0 ∈ σ(T ) by the closedness of the spectrum.

Otherwise, if {λ

} is finite, let E

, ··· , E

be the eigenspaces. Define

= span{E

, ··· , E

}

⊥

This is non-empty, since each

is finite-dimensional, but

is infinite dimen-

sional. Then T restricts to T |

: H

→ H

Now

T |

has no non-zero eigenvalues. By the previous discussion, we know

σ(T |

) ⊆ {0}. By non-emptiness of σ(T|

), we know 0 ∈ σ(T |

) ⊆ σ(T).

So done.

4.6 Self-adjoint operators

We have just looked at compact operators. This time, we are going to add a

condition of self-adjointness.

Definition

(Self-adjoint operator)

Let

be a Hilbert space,

T ∈ B

(

). Then

T is self-adjoint or Hermitian if for all x, y ∈ H, we have

hT x, yi = hx, T yi.

It is important to note that we defined the term for bounded linear operators

. If we have unbounded operators instead, Hermitian means something different

from self-adjoint, and we have to be careful.

Recall that we defined the adjoint of a linear map to be a map of the dual

spaces. However, we will often abuse notation and call

∗

H → H

the adjoint,

which is the (unique) operator such that for all x, y ∈ H,

hT x, yi = hx, T

∗

yi.

It is an exercise to show that this is well-defined.

How is this related to the usual adjoint? Let

∗

→ H

∗

be the usual

adjoint. Then we have

∗

= φ

−1

◦

∗

◦ φ,

where φ : H → H

∗

is defined by

φ(v)(w) = hw, vi

as in the Reisz representation theorem.

The main result regarding self-adjoint operators is the spectral theorem:

Theorem

(Spectral theorem)

Let

be an infinite dimensional Hilbert space

and T : H → H a compact self-adjoint operator.

(i) σ

(T ) = {λ

}

i=1

is at most countable.

(ii) σ

(T ) ⊆ R.

(iii) σ(T ) = {0} ∪ σ

(T ).

(iv) If E

are the eigenspaces, then dim E

is finite if λ

6= 0.

(v) E

⊥ E

if λ

6= λ

(vi) If {λ

} is infinite, then λ

→ 0.

(vii)

T =

i=1

We have already shown (i), (iii), (iv) and (vi). The parts (ii) and (v) we

already did in IA Vectors and Matrices, but for completeness, we will do the

proof again. They do not require compactness. The only non-trivial bit left is

the last part (vii).

We first do the two easy bits.

Proposition.

Let

be a Hilbert space and

T ∈ B

(

) self-adjoint. Then

(T ) ⊆ R.

Proof.

Let

λ ∈ σ

(

) and

v ∈ ker

(

T − λI

)

\ {

}

. Then by definition of

, we

have

λ =

hT v, vi

kvk

hv, T vi

kvk

λ.

So λ ∈ R.

Proposition.

Let

be a Hilbert space and

T ∈ B

(

) self-adjoint. If

λ, µ ∈

(T ) and λ 6= µ, then E

⊥ E

Proof. Let v ∈ ker(T −λI) \ {0} and w ∈ ker(T − µI) \{0}. Then

λhv, wi = hT v , wi = hv, T wi = ¯µhv, wi = µhv, wi,

using the fact that eigenvalues are real. Since

λ 6

by assumption, we must

have hv, wi = 0.

To prove the final part, we need the following proposition:

Proposition.

Let

be a Hilbert space and

T ∈ B

(

) a compact self-adjoint

operator. If T 6= 0, then T has a non-zero eigenvalue.

This is consistent with our spectral theorem, since if

is non-zero, then

something in the sum

has to be non-zero. It turns out this is most of

the work we need.

However, to prove this, we need the following lemma:

Lemma.

Let

be a Hilbert space, and

T ∈ B

(

) a compact self-adjoint

operator. Then

kT k

B(H)

= sup

kxk

|hx, T xi|

Proof. Write

λ = sup

kxk

|hx, T xi|.

Note that one direction is easy, since for all x, Cauchy-Schwarz gives

|hx, T xi| ≤ kT xk

kxk

= kT k

B(H)

kxk

So it suffices to show the inequality in the other direction. We now claim that

kT k

B(H)

= sup

kxk

=1,kyk

|hT x, yi|.

To show this, recall that

H → H

∗

defined by

v 7→ h·, vi

is an isometry. By

definition, we have

kT k

B(H)

= sup

kxk

kT xk

= sup

kxk

kφ(T x)k

∗

= sup

kxk

sup

kyk

|hy, T xi|.

Hence, it suffices to show that

sup

kxk

=1,kyk

|hT x, yi| ≤ λ.

Take

x, y ∈ H

such that

kxk

kyk

= 1. We first perform a trick similar to

the polarization identity. First, by multiplying

by an appropriate scalar, we

can wlog assume hT x, yi is real. Then we have

|hT (x + y), x + yi − hT (x −y), x − yi| = 2|hT x, yi+ hT y, xi|

= 4|hT x, yi|.

Hence we have

|hT x, yi| =

|hT (x + y), x + yi − hT (x −y), x − yi|

≤

(λkx + yk

+ λkx − yk

)

(2kxk

+ 2kyk

)

= λ,

where we used the parallelogram law. So we have kTk

B(H)

≤ λ.

Finally, we can prove our proposition.

Proposition.

Let

be a Hilbert space and

T ∈ B

(

) a compact self-adjoint

operator. If T 6= 0, then T has a non-zero eigenvalue.

Proof.

Since

T 6

= 0, then

kT k

B(H)

= 0. Let

kT k

B(H)

. We now claim that

either λ or −λ is an eigenvalue of T.

By the previous lemma, there exists a sequence

}

∞

n=1

⊆ H

such that

= 1 and hx

, T x

i → ±λ.

We consider the two cases separately. Suppose

, T x

i → λ

. Consider

T x

−λx

. Since

is compact, there exists a subsequence such that

T x

→ y

for some

y ∈ H

. For simplicity of notation, we assume

T x

→ y

itself. We have

0 ≤ kT x

− λx

= hT x

− λx

, T x

− λx

= kT x

− 2λhT x

, x

i + λ

→ λ

− 2λ

+ λ

= 0

n → ∞

. Note that we implicitly used the fact that

hT x

, x

, T x

since hT x

, x

i is real. So we must have

kT x

− λx

→ 0.

In other words,

→

Finally, we show y is an eigenvector. This is easy, since

T y = lim

n→∞

T (λx

) = λy.

The case where

→ −λ

is entirely analogous. In this case,

−λ

is an eigenvalue.

The proof is exactly the same, apart form some switching of signs.

Finally, we can prove the last part of the spectral theorem.

Proposition.

Let

be an infinite dimensional Hilbert space and

H → H

a compact self-adjoint operator. Then

T =

i=1

Proof. Let

U = span{E

, E

, ···}.

Firstly, we clearly have

T |

i=1

This is since for any x ∈ U can be written as

x =

i=1

Less trivially, this is also true for

U, i.e.

T |

i=1

but this is also clear from definition once we stare at it hard enough.

We also know that

H =

U ⊕ U

⊥

It thus suffices to show that

T |

⊥

= 0.

But since

T |

⊥

has no non-zero eigenvalues, this follows from our previous

proposition. So done.