Part II Linear Analysis
Based on lectures by J. W. Luk
Notes taken by Dexter Chua
Michaelmas 2015
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Part IB Linear Algebra, Analysis II and Metric and Topological Spaces are essential
Normed and Banach spaces. Linear mappings, continuity, boundedness, and norms.
Finite-dimensional normed spaces. [4]
The Baire category theorem. The principle of uniform boundedness, the closed graph
theorem and the inversion theorem; other applications. [5]
The normality of compact Hausdorff spaces. Urysohn’s lemma and Tiezte’s exten-
sion theorem. Spaces of continuous functions. The Stone-Weierstrass theorem and
applications. Equicontinuity: the Ascoli-Arzel`a theorem. [5]
Inner product spaces and Hilbert spaces; examples and elementary properties. Or-
thonormal systems, and the orthogonalization process. Bessel’s inequality, the Parseval
equation, and the Riesz-Fischer theorem. Duality; the self duality of Hilbert space. [5]
Bounded linear operations, invariant subspaces, eigenvectors; the spectrum and resolvent
set. Compact operators on Hilbert space; discreteness of spectrum. Spectral theorem
for compact Hermitian operators. [5]
Contents
0 Introduction
1 Normed vector spaces
1.1 Bounded linear maps
1.2 Dual spaces
1.3 Adjoint
1.4 The double dual
1.5 Isomorphism
1.6 Finite-dimensional normed vector spaces
1.7 Hahn–Banach Theorem
2 Baire category theorem
2.1 The Baire category theorem
2.2 Some applications
3 The topology of C(K)
3.1 Normality of compact Hausdorff spaces
3.2 Tietze-Urysohn extension theorem
3.3 Arzel`a-Ascoli theorem
3.4 Stone–Weierstrass theorem
4 Hilbert spaces
4.1 Inner product spaces
4.2 Riesz representation theorem
4.3 Orthonormal systems and basis
4.4 The isomorphism with `
2
4.5 Operators
4.6 Self-adjoint operators
0 Introduction
In IB Linear Algebra, we studied vector spaces in general. Most of the time, we
concentrated on finite-dimensional vector spaces, since these are easy to reason
about. For example, we know that every finite-dimensional vector space (by
definition) has a basis. Using the basis, we can represent vectors and linear maps
concretely as column vectors (in F
n
) and matrices.
However, in real life, often we have to work with infinite-dimensional vector
spaces instead. For example, we might want to consider the vector space of
all continuous (real) functions, or the vector space of infinite sequences. It is
difficult to analyse these spaces using the tools from IB Linear Algebra, since
many of those assume the vector space is finite-dimensional. Moreover, in these
cases, we often are not interested in the vector space structure itself. It’s just
that the objects we are interested in happen to have a vector space structure.
Instead, we want to look at notions like continuity and convergence. We want to
do analysis on vector spaces. These are not something the vector space structure
itself provides.
In this course, we are going to give our vector spaces some additional structure.
For the first half of the course, we will grant our vector space a norm. This
allows us to assign a “length” to each vector. With this, we can easily define
convergence and continuity. It turns out this allows us to understand a lot about,
say, function spaces and sequence spaces.
In the second half, we will grant a stronger notion, called the inner product.
Among many things, this allows us to define orthogonality of the elements of a
vector space, which is something we are familiar with from, say, IB Methods.
Most of the time, we will be focusing on infinite-dimensional vector spaces,
since finite-dimensional spaces are boring. In fact, we have a section dedicated
to proving that finite-dimensional vector spaces are boring. In particular, they
are all isomorphic to
R
n
, and most of our theorems can be proved trivially for
finite-dimensional spaces using what we already know from IB Linear Algebra.
So we will not care much about them.
1 Normed vector spaces
In IB Linear Algebra, we have studied vector spaces in quite a lot of detail.
However, just knowing something is a vector space usually isn’t too helpful.
Often, we would want the vector space to have some additional structure. The
first structure we will study is a norm.
Definition
(Normed vector space)
.
A normed vector space is a pair (
V, k · k
),
where
V
is a vector space over a field
F
and
k · k
is a function
k · k
:
V 7→ R
,
known as the norm, satisfying
(i) kvk 0 for all v V , with equality iff v = 0.
(ii) kλvk = |λ|kvk for all λ F, v V .
(iii) kv + wk kvk+ kwk for all v, w V .
Intuitively, we think of kvk as the “length” or “magnitude” of the vector.
Example.
Let
V
be a finite dimensional vector space, and
{e
1
, ··· , e
n
}
a basis.
Then, for any v =
P
n
i=1
v
i
e
i
, we can define a norm as
kvk =
v
u
u
t
n
X
i=1
v
2
i
.
If we are given a norm of a vector space
V
, we immediately obtain two more
structures on V for free, namely a metric and a topology.
Recall from IB Metric and Topological Spaces that (
V, d
) is a metric space if
the metric d : V × V R satisfies
(i) d(x, x) = 0 for all x V .
(ii) d(x, y) = d(y, x) for all x, y V .
(iii) d(x, y) d(x, z) + d(z, y) for all x, y, z V .
Also, a topological spaces is a set
V
together with a topology (a collection of
open subsets) such that
(i) and V are open subsets.
(ii) The union of open subsets is open.
(iii) The finite intersection of open subsets is open.
As we have seen in IB Metric and Topological Spaces, a norm on a vector space
induces a metric by
d
(
v, w
) =
kv wk
. This metric in terms defines a topology
on
V
where the open sets are given by
U V
is open iff for any
x U
, there
exists some ε > 0 such that B(x, ε) = {y V : d(x, y) < ε} U”.
This induced topology is not just a random topology on the vector space.
They have the nice property that the vector space operators behave well under
this topology.
Proposition.
Addition + :
V ×V V
, and scalar multiplication
·
:
F×V V
are continuous with respect to the topology induced by the norm (and the usual
product topology).
Proof.
Let
U
be open in
V
. We want to show that (+)
1
(
U
) is open. Let
(
v
1
, v
2
)
(+)
1
(
U
), i.e.
v
1
+
v
2
U
. Since
v
1
+
v
2
U
, there exists
ε
such that
B
(
v
1
+
v
2
, ε
)
U
. By the triangle inequality, we know that
B
(
v
1
,
ε
2
)+
B
(
v
2
,
ε
2
)
U. Hence we have (v
1
, v
2
) B
(v
1
, v
2
),
ε
2
(+)
1
(U). So (+)
1
(U) is open.
Scalar multiplication can be done in a very similar way.
This motivates the following definition we can do without the norm, and
just require a topology in which addition and scalar multiplication are continuous.
Definition
(Topological vector space)
.
A topological vector space (
V, U
) is
a vector space
V
together with a topology
U
such that addition and scalar
multiplication are continuous maps, and moreover singleton points
{v}
are
closed sets.
The requirement that points are closed is just a rather technical requirement
needed in certain proofs. We should, however, not pay too much attention to
this when trying to understand it intuitively.
A natural question to ask is: when is a topological vector space normable? i.e.
Given a topological vector space, can we find a norm that induces the topology?
To answer this question, we will first need a few definitions.
Definition
(Absolute convexity)
.
Let
V
be a vector space. Then
C V
is
absolutely convex (or balanced convex ) if for any
λ, µ F
such that
|λ|
+
|µ|
1,
we have λC + µC C. In other words, if c
1
, c
2
C, we have λc
1
+ µc
2
C.
Proposition.
If (
V, k · k
) is a normed vector space, then
B
(
t
) =
B
(
0, t
) =
{v
:
kvk < t} is absolutely convex.
Proof. By triangle inequality.
Definition
(Bounded subset)
.
Let
V
be a topological vector space. Then
B V
is bounded if for every open neighbourhood
U V
of
0
, there is some
s >
0 such
that B tU for all t > s.
At first sight, this might seem like a rather weird definition. Intuitively, this
just means that
B
is bounded if, whenever we take any open set
U
, by enlarging
it by a scalar multiple, we can make it fully contain B.
Example. B(t) in a normed vector space is bounded.
Proposition.
A topological vector space (
V, U
) is normable if and only if there
exists an absolutely convex, bounded open neighbourhood of 0.
Proof.
One direction is obvious if
V
is normable, then
B
(
t
) is an absolutely
convex, bounded open neighbourhood of 0.
The other direction is not too difficult as well. We define the Minkowski
functional µ : V R by
µ
C
(v) = inf{t > 0 : v tC},
where C is our absolutely convex, bounded open neighbourhood.
Note that by definition, for any
t < µ
C
(
v
),
v 6∈ tC
. On the other hand, by
absolute convexity, for any t > µ
C
(v), we have v tC.
We now show that this is a norm on V :
(i) If v = 0, then v 0C. So µ
C
(0) = 0. On the other hand, suppose v 6= 0.
Since a singleton point is closed,
U
=
V \{v}
is an open neighbourhood of
0. Hence there is some
t
such that
C tU
. Alternatively,
1
t
C U
. Hence,
v 6∈
1
t
C. So µ
C
(v)
1
t
> 0. So µ
C
(v) = 0 iff v = 0.
(ii) We have
µ
C
(λv) = inf{t > 0 : λv tC} = λ inf{t > 0 : v tC} = λµ
C
(v).
(iii) We want to show that
µ
C
(v + w) µ
C
(v) + µ
C
(w).
This is equivalent to showing that
inf{t > 0 : v + w tC} inf{t > 0 : v tC} + inf{r > 0 : w rC}.
This is, in turn equivalent to proving that if
v tC
and
w rC
, then
(v + w) (t + r)C.
Let
v
0
=
v/t, w
0
=
w/r
. Then we want to show that if
v
0
C
and
w
0
C
,
then
1
(t+r)
(
tv
0
+
rw
0
)
C
. This is exactly what is required by convexity.
So done.
In fact, the condition of absolute convexity can be replaced “convex”, where
“convex” means for every
t
[0
,
1],
tC
+ (1
t
)
C C
. This is since for every
convex bounded
C
, we can find always find a absolutely convex bounded
˜
C C
,
which is something not hard to prove.
Among all normed spaces, some are particularly nice, known as Banach
spaces.
Definition
(Banach spaces)
.
A normed vector space is a Banach space if it is
complete as a metric space, i.e. every Cauchy sequence converges.
Example.
(i)
A finite dimensional vector space (which is isomorphic to
F
n
for some
n
)
is Banach.
(ii) Let X be a compact Hausdorff space. Then let
B(X) = {f : X R such that f is bounded}.
This is obviously a vector space, and we can define the norm be
kfk
=
sup
xX
f
(
x
). It is easy to show that this is a norm. It is less trivial to
show that this is a Banach space.
Let
{f
n
} B
(
X
) be a Cauchy sequence. Then for any
x
,
{f
n
(
x
)
} R
is
also Cauchy. So we can define f(x) = lim
n→∞
f
n
(x).
To show that
f
n
f
, let
ε >
0. By definition of
f
n
being Cauchy,
there is some
N
such that for any
n, m > N
and any fixed
x
, we have
|f
n
(
x
)
f
m
(
x
)
| < ε
. Take the limit as
m
. Then
f
m
(
x
)
f
(
x
). So
|f
n
(
x
)
f
(
x
)
| ε
. Since this is true for all
x
, for any
n > N
, we must
have kf
n
fk ε. So f
n
f.
(iii) Define X as before, and let
C(X) = {f : X R such that f is continuous}.
Since any continuous
f
is bounded, so
C
(
X
)
B
(
X
). We define the norm
as before.
Since we know that
C
(
X
)
B
(
X
), to show that
C
(
X
) is Banach, it suffices
to show that
C
(
X
)
B
(
X
) is closed, i.e. if
f
n
f
,
f
n
C
(
X
), then
f C
(
X
), i.e. the uniform limit of a continuous function is continuous.
Proof can be found in IB Analysis II.
(iv) For 1 p < , define
ˆ
L
p
([0, 1]) = {f : [0, 1] R such that f is continuous}.
We define the norm k · k
ˆ
L
p
by
kfk
ˆ
L
p
=
Z
1
0
|f|
p
dx
1/p
.
It is easy to show that
ˆ
L
p
is indeed a vector space, and we now check that
this is a norm.
(a) kfk
ˆ
L
p
0 is obvious. Also, suppose that
kfk
ˆ
L
p
= 0. Then we must
have
f
= 0. Otherwise, if
f 6
= 0, say
f
(
x
) =
ε
for some
x
. Then there
is some
δ
such that for any
y
(
x δ, x
+
δ
), we have
kf
(
y
)
k
ε
2
.
Hence
kfk
ˆ
L
p
=
Z
1
0
|f|
p
dx
1/p
h
2δ
ε
2
p
i
1/p
> 0.
(b) kλfk = |λ|kfk is obvious
(c)
The triangle inequality is the exactly what the Minkowski inequality
says, which is in the example sheet.
It turns out that
ˆ
L
p
is not a Banach space. We can brute-force a hard
proof here, but we will later develop some tools that allow us to prove this
much more easily.
Hence, we define
L
p
([0
,
1]) to be the completion of
ˆ
L
p
([0
,
1]). In IID
Probability and Measure, we will show that L
p
([0, 1]) is in fact the space
L
p
([0, 1]) =
f : [0, 1] R such that
Z
1
0
|f|
p
dx <
/,
where the integral is the Lebesgue integral, and we are quotienting by the
relation
f g
if
f
=
g
Lebesgue almost everywhere. You will understand
what these terms mean in the IID Probability and Measure course.
(v) `
p
spaces: for p [1, ), define
`
p
(F) =
(
(x
1
, x
2
, ···) : x
i
F,
X
i=1
|x
i
|
p
<
)
,
with the norm
kxk
`
p
=
X
i=1
|x
i
|
p
!
1/p
.
It should be easy to check that this is a normed vector space. Moreover,
this is a Banach space. Proof is in example sheet.
(vi) `
space: we define
`
=
(x
1
, x
2
, ···) : x
i
F, sup
iN
|x
i
| <
with norm
kxk
`
= sup
iN
|x
i
|.
Again, this is a Banach space.
(vii)
Let
B
=
B
(1) be the unit open ball in
R
n
. Define
C
(
B
) to be the set of
continuous functions
f
:
B R
. Note that unlike in our previous example,
these functions need not be bounded. So our previous norm cannot be
applied. However, we can still define a topology as follows:
Let
{K
i
}
i=1
be a sequence of compact subsets of
B
such that
K
i
K
i+1
and
S
i=1
K
i
= B. We define the basis to include
f C(B) : sup
xK
i
|f(x)| <
1
m
for each m, i = 1, 2, ···, as well as the translations of these sets.
This weird basis is chosen such that
f
n
f
in this topology iff
f
n
f
uniformly in every compact set. It can be showed that this is not normable.
1.1 Bounded linear maps
With vector spaces, we studied linear maps. These are maps that respect the
linear structure of a vector space. With normed vector spaces, the right kind of
maps to study is the bounded linear maps.
Definition
(Bounded linear map)
. T
:
X Y
is a bounded linear map if there
is a constant
C >
0 such that
kT xk
Y
Ckxk
X
for all
x X
. We write
B
(
X, Y
)
for the set of bounded linear maps from X to Y .
This is equivalent to saying
T
(
B
X
(1))
B
Y
(
C
) for some
C >
0. This also
equivalent to saying that
T
(
B
) is bounded for every bounded subset
B
of
X
.
Note that this final characterization is also valid when we just have a topological
vector space.
How does boundedness relate to the topological structure of the vector spaces?
It turns out that boundedness is the same as continuity, which is another reason
why we like bounded linear maps.
Proposition.
Let
X
,
Y
be normed vector spaces,
T
:
X Y
a linear map.
Then the following are equivalent:
(i) T is continuous.
(ii) T is continuous at 0.
(iii) T is bounded.
Proof. (i) (ii) is obvious.
(ii)
(iii): Consider
B
Y
(1)
Y
, the unit open ball. Since
T
is continuous
at 0,
T
1
(
B
Y
(1))
X
is open. Hence there exists
ε >
0 such that
B
X
(
ε
)
T
1
(B
Y
(1)). So T (B
x
(ε)) B
Y
(1). So T (B
X
(1)) B
Y
1
ε
. So T is bounded.
(iii)
(i): Let
ε >
0. Then
kT x
1
T x
2
k
Y
=
kT
(
x
1
x
2
)
k
Y
Ckx
1
x
2
k
X
.
This is less than ε if kx
1
x
2
k < C
1
ε. So done.
Using the obvious operations,
B
(
X, Y
) can be made a vector space. What
about a norm?
Definition
(Norm on
B
(
X, Y
))
.
Let
T
:
X Y
be a bounded linear map.
Define kT k
B(X,Y )
by
kT k
B(X,Y )
= sup
kxk≤1
kT xk
Y
.
Alternatively, this is the minimum
C
such that
kT xk
Y
Ckxk
X
for all
x
.
In particular, we have
kT xk
Y
kT k
B(X,Y )
kxk
X
.
1.2 Dual spaces
We will frequently be interested in one particular case of B(X, Y ).
Definition (Dual space). Let V be a normed vector space. The dual space is
V
= B(V, F).
We call the elements of V
functionals. The algebraic dual of V is
V
0
= L(V, F),
where we do not require boundedness.
One particularly nice property of the dual is that
V
is always a Banach
space.
Proposition. Let V be a normed vector space. Then V
is a Banach space.
Proof.
Suppose
{T
i
} V
is a Cauchy sequence. We define
T
as follows: for
any
v V
,
{T
i
(
v
)
} F
is Cauchy sequence. Since
F
is complete (it is either
R
or C), we can define T : V R by
T (v) = lim
n→∞
T
n
(v).
Our objective is to show that
T
i
T
. The first step is to show that we indeed
have T V
, i.e. T is a bounded map.
Let
kvk
1. Pick
ε
= 1. Then there is some
N
such that for all
i > N
, we
have
|T
i
(v) T (v)| < 1.
Then we have
|T (v)| |T
i
(v) T (v)| + |T
i
(v)|
< 1 + kT
i
k
V
kvk
V
1 + kT
i
k
V
1 + sup
i
kT
i
k
V
Since
T
i
is Cauchy,
sup
i
kT
i
k
V
is bounded. Since this bound does not depend
on v (and N), we get that T is bounded.
Now we want to show that kT
i
T k
V
0 as n .
For arbitrary ε > 0, there is some N such that for all i, j > N , we have
kT
i
T
j
k
V
< ε.
In particular, for any v such that kvk 1, we have
|T
i
(v) T
j
(v)| < ε.
Taking the limit as j , we obtain
|T
i
(v) T (v)| ε.
Since this is true for any v, we have
kT
i
T k
V
ε.
for all i > N. So T
i
T .
Exercise: in general, for
X, Y
normed vector spaces, what condition on
X
and Y guarantees that B(X, Y ) is a Banach space?
1.3 Adjoint
The idea of the adjoint is given a
T B
(
X, Y
), produce a dual map”, or an
adjoint T
B(Y
, X
).
There is really only one (non-trivial) natural way of doing this. First we can
think about what
T
should do. It takes in something from
Y
and produces
something in
X
. By the definition of the dual space, this is equivalent to taking
in a function g : Y F and returning a function T
(g) : X F.
To produce this
T
(
g
), the only things we have on our hands to use are
T
:
X Y
and
g
:
Y F
. Thus the only option we have is to define
T
(
g
)
as the composition g T , or T
(g)(x) = g(T(x)) (we also have a silly option of
producing the zero map regardless of input, but this is silly). Indeed, this is the
definition of the adjoint.
Definition
(Adjoint)
.
Let
X, Y
be normal vector spaces. Given
T B
(
X, Y
),
we define the adjoint of T , denoted T
, as a map T
B(Y
, X
) given by
T
(g)(x) = g(T(x))
for x X, y Y
. Alternatively, we can write
T
(g) = g T.
It is easy to show that our T
is indeed linear. We now show it is bounded.
Proposition. T
is bounded.
Proof.
We want to show that
kT
k
B(Y
,X
)
is finite. For simplicity of notation,
the supremum is assumed to be taken over non-zero elements of the space. We
have
kT
k
B(Y
,X
)
= sup
gY
kT
(g)k
X
kgk
Y
= sup
gY
sup
xX
|T
(g)(x)|/kxk
X
kgk
Y
= sup
gY
sup
xX
|g(T x)|
kgk
Y
kxk
X
sup
gY
sup
xX
kgk
Y
kT xk
Y
kgk
Y
kxk
X
sup
xX
kT k
B(X,Y )
kxk
X
kxk
X
= kT k
B(X,Y )
So it is finite.
1.4 The double dual
Definition
(Double dual)
.
Let
V
be a normed vector space. Define
V
∗∗
= (
V
)
.
We want to define a map
φ
:
V V
∗∗
. Again, we can reason about what
we expect this function to do. It takes in a
v V
, and produces a
φ
(
v
)
V
∗∗
.
Expanding the definition, this gives a
φ
(
v
) :
V
F
. Hence this
φ
(
v
) takes in
a g V
, and returns a φ(v)(g) F.
This is easy. Since
g V
, we know that
g
is a function
g
:
V F
. Given
this function
g
and a
v V
, it is easy to produce a
φ
(
v
)(
g
)
F
. Just apply
g
on v:
φ(v)(g) = g(v).
Proposition.
Let
φ
:
V V
∗∗
be defined by
φ
(
v
)(
g
) =
g
(
v
). Then
φ
is a
bounded linear map and kφk
B(V,V
)
1
Proof. Again, we are taking supremum over non-zero elements. We have
kφk
B(V,V
)
= sup
vV
kφ(v)k
V
∗∗
kvk
V
= sup
vV
sup
gV
|φ(v)(g)|
kvk
V
kgk
V
= sup
vV
sup
gV
|g(v)|
kvk
V
kgk
V
1.
In fact, we will later show that kφk
B(V,V
)
= 1.
1.5 Isomorphism
So far, we have discussed a lot about bounded linear maps, which are “morphisms”
between normed vector spaces. It is thus natural to come up with the notion of
isomorphism.
Definition
(Isomorphism)
.
Let
X, Y
be normed vector spaces. Then
T
:
X Y
is an isomorphism if it is a bounded linear map with a bounded linear inverse
(i.e. it is a homeomorphism).
We say X and Y are isomorphic if there is an isomorphism T : X Y .
We say that
T
:
X Y
is an isometric isomorphism if
T
is an isomorphism
and kT xk
Y
= kxk
X
for all x X.
X
and
Y
are isometrically isomorphic if there is an isometric isomorphism
between them.
Example.
Consider a finite-dimensional space
F
n
with the standard basis
{e
1
, ··· , e
n
}. For any v =
P
v
i
e
i
, the norm is defined by
kvk =
X
v
2
i
1/2
.
Then any
g V
is determined by
g
(
e
i
) for
i
= 1
, ··· , n
. We want to show that
there are no restrictions on what
g
(
e
i
) can be, i.e. whatever values I assign to
them, g will still be bounded. We have
kgk
V
= sup
vV
|g(v)|
kvk
sup
vV
P
|v
i
||g(e
i
)|
(
P
|v
i
|
2
)
1/2
C sup
vV
(
P
|v
i
|
2
)
1
2
(
P
|v
i
|
2
)
1
2
sup
i
|g(e
i
)|
= C sup
i
|g(e
i
)|
for some
C
, where the second-to-last line is due to the Cauchy-Schwarz inequality.
The supremum is finite since F
n
is finite dimensional.
Since
g
is uniquely determined by a list of values (
g
(
e
1
)
, g
(
e
2
)
, ··· , g
(
e
n
)),
it has dimension
n
. Therefore,
V
is isomorphic to
F
n
. By the same lines of
argument, V
∗∗
is isomorphic to F
n
.
In fact, we can show that
φ
:
V V
∗∗
by
φ
(
v
)(
g
) =
g
(
v
) is an isometric
isomorphism (this is not true for general normed vector spaces. Just pick
V
to
be incomplete, then V and V
∗∗
cannot be isomorphic since V
∗∗
is complete).
Example. Consider `
p
for p [1, ). What is `
p
?
Suppose q is the conjugate exponent of p, i.e.
1
q
+
1
p
= 1.
(if p = 1, define q = ) It is easy to see that `
q
`
p
by the following:
Suppose (
x
1
, x
2
, ···
)
`
p
, and (
y
1
, y
2
, ···
)
`
q
. Define
y
(
x
) =
P
i=1
x
i
y
i
.
We will show that
y
defined this way is a bounded linear map. Linearity is easy
to see, and boundedness comes from the fact that
kyk
`
p
= sup
x
ky(x)k
kxk
`
p
= sup
x
P
x
i
y
i
kxk
`
p
sup
kxk
`
p
kyk
`
q
kxk
`
p
= kyk
`
p
,
by the older’s inequality. So every (
y
i
)
`
q
determines a bounded linear map.
In fact, we can show `
p
is isomorphic to `
q
.
1.6 Finite-dimensional normed vector spaces
We are now going to look at a special case of normed vector spaces, where the
vector space is finite dimensional.
It turns out that finite-dimensional vector spaces are rather boring. In
particular, we have
(i) All norms are equivalent.
(ii) The closed unit ball is compact.
(iii) They are Banach spaces.
(iv) All linear maps whose domain is finite dimensional are bounded.
These are what we are going to show in this section.
First of all, we need to say what we mean when we say all norms are
“equivalent”
Definition
(Equivalent norms)
.
Let
V
be a vector space, and
k · k
1
,
k · k
2
be
norms on
V
. We say that these are equivalent if there exists a constant
C >
0
such that for any v V , we have
C
1
kvk
2
kvk
1
Ckvk
2
.
It is an exercise to show that equivalent norms induce the same topology,
and hence agree on continuity and convergence. Also, equivalence of norms is an
equivalence relation (as the name suggests).
Now let
V
be an
n
-dimensional vector space with basis
{e
1
, ··· , e
n
}
. We
can define the `
n
p
norm by
kvk
`
n
p
=
n
X
i=1
|v
i
|
p
!
1/p
,
where
v =
n
X
i=1
v
i
e
i
.
Proposition.
Let
V
be an
n
-dimensional vector space. Then all norms on
V
are equivalent to the norm k · k
`
n
1
.
Corollary. All norms on a finite-dimensional vector space are equivalent.
Proof. Let k · k be a norm on V .
Let v = (v
1
, ··· , v
n
) =
P
v
i
e
i
V . Then we have
kvk =
X
v
i
e
i
n
X
i=1
|v
i
|ke
i
k
sup
i
ke
i
k
n
X
i=1
|v
i
|
Ckvk
`
n
1
,
where C = sup ke
i
k < since we are taking a finite supremum.
For the other way round, let
S
1
=
{v V
:
kvk
`
n
1
= 1
}
. We will show the
two following results:
(i) k · k : (S
1
, k · k
`
n
1
) R is continuous.
(ii) S
1
is a compact set.
We first see why this gives what we want. We know that for any continuous map
from a compact set to
R
, the image is bounded and the infimum is achieved. So
there is some v
S
1
such that
kv
k = inf
vS
1
kvk.
Since v
6= 0, there is some c
0
such that kvk c
0
for all v S
1
.
Now take an arbitrary non-zero v V , since
v
kvk
`
n
1
S
1
, we know that
v
kvk
`
n
1
c
0
,
which is to say that
kvk c
0
kvk
`
n
1
.
Since we have found c, c
0
> 0 such that
c
0
kvk
`
n
1
kvk ckvk
`
n
1
,
now let C = max
c,
1
c
0
> 0. Then
C
1
kvk
2
kvk
1
Ckvk
2
.
So the norms are equivalent. Now we can start to prove (i) and (ii).
First, let v, w V . We have
kvk kwk
kv wk Ckv wk
`
n
1
.
Hence when
v
is close to
w
under
`
n
1
, then
kvk
is close to
kwk
. So it is continuous.
To show (ii), it suffices to show that the unit ball
B
=
{v V
:
kvk
`
n
1
1
}
is compact, since
S
1
is a closed subset of
B
. We will do so by showing it is
sequentially compact.
Let {v
(k)
}
k=1
be a sequence in B. Write
v
(k)
=
n
X
i=1
λ
(k)
i
e
i
.
Since v
(k)
B, we have
n
X
i=1
|λ
(k)
i
| 1.
Consider the sequence λ
(k)
1
, which is a sequence in F.
We know that
|λ
(k)
1
|
1. So by Bolzano-Weierstrass, there is a convergent
subsequence λ
(k
j
1
)
1
.
Now look at
λ
(k
j
1
)
2
. Since this is bounded, there is a convergent subsequence
λ
(k
j
2
)
2
.
Iterate this for all
n
to obtain a sequence
k
j
n
such that
λ
(k
j
n
)
i
is convergent
for all i. So v
(k
j
n
)
is a convergent subsequence.
Proposition.
Let
V
be a finite-dimensional normed vector space. Then the
closed unit ball
¯
B(1) = {v V : kvk 1}
is compact.
Proof. This follows from the proof above.
Proposition.
Let
V
be a finite-dimensional normed vector space. Then
V
is a
Banach space.
Proof.
Let
{v
i
} V
be a Cauchy sequence. Since
{v
i
}
is Cauchy, it is bounded,
i.e.
{v
i
}
¯
B
(
R
) for some
R >
0. By above,
¯
B
(
R
) is compact. So
{v
i
}
has a
convergent subsequence
v
i
k
v
. Since
{v
i
}
is Cauchy, we must have
v
i
v
.
So v
i
converges.
Proposition.
Let
V, W
be normed vector spaces,
V
be finite-dimensional. Also,
let T : V W be a linear map. Then T is bounded.
Proof.
Recall discussions last time about
V
for finite-dimensional
V
. We will
do a similar proof.
Note that since
V
is finite-dimensional,
im T
finite dimensional. So wlog
W
is finite-dimensional. Since all norms are equivalent, it suffices to consider the
case where the vector spaces have
`
n
1
and
`
m
1
norm. This can be represented by
a matrix T
ij
such that
T (x
1
, ··· , x
n
) =
X
T
1i
x
i
, ··· ,
X
T
mi
x
i
.
We can bound this by
kT (x
1
, ··· , x
n
)k
m
X
j=1
n
X
i=1
|T
ji
||x
i
| m
sup
i,j
|T
ij
|
n
X
i=1
|x
i
| Ckxk
`
n
1
for some
C >
0, since we are taking the supremum over a finite set. This implies
that kT k
B(`
n
1
,`
m
1
)
C.
There is another way to prove this statement.
Proof.
(alternative) Let
T
:
V W
be a linear map. We define a norm on
V
by kvk
0
= kvk
V
+ kT vk
W
. It is easy to show that this is a norm.
Since
V
is finite dimensional, all norms are equivalent. So there is a constant
C > 0 such that for all v, we have
kvk
0
Ckvk
V
.
In particular, we have
kT vk Ckvk
V
.
So done.
Among all these properties, compactness of
¯
B
(1) characterizes finite dimen-
sionality.
Proposition.
Let
V
be a normed vector space. Suppose that the closed unit
ball
¯
B(1) is compact. Then V is finite dimensional.
Proof. Consider the following open cover of
¯
B(1):
¯
B(1)
[
y
¯
B(1)
B
y,
1
2
.
Since
¯
B
(1) is compact, this has a finite subcover. So there is some
y
1
, ··· , y
n
such that
¯
B(1)
n
[
i=1
B
y
i
,
1
2
.
Now let
Y
=
span{y
1
, ··· , y
n
}
, which is a finite-dimensional subspace of
V
. We
want to show that in fact we have Y = V .
Clearly, by definition of Y , the unit ball
B(1) Y + B
1
2
,
i.e. for every
v B
(1), there is some
y Y, w B
(
1
2
) such that
v
=
y
+
w
.
Multiplying everything by
1
2
, we get
B
1
2
Y + B
1
4
.
Hence we also have
B(1) Y + B
1
4
.
By induction, for every n, we have
B(1) Y + B
1
2
n
.
As a consequence,
B(1)
¯
Y .
Since
Y
is finite-dimensional, we know that
Y
is complete. So
Y
is a closed
subspace of V . So
¯
Y = Y . So in fact
B(1) Y.
Since every element in
V
can be rescaled to an element of
B
(1), we know that
V = Y . Hence V is finite dimensional.
This concludes our discussion on finite-dimensional vector spaces. We’ll end
with an example that shows these are not true for infinite dimensional vector
spaces.
Example.
Consider
`
1
, and
e
i
= (0
,
0
, ··· ,
0
,
1
,
0
, ···
), where
e
i
is 1 on the
i
th
entry, 0 elsewhere.
Note that if i 6= j, then
ke
i
e
j
k = 2.
Since
e
i
¯
B
(1), we see that
¯
B
(1) cannot be covered by finitely many open balls
of radius
1
2
, since each open ball can contain at most one of {e
i
}.
1.7 Hahn–Banach Theorem
Let
V
be a real normed vector space. What can we say about
V
=
B
(
V, R
)?
For instance, If V is non-trivial, must V
be non-trivial?
The main goal of this section is to prove the Hahn–Banach theorem (surprise),
which allows us to produce a lot of elements in
V
. Moreover, it doesn’t just tell
us that
V
is non-empty (this is rather dull), but provides a tool to craft (or at
least prove existence of) elements of V
that satisfy some property we want.
Proposition.
Let
V
be a real normed vector space, and
W V
has co-
dimension 1. Assume we have the following two items:
p : V R (not necessarily linear), which is positive homogeneous, i.e.
p(λv) = λp(v)
for all v V, λ > 0, and subadditive, i.e.
p(v
1
+ v
2
) p(v
1
) + p(v
2
)
for all
v
1
, v
2
V
. We can think of something like a norm, but more
general.
f : W R a linear map such that f(w) p(w) for all w W .
Then there exists an extension
˜
f
:
V R
which is linear such that
˜
f|
W
=
f
and
˜
f(v) p(v) for all v V .
Why do we want this weird theorem? Our objective is to find something in
V
. This theorem tells us that to find a bounded linear map in
V
, we just need
something in
W
bounded by a norm-like object, and then we can extend it to
V
.
Proof.
Let
v
0
V \ W
. Since
W
has co-dimension 1, every element
v V
can be written uniquely as
v
=
w
+
av
0
, for some
w W, a R
. Therefore it
suffices to define
˜
f(v
0
) and then extend linearly to V .
The condition we want to meet is
˜
f(w + av
0
) p(w + av
0
) ()
for all
w W, a R
. If
a
= 0, then this is satisfied since
˜
f
restricts to
f
on
W
.
If a > 0 then () is equivalent to
˜
f(w) + a
˜
f(v
0
) p(w + av
0
).
We can divide by a to obtain
˜
f(a
1
w) +
˜
f(v
0
) p(a
1
w + v
0
).
We let w
0
= a
1
w. So we can write this as
˜
f(v
0
) p(w
0
+ v
0
) f (w
0
),
for all w
0
W .
If a < 0, then () is equivalent to
˜
f(w) + a
˜
f(v
0
) p(w + av
0
).
We now divide by a and flip the sign of the equality. So we have
˜
f(a
1
w) +
˜
f(v
0
) (a
1
)p(w + av
0
).
In other words, we want
˜
f(v
0
) p(a
1
w v
0
) f (a
1
w).
We let w
0
= a
1
w. Then we are left with
˜
f(v
0
) p(w
0
v
0
) + f (w
0
).
for all w
0
W .
Hence we are done if we can define a
˜
f
(
v
0
) that satisfies these two conditions.
This is possible if and only if
p(w
1
v
0
) + f (w
1
) p(w
2
+ v
0
) f (w
2
)
for all w
1
, w
2
. This holds since
f(w
1
) + f (w
2
) = f(w
1
+ w
2
)
p(w
1
+ w
2
)
= p(w
1
v
0
+ w
2
+ v
0
)
p(w
1
v
0
) + p(w
2
+ v
0
).
So the result follows.
The goal is to “iterate” this to get a similar result without the co-dimension
1 assumption. While we can do this directly for finitely many times, this isn’t
helpful (since we already know a lot about finite dimensional normed spaces). To
perform an “infinite iteration”, we need the mysterious result known as Zorn’s
lemma.
Digression on Zorn’s lemma
We first need a few definitions before we can come to Zorn’s lemma.
Definition
(Partial order)
.
A relation
on a set
X
is a partial order if it
satisfies
(i) x x (reflexivity)
(ii) x y and y x implies x = y (antisymmetry)
(iii) x y and y z implies x z (transitivity)
Definition
(Total order)
.
Let (
S,
) be a partial order.
T S
is totally ordered
if for all x, y T , either x y or y x, i.e. every two things are related.
Definition
(Upper bound)
.
Let (
S,
) be a partial order.
S
0
S
subset. We
say b S is an upper bound of this subset if x b for all x S
0
.
Definition
(Maximal element)
.
Let (
S,
) be a partial order. Then
m S
is a
maximal element if x m implies x = m.
The glorious Zorn’s lemma tells us that:
Lemma
(Zorn’s lemma)
.
Let (
S,
) be a non-empty partially ordered set such
that every totally-ordered subset
S
0
has an upper bound in
S
. Then
S
has a
maximal element.
We will not give a proof of this lemma here, but can explain why it should
be true.
We start by picking one element
x
0
in
S
. If it is maximal, then done.
Otherwise, there is some
x
1
> x
0
. If this is not maximal, then pick
x
2
> x
1
. We
do this to infinity “and beyond” after picking infinitely many
x
i
, if we have
not yet reached a maximal element, we take an upper bound of this set, and call
it x
ω
. If this is not maximal, we can continue picking a larger element.
We can do this forever, but if this process never stops, even after infinite
time, we would have picked out more elements than there are in
S
, which is
clearly nonsense. Of course, this is hardly a formal proof. The proper proof can
be found in the IID Logic and Set Theory course.
Back to vector spaces
The Hahn–Banach theorem is just our previous proposition without the constraint
that W has co-dimension 1.
Theorem
(Hahn–Banach theorem*)
.
Let
V
be a real normed vector space, and
W V a subspace. Assume we have the following two items:
p
:
V R
(not necessarily linear), which is positive homogeneous and
subadditive;
f : W R a linear map such that f(w) p(w) for all w W .
Then there exists an extension
˜
f
:
V R
which is linear such that
˜
f|
W
=
f
and
˜
f(v) p(v) for all v V .
Proof. Let S be the set of all pairs (
˜
V ,
˜
f) such that
(i) W
˜
V V
(ii)
˜
f :
˜
V R is linear
(iii)
˜
f|
W
= f
(iv)
˜
f(
˜
v) p(
˜
v) for all
˜
v V
We introduce a partial order
on
S
by (
˜
V
1
,
˜
f
1
)
(
˜
V
2
,
˜
f
2
) if
˜
V
1
˜
V
2
and
˜
f
2
|
˜
V
1
=
˜
f
1
. It is easy to see that this is indeed a partial order.
We now check that this satisfies the assumptions of Zorn’s lemma. Let
{(
˜
V
α
,
˜
f
α
)}
αA
S be a totally ordered set. Define (
˜
V ,
˜
f) by
˜
V =
[
αA
˜
V
α
,
˜
f(x) =
˜
f
α
(x) for x
˜
V
α
.
This is well-defined because
{
(
˜
V ,
˜
f
α
)
}
αA
is totally ordered. So if
x
˜
V
α
1
and
x
˜
V
α
2
, wlog assume (
˜
V
α
1
,
˜
f
α
1
)
(
˜
V
α
2
,
˜
f
α
2
). So
˜
f
α
2
|
˜
V
α
2
=
˜
f
α
1
. So
˜
f
α
1
(x) =
˜
f
α
2
(x).
It should be clear that (
˜
V ,
˜
f
)
S
and (
˜
V ,
˜
f
) is indeed an upper bound of
{(
˜
V
α
,
˜
f
α
)}
αA
. So the conditions of Zorn’s lemma are satisfied.
Hence by Zorn’s lemma, there is an maximal element (
˜
W ,
˜
f
)
S
. Then by
definition,
˜
f
is linear, restricts to
f
on
W
, and bounded by
p
. We now show
that
˜
W = V .
Suppose not. Then there is some
v
0
V \
˜
W
. Define
˜
V
=
span{
˜
W , v
0
}
.
Now
˜
W
is a co-dimensional 1 subspace of
˜
V
. By our previous result, we know
that there is some
˜
˜
f
:
˜
V R
linear such that
˜
˜
f|
˜
W
=
˜
f
and
˜
˜
f
(
v
)
p
(
v
) for all
v
˜
V .
Hence we have (
˜
W ,
˜
˜
f
)
S
but (
˜
W ,
˜
f
)
<
(
˜
V ,
˜
˜
f
). This contradicts the
maximality of (
˜
W ,
˜
f).
There is a particularly important special case of this, which is also known as
Hahn-Banach theorem sometimes.
Corollary
(Hahn-Banach theorem 2.0)
.
Let
W V
be real normed vector
spaces. Given
f W
, there exists a
˜
f V
such that
˜
f|
W
=
f
and
k
˜
fk
V
=
kfk
W
.
Proof.
Use the Hahn-Banach theorem with
p
(
x
) =
kfk
W
kxk
V
for all
x V
.
Positive homogeneity and subadditivity follow directly from the axioms of the
norm. Then by definition
f
(
w
)
p
(
w
) for all
w W
. So Hahn-Banach theorem
says that there is
˜
f
:
V R
linear such that
˜
f|
W
=
f
and
˜
f
(
v
)
p
(
w
) =
kfk
W
kvk
V
.
Now notice that
˜
f(v) kfk
W
kvk
V
,
˜
f(v) =
˜
f(v) kfk
W
kvk
V
implies that |
˜
f(v)| kfk
W
kvk
V
for all v V .
On the other hand, we have (again taking supremum over non-zero v)
k
˜
fk
V
= sup
vV
|
˜
f(v)|
kvk
V
sup
wW
|f(w)|
kwk
W
= kfk
W
.
So indeed we have k
˜
fk
V
= kfk
W
.
We’ll have some quick corollaries of these theorems.
Proposition.
Let
V
be a real normed vector space. For every
v V \ {
0
}
,
there is some f
v
V
such that f
v
(v) = kvk
V
and kf
v
k
V
= 1.
Proof.
Apply Hahn-Banach theorem (2.0) with
W
=
span{v}
,
f
0
v
(
v
) =
kvk
V
.
Corollary.
Let
V
be a real normed vector space. Then
v
=
0
if and only if
f(v) = 0 for all f V
.
Corollary.
Let
V
be a non-trivial real normed vector space,
v, w V
with
v 6= w. Then there is some f V
such that f (v) 6= f(w).
Corollary.
If
V
is a non-trivial real normed vector space, then
V
is non-trivial.
We now want to restrict the discussion to double duals. We define
φ
:
V
V
∗∗
as before by φ(v)(f) = f(v) for v V, f V
.
Proposition. The map φ : V V
∗∗
is an isometry, i.e. kφ(v)k
V
∗∗
= kvk
V
.
Proof. We have previously shown that
kφk
B(V,V
∗∗
)
1.
It thus suffices to show that the norm is greater than 1, or that
kφ(v)k
V
∗∗
kvk
V
.
We can assume v 6= 0, for which the inequality is trivial. We have
kφ(v)k
V
∗∗
= sup
fV
|φ(v)(f)|
kfk
V
|φ(v)(f
v
)|
kf
v
k
V
= |f
v
(v)| = kvk
V
,
where
f
v
is the function such that
f
v
(
v
) =
kvk
V
, kf
v
k
V
= 1 as we have
previously defined.
So done.
In particular,
φ
is injective and one can view
φ
as an isometric embedding of
V into V
∗∗
.
Definition (Reflexive). We say V is reflexive if φ(V ) = V
∗∗
.
Note that any reflexive space is Banach, since V
∗∗
, being the dual of V
, is
reflexive.
You might have heard that for any infinite dimensional vector space V , the
dual of
V
is always strictly larger than
V
. This does not prevent an infinite
dimensional vector space from being reflexive. When we said the dual of
V
is
always strictly larger than
V
, we are referring to the algebraic dual, i.e. the set of
all linear maps from
V
to
F
. In the definition of reflexive (and everywhere else
where we mention “dual” in this course), we mean the continuous dual, where
we look at the set of all bounded linear maps from
V
to
F
. It is indeed possible
for the continuous dual to be isomorphic to the original space, even in infinite
dimensional spaces, as we will see later.
Example.
Finite-dimensional normed vector spaces are reflexive. Also
`
p
is
reflexive for p (1, ).
Recall that given T B(V, W ), we defined T
B(W
, V
) by
T
(f)(v) = f(T v)
for v V, f W
.
We have previously shown that
kT
k
B(W
,V
)
kT k
B(V,W )
.
We will now show that in fact equality holds.
Proposition.
kT
k
B(W
,V
)
= kT k
B(V,W )
.
Proof. We have already shown that
kT
k
B(W
,V
)
kT k
B(V,W )
.
For the other inequality, first let ε > 0. Since
kT k
B(V,W )
= sup
vV
kT vk
W
kvk
V
by definition, there is some
v V
such that
kT vk
W
kTk
B(V,W )
kvk
V
ε
.
wlog, assume kvk
V
= 1. So
kT vk
W
kT k
B(V,W )
ε.
Therefore, we get that
kT
k
B(W
,V
)
= sup
fW
kT
(f)k
V
kfk
W
kT
(f
T v
)k
V
|T
(f
T v
)(v)|
= |f
T v
(T v)|
= kT vk
W
kT k
B(V,W )
ε,
where we used the fact that
kf
T v
k
W
and
kvk
V
are both 1. Since
ε
is arbitrary,
we are done.
2 Baire category theorem
2.1 The Baire category theorem
When we first write the Baire category theorem down, it might seem a bit
pointless. However, it turns out to be a really useful result, and we will be able
to prove surprisingly many results from it.
In fact, the Baire category theorem itself does not involve normed vector
spaces. It works on any metric space. However, most of the applications we have
here are about normed vector spaces.
To specify the theorem, we will need some terminology.
Definition
(Nowhere dense set)
.
Let
X
be a topological space. A subset
E X
is nowhere dense if
¯
E has empty interior.
Usually, we will pick
E
to be closed so that the definition just says that
E
has an empty interior.
Definition
(First/second category, meagre and residual)
.
Let
X
be a topological
space. We say that
Z X
is of first category or meagre if it is a countable union
of nowhere dense sets.
A subset is of second category or non-meagre if it is not of first category.
A subset is residual if X \ Z is meagre.
Theorem
(Baire category theorem)
.
Let
X
be a complete metric space. Then
X is of second category.
This, by definition, means that it is not a countable union of nowhere dense
sets. This is equivalent to saying that if we can write
X =
[
n=1
C
n
,
where each C
n
are closed, then C
n
has a non-empty interior for some n.
Alternatively, we can say that if
U
n
is a countable collection of open dense
sets, then
T
n=1
U
n
6
=
(for if
U
n
is open dense, then
X \ X
n
is closed with
empty interior).
Proof.
We will prove that the intersection of a countable collection of open dense
sets is non-empty. Let U
n
be a countable collection of open dense set.
The key to proving this is completeness, since that is the only information
we have. The idea is to construct a sequence, show that it is Cauchy, and prove
that the limit is in the intersection.
Construct a sequence
x
n
X
and
ε
n
>
0 as follows: let
x
1
, ε
1
be defined
such that
B(x
1
, ε
1
) U
1
. This exists
U
1
is open and dense. By density, there is
some x
1
U
1
, and ε
1
exists by openness.
We define the
x
n
iteratively. Suppose we already have
x
n
and
ε
n
. Define
x
n+1
, ε
n+1
such that
B(x
n+1
, ε
n+1
) B(x
n
, ε
n
) U
n+1
. Again, this is possible
because
U
n+1
is open and dense. Moreover, we choose our
ε
n+1
such that
ε
n+1
<
1
n
so that ε
n
0.
Since
ε
n
0, we know that
x
n
is a Cauchy sequence. By completeness of
X
, we can find an
x X
such that
x
n
x
. Since
x
is the limit of
x
n
, we know
that x B(x
n
, ε
n
) for all n. In particular, x U
n
for all n. So done.
2.2 Some applications
We are going to have a few applications of the Baire category theorem.
Proposition. R \ Q 6= , i.e. there is an irrational number.
Of course, we can also prove this directly by, say, showing that
2
is irrational,
or that noting that
R
is uncountable, but
Q
is not. However, we can also use
the Baire category theorem.
Proof.
Recall that we defined
R
to be the completion of
Q
. So we just have to
show that Q is not complete.
First, note that
Q
is countable. Also, for all
q Q
,
{q}
is closed and has
empty interior. Hence
Q =
[
qQ
{q}
is the countable union of nowhere dense sets. So it is not complete by the Baire
category theorem.
We will show that there are normed vector spaces which are not Banach
spaces.
Proposition. Let
ˆ
`
1
be a normed vector space defined by the vector space
V = {(x
1
, x
2
, ···) : x
i
R, I N such that i > I x
i
= 0},
with componentwise addition and scalar multiplication. This is the space of all
sequences that are eventually zero.
We define the norm by
kxk
ˆ
`
1
=
X
i=1
|x
i
|.
Then
ˆ
`
1
is not a Banach space.
Note that
ˆ
`
1
is not standard notation.
Proof. Let
E
n
= {x
ˆ
`
1
: x
i
= 0, i n}.
By definition,
ˆ
`
1
=
[
n=1
E
n
.
We now show that
E
n
is nowhere dense. We first show that
E
n
is closed. If
x
j
x
in
ˆ
`
1
with
x
j
E
n
, then since
x
j
is 0 from the
n
th component onwards,
x
is also 0 from the
n
th component onwards. So we must have
x E
n
. So
E
n
is
closed.
We now show that
E
n
has empty interior. We need to show that for all
x E
n
and
ε >
0, there is some
y
ˆ
`
1
such that
ky xk < ε
but
y 6∈ E
n
. This
is also easy. Given x = (x
1
, ··· , x
n1
, 0, 0, ···), we consider
y = (x
1
, ··· , x
n1
, ε/2, 0, 0, ···).
Then
ky xk
ˆ
`
1
< ε
but
y 6∈ E
n
. Hence by the Baire category theorem,
ˆ
`
1
is not
complete.
Proposition. There exists an f C([0, 1]) which is nowhere differentiable.
Proof.
(sketch) We want to show that the set of all continuous functions which
are differentiable at at least one point is contained in a meagre subset of
C
([0
,
1]).
Then this set cannot be all of C([0, 1]) since C([0, 1]) is complete.
Let E
m,n
be the set of all f C([0, 1]) such that
(x)(y) 0 < |y x| <
1
m
|f(y) f(x)| < n|y x|.
(where the quantifiers range over [0, 1]).
We now show that
{f C([0, 1]) : f is differentiable somewhere}
[
n,m=1
E
m,n
.
This is easy from definition. Suppose
f
is differentiable at
x
0
. Then by definition,
lim
yx
0
f(y) f(x
0
)
y x
0
= f
0
(x
0
).
Let
n N
be such that
|f
0
(
x
0
)
| < n
. Then by definition of the limit, there
is some
m
such that whenever 0
< |y x| <
1
m
, we have
|f(y)f (x)|
|yx
0
|
< n
. So
f E
m,n
.
Finally, we need to show that each
E
m,n
is closed and has empty interior.
This is left as an exercise for the reader.
Theorem
(Banach-Steinhaus theorem/uniform boundedness principle)
.
Let
V
be a Banach space and
W
be a normed vector space. Suppose
T
α
is a collection
of bounded linear maps T
α
: V W such that for each fixed v V ,
sup
α
kT
α
(v)k
W
< .
Then
sup
α
kT
α
k
B(V,W )
< .
This says that if the set of linear maps is pointwise bounded, then they are
uniformly bounded.
Proof. Let
E
n
= {v V : sup
α
kT
α
(v)k
W
n}.
Then by our conditions,
V =
[
n=1
E
n
.
We can write each E
n
as
E
n
=
\
α
{v V : kT
α
(v)k
W
n}.
Since
T
α
is bounded and hence continuous, so
{v V
:
kT
α
(
v
)
k
W
n}
is
the continuous preimage of a closed set, and is hence closed. So
E
n
, being the
intersection of closed sets, is closed.
By the Baire category theorem, there is some
n
such that
E
n
has non-empty
interior. In particular, (
n
)(
ε >
0)(
v
0
V
) such that for all
v B
(
v
0
, ε
), we
have
sup
α
kT
α
(v)k
W
n.
Now consider arbitrary kv
0
k
V
1. Then
v
0
+
ε
2
v
0
B(v
0
, ε).
So
sup
α
T
α
v
0
+
εv
0
2
W
n.
Therefore
sup
α
kT
α
v
0
k
W
2
ε
n + sup
α
kT
α
v
0
k
.
Note that the right hand side is independent of v
0
. So
sup
kv
0
k≤1
sup
α
kT
α
v
0
k
W
.
Note that this result is not true for general functions. For example, consider
f : [0, 1] R defined by
x
y
1
2
n1
1
2
n
1
2
n+1
1
2
n+2
n
Then for all x [0, 1], we have
sup
n
|f
n
(x)| < ,
but
sup
n
sup
x
|f
n
(x)| = .
However, by a proof very similar to what we had above, we have
Theorem
(Osgood)
.
Let
f
n
: [0
,
1]
R
be a sequence of continuous functions
such that for all x [0, 1]
sup
n
|f
n
(x)| <
Then there are some a, b with 0 a < b 1 such that
sup
n
sup
x[a,b]
|f
n
(x)| < .
Proof. See example sheet.
Theorem
(Open mapping theorem)
.
Let
V
and
W
be Banach spaces and
T
:
V W
be a bounded surjective linear map. Then
T
is an open map, i.e.
T (U) is an open subset of W whenever U is an open subset of V .
Note that surjectivity is necessary. If
q 6∈ T
(
V
), then we can scale
q
down
arbitrarily and still not be in the image of
T
. So
T
(
V
) does not contain an open
neighbourhood of 0, and hence cannot be open.
Proof. We can break our proof into three parts:
(i)
We first want an easy way to check if a map is an open map. We want
to show that
T
is open if and only if
T
(
B
V
(1))
B
W
(
ε
) for some
ε >
0.
Note that one direction is easy if
T
is open, then by definition
T
(
B
V
(1))
is open, and hence we can find the epsilon required. So we are going to
prove the other direction.
(ii) We show that T (B
V
(1)) B
W
(ε) for some ε > 0
(iii)
By rescaling the norm in
W
, we may wlog the
ε
obtained above is in fact
1. We then show that if T(B
V
(1)) B
W
(1), then T (B
V
(1)) B
W
(
1
2
).
We now prove them one by one.
(i)
Suppose
T
(
B
V
(1))
B
W
(
ε
) for some
ε >
0. Let
U V
be an open set.
We want to show that T (U) is open. So let p U, q = T p.
Since
U
is open, there is some
δ >
0 such that
B
V
(
p, δ
)
U
. We can also
write the ball as B
V
(p, δ) = p + B
V
(δ). Then we have
T (U) T (p + B
V
(δ))
= T p + T (B
V
(δ))
= T p + δT (B
V
(1))
q + δB
W
(ε)
= q + B
W
(δε)
= B
W
(q, δε).
So done.
(ii) This is the step where we use the Baire category theorem.
Since T is surjective, we can write W as
W =
[
n=1
T (B
V
(n)) =
[
n=1
T (nB
V
(1)) =
[
n=1
T (nB
V
(1)).
We have written
W
as a countable union of closed sets. Since
W
is a
Banach space, by the Baire category theorem, there is some
n
1 such that
T (nB
V
(1))
has non-empty interior. But since
T (nB
V
(1))
=
nT (B
V
(1))
,
and multiplication by
n
is a homeomorphism, it follows that
T (B
V
(1))
has
non-empty interior. So there is some ε > 0 and w
0
W such that
T (B
V
(1)) B
W
(w
0
, ε).
We have now found an open ball in the neighbourhood, but we want a ball
centered at the origin. We will use linearity in two ways. Firstly, since if
v B
V
(1), then v B
V
(1). By linearly of T , we know that
T (B
V
(1)) B
W
(w
0
, ε).
Then by linearity, intuitively, since the image contains the balls
B
W
(
w
0
, ε
)
and
B
W
(
w
0
, ε
), it must contain everything in between. In particular, it
must contain B
W
(ε).
To prove this properly, we need some additional work. This is easy if we had
T
(
B
V
(1))
B
W
(
w
0
, ε
) instead of the closure of it for any
w B
W
(
ε
),
we let
v
1
, v
2
B
V
(1) be such that
T
(
v
1
) =
w
0
+
w
,
T
(
v
2
) =
w
0
+
w
.
Then v =
v
1
+v
2
2
satisfies kvk
V
< 1 and T (v) = w.
Since we now have the closure instead, we need to mess with sequences.
Since
T (B
V
(1)) ±w
0
+
B
W
(
ε
), for any
w B
W
(
ε
), we can find sequences
(
v
i
) and (
u
i
) such that
kv
i
k
V
, ku
i
k
V
<
1 for all
i
and
T
(
v
i
)
w
0
+
w
,
T (u
i
) w
0
+ w.
Now by the triangle inequality, we get
v
i
+ u
i
2
< 1,
and we also have
v
i
+ u
i
2
w
0
+ w
2
+
w
0
+ w
2
= w.
So w T (B
V
(1)). So T (B
V
(1)) B
W
(ε).
(iii) Let w B
W
(
1
2
). For any δ, we know
T (B
V
(δ)) B
W
(δ).
Thus, picking δ =
1
2
, we can find some v
1
V such that
kv
1
k
V
<
1
2
, kT v
1
wk <
1
4
.
Suppose we have recursively found v
n
such that
kv
n
k
V
<
1
2
n
, kT (v
1
+ ··· + v
n
) wk <
1
2
n+1
.
Then picking
δ
=
1
2
n+1
, we can find
v
n+1
satsifying the properties listed
above. Then
P
n=1
v
n
is Cauchy, hence convergent by completeness. Let
v be the limit. Then
kvk
V
X
i=1
kv
i
k
V
< 1.
Moreover, by continuity of T , we know T v = w. So we are done.
Note that in this proof, we required both
V
and
W
to be Banach spaces.
However, we used the completeness in different ways. We used the completeness
of
V
to extract a limit, but we just used the completeness of
W
to say it is of
second category. In particular, it suffices to assume the image of
T
is of second
category, instead of assuming surjectivity. Hence if we know that
T
:
V W
is a bounded linear map such that
V
is Banach and
im T
is of second category,
then
T
is open. As a consequence
T
has to be surjective (its image contains a
small open ball which we can scale up arbitrary).
We are now going to look at certain applications of the open mapping theorem
Theorem
(Inverse mapping theorem)
.
Let
V, W
be Banach spaces, and
T
:
V W
be a bounded linear map which is both injective and surjective. Then
T
1
exists and is a bounded linear map.
Proof.
We know that
T
1
as a function of sets exists. It is also easy to show
that it is linear since
T
is linear. By the open mapping theorem, since
T
(
U
) is
open for all
U V
open. So (
T
1
)
1
(
U
) is open for all
U V
. By definition,
T
1
is continuous. Hence
T
1
is bounded since boundedness and continuity are
equivalent.
Theorem
(Closed graph theorem)
.
Let
V, W
be Banach spaces, and
T
:
V W
a linear map. If the graph of T is closed, i.e.
Γ(T ) = {(v, T (v)) : v V } V × W
is a closed subset of the product space (using the norm
k
(
v, w
)
k
V ×W
=
max{kvk
V
, kwk
W
}), then T is bounded.
What does this mean? Closedness of the graph means that if
v
n
v
in
V
and
T
(
v
n
)
w
, then
w
=
T
(
v
). What we want to show is continuity, which
is a stronger statement that if
v
n
v
, then
T
(
v
n
) converges and converges to
T (v).
Proof.
Consider
φ
: Γ(
T
)
V
defined by
φ
(
v, T
(
v
)) =
v
. We want to apply
the inverse mapping theorem to this. To do so, we need to show a few things.
First we need to show that the spaces are Banach spaces. This is easy Γ(
T
)
is a Banach space since it is a closed subset of a complete space, and we are
already given that V is Banach.
Now we need to show surjectivity and injectivity. This is surjective since for
any
v V
, we have
φ
(
v, T
(
v
)) =
v
. It is also injective since the function
T
is
single-valued.
Finally, we want to show φ is bounded. This is since
kvk
V
max{kvk, kT (v)k} = k(v, T (v))k
Γ(T )
.
By the inverse mapping theorem,
φ
1
is bounded, i.e. there is some
C >
0 such
that
max{kvk
V
, kT (v)k} Ckvk
V
In particular, kT (v)k Ckvk
V
. So T is bounded.
Example.
We define
D
(
T
) to be equal to
C
1
([0
,
1]) as a vector space, but
equipped with the
C
([0
,
1]) norm instead. We seek to show that
D
(
T
) is not
complete.
To do so, consider the differentiation map
T : D(T ) C([0, 1]),
First of all, this is unbounded. Indeed, consider the sequence of functions
f
n
(x) = x
n
. Then kf
n
k
C([0,1])
= 1 for all n. However,
kT f
n
k
C([0,1])
= sup
x[0,1]
nx
n1
= n.
So T is unbounded.
We claim the graph of
T
is closed. If so, then since
C
([0
,
1]) is complete, the
closed graph theorem implies D(T ) is not complete.
To check this, suppose we have
f
n
f
in the
C
([0
,
1]) norm, and
f
0
n
g
,
again in the C([0, 1]) norm. We want to show that f
0
= g.
By the fundamental theorem of calculus, we have
f
n
(t) = f
n
(0) +
Z
t
0
f
0
n
(x) dx.
Hence by uniform convergence of f
0
n
g and f
n
f, we have
f(t) = f(0) +
Z
t
0
g(x) dx.
So by the fundamental theorem of calculus, we know that
f
0
(
t
) =
g
(
t
). So the
graph of T is closed.
Since we know that
C
([0
,
1]) is complete, this shows that
D
(
T
) is not complete.
Example.
We can also use the Baire category theorem to understand Fourier
series. Let
f
:
S
1
R
be continuous, i.e.
f
: [
π, π
]
R
which is continu-
ous with periodic boundary condition
f
(
π
) =
f
(
π
). We define the Fourier
coefficients
ˆ
f : Z C by
ˆ
f(k) =
1
2π
Z
π
π
e
ikx
f(x) dx.
We define the Fourier series by
X
kZ
e
ikz
ˆ
f(k).
In particular, we define the partial sums as
S
n
(f)(x) =
n
X
k=n
e
ikx
ˆ
f(k).
The question we want to ask is: does the Fourier series converge? We are not
even asking if it converges back to
f
. Just if it converges at all. More concretely,
we want to know if S
n
(f)(x) has a limit as n for f continuous.
Unfortunately, no. We can show that there exists a continuous function
f
such that
S
n
(
f
)(
x
) diverges. To show this, we can consider
φ
n
:
C
(
S
1
)
R
defined by φ
n
(f) = S
n
(f)(0). Assume that
sup
n
|φ
n
(f)|
is finite for all continuous f. By Banach-Steinhaus theorem, we have
sup
n
kφ
n
k
B(C(S
1
),R)
< ,
On the other hand, we can show that
φ
n
(f) =
1
2π
Z
π
π
f(x)
sin
n +
1
2
x
sin
x
2
dx.
It thus suffices to find a sequence f
n
such that kf
n
k
C(S
1
)
1 but
Z
π
π
f(x)
sin
n +
1
2
x
sin
x
2
dx
,
which is a contradiction. Details are left for the example sheet.
What’s the role of the Banach-Steinhaus theorem here? If we wanted to prove
the result directly, then we need to find a single function
f C
(
S
1
) such that
φ
n
(
f
) is unbounded. However, now we just have to find a sequence
f
n
C
(
S
1
)
such that φ
n
(f
n
) . This is much easier.
There is another thing we can ask. Note that if
f
is continuous, then
|
ˆ
f
(
k
)
|
0 as
k ±∞
. In fact, this is even true if
f L
1
(
S
1
), i.e.
f
is Lebesgue
integrable and
Z
π
π
|f(x)| dx < .
A classical question is: do all sequences
{a
n
} C
with
|a
n
|
0 as
n ±∞
arise as the Fourier series of some
f L
1
? The answer is no. We let
˜c
0
be
the set of all such sequences. By the inverse mapping theorem, if the map
φ
:
L
1
(
S
1
)
˜c
0
is surjective, then the inverse is bounded. But this is not true,
since we can find a sequence
f
n
such that
kf
n
k
L
1
(S
1
)
but
sup
k
|
ˆ
f
k
(
`
)
|
1
for all `. Details are again left for the reader.
3 The topology of C(K)
Before we start the chapter, it helps to understand the title. In particular, what
is
C
(
K
)? In the chapter,
K
will denote a compact Hausdorff topological space.
We will first define what it means for a space to be Hausdorff.
Definition
(Hausdorff space)
.
A topological space
X
is Hausdorff if for all
distinct
p, q X
, there are
U
p
, U
q
X
that are open subsets of
X
such that
p U
p
, q U
q
and U
p
U
q
= .
Example. Every metric space is Hausdorff.
What we want to look at here is compact Hausdorff spaces.
Example. [0, 1] is a compact Hausdorff space.
Notation. C(K) is the set of continuous functions f : K R with the norm
kfk
C(K)
= sup
xK
|f(x)|.
There are three themes we will discuss
(i)
There are many functions in
C
(
K
). For example, we will show that given
a finite set of points
{p
i
}
n
i=1
K
and
{y
i
}
n
i=1
R
, there is some
f C
(
k
)
such that
f
(
p
i
) =
y
i
. We will prove this later. Note that this is trivial for
C
([0
,
1]), since we can use piecewise linear functions. However, this is not
easy to prove if
K
is a general compact Hausdorff space. In fact, we can
prove a much stronger statement, known as the Tietze-Urysohn theorem.
(ii)
Elements of
C
(
K
) can be approximated by nice functions. This should be
thought of as a generalization of the Weierstrass approximation theorem,
which states that polynomials are dense in
C
([0
,
1]), i.e. every continu-
ous function can be approximated uniformly to arbitrary accuracy by
polynomials.
(iii)
Compact subsets of
C
(
K
). One question we would like to understand is
that given a sequence of functions
{f
k
}
n=1
C
(
K
), when can we extract
a convergent subsequence?
3.1 Normality of compact Hausdorff spaces
At this point, it is helpful to introduce a new class of topological spaces.
Definition
(Normal space)
.
A topological space
X
is normal if for every disjoint
pair of closed subsets
C
1
, C
2
of
X
, there exists
U
1
, U
2
X
disjoint open such
that C
1
U
1
, C
2
U
2
.
This is similar to being Hausdorff, except that instead of requiring the ability
to separate points, we want the ability to separate closed subsets.
In general, one makes the following definition:
Definition
(
T
i
space)
.
A topological space
X
has the
T
1
property if for all
x, y X, where x 6= y, there exists U X open such that x U and y 6∈ U .
A topological space X has the T
2
property if X is Hausdorff.
A topological space
X
has the
T
3
property if for any
x X
,
C X
closed
with
x 6∈ C
, then there are
U
x
, U
C
disjoint open such that
x U
x
, C U
C
.
These spaces are called regular.
A topological space X has the T
4
property if X is normal.
Note here that
T
4
and
T
1
together imply
T
2
. It suffices to notice that
T
1
implies that
x
is a closed set for all
x
for all
y
, let
U
y
be such that
y U
y
and x 6∈ U
y
. Then we can write
X \ {x} =
[
y6=x
U
y
,
which is open since it is a union of open sets.
More importantly, we have the following theorem:
Theorem.
Let
X
be a Hausdorff space. If
C
1
, C
2
X
are compact disjoint
subsets, then there are some
U
1
, U
2
X
disjoint open such that
C
1
U
1
, C
2
,
U
2
.
In particular, if
X
is a compact Hausdorff space, then
X
is normal (since
closed subsets of compact spaces are compact).
Proof.
Since
C
1
and
C
2
are disjoint, by the Hausdorff property, for every
p C
1
and q C
2
, there is some U
p,q
, V
p,q
X disjoint open with p U
p,q
, q V
p,q
.
Now fix a
p
. Then
S
qC
2
V
p,q
C
2
is an open cover. Since
C
2
is compact,
there is a finite subcover, say
C
2
n
[
i=1
V
p,q
i
for some {q
1
, ··· , q
n
} C
2
.
Note that n and q
i
depends on which p we picked at the beginning.
Define
U
p
=
n
\
i=1
U
p,q
i
, V
p
=
n
[
i=1
V
p,q
i
.
Since these are finite intersections and unions,
U
p
and
V
p
are open. Also,
U
p
and V
p
are disjoint. We also know that C
2
V
p
.
Now note that
S
pC
1
U
p
C
1
is an open cover. By compactness of
C
1
, there
is a finite subcover, say
C
1
m
[
j=1
U
p
j
for some {p
1
, ··· , p
m
} C
1
.
Now define
U =
m
[
j=1
U
p
j
, V =
m
\
j=1
V
p
j
.
Then U and V are disjoint open with C
1
U, C
2
V . So done.
Why do we care about this? It turns out it is easier to discuss continuous
functions on normal spaces rather than Hausdorff spaces. Hence, if we are given
that a space is compact Hausdorff (e.g. [0, 1]), then we know it is normal.
3.2 Tietze-Urysohn extension theorem
The objective of this chapter is to show that if we have a continuous function
defined on a closed subset of a normal space, then we can extend this to the
whole of the space.
We start a special case of this theorem.
Lemma
(Urysohn’s lemma)
.
Let
X
be normal and
C
0
, C
1
be disjoint closed
subsets of
X
. Then there is a
f C
(
X
) such that
f|
C
0
= 0 and
f|
C
1
= 1, and
0 f(x) 1 for all X.
Before we prove this, let’s look at a “stupid” example. Let
X
= [
1
,
2]. This
is compact Hausdorff. We let
C
0
= [
1
,
0] and
C
1
= [1
,
2]. To construct the
function f , we do the obvious thing:
1 0 1 2
We can define this function f (in [0, 1]) by
f(x) = inf
n
a
2
n
: a, n N, 0 a 2
n
, x
a
2
n
o
.
This is obviously a rather silly way to write our function out. However, this is
what we will end up doing in the proof below. So keep this in mind for now.
Proof. In this proof, all subsets labeled C are closed, and all subsets labeled U
are open.
First note that normality is equivalent to the following: suppose
C U X
,
where
U
is open and
C
is closed. Then there is some
˜
C
closed,
˜
U
open such that
C
˜
U
˜
C U.
We start by defining
U
1
=
X \ C
1
. Since
C
0
and
C
1
are disjoint, we know
that C
0
U
1
. By normality, there exists C
1
2
and U
1
2
such that
C
0
U
1
2
C
1
2
U
1
.
Then we can find C
1
4
, C
3
4
, U
1
4
, U
3
4
such that
C
0
U
1
4
C
1
4
U
1
2
C
1
2
U
3
4
C
3
4
U
1
.
Iterating this, we get that for all dyadic rationals
q
=
a
2
n
,
a, n N,
0
< a <
2
n
,
there are some U
q
open, C
q
closed such that U
q
C
q
, with C
q
U
q
0
if q < q
0
.
We now define f by
f(x) = inf {q (0, 1] dyadic rational : x U
q
},
with the understanding that inf = 1. We now check the properties desired.
By definition, we have 0 f(x) 1.
If x C
0
, then x U
q
for all q. So f(x) = 0.
If x C
1
, then x 6∈ U
q
for all q. So f(x) = 1.
To show
f
is continuous, it suffices to check that
{x
:
f
(
x
)
> α}
and
{x
:
f
(
x
)
< α}
are open for all
α R
, as this shows that the pre-images of
all open intervals in R are open. We know that
f(x) < α inf{q (0, 1) dyadic rational : x U
q
} < α
(q) q < α and x U
q
x
[
q
U
q
.
Hence we have
{x : f(x) < α} =
[
q
U
q
.
which is open, since each U
q
is open for all q. Similarly we know that
f(x) > α inf{q : x U
q
} > α
(q > α) x 6∈ C
q
x
[
q
X \ C
q
.
Since this is a union of complement of closed sets, this is open.
With this, we can already say that there are many continuous functions. We
can just pick some values on some
C
0
, C
1
, and then get a continuous function
out of it. However, we can make a stronger statement.
Theorem
(Tietze-Urysohn extension theorem)
.
Let
X
be a normal topological
space, and
C X
be a closed subset. Suppose
f
:
C R
is a continuous
function. Then there exists an extension
˜
f
:
X R
which is continuous and
satisfies
˜
f|
C
= f and k
˜
fk
C(X)
= kfk
C(C)
.
This is in some sense similar to the Hahn-Banach theorem, which states that
we can extend linear maps to larger spaces.
Note that this implies the Urysohn’s lemma, since if
C
0
and
C
1
are closed,
then
C
0
C
1
is closed. However, we cannot be lazy and not prove Urysohn’s
lemma, because the proof of this theorem relies on the Urysohn’s lemma.
Proof.
The idea is to repeatedly use Urysohn’s lemma to get better and better
approximations. We can assume wlog that 0
f
(
x
)
1 for all
x C
. Otherwise,
we just translate and rescale our function. Moreover, we can assume that the
sup
xC
f
(
x
) = 1. It suffices to find
˜
f
:
X R
with
˜
f|
C
=
f
with 0
˜
f
(
x
)
1 for
all x X.
We define the sequences of continuous functions
f
i
:
C R
and
g
i
:
X R
for
i N
. We want to think of the sum
P
n
i=0
g
i
to be the approximations, and
f
n+1
the error on C.
Let
f
0
=
f
. This is the error we have when we approximate with the zero
function.
We first define g
0
on a subset of X by
g
0
(x) =
(
0 x f
1
0

0,
1
3

1
3
x f
1
0

2
3
, 1

.
We can then extend this to the whole of
X
with 0
g
0
(
x
)
1
3
for all
x
by
Urysohn’s lemma.
1
3
2
3
1
g
0
(x)
f(x)
We define
f
1
= f
0
g
0
|
C
.
By construction, we know that 0
f
1
2
3
. This is our first approximation.
Note that we have now lowered our maximum error from 1 to
2
3
. We now repeat
this.
Given f
i
: C R with 0 f
i
2
3
i
, we define g
i
by requiring
g
i
(x) =
0 x f
1
i
h
0,
1
3
2
3
i
i
1
3
2
3
i
x f
1
i
h
2
3
i+1
,
2
3
i
i
,
and then extending to the whole of
X
with 0
g
i
1
3
2
3
i
and
g
i
continuous.
Again, this exists by Urysohn’s lemma. We then define f
i+1
= f
i
g
i
|
C
.
We then have
n
X
i=0
g
i
= (f
0
f
1
) + (f
1
f
2
) + ··· + (f
n
f
n+1
) = f f
n+1
.
We also know that
0 f
i+1
2
3
i+1
.
We conclude by letting
˜
f =
X
i=0
g
i
.
This exists because we have the bounds
0 g
i
1
3
2
3
i
,
and hence
P
n
i=0
g
i
is Cauchy. So the limit exists and is continuous by the
completeness of C(X).
Now we check that
n
X
i=0
g
i
|
C
f = f
n+1
.
Since we know that kf
n+1
k
C
(C)
0. Therefore, we know that
X
i=0
g
i
C
=
˜
f|
C
= f.
Finally, we check the bounds. We need to show that 0
˜
f
(
x
)
1. This is true
since g
i
0 for all i, and also
|
˜
f(x)|
X
i=0
g
i
(x)
n
X
i=0
1
3
2
3
i
= 1.
So done.
We can already show what was stated last time if
K
is compact Hausdorff,
{p
1
, ··· , p
n
} K
a finite set of points, and
{y
1
, ··· , y
n
} R
, then there exists
f
:
K R
continuous such that
f
(
p
i
) =
y
i
. This is since compact Hausdorff
spaces are normal, and singleton points are closed sets in Hausdorff spaces. In
fact, we can prove this directly with the Urysohn’s lemma, by, say, requesting
functions
f
i
such that
f
i
(
p
i
) =
y
i
,
f
i
(
p
j
) = 0 for
i 6
=
j
. Then we just sum all
f
i
.
Note that normality is necessary for Urysohn’s lemma. Since Urysohn’s
lemma is a special case of the Tietze-Urysohn extension theorem, normality is
also necessary for the Tietze-Urysohn extension theorem. In fact, the lemma
is equivalent to normality. Let
C
0
, C
1
be disjoint closed sets of
X
. If there is
some
f
:
X R
such that
f|
C
0
= 0,
f|
C
1
= 1, then
U
0
=
f
1

−∞,
1
3

and
U
1
= f
1

2
3
,

are open disjoint sets such that C
0
U
0
, C
1
U
1
.
Closedness of
C
0
and
C
1
is also necessary in Urysohn’s lemma. For example,
we cannot extend f : [0,
1
2
) (
1
2
, 1] to [0, 1] continuously, where f is defined as
f(x) =
(
0 x <
1
2
1 x >
1
2
.
3.3 Arzel`a-Ascoli theorem
Let
K
be compact Hausdorff, and
{f
n
}
n=1
be a sequence of continuous functions
f
n
:
K R
(or
C
). When does (
f
n
) have a convergent subsequence in
C
(
K
)?
In other words, when is there a subsequence which converges uniformly?
This will be answered by the Arzel`a-Ascoli theorem. Before we get to that,
we look at some examples.
Example.
Let
K
= [0
,
1], and
f
n
(
x
) =
n
. This does not have a convergent
subsequence in C(K) since it does not even have a subsequence that converges
pointwise. This is since f
n
is unbounded.
We see that unboundedness is one “enemy” that prevents us from having a
convergent subsequence.
Example. We again let K[0, 1]), and let f be defined as follows:
1
1
n
1
We know that
f
n
does not have a convergent subsequence in
C
(
K
), since any
subsequence must converge pointwise to
f(x) =
(
0 x 6= 0
1 x = 0
,
which is not continuous.
What is happening here? For every n, fixed x and every ε, by continuity of
f
n
, there ie some
δ
such that
|x y| < δ
implies
|f
n
(
x
)
f
n
(
y
)
| < ε
, but this
choice of
δ
depends on
n
, and there is no universal choice that works for us. This
is another problem that leads to the lack of a limit.
The Arzel`a-Ascoli theorem tells us these are the only two “enemies” that
prevent us from extracting a convergent subsequence.
To state this theorem, we first need a definition.
Definition
(Equicontinuous)
.
Let
K
be a topological space, and
F C
(
K
).
We say
F
is equicontinuous at
x K
if for every
ε
, there is some
U
which is an
open neighbourhood of x such that
(f F )(y U) |f(y) f (x)| < ε.
We say F is equicontinuous if it is equicontinuous at x for all x K.
Theorem
(Arzel`a-Ascoli theorem)
.
Let
K
be a compact topological space. Then
F C
(
K
) is pre-compact, i.e.
¯
F
is compact, if and only if
F
is bounded and
equicontinuous.
This indeed applies to the problem of extracting a uniformly convergent
subsequence, since
C
(
K
) is a metric space, and compactness is equivalent to
sequential compactness. Indeed, let (
f
n
) be a bounded and equicontinuous
sequence in
C
(
K
). Then
F
=
{f
n
:
n N} C
(
K
) is bounded and equicon-
tinuous. So it is pre-compact, and hence (
f
n
), being a sequence in
¯
F
, has a
convergent subsequence.
To prove this, it helps to introduce some more terminology and a few lemmas
first.
Definition
(
ε
-net)
.
Let
X
be a metric space, and let
E X
. For
ε >
0, we say
that N X is an ε-net for E if and only if
S
xN
B(x, ε) E.
Definition
(Totally bounded subset)
.
Let
X
be a metric space, and
E X
.
We say that E is totally bounded for every ε, there is a finite ε-net N
ε
for E.
An important result about totally bounded subsets is the following:
Proposition.
Let
X
be a complete metric space. Then
E X
is totally
bounded if and only if for every sequence
{y
i
}
i=1
E
, there is a subsequence
which is Cauchy.
By completeness, we can rewrite this as
Corollary.
Let
X
be a complete metric space. Then
E X
is totally bounded
if and only if
¯
E is compact.
We’ll prove these later. For now, we assume this corollary and we will prove
Arzel`a-Ascoli theorem.
Theorem
(Arzel`a-Ascoli theorem)
.
Let
K
be a compact topological space. Then
F C
(
K
) is pre-compact, i.e.
¯
F
is compact, if and only if
F
is bounded and
equicontinuous.
Proof.
By the previous corollary, it suffices to prove that
F
is totally bounded if
and only if F is bounded and equicontinuous. We first do the boring direction.
(
) Suppose
F
is totally bounded. First notice that
F
is obviously bounded,
since F can be written as the finite union of ε-balls, which must be bounded.
Now we show
F
is equicontinuous. Let
ε >
0. Since
F
is totally bounded,
there exists a finite
ε
-net for
F
, i.e. there is some
{f
1
, ··· , f
n
} F
such that
for every f F , there exists an i {1, ··· , n} such that kf f
i
k
C(K)
< ε.
Consider a point
x K
. Since
{f
1
, ··· , f
n
}
are continuous, for each
i
, there
exists a neighbourhood U
i
of x such that |f
i
(y) f
i
(x)| < ε for all y U
i
.
Let
U =
n
\
i=1
U
i
.
Since this is a finite intersection,
U
is open. Then for any
f F
,
y U
, we can
find some i such that kf f
i
k
C(K)
< ε. So
|f(y) f(x)| |f(y) f
i
(y)|+ |f
i
(y) f
i
(x)| + |f
i
(x) f (x)| < 3ε.
So F is equicontinuous at x. Since x was arbitrary, F is equicontinuous.
(
) Suppose
F
is bounded and equicontinuous. Let
ε >
0. By equicontinuity,
for every
x K
, there is some neighbourhood
U
x
of
x
such that
|f
(
y
)
f
(
x
)
| < ε
for all y U
x
, f F . Obviously, we have
[
xK
U
x
= K.
By the compactness of K, there are some {x
1
, ··· , x
n
} such that
n
[
i=1
U
x
i
K.
Consider the restriction of functions in
F
to these points. This can be viewed
as a bounded subset of
`
n
, the
n
-dimensional normed vector space with the
supremum norm. Since this is finite-dimensional, boundedness implies total
boundedness (due to, say, the compactness of the closed unit ball). In other
words, there is a finite
ε
-net
{f
1
, ··· , f
n
}
such that for every
f F
, there is a
j {1, ··· , m} such that
max
i
|f(x
i
) f
j
(x
i
)| < ε.
Then for every f F , pick an f
j
such that the above holds. Then
kf f
j
k
C(K)
= sup
y
|f(y) f
j
(y)|
Since {U
x
i
} covers K, we can write this as
= max
i
sup
yU
x
i
|f(y) f
j
(y)|
max
i
sup
yU
x
i
|f(y) f(x
i
)| + |f (x
i
) f
j
(x
i
)| + |f
j
(x
i
) f
j
(y)|
< ε + ε + ε = 3ε.
So done.
Now we return to prove the proposition we just used to prove Arzel`a-Ascoli.
Proposition.
Let
X
be a (complete) metric space. Then
E X
is totally
bounded if and only if for every sequence
{y
i
}
i=1
E
, there is a subsequence
which is Cauchy.
Proof.
(
) Let
E X
be totally bounded,
{y
i
} E
. For every
j N
, there
exists a finite
1
j
-net, call it N
j
.
Now since
N
1
is finite, there is some
x
1
such that there are infinitely many
y
i
’s in B(x
1
, 1). Pick the first y
i
in B(x
1
, 1) and call it y
i
1
.
Now there is some
x
2
N
2
such that there are infinitely many
y
i
’s in
B
(
x
1
,
1)
B
(
x
2
,
1
2
). Pick the one with smallest value of
i > i
1
, and call this
y
i
2
.
Continue till infinity.
This procedure gives a sequence x
i
N
i
and a subsequence {y
i
k
}, and also
y
i
n
n
\
j=1
B
x
j
,
1
j
.
It is easy to see that {y
i
n
} is Cauchy since if m > n, then d(y
i
m
, y
i
n
) <
2
n
.
(
) Suppose
E
is not totally bounded. So there is no finite
ε
-net. Pick any
y
1
. Pick y
2
such that d(y
1
, y
2
) ε. This exists because there is no finite ε-net.
Now given
y
1
, ··· , y
n
such that
d
(
y
i
, y
j
)
ε
for all
i, j
= 1
, ··· , n
,
i 6
=
j
,
we pick
y
n+1
such that
d
(
y
n+1
, y
j
)
ε
for all
j
= 1
, ··· , n
. Again, this exists
because there is no finite
ε
-net. Then clearly any subsequence of
{y
n
}
is not
Cauchy.
Note that the first part is similar to the proof of Bolzano-Weierstrass in
R
n
by repeated bisection.
Recall that at the beginning of the chapter, we have seen that boundedness
and equicontinuity assumptions are necessary. The compactness of
K
is also
important. Let
X
=
R
, which is not compact, and let
φ C
c
, an infinitely
differentiable function with compact support, say, the bump function.
x
We now let
f
n
(
x
) =
φ
(
x n
), i.e. we shift our bump function to the right by
n
units. This sequence is clearly bounded and equicontinuous, but this has no
convergent subsequence
f
n
converges pointwise to the zero function, but the
convergence is not uniform, and this is true for arbitrary subsequences as well.
We are going to look at some applications of the theorem:
Example.
Let
K R
be a compact space,
{f
n
}
n=1
be a sequence of continu-
ously differentiable functions in C(K), such that
sup
x
sup
n
(|f
n
(x)| + |f
0
n
(x)|) < C
for some
C
. Then there is a convergent subsequence. We clearly have uniform
boundedness. To obtain equicontinuity, since the derivative is bounded, by the
mean value theorem, we have
|f
n
(x) f
n
(y)|
|x y|
= |f
0
n
(z)| C
So
|f
n
(x) f
n
(y)| C|x y|.
Consider the ordinary differential equation
x
0
=
f
(
x
) with the boundary
conditions
x
(0) =
x
0
R
n
. Recall from IB Analysis II that Picard-Lindel¨of
theorem says that if
f
is a Lipschitz function, then there exists some
ε >
0 such
that the ODE has a unique solution in (ε, ε).
What if f is not Lipschitz? If so, we can get the following
Theorem
(Peano*)
.
Given
f
continuous, then there is some
ε >
0 such that
x
0
= f(x) with boundary condition x(0) = x
0
R has a solution in (ε, ε).
Note that uniqueness is false. For example, if
x
0
=
p
|x|
,
x
(0) = 0, then
x(t) = 0 and x(t) = t
2
are both solutions.
Proof.
(sketch) We approximate
f
by a sequence of continuously differentiable
functions
f
n
such that
kf f
n
k
C(K)
0 for some
K R
. We use Picard-
Lindel¨of to get a solution for all
n
. Then we use the ODE to get estimates for
the solution. Finally, we can use Arzel`a-Ascoli to extract a limit as
n
. We
can then show it is indeed a solution.
3.4 Stone–Weierstrass theorem
Here we will prove the Stone–Weierstrass theorem, which is a generalization of
the classical Weierstrass approximation theorem.
Theorem
(Weierstrass approximation theorem)
.
The set of polynomials are
dense in C([0, 1]).
This tells us that
C
([0
,
1]) is not too “big” because it is has a dense subset
of “nice” functions.
We will try to generalize this for more general domains. Note that for this
section, real-valued and complex-valued continuous functions are somewhat
differentiable. So we will write these as C
R
(K) and C
C
(K).
To state this theorem, we need some definitions.
Definition
(Algebra)
.
A vector space (
V,
+) is called an algebra if there is an
operation (called multiplication)
·
:
V V
such that (
V,
+
, ·
) is a rng (i.e. ring
not necessarily with multiplicative identity). Also,
λ
(
v ·w
) = (
λv
)
·w
=
v ·
(
λw
)
for all λ F, v, w V .
If V is in addition a normed vector space and
kv ·wk
V
kvk
V
· kwk
V
for all v, w V , then we say V is a normed algebra.
If V complete normed algebra, we say V is a Banach algebra.
If
V
is an algebra that is commutative as a rng, then we say
V
is a commutative
algebra.
If V is an algebra with multiplicative identity, then V is a unital algebra.
Example. C(K) is a commutative, unital, Banach algebra.
Example. Recall from the example sheets that if T
1
, T
2
: V V , then
kT
1
T
2
k
B(V,V )
kT
1
k
B(V,V )
kT
2
k
B(V,V )
.
So B(V, V ) is a unital normed algebra.
We will need this language to generalize the Weierstrass approximation
theorem. The main problem in doing so is that we have to figure out what we
can generalize polynomials to. This is why we need these funny definitions.
Theorem
(Stone-Weierstrass theorem)
.
Let
K
be compact, and
A C
R
(
K
)
be a subalgebra (i.e. it is a subset that is closed under the operations) with the
property that it separates points, i.e. for every
x, y K
distinct, there exists
some
f A
such that
f
(
x
)
6
=
f
(
y
). Then either
¯
A
=
C
R
(
K
) or there is some
x
0
K such that
¯
A = {f C
R(K)
: f(x
0
) = 0}.
Note that it is not possible that
¯
A
is always zero on 2 or more points, since
A separates points.
This indeed implies the Weierstrass approximation theorem, since polynomials
separates points (consider the polynomial
x
), and the polynomial 1 is never 0
for all x. In fact, this works also for polynomials on compact subsets of R
n
.
Note, however, that the second case of Stone-Weierstrass theorem can actually
happen. For example, consider
K
= [0
,
1] compact, and
A
be the algebra
generated by x. Then
¯
A = {f C
R
(K) : f(0) = 0}.
We will prove this in using two lemmas:
Lemma.
Let
K
compact,
L C
R
(
K
) be a subset which is closed under taking
maximum and minimum, i.e. if
f, g L
, then
max{f, g} L
and
min{f, g} L
(with
max{f, g}
defined as
max{f, g}
(
x
) =
max{f
(
x
)
, g
(
x
)
}
, and similarly for
minimum).
Given
g C
R
(
K
), assume further that for any
ε >
0 and
x, y K
, there
exists f
x,y
L such that
|f
x,y
(x) g(x)| < ε, |f
x,y
(y) g(y)| < ε.
Then there exists some f L such that
kf gk
C
R
(K)
< ε,
i.e. g
¯
L.
This means that if we are allowed to take maximums and minimums, to be
able to approximate a function, we just need to be able to approximate it at any
two points.
The idea is to next show that if
A
is a subalgebra, then
¯
A
is closed under
taking maximum and minimum. Then use the ability to separate points to find
f
x,y
, and prove that we can approximate arbitrary functions.
Proof. Let g C
R
(K) and ε > 0 be given. So for every x, y K, there is some
f
x,y
L such that
|f
x,y
(x) g(x)| < ε, |f
x,y
(y) g(y)| < ε.
Claim.
For each
x K
, there exists
f
x
L
such that
|f
x
(
x
)
g
(
x
)
| < ε
and
f
x
(z) < g(z) + ε for all z K.
Since f
x,y
is continuous, there is some U
x,y
containing y open such that
|f
x,y
(z) g(z)| < ε
for all z U
x,y
. Since
[
yK
U
x,y
K,
by compactness of K, there exists a some y
1
, ··· , y
n
such that
n
[
i=1
U
x,y
i
K.
g
g + ε
x
y
1
y
2
y
3
f
y
1
f
y
2
f
y
3
We then let
f
x
(z) = min{f
x,y
1
(z), ··· , f
x,y
n
(z)}
for every
z K
. We then see that this works. Indeed, by assumption,
f
x
L
.
If z K is some arbitrary point, then z U
x,y
i
for some i. Then
f
x,y
i
(z) < g(z) + ε.
Hence, since f
x
is the minimum of all such f
x,y
i
, for any z, we have
f
x
(z) < g(z) + ε.
The property at x is also clear.
Claim. There exists f L such that |f(z) g(z)| < ε for all z K.
We are going to play the same game with this. By continuity of
f
x
, there is
V
x
containing x open such that
|f
x
(w) g(w)| < ε
for all w V
x
. Since
[
xK
V
x
K,
by compactness of K, there is some {x
1
, ··· , x
m
} such that
m
[
j=1
V
x
j
K.
Define
f(z) = max{f
x
1
(z), ··· , f
x
m
(z)}.
Again, by assumption, f L. Then we know that
f(z) > g(z) ε.
We still have our first bound
f(z) < g(z) + ε.
Therefore we have
kf gk
C
R
(K)
< ε.
Lemma.
Let
A C
R
(
K
) be a subalgebra that is a closed subset in the topology
of C
R
(K). Then A is closed under taking maximum and minimum.
Proof. First note that
max{f(x), g(x)} =
1
2
(f(x) + g(x)) +
1
2
|f(x) g(x)|,
min{f(x), g(x)} =
1
2
(f(x) + g(x))
1
2
|f(x) g(x)|.
Since
A
is an algebra, it suffices to show that
f A
implies
|f| A
for every
f
such that kf k
C
R
(K)
1.
The key observation is the following: consider the function
h
(
x
) =
x + ε
2
.
Then
h
(
x
2
) approximates
|x|
. This has the property that the Taylor expansion
of
h
(
x
) centered at
x
=
1
2
is uniformly convergent for
x
[0
,
1]. Therefore there
exists a polynomial S(x) such that
|S(x)
p
x + ε
2
| < ε.
Now note that
S
(
x
)
S
(0) is a polynomial with no constant term. Therefore,
since A is an algebra, if f A, then S(f
2
) S(0) A by closure.
Now look at
k|f|(S(f
2
) S(0))k
C
R
(K)
k|f|
p
f
2
+ ε
2
k+ k
p
f
2
+ ε
2
S(f
2
)k+ kS(0)k.
We will make each individual term small. For the first term, note that
sup
x[0,1]
|x
p
x
2
+ ε
2
| = sup
x[0,1]
ε
2
|x +
x
2
+ ε
2
|
= ε.
So the first term is at most
ε
. The second term is also easy, since
S
is chosen
such that
|S
(
x
)
x + ε
2
| <
1 for
x
[0
,
1], and
|f
(
x
)
2
|
1 for all
x
[0
,
1].
So it is again bounded by ε.
By the same formula, |S(0)
0 + ε
2
| < ε. So |S(0)| < 2ε. So
k|f| (S(f
2
) S(0))k
C
R
(K)
< 4ε.
Since
ε >
0 and
A
is closed in the topology of
C
R
(
K
),
f A
and
kfk
C
R
(K)
1
implies that |f| A.
Note that if we have already proven the classical Weierstrass approximation
theorem, we can just use it to get a polynomial approximation for |f| directly.
We will now combine both lemmas and prove the Stone-Weierstrass theorem.
Theorem
(Stone-Weierstrass theorem)
.
Let
K
be compact, and
A C
R
(
K
)
be a subalgebra (i.e. it is a subset that is closed under the operations) with the
property that it separates points, i.e. for every
x, y K
distinct, there exists
some
f A
such that
f
(
x
)
6
=
f
(
y
). Then either
¯
A
=
C
R
(
K
) or there is some
x
0
K such that
¯
A = {f C
R(K)
: f(x
0
) = 0}.
Proof.
Note that there are two possible outcomes. We will first look at the first
possibility.
Consider the case where for all
x K
, there is some
f A
such that
f
(
x
)
6
= 0.
Let
g C
R
(
K
) be given. By our previous lemmas, to approximate
g
in
¯
A
, we
just need to show that we can approximate
g
at two points. So given any
ε >
0,
x, y K, we want to find f
x,y
A such that
|f
x,y
(x) g(x)| < ε, |f
x,y
(y) g(y)| < ε. ()
For every
x, y K
,
x 6
=
y
, we first show that there exists
h
x,y
A
such that
h
x,y
(
x
)
6
= 0, and
h
x,y
(
x
)
6
=
h
x,y
(
y
). This is easy to see. By our assumptions, we
can find the following functions:
(i) There exists h
(1)
x,y
such that h
(1)
x,y
6= h
(1)
x,y
(y)
(ii) There exists h
(2)
x,y
such that h
(2)
x,y
(x) 6= 0.
(iii) There exists h
(3)
x,y
such that h
(3)
x,y
(y) 6= 0.
Then it is an easy exercise to show that some linear combination of
h
(1)
x,y
and
h
(2)
x,y
and h
(3)
x,y
works, say h
x,y
.
We will want to find our
f
x,y
that satisfies (
). But we will do better. We
will make it equal
g
on
x
and
y
. The idea is to take linear combinations of
h
x,y
and
h
2
x,y
. Instead of doing the messy algebra to show that we can find a working
linear combination, just notice that (
h
x,y
(
x
)
, h
x,y
(
y
)) and (
h
x,y
(
x
)
2
, h
x,y
(
y
)
2
)
are linearly independent vectors in
R
2
. Therefore there exists
α, β R
such that
α(h
x,y
(x), h
x,y
(y)) + β(h
x,y
(x)
2
, h
x,y
(y)
2
) = (g(x), g(y)).
So done.
In the other case, given
A
, suppose there is
x
0
K
such that
f
(
x
0
) = 0 for
all f A. Consider the algebra
A
0
= A + λ1 = {f + λ1 : f A, λ R}
Since
A
separates points, and for any
x K
, there is some
f A
0
such that
f(x) 6= 0 (e.g. f = 1), by the previous part, we know that
¯
A
0
= C
R
(K).
Now note that
¯
A {f C
R
(K) : f(x
0
) = 0} = B.
So we suffices to show that we have equality, i.e. for any
g B
and
ε >
0, there
is some f A such that
kf gk
C
R
(K)
< ε.
Since
¯
A
0
=
C
R
(
K
), given such
g
and
ε
, there is some
f A
and
λ
0
R
such
that
kg (f + λ1)k
C
R
(K)
< ε.
But
g
(
x
0
) =
f
(
x
0
) = 0, which implies that
|λ| < ε
. Therefore
kg fk
C
R
(K)
<
2
ε
.
So done.
What happens for C
C
(K)?
Example.
Let
K
=
B(0, 1) C
, and let
A
be the set of polynomials on
B(0, 1)
.
We will show that
¯
A 6= C
C
(B(0, 1)).
Consider
f
(
z
) =
¯z
. This is not in the closure of
A
, since this is not holomor-
phic, but the uniform limit of any sequence of holomorphic is holomorphic (by
Morera’s theorem, i.e.
f
is holomorphic iff
H
γ
f
(
z
) d
z
= 0 for all closed piecewise
smooth curve γ).
Hence we need an additional condition on this. It turns out that this rather
simple example is the only way in which things can break. We have the following:
Theorem
(Complex version of Stone-Weierstrass theorem)
.
Let
K
be compact
and
A C
C
(
K
) be a subalgebra over
C
which separates points and is closed
under complex conjugation (i.e. if
f A
, then
¯
f
=
A
). Then either
¯
A
=
C
C
(
K
)
or these is an x
0
such that
¯
A = {f C
C
(K) : f(x
0
) = 0}.
Proof.
It suffices to show that either
¯
A C
R
(
K
) or there exists a point
x
0
such that
¯
A {f C
R
(
K
) :
f
(
x
0
) = 0
}
, since we can always break a complex
function up into its real and imaginary parts.
Now consider
A
0
=
f +
¯
f
2
: f A
f
¯
f
2i
: f A
.
Now note that by closure of
A
, we know that
A
0
is a subset of
A
and is
a subalgebra of
C
R
(
K
) over
R
, which separates points. Hence by the real
version of Stone-Weierstrass, either
¯
A
0
=
C
R
(
K
) or there is some
x
0
such that
¯
A
0
= {f C
R
(K) : f(x
0
) = 0}. So done.
4 Hilbert spaces
4.1 Inner product spaces
We have just looked at continuous functions on some compact space
K
. Another
important space is the space of square-integrable functions. Consider the space
L
2
(R) =
f : f is Lebesgue integrable :
Z
|f|
2
<
/,
where f g if f = g is Lebesgue almost everywhere.
One thing we like to think about is the Fourier series. Recall that for
f C(S
1
), we have defined, for each k Z,
ˆ
f(k) =
1
2π
Z
π
π
e
ikx
f(x) dx,
and we have defined the partial sum
S
N
(f)(x) =
N
X
n=N
e
inx
ˆ
f(k).
We have previously seen that even if
f
is continuous, it is possible that the partial
sums
S
N
do not converge, even pointwise. However, we can ask for something
weaker:
Proposition. Let f C(S
1
). Then
lim
N→∞
1
2π
Z
π
π
|f(x) S
N
(f)(x)|
2
dx = 0.
We will prove this later. However, the key points of the proof is the “orthog-
onality” of {e
inx
}. More precisely, we have
1
2π
Z
π
π
e
inx
e
imx
dx = 0 if m 6= n.
The introduction of Hilbert spaces is in particular a way to put this in a
general framework. We want to introduce an extra structure that gives rise to
“orthogonality”.
Definition
(Inner product)
.
Let
V
be a vector space over
R
or
C
. We say
p : V × V R or C is an inner product on V it satisfies
(i) p(v, w) = p(w, v) for all v, w V . (antisymmetry)
(ii) p
(
λ
1
v
1
+
λ
2
v
2
, u
) =
λ
1
p
(
v
1
, w
) +
λ
2
p
(
v
2
, w
). (linearity in first argument)
(iii) p(v, v) 0 for all v V and equality holds iff v = 0. (non-negativity)
We will often denote an inner product by
p
(
v, w
) =
hv, wi
. We call (
V, h·, ·i
)
an inner product space.
Definition
(Orthogonality)
.
In an inner product space,
v
and
w
are orthogonal
if hv, wi = 0.
Orthogonality and the inner product is important when dealing with vector
spaces. For example, recall that when working with finite-dimensional spaces, we
had things like Hermitian matrices, orthogonal matrices and normal matrices. All
these are in some sense defined in terms of the inner product and orthogonality.
More fundamentally, when we have a finite-dimensional vector spaces, we often
write the vectors as a set of
n
coordinates. To define this coordinate system, we
start by picking
n
orthogonal vectors (which are in fact orthonormal), and then
the coordinates are just the projections onto these orthogonal vectors.
Hopefully, you are convinced that inner products are important. So let’s see
what we can get if we put in inner products to arbitrary vector spaces.
We will look at some easy properties of the inner product.
Proposition
(Cauchy-Schwarz inequality)
.
Let (
V, h·, ·i
) be an inner product
space. Then for all v, w V ,
|hv, wi|
p
hv, vihw, wi,
with equality iff there is some λ R or C such that v = λw or w = λv.
Proof.
wlog, we can assume
w 6
= 0. Otherwise, this is trivial. Moreover, assume
hv, wi R. Otherwise, we can just multiply w by some e
.
By non-negativity, we know that for all t, we have
0 hv + tw, v + twi
= hv, vi + 2thv, wi+ t
2
hw, wi.
Therefore, the discriminant of this quadratic polynomial in
t
is non-positive, i.e.
4(hv, wi)
2
4hv, vihw, wi 0,
from which the result follows.
Finally, note that if equality holds, then the discriminant is 0. So the
quadratic has exactly one root. So there exists
t
such that
v
+
tw
= 0, which of
course implies v = tw.
Proposition. Let (V, h·, ·i) be an inner product space. Then
kvk =
p
hv, vi
defines a norm.
Proof.
The first two axioms of the norm are easy to check, since it follows directly
from definition of the inner product that
kvk
0 with equality iff
v
=
0
, and
kλvk = |λ|kvk.
The only non-trivial thing to check is the triangle inequality. We have
kv + wk
2
= hv + w, v + wi
= kvk
2
+ kwk
2
+ |hv, wi| + |hw, v i|
kvk
2
+ kwk
2
+ 2kvkkwk
= (kvk + kwk)
2
Hence we know that kv + wk kvk + kwk.
This motivates the following definition:
Definition
(Euclidean space)
.
A normed vector space (
V, k · k
) is a Euclidean
space if there exists an inner product h·, ·i such that
kvk =
p
hv, vi.
Proposition.
Let (
E, k · k
) be a Euclidean space. Then there is a unique inner
product h·, ·i such that kvk =
p
hv, vi.
Proof. The real and complex cases are slightly different.
First suppose
E
is a vector space over
R
, and suppose also that we have an
inner product h·, ·i such that kvk =
p
hv, vi. Then
hv + w, v + wi = kvk
2
+ 2hv, wi + kwk
2
.
So we get
hv, wi =
1
2
(kv + wk
2
kvk
2
kwk
2
). ()
In particular, the inner product is completely determined by the norm. So this
must be unique.
Now suppose E is a vector space over C. We have
hv + w, v + wi = kvk
2
+ kwk
2
+ hv, wi + hw, v i (1)
hv w, v wi = kvk
2
+ kwk
2
hv, wi hw, v i (2)
hv + iw, v + iwi = kvk
2
+ kwk
2
ihv, wi + ihw, v i (3)
hv iw, v iwi = kvk
2
+ kwk
2
+ ihv, wi ihw, v i (4)
Now consider (1) (2) + i(3) i(4). Then we obtain
kv + wk
2
kv wk
2
+ ikv + iwk
2
ikv iwk
2
= 4hv, wi. ()
So again hv, wi is again determined by the norm.
The identities (
) and (
) are sometimes known as the polarization identities.
Definition
(Hilbert space)
.
A Euclidean space (
E, k · k
) is a Hilbert space if it
is complete.
We will prove certain properties of the inner product.
Proposition
(Parallelogram law)
.
Let (
E, k · k
) be a Euclidean space. Then
for v, w E, we have
kv wk
2
+ kv + wk
2
= 2kvk
2
+ 2kwk
2
.
This is called the parallelogram law because it says that for any parallelogram,
the sum of the square of the lengths of the diagonals is the sum of square of the
lengths of the two sides.
v
w
v + w
v w
Proof. This is just simple algebraic manipulation. We have
kv wk
2
+ kv + wk
2
= hv w, v wi + hv + w, v + w i
= hv, vi hv, wihw, v i + hw, wi
+ hv, vi + hv, w i + hw, vi+ hw, wi
= 2hv, vi + 2hw, wi.
Proposition
(Pythagoras theorem)
.
Let (
E, k · k
) be a Euclidean space, and
let v, w E be orthogonal. Then
kv + wk
2
= kvk
2
+ kwk
2
.
Proof.
kv + wk
2
= hv + w, v + wi
= hv, vi + hv, wi+ hw, v i + hw, wi
= hv, vi + 0 + 0 + hw, wi
= kvk
2
+ kwk
2
.
By induction, if
v
i
E
for
i
= 1
, ··· , n
such that
hv
i
, v
j
i
= 0 for
i 6
=
j
, i.e.
they are mutually orthogonal, then
n
X
i=1
v
i
2
=
n
X
i=1
kv
i
k
2
.
Proposition.
Let (
E, k · k
) be a Euclidean space. Then
h·, ·i
:
E × E C
is
continuous.
Proof. Let (v, w) E × E, and (
˜
v,
˜
w) E × E. We have
khv, wi h
˜
v,
˜
wik = khv, wi hv,
˜
wi + hv,
˜
wi h
˜
v,
˜
wik
khv, wi hv,
˜
wik + khv,
˜
wi h
˜
v,
˜
wik
= khv, w
˜
wik + khv
˜
v,
˜
wik
kvkkw
˜
wk + kv
˜
vkk
˜
wk
Hence for
v, w
sufficiently closed to
˜
v,
˜
w
, we can get
khv, wih
˜
v,
˜
wik
arbitrarily
small. So it is continuous.
When we have an incomplete Euclidean space, we can of course take the
completion of it to form a complete extension of the original normed vector
space. However, it is not immediately obvious that the inner product can also
be extended to the completion to give a Hilbert space. The following proposition
tells us we can do so.
Proposition.
Let (
E, k · k
) denote a Euclidean space, and
¯
E
its completion.
Then the inner product extends to an inner product on
¯
E
, turning
¯
E
into a
Hilbert space.
Proof.
Recall we constructed the completion of a space as the equivalence classes
of Cauchy sequences (where two Cauchy sequences (
x
n
) and (
x
0
n
) are equivalent
if
|x
n
x
0
n
|
0). Let (
x
n
)
,
(
y
n
) be two Cauchy sequences in
E
, and let
˜x, ˜y
¯
E
denote their equivalence classes. We define the inner product as
h
˜
x,
˜
yi = lim
n→∞
hx
n
, y
n
i. ()
We want to show this is well-defined. Firstly, we need to make sure the limit
exists. We can show this by showing that this is a Cauchy sequence. We have
khx
n
, y
n
i hx
m
, y
m
ik = khx
n
, y
n
i hx
m
, y
n
i + hx
m
, y
n
i hx
m
, y
m
ik
khx
n
, y
n
i hx
m
, y
n
ik + khx
m
, y
n
i hx
m
, y
m
ik
khx
n
, x
m
, y
n
ik + khx
m
, y
n
y
m
ik
kx
n
x
m
kky
n
k + kxkky
n
y
n
k
So hx
n
, y
n
i is a Cauchy sequence since (x
n
) and (y
n
) are.
We also need to show that (
) does not depend on the representatives for
˜
x
and
˜
y. This is left as an exercise for the reader
We also need to show that
h·, ·i
¯
E
define the norm of
k · k
¯
E
, which is yet
another exercise.
Example. Consider the space
`
2
=
(
(x
1
, x
2
, ···) : x
i
C,
X
i=1
|x
i
|
2
<
)
.
We already know that this is a complete Banach space. We can also define an
inner product on this space by
ha, bi
`
2
=
X
i=1
a
i
¯
b
i
.
We need to check that this actually converges. We prove this by showing absolute
convergence. For each n, we can use Cauchy-Schwarz to obtain
n
X
i=1
|a
i
¯
b
i
|
n
X
i=1
|a|
2
!
1
2
n
X
i=1
|b|
2
!
1
2
kak
`
2
kbk
`
2
.
So it converges. Now notice that the
`
2
norm is indeed induced by this inner
product.
This is a significant example since we will later show that every separable (i.e.
has countable basis) infinite dimensional Hilbert space is isometric isomorphic
to `
2
.
Definition
(Orthogonal space)
.
Let
E
be a Euclidean space and
S E
an
arbitrary subset. Then the orthogonal space of S, denoted by S
is given by
S
= {v E : w S, hv, w i = 0}.
Proposition.
Let
E
be a Euclidean space and
S E
. Then
S
is a closed
subspace of E, and moreover
S
= (span S)
.
Proof.
We first show it is a subspace. Let
u, v S
and
λ, µ C
. We want to
show λu + µv S
. Let w S. Then
hλu + µv, wi = λhu, wi + µhv, wi = 0.
To show it is closed, let
u
n
S
be a sequence such that
u
n
u E
. Let
w S. Then we know that
hu
n
, wi = 0.
Hence, by the continuity of the inner product, we have
0 = lim
n→∞
hu
n
, wi = hlim u
n
, wi = hu, w i.
The remaining part is left as an exercise.
Note that if
V
is a linear subspace, then
V V
=
{
0
}
, since any
v V V
has to satisfy hv, vi = 0. So V + V
is a direct sum.
Theorem.
Let (
E, k · k
) be a Euclidean space, and
F E
a complete subspace.
Then F F
= E.
Hence, by definition of the direct sum, for
x E
, we can write
x
=
x
1
+
x
2
,
where x
1
F and x
2
F
. Moreover, x
1
is uniquely characterized by
kx
1
xk = inf
yF
ky xk.
F
x
x
1
Note that this is not necessarily true if F is not complete.
Proof.
We already know that
F F
is a direct sum. It thus suffices to show
that the sum is the whole of E.
Let y
i
F be a sequence with
lim
i→∞
ky
i
xk = inf
yF
ky xk = d.
We want to show that
y
is a Cauchy sequence. Let
ε >
0 be given. Let
n
0
N
such that for all i n
0
, we have
ky
i
xk
2
d
2
+ ε.
We now use the parallelogram law for
v
=
x y
i
,
w
=
x y
j
with
i, j n
0
.
Then the parallelogram law says:
kv + wk
2
+ kv wk
2
= 2kvk
2
+ 2kwk
2
,
or
ky
j
y
i
k
2
+ k2x y
i
y
j
k
2
= 2ky
i
xk
2
+ 2ky
j
xk
2
.
Hence we know that
ky
i
y
j
k
2
2ky
i
xk
2
+ 2ky
j
xk
2
4
x
y
i
+ y
j
2
2
2(d
2
+ ε) + 2(d
2
+ ε) 4d
2
4ε.
So
y
i
is a Cauchy sequence. Since
F
is complete,
y
i
y F
for some
F
.
Moreover, by continuity, of k · k, we know that
d = lim
i→∞
ky
i
xk = ky xk.
Now let
x
1
=
y
and
x
2
=
x y
. The only thing left over is to show
x
2
F
.
Suppose not. Then there is some
˜
y F such that
h
˜
y, x
2
i 6= 0.
The idea is that we can perturbe
y
by a little bit to get a point even closer to
x
.
By multiplying
˜
y with a scalar, we can assume
h
˜
y, x
2
i > 0.
Then for t > 0, we have
k(y + t
˜
y) xk
2
= hy + t
˜
y x, y + t
˜
y xi
= hy x, y xi + ht
˜
y, y xi + hy x, t
˜
yi + t
2
h
˜
y,
˜
yi
= d
2
2th
˜
y, x
2
i + t
2
k
˜
yk
2
.
Hence for sufficiently small
t
, the
t
2
term is negligible, and we can make this
less that d
2
. This is a contradiction since y + t
˜
y F .
As a corollary, we can define the projection map as follows:
Corollary.
Let
E
be a Euclidean space and
F E
a complete subspace. Then
there exists a projection map
P
:
E E
defined by
P
(
x
) =
x
1
, where
x
1
F
is
as defined in the theorem above. Moreover,
P
satisfies the following properties:
(i) P (E) = F and P (F
) = {0}, and P
2
= P . In other words, F
ker P .
(ii) (I P )(E) = F
, (I P )(F ) = {0}, (I P )
2
= (I P ).
(iii) kP k
B(E,E)
1 and
kI P k
B(E,E)
1, with equality if and only if
F 6
=
{
0
}
and F
6= {0} respectively.
Here P projects our space to F , where I P projects our space to F
.
4.2 Riesz representation theorem
Our next theorem allows us to completely understand the duals of Hilbert spaces.
Consider the following map. Given a Hilbert space
H
and
v H
, consider
φ
v
H
defined by
φ
v
(w) = hw, vi.
Note that this construction requires the existence of a inner product.
Notice that this is indeed a bounded linear map, where boundedness comes
from the Cauchy-Schwarz inequality
|hw, vi| kvk
H
kwk
H
.
Therefore, φ taking v 7→ φ
v
is a map φ : H H
.
Using this simple construction, we have managed to produce a lot of members
of the dual. Are there any more things in the dual? The answer is no, and this
is given by the Riesz representation theorem.
Proposition
(Riesz representation theorem)
.
Let
H
be a Hilbert space. Then
φ
:
H H
defined by
v 7→ h·, vi
is an isometric anti-isomorphism, i.e. it is
isometric, bijective and
φ(λv + µw) =
¯
λφ(v) + ¯µφ(v).
Proof. We first prove all the easy bits, namely everything but surjectivity.
To show injectivity, if
φ
v
=
φ
u
, then
hw, vi
=
hw, ui
for all
w
by definition.
So
hw, vui
= 0 for all
w
. In particular,
hvw, vwi
= 0. So
vw
= 0.
To show that it is an anti-homomorphism, let
v, w, y H
and
λ, µ F
.
Then
φ
λv+µw
(y) = hy, λv + µwi =
¯
λhy, vi + ¯µhy, wi =
¯
λφ
v
(y) + ¯µφ
w
(y).
To show it is isometric, let v, w H and kwk
H
= 1. Then
|φ
v
(w)| = |hw, vi| kwk
H
kvk
H
= kvk
H
.
Hence, for all
v
,
kφ
v
k
H
kvk
H
for all
v H
. To show
kφ
v
k
H
is exactly
kvk
H
, it suffices to note that
|φ
v
(v)| = hv, vi = kvk
2
H
.
So kφ
v
k
H
kvk
2
H
/kvk
H
= kvk
H
.
Finally, we show surjectivity. Let ξ H
. If ξ = 0, then ξ = φ
0
.
Otherwise, suppose
ξ 6
= 0. The idea is that (
ker ξ
)
is one-dimensional, and
then the
v
we are looking for will be an element in this complement. So we
arbitrarily pick one, and then scale it appropriately.
We now write out the argument carefully. First, we note that since
ξ
is
continuous,
ker ξ
is closed, since it is the inverse image of the closed set
{
0
}
. So
ker ξ is complete, and thus we have
H = ker ξ (ker ξ)
.
The next claim is that
dim
(
ker ξ
) = 1. This is an immediate consequence of the
first isomorphism theorem, whose proof is the usual one, but since we didn’t
prove that, we will run the argument manually.
We pick any two elements
v
1
, v
2
(
ker ξ
)
. Then we can always find some
λ, µ not both zero such that
λξ(v
1
) + µξ(v
2
) = 0.
So
λv
1
+
µv
2
ker ξ
. But they are also in (
ker ξ
)
by linearity. Since
ker ξ
and
(
ker ξ
)
have trivial intersection, we deduce that
λv
1
+
µv
2
= 0. Thus, any two
vectors in (
ker ξ
)
are dependent. Since
ξ 6
= 0, we know that
ker ξ
has dimension
1.
Now pick any
v
(
ker ξ
)
such that
ξ
(
v
)
6
= 0. By scaling it appropriately,
we can obtain a v such that
ξ(v) = hv, v i.
Finally, we show that
ξ
=
φ
v
. To prove this, let
w H
. We decompose
w
using
the previous theorem to get
w = αv + w
0
for some
w
0
ker ξ
and
α F
. Note that by definition of (
ker ξ
)
, we know
that hw
0
, vi = 0. Hence we know that
ξ(w) = ξ(αv + w
0
) = ξ(αv) = αξ(v)
= αhv, vi = hαv, vi = hαv + w
0
, vi = hw, vi.
Since w was arbitrary, we are done.
Using this proposition twice, we know that all Hilbert spaces are reflexive,
i.e. H
=
H
∗∗
.
We now return to the proof of the proposition we claimed at the beginning.
Proposition. For f C(S
1
), defined, for each k Z,
ˆ
f(k) =
1
2π
Z
π
π
e
ikx
f(x) dx.
The partial sums are then defined as
S
N
(f)(x) =
N
X
n=N
e
inx
ˆ
f(k).
Then we have
lim
N→∞
1
2π
Z
π
π
|f(x) S
N
(f)(x)|
2
dx = 0.
Proof.
Consider the following Hilbert space
L
2
(
S
1
) defined as the completion of
C
C
(S
1
) under the inner product
hf, gi =
1
2π
Z
π
π
f(x)¯g(x) dx,
Consider the closed subspace
U
N
= span{e
inx
: |n| N }.
Then in fact S
N
defined above by
S
N
(f)(x) =
N
X
n=N
e
inx
ˆ
f(k)
is the projection operator onto
U
N
. This is since we have the orthonormal
condition
he
inx
, e
imx
i =
1
2π
Z
π
π
e
inx
e
imx
dx =
(
1 n = m
0 n 6= m
Hence it is easy to check that if
f U
N
, say
f
=
P
N
n=N
a
n
e
inx
, then
S
N
f
=
f
since
S
N
(f) =
N
X
n−−N
ˆ
f(k)e
inx
=
N
X
n=N
hf, e
inx
ie
inx
=
N
X
n=N
a
n
e
inx
= f
using the orthogonality relation. But if f U
N
, then
1
2π
Z
π
π
e
inx
f(x) dx = 0
for all |n| < N. So S
N
(f) = 0. So this is indeed a projection map.
In particular, we will use the fact that projection maps have norms
1.
Hence for any P (x), we have
1
2π
Z
π
π
|S
N
(f)(x) S
N
(P )(x)|
2
dx
1
2π
Z
π
π
|f(x) P (x)|
2
dx
Now consider the algebra
A
generated
{e
inx
:
n Z}
. Notice that
A
separates
points and is closed under complex conjugation. Also, for every
x S
1
, there
exists
f A
such that
f
(
x
)
6
= 0 (using, say
f
(
x
) =
e
ix
). Hence, by Stone-
Weierstrass theorem,
¯
A
=
C
C
(
S
1
), i.e. for every
f C
C
(
S
1
) and
ε >
0, there
exists a polynomial P of e
ix
and e
ix
such that
kP f k < ε.
We are almost done. We now let
N > deg P
be a large number. Then in
particular, we have S
N
(P ) = P . Then
1
2π
Z
π
π
|S
N
(f) f |
2
dx
1
2
1
2π
Z
π
π
|S
N
(f) S
N
(P )|
2
dx
1
2
+
1
2π
Z
π
π
|S
N
(P ) P |
2
dx
1
2
+
1
2π
Z
π
π
|P f |
2
dx
1
2
ε + 0 + ε
= 2ε.
So done.
4.3 Orthonormal systems and basis
Definition
(Orthonormal system)
.
Let
E
be a Euclidean space. A set of unit
vectors {e
α
}
αA
is called an orthonormal system if he
α
, e
β
i = 0 if α 6= β.
We want to define a “basis” in an infinite dimensional vector space. The idea
is that these should be orthonormal systems “big enough” to span everything.
In finite-dimensions, this was easy, since there is the notion of dimension if
we have
n
dimensions, then we just take an orthonormal system of
n
vectors,
and we are done.
If we have infinite dimensions, this is trickier. If we have many many
dimensions and vectors, we can keep adding things to our orthonormal system,
but we might never get to such a “basis”, if our “basis” has to be uncountable.
Hence we have the idea of “maximality”.
Definition
(Maximal orthonormal system)
.
Let
E
be a Euclidean space. An
orthonormal space is called maximal if it cannot be extended to a strictly larger
orthonormal system.
By Zorn’s lemma, a maximal orthonormal system always exists. We will
later see that in certain nice cases, we can construct a maximal orthonormal
system directly, without appealing to Zorn’s lemma. The advantage of an explicit
construction is that we will understand our system much more.
One important thing we would like to do is given an orthonormal system,
decide whether it is maximal. In general, this is difficult, and Zorn’s lemma is
completely useless.
Now suppose we are nicer and have a Hilbert space. What we would like to
say is that if we have a maximal orthonormal system, then its span is the whole
space
H
. However, this doesn’t really work. The span of a set
S
only allows us
to take finite linear combinations, but by completeness of
H
, we want to have
the infinite sums, i.e. the limits as well. So what we really have is the following.
Proposition.
Let
H
be a Hilbert space. Let
S
be a maximal orthonormal
system. Then span S = H.
While this might seem difficult to prove at first, it turns out the proof is
pretty short and simple.
Proof. Recall that S
= (span S)
. Since H is a Hilbert space, we have
H = span S (span S)
= span S S
.
Since S is maximal, S
= {0}. So done.
How about the converse? It is also true. In fact, it is true even for Euclidean
spaces, and the proof is easy.
Proposition.
Let
E
be Euclidean, and let
S
be an orthonormal system. If
span S = E, then S is maximal.
Proof.
S
= (span S)
= E
= {0}.
So in a Hilbert space, we have an if and only if condition a system is
maximal if and only if the closure of the span is everything. In other words,
given any vector
v H
, we can find a sequence
v
i
in the span of the maximal
system that converges to
v
. This sequence is clearly not unique, since we can
just add a random term to the first item.
However, we can do something better. Consider our space
`
2
, and the element
(1
,
1
2
,
1
4
, ···
). There is a very natural way to write this as the limit of the sequence:
(1, 0, 0, ···),
1,
1
2
, 0, ···
,
1,
1
2
,
1
4
, 0, ···
, ··· .
What we are doing is that we are truncating the element at the
n
th component
for each
n
. Alternatively, the
n
th term is what we get when we project our
v
onto the space spanned by the first
n
“basis” vectors. This is a nice and natural
way to produce the sequence.
Definition
(Hilbert space basis)
.
Let
H
be a Hilbert space. A maximal or-
thonormal system is called a Hilbert space basis.
Recall that at the beginning, we said we needed Zorn’s lemma to get an
orthonormal system. In many cases, we can find a basis without using Zorn’s
lemma. This relies on the Gram-Schmidt procedure.
Proposition.
Let
{x
i
}
n
i=1
,
n N
be linearly independent. Then there exists
{e
i
}
n
i=1
such that {e
i
}
n
i=1
is an orthonormal system and
span{x
1
, ··· , x
j
} = span{e
1
, ··· , e
j
}
for all j n.
Proof. Define e
1
by
e
1
=
x
1
kx
1
k
.
Assume we have defined {e
i
}
j
i=1
orthonormal such that
span{x
1
, ··· , x
j
} = span{e
1
, ··· , e
j
}.
Then by linear independence, we know that
x
j+1
6∈ span{x
1
, ··· , x
j
} = span{e
1
, ··· , e
j
} = F
j
.
We now define
˜
x
j+1
= x
j+1
P
F
j
(x
j+1
),
where P
F
j
is the projection onto F
j
given by
P
F
j
=
j
X
i=1
hx, e
i
ie
i
.
Since F
j
is a closed, finite subspace, we know that
x
j+1
P
F
j
x
j+1
F
j
.
Thus
e
j+1
=
˜
x
j+1
k
˜
x
j+1
k
is the right choice. We can also write this in full as
e
j+1
=
x
j+1
P
j
i=1
hx
j
e
j
ie
i
kx
j+1
P
j
i=1
hx
j
e
j
ie
i
k
.
So done.
Note that projection into the first
n
basis is exactly what we did when we
wrote an element in `
2
as the limit of the sequence.
This is a helpful result, since it is a constructive way of producing orthonormal
systems. So if we are handed a set of vectors, we can just apply this result, plug
our vectors into this ugly formula, and get a result. Of course, we want to apply
this to infinite spaces.
Proposition.
Let
H
be separable, i.e. there is an infinite set
{y
i
}
iN
such that
span{y
i
} = H.
Then there exists a countable basis for span{y
i
}.
Proof.
We find a subset
{y
i
j
}
such that
span{y
i
}
=
span{y
i
j
}
and
{y
i
j
}
are
independent. This is easy to do since we can just throw away the useless
dependent stuff. At this point, we do Gram-Schmidt, and done.
Example. Consider H = `
2
and the sequence {e
i
}
iN
be defined by
e
i
= (0, 0, ··· , 0, 1, 0, ···),
with the zero in the ith column.
Note that
x {e
i
}
iN
if and only if each component is zero, i.e.
x
=
0
. So
{e
i
} is maximal, and hence a basis.
Example. Consider H = L
2
, the completion of C(S
1
) under the L
2
norm, i.e.
hf, gi =
Z
π
π
f ¯g dx.
Trigonometric polynomials are dense in
C
(
S
1
) with respect to the supremum
norm due to Stone-Weierstrass. So in fact
span
n
1
2π
e
inx
o
for
n N
is dense
in
C
(
S
1
). Hence it is dense in
C
(
S
1
) under the
L
2
norm since convergence
under the supremum norm implies convergence under
L
2
. In particular, it is
dense in the
L
2
space since
L
2
is the completion of
C
(
S
1
). Moreover, this set is
orthonormal in C(S
1
) under the L
2
norm. So
n
1
2π
e
inx
o
is a basis for L
2
.
Note that in these two examples, we have exhibited two different ways of
constructing a basis. In the first case, we showed that it is maximal directly. In
the second case, we show that its span is a dense subset of the space. By our
proposition, these are equivalent and valid ways of proving that it is a basis.
4.4 The isomorphism with `
2
We ended the previous section with two examples. Both of them are Hilbert
spaces, and both have countable basis. Is there any way we can identify the
both? This is a reasonable thing to ask. If we are given a Hilbert space
H
of
finite dimension
dim H
=
n
, then we know that
H
is indeed isomorphic to
R
n
(or
C
n
) with the Euclidean norm. In some sense
`
2
is just an “infinite version”
of
R
n
. So we might expect all other Hilbert spaces with countable dimension to
be isomorphic to `
2
.
Recall that if we have a finite-dimensional Hilbert space
H
with
dim H
=
n
,
and an orthonormal basis {e
1
, ··· , e
n
}, then each x H can be written as
x =
n
X
i=1
hx, e
i
ie
i
,
and
kxk
2
=
n
X
i=1
|hx, e
i
i|
2
.
Thus
H
is isomorphic to
`
n
2
, the space
R
n
with the Euclidean norm, via the map
x 7→ (hx, e
1
i, ··· , hx, e
n
i).
Can we push this to the infinite dimensional case? Yes. We will have to replace
our finite sum
P
n
i=1
with an infinite sum. Of course, with an infinite sum, we
need to make sure things converge. This is guaranteed by Bessel’s inequality.
Lemma
(Bessel’s inequality)
.
Let
E
be Euclidean and
{e
i
}
N
i=1
with
N N∪{∞}
an orthonormal system. For any
x E
, define
x
i
=
hx, e
i
i
. Then for any
j N
,
we have
j
X
i=1
|x
i
|
2
kxk
2
.
Proof. Consider the case where j is finite first. Define
F
j
= span{e
1
, ··· , e
j
}.
This is a finite dimensional subspace of
E
. Hence an orthogonal projection
P
F
j
exists. Moreover, we have an explicit formula for this:
P
F
j
=
j
X
i=1
hx, e
i
ie
i
.
Thus
j
X
i=1
|x
i
|
2
= kP
F
j
xk
2
kxk
2
since we know that
kP
F
j
k
1. Taking the limit as
j
proves the case for
infinite j.
The only thing we required in the proof is for the space to be Euclidean.
This is since we are talking about the sum
X
i=1
|x
i
|
2
,
and this is a sum of numbers. However, if we want to investigate the sum
x =
X
i=1
hx, e
i
ie
i
,
then we’d better require the space to be Hilbert, so that the sum has something
to converge to.
Proposition.
Let
H
be a separable Hilbert space, with a countable basis
{e
i
}
N
i=1
, where N N {∞}. Let x, y H and
x
i
= hx, e
i
i, y
i
= hy, e
i
i.
Then
x =
N
X
i=1
x
i
e
i
, y =
N
X
i=1
y
i
e
i
,
and
hx, yi =
N
X
i=1
x
i
¯y
i
.
Moreover, the sum converges absolutely.
Proof.
We only need to consider the case
N
=
. Otherwise, it is just finite-
dimensional linear algebra.
First, note that our expression is written as an infinite sum. So we need to
make sure it converges. We define the partial sums to be
s
n
=
n
X
i=1
x
i
e
i
.
We want to show s
n
x. By Bessel’s inequality, we know that
X
i=1
|x
i
|
2
kxk
2
.
In particular, the sum is bounded, and hence converges.
For any m < n, we have
ks
n
s
m
k =
n
X
i=m+1
|x
i
|
2
X
i=m+1
|x
i
|
2
.
As
m
, the series must go to 0. Thus
{s
n
}
is Cauchy. Since
H
is Hilbert,
s
n
converges, say
s
n
s =
X
i=1
x
i
e
i
.
Now we want to prove that this sum is indeed
x
itself. Note that so far in the
proof, we have not used the fact that
{e
i
}
is a basis. We just used the fact that
it is orthogonal. Hence we should use this now. We notice that
hs, e
i
i = lim
n→∞
hs
n
, e
i
i = lim
n→∞
n
X
j=1
x
j
he
j
, e
i
i = x
i
.
Hence we know that
hx s, e
i
i = 0.
for all
i
. So
x s
is perpendicular to all
e
i
. Since
{e
i
}
is a basis, we must have
x s = 0, i.e. x = s.
To show our formula for the inner product, we can compute
hx, yi = lim
n→∞
*
n
X
i=1
x
i
e
i
,
n
X
j=1
y
j
e
j
+
= lim
n→∞
n
X
i,j=1
x
i
¯y
j
he
i
, e
j
i
= lim
n→∞
n
X
i,j=1
x
i
¯y
j
δ
ij
= lim
n→∞
n
X
i=1
x
i
¯y
i
=
X
i=1
x
i
¯y
i
.
Note that we know the limit exists, since the continuity of the inner product
ensures the first line is always valid.
Finally, to show absolute convergence, note that for all finite j, we have
j
X
i=1
|x
i
¯y
i
|
v
u
u
t
n
X
i=1
|x
i
|
2
v
u
u
t
n
X
i=1
|y
i
|
2
kxkkyk.
Since this is a uniform bound for any j, the sum converges absolutely.
Note that in the case of x = y, our formula for the inner product gives
kxk
2
=
N
X
i=1
|x
i
|
2
.
This is known as Parseval’s equality
What this proposition gives us is that given any separable Hilbert space,
we can find “coordinates” for it, and in terms of these coordinates, our inner
product and hence norm all act like `
2
. In particular, we have the map
x 7→ {hx, e
i
i}
N
i=1
that takes
H
into
`
2
. This is injective since by Parseval’s equality, if the image
of x is 0, then kxk
2
=
P
0 = 0. So x = 0.
This is good, but not good enough. We want the map to be an isomorphism.
Hence, we need to show it is surjective. In other words, every element in
`
2
is
obtained. This is a theorem by Riesz and Fisher, and is in fact easy to prove,
since there is an obvious candidate for the preimage of any {x
i
}
iN
.
Proposition.
Let
H
be a separable Hilbert space with orthonormal basis
{e
i
}
iN
. Let
{a
i
}
iN
`
2
(
C
). Then there exists an
x H
with
hx, e
i
i
=
a
i
.
Moreover, this x is exactly
x =
X
i=1
x
i
e
i
.
Proof.
The only thing we need to show is that this sum converges. For any
n N, define
s
n
=
n
X
i=1
a
i
e
i
H.
For m < n, we have
ks
n
s
m
k
2
=
n
X
m+1
|a
i
|
2
0
as
m
because
{a
i
} `
2
. Hence
s
n
is Cauchy and as such converges to
x
.
Obviously, we have
hx, e
i
i = lim
n→∞
n
X
j=1
a
j
he
j
, e
i
i = a
i
.
So done.
This means we have a isomorphism between
`
2
and
H
. Moreover, this is
continuous and in fact isometric. So this is a very strong result. This says all
separable Hilbert spaces are `
2
.
4.5 Operators
We are going to look at operators on Hilbert spaces. For example, we would like
to see how differential operators behave on spaces of differentiable functions.
In this section, we will at least require the space to be Banach. So let
X
be a Banach space over
C
. We will consider
B
(
X
) =
B
(
X, X
), the vector space
of bounded linear maps from
X
to itself. We have seen in the example sheets
that
B
(
X
) is a unital Banach algebra, i.e. it forms a complete algebra with
composition as multiplication. Our goal is to generalize some considerations in
finite dimensions such as eigenvectors and eigenvalues.
Definition
(Spectrum and resolvent set)
.
Let
X
be a Banach space and
T
B(X), we define the spectrum of T , denoted by σ(T ) by
σ(t) = {λ C : T λI is not invertible}.
The resolvent set, denoted by ρ(T), is
ρ(t) = C \ σ(T ).
Note that if
T λ
is is bijective, then by the inverse mapping theorem, we
know it has a bounded inverse. So if
λ σ
(
T
), then either
T λI
is not injective,
or it is not surjective. In other words,
ker
(
T λI
)
6
=
{
0
}
or
im
(
T λI
)
6
=
X
.
In finite dimensions, these are equivalent by, say, the rank-nullity theorem, but
in general, they are not.
Example. Consider the shift operator s : `
`
defined by
(a
1
, a
2
, a
3
, ···) 7→ (0, a
1
, a
2
, ···).
Then this is injective but not surjective.
Now if
λ ρ
(
T
), i.e.
T λI
is invertible, then (
T λI
)
1
is automatically
bounded by the inverse mapping theorem. This is why we want to work with
Banach spaces.
Definition
(Resolvent)
.
Let
X
be a Banach space. The resolvent is the map
R : ρ(T ) B(X) given by
λ 7→ (T λI)
1
.
Definition
(Eigenvalue)
.
We say
λ
is an eigenvalue of
T
if
ker
(
T λI
)
6
=
{
0
}
.
Definition
(Point spectrum)
.
Let
X
be a Banach space. The point spectrum is
σ
p
(T ) = {λ C : λ is an eigenvalue of T }.
Obviously, σ
p
(T ) σ(T ), but they are in general not equal.
Definition
(Approximate point spectrum)
.
Let
X
be a Banach space. The
approximate point spectrum is defined as
σ
ap
(X) = {λ C : ∃{x
n
} X : kx
n
k
X
= 1 and k(T λI)x
n
k
X
0}.
Again, we have
σ
p
(T ) σ
ap
(T ) σ(T ).
The last inclusion follows from the fact that if an inverse exists, then the inverse
is bounded.
An important characterization of the spectrum is the following theorem:
Theorem.
Let
X
be a Banach space,
T B
(
X
). Then
σ
(
T
) is a non-empty,
closed subset of
{λ C : |λ| kT k
B(X)
}.
In finite dimensions, this in particular implies the existence of eigenvalues,
since the spectrum is equal to the point spectrum. Notice this is only true for
vector spaces over C, as we know from linear algebra.
To prove this theorem, we will first prove two lemmas.
Lemma.
Let
X
be a Banach space,
T B
(
X
) and
kT k
B(X)
<
1. Then
I T
is invertible.
Proof.
To prove it is invertible, we construct an explicit inverse. We want to
show
(I T )
1
=
X
i=0
T
i
.
First, we check the right hand side is absolutely convergent. This is since
X
i=0
kT
i
k
B(X)
X
i=0
kT k
i
B(X)
1
1 kT k
B(X)
< .
Since
X
is Banach, and hence
B
(
X
) is Banach, the limit is well-defined. Now it
is easy to check that
(I T )
X
i=1
T
i
= (I T )(I + T + T
2
+ ···)
= I + (T T ) + (T
2
T
2
) + ···
= I.
Similarly, we have
X
i=1
T
i
!
(I T ) = I.
Lemma.
Let
X
be a Banach space,
S
1
B
(
X
) be invertible. Then for all
S
2
B(X) such that
kS
1
1
k
B(X)
kS
1
S
2
k
B(X)
< 1,
S
2
is invertible.
This is some sort of an “openness” statement for the invertible bounded
linear maps, since if
S
1
is invertible, then any “nearby” bounded linear map is
also invertible.
Proof. We can write
S
2
= S
1
(I S
1
1
(S
1
S
2
)).
Since
kS
1
1
(S
1
S
2
)k
B(X)
kS
1
1
k
B(X)
kS
1
S
2
k
B(X)
< 1
by assumption, by the previous lemma, (
I S
1
1
(
S
1
S
2
))
1
exists. Therefore
the inverse of S
2
is
S
1
2
= (I S
1
1
(S
1
S
2
))
1
S
1
1
.
We can now return to prove our original theorem.
Theorem.
Let
X
be a Banach space,
T B
(
X
). Then
σ
(
T
) is a non-empty,
closed subset of
{λ C : |λ| kT k
B(X)
}.
Note that it is not hard to prove that it is closed and a subset of
{λ C
:
|λ| kT k
B(X)
}. The hard part is to prove it is non-empty.
Proof.
We first prove the closedness of the spectrum. It suffices to prove that
the resolvent set ρ(T ) = C \σ(T ) is open, by the definition of closedness.
Let
λ ρ
(
T
). By definition,
S
1
=
T λI
is invertible. Define
S
2
=
T µI
.
Then
kS
1
S
2
k
B(X)
= k(T λI) (T µI)k
B(X)
= |λ µ|.
Hence if
|λ µ|
is sufficiently small, then
T µI
is invertible by the above
lemma. Hence µ ρ(T ). So ρ(T ) is open.
To show σ(T ) {λ C : |λ| kT k
B(X)
} is equivalent to showing
{λ C : |λ| > kT k
B(X)
} C \ σ(T ) = ρ(T ).
Suppose |λ| > kT k. Then I λ
1
T is invertible since
kλ
1
T k
B(X)
= λ
1
kT k
B(X)
< 1.
Therefore, (I λ
1
T )
1
exists, and hence
(λI T )
1
= λ
1
(I λT )
1
is well-defined. Therefore λI T , and hence T λI is invertible. So λ ρ(T ).
Finally, we need to show it is non-empty. How did we prove it in the case
of finite-dimensional vector spaces? In that case, it ultimately boiled down to
the fundamental theorem of algebra. And how did we prove the fundamental
theorem of algebra? We said that if
p
(
x
) is a polynomial with no roots, then
1
p(x)
is bounded and entire, hence constant.
We are going to do the same proof. We look at
1
T λI
as a function of
λ
. If
σ
(
T
) =
, then this is an everywhere well-defined function. We show that this is
entire and bounded, and hence by “Liouville’s theorem”, it must be constant,
which is impossible (in the finite-dimensional case, we would have inserted a
det
there).
So suppose
σ
(
T
) =
, and consider the function
R
:
C B
(
X
), given by
R(λ) = (T λI)
1
.
We first show this is entire. This, by definition, means
R
is given by a power
series near any point
λ
0
C
. Fix such a point. Then as before, we can expand
T λI = (T λ
0
I)
h
I (T λ
0
I)
1
(T λ
0
I) (T λI)
i
= (T λ
0
I)
h
I (λ λ
0
)(T λ
0
I)
1
i
.
Then for (λ λ
0
) small, we have
(T λI)
1
=
X
i=0
(λ λ
0
)
i
(T λ
0
I)
i
!
(T λ
0
I)
1
=
X
i=0
(λ λ
0
)
i
(T λ
0
I)
i1
.
So this is indeed given by an absolutely convergent power series near λ
0
.
Next, we show R is bounded, i.e.
sup
λC
kR(λ)k
B(X)
< .
It suffices to prove this for λ large. Note that we have
(T λI)
1
= λ
1
(λ
1
T I)
1
= λ
1
X
i=0
λ
i
T
i
.
Hence we get
k(λI T )
1
k
B(X)
|λ|
1
X
i=0
|λ|
i
kT
i
k
B(X)
|λ|
1
X
i=0
|λ|
1
kT k
B(X)
i
1
|λ| kT k
B(X)
,
which tends to 0 as |λ| . So it is bounded.
By “Liouville’s theorem”,
R
(
λ
) is constant, which is clearly a contradiction
since R(λ) 6= R(µ) for λ 6= µ.
Of course, to do this properly, we need a version of Liouville’s theorem for
Banach-space valued functions as opposed to complex-valued functions. So let’s
prove this.
Proposition
(Liouville’s theorem for Banach space-valued analytic function)
.
Let
X
be a Banach space, and
F
:
C X
be entire (in the sense that
F
is given
by an absolutely convergent power series in some neighbourhood of any point)
and norm bounded, i.e.
sup
zC
kF (z)k
X
< .
Then F is constant.
This is a generalization of Liouville’s theorem to the case where the target
of the map is a Banach space. To prove this, we reduce this to the case of
complex-valued functions. To do so, we compose this F with a map X C.
Proof.
Let
f X
. Then we show
f F
:
C C
is bounded and entire. To see
it is bounded, just note that f is a bounded linear map. So
sup
zC
|f F (z)| sup
zC
kfk
X
kF (z)k
X
< .
Analyticity can be shown in a similar fashion, exploiting the fact that
f
is
linear.
Hence Liouville’s theorem implies
f F
is constant, i.e. (
f F
)(
z
) = (
f F
)(0).
In particular, this implies
f
(
F
(
z
)
F
(0)) = 0. Moreover, this is true for all
f X
. Hence by (corollary of) Hahn-Banach theorem, we know
F
(
z
)
F
(0) = 0
for all z C. Therefore F is constant.
We have thus completed our proof that
σ
(
T
) is non-empty, closed and a
subset of {λ C : |λ| kT k
B(X)
}.
However, we want to know more. Apart from the spectrum itself, we also
had the point spectrum
σ
p
(
T
) and the approximate point spectrum
σ
ap
(
T
), and
we had the inclusions
σ
p
(T ) σ
ap
(T ) σ(T ).
We know that the largest set
σ
(
T
) is non-empty, but we want the smaller ones
to be non-empty as well. We have the following theorem:
Theorem. We have
σ
ap
(T ) σ(T ),
where
σ
(
T
) is the boundary of
σ
(
T
) in the topology of
C
. In particular,
σ
ap
(T ) 6= .
On the other hand, it is possible for
σ
p
(
T
) to be empty (in infinite dimensional
cases).
Proof.
Let
λ σ
(
T
). Pick sequence
{λ
n
}
n=1
ρ
(
T
) =
C \ σ
(
T
) such that
λ
n
λ. We claim that R(λ
n
) = (T λ
n
I)
1
satisfies
kR(λ
n
)k
B(X)
.
If this were the case, then we can pick
y
n
X
such that
ky
n
k
0 and
kR(λ
n
)(y
n
)k = 1. Setting x
n
= R(λ
n
)(y
n
), we have
k(T λI)x
n
k k(T λ
n
I)x
n
k
X
+ k(λ λ
n
)x
n
k
X
= k(T λ
n
I)(T λ
n
I)
1
y
n
k
X
+ k(λ λ
n
)x
n
k
= ky
n
k
X
+ |λ λ
n
|
0.
So λ σ
ap
(T ).
Thus, it remains to prove that
kR
(
λ
n
)
k
B(X)
. Recall from last time if
S
1
is invertible, and
kS
1
1
k
B(X)
kS
1
S
2
k
B(X)
1, ()
then S
2
is invertible. Thus, for any µ σ(T ), we have
kR(λ
n
)k
B(X)
|µ λ
n
| = kR(λ
n
)k
B(X)
k(T λ
n
I) (T µI)k
B(X)
1.
Thus, it follows that
kR(λ
n
)k
B(X)
1
inf{|µ λ
n
| : µ σ(T )}
.
So we are done.
Having proven so many theorems, we now look at an specific example.
Example. Consider the shift operator S : `
`
defined by
(a
1
, a
2
, a
3
, ···) 7→ (0, a
1
, a
2
, ···).
Then
S
is a bounded linear operator with norm
kSk
B(`
)
= 1. The theorem
then tells σ(S) is a non-empty closed subset of {λ C : kλk 1}.
First, we want to understand what the point spectrum is. In fact, it is empty.
To show this, suppose
S(a
1
, a
2
, a
3
, ···) = λ(a
1
, a
2
, a
3
, ···)
for some λ C. In other words,
(0, a
1
, a
2
, ···) = λ(a
1
, a
2
, a
3
, ···).
First consider the possibility that
λ
= 0. This would imply that the left is zero.
So a
i
= 0 for all i.
If
λ 6
= 0, then for the first coordinate to match, we must have
a
1
= 0. Then
for the second coordinate to match, we also need
a
2
= 0. By induction, we need
all a
i
= 0. So ker(S λI) = {0} for all λ C.
To find the spectrum, we will in fact show that
σ(S) = D = {λ C : |λ| 1}.
To prove this, we need to show that for any
λ D
,
S λI
is not surjective.
The
λ
= 0 case is obvious. For the other cases, we first have a look at what the
image of S λI looks like. We take
(b
1
, b
2
, b
3
, ···) `
.
Suppose for some λ D, there exists (a
1
, a
2
, ···) such that we have
(S λI)(a
1
, a
2
, ···) = (b
1
, b
2
, ···)
In other words, we have
(0, a
1
, a
2
, ···) (λa
1
, λa
2
, λa
3
, ···) = (b
1
, b
2
, b
3
, ···).
So λa
1
= b
1
. Hence we have
a
1
= λ
1
b
1
.
The next line then gives
a
1
λa
2
= b
2
.
Hence
a
2
= λ
1
(b
2
a
1
) = λ
1
(b
2
+ λ
1
b
1
).
Inductively, we can show that
a
n
= λ
1
(b
n
+ λ
1
b
n1
+ λ
2
b
n2
+ ··· + λ
n+1
b
1
).
Now if
|λ|
1, we pick
b
n
such that
|b
n
|
= 1 and
λ
i
b
ni
=
|λ|
i
. Then we must
have
|a
n
|
. Such a sequence (
a
n
)
6∈ `
. So (
b
n
)
6∈ im
(
S λI
). Therefore
for |λ| 1, S λI is not surjective.
Hence we have
σ
(
S
)
D
. By the theorem, we also know
σ
(
S
)
D
. So in
fact σ(S) = D.
Finally, we show that
σ
ap
(S) = D = {λ C : |λ| = 1}.
Our theorem tells us that
D σ
ap
(
S
)
D
. To show that indeed
D
=
σ
ap
(
S
),
note that if |λ| < 1, then for all x `
.
k(S λI)xk
`
kS
x
k
`
|λ|kxk
`
= kxk
`
|λ|kxk
`
= (1 λ)kxk
`
.
So if
|λ| <
1, then there exists no sequence
x
n
with
kx
n
k
`
= 1 and
k
(
S
λI)x
n
k
`
0. So λ is not in the approximate point spectrum.
We see that these results are rather unpleasant and the spectrum behaves
rather unlike the finite dimensional cases. We saw last time that the spectrum
σ
(
T
) can in general be complicated. In fact, any non-empty compact subset of
C
can be realized as the spectrum of some operator
T
on some Hilbert space
H
.
This is an exercise on the Example sheet.
We are now going to introduce a class of “nice” operators whose spectrum
behaves in a way more similar to the finite-dimensional case. In particular, most
of
σ
(
T
) consists of
σ
p
(
T
) and is discrete. This includes, at least, all finite rank
operators (i.e. operators T such that dim(im T )) < ) and their limits.
Definition
(Compact operator)
.
Let
X, Y
be Banach spaces. We say
T
L
(
X, Y
) is compact if for every bounded subset
E
of
X
,
T
(
E
) is totally bounded.
We write B
0
(X) for the set of all compact operators T B(X).
Note that in the definition of
T
, we only required
T
to be a linear map, not
a bounded linear map. However, boundedness comes from the definition for free
because a totally bounded set is bounded.
There is a nice alternative characterization of compact operators:
Proposition.
Let
X, Y
be Banach spaces. Then
T L
(
X, Y
) is compact if and
only if T (B(1)) is totally bounded if and only if T (B(1)) is compact.
The first equivalence is obvious, since
B
(1) is a bounded set, and given any
bounded set
E
, we can rescale it to be contained in
B
(1). The second equivalence
comes from the fact that a space is compact if and only if it is totally bounded
and complete.
The last characterization is what we will use most of the time, and this is
where the name “compact operator” came from.
Proposition.
Let
X
be a Banach space. Then
B
0
(
X
) is a closed subspace of
B(X). Moreover, if T B
0
(X) and S B(X), then T S, ST B
0
(X).
In a more algebraic language, this means
B
0
(
X
) is a closed ideal of the
algebra B(X).
Proof.
There are three things to prove. First, it is obvious that
B
0
(
X
) is a
subspace. To check it is closed, suppose
{T
n
}
n=1
B
0
(
X
) and
kT
n
T k
B(X)
0.
We need to show T B
0
(X), i.e. T (B(1)) is totally bounded.
Let ε > 0. Then there exists N such that
kT T
n
k
B(X)
< ε
whenever
n N
. Take such an
n
. Then
T
n
(
B
(1)) is totally bounded. So there
exists
x
1
, ··· , x
k
B
(1) such that
{T
n
x
i
}
k
i=1
is an
ε
-net for
T
n
(
B
(1)). We now
claim that {T x
i
}
k
i=1
is an 3ε-net for T (B(1)).
This is easy to show. Let
x X
be such that
kxk
1. Then by the triangle
inequality,
kT x T x
i
k
X
kT x T
n
xk + kT
n
x T
n
x
i
k + kT
n
x
i
T x
i
k
ε + kT
n
x T
n
x
i
k
X
+ ε
= 2ε + kT
n
x T
n
x
i
k
X
Now since
{T
n
x
i
}
is an
ε
-net for
T
n
(
B
(1)), there is some
i
such that
kT
n
x
T
n
x
i
k < ε. So this gives
kT x T x
i
k
X
3ε.
Finally, let
T B
0
(
X
) and
S B
(
X
). Let
{x
n
} X
such that
kx
n
k
X
1. Since
T
is compact, i.e.
T (B(1))
is compact, there exists a convergence subsequence
of {T x
i
}.
Since
S
is bounded, it maps a convergent sequence to a convergent sequence.
So
{ST x
n
}
also has a convergent subsequence. So
ST (B(1))
is compact. So
ST
is compact.
We also have to show that
T S
(
B
(1)) is totally bounded. Since
S
is bounded,
S
(
B
(1)) is bounded. Since
T
sends a bounded set to a totally bounded set, it
follows that T S(B(1)) is totally bounded. So T S is compact.
At this point, it helps to look at some examples at actual compact operators.
Example.
(i)
Let
X, Y
by Banach spaces. Then any finite rank operator in
B
(
X, Y
) is
compact. This is since
T
(
B
(1)) is a bounded subset of a finite-dimensional
space (since
T
is bounded), and any bounded subset of a finite-dimensional
space is totally bounded.
In particular, any
f X
is compact. Moreover, by the previous propo-
sition, limits of finite-rank operators are also compact (since
B
0
(
X
) is
closed).
(ii)
Let
X
be a Banach space. Then
I
:
X X
is compact if and only if
X
is
finite dimensional. This is since
B(1)
is compact if and only if the space is
finite-dimensional.
(iii)
Let
K
:
R R
be a smooth function. Define
T
:
C
([0
,
1])
C
([0
,
1]) by
the convolution
(T f)(x) =
Z
1
0
K(x y)f(y) dy.
We first show T is bounded. This is since
sup
x
|T f(x)| sup
z[1,1]
K(z) sup
y
|f(y)|.
Since
K
is smooth, it is bounded on [
1
,
1]. So
T f
is bounded. In fact,
T
is compact. To see this, notice
sup
x
d(T f)
dx
(x)
sup
z[1,1]
|K
0
(z)|sup
y
|f(y)|.
Therefore if we have a sequence
{f
n
} C
([0
,
1]) with
kf
n
k
C([0,1])
1,
then
{T f
n
}
n=1
is uniformly bounded with uniformly bounded derivative,
and hence equicontinuous. By Arzel`a-Ascoli theorem,
{T f
n
}
n=1
has a
convergent subsequence. Therefore T (B(1)) is compact.
There is much more we can say about the last example. For example, requiring
K
to be smooth is clearly overkill. Requiring, say, differentiable with bounded
derivative is already sufficient. Alternatively, we can ask what we can get if we
work on, say, L
2
instead of C([0, 1]).
However, we don’t have enough time for that, and instead we should return
to developing some general theory. An important result is the following theorem,
characterizing the point spectrum and spectrum of a compact operator.
Theorem.
Let
X
be an infinite-dimensional Banach space, and
T B
(
X
) be a
compact operator. Then
σ
p
(
T
) =
{λ
i
}
is at most countable. If
σ
p
(
T
) is infinite,
then λ
i
0.
The spectrum is given by
σ
(
T
) =
σ
p
(
T
)
{
0
}
. Moreover, for every non-zero
λ
i
σ
p
(T ), the eigenspace has finite dimensions.
Note that it is still possible for a compact operator to have empty point
spectrum. In that case, the spectrum is just
{
0
}
. An example of this is found
on the example sheet.
We will only prove this in the case where
X
=
H
is a Hilbert space. In a
lot of the proofs, we will have a closed subspace
V X
, and we often want to
pick an element in
X \ V
that is “far away” from
V
in some sense. If we have a
Hilbert space, then we can pick this element to be an element in the complement
of
V
. If we are not in a Hilbert space, then we need to invoke Riesz’s lemma,
which we shall not go into.
We will break the full proof of the theory into many pieces.
Proposition.
Let
H
be a Hilbert space, and
T B
0
(
H
) a compact operator.
Let
a >
0. Then there are only finitely many linearly independent eigenvectors
whose eigenvalue have magnitude a.
This already gives most of the theorem, since this mandates
σ
p
(
T
) is at most
countable, and if
σ
p
(
T
) is infinite, we must have
λ
i
0. Since there are only
finitely many linearly independent eigenvectors, the eigenspaces are also finite.
Proof.
Suppose not. There there are infinitely many independent
x
1
, x
2
, x
3
, ···
such that T x
i
= λ
i
x
i
with |λ
i
| a.
Define
X
n
=
span{x
1
, ··· , x
n
}
. Since the
x
i
’s are linearly independent, there
exists y
n
X
n
X
n1
with ky
n
k
H
= 1.
Now let
z
n
=
y
n
λ
n
.
Note that
kz
n
k
H
1
a
.
Since
X
n
is spanned by the eigenvectors, we know that
T
maps
X
n
into itself.
So we have
T z
n
X
n
.
Moreover, we claim that T z
n
y
n
X
n1
. We can check this directly. Let
y
n
=
n
X
k=1
c
k
x
k
.
Then we have
T z
n
y
n
=
1
λ
n
T
n
X
k=1
c
k
x
k
!
n
X
k=1
c
k
x
k
=
n
X
k=1
c
k
λ
k
λ
n
1
x
k
=
n1
X
k=1
c
k
λ
k
λ
n
1
x
k
X
n1
.
We next claim that
kT z
n
T z
m
k
H
1 whenever
n > m
. If this holds, then
T
is not compact, since T z
n
does not have a convergent subsequence.
To show this, wlog, assume n > m. We have
kT z
n
T z
m
k
2
H
= k(T z
n
y
n
) (T z
m
y
n
)k
2
H
Note that
T z
n
y
n
X
n1
, and since
m < n
, we also have
T z
m
X
n1
. By
construction, y
n
X
n1
. So by Pythagorean theorem, we have
= kT z
n
y
n
T z
m
k
2
H
+ ky
n
k
2
H
ky
n
k
2
= 1
So done.
To prove the previous theorem, the only remaining thing to prove is that
σ
(
T
) =
σ
p
(
T
)
{
0
}
. In order to prove this, we need a lemma, which might seem
a bit unmotivated at first, but will soon prove itself useful.
Lemma.
Let
H
be a Hilbert space, and
T B
(
H
) compact. Then
im
(
I T
) is
closed.
Proof.
We let
S
be the orthogonal complement of
ker
(
I T
), which is a closed
subspace, hence a Hilbert space. We shall consider the restriction (
I T
)
|
S
,
which has the same image as I T .
To show that
im
(
I T
) is closed, it suffices to show that (
I T
)
|
S
is bounded
below, i.e. there is some C > 0 such that
kxk
H
Ck(I T )xk
H
for all x S. If this were the case, then if (I T )x
n
y in H, then
kx
n
x
m
k Ck(I T )(x
n
x
m
)k 0,
and so
{x
n
}
is a Cauchy sequence. Write
x
n
x
. Then by continuity, (
IT
)
x
=
y, and so y im(I T ).
Thus, suppose (
I T
) is not bounded below. Pick
x
n
such that
kx
n
k
H
= 1,
but (
I T
)
x
n
0. Since
T
is compact, we know
T x
n
has a convergent
subsequence. We may wlog
T x
n
y
. Then since
kT x
n
x
n
k
H
0, it follows
that we also have x
n
y. In particular, kyk = 1 6= 0, and y S.
But
x
n
y
also implies
T x
n
T y
. So this means we must have
T y
=
y
.
But this is a contradiction, since y does not lie in ker(I T ).
Proposition.
Let
H
be a Hilbert space,
T B
(
H
) compact. If
λ 6
= 0 and
λ σ(T ), then λ σ
p
(T ).
Proof.
We will prove if
λ 6
= 0 and
λ 6∈ σ
p
(
T
), then
λ 6∈ σ
(
T
). In other words,
let
λ 6
= 0 and
ker
(
T λI
) =
{
0
}
. We will show that
T λI
is surjective, i.e.
im(T λI) = H.
Suppose this is not the case. Denote
H
0
=
H
and
H
1
=
im
(
T λI
). We
know that
H
1
is closed and is hence a Hilbert space. Moreover,
H
1
( H
0
by
assumption.
We now define the sequence {H
n
} recursively by
H
n
= (T λI)H
n1
.
We claim that
H
n
( H
n1
. This must be the case, because the map (
T λI
)
n
:
H
0
H
n
is an isomorphism (it is injective and surjective). So the inclusion
H
n
H
n1
is isomorphic to the inclusion H
1
H
0
, which is strict.
Thus we have a strictly decreasing sequence
H
0
) H
1
) H
2
) ···
Let
y
n
be such that
y
n
H
n
,
y
n
H
n+1
and
ky
n
k
H
= 1. We now claim
kT y
n
T y
m
k |λ|
if
n 6
=
m
. This then contradicts the compactness of
T
. To
show this, again wlog we can assume that n > m. Then we have
kT y
n
T y
m
k
2
H
= k(T y
n
λy
n
) (T y
m
λy
m
) λy
m
+ λy
n
k
2
= k(T λI)y
n
(T λI)y
m
λy
m
+ λy
n
k
2
H
Now note that (
T λI
)
y
n
H
n+1
H
m+1
, while (
T λI
)
y
m
and
λy
n
are both
in
H
m+1
. So
λy
m
is perpendicular to all of them, and Pythagorean theorem
tells
= |λ|
2
ky
m
k
2
+ k(T λI)y
m
(T λI)y
m
λy
m
k
2
|λ|
2
ky
m
k
2
= |λ|
2
.
This contradicts the compactness of T . Therefore im(T λI) = H.
Finally, we can prove the initial theorem.
Theorem.
Let
H
be an infinite-dimensional Hilbert space, and
T B
(
H
) be a
compact operator. Then
σ
p
(
T
) =
{λ
i
}
is at most countable. If
σ
p
(
T
) is infinite,
then λ
i
0.
The spectrum is given by
σ
(
T
) =
σ
p
(
T
)
0. Moreover, for every non-zero
λ
i
σ
p
(T ), the eigenspace has finite dimensions.
Proof.
As mentioned, it remains to show that
σ
(
T
) =
σ
p
(
T
)
{
0
}
. The previous
proposition tells us
σ
(
T
)
\{
0
} σ
p
(
T
). So it only remains to show that 0
σ
(
T
).
There are two possible cases. The first is if
{λ
i
}
is infinite. We have already
shown that λ
i
0. So 0 σ(T ) by the closedness of the spectrum.
Otherwise, if {λ
i
} is finite, let E
λ
1
, ··· , E
λ
n
be the eigenspaces. Define
H
0
= span{E
λ
1
, ··· , E
λ
n
}
.
This is non-empty, since each
E
λ
i
is finite-dimensional, but
H
is infinite dimen-
sional. Then T restricts to T |
H
0
: H
0
H
0
.
Now
T |
H
0
has no non-zero eigenvalues. By the previous discussion, we know
σ(T |
H
0
) {0}. By non-emptiness of σ(T|
H
0
), we know 0 σ(T |
H
0
) σ(T).
So done.
4.6 Self-adjoint operators
We have just looked at compact operators. This time, we are going to add a
condition of self-adjointness.
Definition
(Self-adjoint operator)
.
Let
H
be a Hilbert space,
T B
(
H
). Then
T is self-adjoint or Hermitian if for all x, y H, we have
hT x, yi = hx, T yi.
It is important to note that we defined the term for bounded linear operators
T
. If we have unbounded operators instead, Hermitian means something different
from self-adjoint, and we have to be careful.
Recall that we defined the adjoint of a linear map to be a map of the dual
spaces. However, we will often abuse notation and call
T
:
H H
the adjoint,
which is the (unique) operator such that for all x, y H,
hT x, yi = hx, T
yi.
It is an exercise to show that this is well-defined.
How is this related to the usual adjoint? Let
˜
T
:
H
H
be the usual
adjoint. Then we have
T
= φ
1
˜
T
φ,
where φ : H H
is defined by
φ(v)(w) = hw, vi
as in the Reisz representation theorem.
The main result regarding self-adjoint operators is the spectral theorem:
Theorem
(Spectral theorem)
.
Let
H
be an infinite dimensional Hilbert space
and T : H H a compact self-adjoint operator.
(i) σ
p
(T ) = {λ
i
}
N
i=1
is at most countable.
(ii) σ
p
(T ) R.
(iii) σ(T ) = {0} σ
p
(T ).
(iv) If E
λ
i
are the eigenspaces, then dim E
λ
i
is finite if λ
i
6= 0.
(v) E
λ
i
E
λ
j
if λ
i
6= λ
j
.
(vi) If {λ
i
} is infinite, then λ
i
0.
(vii)
T =
N
X
i=1
λ
i
P
E
λ
i
.
We have already shown (i), (iii), (iv) and (vi). The parts (ii) and (v) we
already did in IA Vectors and Matrices, but for completeness, we will do the
proof again. They do not require compactness. The only non-trivial bit left is
the last part (vii).
We first do the two easy bits.
Proposition.
Let
H
be a Hilbert space and
T B
(
H
) self-adjoint. Then
σ
p
(T ) R.
Proof.
Let
λ σ
p
(
T
) and
v ker
(
T λI
)
\ {
0
}
. Then by definition of
v
, we
have
λ =
hT v, vi
kvk
2
H
=
hv, T vi
kvk
2
H
=
¯
λ.
So λ R.
Proposition.
Let
H
be a Hilbert space and
T B
(
H
) self-adjoint. If
λ, µ
σ
p
(T ) and λ 6= µ, then E
λ
E
µ
.
Proof. Let v ker(T λI) \ {0} and w ker(T µI) \{0}. Then
λhv, wi = hT v , wi = hv, T wi = ¯µhv, wi = µhv, wi,
using the fact that eigenvalues are real. Since
λ 6
=
µ
by assumption, we must
have hv, wi = 0.
To prove the final part, we need the following proposition:
Proposition.
Let
H
be a Hilbert space and
T B
(
H
) a compact self-adjoint
operator. If T 6= 0, then T has a non-zero eigenvalue.
This is consistent with our spectral theorem, since if
T
is non-zero, then
something in the sum
λ
i
P
E
λ
i
has to be non-zero. It turns out this is most of
the work we need.
However, to prove this, we need the following lemma:
Lemma.
Let
H
be a Hilbert space, and
T B
(
H
) a compact self-adjoint
operator. Then
kT k
B(H)
= sup
kxk
H
=1
|hx, T xi|
Proof. Write
λ = sup
kxk
H
=1
|hx, T xi|.
Note that one direction is easy, since for all x, Cauchy-Schwarz gives
|hx, T xi| kT xk
H
kxk
H
= kT k
B(H)
kxk
2
H
.
So it suffices to show the inequality in the other direction. We now claim that
kT k
B(H)
= sup
kxk
H
=1,kyk
H
=1
|hT x, yi|.
To show this, recall that
φ
:
H H
defined by
v 7→ h·, vi
is an isometry. By
definition, we have
kT k
B(H)
= sup
kxk
H
=1
kT xk
H
= sup
kxk
H
=1
kφ(T x)k
H
= sup
kxk
H
=1
sup
kyk
H
=1
|hy, T xi|.
Hence, it suffices to show that
sup
kxk
H
=1,kyk
H
=1
|hT x, yi| λ.
Take
x, y H
such that
kxk
H
=
kyk
H
= 1. We first perform a trick similar to
the polarization identity. First, by multiplying
y
by an appropriate scalar, we
can wlog assume hT x, yi is real. Then we have
|hT (x + y), x + yi hT (x y), x yi| = 2|hT x, yi+ hT y, xi|
= 4|hT x, yi|.
Hence we have
|hT x, yi| =
1
4
|hT (x + y), x + yi hT (x y), x yi|
1
4
(λkx + yk
2
H
+ λkx yk
2
H
)
=
λ
4
(2kxk
2
H
+ 2kyk
2
H
)
= λ,
where we used the parallelogram law. So we have kTk
B(H)
λ.
Finally, we can prove our proposition.
Proposition.
Let
H
be a Hilbert space and
T B
(
H
) a compact self-adjoint
operator. If T 6= 0, then T has a non-zero eigenvalue.
Proof.
Since
T 6
= 0, then
kT k
B(H)
6
= 0. Let
kT k
B(H)
=
λ
. We now claim that
either λ or λ is an eigenvalue of T.
By the previous lemma, there exists a sequence
{x
n
}
n=1
H
such that
kx
n
k
H
= 1 and hx
n
, T x
n
i ±λ.
We consider the two cases separately. Suppose
hx
n
, T x
n
i λ
. Consider
T x
n
λx
n
. Since
T
is compact, there exists a subsequence such that
T x
n
k
y
for some
y H
. For simplicity of notation, we assume
T x
n
y
itself. We have
0 kT x
n
λx
n
k
2
H
= hT x
n
λx
n
, T x
n
λx
n
i
= kT x
n
k
2
H
2λhT x
n
, x
n
i + λ
2
kx
n
k
2
λ
2
2λ
2
+ λ
2
= 0
as
n
. Note that we implicitly used the fact that
hT x
n
, x
n
i
=
hx
n
, T x
n
i
since hT x
n
, x
n
i is real. So we must have
kT x
n
λx
n
k
2
H
0.
In other words,
x
n
1
λ
y.
Finally, we show y is an eigenvector. This is easy, since
T y = lim
n→∞
T (λx
n
) = λy.
The case where
x
n
λ
is entirely analogous. In this case,
λ
is an eigenvalue.
The proof is exactly the same, apart form some switching of signs.
Finally, we can prove the last part of the spectral theorem.
Proposition.
Let
H
be an infinite dimensional Hilbert space and
T
:
H H
a compact self-adjoint operator. Then
T =
N
X
i=1
λ
i
P
E
λ
i
.
Proof. Let
U = span{E
λ
1
, E
λ
2
, ···}.
Firstly, we clearly have
T |
U
=
N
X
i=1
λ
i
P
E
λ
i
.
This is since for any x U can be written as
x =
n
X
i=1
P
E
λ
i
x.
Less trivially, this is also true for
¯
U, i.e.
T |
¯
U
=
N
X
i=1
λ
i
P
E
λ
i
,
but this is also clear from definition once we stare at it hard enough.
We also know that
H =
¯
U U
.
It thus suffices to show that
T |
U
= 0.
But since
T |
U
has no non-zero eigenvalues, this follows from our previous
proposition. So done.