Part IB — Linear Algebra
Based on lectures by S. J. Wadsley
Notes taken by Dexter Chua
Michaelmas 2015
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Definition of a vector space (over
R
or
C
), subspaces, the space spanned by a subset.
Linear independence, bases, dimension. Direct sums and complementary subspaces.
[3]
Linear maps, isomorphisms. Relation between rank and nullity. The space of linear
maps from
U
to
V
, representation by matrices. Change of basis. Row rank and column
rank. [4]
Determinant and trace of a square matrix. Determinant of a product of two matrices
and of the inverse matrix. Determinant of an endomorphism. The adjugate matrix. [3]
Eigenvalues and eigenvectors. Diagonal and triangular forms. Characteristic and
minimal polynomials. CayleyHamilton Theorem over
C
. Algebraic and geometric
multiplicity of eigenvalues. Statement and illustration of Jordan normal form. [4]
Dual of a finitedimensional vector space, dual bases and maps. Matrix representation,
rank and determinant of dual map. [2]
Bilinear forms. Matrix representation, change of basis. Symmetric forms and their link
with quadratic forms. Diagonalisation of quadratic forms. Law of inertia, classification
by rank and signature. Complex Hermitian forms. [4]
Inner product spaces, orthonormal sets, orthogonal projection,
V
=
W ⊕ W
⊥
. Gram
Schmidt orthogonalisation. Adjoints. Diagonalisation of Hermitian matrices. Orthogo
nality of eigenvectors and properties of eigenvalues. [4]
Contents
0 Introduction
1 Vector spaces
1.1 Definitions and examples
1.2 Linear independence, bases and the Steinitz exchange lemma
1.3 Direct sums
2 Linear maps
2.1 Definitions and examples
2.2 Linear maps and matrices
2.3 The first isomorphism theorem and the ranknullity theorem
2.4 Change of basis
2.5 Elementary matrix operations
3 Duality
3.1 Dual space
3.2 Dual maps
4 Bilinear forms I
5 Determinants of matrices
6 Endomorphisms
6.1 Invariants
6.2 The minimal polynomial
6.2.1 Aside on polynomials
6.2.2 Minimal polynomial
6.3 The CayleyHamilton theorem
6.4 Multiplicities of eigenvalues and Jordan normal form
7 Bilinear forms II
7.1 Symmetric bilinear forms and quadratic forms
7.2 Hermitian form
8 Inner product spaces
8.1 Definitions and basic properties
8.2 GramSchmidt orthogonalization
8.3 Adjoints, orthogonal and unitary maps
8.4 Spectral theory
0 Introduction
In IA Vectors and Matrices, we have learnt about vectors (and matrices) in a
rather applied way. A vector was just treated as a “list of numbers” representing
a point in space. We used these to represent lines, conics, planes and many
other geometrical notions. A matrix is treated as a “physical operation” on
vectors that stretches and rotates them. For example, we studied the properties
of rotations, reflections and shears of space. We also used matrices to express
and solve systems of linear equations. We mostly took a practical approach in
the course.
In IB Linear Algebra, we are going to study vectors in an abstract setting.
Instead of treating vectors as “lists of numbers”, we view them as things we
can add and scalarmultiply. We will write down axioms governing how these
operations should behave, just like how we wrote down the axioms of group
theory. Instead of studying matrices as an array of numbers, we instead look at
linear maps between vector spaces abstractly.
In the course, we will, of course, prove that this abstract treatment of linear
algebra is just “the same as” our previous study of “vectors as a list of numbers”.
Indeed, in certain cases, results are much more easily proved by working with
matrices (as an array of numbers) instead of abstract linear maps, and we don’t
shy away from doing so. However, most of the time, looking at these abstractly
will provide a much better fundamental understanding of how things work.
1 Vector spaces
1.1 Definitions and examples
Notation. We will use F to denote an arbitrary field, usually R or C.
Intuitively, a vector space V over a field F (or an Fvector space) is a space
with two operations:
– We can add two vectors v
1
, v
2
∈ V to obtain v
1
+ v
2
∈ V .
– We can multiply a scalar λ ∈ F with a vector v ∈ V to obtain λv ∈ V .
Of course, these two operations must satisfy certain axioms before we can
call it a vector space. However, before going into these details, we first look at a
few examples of vector spaces.
Example.
(i) R
n
=
{column vectors of length n with coefficients in R}
with the usual
addition and scalar multiplication is a vector space.
An
m × n
matrix
A
with coefficients in
R
can be viewed as a linear map
from R
m
to R
n
via v 7→ Av.
This is a motivational example for vector spaces. When confused about
definitions, we can often think what the definition means in terms of
R
n
and matrices to get some intuition.
(ii)
Let
X
be a set and define
R
X
=
{f
:
X → R}
with addition (
f
+
g
)(
x
) =
f
(
x
) +
g
(
x
) and scalar multiplication (
λf
)(
x
) =
λf
(
x
). This is a vector
space.
More generally, if
V
is a vector space,
X
is a set, we can define
V
X
=
{f
:
X → V } with addition and scalar multiplication as above.
(iii) Let [a, b] ⊆ R be a closed interval, then
C([a, b], R) = {f ∈ R
[a,b]
: f is continuous}
is a vector space, with operations as above. We also have
C
∞
([a, b], R) = {f ∈ R
[a,b]
: f is infinitely differentiable}
(iv)
The set of
m × n
matrices with coefficients in
R
is a vector space, using
componentwise addition and scalar multiplication, is a vector space.
Of course, we cannot take a random set, define some random operations
called addition and scalar multiplication, and call it a vector space. These
operations have to behave sensibly.
Definition
(Vector space)
.
An
F
vector space is an (additive) abelian group
V
together with a function F × V → V , written (λ, v) 7→ λv , such that
(i) λ(µv) = λµv for all λ, µ ∈ F, v ∈ V (associativity)
(ii) λ(u + v) = λu + λv for all λ ∈ F, u, v ∈ V (distributivity in V )
(iii) (λ + µ)v = λv + µv for all λ, µ ∈ F, v ∈ V (distributivity in F)
(iv) 1v = v for all v ∈ V (identity)
We always write
0
for the additive identity in
V
, and call this the identity.
By abuse of notation, we also write 0 for the trivial vector space {0}.
In a general vector space, there is no notion of “coordinates”, length, angle
or distance. For example, it would be difficult to assign these quantities to the
vector space of realvalued continuous functions in [a, b].
From the axioms, there are a few results we can immediately prove.
Proposition.
In any vector space
V
, 0
v
=
0
for all
v ∈ V
, and (
−
1)
v
=
−v
,
where −v is the additive inverse of v.
Proof is left as an exercise.
In mathematics, whenever we define “something”, we would also like to define
a “subsomething”. In the case of vector spaces, this is a subspace.
Definition
(Subspace)
.
If
V
is an
F
vector space, then
U ⊆ V
is an (
F
linear)
subspace if
(i) u, v ∈ U implies u + v ∈ U .
(ii) u ∈ U, λ ∈ F implies λu ∈ U .
(iii) 0 ∈ U .
These conditions can be expressed more concisely as “
U
is nonempty and if
λ, µ ∈ F, u, v ∈ U , then λu + µv ∈ U”.
Alternatively,
U
is a subspace of
V
if it is itself a vector space, inheriting the
operations from V .
We sometimes write U ≤ V if U is a subspace of V .
Example.
(i) {(x
1
, x
2
, x
3
) ∈ R
3
: x
1
+ x
2
+ x
3
= t} is a subspace of R
3
iff t = 0.
(ii)
Let
X
be a set. We define the support of
f
in
F
X
to be
supp
(
f
) =
{x ∈ X
:
f
(
x
)
6
= 0
}
. Then the set of functions with finite support
forms a vector subspace. This is since
supp
(
f
+
g
)
⊆ supp
(
f
)
∪ supp
(
g
),
supp(λf) = supp(f) (for λ 6= 0) and supp(0) = ∅.
If we have two subspaces
U
and
V
, there are several things we can do with
them. For example, we can take the intersection
U ∩ V
. We will shortly show
that this will be a subspace. However, taking the union will in general not
produce a vector space. Instead, we need the sum:
Definition
(Sum of subspaces)
.
Suppose
U, W
are subspaces of an
F
vector
space V . The sum of U and V is
U + W = {u + w : u ∈ U, w ∈ W }.
Proposition.
Let
U, W
be subspaces of
V
. Then
U
+
W
and
U ∩ W
are
subspaces.
Proof. Let u
i
+ w
i
∈ U + W , λ, µ ∈ F. Then
λ(u
1
+ w
1
) + µ(u
2
+ w
2
) = (λu
1
+ µu
2
) + (λw
1
+ µw
2
) ∈ U + W.
Similarly, if
v
i
∈ U ∩ W
, then
λv
1
+
µv
2
∈ U
and
λv
1
+
µv
2
∈ W
. So
λv
1
+ µv
2
∈ U ∩ W .
Both U ∩ W and U + W contain 0, and are nonempty. So done.
In addition to subsomethings, we often have quotientsomethings as well.
Definition
(Quotient spaces)
.
Let
V
be a vector space, and
U ⊆ V
a subspace.
Then the quotient group
V/U
can be made into a vector space called the quotient
space, where scalar multiplication is given by (λ, v + U ) = (λv) + U.
This is well defined since if
v
+
U
=
w
+
U ∈ V /U
, then
v − w ∈ U
. Hence
for λ ∈ F, we have λv − λw ∈ U. So λv + U = λw + U .
1.2 Linear independence, bases and the Steinitz exchange
lemma
Recall that in
R
n
, we had the “standard basis” made of vectors of the form
e
i
= (0
, ··· ,
0
,
1
,
0
, ··· ,
0), with 1 in the
i
th component and 0 otherwise. We
call this a basis because everything in
R
n
can be (uniquely) written as a sum
of (scalar multiples of) these basis elements. In other words, the whole
R
n
is
generated by taking sums and multiples of the basis elements.
We would like to capture this idea in general vector spaces. The most
important result in this section is to prove that for any vector space
V
, any two
basis must contain the same number of elements. This means we can define the
“dimension” of a vector space as the number of elements in the basis.
While this result sounds rather trivial, it is a very important result. We will
in fact prove a slightly stronger statement than what was stated above, and this
ensures that the dimension of a vector space is wellbehaved. For example, the
subspace of a vector space has a smaller dimension than the larger space (at
least when the dimensions are finite).
This is not the case when we study modules in IB Groups, Rings and Modules,
which are generalizations of vector spaces. Not all modules have basis, which
makes it difficult to define the dimension. Even for those that have basis, the
behaviour of the “dimension” is complicated when, say, we take submodules.
The existence and wellbehavedness of basis and dimension is what makes linear
algebra different from modules.
Definition
(Span)
.
Let
V
be a vector space over
F
and
S ⊆ V
. The span of
S
is defined as
hSi =
(
n
X
i=1
λ
i
s
i
: λ
i
∈ F, s
i
∈ S, n ≥ 0
)
This is the smallest subspace of V containing S.
Note that the sums must be finite. We will not play with infinite sums, since
the notion of convergence is not even well defined in a general vector space.
Example.
(i) Let V = R
3
and S =
1
0
0
,
0
1
1
,
1
2
2
. Then
hSi =
a
b
b
: a, b ∈ R
.
Note that any subset of S of order 2 has the same span as S.
(ii) Let X be a set, x ∈ X. Define the function δx : X → F by
δx(y) =
(
1 y = x
0 y 6= x
.
Then hδx : x ∈ Xi is the set of all functions with finite support.
Definition
(Spanning set)
.
Let
V
be a vector space over
F
and
S ⊆ V
.
S
spans
V if hSi = V .
Definition
(Linear independence)
.
Let
V
be a vector space over
F
and
S ⊆ V
.
Then S is linearly independent if whenever
n
X
i=1
λ
i
s
i
= 0 with λ
i
∈ F, s
1
, s
2
, ··· , s
n
∈ S distinct,
we must have λ
i
= 0 for all i.
If S is not linearly independent, we say it is linearly dependent.
Definition
(Basis)
.
Let
V
be a vector space over
F
and
S ⊆ V
. Then
S
is a
basis for V if S is linearly independent and spans V .
Definition
(Finite dimensional)
.
A vector space is finite dimensional if there
is a finite basis.
Ideally, we would want to define the dimension as the number of vectors in
the basis. However, we must first show that this is welldefined. It is certainly
plausible that a vector space has a basis of size 7 as well as a basis of size 3. We
must show that this can never happen, which is something we’ll do soon.
We will first have an example:
Example.
Again, let
V
=
R
3
and
S
=
1
0
0
,
0
1
1
,
1
2
2
. Then
S
is
linearly dependent since
1
1
0
0
+ 2
0
1
1
+ (−1)
1
2
2
= 0.
S also does not span V since
0
0
1
6∈ hSi.
Note that no linearly independent set can contain
0
, as 1
· 0
=
0
. We also
have h∅i = {0} and ∅ is a basis for this space.
There is an alternative way in which we can define linear independence.
Lemma. S ⊆ V
is linearly dependent if and only if there are distinct
s
0
, ··· , s
n
∈
S and λ
1
, ··· , λ
n
∈ F such that
n
X
i=1
λ
i
s
i
= s
0
.
Proof.
If
S
is linearly dependent, then there are some
λ
1
, ··· , λ
n
∈ F
not all
zero and s
1
, ··· , s
n
∈ S such that
P
λ
i
s
i
= 0. Wlog, let λ
1
6= 0. Then
s
1
=
n
X
i=2
−
λ
i
λ
1
s
i
.
Conversely, if s
0
=
P
n
i=1
λ
i
s
i
, then
(−1)s
0
+
n
X
i=1
λ
i
s
i
= 0.
So S is linearly dependent.
This in turn gives an alternative characterization of what it means to be a
basis:
Proposition.
If
S
=
{e
1
, ··· , e
n
}
is a subset of
V
over
F
, then it is a basis if
and only if every
v ∈ V
can be written uniquely as a finite linear combination
of elements in S, i.e. as
v =
n
X
i=1
λ
i
e
i
.
Proof.
We can view this as a combination of two statements: it can be spanned
in at least one way, and it can be spanned in at most one way. We will see that
the first part corresponds to
S
spanning
V
, and the second part corresponds to
S being linearly independent.
In fact,
S
spanning
V
is defined exactly to mean that every item
v ∈ V
can
be written as a finite linear combination in at least one way.
Now suppose that S is linearly independent, and we have
v =
n
X
i=1
λ
i
e
i
=
n
X
i=1
µ
i
e
i
.
Then we have
0 = v − v =
n
X
i=1
(λ
i
− µ
i
)e
i
.
Linear independence implies that
λ
i
− µ
i
= 0 for all
i
. Hence
λ
i
=
µ
i
. So
v
can
be expressed in a unique way.
On the other hand, if S is not linearly independent, then we have
0 =
n
X
i=1
λ
i
e
i
where λ
i
6= 0 for some i. But we also know that
0 =
n
X
i=1
0 · e
i
.
So there are two ways to write 0 as a linear combination. So done.
Now we come to the key theorem:
Theorem
(Steinitz exchange lemma)
.
Let
V
be a vector space over
F
, and
S
=
{e
1
, ··· , e
n
}
a finite linearly independent subset of
V
, and
T
a spanning
subset of
V
. Then there is some
T
0
⊆ T
of order
n
such that (
T \ T
0
)
∪ S
still
spans V . In particular, T  ≥ n.
What does this actually say? This says if
T
is spanning and
S
is independent,
there is a way of grabbing
S
many elements away from
T
and replace them
with S, and the result will still be spanning.
In some sense, the final remark is the most important part. It tells us that
we cannot have a independent set larger than a spanning set, and most of our
corollaries later will only use this remark.
This is sometimes stated in the following alternative way for T  < ∞.
Corollary.
Let
{e
1
, ··· , e
n
}
be a linearly independent subset of
V
, and sup
pose
{f
1
, ··· , f
m
}
spans
V
. Then there is a reordering of the
{f
i
}
such that
{e
1
, ··· , e
n
, f
n+1
, ··· , f
m
} spans V .
The proof is going to be slightly technical and notationally daunting. So it
helps to give a brief overview of what we are going to do in words first. The idea
is to do the replacement one by one. The first one is easy. Start with
e
1
. Since
T is spanning, we can write
e
1
=
X
λ
i
t
i
for some
t
i
∈ T, λ
i
∈ F
nonzero. We then replace with
t
1
with
e
1
. The result is
still spanning, since the above formula allows us to write
t
1
in terms of
e
1
and
the other t
i
.
We continue inductively. For the rth element, we again write
e
r
=
X
λ
i
t
i
.
We would like to just pick a random
t
i
and replace it with
e
r
. However, we
cannot do this arbitrarily, since the lemma wants us to replace something in T
with with
e
r
. After all that replacement procedure before, some of the
t
i
might
have actually come from S.
This is where the linear independence of
S
kicks in. While some of the
t
i
might be from S, we cannot possibly have all all of them being from S, or else
this violates the linear independence of
S
. Hence there is something genuinely
from T , and we can safely replace it with e
r
.
We now write this argument properly and formally.
Proof.
Suppose that we have already found
T
0
r
⊆ T
of order 0
≤ r < n
such that
T
r
= (T \ T
0
r
) ∪ {e
1
, ··· , e
r
}
spans V .
(Note that the case
r
= 0 is trivial, since we can take
T
0
r
=
∅
, and the case
r = n is the theorem which we want to achieve.)
Suppose we have these. Since T
r
spans V , we can write
e
r+1
=
k
X
i=1
λ
i
t
i
, λ
i
∈ F, t
i
∈ T
r
.
We know that the
e
i
are linearly independent, so not all
t
i
’s are
e
i
’s. So there is
some j such that t
j
∈ (T \ T
0
r
). We can write this as
t
j
=
1
λ
j
e
r+1
+
X
i6=j
−
λ
i
λ
j
t
i
.
We let T
0
r+1
= T
0
r
∪ {t
j
} of order r + 1, and
T
r+1
= (T \ T
0
r+1
) ∪ {e
1
, ··· , e
r+1
} = (T
r
\ {t
j
}} ∪ {e
r+1
}
Since t
j
is in the span of T
r
∪ {e
r+1
}, we have t
j
∈ hT
r+1
i. So
V ⊇ hT
r+1
i ⊇ hT
r
i = V.
So hT
r+1
i = V .
Hence we can inductively find T
n
.
From this lemma, we can immediately deduce a lot of important corollaries.
Corollary. Suppose V is a vector space over F with a basis of order n. Then
(i) Every basis of V has order n.
(ii) Any linearly independent set of order n is a basis.
(iii) Every spanning set of order n is a basis.
(iv) Every finite spanning set contains a basis.
(v) Every linearly independent subset of V can be extended to basis.
Proof. Let S = {e
1
, ··· , e
n
} be the basis for V .
(i)
Suppose
T
is another basis. Since
S
is independent and
T
is spanning,
T  ≥ S.
The other direction is less trivial, since
T
might be infinite, and Steinitz
does not immediately apply. Instead, we argue as follows: since
T
is
linearly independent, every finite subset of
T
is independent. Also,
S
is
spanning. So every finite subset of
T
has order at most
S
. So
T  ≤ S
.
So T  = S.
(ii)
Suppose now that
T
is a linearly independent subset of order
n
, but
hT i 6
=
V
. Then there is some
v ∈ V \ hT i
. We now show that
T ∪ {v}
is
independent. Indeed, if
λ
0
v +
m
X
i=1
λ
i
t
i
= 0
with λ
i
∈ F, t
1
, ··· , t
m
∈ T distinct, then
λ
0
v =
m
X
i=1
(−λ
i
)t
i
.
Then
λ
0
v ∈ hT i
. So
λ
0
= 0. As
T
is linearly independent, we have
λ
0
=
···
=
λ
m
= 0. So
T ∪ {v}
is a linearly independent subset of size
> n. This is a contradiction since S is a spanning set of size n.
(iii)
Let
T
be a spanning set of order
n
. If
T
were linearly dependent, then
there is some t
0
, ··· , t
m
∈ T distinct and λ
1
, ··· , λ
m
∈ F such that
t
0
=
X
λ
i
t
i
.
So
t
0
∈ hT \ {t
0
}i
, i.e.
hT \ {t
0
}i
=
V
. So
T \ {t
0
}
is a spanning set of
order n − 1, which is a contradiction.
(iv)
Suppose
T
is any finite spanning set. Let
T
0
⊆ T
be a spanning set of least
possible size. This exists because
T
is finite. If
T
0

has size
n
, then done
by (iii). Otherwise by the Steinitz exchange lemma, it has size
T
0
 > n
.
So
T
0
must be linearly dependent because
S
is spanning. So there is some
t
0
, ··· , t
m
∈ T
distinct and
λ
1
, ··· , λ
m
∈ F
such that
t
0
=
P
λ
i
t
i
. Then
T
0
\ {t
0
} is a smaller spanning set. Contradiction.
(v)
Suppose
T
is a linearly independent set. Since
S
spans, there is some
S
0
⊆ S
of order
T 
such that (
S \S
0
)
∪T
spans
V
by the Steinitz exchange
lemma. So by (ii), (S \ S
0
) ∪ T is a basis of V containing T .
Note that the last part is where we actually use the full result of Steinitz.
Finally, we can use this to define the dimension.
Definition
(Dimension)
.
If
V
is a vector space over
F
with finite basis
S
, then
the dimension of V , written
dim V = dim
F
V = S.
By the corollary,
dim V
does not depend on the choice of
S
. However, it does
depend on
F
. For example,
dim
C
C
= 1 (since
{
1
}
is a basis), but
dim
R
C
= 2
(since {1, i} is a basis).
After defining the dimension, we can prove a few things about dimensions.
Lemma.
If
V
is a finite dimensional vector space over
F
,
U ⊆ V
is a proper
subspace, then U is finite dimensional and dim U < dim V .
Proof.
Every linearly independent subset of
V
has size at most
dim V
. So let
S ⊆ U
be a linearly independent subset of largest size. We want to show that
S
spans U and S < dim V .
If
v ∈ V \hSi
, then
S ∪{v}
is linearly independent. So
v 6∈ U
by maximality
of S. This means that hSi = U.
Since
U 6
=
V
, there is some
v ∈ V \ U
=
V \ hSi
. So
S ∪ {v}
is a linearly
independent subset of order
S
+ 1. So
S
+ 1
≤ dim V
. In particular,
dim U
=
S < dim V .
Proposition.
If
U, W
are subspaces of a finite dimensional vector space
V
, then
dim(U + W ) = dim U + dim W − dim(U ∩ W ).
The proof is not hard, as long as we manage to pick the right basis to do the
proof. This is our slogan:
When you choose a basis, always choose the right basis.
We need a basis for all four of them, and we want to compare the bases. So we
want to pick bases that are compatible.
Proof.
Let
R
=
{v
1
, ··· , v
r
}
be a basis for
U ∩W
. This is a linearly independent
subset of U . So we can extend it to be a basis of U by
S = {v
1
, ··· , v
r
, u
r+1
, ··· , u
s
}.
Similarly, for W , we can obtain a basis
T = {v
1
, ··· , v
r
, w
r+1
, ··· , w
t
}.
We want to show that
dim
(
U
+
W
) =
s
+
t − r
. It is sufficient to prove that
S ∪ T is a basis for U + W .
We first show spanning. Suppose
u
+
w ∈ U
+
W
,
u ∈ U, w ∈ W
. Then
u ∈ hSi and w ∈ hT i. So u + w ∈ hS ∪ T i. So U + W = hS ∪ T i.
To show linear independence, suppose we have a linear relation
r
X
i=1
λ
i
v
i
+
s
X
j=r+1
µ
j
u
j
+
t
X
k=r+1
ν
k
w
k
= 0.
So
X
λ
i
v
i
+
X
µ
j
u
j
= −
X
ν
k
w
k
.
Since the left hand side is something in
U
, and the right hand side is something
in W , they both lie in U ∩ W .
Since
S
is a basis of
U
, there is only one way of writing the left hand vector
as a sum of
v
i
and
u
j
. However, since
R
is a basis of
U ∩ W
, we can write the
left hand vector just as a sum of
v
i
’s. So we must have
µ
j
= 0 for all
j
. Then
we have
X
λ
i
v
i
+
X
ν
k
w
k
= 0.
Finally, since
T
is linearly independent,
λ
i
=
ν
k
= 0 for all
i, k
. So
S ∪ T
is
linearly independent.
Proposition.
If
V
is a finite dimensional vector space over
F
and
U ∪ V
is a
subspace, then
dim V = dim U + dim V/U.
We can view this as a linear algebra version of Lagrange’s theorem. Combined
with the first isomorphism theorem for vector spaces, this gives the ranknullity
theorem.
Proof.
Let
{u
1
, ··· , u
m
}
be a basis for
U
and extend this to a basis
{u
1
, ··· , u
m
,
v
m+1
, ··· , v
n
}
for
V
. We want to show that
{v
m+1
+
U, ··· , v
n
+
U}
is a basis
for V /U.
It is easy to see that this spans V /U. If v + U ∈ V/U, then we can write
v =
X
λ
i
u
i
+
X
µ
i
v
i
.
Then
v + U =
X
µ
i
(v
i
+ U ) +
X
λ
i
(u
i
+ U ) =
X
µ
i
(v
i
+ U ).
So done.
To show that they are linearly independent, suppose that
X
λ
i
(v
i
+ U ) = 0 + U = U.
Then this requires
X
λ
i
v
i
∈ U.
Then we can write this as a linear combination of the u
i
’s. So
X
λ
i
v
i
=
X
µ
j
u
j
for some
µ
j
. Since
{u
1
, ··· , u
m
, v
n+1
, ··· , v
n
}
is a basis for
V
, we must have
λ
i
= µ
j
= 0 for all i, j. So {v
i
+ U } is linearly independent.
1.3 Direct sums
We are going to define direct sums in many ways in order to confuse students.
Definition
((Internal) direct sum)
.
Suppose
V
is a vector space over
F
and
U, W ⊆ V
are subspaces. We say that
V
is the (internal) direct sum of
U
and
W if
(i) U + W = V
(ii) U ∩ W = 0.
We write V = U ⊕ W .
Equivalently, this requires that every
v ∈ V
can be written uniquely as
u
+
w
with u ∈ U, w ∈ W . We say that U and W are complementary subspaces of V .
You will show in the example sheets that given any subspace
U ⊆ V
,
U
must
have a complementary subspace in V .
Example.
Let
V
=
R
2
, and
U
=
h
0
1
i
. Then
h
1
1
i
and
h
1
0
i
are both
complementary subspaces to U in V .
Definition
((External) direct sum)
.
If
U, W
are vector spaces over
F
, the
(external) direct sum is
U ⊕ W = {(u, w) : u ∈ U, w ∈ W },
with addition and scalar multiplication componentwise:
(u
1
, w
1
) + (u
2
, w
2
) = (u
1
+ u
2
, w
1
+ w
2
), λ(u, w) = (λu, λw).
The difference between these two definitions is that the first is decomposing
V
into smaller spaces, while the second is building a bigger space based on two
spaces.
Note, however, that the external direct sum
U ⊕ W
is the internal direct
sum of
U
and
W
viewed as subspaces of
U ⊕ W
, i.e. as the internal direct sum
of
{
(
u, 0
) :
u ∈ U}
and
{
(
0, v
) :
v ∈ V }
. So these two are indeed compatible
notions, and this is why we give them the same name and notation.
Definition
((Multiple) (internal) direct sum)
.
If
U
1
, ··· , U
n
⊆ V
are subspaces
of V , then V is the (internal) direct sum
V = U
1
⊕ ··· ⊕ U
n
=
n
M
i=1
U
i
if every v ∈ V can be written uniquely as v =
P
u
i
with u
i
∈ U
i
.
This can be extended to an infinite sum with the same definition, just noting
that the sum v =
P
u
i
has to be finite.
For more details, see example sheet 1 Q. 10, where we prove in particular
that dim V =
P
dim U
i
.
Definition
((Multiple) (external) direct sum)
.
If
U
1
, ··· , U
n
are vector spaces
over F, the external direct sum is
U
1
⊕ ··· ⊕ U
n
=
n
M
i=1
U
i
= {(u
1
, ··· , u
n
) : u
i
∈ U
i
},
with pointwise operations.
This can be made into an infinite sum if we require that all but finitely many
of the u
i
have to be zero.
2 Linear maps
In mathematics, apart from studying objects, we would like to study functions
between objects as well. In particular, we would like to study functions that
respect the structure of the objects. With vector spaces, the kinds of functions
we are interested in are linear maps.
2.1 Definitions and examples
Definition
(Linear map)
.
Let
U, V
be vector spaces over
F
. Then
α
:
U → V
is a linear map if
(i) α(u
1
+ u
2
) = α(u
1
) + α(u
2
) for all u
i
∈ U .
(ii) α(λu) = λα(u) for all λ ∈ F, u ∈ U.
We write L(U, V ) for the set of linear maps U → V .
There are a few things we should take note of:
–
If we are lazy, we can combine the two requirements to the single require
ment that
α(λu
1
+ µu
2
) = λα(u
1
) + µα(u
2
).
–
It is easy to see that if
α
is linear, then it is a group homomorphism (if we
view vector spaces as groups). In particular, α(0) = 0.
–
If we want to stress the field
F
, we say that
α
is
F
linear. For example,
complex conjugation is a map C → C that is Rlinear but not C linear.
Example.
(i)
Let
A
be an
n×m
matrix with coefficients in
F
. We will write
A ∈ M
n,m
(
F
).
Then α : F
m
→ F
n
defined by v → Av is linear.
Recall matrix multiplication is defined by: if A
ij
is the ijth coefficient of
A, then the ith coefficient of Av is A
ij
v
j
. So we have
α(λu + µv)
i
=
m
X
j=1
A
ij
(λu + µv)
j
= λ
m
X
j=1
A
ij
u
j
+ µ
m
X
j=1
A
ij
v
j
= λα(u)
i
+ µα(v)
i
.
So α is linear.
(ii)
Let
X
be a set and
g ∈ F
X
. Then we define
m
g
:
F
X
→ F
X
by
m
g
(
f
)(
x
) =
g(x)f(x). Then m
g
is linear. For example, f (x) 7→ 2x
2
f(x) is linear.
(iii)
Integration
I
: (
C
([
a, b
])
, R
)
→
(
C
([
a, b
])
, R
) defined by
f 7→
R
x
a
f
(
t
) d
t
is
linear.
(iv) Differentiation D : (C
∞
([a, b]), R) → (C
∞
([a, b]), R) by f 7→ f
0
is linear.
(v)
If
α, β ∈ L
(
U, V
), then
α
+
β
defined by (
α
+
β
)(
u
) =
α
(
u
) +
β
(
u
) is linear.
Also, if λ ∈ F, then λα defined by (λα)(u) = λ(α(u)) is also linear.
In this way, L(U, V ) is also a vector space over F.
(vi)
Composition of linear maps is linear. Using this, we can show that many
things are linear, like differentiating twice, or adding and then multiplying
linear maps.
Just like everything else, we want to define isomorphisms.
Definition
(Isomorphism)
.
We say a linear map
α
:
U → V
is an isomorphism
if there is some β : V → U (also linear) such that α ◦ β = id
V
and β ◦ α = id
U
.
If there exists an isomorphism
U → V
, we say
U
and
V
are isomorphic, and
write U
∼
=
V .
Lemma.
If
U
and
V
are vector spaces over
F
and
α
:
U → V
, then
α
is an
isomorphism iff α is a bijective linear map.
Proof.
If
α
is an isomorphism, then it is clearly bijective since it has an inverse
function.
Suppose
α
is a linear bijection. Then as a function, it has an inverse
β
:
V → U
. We want to show that this is linear. Let
v
1
, v
2
∈ V
,
λ, µ ∈ F
. We
have
αβ(λv
1
+ µv
2
) = λv
1
+ µv
2
= λαβ(v
1
) + µαβ(v
2
) = α(λβ(v
1
) + µβ(v
2
)).
Since α is injective, we have
β(λv
1
+ µv
2
) = λβ(v
1
) + µβ(v
2
).
So β is linear.
Definition
(Image and kernel)
.
Let
α
:
U → V
be a linear map. Then the
image of α is
im α = {α(u) : u ∈ U }.
The kernel of α is
ker α = {u : α(u) = 0}.
It is easy to show that these are subspaces of V and U respectively.
Example.
(i)
Let
A ∈ M
m,n
(
F
) and
α
:
F
n
→ F
m
be the linear map
v 7→ Av
. Then the
system of linear equations
m
X
j=1
A
ij
x
j
= b
i
, 1 ≤ i ≤ n
has a solution iff (b
1
, ··· , b
n
) ∈ im α.
The kernel of α contains all solutions to
P
j
A
ij
x
j
= 0.
(ii) Let β : C
∞
(R, R) → C
∞
(R, R) that sends
β(f )(t) = f
00
(t) + p(t)f
0
(t) + q(t)f (t).
for some p, q ∈ C
∞
(R, R).
Then if
y
(
t
)
∈ im β
, then there is a solution (in
C
∞
(
R, R
)) to the differential
equation
f
00
(t) + p(t)f
0
(t) + q(t)f (t) = y(t).
Similarly, ker β contains the solutions to the homogeneous equation
f
00
(t) + p(t)f
0
(t) + q(t)f (t) = 0.
If two vector spaces are isomorphic, then it is not too surprising that they
have the same dimension, since isomorphic spaces are “the same”. Indeed this is
what we are going to show.
Proposition. Let α : U → V be an Flinear map. Then
(i)
If
α
is injective and
S ⊆ U
is linearly independent, then
α
(
S
) is linearly
independent in V .
(ii) If α is surjective and S ⊆ U spans U, then α(S) spans V .
(iii) If α is an isomorphism and S ⊆ U is a basis, then α(S) is a basis for V .
Here (iii) immediately shows that two isomorphic spaces have the same
dimension.
Proof.
(i)
We prove the contrapositive. Suppose that
α
is injective and
α
(
S
) is linearly
dependent. So there are
s
0
, ··· , s
n
∈ S
distinct and
λ
1
, ··· , λ
n
∈ F
not all
zero such that
α(s
0
) =
n
X
i=1
λ
i
α(s
i
) = α
n
X
i=1
λ
i
s
i
!
.
Since α is injective, we must have
s
0
=
n
X
i=1
λ
i
s
i
.
This is a nontrivial relation of the s
i
in U . So S is linearly dependent.
(ii)
Suppose
α
is surjective and
S
spans
U
. Pick
v ∈ V
. Then there is some
u ∈ U
such that
α
(
u
) =
v
. Since
S
spans
U
, there is some
s
1
, ··· , s
n
∈ S
and λ
1
, ··· , λ
n
∈ F such that
u =
n
X
I=1
λ
i
s
i
.
Then
v = α(u) =
n
X
i=1
λ
i
α(s
i
).
So α(S) spans V .
(iii) Follows immediately from (i) and (ii).
Corollary.
If
U
and
V
are finitedimensional vector spaces over
F
and
α
:
U → V
is an isomorphism, then dim U = dim V .
Note that we restrict it to finitedimensional spaces since we’ve only shown
that dimensions are welldefined for finite dimensional spaces. Otherwise, the
proof works just fine for infinite dimensional spaces.
Proof.
Let
S
be a basis for
U
. Then
α
(
S
) is a basis for
V
. Since
α
is injective,
S = α(S). So done.
How about the other way round? If two vector spaces have the same di
mension, are they necessarily isomorphic? The answer is yes, at least for
finitedimensional ones.
However, we will not just prove that they are isomorphic. We will show that
they are isomorphic in many ways.
Proposition.
Suppose
V
is a
F
vector space of dimension
n < ∞
. Then writing
e
1
, ··· , e
n
for the standard basis of F
n
, there is a bijection
Φ : {isomorphisms F
n
→ V } → {(ordered) basis(v
1
, ··· , v
n
) for V },
defined by
α 7→ (α(e
1
), ··· , α(e
n
)).
Proof.
We first make sure this is indeed a function — if
α
is an isomorphism,
then from our previous proposition, we know that it sends a basis to a basis. So
(α(e
1
), ··· , α(e
n
)) is indeed a basis for V .
We now have to prove surjectivity and injectivity.
Suppose
α, β
:
F
n
→ V
are isomorphism such that Φ(
α
) = Φ(
β
). In other
words, α(e
i
) = β(e
i
) for all i. We want to show that α = β. We have
α
x
1
.
.
.
x
n
= α
n
X
i=1
x
i
e
i
!
=
X
x
i
α(e
i
) =
X
x
i
β(e
i
) = β
x
1
.
.
.
x
n
.
Hence α = β.
Next, suppose that (v
1
, ··· , v
n
) is an ordered basis for V . Then define
α
x
1
.
.
.
x
n
=
X
x
i
v
i
.
It is easy to check that this is welldefined and linear. We also know that
α
is
injective since (
v
1
, ··· , v
n
) is linearly independent. So if
P
x
i
v
i
=
P
y
i
v
i
, then
x
i
=
y
i
. Also,
α
is surjective since (
v
1
, ··· , v
n
) spans
V
. So
α
is an isomorphism,
and by construction Φ(α) = (v
1
, ··· , v
n
).
2.2 Linear maps and matrices
Recall that our first example of linear maps is matrices acting on
F
n
. We will
show that in fact, all linear maps come from matrices. Since we know that all
vector spaces are isomorphic to
F
n
, this means we can represent arbitrary linear
maps on vector spaces by matrices.
This is a useful result, since it is sometimes easier to argue about matrices
than linear maps.
Proposition.
Suppose
U, V
are vector spaces over
F
and
S
=
{e
1
, ··· , e
n
}
is a
basis for
U
. Then every function
f
:
S → V
extends uniquely to a linear map
U → V .
The slogan is “to define a linear map, it suffices to define its values on a
basis”.
Proof.
For uniqueness, first suppose
α, β
:
U → V
are linear and extend
f
:
S →
V . We have sortof proved this already just now.
If u ∈ U, we can write u =
P
n
i=1
u
i
e
i
with u
i
∈ F since S spans. Then
α(u) = α
X
u
i
e
i
=
X
u
i
α(e
i
) =
X
u
i
f(e
i
).
Similarly,
β(u) =
X
u
i
f(e
i
).
So α(u) = β(u) for every u. So α = β.
For existence, if
u ∈ U
, we can write
u
=
P
u
i
e
i
in a unique way. So defining
α(u) =
X
u
i
f(e
i
)
is unambiguous. To show linearity, let λ, µ ∈ F, u, v ∈ U. Then
α(λu + µv) = α
X
(λu
i
+ µv
i
)e
i
=
X
(λu
i
+ µv
i
)f(e
i
)
= λ
X
u
i
f(e
i
)
+ µ
X
v
i
f(e
i
)
= λα(u) + µα(v).
Moreover, α does extend f.
Corollary.
If
U
and
V
are finitedimensional vector spaces over
F
with bases
(e
1
, ··· , e
m
) and (f
1
, ··· , f
n
) respectively, then there is a bijection
Mat
n,m
(F) → L(U, V ),
sending A to the unique linear map α such that α(e
i
) =
P
a
ji
f
j
.
We can interpret this as follows: the
i
th column of
A
tells us how to write
α(e
i
) in terms of the f
j
.
We can also draw a fancy diagram to display this result. Given a basis
e
1
, ··· , e
m
, by our bijection, we get an isomorphism
s
(
e
i
) :
U → F
m
. Similarly,
we get an isomorphism s(f
i
) : V → F
n
.
Since a matrix is a linear map
A
:
F
m
→ F
n
, given a matrix
A
, we can
produce a linear map α : U → V via the following composition
U F
m
F
n
V.
s(e
i
)
A
s(f
i
)
−1
We can put this into a square:
F
m
F
n
U V
A
s(e
i
)
α
s(f
i
)
Then the corollary tells us that every
A
gives rise to an
α
, and every
α
corresponds
to an A that fit into this diagram.
Proof.
If
α
is a linear map
U → V
, then for each 1
≤ i ≤ m
, we can write
α
(
e
i
)
uniquely as
α(e
i
) =
n
X
j=1
a
ji
f
j
for some
a
ji
∈ F
. This gives a matrix
A
= (
a
ij
). The previous proposition tells
us that every matrix A arises in this way, and α is determined by A.
Definition
(Matrix representation)
.
We call the matrix corresponding to a
linear map
α ∈ L
(
U, V
) under the corollary the matrix representing
α
with
respect to the bases (e
1
, ··· , e
m
) and (f
1
, ··· , f
n
).
It is an exercise to show that the bijection
Mat
n,m
(
F
)
→ L
(
U, V
) is an iso
morphism of the vector spaces and deduce that
dim L
(
U, V
) = (
dim U
)(
dim V
).
Proposition.
Suppose
U, V, W
are finitedimensional vector spaces over
F
with
bases R = (u
1
, ··· , u
r
) , S = (v
1
, .., v
s
) and T = (w
1
, ··· , w
t
) respectively.
If
α
:
U → V
and
β
:
V → W
are linear maps represented by
A
and
B
respectively (with respect to
R
,
S
and
T
), then
βα
is linear and represented by
BA with respect to R and T .
F
r
F
s
F
t
U V W
A B
s(R)
α
s(S)
β
s(T )
Proof.
Verifying
βα
is linear is straightforward. Next we write
βα
(
u
i
) as a linear
combination of w
1
, ··· , w
t
:
βα(u
i
) = β
X
k
A
ki
v
k
!
=
X
k
A
ki
β(v
k
)
=
X
k
A
ki
X
j
B
jk
w
j
=
X
j
X
k
B
jk
A
ki
!
w
j
=
X
j
(BA)
ji
w
j
2.3 The first isomorphism theorem and the ranknullity
theorem
The main theorem of this section is the ranknullity theorem, which relates the
dimensions of the kernel and image of a linear map. This is in fact an easy
corollary of a stronger result, known as the first isomorphism theorem, which
directly relates the kernel and image themselves. This first isomorphism is an
exact analogy of that for groups, and should not be unfamiliar. We will also
provide another proof that does not involve quotients.
Theorem
(First isomorphism theorem)
.
Let
α
:
U → V
be a linear map. Then
ker α
and
im α
are subspaces of
U
and
V
respectively. Moreover,
α
induces an
isomorphism
¯α : U/ ker α → im α
(u + ker α) 7→ α(u)
Note that if we view a vector space as an abelian group, then this is exactly
the first isomorphism theorem of groups.
Proof. We know that 0 ∈ ker α and 0 ∈ im α.
Suppose u
1
, u
2
∈ ker α and λ
1
, λ
2
∈ F. Then
α(λ
1
u
1
+ λ
2
u
2
) = λ
1
α(u
1
) + λ
2
α(u
2
) = 0.
So λ
1
u
1
+ λ
2
u
2
∈ ker α. So ker α is a subspace.
Similarly, if
α
(
u
1
)
, α
(
u
2
)
∈ im α
, then
λα
(
u
1
)+
λ
2
α
(
u
2
) =
α
(
λ
1
u
1
+
λ
2
u
2
)
∈
im α. So im α is a subspace.
Now by the first isomorphism theorem of groups,
¯α
is a welldefined isomor
phism of groups. So it remains to show that
¯α
is a linear map. Indeed, we
have
¯α(λ(u + ker α)) = α(λu) = λα(u) = λ(¯α(u + ker α)).
So ¯α is a linear map.
Definition
(Rank and nullity)
.
If
α
:
U → V
is a linear map between finite
dimensional vector spaces over
F
(in fact we just need
U
to be finitedimensional),
the rank of
α
is the number
r
(
α
) =
dim im α
. The nullity of
α
is the number
n(α) = dim ker α.
Corollary
(Ranknullity theorem)
.
If
α
:
U → V
is a linear map and
U
is
finitedimensional, then
r(α) + n(α) = dim U.
Proof.
By the first isomorphism theorem, we know that
U/ ker α
∼
=
im α
. So we
have
dim im α = dim(U/ ker α) = dim U − dim ker α.
So the result follows.
We can also prove this result without the first isomorphism theorem, and
say a bit more in the meantime.
Proposition.
If
α
:
U → V
is a linear map between finitedimensional vector
spaces over
F
, then there are bases (
e
1
, ··· , e
m
) for
U
and (
f
1
, ··· , f
n
) for
V
such that α is represented by the matrix
I
r
0
0 0
,
where r = r(α) and I
r
is the r × r identity matrix.
In particular, r(α) + n(α) = dim U.
Proof.
Let
e
k+1
, ··· , e
m
be a basis for the kernel of
α
. Then we can extend this
to a basis of the (e
1
, ··· , e
m
).
Let
f
i
=
α
(
e
i
) for 1
≤ i ≤ k
. We now show that (
f
1
, ··· , f
k
) is a basis for
im α
(and thus
k
=
r
). We first show that it spans. Suppose
v ∈ im α
. Then we
have
v = α
m
X
i=1
λ
i
e
i
!
for some λ
i
∈ F. By linearity, we can write this as
v =
m
X
i=1
λ
i
α(e
i
) =
k
X
i=1
λ
i
f
i
+ 0.
So v ∈ hf
1
, ··· , f
k
i.
To show linear dependence, suppose that
k
X
i=1
µ
i
f
i
= 0.
So we have
α
k
X
i=1
µ
i
e
i
!
= 0.
So
P
k
i=1
µ
i
e
i
∈ ker α. Since (e
k+1
, ··· , e
m
) is a basis for ker α, we can write
k
X
i=1
µ
i
e
i
=
m
X
i=k+1
µ
i
e
i
for some
µ
i
(
i
=
k
+ 1
, ··· , m
). Since (
e
1
, ··· , e
m
) is a basis, we must have
µ
i
= 0 for all i. So they are linearly independent.
Now we extend (f
1
, ··· , f
r
) to a basis for V , and
α(e
i
) =
(
f
i
1 ≤ i ≤ k
0 k + 1 ≤ i ≤ m
.
Example. Let
W = {x ∈ R
5
: x
1
+ x
2
+ x
3
= 0 = x
3
− x
4
− x
5
}.
What is dim W ? Well, it clearly is 3, but how can we prove it?
We can consider the map α : R
5
→ R
2
given by
x
1
.
.
.
x
5
7→
x
1
+ x
2
+ x
5
x
3
− x
4
− x
5
.
Then
ker α
=
W
. So
dim W
= 5
− r
(
α
). We know that
α
(1
,
0
,
0
,
0
,
0) = (1
,
0)
and α(0, 0, 1, 0, 0) = (0, 1). So r(α) = dim im α = 2. So dim W = 3.
More generally, the ranknullity theorem gives that
m
linear equations in
n
have a space of solutions of dimension at least n − m.
Example.
Suppose that
U
and
W
are subspaces of
V
, all of which are finite
dimensional vector spaces of F. We let
α : U ⊕ W → V
(u, w) 7→ u + w,
where the ⊕ is the external direct sum. Then im α = U + W and
ker α = {(u, −u) : u ∈ U ∩ W }
∼
=
dim(U ∩ W ).
Then we have
dim U + dim W = dim(U ⊕ W ) = r(α) + n(α) = dim(U + W ) + dim(U ∩ W ).
This is a result we’ve previously obtained through fiddling with basis and horrible
stuff.
Corollary.
Suppose
α
:
U → V
is a linear map between vector spaces over
F
both of dimension n < ∞. Then the following are equivalent
(i) α is injective;
(ii) α is surjective;
(iii) α is an isomorphism.
Proof.
It is clear that, (iii) implies (i) and (ii), and (i) and (ii) together implies
(iii). So it suffices to show that (i) and (ii) are equivalent.
Note that
α
is injective iff
n
(
α
) = 0, and
α
is surjective iff
r
(
α
) =
dim V
=
n
.
By the ranknullity theorem,
n
(
α
) +
r
(
α
) =
n
. So the result follows immediately.
Lemma.
Let
A ∈ M
n,n
(
F
) =
M
n
(
F
) be a square matrix. The following are
equivalent
(i) There exists B ∈ M
n
(F) such that BA = I
n
.
(ii) There exists C ∈ M
n
(F) such that AC = I
n
.
If these hold, then
B
=
C
. We call
A
invertible or nonsingular, and write
A
−1
= B = C.