Part IA — Vectors and Matrices
Based on lectures by N. Peake
Notes taken by Dexter Chua
Michaelmas 2014
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Complex numbers
Review of complex numbers, including complex conjugate, inverse, modulus, argument
and Argand diagram. Informal treatment of complex logarithm,
n
th roots and complex
powers. de Moivre’s theorem. [2]
Vectors
Review of elementary algebra of vectors in
R
3
, including scalar product. Brief discussion
of vectors in
R
n
and
C
n
; scalar product and the CauchySchwarz inequality. Concepts
of linear span, linear independence, subspaces, basis and dimension.
Suffix notation: including summation convention,
δ
ij
and
ε
ijk
. Vector product and
triple product: definition and geometrical interpretation. Solution of linear vector
equations. Applications of vectors to geometry, including equations of lines, planes and
spheres. [5]
Matrices
Elementary algebra of 3
×
3 matrices, including determinants. Extension to
n × n
complex matrices. Trace, determinant, nonsingular matrices and inverses. Matrices as
linear transformations; examples of geometrical actions including rotations, reflections,
dilations, shears; kernel and image. [4]
Simultaneous linear equations: matrix formulation; existence and uniqueness of solu
tions, geometric interpretation; Gaussian elimination. [3]
Symmetric, antisymmetric, orthogonal, hermitian and unitary matrices. Decomposition
of a general matrix into isotropic, symmetric tracefree and antisymmetric parts. [1]
Eigenvalues and Eigenvectors
Eigenvalues and eigenvectors; geometric significance. [2]
Proof that eigenvalues of hermitian matrix are real, and that distinct eigenvalues give
an orthogonal basis of eigenvectors. The effect of a general change of basis (similarity
transformations). Diagonalization of general matrices: sufficient conditions; examples
of matrices that cannot be diagonalized. Canonical forms for 2 × 2 matrices. [5]
Discussion of quadratic forms, including change of basis. Classification of conics,
cartesian and polar forms. [1]
Rotation matrices and Lorentz transformations as transformation groups. [1]
Contents
0 Introduction
1 Complex numbers
1.1 Basic properties
1.2 Complex exponential function
1.3 Roots of unity
1.4 Complex logarithm and power
1.5 De Moivre’s theorem
1.6 Lines and circles in C
2 Vectors
2.1 Definition and basic properties
2.2 Scalar product
2.2.1 Geometric picture (R
2
and R
3
only)
2.2.2 General algebraic definition
2.3 CauchySchwarz inequality
2.4 Vector product
2.5 Scalar triple product
2.6 Spanning sets and bases
2.6.1 2D space
2.6.2 3D space
2.6.3 R
n
space
2.6.4 C
n
space
2.7 Vector subspaces
2.8 Suffix notation
2.9 Geometry
2.9.1 Lines
2.9.2 Plane
2.10 Vector equations
3 Linear maps
3.1 Examples
3.1.1 Rotation in R
3
3.1.2 Reflection in R
3
3.2 Linear Maps
3.3 Rank and nullity
3.4 Matrices
3.4.1 Examples
3.4.2 Matrix Algebra
3.4.3 Decomposition of an n × n matrix
3.4.4 Matrix inverse
3.5 Determinants
3.5.1 Permutations
3.5.2 Properties of determinants
3.5.3 Minors and Cofactors
4 Matrices and linear equations
4.1 Simple example, 2 × 2
4.2 Inverse of an n × n matrix
4.3 Homogeneous and inhomogeneous equations
4.3.1 Gaussian elimination
4.4 Matrix rank
4.5 Homogeneous problem Ax = 0
4.5.1 Geometrical interpretation
4.5.2 Linear mapping view of Ax = 0
4.6 General solution of Ax = d
5 Eigenvalues and eigenvectors
5.1 Preliminaries and definitions
5.2 Linearly independent eigenvectors
5.3 Transformation matrices
5.3.1 Transformation law for vectors
5.3.2 Transformation law for matrix
5.4 Similar matrices
5.5 Diagonalizable matrices
5.6 Canonical (Jordan normal) form
5.7 CayleyHamilton Theorem
5.8 Eigenvalues and eigenvectors of a Hermitian matrix
5.8.1 Eigenvalues and eigenvectors
5.8.2 GramSchmidt orthogonalization (nonexaminable)
5.8.3 Unitary transformation
5.8.4 Diagonalization of n × n Hermitian matrices
5.8.5 Normal matrices
6 Quadratic forms and conics
6.1 Quadrics and conics
6.1.1 Quadrics
6.1.2 Conic sections (n = 2)
6.2 Focusdirectrix property
7 Transformation groups
7.1 Groups of orthogonal matrices
7.2 Length preserving matrices
7.3 Lorentz transformations
0 Introduction
Vectors and matrices is the language in which a lot of mathematics is written
in. In physics, many variables such as position and momentum are expressed as
vectors. Heisenberg also formulated quantum mechanics in terms of vectors and
matrices. In statistics, one might pack all the results of all experiments into a
single vector, and work with a large vector instead of many small quantities. In
group theory, matrices are used to represent the symmetries of space (as well as
many other groups).
So what is a vector? Vectors are very general objects, and can in theory
represent very complex objects. However, in this course, our focus is on vectors
in
R
n
or
C
n
. We can think of each of these as an array of
n
real or complex
numbers. For example, (1
,
6
,
4) is a vector in
R
3
. These vectors are added in the
obvious way. For example, (1
,
6
,
4) + (3
,
5
,
2) = (4
,
11
,
6). We can also multiply
vectors by numbers, say 2(1
,
6
,
4) = (2
,
12
,
8). Often, these vectors represent
points in an ndimensional space.
Matrices, on the other hand, represent functions between vectors, i.e. a
function that takes in a vector and outputs another vector. These, however, are
not arbitrary functions. Instead matrices represent linear functions. These are
functions that satisfy the equality
f
(
λx
+
µy
) =
λf
(
x
) +
µf
(
y
) for arbitrary
numbers
λ, µ
and vectors
x, y
. It is important to note that the function
x 7→ x
+
c
for some constant vector
c
is not linear according to this definition, even though
it might look linear.
It turns out that for each linear function from
R
n
to
R
m
, we can represent
the function uniquely by an
m × n
array of numbers, which is what we call the
matrix. Expressing a linear function as a matrix allows us to conveniently study
many of its properties, which is why we usually talk about matrices instead of
the function itself.
1 Complex numbers
In
R
, not every polynomial equation has a solution. For example, there does
not exist any
x
such that
x
2
+ 1 = 0, since for any
x
,
x
2
is nonnegative, and
x
2
+ 1 can never be 0. To solve this problem, we introduce the “number”
i
that
satisfies
i
2
=
−
1. Then
i
is a solution to the equation
x
2
+ 1 = 0. Similarly,
−i
is also a solution to the equation.
We can add and multiply numbers with
i
. For example, we can obtain
numbers 3 +
i
or 1 + 3
i
. These numbers are known as complex numbers. It turns
out that by adding this single number
i
, every polynomial equation will have a
root. In fact, for an
n
th order polynomial equation, we will later see that there
will always be
n
roots, if we account for multiplicity. We will go into details in
Chapter 5.
Apart from solving equations, complex numbers have a lot of rather important
applications. For example, they are used in electronics to represent alternating
currents, and form an integral part in the formulation of quantum mechanics.
1.1 Basic properties
Definition
(Complex number)
.
A complex number is a number
z ∈ C
of the
form
z
=
a
+
ib
with
a, b ∈ R
, where
i
2
=
−
1. We write
a
=
Re
(
z
) and
b
=
Im
(
z
).
We have
z
1
± z
2
= (a
1
+ ib
1
) ± (a
2
+ ib
2
)
= (a
1
± a
2
) + i(b
1
± b
2
)
z
1
z
2
= (a
1
+ ib
1
)(a
2
+ ib
2
)
= (a
1
a
2
− b
1
b
2
) + i(b
1
a
2
+ a
1
b
2
)
z
−1
=
1
a + ib
=
a − ib
a
2
+ b
2
Definition
(Complex conjugate)
.
The complex conjugate of
z
=
a
+
ib
is
a −ib
.
It is written as ¯z or z
∗
.
It is often helpful to visualize complex numbers in a diagram:
Definition
(Argand diagram)
.
An Argand diagram is a diagram in which a
complex number
z
=
x
+
iy
is represented by a vector
p
=
x
y
. Addition of
vectors corresponds to vector addition and ¯z is the reflection of z in the xaxis.
Re
Im
z
1
z
2
¯z
2
z
1
+ z
2
Definition
(Modulus and argument of complex number)
.
The modulus of
z
=
x
+
iy
is
r
=
z
=
p
x
2
+ y
2
. The argument is
θ
=
arg z
=
tan
−1
(
y/x
). The
modulus is the length of the vector in the Argand diagram, and the argument is
the angle between z and the real axis. We have
z = r(cos θ + i sin θ)
Clearly the pair (
r, θ
) uniquely describes a complex number
z
, but each complex
number
z ∈ C
can be described by many different
θ
since
sin
(2
π
+
θ
) =
sin θ
and cos(2π + θ) = cos θ. Often we take the principle value θ ∈ (−π, π].
When writing z
i
= r
i
(cos θ
i
+ i sin θ
i
), we have
z
1
z
2
= r
1
r
2
[(cos θ
1
cos θ
2
− sin θ
1
sin θ
2
) + i(sin θ
1
cos θ
2
+ sin θ
2
cos θ
1
)]
= r
1
r
2
[cos(θ
1
+ θ
2
) + i sin(θ
1
+ θ
2
)]
In other words, when multiplying complex numbers, the moduli multiply and
the arguments add.
Proposition. z¯z = a
2
+ b
2
= z
2
.
Proposition. z
−1
= ¯z/z
2
.
Theorem (Triangle inequality). For all z
1
, z
2
∈ C, we have
z
1
+ z
2
 ≤ z
1
 + z
2
.
Alternatively, we have z
1
− z
2
 ≥ z
1
 − z
2
.
1.2 Complex exponential function
Exponentiation was originally defined for integer powers as repeated multiplica
tion. This is then extended to rational powers using roots. We can also extend
this to any real number since real numbers can be approximated arbitrarily
accurately by rational numbers. However, what does it mean to take an exponent
of a complex number?
To do so, we use the Taylor series definition of the exponential function:
Definition (Exponential function). The exponential function is defined as
exp(z) = e
z
= 1 + z +
z
2
2!
+
z
3
3!
+ ··· =
∞
X
n=0
z
n
n!
.
This automatically allows taking exponents of arbitrary complex numbers.
Having defined exponentiation this way, we want to check that it satisfies the
usual properties, such as
exp
(
z
+
w
) =
exp
(
z
)
exp
(
w
). To prove this, we will
first need a helpful lemma.
Lemma.
∞
X
n=0
∞
X
m=0
a
mn
=
∞
X
r=0
r
X
m=0
a
r−m,m
Proof.
∞
X
n=0
∞
X
m=0
a
mn
= a
00
+ a
01
+ a
02
+ ···
+ a
10
+ a
11
+ a
12
+ ···
+ a
20
+ a
21
+ a
22
+ ···
= (a
00
) + (a
10
+ a
01
) + (a
20
+ a
11
+ a
02
) + ···
=
∞
X
r=0
r
X
m=0
a
r−m,m
This is not exactly a rigorous proof, since we should not handwave about
infinite sums so casually. But in fact, we did not even show that the definition of
exp
(
z
) is well defined for all numbers
z
, since the sum might diverge. All these
will be done in that IA Analysis I course.
Theorem. exp(z
1
) exp(z
2
) = exp(z
1
+ z
2
)
Proof.
exp(z
1
) exp(z
2
) =
∞
X
n=0
∞
X
m=0
z
m
1
m!
z
n
2
n!
=
∞
X
r=0
r
X
m=0
z
r−m
1
(r − m)!
z
m
2
m!
=
∞
X
r=0
1
r!
r
X
m=0
r!
(r − m)!m!
z
r−m
1
z
m
2
=
∞
X
r=0
(z
1
+ z
2
)
r
r!
Again, to define the sine and cosine functions, instead of referring to “angles”
(since it doesn’t make much sense to refer to complex “angles”), we again use a
series definition.
Definition (Sine and cosine functions). Define, for all z ∈ C,
sin z =
∞
X
n=0
(−1)
n
(2n + 1)!
z
2n+1
= z −
1
3!
z
3
+
1
5!
z
5
+ ···
cos z =
∞
X
n=0
(−1)
n
(2n)!
z
2n
= 1 −
1
2!
z
2
+
1
4!
z
4
+ ···
One very important result is the relationship between exp, sin and cos.
Theorem. e
iz
= cos z + i sin z.
Alternatively, since sin(−z) = −sin z and cos(−z) = cos z, we have
cos z =
e
iz
+ e
−iz
2
,
sin z =
e
iz
− e
−iz
2i
.
Proof.
e
iz
=
∞
X
n=0
i
n
n!
z
n
=
∞
X
n=0
i
2n
(2n)!
z
2n
+
∞
X
n=0
i
2n+1
(2n + 1)!
z
2n+1
=
∞
X
n=0
(−1)
n
(2n)!
z
2n
+ i
∞
X
n=0
(−1)
n
(2n + 1)!
z
2n+1
= cos z + i sin z
Thus we can write z = r(cos θ + i sin θ) = re
iθ
.
1.3 Roots of unity
Definition
(Roots of unity)
.
The
n
th roots of unity are the roots to the equation
z
n
= 1 for
n ∈ N
. Since this is a polynomial of order
n
, there are
n
roots of
unity. In fact, the nth roots of unity are exp
2πi
k
n
for k = 0, 1, 2, 3 ···n − 1.
Proposition. If ω = exp
2πi
n
, then 1 + ω + ω
2
+ ··· + ω
n−1
= 0
Proof. Two proofs are provided:
(i)
Consider the equation
z
n
= 1. The coefficient of
z
n−1
is the sum of
all roots. Since the coefficient of
z
n−1
is 0, then the sum of all roots
= 1 + ω + ω
2
+ ··· + ω
n−1
= 0.
(ii)
Since
ω
n
−
1 = (
ω −
1)(1 +
ω
+
···
+
ω
n−1
) and
ω 6
= 1, dividing by (
ω −
1),
we have 1 + ω + ··· + ω
n−1
= (ω
n
− 1)/(ω − 1) = 0.
1.4 Complex logarithm and power
Definition
(Complex logarithm)
.
The complex logarithm
w
=
log z
is a solution
to
e
ω
=
z
, i.e.
ω
=
log z
. Writing
z
=
re
iθ
, we have
log z
=
log
(
re
iθ
) =
log r
+
iθ
.
This can be multivalued for different values of
θ
and, as above, we should select
the θ that satisfies −π < θ ≤ π.
Example. log 2i = log 2 + i
π
2
Definition
(Complex power)
.
The complex power
z
α
for
z, α ∈ C
is defined as
z
α
=
e
α log z
. This, again, can be multivalued, as
z
α
=
e
α log z
e
iαθ
e
2inπα
(there
are finitely many values if
α ∈ Q
, infinitely many otherwise). Nevertheless, we
make z
α
singlevalued by insisting −π < θ ≤ π.
1.5 De Moivre’s theorem
Theorem (De Moivre’s theorem).
cos nθ + i sin nθ = (cos θ + i sin θ)
n
.
Proof.
First prove for the
n ≥
0 case by induction. The
n
= 0 case is true since
it merely reads 1 = 1. We then have
(cos θ + i sin θ)
n+1
= (cos θ + i sin θ)
n
(cos θ + i sin θ)
= (cos nθ + i sin nθ)(cos θ + i sin θ)
= cos(n + 1)θ + i sin(n + 1)θ
If n < 0, let m = −n. Then m > 0 and
(cosθ + i sin θ)
−m
= (cos mθ + i sin mθ)
−1
=
cos mθ − i sin mθ
(cos mθ + i sin mθ)(cos mθ − i sin mθ)
=
cos(−mθ) + i sin(−mθ)
cos
2
mθ + sin
2
mθ
= cos(−mθ) + i sin(−mθ)
= cos nθ + i sin nθ
Note that “
cos nθ
+
i sin nθ
=
e
inθ
= (
e
iθ
)
n
= (
cos θ
+
i sin θ
)
n
” is not a valid
proof of De Moivre’s theorem, since we do not know yet that
e
inθ
= (
e
iθ
)
n
. In
fact, De Moivre’s theorem tells us that this is a valid rule to apply.
Example.
We have
cos
5
θ
+
i sin
5
θ
= (
cos θ
+
i sin θ
)
5
. By binomial expansion
of the RHS and taking real and imaginary parts, we have
cos 5θ = 5 cos θ − 20 cos
3
θ + 16 cos
5
θ
sin 5θ = 5 sin θ − 20 sin
3
θ + 16 sin
5
θ
1.6 Lines and circles in C
Since complex numbers can be regarded as points on the 2D plane, we can often
use complex numbers to represent two dimensional objects.
Suppose that we want to represent a straight line through
z
0
∈ C
parallel to
w ∈ C
. The obvious way to do so is to let
z
=
z
0
+
λw
where
λ
can take any
real value. However, this is not an optimal way of doing so, since we are not
using the power of complex numbers fully. This is just the same as the vector
equation for straight lines, which you may or may not know from your A levels.
Instead, we arrange the equation to give
λ
=
z−z
0
w
. We take the complex
conjugate of this expression to obtain
¯
λ
=
¯z− ¯z
0
¯w
. The trick here is to realize that
λ is a real number. So we must have λ =
¯
λ. This means that we must have
z − z
0
w
=
¯z − ¯z
0
¯w
z ¯w − ¯zw = z
0
¯w − ¯z
0
w.
Theorem
(Equation of straight line)
.
The equation of a straight line through
z
0
and parallel to w is given by
z ¯w − ¯zw = z
0
¯w − ¯z
0
w.
The equation of a circle, on the other hand, is rather straightforward. Suppose
that we want a circle with center
c ∈ C
and radius
ρ ∈ R
+
. By definition of a
circle, a point
z
is on the circle iff its distance to
c
is
ρ
, i.e.
z −c
=
ρ
. Recalling
that z
2
= z¯z, we obtain,
z − c = ρ
z − c
2
= ρ
2
(z − c)(¯z − ¯c) = ρ
2
z¯z − ¯cz − c¯z = ρ
2
− c¯c
Theorem.
The general equation of a circle with center
c ∈ C
and radius
ρ ∈ R
+
can be given by
z¯z − ¯cz − c¯z = ρ
2
− c¯c.
2 Vectors
We might have first learned vectors as arrays of numbers, and then defined
addition and multiplication in terms of the individual numbers in the vector.
This however, is not what we are going to do here. The array of numbers is just
a representation of the vector, instead of the vector itself.
Here, we will define vectors in terms of what they are, and then the various
operations are defined axiomatically according to their properties.
2.1 Definition and basic properties
Definition
(Vector)
.
A vector space over
R
or
C
is a collection of vectors
v ∈ V
,
together with two operations: addition of two vectors and multiplication of a
vector with a scalar (i.e. a number from R or C, respectively).
Vector addition has to satisfy the following axioms:
(i) a + b = b + a (commutativity)
(ii) (a + b) + c = a + (b + c) (associativity)
(iii) There is a vector 0 such that a + 0 = a. (identity)
(iv) For all vectors a, there is a vector (−a) such that a + (−a) = 0 (inverse)
Scalar multiplication has to satisfy the following axioms:
(i) λ(a + b) = λa + λb.
(ii) (λ + µ)a = λa + µa.
(iii) λ(µa) = (λµ)a.
(iv) 1a = a.
Often, vectors have a length and direction. The length is denoted by
v
. In
this case, we can think of a vector as an “arrow” in space. Note that
λa
is either
parallel (λ ≥ 0) to or antiparallel (λ ≤ 0) to a.
Definition
(Unit vector)
.
A unit vector is a vector with length 1. We write a
unit vector as
ˆ
v.
Example. R
n
is a vector space with componentwise addition and scalar mul
tiplication. Note that the vector space
R
is a line, but not all lines are vector
spaces. For example,
x
+
y
= 1 is not a vector space since it does not contain
0
.
2.2 Scalar product
In a vector space, we can define the scalar product of two vectors, which returns
a scalar (i.e. a real or complex number). We will first look at the usual scalar
product defined for R
n
, and then define the scalar product axiomatically.
2.2.1 Geometric picture (R
2
and R
3
only)
Definition
(Scalar/dot product)
. a · b
=
abcos θ
, where
θ
is the angle
between a and b. It satisfies the following properties:
(i) a · b = b · a
(ii) a · a = a
2
≥ 0
(iii) a · a = 0 iff a = 0
(iv) If a · b = 0 and a, b 6= 0, then a and b are perpendicular.
Intuitively, this is the product of the parts of a and b that are parallel.
b
a
a
acos θ
Using the dot product, we can write the projection of
b
onto
a
as (
bcos θ
)
ˆ
a
=
(ˆa · b)ˆa.
The cosine rule can be derived as follows:

−−→
BC
2
= 
−→
AC −
−−→
AB
2
= (
−→
AC −
−−→
AB) · (
−→
AC −
−−→
AB)
= 
−−→
AB
2
+ 
−→
AC
2
− 2
−−→
AB
−→
ACcos θ
We will later come up with a convenient algebraic way to evaluate this scalar
product.
2.2.2 General algebraic definition
Definition
(Inner/scalar product)
.
In a real vector space
V
, an inner product
or scalar product is a map
V × V → R
that satisfies the following axioms. It is
written as x · y or hx  yi.
(i) x · y = y · x (symmetry)
(ii) x · (λy + µz) = λx · y + µx · z (linearity in 2nd argument)
(iii) x · x ≥ 0 with equality iff x = 0 (positive definite)
Note that this is a definition only for real vector spaces, where the scalars
are real. We will have a different set of definitions for complex vector spaces.
In particular, here we can use (i) and (ii) together to show linearity in 1st
argument. However, this is generally not true for complex vector spaces.
Definition. The norm of a vector, written as a or kak, is defined as
a =
√
a · a.
Example.
Instead of the usual
R
n
vector space, we can consider the set of all
real (integrable) functions as a vector space. We can define the following inner
product:
hf  gi =
Z
1
0
f(x)g(x) dx.
2.3 CauchySchwarz inequality
Theorem (CauchySchwarz inequality). For all x, y ∈ R
n
,
x · y ≤ xy.
Proof. Consider the expression x − λy
2
. We must have
x − λy
2
≥ 0
(x − λy) · (x − λy) ≥ 0
λ
2
y
2
− λ(2x · y) + x
2
≥ 0.
Viewing this as a quadratic in
λ
, we see that the quadratic is nonnegative and
thus cannot have 2 real roots. Thus the discriminant ∆ ≤ 0. So
4(x · y)
2
≤ 4y
2
x
2
(x · y)
2
≤ x
2
y
2
x · y ≤ xy.
Note that we proved this using the axioms of the scalar product. So this
result holds for all possible scalar products on any (real) vector space.
Example.
Let
x
= (
α, β, γ
) and
y
= (1
,
1
,
1). Then by the CauchySchwarz
inequality, we have
α + β + γ ≤
√
3
p
α
2
+ β
2
+ γ
2
α
2
+ β
2
+ γ
2
≥ αβ + βγ + γα,
with equality if α = β = γ.
Corollary (Triangle inequality).
x + y ≤ x + y.
Proof.
x + y
2
= (x + y) · (x + y)
= x
2
+ 2x · y + y
2
≤ x
2
+ 2xy + y
2
= (x + y)
2
.
So
x + y ≤ x + y.
2.4 Vector product
Apart from the scalar product, we can also define the vector product. However,
this is defined only for R
3
space, but not spaces in general.
Definition
(Vector/cross product)
.
Consider
a, b ∈ R
3
. Define the vector
product
a × b = absin θ
ˆ
n,
where
ˆn
is a unit vector perpendicular to both
a
and
b
. Since there are two
(opposite) unit vectors that are perpendicular to both of them, we pick
ˆn
to be
the one that is perpendicular to a, b in a righthanded sense.
a
b
a × b
The vector product satisfies the following properties:
(i) a × b = −b × a.
(ii) a × a = 0.
(iii) a × b = 0 ⇒ a = λb for some λ ∈ R (or b = 0).
(iv) a × (λb) = λ(a × b).
(v) a × (b + c) = a × b + a × c.
If we have a triangle
OAB
, its area is given by
1
2

−→
OA
−−→
OBsin θ
=
1
2

−→
OA×
−−→
OB
.
We define the vector area as
1
2
−→
OA ×
−−→
OB
, which is often a helpful notion when
we want to do calculus with surfaces.
There is a convenient way of calculating vector products:
Proposition.
a × b = (a
1
ˆ
i + a
2
ˆ
j + a
3
ˆ
k) × (b
1
ˆ
i + b
2
ˆ
j + b
3
ˆ
k)
= (a
2
b
3
− a
3
b
2
)
ˆ
i + ···
=
ˆ
i
ˆ
j
ˆ
k
a
1
a
2
a
3
b
1
b
2
b
3
2.5 Scalar triple product
Definition (Scalar triple product). The scalar triple product is defined as
[a, b, c] = a · (b × c).
Proposition.
If a parallelepiped has sides represented by vectors
a, b, c
that
form a righthanded system, then the volume of the parallelepiped is given by
[a, b, c].
b
c
a
Proof.
The area of the base of the parallelepiped is given by
bcsin θ
=
b × c
.
Thus the volume=
b × cacos φ
=
a · (b × c)
, where
φ
is the angle between
a
and the normal to
b
and
c
. However, since
a, b, c
form a righthanded system,
we have a · (b × c) ≥ 0. Therefore the volume is a · (b × c).
Since the order of a, b, c doesn’t affect the volume, we know that
[a, b, c] = [b, c, a] = [c, a, b] = −[b, a, c] = −[a, c, b] = −[c, b, a].
Theorem. a × (b + c) = a × b + a × c.
Proof. Let d = a × (b + c) − a × b − a × c. We have
d · d = d · [a × (b + c)] − d · (a × b) − d · (a × c)
= (b + c) · (d × a) − b · (d × a) − c · (d × a)
= 0
Thus d = 0.
2.6 Spanning sets and bases
2.6.1 2D space
Definition
(Spanning set)
.
A set of vectors
{a, b}
spans
R
2
if for all vectors
r ∈ R
2
, there exist some λ, µ ∈ R such that r = λa + µb.
In R
2
, two vectors span the space if a × b 6= 0.
Theorem. The coefficients λ, µ are unique.
Proof.
Suppose that
r
=
λa
+
µb
=
λ
0
a
+
µ
0
b
. Take the vector product with
a
on both sides to get (
µ − µ
0
)
a × b
=
0
. Since
a × b 6
=
0
, then
µ
=
µ
0
. Similarly,
λ = λ
0
.
Definition
(Linearly independent vectors in
R
2
)
.
Two vectors
a
and
b
are
linearly independent if for
α, β ∈ R
,
αa
+
βb
=
0
iff
α
=
β
= 0. In
R
2
,
a
and
b
are linearly independent if a × b 6= 0.
Definition
(Basis of
R
2
)
.
A set of vectors is a basis of
R
2
if it spans
R
2
and
are linearly independent.
Example. {
ˆ
i,
ˆ
j}
=
{
(1
,
0)
,
(0
,
1)
}
is a basis of
R
2
. They are the standard basis
of R
2
.
2.6.2 3D space
We can extend the above definitions of spanning set and linear independent set
to R
3
. Here we have
Theorem.
If
a, b, c ∈ R
3
are noncoplanar, i.e.
a ·
(
b × c
)
6
= 0, then they form
a basis of R
3
.
Proof.
For any
r
, write
r
=
λa
+
µb
+
νc
. Performing the scalar product
with
b × c
on both sides, one obtains
r · (b × c)
=
λa · (b × c)
+
µb · (b × c)
+
νc · (b × c)
=
λ[a, b, c]
. Thus
λ
=
[r, b, c]/[a, b, c]
. The values of
µ
and
ν
can
be found similarly. Thus each
r
can be written as a linear combination of
a, b
and c.
By the formula derived above, it follows that if
αa
+
βb
+
γc
=
0
, then
α = β = γ = 0. Thus they are linearly independent.
Note that while we came up with formulas for
λ, µ
and
ν
, we did not actually
prove that these coefficients indeed work. This is rather unsatisfactory. We
could, of course, expand everything out and show that this indeed works, but
in IB Linear Algebra, we will prove a much more general result, saying that if
we have an
n
dimensional space and a set of
n
linear independent vectors, then
they form a basis.
In R
3
, the standard basis is
ˆ
i,
ˆ
j,
ˆ
k, or (1, 0, 0), (0, 1, 0) and (0, 0, 1).
2.6.3 R
n
space
In general, we can define
Definition
(Linearly independent vectors)
.
A set of vectors
{v
1
, v
2
, v
3
···v
m
}
is linearly independent if
m
X
i=1
λ
i
v
i
= 0 ⇒ (∀i) λ
i
= 0.
Definition
(Spanning set)
.
A set of vectors
{u
1
, u
2
, u
3
···u
m
} ⊆ R
n
is a
spanning set of R
n
if
(∀x ∈ R
n
)(∃λ
i
)
m
X
i=1
λ
i
u
i
= x
Definition
(Basis vectors)
.
A basis of
R
n
is a linearly independent spanning
set. The standard basis of
R
n
is
e
1
= (1
,
0
,
0
, ···
0)
, e
2
= (0
,
1
,
0
, ···
0)
, ···e
n
=
(0, 0, 0, ··· , 1).
Definition
(Orthonormal basis)
.
A basis
{e
i
}
is orthonormal if
e
i
· e
j
= 0 if
i 6= j and e
i
· e
i
= 1 for all i, j.
Using the Kronecker Delta symbol, which we will define later, we can write
this condition as e
i
· e
j
= δ
ij
.
Definition
(Dimension of vector space)
.
The dimension of a vector space is
the number of vectors in its basis. (Exercise: show that this is welldefined)
We usually denote the components of a vector
x
by
x
i
. So we have
x
=
(x
1
, x
2
, ··· , x
n
).
Definition
(Scalar product)
.
The scalar product of
x, y ∈ R
n
is defined as
x · y =
P
x
i
y
i
.
The reader should check that this definition coincides with the
xycos θ
definition in the case of R
2
and R
3
.
2.6.4 C
n
space
C
n
is very similar to
R
n
, except that we have complex numbers. As a result, we
need a different definition of the scalar product. If we still defined
u ·v
=
P
u
i
v
i
,
then if we let
u
= (0
, i
), then
u · u
=
−
1
<
0. This would be bad if we want to
use the scalar product to define a norm.
Definition
(
C
n
)
. C
n
=
{
(
z
1
, z
2
, ··· , z
n
) :
z
i
∈ C}
. It has the same standard
basis as
R
n
but the scalar product is defined differently. For
u, v ∈ C
n
,
u · v
=
P
u
∗
i
v
i
. The scalar product has the following properties:
(i) u · v = (v · u)
∗
(ii) u · (λv + µw) = λ(u · v) + µ(u · w)
(iii) u · u ≥ 0 and u · u = 0 iff u = 0
Instead of linearity in the first argument, here we have (
λu
+
µv
)
· w
=
λ
∗
u · w + µ
∗
v · w.
Example.
4
X
k=1
(−i)
k
x + i
k
y
2
=
X
(−i)
k
hx + i
k
y  x + i
k
yi
=
X
(−i)
k
(hx + i
k
y  xi + i
k
hx + i
k
y  yi)
=
X
(−i)
k
(hx  xi + (−i)
k
hy  xi + i
k
hx  yi + i
k
(−i)
k
hy  yi)
=
X
(−i)
k
[(x
2
+ y
2
) + (−1)
k
hy  xi + hx  yi]
= (x
2
+ y
2
)
X
(−i)
k
+ hy  xi
X
(−1)
k
+ hx  yi
X
1
= 4hx  yi.
We can prove the CauchySchwarz inequality for complex vector spaces using
the same proof as the real case, except that this time we have to first multiply
y
by some
e
iθ
so that
x ·
(
e
iθ
y
) is a real number. The factor of
e
iθ
will drop off at
the end when we take the modulus signs.
2.7 Vector subspaces
Definition
(Vector subspace)
.
A vector subspace of a vector space
V
is a subset
of
V
that is also a vector space under the same operations. Both
V
and
{0}
are
subspaces of V . All others are proper subspaces.
A useful criterion is that a subset U ⊆ V is a subspace iff
(i) x, y ∈ U ⇒ (x + y) ∈ U.
(ii) x ∈ U ⇒ λx ∈ U for all scalars λ.
(iii) 0 ∈ U .
This can be more concisely written as “
U
is nonempty and for all
x, y ∈ U
,
(λx + µy) ∈ U”.
Example.
(i)
If
{a, b, c}
is a basis of
R
3
, then
{a + c, b + c}
is a basis of a 2D subspace.
Suppose x, y ∈ span{a + c, b + c}. Let
x = α
1
(a + c) + β
1
(b + c);
y = α
2
(a + c) + β
2
(b + c).
Then
λx + µy = (λα
1
+ µα
2
)(a + c) + (λβ
1
+ µβ
2
)(b + c) ∈ span{a + c, b + c}.
Thus this is a subspace of R
3
.
Now check that
a + c, b + c
is a basis. We only need to check linear
independence. If
α
(
a + c
) +
β
(
b + c
) =
0
, then
αa
+
βb
+ (
α
+
β
)
c
=
0
.
Since
{a, b, c}
is a basis of
R
3
, therefore
a, b, c
are linearly independent
and
α
=
β
= 0. Therefore
a + c, b + c
is a basis and the subspace has
dimension 2.
(ii)
Given a set of numbers
α
i
, let
U
=
{x ∈ R
n
:
P
n
i=1
α
i
x
i
= 0
}
. We show
that this is a vector subspace of
R
n
: Take
x, y ∈ U
, then consider
λx
+
µy
.
We have
P
α
i
(
λx
i
+
µy
i
) =
λ
P
α
i
x
i
+
µ
P
α
i
y
i
= 0. Thus
λx
+
µy ∈ U
.
The dimension of the subspace is
n −
1 as we can freely choose
x
i
for
i = 1, ··· , n − 1 and then x
n
is uniquely determined by the previous x
i
’s.
(iii)
Let
W
=
{x ∈ R
n
:
P
α
i
x
i
= 1
}
. Then
P
α
i
(
λx
i
+
µy
i
) =
λ
+
µ 6
= 1.
Therefore W is not a vector subspace.
2.8 Suffix notation
Here we are going to introduce a powerful notation that can help us simplify a
lot of things.
First of all, let
v ∈ R
3
. We can write
v
=
v
1
e
1
+
v
2
e
2
+
v
3
e
3
= (
v
1
, v
2
, v
3
).
So in general, the
i
th component of
v
is written as
v
i
. We can thus write
vector equations in component form. For example,
a
=
b → a
i
=
b
i
or
c
=
αa
+
βb → c
i
=
αa
i
+
βb
i
. A vector has one free suffix,
i
, while a scalar
has none.
Notation
(Einstein’s summation convention)
.
Consider a sum
x · y
=
P
x
i
y
i
.
The summation convention says that we can drop the
P
symbol and simply
write x · y = x
i
y
i
. If suffixes are repeated once, summation is understood.
Note that
i
is a dummy suffix and doesn’t matter what it’s called, i.e.
x
i
y
i
= x
j
y
j
= x
k
y
k
etc.
The rules of this convention are:
(i) Suffix appears once in a term: free suffix
(ii) Suffix appears twice in a term: dummy suffix and is summed over
(iii) Suffix appears three times or more: WRONG!
Example. [(a · b)c − (a · c)b]
i
= a
j
b
j
c
i
− a
j
c
j
b
i
summing over j understood.
It is possible for an item to have more than one index. These objects are
known as tensors, which will be studied in depth in the IA Vector Calculus
course.
Here we will define two important tensors:
Definition (Kronecker delta).
δ
ij
=
(
1 i = j
0 i 6= j
.
We have
δ
11
δ
12
δ
13
δ
21
δ
22
δ
23
δ
31
δ
32
δ
33
=
1 0 0
0 1 0
0 0 1
= I.
So the Kronecker delta represents an identity matrix.
Example.
(i) a
i
δ
i1
= a
1
. In general, a
i
δ
ij
= a
j
(i is dummy, j is free).
(ii) δ
ij
δ
jk
= δ
ik
(iii) δ
ii
= n if we are in R
n
.
(iv) a
p
δ
pq
b
q
= a
p
b
p
with p, q both dummy suffices and summed over.
Definition
(Alternating symbol
ε
ijk
)
.
Consider rearrangements of 1
,
2
,
3. We
can divide them into even and odd permutations. Even permutations include
(1
,
2
,
3), (2
,
3
,
1) and (3
,
1
,
2). These are permutations obtained by performing
two (or no) swaps of the elements of (1
,
2
,
3). (Alternatively, it is any “rotation”
of (1, 2, 3))
The odd permutations are (2
,
1
,
3), (1
,
3
,
2) and (3
,
2
,
1). They are the
permutations obtained by one swap only.
Define
ε
ijk
=
+1 ijk is even permutation
−1 ijk is odd permutation
0 otherwise (i.e. repeated suffices)
ε
ijk
has 3 free suffices.
We have
ε
123
=
ε
231
=
ε
312
= +1 and
ε
213
=
ε
132
=
ε
321
=
−
1.
ε
112
=
ε
111
= ··· = 0.
We have
(i) ε
ijk
δ
jk
= ε
ijj
= 0
(ii)
If
a
jk
=
a
kj
(i.e.
a
ij
is symmetric), then
ε
ijk
a
jk
=
ε
ijk
a
kj
=
−ε
ikj
a
kj
.
Since
ε
ijk
a
jk
=
ε
ikj
a
kj
(we simply renamed dummy suffices), we have
ε
ijk
a
jk
= 0.
Proposition. (a × b)
i
= ε
ijk
a
j
b
k
Proof. By expansion of formula
Theorem. ε
ijk
ε
ipq
= δ
jp
δ
kq
− δ
jq
δ
kp
Proof. Proof by exhaustion:
RHS =
+1 if j = p and k = q
−1 if j = q and k = p
0 otherwise
LHS: Summing over
i
, the only nonzero terms are when
j, k 6
=
i
and
p, q 6
=
i
.
If
j
=
p
and
k
=
q
, LHS is (
−
1)
2
or (+1)
2
= 1. If
j
=
q
and
k
=
p
, LHS is
(+1)(−1) or (−1)(+1) = −1. All other possibilities result in 0.
Equally, we have ε
ijk
ε
pqk
= δ
ip
δ
jq
− δ
jp
δ
iq
and ε
ijk
ε
pjq
= δ
ip
δ
kq
− δ
iq
δ
kp
.
Proposition.
a · (b × c) = b · (c × a)
Proof. In suffix notation, we have
a · (b × c) = a
i
(b × c)
i
= ε
ijk
b
j
c
k
a
i
= ε
jki
b
j
c
k
a
i
= b · (c × a).
Theorem (Vector triple product).
a × (b × c) = (a · c)b − (a · b)c.
Proof.
[a × (b × c)]
i
= ε
ijk
a
j
(b × c)
k
= ε
ijk
ε
kpq
a
j
b
p
c
q
= ε
ijk
ε
pqk
a
j
b
p
c
q
= (δ
ip
δ
jq
− δ
iq
δ
jp
)a
j
b
p
c
q
= a
j
b
i
c
j
− a
j
c
i
b
j
= (a · c)b
i
− (a · b)c
i
Similarly, (a × b) × c = (a · c)b − (b · c)a.
Spherical trigonometry
Proposition. (a × b) · (a × c) = (a · a)(b · c) − (a · b)(a · c).
Proof.
LHS = (a × b)
i
(a × c)
i
= ε
ijk
a
j
b
k
ε
ipq
a
p
c
q
= (δ
jp
δ
kq
− δ
jq
δ
kp
)a
j
b
k
a
p
c
q
= a
j
b
k
a
j
c
k
− a
j
b
k
a
k
c
j
= (a · a)(b · c) − (a · b)(a · c)
Consider the unit sphere, center O, with a, b, c on the surface.
A
B C
δ(A, B)
α
Suppose we are living on the surface of the sphere. So the distance from
A
to
B
is
the arc length on the sphere. We can imagine this to be along the circumference
of the circle through
A
and
B
with center
O
. So the distance is
∠AOB
, which we
shall denote by
δ
(
A, B
). So
a · b
=
cos ∠AOB
=
cos δ
(
A, B
). We obtain similar
expressions for other dot products. Similarly, we get a × b = sin δ(A, B).
cos α =
(a × b) · (a × c)
a × ba × c
=
b · c − (a · b)(a · c)
a × ba × c
Putting in our expressions for the dot and cross products, we obtain
cos α sin δ(A, B) sin δ(A, C) = cos δ(B, C) − cos δ(A, B) cos δ(A, C).
This is the spherical cosine rule that applies when we live on the surface of a
sphere. What does this spherical geometry look like?
Consider a spherical equilateral triangle. Using the spherical cosine rule,
cos α =
cos δ − cos
2
δ
sin
2
δ
= 1 −
1
1 + cos δ
.
Since
cos δ ≤
1, we have
cos α ≤
1
2
and
α ≥
60
◦
. Equality holds iff
δ
= 0, i.e. the
triangle is simply a point. So on a sphere, each angle of an equilateral triangle is
greater than 60
◦
, and the angle sum of a triangle is greater than 180
◦
.
2.9 Geometry
2.9.1 Lines
Any line through a and parallel to t can be written as
x = a + λt.
By crossing both sides of the equation with t, we have
Theorem. The equation of a straight line through a and parallel to t is
(x − a) × t = 0 or x × t = a × t.
2.9.2 Plane
To define a plane Π, we need a normal
n
to the plane and a fixed point
b
. For
any
x ∈
Π, the vector
x − b
is contained in the plane and is thus normal to
n
,
i.e. (x − b) · n = 0.
Theorem. The equation of a plane through b with normal n is given by
x · n = b · n.
If
n = ˆn
is a unit normal, then
d
=
x · ˆn = b · ˆn
is the perpendicular distance
from the origin to Π.
Alternatively, if a, b, c lie in the plane, then the equation of the plane is
(x − a) · [(b − a) × (c − a)] = 0.
Example.
(i)
Consider the intersection between a line
x × t = a × t
with the plane
x · n = b · n. Cross n on the right with the line equation to obtain
(x · n)t − (t · n)x = (a × t) × n
Eliminate x · n using x · n = b · n
(t · n)x = (b · n)t − (a × t) × n
Provided t · n is nonzero, the point of intersection is
x =
(b · n)t − (a × t) × n
t · n
.
Exercise: what if t · n = 0?
(ii)
Shortest distance between two lines. Let
L
1
be (
x − a
1
)
× t
1
=
0
and
L
2
be (x − a
2
) × t
2
= 0.
The distance of closest approach
s
is along a line perpendicular to both
L
1
and
L
2
, i.e. the line of closest approach is perpendicular to both lines and
thus parallel to
t
1
× t
2
. The distance
s
can then be found by projecting
a
1
− a
2
onto t
1
× t
2
. Thus s =
(a
1
− a
2
) ·
t
1
×t
2
t
1
×t
2

.
2.10 Vector equations
Example. x − (x × a) × b = c
. Strategy: take the dot or cross of the equation
with suitable vectors. The equation can be expanded to form
x − (x · b)a + (a · b)x = c.
Dot this with b to obtain
x · b − (x · b)(a · b) + (a · b)(x · b) = c · b
x · b = c · b.
Substituting this into the original equation, we have
x(1 + a · b) = c + (c · b)a
If (1 + a · b) is nonzero, then
x =
c + (c · b)a
1 + a · b
Otherwise, when (1 +
a · b
) = 0, if
c + (c · b)a 6= 0
, then a contradiction is
reached. Otherwise,
x · b = c · b
is the most general solution, which is a plane
of solutions.
3 Linear maps
A linear map is a special type of function between vector spaces. In fact, most
of the time, these are the only functions we actually care about. They are maps
that satisfy the property f(λa + µb) = λf(a) + µf (b).
We will first look at two important examples of linear maps — rotations and
reflections, and then study their properties formally.
3.1 Examples
3.1.1 Rotation in R
3
In
R
3
, first consider the simple cases where we rotate about the
z
axis by
θ
. We
call this rotation R and write x
0
= R(x).
Suppose that initially,
x
= (
x, y, z
) = (
r cos φ, r sin φ, z
). Then after a
rotation by θ, we get
x
0
= (r cos(φ + θ), r sin(φ + θ), z)
= (r cos φ cos θ − r sin φ sin θ, r sin φ cos θ + r cos φ sin θ, z)
= (x cos θ − y sin θ, x sin θ + y cos θ, z).
We can represent this by a matrix
R
such that
x
0
i
=
R
ij
x
j
. Using our formula
above, we obtain
R =
cos θ −sin θ 0
sin θ cos θ 0
0 0 1
Now consider the general case where we rotate by θ about
ˆ
n.
O
ˆ
n
A
x
B
A
0
C
x
0
B A
A
0
C
θ
We have x
0
=
−−→
OB +
−−→
BC +
−−→
CA
0
. We know that
−−→
OB = (ˆn · x)ˆn
−−→
BC =
−−→
BA cos θ
= (
−−→
BO +
−→
OA) cos θ
= (−(ˆn · x)ˆn + x) cos θ
Finally, to get
−→
CA
, we know that

−−→
CA
0

=

−−→
BA
0
sin θ
=

−−→
BAsin θ
=
ˆn × xsin θ
.
Also,
−−→
CA
0
is parallel to
ˆ
n × x. So we must have
−−→
CA
0
= (
ˆ
n × x) sin θ.
Thus x
0
= x cos θ + (1 − cos θ)(ˆn · x)ˆn + ˆn × x sin θ. In components,
x
0
i
= x
i
cos θ + (1 − cos θ)n
j
x
j
n
i
− ε
ijk
x
j
n
k
sin θ.
We want to find an R such that x
0
i
= R
ij
x
j
. So
R
ij
= δ
ij
cos θ + (1 − cos θ)n
i
n
j
− ε
ijk
n
k
sin θ.
3.1.2 Reflection in R
3
Suppose we want to reflect through a plane through
O
with normal
ˆ
n
. First of
all the projection of
x
onto
ˆ
n
is given by (
x ·
ˆ
n
)
ˆ
n
. So we get
x
0
=
x −
2
(x · ˆn)ˆn
.
In suffix notation, we have
x
0
i
=
x
i
−
2
x
j
n
j
n
i
. So our reflection matrix is
R
ij
= δ
ij
− 2n
i
n
j
.
x
0
ˆ
n x
3.2 Linear Maps
Definition
(Domain, codomain and image of map)
.
Consider sets
A
and
B
and mapping
T
:
A → B
such that each
x ∈ A
is mapped into a unique
x
0
=
T
(
x
)
∈ B
.
A
is the domain of
T
and
B
is the codomain of
T
. Typically,
we have T : R
n
→ R
m
or T : C
n
→ C
m
.
Definition
(Linear map)
.
Let
V, W
be real (or complex) vector spaces, and
T : V → W . Then T is a linear map if
(i) T (a + b) = T (a) + T (b) for all a, b ∈ V .
(ii) T (λa) = λT (a) for all λ ∈ R (or C).
Equivalently, we have T (λa + µb) = λT (a) + µT (b).
Example.
(i)
Consider a translation
T
:
R
3
→ R
3
with
T
(
x
) =
x + a
for some fixed,
given
a
. This is not a linear map since
T
(
λx
+
µy
)
6
=
λx
+
µy
+ (
λ
+
µ
)
a
.
(ii) Rotation, reflection and projection are linear transformations.
Definition
(Image and kernel of map)
.
The image of a map
f
:
U → V
is the
subset of V {f(u) : u ∈ U}. The kernel is the subset of U {u ∈ U : f(u) = 0}.
Example.
(i)
Consider
S
:
R
3
→ R
2
with
S
(
x, y, z
) = (
x
+
y,
2
x − z
). Simple yet
tedious algebra shows that this is linear. Now consider the effect of
S
on
the standard basis.
S
(1
,
0
,
0) = (1
,
2),
S
(0
,
1
,
0) = (1
,
0) and
S
(0
,
0
,
1) =
(0
, −
1). Clearly these are linearly dependent, but they do span the whole
of R
2
. We can say S(R
3
) = R
2
. So the image is R
2
.
Now solve
S
(
x, y, z
) =
0
. We need
x
+
y
= 0 and 2
x − z
= 0. Thus
x
=
(
x, −x,
2
x
), i.e. it is parallel to (1
, −
1
,
2). So the set
{λ
(1
, −
1
,
2) :
λ ∈ R}
is the kernel of S.
(ii)
Consider a rotation in
R
3
. The kernel is the zero vector and the image is
R
3
.
(iii)
Consider a projection of
x
onto a plane with normal
ˆn
. The image is the
plane itself, and the kernel is any vector parallel to ˆn
Theorem.
Consider a linear map
f
:
U → V
, where
U, V
are vector spaces.
Then im(f ) is a subspace of V , and ker(f) is a subspace of U.
Proof. Both are nonempty since f(0) = 0.
If
x, y ∈ im
(
f
), then
∃a, b ∈ U
such that
x
=
f
(
a
)
, y
=
f
(
b
). Then
λx
+
µy
=
λf
(
a
) +
µf
(
b
) =
f
(
λa
+
µb
). Now
λa
+
µb ∈ U
since
U
is a vector
space, so there is an element in
U
that maps to
λx
+
µy
. So
λx
+
µy ∈ im
(
f
)
and im(f ) is a subspace of V .
Suppose
x, y ∈ ker
(
f
), i.e.
f
(
x
) =
f
(
y
) =
0
. Then
f
(
λx
+
µy
) =
λf
(
x
) +
µf(y) = λ0 + µ0 = 0. Therefore λx + µy ∈ ker(f).
3.3 Rank and nullity
Definition
(Rank of linear map)
.
The rank of a linear map
f
:
U → V
, denoted
by r(f ), is the dimension of the image of f.
Definition
(Nullity of linear map)
.
The nullity of
f
, denoted
n
(
f
) is the
dimension of the kernel of f.
Example.
For the projection onto a plane in
R
3
, the image is the whole plane
and the rank is 2. The kernel is a line so the nullity is 1.
Theorem (Ranknullity theorem). For a linear map f : U → V ,
r(f) + n(f) = dim(U ).
Proof.
(Nonexaminable) Write
dim
(
U
) =
n
and
n
(
f
) =
m
. If
m
=
n
, then
f
is
the zero map, and the proof is trivial, since
r
(
f
) = 0. Otherwise, assume
m < n
.
Suppose
{e
1
, e
2
, ··· , e
m
}
is a basis of
ker f
, Extend this to a basis of the
whole of
U
to get
{e
1
, e
2
, ··· , e
m
, e
m+1
, ··· , e
n
}
. To prove the theorem, we
need to prove that {f(e
m+1
), f(e
m+2
), ···f(e
n
)} is a basis of im(f).
(i)
First show that it spans
im
(
f
). Take
y ∈ im
(
f
). Thus
∃x ∈ U
such that
y = f(x). Then
y = f(α
1
e
1
+ α
2
e
2
+ ··· + α
n
e
n
),
since e
1
, ···e
n
is a basis of U. Thus
y = α
1
f(e
1
) + α
2
f(e
2
) + ···+ α
m
f(e
m
) + α
m+1
f(e
m+1
) + ···+ α
n
f(e
n
).
The first
m
terms map to
0
, since
e
1
, ···e
m
is the basis of the kernel of
f
.
Thus
y = α
m+1
f(e
m+1
) + ··· + α
n
f(e
n
).
(ii) To show that they are linearly independent, suppose
α
m+1
f(e
m+1
) + ··· + α
n
f(e
n
) = 0.
Then
f(α
m+1
e
m+1
+ ··· + α
n
e
n
) = 0.
Thus
α
m+1
e
m+1
+
···
+
α
n
e
n
∈ ker
(
f
). Since
{e
1
, ··· , e
m
}
span
ker
(
f
),
there exist some α
1
, α
2
, ···α
m
such that
α
m+1
e
m+1
+ ··· + α
n
e
n
= α
1
e
1
+ ··· + α
m
e
m
.
But
e
1
···e
n
is a basis of
U
and are linearly independent. So
α
i
= 0 for all
i
.
Then the only solution to the equation
α
m+1
f
(
e
m+1
) +
···
+
α
n
f
(
e
n
) =
0
is α
i
= 0, and they are linearly independent by definition.
Example.
Calculate the kernel and image of
f
:
R
3
→ R
3
, defined by
f(x, y, z) = (x + y + z, 2x − y + 5z, x + 2z).
First find the kernel: we’ve got the system of equations:
x + y + z = 0
2x − y + 5z = 0
x + 2z = 0
Note that the first and second equation add to give 3
x
+6
z
= 0, which is identical
to the third. Then using the first and third equation, we have
y
=
−x − z
=
z
.
So the kernel is any vector in the form (−2z, z, z) and is the span of (−2, 1, 1).
To find the image, extend the basis of
ker
(
f
) to a basis of the whole of
R
3
:
{
(
−
2
,
1
,
1)
,
(0
,
1
,