IA Vectors and Matrices (Full)

Part IA — Vectors and Matrices

Based on lectures by N. Peake

Notes taken by Dexter Chua

Michaelmas 2014

These notes are not endorsed by the lecturers, and I have modified them (often

significantly) after lectures. They are nowhere near accurate representations of what

was actually lectured, and in particular, all errors are almost surely mine.

Complex numbers

Review of complex numbers, including complex conjugate, inverse, modulus, argument

and Argand diagram. Informal treatment of complex logarithm,

-th roots and complex

powers. de Moivre’s theorem. [2]

Vectors

Review of elementary algebra of vectors in

, including scalar product. Brief discussion

of vectors in

and

; scalar product and the Cauchy-Schwarz inequality. Concepts

of linear span, linear independence, subspaces, basis and dimension.

Suffix notation: including summation convention,

and

ijk

. Vector product and

triple product: definition and geometrical interpretation. Solution of linear vector

equations. Applications of vectors to geometry, including equations of lines, planes and

spheres. [5]

Matrices

Elementary algebra of 3

3 matrices, including determinants. Extension to

n × n

complex matrices. Trace, determinant, non-singular matrices and inverses. Matrices as

linear transformations; examples of geometrical actions including rotations, reflections,

dilations, shears; kernel and image. [4]

Simultaneous linear equations: matrix formulation; existence and uniqueness of solu-

tions, geometric interpretation; Gaussian elimination. [3]

Symmetric, anti-symmetric, orthogonal, hermitian and unitary matrices. Decomposition

of a general matrix into isotropic, symmetric trace-free and antisymmetric parts. [1]

Eigenvalues and Eigenvectors

Eigenvalues and eigenvectors; geometric significance. [2]

Proof that eigenvalues of hermitian matrix are real, and that distinct eigenvalues give

an orthogonal basis of eigenvectors. The effect of a general change of basis (similarity

transformations). Diagonalization of general matrices: sufficient conditions; examples

of matrices that cannot be diagonalized. Canonical forms for 2 × 2 matrices. [5]

Discussion of quadratic forms, including change of basis. Classification of conics,

cartesian and polar forms. [1]

Rotation matrices and Lorentz transformations as transformation groups. [1]

Contents

0 Introduction

1 Complex numbers

1.1 Basic properties

1.2 Complex exponential function

1.3 Roots of unity

1.4 Complex logarithm and power

1.5 De Moivre’s theorem

1.6 Lines and circles in C

2 Vectors

2.1 Definition and basic properties

2.2 Scalar product

2.2.1 Geometric picture (R

and R

only)

2.2.2 General algebraic definition

2.3 Cauchy-Schwarz inequality

2.4 Vector product

2.5 Scalar triple product

2.6 Spanning sets and bases

2.6.1 2D space

2.6.2 3D space

2.6.3 R

space

2.6.4 C

space

2.7 Vector subspaces

2.8 Suffix notation

2.9 Geometry

2.9.1 Lines

2.9.2 Plane

2.10 Vector equations

3 Linear maps

3.1 Examples

3.1.1 Rotation in R

3.1.2 Reflection in R

3.2 Linear Maps

3.3 Rank and nullity

3.4 Matrices

3.4.1 Examples

3.4.2 Matrix Algebra

3.4.3 Decomposition of an n × n matrix

3.4.4 Matrix inverse

3.5 Determinants

3.5.1 Permutations

3.5.2 Properties of determinants

3.5.3 Minors and Cofactors

4 Matrices and linear equations

4.1 Simple example, 2 × 2

4.2 Inverse of an n × n matrix

4.3 Homogeneous and inhomogeneous equations

4.3.1 Gaussian elimination

4.4 Matrix rank

4.5 Homogeneous problem Ax = 0

4.5.1 Geometrical interpretation

4.5.2 Linear mapping view of Ax = 0

4.6 General solution of Ax = d

5 Eigenvalues and eigenvectors

5.1 Preliminaries and definitions

5.2 Linearly independent eigenvectors

5.3 Transformation matrices

5.3.1 Transformation law for vectors

5.3.2 Transformation law for matrix

5.4 Similar matrices

5.5 Diagonalizable matrices

5.6 Canonical (Jordan normal) form

5.7 Cayley-Hamilton Theorem

5.8 Eigenvalues and eigenvectors of a Hermitian matrix

5.8.1 Eigenvalues and eigenvectors

5.8.2 Gram-Schmidt orthogonalization (non-examinable)

5.8.3 Unitary transformation

5.8.4 Diagonalization of n × n Hermitian matrices

5.8.5 Normal matrices

6 Quadratic forms and conics

6.1 Quadrics and conics

6.1.1 Quadrics

6.1.2 Conic sections (n = 2)

6.2 Focus-directrix property

7 Transformation groups

7.1 Groups of orthogonal matrices

7.2 Length preserving matrices

7.3 Lorentz transformations

0 Introduction

Vectors and matrices is the language in which a lot of mathematics is written

in. In physics, many variables such as position and momentum are expressed as

vectors. Heisenberg also formulated quantum mechanics in terms of vectors and

matrices. In statistics, one might pack all the results of all experiments into a

single vector, and work with a large vector instead of many small quantities. In

group theory, matrices are used to represent the symmetries of space (as well as

many other groups).

So what is a vector? Vectors are very general objects, and can in theory

represent very complex objects. However, in this course, our focus is on vectors

. We can think of each of these as an array of

real or complex

numbers. For example, (1

4) is a vector in

. These vectors are added in the

obvious way. For example, (1

4) + (3

2) = (4

6). We can also multiply

vectors by numbers, say 2(1

4) = (2

8). Often, these vectors represent

points in an n-dimensional space.

Matrices, on the other hand, represent functions between vectors, i.e. a

function that takes in a vector and outputs another vector. These, however, are

not arbitrary functions. Instead matrices represent linear functions. These are

functions that satisfy the equality

(

λx

µy

) =

λf

(

) +

µf

(

) for arbitrary

numbers

λ, µ

and vectors

x, y

. It is important to note that the function

x 7→ x

for some constant vector

is not linear according to this definition, even though

it might look linear.

It turns out that for each linear function from

, we can represent

the function uniquely by an

m × n

array of numbers, which is what we call the

matrix. Expressing a linear function as a matrix allows us to conveniently study

many of its properties, which is why we usually talk about matrices instead of

the function itself.

1 Complex numbers

, not every polynomial equation has a solution. For example, there does

not exist any

such that

+ 1 = 0, since for any

is non-negative, and

+ 1 can never be 0. To solve this problem, we introduce the “number”

that

satisfies

−

1. Then

is a solution to the equation

+ 1 = 0. Similarly,

−i

is also a solution to the equation.

We can add and multiply numbers with

. For example, we can obtain

numbers 3 +

or 1 + 3

. These numbers are known as complex numbers. It turns

out that by adding this single number

, every polynomial equation will have a

root. In fact, for an

th order polynomial equation, we will later see that there

will always be

roots, if we account for multiplicity. We will go into details in

Chapter 5.

Apart from solving equations, complex numbers have a lot of rather important

applications. For example, they are used in electronics to represent alternating

currents, and form an integral part in the formulation of quantum mechanics.

1.1 Basic properties

Definition

(Complex number)

A complex number is a number

z ∈ C

of the

form

with

a, b ∈ R

, where

−

1. We write

(

) and

(

We have

± z

= (a

+ ib

) ± (a

+ ib

)

= (a

± a

) + i(b

± b

)

= (a

+ ib

)(a

+ ib

)

= (a

− b

) + i(b

+ a

)

−1

a + ib

a − ib

+ b

Definition

(Complex conjugate)

The complex conjugate of

a −ib

It is written as ¯z or z

∗

It is often helpful to visualize complex numbers in a diagram:

Definition

(Argand diagram)

An Argand diagram is a diagram in which a

complex number

is represented by a vector





. Addition of

vectors corresponds to vector addition and ¯z is the reflection of z in the x-axis.

¯z

+ z

Definition

(Modulus and argument of complex number)

The modulus of

|z|

+ y

. The argument is

arg z

tan

−1

(

y/x

). The

modulus is the length of the vector in the Argand diagram, and the argument is

the angle between z and the real axis. We have

z = r(cos θ + i sin θ)

Clearly the pair (

r, θ

) uniquely describes a complex number

, but each complex

number

z ∈ C

can be described by many different

since

sin

) =

sin θ

and cos(2π + θ) = cos θ. Often we take the principle value θ ∈ (−π, π].

When writing z

= r

(cos θ

+ i sin θ

), we have

= r

[(cos θ

cos θ

− sin θ

sin θ

) + i(sin θ

cos θ

+ sin θ

cos θ

)]

= r

[cos(θ

+ θ

) + i sin(θ

+ θ

)]

In other words, when multiplying complex numbers, the moduli multiply and

the arguments add.

Proposition. z¯z = a

+ b

= |z|

Proposition. z

−1

= ¯z/|z|

Theorem (Triangle inequality). For all z

, z

∈ C, we have

+ z

| ≤ |z

| + |z

Alternatively, we have |z

− z

| ≥ ||z

| − |z

||.

1.2 Complex exponential function

Exponentiation was originally defined for integer powers as repeated multiplica-

tion. This is then extended to rational powers using roots. We can also extend

this to any real number since real numbers can be approximated arbitrarily

accurately by rational numbers. However, what does it mean to take an exponent

of a complex number?

To do so, we use the Taylor series definition of the exponential function:

Definition (Exponential function). The exponential function is defined as

exp(z) = e

= 1 + z +

+ ··· =

∞

n=0

This automatically allows taking exponents of arbitrary complex numbers.

Having defined exponentiation this way, we want to check that it satisfies the

usual properties, such as

exp

(

) =

exp

(

)

exp

(

). To prove this, we will

first need a helpful lemma.

Lemma.

∞

n=0

∞

m=0

∞

r=0

m=0

r−m,m

Proof.

∞

n=0

∞

m=0

= a

+ a

+ ···

+ a

+ ···

+ a

+ ···

= (a

) + (a

+ a

) + (a

+ a

) + ···

∞

r=0

m=0

r−m,m

This is not exactly a rigorous proof, since we should not hand-wave about

infinite sums so casually. But in fact, we did not even show that the definition of

exp

(

) is well defined for all numbers

, since the sum might diverge. All these

will be done in that IA Analysis I course.

Theorem. exp(z

) exp(z

) = exp(z

+ z

)

Proof.

exp(z

) exp(z

) =

∞

n=0

∞

m=0

∞

r=0

m=0

r−m

(r − m)!

∞

r=0

m=0

(r − m)!m!

r−m

∞

r=0

+ z

)

Again, to define the sine and cosine functions, instead of referring to “angles”

(since it doesn’t make much sense to refer to complex “angles”), we again use a

series definition.

Definition (Sine and cosine functions). Define, for all z ∈ C,

sin z =

∞

n=0

(−1)

(2n + 1)!

2n+1

= z −

+ ···

cos z =

∞

n=0

(−1)

(2n)!

= 1 −

+ ···

One very important result is the relationship between exp, sin and cos.

Theorem. e

= cos z + i sin z.

Alternatively, since sin(−z) = −sin z and cos(−z) = cos z, we have

cos z =

+ e

−iz

sin z =

− e

−iz

Proof.

∞

n=0

∞

n=0

(2n)!

∞

n=0

2n+1

(2n + 1)!

2n+1

∞

n=0

(−1)

(2n)!

+ i

∞

n=0

(−1)

(2n + 1)!

2n+1

= cos z + i sin z

Thus we can write z = r(cos θ + i sin θ) = re

iθ

1.3 Roots of unity

Definition

(Roots of unity)

The

th roots of unity are the roots to the equation

= 1 for

n ∈ N

. Since this is a polynomial of order

, there are

roots of

unity. In fact, the nth roots of unity are exp



2πi



for k = 0, 1, 2, 3 ···n − 1.

Proposition. If ω = exp



2πi



, then 1 + ω + ω

+ ··· + ω

n−1

= 0

Proof. Two proofs are provided:

(i)

Consider the equation

= 1. The coefficient of

n−1

is the sum of

all roots. Since the coefficient of

n−1

is 0, then the sum of all roots

= 1 + ω + ω

+ ··· + ω

n−1

= 0.

(ii)

Since

−

1 = (

ω −

1)(1 +

···

n−1

) and

ω 6

= 1, dividing by (

ω −

1),

we have 1 + ω + ··· + ω

n−1

= (ω

− 1)/(ω − 1) = 0.

1.4 Complex logarithm and power

Definition

(Complex logarithm)

The complex logarithm

log z

is a solution

, i.e.

log z

. Writing

iθ

, we have

log z

log

(

iθ

) =

log r

iθ

This can be multi-valued for different values of

and, as above, we should select

the θ that satisfies −π < θ ≤ π.

Example. log 2i = log 2 + i

Definition

(Complex power)

The complex power

for

z, α ∈ C

is defined as

α log z

. This, again, can be multi-valued, as

α log |z|

iαθ

2inπα

(there

are finitely many values if

α ∈ Q

, infinitely many otherwise). Nevertheless, we

make z

single-valued by insisting −π < θ ≤ π.

1.5 De Moivre’s theorem

Theorem (De Moivre’s theorem).

cos nθ + i sin nθ = (cos θ + i sin θ)

Proof.

First prove for the

n ≥

0 case by induction. The

= 0 case is true since

it merely reads 1 = 1. We then have

(cos θ + i sin θ)

n+1

= (cos θ + i sin θ)

(cos θ + i sin θ)

= (cos nθ + i sin nθ)(cos θ + i sin θ)

= cos(n + 1)θ + i sin(n + 1)θ

If n < 0, let m = −n. Then m > 0 and

(cosθ + i sin θ)

−m

= (cos mθ + i sin mθ)

−1

cos mθ − i sin mθ

(cos mθ + i sin mθ)(cos mθ − i sin mθ)

cos(−mθ) + i sin(−mθ)

cos

mθ + sin

mθ

= cos(−mθ) + i sin(−mθ)

= cos nθ + i sin nθ

Note that “

cos nθ

i sin nθ

inθ

= (

iθ

)

= (

cos θ

i sin θ

)

” is not a valid

proof of De Moivre’s theorem, since we do not know yet that

inθ

= (

iθ

)

. In

fact, De Moivre’s theorem tells us that this is a valid rule to apply.

Example.

We have

cos

i sin

= (

cos θ

i sin θ

)

. By binomial expansion

of the RHS and taking real and imaginary parts, we have

cos 5θ = 5 cos θ − 20 cos

θ + 16 cos

sin 5θ = 5 sin θ − 20 sin

θ + 16 sin

1.6 Lines and circles in C

Since complex numbers can be regarded as points on the 2D plane, we can often

use complex numbers to represent two dimensional objects.

Suppose that we want to represent a straight line through

∈ C

parallel to

w ∈ C

. The obvious way to do so is to let

λw

where

can take any

real value. However, this is not an optimal way of doing so, since we are not

using the power of complex numbers fully. This is just the same as the vector

equation for straight lines, which you may or may not know from your A levels.

Instead, we arrange the equation to give

z−z

. We take the complex

conjugate of this expression to obtain

¯z− ¯z

¯w

. The trick here is to realize that

λ is a real number. So we must have λ =

λ. This means that we must have

z − z

¯z − ¯z

¯w

z ¯w − ¯zw = z

¯w − ¯z

Theorem

(Equation of straight line)

The equation of a straight line through

and parallel to w is given by

z ¯w − ¯zw = z

¯w − ¯z

The equation of a circle, on the other hand, is rather straightforward. Suppose

that we want a circle with center

c ∈ C

and radius

ρ ∈ R

. By definition of a

circle, a point

is on the circle iff its distance to

, i.e.

|z −c|

. Recalling

that |z|

= z¯z, we obtain,

|z − c| = ρ

|z − c|

= ρ

(z − c)(¯z − ¯c) = ρ

z¯z − ¯cz − c¯z = ρ

− c¯c

Theorem.

The general equation of a circle with center

c ∈ C

and radius

ρ ∈ R

can be given by

z¯z − ¯cz − c¯z = ρ

− c¯c.

2 Vectors

We might have first learned vectors as arrays of numbers, and then defined

addition and multiplication in terms of the individual numbers in the vector.

This however, is not what we are going to do here. The array of numbers is just

a representation of the vector, instead of the vector itself.

Here, we will define vectors in terms of what they are, and then the various

operations are defined axiomatically according to their properties.

2.1 Definition and basic properties

Definition

(Vector)

A vector space over

is a collection of vectors

v ∈ V

together with two operations: addition of two vectors and multiplication of a

vector with a scalar (i.e. a number from R or C, respectively).

Vector addition has to satisfy the following axioms:

(i) a + b = b + a (commutativity)

(ii) (a + b) + c = a + (b + c) (associativity)

(iii) There is a vector 0 such that a + 0 = a. (identity)

(iv) For all vectors a, there is a vector (−a) such that a + (−a) = 0 (inverse)

Scalar multiplication has to satisfy the following axioms:

(i) λ(a + b) = λa + λb.

(ii) (λ + µ)a = λa + µa.

(iii) λ(µa) = (λµ)a.

(iv) 1a = a.

Often, vectors have a length and direction. The length is denoted by

|v|

. In

this case, we can think of a vector as an “arrow” in space. Note that

λa

is either

parallel (λ ≥ 0) to or anti-parallel (λ ≤ 0) to a.

Definition

(Unit vector)

A unit vector is a vector with length 1. We write a

unit vector as

Example. R

is a vector space with component-wise addition and scalar mul-

tiplication. Note that the vector space

is a line, but not all lines are vector

spaces. For example,

= 1 is not a vector space since it does not contain

2.2 Scalar product

In a vector space, we can define the scalar product of two vectors, which returns

a scalar (i.e. a real or complex number). We will first look at the usual scalar

product defined for R

, and then define the scalar product axiomatically.

2.2.1 Geometric picture (R

and R

only)

Definition

(Scalar/dot product)

. a · b

|a||b|cos θ

, where

is the angle

between a and b. It satisfies the following properties:

(i) a · b = b · a

(ii) a · a = |a|

≥ 0

(iii) a · a = 0 iff a = 0

(iv) If a · b = 0 and a, b 6= 0, then a and b are perpendicular.

Intuitively, this is the product of the parts of a and b that are parallel.

|a|

|a|cos θ

Using the dot product, we can write the projection of

onto

as (

|b|cos θ

)

(ˆa · b)ˆa.

The cosine rule can be derived as follows:

−−→

BC|

= |

−→

AC −

−−→

AB|

= (

−→

AC −

−−→

AB) · (

−→

AC −

−−→

AB)

= |

−−→

AB|

+ |

−→

AC|

− 2|

−−→

AB||

−→

AC|cos θ

We will later come up with a convenient algebraic way to evaluate this scalar

product.

2.2.2 General algebraic definition

Definition

(Inner/scalar product)

In a real vector space

, an inner product

or scalar product is a map

V × V → R

that satisfies the following axioms. It is

written as x · y or hx | yi.

(i) x · y = y · x (symmetry)

(ii) x · (λy + µz) = λx · y + µx · z (linearity in 2nd argument)

(iii) x · x ≥ 0 with equality iff x = 0 (positive definite)

Note that this is a definition only for real vector spaces, where the scalars

are real. We will have a different set of definitions for complex vector spaces.

In particular, here we can use (i) and (ii) together to show linearity in 1st

argument. However, this is generally not true for complex vector spaces.

Definition. The norm of a vector, written as |a| or kak, is defined as

|a| =

√

a · a.

Example.

Instead of the usual

vector space, we can consider the set of all

real (integrable) functions as a vector space. We can define the following inner

product:

hf | gi =

f(x)g(x) dx.

2.3 Cauchy-Schwarz inequality

Theorem (Cauchy-Schwarz inequality). For all x, y ∈ R

|x · y| ≤ |x||y|.

Proof. Consider the expression |x − λy|

. We must have

|x − λy|

≥ 0

(x − λy) · (x − λy) ≥ 0

|y|

− λ(2x · y) + |x|

≥ 0.

Viewing this as a quadratic in

, we see that the quadratic is non-negative and

thus cannot have 2 real roots. Thus the discriminant ∆ ≤ 0. So

4(x · y)

≤ 4|y|

|x|

(x · y)

≤ |x|

|y|

|x · y| ≤ |x||y|.

Note that we proved this using the axioms of the scalar product. So this

result holds for all possible scalar products on any (real) vector space.

Example.

Let

= (

α, β, γ

) and

= (1

1). Then by the Cauchy-Schwarz

inequality, we have

α + β + γ ≤

√

+ β

+ γ

+ β

+ γ

≥ αβ + βγ + γα,

with equality if α = β = γ.

Corollary (Triangle inequality).

|x + y| ≤ |x| + |y|.

Proof.

|x + y|

= (x + y) · (x + y)

= |x|

+ 2x · y + |y|

≤ |x|

+ 2|x||y| + |y|

= (|x| + |y|)

|x + y| ≤ |x| + |y|.

2.4 Vector product

Apart from the scalar product, we can also define the vector product. However,

this is defined only for R

space, but not spaces in general.

Definition

(Vector/cross product)

Consider

a, b ∈ R

. Define the vector

product

a × b = |a||b|sin θ

where

ˆn

is a unit vector perpendicular to both

and

. Since there are two

(opposite) unit vectors that are perpendicular to both of them, we pick

ˆn

to be

the one that is perpendicular to a, b in a right-handed sense.

a × b

The vector product satisfies the following properties:

(i) a × b = −b × a.

(ii) a × a = 0.

(iii) a × b = 0 ⇒ a = λb for some λ ∈ R (or b = 0).

(iv) a × (λb) = λ(a × b).

(v) a × (b + c) = a × b + a × c.

If we have a triangle

OAB

, its area is given by

−→

OA||

−−→

OB|sin θ

−→

OA×

−−→

OB|

We define the vector area as

−→

OA ×

−−→

, which is often a helpful notion when

we want to do calculus with surfaces.

There is a convenient way of calculating vector products:

Proposition.

a × b = (a

i + a

j + a

k) × (b

i + b

j + b

= (a

− a

)

i + ···



2.5 Scalar triple product

Definition (Scalar triple product). The scalar triple product is defined as

[a, b, c] = a · (b × c).

Proposition.

If a parallelepiped has sides represented by vectors

a, b, c

that

form a right-handed system, then the volume of the parallelepiped is given by

[a, b, c].

Proof.

The area of the base of the parallelepiped is given by

|b||c|sin θ

|b × c|

Thus the volume=

|b × c||a|cos φ

|a · (b × c)|

, where

is the angle between

and the normal to

and

. However, since

a, b, c

form a right-handed system,

we have a · (b × c) ≥ 0. Therefore the volume is a · (b × c).

Since the order of a, b, c doesn’t affect the volume, we know that

[a, b, c] = [b, c, a] = [c, a, b] = −[b, a, c] = −[a, c, b] = −[c, b, a].

Theorem. a × (b + c) = a × b + a × c.

Proof. Let d = a × (b + c) − a × b − a × c. We have

d · d = d · [a × (b + c)] − d · (a × b) − d · (a × c)

= (b + c) · (d × a) − b · (d × a) − c · (d × a)

= 0

Thus d = 0.

2.6 Spanning sets and bases

2.6.1 2D space

Definition

(Spanning set)

A set of vectors

{a, b}

spans

if for all vectors

r ∈ R

, there exist some λ, µ ∈ R such that r = λa + µb.

In R

, two vectors span the space if a × b 6= 0.

Theorem. The coefficients λ, µ are unique.

Proof.

Suppose that

λa

µb

. Take the vector product with

on both sides to get (

µ − µ

)

a × b

. Since

a × b 6

, then

. Similarly,

λ = λ

Definition

(Linearly independent vectors in

)

Two vectors

and

are

linearly independent if for

α, β ∈ R

αa

βb

iff

= 0. In

and

are linearly independent if a × b 6= 0.

Definition

(Basis of

)

A set of vectors is a basis of

if it spans

and

are linearly independent.

Example. {

{

}

is a basis of

. They are the standard basis

of R

2.6.2 3D space

We can extend the above definitions of spanning set and linear independent set

to R

. Here we have

Theorem.

a, b, c ∈ R

are non-coplanar, i.e.

a ·

(

b × c

)

= 0, then they form

a basis of R

Proof.

For any

, write

λa

µb

νc

. Performing the scalar product

with

b × c

on both sides, one obtains

r · (b × c)

λa · (b × c)

µb · (b × c)

νc · (b × c)

λ[a, b, c]

. Thus

[r, b, c]/[a, b, c]

. The values of

and

can

be found similarly. Thus each

can be written as a linear combination of

a, b

and c.

By the formula derived above, it follows that if

αa

βb

γc

, then

α = β = γ = 0. Thus they are linearly independent.

Note that while we came up with formulas for

λ, µ

and

, we did not actually

prove that these coefficients indeed work. This is rather unsatisfactory. We

could, of course, expand everything out and show that this indeed works, but

in IB Linear Algebra, we will prove a much more general result, saying that if

we have an

-dimensional space and a set of

linear independent vectors, then

they form a basis.

In R

, the standard basis is

k, or (1, 0, 0), (0, 1, 0) and (0, 0, 1).

2.6.3 R

space

In general, we can define

Definition

(Linearly independent vectors)

A set of vectors

, v

···v

}

is linearly independent if

i=1

= 0 ⇒ (∀i) λ

= 0.

Definition

(Spanning set)

A set of vectors

, u

···u

} ⊆ R

is a

spanning set of R

(∀x ∈ R

)(∃λ

)

i=1

= x

Definition

(Basis vectors)

A basis of

is a linearly independent spanning

set. The standard basis of

= (1

, ···

, e

= (0

, ···

, ···e

(0, 0, 0, ··· , 1).

Definition

(Orthonormal basis)

A basis

}

is orthonormal if

· e

= 0 if

i 6= j and e

· e

= 1 for all i, j.

Using the Kronecker Delta symbol, which we will define later, we can write

this condition as e

· e

= δ

Definition

(Dimension of vector space)

The dimension of a vector space is

the number of vectors in its basis. (Exercise: show that this is well-defined)

We usually denote the components of a vector

. So we have

, x

, ··· , x

Definition

(Scalar product)

The scalar product of

x, y ∈ R

is defined as

x · y =

The reader should check that this definition coincides with the

|x||y|cos θ

definition in the case of R

and R

2.6.4 C

space

is very similar to

, except that we have complex numbers. As a result, we

need a different definition of the scalar product. If we still defined

u ·v

then if we let

= (0

, i

), then

u · u

−

0. This would be bad if we want to

use the scalar product to define a norm.

Definition

(

)

. C

{

(

, z

, ··· , z

) :

∈ C}

. It has the same standard

basis as

but the scalar product is defined differently. For

u, v ∈ C

u · v

∗

. The scalar product has the following properties:

(i) u · v = (v · u)

∗

(ii) u · (λv + µw) = λ(u · v) + µ(u · w)

(iii) u · u ≥ 0 and u · u = 0 iff u = 0

Instead of linearity in the first argument, here we have (

λu

µv

)

· w

∗

u · w + µ

∗

v · w.

Example.

k=1

(−i)

|x + i

(−i)

hx + i

y | x + i

(−i)

(hx + i

y | xi + i

hx + i

y | yi)

(−i)

(hx | xi + (−i)

hy | xi + i

hx | yi + i

(−i)

hy | yi)

(−i)

[(|x|

+ |y|

) + (−1)

hy | xi + hx | yi]

= (|x|

+ |y|

)

(−i)

+ hy | xi

(−1)

+ hx | yi

= 4hx | yi.

We can prove the Cauchy-Schwarz inequality for complex vector spaces using

the same proof as the real case, except that this time we have to first multiply

by some

iθ

so that

x ·

(

iθ

) is a real number. The factor of

iθ

will drop off at

the end when we take the modulus signs.

2.7 Vector subspaces

Definition

(Vector subspace)

A vector subspace of a vector space

is a subset

that is also a vector space under the same operations. Both

and

{0}

are

subspaces of V . All others are proper subspaces.

A useful criterion is that a subset U ⊆ V is a subspace iff

(i) x, y ∈ U ⇒ (x + y) ∈ U.

(ii) x ∈ U ⇒ λx ∈ U for all scalars λ.

(iii) 0 ∈ U .

This can be more concisely written as “

is non-empty and for all

x, y ∈ U

(λx + µy) ∈ U”.

Example.

(i)

{a, b, c}

is a basis of

, then

{a + c, b + c}

is a basis of a 2D subspace.

Suppose x, y ∈ span{a + c, b + c}. Let

x = α

(a + c) + β

(b + c);

y = α

(a + c) + β

(b + c).

Then

λx + µy = (λα

+ µα

)(a + c) + (λβ

+ µβ

)(b + c) ∈ span{a + c, b + c}.

Thus this is a subspace of R

Now check that

a + c, b + c

is a basis. We only need to check linear

independence. If

(

a + c

) +

(

b + c

) =

, then

αa

βb

+ (

)

Since

{a, b, c}

is a basis of

, therefore

a, b, c

are linearly independent

and

= 0. Therefore

a + c, b + c

is a basis and the subspace has

dimension 2.

(ii)

Given a set of numbers

, let

{x ∈ R

i=1

= 0

}

. We show

that this is a vector subspace of

: Take

x, y ∈ U

, then consider

λx

µy

We have

(

λx

µy

) =

= 0. Thus

λx

µy ∈ U

The dimension of the subspace is

n −

1 as we can freely choose

for

i = 1, ··· , n − 1 and then x

is uniquely determined by the previous x

’s.

(iii)

Let

{x ∈ R

= 1

}

. Then

(

λx

µy

) =

µ 6

= 1.

Therefore W is not a vector subspace.

2.8 Suffix notation

Here we are going to introduce a powerful notation that can help us simplify a

lot of things.

First of all, let

v ∈ R

. We can write

= (

, v

So in general, the

th component of

is written as

. We can thus write

vector equations in component form. For example,

b → a

αa

βb → c

αa

βb

. A vector has one free suffix,

, while a scalar

has none.

Notation

(Einstein’s summation convention)

Consider a sum

x · y

The summation convention says that we can drop the

symbol and simply

write x · y = x

. If suffixes are repeated once, summation is understood.

Note that

is a dummy suffix and doesn’t matter what it’s called, i.e.

= x

etc.

The rules of this convention are:

(i) Suffix appears once in a term: free suffix

(ii) Suffix appears twice in a term: dummy suffix and is summed over

(iii) Suffix appears three times or more: WRONG!

Example. [(a · b)c − (a · c)b]

= a

− a

summing over j understood.

It is possible for an item to have more than one index. These objects are

known as tensors, which will be studied in depth in the IA Vector Calculus

course.

Here we will define two important tensors:

Definition (Kronecker delta).

(

1 i = j

0 i 6= j

We have













1 0 0

0 1 0

0 0 1





= I.

So the Kronecker delta represents an identity matrix.

Example.

(i) a

= a

. In general, a

= a

(i is dummy, j is free).

(ii) δ

= δ

(iii) δ

= n if we are in R

(iv) a

= a

with p, q both dummy suffices and summed over.

Definition

(Alternating symbol

ijk

)

Consider rearrangements of 1

3. We

can divide them into even and odd permutations. Even permutations include

3), (2

1) and (3

2). These are permutations obtained by performing

two (or no) swaps of the elements of (1

3). (Alternatively, it is any “rotation”

of (1, 2, 3))

The odd permutations are (2

3), (1

2) and (3

1). They are the

permutations obtained by one swap only.

Define

ijk











+1 ijk is even permutation

−1 ijk is odd permutation

0 otherwise (i.e. repeated suffices)

ijk

has 3 free suffices.

We have

123

231

312

= +1 and

213

132

321

−

112

111

= ··· = 0.

We have

(i) ε

ijk

= ε

ijj

= 0

(ii)

(i.e.

is symmetric), then

ijk

−ε

ikj

Since

ijk

ikj

(we simply renamed dummy suffices), we have

ijk

= 0.

Proposition. (a × b)

= ε

ijk

Proof. By expansion of formula

Theorem. ε

ijk

ipq

= δ

− δ

Proof. Proof by exhaustion:

RHS =











+1 if j = p and k = q

−1 if j = q and k = p

0 otherwise

LHS: Summing over

, the only non-zero terms are when

j, k 6

and

p, q 6

and

, LHS is (

−

or (+1)

= 1. If

and

, LHS is

(+1)(−1) or (−1)(+1) = −1. All other possibilities result in 0.

Equally, we have ε

ijk

pqk

= δ

− δ

and ε

ijk

pjq

= δ

− δ

Proposition.

a · (b × c) = b · (c × a)

Proof. In suffix notation, we have

a · (b × c) = a

(b × c)

= ε

ijk

= ε

jki

= b · (c × a).

Theorem (Vector triple product).

a × (b × c) = (a · c)b − (a · b)c.

Proof.

[a × (b × c)]

= ε

ijk

(b × c)

= ε

ijk

kpq

= ε

ijk

pqk

= (δ

− δ

= a

− a

= (a · c)b

− (a · b)c

Similarly, (a × b) × c = (a · c)b − (b · c)a.

Spherical trigonometry

Proposition. (a × b) · (a × c) = (a · a)(b · c) − (a · b)(a · c).

Proof.

LHS = (a × b)

(a × c)

= ε

ijk

ipq

= (δ

− δ

= a

− a

= (a · a)(b · c) − (a · b)(a · c)

Consider the unit sphere, center O, with a, b, c on the surface.

B C

δ(A, B)

Suppose we are living on the surface of the sphere. So the distance from

the arc length on the sphere. We can imagine this to be along the circumference

of the circle through

and

with center

. So the distance is

∠AOB

, which we

shall denote by

(

A, B

). So

a · b

cos ∠AOB

cos δ

(

A, B

). We obtain similar

expressions for other dot products. Similarly, we get |a × b| = sin δ(A, B).

cos α =

(a × b) · (a × c)

|a × b||a × c|

b · c − (a · b)(a · c)

|a × b||a × c|

Putting in our expressions for the dot and cross products, we obtain

cos α sin δ(A, B) sin δ(A, C) = cos δ(B, C) − cos δ(A, B) cos δ(A, C).

This is the spherical cosine rule that applies when we live on the surface of a

sphere. What does this spherical geometry look like?

Consider a spherical equilateral triangle. Using the spherical cosine rule,

cos α =

cos δ − cos

sin

= 1 −

1 + cos δ

Since

cos δ ≤

1, we have

cos α ≤

and

α ≥

◦

. Equality holds iff

= 0, i.e. the

triangle is simply a point. So on a sphere, each angle of an equilateral triangle is

greater than 60

◦

, and the angle sum of a triangle is greater than 180

◦

2.9 Geometry

2.9.1 Lines

Any line through a and parallel to t can be written as

x = a + λt.

By crossing both sides of the equation with t, we have

Theorem. The equation of a straight line through a and parallel to t is

(x − a) × t = 0 or x × t = a × t.

2.9.2 Plane

To define a plane Π, we need a normal

to the plane and a fixed point

. For

any

x ∈

Π, the vector

x − b

is contained in the plane and is thus normal to

i.e. (x − b) · n = 0.

Theorem. The equation of a plane through b with normal n is given by

x · n = b · n.

n = ˆn

is a unit normal, then

x · ˆn = b · ˆn

is the perpendicular distance

from the origin to Π.

Alternatively, if a, b, c lie in the plane, then the equation of the plane is

(x − a) · [(b − a) × (c − a)] = 0.

Example.

(i)

Consider the intersection between a line

x × t = a × t

with the plane

x · n = b · n. Cross n on the right with the line equation to obtain

(x · n)t − (t · n)x = (a × t) × n

Eliminate x · n using x · n = b · n

(t · n)x = (b · n)t − (a × t) × n

Provided t · n is non-zero, the point of intersection is

x =

(b · n)t − (a × t) × n

t · n

Exercise: what if t · n = 0?

(ii)

Shortest distance between two lines. Let

be (

x − a

)

× t

and

be (x − a

) × t

= 0.

The distance of closest approach

is along a line perpendicular to both

and

, i.e. the line of closest approach is perpendicular to both lines and

thus parallel to

× t

. The distance

can then be found by projecting

− a

onto t

× t

. Thus s =



− a

) ·

×t



2.10 Vector equations

Example. x − (x × a) × b = c

. Strategy: take the dot or cross of the equation

with suitable vectors. The equation can be expanded to form

x − (x · b)a + (a · b)x = c.

Dot this with b to obtain

x · b − (x · b)(a · b) + (a · b)(x · b) = c · b

x · b = c · b.

Substituting this into the original equation, we have

x(1 + a · b) = c + (c · b)a

If (1 + a · b) is non-zero, then

x =

c + (c · b)a

1 + a · b

Otherwise, when (1 +

a · b

) = 0, if

c + (c · b)a 6= 0

, then a contradiction is

reached. Otherwise,

x · b = c · b

is the most general solution, which is a plane

of solutions.

3 Linear maps

A linear map is a special type of function between vector spaces. In fact, most

of the time, these are the only functions we actually care about. They are maps

that satisfy the property f(λa + µb) = λf(a) + µf (b).

We will first look at two important examples of linear maps — rotations and

reflections, and then study their properties formally.

3.1 Examples

3.1.1 Rotation in R

, first consider the simple cases where we rotate about the

axis by

. We

call this rotation R and write x

= R(x).

Suppose that initially,

= (

x, y, z

) = (

r cos φ, r sin φ, z

). Then after a

rotation by θ, we get

= (r cos(φ + θ), r sin(φ + θ), z)

= (r cos φ cos θ − r sin φ sin θ, r sin φ cos θ + r cos φ sin θ, z)

= (x cos θ − y sin θ, x sin θ + y cos θ, z).

We can represent this by a matrix

such that

. Using our formula

above, we obtain

R =





cos θ −sin θ 0

sin θ cos θ 0

0 0 1





Now consider the general case where we rotate by θ about

B A

We have x

−−→

OB +

−−→

BC +

−−→

. We know that

−−→

OB = (ˆn · x)ˆn

−−→

BC =

−−→

BA cos θ

= (

−−→

BO +

−→

OA) cos θ

= (−(ˆn · x)ˆn + x) cos θ

Finally, to get

−→

, we know that

−−→

|sin θ

−−→

BA|sin θ

|ˆn × x|sin θ

Also,

−−→

is parallel to

n × x. So we must have

−−→

= (

n × x) sin θ.

Thus x

= x cos θ + (1 − cos θ)(ˆn · x)ˆn + ˆn × x sin θ. In components,

= x

cos θ + (1 − cos θ)n

− ε

ijk

sin θ.

We want to find an R such that x

= R

. So

= δ

cos θ + (1 − cos θ)n

− ε

ijk

sin θ.

3.1.2 Reflection in R

Suppose we want to reflect through a plane through

with normal

. First of

all the projection of

onto

is given by (

x ·

)

. So we get

x −

(x · ˆn)ˆn

In suffix notation, we have

−

. So our reflection matrix is

= δ

− 2n

n x

3.2 Linear Maps

Definition

(Domain, codomain and image of map)

Consider sets

and

and mapping

A → B

such that each

x ∈ A

is mapped into a unique

(

)

∈ B

is the domain of

and

is the co-domain of

. Typically,

we have T : R

→ R

or T : C

→ C

Definition

(Linear map)

Let

V, W

be real (or complex) vector spaces, and

T : V → W . Then T is a linear map if

(i) T (a + b) = T (a) + T (b) for all a, b ∈ V .

(ii) T (λa) = λT (a) for all λ ∈ R (or C).

Equivalently, we have T (λa + µb) = λT (a) + µT (b).

Example.

(i)

Consider a translation

→ R

with

(

) =

x + a

for some fixed,

given

. This is not a linear map since

(

λx

µy

)

λx

µy

+ (

)

(ii) Rotation, reflection and projection are linear transformations.

Definition

(Image and kernel of map)

The image of a map

U → V

is the

subset of V {f(u) : u ∈ U}. The kernel is the subset of U {u ∈ U : f(u) = 0}.

Example.

(i)

Consider

→ R

with

(

x, y, z

) = (

x − z

). Simple yet

tedious algebra shows that this is linear. Now consider the effect of

the standard basis.

0) = (1

2),

0) = (1

0) and

1) =

, −

1). Clearly these are linearly dependent, but they do span the whole

of R

. We can say S(R

) = R

. So the image is R

Now solve

(

x, y, z

) =

. We need

= 0 and 2

x − z

= 0. Thus

(

x, −x,

), i.e. it is parallel to (1

, −

2). So the set

{λ

, −

2) :

λ ∈ R}

is the kernel of S.

(ii)

Consider a rotation in

. The kernel is the zero vector and the image is

(iii)

Consider a projection of

onto a plane with normal

ˆn

. The image is the

plane itself, and the kernel is any vector parallel to ˆn

Theorem.

Consider a linear map

U → V

, where

U, V

are vector spaces.

Then im(f ) is a subspace of V , and ker(f) is a subspace of U.

Proof. Both are non-empty since f(0) = 0.

x, y ∈ im

(

), then

∃a, b ∈ U

such that

(

)

, y

(

). Then

λx

µy

λf

(

) +

µf

(

) =

(

λa

µb

). Now

λa

µb ∈ U

since

is a vector

space, so there is an element in

that maps to

λx

µy

. So

λx

µy ∈ im

(

)

and im(f ) is a subspace of V .

Suppose

x, y ∈ ker

(

), i.e.

(

) =

(

) =

. Then

(

λx

µy

) =

λf

(

) +

µf(y) = λ0 + µ0 = 0. Therefore λx + µy ∈ ker(f).

3.3 Rank and nullity

Definition

(Rank of linear map)

The rank of a linear map

U → V

, denoted

by r(f ), is the dimension of the image of f.

Definition

(Nullity of linear map)

The nullity of

, denoted

(

) is the

dimension of the kernel of f.

Example.

For the projection onto a plane in

, the image is the whole plane

and the rank is 2. The kernel is a line so the nullity is 1.

Theorem (Rank-nullity theorem). For a linear map f : U → V ,

r(f) + n(f) = dim(U ).

Proof.

(Non-examinable) Write

dim

(

) =

and

(

) =

. If

, then

the zero map, and the proof is trivial, since

(

) = 0. Otherwise, assume

m < n

Suppose

, e

, ··· , e

}

is a basis of

ker f

, Extend this to a basis of the

whole of

to get

, e

, ··· , e

, e

m+1

, ··· , e

}

. To prove the theorem, we

need to prove that {f(e

m+1

), f(e

m+2

), ···f(e

)} is a basis of im(f).

(i)

First show that it spans

(

). Take

y ∈ im

(

). Thus

∃x ∈ U

such that

y = f(x). Then

y = f(α

+ α

+ ··· + α

since e

, ···e

is a basis of U. Thus

y = α

f(e

) + α

f(e

) + ···+ α

f(e

) + α

m+1

f(e

m+1

) + ···+ α

f(e

The first

terms map to

, since

, ···e

is the basis of the kernel of

Thus

y = α

m+1

f(e

m+1

) + ··· + α

f(e

(ii) To show that they are linearly independent, suppose

m+1

f(e

m+1

) + ··· + α

f(e

) = 0.

Then

f(α

m+1

+ ··· + α

) = 0.

Thus

m+1

···

∈ ker

(

). Since

, ··· , e

}

span

ker

(

there exist some α

, α

, ···α

such that

m+1

+ ··· + α

= α

+ ··· + α

But

···e

is a basis of

and are linearly independent. So

= 0 for all

Then the only solution to the equation

m+1

(

m+1

) +

···

(

) =

is α

= 0, and they are linearly independent by definition.

Example.

Calculate the kernel and image of

→ R

, defined by

f(x, y, z) = (x + y + z, 2x − y + 5z, x + 2z).

First find the kernel: we’ve got the system of equations:

x + y + z = 0

2x − y + 5z = 0

x + 2z = 0

Note that the first and second equation add to give 3

= 0, which is identical

to the third. Then using the first and third equation, we have

−x − z

So the kernel is any vector in the form (−2z, z, z) and is the span of (−2, 1, 1).

To find the image, extend the basis of

ker

(

) to a basis of the whole of

{

(

−

}

. Apply

to this basis to obtain (0

, −

and (1

2). From the proof of the rank-nullity theorem, we know that

and f (0, 0, 1) is a basis of the image.

To get the standard form of the image, we know that the normal to the plane

is parallel to (1

, −

3). Since

0 ∈ im

(

), the equation of

the plane is x + y − 3z = 0.

3.4 Matrices

In the examples above, we have represented our linear maps by some object

such that

. We call

the matrix for the linear map. In general, let

α : R

→ R

be a linear map, and x

= α(x).

Let {e

} be a basis of R

. Then x = x

for some x

. Then we get

= α(x

) = x

α(e

So we get that

= [α(e

)]

We now define A

= [α(e

)]

. Then x

= A

. We write

A = {A

} =







··· A

. A

··· A







Here

is the entry in the

th row of the

th column. We say that

is an

m × n matrix, and write x

= Ax.

We see that the columns of the matrix are the images of the standard basis

vectors under the mapping α.

Example.

3.4.1 Examples

(i)

, consider a reflection in a line with an angle

to the

axis. We

know that

i 7→ cos

sin

, with

j 7→ −cos

sin

. Then the

matrix is



cos 2θ sin 2θ

sin 2θ −cos 2θ



(ii)

, as we’ve previously seen, a rotation by

about the

axis is given

R =





cos θ −sin θ 0

sin θ cos θ 0

0 0 1





(iii)

, a reflection in plane with normal

is given by

−

ˆn

Written as a matrix, we have





1 − 2ˆn

−2ˆn

ˆn

−2ˆn

ˆn

−2ˆn

ˆn

1 − 2ˆn

−2ˆn

ˆn

−2ˆn

ˆn

−2ˆn

ˆn

1 − 2ˆn





(iv)

Dilation (“stretching”)

→ R

is given by a map (

x, y, z

)

7→

(λx, µy, νz) for some λ, µ, ν. The matrix is





λ 0 0

0 µ 0

0 0 ν





(v) Shear: Consider S : R

→ R

that sheers in the x direction:

x x

sheer in x direction

We have (x, y, z) 7→ (x + λy, y, z). Then

S =





1 λ 0

0 1 0

0 0 1





3.4.2 Matrix Algebra

This part is mostly on a whole lot of definitions, saying what we can do with

matrices and classifying them into different types.

Definition

(Addition of matrices)

Consider two linear maps

α, β

→ R

The sum of α and β is defined by

(α + β)(x) = α(x) + β(x)

In terms of the matrix, we have

(A + B)

= A

+ B

(A + B)

= A

+ B

Definition

(Scalar multiplication of matrices)

Define (

λα

)

[

(

)]. So

(λA)

= λA

Definition

(Matrix multiplication)

Consider maps

→ R

and

→ R

. The composition is

βα

→ R

. Take

x ∈ R

7→ x

∈ R

Then

= (

)

, where

. Using suffix notation, we have

= (Bx

)

= b

= B

. But x

= (BA)

. So

(BA)

= B

Generally, an

m ×n

matrix multiplied by an

n ×`

matrix gives an

m ×`

matrix.

(BA)

is given by the ith row of B dotted with the jth column of A.

Note that the number of columns of

has to be equal to the number of rows

for multiplication to be defined. If

as well, then both

and

make sense, but

AB 6

in general. In fact, they don’t even have to have the

same dimensions.

Also, since function composition is associative, we get A(BC) = (AB)C.

Definition

(Transpose of matrix)

is an

m × n

matrix, the transpose

is an n × m matrix defined by (A

)

= A

Proposition.

(i) (A

)

= A.

(ii) If x is a column vector













, x

is a row vector (x

···x

(iii) (AB)

= B

since (AB)

= (AB)

= A

= B

= (B

)

= (B

)

Definition

(Hermitian conjugate)

Define

†

= (

)

∗

. Similarly, (

)

†

Definition (Symmetric matrix). A matrix is symmetric if A

= A.

Definition

(Hermitian matrix)

A matrix is Hermitian if

†

. (The diagonal

of a Hermitian matrix must be real).

Definition

(Anti/skew symmetric matrix)

A matrix is anti-symmetric or skew

symmetric if A

= −A. The diagonals are all zero.

Definition

(Skew-Hermitian matrix)

A matrix is skew-Hermitian if

†

−A

The diagonals are pure imaginary.

Definition

(Trace of matrix)

The trace of an

n ×n

matrix

is the sum of the

diagonal. tr(A) = A

Example.

Consider the reflection matrix

−

ˆn

. We have

(

) =

= 3 − 2ˆn · ˆn = 3 − 2 = 1.

Proposition. tr(BC) = tr(CB)

Proof. tr(BC) = B

= C

= (CB)

= tr(CB)

Definition (Identity matrix). I = δ

3.4.3 Decomposition of an n × n matrix

Any

n ×n

matrix

can be split as a sum of symmetric and antisymmetric parts.

Write

+ B

)

| {z }

− B

)

| {z }

We have

, so

is symmetric, while

−A

, and

is antisymmetric.

So B = S + A.

Furthermore , we can decompose

into an isotropic part (a scalar multiple

of the identity) plus a trace-less part (i.e. sum of diagonal = 0). Write

tr(S)δ

| {z }

isotropic part

+ (S

−

tr(S)δ

)

| {z }

We have tr(T ) = T

= S

−

tr(S)δ

= tr(S) −

tr(S)(n) = 0.

Putting all these together,

B =

tr(B)I +



(B + B

) −

tr(B)I



(B − B

In three dimensions, we can write the antisymmetric part

in terms of a single

vector: we have

A =





0 a −b

−a 0 c

b −c 0





and we can consider

ijk





0 ω

−ω

0 ω

−ω





So if we have ω = (c, b, a), then A

= ε

ijk

This decomposition can be useful in certain physical applications. For

example, if the matrix represents the stress of a system, different parts of the

decomposition will correspond to different types of stresses.

3.4.4 Matrix inverse

Definition

(Inverse of matrix)

Consider an

m×n

matrix

and

n×m

matrices

and

. If

, then we say

is the left inverse of

. If

, then

we say

is the right inverse of

. If

is square (

n × n

), then

(

) =

(

)

, i.e. the left and right inverses coincide. Both are denoted by

−1

the inverse of A. Therefore we have

−1

= A

−1

A = I.

Note that not all square matrices have inverses. For example, the zero matrix

clearly has no inverse.

Definition (Invertible matrix). If A has an inverse, then A is invertible.

Proposition. (AB)

−1

= B

−1

Proof. (B

−1

)(AB) = B

−1

A)B = B

−1

B = I.

Definition

(Orthogonal and unitary matrices)

A real

n×n

matrix is orthogonal

, i.e.

−1

. A complex

n × n

matrix is unitary if

†

U = UU

†

= I, i.e. U

†

= U

−1

Note that an orthogonal matrix

satisfies

(

) =

, i.e.

We can see this as saying “the scalar product of two distinct rows is 0, and the

scalar product of a row with itself is 1”. Alternatively, the rows (and columns —

by considering A

) of an orthogonal matrix form an orthonormal set.

Similarly, for a unitary matrix,

†

, i.e.

∗

. i.e.

the rows are orthonormal, using the definition of complex scalar product.

Example.

(i)

The reflection in a plane is an orthogonal matrix. Since

−

We have

= (δ

− 2n

)(δ

− 2n

)

= δ

− 2δ

+ 2n

= δ

− 2n

+ 4n

)

= δ

(ii)

The rotation is an orthogonal matrix. We could multiply out using suffix

notation, but it would be cumbersome to do so. Alternatively, denote

rotation matrix by

about

ˆn

(

θ, ˆn

). Clearly,

(

θ, ˆn

)

−1

(

−θ, ˆn

We have

(−θ, ˆn) = (cos θ)δ

+ n

(1 − cos θ) + ε

ijk

sin θ

= (cos θ)δ

+ n

(1 − cos θ) − ε

jik

sin θ

= R

(θ, ˆn)

In other words, R(−θ, ˆn) = R(θ, ˆn)

. So R(θ, ˆn)

−1

= R(θ, ˆn)

3.5 Determinants

Consider a linear map

→ R

. The standard basis

, e

is mapped to

, e

with

. Thus the unit cube formed by

, e

is mapped to

the parallelepiped with volume

, e

] = ε

ijk

)

= ε

ijk

)

|{z}

)

|{z}

)

|{z}

= ε

ijk

We call this the determinant and write as

det(A) =



3.5.1 Permutations

To define the determinant for square matrices of arbitrary size, we first have to

consider permutations.

Definition (Permutation). A permutation of a set S is a bijection ε : S → S.

Notation.

Consider the set

of all permutations of 1

, ··· , n

contains

n! elements. Consider ρ ∈ S

with i 7→ ρ(i). We write

ρ =



1 2 ··· n

ρ(1) ρ(2) ··· ρ(n)



Definition

(Fixed point)

A fixed point of

is a

such that

(

) =

. e.g. in



1 2 3 4

4 1 3 2



, 3 is the fixed point. By convention, we can omit the fixed point

and write as



1 2 4

4 1 2



Definition (Disjoint permutation). Two permutations are disjoint if numbers

moved by one are fixed by the other, and vice versa. e.g.



1 2 4 5 6

5 6 1 4 2





2 6

6 2



1 4 5

5 1 4



, and the two cycles on the right hand side are disjoint.

Disjoint permutations commute, but in general non-disjoint permutations do

not.

Definition

(Transposition and

-cycle)



2 6

6 2



is a 2-cycle or a transposition,

and we can simply write (2 6).



1 4 5

5 1 4



is a 3-cycle, and we can simply write

(1 5 4). (1 is mapped to 5; 5 is mapped to 4; 4 is mapped to 1)

Proposition. Any q-cycle can be written as a product of 2-cycles.

Proof. (1 2 3 ··· n) = (1 2)(2 3)(3 4) ···(n − 1 n).

Definition

(Sign of permutation)

The sign of a permutation

(

) is (

−

where

is the number of 2-cycles when

is written as a product of 2-cycles. If

(

) = +1, it is an even permutation. Otherwise, it is an odd permutation. Note

that ε(ρσ) = ε(ρ)ε(σ) and ε(ρ

−1

) = ε(ρ).

The proof that this is well-defined can be found in IA Groups.

Definition (Levi-Civita symbol). The Levi-Civita symbol is defined by

···j











+1 if j

···j

is an even permutation of 1, 2, ···n

−1 if it is an odd permutation

0 if any 2 of them are equal

Clearly, ε

ρ(1)ρ(2)···ρ(n)

= ε(ρ).

Definition

(Determinant)

The determinant of an

n ×n

matrix

is defined as:

det(A) =

σ∈S

ε(σ)A

σ(1)1

σ(2)2

···A

σ(n)n

or equivalently,

det(A) = ε

···j

···A

Proposition.



a b

c d



= ad − bc

3.5.2 Properties of determinants

Proposition. det(A) = det(A

Proof.

Take a single term

σ(1)1

σ(2)2

···A

σ(n)n

and let

be another permuta-

tion in S

. We have

σ(1)1

σ(2)2

···A

σ(n)n

= A

σ(ρ(1))ρ(1)

σ(ρ(2))ρ(2)

···A

σ(ρ(n))ρ(n)

since the right hand side is just re-ordering the order of multiplication. Choose

ρ = σ

−1

and note that ε(σ) = ε(ρ). Then

det(A) =

ρ∈S

ε(ρ)A

1ρ(1)

2ρ(2)

···A

nρ(n)

= det(A

Proposition.

If matrix

is formed by multiplying every element in a single row

by a scalar

, then

det

(

) =

λ det

(

). Consequently,

det

(

λA

) =

det

(

Proof.

Each term in the sum is multiplied by

, so the whole sum is multiplied

by λ

Proposition.

If 2 rows (or 2 columns) of

are identical, the determinant is 0.

Proof. wlog, suppose columns 1 and 2 are the same. Then

det(A) =

σ∈S

ε(σ)A

σ(1)1

σ(2)2

···A

σ(n)n

Now write an arbitrary

in the form

(1 2). Then

(

) =

(

)

((1 2)) =

−ε(ρ). So

det(A) =

ρ∈S

−ε(ρ)A

ρ(2)1

ρ(1)2

ρ(3)3

···A

ρ(n)n

But columns 1 and 2 are identical, so

ρ(2)1

ρ(2)2

and

ρ(1)2

ρ(1)1

. So

det(A) = −det(A) and det(A) = 0.

Proposition.

If 2 rows or 2 columns of a matrix are linearly dependent, then

the determinant is zero.

Proof. Suppose in A, (column r) + λ(column s) = 0. Define

(

j 6= r

+ λA

j = r

Then

det

(

) =

det

(

) +

λ det

(matrix with column

= column

) =

det

(

Then we can see that the

th column of

is all zeroes. So each term in the sum

contains one zero and det(A) = det(B) = 0.

Even if we don’t have linearly dependent rows or columns, we can still run

the exact same proof as above, and still get that

det

(

) =

det

(

). Linear

dependence is only required to show that

det

(

) = 0. So in general, we can add

a linear multiple of a column (or row) onto another column (or row) without

changing the determinant.

Proposition.

Given a matrix

, if

is a matrix obtained by adding a multiple

of a column (or row) of

to another column (or row) of

, then

det A

det B

Corollary.

Swapping two rows or columns of a matrix negates the determinant.

Proof. We do the column case only. Let A = (a

···a

). Then

det(a

···a

) = det(a

···a

+ a

···a

)

= det(a

···a

+ a

···a

− (a

+ a

) ···a

)

= det(a

···a

+ a

··· − a

···a

)

= det(a

···a

··· − a

···a

)

= −det(a

···a

)

Alternatively, we can prove this from the definition directly, using the fact that

the sign of a transposition is −1 (and that the sign is multiplicative).

Proposition. det(AB) = det(A) det(B).

Proof.

First note that

(

)

σ(1)ρ(1)

σ(2)ρ(2)

(

)

det

(

), i.e. swapping

columns (or rows) an even/odd number of times gives a factor

1 respectively.

We can prove this by writing σ = µρ.

Now

det AB =

ε(σ)(AB)

σ(1)1

(AB)

σ(2)2

···(AB)

σ(n)n

ε(σ)

,···,k

σ(1)k

···A

σ(n)k

,···,k

···B

ε(σ)A

σ(1)k

σ(2)k

···A

σ(n)k

| {z }

Now consider the many different

’s. If in

, two of

and

are equal, then

is a determinant of a matrix with two columns the same, i.e.

= 0. So we only

have to consider the sum over distinct

s. Thus the

s are are a permutation

of 1, ···n, say k

= ρ(i). Then we can write

det AB =

ρ(1)1

···B

ρ(n)n

ε(σ)A

σ(1)ρ(1)

···A

σ(n)ρ(n)

ρ(1)1

···B

ρ(n)n

(ε(ρ) det A)

= det A

ε(ρ)B

ρ(1)1

···B

ρ(n)n

= det A det B

Corollary. If A is orthogonal, det A = ±1.

Proof.

= I

det AA

= det I

det A det A

= 1

(det A)

= 1

det A = ±1

Corollary. If U is unitary, |det U| = 1.

Proof.

We have

det U

†

= (

det U

)

∗

det

(

)

∗

. Since

†

, we have

det(U) det(U)

∗

= 1.

Proposition.

, orthogonal matrices represent either a rotation (

det

= 1)

or a reflection (det = −1).

3.5.3 Minors and Cofactors

Definition

(Minor and cofactor)

For an

n × n

matrix

, define

to be the

(n − 1) × (n − 1) matrix in which row i and column j of A have been removed.

The minor of the ijth element of A is M

= det A

The cofactor of the ijth element of A is ∆

= (−1)

i+j

Notation.

We use

to denote a symbol which has been missed out of a natural

sequence.

Example. 1, 2, 3, 5 = 1, 2, 3,

4, 5.

The significance of these definitions is that we can use them to provide a

systematic way of evaluating determinants. We will also use them to find inverses

of matrices.

Theorem (Laplace expansion formula). For any particular fixed i,

det A =

j=1

∆

Proof.

det A =

,···,j

,···j

···j

···A

Let

σ ∈ S

be the permutation which moves

to the

th position, and leave

everything else in its natural order, i.e.

σ =



1 ··· i i + 1 i + 2 ··· j

− 1 j

+ 1 ··· n

1 ··· j

i i + 1 ··· j

− 2 j

− 1 j

+ 1 ··· n



> i

, and similarly for other cases. To perform this permutation,

|i − j

transpositions are made. So ε(σ) = (−1)

i−j

Now consider the permutation ρ ∈ S

ρ =



1 ··· ···

··· n

···

··· ··· j



The composition

ρσ

reorders (1

, ··· , n

) to (

, j

, ··· , j

). So

(

ρσ

) =

···j

ε(ρ)ε(σ) = (−1)

i−j

···

···j

. Hence the original equation becomes

det A =

···

···j

(−1)

i−j

···

···j

···A

(−1)

i−j

∆

j=1

∆

Example. det A =



2 4 2

3 2 1

2 0 1



. We can pick the first row and have

det A = 2



2 1

0 1



− 4



3 1

2 1



+ 2



3 2

2 0



= 2(2 − 0) − 4(3 − 2) + 2(0 − 4)

= −8.

Alternatively, we can pick the second column and have

det A = −4



3 1

2 1



+ 2



2 2

2 1



− 0



2 2

3 1



= −4(3 − 2) + 2(2 − 4) − 0

= −8.

In practical terms, we use a combination of properties of determinants with

a sensible choice of i to evaluate det(A).

Example. Consider



1 a a

1 b b

1 c c



. Row 1 - row 2 gives



0 a − b a

− b

1 b b

1 c c



= (a − b)



0 1 a + b

1 b b

1 c c



Do row 2 - row 3. We obtain

(a − b)(b − c)



0 1 a + b

0 1 b + c

1 c c



Row 1 - row 2 gives

(a − b)(b − c)(a − c)



0 0 1

0 1 b + c

1 c c



= (a − b)(b − c)(a − c).

4 Matrices and linear equations

4.1 Simple example, 2 × 2

Consider the system of equations

+ A

= d

(a)

+ A

= d

. (b)

We can write this as

Ax = d.

If we do (a)×A

−(b)×A

and similarly the other way round, we obtain

− A

= A

− A

)

| {z }

det A

= A

− A

Dividing by det A and writing in matrix form, we have





det A



−A





On the other hand, given the equation

, if

−1

exists, then by multiplying

both sides on the left by A

−1

, we obtain x = A

−1

Hence, we have constructed

−1

in the 2

2 case, and shown that the

condition for its existence is det A 6= 0, with

−1

det A



−A



4.2 Inverse of an n × n matrix

For larger matrices, the formula for the inverse is similar, but slightly more

complicated (and costly to evaluate). The key to finding the inverse is the

following:

Lemma.

∆

= δ

det A.

Proof.

i 6

, then consider an

n × n

matrix

, which is identical to

except

the

th row is replaced by the

th row of

. So ∆

= ∆

, since ∆

does not depend on the elements in row

. Since

has a duplicate row, we know

that

0 = det B =

k=1

∆

k=1

∆

If i = j, then the expression is det A by the Laplace expansion formula.

Theorem. If det A 6= 0, then A

−1

exists and is given by

−1

)

∆

det A

Proof.

−1

)

∆

det A

= δ

So A

−1

A = I.

The other direction is easy to prove. If

det A

= 0, then it has no inverse,

since for any matrix B, det AB = 0, and hence AB cannot be the identity.

Example.

Consider the shear matrix





1 λ 0

0 1 0

0 0 1





. We have

det S

= 1.

The cofactors are

∆

= 1 ∆

= 0 ∆

= 0

∆

− λ ∆

= 1 ∆

= 0

∆

= 0 ∆

= 1

So S

−1





1 −λ 0

0 1 0

0 0 1





How many arithmetic operations are involved in calculating the inverse of an

n × n matrix? We just count multiplication operations since they are the most

time-consuming. Suppose that calculating

det A

takes

multiplications. This

involves

(

n −

(

n −

1) determinants, and you need

more multiplications to

put them together. So f

= nf

n−1

+ n. So f

= O(n!) (in fact f

≈ (1 + e)n!).

To find the inverse, we need to calculate

cofactors. Each is a

n −

determinant, and each takes

((

n−

1)!). So the time complexity is

(

n−

1)!) =

O(n · n!).

This is incredibly slow. Hence while it is theoretically possible to solve

systems of linear equations by inverting a matrix, sane people do not do so

in general. Instead, we develop certain better methods to solve the equations.

In fact, the “usual” method people use to solve equations by hand only has

complexity O(n

), which is a much better complexity.

4.3 Homogeneous and inhomogeneous equations

Consider

where

is an

n ×n

matrix,

and

are

n ×

1 column vectors.

Definition

(Homogeneous equation)

, then the system is homogeneous.

Otherwise, it’s inhomogeneous.

Suppose

det A 6

= 0. Then there is a unique solution

−1

(

for

homogeneous).

How can we understand this result? Recall that

det A 6

= 0 means that the

columns of

are linearly independent. The columns are the images of the stan-

dard basis,

. So

det A 6

= 0 means that

are linearly independent and

form a basis of

. Therefore the image is the whole of

. This automatically

ensures that b is in the image, i.e. there is a solution.

To show that there is exactly one solution, suppose

and

are both solutions.

Then

. So

(

x − x

) =

. So

x − x

is in the kernel of

. But

since the rank of

, by the rank-nullity theorem, the nullity is 0. So the

kernel is trivial. So x − x

= 0, i.e. x = x

4.3.1 Gaussian elimination

Consider a general solution

+ A

+ ··· + A

= d

+ A

+ ··· + A

= d

+ A

+ ··· + A

= d

So we have m equations and n unknowns.

Assume

= 0 (if not, we can re-order the equations). We can use the

first equation to eliminate

from the remaining (

m −

1) equations. Then use

the second equation to eliminate

from the remaining (

m −

2) equations (if

anything goes wrong, just re-order until things work). Repeat.

We are left with

+ A

+ ··· + A

= d

(2)

+ A

(2)

+ ··· + A

(2)

= d

(r)

+ ··· + A

(r)

= d

0 = d

(r)

r+1

0 = d

(r)

Here

(i)

= 0 (which we can achieve by re-ordering), and the superfix (

) refers

to the “version number” of the coefficient, e.g.

(2)

is the second version of the

coefficient of x

in the second row.

Let’s consider the different possibilities:

(i) r < m

and at least one of

(r)

r+1

, ···d

(r)

= 0. Then a contradiction is

reached. The system is inconsistent and has no solution. We say it is

overdetermined.

Example. Consider the system

+ 2x

+ x

= 3

+ 3x

= 0

+ 2x

+ 4x

= 6

This becomes

+ 2x

+ x

= 3

0 − x

+ x

= −6

0 − 2x

+ 2x

= 0

And then

+ 2x

+ x

= 3

0 − x

+ x

= −6

0 = 12

We have d

(3)

= 12 = 0 and there is no solution.

(ii)

n ≤ m

, and all

(r)

r+i

= 0. Then from the

th equation, there

is a unique solution for

(n)

, and hence for all

by back

substitution. This system is determined.

Example.

+ 5x

= 2

+ 3x

= 11

This becomes

+ 5x

= 2

−7x

= 7

So x

= −1 and thus x

= 7/2.

(iii)

r < n

and

(r)

r+i

= 0, then

r+1

, ···x

can be freely chosen, and there

are infinitely many solutions. System is under-determined. e.g.

+ x

= 1

+ 2x

= 2

Which gives

+ x

= 1

0 = 0

So x

= 1 − x

is a solution for any x

In the

case, there are

(

) operations involved, which is much less than

inverting the matrix. So this is an efficient way of solving equations.

This is also be related to the determinant. Consider the case where

and

is square. Since row operations do not change the determinant and

swapping rows give a factor of (−1). So

det A = (−1)



··· ··· ··· A

0 A

(2)

··· ··· ··· A

(n)

0 0 ··· A

(r)

··· A

(n)

0 0 ··· 0 0 ···



This determinant is an upper triangular one (all elements below diagonal are 0)

and the determinant is the product of its diagonal elements.

Hence if

r < n

(and

(r)

= 0 for

i > r

), then we have case (ii) and the

det A = 0. If r = n, then det A = (−1)

(2)

···A

(n)

6= 0.

4.4 Matrix rank

Consider a linear map

→ R

. Recall the rank

(

) is the dimension of

the image. Suppose that the matrix

is associated with the linear map. We

also call r(A) the rank of A.

Recall that if the standard basis is

, ···e

, then

, ··· , Ae

span the

image (but not necessarily linearly independent).

Further,

, ··· , Ae

are the columns of the matrix

. Hence

(

) is the

number of linearly independent columns.

Definition

(Column and row rank of linear map)

The column rank of a matrix

is the maximum number of linearly independent columns.

The row rank of a matrix is the maximum number of linearly independent

rows.

Theorem. The column rank and row rank are equal for any m × n matrix.

Proof.

Let

be the row rank of

. Write the biggest set of linearly independent

rows as

, v

, ···v

or in component form

= (

, v

, ··· , v

) for

1, 2, ··· , r.

Now denote the ith row of A as r

= (A

, A

, ···A

Note that every row of

can be written as a linear combination of the

’s.

(If

cannot be written as a linear combination of the

’s, then it is independent

of the

’s and

is not the maximum collection of linearly independent rows)

Write

k=1

For some coefficients C

with 1 ≤ i ≤ m and 1 ≤ k ≤ r.

Now the elements of A are

= (r

)

k=1

)













k=1













So every column of

can be written as a linear combination of the

column

vectors c

. Then the column rank of A ≤ r, the row rank of A.

Apply the same argument to

to see that the row rank is

≤

the column

rank.

4.5 Homogeneous problem Ax = 0

We restrict our attention to the square case, i.e. number of unknowns = number

of equations. Here A is an n × n matrix. We want to solve Ax = 0.

First of all, if

det A 6

= 0, then

−1

exists and

−1

, which is the

unique solution. Hence if Ax = 0 with x 6= 0, then det A = 0.

4.5.1 Geometrical interpretation

We consider a 3 × 3 matrix

A =









means that

· x

= 0 for all

. Each equation

· x

= 0 represents a

plane through the origin. So the solution is the intersection of the three planes.

There are three possibilities:

(i)

det A

= [

, r

]

= 0, span

, r

}

and thus

(

) = 3. By

the rank-nullity theorem,

(

) = 0 and the kernel is

{0}

. So

is the

unique solution.

(ii) If det A = 0, then dim(span{r

, r

}) = 1 or 2.

(a)

If rank = 2, wlog assume

, r

are linearly independent. So

lies

on the intersection of two planes

x · r

= 0 and

x · r

= 0, which is

the line

{x ∈ R

λr

× r

}

(Since

lies on the intersection of

the two planes, it has to be normal to the normals of both planes).

All such points on this line also satisfy

x · r

= 0 since

is a linear

combination of r

and r

. The kernel is a line, n(A) = 1.

(b)

If rank = 1, then

, r

are parallel. So

x·r

= 0

⇒ x·r

x·r

= 0.

So all

that satisfy

x ·r

= 0 are in the kernel, and the kernel now is

a plane. n(A) = 2.

(We also have the trivial case where

(

) = 0, we have the zero mapping and

the kernel is R

)

4.5.2 Linear mapping view of Ax = 0

In the general case, consider a linear map

→ R

x 7→ x

. The

kernel k(A) = {x ∈ R

: Ax = 0} has dimension n(A).

(i)

(

) = 0, then

(

)

, A

(

)

, ··· , A

(

) is a linearly independent set,

and r(A) = n.

(ii)

(

)

0, then the image is not the whole of

. Let

}, i

, ··· , n

(

) be a basis of the kernel, i.e. so given any solution to

n(A)

i=1

for some

. Extend

}

to be a basis of

by introducing

extra vectors

for

(

) + 1

, ··· , n

. The vectors

(

) for

n(A) + 1, ··· , n form a basis of the image.

4.6 General solution of Ax = d

Finally consider the general equation

, where

is an

n × n

matrix and

x, d are n × 1 column vectors. We can separate into two main cases.

(i) det

(

)

= 0. So

−1

exists and

(

) = 0,

(

) =

. Then for any

d ∈ R

a unique solution must exists and it is x = A

−1

(ii) det

(

) = 0. Then

−1

does not exist, and

(

)

(

)

< n

. So the

image of A is not the whole of R

(a) If d 6∈ im A, then there is no solution (by definition of the image)

(b)

d ∈ im A

, then by definition there exists at least one

such that

. The general solution of

can be written as

where

is a particular solution (i.e.

), and

is any vector

in ker A (i.e. Ay = 0). (cf. Isomorphism theorem)

(

) = 0, then

y = 0

only, and then the solution is unique (i.e.

case (i)). If

(

)

0 , then

}, i

= 1

, ··· , n

(

) is a basis of the

kernel. Hence

y =

n(A)

j=1

x = x

n(A)

j=1

for any µ

, i.e. there are infinitely many solutions.

Example.



1 1

a 1









We have det A = 1 − a. If a 6= 1, then A

−1

exists and

−1

1 − a



1 −1

−a 1



Then

x =

1 − a



1 − b

−a + b



If a = 1, then

Ax =



+ x



= (x

+ x

)





im A

span





and

ker A

span



−1



. If

b 6

= 1, then





6∈ im A

and there is no solution. If b = 1, then





∈ im A.

We find a particular solution of





. So The general solution is

x =





+ λ



−1



Example. Find the general solution of





a a b

b a a

a b a





















We have

det A

= (

a − b

)

). If

a 6

and

b 6

−

, then the inverse exists

and there is a unique solution for any c. Otherwise, the possible cases are

(i) a

b, b 6

−

. So

a 6

= 0. The kernel is the plane

= 0 which is

span











−1









−1











We extend this basis to R

by adding









So the image is the span of

















. Hence if

c 6

= 1, then









is not

in the image and there is no solution. If

= 1, then a particular solution









and the general solution is

x =









+ λ





−1





+ µ





−1





(ii) If a 6= b and b = −2a, then a 6= 0. The kernel satisfies

x + y − 2z = 0

−2x + y + z = 0

x − 2y + z = 0

This can be solved to give

, and the kernel is

span





















. We

add









and









to form a basis of

. So the image is the span of





−2









−2













is in the image, then









= λ





−2





+ µ





−2





Then the only solution is

= 0

, λ

= 1

, c

−

2. Thus there is no solution if

c 6

−

2, and when

−

2, pick a particular solution









and the general

solution is

x =









+ λ









(iii)

and

−

, then

= 0 and

ker A

. So there is no

solution for any c.

5 Eigenvalues and eigenvectors

Given a matrix A, an eigenvector is a vector x that satisfies Ax = λx for some

. We call

the associated eigenvalue. In some sense, these vectors are not

modified by the matrix, and are just scaled up by the matrix. We will look

at the properties of eigenvectors and eigenvalues, and see their importance in

diagonalizing matrices.

5.1 Preliminaries and definitions

Theorem

(Fundamental theorem of algebra)

Let

(

) be a polynomial of degree

m ≥ 1, i.e.

p(z) =

j=0

where c

∈ C and c

6= 0.

Then

(

) = 0 has precisely

(not necessarily distinct) roots in the complex

plane, accounting for multiplicity.

Note that we have the disclaimer “accounting for multiplicity”. For example,

−

+ 1 = 0 has only one distinct root, 1, but we say that this root has

multiplicity 2, and is thus counted twice. Formally, multiplicity is defined as

follows:

Definition

(Multiplicity of root)

The root

has multiplicity

if (

z − ω

)

is a factor of p(z) but (z − ω)

k+1

is not.

Example.

Let

(

) =

− z

+ 1 = (

z −

(

+ 1). So

(

) = 0 has roots

1, 1, −1, where z = 1 has multiplicity 2.

Definition

(Eigenvector and eigenvalue)

Let

→ C

be a linear map

with associated matrix A. Then x 6= 0 is an eigenvector of A if

Ax = λx

for some

is the associated eigenvalue. This means that the direction of the

eigenvector is preserved by the mapping, but is scaled up by λ.

There is a rather easy way of finding eigenvalues:

Theorem. λ is an eigenvalue of A iff

det(A − λI) = 0.

Proof.

(

⇒

) Suppose that

is an eigenvalue and

is the associated eigenvector.

We can rearrange the equation in the definition above to

(A − λI)x = 0

and thus

x ∈ ker(A − λI)

But

x 6

. So

ker

(

A−λI

) is non-trivial and

det

(

A−λI

) = 0. The (

⇐

) direction

is similar.

Definition

(Characteristic equation of matrix)

The characteristic equation of

A is

det(A − λI) = 0.

Definition

(Characteristic polynomial of matrix)

The characteristic polynomial

of A is

(λ) = det(A − λI).

From the definition of the determinant,

(λ) = det(A − λI)

= ε

···j

− λδ

) ···(A

− λδ

)

= c

+ c

λ + ··· + c

for some constants c

, ··· , c

. From this, we see that

(i) p

(

) has degree

and has

roots. So an

n ×n

matrix has

eigenvalues

(accounting for multiplicity).

(ii)

is real, then all

∈ R

. So eigenvalues are either real or come in

complex conjugate pairs.

(iii) c

= (

−

and

n−1

= (

−

n−1

(

···

) = (

−

n−1

(

But c

n−1

is the sum of roots, i.e. c

n−1

= (−1)

n−1

(λ

+ λ

+ ···λ

), so

tr(A) = λ

+ λ

+ ··· + λ

Finally,

(0) =

det

(

). Also

is the product of all roots, i.e.

= λ

···λ

. So

det A = λ

···λ

The kernel of the matrix

A − λI

is the set

λx}

. This is a vector

subspace because the kernel of any map is always a subspace.

Definition

(Eigenspace)

The eigenspace denoted by

is the kernel of the

matrix A − λI, i.e. the set of eigenvectors with eigenvalue λ.

Definition

(Algebraic multiplicity of eigenvalue)

The algebraic multiplicity

(

) or

of an eigenvalue

is the multiplicity of

(

) = 0. By the

fundamental theorem of algebra,

M(λ) = n.

If M (λ) > 1, then the eigenvalue is degenerate.

Definition

(Geometric multiplicity of eigenvalue)

The geometric multiplicity

(

) or

of an eigenvalue

is the dimension of the eigenspace, i.e. the

maximum number of linearly independent eigenvectors with eigenvalue λ.

Definition (Defect of eigenvalue). The defect ∆

of eigenvalue λ is

∆

= M(λ) − m(λ).

It can be proven that ∆

≥

0, i.e. the geometric multiplicity is never greater

than the algebraic multiplicity.

5.2 Linearly independent eigenvectors

Theorem.

Suppose

n×n

matrix

has distinct eigenvalues

, λ

, ··· , λ

. Then

the corresponding eigenvectors x

, x

, ··· , x

are linearly independent.

Proof.

Proof by contradiction: Suppose

, x

, ··· , x

are linearly dependent.

Then we can find non-zero constants d

for i = 1, 2, ··· , r, such that

+ d

+ ··· + d

= 0.

Suppose that this is the shortest non-trivial linear combination that gives

(we

may need to re-order x

Now apply (A − λ

I) to the whole equation to obtain

(λ

− λ

+ d

(λ

− λ

+ ··· + d

(λ

− λ

= 0.

We know that the first term is

, while the others are not (since we assumed

6= λ

for i 6= j). So

(λ

− λ

+ ··· + d

(λ

− λ

= 0,

and we have found a shorter linear combination that gives

. Contradiction.

Example.

(i) A =



0 1

−1 0



. Then p

(λ) = λ

+ 1 = 0. So λ

= i and λ

= −i.

To solve (A − λ

I)x = 0, we obtain



−i 1

−1 −i





= 0.

So we obtain









to be an eigenvector. Clearly any scalar multiple of





is also a solution,

but still in the same eigenspace E

= span





Solving (A − λ

I)x = 0 gives







−i



So E

−i

= span



−i



Note that

(

±i

) =

(

±i

) = 1, so ∆

±i

= 0. Also note that the two

eigenvectors are linearly independent and form a basis of C

(ii) Consider

A =





−2 2 −3

2 1 −6

−1 −2 0





Then

det

(

A − λI

) = 0 gives 45 + 21

λ − λ

− λ

. So

= 5

, λ

−

The eigenvector with eigenvalue 5 is

x =





−1





We can find that the eigenvectors with eigenvalue −3 are

x =





−2x

+ 3x





for any

, x

. This gives two linearly independent eigenvectors, say





−2













(5) =

(5) = 1 and

(

−

3) =

(

−

3) = 2, and there is no defect for

both of them. Note that these three eigenvectors form a basis of C

(iii) Let

A =





−3 −1 1

−1 −3 1

−2 −2 0





Then 0 =

(

) =

−

(

+ 2)

. So

−

, −

2. To find the eigenvectors,

we have

(A + 2I)x =





−1 −1 1

−2 −2 2













= 0

The general solution is thus

− x

= 0, and the general solution is

thus x =





+ x





. The eigenspace E

−2

= span





























Hence

(

−

2) = 3 and

(

−

2) = 2. Thus the defect ∆

−2

= 1. So the

eigenvectors do not form a basis of C

(iv)

Consider the reflection

in the plane with normal

. Clearly

−n

The eigenvalue is

−

1 and the eigenvector is

. Then

span{n}

. So

M(−1) = m(−1) = 1.

is any vector in the plane,

. So this has an eigenvalue of 1 and

eigenvectors being any vector in the plane. So M (1) = m(1) = 2.

So the eigenvectors form a basis of R

(v)

Consider a rotation

about

. Since

, we have an eigenvalue

of 1 and eigenspace E

= span{n}.

We know that there are no other real eigenvalues since rotation changes

the direction of any other vector. The other eigenvalues turn out to be

±iθ

. If

θ 6

= 0, there are 3 distinct eigenvalues and the eigenvectors form a

basis of C

(vi) Consider a shear

A =



1 µ

0 1



The characteristic equation is (1

− λ

)

= 0 and

= 1. The eigenvectors

corresponding to

= 1 is





. We have

(1) = 2 and

(1) = 1. So

∆

= 1.

n × n

matrix

has

distinct eigenvalues, and hence has

linearly

independent eigenvectors

, v

, ···v

, then with respect to this eigenvector

basis, A is diagonal.

In this basis,

= (1

, ··· ,

0) etc. We know that

(no summation).

So the image of the

th basis vector is

times the

th basis. Since the columns

of A are simply the images of the basis,







0 ··· 0

0 λ

··· 0

0 0 ··· λ







The fact that

can be diagonalized by changing the basis is an important

observation. We will now look at how we can change bases and see how we can

make use of this.

5.3 Transformation matrices

How do the components of a vector or a matrix change when we change the

basis?

Let

, e

, ··· , e

}

and

{

, ··· ,

}

be 2 different bases of

Then we can write

i=1

i.e.

is the

th component of

with respect to the basis

, e

, ··· , e

}

Note that the sum is made as

, not

. This is different from the formula

for matrix multiplication.

Matrix

has as its columns the vectors

relative to

, e

, ··· , e

}

. So

P = (

···

) and

P (e

) =

Similarly, we can write

k=1

with Q = (e

··· e

Substituting this into the equation for

, we have

i=1

k=1

i=1

But

, ··· ,

are linearly independent, so this is only possible if

i=1

= δ

which is just a fancy way of saying QP = I, or Q = P

−1

5.3.1 Transformation law for vectors

With respect to basis

}

i=1

. With respect to basis

{

}

i=1

˜u

. Note that this is the same vector

but has different components

with respect to different bases. Using the transformation matrix above for the

basis, we have

u =

j=1

˜u

i=1





j=1

˜u





By comparison, we know that

j=1

˜u

Theorem.

Denote vector as

with respect to

}

and

with respect to

{

}

Then

u = P ˜u and ˜u = P

−1

Example.

Take the first basis as

= (1

, e

= (0

}

and the second as

{

= (1, 1),

= (−1, 1)}.

= e

+ e

and

= −e

+ e

. We have

P =



1 −1

1 1



Then for an arbitrary vector u, we have

u = u

+ u

= u

(

−

) + u

(

)

+ u

)

(−u

+ u

)

Alternatively, using the formula above, we obtain

˜u = P

−1



1 1

−1 1







+ u

)

(−u

+ u

)



Which agrees with the above direct expansion.

5.3.2 Transformation law for mat rix

Consider a linear map α : C

→ C

with associated n × n matrix A. We have

= α(u) = Au.

Denote

and

as being with respect to basis

}

(i.e. same basis in both

spaces), and ˜u, ˜u

with respect to {

Using what we’ve got above, we have

= Au

P ˜u

= AP

˜u

= P

−1

AP ˜u

Theorem.

A = P

−1

AP.

Example.

Consider the shear





1 λ 0

0 1 0

0 0 1





with respect to the standard

basis. Choose a new set of basis vectors by rotating by θ about the e

axis:

= cos θe

+ sin θe

= −sin θe

+ cos θe

= e

So we have

P =





cos θ −sin θ 0

sin θ cos θ 0

0 0 1





, P

−1





cos θ sin θ 0

−sin θ cos θ 0

0 0 1





Now use the basis transformation laws to obtain





1 + λ sin θ cos θ λ cos

θ 0

−λ sin

θ 1 − λ sin θ cos θ 0

0 0 1





Clearly this is much more complicated than our original basis. This shows that

choosing a sensible basis is important.

More generally, given

→ C

, given

x ∈ C

∈ C

with

We know that A is an n × m matrix.

Suppose

has a basis

}

and

has a basis

}

. Now change bases to

{

} and {

We know that

P ˜x

with

being an

m × m

matrix, with

with

R being an n × n matrix.

Combining both of these, we have

= AP

= R

−1

AP ˜x

Therefore

A = R

−1

AP .

Example.

Consider

→ R

, with respect to the standard bases in both

spaces,

A =



2 3 4

1 6 3



Use a new basis









and keep the standard basis in

. The basis

change matrix in R

is simply I, while

R =



2 1

1 5



, R

−1



5 −1

−1 2



is the transformation matrix for R

. So

A =



2 1

1 5



2 3 4

1 6 3





5 −1

−1 2



2 3 4

1 6 3





1 1 17/9

0 1 2/9



We can alternatively do it this way: we know that









Then

we know that

= e

7→ 2f

+ f

= f

= e

7→ 3f

+ 6f

= e

7→ 4f

+ 3f

and we can construct the matrix correspondingly.

5.4 Similar matrices

Definition

(Similar matrices)

Two

n ×n

matrices

and

are similar if there

exists an invertible matrix P such that

B = P

−1

AP,

i.e. they represent the same map under different bases. Alternatively, using the

language from IA Groups, we say that they are in the same conjugacy class.

Proposition. Similar matrices have the following properties:

(i) Similar matrices have the same determinant.

(ii) Similar matrices have the same trace.

(iii) Similar matrices have the same characteristic polynomial.

Note that (iii) implies (i) and (ii) since the determinant and trace are the

coefficients of the characteristic polynomial

Proof. They are proven as follows:

(i) det B = det(P

−1

AP ) = (det A)(det P )

−1

(det P ) = det A

(ii)

tr B = B

= P

−1

= A

−1

= A

(P P

−1

)

= A

= tr A

(iii)

(λ) = det(B − λI)

= det(P

−1

AP − λI)

= det(P

−1

AP − λP

−1

IP )

= det(P

−1

(A − λI)P )

= det(A − λI)

= p

(λ)

5.5 Diagonalizable matrices

Definition

(Diagonalizable matrices)

n × n

matrix

is diagonalizable if

it is similar to a diagonal matrix. We showed above that this is equivalent to

saying the eigenvectors form a basis of C

The requirement that matrix

has

distinct eigenvalues is a sufficient

condition for diagonalizability as shown above. However, it is not necessary.

Consider the second example in Section 5.2,

A =





−2 2 −3

2 1 −6

−1 −2 0





We found three linear eigenvectors













−2













If we let

P =





1 −2 3

2 1 0

1 0 1





, P

−1





1 2 −3

−2 4 6

1 2 5





then

A = P

−1

AP =





5 0 0

0 −3 0

0 0 −3





so A is diagonalizable.

Theorem.

Let

, λ

, ··· , λ

, with

r ≤ n

be the distinct eigenvalues of

. Let

, B

, ···B

be the bases of the eigenspaces

, E

, ··· , E

correspondingly.

Then the set B =

[

i=1

is linearly independent.

This is similar to the proof we had for the case where the eigenvalues are

distinct. However, we are going to do it much concisely, and the actual meat of

the proof is actually just a single line.

Proof.

Write

(1)

, x

(1)

, ···x

(1)

m(λ

)

}

. Then

(

) =

dim

(

), and simi-

larly for all B

Consider the following general linear combination of all elements in

. Con-

sider the equation

i=1

m(λ

)

j=1

(i)

= 0.

The first sum is summing over all eigenspaces, and the second sum sums over

the basis vectors in B

. Now apply the matrix

k=1,2,··· ,

K,··· ,r

(A − λ

to the above sum, for some arbitrary K. We obtain

m(λ

)

j=1





k=1,2,··· ,

K,··· ,r

(λ

− λ

)





(K)

= 0.

Since the

(K)

are linearly independent (

is a basis),

= 0 for all

. Since

K was arbitrary, all α

must be zero. So B is linearly independent.

Proposition. A is diagonalizable iff all its eigenvalues have zero defect.

5.6 Canonical (Jordan normal) form

Given a matrix

, if its eigenvalues all have non-zero defect, then we can find

a basis in which it is diagonal. However, if some eigenvalue does have defect,

we can still put it into an almost-diagonal form. This is known as the Jordan

normal form.

Theorem. Any 2 × 2 complex matrix A is similar to exactly one of



0 λ





λ 0

0 λ





λ 1

0 λ



Proof. For each case:

(i)

has two distinct eigenvalues, then eigenvectors are linearly independent.

Then we can use P formed from eigenvectors as its columns

(ii)

and

dim E

= 2, then write

span{u, v}

, with

u, v

linearly independent. Now use

{u, v}

as a new basis of

and

A = P

−1

AP =



λ 0

0 λ



= λI

Note that since

−1

λI

, we have

(

λI

)

−1

λI

. So

isotropic, i.e. the same with respect to any basis.

(iii)

and

dim

(

) = 1, then

span{v}

. Now choose basis

of C

as {v, w}, where w ∈ C

\ E

We know that

Aw ∈ C

. So

αv

βw

. Hence, if we change basis to

{v, w}, then

A = P

−1

AP =



λ α

0 β



However,

and

both have eigenvalue

with algebraic multiplicity 2.

So we must have

. To make

= 1, let

= (

A − λI

)

. We know

u 6= 0 since w is not in the eigenspace. Then

(

A − λI)u = (

A − λI)

w =



0 α

0 0



0 α

0 0



w = 0.

So u is an eigenvector of

A with eigenvalue λ.

We have u =

Aw − λw. So

Aw = u + λw.

Change basis to {u, w}. Then A with respect to this basis is



λ 1

0 λ



This is a two-stage process:

sends basis to

{v, w}

and then matrix

sends to basis

{u, w}

. So the similarity transformation is

−1

(

−1

)

(P Q)

−1

A(P Q).

Proposition.

(Without proof) The canonical form, or Jordan normal form,

exists for any

n × n

matrix

. Specifically, there exists a similarity transform

such that A is similar to a matrix to

A that satisfies the following properties:

(i)

αα

= λ

, i.e. the diagonal composes of the eigenvalues.

(ii)

α,α+1

= 0 or 1.

(iii)

= 0 otherwise.

The actual theorem is actually stronger than this, and the Jordan normal

form satisfies some additional properties in addition to the above. However, we

shall not go into details, and this is left for the IB Linear Algebra course.

Example. Let

A =





−3 −1 1

−1 −3 1

−2 −2 0





The eigenvalues are

−

, −

2 and the eigenvectors are





−1













. Pick









. Write

= (

A − λI

)





−1 −1 1

−2 −2 2

















−1

−2





. Note that

−

. We also have

u −

. Form a basis

{u, w, v}

, where

another eigenvector linearly independent from u, say









Now change to this basis with





−1 1 1

−1 0 0

−2 0 1





. Then the Jordan normal

form is P

−1

AP =





−2 1 0

0 −2 0

0 0 −2





5.7 Cayley-Hamilton Theorem

Theorem

(Cayley-Hamilton theorem)

Every

n × n

complex matrix satisfies

its own characteristic equation.

Proof.

We will only prove for diagonalizable matrices here. So suppose for our

matrix

, there is some

such that

diag

(

, λ

, ··· , λ

) =

−1

. Note

that

= (P

−1

AP )(P

−1

AP ) ···(P

−1

AP ) = P

−1

Hence

(D) = p

−1

AP ) = P

−1

(A)]P.

Since similar matrices have the same characteristic polynomial. So

(D) = P

−1

(A)]P.

However, we also know that D

= diag(λ

, λ

, ···λ

). So

(D) = diag(p

(λ

), p

(λ

), ··· , p

(λ

)) = diag(0, 0, ··· , 0)

since the eigenvalues are roots of

(

) = 0. So 0 =

(

) =

−1

(

)

and

thus p

(A) = 0.

There are a few things to note.

(i)

−1

exists, then

−1

(

) =

−1

(

···

) = 0.

−1

···

n−1

. Since

−1

exists,

±det A 6

= 0.

−1

+ c

A + ··· + c

n−1

So we can calculate A

−1

from positive powers of A.

(ii) We can define matrix exponentiation by

= I + A +

+ ··· +

+ ··· .

It is a fact that this always converges.

is diagonalizable with

with

−1

diag

(

, λ

, ··· , λ

then

−1

P = P

−1

IP + P

−1

AP +

−1

P + ···

= I + D +

+ ···

= diag(e

, e

, ···e

)

= P [diag(e

, e

, ··· , e

)]P

−1

(iii)

For 2

2 matrices which are similar to



λ 1

0 λ



We see that the

characteristic polynomial

(

) =

det

(

B − zI

) = (

λ − z

)

. Then

(

) =

(λI − B)



0 −1

0 0





0 0



Since we have proved for the diagonalizable matrices above, we now know

that any 2 × 2 matrix satisfies Cayley-Hamilton theorem.

In IB Linear Algebra, we will prove the Cayley Hamilton theorem properly for

all matrices without assuming diagonalizability.

5.8 Eigenvalues and eigenvectors of a Hermitian matrix

5.8.1 Eigenvalues and eigenvectors

Theorem. The eigenvalues of a Hermitian matrix H are real.

Proof. Suppose that H has eigenvalue λ with eigenvector v 6= 0. Then

Hv = λv.

We pre-multiply by v

†

, a 1 × n row vector, to obtain

†

Hv = λv

†

v (∗)

We take the Hermitian conjugate of both sides. The left hand side is

†

Hv)

†

= v

†

v = v

†

since H is Hermitian. The right hand side is

(λv

†

= λ

∗

†

So we have

†

Hv = λ

∗

†

From (

∗

), we know that

λv

†

∗

†

. Since

v 6

= 0, we know that

†

v · v 6= 0. So λ = λ

∗

and λ is real.

Theorem.

The eigenvectors of a Hermitian matrix

corresponding to distinct

eigenvalues are orthogonal.

Proof. Let

= λ

(i)

= λ

. (ii)

Pre-multiply (i) by v

†

to obtain

†

= λ

†

. (iii)

Pre-multiply (ii) by v

†

and take the Hermitian conjugate to obtain

†

= λ

†

. (iv)

Equating (iii) and (iv) yields

†

= λ

†

Since

, we must have

†

= 0. So their inner product is zero and are

orthogonal.

So we know that if a Hermitian matrix has

distinct eigenvalues, then

the eigenvectors form an orthonormal basis. However, if there are degenerate

eigenvalues, it is more difficult, and requires the Gram-Schmidt process.

5.8.2 Gram-Schmidt orthogonalization (non-examinable)

Suppose we have a set

, w

, ··· , w

}

of linearly independent vectors.

We want to find an orthogonal set

B = {v

, v

, ··· , v

Define the projection of

onto

(

) =

hv|wi

hv|vi

. Now construct

iteratively:

(i) v

= w

(ii) v

= w

− P

(w)

Then we get that hv

| v

i = hv

| w

i −





| v

i = 0

(iii) v

= w

− P

) − P

)

(iv)

(v) v

= w

−

r−1

j=1

)

At each step, we subtract out the components of

that belong to the space

, ··· , v

k−1

}

. This ensures that all the vectors are orthogonal. Finally, we

normalize each basis vector individually to obtain an orthonormal basis.

5.8.3 Unitary transformation

Suppose

is the transformation between one orthonormal basis and a new

orthonormal basis {u

, u

, ··· , u

}, i.e. hu

| u

i = δ

. Then

U =







)

··· (u

)

··· (u

)

··· (u

)







Then

†

= (U

†

)

= U

∗

= (u

)

∗

)

= hu

| u

= δ

So U is a unitary matrix.

5.8.4 Diagonalization of n × n Hermitian matrices

Theorem.

n × n

Hermitian matrix has precisely

orthogonal eigenvectors.

Proof.

(Non-examinable) Let

, λ

, ··· , λ

be the distinct eigenvalues of

(

r ≤

), with a set of corresponding orthonormal eigenvectors

, v

, ··· , v

}

Extend to a basis of the whole of C

= {v

, v

, ··· , v

, w

, ··· , w

n−r

}

Now use Gram-Schmidt to create an orthonormal basis

B = {v

, v

, ··· , v

, u

, ··· , u

n−r

Now write

P =





↑ ↑ ↑ ↑ ↑

··· v

··· u

n−r

↓ ↓ ↓ ↓ ↓





We have shown above that this is a unitary matrix, i.e.

−1

†

. So if we

change basis, we have

−1

HP = P

†







0 ··· 0 0 0 ··· 0

0 λ

··· 0 0 0 ··· 0

0 0 ··· λ

0 0 ··· 0

0 0 ··· 0 c

··· c

1,n−r

0 0 ··· 0 c

··· c

2,n−r

0 0 ··· 0 c

n−r,1

n−r,2

··· c

n−r,n−r







Here

is an (

n − r

)

(

n − r

) Hermitian matrix. The eigenvalues of

are also

eigenvalues of

because

det

(

H − λI

) =

det

(

†

HP − λI

) = (

− λ

)

···

(

−

λ) det(C − λI). So the eigenvalues of C are the eigenvalues of H.

We can keep repeating the process on

until we finish all rows. For example,

if the eigenvalues of

are all distinct, there are

n − r

orthonormal eigenvectors

(for j = r + 1, ··· , n) of C. Let

Q =







↑ ↑ ↑

r+1

r+2

··· w

↓ ↓ ↓







with other entries 0. (where we have a

r × r

identity matrix block on the top

left corner and a (n − r) × (n − r) with columns formed by w

)

Since the columns of

are orthonormal,

is unitary. So

†

HP Q

diag

(

, λ

, ··· , λ

, λ

r+1

, ··· , λ

), where the first

r λ

s are distinct and the re-

maining ones are copies of previous ones.

The n linearly-independent eigenvectors are the columns of P Q.

So it now follows that

is diagonalizable via transformation

P Q

is a unitary matrix because P and Q are. We have

D = U

†

H = UDU

†

Note that a real symmetric matrix

is a special case of Hermitian matrices. So

we have

D = Q

S = QDQ

Example.

Find the orthogonal matrix which diagonalizes the following real

symmetric matrix: S =



1 β

β 1



with β 6= 0 ∈ R.

We find the eigenvalues by solving the characteristic equation:

det

(

S−λI

) = 0,

and obtain λ = 1 ± β.

The corresponding eigenvectors satisfy (

S − λI

)

= 0, which gives

√



±1



We change the basis from the standard basis to

√





√



−1



(which

is just a rotation by π/4).

The transformation matrix is



√

2 1/

√

2 −1/

√



. Then we know that

S = QDQ

with D = diag(1, −1)

5.8.5 Normal matrices

We have seen that the eigenvalues and eigenvectors of Hermitian matrices satisfy

some nice properties. More generally, we can define the following:

Definition

(Normal matrix)

A normal matrix as a matrix that commutes with

its own Hermitian conjugate, i.e.

†

= N

†

Hermitian, real symmetric, skew-Hermitian, real anti-symmetric, orthogonal,

unitary matrices are all special cases of normal matrices.

It can be shown that:

Proposition.

(i) If λ is an eigenvalue of N , then λ

∗

is an eigenvalue of N

†

(ii) The eigenvectors of distinct eigenvalues are orthogonal.

(iii)

A normal matrix can always be diagonalized with an orthonormal basis of

eigenvectors.

6 Quadratic forms and conics

We want to study quantities like

and 3

+ 2

+ 4

. For example,

conic sections generally take this form. The common characteristic of these is

that each term has degree 2. Consequently, we can write it in the form

†

for some matrix A.

Definition

(Sesquilinear, Hermitian and quadratic forms)

A sesquilinear form

is a quantity

†

∗

. If

is Hermitian, then

is a Hermitian

form. If A is real symmetric, then F is a quadratic form.

Theorem. Hermitian forms are real.

Proof.

(

†

)

∗

= (

†

)

†

. So (

†

)

∗

†

and it is

real.

We know that any Hermitian matrix can be diagonalized with a unitary

transformation. So

(

) =

†

UDU

†

. Write

†

. So

)

†

, where D = diag(λ

, ··· , λ

We know that x

is the vector x relative to the eigenvector basis. So

F (x) =

i=1

The eigenvectors are known as the principal axes.

Example.

Take

= 2

−

+ 5

, where





and



2 −2

−2 5



Note that we can always choose the matrix to be symmetric. This is since

for any antisymmetric

, we have

†

= 0. So we can just take the symmetric

part.

The eigenvalues are 1

6 with corresponding eigenvectors

√





√



−2



Now change basis with

Q =

√



2 1

1 −2



Then x

= Q

x =

√



2x + y

x − 2y



. Then F = (x

)

+ 6(y

)

So F = c is an ellipse.

6.1 Quadrics and conics

6.1.1 Quadrics

Definition

(Quadric)

A quadric is an

-dimensional surface defined by the

zero of a real quadratic polynomial, i.e.

Ax + b

x + c = 0,

where

is a real

n ×n

matrix,

x, b

are

-dimensional column vectors and

is a

constant scalar.

As noted in example, anti-symmetric matrix has

= 0, so for any

we can split it into symmetric and anti-symmetric parts, and just retain the

symmetric part S = (A + A

)/2. So we can have

Sx + b

x + c = 0

with S symmetric.

Since

is real and symmetric, we can diagonalize it using

QDQ

with

D diagonal. We write x

= Q

x and b

= Q

b. So we have

)

+ (b

)

+ c = 0.

is invertible, i.e. with no zero eigenvalues, then write

−1

which shifts the origin to eliminate the linear term (

)

and finally have

(dropping the prime superfixes)

Dx = k.

So through two transformations, we have ended up with a simple quadratic form.

6.1.2 Conic sections (n = 2)

From the equation above, we obtain

+ λ

= k.

We have the following cases:

(i) λ

0: we have ellipses with axes coinciding with eigenvectors of

(We require

sgn

(

) =

sgn

(

, λ

), or else we would have no solutions at all)

(ii) λ

< 0: say λ

= k/a

> 0, λ

= −k/b

< 0. So we obtain

−

= 1,

which is a hyperbola.

(iii) λ

= 0: Say

= 0,

= 0. Note that in this case, our symmetric

matrix S is not invertible and we cannot shift our origin using as above.

From our initial equation, we have

)

+ b

+ c = 0.

We perform the coordinate transform (which is simply completing the

square!)

= x

2λ

= x

−

)

4λ

to remove the x

and constant term. Dropping the primes, we have

+ b

= 0,

which is a parabola.

Note that above we assumed

= 0. If

= 0, we have

(

)

0. If we solve this quadratic for

, we obtain 0, 1 or 2 solutions for

(and x

can be any value). So we have 0, 1 or 2 straight lines.

These are known as conic sections. As you will see in IA Dynamics and Relativity,

this are the trajectories of planets under the influence of gravity.

6.2 Focus-directrix property

Conic sections can be defined in a different way, in terms of

Definition

(Conic sections)

The eccentricity and scale are properties of a conic

section that satisfy the following:

Let the foci of a conic section be (±ae, 0) and the directrices be x = ±a/e.

A conic section is the set of points whose distance from focus is

e×

distance

from directrix which is closer to that of focus (unless

= 1, where we take the

distance to the other directrix).

Now consider the different cases of e:

(i) e < 1. By definition,

x = a/e

(x, y)

(x − ae)

+ y

= e



− x



(1 − e

)

= 1

Which is an ellipse with semi-major axis

and semi-minor axis

√

1 − e

(if e = 0, then we have a circle)

(ii) e > 1. So

x = a/e

(x, y)

(x − ae)

+ y

= e



x −



−

− 1)

= 1

and we have a hyperbola.

(iii) e = 1: Then

x = a

(x, y)

(x − a)

+ y

= (x + 1)

= 4ax

and we have a parabola.

Conics also work in polar coordinates. We introduce a new parameter

such

that l/e is the distance from the focus to the directrix. So

l = a|1 − e

We use polar coordinates (

r, θ

) centered on a focus. So the focus-directrix

property is

r = e



− r cos θ



r =

1 + e cos θ

We see that

r → ∞

θ → cos

−1

(

−

), which is only possible if

e ≥

1, i.e.

hyperbola or parabola. But ellipses have e < 1. So r is bounded, as expected.

7 Transformation groups

We have previously seen that orthogonal matrices are used to transform between

orthonormal bases. Alternatively, we can see them as transformations of space

itself that preserve distances, which is something we will prove shortly.

Using this as the definition of an orthogonal matrix, we see that our definition

of orthogonal matrices is dependent on our choice of the notion of distance, or

metric. In special relativity, we will need to use a different metric, which will

lead to the Lorentz matrices, the matrices that conserve distances in special

relativity. We will have a brief look at these as well.

7.1 Groups of orthogonal matrices

Proposition.

The set of all

n × n

orthogonal matrices

forms a group under

matrix multiplication.

Proof.

P, Q

are orthogonal, then consider

P Q

= (

P Q

)(

P Q

)

P (QQ

= P P

= I. So R is orthogonal.

1. I satisfies II

= I. So I is orthogonal and is an identity of the group.

Inverse: if

is orthogonal, then

−1

by definition, which is also

orthogonal.

Matrix multiplication is associative since function composition is associative.

Definition

(Orthogonal group)

The orthogonal group

(

) is the group of

orthogonal matrices.

Definition

(Special orthogonal group)

The special orthogonal group is the

subgroup of O(n) that consists of all orthogonal matrices with determinant 1.

In general, we can show that any matrix in O(2) is of the form



cos θ −sin θ

sin θ cos θ





cos θ sin θ

sin θ −cos θ



7.2 Length preserving matrices

Theorem. Let P ∈ O(n). Then the following are equivalent:

(i) P is orthogonal

(ii) |P x| = |x|

(iii) (P x)

(P y) = x

y, i.e. (P x) · (P y) = x · y.

(iv) If (v

, v

, ··· , v

) are orthonormal, so are (Pv

, P v

, ··· , P v

)

(v) The columns of P are orthonormal.

Proof. We do them one by one:

(i) ⇒ (ii): |P x|

= (P x)

(P x) = x

P x = x

x = |x|

(ii) ⇒ (iii): |P (x + y)|

= |x + y|

. The right hand side is

+ y

)(x + y) = x

x + y

y + y

x + x

y = |x|

+ |y|

+ 2x

Similarly, the left hand side is

|P x + P y|

= |P x|

+ |P y| + 2(P x)

P y = |x|

+ |y|

+ 2(P x)

P y.

So (P x)

P y = x

(iii) ⇒ (iv): (P v

)

P v

= v

= δ

. So P v

’s are also orthonormal.

(iv) ⇒

(v): Take the

’s to be the standard basis. So the columns of

, being

P e

, are orthonormal.

(v) ⇒

(i): The columns of

are orthonormal. Then (

P P

)

) · (P

) = δ

, viewing P

as the ith column of P. So P P

= I.

Therefore the set of length-preserving matrices is precisely O(n).

7.3 Lorentz transformations

Consider the Minkowski 1 + 1 dimension spacetime (i.e. 1 space dimension and

1 time dimension)

Definition

(Minkowski inner product)

The Minkowski inner product of 2

vectors x and y is

hx | yi = x

Jy,

where

J =



1 0

0 −1



Then hx | yi = x

− x

This is to be compared to the usual Euclidean inner product of

x, y ∈ R

given by

hx | yi = x

y = x

Iy = x

+ x

Definition

(Preservation of inner product)

A transformation matrix

pre-

serves the Minkowski inner product if

hx|yi = hMx|Myi

for all x, y.

We know that

= (

)

JMy

. Since this has to be

true for all x and y, we must have

J = M

JM.

We can show that M takes the form of



cosh α sinh α

sinh α cosh α



or K

α/2



cosh α −sinh α

sinh α −cosh α



where H

is a hyperbolic rotation, and K

α/2

is a hyperbolic reflection.

This is technically all matrices that preserve the metric, since these only

include matrices with

0. In physics, these are the matrices we want, since

< 0 corresponds to inverting time, which is frowned upon.

Definition

(Lorentz matrix)

A Lorentz matrix or a Lorentz boost is a matrix

in the form

√

1 − v



1 v

v 1



Here

|v| <

1, where we have chosen units in which the speed of light is equal to

1. We have B

= H

tanh

−1

Definition

(Lorentz group)

The Lorentz group is a group of all Lorentz matrices

under matrix multiplication.

It is easy to prove that this is a group. For the closure axiom, we have

= B

, where

= tanh(tanh

−1

+ tanh

−1

) =

+ v

1 + v

The set of all

is a group of transformations which preserve the Minkowski

inner product.