6Endomorphisms

IB Linear Algebra

6.3 The Cayley-Hamilton theorem

We will first state the theorem, and then prove it later.

Recall that

χ

α

(

t

) =

det

(

tι − α

) for

α ∈ End

(

V

). Our main theorem of the

section (as you might have guessed from the title) is

Theorem

(Cayley-Hamilton theorem)

.

Let

V

be a finite-dimensional vector

space and

α ∈ End

(

V

). Then

χ

α

(

α

) = 0, i.e.

M

α

(

t

)

| χ

α

(

t

). In particular,

deg M

α

≤ n.

We will not prove this yet, but just talk about it first. It is tempting to prove

this by substituting

t

=

α

into

det

(

tι − α

) and get

det

(

α − α

) = 0, but this is

meaningless, since what the statement

χ

α

(

t

) =

det

(

tι − α

) tells us to do is to

expand the determinant of the matrix

t −a

11

a

12

··· a

1n

a

21

t −a

22

··· a

2n

.

.

.

.

.

.

.

.

.

.

.

.

a

n1

a

n2

··· t −a

nn

to obtain a polynomial, and we clearly cannot substitute

t

=

A

in this expression.

However, we can later show that we can use this idea to prove it, but just be a

bit more careful.

Note also that if ρ(t) ∈ F[t] and

A =

λ

1

.

.

.

λ

n

,

then

ρ(A) =

ρ(λ

1

)

.

.

.

ρ(λ

n

)

.

Since

χ

A

(

t

) is defined as

Q

n

i=1

(

t − λ

i

), it follows that

χ

A

(

A

) = 0. So if

α

is

diagonalizable, then the theorem is clear.

This was easy. Diagonalizable matrices are nice. The next best thing we can

look at is upper-triangular matrices.

Definition

(Triangulable)

.

An endomorphism

α ∈ End

(

V

) is triangulable if

there is a basis for V such that α is represented by an upper triangular matrix

a

11

a

12

··· a

1n

0 a

22

··· a

2n

.

.

.

.

.

.

.

.

.

.

.

.

0 0 ··· a

nn

.

We have a similar lemma telling us when matrices are triangulable.

Lemma.

An endomorphism

α

is triangulable if and only if

χ

α

(

t

) can be written

as a product of linear factors, not necessarily distinct. In particular, if

F

=

C

(or any algebraically closed field), then every endomorphism is triangulable.

Proof. Suppose that α is triangulable and represented by

λ

1

∗ ··· ∗

0 λ

2

··· ∗

.

.

.

.

.

.

.

.

.

.

.

.

0 0 ··· λ

n

.

Then

χ

α

(t) = det

t −λ

1

∗ ··· ∗

0 t −λ

2

··· ∗

.

.

.

.

.

.

.

.

.

.

.

.

0 0 ··· t −λ

n

=

n

Y

i=1

(t −λ

i

).

So it is a product of linear factors.

We are going to prove the converse by induction on the dimension of our

space. The base case

dim V

= 1 is trivial, since every 1

×

1 matrix is already

upper triangular.

Suppose

α ∈ End

(

V

) and the result holds for all spaces of dimensions

< dim V

, and

χ

α

is a product of linear factors. In particular,

χ

α

(

t

) has a root,

say λ ∈ F.

Now let

U

=

E

(

λ

)

6

= 0, and let

W

be a complementary subspace to

U

in

V

,

i.e.

V

=

U ⊕ W

. Let

u

1

, ··· , u

r

be a basis for

U

and

w

r+1

, ··· , w

n

be a basis

for

W

so that

u

1

, ··· , u

r

, w

r+1

, ··· , w

n

is a basis for

V

, and

α

is represented by

λI

r

stuff

0 B

We know

χ

α

(

t

) = (

t −λ

)

r

χ

B

(

t

). So

χ

B

(

t

) is also a product of linear factors. We

let β : W → W be the map defined by B with respect to w

r+1

, ··· , w

n

.

(Note that in general,

β

is not

α|

W

in general, since

α

does not necessarily

map

W

to

W

. However, we can say that (

α−β

)(

w

)

∈ U

for all

w ∈ W

. This can

be much more elegantly expressed in terms of quotient spaces, but unfortunately

that is not officially part of the course)

Since

dim W < dim V

, there is a basis

v

r+1

, ··· , v

n

for

W

such that

β

is

represented by C, which is upper triangular.

For j = 1, ··· , n − r, we have

α(v

j+r

) = u +

n−r

X

k=1

C

kj

v

k+r

for some u ∈ U. So α is represented by

λI

r

stuff

0 C

with respect to (u

1

, ··· , u

r

, v

r+1

, ··· , v

n

), which is upper triangular.

Example. Consider the real rotation matrix

cos θ sin θ

−sin θ cos θ

.

This is not similar to a real upper triangular matrix (if

θ

is not an integer

multiple of

π

). This is since the eigenvalues are

e

±iθ

and are not real. On the

other hand, as a complex matrix, it is triangulable, and in fact diagonalizable

since the eigenvalues are distinct.

For this reason, in the rest of the section, we are mostly going to work in

C

.

We can now prove the Cayley-Hamilton theorem.

Theorem

(Cayley-Hamilton theorem)

.

Let

V

be a finite-dimensional vector

space and

α ∈ End

(

V

). Then

χ

α

(

α

) = 0, i.e.

M

α

(

t

)

| χ

α

(

t

). In particular,

deg M

α

≤ n.

Proof.

In this proof, we will work over

C

. By the lemma, we can choose a basis

{e

1

, ··· , e

n

} is represented by an upper triangular matrix.

A =

λ

1

∗ ··· ∗

0 λ

2

··· ∗

.

.

.

.

.

.

.

.

.

.

.

.

0 0 ··· λ

n

.

We must prove that

χ

α

(α) = χ

A

(α) =

n

Y

i=1

(α − λ

i

ι) = 0.

Write V

j

= he

1

, ··· , e

j

i. So we have the inclusions

V

0

= 0 ⊆ V

1

⊆ ··· ⊆ V

n−1

⊆ V

n

= V.

We also know that dim V

j

= j. This increasing sequence is known as a flag.

Now note that since A is upper-triangular, we get

α(e

i

) =

i

X

k=1

A

ki

e

k

∈ V

i

.

So α(V

j

) ⊆ V

j

for all j = 0, ··· , n.

Moreover, we have

(α − λ

j

ι)(e

j

) =

j−1

X

k=1

A

kj

e

k

⊆ V

j−1

for all

j

= 1

, ··· , n

. So every time we apply one of these things, we get to a

smaller space. Hence by induction on n − j, we have

n

Y

i=j

(α − λ

i

ι)(V

n

) ⊆ V

j−1

.

In particular, when j = 1, we get

n

Y

i=1

(α − λ

i

ι)(V ) ⊆ V

0

= 0.

So χ

α

(α) = 0 as required.

Note that if our field

F

is not

C

but just a subfield of

C

, say

R

, we can just

pretend it is a complex matrix, do the same proof.

We can see this proof more “visually” as follows: for simplicity of expression,

we suppose

n

= 4. In the basis where

α

is upper-triangular, the matrices

A −λ

i

I

look like this

A −λ

1

I =

0 ∗ ∗ ∗

0 ∗ ∗ ∗

0 0 ∗ ∗

0 0 0 ∗

A −λ

2

I =

∗ ∗ ∗ ∗

0 0 ∗ ∗

0 0 ∗ ∗

0 0 0 ∗

A −λ

3

I =

∗ ∗ ∗ ∗

0 ∗ ∗ ∗

0 0 0 ∗

0 0 0 ∗

A −λ

4

I =

∗ ∗ ∗ ∗

0 ∗ ∗ ∗

0 0 ∗ ∗

0 0 0 0

Then we just multiply out directly (from the right):

4

Y

i=1

(A −λ

i

I) =

0 ∗ ∗ ∗

0 ∗ ∗ ∗

0 0 ∗ ∗

0 0 0 ∗

∗ ∗ ∗ ∗

0 0 ∗ ∗

0 0 ∗ ∗

0 0 0 ∗

∗ ∗ ∗ ∗

0 ∗ ∗ ∗

0 0 0 ∗

0 0 0 ∗

∗ ∗ ∗ ∗

0 ∗ ∗ ∗

0 0 ∗ ∗

0 0 0 0

=

0 ∗ ∗ ∗

0 ∗ ∗ ∗

0 0 ∗ ∗

0 0 0 ∗

∗ ∗ ∗ ∗

0 0 ∗ ∗

0 0 ∗ ∗

0 0 0 ∗

∗ ∗ ∗ ∗

0 ∗ ∗ ∗

0 0 0 0

0 0 0 0

=

0 ∗ ∗ ∗

0 ∗ ∗ ∗

0 0 ∗ ∗

0 0 0 ∗

∗ ∗ ∗ ∗

0 0 0 0

0 0 0 0

0 0 0 0

=

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

.

This is exactly what we showed in the proof — after multiplying out the first

k

elements of the product (counting from the right), the image is contained in the

span of the first n −k basis vectors.

Proof.

We’ll now prove the theorem again, which is somewhat a formalization

of the “nonsense proof” where we just substitute t = α into det(α − tι).

Let α be represented by A, and B = tI − A. Then

B adj B = det BI

n

= χ

α

(t)I

n

.

But we know that

adj B

is a matrix with entries in

F

[

t

] of degree at most

n −

1.

So we can write

adj B = B

n−1

t

n−1

+ B

n−2

t

n−2

+ ··· + B

0

,

with B

i

∈ Mat

n

(F). We can also write

χ

α

(t) = t

n

+ a

n−1

t

n−1

+ ··· + a

0

.

Then we get the result

(tI

n

− A)(B

n−1

t

n−1

+ B

n−2

t

n−2

+ ··· + B

0

) = (t

n

+ a

n−1

t

n−1

+ ··· + a

0

)I

n

.

We would like to just throw in

t

=

A

, and get the desired result, but in all these

derivations, t is assumed to be a real number, and, tI

n

− A is the matrix

t −a

11

a

12

··· a

1n

a

21

t −a

22

··· a

2n

.

.

.

.

.

.

.

.

.

.

.

.

a

n1

a

n2

··· t −a

nn

It doesn’t make sense to put our A in there.

However, what we can do is to note that since this is true for all values of

t

,

the coefficients on both sides must be equal. Equating coefficients in

t

k

, we have

−AB

0

= a

0

I

n

B

0

− AB

1

= a

1

I

n

.

.

.

B

n−2

− AB

n−1

= a

n−1

I

n

AB

n−1

− 0 = I

n

We now multiply each row a suitable power of A to obtain

−AB

0

= a

0

I

n

AB

0

− A

2

B

1

= a

1

A

.

.

.

A

n−1

B

n−2

− A

n

B

n−1

= a

n−1

A

n−1

A

n

B

n−1

− 0 = A

n

.

Summing this up then gives χ

α

(A) = 0.

This proof suggests that we really ought to be able to just substitute in

t

=

α

and be done. In fact, we can do this, after we develop sufficient machinery. This

will be done in the IB Groups, Rings and Modules course.

Lemma. Let α ∈ End(V ), λ ∈ F. Then the following are equivalent:

(i) λ is an eigenvalue of α.

(ii) λ is a root of χ

α

(t).

(iii) λ is a root of M

α

(t).

Proof.

–

(i)

⇔

(ii):

λ

is an eigenvalue of

α

if and only if (

α − λι

)(

v

) = 0 has a

non-trivial root, iff det(α − λι) = 0.

– (iii) ⇒ (ii): This follows from Cayley-Hamilton theorem since M

α

| χ

α

.

–

(i)

⇒

(iii): Let

λ

be an eigenvalue, and

v

be a corresponding eigenvector.

Then by definition of M

α

, we have

M

α

(α)(v) = 0(v) = 0.

We also know that

M

α

(α)(v) = M

α

(λ)v.

Since v is non-zero, we must have M

α

(λ) = 0.

–

(iii)

⇒

(i): This is not necessary since it follows from the above, but

we could as well do it explicitly. Suppose

λ

is a root of

M

α

(

t

). Then

M

α

(

t

) = (

t − λ

)

g

(

t

) for some

g ∈ F

[

t

]. But

deg g < deg M

α

. Hence by

minimality of

M

α

, we must have

g

(

α

)

6

= 0. So there is some

v ∈ V

such

that g(α)(v) 6= 0. Then

(α − λι)g(α)(v) = M

α

(α)v = 0.

So we must have

α

(

g

(

α

)(

v

)) =

λg

(

α

)(

v

). So

g

(

α

)(

v

)

∈ E

α

(

λ

)

\{

0

}

. So (i)

holds.

Example. What is the minimal polynomial of

A =

1 0 −2

0 1 1

0 0 2

?

We can compute

χ

A

(

t

) = (

t−

1)

2

(

t−

2). So we know that the minimal polynomial

is one of (t −1)

2

(t −2) and (t − 1)(t −2).

By direct and boring computations, we can find (

A −I

)(

A −

2

I

) = 0. So we

know that M

A

(t) = (t −1)(t −2). So A is diagonalizable.