IB Linear Algebra - Endomorphisms

6Endomorphisms

IB Linear Algebra

6.3 The Cayley-Hamilton theorem

We will first state the theorem, and then prove it later.

Recall that

(

) =

det

(

tι − α

) for

α ∈ End

(

). Our main theorem of the

section (as you might have guessed from the title) is

Theorem

(Cayley-Hamilton theorem)

Let

be a finite-dimensional vector

space and

α ∈ End

(

). Then

(

) = 0, i.e.

(

)

| χ

(

). In particular,

deg M

≤ n.

We will not prove this yet, but just talk about it first. It is tempting to prove

this by substituting

into

det

(

tι − α

) and get

det

(

α − α

) = 0, but this is

meaningless, since what the statement

(

) =

det

(

tι − α

) tells us to do is to

expand the determinant of the matrix







t −a

··· a

t −a

··· a

··· t −a







to obtain a polynomial, and we clearly cannot substitute

in this expression.

However, we can later show that we can use this idea to prove it, but just be a

bit more careful.

Note also that if ρ(t) ∈ F[t] and

A =













then

ρ(A) =







ρ(λ

)

ρ(λ

)







Since

(

) is defined as

i=1

(

t − λ

), it follows that

(

) = 0. So if

diagonalizable, then the theorem is clear.

This was easy. Diagonalizable matrices are nice. The next best thing we can

look at is upper-triangular matrices.

Definition

(Triangulable)

An endomorphism

α ∈ End

(

) is triangulable if

there is a basis for V such that α is represented by an upper triangular matrix







··· a

0 a

··· a

0 0 ··· a







We have a similar lemma telling us when matrices are triangulable.

Lemma.

An endomorphism

is triangulable if and only if

(

) can be written

as a product of linear factors, not necessarily distinct. In particular, if

(or any algebraically closed field), then every endomorphism is triangulable.

Proof. Suppose that α is triangulable and represented by







∗ ··· ∗

0 λ

··· ∗

0 0 ··· λ







Then

(t) = det







t −λ

∗ ··· ∗

0 t −λ

··· ∗

0 0 ··· t −λ







i=1

(t −λ

So it is a product of linear factors.

We are going to prove the converse by induction on the dimension of our

space. The base case

dim V

= 1 is trivial, since every 1

1 matrix is already

upper triangular.

Suppose

α ∈ End

(

) and the result holds for all spaces of dimensions

< dim V

, and

is a product of linear factors. In particular,

(

) has a root,

say λ ∈ F.

Now let

(

)

= 0, and let

be a complementary subspace to

i.e.

U ⊕ W

. Let

, ··· , u

be a basis for

and

r+1

, ··· , w

be a basis

for

so that

, ··· , u

, w

r+1

, ··· , w

is a basis for

, and

is represented by



λI

stuff

0 B



We know

(

) = (

t −λ

)

(

). So

(

) is also a product of linear factors. We

let β : W → W be the map defined by B with respect to w

r+1

, ··· , w

(Note that in general,

is not

α|

in general, since

does not necessarily

map

. However, we can say that (

α−β

)(

)

∈ U

for all

w ∈ W

. This can

be much more elegantly expressed in terms of quotient spaces, but unfortunately

that is not officially part of the course)

Since

dim W < dim V

, there is a basis

r+1

, ··· , v

for

such that

represented by C, which is upper triangular.

For j = 1, ··· , n − r, we have

α(v

j+r

) = u +

n−r

k=1

k+r

for some u ∈ U. So α is represented by



λI

stuff

0 C



with respect to (u

, ··· , u

, v

r+1

, ··· , v

), which is upper triangular.

Example. Consider the real rotation matrix



cos θ sin θ

−sin θ cos θ



This is not similar to a real upper triangular matrix (if

is not an integer

multiple of

). This is since the eigenvalues are

±iθ

and are not real. On the

other hand, as a complex matrix, it is triangulable, and in fact diagonalizable

since the eigenvalues are distinct.

For this reason, in the rest of the section, we are mostly going to work in

We can now prove the Cayley-Hamilton theorem.

Theorem

(Cayley-Hamilton theorem)

Let

be a finite-dimensional vector

space and

α ∈ End

(

). Then

(

) = 0, i.e.

(

)

| χ

(

). In particular,

deg M

≤ n.

Proof.

In this proof, we will work over

. By the lemma, we can choose a basis

, ··· , e

} is represented by an upper triangular matrix.

A =







∗ ··· ∗

0 λ

··· ∗

0 0 ··· λ







We must prove that

(α) = χ

(α) =

i=1

(α − λ

ι) = 0.

Write V

= he

, ··· , e

i. So we have the inclusions

= 0 ⊆ V

⊆ ··· ⊆ V

n−1

⊆ V

= V.

We also know that dim V

= j. This increasing sequence is known as a flag.

Now note that since A is upper-triangular, we get

α(e

) =

k=1

∈ V

So α(V

) ⊆ V

for all j = 0, ··· , n.

Moreover, we have

(α − λ

ι)(e

) =

j−1

k=1

⊆ V

j−1

for all

= 1

, ··· , n

. So every time we apply one of these things, we get to a

smaller space. Hence by induction on n − j, we have

i=j

(α − λ

ι)(V

) ⊆ V

j−1

In particular, when j = 1, we get

i=1

(α − λ

ι)(V ) ⊆ V

= 0.

So χ

(α) = 0 as required.

Note that if our field

is not

but just a subfield of

, say

, we can just

pretend it is a complex matrix, do the same proof.

We can see this proof more “visually” as follows: for simplicity of expression,

we suppose

= 4. In the basis where

is upper-triangular, the matrices

A −λ

look like this

A −λ

I =







0 ∗ ∗ ∗

0 0 ∗ ∗

0 0 0 ∗







A −λ

I =







∗ ∗ ∗ ∗

0 0 ∗ ∗

0 0 0 ∗







A −λ

I =







∗ ∗ ∗ ∗

0 ∗ ∗ ∗

0 0 0 ∗







A −λ

I =







∗ ∗ ∗ ∗

0 ∗ ∗ ∗

0 0 ∗ ∗

0 0 0 0







Then we just multiply out directly (from the right):

i=1

(A −λ

I) =







0 ∗ ∗ ∗

0 0 ∗ ∗

0 0 0 ∗













∗ ∗ ∗ ∗

0 0 ∗ ∗

0 0 0 ∗













∗ ∗ ∗ ∗

0 ∗ ∗ ∗

0 0 0 ∗













∗ ∗ ∗ ∗

0 ∗ ∗ ∗

0 0 ∗ ∗

0 0 0 0













0 ∗ ∗ ∗

0 0 ∗ ∗

0 0 0 ∗













∗ ∗ ∗ ∗

0 0 ∗ ∗

0 0 0 ∗













∗ ∗ ∗ ∗

0 ∗ ∗ ∗

0 0 0 0













0 ∗ ∗ ∗

0 0 ∗ ∗

0 0 0 ∗













∗ ∗ ∗ ∗

0 0 0 0













0 0 0 0







This is exactly what we showed in the proof — after multiplying out the first

elements of the product (counting from the right), the image is contained in the

span of the first n −k basis vectors.

Proof.

We’ll now prove the theorem again, which is somewhat a formalization

of the “nonsense proof” where we just substitute t = α into det(α − tι).

Let α be represented by A, and B = tI − A. Then

B adj B = det BI

= χ

(t)I

But we know that

adj B

is a matrix with entries in

[

] of degree at most

n −

So we can write

adj B = B

n−1

+ B

n−2

+ ··· + B

with B

∈ Mat

(F). We can also write

(t) = t

+ a

n−1

+ ··· + a

Then we get the result

(tI

− A)(B

n−1

+ B

n−2

+ ··· + B

) = (t

+ a

n−1

+ ··· + a

We would like to just throw in

, and get the desired result, but in all these

derivations, t is assumed to be a real number, and, tI

− A is the matrix







t −a

··· a

t −a

··· a

··· t −a







It doesn’t make sense to put our A in there.

However, what we can do is to note that since this is true for all values of

the coefficients on both sides must be equal. Equating coefficients in

, we have

−AB

= a

− AB

= a

n−2

− AB

n−1

= a

n−1

− 0 = I

We now multiply each row a suitable power of A to obtain

−AB

= a

− A

= a

n−1

n−2

− A

n−1

= a

n−1

− 0 = A

Summing this up then gives χ

(A) = 0.

This proof suggests that we really ought to be able to just substitute in

and be done. In fact, we can do this, after we develop sufficient machinery. This

will be done in the IB Groups, Rings and Modules course.

Lemma. Let α ∈ End(V ), λ ∈ F. Then the following are equivalent:

(i) λ is an eigenvalue of α.

(ii) λ is a root of χ

(t).

(iii) λ is a root of M

(t).

Proof.

–

(i)

⇔

(ii):

is an eigenvalue of

if and only if (

α − λι

)(

) = 0 has a

non-trivial root, iff det(α − λι) = 0.

– (iii) ⇒ (ii): This follows from Cayley-Hamilton theorem since M

| χ

–

(i)

⇒

(iii): Let

be an eigenvalue, and

be a corresponding eigenvector.

Then by definition of M

, we have

(α)(v) = 0(v) = 0.

We also know that

(α)(v) = M

(λ)v.

Since v is non-zero, we must have M

(λ) = 0.

–

(iii)

⇒

(i): This is not necessary since it follows from the above, but

we could as well do it explicitly. Suppose

is a root of

(

). Then

(

) = (

t − λ

)

(

) for some

g ∈ F

[

]. But

deg g < deg M

. Hence by

minimality of

, we must have

(

)

= 0. So there is some

v ∈ V

such

that g(α)(v) 6= 0. Then

(α − λι)g(α)(v) = M

(α)v = 0.

So we must have

(

)(

)) =

λg

(

)(

). So

(

)(

)

∈ E

(

)

}

. So (i)

holds.

Example. What is the minimal polynomial of

A =





1 0 −2

0 1 1

0 0 2





We can compute

(

) = (

t−

(

t−

2). So we know that the minimal polynomial

is one of (t −1)

(t −2) and (t − 1)(t −2).

By direct and boring computations, we can find (

A −I

)(

A −

) = 0. So we

know that M

(t) = (t −1)(t −2). So A is diagonalizable.