6Endomorphisms

IB Linear Algebra



6.3 The Cayley-Hamilton theorem
We will first state the theorem, and then prove it later.
Recall that
χ
α
(
t
) =
det
(
α
) for
α End
(
V
). Our main theorem of the
section (as you might have guessed from the title) is
Theorem
(Cayley-Hamilton theorem)
.
Let
V
be a finite-dimensional vector
space and
α End
(
V
). Then
χ
α
(
α
) = 0, i.e.
M
α
(
t
)
| χ
α
(
t
). In particular,
deg M
α
n.
We will not prove this yet, but just talk about it first. It is tempting to prove
this by substituting
t
=
α
into
det
(
α
) and get
det
(
α α
) = 0, but this is
meaningless, since what the statement
χ
α
(
t
) =
det
(
α
) tells us to do is to
expand the determinant of the matrix
t a
11
a
12
··· a
1n
a
21
t a
22
··· a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
··· t a
nn
to obtain a polynomial, and we clearly cannot substitute
t
=
A
in this expression.
However, we can later show that we can use this idea to prove it, but just be a
bit more careful.
Note also that if ρ(t) F[t] and
A =
λ
1
.
.
.
λ
n
,
then
ρ(A) =
ρ(λ
1
)
.
.
.
ρ(λ
n
)
.
Since
χ
A
(
t
) is defined as
Q
n
i=1
(
t λ
i
), it follows that
χ
A
(
A
) = 0. So if
α
is
diagonalizable, then the theorem is clear.
This was easy. Diagonalizable matrices are nice. The next best thing we can
look at is upper-triangular matrices.
Definition
(Triangulable)
.
An endomorphism
α End
(
V
) is triangulable if
there is a basis for V such that α is represented by an upper triangular matrix
a
11
a
12
··· a
1n
0 a
22
··· a
2n
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ··· a
nn
.
We have a similar lemma telling us when matrices are triangulable.
Lemma.
An endomorphism
α
is triangulable if and only if
χ
α
(
t
) can be written
as a product of linear factors, not necessarily distinct. In particular, if
F
=
C
(or any algebraically closed field), then every endomorphism is triangulable.
Proof. Suppose that α is triangulable and represented by
λ
1
···
0 λ
2
···
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ··· λ
n
.
Then
χ
α
(t) = det
t λ
1
···
0 t λ
2
···
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ··· t λ
n
=
n
Y
i=1
(t λ
i
).
So it is a product of linear factors.
We are going to prove the converse by induction on the dimension of our
space. The base case
dim V
= 1 is trivial, since every 1
×
1 matrix is already
upper triangular.
Suppose
α End
(
V
) and the result holds for all spaces of dimensions
< dim V
, and
χ
α
is a product of linear factors. In particular,
χ
α
(
t
) has a root,
say λ F.
Now let
U
=
E
(
λ
)
6
= 0, and let
W
be a complementary subspace to
U
in
V
,
i.e.
V
=
U W
. Let
u
1
, ··· , u
r
be a basis for
U
and
w
r+1
, ··· , w
n
be a basis
for
W
so that
u
1
, ··· , u
r
, w
r+1
, ··· , w
n
is a basis for
V
, and
α
is represented by
λI
r
stuff
0 B
We know
χ
α
(
t
) = (
t λ
)
r
χ
B
(
t
). So
χ
B
(
t
) is also a product of linear factors. We
let β : W W be the map defined by B with respect to w
r+1
, ··· , w
n
.
(Note that in general,
β
is not
α|
W
in general, since
α
does not necessarily
map
W
to
W
. However, we can say that (
αβ
)(
w
)
U
for all
w W
. This can
be much more elegantly expressed in terms of quotient spaces, but unfortunately
that is not officially part of the course)
Since
dim W < dim V
, there is a basis
v
r+1
, ··· , v
n
for
W
such that
β
is
represented by C, which is upper triangular.
For j = 1, ··· , n r, we have
α(v
j+r
) = u +
nr
X
k=1
C
kj
v
k+r
for some u U. So α is represented by
λI
r
stuff
0 C
with respect to (u
1
, ··· , u
r
, v
r+1
, ··· , v
n
), which is upper triangular.
Example. Consider the real rotation matrix
cos θ sin θ
sin θ cos θ
.
This is not similar to a real upper triangular matrix (if
θ
is not an integer
multiple of
π
). This is since the eigenvalues are
e
±
and are not real. On the
other hand, as a complex matrix, it is triangulable, and in fact diagonalizable
since the eigenvalues are distinct.
For this reason, in the rest of the section, we are mostly going to work in
C
.
We can now prove the Cayley-Hamilton theorem.
Theorem
(Cayley-Hamilton theorem)
.
Let
V
be a finite-dimensional vector
space and
α End
(
V
). Then
χ
α
(
α
) = 0, i.e.
M
α
(
t
)
| χ
α
(
t
). In particular,
deg M
α
n.
Proof.
In this proof, we will work over
C
. By the lemma, we can choose a basis
{e
1
, ··· , e
n
} is represented by an upper triangular matrix.
A =
λ
1
···
0 λ
2
···
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ··· λ
n
.
We must prove that
χ
α
(α) = χ
A
(α) =
n
Y
i=1
(α λ
i
ι) = 0.
Write V
j
= he
1
, ··· , e
j
i. So we have the inclusions
V
0
= 0 V
1
··· V
n1
V
n
= V.
We also know that dim V
j
= j. This increasing sequence is known as a flag.
Now note that since A is upper-triangular, we get
α(e
i
) =
i
X
k=1
A
ki
e
k
V
i
.
So α(V
j
) V
j
for all j = 0, ··· , n.
Moreover, we have
(α λ
j
ι)(e
j
) =
j1
X
k=1
A
kj
e
k
V
j1
for all
j
= 1
, ··· , n
. So every time we apply one of these things, we get to a
smaller space. Hence by induction on n j, we have
n
Y
i=j
(α λ
i
ι)(V
n
) V
j1
.
In particular, when j = 1, we get
n
Y
i=1
(α λ
i
ι)(V ) V
0
= 0.
So χ
α
(α) = 0 as required.
Note that if our field
F
is not
C
but just a subfield of
C
, say
R
, we can just
pretend it is a complex matrix, do the same proof.
We can see this proof more “visually” as follows: for simplicity of expression,
we suppose
n
= 4. In the basis where
α
is upper-triangular, the matrices
A λ
i
I
look like this
A λ
1
I =
0 ∗ ∗ ∗
0 ∗ ∗ ∗
0 0 ∗ ∗
0 0 0
A λ
2
I =
∗ ∗ ∗ ∗
0 0 ∗ ∗
0 0 ∗ ∗
0 0 0
A λ
3
I =
∗ ∗ ∗ ∗
0 ∗ ∗ ∗
0 0 0
0 0 0
A λ
4
I =
∗ ∗ ∗ ∗
0 ∗ ∗ ∗
0 0 ∗ ∗
0 0 0 0
Then we just multiply out directly (from the right):
4
Y
i=1
(A λ
i
I) =
0 ∗ ∗ ∗
0 ∗ ∗ ∗
0 0 ∗ ∗
0 0 0
∗ ∗ ∗ ∗
0 0 ∗ ∗
0 0 ∗ ∗
0 0 0
∗ ∗ ∗ ∗
0 ∗ ∗ ∗
0 0 0
0 0 0
∗ ∗ ∗ ∗
0 ∗ ∗ ∗
0 0 ∗ ∗
0 0 0 0
=
0 ∗ ∗ ∗
0 ∗ ∗ ∗
0 0 ∗ ∗
0 0 0
∗ ∗ ∗ ∗
0 0 ∗ ∗
0 0 ∗ ∗
0 0 0
∗ ∗ ∗ ∗
0 ∗ ∗ ∗
0 0 0 0
0 0 0 0
=
0 ∗ ∗ ∗
0 ∗ ∗ ∗
0 0 ∗ ∗
0 0 0
∗ ∗ ∗ ∗
0 0 0 0
0 0 0 0
0 0 0 0
=
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
.
This is exactly what we showed in the proof after multiplying out the first
k
elements of the product (counting from the right), the image is contained in the
span of the first n k basis vectors.
Proof.
We’ll now prove the theorem again, which is somewhat a formalization
of the “nonsense proof” where we just substitute t = α into det(α ).
Let α be represented by A, and B = tI A. Then
B adj B = det BI
n
= χ
α
(t)I
n
.
But we know that
adj B
is a matrix with entries in
F
[
t
] of degree at most
n
1.
So we can write
adj B = B
n1
t
n1
+ B
n2
t
n2
+ ··· + B
0
,
with B
i
Mat
n
(F). We can also write
χ
α
(t) = t
n
+ a
n1
t
n1
+ ··· + a
0
.
Then we get the result
(tI
n
A)(B
n1
t
n1
+ B
n2
t
n2
+ ··· + B
0
) = (t
n
+ a
n1
t
n1
+ ··· + a
0
)I
n
.
We would like to just throw in
t
=
A
, and get the desired result, but in all these
derivations, t is assumed to be a real number, and, tI
n
A is the matrix
t a
11
a
12
··· a
1n
a
21
t a
22
··· a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
··· t a
nn
It doesn’t make sense to put our A in there.
However, what we can do is to note that since this is true for all values of
t
,
the coefficients on both sides must be equal. Equating coefficients in
t
k
, we have
AB
0
= a
0
I
n
B
0
AB
1
= a
1
I
n
.
.
.
B
n2
AB
n1
= a
n1
I
n
AB
n1
0 = I
n
We now multiply each row a suitable power of A to obtain
AB
0
= a
0
I
n
AB
0
A
2
B
1
= a
1
A
.
.
.
A
n1
B
n2
A
n
B
n1
= a
n1
A
n1
A
n
B
n1
0 = A
n
.
Summing this up then gives χ
α
(A) = 0.
This proof suggests that we really ought to be able to just substitute in
t
=
α
and be done. In fact, we can do this, after we develop sufficient machinery. This
will be done in the IB Groups, Rings and Modules course.
Lemma. Let α End(V ), λ F. Then the following are equivalent:
(i) λ is an eigenvalue of α.
(ii) λ is a root of χ
α
(t).
(iii) λ is a root of M
α
(t).
Proof.
(i)
(ii):
λ
is an eigenvalue of
α
if and only if (
α λι
)(
v
) = 0 has a
non-trivial root, iff det(α λι) = 0.
(iii) (ii): This follows from Cayley-Hamilton theorem since M
α
| χ
α
.
(i)
(iii): Let
λ
be an eigenvalue, and
v
be a corresponding eigenvector.
Then by definition of M
α
, we have
M
α
(α)(v) = 0(v) = 0.
We also know that
M
α
(α)(v) = M
α
(λ)v.
Since v is non-zero, we must have M
α
(λ) = 0.
(iii)
(i): This is not necessary since it follows from the above, but
we could as well do it explicitly. Suppose
λ
is a root of
M
α
(
t
). Then
M
α
(
t
) = (
t λ
)
g
(
t
) for some
g F
[
t
]. But
deg g < deg M
α
. Hence by
minimality of
M
α
, we must have
g
(
α
)
6
= 0. So there is some
v V
such
that g(α)(v) 6= 0. Then
(α λι)g(α)(v) = M
α
(α)v = 0.
So we must have
α
(
g
(
α
)(
v
)) =
λg
(
α
)(
v
). So
g
(
α
)(
v
)
E
α
(
λ
)
\{
0
}
. So (i)
holds.
Example. What is the minimal polynomial of
A =
1 0 2
0 1 1
0 0 2
?
We can compute
χ
A
(
t
) = (
t
1)
2
(
t
2). So we know that the minimal polynomial
is one of (t 1)
2
(t 2) and (t 1)(t 2).
By direct and boring computations, we can find (
A I
)(
A
2
I
) = 0. So we
know that M
A
(t) = (t 1)(t 2). So A is diagonalizable.