6Endomorphisms

IB Linear Algebra

6.2 The minimal polynomial

6.2.1 Aside on polynomials

Before we talk about minimal polynomials, we first talk about polynomials in

general.

Definition (Polynomial). A polynomial over F is an object of the form

f(t) = a

m

t

m

+ a

m−1

t

m−1

+ ··· + a

1

t + a

0

,

with m ≥ 0, a

0

, ··· , a

m

∈ F.

We write F[t] for the set of polynomials over F.

Note that we don’t identify a polynomial

f

with the corresponding function

it represents. For example, if

F

=

Z/pZ

, then

t

p

and

t

are different polynomials,

even though they define the same function (by Fermat’s little theorem/Lagrange’s

theorem). Two polynomials are equal if and only if they have the same coeffi-

cients.

However, we will later see that if

F

is

R

or

C

, then polynomials are equal

if and only if they represent the same function, and this distinction is not as

important.

Definition

(Degree)

.

Let

f ∈ F

[

t

]. Then the degree of

f

, written

deg f

is the

largest n such that a

n

6= 0. In particular, deg 0 = −∞.

Notice that deg f g = deg f + deg g and deg f + g ≤ max{deg f, deg g}.

Lemma

(Polynomial division)

.

If

f, g ∈ F

[

t

] (and

g 6

= 0), then there exists

q, r ∈ F[t] with deg r < deg g such that

f = qg + r.

Proof is omitted.

Lemma. If λ ∈ F is a root of f, i.e. f(λ) = 0, then there is some g such that

f(t) = (t − λ)g(t).

Proof. By polynomial division, let

f(t) = (t − λ)g(t) + r(t)

for some

g

(

t

)

, r

(

t

)

∈ F

[

t

] with

deg r < deg

(

t − λ

) = 1. So

r

has to be constant,

i.e. r(t) = a

0

for some a

0

∈ F. Now evaluate this at λ. So

0 = f(λ) = (λ − λ)g(λ) + r(λ) = a

0

.

So a

0

= 0. So r = 0. So done.

Definition

(Multiplicity of a root)

.

Let

f ∈ F

[

t

] and

λ

a root of

f

. We say

λ

has multiplicity k if (t − λ)

k

is a factor of f but (t − λ)

k+1

is not, i.e.

f(t) = (t − λ)

k

g(t)

for some g(t) ∈ F[t] with g(λ) 6= 0.

We can use the last lemma and induction to show that any non-zero

f ∈ F

[

t

]

can be written as

f = g(t)

k

Y

i=1

(t − λ

i

)

a

i

,

where

λ

1

, ··· , λ

k

are all distinct,

a

i

>

1, and

g

is a polynomial with no roots in

F.

Hence we obtain the following:

Lemma.

A non-zero polynomial

f ∈ F

[

t

] has at most

deg f

roots, counted with

multiplicity.

Corollary.

Let

f, g ∈ F

[

t

] have degree

< n

. If there are

λ

1

, ··· , λ

n

distinct such

that f(λ

i

) = g(λ

i

) for all i, then f = g.

Proof.

Given the lemma, consider

f − g

. This has degree less than

n

, and

(

f − g

)(

λ

i

) = 0 for

i

= 1

, ··· , n

. Since it has at least

n ≥ deg

(

f − g

) roots, we

must have f − g = 0. So f = g.

Corollary.

If

F

is infinite, then

f

and

g

are equal if and only if they agree on

all points.

More importantly, we have the following:

Theorem

(The fundamental theorem of algebra)

.

Every non-constant polyno-

mial over C has a root in C.

We will not prove this.

We say C is an algebraically closed field.

It thus follows that every polynomial over

C

of degree

n >

0 has precisely

n

roots, counted with multiplicity, since if we write

f

(

t

) =

g

(

t

)

Q

(

t − λ

i

)

a

i

and

g

has no roots, then

g

is constant. So the number of roots is

P

a

i

=

deg f

,

counted with multiplicity.

It also follows that every polynomial in

R

factors into linear polynomials and

quadratic polynomials with no real roots (since complex roots of real polynomials

come in complex conjugate pairs).

6.2.2 Minimal polynomial

Notation.

Given

f

(

t

) =

P

m

i=0

a

i

t

i

∈ F

[

t

],

A ∈ Mat

n

(

F

) and

α ∈ End

(

V

), we

can write

f(A) =

m

X

i=0

a

i

A

i

, f(α) =

m

X

i=0

a

i

α

i

where A

0

= I and α

0

= ι.

Theorem

(Diagonalizability theorem)

.

Suppose

α ∈ End

(

V

). Then

α

is diago-

nalizable if and only if there exists non-zero

p

(

t

)

∈ F

[

t

] such that

p

(

α

) = 0, and

p(t) can be factored as a product of distinct linear factors.

Proof.

Suppose

α

is diagonalizable. Let

λ

1

, ··· , λ

k

be the distinct eigenvalues

of α. We have

V =

k

M

i=1

E(λ

i

).

So each v ∈ V can be written (uniquely) as

v =

k

X

i=1

v

i

with α(v

i

) = λ

i

v

i

.

Now let

p(t) =

k

Y

i=1

(t − λ

i

).

Then for any v, we get

p(α)(v) =

k

X

i=1

p(α)(v

i

) =

k

X

i=1

p(λ

i

)v

i

= 0.

So p(α) = 0. By construction, p has distinct linear factors.

Conversely, suppose we have our polynomial

p(t) =

k

Y

i=1

(t − λ

i

),

with

λ

1

, ··· , λ

k

∈ F

distinct, and

p

(

α

) = 0 (we can wlog assume

p

is monic, i.e.

the leading coefficient is 1). We will show that

V =

k

X

i=1

E

α

(λ

i

).

In other words, we want to show that for all

v ∈ V

, there is some

v

i

∈ E

α

(

λ

i

)

for i = 1, ··· , k such that v =

P

v

i

.

To find these v

i

out, we let

q

j

(t) =

Y

i6=j

t − λ

i

λ

j

− λ

i

.

This is a polynomial of degree k − 1, and q

j

(λ

i

) = δ

ij

.

Now consider

q(t) =

k

X

i=1

q

i

(t).

We still have

deg q ≤ k −

1, but

q

(

λ

i

) = 1 for any

i

. Since

q

and 1 agree on

k

points, we must have q = 1.

Let π

j

: V → V be given by π

j

= q

j

(α). Then the above says that

k

X

j=1

π

j

= ι.

Hence given v ∈ V , we know that v =

P

π

j

v.

We now check that π

j

v ∈ E

α

(λ

j

). This is true since

(α − λ

j

ι)π

j

v =

1

Q

i6=j

(λ

j

− λ

i

)

k

Y

i=1

(α − λ

ι

)(v) =

1

Q

i6=j

(λ

j

− λ

i

)

p(α)(v) = 0.

So

αv

j

= λ

j

v

j

.

So done.

In the above proof, if

v ∈ E

α

(

λ

i

), then

π

j

(

v

) =

δ

ij

v

. So

π

i

is a projection

onto the E

α

(λ

i

).

Definition

(Minimal polynomial)

.

The minimal polynomial of

α ∈ End

(

V

) is

the non-zero monic polynomial M

α

(t) of least degree such that M

α

(α) = 0.

The monic requirement is just for things to look nice, since we can always

divide by the leading coefficient of a polynomial to get a monic version.

Note that if

A

represents

α

, then for all

p ∈ F

[

t

],

p

(

A

) represents

p

(

α

).

Thus

p

(

α

) is zero iff

p

(

A

) = 0. So the minimal polynomial of

α

is the minimal

polynomial of A if we define M

A

analogously.

There are two things we want to know — whether the minimal polynomial

exists, and whether it is unique.

Existence is always guaranteed in finite-dimensional cases. If

dim V

=

n < ∞

,

then

dim End

(

V

) =

n

2

. So

ι, α, α

2

, ··· , α

n

2

are linearly dependent. So there are

some λ

0

, ··· , λ

n

2

∈ F not all zero such that

n

2

X

i=0

λ

i

α

i

= 0.

So deg M

α

≤ n

2

. So we must have a minimal polynomial.

To show that the minimal polynomial is unique, we will prove the following

stronger characterization of the minimal polynomial:

Lemma.

Let

α ∈ End

(

V

), and

p ∈ F

[

t

]. Then

p

(

α

) = 0 if and only if

M

α

(

t

) is

a factor of p(t). In particular, M

α

is unique.

Proof.

For all such

p

, we can write

p

(

t

) =

q

(

t

)

M

α

(

t

) +

r

(

t

) for some

r

of degree

less than deg M

α

. Then

p(α) = q(α)M

α

(α) + r(α).

So if

r

(

α

) = 0 iff

p

(

α

) = 0. But

deg r < deg M

α

. By the minimality of

M

α

, we

must have r(α) = 0 iff r = 0. So p(α) = 0 iff M

α

(t) | p(t).

So if

M

1

and

M

2

are both minimal polynomials for

α

, then

M

1

| M

2

and

M

2

| M

1

. So

M

2

is just a scalar multiple of

M

1

. But since

M

1

and

M

2

are

monic, they must be equal.

Example. Let V = F

2

, and consider the following matrices:

A =

1 0

0 1

, B =

1 1

0 1

.

Consider the polynomial

p

(

t

) = (

t −

1)

2

. We can compute

p

(

A

) =

p

(

B

) = 0. So

M

A

(

t

) and

M

B

(

t

) are factors of (

t −

1)

2

. There aren’t many factors of (

t −

1)

2

.

So the minimal polynomials are either (

t −

1) or (

t −

1)

2

. Since

A − I

= 0 and

B − I 6

= 0, the minimal polynomial of

A

is

t −

1 and the minimal polynomial of

B is (t − 1)

2

.

We can now re-state our diagonalizability theorem.

Theorem

(Diagonalizability theorem 2.0)

.

Let

α ∈ End

(

V

). Then

α

is diago-

nalizable if and only if M

α

(t) is a product of its distinct linear factors.

Proof. (⇐) This follows directly from the previous diagonalizability theorem.

(

⇒

) Suppose

α

is diagonalizable. Then there is some

p ∈ F

[

t

] non-zero such

that

p

(

α

) = 0 and

p

is a product of distinct linear factors. Since

M

α

divides

p

,

M

α

also has distinct linear factors.

Theorem.

Let

α, β ∈ End

(

V

) be both diagonalizable. Then

α

and

β

are

simultaneously diagonalizable (i.e. there exists a basis with respect to which

both are diagonal) if and only if αβ = βα.

This is important in quantum mechanics. This means that if two operators

do not commute, then they do not have a common eigenbasis. Hence we have

the uncertainty principle.

Proof.

(

⇒

) If there exists a basis (

e

1

, ··· , e

n

) for

V

such that

α

and

β

are repre-

sented by

A

and

B

respectively, with both diagonal, then by direct computation,

AB = BA. But AB represents αβ and BA represents βα. So αβ = βα.

(

⇐

) Suppose

αβ

=

βα

. The idea is to consider each eigenspace of

α

individu-

ally, and then diagonalize

β

in each of the eigenspaces. Since

α

is diagonalizable,

we can write

V =

k

M

i=1

E

α

(λ

i

),

where

λ

i

are the eigenvalues of

V

. We write

E

i

for

E

α

(

λ

i

). We want to show

that

β

sends

E

i

to itself, i.e.

β

(

E

i

)

⊆ E

i

. Let

v ∈ E

i

. Then we want

β

(

v

) to be

in E

i

. This is true since

α(β(v)) = β(α(v)) = β(λ

i

v) = λ

i

β(v).

So β(v) is an eigenvector of α with eigenvalue λ

i

.

Now we can view β|

E

i

∈ End(E

i

). Note that

M

β

(β|

E

i

) = M

β

(β)|

E

i

= 0.

Since

M

β

(

t

) is a product of its distinct linear factors, it follows that

β|

E

i

is

diagonalizable. So we can choose a basis

B

i

of eigenvectors for

β|

E

i

. We can do

this for all i.

Then since

V

is a direct sum of the

E

i

’s, we know that

B

=

S

k

i=1

B

i

is a

basis for V consisting of eigenvectors for both α and β. So done.