Part IB Groups, Rings and Modules
Based on lectures by O. Randal-Williams
Notes taken by Dexter Chua
Lent 2016
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Groups
Basic concepts of group theory recalled from Part IA Groups. Normal subgroups,
quotient groups and isomorphism theorems. Permutation groups. Groups acting on
sets, permutation representations. Conjugacy classes, centralizers and normalizers.
The centre of a group. Elementary properties of finite
p
-groups. Examples of finite
linear groups and groups arising from geometry. Simplicity of A
n
.
Sylow subgroups and Sylow theorems. Applications, groups of small order. [8]
Rings
Definition and examples of rings (commutative, with 1). Ideals, homomorphisms,
quotient rings, isomorphism theorems. Prime and maximal ideals. Fields. The
characteristic of a field. Field of fractions of an integral domain.
Factorization in rings; units, primes and irreducibles. Unique factorization in principal
ideal domains, and in polynomial rings. Gauss’ Lemma and Eisenstein’s irreducibility
criterion.
Rings
Z
[
α
] of algebraic integers as subsets of
C
and quotients of
Z
[
x
]. Examples of
Euclidean domains and uniqueness and non-uniqueness of factorization. Factorization
in the ring of Gaussian integers; representation of integers as sums of two squares.
Ideals in polynomial rings. Hilbert basis theorem. [10]
Modules
Definitions, examples of vector spaces, abelian groups and vector spaces with an
endomorphism. Sub-modules, homomorphisms, quotient modules and direct sums.
Equivalence of matrices, canonical form. Structure of finitely generated modules over
Euclidean domains, applications to abelian groups and Jordan normal form. [6]
Contents
0 Introduction
1 Groups
1.1 Basic concepts
1.2 Normal subgroups, quotients, homomorphisms, isomorphisms
1.3 Actions of permutations
1.4 Conjugacy, centralizers and normalizers
1.5 Finite p-groups
1.6 Finite abelian groups
1.7 Sylow theorems
2 Rings
2.1 Definitions and examples
2.2 Homomorphisms, ideals, quotients and isomorphisms
2.3 Integral domains, field of factions, maximal and prime ideals
2.4 Factorization in integral domains
2.5 Factorization in polynomial rings
2.6 Gaussian integers
2.7 Algebraic integers
2.8 Noetherian rings
3 Modules
3.1 Definitions and examples
3.2 Direct sums and free modules
3.3 Matrices over Euclidean domains
3.4 Modules over F[X] and normal forms for matrices
3.5 Conjugacy of matrices*
0 Introduction
The course is naturally divided into three sections Groups, Rings, and Modules.
In IA Groups, we learnt about some basic properties of groups, and studied
several interesting groups in depth. In the first part of this course, we will
further develop some general theory of groups. In particular, we will prove two
more isomorphism theorems of groups. While we will not find these theorems
particularly useful in this course, we will be able to formulate analogous theorems
for other algebraic structures such as rings and modules, as we will later find in
the course.
In the next part of the course, we will study rings. These are things that
behave somewhat like
Z
, where we can add, subtract, multiply but not (necessar-
ily) divide. While
Z
has many nice properties, these are not necessarily available
in arbitrary rings. Hence we will classify rings into different types, depending on
how many properties of
Z
they inherit. We can then try to reconstruct certain IA
Numbers and Sets results in these rings, such as unique factorization of numbers
into primes and ezout’s theorem.
Finally, we move on to modules. The definition of a module is very similar
to that of a vector space, except that instead of allowing scalar multiplication
by elements of a field, we have scalar multiplication by elements of a ring. It
turns out modules are completely unlike vector spaces, and can have much more
complicated structures. Perhaps because of this richness, many things turn out
to be modules. Using module theory, we will be able to prove certain important
theorems such as the classification of finite abelian groups and the Jordan normal
form theorem.
1 Groups
1.1 Basic concepts
We will begin by quickly recapping some definitions and results from IA Groups.
Definition (Group). A group is a triple (
G, ·, e
), where
G
is a set,
·
:
G×G G
is a function and e G is an element such that
(i) For all a, b, c G, we have (a · b) · c = a · (b · c). (associativity)
(ii) For all a G, we have a · e = e · a = a. (identity)
(iii)
For all
a G
, there exists
a
1
G
such that
a ·a
1
=
a
1
·a
=
e
.(inverse)
Some people add a stupid axiom that says
g · h G
for all
g, h G
, but this
is already implied by saying
·
is a function to
G
. You can write that down as
well, and no one will say you are stupid. But they might secretly think so.
Lemma. The inverse of an element is unique.
Proof. Let a
1
, b be inverses of a. Then
b = b · e = b ·a · a
1
= e · a
1
= a
1
.
Definition (Subgroup). If (
G, ·, e
) is a group and
H G
is a subset, it is a
subgroup if
(i) e H,
(ii) a, b H implies a · b H,
(iii) · : H × H H makes (H, ·, e) a group.
We write H G if H is a subgroup of G.
Note that the last condition in some sense encompasses the first two, but we
need the first two conditions to hold before the last statement makes sense at all.
Lemma.
H G
is a subgroup if
H
is non-empty and for any
h
1
, h
2
H
, we
have h
1
h
1
2
H.
Definition (Abelian group). A group
G
is abelian if
a · b
=
b · a
for all
a, b G
.
Example. We have the following familiar examples of groups
(i) (Z, +, 0), (Q, +, 0), (R, +, 0), (C, +, 0).
(ii) We also have groups of symmetries:
(a)
The symmetric group
S
n
is the collection of all permutations of
{1, 2, ··· , n}.
(b) The dihedral group D
2n
is the symmetries of a regular n-gon.
(c)
The group
GL
n
(
R
) is the group of invertible
n × n
real matrices,
which also is the group of invertible
R
-linear maps from the vector
space R
n
to itself.
(iii) The alternating group A
n
S
n
.
(iv) The cyclic group C
n
D
2n
.
(v)
The special linear group
SL
n
(
R
)
GL
n
(
R
), the subgroup of matrices of
determinant 1.
(vi) The Klein-four group C
2
× C
2
.
(vii)
The quaternions
Q
8
=
1
, ±i, ±j, ±k}
with
ij
=
k, ji
=
k
,
i
2
=
j
2
=
k
2
= 1, (1)
2
= 1.
With groups and subgroups, we can talk about cosets.
Definition (Coset). If H G, g G, the left coset gH is the set
gH = {x G : x = g · h for some h H}.
For example, since
H
is a subgroup, we know
e H
. So for any
g G
, we
must have g gH.
The collection of
H
-cosets in
G
forms a partition of
G
, and furthermore,
all
H
-cosets
gH
are in bijection with
H
itself, via
h 7→ gh
. An immediate
consequence is
Theorem (Lagrange’s theorem). Let G be a finite group, and H G. Then
|G| = |H||G : H|,
where |G : H| is the number of H-cosets in G.
We can do exactly the same thing with right cosets and get the same
conclusion.
We have implicitly used the following notation:
Definition (Order of group). The order of a group is the number of elements
in G, written |G|.
Instead of order of the group, we can ask what the order of an element is.
Definition (Order of element). The order of an element
g G
is the smallest
positive n such that g
n
= e. If there is no such n, we say g has infinite order.
We write ord(g) = n.
A basic lemma is as follows:
Lemma. If G is a finite group and g G has order n, then n | |G|.
Proof. Consider the following subset:
H = {e, g, g
2
, ··· , g
n1
}.
This is a subgroup of
G
, because it is non-empty and
g
r
g
s
=
g
rs
is on the list
(we might have to add
n
to the power of
g
to make it positive, but this is fine
since
g
n
=
e
). Moreover, there are no repeats in the list: if
g
i
=
g
j
, with wlog
i j
, then
g
ij
=
e
. So
i j < n
. By definition of
n
, we must have
i j
= 0,
i.e. i = j.
Hence Lagrange’s theorem tells us n = |H| | |G|.
1.2
Normal subgroups, quotients, homomorphisms, iso-
morphisms
We all (hopefully) recall what the definition of a normal subgroup is. However,
instead of just stating the definition and proving things about it, we can try to
motivate the definition, and see how one could naturally come up with it.
Let
H G
be a subgroup. The objective is to try to make the collection of
cosets
G/H = {gH : g G}
into a group.
Before we do that, we quickly come up with a criterion for when two cosets
gH
and
g
0
H
are equal. Notice that if
gH
=
g
0
H
, then
g g
0
H
. So
g
=
g
0
·h
for
some
h
. In other words, (
g
0
)
1
· g
=
h H
. So if two elements represent the
same coset, their difference is in
H
. The argument is also reversible. Hence two
elements g, g
0
represent the same H-coset if and only if (g
0
)
1
g H.
Suppose we try to make the set
G/H
=
{gH
:
g G}
into a group, by the
obvious formula
(g
1
H) · (g
2
H) = g
1
g
2
H.
However, this doesn’t necessarily make sense. If we take a different representative
for the same coset, we want to make sure it gives the same answer.
If g
2
H = g
0
2
H, then we know g
0
2
= g
2
· h for some h H. So
(g
1
H) · (g
0
2
H) = g
1
g
0
2
H = g
1
g
2
hH = g
1
g
2
H = (g
1
H) · (g
2
H).
So all is good.
What if we change g
1
? If g
1
H = g
0
1
H, then g
0
1
= g
1
· h for some h H. So
(g
0
1
H) · (g
2
H) = g
0
1
g
2
H = g
1
hg
2
H.
Now we are stuck. We would really want the equality
g
1
hg
2
H = g
1
g
2
H
to hold. This requires
(g
1
g
2
)
1
g
1
hg
2
H.
This is equivalent to
g
1
2
hg
2
H.
So for
G/H
to actually be a group under this operation, we must have, for any
h H and g G, the property g
1
hg H to hold.
This is not necessarily true for an arbitrary
H
. Those nice ones that satisfy
this property are known as normal subgroups.
Definition (Normal subgroup). A subgroup
H G
is normal if for any
h H
and g G, we have g
1
hg H. We write H C G.
This allows us to make the following definition:
Definition (Quotient group). If
H C G
is a normal subgroup, then the set
G/H
of left H-cosets forms a group with multiplication
(g
1
H) · (g
2
H) = g
1
g
2
H.
with identity eH = H. This is known as the quotient group.
This is indeed a group. Normality was defined such that this is well-defined.
Multiplication is associative since multiplication in
G
is associative. The inverse
of gH is g
1
H, and eH is easily seen to be the identity.
So far, we’ve just been looking at groups themselves. We would also like to
know how groups interact with each other. In other words, we want to study
functions between groups. However, we don’t allow arbitrary functions, since
groups have some structure, and we would like the functions to respect the group
structures. These nice functions are known as homomorphisms.
Definition (Homomorphism). If (
G, ·, e
G
) and (
H, , e
H
) are groups, a function
φ : G H is a homomorphism if φ(e
G
) = e
H
, and for g, g
0
G, we have
φ(g · g
0
) = φ(g) φ(g
0
).
If we think carefully,
φ
(
e
G
) =
e
H
can be derived from the second condition,
but it doesn’t hurt to put it in as well.
Lemma. If φ : G H is a homomorphism, then
φ(g
1
) = φ(g)
1
.
Proof. We compute φ(g · g
1
) in two ways. On the one hand, we have
φ(g · g
1
) = φ(e) = e.
On the other hand, we have
φ(g · g
1
) = φ(g) φ(g
1
).
By the uniqueness of inverse, we must have
φ(g
1
) = φ(g)
1
.
Given any homomorphism, we can build two groups out of it:
Definition (Kernel). The kernel of a homomorphism φ : G H is
ker(φ) = {g G : φ(g) = e}.
Definition (Image). The image of a homomorphism φ : G H is
im(φ) = {h H : h = φ(g) for some g G}.
Lemma. For a homomorphism
φ
:
G H
, the kernel
ker
(
φ
) is a normal
subgroup, and the image im(φ) is a subgroup of H.
Proof. There is only one possible way we can prove this.
To see ker(φ) is a subgroup, let g, h ker φ. Then
φ(g · h
1
) = φ(g) φ(h)
1
= e e
1
= e.
So gh
1
ker φ. Also, φ(e) = e. So ker(φ) is non-empty. So it is a subgroup.
To show it is normal, let
g ker
(
φ
). Let
x G
. We want to show
x
1
gx ker(φ). We have
φ(x
1
gx) = φ(x
1
) φ(g) φ(x) = φ(x
1
) φ(x) = φ(x
1
x) = φ(e) = e.
So x
1
gx ker(φ). So ker(φ) is normal.
Also, if φ(g), φ(h) im(φ), then
φ(g) φ(h)
1
= φ(gh
1
) im(φ).
Also, e im(φ). So im(φ) is non-empty. So im(φ) is a subgroup.
Definition (Isomorphism). An isomorphism is a homomorphism that is also a
bijection.
Definition (Isomorphic group). Two groups
G
and
H
are isomorphic if there
is an isomorphism between them. We write G
=
H.
Usually, we identify two isomorphic groups as being “the same”, and do not
distinguish isomorphic groups.
It is an exercise to show the following:
Lemma. If φ is an isomorphism, then the inverse φ
1
is also an isomorphism.
When studying groups, it is often helpful to break the group apart into smaller
groups, which are hopefully easier to study. We will have three isomorphism
theorems to do so. These isomorphism theorems tell us what happens when we
take quotients of different things. Then if a miracle happens, we can patch what
we know about the quotients together to get information about the big group.
Even if miracles do not happen, these are useful tools to have.
The first isomorphism relates the kernel to the image.
Theorem (First isomorphism theorem). Let
φ
:
G H
be a homomorphism.
Then ker(φ) C G and
G
ker(φ)
=
im(φ).
Proof.
We have already proved that
ker
(
φ
) is a normal subgroup. We now
have to construct a homomorphism
f
:
G/ ker
(
φ
)
im
(
φ
), and prove it is an
isomorphism.
Define our function as follows:
f :
G
ker(φ)
im(φ)
g ker(φ) 7→ φ(g).
We first tackle the obvious problem that this might not be well-defined, since we
are picking a representative for the coset. If
g ker
(
φ
) =
g
0
ker
(
φ
), then we know
g
1
· g
0
ker(φ). So φ(g
1
· g
0
) = e. So we know
e = φ(g
1
· g
0
) = φ(g)
1
φ(g
0
).
Multiplying the whole thing by
φ
(
g
) gives
φ
(
g
) =
φ
(
g
0
). Hence this function is
well-defined.
Next we show it is a homomorphism. To see
f
is a homomorphism, we have
f(g ker(φ) · g
0
ker(φ)) = f(gg
0
ker(φ))
= φ(gg
0
)
= φ(g) φ(g
0
)
= f(g ker(φ)) f (g
0
ker(φ)).
So f is a homomorphism. Finally, we show it is a bijection.
To show it is surjective, let
h im
(
φ
). Then
h
=
φ
(
g
) for some
g
. So
h = f(g ker(φ)) is in the image of f.
To show injectivity, suppose
f
(
g ker
(
φ
)) =
f
(
g
0
ker
(
φ
)). So
φ
(
g
) =
φ
(
g
0
). So
φ
(
g
1
· g
0
) =
e
. Hence
g
1
· g
0
ker
(
φ
), and hence
g ker
(
φ
) =
g
0
ker
(
φ
). So
done.
Before we move on to further isomorphism theorems, we see how we can use
these to identify two groups which are not obviously the same.
Example. Consider a homomorphism
φ
:
C C \ {
0
}
given by
z 7→ e
z
. We
also know that
e
z+w
= e
z
e
w
.
This means φ is a homomorphism if we think of it as φ : (C, +) (C \ {0}, ×).
What is the image of this homomorphism? The existence of
log
shows that
φ is surjective. So im φ = C \ {0}. What about the kernel? It is given by
ker(φ) = {z C : e
z
= 1} = 2πiZ,
i.e. the set of all integer multiples of 2πi. The conclusion is that
(C/(2πiZ), +)
=
(C \ {0}, ×).
The second isomorphism theorem is a slightly more complicated theorem.
Theorem (Second isomorphism theorem). Let
H G
and
K C G
. Then
HK = {h · k : h H, k K} is a subgroup of G, and H K C H. Moreover,
HK
K
=
H
H K
.
Proof. Let hk, h
0
k
0
HK. Then
h
0
k
0
(hk)
1
= h
0
k
0
k
1
h
1
= (h
0
h
1
)(hk
0
k
1
h
1
).
The first term is in
H
, while the second term is
k
0
k
1
K
conjugated by
h
, which also has to be in
K
be normality. So this is something in
H
times
something in K, and hence in HK. HK also contains e, and is hence a group.
To show
H K C H
, consider
x H K
and
h H
. Consider
h
1
xh
. Since
x K
, the normality of
K
implies
h
1
xh K
. Also, since
x, h H
, closure
implies h
1
xh H. So h
1
xh H K. So H K C H.
Now we can start to prove the second isomorphism theorem. To do so, we
apply the first isomorphism theorem to it. Define
φ : H G/K
h 7→ hK
This is easily seen to be a homomorphism. We apply the first isomorphism
theorem to this homomorphism. The image is all
K
-cosets represented by
something in H, i.e.
im(φ) =
HK
K
.
Then the kernel of φ is
ker(φ) = {h H : hK = eK} = {h H : h K} = H K.
So the first isomorphism theorem says
H
H K
=
HK
K
.
Notice we did more work than we really had to. We could have started by
writing down
φ
and checked it is a homomorphism. Then since
H K
is its
kernel, it has to be a normal subgroup.
Before we move on to the third isomorphism theorem, we notice that if
K C G
, then there is a bijection between subgroups of
G/K
and subgroups of
G
containing K, given by
{subgroups of G/K} {subgroups of G which contain K}
X
G
K
{g G : gK X}
L
K
G
K
K C L G.
This specializes to the bijection of normal subgroups:
{normal subgroups of G/K} {normal subgroups of G which contain K}
using the same bijection.
It is an elementary exercise to show that these are inverses of each other.
This correspondence will be useful in later times.
Theorem (Third isomorphism theorem). Let
K L G
be normal subgroups
of G. Then
G
K
L
K
=
G
L
.
Proof. Define the homomorphism
φ : G/K G/L
gK 7→ gL
As always, we have to check this is well-defined. If
gK
=
g
0
K
, then
g
1
g
0
K L. So gL = g
0
L. This is also a homomorphism since
φ(gK · g
0
K) = φ(gg
0
K) = gg
0
L = (gL) · (g
0
L) = φ(gK) · φ(g
0
K).
This clearly is surjective, since any coset
gL
is the image
φ
(
gK
). So the image
is G/L. The kernel is then
ker(φ) = {gK : gL = L} = {gK : g L} =
L
K
.
So the conclusion follows by the first isomorphism theorem.
The general idea of these theorems is to take a group, find a normal subgroup,
and then quotient it out. Then hopefully the normal subgroup and the quotient
group will be simpler. However, this doesn’t always work.
Definition (Simple group). A (non-trivial) group
G
is simple if it has no normal
subgroups except {e} and G.
In general, simple groups are complicated. However, if we only look at abelian
groups, then life is simpler. Note that by commutativity, the normality condition
is always trivially satisfied. So any subgroup is normal. Hence an abelian group
can be simple only if it has no non-trivial subgroups at all.
Lemma. An abelian group is simple if and only if it is isomorphic to the cyclic
group C
p
for some prime number p.
Proof.
By Lagrange’s theorem, any subgroup of
C
p
has order dividing
|C
p
|
=
p
.
Hence if
p
is prime, then it has no such divisors, and any subgroup must have
order 1 or
p
, i.e. it is either
{e}
or
C
p
itself. Hence in particular any normal
subgroup must be {e} or C
p
. So it is simple.
Now suppose
G
is abelian and simple. Let
e 6
=
g G
be a non-trivial element,
and consider
H
=
·· , g
2
, g
1
, e, g, g
2
, ···}
. Since
G
is abelian, conjugation
does nothing, and every subgroup is normal. So
H
is a normal subgroup. As
G
is simple,
H
=
{e}
or
H
=
G
. Since it contains
g 6
=
e
, it is non-trivial. So we
must have H = G. So G is cyclic.
If
G
is infinite cyclic, then it is isomorphic to
Z
. But
Z
is not simple, since
2Z C Z. So G is a finite cyclic group, i.e. G
=
C
m
for some finite m.
If
n | m
, then
g
m/n
generates a subgroup of
G
of order
n
. So this is a normal
subgroup. Therefore
n
must be
m
or 1. Hence
G
cannot be simple unless
m
has
no divisors except 1 and m, i.e. m is a prime.
One reason why simple groups are important is the following:
Theorem. Let G be any finite group. Then there are subgroups
G = H
1
B H
2
B H
3
B H
4
B ··· B H
n
= {e}.
such that H
i
/H
i+1
is simple.
Note that here we only claim that
H
i+1
is normal in
H
i
. This does not say
that, say, H
3
is a normal subgroup of H
1
.
Proof. If G is simple, let H
2
= {e}. Then we are done.
If
G
is not simple, let
H
2
be a maximal proper normal subgroup of
G
. We
now claim that G/H
2
is simple.
If
G/H
2
is not simple, it contains a proper non-trivial normal subgroup
L C G/H
2
such that
L 6
=
{e}, G/H
2
. However, there is a correspondence
between normal subgroups of
G/H
2
and normal subgroups of
G
containing
H
2
.
So
L
must be
K/H
2
for some
K C G
such that
K H
2
. Moreover, since
L
is
non-trivial and not
G/H
2
, we know
K
is not
G
or
H
2
. So
K
is a larger normal
subgroup. Contradiction.
So we have found an
H
2
C G
such that
G/H
2
is simple. Iterating this process
on
H
2
gives the desired result. Note that this process eventually stops, as
H
i+1
< H
i
, and hence |H
i+1
| < |H
i
|, and all these numbers are finite.
1.3 Actions of permutations
When we first motivated groups, we wanted to use them to represent some
collection of “symmetries”. Roughly, a symmetry of a set
X
is a permutation
of
X
, i.e. a bijection
X X
that leaves some nice properties unchanged. For
example, a symmetry of a square is a permutation of the vertices that leaves the
overall shape of the square unchanged.
Instead of just picking some nice permutations, we can consider the group of
all permutations. We call this the symmetric group.
Definition (Symmetric group). The symmetric group
S
n
is the group of all
permutations of {1, ··· , n}, i.e. the set of all bijections of this set with itself.
A convenient way of writing permutations is to use the disjoint cycle notation,
such as writing (1 2 3)(4 5)(6) for the permutation that maps
1 7→ 2 4 7→ 5
2 7→ 3 5 7→ 4
3 7→ 1 6 7→ 6.
Unfortunately, the convention for writing permutations is weird. Since permuta-
tions are bijections, and hence functions, they are multiplied the wrong way, i.e.
f g
means first apply
g
, then apply
f
. In particular, (1 2 3)(3 4) requires first
applying the second permutation, then the first, and is in fact (1 2 3 4).
We know that any permutation is a product of transpositions. Hence we
make the following definition.
Definition (Even and odd permutation). A permutation
σ S
n
is even if it
can be written as a product of evenly many transpositions; odd otherwise.
In IA Groups, we spent a lot of time proving this is well-defined, and we are
not doing that again (note that this definition by itself is well-defined if a
permutation can be both written as an even number of transposition and an odd
number of transposition, the definition says it is even. However, this is not what
we really want, since we cannot immediately conclude that, say, (1 2) is odd).
This allows us to define the homomorphism:
sgn : S
n
(1}, ×)
σ 7→
(
+1 σ is even
1 σ is odd
Definition (Alternating group). The alternating group
A
n
S
n
is the subgroup
of even permutations, i.e. A
n
is the kernel of sgn.
This immediately tells us
A
n
C S
n
, and we can immediately work out its
index, since
S
n
A
n
=
im(sgn) = 1},
unless n = 1. So A
n
has index 2.
More generally, for a set X, we can define its symmetric group as follows:
Definition (Symmetric group of
X
). Let
X
be a set. We write
Sym
(
X
) for the
group of all permutations of X.
However, we don’t always want the whole symmetric group. Sometimes, we
just want some subgroups of symmetric groups, as in our initial motivation. So
we make the following definition.
Definition (Permutation group). A group
G
is called a permutation group if it
is a subgroup of
Sym
(
X
) for some
X
, i.e. it is given by some, but not necessarily
all, permutations of some set.
We say G is a permutation group of order n if in addition |X| = n.
This is not really a too interesting definition, since, as we will soon see, every
group is (isomorphic to) a permutation group. However, in some cases, thinking
of a group as a permutation group of some object gives us better intuition on
what the group is about.
Example.
S
n
and
A
n
are obviously permutation groups. Also, the dihedral
group
D
2n
is a permutation group of order
n
, viewing it as a permutation of the
vertices of a regular n-gon.
We would next want to recover the idea of a group being a “permutation”.
If
G Sym
(
X
), then each
g G
should be able to give us a permutation of
X
,
in a way that is consistent with the group structure. We say the group
G
acts
on X. In general, we make the following definition:
Definition (Group action). An action of a group (
G, ·
) on a set
X
is a function
: G × X X
such that
(i) g
1
(g
2
x) = (g
1
· g
2
) x for all g
1
, g
2
G and x X.
(ii) e x = x for all x X.
There is another way of defining group actions, which is arguably a better
way of thinking about group actions.
Lemma. An action of
G
on
X
is equivalent to a homomorphism
φ
:
G
Sym(X).
Note that the statement by itself is useless, since it doesn’t tell us how to
translate between the homomorphism and a group action. The important part
is the proof.
Proof.
Let
:
G × X X
be an action. Define
φ
:
G Sym
(
X
) by sending
g
to the function
φ
(
g
) = (
g ·
:
X X
). This is indeed a permutation
g
1
·
is an inverse since
φ(g
1
)(φ(g)(x)) = g
1
(g x) = (g
1
· g) x = e x = x,
and a similar argument shows
φ
(
g
)
φ
(
g
1
) =
id
X
. So
φ
is at least a well-defined
function.
To show it is a homomorphism, just note that
φ(g
1
)(φ(g
2
)(x)) = g
1
(g
2
x) = (g
1
· g
2
) x = φ(g
1
· g
2
)(x).
Since this is true for all
x X
, we know
φ
(
g
1
)
φ
(
g
2
) =
φ
(
g
1
· g
2
). Also,
φ
(
e
)(
x
) =
e x
=
x
. So
φ
(
e
) is indeed the identity. Hence
φ
is a homomorphism.
We now do the same thing backwards. Given a homomorphism
φ
:
G
Sym
(
X
), define a function by
g x
=
φ
(
g
)(
x
). We now check it is indeed a group
action. Using the definition of a homomorphism, we know
(i) g
1
(
g
2
x
) =
φ
(
g
1
)(
φ
(
g
2
)(
x
)) = (
φ
(
g
1
)
φ
(
g
2
))(
x
) =
φ
(
g
1
· g
2
)(
x
) =
(g
1
· g
2
) x.
(ii) e x = φ(e)(x) = id
X
(x) = x.
So this homomorphism gives a group action. These two operations are clearly in-
verses to each other. So group actions of
G
on
X
are the same as homomorphisms
G Sym(X).
Definition (Permutation representation). A permutation representation of a
group G is a homomorphism G Sym(X).
We have thus shown that a permutation representation is the same as a
group action.
The good thing about thinking of group actions as homomorphisms is that
we can use all we know about homomorphisms on them.
Notation. For an action of
G
on
X
given by
φ
:
G Sym
(
X
), we write
G
X
= im(φ) and G
X
= ker(φ).
The first isomorphism theorem immediately gives
Proposition. G
X
C G and G/G
X
=
G
X
.
In particular, if G
X
= {e} is trivial, then G
=
G
X
Sym(X).
Example. Let
G
be the group of symmetries of a cube. Let
X
be the set of
diagonals of the cube.
Then
G
acts on
X
, and so we get
φ
:
G Sym
(
X
). What is its kernel? To pre-
serve the diagonals, it either does nothing to the diagonal, or flips the two vertices.
So G
X
= ker(φ) = {id, symmetry that sends each vertex to its opposite}
=
C
2
.
How about the image? We have
G
X
=
im
(
φ
)
Sym
(
X
)
=
S
4
. It is an
exercise to show that
im
(
φ
) =
Sym
(
X
), i.e. that
φ
is surjective. We are not
proving this because this is an exercise in geometry, not group theory. Then the
first isomorphism theorem tells us
G
X
=
G/G
X
.
So
|G| = |G
X
||G
X
| = 4! · 2 = 48.
This is an example of how we can use group actions to count elements in a
group.
Example (Cayley’s theorem). For any group
G
, we have an action of
G
on
G
itself via
g g
1
= gg
1
.
It is trivial to check this is indeed an action. This gives a group homomorphism
φ
:
G Sym
(
G
). What is its kernel? If
g ker
(
φ
), then it acts trivially on
every element. In particular, it acts trivially on the identity. So
g e
=
e
, which
means g = e. So ker(φ) = {e}. By the first isomorphism theorem, we get
G
=
G/{e}
=
im φ Sym(G).
So we know every group is (isomorphic to) a subgroup of a symmetric group.
Example. Let
H
be a subgroup of
G
, and
X
=
G/H
be the set of left cosets of
H. We let G act on X via
g g
1
H = gg
1
H.
It is easy to check this is well-defined and is indeed a group action. So we get
φ : G Sym(X).
Now consider
G
X
=
ker
(
φ
). If
g G
X
, then for every
g
1
G
, we have
g g
1
H = g
1
H. This means g
1
1
gg
1
H. In other words, we have
g g
1
Hg
1
1
.
This has to happen for all g
1
G. So
G
X
\
g
1
G
g
1
Hg
1
1
.
This argument is completely reversible if
g
T
g
1
G
g
1
Hg
1
1
, then for each
g
1
G, we know
g
1
1
gg
1
H,
and hence
gg
1
H = g
1
H.
So
g g
1
H = g
1
H
So g G
X
. Hence we indeed have equality:
ker(φ) = G
X
=
\
g
1
G
g
1
Hg
1
1
.
Since this is a kernel, this is a normal subgroup of
G
, and is contained in
H
.
Starting with an arbitrary subgroup
H
, this allows us to generate a normal
subgroup, and this is indeed the biggest normal subgroup of
G
that is contained
in H, if we stare at it long enough.
We can use this to prove the following theorem.
Theorem. Let
G
be a finite group, and
H G
a subgroup of index
n
. Then
there is a normal subgroup
K C G
with
K H
such that
G/K
is isomorphic to
a subgroup of S
n
. Hence |G/K| | n! and |G/K| n.
Proof.
We apply the previous example, giving
φ
:
G Sym
(
G/H
), and let
K
be the kernel of this homomorphism. We have already shown that
K H
. Then
the first isomorphism theorem gives
G/K
=
im φ Sym(G/H)
=
S
n
.
Then by Lagrange’s theorem, we know
|G/K| | |S
n
|
=
n
!, and we also have
|G/K| |G/H| = n.
Corollary. Let
G
be a non-abelian simple group. Let
H G
be a proper
subgroup of index
n
. Then
G
is isomorphic to a subgroup of
A
n
. Moreover, we
must have n 5, i.e. G cannot have a subgroup of index less than 5.
Proof.
The action of
G
on
X
=
G/H
gives a homomorphism
φ
:
G Sym
(
X
).
Then
ker
(
φ
)
C G
. Since
G
is simple,
ker
(
φ
) is either
G
or
{e}
. We first show
that it cannot be
G
. If
ker
(
φ
) =
G
, then every element of
G
acts trivially on
X
=
G/H
. But if
g G \ H
, which exists since the index of
H
is not 1, then
g H
=
gH 6
=
H
. So
g
does not act trivially. So the kernel cannot be the whole
of G. Hence ker(φ) = {e}.
Thus by the first isomorphism theorem, we get
G
=
im(φ) Sym(X)
=
S
n
.
We now need to show that G is in fact a subgroup of A
n
.
We know
A
n
C S
n
. So
im
(
φ
)
A
n
C im
(
φ
)
=
G
. As
G
is simple,
im
(
φ
)
A
n
is either
{e}
or
G
=
im
(
φ
). We want to show that the second thing happens,
i.e. the intersection is not the trivial group. We use the second isomorphism
theorem. If im(φ) A
n
= {e}, then
im(φ)
=
im(φ)
im(φ) A
n
=
im(φ)A
n
A
n
S
n
A
n
=
C
2
.
So
G
=
im
(
φ
) is a subgroup of
C
2
, i.e. either
{e}
or
C
2
itself. Neither of these are
non-abelian. So this cannot be the case. So we must have im(φ) A
n
= im(φ),
i.e. im(φ) A
n
.
The last part follows from the fact that
S
1
, S
2
, S
3
, S
4
have no non-abelian
simple subgroups, which you can check by going to a quiet room and listing out
all their subgroups.
Let’s recall some old definitions from IA Groups.
Definition (Orbit). If G acts on a set X, the orbit of x X is
G · x = {g x X : g G}.
Definition (Stabilizer). If G acts on a set X, the stabilizer of x X is
G
x
= {g G : g x = x}.
The main theorem about these concepts is the orbit-stabilizer theorem.
Theorem (Orbit-stabilizer theorem). Let
G
act on
X
. Then for any
x X
,
there is a bijection between G · x and G/G
x
, given by g · x g · G
x
.
In particular, if G is finite, it follows that
|G| = |G
x
||G · x|.
It takes some work to show this is well-defined and a bijection, but you’ve
done it in IA Groups. In IA Groups, you probably learnt the second statement
instead, but this result is more generally true for infinite groups.
1.4 Conjugacy, centralizers and normalizers
We have seen that every group acts on itself by multiplying on the left. A group
G can also act on itself in a different way, by conjugation:
g g
1
= gg
1
g
1
.
Let
φ
:
G Sym
(
G
) be the associated permutation representation. We know,
by definition, that
φ
(
g
) is a bijection from
G
to
G
as sets. However, here
G
is
not an arbitrary set, but is a group. A natural question to ask is whether
φ
(
g
)
is a homomorphism or not. Indeed, we have
φ(g)(g
1
· g
2
) = gg
1
g
2
g
1
= (gg
1
g
1
)(gg
2
g
1
) = φ(g)(g
1
)φ(g)(g
2
).
So
φ
(
g
) is a homomorphism from
G
to
G
. Since
φ
(
g
) is bijective (as in any
group action), it is in fact an isomorphism.
Thus, for any group
G
, there are many isomorphisms from
G
to itself, one
for every g G, and can be obtained from a group action of G on itself.
We can, of course, take the collection of all isomorphisms of
G
, and form a
new group out of it.
Definition (Automorphism group). The automorphism group of G is
Aut(G) = {f : G G : f is a group isomorphism}.
This is a group under composition, with the identity map as the identity.
This is a subgroup of
Sym
(
G
), and the homomorphism
φ
:
G Sym
(
G
) by
conjugation lands in Aut(G).
This is pretty fun we can use this to cook up some more groups, by taking
a group and looking at its automorphism group.
We can also take a group, take its automorphism group, and then take its
automorphism group again, and do it again, and see if this process stabilizes, or
becomes periodic, or something. This is left as an exercise for the reader.
Definition (Conjugacy class). The conjugacy class of g G is
ccl
G
(g) = {hgh
1
: h G},
i.e. the orbit of g G under the conjugation action.
Definition (Centralizer). The centralizer of g G is
C
G
(g) = {h G : hgh
1
= g},
i.e. the stabilizer of
g
under the conjugation action. This is alternatively the set
of all h G that commute with g.
Definition (Center). The center of a group G is
Z(G) = {h G : hgh
1
= g for all g G} =
\
gG
C
G
(g) = ker(φ).
These are the elements of the group that commute with everything else.
By the orbit-stabilizer theorem, for each
x G
, we obtain a bijection
ccl(x) G/C
G
(x).
Proposition. Let G be a finite group. Then
|ccl(x)| = |G : C
G
(x)| = |G|/|C
G
(x)|.
In particular, the size of each conjugacy class divides the order of the group.
Another useful notion is the normalizer.
Definition (Normalizer). Let H G. The normalizer of H in G is
N
G
(H) = {g G : g
1
Hg = H}.
Note that we certainly have
H N
G
(
H
). Even better,
HCN
G
(
H
), essentially
by definition. This is in fact the biggest subgroup of G in which H is normal.
We are now going to look at conjugacy classes of
S
n
. Now we recall from IA
Groups that permutations in
S
n
are conjugate if and only if they have the same
cycle type when written as a product of disjoint cycles. We can think of the
cycle types as partitions of
n
. For example, the partition 2
,
2
,
1 of 5 corresponds
to the conjugacy class of (1 2)(3 4)(5). So the conjugacy classes of
S
n
are exactly
the partitions of n.
We will use this fact in the proof of the following theorem:
Theorem. The alternating groups
A
n
are simple for
n
5 (also for
n
= 1
,
2
,
3).
The cases in brackets follow from a direct check since
A
1
=
A
2
=
{e}
and
A
3
=
C
3
, all of which are simple. We can also check manually that
A
4
has
non-trivial normal subgroups, and hence not simple.
Recall we also proved that
A
5
is simple in IA Groups by brute force we
listed all its conjugacy classes, and see they cannot be put together to make a
normal subgroup. This obviously cannot be easily generalized to higher values
of n. Hence we need to prove this with a different approach.
Proof. We start with the following claim:
Claim. A
n
is generated by 3-cycles.
As any element of
A
n
is a product of evenly-many transpositions, it suffices
to show that every product of two transpositions is also a product of 3-cycles.
There are three possible cases: let a, b, c, d be distinct. Then
(i) (a b)(a b) = e.
(ii) (a b)(b c) = (a b c).
(iii) (a b)(c d) = (a c b)(a c d).
So we have shown that every possible product of two transpositions is a product
of three-cycles.
Claim. Let H C A
n
. If H contains a 3-cycle, then we H = A
n
.
We show that if
H
contains a 3-cycle, then every 3-cycle is in
H
. Then we
are done since
A
n
is generated by 3-cycles. For concreteness, suppose we know
(a b c) H, and we want to show (1 2 3) H.
Since they have the same cycle type, so we have
σ S
n
such that (
a b c
) =
σ
(1 2 3)
σ
1
. If
σ
is even, i.e.
σ A
n
, then we have that (1 2 3)
σ
1
Hσ
=
H
,
by the normality of H and we are trivially done.
If
σ
is odd, replace it by
¯σ
=
σ ·
(4 5). Here is where we use the fact that
n 5 (we will use it again later). Then we have
¯σ(1 2 3)¯σ
1
= σ(4 5)(1 2 3)(4 5)σ
1
= σ(1 2 3)σ
1
= (a b c),
using the fact that (1 2 3) and (4 5) commute. Now
¯σ
is even. So (1 2 3)
H
as
above.
What we’ve got so far is that if
H C A
n
contains any 3-cycle, then it is
A
n
.
Finally, we have to show that every normal subgroup must contain at least one
3-cycle.
Claim. Let H C A
n
be non-trivial. Then H contains a 3-cycle.
We separate this into many cases
(i)
Suppose
H
contains an element which can be written in disjoint cycle
notation
σ = (1 2 3 ···r)τ,
for
r
4. We now let
δ
= (1 2 3)
A
n
. Then by normality of
H
, we know
δ
1
σδ H
. Then
σ
1
δ
1
σδ H
. Also, we notice that
τ
does not contain
1
,
2
,
3. So it commutes with
δ
, and also trivially with (1 2 3
··· r
). We
can expand this mess to obtain
σ
1
δ
1
σδ = (r ··· 2 1)(1 3 2)(1 2 3 ··· r)(1 2 3) = (2 3 r),
which is a 3-cycle. So done.
The same argument goes through if
σ
= (
a
1
a
2
··· a
r
)
τ
for any
a
1
, ··· , a
n
.
(ii)
Suppose
H
contains an element consisting of at least two 3-cycles in disjoint
cycle notation, say
σ = (1 2 3)(4 5 6)τ
We now let δ = (1 2 4), and again calculate
σ
1
δ
1
σδ = (1 3 2)(4 6 5)(1 4 2)(1 2 3)(4 5 6)(1 2 4) = (1 2 4 3 6).
This is a 5-cycle, which is necessarily in
H
. By the previous case, we get a
3-cycle in H too, and hence H = A
n
.
(iii)
Suppose
H
contains
σ
= (1 2 3)
τ
, with
τ
a product of 2-cycles (if
τ
contains
anything longer, then it would fit in one of the previous two cases). Then
σ
2
= (1 2 3)
2
= (1 3 2) is a three-cycle.
(iv)
Suppose
H
contains
σ
= (1 2)(3 4)
τ
, where
τ
is a product of 2-cycles. We
first let δ = (1 2 3) and calculate
u = σ
1
δ
1
σδ = (1 2)(3 4)(1 3 2)(1 2)(3 4)(1 2 3) = (1 4)(2 3),
which is again in
u
. We landed in the same case, but instead of two
transpositions times a mess, we just have two transpositions, which is nicer.
Now let
v = (1 5 2)u(1 2 5) = (1 3)(4 5) H.
Note that we used
n
5 again. We have yet again landed in the same case.
Notice however, that these are not the same transpositions. We multiply
uv = (1 4)(2 3)(1 3)(4 5) = (1 2 3 4 5) H.
This is then covered by the first case, and we are done.
So done. Phew.
1.5 Finite p-groups
Note that when studying the orders of groups and subgroups, we always talk
about divisibility, since that is what Lagrange’s theorem tells us about. We
never talk about things like the sum of the orders of two subgroups. When it
comes to divisibility, the simplest case would be when the order is a prime, and
we have done that already. The next best thing we can hope for is that the order
is a power of a prime.
Definition (
p
-group). A finite group
G
is a
p
-group if
|G|
=
p
n
for some prime
number p and n 1.
Theorem. If
G
is a finite
p
-group, then
Z
(
G
) =
{x G
:
xg
=
gx for all g G}
is non-trivial.
This immediately tells us that for n 2, a p group is never simple.
Proof.
Let
G
act on itself by conjugation. The orbits of this action (i.e. the
conjugacy classes) have order dividing
|G|
=
p
n
. So it is either a singleton, or
its size is divisible by p.
Since the conjugacy classes partition
G
, we know the total size of the conjugacy
classes is |G|. In particular,
|G| = number of conjugacy class of size 1
+
X
order of all other conjugacy classes.
We know the second term is divisible by
p
. Also
|G|
=
p
n
is divisible by
p
. Hence
the number of conjugacy classes of size 1 is divisible by
p
. We know
{e}
is a
conjugacy class of size 1. So there must be at least
p
conjugacy classes of size 1.
Since the smallest prime number is 2, there is a conjugacy class {x} 6= {e}.
But if
{x}
is a conjugacy class on its own, then by definition
g
1
xg
=
x
for
all g G, i.e. xg = gx for all g G. So x Z(G). So Z(G) is non-trivial.
The theorem allows us to prove interesting things about
p
-groups by induction
we can quotient
G
by
Z
(
G
), and get a smaller
p
-group. One way to do this is
via the below lemma.
Lemma. For any group G, if G/Z(G) is cyclic, then G is abelian.
In other words, if
G/Z
(
G
) is cyclic, then it is in fact trivial, since the center
of an abelian group is the abelian group itself.
Proof.
Let
g Z
(
G
) be a generator of the cyclic group
G/Z
(
G
). Hence every
coset of
Z
(
G
) is of the form
g
r
Z
(
G
). So every element
x G
must be of the
form
g
r
z
for
z Z
(
G
) and
r Z
. To show
G
is abelian, let
¯x
=
g
¯r
¯z
be another
element, with
¯z Z
(
G
)
, ¯r Z
. Note that
z
and
¯z
are in the center, and hence
commute with every element. So we have
x¯x = g
r
zg
¯r
¯z = g
r
g
¯r
z¯z = g
¯r
g
r
¯zz = g
¯r
¯zg
r
z = ¯xx.
So they commute. So G is abelian.
This is a general lemma for groups, but is particularly useful when applied
to p groups.
Corollary. If p is prime and |G| = p
2
, then G is abelian.
Proof.
Since
Z
(
G
)
G
, its order must be 1,
p
or
p
2
. Since it is not trivial, it
can only be
p
or
p
2
. If it has order
p
2
, then it is the whole group and the group
is abelian. Otherwise,
G/Z
(
G
) has order
p
2
/p
=
p
. But then it must be cyclic,
and thus G must be abelian. This is a contradiction. So G is abelian.
Theorem. Let
G
be a group of order
p
a
, where
p
is a prime number. Then it
has a subgroup of order p
b
for any 0 b a.
This means there is a subgroup of every conceivable order. This is not true
for general groups. For example,
A
5
has no subgroup of order 30 or else that
would be a normal subgroup.
Proof.
We induct on
a
. If
a
= 1, then
{e}, G
give subgroups of order
p
0
and
p
1
.
So done.
Now suppose
a >
1, and we want to construct a subgroup of order
p
b
. If
b = 0, then this is trivial, namely {e} G has order 1.
Otherwise, we know
Z
(
G
) is non-trivial. So let
x 6
=
e Z
(
G
). Since
ord
(
x
)
| |G|
, its order is a power of
p
. If it in fact has order
p
c
, then
x
p
c1
has
order
p
. So we can suppose, by renaming, that
x
has order
p
. We have thus
generated a subgroup
hxi
of order exactly
p
. Moreover, since
x
is in the center,
hxi
commutes with everything in
G
. So
hxi
is in fact a normal subgroup of
G
.
This is the point of choosing it in the center. Therefore G/hxi has order p
a1
.
Since this is a strictly smaller group, we can by induction suppose
G/hxi
has
a subgroup of any order. In particular, it has a subgroup
L
of order
p
b1
. By
the subgroup correspondence, there is some
K G
such that
L
=
K/hxi
and
H C K. But then K has order p
b
. So done.
1.6 Finite abelian groups
We now move on to a small section, which is small because we will come back to
it later, and actually prove what we claim.
It turns out finite abelian groups are very easy to classify. We can just write
down a list of all finite abelian groups. We write down the classification theorem,
and then prove it in the last part of the course, where we hit this with a huge
sledgehammer.
Theorem (Classification of finite abelian groups). Let
G
be a finite abelian
group. Then there exist some d
1
, ··· , d
r
such that
G
=
C
d
1
× C
d
2
× ··· × C
d
r
.
Moreover, we can pick
d
i
such that
d
i+1
| d
i
for each
i
, and this expression is
unique.
It turns out the best way to prove this is not to think of it as a group, but
as a Z-module, which is something we will come to later.
Example. The abelian groups of order 8 are C
8
, C
4
× C
2
, C
2
× C
2
× C
2
.
Sometimes this is not the most useful form of decomposition. To get a nicer
decomposition, we use the following lemma:
Lemma. If n and m are coprime, then C
mn
=
C
m
× C
n
.
This is a grown-up version of the Chinese remainder theorem. This is what
the Chinese remainder theorem really says.
Proof.
It suffices to find an element of order
nm
in
C
m
×C
n
. Then since
C
n
×C
m
has order nm, it must be cyclic, and hence isomorphic to C
nm
.
Let
g C
m
have order
m
;
h C
n
have order
n
, and consider (
g, h
)
C
m
×C
n
.
Suppose the order of (
g, h
) is
k
. Then (
g, h
)
k
= (
e, e
). Hence (
g
k
, h
k
) = (
e, e
).
So the order of
g
and
h
divide
k
, i.e.
m | k
and
n | k
. As
m
and
n
are coprime,
this means that mn | k.
As
k
=
ord
((
g, h
)) and (
g, h
)
C
m
× C
n
is a group of order
mn
, we must
have k | nm. So k = nm.
Corollary. For any finite abelian group G, we have
G
=
C
d
1
× C
d
2
× ··· × C
d
r
,
where each d
i
is some prime power.
Proof.
From the classification theorem, iteratively apply the previous lemma to
break each component up into products of prime powers.
As promised, this is short.
1.7 Sylow theorems
We finally get to the big theorem of this part of the course.
Theorem (Sylow theorems). Let
G
be a finite group of order
p
a
· m
, with
p
a
prime and p - m. Then
(i) The set of Sylow p-subgroups of G, given by
Syl
p
(G) = {P G : |P | = p
a
},
is non-empty. In other words, G has a subgroup of order p
a
.
(ii) All elements of Syl
p
(G) are conjugate in G.
(iii)
The number of Sylow
p
-subgroups
n
p
=
|Syl
p
(
G
)
|
satisfies
n
p
1 (
mod p
)
and n
p
| |G| (in fact n
p
| m, since p is not a factor of n
p
).
These are sometimes known as Sylow’s first/second/third theorem respec-
tively.
We will not prove this just yet. We first look at how we can apply this
theorem. We can use it without knowing how to prove it.
Lemma. If n
p
= 1, then the Sylow p-subgroup is normal in G.
Proof.
Let
P
be the unique Sylow
p
-subgroup, and let
g G
, and consider
g
1
P g
. Since this is isomorphic to
P
, we must have
|g
1
P g|
=
p
a
, i.e. it is also
a Sylow
p
-subgroup. Since there is only one, we must have
P
=
g
1
P g
. So
P
is
normal.
Corollary. Let
G
be a non-abelian simple group. Then
|G| |
n
p
!
2
for every prime
p such that p | |G|.
Proof.
The group
G
acts on
Syl
p
(
G
) by conjugation. So it gives a permutation
representation
φ
:
G Sym
(
Syl
p
(
G
))
=
S
n
p
. We know
ker φ C G
. But
G
is
simple. So ker(φ) = {e} or G. We want to show it is not the whole of G.
If we had
G
=
ker
(
φ
), then
g
1
P g
=
P
for all
g G
. Hence
P
is a normal
subgroup. As
G
is simple, either
P
=
{e}
, or
P
=
G
. We know
P
cannot be
trivial since
p | |G|
. But if
G
=
P
, then
G
is a
p
-group, has a non-trivial center,
and hence G is not non-abelian simple. So we must have ker(φ) = {e}.
Then by the first isomorphism theorem, we know
G
=
im φ S
n
p
. We have
proved the theorem without the divide-by-two part. To prove the whole result,
we need to show that in fact
im
(
φ
)
A
n
p
. Consider the following composition
of homomorphisms:
G S
n
p
1}.
φ sgn
If this is surjective, then
ker
(
sgn φ
)
C G
has index 2 (since the index is the size
of the image), and is not the whole of
G
. This means
G
is not simple (the case
where |G| = C
2
is ruled out since it is abelian).
So the kernel must be the whole
G
, and
sgn φ
is the trivial map. In other
words, sgn(φ(g)) = +1. So φ(g) A
n
p
. So in fact we have
G
=
im(φ) A
n
p
.
So we get |G| |
n
p
!
2
.
Example. Suppose
|G|
= 1000. Then
|G|
is not simple. To show this, we need
to factorize 1000. We have
|G|
= 2
3
·
5
3
. We pick our favorite prime to be
p
= 5.
We know
n
5
=
1 (
mod
5), and
n
5
|
2
3
= 8. The only number that satisfies this
is n
5
= 1. So the Sylow 5-subgroup is normal, and hence G is not normal.
Example. Let
|G|
= 132 = 2
2
·
3
·
11. We want to show this is not simple. So
for a contradiction suppose it is.
We start by looking at
p
= 11. We know
n
11
1 (
mod
11). Also
n
11
|
12.
As G is simple, we must have n
11
= 12.
Now look at
p
= 3. We have
n
3
= 1 (
mod
3) and
n
3
|
44. The possible
values of n
3
are 4 and 22.
If
n
3
= 4, then the corollary says
|G| |
4!
2
= 12, which is of course nonsense.
So n
3
= 22.
At this point, we count how many elements of each order there are. This is
particularly useful if
p | |G|
but
p
2
- |G|
, i.e. the Sylow
p
-subgroups have order
p
and hence are cyclic.
As all Sylow 11-subgroups are disjoint, apart from
{e}
, we know there are
12
·
(11
1) = 120 elements of order 11. We do the same thing with the Sylow
3-subgroups. We need 22
·
(3
1) = 44 elements of order 3. But this is more
elements than the group has. This can’t happen. So G must be simple.
We now get to prove our big theorem. This involves some non-trivial amount
of trickery.
Proof of Sylow’s theorem. Let G be a finite group with |G| = p
a
m, and p - m.
(i)
We need to show that
Syl
p
(
G
)
6
=
, i.e. we need to find some subgroup of
order p
a
. As always, we find something clever for G to act on. We let
= {X subset of G : |X| = p
a
}.
We let G act on by
g {g
1
, g
2
, ··· , g
p
a
} = {gg
1
, gg
2
, ··· , gg
p
a
}.
Let Σ be an orbit.
We first note that if
{g
1
, ··· , g
p
a
}
Σ, then by the definition of an orbit,
for every g G,
gg
1
1
{g
1
, ··· , g
p
a
} = {g, gg
1
1
g
2
, ··· , gg
1
1
g
p
a
} Σ.
The important thing is that this set contains
g
. So for each
g
, Σ contains
a set X which contains g. Since each set X has size p
a
, we must have
|Σ|
|G|
p
a
= m.
Suppose
|
Σ
|
=
m
. Then the orbit-stabilizer theorem says the stabilizer
H
of
any
{g
1
, ··· , g
p
a
}
Σ has index
m
, hence
|H|
=
p
a
, and thus
H Syl
p
(
G
).
So we need to show that not every orbit Σ can have size
> m
. Again, by
the orbit-stabilizer, the size of any orbit divides the order of the group,
|G|
=
p
a
m
. So if
|
Σ
| > m
, then
p | |
Σ
|
. Suppose we can show that
p - |
|
.
Then not every orbit Σ can have size
> m
, since is the disjoint union of
all the orbits, and thus we are done.
So we have to show p - ||. This is just some basic counting. We have
|| =
|G|
p
a
=
p
a
m
p
a
=
p
a
1
Y
j=0
=
p
a
m j
p
a
j
.
Now note that the largest power of
p
dividing
p
a
m j
is the largest power
of
p
dividing
j
. Similarly, the largest power of
p
dividing
p
a
j
is also the
largest power of
p
dividing
j
. So we have the same power of
p
on top and
bottom for each item in the product, and they cancel. So the result is not
divisible by p.
This proof is not straightforward. We first needed the clever idea of letting
G
act on Ω. But then if we are given this set, the obvious thing to do
would be to find something in that is also a group. This is not what we
do. Instead, we find an orbit whose stabilizer is a Sylow p-subgroup.
(ii)
We instead prove something stronger: if
Q G
is a
p
-subgroup (i.e.
|Q|
=
p
b
, for
b
not necessarily
a
), and
P G
is a Sylow
p
-subgroup, then
there is a
g G
such that
g
1
Qg P
. Applying this to the case where
Q
is another Sylow
p
-subgroup says there is a
g
such that
g
1
Qg P
, but
since g
1
Qg has the same size as P , they must be equal.
We let Q act on the set of cosets of G/P via
q gP = qgP.
We know the orbits of this action have size dividing
|Q|
, so is either 1 or
divisible by
p
. But they can’t all be divisible by
p
, since
|G/P |
is coprime
to
p
. So at least one of them have size 1, say
{gP }
. In other words, for
every
q Q
, we have
qgP
=
gP
. This means
g
1
qg P
. This holds for
every element q Q. So we have found a g such that g
1
Qg P .
(iii)
Finally, we need to show that
n
p
=
1 (
mod p
) and
n
p
| |G|
, where
n
p
=
|Syl
P
(G)|.
The second part is easier by Sylow’s second theorem, the action of
G
on
Syl
p
(
G
) by conjugation has one orbit. By the orbit-stabilizer theorem,
the size of the orbit, which is
|Syl
p
(
G
)
|
=
n
p
, divides
|G|
. This proves the
second part.
For the first part, let
P Syl
P
(
G
). Consider the action by conjugation
of
P
on
Syl
p
(
G
). Again by the orbit-stabilizer theorem, the orbits each
have size 1 or size divisible by
p
. But we know there is one orbit of size 1,
namely
{P }
itself. To show
n
p
=
|Syl
P
(
G
)
|
=
1 (
mod p
), it is enough to
show there are no other orbits of size 1.
Suppose {Q} is an orbit of size 1. This means for every p P , we get
p
1
Qp = Q.
In other words,
P N
G
(
Q
). Now
N
G
(
Q
) is itself a group, and we
can look at its Sylow
p
-subgroups. We know
Q N
G
(
Q
)
G
. So
p
a
| |N
G
(
Q
)
| | p
a
m
. So
p
a
is the biggest power of
p
that divides
|N
G
(
Q
)
|
.
So Q is a Sylow p-subgroup of N
G
(Q).
Now we know
P N
G
(
Q
) is also a Sylow
p
-subgroup of
N
G
(
Q
). By Sylow’s
second theorem, they must be conjugate in
N
G
(
Q
). But conjugating
anything in
Q
by something in
N
G
(
Q
) does nothing, by definition of
N
G
(
Q
). So we must have
P
=
Q
. So the only orbit of size 1 is
{P }
itself.
So done.
This is all the theories of groups we’ve got. In the remaining time, we will
look at some interesting examples of groups.
Example. Let
G
=
GL
n
(
Z/p
), i.e. the set of invertible
n × n
matrices with
entries in
Z/p
, the integers modulo
p
. Here
p
is obviously a prime. When we do
rings later, we will study this properly.
First of all, we would like to know the size of this group. A matrix
A
GL
n
(
Z/p
) is the same as
n
linearly independent vectors in the vector space
(
Z/p
)
n
. We can just work out how many there are. This is not too difficult,
when you know how.
We can pick the first vector, which can be anything except zero. So there
are
p
n
1 ways of choosing the first vector. Next, we need to pick the second
vector. This can be anything that is not in the span of the first vector, and this
rules out
p
possibilities. So there are
p
n
p
ways of choosing the second vector.
Continuing this chain of thought, we have
|GL
n
(Z/p)| = (p
n
1)(p
n
p)(p
n
p
2
) ···(p
n
p
n1
).
What is a Sylow
p
-subgroup of
GL
n
(
Z/p
)? We first work out what the order of
this is. We can factorize that as
|GL
n
(Z/p)| = (1 · p · p
2
· ··· · p
n1
)((p
n
1)(p
n1
1) ···(p 1)).
So the largest power of
p
that divides
|GL
n
(
Z/p
)
|
is
p
(
n
2
)
. Let’s find a subgroup
of size p
(
n
2
)
. We consider matrices of the form
U =
1 ···
0 1 ···
0 0 1 ···
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 ··· 1
GL
n
(Z/p)
.
Then we know
|U|
=
p
(
n
2
)
as each
can be chosen to be anything in
Z/p
, and
there are
n
2
s.
Is the Sylow
p
-subgroup unique? No. We can take the lower triangular
matrices and get another Sylow p-subgroup.
Example. Let’s be less ambitious and consider GL
2
(Z/p). So
|G| = p(p
2
1)(p 1) = p(p 1)
2
(p + 1).
Let
`
be another prime number such that
` | p
1. Suppose the largest power of
` that divides |G| is `
2
. Can we (explicitly) find a Sylow `-subgroup?
First, we want to find an element of order
`
. How is
p
1 related to
p
(apart
from the obvious way)? We know that
(Z/p)
×
= {x Z/p : (y) xy 1 (mod p)}
=
C
p1
.
So as
` | p
1, there is a subgroup
C
`
C
p1
=
(
Z/p
)
×
. Then we immediately
know where to find a subgroup of order `
2
: we have
C
`
× C
`
(Z/p)
× (Z/p)
×
GL
2
(Z/p),
where the final inclusion is the diagonal matrices, identifying
(a, b)
a 0
0 b
.
So this is the Sylow `-subgroup.
2 Rings
2.1 Definitions and examples
We now move on to something completely different rings. In a ring, we are
allowed to add, subtract, multiply but not divide. Our canonical example of a
ring would be Z, the integers, as studied in IA Numbers and Sets.
In this course, we are only going to consider rings in which multiplication
is commutative, since these rings behave like “number systems”, where we can
study number theory. However, some of these rings do not behave like
Z
. Thus
one major goal of this part is to understand the different properties of
Z
, whether
they are present in arbitrary rings, and how different properties relate to one
another.
Definition (Ring). A ring is a quintuple (
R,
+
, ·,
0
R
,
1
R
) where 0
R
,
1
R
R
,
and +, · : R × R R are binary operations such that
(i) (R, +, 0
R
) is an abelian group.
(ii) The operation · : R × R R satisfies associativity, i.e.
a · (b · c) = (a · b) · c,
and identity:
1
R
· r = r · 1
R
= r.
(iii) Multiplication distributes over addition, i.e.
r
1
· (r
2
+ r
3
) = (r
1
· r
2
) + (r
1
· r
3
)
(r
1
+ r
2
) · r
3
= (r
1
· r
3
) + (r
2
· r
3
).
Notation. If
R
is a ring and
r R
, we write
r
for the inverse to
r
in (
R,
+
,
0
R
).
This satisfies r + (r) = 0
R
. We write r s to mean r + (s) etc.
Some people don’t insist on the existence of the multiplicative identity, but
we will for the purposes of this course.
Since we can add and multiply two elements, by induction, we can add and
multiply any finite number of elements. However, the notions of infinite sum and
product are undefined. It doesn’t make sense to ask if an infinite sum converges.
Definition (Commutative ring). We say a ring
R
is commutative if
a · b
=
b · a
for all a, b R.
From now onwards, all rings in this course are going to be commutative.
Just as we have groups and subgroups, we also have subrings.
Definition (Subring). Let (
R,
+
, ·,
0
R
,
1
R
) be a ring, and
S R
be a subset.
We say
S
is a subring of
R
if 0
R
,
1
R
S
, and the operations +
, ·
make
S
into a
ring in its own right. In this case we write S R.
Example. The familiar number systems are all rings: we have
Z Q R C
,
under the usual 0, 1, +, ·.
Example. The set
Z
[
i
] =
{a
+
ib
:
a, b Z} C
is the Gaussian integers, which
is a ring.
We also have the ring Q[
2] = {a + b
2 R : a, b Q} R.
We will use the square brackets notation quite frequently. It should be clear
what it should mean, and we will define it properly later.
In general, elements in a ring do not have inverses. This is not a bad thing.
This is what makes rings interesting. For example, the division algorithm would
be rather contentless if everything in
Z
had an inverse. Fortunately,
Z
only has
two invertible elements 1 and 1. We call these units.
Definition (Unit). An element
u R
is a unit if there is another element
v R
such that u · v = 1
R
.
It is important that this depends on
R
, not just on
u
. For example, 2
Z
is
not a unit, but 2 Q is a unit (since
1
2
is an inverse).
A special case is when (almost) everything is a unit.
Definition (Field). A field is a non-zero ring where every
u 6
= 0
R
R
is a unit.
We will later show that 0
R
cannot be a unit except in a very degenerate case.
Example. Z is not a field, but Q, R, C are all fields.
Similarly, Z[i] is not a field, while Q[
2] is.
Example. Let
R
be a ring. Then 0
R
+ 0
R
= 0
R
, since this is true in the group
(R, +, 0
R
). Then for any r R, we get
r · (0
R
+ 0
R
) = r · 0
R
.
We now use the fact that multiplication distributes over addition. So
r · 0
R
+ r · 0
R
= r · 0
R
.
Adding (r · 0
R
) to both sides give
r · 0
R
= 0
R
.
This is true for any element
r R
. From this, it follows that if
R 6
=
{
0
}
, then
1
R
6= 0
R
if they were equal, then take r 6= 0
R
. So
r = r · 1
R
= r · 0
R
= 0
R
,
which is a contradiction.
Note, however, that
{
0
}
forms a ring (with the only possible operations
and identities), the zero ring, albeit a boring one. However, this is often a
counterexample to many things.
Definition (Product of rings). Let
R, S
be rings. Then the product
R × S
is a
ring via
(r, s) + (r
0
, s
0
) = (r + r
0
, s + s
0
), (r, s) · (r
0
, s
0
) = (r · r
0
, s · s
0
).
The zero is (0
R
, 0
S
) and the one is (1
R
, 1
S
).
We can (but won’t) check that these indeed are rings.
Definition (Polynomial). Let
R
be a ring. Then a polynomial with coefficients
in R is an expression
f = a
0
+ a
1
X + a
2
X
2
+ ··· + a
n
X
n
,
with a
i
R. The symbols X
i
are formal symbols.
We identify f and f + 0
R
· X
n+1
as the same things.
Definition (Degree of polynomial). The degree of a polynomial
f
is the largest
m such that a
m
6= 0.
Definition (Monic polynomial). Let
f
have degree
m
. If
a
m
= 1, then
f
is
called monic.
Definition (Polynomial ring). We write
R
[
X
] for the set of all polynomials
with coefficients in
R
. The operations are performed in the obvious way, i.e. if
f
=
a
0
+
a
1
X
+
···
+
A
n
X
n
and
g
=
b
0
+
b
1
X
+
···
+
b
k
X
k
are polynomials,
then
f + g =
max{n,k}
X
r=0
(a
i
+ b
i
)X
i
,
and
f · g =
n+k
X
i=0
i
X
j=0
a
j
b
ij
X
i
,
We identify
R
with the constant polynomials, i.e. polynomials
P
a
i
X
i
with
a
i
= 0 for
i >
0. In particular, 0
R
R
and 1
R
R
are the zero and one of
R
[
X
].
This is in fact a ring.
Note that a polynomial is just a sequence of numbers, interpreted as the
coefficients of some formal symbols. While it does indeed induce a function in
the obvious way, we shall not identify the polynomial with the function given by
it, since different polynomials can give rise to the same function.
For example, in
Z/
2
Z
[
X
],
f
=
X
2
+
X
is not the zero polynomial, since its
coefficients are not zero. However,
f
(0) = 0 and
f
(1) = 0. As a function, this is
identically zero. So f 6= 0 as a polynomial but f = 0 as a function.
Definition (Power series). We write
R
[[
X
]] for the ring of power series on
R
,
i.e.
f = a
0
+ a
1
X + a
2
X
2
+ ··· ,
where each
a
i
R
. This has addition and multiplication the same as for
polynomials, but without upper limits.
A power series is very much not a function. We don’t talk about whether
the sum converges or not, because it is not a sum.
Example. Is 1
X R
[
X
] a unit? For every
g
=
a
0
+
···
+
a
n
X
n
(with
a
n
6
= 0),
we get
(1 X)g = stuff + ··· a
n
X
n+1
,
which is not 1. So
g
cannot be the inverse of (1
X
). So (1
X
) is not a unit.
However, 1 X R[[X]] is a unit, since
(1 X)(1 + X + X
2
+ X
3
+ ···) = 1.
Definition (Laurent polynomials). The Laurent polynomials on
R
is the set
R[X, X
1
], i.e. each element is of the form
f =
X
iZ
a
i
X
i
where
a
i
R
and only finitely many
a
i
are non-zero. The operations are the
obvious ones.
We can also think of Laurent series, but we have to be careful. We allow
infinitely many positive coefficients, but only finitely many negative ones. Or
else, in the formula for multiplication, we will have an infinite sum, which is
undefined.
Example. Let
X
be a set, and
R
be a ring. Then the set of all functions on
X
,
i.e. functions f : X R, is a ring with ring operations given by
(f + g)(x) = f(x) + g(x), (f · g)(x) = f(x) · g(x).
Here zero is the constant function 0 and one is the constant function 1.
Usually, we don’t want to consider all functions
X R
. Instead, we look at
some subrings of this. For example, we can consider the ring of all continuous
functions
R R
. This contains, for example, the polynomial functions, which
is just R[X] (since in R, polynomials are functions).
2.2 Homomorphisms, ideals, quotients and isomorphisms
Just like groups, we will come up with analogues of homomorphisms, normal
subgroups (which are now known as ideals), and quotients.
Definition (Homomorphism of rings). Let
R, S
be rings. A function
φ
:
R S
is a ring homomorphism if it preserves everything we can think of, i.e.
(i) φ(r
1
+ r
2
) = φ(r
1
) + φ(r
2
),
(ii) φ(0
R
) = 0
S
,
(iii) φ(r
1
· r
2
) = φ(r
1
) · φ(r
2
),
(iv) φ(1
R
) = 1
S
.
Definition (Isomorphism of rings). If a homomorphism
φ
:
R S
is a bijection,
we call it an isomorphism.
Definition (Kernel). The kernel of a homomorphism φ : R S is
ker(φ) = {r R : φ(r) = 0
S
}.
Definition (Image). The image of φ : R S is
im(φ) = {s S : s = φ(r) for some r R}.
Lemma. A homomorphism φ : R S is injective if and only if ker φ = {0
R
}.
Proof.
A ring homomorphism is in particular a group homomorphism
φ
:
(
R,
+
,
0
R
)
(
S,
+
,
0
S
) of abelian groups. So this follows from the case of
groups.
In the group scenario, we had groups, subgroups and normal subgroups,
which are special subgroups. Here, we have a special kind of subsets of a ring
that act like normal subgroups, known as ideals.
Definition (Ideal). A subset I R is an ideal, written I C R, if
(i)
It is an additive subgroup of (
R,
+
,
0
R
), i.e. it is closed under addition and
additive inverses. (additive closure)
(ii) If a I and b R, then a · b I. (strong closure)
We say I is a proper ideal if I 6= R.
Note that the multiplicative closure is stronger than what we require for
subrings for subrings, it has to be closed under multiplication by its own
elements; for ideals, it has to be closed under multiplication by everything in
the world. This is similar to how normal subgroups not only have to be closed
under internal multiplication, but also conjugation by external elements.
Lemma. If φ : R S is a homomorphism, then ker(φ) C R.
Proof.
Since
φ
: (
R,
+
,
0
R
)
(
S,
+
,
0
S
) is a group homomorphism, the kernel is
a subgroup of (R, +, 0
R
).
For the second part, let
a ker
(
φ
),
b R
. We need to show that their
product is in the kernel. We have
φ(a · b) = φ(a) · φ(b) = 0 · φ(b) = 0.
So a · b ker(φ).
Example. Suppose
I C R
is an ideal, and 1
R
I
. Then for any
r R
, the
axioms entail 1
R
· r I. But 1
R
· r = r. So if 1
R
I, then I = R.
In other words, every proper ideal does not contain 1. In particular, every
proper ideal is not a subring, since a subring must contain 1.
We are starting to diverge from groups. In groups, a normal subgroup is a
subgroup, but here an ideal is not a subring.
Example. We can generalize the above a bit. Suppose
I C R
and
u I
is a
unit, i.e. there is some
v R
such that
u · v
= 1
R
. Then by strong closure,
1
R
= u · v I. So I = R.
Hence proper ideals are not allowed to contain any unit at all, not just 1
R
.
Example. Consider the ring
Z
of integers. Then every ideal of
Z
is of the form
nZ = ·· , 2n, n, 0, n, 2n, ···} Z.
It is easy to see this is indeed an ideal.
To show these are all the ideals, let
I C Z
. If
I
=
{
0
}
, then
I
= 0
Z
. Otherwise,
let
n N
be the smallest positive element of
I
. We want to show in fact
I
=
nZ
.
Certainly nZ I by strong closure.
Now let m I. By the Euclidean algorithm, we can write
m = q · n + r
with 0
r < n
. Now
n, m I
. So by strong closure,
m, q · n I
. So
r
=
m q ·n I
. As
n
is the smallest positive element of
I
, and
r < n
, we must
have r = 0. So m = q · n nZ. So I nZ. So I = nZ.
The key to proving this was that we can perform the Euclidean algorithm on
Z
. Thus, for any ring
R
in which we can “do Euclidean algorithm”, every ideal
is of the form
aR
=
{a · r
:
r R}
for some
a R
. We will make this notion
precise later.
Definition (Generator of ideal). For an element a R, we write
(a) = aR = {a · r : r R}C R.
This is the ideal generated by a.
In general, let a
1
, a
2
, ··· , a
k
R, we write
(a
1
, a
2
, ··· , a
k
) = {a
1
r
1
+ ··· + a
k
r
k
: r
1
, ··· , r
k
R}.
This is the ideal generated by a
1
, ··· , a
k
.
We can also have ideals generated by infinitely many objects, but we have to
be careful, since we cannot have infinite sums.
Definition (Generator of ideal). For
A R
a subset, the ideal generated by
A
is
(A) =
(
X
aA
r
a
· a : r
a
R, only finitely-many non-zero
)
.
These ideals are rather nice ideals, since they are easy to describe, and often
have some nice properties.
Definition (Principal ideal). An ideal
I
is a principal ideal if
I
= (
a
) for some
a R.
So what we have just shown for
Z
is that all ideals are principal. Not all
rings are like this. These are special types of rings, which we will study more in
depth later.
Example. Consider the following subset:
{f R[X] : the constant coefficient of f is 0}.
This is an ideal, as we can check manually (alternatively, it is the kernel of the
“evaluate at 0” homomorphism). It turns out this is a principal ideal. In fact, it
is (X).
We have said ideals are like normal subgroups. The key idea is that we can
divide by ideals.
Definition (Quotient ring). Let
I C R
. The quotient ring
R/I
consists of the
(additive) cosets
r
+
I
with the zero and one as 0
R
+
I
and 1
R
+
I
, and operations
(r
1
+ I) + (r
2
+ I) = (r
1
+ r
2
) + I
(r
1
+ I) · (r
2
+ I) = r
1
r
2
+ I.
Proposition. The quotient ring is a ring, and the function
R R/I
r 7→ r + I
is a ring homomorphism.
This is true, because we defined ideals to be those things that can be
quotiented by. So we just have to check we made the right definition.
Just as we could have come up with the definition of a normal subgroup by
requiring operations on the cosets to be well-defined, we could have come up
with the definition of an ideal by requiring the multiplication of cosets to be
well-defined, and we would end up with the strong closure property.
Proof.
We know the group (
R/I,
+
,
0
R/I
) is well-defined, since
I
is a (normal)
subgroup of R. So we only have to check multiplication is well-defined.
Suppose
r
1
+
I
=
r
0
1
+
I
and
r
2
+
I
=
r
0
2
+
I
. Then
r
0
1
r
1
=
a
1
I
and
r
0
2
r
2
= a
2
I. So
r
0
1
r
0
2
= (r
1
+ a
1
)(r
2
+ a
2
) = r
1
r
2
+ r
1
a
2
+ r
2
a
1
+ a
1
a
2
.
By the strong closure property, the last three objects are in
I
. So
r
0
1
r
0
2
+
I
=
r
1
r
2
+ I.
It is easy to check that 0
R
+
I
and 1
R
+
I
are indeed the zero and one, and
the function given is clearly a homomorphism.
Example. We have the ideals
nZ C Z
. So we have the quotient rings
Z/nZ
.
The elements are of the form m + nZ, so they are just
0 + nZ, 1 + nZ, 2 + nZ, ··· , (n 1) + nZ.
Addition and multiplication are just what we are used to addition and
multiplication modulo n.
Note that it is easier to come up with ideals than normal subgroups we
can just pick up random elements, and then take the ideal generated by them.
Example. Consider (
X
)
C C
[
X
]. What is
C
[
X
]
/
(
X
)? Elements are represented
by
a
0
+ a
1
X + a
2
X
2
+ ··· + a
n
X
n
+ (X).
But everything but the first term is in (
X
). So every such thing is equivalent to
a
0
+ (
X
). It is not hard to convince yourself that this representation is unique.
So in fact C[X]/(X)
=
C, with the bijection a
0
+ (X) a
0
.
If we want to prove things like this, we have to convince ourselves this
representation is unique. We can do that by hand here, but in general, we want
to be able to do this properly.
Proposition (Euclidean algorithm for polynomials). Let
F
be a field and
f, g F[X]. Then there is some r, q F[X] such that
f = gq + r,
with deg r < deg g.
This is like the usual Euclidean algorithm, except that instead of the absolute
value, we use the degree to measure how “big” the polynomial is.
Proof. Let deg(f) = n. So
f =
n
X
i=0
a
i
X
i
,
and a
n
6= 0. Similarly, if deg g = m, then
g =
m
X
i=0
b
i
X
i
,
with b
m
6= 0. If n < m, we let q = 0 and r = f, and done.
Otherwise, suppose n m, and proceed by induction on n.
We let
f
1
= f a
n
b
1
m
X
nm
g.
This is possible since
b
m
6
= 0, and
F
is a field. Then by construction, the
coefficients of X
n
cancel out. So deg(f
1
) < n.
If n = m, then deg(f
1
) < n = m. So we can write
f = (a
n
b
1
m
X
nm
)g + f
1
,
and
deg
(
f
1
)
< deg
(
f
). So done. Otherwise, if
n > m
, then as
deg
(
f
1
)
< n
, by
induction, we can find r
1
, q
1
such that
f
1
= gq
1
+ r
1
,
and deg(r
1
) < deg g = m. Then
f = a
n
b
1
m
X
nm
g + q
1
g + r
1
= (a
n
b
1
m
X
nm
+ q
1
)g + r
1
.
So done.
Now that we have a Euclidean algorithm for polynomials, we should be able
to show that every ideal of
F
[
X
] is generated by one polynomial. We will not
prove it specifically here, but later show that in general, in every ring where the
Euclidean algorithm is possible, all ideals are principal.
We now look at some applications of the Euclidean algorithm.
Example. Consider
R
[
X
], and consider the principal ideal (
X
2
+ 1)
C R
[
X
].
We let R = R[X]/(X
2
+ 1).
Elements of R are polynomials
a
0
+ a
1
X + a
2
X
2
+ ··· + a
n
X
n
| {z }
f
+(X
2
+ 1).
By the Euclidean algorithm, we have
f = q(X
2
+ 1) + r,
with
deg
(
r
)
<
2, i.e.
r
=
b
0
+
b
1
X
. Thus
f
+ (
X
2
+ 1) =
r
+ (
X
2
+ 1). So every
element of R[X]/(X
2
+ 1) is representable as a + bX for some a, b R.
Is this representation unique? If
a
+
bX
+ (
X
2
+ 1) =
a
0
+
b
0
X
+ (
X
2
+ 1),
then the difference (
a a
0
) + (
b b
0
)
X
(
X
2
+ 1). So it is (
X
2
+ 1)
q
for some
q
.
This is possible only if
q
= 0, since for non-zero
q
, we know (
X
2
+ 1)
q
has degree
at least 2. So we must have (
a a
0
) + (
b b
0
)
X
= 0. So
a
+
bX
=
a
0
+
b
0
X
. So
the representation is unique.
What we’ve got is that every element in
R
is of the form
a
+
bX
, and
X
2
+ 1 = 0, i.e.
X
2
=
1. This sounds like the complex numbers, just that we
are calling it X instead of i.
To show this formally, we define the function
φ : R[X]/(X
2
+ 1) C
a + bX + (X
2
+ 1) 7→ a + bi.
This is well-defined and a bijection. It is also clearly additive. So to prove this
is an isomorphism, we have to show it is multiplicative. We check this manually.
We have
φ((a + bX + (X
2
+ 1))(c + dX + (X
2
+ 1)))
= φ(ac + (ad + bc)X + bdX
2
+ (X
2
+ 1))
= φ((ac bd) + (ad + bc)X + (X
2
+ 1))
= (ac bd) + (ad + bc)i
= (a + bi)(c + di)
= φ(a + bX + (X
2
+ 1))φ(c + dX + (X
2
+ 1)).
So this is indeed an isomorphism.
This is pretty tedious. Fortunately, we have some helpful results we can use,
namely the isomorphism theorems. These are exactly analogous to those for
groups.
Theorem (First isomorphism theorem). Let
φ
:
R S
be a ring homomorphism.
Then ker(φ) C R, and
R
ker(φ)
=
im(φ) S.
Proof. We have already seen ker(φ) C R. Now define
Φ : R/ ker(φ) im(φ)
r + ker(φ) 7→ φ(r).
This well-defined, since if
r
+
ker
(
φ
) =
r
0
+
ker
(
φ
), then
r r
0
ker
(
φ
). So
φ(r r
0
) = 0. So φ(r) = φ(r
0
).
We don’t have to check this is bijective and additive, since that comes for
free from the (proof of the) isomorphism theorem of groups. So we just have to
check it is multiplicative. To show Φ is multiplicative, we have
Φ((r + ker(φ))(t + ker(φ))) = Φ(rt + ker(φ))
= φ(rt)
= φ(r)φ(t)
= Φ(r + ker(φ))Φ(t + ker(φ)).
This is more-or-less the same proof as the one for groups, just that we had a
few more things to check.
Since there is the first isomorphism theorem, we, obviously, have more
coming.
Theorem (Second isomorphism theorem). Let
R S
and
J CS
. Then
J RCR
,
and
R + J
J
= {r + J : r R}
S
J
is a subring, and
R
R J
=
R + J
J
.
Proof. Define the function
φ : R S/J
r 7→ r + J.
Since this is the quotient map, it is a ring homomorphism. The kernel is
ker(φ) = {r R : r + J = 0, i.e. r J} = R J.
Then the image is
im(φ) = {r + J : r R} =
R + J
J
.
Then by the first isomorphism theorem, we know
R J C R
, and
R+J
J
S
, and
R
R J
=
R + J
J
.
Before we get to the third isomorphism theorem, recall we had the subgroup
correspondence for groups. Analogously, for I C R,
{subrings of R/I} {subrings of R which contain I}
L
R
I
{x R : x + I L}
S
I
R
I
I C S R.
This is exactly the same formula as for groups.
For groups, we had a correspondence for normal subgroups. Here, we have a
correspondence between ideals
{ideals of R/I} {ideals of R which contain I}
It is important to note here that quotienting in groups and rings have different
purposes. In groups, we take quotients so that we have simpler groups to work
with. In rings, we often take quotients to get more interesting rings. For example,
R
[
X
] is quite boring, but
R
[
X
]
/
(
X
2
+ 1)
=
C
is more interesting. Thus this ideal
correspondence allows us to occasionally get interesting ideals from boring ones.
Theorem (Third isomorphism theorem). Let
I C R
and
J C R
, and
I J
.
Then J/I C R/I and
R
I
J
I
=
R
J
.
Proof. We define the map
φ : R/I R/J
r + I 7→ r + J.
This is well-defined and surjective by the groups case. Also it is a ring homo-
morphism since multiplication in
R/I
and
R/J
are “the same”. The kernel
is
ker(φ) = {r + I : r + J = 0, i.e. r J} =
J
I
.
So the result follows from the first isomorphism theorem.
Note that for any ring
R
, there is a unique ring homomorphism
Z R
, given
by
ι : Z R
n 0 7→ 1
R
+ 1
R
+ ··· + 1
R
| {z }
n times
n 0 7→ (1
R
+ 1
R
+ ··· + 1
R
| {z }
n times
)
Any homomorphism
Z R
must be given by this formula, since it must send the
unit to the unit, and we can show this is indeed a homomorphism by distributivity.
So the ring homomorphism is unique. In fancy language, we say
Z
is the initial
object in (the category of) rings.
We then know ker(ι) C Z. Thus ker(ι) = nZ for some n.
Definition (Characteristic of ring). Let
R
be a ring, and
ι
:
Z R
be the
unique such map. The characteristic of
R
is the unique non-negative
n
such
that ker(ι) = nZ.
Example. The rings
Z, Q, R, C
all have characteristic 0. The ring
Z/nZ
has
characteristic n. In particular, all natural numbers can be characteristics.
The notion of the characteristic will not be too useful in this course. How-
ever, fields of non-zero characteristic often provide interesting examples and
counterexamples to some later theory.
2.3
Integral domains, field of factions, maximal and prime
ideals
Many rings can be completely nothing like
Z
. For example, in
Z
, we know that if
a, b 6
= 0, then
ab 6
= 0. However, in, say,
Z/
6
Z
, we get 2
,
3
6
= 0, but 2
·
3 = 0. Also,
Z
has some nice properties such as every ideal is principal, and every integer
has an (essentially) unique factorization. We will now classify rings according to
which properties they have.
We start with the most fundamental property that the product of two non-
zero elements are non-zero. We will almost exclusively work with rings that
satisfy this property.
Definition (Integral domain). A non-zero ring
R
is an integral domain if for all
a, b R, if a · b = 0
R
, then a = 0
R
or b = 0
R
.
An element that violates this property is known as a zero divisor.
Definition (Zero divisor). An element
x R
is a zero divisor if
x 6
= 0 and
there is a y 6= 0 such that x · y = 0 R.
In other words, a ring is an integral domain if it has no zero divisors.
Example. All fields are integral domains, since if
a · b
= 0, and
b 6
= 0, then
a = a · (b · b
1
) = 0. Similarly, if a 6= 0, then b = 0.
Example. A subring of an integral domain is an integral domain, since a zero
divisor in the small ring would also be a zero divisor in the big ring.
Example. Immediately, we know
Z, Q, R, C
are integral domains, since
C
is a
field, and the others are subrings of it. Also,
Z
[
i
]
C
is also an integral domain.
These are the nice rings we like in number theory, since there we can sensibly
talk about things like factorization.
It turns out there are no interesting finite integral domains.
Lemma. Let
R
be a finite ring which is an integral domain. Then
R
is a field.
Proof. Let a R be non-zero, and consider the ring homomorphism
a · : R R
b 7→ a · b
We want to show this is injective. For this, it suffices to show the kernel is trivial.
If
r ker
(
a ·
), then
a · r
= 0. So
r
= 0 since
R
is an integral domain. So the
kernel is trivial.
Since
R
is finite,
a ·
must also be surjective. In particular, there is an
element
b R
such that
a · b
= 1
R
. So
a
has an inverse. Since
a
was arbitrary,
R is a field.
So far, we know fields are integral domains, and subrings of integral domains
are integral domains. We have another good source of integral domain as follows:
Lemma. Let R be an integral domain. Then R[X] is also an integral domain.
Proof.
We need to show that the product of two non-zero elements is non-zero.
Let f, g R[X] be non-zero, say
f = a
0
+ a
1
X + ··· + a
n
X
n
R[X]
g = b
0
+ b
1
X + ··· + b
m
X
m
R[X],
with
a
n
, b
m
6
= 0. Then the coefficient of
X
n+m
in
fg
is
a
n
b
m
. This is non-
zero since
R
is an integral domain. So
fg
is non-zero. So
R
[
X
] is an integral
domain.
So, for instance, Z[X] is an integral domain.
We can also iterate this.
Notation. Write
R
[
X, Y
] for (
R
[
X
])[
Y
], the polynomial ring of
R
in two vari-
ables. In general, write R[X
1
, ··· , X
n
] = (···((R[X
1
])[X
2
]) ···)[X
n
].
Then if R is an integral domain, so is R[X
1
, ··· , X
n
].
We now mimic the familiar construction of
Q
from
Z
. For any integral
domain
R
, we want to construct a field
F
that consists of “fractions” of elements
in
R
. Recall that a subring of any field is an integral domain. This says the
converse every integral domain is the subring of some field.
Definition (Field of fractions). Let
R
be an integral domain. A field of fractions
F of R is a field with the following properties
(i) R F
(ii)
Every element of
F
may be written as
a ·b
1
for
a, b R
, where
b
1
means
the multiplicative inverse to b 6= 0 in F .
For example, Q is the field of fractions of Z.
Theorem. Every integral domain has a field of fractions.
Proof.
The construction is exactly how we construct the rationals from the
integers as equivalence classes of pairs of integers. We let
S = {(a, b) R × R : b 6= 0}.
We think of (a, b) S as
a
b
. We define the equivalence relation on S by
(a, b) (c, d) ad = bc.
We need to show this is indeed a equivalence relation. Symmetry and reflexivity
are obvious. To show transitivity, suppose
(a, b) (c, d), (c, d) (e, f),
i.e.
ad = bc, cf = de.
We multiply the first equation by f and the second by b, to obtain
adf = bcf, bcf = bed.
Rearranging, we get
d(af be) = 0.
Since
d
is in the denominator,
d 6
= 0. Since
R
is an integral domain, we must
have
af be
= 0, i.e.
af
=
be
. So (
a, b
)
(
e, f
). This is where being an integral
domain is important.
Now let
F = S/
be the set of equivalence classes. We now want to check this is indeed the field
of fractions. We first want to show it is a field. We write
a
b
= [(
a, b
)]
F
, and
define the operations by
a
b
+
c
d
=
ad + bc
bd
a
b
·
c
d
=
ac
bd
.
These are well-defined, and make (
F,
+
, ·,
0
1
,
1
1
) into a ring. There are many
things to check, but those are straightforward, and we will not waste time doing
that here.
Finally, we need to show every non-zero element has an inverse. Let
a
b
6
= 0
F
,
i.e.
a
b
6=
0
1
, or a ·1 6= b · 0 R, i.e. a 6= 0. Then
b
a
F is defined, and
b
a
·
a
b
=
ba
ba
= 1
F
.
So
a
b
has a multiplicative inverse. So F is a field.
We now need to construct a subring of
F
that is isomorphic to
R
. To do so,
we need to define an injective isomorphism φ : R F . This is given by
φ : R F
r 7→
r
1
.
This is a ring homomorphism, as one can check easily. The kernel is the set of
all
r R
such that
r
1
= 0, i.e.
r
= 0. So the kernel is trivial, and
φ
is injective.
Then by the first isomorphism theorem, R
=
im(φ) F .
Finally, we need to show everything is a quotient of two things in
R
. We
have
a
b
=
a
1
·
1
b
=
a
1
·
b
1
1
,
as required.
This gives us a very useful tool. Since this gives us a field from an integral
domain, this allows us to use field techniques to study integral domains. Moreover,
we can use this to construct new interesting fields from integral domains.
Example. Consider the integral domain
C
[
X
]. Its field of fractions is the field
of all rational functions
p(X)
q(X)
, where p, q C[X].
To some people, it is a shame to think of rings as having elements. Instead,
we should think of a ring as a god-like object, and the only things we should
ever mention are its ideals. We should also not think of the ideals as containing
elements, but just some abstract objects, and all we know is how ideals relate to
one another, e.g. if one contains the other.
Under this philosophy, we can think of a field as follows:
Lemma. A (non-zero) ring
R
is a field if and only if its only ideals are
{
0
}
and
R.
Note that we don’t need elements to define the ideals
{
0
}
and
R
.
{
0
}
can be
defined as the ideal that all other ideals contain, and
R
is the ideal that contains
all other ideals. Alternatively, we can reword this as R is a field if and only if
it has only two ideals” to avoid mentioning explicit ideals.
Proof.
(
) Let
I C R
and
R
be a field. Suppose
x 6
= 0
I
. Then as
x
is a unit,
I = R.
(
) Suppose
x 6
= 0
R
. Then (
x
) is an ideal of
R
. It is not
{
0
}
since it
contains
x
. So (
x
) =
R
. In other words 1
R
(
x
). But (
x
) is defined to be
{x · y
:
y R}
. So there is some
u R
such that
x · u
= 1
R
. So
x
is a unit.
Since x was arbitrary, R is a field.
This is another reason why fields are special. They have the simplest possible
ideal structure.
This motivates the following definition:
Definition (Maximal ideal). An ideal
I
of a ring
R
is maximal if
I 6
=
R
and
for any ideal J with I J R, either J = I or J = R.
The relation with what we’ve done above is quite simple. There is an easy
way to recognize if an ideal is maximal.
Lemma. An ideal I C R is maximal if and only if R/I is a field.
Proof. R/I
is a field if and only if
{
0
}
and
R/I
are the only ideals of
R/I
. By
the ideal correspondence, this is equivalent to saying
I
and
R
are the only ideals
of R which contains I, i.e. I is maximal. So done.
This is a nice result. This makes a correspondence between properties of
ideals I and properties of the quotient R/I. Here is another one:
Definition (Prime ideal). An ideal
I
of a ring
R
is prime if
I 6
=
R
and whenever
a, b R are such that a · b I, then a I or b I.
This is like the opposite of the property of being an ideal being an ideal
means if we have something in the ideal and something outside, the product is
always in the ideal. This does the backwards. If the product of two random
things is in the ideal, then one of them must be from the ideal.
Example. A non-zero ideal nZ C Z is prime if and only if n is a prime.
To show this, first suppose
n
=
p
is a prime, and
a · b pZ
. So
p | a · b
. So
p | a or p | b, i.e. a pZ or b pZ.
For the other direction, suppose
n
=
pq
is a composite number (
p, q 6
= 1).
Then n nZ but p 6∈ nZ and q 6∈ nZ, since 0 < p, q < n.
So instead of talking about prime numbers, we can talk about prime ideals
instead, because ideals are better than elements.
We prove a result similar to the above:
Lemma. An ideal I C R is prime if and only if R/I is an integral domain.
Proof.
Let
I
be prime. Let
a
+
I, b
+
I R/I
, and suppose (
a
+
I
)(
b
+
I
) = 0
R/I
.
By definition, (
a
+
I
)(
b
+
I
) =
ab
+
I
. So we must have
ab I
. As
I
is prime,
either
a I
or
b I
. So
a
+
I
= 0
R/I
or
b
+
I
= 0
R/I
. So
R/I
is an integral
domain.
Conversely, suppose
R/I
is an integral domain. Let
a, b R
be such that
ab I
. Then (
a
+
I
)(
b
+
I
) =
ab
+
I
= 0
R/I
R/I
. Since
R/I
is an integral
domain, either
a
+
I
= 0
R/I
or
b
+
I
= 0
R/i
, i.e.
a I
or
b I
. So
I
is a prime
ideal.
Prime ideals and maximal ideals are the main types of ideals we care about.
Note that every field is an integral domain. So we immediately have the following
result:
Proposition. Every maximal ideal is a prime ideal.
Proof. I C R
is maximal implies
R/I
is a field implies
R/I
is an integral domain
implies I is prime.
The converse is not true. For example,
{
0
} Z
is prime but not maximal.
Less stupidly, (
X
)
Z
[
X, Y
] is prime but not maximal (since
Z
[
X, Y
]
/
(
X
)
=
Z
[
Y
]). We can provide a more explicit proof of this, which is essentially the
same.
Alternative proof.
Let
I
be a maximal ideal, and suppose
a, b 6∈ I
but
ab I
.
Then by maximality,
I
+ (
a
) =
I
+ (
b
) =
R
= (1). So we can find some
p, q R
and n, m I such that n + ap = m + bq = 1. Then
1 = (n + ap)(m + bq) = nm + apm + bqn + abpq I,
since n, m, ab I. This is a contradiction.
Lemma. Let
R
be an integral domain. Then its characteristic is either 0 or a
prime number.
Proof.
Consider the unique map
φ
:
Z R
, and
ker
(
φ
) =
nZ
. Then
n
is the
characteristic of R by definition.
By the first isomorphism theorem,
Z/nZ
=
im
(
φ
)
R
. So
Z/nZ
is an
integral domain. So nZ C Z is a prime. So n = 0 or a prime number.
2.4 Factorization in integral domains
We now move on to tackle the problem of factorization in rings. For sanity,
we suppose throughout the section that
R
is an integral domain. We start by
making loads of definitions.
Definition (Unit). An element
a R
is a unit if there is a
b R
such that
ab = 1
R
. Equivalently, if the ideal (a) = R.
Definition (Division). For elements
a, b R
, we say
a
divides
b
, written
a | b
,
if there is a c R such that b = ac. Equivalently, if (b) (a).
Definition (Associates). We say
a, b R
are associates if
a
=
bc
for some unit
c. Equivalently, if (a) = (b). Equivalently, if a | b and b | a.
In the integers, this can only happen if
a
and
b
differ by a sign, but in more
interesting rings, more interesting things can happen.
When considering division in rings, we often consider two associates to be
“the same”. For example, in Z, we can factorize 6 as
6 = 2 · 3 = (2) · (3),
but this does not violate unique factorization, since 2 and
2 are associates (and
so are 3 and 3), and we consider these two factorizations to be “the same”.
Definition (Irreducible). We say
a R
is irreducible if
a 6
= 0,
a
is not a unit,
and if a = xy, then x or y is a unit.
For integers, being irreducible is the same as being a prime number. However,
“prime” means something different in general rings.
Definition (Prime). We say
a R
is prime if
a
is non-zero, not a unit, and
whenever a | xy, either a | x or a | y.
It is important to note all these properties depend on the ring, not just the
element itself.
Example. 2 Z is a prime, but 2 Q is not (since it is a unit).
Similarly, the polynomial 2
X Q
[
X
] is irreducible (since 2 is a unit), but
2X Z[X] not irreducible.
We have two things called prime, so they had better be related.
Lemma. A principal ideal (
r
) is a prime ideal in
R
if and only if
r
= 0 or
r
is
prime.
Proof.
(
) Let (
r
) be a prime ideal. If
r
= 0, then done. Otherwise, as prime
ideals are proper, i.e. not the whole ring,
r
is not a unit. Now suppose
r | a · b
.
Then
a · b
(
r
). But (
r
) is prime. So
a
(
r
) or
b
(
r
). So
r | a
or
r | b
. So
r
is
prime.
(
) If
r
= 0, then (0) =
{
0
} C R
, which is prime since
R
is an integral
domain. Otherwise, let
r 6
= 0 be prime. Suppose
a · b
(
r
). This means
r | a · b
.
So r | a or r | b. So a (r) and b (r). So (r) is prime.
Note that in
Z
, prime numbers exactly match the irreducibles, but prime
numbers are also prime (surprise!). In general, it is not true that irreducibles
are the same as primes. However, one direction is always true.
Lemma. Let r R be prime. Then it is irreducible.
Proof.
Let
r R
be prime, and suppose
r
=
ab
. Since
r | r
=
ab
, and
r
is
prime, we must have
r | a
or
r | b
. wlog,
r | a
. So
a
=
rc
for some
c R
. So
r
=
ab
=
rcb
. Since we are in an integral domain, we must have 1 =
cb
. So
b
is
a unit.
We now do a long interesting example.
Example. Let
R = Z[
5] = {a + b
5 : a, b Z} C.
By definition, it is a subring of a field. So it is an integral domain. What are
the units of the ring? There is a nice trick we can use, when things are lying
inside C. Consider the function
N : R Z
0
given by
N(a + b
5) 7→ a
2
+ 5b
2
.
It is convenient to think of this as
z 7→ z¯z
=
|z|
2
. This satisfies
N
(
z · w
) =
N
(
z
)
N
(
w
). This is a desirable thing to have for a ring, since it immediately
implies all units have norm 1 if
r ·s
= 1, then 1 =
N
(1) =
N
(
rs
) =
N
(
r
)
N
(
s
).
So N (r) = N(s) = 1.
So to find the units, we need to solve
a
2
+ 5
b
2
= 1, for
a
and
b
units. The
only solutions are
±
1. So only
±
1
R
can be units, and these obviously are
units. So these are all the units.
Next, we claim 2
R
is irreducible. We again use the norm. Suppose 2 =
ab
.
Then 4 =
N
(2) =
N
(
a
)
N
(
b
). Now note that nothing has norm 2.
a
2
+ 5
b
2
can
never be 2 for integers
a, b Z
. So we must have, wlog,
N
(
a
) = 4
, N
(
b
) = 1.
So
b
must be a unit. Similarly, we see that 3
,
1 +
5,
1
5
are irreducible
(since there is also no element of norm 3).
We have four irreducible elements in this ring. Are they prime? No! Note
that
(1 +
5)(1
5) = 6 = 2 · 3.
We now claim 2 does not divide 1 +
5 or 1
5. So 2 is not prime.
To show this, suppose 2
|
1 +
5
. Then
N
(2)
| N
(1 +
5
). But
N
(2) = 4
and
N
(1 +
5
) = 6, and 4
-
6. Similarly,
N
(1
5
) = 6 as well. So
2 - 1 ±
5.
There are several life lessons here. First is that primes and irreducibles are
not the same thing in general. We’ve always thought they were the same because
we’ve been living in the fantasy land of the integers. But we need to grow up.
The second one is that factorization into irreducibles is not necessarily unique,
since 2 · 3 = (1 +
5)(1
5) are two factorizations into irreducibles.
However, there is one situation when unique factorizations holds. This is
when we have a Euclidean algorithm available.
Definition (Euclidean domain). An integral domain
R
is a Euclidean domain
(ED) if there is a Euclidean function φ : R \ {0} Z
0
such that
(i) φ(a · b) φ(b) for all a, b 6= 0
(ii) If a, b R, with b 6= 0, then there are q, r R such that
a = b · q + r,
and either r = 0 or φ(r) < φ(b).
What are examples? Every time in this course where we said “Euclidean
algorithm”, we have an example.
Example. Z is a Euclidean domain with φ(n) = |n|.
Example. For any field F, F[X] is a Euclidean domain with
φ(f) = deg(f ).
Example. The Gaussian integers
R
=
Z
[
i
]
C
is a Euclidean domain with
φ(z) = N(z) = |z|
2
. We now check this:
(i) We have φ(zw) = φ(z)φ(w) φ(z), since φ(w) is a positive integer.
(ii) Given a, b Z[i], b 6= 0. We consider the complex number
a
b
C.
Consider the following complex plane, where the red dots are points in
Z[i].
Re
Im
a
b
By looking at the picture, we know that there is some
q Z
[
i
] such that
a
b
q
< 1. So we can write
a
b
= q + c
with |c| < 1. Then we have
a = b · q + b · c
|{z}
r
.
We know
r
=
a bq Z
[
i
], and
φ
(
r
) =
N
(
bc
) =
N
(
b
)
N
(
c
)
< N
(
b
) =
φ
(
b
).
So done.
This is not just true for the Gaussian integers. All we really needed was that
R C
, and for any
x C
, there is some point in
R
that is not more than 1 away
from x. If we draw some more pictures, we will see this is not true for Z[
5].
Before we move on to prove unique factorization, we first derive something
we’ve previously mentioned. Recall we showed that every ideal in
Z
is principal,
and we proved this by the Euclidean algorithm. So we might expect this to be
true in an arbitrary Euclidean domain.
Definition (Principal ideal domain). A ring
R
is a principal ideal domain (PID)
if it is an integral domain, and every ideal is a principal ideal, i.e. for all
I C R
,
there is some a such that I = (a).
Example. Z is a principal ideal domain.
Proposition. Let
R
be a Euclidean domain. Then
R
is a principal ideal domain.
We have already proved this, just that we did it for a particular Euclidean
domain Z. Nonetheless, we shall do it again.
Proof.
Let
R
have a Euclidean function
φ
:
R \ {
0
} Z
0
. We let
I C R
be a
non-zero ideal, and let
b I \ {
0
}
be an element with
φ
(
b
) minimal. Then for
any a I, we write
a = bq + r,
with
r
= 0 or
φ
(
r
)
< φ
(
b
). However, any such
r
must be in
I
since
r
=
a bq I
.
So we cannot have
φ
(
r
)
< φ
(
b
). So we must have
r
= 0. So
a
=
bq
. So
a
(
b
).
Since this is true for all
a I
, we must have
I
(
b
). On the other hand, since
b I, we must have (b) I. So we must have I = (b).
This is exactly, word by word, the same proof as we gave for the integers,
except we replaced the absolute value with φ.
Example.
Z
is a Euclidean domain, and hence a principal ideal domain. Also,
for any field F, F[X] is a Euclidean domain, hence a principal ideal domain.
Also, Z[i] is a Euclidean domain, and hence a principal ideal domain.
What is a non-example of principal ideal domains? In
Z
[
X
], the ideal
(2
, X
)
C Z
[
X
] is not a principal ideal. Suppose it were. Then (2
, X
) = (
f
). Since
2
(2
, X
) = (
f
), we know 2
(
f
) , i.e. 2 =
f · g
for some
g
. So
f
has degree
zero, and hence constant. So f = ±1 or ±2.
If
f
=
±
1, since
±
1 are units, then (
f
) =
Z
[
X
]. But (2
, X
)
6
=
Z
[
X
], since,
say, 1
6∈
(2
, X
). If
f
=
±
2, then since
X
(2
, X
) = (
f
), we must have
±
2
| X
,
but this is clearly false. So (2, X) cannot be a principal ideal.
Example. Let
A M
n×n
(
F
) be an
n × n
matrix over a field
F
. We consider
the following set
I = {f F[X] : f(A) = 0}.
This is an ideal if
f, g I
, then (
f
+
g
)(
A
) =
f
(
A
) +
g
(
A
) = 0. Similarly, if
f I and h F[X], then (f g)(A) = f(A)g(A) = 0.
But we know
F
[
X
] is a principal ideal domain. So there must be some
m F[X] such that I = (m) for some m.
Suppose
f F
[
X
] such that
f
(
A
) = 0, i.e.
f I
. Then
m | f
. So
m
is
a polynomial that divides all polynomials that kill
A
, i.e.
m
is the minimal
polynomial of A.
We have just proved that all matrices have minimal polynomials, and that
the minimal polynomial divides all other polynomials that kill
A
. Also, the
minimal polynomial is unique up to multiplication of units.
Let’s get further into number theory-like things. For a general ring, we
cannot factorize things into irreducibles uniquely. However, in some rings, this
is possible.
Definition (Unique factorization domain). An integral domain
R
is a unique
factorization domain (UFD) if
(i) Every non-unit may be written as a product of irreducibles;
(ii)
If
p
1
p
2
···p
n
=
q
1
···q
m
with
p
i
, q
j
irreducibles, then
n
=
m
, and they can
be reordered such that p
i
is an associate of q
i
.
This is a really nice property, and here we can do things we are familiar with
in number theory. So how do we know if something is a unique factorization
domain?
Our goal is to show that all principal ideal domains are unique factorization
domains. To do so, we are going to prove several lemmas that give us some
really nice properties of principal ideal domains.
Recall we saw that every prime is an irreducible, but in
Z
[
5
], there are
some irreducibles that are not prime. However, this cannot happen in principal
ideal domains.
Lemma. Let
R
be a principal ideal domain. If
p R
is irreducible, then it is
prime.
Note that this is also true for general unique factorization domains, which
we can prove directly by unique factorization.
Proof.
Let
p R
be irreducible, and suppose
p | a · b
. Also, suppose
p - a
. We
need to show p | b.
Consider the ideal (
p, a
)
C R
. Since
R
is a principal ideal domain, there is
some d R such that (p, a) = (d). So d | p and d | a.
Since
d | p
, there is some
q
1
such that
p
=
q
1
d
. As
p
is irreducible, either
q
1
or d is a unit.
If
q
1
is a unit, then
d
=
q
1
1
p
, and this divides
a
. So
a
=
q
1
1
px
for some
x
.
This is a contradiction, since p - a.
Therefore
d
is a unit. So (
p, a
) = (
d
) =
R
. In particular, 1
R
(
p, a
). So
suppose 1
R
=
rp
+
sa
, for some
r, s R
. We now take the whole thing and
multiply by b. Then we get
b = rpb + sab.
We observe that
ab
is divisible by
p
, and so is
p
. So
b
is divisible by
p
. So
done.
This is similar to the argument for integers. For integers, we would say if
p - a
,
then
p
and
a
are coprime. Therefore there are some
r, s
such that 1 =
rp
+
sa
.
Then we continue the proof as above. Hence what we did in the middle is to do
something similar to showing p and a are “coprime”.
Another nice property of principal ideal domains is the following:
Lemma. Let
R
be a principal ideal domain. Let
I
1
I
2
I
3
···
be a chain
of ideals. Then there is some N N such that I
n
= I
n+1
for some n N.
So in a principal ideal domain, we cannot have an infinite chain of bigger
and bigger ideals.
Definition (Ascending chain condition). A ring satisfies the ascending chain
condition (ACC) if there is no infinite strictly increasing chain of ideals.
Definition (Noetherian ring). A ring that satisfies the ascending chain condition
is known as a Noetherian ring.
So we are proving that every principal ideal domain is Noetherian.
Proof.
The obvious thing to do when we have an infinite chain of ideals is to
take the union of them. We let
I =
[
n1
I
n
,
which is again an ideal. Since
R
is a principal ideal domain,
I
= (
a
) for some
a R. We know a I =
S
n=0
I
n
. So a I
N
for some N . Then we have
(a) I
N
I = (a)
So we must have I
N
= I. So I
n
= I
N
= I for all n N.
Notice it is not important that
I
is generated by one element. If, for some
reason, we know
I
is generated by finitely many elements, then the same argument
work. So if every ideal is finitely generated, then the ring must be Noetherian.
It turns out this is an if-and-only-if if you are Noetherian, then every ideal is
finitely generated. We will prove this later on in the course.
Finally, we have done the setup, and we can prove the proposition promised.
Proposition. Let
R
be a principal ideal domain. Then
R
is a unique factoriza-
tion domain.
Proof. We first need to show any (non-unit) r R is a product of irreducibles.
Suppose
r R
cannot be factored as a product of irreducibles. Then it is
certainly not irreducible. So we can write
r
=
r
1
s
1
, with
r
1
, s
1
both non-units.
Since
r
cannot be factored as a product of irreducibles, wlog
r
1
cannot be
factored as a product of irreducibles (if both can, then
r
would be a product of
irreducibles). So we can write
r
1
=
r
2
s
2
, with
r
2
, s
2
not units. Again, wlog
r
2
cannot be factored as a product of irreducibles. We continue this way.
By assumption, the process does not end, and then we have the following
chain of ideals:
(r) (r
1
) (r
2
) ··· (r
n
) ···
But then we have an ascending chain of ideals. By the ascending chain condition,
these are all eventually equal, i.e. there is some
n
such that (
r
n
) = (
r
n+1
) =
(
r
n+2
) =
···
. In particular, since (
r
n
) = (
r
n+1
), and
r
n
=
r
n+1
s
n+1
, then
s
n+1
is a unit. But this is a contradiction, since
s
n+1
is not a unit. So
r
must be a
product of irreducibles.
To show uniqueness, we let
p
1
p
2
···p
n
=
q
1
q
2
···q
m
, with
p
i
, q
i
irreducible.
So in particular
p
1
| q
1
···q
m
. Since
p
1
is irreducible, it is prime. So
p
1
divides
some
q
i
. We reorder and suppose
p
1
| q
1
. So
q
1
=
p
1
·a
for some
a
. But since
q
1
is irreducible,
a
must be a unit. So
p
1
, q
1
are associates. Since
R
is a principal
ideal domain, hence integral domain, we can cancel p
1
to obtain
p
2
p
3
···p
n
= (aq
2
)q
3
···q
m
.
We now rename aq
2
as q
2
, so that we in fact have
p
2
p
3
···p
n
= q
2
q
3
···q
m
.
We can then continue to show that
p
i
and
q
i
are associates for all
i
. This also
shows that
n
=
m
, or else if
n
=
m
+
k
, saw, then
p
k+1
···p
n
= 1, which is a
contradiction.
We can now use this to define other familiar notions from number theory.
Definition (Greatest common divisor).
d
is a greatest common divisor (gcd) of
a
1
, a
2
, ··· , a
n
if
d | a
i
for all
i
, and if any other
d
0
satisfies
d
0
| a
i
for all
i
, then
d
0
| d.
Note that the gcd of a set of numbers, if exists, is not unique. It is only
well-defined up to a unit.
This is a definition that says what it means to be a greatest common divisor.
However, it does not always have to exist.
Lemma. Let
R
be a unique factorization domain. Then greatest common
divisors exists, and is unique up to associates.
Proof.
We construct the greatest common divisor using the good-old way of
prime factorization.
We let
p
1
, p
2
, ··· , p
m
be a list of all irreducible factors of
a
i
, such that no
two of these are associates of each other. We now write
a
i
= u
i
m
Y
j=1
p
n
ij
j
,
where n
ij
N and u
i
are units. We let
m
j
= min
i
{n
ij
},
and choose
d =
m
Y
j=1
p
m
j
j
.
As, by definition, m
j
n
ij
for all i, we know d | a
i
for all i.
Finally, if d
0
| a
i
for all i, then we let
d
0
= v
m
Y
j=1
p
t
j
j
.
Then we must have
t
j
n
ij
for all
i, j
. So we must have
t
j
m
j
for all
j
. So
d
0
| d.
Uniqueness is immediate since any two greatest common divisors have to
divide each other.
2.5 Factorization in polynomial rings
Since polynomial rings are a bit more special than general integral domains, we
can say a bit more about them.
Recall that for
F
a field, we know
F
[
X
] is a Euclidean domain, hence a
principal ideal domain, hence a unique factorization domain. Therefore we know
(i) If I C F [X], then I = (f ) for some f F [X].
(ii) If f F [X], then f is irreducible if and only if f is prime.
(iii)
Let
f
be irreducible, and suppose (
f
)
J F
[
X
]. Then
J
= (
g
) for some
g
. Since (
f
)
(
g
), we must have
f
=
gh
for some
h
. But
f
is irreducible.
So either
g
or
h
is a unit. If
g
is a unit, then (
g
) =
F
[
X
]. If
h
is a unit,
then (
f
) = (
g
). So (
f
) is a maximal ideal. Note that this argument is valid
for any PID, not just polynomial rings.
(iv)
Let (
f
) be a prime ideal. Then
f
is prime. So
f
is irreducible. So (
f
) is
maximal. But we also know in complete generality that maximal ideals are
prime. So in
F
[
X
], prime ideals are the same as maximal ideals. Again,
this is true for all PIDs in general.
(v) Thus f is irreducible if and only if F [X]/(f ) is a field.
To use the last item, we can first show that
F
[
X
]
/
(
f
) is a field, and then use this
to deduce that
f
is irreducible. But we can also do something more interesting
find an irreducible f , and then generate an interesting field F [X]/(f).
So we want to understand reducibility, i.e. we want to know whether we can
factorize a polynomial
f
. Firstly, we want to get rid of the trivial case where we
just factor out a scalar, e.g. 2
X
2
+ 2 = 2(
X
2
+ 1)
Z
[
X
] is a boring factorization.
Definition (Content). Let
R
be a UFD and
f
=
a
0
+
a
1
X
+
···
+
a
n
X
n
R
[
X
].
The content c(f ) of f is
c(f) = gcd(a
0
, a
1
, ··· , a
n
) R.
Again, since the gcd is only defined up to a unit, so is the content.
Definition (Primitive polynomial). A polynomial is primitive if
c
(
f
) is a unit,
i.e. the a
i
are coprime.
Note that this is the best we can do. We cannot ask for
c
(
f
) to be exactly 1,
since the gcd is only well-defined up to a unit.
We now want to prove the following important lemma:
Lemma (Gauss’ lemma). Let
R
be a UFD, and
f R
[
X
] be a primitive
polynomial. Then
f
is reducible in
R
[
X
] if and only if
f
is reducible
F
[
X
], where
F is the field of fractions of R.
We can’t do this right away. We first need some preparation. Before that,
we do some examples.
Example. Consider
X
3
+
X
+ 1
Z
[
X
]. This has content 1 so is primitive. We
show it is not reducible in Z[X], and hence not reducible in Q[X].
Suppose
f
is reducible in
Q
[
X
]. Then by Gauss’ lemma, this is reducible in
Z[X]. So we can write
X
3
+ X + 1 = gh,
for some polynomials
g, h Z
[
X
], with
g, h
not units. But if
g
and
h
are not
units, then they cannot be constant, since the coefficients of
X
3
+
X
+ 1 are all
1 or 0. So they have degree at least 1. Since the degrees add up to 3, we wlog
suppose g has degree 1 and h has degree 2. So suppose
g = b
0
+ b
1
X, h = c
0
+ c
1
X + c
2
X
2
.
Multiplying out and equating coefficients, we get
b
0
c
0
= 1
c
2
b
1
= 1
So
b
0
and
b
1
must be
±
1. So
g
is either 1 +
X,
1
X,
1 +
X
or
1
X
, and
hence has
±
1 as a root. But this is a contradiction, since
±
1 is not a root of
X
3
+ X + 1. So f is not reducible in Q. In particular, f has no root in Q.
We see the advantage of using Gauss’ lemma if we worked in
Q
instead,
we could have gotten to the step
b
0
c
0
= 1, and then we can do nothing, since
b
0
and c
0
can be many things if we live in Q.
Now we start working towards proving this.
Lemma. Let R be a UFD. If f, g R[X] are primitive, then so is f g.
Proof. We let
f = a
0
+ a
1
X + ··· + a
n
X
n
,
g = b
0
+ b
1
X + ··· + b
m
X
m
,
where
a
n
, b
m
6
= 0, and
f, g
are primitive. We want to show that the content of
fg is a unit.
Now suppose
fg
is not primitive. Then
c
(
fg
) is not a unit. Since
R
is a
UFD, we can find an irreducible p which divides c(fg).
By assumption,
c
(
f
) and
c
(
g
) are units. So
p - c
(
f
) and
p - c
(
g
). So suppose
p | a
0
,
p | a
1
, . . . ,
p | a
k1
but
p - a
k
. Note it is possible that
k
= 0. Similarly,
suppose p | b
0
, p | b
1
, ··· , p | b
`1
, p - b
`
.
We look at the coefficient of X
k+`
in f g. It is given by
X
i+j=k+`
a
i
b
j
= a
k+`
b
0
+ ··· + a
k+1
b
`1
+ a
k
b
`
+ a
k1
b
`+1
+ ··· + a
0
b
`+k
.
By assumption, this is divisible by p. So
p |
X
i+j=k+`
a
i
b
j
.
However, the terms
a
k+`
b
0
+
···
+
a
k+1
b
`1
, is divisible by
p
, as
p | b
j
for
j < `
.
Similarly,
a
k1
b
`+1
+
···
+
a
0
b
`+k
is divisible by
p
. So we must have
p | a
k
b
`
.
As
p
is irreducible, and hence prime, we must have
p | a
k
or
p | b
`
. This is a
contradiction. So c(fg) must be a unit.
Corollary. Let
R
be a UFD. Then for
f, g R
[
X
], we have that
c
(
fg
) is an
associate of c(f)c(g).
Again, we cannot say they are equal, since content is only well-defined up to
a unit.
Proof.
We can write
f
=
c
(
f
)
f
1
and
g
=
c
(
g
)
g
1
, with
f
1
and
g
1
primitive. Then
fg = c(f)c(g)f
1
g
1
.
Since
f
1
g
1
is primitive, so
c
(
f
)
c
(
g
) is a gcd of the coefficients of
fg
, and so is
c(fg), by definition. So they are associates.
Finally, we can prove Gauss’ lemma.
Lemma (Gauss’ lemma). Let
R
be a UFD, and
f R
[
X
] be a primitive
polynomial. Then
f
is reducible in
R
[
X
] if and only if
f
is reducible
F
[
X
], where
F is the field of fractions of R.
Proof.
We will show that a primitive
f R
[
X
] is reducible in
R
[
X
] if and only
if f is reducible in F [X].
One direction is almost immediately obvious. Let
f
=
gh
be a product in
R
[
X
] with
g, h
not units. As
f
is primitive, so are
g
and
h
. So both have degree
> 0. So g, h are not units in F [X]. So f is reducible in F [X].
The other direction is less obvious. We let
f
=
gh
in
F
[
X
], with
g, h
not units.
So
g
and
h
have degree
>
0, since
F
is a field. So we can clear denominators
by finding
a, b R
such that (
ag
)
,
(
bh
)
R
[
X
] (e.g. let
a
be the product of
denominators of coefficients of g). Then we get
abf = (ag)(bh),
and this is a factorization in
R
[
X
]. Here we have to be careful (
ag
) is one
thing that lives in
R
[
X
], and is not necessarily a product in
R
[
X
], since
g
might
not be in R[X]. So we should just treat it as a single symbol.
We now write
(ag) = c(ag)g
1
,
(bh) = c(bh)h
1
,
where g
1
, h
1
are primitive. So we have
ab = c(abf) = c((ag)(bh)) = u · c(ag)c(bh),
where u R is a unit, by the previous corollary. But also we have
abf = c(ag)c(gh)g
1
h
1
= u
1
abg
1
h
1
.
So cancelling ab gives
f = u
1
g
1
h
1
R[X].
So f is reducible in R[X].
If this looks fancy and magical, you can try to do this explicitly in the case
where R = Z and F = Q. Then you will probably get enlightened.
We will do another proof performed in a similar manner.
Proposition. Let
R
be a UFD, and
F
be its field of fractions. Let
g R
[
X
] be
primitive. We let
J = (g) C R[X], I = (g) C F [X].
Then
J = I R[X].
In other words, if
f R
[
X
] and we can write it as
f
=
gh
, with
h F
[
X
], then
in fact h R[X].
Proof.
The strategy is the same we clear denominators in the equation
f
=
gh
,
and then use contents to get that down in R[X].
We certainly have J I R[X]. Now let f I R[X]. So we can write
f = gh,
with h F [X]. So we can choose b R such that bh R[X]. Then we know
bf = g(bh) R[X].
We let
(bh) = c(bh)h
1
,
for h
1
R[X] primitive. Thus
bf = c(bh)gh
1
.
Since
g
is primitive, so is
gh
1
. So
c
(
bh
) =
uc
(
bf
) for
u
a unit. But
bf
is really a
product in R[X]. So we have
c(bf) = c(b)c(f) = bc(f).
So we have
bf = ubc(f)gh
1
.
Cancelling b gives
f = g(uc(f)h
1
).
So g | f in R[X]. So f J.
From this we can get ourselves a large class of UFDs.
Theorem. If R is a UFD, then R[X] is a UFD.
In particular, if R is a UFD, then R[X
1
, ··· , X
n
] is also a UFD.
Proof.
We know
R
[
X
] has a notion of degree. So we will combine this with the
fact that R is a UFD.
Let
f R
[
X
]. We can write
f
=
c
(
f
)
f
1
, with
f
1
primitive. Firstly, as
R
is a
UFD, we may factor
c(f) = p
1
p
2
···p
n
,
for
p
i
R
irreducible (and also irreducible in
R
[
X
]). Now we want to deal with
f
1
.
If f
1
is not irreducible, then we can write
f
1
= f
2
f
3
,
with
f
2
, f
3
both not units. Since
f
1
is primitive,
f
2
, f
3
also cannot be constants.
So we must have
deg f
2
, deg f
3
>
0. Also, since
deg f
2
+
deg f
3
=
deg f
1
, we must
have
deg f
2
, deg f
3
< deg f
1
. If
f
2
, f
3
are irreducible, then done. Otherwise, keep
on going. We will eventually stop since the degrees have to keep on decreasing.
So we can write it as
f
1
= q
1
···q
m
,
with q
i
irreducible. So we can write
f = p
1
p
2
···p
n
q
1
q
2
···q
m
,
a product of irreducibles.
For uniqueness, we first deal with the p’s. We note that
c(f) = p
1
p
2
···p
n
is a unique factorization of the content, up to reordering and associates, as
R
is
a UFD. So cancelling the content, we only have to show that primitives can be
factored uniquely.
Suppose we have two factorizations
f
1
= q
1
q
2
···q
m
= r
1
r
2
···r
`
.
Note that each
q
i
and each
r
i
is a factor of the primitive polynomial
f
1
, so are
also primitive. Now we do (maybe) the unexpected thing. We let
F
be the
field of fractions of
R
, and consider
q
i
, r
i
F
[
X
]. Since
F
is a field,
F
[
X
] is
a Euclidean domain, hence principal ideal domain, hence unique factorization
domain.
By Gauss’ lemma, since the
q
i
and
r
i
are irreducible in
R
[
X
], they are also
irreducible in
F
[
X
]. As
F
[
X
] is a UFD, we find that
`
=
m
, and after reordering,
r
i
and q
i
are associates, say
r
i
= u
i
q
i
,
with
u
i
F
[
X
] a unit. What we want to say is that
r
i
is a unit times
q
i
in
R
[
X
].
Firstly, note that u
i
F as it is a unit. Clearing denominators, we can write
a
i
r
i
= b
i
q
i
R[X].
Taking contents, since
r
i
, q
i
are primitives, we know
a
i
and
b
i
are associates, say
b
i
= v
i
a
i
,
with
v
i
R
a unit. Cancelling
a
i
on both sides, we know
r
i
=
v
i
q
i
as required.
The key idea is to use Gauss’ lemma to say the reducibility in
R
[
X
] is the
same as reducibility in
F
[
X
], as long as we are primitive. The first part about
contents is just to turn everything into primitives.
Note that the last part of the proof is just our previous proposition. We
could have applied it, but we decide to spell it out in full for clarity.
Example. We know
Z
[
X
] is a UFD, and if
R
is a UFD, then
R
[
X
1
, ··· , X
n
] is
also a UFD.
This is a useful thing to know. In particular, it gives us examples of UFDs
that are not PIDs. However, in such rings, we would also like to have an easy to
determine whether something is reducible. Fortunately, we have the following
criterion:
Proposition (Eisenstein’s criterion). Let R be a UFD, and let
f = a
0
+ a
1
X + ··· + a
n
X
n
R[X]
be primitive with a
n
6= 0. Let p R be irreducible (hence prime) be such that
(i) p - a
n
;
(ii) p | a
i
for all 0 i < n;
(iii) p
2
- a
0
.
Then
f
is irreducible in
R
[
X
], and hence in
F
[
X
] (where
F
is the field of fractions
of R).
It is important that we work in
R
[
X
] all the time, until the end where we
apply Gauss’ lemma. Otherwise, we cannot possibly apply Eisenstein’s criterion
since there are no primes in F .
Proof. Suppose we have a factorization f = gh with
g = r
0
+ r
1
X + ··· + r
k
X
k
h = s
0
+ s
1
X + ··· + s
`
X
`
,
for r
k
, s
`
6= 0.
We know
r
k
s
`
=
a
n
. Since
p - a
n
, so
p - r
k
and
p - s
`
. We can also look at
bottom coefficients. We know
r
0
s
0
=
a
0
. We know
p | a
0
and
p
2
- a
0
. So
p
divides exactly one of r
0
and s
0
. wlog, p | r
0
and p - s
0
.
Now let j be such that
p | r
0
, p | r
1
, ··· , p | r
j1
, p - r
j
.
We now look at a
j
. This is, by definition,
a
j
= r
0
s
j
+ r
1
s
j1
+ ··· + r
j1
s
1
+ r
j
s
0
.
We know r
0
, ··· , r
j1
are all divisible by p. So
p | r
0
s
j
+ r
1
s
j1
+ ··· + r
j1
s
1
.
Also, since
p - r
j
and
p - s
0
, we know
p - r
j
s
0
, using the fact that
p
is prime. So
p - a
j
. So we must have j = n.
We also know that
j k n
. So we must have
j
=
k
=
n
. So
deg g
=
n
.
Hence
`
=
n h
= 0. So
h
is a constant. But we also know
f
is primitive. So
h
must be a unit. So this is not a proper factorization.
Example. Consider the polynomial
X
n
p Z
[
X
] for
p
a prime. Apply
Eisenstein’s criterion with
p
, and observe all the conditions hold. This is
certainly primitive, since this is monic. So
X
n
p
is irreducible in
Z
[
X
], hence
in
Q
[
X
]. In particular,
X
n
p
has no rational roots, i.e.
n
p
is irrational (for
n > 1).
Example. Consider a polynomial
f = X
p1
+ X
p2
+ ··· + X
2
+ X + 1 Z[X],
where
p
is a prime number. If we look at this, we notice Eisenstein’s criteria
does not apply. What should we do? We observe that
f =
X
p
1
X 1
.
So it might be a good idea to let Y = X 1. Then we get a new polynomial
ˆ
f =
ˆ
f(Y ) =
(Y + 1)
p
1
Y
= Y
p1
+
p
1
Y
p2
+
p
2
Y
p3
+ ··· +
p
p 1
.
When we look at it hard enough, we notice Eisenstein’s criteria can be applied
we know
p |
p
i
for 1
i p
1, but
p
2
-
p
p1
=
p
. So
ˆ
f
is irreducible in
Z
[
Y
].
Now if we had a factorization
f(X) = g(X)h(X) Z[X],
then we get
ˆ
f(Y ) = g(Y + 1)h(Y + 1)
in Z[Y ]. So f is irreducible.
Hence none of the roots of
f
are rational (but we already know that they
are not even real!).
2.6 Gaussian integers
We’ve mentioned the Gaussian integers already.
Definition (Gaussian integers). The Gaussian integers is the subring
Z[i] = {a + bi : a, b Z} C.
We have already shown that the norm
N
(
a
+
ib
) =
a
2
+
b
2
is a Euclidean
function for
Z
[
i
]. So
Z
[
i
] is a Euclidean domain, hence principal ideal domain,
hence a unique factorization domain.
Since the units must have norm 1, they are precisely
±
1
, ±i
. What does
factorization in Z[i] look like? What are the primes? We know we are going to
get new primes, i.e. primes that are not integers, while we will lose some other
primes. For example, we have
2 = (1 + i)(1 i).
So 2 is not irreducible, hence not prime. However, 3 is a prime. We have
N
(3) = 9. So if 3 =
uv
, with
u, v
not units, then 9 =
N
(
u
)
N
(
v
), and neither
N
(
u
) nor
N
(
v
) are 1. So
N
(
u
) =
N
(
v
) = 3. However, 3 =
a
2
+
b
2
has no
solutions with
a, b Z
. So there is nothing of norm 3. So 3 is irreducible, hence
a prime.
Also, 5 is not prime, since
5 = (1 + 2i)(1 2i).
How can we understand which primes stay as primes in the Gaussian integers?
Proposition. A prime number
p Z
is prime in
Z
[
i
] if and only if
p 6
=
a
2
+
b
2
for a, b Z \ {0}.
The proof is exactly what we have done so far.
Proof. If p = a
2
+ b
2
, then p = (a + ib)(a ib). So p is not irreducible.
Now suppose
p
=
uv
, with
u, v
not units. Taking norms, we get
p
2
=
N
(
u
)
N
(
v
). So if
u
and
v
are not units, then
N
(
u
) =
N
(
v
) =
p
. Writing
u = a + ib, then this says a
2
+ b
2
= p.
So what we have to do is to understand when a prime
p
can be written as a
sum of two squares. We will need the following helpful lemma:
Lemma. Let
p
be a prime number. Let
F
p
=
Z/pZ
be the field with
p
elements.
Let
F
×
p
=
F
p
\{
0
}
be the group of invertible elements under multiplication. Then
F
×
p
=
C
p1
.
Proof.
Certainly
F
×
p
has order
p
1, and is abelian. We know from the classifi-
cation of finite abelian groups that if
F
×
p
is not cyclic, then it must contain a
subgroup
C
m
×C
m
for
m >
1 (we can write it as
C
d
×C
d
0
×···
, and that
d
0
| d
.
So C
d
has a subgroup isomorphic to C
d
0
).
We consider the polynomial
X
m
1
F
p
[
x
], which is a UFD. At best, this
factors into
m
linear factors. So
X
m
1 has at most
m
distinct roots. But if
C
m
× C
m
F
×
p
, then we can find
m
2
elements of order dividing
m
. So there
are
m
2
elements of
F
p
which are roots of
X
m
1. This is a contradiction. So
F
×
p
is cyclic.
This is a funny proof, since we have not found any element that has order
p 1.
Proposition. The primes in Z[i] are, up to associates,
(i) Prime numbers p Z Z[i] such that p 3 (mod 4).
(ii)
Gaussian integers
z Z
[
i
] with
N
(
z
) =
z¯z
=
p
for some prime
p
such that
p = 2 or p 1 (mod 4).
Proof.
We first show these are primes. If
p
3 (
mod
4), then
p 6
=
a
2
+
b
2
, since
a square number mod 4 is always 0 or 1. So these are primes in Z[i].
On the other hand, if
N
(
z
) =
p
, and
z
=
uv
, then
N
(
u
)
N
(
v
) =
p
. So
N
(
u
)
is 1 or
N
(
v
) is 1. So
u
or
v
is a unit. Note that we did not use the condition
that p 6≡ 3 (mod 4). This is not needed, since N(z) is always a sum of squares,
and hence N (z) cannot be a prime that is 3 mod 4.
Now let
z Z
[
i
] be irreducible, hence prime. Then
¯z
is also irreducible. So
N
(
z
) =
z¯z
is a factorization of
N
(
z
) into irreducibles. Let
p Z
be an ordinary
prime number dividing N (z), which exists since N (z) 6= 1.
Now if
p
3 (
mod
4), then
p
itself is prime in
Z
[
i
] by the first part of the
proof. So
p | N
(
z
) =
z¯z
. So
p | z
or
p | ¯z
. Note that if
p | ¯z
, then
p | z
by taking
complex conjugates. So we get
p | z
. Since both
p
and
z
are irreducible, they
must be equal up to associates.
Otherwise, we get
p
= 2 or
p
1 (
mod
4). If
p
1 (
mod
4), then
p
1 = 4
k
for some
k Z
. As
F
×
p
=
C
p1
=
C
4k
, there is a unique element of order 2 (this
is true for any cyclic group of order 4
k
think
Z/
4
kZ
). This must be [
1]
F
p
.
Now let a F
×
p
be an element of order 4. Then a
2
has order 2. So [a
2
] = [1].
This is a complicated way of saying we can find an
a
such that
p | a
2
+ 1.
Thus
p |
(
a
+
i
)(
a i
). In the case where
p
= 2, we know by checking directly
that 2 = (1 + i)(1 i).
In either case, we deduce that
p
(or 2) is not prime (hence irreducible),
since it clearly does not divide
a ± i
(or 1
± i
). So we can write
p
=
z
1
z
2
, for
z
1
, z
2
Z[i] not units. Now we get
p
2
= N(p) = N(z
1
)N(z
2
).
As the
z
i
are not units, we know
N
(
z
1
) =
N
(
z
2
) =
p
. By definition, this means
p = z
1
¯z
1
= z
2
¯z
2
. But also p = z
1
z
2
. So we must have ¯z
1
= z
2
.
Finally, we have
p
=
z
1
¯z
1
| N
(
z
) =
z¯z
. All these
z
,
z
i
are irreducible. So
z
must be an associate of z
1
(or maybe ¯z
1
). So in particular N (z) = p.
Corollary. An integer
n Z
0
may be written as
x
2
+
y
2
(as the sum of two
squares) if and only if “when we write
n
=
p
n
1
1
p
n
2
2
···p
n
k
k
as a product as distinct
primes, then p
i
3 (mod 4) implies n
i
is even”.
We have proved this in the case when n is a prime.
Proof. If n = x
2
+ y
2
, then we have
n = (x + iy)(x iy) = N(x + iy).
Let
z
=
x
+
iy
. So we can write
z
=
α
1
···α
q
as a product of irreducibles in
Z
[
i
].
By the proposition, each
α
i
is either
α
i
=
p
(a genuine prime number with
p
3
(
mod
4)), or
N
(
α
i
) =
p
is a prime number which is either 2 or
1 (
mod
4). We
now take the norm to obtain
N = x
2
+ y
2
= N(z) = N(α
1
)N(α
2
) ···N(α
q
).
Now each
N
(
α
i
) is either
p
2
with
p
3 (
mod
4), or is just
p
for
p
= 2 or
p
1
(
mod
4). So if
p
m
is the largest power of
p
divides
n
, we find that
n
must be
even if p 3 (mod 4).
Conversely, let
n
=
p
n
1
1
p
n
2
2
···p
n
k
k
be a product of distinct primes. Now for
each p
i
, either p
i
3 (mod 4), and n
i
is even, in which case
p
n
i
i
= (p
2
i
)
n
i
/2
= N(p
n
i
/2
i
);
or
p
i
= 2 or
p
i
1 (
mod
4), in which case, the above proof shows that
p
i
=
N
(
α
i
)
for some α
i
. So p
n
i
= N(α
n
i
).
Since the norm is multiplicative, we can write
n
as the norm of some
z Z
[
i
].
So
n = N(z) = N(x + iy) = x
2
+ y
2
,
as required.
Example. We know 65 = 5
×
13. Since 5
,
13
1 (
mod
4), it is a sum of squares.
Moreover, the proof tells us how to find 65 as the sum of squares. We have to
factor 5 and 13 in Z[i]. We have
5 = (2 + i)(2 i)
13 = (2 + 3i)(2 3i).
So we know
65 = N(2 + i)N(2 + 3i) = N ((2 + i)(2 + 3i)) = N(1 + 8i) = 1
2
+ 8
2
.
But there is a choice here. We had to pick which factor is
α
and which is
¯α
. So
we can also write
65 = N((2 + i)(2 3i)) = N(7 4i) = 7
2
+ 4
2
.
So not only are we able to write them as sum of squares, but this also gives us
many ways of writing 65 as a sum of squares.
2.7 Algebraic integers
We generalize the idea of Gaussian integers to algebraic integers.
Definition (Algebraic integer). An
α C
is called an algebraic integer if it is
a root of a monic polynomial in
Z
[
X
], i.e. there is a monic
f Z
[
X
] such that
f(α) = 0.
We can immediately check that this is a sensible definition not all complex
numbers are algebraic integers, since there are only countably many polynomials
with integer coefficients, hence only countably many algebraic integers, but there
are uncountably many complex numbers.
Notation. For
α
an algebraic integer, we write
Z
[
α
]
C
for the smallest subring
containing α.
This can also be defined for arbitrary complex numbers, but it is less inter-
esting.
We can also construct
Z
[
α
] by taking it as the image of the map
φ
:
Z
[
X
]
C
given by g 7→ g(α). So we can also write
Z[α] =
Z[X]
I
, I = ker φ.
Note that
I
is non-empty, since, say,
f I
, by definition of an algebraic integer.
Proposition. Let α C be an algebraic integer. Then the ideal
I = ker(φ : Z[X] C, f 7→ f(α))
is principal, and equal to (f
α
) for some irreducible monic f
α
.
This is a non-trivial theorem, since
Z
[
X
] is not a principal ideal domain. So
there is no immediate guarantee that I is generated by one polynomial.
Definition (Minimal polynomial). Let
α C
be an algebraic integer. Then
the minimal polynomial is a polynomial
f
α
is the irreducible monic such that
I = ker(φ) = (f
α
).
Proof.
By definition, there is a monic
f Z
[
X
] such that
f
(
a
) = 0. So
f I
.
So
I 6
= 0. Now let
f
α
I
be such a polynomial of minimal degree. We may
suppose that
f
α
is primitive. We want to show that
I
= (
f
α
), and that
f
α
is
irreducible.
Let
h I
. We pretend we are living in
Q
[
X
]. Then we have the Euclidean
algorithm. So we can write
h = f
α
q + r,
with
r
= 0 or
deg r < deg f
α
. This was done over
Q
[
X
], not
Z
[
X
]. We now clear
denominators. We multiply by some a Z to get
ah = f
α
(aq) + (ar),
where now (
aq
)
,
(
ar
)
Z
[
X
]. We now evaluate these polynomials at
α
. Then
we have
ah(α) = f
α
(α)aq(α) + ar(α).
We know
f
α
(
α
) =
h
(
α
) = 0, since
f
α
and
h
are both in
I
. So
ar
(
α
) = 0. So
(
ar
)
I
. As
f
α
I
has minimal degree, we cannot have
deg
(
r
) =
deg
(
ar
)
<
deg(f
a
). So we must have r = 0.
Hence we know
ah = f
α
· (aq)
is a factorization in
Z
[
X
]. This is almost right, but we want to factor
h
, not
ah
.
Again, taking contents of everything, we get
ac(h) = c(ah) = c(f
α
(aq)) = c(aq),
as
f
α
is primitive. In particular,
a | c
(
aq
). This, by definition of content, means
(
aq
) can be written as
a¯q
, where
¯q Z
[
X
]. Cancelling, we get
q
=
¯q Z
[
X
]. So
we know
h = f
α
q (f
α
).
So we know I = (f
α
).
To show f
α
is irreducible, note that
Z[X]
(f
α
)
=
Z[X]
ker φ
=
im(φ) = Z[α] C.
Since
C
is an integral domain, so is
im
(
φ
). So we know
Z
[
X
]
/
(
f
α
) is an integral
domain. So (f
α
) is prime. So f
α
is prime, hence irreducible.
If this final line looks magical, we can unravel this proof as follows: suppose
f
α
=
pq
for some non-units
pq
. Then since
f
α
(
α
) = 0, we know
p
(
α
)
q
(
α
) = 0.
Since
p
(
α
)
, q
(
α
)
C
, which is an integral domain, we must have, say,
p
(
α
) = 0.
But then deg p < deg f
α
, so p 6∈ I = (f
α
). Contradiction.
Example.
(i) We know α = i is an algebraic integer with f
α
= X
2
+ 1.
(ii) Also, α =
2 is an algebraic integer with f
α
= X
2
2.
(iii)
More interestingly,
α
=
1
2
(1 +
3
) is an algebraic integer with
f
α
=
X
2
X 1.
(iv)
The polynomial
X
5
X
+
d Z
[
X
] with
d Z
0
has precisely one real
root
α
, which is an algebraic integer. It is a theorem, which will be proved
in IID Galois Theory, that this
α
cannot be constructed from integers
via +
, , ×, ÷,
n
·
. It is also a theorem, found in IID Galois Theory, that
degree 5 polynomials are the smallest degree for which this can happen (the
prove involves writing down formulas analogous to the quadratic formula
for degree 3 and 4 polynomials).
Lemma. Let α Q be an algebraic integer. Then α Z.
Proof.
Let
f
α
Z
[
X
] be the minimal polynomial, which is irreducible. In
Q
[
X
],
the polynomial
X α
must divide
f
α
. However, by Gauss’ lemma, we know
f Q
[
X
] is irreducible. So we must have
f
α
=
X α Z
[
X
]. So
α
is an
integer.
It turns out the collection of all algebraic integers form a subring of
C
. This
is not at all obvious given
f, g Z
[
X
] monic such that
f
(
α
) =
g
(
α
) = 0, there
is no easy way to find a new monic
h
such that
h
(
α
+
β
) = 0. We will prove this
much later on in the course.
2.8 Noetherian rings
We now revisit the idea of Noetherian rings, something we have briefly mentioned
when proving that PIDs are UFDs.
Definition (Noetherian ring). A ring is Noetherian if for any chain of ideals
I
1
I
2
I
3
··· ,
there is some N such that I
N
= I
N+1
= I
N+2
= ···.
This condition is known as the ascending chain condition.
Example. Every finite ring is Noetherian. This is since there are only finitely
many possible ideals.
Example. Every field is Noetherian. This is since there are only two possible
ideals.
Example. Every principal ideal domain (e.g.
Z
) is Noetherian. This is easy to
check directly, but the next proposition will make this utterly trivial.
Most rings we love and know are indeed Noetherian. However, we can
explicitly construct some non-Noetherian ideals.
Example. The ring
Z
[
X
1
, X
2
, X
3
, ···
] is not Noetherian. This has the chain of
strictly increasing ideals
(X
1
) (X
1
, X
2
) (X
1
, X
2
, X
3
) ··· .
We have the following proposition that makes Noetherian rings much more
concrete, and makes it obvious why PIDs are Noetherian.
Definition (Finitely generated ideal). An ideal
I
is finitely generated if it can
be written as I = (r
1
, ··· , r
n
) for some r
1
, ··· , r
n
R.
Proposition. A ring is Noetherian if and only if every ideal is finitely generated.
Every PID trivially satisfies this condition. So we know every PID is Noethe-
rian.
Proof. We start with the easier direction from concrete to abstract.
Suppose every ideal of
R
is finitely generated. Given the chain
I
1
I
2
···
,
consider the ideal
I = I
1
I
2
I
3
··· .
This is obviously an ideal, and you will check this manually in example sheet 2.
We know I is finitely generated, say I = (r
1
, ··· , r
n
), with r
i
I
k
i
. Let
K = max
i=1,··· ,n
{k
i
}.
Then r
1
, ··· , r
n
I
K
. So I
K
= I. So I
K
= I
K+1
= I
K+2
= ···.
To prove the other direction, suppose there is an ideal
I C R
that is not
finitely generated. We pick
r
1
I
. Since
I
is not finitely generated, we know
(r
1
) 6= I. So we can find some r
2
I \ (r
1
).
Again (
r
1
, r
2
)
6
=
I
. So we can find
r
3
I \
(
r
1
, r
2
). We continue on, and then
can find an infinite strictly ascending chain
(r
1
) (r
1
, r
2
) (r
1
, r
2
, r
3
) ··· .
So R is not Noetherian.
When we have developed some properties or notions, a natural thing to ask
is whether it passes on to subrings and quotients.
If
R
is Noetherian, does every subring of
R
have to be Noetherian? The
answer is no. For example, since
Z
[
X
1
, X
2
, ···
] is an integral domain, we can
take its field of fractions, which is a field, hence Noetherian, but Z[X
1
, X
2
, ···]
is a subring of its field of fractions.
How about quotients?
Proposition. Let
R
be a Noetherian ring and
I
be an ideal of
R
. Then
R/I
is
Noetherian.
Proof.
Whenever we see quotients, we should think of them as the image of a
homomorphism. Consider the quotient map
π : R R/I
x 7→ x + I.
We can prove this result by finitely generated or ascending chain condition. We
go for the former. Let
J C R/I
be an ideal. We want to show that
J
is finitely
generated. Consider the inverse image
π
1
(
J
). This is an ideal of
R
, and is
hence finitely generated, since
R
is Noetherian. So
π
1
(
J
) = (
r
1
, ··· , r
n
) for
some r
1
, ··· , r
n
R. Then J is generated by π(r
1
), ··· , π(r
n
). So done.
This gives us many examples of Noetherian rings. But there is one important
case we have not tackled yet polynomial rings. We know
Z
[
X
] is not a PID,
since (2
, X
) is not principal. However, this is finitely generated. So we are not
dead. We might try to construct some non-finitely generated ideal, but we are
bound to fail. This is since
Z
[
X
] is a Noetherian ring. This is a special case of
the following powerful theorem:
Theorem (Hilbert basis theorem). Let
R
be a Noetherian ring. Then so is
R[X].
Since Z is Noetherian, we know Z[X] also is. Hence so is Z[X, Y ] etc.
The Hilbert basis theorem was, surprisingly, proven by Hilbert himself. Before
that, there were many mathematicians studying something known as invariant
theory. The idea is that we have some interesting objects, and we want to look
at their symmetries. Often, there are infinitely many possible such symmetries,
and one interesting question to ask is whether there is a finite set of symmetries
that generate all possible symmetries.
This sounds like an interesting problem, so people devoted much time, writing
down funny proofs, showing that the symmetries are finitely generated. However,
the collection of such symmetries are often just ideals of some funny ring. So
Hilbert came along and proved the Hilbert basis theorem, and showed once and
for all that those rings are Noetherian, and hence the symmetries are finitely
generated.
Proof.
The proof is not too hard, but we will need to use both the ascending
chain condition and the fact that all ideals are finitely-generated.
Let
I C R
[
X
] be an ideal. We want to show it is finitely generated. Since we
know R is Noetherian, we want to generate some ideals of R from I.
How can we do this? We can do the silly thing of taking all constants of
I
,
i.e.
I R
. But we can do better. We can consider all linear polynomials, and
take their leading coefficients. Thinking for a while, this is indeed an ideal.
In general, for n = 0, 1, 2, ···, we let
I
n
= {r R : there is some f I such that f = rX
n
+ ···} {0}.
Then it is easy to see, using the strong closure property, that each ideal
I
n
is an
ideal of
R
. Moreover, they form a chain, since if
f I
, then
Xf I
, by strong
closure. So I
n
I
n+1
for all n.
By the ascending chain condition of
R
, we know there is some
N
such that
I
N
=
I
N+1
=
···
. Now for each 0
n N
, since
R
is Noetherian, we can write
I
n
= (r
(n)
1
, r
(n)
2
, ··· , r
(n)
k(n)
).
Now for each r
(n)
i
, we choose some f
(n)
i
I with f
(n)
i
= r
(n)
i
X
n
+ ···.
We now claim the polynomials
f
(n)
i
for 0
n N
and 1
i k
(
n
) generate
I.
Suppose not. We pick g I of minimal degree not generated by the f
(n)
i
.
There are two possible cases. If deg g = n N, suppose
g = rX
n
+ ··· .
We know r I
n
. So we can write
r =
X
i
λ
i
r
(n)
i
for some λ
i
R, since that’s what generating an ideal means. Then we know
X
i
λ
i
f
(n)
i
= rX
n
+ ··· I.
But if
g
is not in the span of the
f
(j)
i
, then so isn’t
g
P
i
λ
i
f
(n)
i
. But this has
a lower degree than g. This is a contradiction.
Now suppose
deg g
=
n > N
. This might look scary, but it is not, since
I
n
= I
N
. So we write the same proof. We write
g = rX
n
+ ··· .
But we know r I
n
= I
N
. So we know
r =
X
I
λ
i
r
(N)
i
.
Then we know
X
nN
X
i
λ
i
f
(n)
i
= rX
N
+ ··· I.
Hence
g X
nN
P
λ
i
f
(N)
i
has smaller degree than
g
, but is not in the span of
f
(j)
i
.
As an aside, let
E F
[
X
1
, X
2
, ··· , X
n
] be any set of polynomials. We view
this as a set of equations
f
= 0 for each
f E
. The claim is that to solve the
potentially infinite set of equations
E
, we actually only have to solve finitely
many equations.
Consider the ideal (
E
)
C F
[
X
1
, ··· , X
n
]. By the Hilbert basis theorem, there
is a finite list f
1
, ··· , f
k
such that
(f
1
, ··· , f
k
) = (E).
We want to show that we only have to solve
f
i
(
x
) = 0 for these
f
i
. Given
(α
1
, ··· , α
n
) F
n
, consider the homomorphism
φ
α
: F [X
1
, ··· , X
n
] F
X
i
7→ α
i
.
Then we know (
α
1
, ··· , α
n
)
F
n
is a solution to the equations
E
if and only
if (
E
)
ker
(
ϕ
α
). By our choice of
f
i
, this is true if and only if (
f
1
, ··· , f
k
)
ker
(
ϕ
α
). By inspection, this is true if and only if (
α
1
, ··· , α
n
) is a solution to
all of
f
1
, ··· , f
k
. So solving
E
is the same as solving
f
1
, ··· , f
k
. This is useful
in, say, algebraic geometry.
3 Modules
Finally, we are going to look at modules. Recall that to define a vector space,
we first pick some base field
F
. We then defined a vector space to be an abelian
group
V
with an action of
F
on
V
(i.e. scalar multiplication) that is compatible
with the multiplicative and additive structure of F.
In the definition, we did not at all mention division in
F
. So in fact we can
make the same definition, but allow
F
to be a ring instead of a field. We call
these modules. Unfortunately, most results we prove about vector spaces do use
the fact that
F
is a field. So many linear algebra results do not apply to modules,
and modules have much richer structures.
3.1 Definitions and examples
Definition (Module). Let
R
be a commutative ring. We say a quadruple
(M, +, 0
M
, ·) is an R-module if
(i) (M, +, 0
M
) is an abelian group
(ii) The operation · : R × M M satisfies
(a) (r
1
+ r
2
) · m = (r
1
· m) + (r
2
· m);
(b) r · (m
1
+ m
2
) = (r · m
1
) + (r · m
2
);
(c) r
1
· (r
2
· m) = (r
1
· r
2
) · m; and
(d) 1
R
· m = m.
Note that there are two different additions going on addition in the ring
and addition in the module, and similarly two notions of multiplication. However,
it is easy to distinguish them since they operate on different things. If needed,
we can make them explicit by writing, say, +
R
and +
M
.
We can imagine modules as rings acting on abelian groups, just as groups
can act on sets. Hence we might say
R
acts on
M
to mean
M
is an
R
-module.
Example. Let
F
be a field. An
F
-module is precisely the same as a vector space
over F (the axioms are the same).
Example. For any ring R, we have the R-module R
n
= R × R × ··· × R via
r · (r
1
, ··· , r
n
) = (rr
1
, ··· , rr
n
),
using the ring multiplication. This is the same as the definition of the vector
space F
n
for fields F.
Example. Let I C R be an ideal. Then it is an R-module via
r ·
I
a = r ·
R
a, r
1
+
I
r
2
= r
1
+
R
r
2
.
Also, R/I is an R-module via
r ·
R/I
(a + I) = (r ·
R
a) + I,
Example. A
Z
-module is precisely the same as an abelian group. For
A
an
abelian group, we have
Z × A A
(n, a) 7→ a + ··· + a
| {z }
n times
,
where we adopt the notation
a + ··· + a
| {z }
n times
= (a) + ··· + (a)
| {z }
n times
,
and adding something to itself 0 times is just 0.
This definition is essentially forced upon us, since by the axioms of a module,
we must have (1, a) 7→ a. Then we must send, say, (2, a) = (1 + 1, a) 7→ a + a.
Example. Let
F
be a field and
V
a vector space over
F
, and
α
:
V V
be a
linear map. Then V is an F[X]-module via
F[X] × V V
(f, v ) 7→ f (α)(v).
This is a module.
Note that we cannot just say that
V
is an
F
[
X
]-module. We have to specify
the α as well. Picking a different α will give a different F[X]-module structure.
Example. Let
φ
:
R S
be a homomorphism of rings. Then any
S
-module
M
may be considered as an R-module via
R × M M
(r, m) 7→ φ(r) ·
M
m.
Definition (Submodule). Let
M
be an
R
-module. A subset
N M
is an
R
-submodule if it is a subgroup of (
M,
+
,
0
M
), and if
n N
and
r R
, then
rn N. We write N M.
Example. We know
R
itself is an
R
-module. Then a subset of
R
is a submodule
if and only if it is an ideal.
Example. A subset of an
F
-module
V
, where
F
is a field, is an
F
-submodule if
and only if it is a vector subspace of V .
Definition (Quotient module). Let
N M
be an
R
-submodule. The quotient
module M/N is the set of N-cosets in (M, +, 0
M
), with the R-action given by
r · (m + N) = (r · m) + N.
It is easy to check this is well-defined and is indeed a module.
Note that modules are different from rings and groups. In groups, we had
subgroups, and we have some really nice ones called normal subgroups. We are
only allowed to quotient by normal subgroups. In rings, we have subrings and
ideals, which are unrelated objects, and we only quotient by ideals. In modules,
we only have submodules, and we can quotient by arbitrary submodules.
Definition (
R
-module homomorphism and isomorphism). A function
f
:
M
N
between
R
-modules is an
R
-module homomorphism if it is a homomorphism
of abelian groups, and satisfies
f(r · m) = r · f(m)
for all r R and m M.
An isomorphism is a bijective homomorphism, and two
R
-modules are
isomorphic if there is an isomorphism between them.
Note that on the left, the multiplication is the action in
M
, while on the
right, it is the action in N.
Example. If
F
is a field and
V, W
are
F
-modules (i.e. vector spaces over
F
),
then an F-module homomorphism is precisely an F-linear map.
Theorem (First isomorphism theorem). Let
f
:
M N
be an
R
-module
homomorphism. Then
ker f = {m M : f(m) = 0} M
is an R-submodule of M. Similarly,
im f = {f(m) : m M } N
is an R-submodule of N. Then
M
ker f
=
im f.
We will not prove this again. The proof is exactly the same.
Theorem (Second isomorphism theorem). Let A, B M. Then
A + B = {m M : m = a + b for some a A, b B} M,
and
A B M.
We then have
A + B
A
=
B
A B
.
Theorem (Third isomorphism theorem). Let N L M. Then we have
M
L
=
M
N
L
N
.
Also, we have a correspondence
{submodules of M/N} {submodules of M which contain N }
It is an exercise to see what these mean in the cases where
R
is a field, and
modules are vector spaces.
We now have something new. We have a new concept that was not present
in rings and groups.
Definition (Annihilator). Let
M
be an
R
-module, and
m M
. The annihilator
of m is
Ann(m) = {r R : r · m = 0}.
For any set S M, we define
Ann(S) = {r R : r · m = 0 for all m S} =
\
mS
Ann(m).
In particular, for the module M itself, we have
Ann(M) = {r R : r · m = 0 for all m M} =
\
mM
Ann(m).
Note that the annihilator is a subset of
R
. Moreover it is an ideal if
r ·m
= 0 and
s ·m
= 0, then (
r
+
s
)
·m
=
r ·m
+
s ·m
= 0. So
r
+
s Ann
(
m
).
Moreover, if r · m = 0, then also (sr) · m = s · (r · m) = 0. So sr Ann(m).
What is this good for? We first note that any
m M
generates a submodule
Rm as follows:
Definition (Submodule generated by element). Let
M
be an
R
-module, and
m M . The submodule generated by m is
Rm = {r · m M : r R}.
We consider the R-module homomorphism
φ : R M
r 7→ rm.
This is clearly a homomorphism. Then we have
Rm = im(φ),
Ann(m) = ker(φ).
The conclusion is that
Rm
=
R/ Ann(m).
As we mentioned, rings acting on modules is like groups acting on sets. We can
think of this as the analogue of the orbit-stabilizer theorem.
In general, we can generate a submodule with many elements.
Definition (Finitely generated module). An
R
-module
M
is finitely generated
if there is a finite list of elements m
1
, ··· , m
k
such that
M = Rm
1
+ Rm
2
+ ··· + Rm
k
= {r
1
m
1
+ r
2
m
2
+ ··· + r
k
m
k
: r
i
R}.
This is in some sense analogous to the idea of a vector space being finite-
dimensional. However, it behaves much more differently.
While this definition is rather concrete, it is often not the most helpful
characterization of finitely-generated modules. Instead, we use the following
lemma:
Lemma. An
R
-module
M
is finitely-generated if and only if there is a surjective
R-module homomorphism f : R
k
M for some finite k.
Proof. If
M = Rm
1
+ Rm
2
+ ··· + Rm
k
,
we define f : R
k
M by
(r
1
, ··· , r
k
) 7→ r
1
m
1
+ ··· + r
k
m
k
.
It is clear that this is an
R
-module homomorphism. This is by definition
surjective. So done.
Conversely, given a surjection f : R
k
M, we let
m
i
= f(0, 0, ··· , 0, 1, 0, ··· , 0),
where the 1 appears in the ith position. We now claim that
M = Rm
1
+ Rm
2
+ ··· + Rm
k
.
So let m M. As f is surjective, we know
m = f(r
1
, r
2
, ··· , r
k
)
for some r
i
. We then have
f(r
1
, r
2
, ··· , r
k
)
= f((r
1
, 0, ··· , 0) + (0, r
2
, 0, ··· , 0) + ··· + (0, 0, ··· , 0, r
k
))
= f(r
1
, 0, ··· , 0) + f(0, r
2
, 0, ··· , 0) + ··· + f(0, 0, ··· , 0, r
k
)
= r
1
f(1, 0, ··· , 0) + r
2
f(0, 1, 0, ··· , 0) + ··· + r
k
f(0, 0, ··· , 0, 1)
= r
1
m
1
+ r
2
m
2
+ ··· + r
k
m
k
.
So the m
i
generate M .
This view is a convenient way of thinking about finitely-generated modules.
For example, we can immediately prove the following corollary:
Corollary. Let
N M
and
M
be finitely-generated. Then
M/N
is also finitely
generated.
Proof.
Since
m
is finitely generated, we have some surjection
f
:
R
k
M
.
Moreover, we have the surjective quotient map
q
:
M M/N
. Then we get the
following composition
R
k
M M/N,
f q
which is a surjection, since it is a composition of surjections. So
M/N
is finitely
generated.
It is very tempting to believe that if a module is finitely generated, then its
submodules are also finitely generated. It would be very wrong to think so.
Example. A submodule of a finitely-generated module need not be finitely
generated.
We let
R
=
C
[
X
1
, X
2
, ···
]. We consider the
R
-module
M
=
R
, which is
finitely generated (by 1). A submodule of the ring is the same as an ideal.
Moreover, an ideal is finitely generated as an ideal if and only if it is finitely
generated as a module. We pick the submodule
I = (X
1
, X
2
, ···),
which we have already shown to be not finitely-generated. So done.
Example. For a complex number
α
, the ring
Z
[
α
] (i.e. the smallest subring
of
C
containing
α
) is a finitely-generated as a
Z
-module if and only if
α
is an
algebraic integer.
Proof is left as an exercise for the reader on the last example sheet. This allows
us to prove that algebraic integers are closed under addition and multiplication,
since it is easier to argue about whether Z[α] is finitely generated.
3.2 Direct sums and free modules
We’ve been secretly using the direct sum in many examples, but we shall define
it properly now.
Definition (Direct sum of modules). Let
M
1
, M
2
, ··· , M
k
be
R
-modules. The
direct sum is the R-module
M
1
M
2
··· M
k
,
which is the set M
1
× M
2
× ··· × M
k
, with addition given by
(m
1
, ··· , m
k
) + (m
0
1
, ··· , m
0
k
) = (m
1
+ m
0
1
, ··· , m
k
+ m
0
k
),
and the R-action given by
r · (m
1
, ··· , m
k
) = (rm
1
, ··· , rm
k
).
We’ve been using one example of the direct sum already, namely
R
n
= R R ··· R
| {z }
n times
.
Recall we said modules are like vector spaces. So we can try to define things like
basis and linear independence. However, we will fail massively, since we really
can’t prove much about them. Still, we can define them.
Definition (Linear independence). Let
m
1
, ··· , m
k
M
. Then
{m
1
, ··· , m
k
}
is linearly independent if
k
X
i=1
r
i
m
i
= 0
implies r
1
= r
2
= ··· = r
k
= 0.
Lots of modules will not have a basis in the sense we are used to. The next
best thing would be the following:
Definition (Freely generate). A subset S M generates M freely if
(i) S generates M
(ii)
Any set function
ψ
:
S N
to an
R
-module
N
extends to an
R
-module
map θ : M N.
Note that if
θ
1
, θ
2
are two such extensions, we can consider
θ
1
θ
2
:
M N
.
Then
θ
1
θ
2
sends everything in
S
to 0. So
S ker
(
θ
1
θ
2
)
M
. So the
submodule generated by S lies in ker(θ
1
θ
2
) too. But this is by definition M.
So
M ker
(
θ
1
θ
2
)
M
, i.e. equality holds. So
θ
1
θ
2
= 0. So
θ
1
=
θ
2
. So
any such extension is unique.
Thus, what this definition tells us is that giving a map from
M
to
N
is
exactly the same thing as giving a function from S to N .
Definition (Free module and basis). An
R
-module is free if it is freely generated
by some subset S M, and S is called a basis.
We will soon prove that if
R
is a field, then every module is free. However, if
R is not a field, then there are non-free modules.
Example. The
Z
-module
Z/
2
Z
is not freely generated. Suppose
Z/
2
Z
were
generated by some
S Z/
2
Z
. Then this can only possibly be
S
=
{
1
}
. Then
this implies there is a homomorphism
θ
:
Z/
2
Z Z
sending 1 to 1. But it does
not send 0 = 1 + 1 to 1 + 1, since homomorphisms send 0 to 0. So
Z/
2
Z
is not
freely generated.
We now want to formulate free modules in a way more similar to what we
do in linear algebra.
Proposition. For a subset
S
=
{m
1
, ··· , m
k
} M
, the following are equivalent:
(i) S generates M freely.
(ii) S generates M and the set S is independent.
(iii) Every element of M is uniquely expressible as
r
1
m
1
+ r
2
m
2
+ ··· + r
k
m
k
for some r
i
R.
Proof.
The fact that (ii) and (iii) are equivalent is something we would expect
from what we know from linear algebra, and in fact the proof is the same. So
we only show that (i) and (ii) are equivalent.
Let S generate M freely. If S is not independent, then we can write
r
1
m
1
+ ··· + r
k
m
k
= 0,
with
r
i
R
and, say,
r
1
non-zero. We define the set function
ψ
:
S R
by
sending
m
1
7→
1
R
and
m
i
7→
0 for all
i 6
= 1. As
S
generates
M
freely, this
extends to an R-module homomorphism θ : M R.
By definition of a homomorphism, we can compute
0 = θ(0)
= θ(r
1
m
1
+ r
2
m
2
+ ··· + r
k
m
k
)
= r
1
θ(m
1
) + r
2
θ(m
2
) + ··· + r
k
θ(m
k
)
= r
1
.
This is a contradiction. So S must be independent.
To prove the other direction, suppose every element can be uniquely written
as
r
1
m
1
+
···
+
r
k
m
k
. Given any set function
ψ
:
S N
, we define
θ
:
M N
by
θ(r
1
m
1
+ ··· + r
k
m
k
) = r
1
ψ(m
1
) + ··· + r
k
ψ(m
k
).
This is well-defined by uniqueness, and is clearly a homomorphism. So it follows
that S generates M freely.
Example. The set
{
2
,
3
} Z
generates
Z
. However, they do not generate
Z
freely, since
3 · 2 + (2) · 3 = 0.
Recall from linear algebra that if a set
S
spans a vector space
V
, and it is not
independent, then we can just pick some useless vectors and throw them away
in order to get a basis. However, this is no longer the case in modules. Neither
2 nor 3 generate Z.
Definition (Relations). If
M
is a finitely-generated
R
-module, we have shown
that there is a surjective
R
-module homomorphism
φ
:
R
k
M
. We call
ker
(
φ
)
the relation module for those generators.
Definition (Finitely presented module). A finitely-generated module is finitely
presented if we have a surjective homomorphism
φ
:
R
k
M
and
ker φ
is finitely
generated.
Being finitely presented means I can tell you everything about the module
with a finite amount of paper. More precisely, if
{m
1
, ··· , m
k
}
generate
M
and
{n
1
, n
2
, ··· , n
`
} generate ker(φ), then each
n
i
= (r
i1
, ···r
ik
)
corresponds to the relation
r
i1
m
1
+ r
i2
m
2
+ ··· + r
ik
m
k
= 0
in
M
. So
M
is the module generated by writing down
R
-linear combinations
of
m
1
, ··· , m
k
, and saying two elements are the same if they are related to one
another by these relations. Since there are only finitely many generators and
finitely many such relations, we can specify the module with a finite amount of
information.
A natural question we might ask is if
n 6
=
m
, then are
R
n
and
R
m
the same?
In vector spaces, they obviously must be different, since basis and dimension are
well-defined concepts.
Proposition (Invariance of dimension/rank). Let
R
be a non-zero ring. If
R
n
=
R
m
as R-modules, then n = m.
We know this is true if
R
is a field. We now want to reduce this to the case
where R is a ring.
If
R
is an integral domain, then we can produce a field by taking the field of
fractions, and this might be a good starting point. However, we want to do this
for general rings. So we need some more magic.
We will need the following construction:
Let I C R be an ideal, and let M be an R-module. We define
IM = {am M : a I, m M} M.
So we can take the quotient module M/IM, which is an R-module again.
Now if b I, then its action on M/IM is
b(m + IM) = bm + IM = IM.
So everything in
I
kills everything in
M/IM
. So we can consider
M/IM
as an
R/I module by
(r + I) · (m + IM) = r · m + IM.
So we have proved that
Proposition. If
I C R
is an ideal and
M
is an
R
-module, then
M/IM
is an
R/I module in a natural way.
We next need to use the following general fact:
Proposition. Every non-zero ring has a maximal ideal.
This is a rather strong statement, since it talks about “all rings”, and we can
have weird rings. We need to use a more subtle argument, namely via Zorn’s
lemma. You probably haven’t seen it before, in which case you might want to
skip the proof and just take the lecturer’s word on it.
Proof.
We observe that an ideal
I C R
is proper if and only if 1
R
6∈ I
. So every
increasing union of proper ideals is proper. Then by Zorn’s lemma, there is a
maximal ideal (Zorn’s lemma says if an arbitrary union of increasing things is
still a thing, then there is a maximal such thing, roughly).
With these two notions, we get
Proposition (Invariance of dimension/rank). Let
R
be a non-zero ring. If
R
n
=
R
m
as R-modules, then n = m.
Proof.
Let
I
be a maximal ideal of
R
. Suppose we have
R
n
=
R
m
. Then we
must have
R
n
IR
n
=
R
m
IR
m
,
as R/I modules.
But staring at it long enough, we figure that
R
n
IR
n
=
R
I
n
,
and similarly for
m
. Since
R/I
is a field, the result follows by linear algebra.
The point of this proposition is not the result itself (which is not too inter-
esting), but the general constructions used behind the proof.
3.3 Matrices over Euclidean domains
This is the part of the course where we deliver all our promises about proving
the classification of finite abelian groups and Jordan normal forms.
Until further notice, we will assume R is a Euclidean domain, and we write
φ
:
R \{
0
} Z
0
for its Euclidean function. We know that in such a Euclidean
domain, the greatest common divisor gcd(a, b) exists for all a, b R.
We will consider some matrices with entries in R.
Definition (Elementary row operations). Elementary row operations on an
m × n matrix A with entries in R are operations of the form
(i)
Add
c R
times the
i
th row to the
j
th row. This may be done by
multiplying by the following matrix on the left:
1
.
.
.
1 c
.
.
.
1
.
.
.
1
,
where c appears in the ith column of the jth row.
(ii)
Swap the
i
th and
j
th rows. This can be done by left-multiplication of the
matrix
1
.
.
.
1
0 1
1
.
.
.
1
1 0
1
.
.
.
1
.
Again, the rows and columns we have messed with are the
i
th and
j
th
rows and columns.
(iii)
We multiply the
i
th row by a unit
c R
. We do this via the following
matrix:
1
.
.
.
1
c
1
.
.
.
1
Notice that if
R
is a field, then we can multiply any row by any non-zero
number, since they are all units.
We also have elementary column operations defined in a similar fashion, corre-
sponding to right multiplication of the matrices. Notice all these matrices are
invertible.
Definition (Equivalent matrices). Two matrices are equivalent if we can get from
one to the other via a sequence of such elementary row and column operations.
Note that if A and B are equivalent, then we can write
B = QAT
1
for some invertible matrices Q and T
1
.
The aim of the game is to find, for each matrix, a matrix equivalent to it
that is as simple as possible. Recall from IB Linear Algebra that if
R
is a field,
then we can put any matrix into the form
I
r
0
0 0
via elementary row and column operations. This is no longer true when working
with rings. For example, over Z, we cannot put the matrix
2 0
0 0
into that form, since no operation can turn the 2 into a 1. What we get is the
following result:
Theorem (Smith normal form). An
m × n
matrix over a Euclidean domain
R
is equivalent to a diagonal matrix
d
1
d
2
.
.
.
d
r
0
.
.
.
0
,
with the d
i
all non-zero and
d
1
| d
2
| d
3
| ··· | d
r
.
Note that the divisibility criterion is similar to the classification of finitely-
generated abelian groups. In fact, we will derive that as a consequence of the
Smith normal form.
Definition (Invariant factors). The
d
k
obtained in the Smith normal form are
called the invariant factors of A.
We first exhibit the algorithm of producing the Smith normal form with an
algorithm in Z.
Example. We start with the matrix
3 7 4
1 1 2
3 5 1
.
We want to move the 1 to the top-left corner. So we swap the first and second
rows to obtain.
1 1 2
3 7 4
3 5 1
.
We then try to eliminate the other entries in the first row by column operations.
We add multiples of the first column to the second and third to obtain
1 0 0
3 10 2
3 8 5
.
We similarly clear the first column to get
1 0 0
0 10 2
0 8 5
.
We are left with a 2 × 2 matrix to fiddle with.
We swap the second and third columns so that 2 is in the 2
,
2 entry, and
secretly change sign to get
1 0 0
0 2 10
0 5 8
.
We notice that (2
,
5) = 1. So we can use linear combinations to introduce a 1 at
the bottom
1 0 0
0 2 10
0 1 12
.
Swapping rows, we get
1 0 0
0 1 12
0 2 10
.
We then clear the remaining rows and columns to get
1 0 0
0 1 0
0 0 34
.
Proof.
Throughout the process, we will keep calling our matrix
A
, even though
it keeps changing in each step, so that we don’t have to invent hundreds of names
for these matrices.
If
A
= 0, then done! So suppose
A 6
= 0. So some entry is not zero, say,
A
ij
6
= 0. Swapping the
i
th and first row, then
j
th and first column, we arrange
that
A
11
6
= 0. We now try to reduce
A
11
as much as possible. We have the
following two possible moves:
(i)
If there is an
A
1j
not divisible by
A
11
, then we can use the Euclidean
algorithm to write
A
1j
= qA
11
+ r.
By assumption,
r 6
= 0. So
φ
(
r
)
< φ
(
A
11
) (where
φ
is the Euclidean
function).
So we subtract
q
copies of the first column from the
j
th column. Then
in position (1
, j
), we now have
r
. We swap the first and
j
th column such
that
r
is in position (1
,
1), and we have strictly reduced the value of
φ
at
the first entry.
(ii)
If there is an
A
i1
not divisible by
A
11
, we do the same thing, and this
again reduces φ(A
11
).
We keep performing these until no move is possible. Since the value of
φ
(
A
11
)
strictly decreases every move, we stop after finitely many applications. Then we
know that we must have
A
11
dividing all
A
ij
and
A
i1
. Now we can just subtract
appropriate multiples of the first column from others so that
A
1j
= 0 for
j 6
= 1.
We do the same thing with rows so that the first row is cleared. Then we have a
matrix of the form
A =
d 0 ··· 0
0
.
.
. C
0
.
We would like to say “do the same thing with
C
”, but then this would get us a
regular diagonal matrix, not necessarily in Smith normal form. So we need some
preparation.
(iii) Suppose there is an entry of C not divisible by d, say A
ij
with i, j > 1.
A =
d 0 ··· 0 ··· 0
0
.
.
.
0 A
ij
.
.
.
0
We suppose
A
ij
= qd + r,
with
r 6
= 0 and
φ
(
r
)
< φ
(
d
). We add column 1 to column
j
, and subtract
q
times row 1 from row
i
. Now we get
r
in the (
i, j
)th entry, and we want
to send it back to the (1
,
1) position. We swap row
i
with row 1, swap
column j with row 1, so that r is in the (1, 1)th entry, and φ(r) < φ(d).
Now we have messed up the first row and column. So we go back and do
(i) and (ii) again until the first row and columns are cleared. Then we get
A =
d
0
0 ··· 0
0
0 C
0
0
,
where
φ(d
0
) φ(r) < φ(d).
As this strictly decreases the value of
φ
(
A
11
), we can only repeat this finitely
many times. When we stop, we will end up with a matrix
A =
d 0 ··· 0
0
.
.
. C
0
,
and
d
divides every entry of
C
. Now we apply the entire process to
C
. When
we do this process, notice all allowed operations don’t change the fact that
d
divides every entry of C.
So applying this recursively, we obtain a diagonal matrix with the claimed
divisibility property.
Note that if we didn’t have to care about the divisibility property, we can
just do (i) and (ii), and we can get a diagonal matrix. The magic to get to the
Smith normal form is (iii).
Recall that the
d
i
are called the invariant factors. So it would be nice if we
can prove that the
d
i
are indeed invariant. It is not clear from the algorithm
that we will always end up with the same
d
i
. Indeed, we can multiply a whole
row by
1 and get different invariant factors. However, it turns out that these
are unique up to multiplication by units.
To study the uniqueness of the invariant factors of a matrix
A
, we relate
them to other invariants, which involves minors.
Definition (Minor). A
k ×k
minor of a matrix
A
is the determinant of a
k ×k
sub-matrix of
A
(i.e. a matrix formed by removing all but
k
rows and all but
k
columns).
Any given matrix has many minors, since we get to decide which rows and
columns we can throw away. The idea is to consider the ideal generated by all
the minors of matrix.
Definition (Fitting ideal). For a matrix
A
, the
k
th Fitting ideal
Fit
k
(
A
)
C R
is
the ideal generated by the set of all k × k minors of A.
A key property is that equivalent matrices have the same Fitting ideal, even
if they might have very different minors.
Lemma. Let A and B be equivalent matrices. Then
Fit
k
(A) = Fit
k
(B)
for all k.
Proof.
It suffices to show that changing
A
by a row or column operation does
not change the Fitting ideal. Since taking the transpose does not change the
determinant, i.e.
Fit
k
(
A
) =
Fit
k
(
A
T
), it suffices to consider the row operations.
The most difficult one is taking linear combinations. Let
B
be the result of
adding
c
times the
i
th row to the
j
th row, and fix
C
a
k ×k
minor of
A
. Suppose
the resultant matrix is C
0
. We then want to show that det C
0
Fit
k
(A).
If the
j
th row is outside of
C
, then the minor
det C
is unchanged. If both
the
i
th and
j
th rows are in
C
, then the submatrix
C
changes by a row operation,
which does not affect the determinant. These are the boring cases.
Suppose the
j
th row is in
C
and the
i
th row is not. Suppose the
i
th row is
f
1
, ··· , f
k
. Then C is changed to C
0
, with the jth row being
(C
j1
+ cf
1
, C
j
2
+ cf
2
, ··· , C
jk
+ cf
k
).
We compute det C
0
by expanding along this row. Then we get
det C
0
= det C + c det D,
where
D
is the matrix obtained by replacing the
j
th row of
C
with (
f
1
, ··· , f
k
).
The point is that
det C
is definitely a minor of
A
, and
det D
is still a minor of
A
, just another one. Since ideals are closed under addition and multiplications,
we know
det(C
0
) Fit
k
(A).
The other operations are much simpler. They just follow by standard properties
of the effect of swapping rows or multiplying rows on determinants. So after any
row operation, the resultant submatrix C
0
satisfies
det(C
0
) Fit
k
(A).
Since this is true for all minors, we must have
Fit
k
(B) Fit
k
(A).
But row operations are invertible. So we must have
Fit
k
(A) Fit
k
(B)
as well. So they must be equal. So done.
We now notice that if we have a matrix in Smith normal form, say
B =
d
1
d
2
.
.
.
d
r
0
.
.
.
0
,
then we can immediately read off
Fit
k
(B) = (d
1
d
2
···d
k
).
This is clear once we notice that the only possible contributing minors are from
the diagonal submatrices, and the minor from the top left square submatrix
divides all other diagonal ones. So we have
Corollary. If A has Smith normal form
B =
d
1
d
2
.
.
.
d
r
0
.
.
.
0
,
then
Fit
k
(A) = (d
1
d
2
···d
k
).
So d
k
is unique up to associates.
This is since we can find
d
k
by dividing the generator of
Fit
k
(
A
) by the
generator of Fit
k1
(A).
Example. Consider the matrix in Z:
A =
2 0
0 3
.
This is diagonal, but not in Smith normal form. We can potentially apply the
algorithm, but that would be messy. We notice that
Fit
1
(A) = (2, 3) = (1).
So we know d
1
= ±1. We can then look at the second Fitting ideal
Fit
2
(A) = (6).
So d
1
d
2
= ±6. So we must have d
2
= ±6. So the Smith normal form is
1 0
0 6
.
That was much easier.
We are now going to use Smith normal forms to do things. We will need
some preparation, in the form of the following lemma:
Lemma. Let
R
be a principal ideal domain. Then any submodule of
R
m
is
generated by at most m elements.
This is obvious for vector spaces, but is slightly more difficult here.
Proof. Let N R
m
be a submodule. Consider the ideal
I = {r R : (r, r
2
, ··· , r
m
) N for some r
2
, ··· , r
m
R}.
It is clear this is an ideal. Since
R
is a principle ideal domain, we must have
I = (a) for some a R. We now choose an
n = (a, a
2
, ··· , a
m
) N.
Then for any vector (
r
1
, r
2
, ··· , r
m
)
N
, we know that
r
1
I
. So
a | r
1
. So we
can write
r
1
= ra.
Then we can form
(r
1
, r
2
, ··· , r
m
) r(a, a
2
, ··· , a
m
) = (0, r
2
ra
2
, ··· , r
m
ra
m
) N.
This lies in
N
0
=
N
(
{
0
} × R
m1
)
R
m1
. Thus everything in
N
can
be written as a multiple of
n
plus something in
N
0
. But by induction, since
N
0
R
m1
, we know
N
0
is generated by at most
m
1 elements. So there are
n
2
, ··· , n
m
N
0
generating N
0
. So n, n
2
, ··· , n
m
generate N .
If we have a submodule of
R
m
, then it has at most
m
generators. However,
these might generate the submodule in a terrible way. The next theorem tells us
there is a nice way of finding generators.
Theorem. Let
R
be a Euclidean domain, and let
N R
m
be a submod-
ule. Then there exists a basis
v
1
, ··· , v
m
of
R
m
such that
N
is generated by
d
1
v
1
, d
2
v
2
, ··· , d
r
v
r
for some 0 r m and some d
i
R such that
d
1
| d
2
| ··· | d
r
.
This is not hard, given what we’ve developed so far.
Proof.
By the previous lemma,
N
is generated by some elements
x
1
, ··· , x
n
with
n m
. Each
x
i
is an element of
R
m
. So we can think of it as a column vector
of length m, and we can form a matrix
A =
x
1
x
2
··· x
n
.
We’ve got an
m ×n
matrix. So we can put it in Smith normal form! Since there
are fewer columns than there are rows, this is of the form
d
1
d
2
.
.
.
d
r
0
.
.
.
0
0
.
.
.
0
Recall that we got to the Smith normal form by row and column operations.
Performing row operations is just changing the basis of
R
m
, while each column
operation changes the generators of N.
So what this tells us is that there is a new basis
v
1
, ··· , v
m
of
R
m
such
that
N
is generated by
d
1
v
1
, ··· , d
r
v
r
. By definition of Smith normal form, the
divisibility condition holds.
Corollary. Let
R
be a Euclidean domain. A submodule of
R
m
is free of rank
at most
m
. In other words, the submodule of a free module is free, and of a
smaller (or equal) rank.
Proof.
Let
N R
m
be a submodule. By the above, there is a basis
v
1
, ··· , v
n
of
R
m
such that
N
is generated by
d
1
v
1
, ··· , d
r
v
r
for
r m
. So it is certainly
generated by at most
m
elements. So we only have to show that
d
1
v
1
, ··· , d
r
v
r
are
independent. But if they were linearly dependent, then so would be
v
1
, ··· , v
m
.
But
v
1
, ··· , v
n
are a basis, hence independent. So
d
1
v
1
, ··· , d
r
v
r
generate
N
freely. So
N
=
R
r
.
Note that this is not true for all rings. For example, (2
, X
)
C Z
[
X
] is a
submodule of Z[X], but is not isomorphic to Z[X].
Theorem (Classification of finitely-generated modules over a Euclidean domain).
Let R be a Euclidean domain, and M be a finitely generated R-module. Then
M
=
R
(d
1
)
R
(d
2
)
···
R
(d
r
)
R R ··· R
for some d
i
6= 0, and
d
1
| d
2
| ··· | d
r
.
This is either a good or bad thing. If you are pessimistic, this says the world
of finitely generated modules is boring, since there are only these modules we
already know about. If you are optimistic, this tells you all finitely-generated
modules are of this simple form, so we can prove things about them assuming
they look like this.
Proof.
Since
M
is finitely-generated, there is a surjection
φ
:
R
m
M
. So by
the first isomorphism, we have
M
=
R
m
ker φ
.
Since
ker φ
is a submodule of
R
m
, by the previous theorem, there is a basis
v
1
, ··· , v
m
of
R
m
such that
ker φ
is generated by
d
1
v
1
, ··· , d
r
v
r
for 0
r m
and d
1
| d
2
| ··· | d
r
. So we know
M
=
R
m
((d
1
, 0, ··· , 0), (0, d
2
, 0, ··· , 0), ··· , (0, ··· , 0, d
r
, 0, ··· , 0))
.
This is just
R
(d
1
)
R
(d
2
)
···
R
(d
r
)
R ··· R,
with m r copies of R.
This is particularly useful in the case where
R
=
Z
, where
R
-modules are
abelian groups.
Example. Let A be the abelian group generated by a, b, c with relations
2a + 3b + c = 0,
a + 2b = 0,
5a + 6b + 7c = 0.
In other words, we have
A =
Z
3
((2, 3, 1), (1, 2, 0), (5, 6, 7))
.
We would like to get a better description of
A
. It is not even obvious if this
module is the zero module or not.
To work out a good description, We consider the matrix
X =
2 1 5
3 2 6
1 0 7
.
To figure out the Smith normal form, we find the fitting ideals. We have
Fit
1
(X) = (1, ···) = (1).
So d
1
= 1.
We have to work out the second fitting ideal. In principle, we have to check
all the minors, but we immediately notice
2 1
3 2
= 1.
So Fit
2
(X) = (1), and d
2
= 1. Finally, we find
Fit
3
(X) =
2 1 5
3 2 6
1 0 7
= (3).
So d
3
= 3. So we know
A
=
Z
(1)
Z
(1)
Z
(3)
=
Z
(3)
=
C
3
.
If you don’t feel like computing determinants, doing row and column reduction
is often as quick and straightforward.
We re-state the previous theorem in the specific case where
R
is
Z
, since this
is particularly useful.
Corollary (Classification of finitely-generated abelian groups). Any finitely-
generated abelian group is isomorphic to
C
d
1
× ··· × C
d
r
× C
× ··· × C
,
where C
=
Z is the infinite cyclic group, with
d
1
| d
2
| ··· | d
r
.
Proof.
Let
R
=
Z
, and apply the classification of finitely generated
R
-modules.
Note that if the group is finite, then there cannot be any
C
factors. So it
is just a product of finite cyclic groups.
Corollary. If A is a finite abelian group, then
A
=
C
d
1
× ··· × C
d
r
,
with
d
1
| d
2
| ··· | d
r
.
This is the result we stated at the beginning of the course.
Recall that we were also to decompose a finite abelian group into products of
the form
C
p
k
, where
p
is a prime, and we said it was just the Chinese remainder
theorem. This is again in general true, but we, again, need the Chinese remainder
theorem.
Lemma (Chinese remainder theorem). Let
R
be a Euclidean domain, and
a, b R be such that gcd(a, b) = 1. Then
R
(ab)
=
R
(a)
×
R
(b)
as R-modules.
The proof is just that of the Chinese remainder theorem written in ring
language.
Proof. Consider the R-module homomorphism
φ :
R
(a)
×
R
(b)
R
(ab)
by
(r
1
+ (a), r
2
+ (b)) 7→ br
1
+ ar
2
+ (ab).
To show this is well-defined, suppose
(r
1
+ (a), r
2
+ (b)) = (r
0
1
+ (a), r
0
2
+ (b)).
Then
r
1
= r
0
1
+ xa
r
2
= r
0
2
+ yb.
So
br
1
+ ar
2
+ (ab) = br
0
1
+ xab + ar
0
2
+ yab + (ab) = br
0
1
+ ar
0
2
+ (ab).
So this is indeed well-defined. It is clear that this is a module map, by inspection.
We now have to show it is surjective and injective. So far, we have not used
the hypothesis, that
gcd
(
a, b
) = 1. As we know
gcd
(
a, b
) = 1, by the Euclidean
algorithm, we can write
1 = ax + by
for some x, y R. So we have
φ(y + (a), x + (b)) = by + ax + (ab) = 1 + (ab).
So 1 im φ. Since this is an R-module map, we get
φ(r(y + (a), x + (b))) = r · (1 + (ab)) = r + (ab).
The key fact is that
R/
(
ab
) as an
R
-module is generated by 1. Thus we know
φ
is surjective.
Finally, we have to show it is injective, i.e. that the kernel is trivial. Suppose
φ(r
1
+ (a), r
2
+ (b)) = 0 + (ab).
Then
br
1
+ ar
2
(ab).
So we can write
br
1
+ ar
2
= abx
for some
x R
. Since
a | ar
2
and
a | abx
, we know
a | br
1
. Since
a
and
b
are
coprime, unique factorization implies a | r
1
. Similarly, we know b | r
2
.
(r
1
+ (a), r
2
+ (b)) = (0 + (a), 0 + (b)).
So the kernel is trivial.
Theorem (Prime decomposition theorem). Let
R
be a Euclidean domain, and
M be a finitely-generated R-module. Then
M
=
N
1
N
2
··· N
t
,
where each N
i
is either R or is R/(p
n
) for some prime p R and some n 1.
Proof. We already know
M
=
R
(d
1
)
···
R
(d
r
)
R ··· R.
So it suffices to show that each R/(d
1
) can be written in that form. We let
d = p
n
1
1
p
n
2
2
···p
n
k
k
with
p
i
distinct primes. So each
p
n
i
i
is coprime to each other. So by the lemma
iterated, we have
R
(d
1
)
=
R
(p
n
1
1
)
···
R
(p
n
k
k
)
.
3.4 Modules over F[X] and normal forms for matrices
That was one promise delivered. We next want to consider the Jordan normal
form. This is less straightforward, since considering
V
directly as an
F
module
would not be too helpful (since that would just be pure linear algebra). Instead,
we use the following trick:
For a field
F
, the polynomial ring
F
[
X
] is a Euclidean domain, so the results
of the last few sections apply. If
V
is a vector space on
F
, and
α
:
V V
is a
linear map, then we can make V into an F[X]-module via
F[X] × V V
(f, v) 7→ (f(α))(v).
We write V
α
for this F[X]-module.
Lemma. If
V
is a finite-dimensional vector space, then
V
α
is a finitely-generated
F[X]-module.
Proof.
If v
1
, ··· ,
v
n
generate
V
as an
F
-module, i.e. they span
V
as a vector
space over
F
, then they also generate
V
α
as an
F
[
X
]-module, since
F F
[
X
].
Example. Suppose
V
α
=
F
[
X
]
/
(
X
r
) as
F
[
X
]-modules. Then in particular
they are isomorphic as
F
-modules (since being a map of
F
-modules has fewer
requirements than being a map of F[X]-modules).
Under this bijection, the elements 1
, X, X
2
, ··· , X
r1
F
[
X
]
/
(
X
r
) form a
vector space basis for
V
α
. Viewing
F
[
X
]
/
(
X
r
) as an
F
-vector space, the action
of X has the matrix
0 0 ··· 0 0
1 0 ··· 0 0
0 1 ··· 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ··· 1 0
.
We also know that in V
α
, the action of X is by definition the linear map α. So
under this basis, α also has matrix
0 0 ··· 0 0
1 0 ··· 0 0
0 1 ··· 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ··· 1 0
.
Example. Suppose
V
α
=
F[X]
((X λ)
r
)
for some λ F. Consider the new linear map
β = α λ · id : V V.
Then
V
β
=
F
[
Y
]
/
(
Y
r
), for
Y
=
X λ
. So there is a basis for
V
so that
β
looks
like
0 0 ··· 0 0
1 0 ··· 0 0
0 1 ··· 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ··· 1 0
.
So we know α has matrix
λ 0 ··· 0 0
1 λ ··· 0 0
0 1 ··· 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ··· 1 λ
So it is a Jordan block (except the Jordan blocks are the other way round, with
zeroes below the diagonal).
Example. Suppose V
α
=
F[X]/(f) for some polynomial f, for
f = a
0
+ a
1
X + ··· + a
r1
X
r1
+ X
r
.
This has a basis 1, X, X
2
, ··· , X
r1
as well, in which α is
c(f) =
0 0 ··· 0 a
0
1 0 ··· 0 a
1
0 1 ··· 0 a
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ··· 1 a
r1
.
We call this the companion matrix for the monic polynomial f .
These are different things that can possibly happen. Since we have already
classified all finitely generated
F
[
X
] modules, this allows us to put matrices in a
rather nice form.
Theorem (Rational canonical form). Let
α
:
V V
be a linear endomorphism
of a finite-dimensional vector space over
F
, and
V
α
be the associated
F
[
X
]-module.
Then
V
α
=
F[X]
(f
1
)
F[X]
(f
2
)
···
F[X]
(f
s
)
,
with
f
1
| f
2
| ··· | f
s
. Thus there is a basis for
V
in which the matrix for
α
is
the block diagonal
c(f
1
) 0 ··· 0
0 c(f
2
) ··· 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ··· c(f
s
)
This is the sort of theorem whose statement is longer than the proof.
Proof.
We already know that
V
α
is a finitely-generated
F
[
X
]-module. By the
structure theorem of F[X]-modules, we know
V
α
=
F[X]
(f
1
)
F[X]
(f
2
)
···
F[X]
(f
s
)
0.
We know there are no copies of
F
[
X
], since
V
α
=
V
is finite-dimensional over
F
, but
F
[
X
] is not. The divisibility criterion also follows from the structure
theorem. Then the form of the matrix is immediate.
This is really a canonical form. The Jordan normal form is not canonical,
since we can move the blocks around. The structure theorem determines the
factors
f
i
up to units, and once we require them to be monic, there is no choice
left.
In terms of matrices, this says that if
α
is represented by a matrix
A M
n,n
(
F
)
in some basis, then A is conjugate to a matrix of the form above.
From the rational canonical form, we can immediately read off the minimal
polynomial as
f
s
. This is since if we view
V
α
as the decomposition above, we
find that
f
s
(
α
) kills everything in
F[X]
(f
s
)
. It also kills the other factors since
f
i
| f
s
for all
i
. So
f
s
(
α
) = 0. We also know no smaller polynomial kills
V
, since it
does not kill
F[X]
(f
s
)
.
Similarly, we find that the characteristic polynomial of α is f
1
f
2
···f
s
.
Recall we had a different way of decomposing a module over a Euclidean
domain, namely the prime decomposition, and this gives us the Jordan normal
form.
Before we can use that, we need to know what the primes are. This is why
we need to work over C.
Lemma. The prime elements of
C
[
X
] are the
X λ
for
λ C
(up to multipli-
cation by units).
Proof.
Let
f C
[
X
]. If
f
is constant, then it is either a unit or 0. Otherwise, by
the fundamental theorem of algebra, it has a root
λ
. So it is divisible by
X λ
.
So if
f
is irreducible, it must have degree 1. And clearly everything of degree 1
is prime.
Applying the prime decomposition theorem to
C
[
X
]-modules gives us the
Jordan normal form.
Theorem (Jordan normal form). Let
α
:
V V
be an endomorphism of a
vector space V over C, and V
α
be the associated C[X]-module. Then
V
α
=
C[X]
((X λ
1
)
a
1
)
C[X]
((X λ
2
)
a
2
)
···
C[X]
((X λ
t
)
a
t
)
,
where
λ
i
C
do not have to be distinct. So there is a basis of
V
in which
α
has
matrix
J
a
1
(λ
1
) 0
J
a
2
(λ
2
)
.
.
.
0 J
a
t
(λ
t
)
,
where
J
m
(λ) =
λ 0 ··· 0
1 λ ··· 0
.
.
.
.
.
.
.
.
.
.
.
.
0 ··· 1 λ
is an m × m matrix.
Proof.
Apply the prime decomposition theorem to
V
α
. Then all primes are of
the form
X λ
. We then use our second example at the beginning of the chapter
to get the form of the matrix.
The blocks
J
m
(
λ
) are called the Jordan
λ
-blocks. It turns out that the Jordan
blocks are unique up to reordering, but it does not immediately follow from what
we have so far, and we will not prove it. It is done in the IB Linear Algebra
course.
We can also read off the minimal polynomial and characteristic polynomial
of α. The minimal polynomial is
Y
λ
(X λ)
a
λ
,
where
a
λ
is the size of the largest
λ
-block. The characteristic polynomial of
α
is
Y
λ
(X λ)
b
λ
,
where b
λ
is the sum of the sizes of the λ-blocks. Alternatively, it is
t
Y
i=1
(X λ
i
)
a
i
.
From the Jordan normal form, we can also read off another invariant, namely
the size of the λ-space of α, namely the number of λ-blocks.
We can also use the idea of viewing
V
as an
F
[
X
] module to prove Cayley-
Hamilton theorem. In fact, we don’t need F to be a field.
Theorem (Cayley-Hamilton theorem). Let
M
be a finitely-generated
R
-module,
where
R
is some commutative ring. Let
α
:
M M
be an
R
-module homomor-
phism. Let
A
be a matrix representation of
α
under some choice of generators,
and let p(t) = det(tI A). Then p(α) = 0.
Proof. We consider M as an R[X]-module with action given by
(f(X))(m) = f(α)m.
Suppose e
1
, ··· , e
n
span M , and that for all i, we have
α(e
i
) =
n
X
j=1
a
ij
e
j
.
Then
n
X
j=1
(Xδ
ij
a
ij
)e
j
= 0.
We write C for the matrix with entries
c
ij
= Xδ
ij
a
ij
F[X].
We now use the fact that
adj(C)C = det(C)I,
which we proved in IB Linear Algebra (and the proof did not assume that the
underlying ring is a field). Expanding this out, we get the following equation (in
F[X]).
χ
α
(X)I = det(XI A)I = (adj(XI A))(XI A).
Writing this in components, and multiplying by e
k
, we have
χ
α
(X)δ
ik
e
k
=
n
X
j=1
(adj(XI A)
ij
)(Xδ
jk
a
jk
)e
k
.
Then for each i, we sum over k to obtain
n
X
k=1
χ
α
(X)δ
ik
e
k
=
n
X
j,k=1
(adj(XI A)
ij
)(Xδ
jk
a
jk
)e
k
= 0,
by our choice of
a
ij
. But the left hand side is just
χ
α
(
X
)
e
i
. So
χ
α
(
X
) acts
trivially on all of the generators
e
i
. So it in fact acts trivially. So
χ
α
(
α
) is the
zero map (since acting by X is the same as acting by α, by construction).
Note that if we want to prove this just for matrices, we don’t really need the
theory of rings and modules. It just provides a convenient language to write the
proof in.
3.5 Conjugacy of matrices*
We are now going to do some fun computations of conjugacy classes of matrices,
using what we have got so far.
Lemma. Let
α, β
:
V V
be two linear maps. Then
V
α
=
V
β
as
F
[
X
]-modules
if and only if
α
and
β
are conjugate as linear maps, i.e. there is some
γ
:
V V
such that α = γ
1
βγ.
This is not a deep theorem. This is in some sense just some tautology. All
we have to do is to unwrap what these statements say.
Proof.
Let
γ
:
V
β
V
α
be an
F
[
X
]-module isomorphism. Then for v
V
, we
notice that β(v) is just X · v in V
β
, and α(v) is just X · v in V
α
. So we get
β γ(v) = X · (γ(v)) = γ(X · v) = γ α(v),
using the definition of an F[X]-module homomorphism.
So we know
βγ = γα.
So
α = γ
1
βγ.
Conversely, let
γ
:
V V
be a linear isomorphism such that
γ
1
βγ
=
α
. We
now claim that
γ
:
V
α
V
β
is an
F
[
X
]-module isomorphism. We just have to
check that
γ(f · v ) = γ(f(α)(v))
= γ(a
0
+ a
1
α + ··· + a
n
α
n
)(v)
= γ(a
0
v) + γ(a
1
α(v)) + γ(a
2
α
2
(v)) + ··· + γ(a
n
α
n
(v))
= (a
0
+ a
1
β + a
2
β
2
+ ··· + a
n
β
n
)(γ(v))
= f · γ(v).
So classifying linear maps up to conjugation is the same as classifying modules.
We can reinterpret this a little bit, using our classification of finitely-generated
modules.
Corollary. There is a bijection between conjugacy classes of
n × n
matrices
over
F
and sequences of monic polynomials
d
1
, ··· , d
r
such that
d
1
| d
2
| ··· | d
r
and deg(d
1
···d
r
) = n.
Example. Let’s classify conjugacy classes in
GL
2
(
F
), i.e. we need to classify
F[X]-modules of the form
F[X]
(d
1
)
F[X]
(d
2
)
···
F[X]
(d
r
)
which are two-dimensional as
F
-modules. As we must have
deg
(
d
1
d
2
···d
r
) = 2,
we either have a quadratic thing or two linear things, i.e. either
(i) r = 1 and deg(d
1
) = 2,
(ii) r
= 2 and
deg
(
d
1
) =
deg
(
d
2
) = 1. In this case, since we have
d
1
| d
2
, and
they are both monic linear, we must have d
1
= d
2
= X λ for some λ.
In the first case, the module is
F[X]
(d
1
)
,
where, say,
d
1
= X
2
+ a
1
X + a
2
.
In the second case, we get
F[X]
(X λ)
F[X]
(X λ)
.
What does this say? In the first case, we use the basis 1
, X
, and the linear map
has matrix
0 a
2
1 a
1
In the second case, this is
λ 0
0 λ
.
Do these cases overlap? Suppose the two of them are conjugate. Then they have
the same determinant and same trace. So we know
a
1
= 2λ
a
2
= λ
2
So in fact our polynomial is
X
2
+ a
1
X + a
2
= X
2
2λ + λ
2
= (X λ)
2
.
This is just the polynomial of a Jordan block. So the matrix
0 a
2
1 a
1
is conjugate to the Jordan block
λ 0
1 λ
,
but this is not conjugate to
λI
, e.g. by looking at eigenspaces. So these cases
are disjoint.
Note that we have done more work that we really needed, since
λI
is invariant
under conjugation.
But the first case is not too satisfactory. We can further classify it as follows.
If X
2
+ a
1
X + a
2
is reducible, then it is
(X λ)(X µ)
for some µ, λ F. If λ = µ, then the matrix is conjugate to
λ 0
1 λ
Otherwise, it is conjugate to
λ 0
0 µ
.
In the case where
X
2
+
a
1
X
+
a
2
is irreducible, there is nothing we can do
in general. However, we can look at some special scenarios and see if there is
anything we can do.
Example. Consider
GL
2
(
Z/
3). We want to classify its conjugacy classes. By
the general theory, we know everything is conjugate to
λ 0
0 µ
,
λ 0
1 λ
,
0 a
2
1 a
1
,
with
X
2
+ a
1
X + a
2
irreducible. So we need to figure out what the irreducibles are.
A reasonable strategy is to guess. Given any quadratic, it is easy to see if it
is irreducible, since we can try to see if it has any roots, and there are just three
things to try. However, we can be a bit slightly more clever. We first count how
many irreducibles we are expecting, and then find that many of them.
There are 9 monic quadratic polynomials in total, since
a
1
, a
2
Z/
3. The
reducibles are (
X λ
)
2
or (
X λ
)(
X µ
) with
λ 6
=
µ
. There are three of each
kind. So we have 6 reducible polynomials, and so 3 irreducible ones.
We can then check that
X
2
+ 1, X
2
+ X + 2, X
2
+ 2X + 2
are the irreducible polynomials. So every matrix in
GL
2
(
Z/
3) is either congruent
to
0 1
1 0
,
0 2
1 1
,
0 2
1 2
,
λ 0
0 µ
,
λ 0
1 λ
,
where
λ, µ
(
Z/
3)
×
(since the matrix has to be invertible). The number of
conjugacy classes of each type are 1
,
1
,
1
,
3
,
2. So there are 8 conjugacy classes.
The first three classes have elements of order 4, 8, 8 respectively, by trying. We
notice that the identity matrix has order 1, and
λ 0
0 µ
has order 2 otherwise. Finally, for the last type, we have
ord
1 0
1 1
= 3, ord
2 0
1 2
= 6
Note that we also have
|GL
2
(Z/3)| = 48 = 2
4
· 3.
Since there is no element of order 16, the Sylow 2-subgroup of
GL
2
(
Z/
3) is not
cyclic.
To construct the Sylow 2-subgroup, we might start with an element of order
8, say
B =
0 1
1 2
.
To make a subgroup of order 6, a sensible guess would be to take an element of
order 2, but that doesn’t work, since
B
4
will give you the element of order 2.
Instead, we pick
A =
0 2
1 0
.
We notice
A
1
BA =
0 1
2 0
0 1
1 2
0 2
1 0
=
1 2
0 2
0 2
1 0
=
2 2
2 0
= B
3
.
So this is a bit like the dihedral group.
We know that
hBi C hA, Bi.
Also, we know
|hBi|
= 8. So if we can show that
hBi
has index 2 in
hA, Bi
, then
this is the Sylow 2-subgroup. By the second isomorphism theorem, something
we have never used in our life, we know
hA, Bi
hBi
=
hAi
hAi hBi
.
We can list things out, and then find
hAi hBi =

2 0
0 2

=
C
2
.
We also know hAi
=
C
4
. So we know
|hA, Bi|
|hBi|
= 2.
So |hA, Bi| = 16. So this is the Sylow 2-subgroup. in fact, it is
hA, B | A
4
= B
8
= e, A
1
BA = B
3
i
We call this the semi-dihedral group of order 16, because it is a bit like a dihedral
group.
Note that finding this subgroup was purely guesswork. There is no method
to know that A and B are the right choices.