Part II Logic and Set Theory
Based on lectures by I. B. Leader
Notes taken by Dexter Chua
Lent 2015
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
No specific prerequisites.
Ordinals and cardinals
Well-orderings and order-types. Examples of countable ordinals. Uncountable ordi-
nals and Hartogs’ lemma. Induction and recursion for ordinals. Ordinal arithmetic.
Cardinals; the hierarchy of alephs. Cardinal arithmetic. [5]
Posets and Zorn’s lemma
Partially ordered sets; Hasse diagrams, chains, maximal elements. Lattices and Boolean
algebras. Complete and chain-complete p osets; fixed-point theorems. The axiom of
choice and Zorn’s lemma. Applications of Zorn’s lemma in mathematics. The well-
ordering principle. [5]
Propositional logic
The propositional calculus. Semantic and syntactic entailment. The deduction and
completeness theorems. Applications: compactness and decidability. [3]
Predicate logic
The predicate calculus with equality. Examples of first-order languages and theories.
Statement of the completeness theorem; *sketch of proof*. The compactness theorem
and the Lowenheim-Skolem theorems. Limitations of first-order logic. Model theory. [5]
Set theory
Set theory as a first-order theory; the axioms of ZF set theory. Transitive closures,
epsilon-induction and epsilon-recursion. Well-founded relations. Mostowski’s collapsing
theorem. The rank function and the von Neumann hierarchy. [5]
Consistency
*Problems of consistency and independence* [1]
Contents
0 Introduction
1 Propositional calculus
1.1 Propositions
1.2 Semantic entailment
1.3 Syntactic implication
2 Well-orderings and ordinals
2.1 Well-orderings
2.2 New well-orderings from old
2.3 Ordinals
2.4 Successors and limits
2.5 Ordinal arithmetic
2.6 Normal functions*
3 Posets and Zorn’s lemma
3.1 Partial orders
3.2 Zorn’s lemma and axiom of choice
3.3 Bourbaki-Witt theorem*
4 Predicate logic
4.1 Language of predicate logic
4.2 Semantic entailment
4.3 Syntactic implication
4.4 Peano Arithmetic
4.5 Completeness and categoricity*
5 Set theory
5.1 Axioms of set theory
5.2 Properties of ZF
5.3 Picture of the universe
6 Cardinals
6.1 Definitions
6.2 Cardinal arithmetic
7 Incompleteness*
0 Introduction
Most people are familiar with the notion of “sets” (here “people” is defined
to be mathematics students). However, most of the time, we only have an
intuitive picture of what set theory should look like there are sets, we can
take intersections, unions, intersections and subsets. We can write down sets
like {x : ϕ(x) is true}.
Historically, mathematicians were content with this vague notion of sets.
However, it turns out that our last statement wasn’t really correct. We cannot
just arbitrarily write down sets we like. This is evidenced by the famous Russel’s
paradox, where the set
X
is defined as
X
=
{x
:
x ∈ x}
. Then we have
X X X ∈ X, which is a contradiction.
This lead to the formal study of set theory, where set theory is given a formal
foundation based on some axioms of set theory. This is known as axiomatic
set theory. This is similar to Euclid’s axioms of geometry, and, in some sense,
the group axioms. Unfortunately, while axiomatic set theory appears to avoid
paradoxes like Russel’s paradox, as odel proved in his incompleteness theorem,
we cannot prove that our axioms are free of contradictions.
Closely related to set theory is formal logic. Similarly, we want to put logic
on a solid foundation. We want to formally define our everyday notions such as
propositions, truth and proofs. The main result we will have is that a statement
is true if and only if we can prove it. This assures that looking for proofs is a
sensible way of showing that a statement is true.
It is important to note that having studied formal logic does not mean that
we should always reason with formal logic. In fact, this is impossible, as we
ultimately need informal logic to reason about formal logic itself!
Throughout the course, we will interleave topics from set theory and formal
logic. This is necessary as we need tools from set theory to study formal logic,
while we also want to define set theory within the framework of formal logic.
One is not allowed to complain that this involves circular reasoning.
As part of the course, we will also side-track to learn about well-orderings
and partial orders, as these are very useful tools in the study of logic and set
theory. Their importance will become evident as we learn more about them.
1 Propositional calculus
Propositional calculus is the study of logical statements such
p p
and
p
(
q p
). As opposed to predicate calculus, which will be studied in Chapter 4,
the statements will not have quantifier symbols like , .
When we say
p p
is a correct”, there are two ways we can interpret this.
We can interpret this as “no matter what truth value
p
takes,
p p
always has
the truth value of “true”.” Alternatively, we can interpret this as “there is a
proof that p p”.
The first notion is concerned with truth, and does not care about whether we
can prove things. The second notion, on the other hand, on talks about proofs.
We do not, in any way, assert that
p p
is “true”. One of the main objectives
of this chapter is to show that these two notions are consistent with each other.
A statement is true (in the sense that it always has the truth value of “true”) if
and only if we can prove it. It turns out that this equivalence has a few rather
striking consequences.
Before we start, it is important to understand that there is no “standard”
logical system. What we present here is just one of the many possible ways of
doing formal logic. In particular, do not expect anyone else to know exactly how
your system works without first describing it. Fortunately, no one really writes
proof with formal logic, and the important theorems we prove (completeness,
compactness etc.) do not depend much on the exact implementation details of
the systems.
1.1 Propositions
We’ll start by defining propositions, which are the statements we will consider
in propositional calculus.
Definition (Propositions). Let
P
be a set of primitive propositions. These are a
bunch of (meaningless) symbols (e.g.
p
,
q
,
r
), which are used as the basic building
blocks of our more interesting propositions. These are usually interpreted to
take a truth value. Usually, any symbol (composed of alphabets and subscripts)
is in the set of primitive propositions.
The set of propositions, written as L or L(P ), is defined inductively by
(i) If p P , then p L.
(ii) L, where is read as “false” (also a meaningless symbol).
(iii) If p, q L, then (p q) L.
Example. If our set of primitive propositions is
P
=
{p, q, r}
, then
p q
,
p , ((p q) (p r)) are propositions.
To define L formally, we let
L
0
= {⊥} P
L
n+1
= L
n
{(p q) : p, q L
n
}.
Then we define L = L
0
L
1
L
2
···.
In formal language terms,
L
is the set of finite strings of symbols from the
alphabet
( )
p
1
p
2
···
that satisfy some formal grammar rule (e.g. brackets
have to match).
Note here that officially, the only relation we have is
. The familiar “not”,
“and” and “or” do not exist. Instead, we define them to be abbreviations of
certain expressions:
Definition (Logical symbols).
¬p (“not p”) is an abbreviation for (p )
p q (“p and q”) is an abbreviation for ¬(p (¬q))
p q (“p or q”) is an abbreviation for (¬p) q
The advantage of having just one symbol
is that when we prove something
about our theories, we only have to prove it for
, instead of all
, ¬,
and
individually.
1.2 Semantic entailment
The idea of semantic entailment is to assign truth values to propositions, where
we declare each proposition to be “true” or “false”. This assignment is performed
by a valuation.
Definition (Valuation). A valuation on
L
is a function
v
:
L {
0
,
1
}
such
that:
v() = 0,
v(p q) =
(
0 if v(p) = 1, v(q) = 0,
1 otherwise
We interpret
v
(
p
) to be the truth value of
p
, with 0 denoting “false” and 1
denoting “true”.
Note that we do not impose any restriction of
v
(
p
) when
p
is a primitive
proposition.
For those people who like homomorphisms, we can first give the set
{
0
,
1
}
a
binary operation by
a b =
(
0 if a = 1, b = 0
1 otherwise
as well as a constant
= 0. Then a valuation can be defined as a homomorphism
between L and {0, 1} that preserves and .
It should be clear that a valuation is uniquely determined by its values on
the primitive propositions, as the values on all other propositions follow from
the definition of a valuation. In particular, we have
Proposition.
(i) If v and v
are valuations with v(p) = v
(p) for all p P , then v = v
.
(ii)
For any function
w
:
P {
0
,
1
}
, we can extend it to a valuation
v
such
that v(p) = w(p) for all p L.
Proof.
(i)
Recall that
L
is defined inductively. We are given that
v
(
p
) =
v
(
p
) on
L
0
. Then for all
p L
1
,
p
must be in the form
q r
for
q, r L
0
. Then
v
(
q r
) =
v
(
p q
) since the value of
v
is uniquely determined by the
definition. So for all p L
1
, v(p) = v
(p).
Continue inductively to show that v(p) = v
(p) for all p L
n
for any n.
(ii)
Set
v
to agree with
w
for all
p P
, and set
v
(
) = 0. Then define
v
on
L
n
inductively according to the definition.
Example. Suppose v is a valuation with v(p) = v(q) = 1, v(r) = 0. Then
v((p q) r) = 0.
Often, we are interested in propositions that are always true, such as
p p
.
These are known as tautologies.
Definition (Tautology).
t
is a tautology, written as
|
=
t
, if
v
(
t
) = 1 for all
valuations v.
To show that a statement is a tautology, we can use a truth table, where we
simply list out all possible valuations and find the value of v(t).
Example.
(i) |= p (q p) “A true statement is implied by anything”
v(p) v(q) v(q p) v(p (q p))
1 1 1 1
1 0 1 1
0 1 0 1
0 0 1 1
(ii) |= (¬¬p) p. Recall that ¬¬p is defined as ((p ) ).
v(p) v(p ) v((p ) ) v(((p ) ) p)
1 0 1 1
0 1 0 1
(iii) |= [p (q r)] [(p q) (p r)].
Instead of creating a truth table, which would be horribly long, we show
this by reasoning: Suppose it is not a tautology. So there is a
v
such that
v
(
p
(
q r
)) = 1 and
v
((
p q
)
(
p r
)) = 0. For the second
equality to hold, we must have
v
(
p q
) = 1 and
v
(
p r
) = 0. So
v(p) = 1, v(r) = 0, v(q) = 1. But then v(p (q r)) = 0.
Sometimes, we don’t want to make statements as strong as
t
is always true”.
Instead, we might want to say
t
is true whenever
S
is true”. This is known as
semantic entailment.
Definition (Semantic entailment). For
S L
,
t L
, we say
S
entails
t
,
S
semantically implies t or
S |
=
t
if, for any
v
such that
v
(
s
) = 1 for all
s S
, we
have v(t) = 1.
Here we are using the symbol
|
= again. This is not an attempt to confuse
students. |= t is equivalent to the statement |= t.
Example. {p q, q r} |= (p r).
We want to show that for any valuation
v
with
v
(
p q
) =
v
(
q r
) = 1, we
have v(p r) = 1. We prove the contrapositive.
If
v
(
p r
) = 0, then
v
(
p
) = 1 and
v
(
r
) = 0. If
v
(
q
) = 0, then
v
(
p q
) = 0.
If
v
(
q
) = 1, then
v
(
q r
) = 0. So
v
(
p r
) = 0 only if one of
v
(
p q
) or
v(q r) is zero.
Note that
{p} |
=
q
and
p q
both intuitively mean “if
p
is true, then
q
is
true”. However, these are very distinct notions.
p q
is a proposition within
our theory. It is true (or false) in the sense that valuations take the value 0 (or
1).
On the other hand, when we say
{p} |
=
q
, this is a statement in the meta-
theory. It is true (or false) in the sense that we decided it is true (or false) based
on some arguments and (informal) proofs, performed in the real world instead
of inside propositional calculus.
The same distinction should be made when we define syntactic implication
later.
Before we dive into syntactic implication, we will define a few convenient
terms for later use.
Definition (Truth and model). If
v
(
t
) = 1, then we say that
t
is true in
v
, or
v
is a model of
t
. For
S L
, a valuation
v
is a model of
S
if
v
(
s
) = 1 for all
s S
.
1.3 Syntactic implication
While semantic implication captures the idea of truthfulness, syntactic impli-
cation captures the idea of proofs. We want to say
S
syntactically implies
t
if
there we can prove t from S.
To prove propositions, we need two things. Firstly, we need axioms, which
are statements we can assume to be true in a proof. These are the basic building
blocks from which we will prove our theorems.
Other than axioms, we also need deduction rules. This allows as to make
deduce statements from other statements.
Our system of deduction composes of the following three axioms:
1. p (q p)
2. [p (q r)] [(p q) (p r)]
3. (¬¬p) p
and the deduction rule of modus ponens: from p and p q, we can deduce q.
At first sight, our axioms look a bit weird, especially the second one. We
will later see that how this particular choice of axioms allows us to prove certain
theorems more easily. This choice of axioms can also be motivated by combinatory
logic, but we shall not go into details of these.
Definition (Proof and syntactic entailment). For any
S L
, a proof of
t
from
S
is a finite sequence
t
1
, t
2
, ···t
n
of propositions, with
t
n
=
t
, such that each
t
i
is one of the following:
(i) An axiom
(ii) A member of S
(iii) A proposition t
i
such that there exist j, k < i with t
j
being t
k
t
i
.
If there is a proof of
t
from
S
, we say that
S
proves or syntactically entails
t
,
written S t.
If t, we say t is a theorem and write t.
In a proof of
t
from
S
,
t
is the conclusion and
S
is the set of hypothesis or
premises.
Example. {p q, q r} p r
We go for (p q) (p r) via Axiom 2.
1. [p (q r)] [(p q) (p r)] Axiom 2
2. q r Hypothesis
3. (q r) [p (q r)] Axiom 1
4. p (q r) MP on 2, 3
5. (p q) (p r) MP on 1, 4
6. p q Hypothesis
7. p r MP on 5, 6
Example. (p p)
We go for [p (p p)] (p p).
1. [p ((p p) p)] [(p (p p)) (p p)] Axiom 2
2. p ((p p) p) Axiom 1
3. [p (p p)] (p p) MP on 1, 2
4. p (p p) Axiom 1
5. p p MP on 3, 4
This seems like a really tedious way to prove things. We now prove that the
deduction theorem, which allows as to find proofs much more easily.
Proposition (Deduction theorem). Let S L and p, q L. Then we have
S (p q) S {p} q.
This says that behaves like the connective in the language.
Proof. () Given a proof of p q from S, append the lines
p Hypothesis
q MP
to obtain a proof of q from S {p}.
(
) Let
t
1
, t
2
, ··· , t
n
=
q
be a proof of
q
from
S {p}
. We’ll show that
S p t
i
for all i.
We consider different possibilities of t
i
:
t
i
is an axiom: Write down
t
i
(p t
i
) Axiom 1
t
i
Axiom
p t
i
MP
t
i
S: Write down
t
i
(p t
i
) Axiom 1
t
i
Hypothesis
p t
i
MP
t
i
= p: Write down our proof of p p from our example above.
t
i
is obtained by MP: we have some
j, k < i
such that
t
k
= (
t
j
t
i
). We
can assume that
S
(
p t
j
) and
S
(
p t
k
) by induction on
i
. Now
we can write down
[p (t
j
t
i
)] [(p t
j
) (p t
i
)] Axiom 2
p (t
j
t
i
) Known already
(p t
j
) (p t
i
) MP
p t
j
Known already
p t
i
MP
to get S (p t
i
).
This is the reason why we have this weird-looking Axiom 2 it enables us to
easily prove the deduction theorem.
This theorem has a “converse”. Suppose we have a deduction system system
that admits modus ponens, and the deduction theorem holds for this system.
Then axioms (1) and (2) must hold in the system, since we can prove them using
the deduction theorem and modus ponens. However, we are not able to deduce
axiom (3) from just modus ponens and the deduction theorem.
Example. We want to show
{p q, q r}
(
p r
). By the deduction
theorem, it is enough to show that
{p q, q r, p} r
, which is trivial by
applying MP twice.
Now we have two notions:
|
= and
. How are they related? We want to
show that they are equal: if something is true, we can prove it; if we can prove
something, it must be true.
Aim. Show that S t if and only if S |= t.
This is known as the completeness theorem, which is made up of two directions:
(i) Soundness: If S t, then S |= t. “Our axioms aren’t absurd”
(ii)
Adequacy: If
S |
=
t
,
S t
. “Our axioms are strong enough to be able to
deduce, from S, all semantic consequences of S.”
We first prove the easy bit:
Proposition (Soundness theorem). If S t, then S |= t.
Proof.
Given valuation
v
with
v
(
s
) = 1 for all
s S
, we need to show that
v(t) = 1. We will show that every line t
i
in the proof has v(t
i
) = 1.
If
t
i
is an axiom, then
v
(
t
i
) = 1 since axioms are tautologies. If
t
i
is a
hypothesis, then by assumption
v
(
s
) = 1. If
t
i
is obtained by modus ponens, say
from t
j
t
i
, since v(t
j
) = 1 and v(t
j
t
i
) = 1, we must have v(t
i
) = 1.
Note that soundness holds whenever our axioms are all tautologies. Even if
we had silly axioms that are able to prove almost nothing, as long as they are
all tautologies, it will be sound.
Now we have to prove adequacy. It seems like a big and scary thing to prove.
Given that a statement is true, we have to find a proof for it, but we all know
that finding proofs is hard!
We first prove a special case. To do so, we first define consistency.
Definition (Consistent).
S
is inconsistent if
S
.
S
is consistent if it is not
inconsistent.
The special case we will prove is the following:
Theorem (Model existence theorem). If
S |
=
, then
S
. i.e. if
S
has no
model, then
S
is inconsistent. Equivalently, if
S
is consistent, then
S
has a
model.
While we called this a “special case”, it is in fact all we need to know to
prove adequacy. If we are given
S |
=
t
, then
S t} |
=
. Hence using the
model existence theorem, we know that
S t}
. Hence by the deduction
theorem, we know that S ¬¬t. But (¬¬t) t by Axiom 3. So S t.
As a result, some books call this the “completeness theorem” instead, because
the rest of the completeness theorem follows trivially from this.
The idea of the proof is that we’d like to define v : L {0, 1} by
p 7→
(
1 if p S
0 if p ∈ S
However, this is obviously going to fail, because we could have some
p
such that
S p
but
p ∈ S
, i.e.
S
is not deductively closed. Yet this is not a serious problem
we take the deductive closure first, i.e. add all the statements that
S
can prove.
But there is a more serious problem. There might be a
p
with
S ⊢ p
and
S ⊢ ¬p
. This is the case if, say,
p
never appears in
S
. The idea here is to
arbitrarily declare that
p
is true or false, and add
p
or
¬p
to
S
. What we have
to prove is that we can do so without making S consistent.
We’ll prove this in the following lemma:
Lemma. For consistent
S L
and
p L
, at least one of
S {p}
and
S p}
is consistent.
Proof.
Suppose instead that both
S {p}
and
S p}
. Then by the
deduction theorem,
S p
and
S ¬p
. So
S
, contradicting consistency of
S.
Now we can prove the completeness theorem. Here we’ll assume that the
primitives
P
, and hence the language
L
is countable. This is a reasonable thing
to assume, since we can only talk about finitely many primitives (we only have
a finite life), so uncountably many primitives would be of no use.
However, this is not a good enough excuse to not prove our things properly.
To prove the whole completeness theorem, we will need to use Zorn’s lemma,
which we will discuss in Chapter 3. For now, we will just prove the countable
case.
Proof. Assuming that L is countable, list L as {t
1
, t
2
, ···}.
Let
S
0
=
S
. Then at least one of
S {t
1
}
and
S t
1
}
is consistent. Pick
S
1
to be the consistent one. Then let
S
2
=
S
1
{t
2
}
or
S
1
t
2
}
such that
S
2
is consistent. Continue inductively.
Set
¯
S
=
S
0
S
1
S
2
···
. Then
p
¯
S
or
¬p
¯
S
for each
p L
by construction.
Also, we know that
¯
S
is consistent. If we had
¯
S
, then since proofs are finite,
there is some
S
n
that contains all assumptions used in the proof of
¯
S
. Hence
S
n
, but we know that all S
n
are consistent.
Finally, we check that
¯
S
is deductively closed: if
¯
S p
, we must have
p
¯
S
.
Otherwise, ¬p
¯
S. But this implies that
¯
S is inconsistent.
Define v : L {0, 1} by
p 7→
(
1 if p
¯
S
0 if not
.
All that is left to show is that this is indeed a valuation.
First of all, we have v() = 0 as ∈
¯
S (since
¯
S is consistent).
For p q, we check all possible cases.
(i)
If
v
(
p
) = 1
, v
(
q
) = 0, we have
p
¯
S
,
q ∈
¯
S
. We want to show
p q ∈
¯
S
.
Suppose instead that
p q
¯
S
. Then
¯
S q
by modus ponens. Hence
q
¯
S
since
¯
S
is deductively closed. This is a contradiction. Hence we
must have v(p q) = 0.
(ii)
If
v
(
q
) = 1, then
q
¯
S
. We want to show
p q
¯
S
. By our first axiom,
we know that
q
(
p q
). So
¯
S p q
. So
p q
¯
S
by deductive
closure. Hence we have v(p q) = 1.
(iii) If v(p) = 0, then p ∈
¯
S. So ¬p
¯
S. We want to show p q
¯
S.
This is equivalent to showing ¬p p q.
By the deduction theorem, this is equivalent to proving {p, ¬p} q.
We know that {p, ¬p} . So it is sufficient to show q.
By axiom 3, this is equivalent to showing ¬¬q.
By the deduction theorem, this is again equivalent to showing
¬¬q.
By definition of ¬, this is equivalent to showing (¬q ).
But this is just an instance of the first axiom. So we know that
¯
S p q
.
So v(p q) = 1.
Note that the last case is the first time we really use Axiom 3.
By remark before the proof, we have
Corollary (Adequacy theorem). Let S L, t L. Then S |= t implies S t.
Theorem (Completeness theorem). Le
S L
and
t L
. Then
S |
=
t
if and
only if S t.
This theorem has two nice consequences.
Corollary (Compactness theorem). Let
S L
and
t L
with
S |
=
t
. Then
there is some finite S
S has S
|= t.
Proof. Trivial with |= replaced by , because proofs are finite.
Sometimes when people say compactness theorem, they mean the special
case where
t
=
. This says that if every finite subset of
S
has a model, then
S
has a model. This result can be used to prove some rather unexpected result in
other fields such as graph theory, but we will not go into details.
Corollary (Decidability theorem). Let
S L
be a finite set and
t L
. Then
there exists an algorithm that determines, in finite and bounded time, whether
or not S t.
Proof. Trivial with replaced by |=, by making a truth table.
This is a rather nice result, because we know that proofs are hard to find in
general. However, this theorem only tells you whether a proof exists, without
giving you the proof itself!
2 Well-orderings and ordinals
In the coming two sections, we will study different orderings. The focus of this
chapter is well-orders, while the focus of the next is partial orders.
A well-order on a set
S
is a special type of total order where every non-empty
subset of
S
has a least element. Among the many nice properties of well-orders,
it is possible to do induction and recursion on well-orders.
Our interest, however, does not lie in well-orders itself. Instead, we are
interested in the “lengths” of well-orders. Officially, we call them them the order
types of the well-orders. Each order type is known as an ordinal.
There are many things we can do with ordinals. We can add and multiply
them to form “longer” well-orders. While we will not make much use of them in
this chapter, in later chapters, we will use ordinals to count “beyond infinity”,
similar to how we count finite things using natural numbers.
2.1 Well-orderings
We start with a few definitions.
Definition ((Strict) total order). A (strict) total order or linear order is a pair
(X, <), where X is a set and < is a relation on X that satisfies
(i) x < x for all x (irreflexivity)
(ii) If x < y, y < z, then x < z (transitivity)
(iii) x < y or x = y or y < x (trichotomy)
We have the usual shorthands for total orders. For example,
x > y
means
y < x and x y means (x < y or x = y).
It is also possible for a total order to be defined in terms of instead of <.
Definition ((Non-strict) total order). A (non-strict) total order is a pair (
X,
),
where X is a set and is a relation on X that satisfies
(i) x x (reflexivity)
(ii) x y and y z implies x z (transitivity)
(iii) x y and y x implies x = y (antisymmetry)
(iv) x y or y x (trichotomy)
Example.
(i) N, Z, Q, R with usual the usual orders are total orders.
(ii)
On
N
+
(the positive integers),
x < y
if
x | y
and
x
=
y
is not trichotomous,
and so not a total order.
(iii)
On
P
(
S
), define
x y
if
x y
. This is not a total order since it is not
trichotomous (for |X| > 1).
While there can be different total orders, the particular kind we are interested
in is well-orders.
Definition (Well order). A total order (
X, <
) is a well-ordering if every (non-
empty) subset has a least element, i.e.
(S X)[S = (x S)(y S) y x].
Example.
(i) N with usual ordering is a well-order.
(ii) Z, Q, R
are not well-ordered because the whole set itself does not have a
least element.
(iii) {x Q
:
x
0
}
is not well-ordered. For example,
{x X
:
x >
0
}
has no
least element.
(iv)
In
R
,
{
1
1
/n
:
n
= 2
,
3
,
4
, ···}
is well-ordered because it is isomorphic to
the naturals.
(v) {
1
1
/n
:
n
= 2
,
3
,
4
, ···} {
1
}
is also well-ordered, If the subset is only
1, then 1 is the least element. Otherwise take the least element of the
remaining set.
(vi) Similarly, {1 1/n : n = 2, 3, 4, ···} {2} is well-ordered.
(vii) {
1
1
/n
:
n
= 2
,
3
,
4
, ···} {
2
1
/n
:
n
= 2
,
3
,
4
, ···}
is also well-ordered.
This is a good example to keep in mind.
There is another way to characterize total orders in terms of infinite decreasing
sequences.
Proposition. A total order is a well-ordering if and only if it has no infinite
strictly decreasing sequence.
Proof. If x
1
> x
2
> x
3
> ···, then {x
i
: i N} has no least element.
Conversely, if non-empty
S X
has no least element, then each
x S
have
x
S with x
< x. Similarly, we can find some x
′′
< x
ad infinitum. So
x > x
> x
′′
> x
′′′
> ···
is an infinite decreasing sequence.
Like all other axiomatic theories we study, we identify two total orders to be
isomorphic if they are “the same up to renaming of elements”.
Definition (Order isomorphism). Say the total orders
X, Y
are isomorphic if
there exists a bijection
f
:
X Y
that is order-preserving, i.e.
x < y f
(
x
)
<
f(y).
Example.
(i) N and {1 1/n : n = 2, 3, 4, ···} are isomorphic.
(ii) {
1
1
/n
:
n
= 2
,
3
,
4
, ···}{
1
}
is isomorphic to
{
1
1
/n
:
n
= 2
,
3
,
4
, ···}
{2}.
(iii) {
1
1
/n
:
n
= 2
,
3
,
4
, ···}
and
{
1
1
/n
:
n
= 2
,
3
,
4
, ···} {
1
}
are not
isomorphic because the second has a greatest element but the first doesn’t.
Recall from IA Numbers and Sets that in
N
, the well-ordering principle is
equivalent to the principle induction. We proved that we can do induction simply
by assuming that
N
is well-ordered. Using the same proof, we should be able to
prove that we can do induction on any well-ordered set.
Of course, a general well-ordered set does not have the concept of “+1”, so
we won’t be able to formulate weak induction. Instead, our principle of induction
is the form taken by strong induction.
Proposition (Principle by induction). Let
X
be a well-ordered set. Suppose
S X has the property:
(x)
(y) y < x y S
x S
,
then S = X.
In particular, if a property P (x) satisfies
(x)
(y) y < x P (y)
P (x)
,
then P (x) for all x.
Proof.
Suppose
S
=
X
. Let
x
be the least element of
X \S
. Then by minimality
of x, for all y, y < x y S. Hence x S. Contradiction.
Using proof by induction, we can prove the following property of well-orders:
Proposition. Let
X
and
Y
be isomorphic well-orderings. Then there is a unique
isomorphism between X and Y .
This is something special to well-orders. This is not true for general total
orderings. For example,
x 7→ x
and
x 7→ x
13 are both isomorphisms
Z Z
. It
is also not true for, say, groups. For example, there are two possible isomorphisms
from Z
3
to itself.
Proof.
Let
f
and
g
be two isomorphisms
X Y
. To show that
f
=
g
, it is
enough, by induction, to show f(x) = g(x) given f(y) = g(y) for all y < x.
Given a fixed
x
, let
S
=
{f
(
y
) :
y < x}
. We know that
Y \ S
is non-empty
since
f
(
x
)
∈ S
. So let
a
be the least member of
Y \ S
. Then we must have
f
(
x
) =
a
. Otherwise, we will have
a < f
(
x
) by minimality of
a
, which implies
that
f
1
(
a
)
< x
since
f
is order-preserving. However, by definition of
S
, this
implies that a = f(f
1
(a)) S. This is a contradiction since a Y \ S.
By the induction hypothesis, for
y < x
, we have
f
(
y
) =
g
(
y
). So we have
S = {g(y) : y < x} as well. Hence g(x) = min(Y \ S) = f(x).
If we have an ordered set, we can decide to cut off the top of the set and
keep the bottom part. What is left is an initial segment.
Definition (Initial segment). A subset
Y
of a totally ordered
X
is an initial
segment if
x Y, y < x y Y,
Y
X
Example. For any
x X
, the set
I
x
=
{y X
:
y < x}
is an initial segment.
However, not every initial segment of
X
need to be in this form. For example,
{x
:
x
3
} R
and
{x
:
x <
0
or x
2
<
2
} Q
are both initial segments not of
this form.
The next nice property of well-orders we have is that every proper initial
segment is of this form.
Proposition. Every initial segment
Y
of a well-ordered set
X
is of the form
I
x
= {y X : y < x}.
Proof.
Take
x
=
min X \ Y
. Then for any
y I
x
, we have
y < x
. So
y Y
by
definition of x. So I
x
Y .
On the other hand, if
y Y
, then definitely
y
=
x
. We also cannot have
y > x
since this implies
x Y
. Hence we must have
y < x
. So
y I
x
. Hence
Y I
x
. So Y = I
x
.
The next nice property we want to show is that in a well-ordering
X
, every
subset S is isomorphic to an initial segment.
Note that this is very false for a general total order. For example, in
Z
, no
initial segment is isomorphic to the subset
{
1
,
2
,
3
}
since every initial segment
of
Z
is either infinite or empty. Alternatively, in
R
,
Q
is not isomorphic to an
initial segment.
It is intuitively obvious how we can prove this. We simply send the minimum
element of
S
to the minimum of
X
, and the continue recursively. However, how
can we justify recursion? If we have the well-order
{
1
2
,
2
3
,
3
4
, ··· ,
1
}
, then we will
never reach 1 if we attempt to write down each step of the recursion, since we
can only get there after infinitely many steps. We will need to define some sort
of “infinite recursion” to justify our recursion on general well-orders.
We first define the restriction of a function:
Definition (Restriction of function). For
f
:
A B
and
C A
, the restriction
of f to C is
f|
C
= {(x, f(x)) : x C}.
In this theorem (and the subsequent proof), we are treating functions as
explicitly a set of ordered pairs (
x, f
(
x
)). We will perform set operations on
functions and use unions to “stitch together” functions.
Theorem (Definition by recursion). Let
X
be a well-ordered set and
Y
be any
set. Then for any function
G
:
P
(
X ×Y
)
Y
, there exists a function
f
:
X Y
such that
f(x) = G(f |
I
x
)
for all x.
This is a rather weird definition. Intuitively, it means that
G
takes previous
values of
f
(
x
) and returns the desired output. This means that in defining
f
at
x
, we are allowed to make use of values of
f
on
I
x
. For example, we define
f(n) = nf (n 1) for the factorial function, with f(0) = 1.
Proof.
We might want to jump into the proof and define
f
(0) =
G
(
), where 0
is the minimum element. Then we define
f
(1) =
G
(
f
(0)) etc. But doing so is
simply recursion, which is the thing we want to prove that works!
Instead, we use the following clever trick: We define an
h
is an attempt” to
mean
h : I Y for some initial segment I of X, and h(x) = G(h|
I
x
) for x I.
The idea is to show that for any
x
, there is an attempt
h
that is defined at
x
.
Then take the value
f
(
x
) to be
h
(
x
). However, we must show this is well-defined
first:
Claim. If attempts h and h
are defined at x, then h(x) = h
(x).
By induction on
x
, it is enough to show that
h
(
x
) =
h
(
x
) assuming
h
(
y
) =
h
(y) for all y < x. But then h(x) = G(h|
I
x
) = G(h
|
I
x
) = h
(x). So done.
Claim. For any x, there must exist an attempt h that is defined at x.
Again, we may assume (by induction) that for each
y < x
, there exists an
attempt
h
y
defined at
y
. Then we put all these functions together, and take
h
=
S
y<x
h
y
. This is defined for all
y < x
, and is well-defined since the
h
y
never disagree.
Finally, add to it (
x, G
(
h
|
I
x
)). Then
h
=
h
(
x, G
(
h
|
I
x
)) is an attempt
defined at x.
Now define
f
:
X Y
by
f
(
x
) =
y
if there exists an attempt
h
, defined at
x, with h(x) = y.
Claim. There is a unique such f.
Suppose
f
and
f
both work. Then if
f
(
y
) =
f
(
y
) for all
y < x
, then
f
(
x
) =
f
(
x
) by definition. So by induction, we know for all
x
, we have
f
(x) = f(x).
With the tool of recursion, we can prove that every subset of a well-order.
Lemma (Subset collapse). Let
X
be a well-ordering and let
Y X
. Then
Y
is
isomorphic to an initial segment of
X
. Moreover, this initial segment is unique.
Proof.
For
f
:
Y X
to be an order-preserving bijection with an initial segment
of X, we need to map x to the smallest thing not yet mapped to, i.e.
f(x) = min(X \ {f(y) : y < x}).
To be able to take the minimum, we have to make sure the set is non-empty, i.e.
{f
(
y
) :
y < x}
=
X
. We can show this by proving that
f
(
z
)
< x
for all
z < x
by
induction, and hence x ∈ {f(y) : y < x}.
Then by the recursion theorem, this function exists and is unique.
This implies that a well-ordered
X
can never be isomorphic to a proper
initial segment of itself. This is since
X
is isomorphic to itself by the identity
function, and uniqueness shows that it cannot be isomorphic to another initial
segment.
Using the idea of initial segments, we can define an order comparing different
well-orders themselves.
Notation. Write X Y if X is isomorphic to an initial segment of Y .
We write
X < Y
if
X Y
but
X
is not isomorphic to
Y
, i.e.
X
is isomorphic
to a proper initial segment of Y .
Example. If X = N, Y = {
1
2
,
2
3
,
3
4
, ···} {1}, then X Y .
We will show that this is a total order. Of course, we identify two well-orders
as “equal” when they are isomorphic.
Reflexivity and transitivity are straightforward. So we prove trichotomy and
antisymmetry:
Theorem. Let X, Y be well-orderings. Then X Y or Y X.
Proof. We attempt to define f : X Y by
f(x) = min(Y \ {f (y) : y < x}).
By the law of excluded middle, this function is either well-defined or not.
If it is well-defined, then it is an isomorphism from
X
to an initial segment
of Y .
If it is not, then there is some
x
such that
{f
(
y
) :
y < x}
=
Y
and we cannot
take the minimum. Then
f
is a bijection between
I
x
=
{y
:
y < x}
and
Y
. So
f
is an isomorphism between Y and an initial segment of X.
Hence either X Y or Y X.
Theorem. Let
X, Y
be well-orderings with
X Y
and
Y X
. Then
X
and
Y are isomorphic.
Proof.
Since
X Y
, there is an order-preserving function
f
:
X Y
that
bijects
X
with an initial segment of
Y
. Similarly, since
Y X
, we get an
analogous
g
:
Y X
. Then
g f
:
X X
defines a bijection between
X
and
an initial segment of X.
Since there is no bijection between
X
and a proper initial segment of itself,
the image of g f must be X itself. Hence g f is a bijection.
Similarly,
f g
is a bijection. Hence
f
and
g
are both bijections, and
X
and
Y are isomorphic.
2.2 New well-orderings from old
Given a well-ordering
X
, we want to create more well-orderings. We’ve previously
shown that we can create a shorter one by taking an initial segment. In this
section, we will explore two ways to make longer well-orderings.
Add one element
We can extend a well-ordering by exactly one element. This is known as the
successor.
Definition (Successor). Given
X
, choose some
x ∈ X
and define a well-ordering
on
X {x}
by setting
y < x
for all
y X
. This is the successor of
X
, written
X
+
.
We clearly have X < X
+
.
Put some together
More interestingly, we want to “stitch together” many well-orderings. However,
we cannot just arbitrarily stitch well-orderings together. The well-orderings must
satisfy certain nice conditions for this to be well-defined.
Definition (Extension). For well-orderings (
X, <
X
) and (
Y, <
Y
), we say
Y
extends
X
if
X
is a proper initial segment of
Y
and
<
X
and
<
Y
agree when
defined.
Note that we explicitly require
X
to be an initial segment of
Y
.
X
simply
being a subset of Y will not work, for reasons that will become clear shortly.
Definition (Nested family). We say well-orderings
{X
i
:
i I}
form a nested
family if for any i, j I, either X
i
extends X
j
, or X
j
extends X
i
.
Proposition. Let
{X
i
:
i I}
be a nested set of well-orderings. Then there
exists a well-ordering X with X
i
X for all i.
Proof.
Let
X
=
S
iI
X
i
with
<
defined on
X
as
S
iI
<
i
(where
<
i
is the
ordering of
X
i
), i.e. we inherit the orders from the
X
i
’s. This is clearly a total
ordering. Since
{X
i
:
i I}
is a nested family, each
X
i
is an initial segment of
X.
To show that it is a well-ordering, let
S X
be a non-empty subset of
X
.
Then
S X
i
is non-empty for some
i
. Let
x
be the minimum element (in
X
i
) of
S X
i
. Then also for any
y S
, we must have
x y
, as
X
i
is an initial segment
of X.
Note that if we didn’t require
X
to be an initial segment of
Y
when defining
’extension’, then the above proof will not work. For example, we can take the
collection of all subsets
X
n
=
{x n
:
x Z}
, and their union would be
Z
,
which is not well-ordered.
2.3 Ordinals
We have already shown that the collection of all well-orderings is a total order.
But is it a well-ordering itself? To investigate this issue further, we first define
ourselves a convenient way of talking about well-orderings.
Definition (Ordinal). An ordinal is a well-ordered set, with two regarded as
the same if they are isomorphic. We write ordinals as Greek letters α, β etc.
We would want to define ordinals as equivalence classes of well-orders under
isomorphism, but we cannot, because they do not form a set. We will provide a
formal definition of ordinals later when we study set theory.
Definition (Order type). If a well-ordering
X
has corresponding ordinal
α
, we
say X has order type α, and write otp(X) = α.
Notation. For each
k N
, we write
k
for the order type of the (unique)
well-ordering of size k. We write ω for the order type of N.
Example. In R, {2, 3, 5, 6} has order type 4. {
1
2
,
2
3
,
3
4
, ···} has order type ω.
Notation. For ordinals
α, β
, write
α β
if
X Y
for some
X
of order type
α
,
Y
of order type
β
. This does not depend on the choice of
X
and
Y
(since any
two choices must be isomorphic).
Proposition. Let
α
be an ordinal. Then the ordinals
< α
form a well-ordering
of order type α.
Notation. Write I
α
= {β : β < α}.
Proof.
Let
X
have order type
α
. The well-orderings
< X
are precisely (up to
isomorphism) the proper initial segments of
X
(by uniqueness of subset collapse).
But these are the
I
x
for all
x X
. So we can biject
X
with the well-orderings
< X by x 7→ I
x
.
Finally, we can prove that the ordinals are well-ordered.
Proposition. Let
S
be a non-empty set of ordinals. Then
S
has a least element.
Proof. Choose α S. If it is minimal, done.
If not, then
S I
α
is non-empty. But
I
α
is well-ordered. So
S I
α
has a
least element, β. Then this is a minimal element of S.
However, it would be wrong to say that the ordinals form a well-ordered set,
for the very reason that they don’t form a set.
Theorem (Burali-Forti paradox). The ordinals do not form a set.
Proof.
Suppose not. Let
X
be the set of ordinals. Then
X
is a well-ordering.
Let its order-type be
α
. Then
X
is isomorphic to
I
α
, a proper initial subset of
X. Contradiction.
Recall that we could create new well-orderings from old via adding one
element and taking unions. We can translate these into ordinal language.
Given an ordinal
α
, suppose that
X
is the corresponding well-order. Then
we define α
+
to be the order type of X
+
.
If we have a set
{α
i
:
i I}
of ordinals, we can stitch them together to form
a new well-order. In particular, we apply “nested well-orders” to the initial
segments
{I
α
i
:
i I}
. This produces an upper bound of the ordinals
α
i
. Since
the ordinals are well-ordered, we know that there is a least upper bound. We
call this the supremum of the set
{α
i
:
i I}
, written
sup{α
i
:
i I}
. In fact,
the upper bound created by nesting well-orders is the least upper bound.
Example. {2, 4, 6, 8, ···} has supremum ω.
Now we have two ways of producing ordinals: +1 and supremum.
We can generate a lot of ordinals now:
0 ω · 2 + 1 ω
2
+ 1 ω
2
· 3 ω
ω+2
ε
0
+ 1
1 ω · 2 + 2 ω
2
+ 2 ω
2
· 4
.
.
.
.
.
.
2 ω · 2 + 3 ω
2
+ 3 ω
2
· 5 ω
ω·2
ε
0
· 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ω ω · 3 ω
2
+ ω ω
3
ω
ω
2
ε
2
0
ω + 1 ω · 4
.
.
.
.
.
.
.
.
.
.
.
.
ω + 2 ω · 5 ω
2
+ ω · 2 ω
ω
ω
ω
ω
ε
ε
0
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ω + ω = ω · 2 ω · ω = ω
2
ω
2
· 2 ω
ω+1
ω
ω
.
.
.
= ε
0
ε
ε
.
.
.
0
0
= ε
1
Here we introduced a lot of different notations. For example, we wrote
ω
+ 1 to
mean
ω
+
, and
ω ·
2 =
sup{ω, ω
+ 1
, ω
+ 2
, ···}
. We will formally define these
notations later.
We have written a lot of ordinals above, some of which are really huge.
However, all the ordinals above are countable. The operations we have done so
far is adding one element and taking countable unions. So the results are all
countable. So is there an uncountable ordinal?
Theorem. There is an uncountable ordinal.
Proof.
This is easy by looking at the supremum of the set of all countable
ordinals. However, this works only if the collection of countable ordinals is a set.
Let
A
=
{R P
(
N × N
) :
R is a well-ordering of a subset of N}
. So
A
P
(
N × N
). Then
B
=
{order type of R
:
R A}
is the set of all countable
ordinals.
Let
ω
1
=
sup B
. Then
ω
1
is uncountable. Indeed, if
ω
1
were countable, then
it would be the greatest countable ordinal, but
ω
1
+ 1 is greater and is also
countable.
By definition,
ω
1
is the least uncountable ordinal, and everything in our
previous big list of ordinals is less than ω
1
.
There are two strange properties of ω
1
:
(i) ω
1
is an uncountable ordering, yet for every
x ω
1
, the set
{y
:
y < x}
is
countable.
(ii)
Every sequence in
ω
1
is bounded, since its supremum is a countable union
of countable sets, which is countable.
In general, we have the following theorem:
Theorem (Hartogs’ lemma). For any set
X
, there is an ordinal that does not
inject into X.
Proof. As before, with B = {α : α injects into X}.
Notation. Write
γ
(
X
) for the least ordinal that does not inject into
X
. e.g.
γ(ω) = ω
1
.
2.4 Successors and limits
In general, we can divide ordinals into two categories. The criteria is as follows:
Given an ordinal
α
, is there a greatest element of
α
? i.e. does
I
α
=
{β
:
β < α}
have a greatest element?
If yes, say
β
is the greatest element. Then
γ I
α
γ β
. So
I
α
=
{β}I
β
.
In other words, α = β
+
.
Definition (Successor ordinal). An ordinal
α
is a successor ordinal if there is a
greatest element β below it. Then α = β
+
.
On the other hand, if no, then for any
γ < α
, there exists
β < α
such that
β > γ. So α = sup{β : β < α}.
Definition (Limit ordinal). An ordinal
α
is a limit if it has no greatest element
below it. We usually write λ for limit ordinals.
Example. 5 and
ω
+
are successors.
ω
and 0 are limits (0 is a limit because it
has no element below it, let alone a greatest one!).
2.5 Ordinal arithmetic
We want to define ordinals arithmetic such as + and
×
, so that we can make
formal sense out of our notations such as ω + ω in our huge list of ordinals.
We first start with addition.
Definition (Ordinal addition (inductive)). Define
α
+
β
by recursion on
β
(
α
is fixed):
α + 0 = α.
α + β
+
= (α + β)
+
.
α + λ = sup{α + γ : γ < λ} for non-zero limit λ.
Note that officially, we cannot do “recursion on the ordinals”, since the
ordinals don’t form a set. So what we officially do is that we define
α
+
γ
on
{γ
:
γ < β}
recursively for each ordinal
β
. Then by uniqueness of recursions, we
can show that this addition is well-defined.
Example. ω + 1 = (ω + 0)
+
= ω
+
.
ω + 2 = (ω + 1)
+
= ω
++
.
1 + ω = sup{1 + n : n ω} = sup{1, 2, 3, ···} = ω.
It is very important to note that addition is not commutative! This asymmetry
arises from our decision to perform recursion on β instead of α.
On the other hand, addition is associative.
Proposition. Addition is associative, i.e. (α + β) + γ = α + (β + γ).
Proof.
Since we define addition by recursion, it makes sense to prove this by
induction. Since we recursed on the right-hand term in the definition, it only
makes sense to induct on γ (and fix α + β).
(i) If γ = 0, then α + (β + 0) = α + β = (α + β) + 0.
(ii) If γ = δ
+
is a successor, then
α + (β + δ
+
) = α + (β + δ)
+
= [α + (β + δ)]
+
= [(α + β) + δ]
+
= (α + β) + δ
+
= (α + β) + γ.
(iii) If γ is a limit ordinal, we have
(α + β) + λ = sup{(α + β) + γ : γ < λ}
= sup{α + (β + γ) : γ < λ}
If we want to evaluate
α
+ (
β
+
λ
), we have to first know whether
β
+
λ
is
a successor or a limit. We now claim it is a limit:
β
+
λ
=
sup{β
+
γ
:
γ < λ}
. We show that this cannot have a greatest
element: for any
β
+
γ
, since
λ
is a limit ordinal, we can find a
γ
such that
γ < γ
< λ. So β + γ
> β + γ. So β + γ cannot be the greatest element.
So
α + (β + λ) = sup{α + δ : δ < β + λ}.
We need to show that
sup{α + δ : δ < β + λ} = sup{α + (β + γ) : γ < λ}.
Note that the two sets are not equal. For example, if
β
= 3 and
λ
=
ω
,
then the left contains α + 2 but the right does not.
So we show that the left is both and the right.
: Each element of the right hand set is an element of the left.
: For
δ < β
+
λ
, we have
δ < sup{β
+
γ
:
γ < λ}
. So
δ < β
+
γ
for some
γ < λ. Hence α + δ < α + (β + γ).
Note that it is easy to prove that
β < γ α
+
β < α
+
γ
by induction on
γ
(which we implicitly assumed above). But it is not true if we add on the right:
1 < 2 but 1 + ω = 2 + ω.
The definition we had above is called the inductive definition. There is an
alternative definition of + based on actual well-orders. This is known as the
synthetic definition.
Intuitively, we first write out all the elements of
α
, then write out all the
elements of β after it. The α + β is the order type of the combined mess.
Definition (Ordinal addition (synthetic)).
α
+
β
is the order type of
α β
(
α
disjoint union β, e.g. α × {0} β × {1}), with all α before all of β
α + β = α β
Example. ω + 1 = ω
+
.
1 + ω = ω.
With this definition, associativity is trivial:.
α + (β + γ) = α β γ = (α + β) + γ.
Now that we have given two definitions, we must show that they are the same:
Proposition. The inductive and synthetic definition of + coincide.
Proof.
Write + for inductive definition, and +
for synthetic. We want to show
that α + β = α +
β. We induct on β.
(i) α + 0 = α = α +
0.
(ii) α + β
+
= (α + β)
+
= (α +
β)
+
= otp α
β · = α +
β
+
(iii) α
+
λ
=
sup{α
+
γ
:
γ < λ}
=
sup{α
+
γ
:
γ < λ}
=
α
+
λ
. This works
because taking the supremum is the same as taking the union.
α γ γ
γ
′′
···λ
The synthetic definition is usually easier to work with, if possible. For
example, it was very easy to show associativity using the synthetic definition. It
is also easier to see why addition is not commutative. However, if we want to do
induction, the inductive definition is usually easier.
After addition, we can define multiplication. Again, we first give an inductive
definition, and then a synthetic one.
Definition (Ordinal multiplication (inductive)). We define
α · β
by induction
on β by:
(i) α · 0 = 0.
(ii) α · (β
+
) = α · β + α.
(iii) α · λ = sup{α · γ : γ < λ} for λ a non-zero limit.
Example.
ω · 1 = ω · 0 + ω = 0 + ω = ω.
ω · 2 = ω · 1 + ω = ω + ω.
2 · ω = sup{2 · n : n < ω} = ω.
ω · ω = sup{ω · n : n < ω} = sup{ω, ω
2
, ω
3
, ···}.
We also have a synthetic definition.
Definition (Ordinal multiplication (synthetic)).
β
α
.
.
.
α
α
Formally,
α ·β
is the order type of
α ×β
, with (
x, y
)
<
(
x
, y
) if
y < y
or (
y
=
y
and x < x
).
Example. ω · 2 =
ω
ω
= ω + ω.
Also 2 · ω = ω
··
.
.
.
··
··
= ω
We can check that the definitions coincide, prove associativity etc. similar to
what we did for addition.
We can define ordinal exponentiation, towers etc. similarly:
Definition (Ordinal exponentiation (inductive)). α
β
is defined as
(i) α
0
= 1
(ii) α
β
+
= α
β
· α
(iii) α
λ
= sup{α
γ
: γ < λ}.
Example. ω
1
= ω
0
· ω = 1 · ω = ω.
ω
2
= ω
1
· ω = ω · ω.
2
ω
= sup{2
n
: n < ω} = ω.
2.6 Normal functions*
Note: These content were not lectured during the year.
When we have ordinals, we would like to consider functions
On On
.
Since the ordinals are totally ordered, it would make sense to consider the
order-preserving functions, i.e. the increasing ones. However, ordinals have an
additional property we could take suprema of ordinals. If we want our function
to preserve this as well, we are lead to the following definition:
Definition (Normal function). A function f : On On is normal if
(i) For any ordinal α, we have f(α) < f(α
+
).
(ii) If λ is a non-zero limit ordinal, then f (λ) = sup{f(γ) : γ < λ}.
Some replace the increasing condition by
f
(
α
)
< f
(
α
+
). These are easily
seen to be equivalent by transfinite induction.
Example. By definition, we see that for each
β >
1, the function
α 7→ β
α
is
normal.
We start by a few technical lemmas.
Lemma. Let f be a normal function. Then f is strictly increasing.
Proof. Let α be a fixed ordinal. We induct on all β > α that f(α) < f(β).
If β = α
+
, then the result is obvious.
If
β
=
γ
+
with
γ
=
α
, then
α < γ
. So
f
(
α
)
< f
(
γ
)
< f
(
γ
+
) =
f
(
β
) by
induction.
If β is a limit and is greater than α, then
f(β) = sup{f(γ) : γ < β} f(α
+
) > f (α),
since α
+
< β. So the result follows.
Lemma. Let f be a normal function, and α an ordinal. Then f (α) α.
Proof.
We prove by induction. It is trivial for zero. For successors, we have
f(α
+
) > f (α) α, so f(α
+
) α
+
. For limits, we have
f(λ) = sup{f (γ) : γ < λ} sup{γ : γ < λ} = λ.
The following is a convenient refinement of the continuity result:
Lemma. If
f
is a normal function, then for any non-empty set
{α
i
}
iI
, we have
f(sup{α
i
: i I}) = sup{f(α
i
) : i I}.
Proof.
If
{α
i
}
has a maximal element, then the result is obvious, as
f
is increasing,
and the supremum is a maximum.
Otherwise, let
α = sup{α
i
: i I}
Since the
α
i
has no maximal element, we know
α
must be a limit ordinal. So we
have
f(α) = sup{f (β) : β < α}.
So it suffices to prove that
sup{f(β) : β < α} = sup{f(α
i
) : i I}.
Since all α
i
< α, we have sup{f (β) : β < α} sup{f (α
i
) : i I}.
For the other direction, it suffices, by definition, to show that
f(β) sup{f(α
γ
) : i I}
for all β < α.
Given such a
β
, since
α
is the supremum of the
α
i
, we can find some particular
α
i
such that
β < α
i
. So
f
(
β
)
< f
(
α
i
)
sup{f
(
α
i
) :
i I}
. So we are done.
Because of these results, some define normal functions to be functions that
are strictly increasing and preserve all suprema.
We now proceed to prove two important properties of normal functions (with
easy proofs!):
Lemma (Fixed-point lemma). Let
f
be a normal function. Then for each
ordinal α, there is some β α such that f(β) = β.
Since the supremum of fixed points is also a fixed point (by normality), it
follows that we can define a function
g
:
On On
that enumerates the fixed
points. Now this function itself is again normal, so it has fixed points as well. . .
Proof. We thus define
β = sup{f (α), f(f (α)), f(f (f (α))), ···}.
If the sequence eventually stops, then we have found a fixed point. Otherwise,
β
is a limit ordinal, and thus normality gives
f(β) = sup{f(f(α)), f (f(f(α))), f (f(f(f(α)))), ···} = β.
So β is a fixed point, and β f(α) α.
Lemma (Division algorithm for normal functions). Let
f
be a normal function.
Then for all α, there is some maximal γ such that α f(γ).
Proof. Let γ = sup{β : f(β) α}. Then we have
f(γ) = sup{f(β) : f (b) α} α.
This is clearly maximal.