Part II Logic and Set Theory
Based on lectures by I. B. Leader
Notes taken by Dexter Chua
Lent 2015
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
No specific prerequisites.
Ordinals and cardinals
Well-orderings and order-types. Examples of countable ordinals. Uncountable ordi-
nals and Hartogs’ lemma. Induction and recursion for ordinals. Ordinal arithmetic.
Cardinals; the hierarchy of alephs. Cardinal arithmetic. [5]
Posets and Zorn’s lemma
Partially ordered sets; Hasse diagrams, chains, maximal elements. Lattices and Boolean
algebras. Complete and chain-complete p osets; fixed-point theorems. The axiom of
choice and Zorn’s lemma. Applications of Zorn’s lemma in mathematics. The well-
ordering principle. [5]
Propositional logic
The propositional calculus. Semantic and syntactic entailment. The deduction and
completeness theorems. Applications: compactness and decidability. [3]
Predicate logic
The predicate calculus with equality. Examples of first-order languages and theories.
Statement of the completeness theorem; *sketch of proof*. The compactness theorem
and the Lowenheim-Skolem theorems. Limitations of first-order logic. Model theory. [5]
Set theory
Set theory as a first-order theory; the axioms of ZF set theory. Transitive closures,
epsilon-induction and epsilon-recursion. Well-founded relations. Mostowski’s collapsing
theorem. The rank function and the von Neumann hierarchy. [5]
Consistency
*Problems of consistency and independence* [1]
Contents
0 Introduction
1 Propositional calculus
1.1 Propositions
1.2 Semantic entailment
1.3 Syntactic implication
2 Well-orderings and ordinals
2.1 Well-orderings
2.2 New well-orderings from old
2.3 Ordinals
2.4 Successors and limits
2.5 Ordinal arithmetic
2.6 Normal functions*
3 Posets and Zorn’s lemma
3.1 Partial orders
3.2 Zorn’s lemma and axiom of choice
3.3 Bourbaki-Witt theorem*
4 Predicate logic
4.1 Language of predicate logic
4.2 Semantic entailment
4.3 Syntactic implication
4.4 Peano Arithmetic
4.5 Completeness and categoricity*
5 Set theory
5.1 Axioms of set theory
5.2 Properties of ZF
5.3 Picture of the universe
6 Cardinals
6.1 Definitions
6.2 Cardinal arithmetic
7 Incompleteness*
0 Introduction
Most people are familiar with the notion of “sets” (here “people” is defined
to be mathematics students). However, most of the time, we only have an
intuitive picture of what set theory should look like there are sets, we can
take intersections, unions, intersections and subsets. We can write down sets
like {x : ϕ(x) is true}.
Historically, mathematicians were content with this vague notion of sets.
However, it turns out that our last statement wasn’t really correct. We cannot
just arbitrarily write down sets we like. This is evidenced by the famous Russel’s
paradox, where the set
X
is defined as
X
=
{x
:
x ∈ x}
. Then we have
X X X ∈ X, which is a contradiction.
This lead to the formal study of set theory, where set theory is given a formal
foundation based on some axioms of set theory. This is known as axiomatic
set theory. This is similar to Euclid’s axioms of geometry, and, in some sense,
the group axioms. Unfortunately, while axiomatic set theory appears to avoid
paradoxes like Russel’s paradox, as odel proved in his incompleteness theorem,
we cannot prove that our axioms are free of contradictions.
Closely related to set theory is formal logic. Similarly, we want to put logic
on a solid foundation. We want to formally define our everyday notions such as
propositions, truth and proofs. The main result we will have is that a statement
is true if and only if we can prove it. This assures that looking for proofs is a
sensible way of showing that a statement is true.
It is important to note that having studied formal logic does not mean that
we should always reason with formal logic. In fact, this is impossible, as we
ultimately need informal logic to reason about formal logic itself!
Throughout the course, we will interleave topics from set theory and formal
logic. This is necessary as we need tools from set theory to study formal logic,
while we also want to define set theory within the framework of formal logic.
One is not allowed to complain that this involves circular reasoning.
As part of the course, we will also side-track to learn about well-orderings
and partial orders, as these are very useful tools in the study of logic and set
theory. Their importance will become evident as we learn more about them.
1 Propositional calculus
Propositional calculus is the study of logical statements such
p p
and
p
(
q p
). As opposed to predicate calculus, which will be studied in Chapter 4,
the statements will not have quantifier symbols like , .
When we say
p p
is a correct”, there are two ways we can interpret this.
We can interpret this as “no matter what truth value
p
takes,
p p
always has
the truth value of “true”.” Alternatively, we can interpret this as “there is a
proof that p p”.
The first notion is concerned with truth, and does not care about whether we
can prove things. The second notion, on the other hand, on talks about proofs.
We do not, in any way, assert that
p p
is “true”. One of the main objectives
of this chapter is to show that these two notions are consistent with each other.
A statement is true (in the sense that it always has the truth value of “true”) if
and only if we can prove it. It turns out that this equivalence has a few rather
striking consequences.
Before we start, it is important to understand that there is no “standard”
logical system. What we present here is just one of the many possible ways of
doing formal logic. In particular, do not expect anyone else to know exactly how
your system works without first describing it. Fortunately, no one really writes
proof with formal logic, and the important theorems we prove (completeness,
compactness etc.) do not depend much on the exact implementation details of
the systems.
1.1 Propositions
We’ll start by defining propositions, which are the statements we will consider
in propositional calculus.
Definition (Propositions). Let
P
be a set of primitive propositions. These are a
bunch of (meaningless) symbols (e.g.
p
,
q
,
r
), which are used as the basic building
blocks of our more interesting propositions. These are usually interpreted to
take a truth value. Usually, any symbol (composed of alphabets and subscripts)
is in the set of primitive propositions.
The set of propositions, written as L or L(P ), is defined inductively by
(i) If p P , then p L.
(ii) L, where is read as “false” (also a meaningless symbol).
(iii) If p, q L, then (p q) L.
Example. If our set of primitive propositions is
P
=
{p, q, r}
, then
p q
,
p , ((p q) (p r)) are propositions.
To define L formally, we let
L
0
= {⊥} P
L
n+1
= L
n
{(p q) : p, q L
n
}.
Then we define L = L
0
L
1
L
2
···.
In formal language terms,
L
is the set of finite strings of symbols from the
alphabet
( )
p
1
p
2
···
that satisfy some formal grammar rule (e.g. brackets
have to match).
Note here that officially, the only relation we have is
. The familiar “not”,
“and” and “or” do not exist. Instead, we define them to be abbreviations of
certain expressions:
Definition (Logical symbols).
¬p (“not p”) is an abbreviation for (p )
p q (“p and q”) is an abbreviation for ¬(p (¬q))
p q (“p or q”) is an abbreviation for (¬p) q
The advantage of having just one symbol
is that when we prove something
about our theories, we only have to prove it for
, instead of all
, ¬,
and
individually.
1.2 Semantic entailment
The idea of semantic entailment is to assign truth values to propositions, where
we declare each proposition to be “true” or “false”. This assignment is performed
by a valuation.
Definition (Valuation). A valuation on
L
is a function
v
:
L {
0
,
1
}
such
that:
v() = 0,
v(p q) =
(
0 if v(p) = 1, v(q) = 0,
1 otherwise
We interpret
v
(
p
) to be the truth value of
p
, with 0 denoting “false” and 1
denoting “true”.
Note that we do not impose any restriction of
v
(
p
) when
p
is a primitive
proposition.
For those people who like homomorphisms, we can first give the set
{
0
,
1
}
a
binary operation by
a b =
(
0 if a = 1, b = 0
1 otherwise
as well as a constant
= 0. Then a valuation can be defined as a homomorphism
between L and {0, 1} that preserves and .
It should be clear that a valuation is uniquely determined by its values on
the primitive propositions, as the values on all other propositions follow from
the definition of a valuation. In particular, we have
Proposition.
(i) If v and v
are valuations with v(p) = v
(p) for all p P , then v = v
.
(ii)
For any function
w
:
P {
0
,
1
}
, we can extend it to a valuation
v
such
that v(p) = w(p) for all p L.
Proof.
(i)
Recall that
L
is defined inductively. We are given that
v
(
p
) =
v
(
p
) on
L
0
. Then for all
p L
1
,
p
must be in the form
q r
for
q, r L
0
. Then
v
(
q r
) =
v
(
p q
) since the value of
v
is uniquely determined by the
definition. So for all p L
1
, v(p) = v
(p).
Continue inductively to show that v(p) = v
(p) for all p L
n
for any n.
(ii)
Set
v
to agree with
w
for all
p P
, and set
v
(
) = 0. Then define
v
on
L
n
inductively according to the definition.
Example. Suppose v is a valuation with v(p) = v(q) = 1, v(r) = 0. Then
v((p q) r) = 0.
Often, we are interested in propositions that are always true, such as
p p
.
These are known as tautologies.
Definition (Tautology).
t
is a tautology, written as
|
=
t
, if
v
(
t
) = 1 for all
valuations v.
To show that a statement is a tautology, we can use a truth table, where we
simply list out all possible valuations and find the value of v(t).
Example.
(i) |= p (q p) “A true statement is implied by anything”
v(p) v(q) v(q p) v(p (q p))
1 1 1 1
1 0 1 1
0 1 0 1
0 0 1 1
(ii) |= (¬¬p) p. Recall that ¬¬p is defined as ((p ) ).
v(p) v(p ) v((p ) ) v(((p ) ) p)
1 0 1 1
0 1 0 1
(iii) |= [p (q r)] [(p q) (p r)].
Instead of creating a truth table, which would be horribly long, we show
this by reasoning: Suppose it is not a tautology. So there is a
v
such that
v
(
p
(
q r
)) = 1 and
v
((
p q
)
(
p r
)) = 0. For the second
equality to hold, we must have
v
(
p q
) = 1 and
v
(
p r
) = 0. So
v(p) = 1, v(r) = 0, v(q) = 1. But then v(p (q r)) = 0.
Sometimes, we don’t want to make statements as strong as
t
is always true”.
Instead, we might want to say
t
is true whenever
S
is true”. This is known as
semantic entailment.
Definition (Semantic entailment). For
S L
,
t L
, we say
S
entails
t
,
S
semantically implies t or
S |
=
t
if, for any
v
such that
v
(
s
) = 1 for all
s S
, we
have v(t) = 1.
Here we are using the symbol
|
= again. This is not an attempt to confuse
students. |= t is equivalent to the statement |= t.
Example. {p q, q r} |= (p r).
We want to show that for any valuation
v
with
v
(
p q
) =
v
(
q r
) = 1, we
have v(p r) = 1. We prove the contrapositive.
If
v
(
p r
) = 0, then
v
(
p
) = 1 and
v
(
r
) = 0. If
v
(
q
) = 0, then
v
(
p q
) = 0.
If
v
(
q
) = 1, then
v
(
q r
) = 0. So
v
(
p r
) = 0 only if one of
v
(
p q
) or
v(q r) is zero.
Note that
{p} |
=
q
and
p q
both intuitively mean “if
p
is true, then
q
is
true”. However, these are very distinct notions.
p q
is a proposition within
our theory. It is true (or false) in the sense that valuations take the value 0 (or
1).
On the other hand, when we say
{p} |
=
q
, this is a statement in the meta-
theory. It is true (or false) in the sense that we decided it is true (or false) based
on some arguments and (informal) proofs, performed in the real world instead
of inside propositional calculus.
The same distinction should be made when we define syntactic implication
later.
Before we dive into syntactic implication, we will define a few convenient
terms for later use.
Definition (Truth and model). If
v
(
t
) = 1, then we say that
t
is true in
v
, or
v
is a model of
t
. For
S L
, a valuation
v
is a model of
S
if
v
(
s
) = 1 for all
s S
.
1.3 Syntactic implication
While semantic implication captures the idea of truthfulness, syntactic impli-
cation captures the idea of proofs. We want to say
S
syntactically implies
t
if
there we can prove t from S.
To prove propositions, we need two things. Firstly, we need axioms, which
are statements we can assume to be true in a proof. These are the basic building
blocks from which we will prove our theorems.
Other than axioms, we also need deduction rules. This allows as to make
deduce statements from other statements.
Our system of deduction composes of the following three axioms:
1. p (q p)
2. [p (q r)] [(p q) (p r)]
3. (¬¬p) p
and the deduction rule of modus ponens: from p and p q, we can deduce q.
At first sight, our axioms look a bit weird, especially the second one. We
will later see that how this particular choice of axioms allows us to prove certain
theorems more easily. This choice of axioms can also be motivated by combinatory
logic, but we shall not go into details of these.
Definition (Proof and syntactic entailment). For any
S L
, a proof of
t
from
S
is a finite sequence
t
1
, t
2
, ···t
n
of propositions, with
t
n
=
t
, such that each
t
i
is one of the following:
(i) An axiom
(ii) A member of S
(iii) A proposition t
i
such that there exist j, k < i with t
j
being t
k
t
i
.
If there is a proof of
t
from
S
, we say that
S
proves or syntactically entails
t
,
written S t.
If t, we say t is a theorem and write t.
In a proof of
t
from
S
,
t
is the conclusion and
S
is the set of hypothesis or
premises.
Example. {p q, q r} p r
We go for (p q) (p r) via Axiom 2.
1. [p (q r)] [(p q) (p r)] Axiom 2
2. q r Hypothesis
3. (q r) [p (q r)] Axiom 1
4. p (q r) MP on 2, 3
5. (p q) (p r) MP on 1, 4
6. p q Hypothesis
7. p r MP on 5, 6
Example. (p p)
We go for [p (p p)] (p p).
1. [p ((p p) p)] [(p (p p)) (p p)] Axiom 2
2. p ((p p) p) Axiom 1
3. [p (p p)] (p p) MP on 1, 2
4. p (p p) Axiom 1
5. p p MP on 3, 4
This seems like a really tedious way to prove things. We now prove that the
deduction theorem, which allows as to find proofs much more easily.
Proposition (Deduction theorem). Let S L and p, q L. Then we have
S (p q) S {p} q.
This says that behaves like the connective in the language.
Proof. () Given a proof of p q from S, append the lines
p Hypothesis
q MP
to obtain a proof of q from S {p}.
(
) Let
t
1
, t
2
, ··· , t
n
=
q
be a proof of
q
from
S {p}
. We’ll show that
S p t
i
for all i.
We consider different possibilities of t
i
:
t
i
is an axiom: Write down
t
i
(p t
i
) Axiom 1
t
i
Axiom
p t
i
MP
t
i
S: Write down
t
i
(p t
i
) Axiom 1
t
i
Hypothesis
p t
i
MP
t
i
= p: Write down our proof of p p from our example above.
t
i
is obtained by MP: we have some
j, k < i
such that
t
k
= (
t
j
t
i
). We
can assume that
S
(
p t
j
) and
S
(
p t
k
) by induction on
i
. Now
we can write down
[p (t
j
t
i
)] [(p t
j
) (p t
i
)] Axiom 2
p (t
j
t
i
) Known already
(p t
j
) (p t
i
) MP
p t
j
Known already
p t
i
MP
to get S (p t
i
).
This is the reason why we have this weird-looking Axiom 2 it enables us to
easily prove the deduction theorem.
This theorem has a “converse”. Suppose we have a deduction system system
that admits modus ponens, and the deduction theorem holds for this system.
Then axioms (1) and (2) must hold in the system, since we can prove them using
the deduction theorem and modus ponens. However, we are not able to deduce
axiom (3) from just modus ponens and the deduction theorem.
Example. We want to show
{p q, q r}
(
p r
). By the deduction
theorem, it is enough to show that
{p q, q r, p} r
, which is trivial by
applying MP twice.
Now we have two notions:
|
= and
. How are they related? We want to
show that they are equal: if something is true, we can prove it; if we can prove
something, it must be true.
Aim. Show that S t if and only if S |= t.
This is known as the completeness theorem, which is made up of two directions:
(i) Soundness: If S t, then S |= t. “Our axioms aren’t absurd”
(ii)
Adequacy: If
S |
=
t
,
S t
. “Our axioms are strong enough to be able to
deduce, from S, all semantic consequences of S.”
We first prove the easy bit:
Proposition (Soundness theorem). If S t, then S |= t.
Proof.
Given valuation
v
with
v
(
s
) = 1 for all
s S
, we need to show that
v(t) = 1. We will show that every line t
i
in the proof has v(t
i
) = 1.
If
t
i
is an axiom, then
v
(
t
i
) = 1 since axioms are tautologies. If
t
i
is a
hypothesis, then by assumption
v
(
s
) = 1. If
t
i
is obtained by modus ponens, say
from t
j
t
i
, since v(t
j
) = 1 and v(t
j
t
i
) = 1, we must have v(t
i
) = 1.
Note that soundness holds whenever our axioms are all tautologies. Even if
we had silly axioms that are able to prove almost nothing, as long as they are
all tautologies, it will be sound.
Now we have to prove adequacy. It seems like a big and scary thing to prove.
Given that a statement is true, we have to find a proof for it, but we all know
that finding proofs is hard!
We first prove a special case. To do so, we first define consistency.
Definition (Consistent).
S
is inconsistent if
S
.
S
is consistent if it is not
inconsistent.
The special case we will prove is the following:
Theorem (Model existence theorem). If
S |
=
, then
S
. i.e. if
S
has no
model, then
S
is inconsistent. Equivalently, if
S
is consistent, then
S
has a
model.
While we called this a “special case”, it is in fact all we need to know to
prove adequacy. If we are given
S |
=
t
, then
S t} |
=
. Hence using the
model existence theorem, we know that
S t}
. Hence by the deduction
theorem, we know that S ¬¬t. But (¬¬t) t by Axiom 3. So S t.
As a result, some books call this the “completeness theorem” instead, because
the rest of the completeness theorem follows trivially from this.
The idea of the proof is that we’d like to define v : L {0, 1} by
p 7→
(
1 if p S
0 if p ∈ S
However, this is obviously going to fail, because we could have some
p
such that
S p
but
p ∈ S
, i.e.
S
is not deductively closed. Yet this is not a serious problem
we take the deductive closure first, i.e. add all the statements that
S
can prove.
But there is a more serious problem. There might be a
p
with
S ⊢ p
and
S ⊢ ¬p
. This is the case if, say,
p
never appears in
S
. The idea here is to
arbitrarily declare that
p
is true or false, and add
p
or
¬p
to
S
. What we have
to prove is that we can do so without making S consistent.
We’ll prove this in the following lemma:
Lemma. For consistent
S L
and
p L
, at least one of
S {p}
and
S p}
is consistent.
Proof.
Suppose instead that both
S {p}
and
S p}
. Then by the
deduction theorem,
S p
and
S ¬p
. So
S
, contradicting consistency of
S.
Now we can prove the completeness theorem. Here we’ll assume that the
primitives
P
, and hence the language
L
is countable. This is a reasonable thing
to assume, since we can only talk about finitely many primitives (we only have
a finite life), so uncountably many primitives would be of no use.
However, this is not a good enough excuse to not prove our things properly.
To prove the whole completeness theorem, we will need to use Zorn’s lemma,
which we will discuss in Chapter 3. For now, we will just prove the countable
case.
Proof. Assuming that L is countable, list L as {t
1
, t
2
, ···}.
Let
S
0
=
S
. Then at least one of
S {t
1
}
and
S t
1
}
is consistent. Pick
S
1
to be the consistent one. Then let
S
2
=
S
1
{t
2
}
or
S
1
t
2
}
such that
S
2
is consistent. Continue inductively.
Set
¯
S
=
S
0
S
1
S
2
···
. Then
p
¯
S
or
¬p
¯
S
for each
p L
by construction.
Also, we know that
¯
S
is consistent. If we had
¯
S
, then since proofs are finite,
there is some
S
n
that contains all assumptions used in the proof of
¯
S
. Hence
S
n
, but we know that all S
n
are consistent.
Finally, we check that
¯
S
is deductively closed: if
¯
S p
, we must have
p
¯
S
.
Otherwise, ¬p
¯
S. But this implies that
¯
S is inconsistent.
Define v : L {0, 1} by
p 7→
(
1 if p
¯
S
0 if not
.
All that is left to show is that this is indeed a valuation.
First of all, we have v() = 0 as ∈
¯
S (since
¯
S is consistent).
For p q, we check all possible cases.
(i)
If
v
(
p
) = 1
, v
(
q
) = 0, we have
p
¯
S
,
q ∈
¯
S
. We want to show
p q ∈
¯
S
.
Suppose instead that
p q
¯
S
. Then
¯
S q
by modus ponens. Hence
q
¯
S
since
¯
S
is deductively closed. This is a contradiction. Hence we
must have v(p q) = 0.
(ii)
If
v
(
q
) = 1, then
q
¯
S
. We want to show
p q
¯
S
. By our first axiom,
we know that
q
(
p q
). So
¯
S p q
. So
p q
¯
S
by deductive
closure. Hence we have v(p q) = 1.
(iii) If v(p) = 0, then p ∈
¯
S. So ¬p
¯
S. We want to show p q
¯
S.
This is equivalent to showing ¬p p q.
By the deduction theorem, this is equivalent to proving {p, ¬p} q.
We know that {p, ¬p} . So it is sufficient to show q.
By axiom 3, this is equivalent to showing ¬¬q.
By the deduction theorem, this is again equivalent to showing
¬¬q.
By definition of ¬, this is equivalent to showing (¬q ).
But this is just an instance of the first axiom. So we know that
¯
S p q
.
So v(p q) = 1.
Note that the last case is the first time we really use Axiom 3.
By remark before the proof, we have
Corollary (Adequacy theorem). Let S L, t L. Then S |= t implies S t.
Theorem (Completeness theorem). Le
S L
and
t L
. Then
S |
=
t
if and
only if S t.
This theorem has two nice consequences.
Corollary (Compactness theorem). Let
S L
and
t L
with
S |
=
t
. Then
there is some finite S
S has S
|= t.
Proof. Trivial with |= replaced by , because proofs are finite.
Sometimes when people say compactness theorem, they mean the special
case where
t
=
. This says that if every finite subset of
S
has a model, then
S
has a model. This result can be used to prove some rather unexpected result in
other fields such as graph theory, but we will not go into details.
Corollary (Decidability theorem). Let
S L
be a finite set and
t L
. Then
there exists an algorithm that determines, in finite and bounded time, whether
or not S t.
Proof. Trivial with replaced by |=, by making a truth table.
This is a rather nice result, because we know that proofs are hard to find in
general. However, this theorem only tells you whether a proof exists, without
giving you the proof itself!
2 Well-orderings and ordinals
In the coming two sections, we will study different orderings. The focus of this
chapter is well-orders, while the focus of the next is partial orders.
A well-order on a set
S
is a special type of total order where every non-empty
subset of
S
has a least element. Among the many nice properties of well-orders,
it is possible to do induction and recursion on well-orders.
Our interest, however, does not lie in well-orders itself. Instead, we are
interested in the “lengths” of well-orders. Officially, we call them them the order
types of the well-orders. Each order type is known as an ordinal.
There are many things we can do with ordinals. We can add and multiply
them to form “longer” well-orders. While we will not make much use of them in
this chapter, in later chapters, we will use ordinals to count “beyond infinity”,
similar to how we count finite things using natural numbers.
2.1 Well-orderings
We start with a few definitions.
Definition ((Strict) total order). A (strict) total order or linear order is a pair
(X, <), where X is a set and < is a relation on X that satisfies
(i) x < x for all x (irreflexivity)
(ii) If x < y, y < z, then x < z (transitivity)
(iii) x < y or x = y or y < x (trichotomy)
We have the usual shorthands for total orders. For example,
x > y
means
y < x and x y means (x < y or x = y).
It is also possible for a total order to be defined in terms of instead of <.
Definition ((Non-strict) total order). A (non-strict) total order is a pair (
X,
),
where X is a set and is a relation on X that satisfies
(i) x x (reflexivity)
(ii) x y and y z implies x z (transitivity)
(iii) x y and y x implies x = y (antisymmetry)
(iv) x y or y x (trichotomy)
Example.
(i) N, Z, Q, R with usual the usual orders are total orders.
(ii)
On
N
+
(the positive integers),
x < y
if
x | y
and
x
=
y
is not trichotomous,
and so not a total order.
(iii)
On
P
(
S
), define
x y
if
x y
. This is not a total order since it is not
trichotomous (for |X| > 1).
While there can be different total orders, the particular kind we are interested
in is well-orders.
Definition (Well order). A total order (
X, <
) is a well-ordering if every (non-
empty) subset has a least element, i.e.
(S X)[S = (x S)(y S) y x].
Example.
(i) N with usual ordering is a well-order.
(ii) Z, Q, R
are not well-ordered because the whole set itself does not have a
least element.
(iii) {x Q
:
x
0
}
is not well-ordered. For example,
{x X
:
x >
0
}
has no
least element.
(iv)
In
R
,
{
1
1
/n
:
n
= 2
,
3
,
4
, ···}
is well-ordered because it is isomorphic to
the naturals.
(v) {
1
1
/n
:
n
= 2
,
3
,
4
, ···} {
1
}
is also well-ordered, If the subset is only
1, then 1 is the least element. Otherwise take the least element of the
remaining set.
(vi) Similarly, {1 1/n : n = 2, 3, 4, ···} {2} is well-ordered.
(vii) {
1
1
/n
:
n
= 2
,
3
,
4
, ···} {
2
1
/n
:
n
= 2
,
3
,
4
, ···}
is also well-ordered.
This is a good example to keep in mind.
There is another way to characterize total orders in terms of infinite decreasing
sequences.
Proposition. A total order is a well-ordering if and only if it has no infinite
strictly decreasing sequence.
Proof. If x
1
> x
2
> x
3
> ···, then {x
i
: i N} has no least element.
Conversely, if non-empty
S X
has no least element, then each
x S
have
x
S with x
< x. Similarly, we can find some x
′′
< x
ad infinitum. So
x > x
> x
′′
> x
′′′
> ···
is an infinite decreasing sequence.
Like all other axiomatic theories we study, we identify two total orders to be
isomorphic if they are “the same up to renaming of elements”.
Definition (Order isomorphism). Say the total orders
X, Y
are isomorphic if
there exists a bijection
f
:
X Y
that is order-preserving, i.e.
x < y f
(
x
)
<
f(y).
Example.
(i) N and {1 1/n : n = 2, 3, 4, ···} are isomorphic.
(ii) {
1
1
/n
:
n
= 2
,
3
,
4
, ···}{
1
}
is isomorphic to
{
1
1
/n
:
n
= 2
,
3
,
4
, ···}
{2}.
(iii) {
1
1
/n
:
n
= 2
,
3
,
4
, ···}
and
{
1
1
/n
:
n
= 2
,
3
,
4
, ···} {
1
}
are not
isomorphic because the second has a greatest element but the first doesn’t.
Recall from IA Numbers and Sets that in
N
, the well-ordering principle is
equivalent to the principle induction. We proved that we can do induction simply
by assuming that
N
is well-ordered. Using the same proof, we should be able to
prove that we can do induction on any well-ordered set.
Of course, a general well-ordered set does not have the concept of “+1”, so
we won’t be able to formulate weak induction. Instead, our principle of induction
is the form taken by strong induction.
Proposition (Principle by induction). Let
X
be a well-ordered set. Suppose
S X has the property:
(x)
(y) y < x y S
x S
,
then S = X.
In particular, if a property P (x) satisfies
(x)
(y) y < x P (y)
P (x)
,
then P (x) for all x.
Proof.
Suppose
S
=
X
. Let
x
be the least element of
X \S
. Then by minimality
of x, for all y, y < x y S. Hence x S. Contradiction.
Using proof by induction, we can prove the following property of well-orders:
Proposition. Let
X
and
Y
be isomorphic well-orderings. Then there is a unique
isomorphism between X and Y .
This is something special to well-orders. This is not true for general total
orderings. For example,
x 7→ x
and
x 7→ x
13 are both isomorphisms
Z Z
. It
is also not true for, say, groups. For example, there are two possible isomorphisms
from Z
3
to itself.
Proof.
Let
f
and
g
be two isomorphisms
X Y
. To show that
f
=
g
, it is
enough, by induction, to show f(x) = g(x) given f(y) = g(y) for all y < x.
Given a fixed
x
, let
S
=
{f
(
y
) :
y < x}
. We know that
Y \ S
is non-empty
since
f
(
x
)
∈ S
. So let
a
be the least member of
Y \ S
. Then we must have
f
(
x
) =
a
. Otherwise, we will have
a < f
(
x
) by minimality of
a
, which implies
that
f
1
(
a
)
< x
since
f
is order-preserving. However, by definition of
S
, this
implies that a = f(f
1
(a)) S. This is a contradiction since a Y \ S.
By the induction hypothesis, for
y < x
, we have
f
(
y
) =
g
(
y
). So we have
S = {g(y) : y < x} as well. Hence g(x) = min(Y \ S) = f(x).
If we have an ordered set, we can decide to cut off the top of the set and
keep the bottom part. What is left is an initial segment.
Definition (Initial segment). A subset
Y
of a totally ordered
X
is an initial
segment if
x Y, y < x y Y,
Y
X
Example. For any
x X
, the set
I
x
=
{y X
:
y < x}
is an initial segment.
However, not every initial segment of
X
need to be in this form. For example,
{x
:
x
3
} R
and
{x
:
x <
0
or x
2
<
2
} Q
are both initial segments not of
this form.
The next nice property of well-orders we have is that every proper initial
segment is of this form.
Proposition. Every initial segment
Y
of a well-ordered set
X
is of the form
I
x
= {y X : y < x}.
Proof.
Take
x
=
min X \ Y
. Then for any
y I
x
, we have
y < x
. So
y Y
by
definition of x. So I
x
Y .
On the other hand, if
y Y
, then definitely
y
=
x
. We also cannot have
y > x
since this implies
x Y
. Hence we must have
y < x
. So
y I
x
. Hence
Y I
x
. So Y = I
x
.
The next nice property we want to show is that in a well-ordering
X
, every
subset S is isomorphic to an initial segment.
Note that this is very false for a general total order. For example, in
Z
, no
initial segment is isomorphic to the subset
{
1
,
2
,
3
}
since every initial segment
of
Z
is either infinite or empty. Alternatively, in
R
,
Q
is not isomorphic to an
initial segment.
It is intuitively obvious how we can prove this. We simply send the minimum
element of
S
to the minimum of
X
, and the continue recursively. However, how
can we justify recursion? If we have the well-order
{
1
2
,
2
3
,
3
4
, ··· ,
1
}
, then we will
never reach 1 if we attempt to write down each step of the recursion, since we
can only get there after infinitely many steps. We will need to define some sort
of “infinite recursion” to justify our recursion on general well-orders.
We first define the restriction of a function:
Definition (Restriction of function). For
f
:
A B
and
C A
, the restriction
of f to C is
f|
C
= {(x, f(x)) : x C}.
In this theorem (and the subsequent proof), we are treating functions as
explicitly a set of ordered pairs (
x, f
(
x
)). We will perform set operations on
functions and use unions to “stitch together” functions.
Theorem (Definition by recursion). Let
X
be a well-ordered set and
Y
be any
set. Then for any function
G
:
P
(
X ×Y
)
Y
, there exists a function
f
:
X Y
such that
f(x) = G(f |
I
x
)
for all x.
This is a rather weird definition. Intuitively, it means that
G
takes previous
values of
f
(
x
) and returns the desired output. This means that in defining
f
at
x
, we are allowed to make use of values of
f
on
I
x
. For example, we define
f(n) = nf (n 1) for the factorial function, with f(0) = 1.
Proof.
We might want to jump into the proof and define
f
(0) =
G
(
), where 0
is the minimum element. Then we define
f
(1) =
G
(
f
(0)) etc. But doing so is
simply recursion, which is the thing we want to prove that works!
Instead, we use the following clever trick: We define an
h
is an attempt” to
mean
h : I Y for some initial segment I of X, and h(x) = G(h|
I
x
) for x I.
The idea is to show that for any
x
, there is an attempt
h
that is defined at
x
.
Then take the value
f
(
x
) to be
h
(
x
). However, we must show this is well-defined
first:
Claim. If attempts h and h
are defined at x, then h(x) = h
(x).
By induction on
x
, it is enough to show that
h
(
x
) =
h
(
x
) assuming
h
(
y
) =
h
(y) for all y < x. But then h(x) = G(h|
I
x
) = G(h
|
I
x
) = h
(x). So done.
Claim. For any x, there must exist an attempt h that is defined at x.
Again, we may assume (by induction) that for each
y < x
, there exists an
attempt
h
y
defined at
y
. Then we put all these functions together, and take
h
=
S
y<x
h
y
. This is defined for all
y < x
, and is well-defined since the
h
y
never disagree.
Finally, add to it (
x, G
(
h
|
I
x
)). Then
h
=
h
(
x, G
(
h
|
I
x
)) is an attempt
defined at x.
Now define
f
:
X Y
by
f
(
x
) =
y
if there exists an attempt
h
, defined at
x, with h(x) = y.
Claim. There is a unique such f.
Suppose
f
and
f
both work. Then if
f
(
y
) =
f
(
y
) for all
y < x
, then
f
(
x
) =
f
(
x
) by definition. So by induction, we know for all
x
, we have
f
(x) = f(x).
With the tool of recursion, we can prove that every subset of a well-order.
Lemma (Subset collapse). Let
X
be a well-ordering and let
Y X
. Then
Y
is
isomorphic to an initial segment of
X
. Moreover, this initial segment is unique.
Proof.
For
f
:
Y X
to be an order-preserving bijection with an initial segment
of X, we need to map x to the smallest thing not yet mapped to, i.e.
f(x) = min(X \ {f(y) : y < x}).
To be able to take the minimum, we have to make sure the set is non-empty, i.e.
{f
(
y
) :
y < x}
=
X
. We can show this by proving that
f
(
z
)
< x
for all
z < x
by
induction, and hence x ∈ {f(y) : y < x}.
Then by the recursion theorem, this function exists and is unique.
This implies that a well-ordered
X
can never be isomorphic to a proper
initial segment of itself. This is since
X
is isomorphic to itself by the identity
function, and uniqueness shows that it cannot be isomorphic to another initial
segment.
Using the idea of initial segments, we can define an order comparing different
well-orders themselves.
Notation. Write X Y if X is isomorphic to an initial segment of Y .
We write
X < Y
if
X Y
but
X
is not isomorphic to
Y
, i.e.
X
is isomorphic
to a proper initial segment of Y .
Example. If X = N, Y = {
1
2
,
2
3
,
3
4
, ···} {1}, then X Y .
We will show that this is a total order. Of course, we identify two well-orders
as “equal” when they are isomorphic.
Reflexivity and transitivity are straightforward. So we prove trichotomy and
antisymmetry:
Theorem. Let X, Y be well-orderings. Then X Y or Y X.
Proof. We attempt to define f : X Y by
f(x) = min(Y \ {f (y) : y < x}).
By the law of excluded middle, this function is either well-defined or not.
If it is well-defined, then it is an isomorphism from
X
to an initial segment
of Y .
If it is not, then there is some
x
such that
{f
(
y
) :
y < x}
=
Y
and we cannot
take the minimum. Then
f
is a bijection between
I
x
=
{y
:
y < x}
and
Y
. So
f
is an isomorphism between Y and an initial segment of X.
Hence either X Y or Y X.
Theorem. Let
X, Y
be well-orderings with
X Y
and
Y X
. Then
X
and
Y are isomorphic.
Proof.
Since
X Y
, there is an order-preserving function
f
:
X Y
that
bijects
X
with an initial segment of
Y
. Similarly, since
Y X
, we get an
analogous
g
:
Y X
. Then
g f
:
X X
defines a bijection between
X
and
an initial segment of X.
Since there is no bijection between
X
and a proper initial segment of itself,
the image of g f must be X itself. Hence g f is a bijection.
Similarly,
f g
is a bijection. Hence
f
and
g
are both bijections, and
X
and
Y are isomorphic.
2.2 New well-orderings from old
Given a well-ordering
X
, we want to create more well-orderings. We’ve previously
shown that we can create a shorter one by taking an initial segment. In this
section, we will explore two ways to make longer well-orderings.
Add one element
We can extend a well-ordering by exactly one element. This is known as the
successor.
Definition (Successor). Given
X
, choose some
x ∈ X
and define a well-ordering
on
X {x}
by setting
y < x
for all
y X
. This is the successor of
X
, written
X
+
.
We clearly have X < X
+
.
Put some together
More interestingly, we want to “stitch together” many well-orderings. However,
we cannot just arbitrarily stitch well-orderings together. The well-orderings must
satisfy certain nice conditions for this to be well-defined.
Definition (Extension). For well-orderings (
X, <
X
) and (
Y, <
Y
), we say
Y
extends
X
if
X
is a proper initial segment of
Y
and
<
X
and
<
Y
agree when
defined.
Note that we explicitly require
X
to be an initial segment of
Y
.
X
simply
being a subset of Y will not work, for reasons that will become clear shortly.
Definition (Nested family). We say well-orderings
{X
i
:
i I}
form a nested
family if for any i, j I, either X
i
extends X
j
, or X
j
extends X
i
.
Proposition. Let
{X
i
:
i I}
be a nested set of well-orderings. Then there
exists a well-ordering X with X
i
X for all i.
Proof.
Let
X
=
S
iI
X
i
with
<
defined on
X
as
S
iI
<
i
(where
<
i
is the
ordering of
X
i
), i.e. we inherit the orders from the
X
i
’s. This is clearly a total
ordering. Since
{X
i
:
i I}
is a nested family, each
X
i
is an initial segment of
X.
To show that it is a well-ordering, let
S X
be a non-empty subset of
X
.
Then
S X
i
is non-empty for some
i
. Let
x
be the minimum element (in
X
i
) of
S X
i
. Then also for any
y S
, we must have
x y
, as
X
i
is an initial segment
of X.
Note that if we didn’t require
X
to be an initial segment of
Y
when defining
’extension’, then the above proof will not work. For example, we can take the
collection of all subsets
X
n
=
{x n
:
x Z}
, and their union would be
Z
,
which is not well-ordered.
2.3 Ordinals
We have already shown that the collection of all well-orderings is a total order.
But is it a well-ordering itself? To investigate this issue further, we first define
ourselves a convenient way of talking about well-orderings.
Definition (Ordinal). An ordinal is a well-ordered set, with two regarded as
the same if they are isomorphic. We write ordinals as Greek letters α, β etc.
We would want to define ordinals as equivalence classes of well-orders under
isomorphism, but we cannot, because they do not form a set. We will provide a
formal definition of ordinals later when we study set theory.
Definition (Order type). If a well-ordering
X
has corresponding ordinal
α
, we
say X has order type α, and write otp(X) = α.
Notation. For each
k N
, we write
k
for the order type of the (unique)
well-ordering of size k. We write ω for the order type of N.
Example. In R, {2, 3, 5, 6} has order type 4. {
1
2
,
2
3
,
3
4
, ···} has order type ω.
Notation. For ordinals
α, β
, write
α β
if
X Y
for some
X
of order type
α
,
Y
of order type
β
. This does not depend on the choice of
X
and
Y
(since any
two choices must be isomorphic).
Proposition. Let
α
be an ordinal. Then the ordinals
< α
form a well-ordering
of order type α.
Notation. Write I
α
= {β : β < α}.
Proof.
Let
X
have order type
α
. The well-orderings
< X
are precisely (up to
isomorphism) the proper initial segments of
X
(by uniqueness of subset collapse).
But these are the
I
x
for all
x X
. So we can biject
X
with the well-orderings
< X by x 7→ I
x
.
Finally, we can prove that the ordinals are well-ordered.
Proposition. Let
S
be a non-empty set of ordinals. Then
S
has a least element.
Proof. Choose α S. If it is minimal, done.
If not, then
S I
α
is non-empty. But
I
α
is well-ordered. So
S I
α
has a
least element, β. Then this is a minimal element of S.
However, it would be wrong to say that the ordinals form a well-ordered set,
for the very reason that they don’t form a set.
Theorem (Burali-Forti paradox). The ordinals do not form a set.
Proof.
Suppose not. Let
X
be the set of ordinals. Then
X
is a well-ordering.
Let its order-type be
α
. Then
X
is isomorphic to
I
α
, a proper initial subset of
X. Contradiction.
Recall that we could create new well-orderings from old via adding one
element and taking unions. We can translate these into ordinal language.
Given an ordinal
α
, suppose that
X
is the corresponding well-order. Then
we define α
+
to be the order type of X
+
.
If we have a set
{α
i
:
i I}
of ordinals, we can stitch them together to form
a new well-order. In particular, we apply “nested well-orders” to the initial
segments
{I
α
i
:
i I}
. This produces an upper bound of the ordinals
α
i
. Since
the ordinals are well-ordered, we know that there is a least upper bound. We
call this the supremum of the set
{α
i
:
i I}
, written
sup{α
i
:
i I}
. In fact,
the upper bound created by nesting well-orders is the least upper bound.
Example. {2, 4, 6, 8, ···} has supremum ω.
Now we have two ways of producing ordinals: +1 and supremum.
We can generate a lot of ordinals now:
0 ω · 2 + 1 ω
2
+ 1 ω
2
· 3 ω
ω+2
ε
0
+ 1
1 ω · 2 + 2 ω
2
+ 2 ω
2
· 4
.
.
.
.
.
.
2 ω · 2 + 3 ω
2
+ 3 ω
2
· 5 ω
ω·2
ε
0
· 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ω ω · 3 ω
2
+ ω ω
3
ω
ω
2
ε
2
0
ω + 1 ω · 4
.
.
.
.
.
.
.
.
.
.
.
.
ω + 2 ω · 5 ω
2
+ ω · 2 ω
ω
ω
ω
ω
ε
ε
0
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ω + ω = ω · 2 ω · ω = ω
2
ω
2
· 2 ω
ω+1
ω
ω
.
.
.
= ε
0
ε
ε
.
.
.
0
0
= ε
1
Here we introduced a lot of different notations. For example, we wrote
ω
+ 1 to
mean
ω
+
, and
ω ·
2 =
sup{ω, ω
+ 1
, ω
+ 2
, ···}
. We will formally define these
notations later.
We have written a lot of ordinals above, some of which are really huge.
However, all the ordinals above are countable. The operations we have done so
far is adding one element and taking countable unions. So the results are all
countable. So is there an uncountable ordinal?
Theorem. There is an uncountable ordinal.
Proof.
This is easy by looking at the supremum of the set of all countable
ordinals. However, this works only if the collection of countable ordinals is a set.
Let
A
=
{R P
(
N × N
) :
R is a well-ordering of a subset of N}
. So
A
P
(
N × N
). Then
B
=
{order type of R
:
R A}
is the set of all countable
ordinals.
Let
ω
1
=
sup B
. Then
ω
1
is uncountable. Indeed, if
ω
1
were countable, then
it would be the greatest countable ordinal, but
ω
1
+ 1 is greater and is also
countable.
By definition,
ω
1
is the least uncountable ordinal, and everything in our
previous big list of ordinals is less than ω
1
.
There are two strange properties of ω
1
:
(i) ω
1
is an uncountable ordering, yet for every
x ω
1
, the set
{y
:
y < x}
is
countable.
(ii)
Every sequence in
ω
1
is bounded, since its supremum is a countable union
of countable sets, which is countable.
In general, we have the following theorem:
Theorem (Hartogs’ lemma). For any set
X
, there is an ordinal that does not
inject into X.
Proof. As before, with B = {α : α injects into X}.
Notation. Write
γ
(
X
) for the least ordinal that does not inject into
X
. e.g.
γ(ω) = ω
1
.
2.4 Successors and limits
In general, we can divide ordinals into two categories. The criteria is as follows:
Given an ordinal
α
, is there a greatest element of
α
? i.e. does
I
α
=
{β
:
β < α}
have a greatest element?
If yes, say
β
is the greatest element. Then
γ I
α
γ β
. So
I
α
=
{β}I
β
.
In other words, α = β
+
.
Definition (Successor ordinal). An ordinal
α
is a successor ordinal if there is a
greatest element β below it. Then α = β
+
.
On the other hand, if no, then for any
γ < α
, there exists
β < α
such that
β > γ. So α = sup{β : β < α}.
Definition (Limit ordinal). An ordinal
α
is a limit if it has no greatest element
below it. We usually write λ for limit ordinals.
Example. 5 and
ω
+
are successors.
ω
and 0 are limits (0 is a limit because it
has no element below it, let alone a greatest one!).
2.5 Ordinal arithmetic
We want to define ordinals arithmetic such as + and
×
, so that we can make
formal sense out of our notations such as ω + ω in our huge list of ordinals.
We first start with addition.
Definition (Ordinal addition (inductive)). Define
α
+
β
by recursion on
β
(
α
is fixed):
α + 0 = α.
α + β
+
= (α + β)
+
.
α + λ = sup{α + γ : γ < λ} for non-zero limit λ.
Note that officially, we cannot do “recursion on the ordinals”, since the
ordinals don’t form a set. So what we officially do is that we define
α
+
γ
on
{γ
:
γ < β}
recursively for each ordinal
β
. Then by uniqueness of recursions, we
can show that this addition is well-defined.
Example. ω + 1 = (ω + 0)
+
= ω
+
.
ω + 2 = (ω + 1)
+
= ω
++
.
1 + ω = sup{1 + n : n ω} = sup{1, 2, 3, ···} = ω.
It is very important to note that addition is not commutative! This asymmetry
arises from our decision to perform recursion on β instead of α.
On the other hand, addition is associative.
Proposition. Addition is associative, i.e. (α + β) + γ = α + (β + γ).
Proof.
Since we define addition by recursion, it makes sense to prove this by
induction. Since we recursed on the right-hand term in the definition, it only
makes sense to induct on γ (and fix α + β).
(i) If γ = 0, then α + (β + 0) = α + β = (α + β) + 0.
(ii) If γ = δ
+
is a successor, then
α + (β + δ
+
) = α + (β + δ)
+
= [α + (β + δ)]
+
= [(α + β) + δ]
+
= (α + β) + δ
+
= (α + β) + γ.
(iii) If γ is a limit ordinal, we have
(α + β) + λ = sup{(α + β) + γ : γ < λ}
= sup{α + (β + γ) : γ < λ}
If we want to evaluate
α
+ (
β
+
λ
), we have to first know whether
β
+
λ
is
a successor or a limit. We now claim it is a limit:
β
+
λ
=
sup{β
+
γ
:
γ < λ}
. We show that this cannot have a greatest
element: for any
β
+
γ
, since
λ
is a limit ordinal, we can find a
γ
such that
γ < γ
< λ. So β + γ
> β + γ. So β + γ cannot be the greatest element.
So
α + (β + λ) = sup{α + δ : δ < β + λ}.
We need to show that
sup{α + δ : δ < β + λ} = sup{α + (β + γ) : γ < λ}.
Note that the two sets are not equal. For example, if
β
= 3 and
λ
=
ω
,
then the left contains α + 2 but the right does not.
So we show that the left is both and the right.
: Each element of the right hand set is an element of the left.
: For
δ < β
+
λ
, we have
δ < sup{β
+
γ
:
γ < λ}
. So
δ < β
+
γ
for some
γ < λ. Hence α + δ < α + (β + γ).
Note that it is easy to prove that
β < γ α
+
β < α
+
γ
by induction on
γ
(which we implicitly assumed above). But it is not true if we add on the right:
1 < 2 but 1 + ω = 2 + ω.
The definition we had above is called the inductive definition. There is an
alternative definition of + based on actual well-orders. This is known as the
synthetic definition.
Intuitively, we first write out all the elements of
α
, then write out all the
elements of β after it. The α + β is the order type of the combined mess.
Definition (Ordinal addition (synthetic)).
α
+
β
is the order type of
α β
(
α
disjoint union β, e.g. α × {0} β × {1}), with all α before all of β
α + β = α β
Example. ω + 1 = ω
+
.
1 + ω = ω.
With this definition, associativity is trivial:.
α + (β + γ) = α β γ = (α + β) + γ.
Now that we have given two definitions, we must show that they are the same:
Proposition. The inductive and synthetic definition of + coincide.
Proof.
Write + for inductive definition, and +
for synthetic. We want to show
that α + β = α +
β. We induct on β.
(i) α + 0 = α = α +
0.
(ii) α + β
+
= (α + β)
+
= (α +
β)
+
= otp α
β · = α +
β
+
(iii) α
+
λ
=
sup{α
+
γ
:
γ < λ}
=
sup{α
+
γ
:
γ < λ}
=
α
+
λ
. This works
because taking the supremum is the same as taking the union.
α γ γ
γ
′′
···λ
The synthetic definition is usually easier to work with, if possible. For
example, it was very easy to show associativity using the synthetic definition. It
is also easier to see why addition is not commutative. However, if we want to do
induction, the inductive definition is usually easier.
After addition, we can define multiplication. Again, we first give an inductive
definition, and then a synthetic one.
Definition (Ordinal multiplication (inductive)). We define
α · β
by induction
on β by:
(i) α · 0 = 0.
(ii) α · (β
+
) = α · β + α.
(iii) α · λ = sup{α · γ : γ < λ} for λ a non-zero limit.
Example.
ω · 1 = ω · 0 + ω = 0 + ω = ω.
ω · 2 = ω · 1 + ω = ω + ω.
2 · ω = sup{2 · n : n < ω} = ω.
ω · ω = sup{ω · n : n < ω} = sup{ω, ω
2
, ω
3
, ···}.
We also have a synthetic definition.
Definition (Ordinal multiplication (synthetic)).
β
α
.
.
.
α
α
Formally,
α ·β
is the order type of
α ×β
, with (
x, y
)
<
(
x
, y
) if
y < y
or (
y
=
y
and x < x
).
Example. ω · 2 =
ω
ω
= ω + ω.
Also 2 · ω = ω
··
.
.
.
··
··
= ω
We can check that the definitions coincide, prove associativity etc. similar to
what we did for addition.
We can define ordinal exponentiation, towers etc. similarly:
Definition (Ordinal exponentiation (inductive)). α
β
is defined as
(i) α
0
= 1
(ii) α
β
+
= α
β
· α
(iii) α
λ
= sup{α
γ
: γ < λ}.
Example. ω
1
= ω
0
· ω = 1 · ω = ω.
ω
2
= ω
1
· ω = ω · ω.
2
ω
= sup{2
n
: n < ω} = ω.
2.6 Normal functions*
Note: These content were not lectured during the year.
When we have ordinals, we would like to consider functions
On On
.
Since the ordinals are totally ordered, it would make sense to consider the
order-preserving functions, i.e. the increasing ones. However, ordinals have an
additional property we could take suprema of ordinals. If we want our function
to preserve this as well, we are lead to the following definition:
Definition (Normal function). A function f : On On is normal if
(i) For any ordinal α, we have f(α) < f(α
+
).
(ii) If λ is a non-zero limit ordinal, then f (λ) = sup{f(γ) : γ < λ}.
Some replace the increasing condition by
f
(
α
)
< f
(
α
+
). These are easily
seen to be equivalent by transfinite induction.
Example. By definition, we see that for each
β >
1, the function
α 7→ β
α
is
normal.
We start by a few technical lemmas.
Lemma. Let f be a normal function. Then f is strictly increasing.
Proof. Let α be a fixed ordinal. We induct on all β > α that f(α) < f(β).
If β = α
+
, then the result is obvious.
If
β
=
γ
+
with
γ
=
α
, then
α < γ
. So
f
(
α
)
< f
(
γ
)
< f
(
γ
+
) =
f
(
β
) by
induction.
If β is a limit and is greater than α, then
f(β) = sup{f(γ) : γ < β} f(α
+
) > f (α),
since α
+
< β. So the result follows.
Lemma. Let f be a normal function, and α an ordinal. Then f (α) α.
Proof.
We prove by induction. It is trivial for zero. For successors, we have
f(α
+
) > f (α) α, so f(α
+
) α
+
. For limits, we have
f(λ) = sup{f (γ) : γ < λ} sup{γ : γ < λ} = λ.
The following is a convenient refinement of the continuity result:
Lemma. If
f
is a normal function, then for any non-empty set
{α
i
}
iI
, we have
f(sup{α
i
: i I}) = sup{f(α
i
) : i I}.
Proof.
If
{α
i
}
has a maximal element, then the result is obvious, as
f
is increasing,
and the supremum is a maximum.
Otherwise, let
α = sup{α
i
: i I}
Since the
α
i
has no maximal element, we know
α
must be a limit ordinal. So we
have
f(α) = sup{f (β) : β < α}.
So it suffices to prove that
sup{f(β) : β < α} = sup{f(α
i
) : i I}.
Since all α
i
< α, we have sup{f (β) : β < α} sup{f (α
i
) : i I}.
For the other direction, it suffices, by definition, to show that
f(β) sup{f(α
γ
) : i I}
for all β < α.
Given such a
β
, since
α
is the supremum of the
α
i
, we can find some particular
α
i
such that
β < α
i
. So
f
(
β
)
< f
(
α
i
)
sup{f
(
α
i
) :
i I}
. So we are done.
Because of these results, some define normal functions to be functions that
are strictly increasing and preserve all suprema.
We now proceed to prove two important properties of normal functions (with
easy proofs!):
Lemma (Fixed-point lemma). Let
f
be a normal function. Then for each
ordinal α, there is some β α such that f(β) = β.
Since the supremum of fixed points is also a fixed point (by normality), it
follows that we can define a function
g
:
On On
that enumerates the fixed
points. Now this function itself is again normal, so it has fixed points as well. . .
Proof. We thus define
β = sup{f (α), f(f (α)), f(f (f (α))), ···}.
If the sequence eventually stops, then we have found a fixed point. Otherwise,
β
is a limit ordinal, and thus normality gives
f(β) = sup{f(f(α)), f (f(f(α))), f (f(f(f(α)))), ···} = β.
So β is a fixed point, and β f(α) α.
Lemma (Division algorithm for normal functions). Let
f
be a normal function.
Then for all α, there is some maximal γ such that α f(γ).
Proof. Let γ = sup{β : f(β) α}. Then we have
f(γ) = sup{f(β) : f (b) α} α.
This is clearly maximal.
3 Posets and Zorn’s lemma
In this chapter, we study partial orders. While there are many examples of
partial orders, the most important example is the power set
P
(
X
) for any set
X
,
ordered under inclusion. We will also consider subsets of the power set.
The two main theorems of this chapter are Knaster-Tarski fixed point theorem
and Zorn’s lemma. We will use Zorn’s lemma to prove a lot of useful results in
different fields, including the completeness theorem in propositional calculus.
Finally, we will investigate the relationship between Zorn’s lemma and Axiom of
Choice.
3.1 Partial orders
Definition (Partial ordering (poset)). A partially ordered set or poset is a pair
(X, ), where X is a set and is a relation on X that satisfies
(i) x x for all x X (reflexivity)
(ii) x y and y z x z (transitivity)
(iii) x y and y x x = y (antisymmetry)
We write
x < y
to mean
x y
and
x
=
y
. We can also define posets in terms
of <:
(i) x < x for all x X (irreflexive)
(ii) x < y and y < z x < z (transitive)
Example.
(i) Any total order is (trivially) a partial order.
(ii) N with x y if x | y is a partial order.
(iii) P(S) with for any set S is a partial order.
(iv) Any subset of P(S) with inclusion is a partial order.
(v) We can use a diagram
a
b
c
d
e
Where “above” means “greater”. So
a b c
,
a d e
, and what
follows by transitivity. This is a Hasse diagram.
Definition (Hasse diagram). A Hasse diagram for a poset
X
consists of
a drawing of the points of
X
in the plane with an upwards line from
x
to
y if y covers x:
Definition (Cover). In a poset,
y
covers
x
if
y > x
and no
z
has
y > z > x
.
Hasse diagrams can be useful e.g. N, or useless, e.g. Q.
(vi)
The following example shows that we cannot assign “heights” or “ranks”
to posets:
a
d
b
e
c
(vii) We can also have complicated structures:
a
b
c
d
e
(viii)
Or the empty poset (let
X
be any set and nothing is less than anything
else).
While there are many examples of posets, all we care about are actually
power sets and their subsets only.
Often, we want to study subsets of posets. For example, we might want to
know if a subset has a least element. All subsets are equal, but some subsets are
more equal than others. A particular interesting class of subsets is a chain.
Definition (Chain and antichain). In a poset, a subset
S
is a chain if it is
totally ordered, i.e. for all
x
,
y
,
x y
or
y x
. An antichain is a subset in
which no two things are related.
Example. In (N, |), 1, 2, 4, 8, 16, ··· is a chain.
In (v), {a, b, c} or {a, c} are chains.
R is a chain in R.
Definition (Upper bound and supremum). For
S X
, an upper bound for
S
is
an x X such that y S : x y.
x X
is a least upper bound, supremum or join of
S
, written
x
=
sup S
or
x
=
W
S
, if
x
is an upper bound for
S
, and for all
y X
, if
y
is an upper bound,
then y x.
Example.
(i) In R, {x : x <
2} has an upper bound 7, and has a supremum
2.
(ii)
In (v) above, consider
{a, b}
. Upper bounds are
b
and
c
. So
sup
=
b
.
However, {b, d} has no upper bound!
(iii) In (vii), {a, b} has upper bounds c, d, e, but has no least upper bound.
Definition (Complete poset). A poset
X
is complete if every
S X
has a
supremum. In particular, it has a greatest element (i.e.
x
such that
y
:
x y
),
namely sup X, and least element (i.e. x such that y : x y), namely sup .
It is very important to remember that this definition does not require that
the subset
S
is bounded above or non-empty. This is different from the definition
of metric space completeness.
Example.
R is not complete because R itself has no supremum.
[0
,
1] is complete because every subset is bounded above, and so has a least
upper bound. Also, has a supremum of 0.
(0, 1) is not complete because (0, 1) has no upper bound.
P
(
S
) for any
S
is always complete, because given any
{A
i
:
i A}
, where
each A
i
S,
S
A
i
is its supremum.
Now we are going to derive fixed-point theorems for complete posets. We
start with a few definitions:
Definition (Fixed point). A fixed point of a function
f
:
X X
is an
x
such
that f (x) = x.
Definition (Order-preserving function). For a poset
X
,
f
:
X X
is order-
preserving of x y f(x) f(y).
Example.
On N, x 7→ x + 1 is order-preserving
On Z, x 7→ x 1 is order-preserving
On (0
,
1),
x 7→
1+x
2
is order-preserving (this function halves the distance
from x to 1).
On P(S), let some fixed i S. Then A 7→ A {i} is order-preserving.
Not every order-preserving
f
has a fixed point (e.g. first two above). However,
we have
Theorem (Knaster-Tarski fixed point theorem). Let
X
be a complete poset,
and f : X X be a order-preserving function. Then f has a fixed point.
Proof.
To show that
f
(
x
) =
x
, we need
f
(
x
)
x
and
f
(
x
)
x
. Let’s not be
too greedy and just want half of it:
Let
E
=
{x
:
x f
(
x
)
}
. Let
s
=
sup E
. We claim that this is a fixed point,
by showing f(s) s and s f(s).
To show
s f
(
s
), we use the fact that
s
is the least upper bound. So if
we can show that
f
(
s
) is also an upper bound, then
s f
(
s
). Now let
x E
.
So
x s
. Therefore
f
(
x
)
f
(
s
) by order-preservingness. Since
x f
(
x
) (by
definition of E) x f (x) f(s). So f(s) is an upper bound.
To show
f
(
s
)
s
, we simply have to show
f
(
s
)
E
, since
s
is an upper
bound. But we already know
s f
(
s
). By order-preservingness,
f
(
s
)
f
(
f
(
s
)).
So f (s) E by definition.
While this proof looks rather straightforward, we need to first establish that
s f
(
s
), then use this fact to show
f
(
s
)
s
. If we decided to show
f
(
s
)
s
first, then we would fail!
The very typical application of Knaster-Tarski is the quick, magic proof of
Cantor-Shr¨oder-Bernstein theorem.
Corollary (Cantor-Schr¨oder-Bernstein theorem). Let
A, B
be sets. Let
f
:
A
B and g : B A be injections. Then there is a bijection h : A B.
Proof.
We try to partition
A
into
P
and
Q
, and
B
into
R
and
S
, such that
f(P ) = R and g(S) = Q. Then we let h = f on R and g
1
on Q.
Q
P
S
R
f
g
Since S = B \ R and Q = A \ P , so we want
P = A \ g(B \ f(P ))
Since the function
P 7→ A \ g
(
B \ f
(
P
)) from
P
(
A
) to
P
(
A
) is order-preserving
(and P(a) is complete), the result follows.
The next result we have is Zorn’s lemma. The main focus of Zorn’s lemma is
on maximal elements.
Definition (Maximal element). In a poset
X
,
x X
is maximal if no
y X
has y > x.
Caution! Under no circumstances confuse a maximal element with a maximum
element, except under confusing circumstances! A maximum element is defined
as an
x
such that all
y X
satisfies
y x
. These two notions are the same in
totally ordered sets, but are very different in posets.
Example. In the poset
a
b
c
d
e
c and e are maximal.
Not every poset has a maximal element, e.g.
N, Q, R
. In each of these, not
only are they incomplete. They have chains that are not bounded above.
Theorem (Zorn’s lemma). Assuming Axiom of Choice, let
X
be a (non-empty)
poset in which every chain has an upper bound. Then it has a maximal element.
Note that “non-empty” is not a strictly necessary condition, because if
X
is
an empty poset, then the empty chain has no upper bound. So the conditions
can never be satisfied.
The actual proof of Zorn’s lemma is rather simple, given what we’ve had so
far. We “hunt” for the maximal element. We start with
x
0
. If it is maximal,
done. If not, we find a bigger
x
1
. If
x
1
is maximal, done. Otherwise, keep go on.
If we never meet a maximal element, then we have an infinite chain. This
has an upper bound
x
ω
. If this is maximal, done. If not, find
x
ω+1
> x
ω
. Keep
going on.
We have not yet reached a contradiction. But suppose we never meet a
maximal element. If
X
is countable, and we can reach
x
ω
1
, then we have found
uncountably many elements in a countable set, which is clearly nonsense!
Since the ordinals can be arbitrarily large (Hartogs’ lemma), if we never
reach a maximal element, then we can get find more elements that X has.
Proof.
Suppose not. So for each
x X
, we have
x
X
with
x
> x
. We denote
the-element-larger-than-x by x
.
We know that each chain C has an upper bound, say u(C).
Let γ = γ(X), the ordinal-larger-than-X by Hartogs’ lemma.
We pick x X, and define x
α
for α < γ recursively by
x
0
= x
x
α
+
= x
α
x
λ
= u({x
α
: α < λ})
for non-zero limit λ
Of course, we have to show that
{x
α
:
α < λ}
is a chain. This is trivial by
induction.
Then α 7→ x
α
is an injection from γ X. Contradiction.
Note that we could as well have defined
x
λ
=
u
(
{x
α
:
α < λ}
), and we can
easily prove it is still an injection. However, we are lazy and put the “prime”
just to save a few lines of proof.
This proof was rather easy. However, this is only because we are given
ordinals, definition by recursion, and Hartogs’ lemma. Without these tools, it is
rather difficult to prove Zorn’s lemma.
A typical application of Zorn’s lemma is: Does every vector space have a
basis? Recall that a basis of
V
is a subset of
V
that is linearly independent (no
finite linear combination = 0) and spanning (ie every
x V
is a finite linear
combination from it).
Example.
Let V be the space of all real polynomials. A basis is {1, x, x
2
, x
3
, ···}.
Let
V
be the space of all real sequences. Let e
i
be the sequence with all
0 except 1 in the
i
th place. However,
{
e
i
}
is not a basis, since 1
,
1
, ···
cannot be written as a finite linear combination of them. In fact, there is
no countable basis (easy exercise). It turns out that there is no “explicit”
basis.
Take
R
as a vector space over
Q
. A basis here, if exists, is called a Hamel
basis.
Using Zorn’s lemma, we can prove that the answer is positive.
Theorem. Every vector space V has a basis.
Proof. We go for a maximal linearly independent subset.
Let
X
be the set of all linearly independent subsets of
V
, ordered by inclusion.
We want to find a maximal
B X
. Then
B
is a basis. Otherwise, if
B
does
not span
V
, choose
x ∈ span B
. Then
B {x}
is independent, contradicting
maximality.
So we have to find such a maximal
B
. By Zorn’s lemma, we simply have to
show that every chain has an upper bound.
Given a chain
{A
i
:
i I}
in
X
, a reasonable guess is to try the union. Let
A
=
S
A
i
. Then
A A
i
for all
i
, by definition. So it is enough to check that
A X, i.e. is linearly independent.
Suppose not. Say
λ
1
x
1
+
···
+
λ
n
x
n
= 0 for some
λ
1
···λ
n
scalars (not all
0). Suppose
x
1
A
i
1
, ···x
n
A
i
n
for some
i
1
, ···i
n
I
. Then there is some
A
i
m
that contains all
A
i
k
, since they form a finite chain. So
A
i
m
contains all
x
i
.
This contradicts the independence of A
i
m
.
Hence by Zorn’s lemma, X has a maximal element. Done.
Another application is the completeness theorem for propositional logic when
P , the primitives, can be uncountable.
Theorem (Model existence theorem (uncountable case)). Let
S L
(
P
) for any
set of primitive propositions P . Then if S is consistent, S has a model.
Proof.
We need a consistent
¯
S S
such that
t L
,
t
¯
S
or
¬t
¯
S
. Then we
have a valuation
v
(
t
) =
(
1 t
¯
S
0 t ∈
¯
S
, as in our original proof for the countable
case.
So we seek a maximal consistent
¯
S S
. If
¯
S
is maximal, then if
t ∈
¯
S
, then
we must have
¯
S {t}
inconsistent, i.e.
¯
S {t}
. By deduction theorem, this
means that
¯
S ¬t
. By maximality, we must have
¬t
¯
S
. So either
t
or
¬t
is in
¯
S.
Now we show that there is such a maximal
¯
S
. Let
X
=
{T L
:
T is consistent , T S}
. Then
X
=
since
S X
. We show that any
non-empty chain has an upper bound. An obvious choice is, again the union.
Let
{T
i
:
i I}
be a non-empty chain. Let
T
=
S
T
i
. Then
T T
i
for all
i
.
So to show that T is an upper bound, we have to show T X.
Certainly,
T S
, as any
T
i
contains
S
(and the chain is non-empty). So we
want to show
T
is consistent. Suppose
T
. So we have
t
1
, ··· , t
n
T
with
{t
1
, ··· , t
n
}
, since proofs are finite. Then some
T
k
contains all
t
i
since
T
i
are nested. So
T
k
is inconsistent. This is a contradiction. Therefore
T
must be
consistent.
Hence by Zorn’s lemma, there is a maximal element of X.
This proof is basically the same proof that every vector space has a basis! In
fact, most proofs involving Zorn’s lemma are similar.
3.2 Zorn’s lemma and axiom of choice
Recall that in the proof of Zorn’s, we picked
x
, then picked
x
, then picked
x
′′
,
ad infinitum. Here we are making arbitrary choices of
x
. In particular, we have
made infinitely many arbitrary choices.
We did the same in IA Numbers and Sets, when proving a countable union
of countable sets is countable, because we chose, for each
A
n
, a listing of
A
n
,
and then count them diagonally. We needed to make a choice because each
A
n
has a lot of possible listings, and we have to pick exactly one.
In terms of “rules for producing sets”, we are appealing to the axiom of choice,
which states that you can pick an element of each
A
i
whenever
{A
i
:
i I}
is a
family of non-empty sets. Formally,
Axiom (Axiom of choice). Given any family
{A
i
:
i I}
of non-empty sets,
there is a choice function f : i
S
A
i
such that f(i) A
i
.
This is of a different character from the other set-building rules (e.g. unions
and power sets exist). The difference is that the other rules are concrete. We
know exactly what
A B
is, and there is only one possible candidate for what
A B
might be. “Union” uniquely specifies what it produces. However, the
choice function is not.
{A
i
:
i I}
can have many choice functions, and the
axiom of choice does not give us a solid, explicit choice function. We say the
axiom of choice is non-constructive.
We are not saying that it’s wrong, but it’s weird. For this reason, it is often
of interest to ask “Did I use AC?” and “Do I need AC?”.
(It is important to note that the Axiom of Choice is needed only to make
infinite choices. It is trivially true if
|I|
= 1, since
A
=
by definition means
x A. We can also do it for two sets. Similarly, for |I| finite, we can do it by
induction. However, in general, AC is required to make infinite choices, i.e. it
cannot be deduced from the other axioms of set theory)
In the proof of Zorn’s we used Choice. However, do we need it? Is it possible
to prove it without Choice?
The answer is it is necessary, since we can deduce AC from Zorn’s. In other
words, we can write down a proof of AC from Zorn’s, using only the other
set-building rules.
Theorem. Zorn’s Lemma Axiom of choice.
As in the past uses of Zorn’s lemma, we have a big scary choice function to
produce. We know that we can do it for small cases, such as when
|I|
= 1. So we
start with small attempts and show that the maximal attempt is what we want.
Proof.
We have already proved that AC
Zorn. We now proved the other way
round.
Given a family
{A
i
:
i I}
of non-empty sets. We say a partial choice
function is a function
f
:
J
S
iI
A
i
(for some
J I
) such that
f
(
j
)
A
for
all j J.
Let
X
=
{
(
J, f
) :
f is a partial choice function with domain J}
. We order
by extension, i.e. (
J, f
)
(
J
, f
) iff
J J
and
f
agrees with
f
when both are
defined.
Given a chain
{
(
J
k
, f
k
) :
k K}
, we have an upper bound
(
S
J
k
,
S
f
k
)
, ie
the function obtained by combining all functions in the chain. So by Zorn’s, it
has a maximal element (J, f).
Suppose
J
=
I
. Then pick
i I \ J
. Then pick
x A
i
. Set
J
=
J {i}
and
f
=
f {
(
i, x
)
}
. Then this is greater than (
J, f
). This contradicts the
maximality of (
J, f
). So we must have
J
=
I
, i.e.
f
is a full choice function.
We have shown that Zorn’s lemma is equivalent to the Axiom of Choice.
There is a third statement that is also equivalent to both of these:
Theorem (Well-ordering theorem). Axiom of choice
every set
X
can be
well-ordered.
This might be very surprising at first for, say
X
=
R
, since there is no obvious
way we can well-order
R
. However, it is much less surprising given Hartogs’
lemma, since Hartogs’ lemma says that there is a (well-ordered) ordinal even
bigger than R. So well-ordering R shouldn’t be hard.
Proof.
The idea is to pick an element from
X
and call it the first; pick another
element and call it the second, and continue transfinitely until we pick everything.
For each
A X
with
A
=
X
, we let
y
A
be an element of
X \A
. Here we are
using Choice to pick out y
A
.
Define
x
α
recursively: Having defined
x
β
for all
β < α
, if
{x
β
:
β < α}
=
X
,
then stop. Otherwise, set
x
α
=
y
{x
β
:β}
, ie pick
x
α
to be something not yet
chosen.
We must stop at some time. Otherwise, we have injected
γ
(
X
) (ie the ordinal
larger than
X
) into
X
, which is a contradiction. So when stop, we have bijected
X
with an well-ordered set (i.e.
I
α
, where
α
is when you’ve stopped). Hence we
have well-ordered X.
Did we need AC? Yes, trivially.
Theorem. Well-ordering theorem Axiom of Choice.
Proof.
Given non-empty sets
{A
i
:
i I}
, well-order
S
A
i
. Then define
f
(
i
) to
be the least element of A
i
.
Our conclusion is:
Axiom of Choice Zorn’s lemma Well-ordering theorem.
Before we end, we need to do a small sanity check: we showed that these three
statements are equivalents using a lot of ordinal theory. Our proofs above make
sense only if we did not use AC when building our ordinal theory. Fortunately,
we did not, apart from the remark that
ω
1
is not a countable supremum which
used the fact that a countable union of countable sets is countable.
3.3 Bourbaki-Witt theorem*
Finally, we’ll quickly present a second (non-examinable) fixed-point theorem.
This time, we are concerned about chain-complete posets and inflationary
functions.
Definition (Chain-complete poset). We say a poset
X
is chain-complete if
X = and every non-empty chain has a supremum.
Example.
Every complete poset is chain-complete.
Any finite (non-empty) poset is chain complete, since every chain is finite
and has a greatest element.
{A V
:
A is linearly independent}
for any vector space
V
is chain-
complete, as shown in the proof that every vector space has a basis.
Definition (Inflationary function). A function
f
:
X X
is inflationary if
f(x) x for all x.
Theorem (Bourbaki-Witt theorem). If
X
is chain-complete and
f
:
X X
is
inflationary, then f has a fixed point.
This is follows instantly from Zorn’s, since
X
has a maximal element
x
, and
since
f
(
x
)
x
, we must have
f
(
x
) =
x
. However, we can prove Bourbaki-Witt
without choice. In the proof of Zorn’s, we had to “pick” and
x
1
> x
0
. Here, we
can simply let
x
0
f
7− x
1
f
7− x
2
f
7− x
3
···
Since each chain has a supremum instead of an upper bound, we also don’t need
Choice to pick our favorite upper bound of each chain.
Then we can do the same proof as Zorn’s to find a fixed point.
We can view this as the “AC-free” part of Zorn’s. It can be used to prove
Zorn’s lemma, but the proof is totally magic.
4 Predicate logic
In the first chapter, we studied propositional logic. However, it isn’t sufficient
for most mathematics we do.
In, say, group theory, we have a set of objects, operations and constants. For
example, in group theory, we have the operations multiplication
m
:
A
2
A
,
inverse
i
:
A A
, and a constant
e A
. For each of these, we assign a number
known as the arity, which specifies how many inputs each operation takes. For
example, multiplication has arity 2, inverse has arity 1 and
e
has arity 0 (we can
view e as a function A
0
A, that takes no inputs and gives a single output).
The study of these objects is known as predicate logic. Compared to proposi-
tional logic, we have a much richer language, which includes all the operations
and possibly relations. For example, with group theory, we have
m, i, e
in our
language, as well as things like
,
etc. Note that unlike propositional logic,
different theories give rise to different languages.
Instead of a valuation, now we have a structure, which is a solid object plus
the operations and relations required. For example, a structure of group theory
will be an actual concrete group with the group operations.
Similar to what we did in propositional logic, we will take
S |
=
t
to mean
“for any structure in which
S
is true,
t
is true”. For example, “Axioms of group
theory”
|
=
m
(
e, e
) =
e
, i.e. in any set that satisfies the group axioms,
m
(
e, e
) =
e
.
We also have S t meaning we can prove t from S.
4.1 Language of predicate logic
We start with the definition of the language. This is substantially more compli-
cated than what we’ve got in propositional logic.
Definition (Language). Let (function symbols) and Π (relation symbols) be
disjoint sets, and α : Π N a function (’arity’).
The language L = L(Ω, Π, α) is the set of formulae, defined as follows:
Variables: we have some variables
x
1
, x
2
, ···
. Sometimes (i.e. always), we
write x, y, z, ··· instead.
Terms: these are defined inductively by
(i) Every variable is a term
(ii)
If
f
Ω,
α
(
f
) =
n
, and
t
1
, ··· , t
n
are terms, then
ft
1
···t
n
is a term.
We often write f (t
1
, ··· , t
n
) instead.
Example. In the language of groups =
{m, i, e}
, Π =
, and
α
(
m
) =
2, α(i) = 1, α(e) = 0. Then e, x
1
, m(x
1
, x
2
), i(m(x
1
, x
1
)) are terms.
Atomic formulae: there are three sorts:
(i)
(ii) (s = t) for any terms s, t.
(iii) (ϕt
1
···t
n
) for any ϕ Π with α(ϕ) = n and t
1
, ··· , t
n
terms.
Example. In the language of posets, =
, Π =
{≤}
and
α
(
) = 2.
Then (x
1
= x
1
), x
1
x
2
(really means ( x
1
x
2
)) are atomic formulae.
Formulae: defined inductively by
(i) Atomic formulae are formulae
(ii) (p q) is a formula for any formulae p, q.
(iii) (x)p is a formula for any formula p and variable x.
Example. In the language of groups,
e
=
e
,
x
1
=
e
,
m
(
e, e
) =
e
,
(
x
)
m
(
x, i
(
x
)) =
e
, (
x
)(
m
(
x, x
) =
e
(
y
)(
m
(
y, y
) =
x
)) are formu-
lae.
It is important to note that a formula is a string of meaningless symbol. It
doesn’t make sense to ask whether it is true or false. In particular, the function
and relation symbols are not assigned any meaning. The only thing the language
cares is the arity of the symbol.
Again, we have the usual abbreviations
¬p
,
p q
,
p q
etc. Also, we have
(x)p for ¬(x)(¬p).
Definition (Closed term). A term is closed if it has no variables.
Example. In the language of groups,
e, m
(
e, e
) are closed terms. However,
m
(
x, i
(
x
)) is not closed even though we think it is always
e
. Apart from the
fact that it is by definition not closed (it has a variable
x
), we do not have the
groups axioms stating that m(x, i(x)) = e.
Definition (Free and bound variables). An occurrence of a variable
x
in a
formula
p
is bound if it is inside brackets of a (
x
) quantifier. It is free otherwise.
Example. In (x)(m(x, x) = e), x is a bound variable.
In (y)(m(y, y) = x m(x, y) = m(y, x)), y is bound while x is free.
We are technically allowed to have a formula with
x
both bound and free,
but DO NOT DO IT. For example,
m
(
x, x
) =
e
(
x
)(
y
)(
m
(
x, y
) =
m
(
y, x
))
is a valid formula (first two x are free, while the others are bound).
Definition (Sentence). A sentence is a formula with no free variables.
Example.
m
(
e, e
) =
e
and (
x
)(
m
(
x, x
) =
e
) are sentences, while
m
(
x, i
(
x
)) =
e
is not.
Definition (Substitution). For a formula
p
, a variable
x
and a term
t
, the
substitution p[t/x] is obtained by replacing each free occurrence of x with t.
Example. If
p
is the statement (
y
)(
m
(
y, y
) =
x
), then
p
[
e/x
] is (
y
)(
m
(
y, y
) =
e.
4.2 Semantic entailment
In propositional logic, we can’t say whether a proposition is true or false unless
we have a valuation. What would be a “valuation” in the case of predicate logic?
It is a set with the operations of the right arity. We call this a structure.
For example, in the language of groups, a structure is a set
G
with the correct
operations. Note that it does not have to satisfy the group axioms in order to
qualify as a structure.
Definition (Structure). An
L
-structure is a non-empty set
A
with a function
f
A
:
A
n
A
for each
f
, α
(
f
) =
n
, and a relation
ϕ
A
A
n
, for each
ϕ
Π,
α(ϕ) = n.
Note that we explicitly forbid
A
from being empty. It is possible to formulate
predicate logic that allows empty structures, but we will have to make many
exceptions when defining our axioms, as we will see later. Since empty structures
are mostly uninteresting (and don’t exist if there is at least one constant), it
isn’t a huge problem if we ignore it. (There is a small caveat here we are
working with single-sorted logic here, so everything is of the same “type”. If
we want to work with multi-sorted logic, where there can be things of different
types, it would then be interesting to consider the case where some of the types
could be empty).
Example. In the language of posets
A
, a structure is a set
A
with a relation
A
A × A.
In the language of groups, a structure is a set
A
with functions
m
A
:
A×A
A, i
A
: A A and e
A
A.
Again, these need not be genuine posets/groups since we do not have the
axioms yet.
Now we want to define
p
holds in
A
for a sentence
p L
and a
L
-structure
A.
For example, we want (
x
)(
m
(
x, x
) =
e
) to be true in
A
iff for each
a A
,
we have
m
A
(
a, a
) =
e
A
. So to translate
p
into something about
A
, you “add
subscript
A
to each function-symbol and relation-symbol, insert
A
after the
quantifiers, and say it aloud”. We call this the interpretation of the sentence p.
This is not great as a definition. So we define it formally, and then quickly
forget about it.
Definition (Interpretation). To define the interpretation
p
A
0, 1
for each
sentence p and L-structure A, we define inductively:
(i) Closed terms: define t
A
A for each closed term t by
(ft
1
, ··· , t
n
)
A
= f
A
(t
1
A
, t
2
A
··· , t
n
A
)
for any f Ω, α(f) = n, and closed terms t
1
, ··· , t
n
.
Example. (m(m(e, e), e))
A
= m
A
(m
A
(e
A
, e
A
), e
A
).
(ii) Atomic formulae:
A
= 0
(s = t)
A
=
(
1 s
A
= t
A
0 s
A
= t
A
(ϕt
1
···t
n
)
A
=
(
1 (t
1
A
, ··· , t
n
A
) ϕ
A
0 otherwise
.
(iii) Sentences:
(p q)
A
=
(
0 p
A
= 1, q
A
= 0
1 otherwise
((x)p)
A
=
(
1 pa/x]
¯
A
for all a A
0 otherwise
where for any
a A
, we define a new language
L
by adding a constant
¯a
and make A into an L
structure
¯
A by setting ¯a
¯
A
= a.
Now that we have formally defined truth, just forget about it!
Note that we have only defined the interpretation only for sentences. We
can also define it for functions with free variables. For any formula
p
with
n
free
variables, we can define the interpretation as the set of all things that satisfy
p
.
For example, if p is (y)(m(y, y) = a), then
p
A
= {a A : b A such that m(b, b) = a}.
However, we are mostly interested in sentences, and don’t have to worry about
these.
Now we can define models and entailment as in propositional logic.
Definition (Theory). A theory is a set of sentences.
Definition (Model). If a sentence
p
has
p
A
= 1, we say that
p
holds in
A
, or
p
is true in A, or A is a model of p.
For a theory S, a model of S is a structure that is a model for each s S.
Definition (Semantic entailment). For a theory
S
and a sentence
t
,
S
entails
t
,
written as S |= t, if every model of S is a model of t.
“Whenever S is true, t is also true”.
Definition (Tautology).
t
is a tautology, written
|
=
t
, if
|
=
t
, i.e. it is true
everywhere.
Example. (x)(x = x) is a tautology.
Example.
(i) Groups:
The language
L
is = (
m, i, e
) and Π =
, with arities 2, 1, 0
respectively.
Let T be
{(x)(y)(z)m(x, m(y, z)) = m(m(x, y), z),
(x)(m(x, e) = x m(e, x) = x),
(x)(m(x, i(x)) = e m(i(x), x) = e)}.
Then an
L
-structure
A
is a model for
T
iff
A
is a group. We say
T
axiomatizes the theory of groups/class of groups. Sometimes we call the
members of T the axioms of T .
Note that we could use a different language and theory to axiomatize group
theory. For example, we can have = (
m, e
) and change the last axiom to
(x)(y)(m(x, y) = e m(y, x) = e)}.
(ii) Fields:
The language
L
is = (+
, ×, ,
0
,
1) and Π =
, with arities 2, 2, 1,
0, 0.
The theory T consists of:
Abelian group under +
× is commutative, associative, and distributes over +
¬(0 = 1)
(x)((¬(x = 0)) (y)(y × x = 1).
Then an
L
-structure is a model of
T
iff it is a field. Then we have
T |= ”inverses are unique”, i.e.
T |= (x)
¬(x = 0)
(y)(z)
(xy = 1 xz = 1) (y = z)
(iii) Posets:
The language is = , and Π = {≤} with arity 2.
The theory T is
{(x)(x x),
(x)(y)(z)
(x y) (y z) x z
,
(x)(y)
x y y z x = y
}
Then T axiomatizes the theory of posets.
(iv) Graphs:
The language
L
is =
and Π =
{a}
with arity 2. This relation is
“adjacent to”. So a(x, y) means there is an edge between x and y.
The theory is
{(x)(¬a(x, x)),
(x)(y)(a(x, y) a(y, x)}
Predicate logic is also called “first-order logic”. It is “first-order” because our
quantifiers range over elements of the structure only, and not subsets. It would
be difficult (and in fact impossible) to axiomatize, say, a complete ordered field,
since the definition requires says every bounded subset has a least upper bound.
4.3 Syntactic implication
Again, to define syntactic implication, we need axioms and deduction rules.
Definition (Axioms of predicate logic). The axioms of predicate logic consists
of the 3 usual axioms, 2 to explain how = works, and 2 to explain how
works.
They are
1. p (q p) for any formulae p, q.
2. [p (q r)] [(p q) (p r)] for any formulae p, q, r.
3. (¬¬p p) for any formula p.
4. (x)(x = x) for any variable x.
5.
(
x
)(
y
)
(
x
=
y
)
(
p p
[
y/x
])
for any variable
x, y
and formula
p
,
with y not occurring bound in p.
6.
[(
x
)
p
]
p
[
t/x
] for any formula
p
, variable
x
, term
t
with no free variable
of t occurring bound in p.
7.
[(
x
)(
p q
)]
[
p
(
x
)
q
] for any formulae
p, q
with variable
x
not
occurring free in p.
The deduction rules are
1. Modus ponens: From p and p q, we can deduce q.
2.
Generalization: From
r
, we can deduce (
x
)
r
, provided that no premise
used in the proof so far had x as a free variable.
Again, we can define proofs, theorems etc.
Definition (Proof). A proof of
p
from
S
is a sequence of statements, in which
each statement is either an axiom, a statement in
S
, or obtained via modus
ponens or generalization.
Definition (Syntactic implication). If there exists a proof a formula
p
form a
set of formulae S, we write S p S proves t”.
Definition (Theorem). If
S p
, we say
p
is a theorem of
S
. (e.g. a theorem of
group theory)
Note that these definitions are exactly the same as those we had in propo-
sitional logic. The only thing that changed is the set of axioms and deduction
rules.
Example. {x = y, x = z} y = z.
We go for x = z giving y = z using Axiom 5.
1. (x)(y)((x = y) (x = z y = z)) Axiom 5
2. [(x)(y)((x = y) (x = z y = z))] (y)(x = y y = z) Axiom 6
3. (y)((x = y) x = z y = z) MP on 1, 2
4. [(y)((x = y) x = z y = z)] [(x = y) (x = z y = z) Axiom 6
5. (x = y) [(x = z) (y = z)] MP on 3, 4
6. x = y Premise
7. (x = z) (y = z) MP 6, 7
8. x = z Premise
9. y = z MP on 7, 8
Note that in the first 5 rows, we are merely doing tricks to get rid of the
signs.
We can now revisit why we forbid
from being a structure. If we allowed
,
then (
x
)
holds in
. However, axioms 6 states that ((
x
)
)
. So we can
deduce
in the empty structure! To fix this, we will have to add some weird
clauses to our axioms, or simply forbid the empty structure!
Now we will prove the theorems we had for propositional logic.
Proposition (Deduction theorem). Let
S L
, and
p, q L
. Then
S {p} q
if and only if S p q.
Proof.
The proof is exactly the same as the one for propositional logic, except
in the case, we have to check Gen.
Suppose we have lines
r
(x)r Gen
and we have a proof of
S p r
(by induction). We want to seek a proof of
p (x)r from S.
We know that no premise used in the proof of
r
from
S {p}
had
x
as a free
variable, as required by the conditions of the use of Gen. Hence no premise used
in the proof of p r from S had x as a free variable.
Hence S (x)(p r).
If x is not free in p, then we get S p (x)r by Axiom 7 (and MP).
If
x
is free in
p
, then we did not use premise
p
in our proof
r
from
S {p}
(by the conditions of the use of Gen). So
S r
, and hence
S
(
x
)
r
by Gen.
So S p (x)r.
Now we want to show
S p
iff
S |
=
p
. For example, a sentence that holds in
all groups should be deducible from the axioms of group theory.
Proposition (Soundness theorem). Let
S
be a set of sentences,
p
a sentence.
Then S p implies S |= p.
Proof.
(non-examinable) We have a proof of
p
from
S
, and want to show that
for every model of S, p holds.
This is an easy induction on the lines of the proof, since our axioms are
tautologies and our rules of deduction are sane.
The hard part is proving
S |= p S p.
This is, by the deduction theorem,
S p} |= S p} .
This is equivalent to the contrapositive:
S p} ⊢ S p} |= .
Theorem (Model existence lemma). Let
S
be a consistent set of sentences.
Then S has a model.
We need several ideas to prove the lemma:
(i)
We need to find a structure. Where can we start from? The only thing we
have is the language. So we start form the language. Let
A
= set of all
closed terms, with the obvious operations.
For example, in the theory of fields, we have “1 + 1”, “0 + 1“ etc in the
structure. Then (1 + 1) +
A
(0 + 1) = (1 + 1) + (0 + 1).
(ii)
However, we have a problem. In, say, the language of fields, and
S
our
field axioms, our
A
has distinct elements “1 + 0”, “0 + 1”, “0 + 1 + 0”
etc. However,
S
1 + 0 = 0 + 1 etc. So we can’t have them as distinct
elements. The solution is to quotient out by the equivalence relation
s t
if
S
(
s
=
t
), i.e. our structure is the set of equivalence classes. It is
trivial check to check that the +,
×
operations are well-defined for the
equivalence classes.
(iii)
We have the next problem: If
S
is ”field of characteristic 2 or 3“, i.e.
S
has a field axiom plus 1 + 1 = 0
1 + 1 + 1 = 0. Then
S ⊢
1 + 1 = 0. Also
S ⊢
1 + 1 + 1 = 0. So [1 + 1]
= [0], and [1 + 1 + 1]
= [0]. But then our
S
has neither characteristic 2 or 3.
This is similar to the problem we had in the propositional logic case, where
we didn’t know what to do with
p
3
if
S
only talks about
p
1
and
p
2
. So we
first extend S to a maximal consistent (or complete)
¯
S.
(iv)
Next problem: Let
S
= “fields with a square root of 2”, i.e.
S
is the
field axioms plus (
x
)(
xx
= 1 + 1). However, there is no closed term
t
which is equivalent to
2
. We say we lack witnesses to the statement
(
x
)(
xx
= 1 + 1). So we add a witness. We add a constant
c
to the
language, and add the axiom
cc
= 1 + 1” to
S
. We do this for each such
existential statement.
(v)
Now what? We have added new symbols to
S
, so our new
S
is no longer
complete! Of course, we go back to (iii), and take the completion again.
Then we have new existential statements to take care of, and we do (iv)
again. Then we’re back to (iii) again! It won’t terminate!
So we keep on going, and finally take the union of all stages.
Proof.
(non-examinable) Suppose we have a consistent
S
in the language
L
=
L
(Ω
,
Π). Extend
S
to a consistent
S
1
such that
p S
1
or (
¬p
)
S
for each
sentence
p L
(by applying Zorn’s lemma to get a maximal consistent
S
1
). In
particular, S
1
is complete, meaning S
1
p or S
1
¬p for all p.
Then for each sentence of the form (
x
)
p
in
S
1
, add a new constant
c
to
L
and add
p
[
c/x
] to
S
1
obtaining
T
1
in language
L
1
=
L
(Ω
C
1
,
Π). It is easy
to check that T
1
is consistent.
Extend
T
1
to a complete theory
S
2
L
1
, and add witnesses to form
T
2
L
2
= L(Ω C
1
C
2
, Π). Continue inductively.
Let
¯
S
=
S
1
S
2
···
in language
¯
L
=
L
1
L
2
···
(i.e.
¯
L
=
L
(Ω
C
1
C
2
··· , Π)).
Claim.
¯
S
is consistent, complete, and has witnesses, i.e. if (
x
)
p
¯
S
, then
p[t/x]
¯
S For some term t.
It is consistent since if
¯
S
, then some
S
n
since proofs are finite. But
all S
n
are consistent. So
¯
S is consistent.
To show completeness, for sentence
p
¯
L
, we have
p L
n
for some
n
, as
p
has only finitely many symbols. So
S
n+1
p
or
S
n+1
¬p
. Hence
¯
S p
or
¯
S ¬p.
To show existence of witnesses, if (
x
)
p
¯
S
, then (
x
)
p S
n
for some
n
.
Hence (by construction of T
n
), we have p[c/x] T
n
for some constant c.
Now define an equivalence relation
on closed term of
¯
L
by
s t
if
¯
S
(
s
=
t
). It is easy to check that this is indeed an equivalence relation. Let
A
be the set of equivalence classes. Define
(i) f
A
([t
1
], ··· , [t
n
]) = [ft
1
, ··· , t
n
] for each formula f Ω, α(f) = n.
(ii) ϕ
A
=
{
([
t
1
]
, ··· ,
[
t
n
]) :
¯
S ϕ
(
t
1
, ··· , t
n
)
}
for each relation
ϕ
Π and
α(ϕ) = n.
It is easy to check that this is well-defined.
Claim. For each sentence
p
,
¯
S p
(i.e.
p
¯
S
) if and only if
p
holds in
A
, i.e.
p
A
= 1.
We prove this by an easy induction.
Atomic sentences:
:
¯
S ⊢ , and
A
= 0. So good.
s
=
t
:
¯
S s
=
t
iff [
s
] = [
t
] (by definition) iff
s
A
=
t
A
(by definition
of s
A
) iff (s = t)
A
. So done.
ϕt
1
, ··· , t
n
is the same.
Induction step:
p q
:
¯
S
(
p q
) iff
¯
S
(
¬p
) or
¯
S q
(justification: if
¯
S ⊢ ¬p
and
¯
S ⊢ q
, then
¯
S p
and
¯
S ¬q
by completeness, hence
¯
S ¬
(
p q
),
contradiction). This is true iff p
A
= 0 or q
A
= 1 iff (p q)
A
= 1.
(
x
)
p
:
¯
S
(
x
)
p
iff
¯
S p
[
t/x
] for some closed term
t
. This is true
since
¯
S
has witnesses. Now this holds iff
p
[
t/x
]
A
= 1 for some closed
term
t
(by induction). This is the same as saying (
x
)
p
holds in
A
,
because A is the set of (equivalence classes of) closed terms.
Here it is convenient to pretend
is the primitive symbol instead of
.
Then we can define (
x
)
p
to be
¬
(
x
)
¬p
, instead of the other way round.
It is clear that the two approaches are equivalent, but using
as primitive
makes the proof look clearer here.
Hence A is a model of
¯
S. Hence it is also a model of S. So S has a model.
Again, if
L
is countable (i.e.
,
Π are countable), then Zorn’s Lemma is not
needed.
From the Model Existence lemma, we obtain:
Corollary (Adequacy theorem). Let
S
be a theory, and
p
a sentence. Then
S |= p implies S p.
Theorem (G¨odel’s completeness theorem (for first order logic)). Let
S
be a
theory, p a sentence. Then S p if and only if S |= p.
Proof. () Soundness, () Adequacy.
Corollary (Compactness theorem). Let
S
be a theory such that every finite
subset of S has a model. Then so does S.
Proof.
Trivial if we replace “has a model” with “is consistent”, because proofs
are finite.
We can look at some applications of this:
Can we axiomatize the theory of finite groups (in the language of groups)?
i.e. is there a set of sentences T such that models of T are finite groups.
Corollary. The theory of finite groups cannot be axiomatized (in the language
of groups).
It is extraordinary that we can prove this, as opposed to just “believing it
should be true”.
Proof.
Suppose theory
T
has models all finite groups and nothing else. Let
T
be T together with
(x
1
)(x
2
)(x
1
= x
2
) (intuitively, |G| 2)
(x
1
)(x
2
)(x
3
)(x
1
= x
2
= x
3
) (intuitively, |G| 3)
···
Then
T
has no model, since each model has to be simultaneously arbitrarily
large and finite. But every finite subset of
T
does have a model (e.g.
Z
n
for
some n). Contradiction.
This proof looks rather simple, but it is not “easy” in any sense. We are
using the full power of completeness (via compactness), and this is not easy to
prove!
Corollary. Let
S
be a theory with arbitrarily large models. Then
S
has an
infinite model.
“Finiteness is not a first-order property”
Proof. Same as above.
Similarly, we have
Corollary (Upward owenheim-Skolem theorem). Let
S
be a theory with an
infinite model. Then S has an uncountable model.
Proof. Add constants {c
i
: i I} to L for some uncountable I.
Let T = S
S
{c
i
= c
j
: i, j I, i = j}.
Then any finite
T
T
has a model, since it can only mention finitely many
of the
C
i
. So any infinite model of
S
will do. Hence by compactness,
T
has a
model
Similarly, we have a model for
S
that does not inject into
X
, for any chosen
set X. For example, we can add γ(X) constants, or P(X) constants.
Example. There exists an infinite field (
Q
). So there exists an uncountable
field (e.g. R). Also, there is a field that does not inject into P(P(R)), say,
Theorem (Downward owenheim-Skolem theorem). Let
L
be a countable
language (i.e. and Π are countable). Then if
S
has a model, then it has a
countable model.
Proof.
The model constructed in the proof of model existence theorem is count-
able.
Note that the proof of the model existence theorem is non-examinable, but
the proof of this is examinable! So we are supposed to magically know that the
model constructed in the proof is countable without knowing what the proof
itself is!
4.4 Peano Arithmetic
As an example, we will make the usual axioms for
N
into a first-order theory.
We take =
{
0
, s,
+
, ×}
, and Π =
. The arities are
α
(0) = 0
, α
(
s
) = 1
, α
(+) =
α(×) = 2.
The operation
s
is the successor operation, which should be thought of as
“+1”.
Our axioms are
Definition (Peano’s axioms). The axioms of Peano’s arithmetic (PA) are
(i) (x)¬(s(x) = 0).
(ii) (x)(y)((s(x) = s(y)) (x = y)).
(iii) (y
1
) ···(y
n
)
[
p
[0
/x
]
(
x
)(
p p
[
s
(
x
)
/x
])]
(
x
)
p
. This is actually
infinitely many axioms one for each formula
p
, free variables
y
1
, ··· , y
n
, x
,
i.e. it is an axiom scheme.
(iv) (x)(x + 0 = x).
(v) (x)(y)(x + s(y) = s(x + y)).
(vi) (x)(x × 0 = 0).
(vii) (x)(y)(x × s(y) = (x + y) + x).
Note that our third axiom looks rather funny with all the (
y
i
) in front. Our
first guess at writing it would be
[p[0/x] (x)(p p[s(x)/x])] (x)p.
However, this is in fact not sufficient. Suppose we want to prove that for all
x
and
y
,
x
+
y
=
y
+
x
. The natural thing to do would be to fix a
y
and induct
on
x
(or the other way round). We want to be able to fix any
y
to do so. So
we need a (
y
) in front of our induction axiom, so that we can prove it for all
values of
y
all at once, instead of proving it once for
y
= 0, once for
y
= 1 , once
for
y
= 1 + 1 etc. This is important, since we might have an uncountable model
of PA, and we cannot name all
y
. When we actually think about it, we can just
forget about the (y
i
)s. But just remember that formally we need them.
We know that PA has a model
N
that is infinite. So it has an uncountable
model by Upward owenheim-Skolem. Then clearly this model is not isomorphic
to
N
. However, we are also told that the axioms of arithmetic characterize
N
completely. Why is this so?
This is since Axiom 3 is not full induction, but a “first-order” version. The
proper induction axiom talks about subsets of N, i.e.
(S N)((0 S x S s(x) S) S = N).
However, there are uncountably many subsets of
N
, and countably many formulae
p. So our Axiom 3 only talks about some subsets of N, not all.
Now the important question is: is PA complete?
odel’s incompleteness theorem says: no! There exists some
p
with PA
⊢ p
and PA ⊢ ¬p.
Hence, there is a p that is true in N, but PA ⊢ p.
Note that this does not contradict odel’s completeness theorem. The
completeness theorem tells us that if
p
holds in every model of PA (not just in
N), then P A p.
4.5 Completeness and categoricity*
We now study some “completeness” property of theories. For convenience, we
will assume that the language is countable.
Definition (Complete theory). A theory
T
is complete if for all propositions
p
in the language, either T p or T ¬p.
Complete theories are rather hard to find. So for the moment, we will content
ourselves by just looking at theories that are not complete.
Example. The theory of groups is not complete, since the proposition
(g)(h)(gh = hg)
can neither be proven or disproven (there are abelian groups and non-abelian
groups).
Another “completeness” notion we might be interested in is categoricity. We
will need to use the notion of a cardinal, which will be introduced later in the
course. Roughly, a cardinal is a mathematical object that denotes the sizes of
sets, and a set “has cardinality κ if it bijects with κ.
Definition (
κ
-categorical). Let
κ
be an infinite cardinal. Then a theory
T
is
κ
-categorical if there is a unique model of the theory of cardinality
κ
up to
isomorphism.
Here the notion of “isomorphism” is the obvious one a homomorphism of
models is a function between the structures that preserves everything, and an
isomorphism is a homomorphism with an inverse (that is also a homomorphism).
Example (Pure identity theory). The pure identity theory has an empty language
and no axioms. Then this theory is
κ
-categorical for any
κ
, since if two models
have the same cardinality, then by definition there is a bijection between them,
and any such bijection is an isomorphism because there are no properties to be
preserve.
One nice thing about categorical theories is that they are complete!
Proposition. Let
T
be a theory that is
κ
categorical for some
κ
, and suppose
T has no finite models. Then T is complete.
Note that the requirement that
T
has no finite models is necessary. For
example, pure identity theory is
κ
-categorical for all
κ
but is not complete, since
the statement (x)(y)(¬(x = y)) is neither provable nor disprovable.
Proof.
Let
p
be a proposition. Suppose
T ⊢ p
and
T ⊢ ¬p
. Then there are
infinite models of
T {p}
and
T p}
(since the models cannot be finite), and
so by the owenhein–Skolem theorems, we can find such models of cardinality
κ
. But since one satisfies
p
and the other does not, they cannot be isomorphic.
This contradicts κ-categoricity.
We are now going to use this idea to prove the Ax-Grothendieck theorem
Theorem (Ax-Grothendieck theorem). Let
f
:
C
n
C
n
be a complex polyno-
mial. If f is injective, then it is in fact a bijection.
We will use the following result from field theory without proof:
Lemma. Any two uncountable algebraically closed fields with the same di-
mension and same characteristic are isomorphic. In other words, the theory of
algebraically closed fields of characteristic
p
(for
p
a prime or 0) is
κ
-categorical
for all uncountable cardinals κ, and in particular complete.
The rough idea (for the field theorists) is that an algebraically closed field is
uniquely determined by its transcendence degree over the base field Q or F
p
.
In the following proof, we will also use some (very) elementary results about
field theory that can be found in IID Galois Theory.
Proof of Ax-Grothendieck.
We will use compactness and completeness to show
that we only have to prove this for fields of positive characteristic, and the result
can be easily proven since we end up dealing with finite fields.
Let
ACF
be the theory of algebraically closed fields. The language is the
language of rings, and the axioms are the usual axioms of a field, plus the
following axiom for each n > 0:
(a
0
, a
1
, ··· , a
n1
)(x)(x
n
+ a
n1
x
n1
+ ··· + a
1
x + a
0
= 0).
Let
ACF
0
denote the theory of algebraically closed fields of characteristic 0,
where we add the axiom
1 + 1 + ··· + 1
| {z }
n times
= 0 ()
for all n to ACF
n
.
Let
ACF
p
denote the theory of algebraically closed fields of characteristic
p
,
where we add the axiom
1 + 1 + ··· + 1
| {z }
p times
= 0
to ACF.
We now notice the following fact: if
r
is a proposition that is a theorem of
ACF
p
for all
p
, then it is true of
ACF
0
. Indeed, we know that
ACF
0
is complete.
So if
r
is not a theorem in
ACF
0
, then
¬r
is a theorem. But the proof is finite,
so it can only use finitely many instances of (
). So there is some large
p
where
¬r can be proven in ACF
p
, which is a contradiction.
Now the statement “If
f
is a polynomial of degree
d
and
f
is injective, then
f
is surjective” can be expressed as a first-order statement. So we just have to
prove it for all fields of characteristic
p >
0. Moreover, by completeness, for each
p
, we only need to prove it for some algebraically complete field of characteristic
p.
Fix a prime
p
, and consider
F
=
¯
F
p
, the algebraic closure of
F
p
. This is an
algebraically closed field with the property that every element is algebraic over
F
p
, i.e. the field generated by any finite subset of elements is finite.
Let
f
:
F
n
F
n
be a polynomial function involving coefficients
a
1
, ··· , a
K
.
Let
b
= (
b
1
, ··· , b
n
)
F
n
be a point. Then
F
restricts to a function from the
field
˜
F
generated by
{b
1
, ··· , b
n
, a
1
, ··· , a
K
}
to itself. But
˜
F
is finite, so any
function
f|
˜
F
:
˜
F
˜
F
that is injective must also be surjective. So
b
is in the
image of f. So f is surjective. So done.
We conclude the section by stating, without proof, a theorem by Morely:
Theorem (Morley’s categoricity theorem). Let
T
be a theory with a countable
language. If
T
is
κ
-categorical for some uncountable cardinal
κ
, then it is
µ-categorical for all uncountable cardinals µ.
Hence we often just say a theory is uncountably categorical when the theory
is categorical for some (hence all) uncountable cardinals.
5 Set theory
Here we’ll axiomatize set theory as “just another first-order theory”, with
signatures, structures etc. There are many possible formulations, but the most
common one is Zermelo Fraenkel set theory (with the axiom of choice), which is
what we will study.
5.1 Axioms of set theory
Definition (Zermelo-Fraenkel set theory). Zermelo-Fraenkel set theory (ZF)
has language = , Π = {∈}, with arity 2.
Then a “universe of sets” will simply mean a model of these axioms, a pair
(
V,
), where
V
is a set and
is a binary relation on
V
in which the axioms are
true (officially, we should write
V
, but it’s so weird that we don’t do it). Each
model (
V,
) will (hopefully) contain a copy of all of maths, and so will look
very complicated!
ZF will have 2 axioms to get started with, 4 to build things, and 3 more
weird ones one might not realize are needed.
Axiom (Axiom of extension). “If two sets have the same elements, they are the
same set”.
(x)(y)((z)(z x z y) x = y).
We could replace the
with an
, but the converse statement
x
=
y
(
z x z y
) is an instance of a logical axiom, and we don’t have to explicitly
state it.
Axiom (Axiom of separation). “Can form subsets of sets”. More precisely, for
any set x and a formula p, we can form {z x : p(z)}.
(t
1
) ···(t
n
)(x)(y)(z)(z y (z x p)).
This is an axiom scheme, with one instance for each formula
p
with free variables
t
1
, ··· , t
n
, z.
Note again that we have those funny (
t
i
). We do need them to form, e.g.
{z x : t z}, where t is a parameter.
This is sometimes known as Axiom of comprehension.
Axiom (Axiom of empty set). “The empty-set exists”
(x)(y)(y ∈ x).
We write
for the (unique, by extension) set with no members. This is an
abbreviation:
p
(
) means (
x
)(
x has no members p
(
x
)). Similarly, we tend to
write {z x : p(z)} for the set given by separation.
Axiom (Axiom of pair set). “Can form {x, y}”.
(x)(y)(z)(t)(t z (t = x t = y)).
We write {x, y} for this set. We write {x} for {x, x}.
We can now define ordered pairs:
Definition (Ordered pair). An ordered pair (x, y) is {{x}, {x, y}}.
We define x is an ordered pair” to mean (y)(z)(x = (y, z)).
We can show that (a, b) = (c, d) (a = c) (b = d).
Definition (Function). We define f is a function to mean
(x)(x f x is an ordered pair)
(x)(y)(z)[(x, y) f (x, z) f ] y = z.
We define
x
=
dom f
to mean
f
is a function and (
y
)(
y x
(
z
)((
y, z
)
f
)).
We define f : x y to mean f is a function and dom f = x and
(z)[(t)((t, z) f) z y].
Once we’ve defined them formally, let’s totally forget about the definition
and move on with life.
Axiom (Axiom of union). “We can form unions” Intuitively, we have
a b c
=
{x
:
x a or x b or x c}
. but instead of
a b c
, we write
S
{a, b, c}
so that
we can express infinite unions as well.
(x)(y)(z)(z y (t)(t x z t)).
We tend to write
S
x for the set given above. We also write x y for
S
{x, y}.
Note that we can define intersection
T
x
as a subset of
y
(for any
y x
) by
separation, so we don’t need an axiom for that.
Axiom (Axiom of power set). “Can form power sets”.
(x)(y)(z)(z y z x),
where z x means (t)(t z t x).
We tend to write P(x) for the set generated above.
We can now form
x × y
, as a subset of
P
(
P
(
x y
)), because for
t x, s y
,
we have (t, s) = {{t}, {t, s}} P(P(x y)).
Similarly, we can form the set of all functions from
x
to
y
as a subset of
P(x × y) (or P(P(P(x y)))!).
Now we’ve got the easy axioms. Time for the weird ones.
Axiom of infinity
From the axioms so far, we cannot prove that there is an infinite set! We start
from the empty set, and all the operations above can only produce finite sets.
So we need an axiom to say that the natural numbers exists.
But how could we do so in finitely many words? So far, we do have infinitely
many sets. For example, if we write
x
+
for
x {x}
, it is easy to check that
,
+
,
++
, ··· are all distinct.
(Writing them out explicitly, we have
+
=
{∅}
,
++
=
{∅, {∅}}
,
+++
=
{∅, {∅}, {∅, {∅}}}
. We can also write 0 for
, 1 for
+
, 2 for
++
. So 0 =
,
1 = {0}, 2 = {0, 1}, ···)
Note that all of these are individually sets, and so
V
is definitely infinite.
However, inside
V
,
V
is not a set, or else we can apply separation to obtain
Russell’s paradox.
So the collection of all 0
,
1
, ··· ,
need not be a set. Therefore we want an
axiom that declares this is a set. So we want an axiom stating
x
such that
x,
+
x,
++
x, ···
. But this is an infinite statement, and we need a
smarter way of formulating this.
Axiom (Axiom of infinity). “There is an infinite set”.
(x)( x (y)(y x y
+
x)).
We say any set that satisfies the above axiom is a successor set.
A successor set can contain a lot of nonsense elements. How do we want to
obtain N, without nonsense?
We know that the intersection of successors is also a successor set. So there
is a least successor set, i.e. the intersection of all successor sets. Call this
ω
. This
will be our copy of N in V . So
(x)(x ω (y)(y is a successor set x y)).
Therefore, if we have
(x)[(x ω x is a successor set) x = ω],
by definition of
ω
. By the definition of “is a successor set”, we can write this as
(x)[(x ω x (y)(y x y
+
x)) x = ω].
This is genuine induction (in
V
)! It is not just our weak first-order axiom in
Peano’s axioms.
Also, we have
(x)(x ω x
+
= ) and (x)(y)((x ω y ω x
+
= y
+
) x = y)
by
ω
-induction (i.e. induction on
ω
). Hence
ω
satisfies the axioms of natural
numbers.
We can now write, e.g.
x
is finite” for (
y
)(
f
)(
y x f bijects x with y
).
Similarly, we define
x
is countable” to mean
x
is finite or
x
bijects with
ω”.
That’s about it for this axiom. Time for the next one:
Axiom of foundation
“Sets are built out of simpler sets”. We want to ban weird things like
x x
,
or
x y y x
, or similarly for longer chains. We also don’t want infinite
descending chains like ···x
3
x
2
x
1
x
0
.
How can we capture the “wrongness” of these weird things all together? In
the first case
x x
, we see that
{x}
has no
-minimal element (we say
y
is
-minimal in
x
if (
z x
)(
z ∈ y
)). In the second case,
{x, y}
has no minimal
element. In the last case, {x
0
, x
1
, x
2
, ···} has no -minimal element.
Axiom (Axiom of foundation). “Every (non-empty) set has an
-minimal
member”
(x)(x = (y)(y x (z)(z x z ∈ y))).
This is sometimes known as the Axiom of regularity.
Note that most of the time, we don’t actually use the Axiom of Foundation.
It’s here just so that our universe “looks nice”. Most results in set theory don’t
rely on foundation.
We will later show that this entails that all sets in
V
can be “built out of”
the empty set, but that’s left for a later time.
Axiom of replacement
In ordinary mathematics, we often say things like “for each
x I
, we have some
A
i
. Now take
{A
i
:
i I}
”. For example, (after defining ordinals), we want to
have the set {ω + i : i ω}.
How do we know that
{A
i
:
i I}
is a set? How do we know that
i 7→ A
i
is a
function, i.e. that
{
(
i, A
i
) :
i I}
is a set? It feels like it should be. We want an
axiom that says something along the line of “the image of a set under a function
is a set”. However, we do not know that the thing is a function yet. So we will
have “the image of a set under something that looks like a function is a set”.
To formally talk about “something that looks like a function”, we need a
digression on classes:
Digression on classes
x 7→ {x}
for all
x
looks like a function, but isn’t it, since every function
f
has a
(set) domain, defined as a suitable subset of
SS
f
, but our function here has
domain V .
So what is this x 7→ {x}? We call it a class.
Definition (Class). Let (
V,
) be an
L
-structure. A class is a collection
C
of
points of
V
such that, for some formula
p
with free variable
x
(and maybe more
funny parameters), we have
x C p holds.
Intuitively, everything of the form {x V : p(x)} is a class.
Note that here we are abusing notation. When we say
x C
, the symbol
does not mean the membership relation in
V
. Inside the theory, we should view
x C as a shorthand of p(x) holds”.
Example.
(i) V is a class, by letting p be x = x”.
(ii)
For any
t
,
{x V
:
t x}
is a class, with
p
being
t x
. Here
t
is a
parameter.
(iii) For any set y V , y is a class let p be x y”.
Definition (Proper class). We say
C
is a proper class if
C
is not a set (in
V
), ie
¬(y)(x)(x y p).
Similarly, we can have
Definition (Function-class). A function-class
F
is a collection of ordered pairs
such that there is a formula
p
with free variables
x, y
(and maybe more) such
that
(x, y) F p holds, and (x, y) F (x, z) F y = z.
Example. x 7→ {x} is a function-class: take p to be y = {x}.
Back to replacement
How do we talk about function-classes? We cannot refer to classes inside
V
.
Instead, we must refer to the first-order formula p.
Axiom (Axiom of replacement). “The image of a set under a function-class is a
set”. This is an axiom scheme, with an instance for each first-order formula p:
(t
1
) ···(t
n
)
| {z }
parameters
[(x)(y)(z)((p p[z/y]) y = z)]
| {z }
p defines a function-class
[(x)(y)(z y (t)(t x p[t/x, z/y]))]
| {z }
image of x under F is a set
.
Example. For any set
x
, we can form
{{t}
:
t x}
by
p
being
y
=
{x}
”.
However, this is a bad use of replacement, because we can already obtain that
by power set (and separation).
We will give a good example later.
So that’s all the axioms of ZF. Note that we did not include the Axiom of
choice!
Axiom of choice
Definition (ZFC). ZFC is the axioms ZF + AC, where AC is the axiom of
choice, “every family of non-empty sets has a choice function”.
(f)[(x)(x dom f f(x) = )
(g)(dom g = dom f) (x)(x dom g g(x) f (x))]
Here we define a family of sets
{A
i
:
i I}
to be a function
f
:
I V
such that
i 7→ A
i
.
5.2 Properties of ZF
Now, what does V look like? We start with the following definition:
Definition (Transitive set). A set
x
is transitive if every member of a member
of x is a member of x:
(y)((z)(y z z x) y x).
This can be more concisely written as
S
x x
, but is very confusing and
impossible to understand!
This notion seems absurd and it is. It is only of importance in the context
of set theory and understanding what
V
looks like. It is an utterly useless notion
in, say, group theory.
Example.
{∅, {∅}}
is transitive. In general, for any
n ω
,
n
is transitive.
ω
is
also transitive.
Lemma. Every x is contained in a transitive set.
Note that this is a theorem of
ZF
, i.e. it officially means: let (
V,
) be a
model of ZF. Then in V , <stuff above>. Equivalently, ZF <stuff above>.
We know that any intersection of transitive sets is transitive. So this lemma
will tell us that
x
is contained in a least transitive set, called the transitive
closure of x, or T C(x).
Proof.
We’d like to form
x
(
S
x
)
(
SS
x
)
(
SSS
x
)
···
”. If this makes
sense, then we are done, since the final product is clearly transitive. This will be
a set by the union axiom applied to
{x,
S
x,
SS
x, ···}
, which itself is a set by
replacement applied to
ω
, for the function-class 0
7→ x
, 1
7→
S
x
, 2
7→
SS
x
etc.
Of course we have to show that the above is a function class, i.e. can be
expressed as a first order relation. We might want to write the sentence as:
p(s, t) is (s = 0 t = x) (u)(v)(s = u + 1 t =
S
v p(u, v)),
but this is complete nonsense! We are defining p in terms of itself!
The solution would be to use attempts, as we did previously for recursion. We
define
f
is an attempt” to mean
f
is a function and
dom f ω
and
dom f
=
and
f
(0) =
x
and (
n
)(
n ω n dom f
)
f
(
n
) =
S
f
(
n
1), i.e.
f
is
defined for some natural numbers and meet our requirements when it is defined.
Then it is easy to show that two attempts
f
and
f
agree whenever both are
defined. Also,
n ω
, there is an attempt
f
defined for
n
(both by
ω
-induction).
Note that the definition of an attempt is a first-order formula. So our function
class is
p(s, t) is (f )(f is an attempt y dom f f (y) = z).
This is a good use of replacement, unlike our example above.
We previously said that Foundation captures the notion “every set is built
out of simpler sets”. What does that exactly mean? If this is true, we should be
able to do induction on it: if p(y) for all y x, then p(x).
If this sounds weird, think about the natural numbers: everything is built
from 0 using +1. So we can do induction on ω.
Theorem (Principle of
-induction). For each formula
p
, with free variables
t
1
, ··· , t
n
, x,
(t
1
) ···(t
n
)
[(x)((y)(y x p(y))) p(x)] (x)(p(x))
Note that officially, p(y) means p[y/x] and p(x) is simply x.
Proof.
Given
t
1
, ··· , t
n
, suppose
¬
(
x
)
p
(
x
). So we have some
x
with
¬p
(
x
).
Similar to how we proved regular induction on naturals from the well-ordering
principle (in IA Numbers and Sets), we find a minimal
x
such that
p
(
x
) does
not hold.
While foundation allows us to take the minimal element of a set,
{y
:
¬p
(
y
)
}
need not be a set e.g. if p(y) is y = y.
Instead, we pick a single
x
such that
¬p
(
x
). Let
u
=
T C
(
{x}
). Then
{y u
:
¬p
(
y
)
}
=
, since
x u
. So it has an
-minimal element, say
y
, by
Foundation. Then each
z y
has
z u
since
u
is transitive. Hence
p
(
z
) by
minimality of y. But this implies p(y). Contradiction.
Note that here we used the transitive closure to reduce from reasoning about
the whole scary V , to an actual set T C(x). This technique is rather useful.
We have now used Foundation to prove
-induction. It turns out that the
two are in fact equivalent.
Proposition. -induction Foundation.
Proof.
To deduce foundation from
-induction, the obvious
p
(
x
)
x
has an
-minimal member, doesn’t work.
Instead, consider p(x) given by
(y) x y y has an -minimal member.
If
p
(
x
) is true, we say
x
is regular. To show that (
x
)
p
(
x
), it is enough to show
that: if every y x is regular, then x is regular.
Given any
z
with
x z
, we want to show that
z
has an
-minimal member.
If
x
is itself minimal in
z
, then done. Otherwise, then
y z
for some
y x
.
But since y x, y is regular. So z has a minimal element.
Hence all
x
is regular. Since all non-empty sets contain at least one element
(by definition), all sets have -minimal member.
This looked rather simple to prove. However, it is because we had the clever
idea of regularity. If we didn’t we would be stuck for a long time!
Now what about recursion? Can we define f(x) using f(y) for all y x?
Theorem (
-recursion theorem). Let
G
be a function-class, everywhere defined.
Then there is a function-class
F
such that
F
(
x
) =
G
(
F |
x
) for all
x
. Moreover,
F is unique (cf. definition of recursion on well-orderings).
Note that F |
x
= {(y, F (y)) : y x} is a set, by replacement.
Proof.
We first show existence. Again, we prove this with attempts. Define
f
is an attempt” to mean
f
is a function and
dom f
is transitive and (
x
)(
x
dom f f (x) = G(f |
x
))”.
Then by simple -induction, we have
(x)(f
)[(f an attempt defined at x
f
an attempt defined at x) f(x) = f
(x)].
Also, (
x
)(
f
)(
f an attempt defined at x
), again by
-induction: suppose for
each
y x
, there exists an attempt defined at
y
. So there exists a unique attempt
f
y
with domain
T C
(
{y}
). Set
f
=
S
yx
f
y
, and let
f
=
f
S
{
(
x, G
(
f|
x
)
}
. Then
this is an attempt defined at x.
So we take q(x, y) to be
(f)(f is an attempt defined at x with f(x) = y).
Uniqueness follows form -induction.
Note that this is exactly the same as the proof of recursion on well-orderings.
It is just that we have some extra mumbling about transitive closures.
So we proved recursion and induction for
. What property of the relation-
class
(with
p
(
x, y
) defined as
x y
) did we use? We most importantly used
the Axiom of foundation, which says that
(i) p is well-founded: every set has a p-minimal element.
(ii) p
is local: ie,
{x
:
p
(
x, y
)
}
is a set for each
y
. We needed this to form the
transitive closure.
Definition (Well-founded relation). A relation-class
R
is well-founded if every
set has a R-minimal element.
Definition (Local relation). A relation-class
R
is local if
{x
:
p
(
x, y
)
}
is a set
for each y.
So we will have
Proposition.
p
-induction and
p
-recursion are well-defined and valid for any
p(x, y) that is well-founded and local.
Proof. Same as above.
Note that if
r
is a relation on a set
a
, then
r
is trivially local. So we just
need r to be well-founded. In particular, our induction and recursion theorems
for well-orderings are special cases of this.
We have almost replicated everything we proved for well-orderings, except
for subset collapse. We will now do that.
This is motivated by the following question: can we model a given relation
on a set by ?
For example, let
a
=
{b, c, d}
, with
b r c
and
c r d
. Can we find a set
{b
, c
, d
}
such that
b
c
and
c
d
? Yes. We can put
b
=
,
c
=
{∅}
,
d
=
{{∅}}
.
Moreover, a
= {b
, c
, d
} is transitive.
Definition (Extensional relation). We say a relation
r
on set
a
is extensional if
(x a)(y a)((z a)(z r x z r y) x = y).
i.e. it obeys the axiom of extension.
Theorem (Mostowski collapse theorem). Let
r
be a relation on a set
a
that is
well-founded and extensional. Then there exists a transitive
b
and a bijection
f
:
a b
such that (
x, y a
)(
x r y f
(
x
)
f
(
y
)). Moreover,
b
and
f
are
unique.
Note that the two conditions “well-founded” and “extensional” are trivially
necessary, since is both well-founded and extensional.
Proof.
Existence: define
f
on
a
the obvious way
f
(
x
) =
{f
(
y
) :
y r x}
. This
is well-defined by
r
-recursion, and is a genuine function, not just of a function
class by replacement it is an image of a.
Let
b
=
{f
(
x
) :
x a}
(this is a set by replacement). We need to show that
it is transitive and bijective.
By definition of
f
,
b
is transitive, and
f
is surjective as
b
is defined to be the
image of f. So we have to show that f is injective.
We’ll show that (
x a
)(
f
(
y
) =
f
(
x
)
y
=
x
) for each
x a
, by
r
-
induction. Given
y a
, with
f
(
y
) =
f
(
x
), we have
{f
(
t
) :
t r y}
=
{f
(
s
) :
s r y}
by definition of
f
. So
{t
:
t r y}
=
{s
:
s r x}
by the induction hypothesis. Hence
x = y since r is extensional.
So we have constructed such an
b
and
f
. Now we show it is unique: for any
suitable f, f
, we have f(x) = f
(x) for all x a by r-induction.
Recall that we defined the ordinals to be the “equivalence class” of all well-
orderings. But this is not a good definition since the ordinals won’t be sets.
Hence we define them (formally) as follows:
Definition (Ordinal). An ordinal is a transitive set, totally ordered by .
This is automatically well-ordered by , by foundation.
Example.
,
{∅}
,
{∅, {∅}}
are ordinals. Any
n ω
as
n
=
{
0
,
1
, ··· , n
1
}
, as
well as ω itself, are ordinals.
Why is this a good definition? Mostowski says that each well-ordering is
order-isomorphic to a unique ordinal (using our definition of ordinal above)
this is its order-type. So here, instead of saying that the ordinals is the
equivalence class of well-orderings, we simply choose one representative of each
equivalence class (given by Mostowski collapse), and call that the ordinal.
For any ordinal
α
,
I
α
=
{β
:
β < α}
is a well-ordering of order-type
α
.
Applying Mostowski collapse, we get α = {β : β < α}. So β < α iff β α.
So, for example,
α
+
=
α {α}
, and
sup{α
k
:
k I}
=
S
{α
i
:
i I}
,
Set theorists are often write suprema as unions, but this is a totally unhelpful
notation!
5.3 Picture of the universe
What does the universe look like? We start with the empty set, and take the
power set, repeatedly, transfinitely.
Definition (von Neumann hierarchy). Define sets
V
α
for
α On
(where
On
is
the class of ordinals) by -recursion:
(i) V
0
= .
(ii) V
α+1
= P(V
α
).
(iii) V
λ
=
S
{V
γ
: γ < λ} for λ a non-zero limit ordinal.
On
V
0
=
V
1
= {∅}
V
7
.
.
.
.
.
.
V
ω
V
ω+1
.
.
.
.
.
.
V
ω+ω
.
.
.
.
.
.
Note that by definition, we have x V
α
x V
α+1
.
We would like every
x
to be in some
V
α
, and that is indeed true. We prove
this though a series of lemmas:
Lemma. Each V
α
is transitive.
Proof.
Since we define
V
α
by recursion, it is sensible to prove this by induction:
By induction on α:
(i) Zero: V
0
= is transitive.
(ii)
Successors: If
x
is transitive, then so is
P
(
x
): given
y z P
(
x
), we want
to show that
y P
(
x
). Since
y
is in a member of
P
(
x
), i.e. a subset of
x
,
we must have y x. So y x since x is transitive. So y P(x).
(iii) Limits: Any union of transitive sets is transitive.
Lemma. If α β, then V
α
V
β
.
Proof. Fix α, and induct on β.
(i) β = α: trivial
(ii)
Successors:
V
β
+
V
β
since
x P
(
x
) for transitive
x
. So
V
α
V
β
V
α
V
β
+
.
(iii) Limits: Trivial by definition
Finally we are ready for:
Theorem. Every x belongs to some V
α
. Intuitively, we want to say
V =
[
αOn
V
α
,
We’ll need some terminology to start with. If
x V
α
for some
α
, then there
is a least α with x V
α
. We call this the rank of x.
For example,
rank
(
) = 0,
rank
(
{∅}
) = 1. Also
rank
(
ω
) =
ω
. In fact,
rank(α) = α for all α On.
Note that we want the least α such that x V
α
, not x V
α
.
Proof. We’ll show that (x)(α)(x V
α
) by -induction on x.
So we are allowed to assume that for each
y x
, we have
y V
α
for some
α
.
So y V
rank(y)
, or y V
rank(y)+1
.
Let
α
=
sup{
(
rank
(
y
)
+
:
y x}
. Then
y V
α
for every
y x
. So
x V
α
.
Our definition of rank is easy to get wrong it is easy to be off by of 1. So
the official definition is
Definition (Rank). The rank of a set x is defined recursively by
rank(x) = sup{(rank y)
+
: y x}.
Then the initial definition we had is now a proposition.
Proposition. rank(x) is the first α such that x V
α
.
6 Cardinals
In this chapter, we will look at the “sizes” of (infinite) sets (finite sets are
boring!). We work in ZFC, since things become really weird without Choice.
Since we will talk about bijections a lot, we will have the following notation:
Notation. Write x y for f : f is a bijection from x to y.
6.1 Definitions
We want to define
card
(
x
) (the cardinality, or size of
x
) in such a way that
card
(
x
) =
card
(
y
)
x y
. We can’t define
card
(
x
) =
{y
:
y x}
as it may
not be a set. So we want to pick a “representative” of the sets that biject with
x, like how we defined the ordinals.
So why not use the ordinals? By Choice, we know that all
x
is well-orderable.
So x α for some α. So we define:
Definition (Cardinality). The cardinality of a set
x
, written
card
(
x
), is the
least ordinal α such that x α.
Then we have (trivially) card(x) = card(y) x y.
(What if we don’t have Choice, i.e. if we are in ZF? This will need a really
clever trick, called the Scott trick. In our universe of ZF, there is a huge of
blob of things that biject with
x
. We cannot take the whole blob (it won’t be a
set), or pick one of them (requires Choice). So we “chop off” the blob at fixed,
determined point, so we are left with a set.
Define the essential rank of
x
to be the least rank of all
y
such that
y x
.
Then set card(x) = {y V
essrank(x)
+
: y x}.)
So what cardinals are there? Clearly we have 1, 2, 3, ···. What else?
Definition (Initial ordinal). We say an ordinal α is initial if
(β < α)(¬β α),
i.e. it is the smallest ordinal of that cardinality.
Then 0
,
1
,
2
,
3
, ··· , ω, ω
1
, γ
(
X
) for any
X
are all initial. However,
ω
ω
is not
initial, as it bijects with ω (both are countable).
Can we find them all? Yes!
Definition (Omega ordinals). We define ω
α
for each α On by
(i) ω
0
= ω;
(ii) ω
α+1
= γ(ω
α
);
(iii) ω
λ
= sup{ω
α
: α < λ} for non-zero limit λ.
It is easy induction to show that each
ω
α
is initial. We can also show that
every initial
δ
(for
δ ω
) is an
ω
α
. We know that the
ω
α
are unbounded since,
say ω
α
α for all α. So there is a least α with ω
α
δ.
If
α
is a successor, then let
α
=
β
+
. Then
ω
β
< δ ω
α
. But there is no
initial ordinal between
ω
β
and
ω
α
=
γ
(
ω
β
), since
γ
(
X
) is defined as the least
ordinal that does not biject with X. So we must have δ = ω
α
.
If
α
is a limit, then since
ω
α
is defined as a supremum, by definition we
cannot have
δ < ω
α
, or else there is some
β < α
with
δ < ω
β
. So
ω
α
=
δ
as well.
Definition (Aleph number). Write
α
(“aleph-α”) for card(ω
α
).
Then from the argument above, we have
Theorem. The
α
are the cardinals of all infinite sets (or, in ZF, the cardinals
of all infinite well-orderable sets). For example, card(ω) =
0
, card ω
1
=
1
.
We will use lower case letters to denote cardinalities and upper case for the
sets with that cardinality. e.g. card(N) = n.
Definition (Cardinal (in)equality). For cardinals
n
and
m
, write
m n
if
M
injects into
N
, where
card M
=
m, card N
=
n
. This clearly does not depend on
M and N .
So
m n
and
n m
implies
n
=
m
by Schr¨oder-Bernstein. Write
m < n
if
m n by m = n.
Example. card(P(ω)) > card(ω).
This is a partial order. Moreover, it is a total order (assuming AC).
6.2 Cardinal arithmetic
Definition (Cardinal addition, multiplication and exponentiation). For cardinals
m, n
, write
m
+
n
for
card
(
M N
);
mn
for
card
(
M ×N
); and
m
n
for
card
(
M
N
),
where
M
N
=
{f
:
f is a function N M}
. Note that this coincides with our
usual definition of X
n
for finite n.
Example. R P(ω) 2
ω
. So card(R) = card(P
ω
) = 2
0
.
Similarly, define
X
iI
m
i
= card
G
iI
M
i
!
.
Example. How many sequences of reals are there? A real sequence is a function
from ω R. We have
card(R
ω
) = (2
0
)
0
= 2
0
×ℵ
0
= 2
0
= card(R)
Note that we used facts like
Proposition.
(i) m + n = n + m since N M N N with the obvious bijection.
(ii) mn = nm using the obvious bijection
(iii)
(
m
n
)
p
=
m
np
as (
M
N
)
P
M
N×P
since both objects take in a
P
and an
N and returns an M.
It is important to note that cardinal exponentiation is different from ordinal
exponentiation. For example,
ω
ω
(ordinal exponentiation) is countable, but
0
0
2
0
>
0
(cardinal exponentiation).
From IA Numbers and sets, we know that
0
0
=
0
. What about
1
1
?
Or
3
0
?
It turns out that cardinal sums and multiplications are utterly boring:
Theorem. For every ordinal α,
α
α
=
α
.
This is the best we could ever ask for. What can be simpler?
Proof.
Since the Alephs are defined by induction, it makes sense to prove it by
induction.
In the following proof, there is a small part that doesn’t work nicely with
α = 0. But α = 0 case (ie
0
0
=
0
) is already done. So assume α = 0.
Induct on
α
. We want
ω
α
× ω
α
to biject with
ω
α
, i.e. well-order
ω
α
× ω
α
to
an ordering of length ω
α
.
Using the ordinal product clearly doesn’t work. The ordinal product counts
the product in rows, so we have many copies of
ω
α
. When we proved
0
0
=
0
,
we counted them diagonally. But counting diagonally here doesn’t look very
nice, since we will have to “jump over” infinities. Instead, we count in squares
ω
α
ω
α
We set (
x, y
)
<
(
x
, y
) if either
max
(
x, y
)
< max
(
x
, y
) (this says that (
x
, y
)
is in a bigger square), or, (say
max
(
x, y
) =
max
(
x
, y
) =
β
and
y
=
β, y < β
or
x
=
x
=
β, y < y
or
y
=
y
=
β, x < x
) (nonsense used to order things in the
same square utterly unimportant).
How do we show that this has order type
ω
α
? We show that any initial
segment has order type < ω
α
.
For any proper initial segment I
(x,y)
, we have
I
(x,y)
β × β
for some β < ω
α
, since ω
α
is a limit, with wlog β infinite. So
β × β β
by induction hypothesis (their cardinality is less that ω
α
). So
card(β × β) < card(ω
α
).
Hence
I
(x,y)
has order type
< ω
α
. Thus the order type of our well-order is
ω
α
.
So
ω
α
× ω
α
injects into
ω
α
. Since trivially
ω
α
injects into
ω
α
× ω
α
, we have
ω
α
× ω
α
ω
α
.
So why did we say cardinal arithmetic is boring? We have
Corollary. Let α β. Then
α
+
β
=
α
β
=
β
.
Proof.
β
α
+
β
β
+
β
= 2
β
β
×
β
=
β
,
So done
Example. X X bijects with X, for infinite X (in ZFC).
However, cardinal exponentiation is very hard. For example, is 2
0
=
1
?
This is the continuum hypothesis, and cannot be proved or disproved in ZFC!
Even today, not all implications among values of
β
α
are known, i.e. we don’t
know whether they are true, false or independent!
7 Incompleteness*
The big goal of this (non-examinable) chapter is to show that PA is incomplete,
i.e. there is a sentence p such that PA ⊢ p and PA ⊢ ¬p.
The strategy is to find a p that is true in N, but PA ⊢ p.
Here we say “true” to mean “true in
N
”, and “provable” to mean “PA proves
it”.
The idea is to find a
p
that says “I am not provable”. More precisely, we
want a
p
such that
p
is true if and only if
p
is not provable. We are then done:
p
must be true, since if
p
were false, then it is provable, i.e.
PA p
. So
p
holds
in every model of PA, and in particular,
p
holds in
N
. Contradiction. So
p
is
true. So p is not provable.
We’ll have to “code” formulae, proofs etc. inside PA, i.e. as numbers. But
this doesn’t seem possible it seems like, in any format,
p
is not provable”
must be longer than p. So p cannot say p is not provable“!
So be prepared for some magic to come up in the middle for the proof!
Definability
We first start with some notions of definability.
Definition (Definability). A subset
S N
is definable if there is a formula
p
with one free variable such that
m N : m S p(m) holds.
Similarly, f : N N is definable if there exists a formula p(x, y) such that
m, n N : f (m) = n p(m, n) holds.
Example. The set of primes is definable: p(x) is
x = 1 (y)(z)(yz = x (y = 1 z = 1)).
We can say m is prime” is definable.
How about powers of 2? We don’t have exponentiation here. What can we
do? We can take p(x) to be
(y)((y is prime y | x) y = 2),
where 2 is a shorthand for
s
(
s
(0)), and
y | x
is a shorthand for (
z
)(
yz
=
x
). So
this is also definable.
The function m 7→ m
2
is also definable: take p(x, y) to be yy = x.
Here we will assume:
Fact. Any function given by an algorithm is definable.
Proof will not be given here. See, eg, PTJ’s book for detailed proof.
Example. m 7→ 2
m
is definable.
Coding
Our language has 12 symbols:
s,
0
, ×,
+
, , ,
(
,
)
,
=
, x,
,
(where the variables
are now called x, x
, x
′′
, x
′′′
.
We assign values to them, say
v
(
s
) = 1
, v
(0) = 2
, ··· , v
(
) = 12. To code a
formula, we can take
2
v(first symbol)
· 3
v(second symbol)
···(nth prime)
v(nth symbol)
.
For example, (x)(x = x) is coded as
2
7
3
12
5
10
7
8
11
7
13
10
17
9
23
8
.
Not every
m
codes a formula, e.g. 2
7
3
12
5
10
is translated to (
x
, which is clearly
nonsense. Similarly, 2
7
7
3
or 2
100
can’t even be translated at all.
However,
m
codes a formula” is definable, as there is an algorithm that
checks that.
Write
S
m
for the formula coded by
m
(and set
S
m
to be
if
m
does not
code a formula). Similarly, write c(p) for the code of p.
Given a finite sequence p
1
, ···p
n
for formulae, code it as
2
c(p
1
)
3
c(p
2
)
···(nth prime)
c(p
n
)
.
Alternatively, we can add a separator character to our 12 symbols and concatenate
the sequence of formulae with the separator character.
Now,
m
codes an axiom (either logical or axiom of PA)” is definable, as
there exists an algorithm to check it. For example, an instance of the first logical
axiom can be validated by the following regex:
^\([s0()=x’\+×⊥ ]+\)([s0()=x’\+×⊥ ]+ \1)$
(after translating to a sentence and verifying it is a valid logical formula)
Also,
ϕ(ℓ, m, n) =“S
n
obtained from S
, S
m
via MP”
is definable, and similarly for generalization.
So
Θ(m, n) = n codes a proof of S
m
is definable.
Thus
Ψ(m) = S
m
is provable”
is definable as
Ψ(m) (n)Θ(m, n).
So far, everything is rather boring. We all know that strings of symbols can be
coded into numbers that’s what computers do!
Clever bit
Consider the statement χ(m) that states
m codes a formula S
m
with one free variable, and S
m
(m) is not provable.”
This is clearly definable. Suppose this is defined by p. So
χ(n) p[n/x]
Suppose c(p) = N. Then χ(N) asserts
N codes a formula S
N
with one free variable and S
N
(N) is not provable.”
But we already know that
N
codes the formula
χ
. So
χ
(
N
) asserts that
χ
(
N
) is
not provable.
So
Theorem (G¨odel’s incompleteness theorem). PA is incomplete.
Maybe that’s because PA is rubbish. Could we add some clever axiom
t
(true
in
N
) to PA, so that PA
∪{t}
is complete? No! Just run the same proof with
“PA” replaced by “PA∪{t} to show that PA∪{t} is incomplete.
But we can certainly extend PA to a complete theory let
T
=
{p
:
p holds in N}. What will go wrong in our proof? It can only be because
Theorem. “Truth is not definable”
T = {p : p holds in N} is not definable. This officially means
{m : m codes a member of T }
is not a definable set.
Next question: we proved that our clever statement
p
is true but not provable.
Why doesn’t this formalize into PA? The answer is, in the proof, we also used
the fact that PA has a model,
N
. By completeness, this means that we used the
statement con(PA), i.e. the statement that PA is consistent, or
(m)(m does not code a proof of ).
With this statement, the proof does formalize to PA. So PA
∪{con(PA)} p
.
Hence
Theorem. PA ⊢ con(PA).
The same proof works in ZF. So ZF is incomplete and ZF does not prove its
own consistency.