II Logic and Set Theory (Full)

Part II — Logic and Set Theory

Based on lectures by I. B. Leader

Notes taken by Dexter Chua

Lent 2015

These notes are not endorsed by the lecturers, and I have modified them (often

significantly) after lectures. They are nowhere near accurate representations of what

was actually lectured, and in particular, all errors are almost surely mine.

No specific prerequisites.

Ordinals and cardinals

Well-orderings and order-types. Examples of countable ordinals. Uncountable ordi-

nals and Hartogs’ lemma. Induction and recursion for ordinals. Ordinal arithmetic.

Cardinals; the hierarchy of alephs. Cardinal arithmetic. [5]

Posets and Zorn’s lemma

Partially ordered sets; Hasse diagrams, chains, maximal elements. Lattices and Boolean

algebras. Complete and chain-complete p osets; fixed-point theorems. The axiom of

choice and Zorn’s lemma. Applications of Zorn’s lemma in mathematics. The well-

ordering principle. [5]

Propositional logic

The propositional calculus. Semantic and syntactic entailment. The deduction and

completeness theorems. Applications: compactness and decidability. [3]

Predicate logic

The predicate calculus with equality. Examples of first-order languages and theories.

Statement of the completeness theorem; *sketch of proof*. The compactness theorem

and the Lowenheim-Skolem theorems. Limitations of first-order logic. Model theory. [5]

Set theory

Set theory as a first-order theory; the axioms of ZF set theory. Transitive closures,

epsilon-induction and epsilon-recursion. Well-founded relations. Mostowski’s collapsing

theorem. The rank function and the von Neumann hierarchy. [5]

Consistency

*Problems of consistency and independence* [1]

Contents

0 Introduction

1 Propositional calculus

1.1 Propositions

1.2 Semantic entailment

1.3 Syntactic implication

2 Well-orderings and ordinals

2.1 Well-orderings

2.2 New well-orderings from old

2.3 Ordinals

2.4 Successors and limits

2.5 Ordinal arithmetic

2.6 Normal functions*

3 Posets and Zorn’s lemma

3.1 Partial orders

3.2 Zorn’s lemma and axiom of choice

3.3 Bourbaki-Witt theorem*

4 Predicate logic

4.1 Language of predicate logic

4.2 Semantic entailment

4.3 Syntactic implication

4.4 Peano Arithmetic

4.5 Completeness and categoricity*

5 Set theory

5.1 Axioms of set theory

5.2 Properties of ZF

5.3 Picture of the universe

6 Cardinals

6.1 Definitions

6.2 Cardinal arithmetic

7 Incompleteness*

0 Introduction

Most people are familiar with the notion of “sets” (here “people” is defined

to be mathematics students). However, most of the time, we only have an

intuitive picture of what set theory should look like — there are sets, we can

take intersections, unions, intersections and subsets. We can write down sets

like {x : ϕ(x) is true}.

Historically, mathematicians were content with this vague notion of sets.

However, it turns out that our last statement wasn’t really correct. We cannot

just arbitrarily write down sets we like. This is evidenced by the famous Russel’s

paradox, where the set

is defined as

x ∈ x}

. Then we have

X ∈ X ⇔ X ∈ X, which is a contradiction.

This lead to the formal study of set theory, where set theory is given a formal

foundation based on some axioms of set theory. This is known as axiomatic

set theory. This is similar to Euclid’s axioms of geometry, and, in some sense,

the group axioms. Unfortunately, while axiomatic set theory appears to avoid

paradoxes like Russel’s paradox, as G¨odel proved in his incompleteness theorem,

we cannot prove that our axioms are free of contradictions.

Closely related to set theory is formal logic. Similarly, we want to put logic

on a solid foundation. We want to formally define our everyday notions such as

propositions, truth and proofs. The main result we will have is that a statement

is true if and only if we can prove it. This assures that looking for proofs is a

sensible way of showing that a statement is true.

It is important to note that having studied formal logic does not mean that

we should always reason with formal logic. In fact, this is impossible, as we

ultimately need informal logic to reason about formal logic itself!

Throughout the course, we will interleave topics from set theory and formal

logic. This is necessary as we need tools from set theory to study formal logic,

while we also want to define set theory within the framework of formal logic.

One is not allowed to complain that this involves circular reasoning.

As part of the course, we will also side-track to learn about well-orderings

and partial orders, as these are very useful tools in the study of logic and set

theory. Their importance will become evident as we learn more about them.

1 Propositional calculus

Propositional calculus is the study of logical statements such

p ⇒ p

and

p ⇒

(

q ⇒ p

). As opposed to predicate calculus, which will be studied in Chapter 4,

the statements will not have quantifier symbols like ∀, ∃.

When we say “

p ⇒ p

is a correct”, there are two ways we can interpret this.

We can interpret this as “no matter what truth value

takes,

p ⇒ p

always has

the truth value of “true”.” Alternatively, we can interpret this as “there is a

proof that p ⇒ p”.

The first notion is concerned with truth, and does not care about whether we

can prove things. The second notion, on the other hand, on talks about proofs.

We do not, in any way, assert that

p ⇒ p

is “true”. One of the main objectives

of this chapter is to show that these two notions are consistent with each other.

A statement is true (in the sense that it always has the truth value of “true”) if

and only if we can prove it. It turns out that this equivalence has a few rather

striking consequences.

Before we start, it is important to understand that there is no “standard”

logical system. What we present here is just one of the many possible ways of

doing formal logic. In particular, do not expect anyone else to know exactly how

your system works without first describing it. Fortunately, no one really writes

proof with formal logic, and the important theorems we prove (completeness,

compactness etc.) do not depend much on the exact implementation details of

the systems.

1.1 Propositions

We’ll start by defining propositions, which are the statements we will consider

in propositional calculus.

Definition (Propositions). Let

be a set of primitive propositions. These are a

bunch of (meaningless) symbols (e.g.

), which are used as the basic building

blocks of our more interesting propositions. These are usually interpreted to

take a truth value. Usually, any symbol (composed of alphabets and subscripts)

is in the set of primitive propositions.

The set of propositions, written as L or L(P ), is defined inductively by

(i) If p ∈ P , then p ∈ L.

(ii) ⊥ ∈ L, where ⊥ is read as “false” (also a meaningless symbol).

(iii) If p, q ∈ L, then (p ⇒ q) ∈ L.

Example. If our set of primitive propositions is

{p, q, r}

, then

p ⇒ q

p ⇒ ⊥, ((p ⇒ q) ⇒ (p ⇒ r)) are propositions.

To define L formally, we let

= {⊥} ∪ P

n+1

= L

∪ {(p ⇒ q) : p, q ∈ L

Then we define L = L

∪ L

∪ ···.

In formal language terms,

is the set of finite strings of symbols from the

alphabet

⊥ ⇒

( )

···

that satisfy some formal grammar rule (e.g. brackets

have to match).

Note here that officially, the only relation we have is

⇒

. The familiar “not”,

“and” and “or” do not exist. Instead, we define them to be abbreviations of

certain expressions:

Definition (Logical symbols).

¬p (“not p”) is an abbreviation for (p ⇒ ⊥)

p ∧ q (“p and q”) is an abbreviation for ¬(p ⇒ (¬q))

p ∨ q (“p or q”) is an abbreviation for (¬p) ⇒ q

The advantage of having just one symbol

⇒

is that when we prove something

about our theories, we only have to prove it for

⇒

, instead of all

⇒, ¬, ∧

and

∨

individually.

1.2 Semantic entailment

The idea of semantic entailment is to assign truth values to propositions, where

we declare each proposition to be “true” or “false”. This assignment is performed

by a valuation.

Definition (Valuation). A valuation on

is a function

L → {

}

such

that:

– v(⊥) = 0,

– v(p ⇒ q) =

(

0 if v(p) = 1, v(q) = 0,

1 otherwise

We interpret

(

) to be the truth value of

, with 0 denoting “false” and 1

denoting “true”.

Note that we do not impose any restriction of

(

) when

is a primitive

proposition.

For those people who like homomorphisms, we can first give the set

{

}

binary operation ⇒ by

a ⇒ b =

(

0 if a = 1, b = 0

1 otherwise

as well as a constant

⊥

= 0. Then a valuation can be defined as a homomorphism

between L and {0, 1} that preserves ⊥ and ⇒.

It should be clear that a valuation is uniquely determined by its values on

the primitive propositions, as the values on all other propositions follow from

the definition of a valuation. In particular, we have

Proposition.

(i) If v and v

′

are valuations with v(p) = v

′

(p) for all p ∈ P , then v = v

′

(ii)

For any function

P → {

}

, we can extend it to a valuation

such

that v(p) = w(p) for all p ∈ L.

Proof.

(i)

Recall that

is defined inductively. We are given that

(

) =

′

(

) on

. Then for all

p ∈ L

must be in the form

q ⇒ r

for

q, r ∈ L

. Then

(

q ⇒ r

) =

(

p ⇒ q

) since the value of

is uniquely determined by the

definition. So for all p ∈ L

, v(p) = v

′

(p).

Continue inductively to show that v(p) = v

′

(p) for all p ∈ L

for any n.

(ii)

Set

to agree with

for all

p ∈ P

, and set

(

⊥

) = 0. Then define

inductively according to the definition.

Example. Suppose v is a valuation with v(p) = v(q) = 1, v(r) = 0. Then

v((p ⇒ q) ⇒ r) = 0.

Often, we are interested in propositions that are always true, such as

p ⇒ p

These are known as tautologies.

Definition (Tautology).

is a tautology, written as

, if

(

) = 1 for all

valuations v.

To show that a statement is a tautology, we can use a truth table, where we

simply list out all possible valuations and find the value of v(t).

Example.

(i) |= p ⇒ (q ⇒ p) “A true statement is implied by anything”

v(p) v(q) v(q ⇒ p) v(p ⇒ (q ⇒ p))

1 1 1 1

1 0 1 1

0 1 0 1

0 0 1 1

(ii) |= (¬¬p) ⇒ p. Recall that ¬¬p is defined as ((p ⇒ ⊥) ⇒ ⊥).

v(p) v(p ⇒ ⊥) v((p ⇒ ⊥) ⇒ ⊥) v(((p ⇒ ⊥) ⇒ ⊥) ⇒ p)

1 0 1 1

0 1 0 1

(iii) |= [p ⇒ (q ⇒ r)] ⇒ [(p ⇒ q) ⇒ (p ⇒ r)].

Instead of creating a truth table, which would be horribly long, we show

this by reasoning: Suppose it is not a tautology. So there is a

such that

(

p ⇒

(

q ⇒ r

)) = 1 and

((

p ⇒ q

)

⇒

(

p ⇒ r

)) = 0. For the second

equality to hold, we must have

(

p ⇒ q

) = 1 and

(

p ⇒ r

) = 0. So

v(p) = 1, v(r) = 0, v(q) = 1. But then v(p ⇒ (q ⇒ r)) = 0.

Sometimes, we don’t want to make statements as strong as “

is always true”.

Instead, we might want to say “

is true whenever

is true”. This is known as

semantic entailment.

Definition (Semantic entailment). For

S ⊆ L

t ∈ L

, we say

entails

semantically implies t or

S |

if, for any

such that

(

) = 1 for all

s ∈ S

, we

have v(t) = 1.

Here we are using the symbol

= again. This is not an attempt to confuse

students. |= t is equivalent to the statement ∅ |= t.

Example. {p ⇒ q, q ⇒ r} |= (p ⇒ r).

We want to show that for any valuation

with

(

p ⇒ q

) =

(

q ⇒ r

) = 1, we

have v(p ⇒ r) = 1. We prove the contrapositive.

(

p ⇒ r

) = 0, then

(

) = 1 and

(

) = 0. If

(

) = 0, then

(

p ⇒ q

) = 0.

(

) = 1, then

(

q ⇒ r

) = 0. So

(

p ⇒ r

) = 0 only if one of

(

p ⇒ q

) or

v(q ⇒ r) is zero.

Note that

{p} |

and

p ⇒ q

both intuitively mean “if

is true, then

true”. However, these are very distinct notions.

p ⇒ q

is a proposition within

our theory. It is true (or false) in the sense that valuations take the value 0 (or

1).

On the other hand, when we say

{p} |

, this is a statement in the meta-

theory. It is true (or false) in the sense that we decided it is true (or false) based

on some arguments and (informal) proofs, performed in the real world instead

of inside propositional calculus.

The same distinction should be made when we define syntactic implication

later.

Before we dive into syntactic implication, we will define a few convenient

terms for later use.

Definition (Truth and model). If

(

) = 1, then we say that

is true in

, or

is a model of

. For

S ⊆ L

, a valuation

is a model of

(

) = 1 for all

s ∈ S

1.3 Syntactic implication

While semantic implication captures the idea of truthfulness, syntactic impli-

cation captures the idea of proofs. We want to say

syntactically implies

there we can prove t from S.

To prove propositions, we need two things. Firstly, we need axioms, which

are statements we can assume to be true in a proof. These are the basic building

blocks from which we will prove our theorems.

Other than axioms, we also need deduction rules. This allows as to make

deduce statements from other statements.

Our system of deduction composes of the following three axioms:

1. p ⇒ (q ⇒ p)

2. [p ⇒ (q ⇒ r)] ⇒ [(p ⇒ q) ⇒ (p ⇒ r)]

3. (¬¬p) ⇒ p

and the deduction rule of modus ponens: from p and p ⇒ q, we can deduce q.

At first sight, our axioms look a bit weird, especially the second one. We

will later see that how this particular choice of axioms allows us to prove certain

theorems more easily. This choice of axioms can also be motivated by combinatory

logic, but we shall not go into details of these.

Definition (Proof and syntactic entailment). For any

S ⊆ L

, a proof of

from

is a finite sequence

, t

, ···t

of propositions, with

, such that each

is one of the following:

(i) An axiom

(ii) A member of S

(iii) A proposition t

such that there exist j, k < i with t

being t

⇒ t

If there is a proof of

from

, we say that

proves or syntactically entails

written S ⊢ t.

If ∅ ⊢ t, we say t is a theorem and write ⊢ t.

In a proof of

from

is the conclusion and

is the set of hypothesis or

premises.

Example. {p ⇒ q, q ⇒ r} ⊢ p ⇒ r

We go for (p ⇒ q) ⇒ (p ⇒ r) via Axiom 2.

1. [p ⇒ (q ⇒ r)] ⇒ [(p ⇒ q) ⇒ (p ⇒ r)] Axiom 2

2. q ⇒ r Hypothesis

3. (q ⇒ r) ⇒ [p ⇒ (q ⇒ r)] Axiom 1

4. p ⇒ (q ⇒ r) MP on 2, 3

5. (p ⇒ q) ⇒ (p ⇒ r) MP on 1, 4

6. p ⇒ q Hypothesis

7. p ⇒ r MP on 5, 6

Example. ⊢ (p ⇒ p)

We go for [p ⇒ (p ⇒ p)] ⇒ (p ⇒ p).

1. [p ⇒ ((p ⇒ p) ⇒ p)] ⇒ [(p ⇒ (p ⇒ p)) ⇒ (p ⇒ p)] Axiom 2

2. p ⇒ ((p ⇒ p) ⇒ p) Axiom 1

3. [p ⇒ (p ⇒ p)] ⇒ (p ⇒ p) MP on 1, 2

4. p ⇒ (p ⇒ p) Axiom 1

5. p ⇒ p MP on 3, 4

This seems like a really tedious way to prove things. We now prove that the

deduction theorem, which allows as to find proofs much more easily.

Proposition (Deduction theorem). Let S ⊂ L and p, q ∈ L. Then we have

S ⊢ (p ⇒ q) ⇔ S ∪ {p} ⊢ q.

This says that ⊢ behaves like the connective ⇒ in the language.

Proof. (⇒) Given a proof of p ⇒ q from S, append the lines

– p Hypothesis

– q MP

to obtain a proof of q from S ∪ {p}.

(

⇐

) Let

, t

, ··· , t

be a proof of

from

S ∪ {p}

. We’ll show that

S ⊢ p ⇒ t

for all i.

We consider different possibilities of t

– t

is an axiom: Write down

◦ t

⇒ (p ⇒ t

) Axiom 1

◦ t

Axiom

◦ p ⇒ t

– t

∈ S: Write down

◦ t

⇒ (p ⇒ t

) Axiom 1

◦ t

Hypothesis

◦ p ⇒ t

– t

= p: Write down our proof of p ⇒ p from our example above.

– t

is obtained by MP: we have some

j, k < i

such that

= (

⇒ t

). We

can assume that

S ⊢

(

p ⇒ t

) and

S ⊢

(

p ⇒ t

) by induction on

. Now

we can write down

◦ [p ⇒ (t

⇒ t

)] ⇒ [(p ⇒ t

) ⇒ (p ⇒ t

)] Axiom 2

◦ p ⇒ (t

⇒ t

) Known already

◦ (p ⇒ t

) ⇒ (p ⇒ t

) MP

◦ p ⇒ t

Known already

◦ p ⇒ t

to get S ⊢ (p ⇒ t

This is the reason why we have this weird-looking Axiom 2 — it enables us to

easily prove the deduction theorem.

This theorem has a “converse”. Suppose we have a deduction system system

that admits modus ponens, and the deduction theorem holds for this system.

Then axioms (1) and (2) must hold in the system, since we can prove them using

the deduction theorem and modus ponens. However, we are not able to deduce

axiom (3) from just modus ponens and the deduction theorem.

Example. We want to show

{p ⇒ q, q ⇒ r} ⊢

(

p ⇒ r

). By the deduction

theorem, it is enough to show that

{p ⇒ q, q ⇒ r, p} ⊢ r

, which is trivial by

applying MP twice.

Now we have two notions:

= and

⊢

. How are they related? We want to

show that they are equal: if something is true, we can prove it; if we can prove

something, it must be true.

Aim. Show that S ⊢ t if and only if S |= t.

This is known as the completeness theorem, which is made up of two directions:

(i) Soundness: If S ⊢ t, then S |= t. “Our axioms aren’t absurd”

(ii)

Adequacy: If

S |

S ⊢ t

. “Our axioms are strong enough to be able to

deduce, from S, all semantic consequences of S.”

We first prove the easy bit:

Proposition (Soundness theorem). If S ⊢ t, then S |= t.

Proof.

Given valuation

with

(

) = 1 for all

s ∈ S

, we need to show that

v(t) = 1. We will show that every line t

in the proof has v(t

) = 1.

is an axiom, then

(

) = 1 since axioms are tautologies. If

is a

hypothesis, then by assumption

(

) = 1. If

is obtained by modus ponens, say

from t

⇒ t

, since v(t

) = 1 and v(t

⇒ t

) = 1, we must have v(t

) = 1.

Note that soundness holds whenever our axioms are all tautologies. Even if

we had silly axioms that are able to prove almost nothing, as long as they are

all tautologies, it will be sound.

Now we have to prove adequacy. It seems like a big and scary thing to prove.

Given that a statement is true, we have to find a proof for it, but we all know

that finding proofs is hard!

We first prove a special case. To do so, we first define consistency.

Definition (Consistent).

is inconsistent if

S ⊢ ⊥

is consistent if it is not

inconsistent.

The special case we will prove is the following:

Theorem (Model existence theorem). If

S |

⊥

, then

S ⊢ ⊥

. i.e. if

has no

model, then

is inconsistent. Equivalently, if

is consistent, then

has a

model.

While we called this a “special case”, it is in fact all we need to know to

prove adequacy. If we are given

S |

, then

S ∪ {¬t} |

⊥

. Hence using the

model existence theorem, we know that

S ∪ {¬t} ⊢ ⊥

. Hence by the deduction

theorem, we know that S ⊢ ¬¬t. But ⊢ (¬¬t) ⇒ t by Axiom 3. So S ⊢ t.

As a result, some books call this the “completeness theorem” instead, because

the rest of the completeness theorem follows trivially from this.

The idea of the proof is that we’d like to define v : L → {0, 1} by

p 7→

(

1 if p ∈ S

0 if p ∈ S

However, this is obviously going to fail, because we could have some

such that

S ⊢ p

but

p ∈ S

, i.e.

is not deductively closed. Yet this is not a serious problem

— we take the deductive closure first, i.e. add all the statements that

can prove.

But there is a more serious problem. There might be a

with

S ⊢ p

and

S ⊢ ¬p

. This is the case if, say,

never appears in

. The idea here is to

arbitrarily declare that

is true or false, and add

¬p

. What we have

to prove is that we can do so without making S consistent.

We’ll prove this in the following lemma:

Lemma. For consistent

S ⊂ L

and

p ∈ L

, at least one of

S ∪ {p}

and

S ∪ {¬p}

is consistent.

Proof.

Suppose instead that both

S ∪ {p} ⊢ ⊥

and

S ∪ {¬p} ⊢ ⊥

. Then by the

deduction theorem,

S ⊢ p

and

S ⊢ ¬p

. So

S ⊢ ⊥

, contradicting consistency of

Now we can prove the completeness theorem. Here we’ll assume that the

primitives

, and hence the language

is countable. This is a reasonable thing

to assume, since we can only talk about finitely many primitives (we only have

a finite life), so uncountably many primitives would be of no use.

However, this is not a good enough excuse to not prove our things properly.

To prove the whole completeness theorem, we will need to use Zorn’s lemma,

which we will discuss in Chapter 3. For now, we will just prove the countable

case.

Proof. Assuming that L is countable, list L as {t

, t

, ···}.

Let

. Then at least one of

S ∪ {t

}

and

S ∪ {¬t

}

is consistent. Pick

to be the consistent one. Then let

∪{t

}

∪{¬t

}

such that

is consistent. Continue inductively.

Set

∪S

···

. Then

p ∈

¬p ∈

for each

p ∈ L

by construction.

Also, we know that

is consistent. If we had

S ⊢ ⊥

, then since proofs are finite,

there is some

that contains all assumptions used in the proof of

S ⊢ ⊥

. Hence

⊢ ⊥, but we know that all S

are consistent.

Finally, we check that

is deductively closed: if

S ⊢ p

, we must have

p ∈

Otherwise, ¬p ∈

S. But this implies that

S is inconsistent.

Define v : L → {0, 1} by

p 7→

(

1 if p ∈

0 if not

All that is left to show is that this is indeed a valuation.

First of all, we have v(⊥) = 0 as ⊥ ∈

S (since

S is consistent).

For p ⇒ q, we check all possible cases.

(i)

(

) = 1

, v

(

) = 0, we have

p ∈

q ∈

. We want to show

p ⇒ q ∈

Suppose instead that

p ⇒ q ∈

. Then

S ⊢ q

by modus ponens. Hence

q ∈

since

is deductively closed. This is a contradiction. Hence we

must have v(p ⇒ q) = 0.

(ii)

(

) = 1, then

q ∈

. We want to show

p ⇒ q ∈

. By our first axiom,

we know that

⊢ q ⇒

(

p ⇒ q

). So

S ⊢ p ⇒ q

. So

p ⇒ q ∈

by deductive

closure. Hence we have v(p ⇒ q) = 1.

(iii) If v(p) = 0, then p ∈

S. So ¬p ∈

S. We want to show p ⇒ q ∈

– This is equivalent to showing ¬p ⊢ p ⇒ q.

– By the deduction theorem, this is equivalent to proving {p, ¬p} ⊢ q.

– We know that {p, ¬p} ⊢ ⊥. So it is sufficient to show ⊥ ⊢ q.

– By axiom 3, this is equivalent to showing ⊥ ⊢ ¬¬q.

–

By the deduction theorem, this is again equivalent to showing

⊢ ⊥ ⇒

¬¬q.

– By definition of ¬, this is equivalent to showing ⊢ ⊥ ⇒ (¬q ⇒ ⊥).

But this is just an instance of the first axiom. So we know that

S ⊢ p ⇒ q

So v(p ⇒ q) = 1.

Note that the last case is the first time we really use Axiom 3.

By remark before the proof, we have

Corollary (Adequacy theorem). Let S ⊂ L, t ∈ L. Then S |= t implies S ⊢ t.

Theorem (Completeness theorem). Le

S ⊂ L

and

t ∈ L

. Then

S |

if and

only if S ⊢ t.

This theorem has two nice consequences.

Corollary (Compactness theorem). Let

S ⊂ L

and

t ∈ L

with

S |

. Then

there is some finite S

′

⊂ S has S

′

|= t.

Proof. Trivial with |= replaced by ⊢, because proofs are finite.

Sometimes when people say compactness theorem, they mean the special

case where

⊥

. This says that if every finite subset of

has a model, then

has a model. This result can be used to prove some rather unexpected result in

other fields such as graph theory, but we will not go into details.

Corollary (Decidability theorem). Let

S ⊂ L

be a finite set and

t ∈ L

. Then

there exists an algorithm that determines, in finite and bounded time, whether

or not S ⊢ t.

Proof. Trivial with ⊢ replaced by |=, by making a truth table.

This is a rather nice result, because we know that proofs are hard to find in

general. However, this theorem only tells you whether a proof exists, without

giving you the proof itself!

2 Well-orderings and ordinals

In the coming two sections, we will study different orderings. The focus of this

chapter is well-orders, while the focus of the next is partial orders.

A well-order on a set

is a special type of total order where every non-empty

subset of

has a least element. Among the many nice properties of well-orders,

it is possible to do induction and recursion on well-orders.

Our interest, however, does not lie in well-orders itself. Instead, we are

interested in the “lengths” of well-orders. Officially, we call them them the order

types of the well-orders. Each order type is known as an ordinal.

There are many things we can do with ordinals. We can add and multiply

them to form “longer” well-orders. While we will not make much use of them in

this chapter, in later chapters, we will use ordinals to count “beyond infinity”,

similar to how we count finite things using natural numbers.

2.1 Well-orderings

We start with a few definitions.

Definition ((Strict) total order). A (strict) total order or linear order is a pair

(X, <), where X is a set and < is a relation on X that satisfies

(i) x < x for all x (irreflexivity)

(ii) If x < y, y < z, then x < z (transitivity)

(iii) x < y or x = y or y < x (trichotomy)

We have the usual shorthands for total orders. For example,

x > y

means

y < x and x ≤ y means (x < y or x = y).

It is also possible for a total order to be defined in terms of ≤ instead of <.

Definition ((Non-strict) total order). A (non-strict) total order is a pair (

X, ≤

where X is a set and ≤ is a relation on X that satisfies

(i) x ≤ x (reflexivity)

(ii) x ≤ y and y ≤ z implies x ≤ z (transitivity)

(iii) x ≤ y and y ≤ x implies x = y (antisymmetry)

(iv) x ≤ y or y ≤ x (trichotomy)

Example.

(i) N, Z, Q, R with usual the usual orders are total orders.

(ii)

(the positive integers), ‘

x < y

x | y

and

x 

’ is not trichotomous,

and so not a total order.

(iii)

(

), define ‘

x ≤ y

’ if

x ⊆ y

. This is not a total order since it is not

trichotomous (for |X| > 1).

While there can be different total orders, the particular kind we are interested

in is well-orders.

Definition (Well order). A total order (

X, <

) is a well-ordering if every (non-

empty) subset has a least element, i.e.

(∀S ⊆ X)[S = ∅ ⇒ (∃x ∈ S)(∀y ∈ S) y ≥ x].

Example.

(i) N with usual ordering is a well-order.

(ii) Z, Q, R

are not well-ordered because the whole set itself does not have a

least element.

(iii) {x ∈ Q

x ≥

}

is not well-ordered. For example,

{x ∈ X

x >

}

has no

least element.

(iv)

{

−

= 2

, ···}

is well-ordered because it is isomorphic to

the naturals.

(v) {

−

= 2

, ···} ∪ {

}

is also well-ordered, If the subset is only

1, then 1 is the least element. Otherwise take the least element of the

remaining set.

(vi) Similarly, {1 − 1/n : n = 2, 3, 4, ···} ∪ {2} is well-ordered.

(vii) {

−

= 2

, ···} ∪{

−

= 2

, ···}

is also well-ordered.

This is a good example to keep in mind.

There is another way to characterize total orders in terms of infinite decreasing

sequences.

Proposition. A total order is a well-ordering if and only if it has no infinite

strictly decreasing sequence.

Proof. If x

> x

> ···, then {x

: i ∈ N} has no least element.

Conversely, if non-empty

S ⊂ X

has no least element, then each

x ∈ S

have

′

∈ S with x

′

< x. Similarly, we can find some x

′′

< x

′

ad infinitum. So

x > x

′

> x

′′

> x

′′′

> ···

is an infinite decreasing sequence.

Like all other axiomatic theories we study, we identify two total orders to be

isomorphic if they are “the same up to renaming of elements”.

Definition (Order isomorphism). Say the total orders

X, Y

are isomorphic if

there exists a bijection

X → Y

that is order-preserving, i.e.

x < y ⇒ f

(

)

f(y).

Example.

(i) N and {1 −1/n : n = 2, 3, 4, ···} are isomorphic.

(ii) {

−

= 2

, ···}∪{

}

is isomorphic to

{

−

= 2

, ···}∪

{2}.

(iii) {

−

= 2

, ···}

and

{

−

= 2

, ···} ∪ {

}

are not

isomorphic because the second has a greatest element but the first doesn’t.

Recall from IA Numbers and Sets that in

, the well-ordering principle is

equivalent to the principle induction. We proved that we can do induction simply

by assuming that

is well-ordered. Using the same proof, we should be able to

prove that we can do induction on any well-ordered set.

Of course, a general well-ordered set does not have the concept of “+1”, so

we won’t be able to formulate weak induction. Instead, our principle of induction

is the form taken by strong induction.

Proposition (Principle by induction). Let

be a well-ordered set. Suppose

S ⊆ X has the property:

(∀x)





(∀y) y < x ⇒ y ∈ S



⇒ x ∈ S



then S = X.

In particular, if a property P (x) satisfies

(∀x)





(∀y) y < x ⇒ P (y)



⇒ P (x)



then P (x) for all x.

Proof.

Suppose

S 

. Let

be the least element of

X \S

. Then by minimality

of x, for all y, y < x ⇒ y ∈ S. Hence x ∈ S. Contradiction.

Using proof by induction, we can prove the following property of well-orders:

Proposition. Let

and

be isomorphic well-orderings. Then there is a unique

isomorphism between X and Y .

This is something special to well-orders. This is not true for general total

orderings. For example,

x 7→ x

and

x 7→ x −

13 are both isomorphisms

Z → Z

. It

is also not true for, say, groups. For example, there are two possible isomorphisms

from Z

to itself.

Proof.

Let

and

be two isomorphisms

X → Y

. To show that

, it is

enough, by induction, to show f(x) = g(x) given f(y) = g(y) for all y < x.

Given a fixed

, let

(

) :

y < x}

. We know that

Y \ S

is non-empty

since

(

)

∈ S

. So let

be the least member of

Y \ S

. Then we must have

(

) =

. Otherwise, we will have

a < f

(

) by minimality of

, which implies

that

−1

(

)

< x

since

is order-preserving. However, by definition of

, this

implies that a = f(f

−1

(a)) ∈ S. This is a contradiction since a ∈ Y \ S.

By the induction hypothesis, for

y < x

, we have

(

) =

(

). So we have

S = {g(y) : y < x} as well. Hence g(x) = min(Y \ S) = f(x).

If we have an ordered set, we can decide to cut off the top of the set and

keep the bottom part. What is left is an initial segment.

Definition (Initial segment). A subset

of a totally ordered

is an initial

segment if

x ∈ Y, y < x ⇒ y ∈ Y,

Example. For any

x ∈ X

, the set

{y ∈ X

y < x}

is an initial segment.

However, not every initial segment of

need to be in this form. For example,

x ≤

} ⊆ R

and

x <

or x

} ⊆ Q

are both initial segments not of

this form.

The next nice property of well-orders we have is that every proper initial

segment is of this form.

Proposition. Every initial segment

of a well-ordered set

is of the form

= {y ∈ X : y < x}.

Proof.

Take

min X \ Y

. Then for any

y ∈ I

, we have

y < x

. So

y ∈ Y

definition of x. So I

⊆ Y .

On the other hand, if

y ∈ Y

, then definitely

y 

. We also cannot have

y > x

since this implies

x ∈ Y

. Hence we must have

y < x

. So

y ∈ I

. Hence

Y ⊆ I

. So Y = I

The next nice property we want to show is that in a well-ordering

, every

subset S is isomorphic to an initial segment.

Note that this is very false for a general total order. For example, in

, no

initial segment is isomorphic to the subset

{

}

since every initial segment

is either infinite or empty. Alternatively, in

is not isomorphic to an

initial segment.

It is intuitively obvious how we can prove this. We simply send the minimum

element of

to the minimum of

, and the continue recursively. However, how

can we justify recursion? If we have the well-order

{

, ··· ,

}

, then we will

never reach 1 if we attempt to write down each step of the recursion, since we

can only get there after infinitely many steps. We will need to define some sort

of “infinite recursion” to justify our recursion on general well-orders.

We first define the restriction of a function:

Definition (Restriction of function). For

A → B

and

C ⊆ A

, the restriction

of f to C is

= {(x, f(x)) : x ∈ C}.

In this theorem (and the subsequent proof), we are treating functions as

explicitly a set of ordered pairs (

x, f

(

)). We will perform set operations on

functions and use unions to “stitch together” functions.

Theorem (Definition by recursion). Let

be a well-ordered set and

be any

set. Then for any function

(

X ×Y

)

→ Y

, there exists a function

X → Y

such that

f(x) = G(f |

)

for all x.

This is a rather weird definition. Intuitively, it means that

takes previous

values of

(

) and returns the desired output. This means that in defining

, we are allowed to make use of values of

. For example, we define

f(n) = nf (n − 1) for the factorial function, with f(0) = 1.

Proof.

We might want to jump into the proof and define

(0) =

(

∅

), where 0

is the minimum element. Then we define

(1) =

(

(0)) etc. But doing so is

simply recursion, which is the thing we want to prove that works!

Instead, we use the following clever trick: We define an “

is an attempt” to

mean

h : I → Y for some initial segment I of X, and h(x) = G(h|

) for x ∈ I.

The idea is to show that for any

, there is an attempt

that is defined at

Then take the value

(

) to be

(

). However, we must show this is well-defined

first:

Claim. If attempts h and h

′

are defined at x, then h(x) = h

′

(x).

By induction on

, it is enough to show that

(

) =

′

(

) assuming

(

) =

′

(y) for all y < x. But then h(x) = G(h|

) = G(h

′

) = h

′

(x). So done.

Claim. For any x, there must exist an attempt h that is defined at x.

Again, we may assume (by induction) that for each

y < x

, there exists an

attempt

defined at

. Then we put all these functions together, and take

′

y<x

. This is defined for all

y < x

, and is well-defined since the

never disagree.

Finally, add to it (

x, G

(

′

)). Then

′

∪

(

x, G

(

′

)) is an attempt

defined at x.

Now define

X → Y

(

) =

if there exists an attempt

, defined at

x, with h(x) = y.

Claim. There is a unique such f.

Suppose

and

′

both work. Then if

(

) =

′

(

) for all

y < x

, then

(

) =

′

(

) by definition. So by induction, we know for all

, we have

′

(x) = f(x).

With the tool of recursion, we can prove that every subset of a well-order.

Lemma (Subset collapse). Let

be a well-ordering and let

Y ⊆ X

. Then

isomorphic to an initial segment of

. Moreover, this initial segment is unique.

Proof.

For

Y → X

to be an order-preserving bijection with an initial segment

of X, we need to map x to the smallest thing not yet mapped to, i.e.

f(x) = min(X \ {f(y) : y < x}).

To be able to take the minimum, we have to make sure the set is non-empty, i.e.

(

) :

y < x} 

. We can show this by proving that

(

)

< x

for all

z < x

induction, and hence x ∈ {f(y) : y < x}.

Then by the recursion theorem, this function exists and is unique.

This implies that a well-ordered

can never be isomorphic to a proper

initial segment of itself. This is since

is isomorphic to itself by the identity

function, and uniqueness shows that it cannot be isomorphic to another initial

segment.

Using the idea of initial segments, we can define an order comparing different

well-orders themselves.

Notation. Write X ≤ Y if X is isomorphic to an initial segment of Y .

We write

X < Y

X ≤ Y

but

is not isomorphic to

, i.e.

is isomorphic

to a proper initial segment of Y .

Example. If X = N, Y = {

, ···} ∪ {1}, then X ≤ Y .

We will show that this is a total order. Of course, we identify two well-orders

as “equal” when they are isomorphic.

Reflexivity and transitivity are straightforward. So we prove trichotomy and

antisymmetry:

Theorem. Let X, Y be well-orderings. Then X ≤ Y or Y ≤ X.

Proof. We attempt to define f : X → Y by

f(x) = min(Y \ {f (y) : y < x}).

By the law of excluded middle, this function is either well-defined or not.

If it is well-defined, then it is an isomorphism from

to an initial segment

of Y .

If it is not, then there is some

such that

(

) :

y < x}

and we cannot

take the minimum. Then

is a bijection between

y < x}

and

. So

is an isomorphism between Y and an initial segment of X.

Hence either X ≤ Y or Y ≤ X.

Theorem. Let

X, Y

be well-orderings with

X ≤ Y

and

Y ≤ X

. Then

and

Y are isomorphic.

Proof.

Since

X ≤ Y

, there is an order-preserving function

X → Y

that

bijects

with an initial segment of

. Similarly, since

Y ≤ X

, we get an

analogous

Y → X

. Then

g ◦ f

X → X

defines a bijection between

and

an initial segment of X.

Since there is no bijection between

and a proper initial segment of itself,

the image of g ◦ f must be X itself. Hence g ◦ f is a bijection.

Similarly,

f ◦ g

is a bijection. Hence

and

are both bijections, and

and

Y are isomorphic.

2.2 New well-orderings from old

Given a well-ordering

, we want to create more well-orderings. We’ve previously

shown that we can create a shorter one by taking an initial segment. In this

section, we will explore two ways to make longer well-orderings.

Add one element

We can extend a well-ordering by exactly one element. This is known as the

successor.

Definition (Successor). Given

, choose some

x ∈ X

and define a well-ordering

X ∪ {x}

by setting

y < x

for all

y ∈ X

. This is the successor of

, written

We clearly have X < X

Put some together

More interestingly, we want to “stitch together” many well-orderings. However,

we cannot just arbitrarily stitch well-orderings together. The well-orderings must

satisfy certain nice conditions for this to be well-defined.

Definition (Extension). For well-orderings (

X, <

) and (

Y, <

), we say

extends

is a proper initial segment of

and

agree when

defined.

Note that we explicitly require

to be an initial segment of

simply

being a subset of Y will not work, for reasons that will become clear shortly.

Definition (Nested family). We say well-orderings

i ∈ I}

form a nested

family if for any i, j ∈ I, either X

extends X

, or X

extends X

Proposition. Let

i ∈ I}

be a nested set of well-orderings. Then there

exists a well-ordering X with X

≤ X for all i.

Proof.

Let

i∈I

with

defined on

i∈I

(where

is the

ordering of

), i.e. we inherit the orders from the

’s. This is clearly a total

ordering. Since

i ∈ I}

is a nested family, each

is an initial segment of

To show that it is a well-ordering, let

S ⊆ X

be a non-empty subset of

Then

S ∩X

is non-empty for some

. Let

be the minimum element (in

) of

S ∩X

. Then also for any

y ∈ S

, we must have

x ≤ y

, as

is an initial segment

of X.

Note that if we didn’t require

to be an initial segment of

when defining

’extension’, then the above proof will not work. For example, we can take the

collection of all subsets

{x ≥ −n

x ∈ Z}

, and their union would be

which is not well-ordered.

2.3 Ordinals

We have already shown that the collection of all well-orderings is a total order.

But is it a well-ordering itself? To investigate this issue further, we first define

ourselves a convenient way of talking about well-orderings.

Definition (Ordinal). An ordinal is a well-ordered set, with two regarded as

the same if they are isomorphic. We write ordinals as Greek letters α, β etc.

We would want to define ordinals as equivalence classes of well-orders under

isomorphism, but we cannot, because they do not form a set. We will provide a

formal definition of ordinals later when we study set theory.

Definition (Order type). If a well-ordering

has corresponding ordinal

, we

say X has order type α, and write otp(X) = α.

Notation. For each

k ∈ N

, we write

for the order type of the (unique)

well-ordering of size k. We write ω for the order type of N.

Example. In R, {2, 3, 5, 6} has order type 4. {

, ···} has order type ω.

Notation. For ordinals

α, β

, write

α ≤ β

X ≤ Y

for some

of order type

. This does not depend on the choice of

and

(since any

two choices must be isomorphic).

Proposition. Let

be an ordinal. Then the ordinals

< α

form a well-ordering

of order type α.

Notation. Write I

= {β : β < α}.

Proof.

Let

have order type

. The well-orderings

< X

are precisely (up to

isomorphism) the proper initial segments of

(by uniqueness of subset collapse).

But these are the

for all

x ∈ X

. So we can biject

with the well-orderings

< X by x 7→ I

Finally, we can prove that the ordinals are well-ordered.

Proposition. Let

be a non-empty set of ordinals. Then

has a least element.

Proof. Choose α ∈ S. If it is minimal, done.

If not, then

S ∩ I

is non-empty. But

is well-ordered. So

S ∩ I

has a

least element, β. Then this is a minimal element of S.

However, it would be wrong to say that the ordinals form a well-ordered set,

for the very reason that they don’t form a set.

Theorem (Burali-Forti paradox). The ordinals do not form a set.

Proof.

Suppose not. Let

be the set of ordinals. Then

is a well-ordering.

Let its order-type be

. Then

is isomorphic to

, a proper initial subset of

X. Contradiction.

Recall that we could create new well-orderings from old via adding one

element and taking unions. We can translate these into ordinal language.

Given an ordinal

, suppose that

is the corresponding well-order. Then

we define α

to be the order type of X

If we have a set

{α

i ∈ I}

of ordinals, we can stitch them together to form

a new well-order. In particular, we apply “nested well-orders” to the initial

segments

i ∈ I}

. This produces an upper bound of the ordinals

. Since

the ordinals are well-ordered, we know that there is a least upper bound. We

call this the supremum of the set

{α

i ∈ I}

, written

sup{α

i ∈ I}

. In fact,

the upper bound created by nesting well-orders is the least upper bound.

Example. {2, 4, 6, 8, ···} has supremum ω.

Now we have two ways of producing ordinals: +1 and supremum.

We can generate a lot of ordinals now:

0 ω · 2 + 1 ω

+ 1 ω

· 3 ω

ω+2

+ 1

1 ω · 2 + 2 ω

+ 2 ω

· 4

2 ω · 2 + 3 ω

+ 3 ω

· 5 ω

ω·2

· 2

ω ω · 3 ω

+ ω ω

ω + 1 ω · 4

ω + 2 ω · 5 ω

+ ω · 2 ω

ω + ω = ω · 2 ω · ω = ω

· 2 ω

ω+1

= ε

Here we introduced a lot of different notations. For example, we wrote

+ 1 to

mean

, and

ω ·

2 =

sup{ω, ω

+ 1

, ω

+ 2

, ···}

. We will formally define these

notations later.

We have written a lot of ordinals above, some of which are really huge.

However, all the ordinals above are countable. The operations we have done so

far is adding one element and taking countable unions. So the results are all

countable. So is there an uncountable ordinal?

Theorem. There is an uncountable ordinal.

Proof.

This is easy by looking at the supremum of the set of all countable

ordinals. However, this works only if the collection of countable ordinals is a set.

Let

{R ∈ P

(

N × N

) :

R is a well-ordering of a subset of N}

. So

A ⊆

(

N × N

). Then

{order type of R

R ∈ A}

is the set of all countable

ordinals.

Let

sup B

. Then

is uncountable. Indeed, if

were countable, then

it would be the greatest countable ordinal, but

+ 1 is greater and is also

countable.

By definition,

is the least uncountable ordinal, and everything in our

previous big list of ordinals is less than ω

There are two strange properties of ω

(i) ω

is an uncountable ordering, yet for every

x ∈ ω

, the set

y < x}

countable.

(ii)

Every sequence in

is bounded, since its supremum is a countable union

of countable sets, which is countable.

In general, we have the following theorem:

Theorem (Hartogs’ lemma). For any set

, there is an ordinal that does not

inject into X.

Proof. As before, with B = {α : α injects into X}.

Notation. Write

(

) for the least ordinal that does not inject into

. e.g.

γ(ω) = ω

2.4 Successors and limits

In general, we can divide ordinals into two categories. The criteria is as follows:

Given an ordinal

, is there a greatest element of

? i.e. does

{β

β < α}

have a greatest element?

If yes, say

is the greatest element. Then

γ ∈ I

⇔ γ ≤ β

. So

{β}∪I

In other words, α = β

Definition (Successor ordinal). An ordinal

is a successor ordinal if there is a

greatest element β below it. Then α = β

On the other hand, if no, then for any

γ < α

, there exists

β < α

such that

β > γ. So α = sup{β : β < α}.

Definition (Limit ordinal). An ordinal

is a limit if it has no greatest element

below it. We usually write λ for limit ordinals.

Example. 5 and

are successors.

and 0 are limits (0 is a limit because it

has no element below it, let alone a greatest one!).

2.5 Ordinal arithmetic

We want to define ordinals arithmetic such as + and

, so that we can make

formal sense out of our notations such as ω + ω in our huge list of ordinals.

We first start with addition.

Definition (Ordinal addition (inductive)). Define

by recursion on

(

is fixed):

– α + 0 = α.

– α + β

= (α + β)

– α + λ = sup{α + γ : γ < λ} for non-zero limit λ.

Note that officially, we cannot do “recursion on the ordinals”, since the

ordinals don’t form a set. So what we officially do is that we define

{γ

γ < β}

recursively for each ordinal

. Then by uniqueness of recursions, we

can show that this addition is well-defined.

Example. ω + 1 = (ω + 0)

= ω

ω + 2 = (ω + 1)

= ω

1 + ω = sup{1 + n : n ≤ ω} = sup{1, 2, 3, ···} = ω.

It is very important to note that addition is not commutative! This asymmetry

arises from our decision to perform recursion on β instead of α.

On the other hand, addition is associative.

Proposition. Addition is associative, i.e. (α + β) + γ = α + (β + γ).

Proof.

Since we define addition by recursion, it makes sense to prove this by

induction. Since we recursed on the right-hand term in the definition, it only

makes sense to induct on γ (and fix α + β).

(i) If γ = 0, then α + (β + 0) = α + β = (α + β) + 0.

(ii) If γ = δ

is a successor, then

α + (β + δ

) = α + (β + δ)

= [α + (β + δ)]

= [(α + β) + δ]

= (α + β) + δ

= (α + β) + γ.

(iii) If γ is a limit ordinal, we have

(α + β) + λ = sup{(α + β) + γ : γ < λ}

= sup{α + (β + γ) : γ < λ}

If we want to evaluate

+ (

), we have to first know whether

a successor or a limit. We now claim it is a limit:

sup{β

γ < λ}

. We show that this cannot have a greatest

element: for any

, since

is a limit ordinal, we can find a

′

such that

γ < γ

′

< λ. So β + γ

′

> β + γ. So β + γ cannot be the greatest element.

α + (β + λ) = sup{α + δ : δ < β + λ}.

We need to show that

sup{α + δ : δ < β + λ} = sup{α + (β + γ) : γ < λ}.

Note that the two sets are not equal. For example, if

= 3 and

then the left contains α + 2 but the right does not.

So we show that the left is both ≥ and ≤ the right.

≥: Each element of the right hand set is an element of the left.

≤

: For

δ < β

, we have

δ < sup{β

γ < λ}

. So

δ < β

for some

γ < λ. Hence α + δ < α + (β + γ).

Note that it is easy to prove that

β < γ ⇒ α

β < α

by induction on

(which we implicitly assumed above). But it is not true if we add on the right:

1 < 2 but 1 + ω = 2 + ω.

The definition we had above is called the inductive definition. There is an

alternative definition of + based on actual well-orders. This is known as the

synthetic definition.

Intuitively, we first write out all the elements of

, then write out all the

elements of β after it. The α + β is the order type of the combined mess.

Definition (Ordinal addition (synthetic)).

is the order type of

α ⊔ β

(

disjoint union β, e.g. α × {0} ∪ β × {1}), with all α before all of β

α + β = α β

Example. ω + 1 = ω

1 + ω = ω.

With this definition, associativity is trivial:.

α + (β + γ) = α β γ = (α + β) + γ.

Now that we have given two definitions, we must show that they are the same:

Proposition. The inductive and synthetic definition of + coincide.

Proof.

Write + for inductive definition, and +

′

for synthetic. We want to show

that α + β = α +

′

β. We induct on β.

(i) α + 0 = α = α +

′

(ii) α + β

= (α + β)

= (α +

′

β)

= otp α

β · = α +

′

(iii) α

sup{α

γ < λ}

sup{α

′

γ < λ}

′

. This works

because taking the supremum is the same as taking the union.

α γ γ

′

′′

···λ

The synthetic definition is usually easier to work with, if possible. For

example, it was very easy to show associativity using the synthetic definition. It

is also easier to see why addition is not commutative. However, if we want to do

induction, the inductive definition is usually easier.

After addition, we can define multiplication. Again, we first give an inductive

definition, and then a synthetic one.

Definition (Ordinal multiplication (inductive)). We define

α · β

by induction

on β by:

(i) α · 0 = 0.

(ii) α · (β

) = α · β + α.

(iii) α · λ = sup{α · γ : γ < λ} for λ a non-zero limit.

Example.

– ω · 1 = ω · 0 + ω = 0 + ω = ω.

– ω · 2 = ω · 1 + ω = ω + ω.

– 2 · ω = sup{2 · n : n < ω} = ω.

– ω · ω = sup{ω · n : n < ω} = sup{ω, ω

, ω

, ···}.

We also have a synthetic definition.

Definition (Ordinal multiplication (synthetic)).











Formally,

α ·β

is the order type of

α ×β

, with (

x, y

)

(

′

, y

′

) if

y < y

′

or (

′

and x < x

′

Example. ω · 2 =

= ω + ω.

Also 2 · ω = ω











··

= ω

We can check that the definitions coincide, prove associativity etc. similar to

what we did for addition.

We can define ordinal exponentiation, towers etc. similarly:

Definition (Ordinal exponentiation (inductive)). α

is defined as

(i) α

= 1

(ii) α

= α

· α

(iii) α

= sup{α

: γ < λ}.

Example. ω

= ω

· ω = 1 · ω = ω.

= ω

· ω = ω · ω.

= sup{2

: n < ω} = ω.

2.6 Normal functions*

Note: These content were not lectured during the year.

When we have ordinals, we would like to consider functions

On → On

Since the ordinals are totally ordered, it would make sense to consider the

order-preserving functions, i.e. the increasing ones. However, ordinals have an

additional property — we could take suprema of ordinals. If we want our function

to preserve this as well, we are lead to the following definition:

Definition (Normal function). A function f : On → On is normal if

(i) For any ordinal α, we have f(α) < f(α

(ii) If λ is a non-zero limit ordinal, then f (λ) = sup{f(γ) : γ < λ}.

Some replace the increasing condition by

(

)

< f

(

). These are easily

seen to be equivalent by transfinite induction.

Example. By definition, we see that for each

β >

1, the function

α 7→ β

normal.

We start by a few technical lemmas.

Lemma. Let f be a normal function. Then f is strictly increasing.

Proof. Let α be a fixed ordinal. We induct on all β > α that f(α) < f(β).

If β = α

, then the result is obvious.

with

γ 

, then

α < γ

. So

(

)

< f

(

)

< f

(

) =

(

) by

induction.

If β is a limit and is greater than α, then

f(β) = sup{f(γ) : γ < β} ≥ f(α

) > f (α),

since α

< β. So the result follows.

Lemma. Let f be a normal function, and α an ordinal. Then f (α) ≥ α.

Proof.

We prove by induction. It is trivial for zero. For successors, we have

f(α

) > f (α) ≥ α, so f(α

) ≥ α

. For limits, we have

f(λ) = sup{f (γ) : γ < λ} ≥ sup{γ : γ < λ} = λ.

The following is a convenient refinement of the continuity result:

Lemma. If

is a normal function, then for any non-empty set

{α

}

i∈I

, we have

f(sup{α

: i ∈ I}) = sup{f(α

) : i ∈ I}.

Proof.

{α

}

has a maximal element, then the result is obvious, as

is increasing,

and the supremum is a maximum.

Otherwise, let

α = sup{α

: i ∈ I}

Since the

has no maximal element, we know

must be a limit ordinal. So we

have

f(α) = sup{f (β) : β < α}.

So it suffices to prove that

sup{f(β) : β < α} = sup{f(α

) : i ∈ I}.

Since all α

< α, we have sup{f (β) : β < α} ≥ sup{f (α

) : i ∈ I}.

For the other direction, it suffices, by definition, to show that

f(β) ≤ sup{f(α

) : i ∈ I}

for all β < α.

Given such a

, since

is the supremum of the

, we can find some particular

such that

β < α

. So

(

)

< f

(

)

≤ sup{f

(

) :

i ∈ I}

. So we are done.

Because of these results, some define normal functions to be functions that

are strictly increasing and preserve all suprema.

We now proceed to prove two important properties of normal functions (with

easy proofs!):

Lemma (Fixed-point lemma). Let

be a normal function. Then for each

ordinal α, there is some β ≥ α such that f(β) = β.

Since the supremum of fixed points is also a fixed point (by normality), it

follows that we can define a function

On → On

that enumerates the fixed

points. Now this function itself is again normal, so it has fixed points as well. . .

Proof. We thus define

β = sup{f (α), f(f (α)), f(f (f (α))), ···}.

If the sequence eventually stops, then we have found a fixed point. Otherwise,

is a limit ordinal, and thus normality gives

f(β) = sup{f(f(α)), f (f(f(α))), f (f(f(f(α)))), ···} = β.

So β is a fixed point, and β ≥ f(α) ≥ α.

Lemma (Division algorithm for normal functions). Let

be a normal function.

Then for all α, there is some maximal γ such that α ≥ f(γ).

Proof. Let γ = sup{β : f(β) ≤ α}. Then we have

f(γ) = sup{f(β) : f (b) ≤ α} ≤ α.

This is clearly maximal.

3 Posets and Zorn’s lemma

In this chapter, we study partial orders. While there are many examples of

partial orders, the most important example is the power set

(

) for any set

ordered under inclusion. We will also consider subsets of the power set.

The two main theorems of this chapter are Knaster-Tarski fixed point theorem

and Zorn’s lemma. We will use Zorn’s lemma to prove a lot of useful results in

different fields, including the completeness theorem in propositional calculus.

Finally, we will investigate the relationship between Zorn’s lemma and Axiom of

Choice.

3.1 Partial orders

Definition (Partial ordering (poset)). A partially ordered set or poset is a pair

(X, ≤), where X is a set and ≤ is a relation on X that satisfies

(i) x ≤ x for all x ∈ X (reflexivity)

(ii) x ≤ y and y ≤ z ⇒ x ≤ z (transitivity)

(iii) x ≤ y and y ≤ x ⇒ x = y (antisymmetry)

We write

x < y

to mean

x ≤ y

and

x 

. We can also define posets in terms

of <:

(i) x < x for all x ∈ X (irreflexive)

(ii) x < y and y < z ⇒ x < z (transitive)

Example.

(i) Any total order is (trivially) a partial order.

(ii) N with “x ≤ y” if x | y is a partial order.

(iii) P(S) with ⊆ for any set S is a partial order.

(iv) Any subset of P(S) with inclusion is a partial order.

(v) We can use a diagram

Where “above” means “greater”. So

a ≤ b ≤ c

a ≤ d ≤ e

, and what

follows by transitivity. This is a Hasse diagram.

Definition (Hasse diagram). A Hasse diagram for a poset

consists of

a drawing of the points of

in the plane with an upwards line from

y if y covers x:

Definition (Cover). In a poset,

covers

y > x

and no

has

y > z > x

Hasse diagrams can be useful — e.g. N, or useless, e.g. Q.

(vi)

The following example shows that we cannot assign “heights” or “ranks”

to posets:

(vii) We can also have complicated structures:

(viii)

Or the empty poset (let

be any set and nothing is less than anything

else).

While there are many examples of posets, all we care about are actually

power sets and their subsets only.

Often, we want to study subsets of posets. For example, we might want to

know if a subset has a least element. All subsets are equal, but some subsets are

more equal than others. A particular interesting class of subsets is a chain.

Definition (Chain and antichain). In a poset, a subset

is a chain if it is

totally ordered, i.e. for all

x ≤ y

y ≤ x

. An antichain is a subset in

which no two things are related.

Example. In (N, |), 1, 2, 4, 8, 16, ··· is a chain.

In (v), {a, b, c} or {a, c} are chains.

R is a chain in R.

Definition (Upper bound and supremum). For

S ⊂ X

, an upper bound for

an x ∈ X such that ∀y ∈ S : x ≥ y.

x ∈ X

is a least upper bound, supremum or join of

, written

sup S

, if

is an upper bound for

, and for all

y ∈ X

, if

is an upper bound,

then y ≥ x.

Example.

(i) In R, {x : x <

√

2} has an upper bound 7, and has a supremum

√

(ii)

In (v) above, consider

{a, b}

. Upper bounds are

and

. So

sup

However, {b, d} has no upper bound!

(iii) In (vii), {a, b} has upper bounds c, d, e, but has no least upper bound.

Definition (Complete poset). A poset

is complete if every

S ⊆ X

has a

supremum. In particular, it has a greatest element (i.e.

such that

∀y

x ≥ y

namely sup X, and least element (i.e. x such that ∀y : x ≤ y), namely sup ∅.

It is very important to remember that this definition does not require that

the subset

is bounded above or non-empty. This is different from the definition

of metric space completeness.

Example.

– R is not complete because R itself has no supremum.

–

1] is complete because every subset is bounded above, and so has a least

upper bound. Also, ∅ has a supremum of 0.

– (0, 1) is not complete because (0, 1) has no upper bound.

– P

(

) for any

is always complete, because given any

i ∈ A}

, where

each A

⊆ S,

is its supremum.

Now we are going to derive fixed-point theorems for complete posets. We

start with a few definitions:

Definition (Fixed point). A fixed point of a function

X → X

is an

such

that f (x) = x.

Definition (Order-preserving function). For a poset

X → X

is order-

preserving of x ≤ y ⇒ f(x) ≤ f(y).

Example.

– On N, x 7→ x + 1 is order-preserving

– On Z, x 7→ x − 1 is order-preserving

–

On (0

1),

x 7→

1+x

is order-preserving (this function halves the distance

from x to 1).

– On P(S), let some fixed i ∈ S. Then A 7→ A ∪ {i} is order-preserving.

Not every order-preserving

has a fixed point (e.g. first two above). However,

we have

Theorem (Knaster-Tarski fixed point theorem). Let

be a complete poset,

and f : X → X be a order-preserving function. Then f has a fixed point.

Proof.

To show that

(

) =

, we need

(

)

≤ x

and

(

)

≥ x

. Let’s not be

too greedy and just want half of it:

Let

x ≤ f

(

)

}

. Let

sup E

. We claim that this is a fixed point,

by showing f(s) ≤ s and s ≤ f(s).

To show

s ≤ f

(

), we use the fact that

is the least upper bound. So if

we can show that

(

) is also an upper bound, then

s ≤ f

(

). Now let

x ∈ E

x ≤ s

. Therefore

(

)

≤ f

(

) by order-preservingness. Since

x ≤ f

(

) (by

definition of E) x ≤ f (x) ≤ f(s). So f(s) is an upper bound.

To show

(

)

≤ s

, we simply have to show

(

)

∈ E

, since

is an upper

bound. But we already know

s ≤ f

(

). By order-preservingness,

(

)

≤ f

(

)).

So f (s) ∈ E by definition.

While this proof looks rather straightforward, we need to first establish that

s ≤ f

(

), then use this fact to show

(

)

≤ s

. If we decided to show

(

)

≤ s

first, then we would fail!

The very typical application of Knaster-Tarski is the quick, magic proof of

Cantor-Shr¨oder-Bernstein theorem.

Corollary (Cantor-Schr¨oder-Bernstein theorem). Let

A, B

be sets. Let

A →

B and g : B → A be injections. Then there is a bijection h : A → B.

Proof.

We try to partition

into

and

, and

into

and

, such that

f(P ) = R and g(S) = Q. Then we let h = f on R and g

−1

on Q.

Since S = B \ R and Q = A \ P , so we want

P = A \ g(B \ f(P ))

Since the function

P 7→ A \ g

(

B \ f

(

)) from

(

) to

(

) is order-preserving

(and P(a) is complete), the result follows.

The next result we have is Zorn’s lemma. The main focus of Zorn’s lemma is

on maximal elements.

Definition (Maximal element). In a poset

x ∈ X

is maximal if no

y ∈ X

has y > x.

Caution! Under no circumstances confuse a maximal element with a maximum

element, except under confusing circumstances! A maximum element is defined

as an

such that all

y ∈ X

satisfies

y ≤ x

. These two notions are the same in

totally ordered sets, but are very different in posets.

Example. In the poset

c and e are maximal.

Not every poset has a maximal element, e.g.

N, Q, R

. In each of these, not

only are they incomplete. They have chains that are not bounded above.

Theorem (Zorn’s lemma). Assuming Axiom of Choice, let

be a (non-empty)

poset in which every chain has an upper bound. Then it has a maximal element.

Note that “non-empty” is not a strictly necessary condition, because if

an empty poset, then the empty chain has no upper bound. So the conditions

can never be satisfied.

The actual proof of Zorn’s lemma is rather simple, given what we’ve had so

far. We “hunt” for the maximal element. We start with

. If it is maximal,

done. If not, we find a bigger

. If

is maximal, done. Otherwise, keep go on.

If we never meet a maximal element, then we have an infinite chain. This

has an upper bound

. If this is maximal, done. If not, find

ω+1

> x

. Keep

going on.

We have not yet reached a contradiction. But suppose we never meet a

maximal element. If

is countable, and we can reach

, then we have found

uncountably many elements in a countable set, which is clearly nonsense!

Since the ordinals can be arbitrarily large (Hartogs’ lemma), if we never

reach a maximal element, then we can get find more elements that X has.

Proof.

Suppose not. So for each

x ∈ X

, we have

′

∈ X

with

′

> x

. We denote

the-element-larger-than-x by x

′

We know that each chain C has an upper bound, say u(C).

Let γ = γ(X), the ordinal-larger-than-X by Hartogs’ lemma.

We pick x ∈ X, and define x

for α < γ recursively by

– x

= x

– x

= x

′

– x

= u({x

: α < λ})

′

for non-zero limit λ

Of course, we have to show that

α < λ}

is a chain. This is trivial by

induction.

Then α 7→ x

is an injection from γ → X. Contradiction.

Note that we could as well have defined

(

α < λ}

), and we can

easily prove it is still an injection. However, we are lazy and put the “prime”

just to save a few lines of proof.

This proof was rather easy. However, this is only because we are given

ordinals, definition by recursion, and Hartogs’ lemma. Without these tools, it is

rather difficult to prove Zorn’s lemma.

A typical application of Zorn’s lemma is: Does every vector space have a

basis? Recall that a basis of

is a subset of

that is linearly independent (no

finite linear combination = 0) and spanning (ie every

x ∈ V

is a finite linear

combination from it).

Example.

– Let V be the space of all real polynomials. A basis is {1, x, x

, x

, ···}.

–

Let

be the space of all real sequences. Let e

be the sequence with all

0 except 1 in the

th place. However,

{

}

is not a basis, since 1

, ···

cannot be written as a finite linear combination of them. In fact, there is

no countable basis (easy exercise). It turns out that there is no “explicit”

basis.

–

Take

as a vector space over

. A basis here, if exists, is called a Hamel

basis.

Using Zorn’s lemma, we can prove that the answer is positive.

Theorem. Every vector space V has a basis.

Proof. We go for a maximal linearly independent subset.

Let

be the set of all linearly independent subsets of

, ordered by inclusion.

We want to find a maximal

B ∈ X

. Then

is a basis. Otherwise, if

does

not span

, choose

x ∈ span B

. Then

B ∪ {x}

is independent, contradicting

maximality.

So we have to find such a maximal

. By Zorn’s lemma, we simply have to

show that every chain has an upper bound.

Given a chain

i ∈ I}

, a reasonable guess is to try the union. Let

. Then

A ⊆ A

for all

, by definition. So it is enough to check that

A ∈ X, i.e. is linearly independent.

Suppose not. Say

···

= 0 for some

···λ

scalars (not all

0). Suppose

∈ A

, ···x

∈ A

for some

, ···i

∈ I

. Then there is some

that contains all

, since they form a finite chain. So

contains all

This contradicts the independence of A

Hence by Zorn’s lemma, X has a maximal element. Done.

Another application is the completeness theorem for propositional logic when

P , the primitives, can be uncountable.

Theorem (Model existence theorem (uncountable case)). Let

S ⊆ L

(

) for any

set of primitive propositions P . Then if S is consistent, S has a model.

Proof.

We need a consistent

S ⊆ S

such that

∀t ∈ L

t ∈

¬t ∈

. Then we

have a valuation

(

) =

(

1 t ∈

0 t ∈

, as in our original proof for the countable

case.

So we seek a maximal consistent

S ⊇ S

. If

is maximal, then if

t ∈

, then

we must have

S ∪ {t}

inconsistent, i.e.

S ∪ {t} ⊢ ⊥

. By deduction theorem, this

means that

S ⊢ ¬t

. By maximality, we must have

¬t ∈

. So either

¬t

is in

Now we show that there is such a maximal

. Let

{T ⊆ L

T is consistent , T ⊇ S}

. Then

X 

∅

since

S ∈ X

. We show that any

non-empty chain has an upper bound. An obvious choice is, again the union.

Let

i ∈ I}

be a non-empty chain. Let

. Then

T ⊇ T

for all

So to show that T is an upper bound, we have to show T ∈ X.

Certainly,

T ⊇ S

, as any

contains

(and the chain is non-empty). So we

want to show

is consistent. Suppose

T ⊢ ⊥

. So we have

, ··· , t

∈ T

with

, ··· , t

} ⊢ ⊥

, since proofs are finite. Then some

contains all

since

are nested. So

is inconsistent. This is a contradiction. Therefore

must be

consistent.

Hence by Zorn’s lemma, there is a maximal element of X.

This proof is basically the same proof that every vector space has a basis! In

fact, most proofs involving Zorn’s lemma are similar.

3.2 Zorn’s lemma and axiom of choice

Recall that in the proof of Zorn’s, we picked

, then picked

′

, then picked

′′

ad infinitum. Here we are making arbitrary choices of

′

. In particular, we have

made infinitely many arbitrary choices.

We did the same in IA Numbers and Sets, when proving a countable union

of countable sets is countable, because we chose, for each

, a listing of

and then count them diagonally. We needed to make a choice because each

has a lot of possible listings, and we have to pick exactly one.

In terms of “rules for producing sets”, we are appealing to the axiom of choice,

which states that you can pick an element of each

whenever

i ∈ I}

is a

family of non-empty sets. Formally,

Axiom (Axiom of choice). Given any family

i ∈ I}

of non-empty sets,

there is a choice function f : i →

such that f(i) ∈ A

This is of a different character from the other set-building rules (e.g. unions

and power sets exist). The difference is that the other rules are concrete. We

know exactly what

A ∪ B

is, and there is only one possible candidate for what

A ∪ B

might be. “Union” uniquely specifies what it produces. However, the

choice function is not.

i ∈ I}

can have many choice functions, and the

axiom of choice does not give us a solid, explicit choice function. We say the

axiom of choice is non-constructive.

We are not saying that it’s wrong, but it’s weird. For this reason, it is often

of interest to ask “Did I use AC?” and “Do I need AC?”.

(It is important to note that the Axiom of Choice is needed only to make

infinite choices. It is trivially true if

|I|

= 1, since

A 

∅

by definition means

∃x ∈ A. We can also do it for two sets. Similarly, for |I| finite, we can do it by

induction. However, in general, AC is required to make infinite choices, i.e. it

cannot be deduced from the other axioms of set theory)

In the proof of Zorn’s we used Choice. However, do we need it? Is it possible

to prove it without Choice?

The answer is it is necessary, since we can deduce AC from Zorn’s. In other

words, we can write down a proof of AC from Zorn’s, using only the other

set-building rules.

Theorem. Zorn’s Lemma ⇔ Axiom of choice.

As in the past uses of Zorn’s lemma, we have a big scary choice function to

produce. We know that we can do it for small cases, such as when

|I|

= 1. So we

start with small attempts and show that the maximal attempt is what we want.

Proof.

We have already proved that AC

⇒

Zorn. We now proved the other way

round.

Given a family

i ∈ I}

of non-empty sets. We say a partial choice

function is a function

J →

i∈I

(for some

J ⊆ I

) such that

(

)

∈ A

for

all j ∈ J.

Let

{

(

J, f

) :

f is a partial choice function with domain J}

. We order

by extension, i.e. (

J, f

)

≤

(

′

, f

′

) iff

J ⊆ J

′

and

′

agrees with

when both are

defined.

Given a chain

{

(

, f

) :

k ∈ K}

, we have an upper bound

(

)

, ie

the function obtained by combining all functions in the chain. So by Zorn’s, it

has a maximal element (J, f).

Suppose

J 

. Then pick

i ∈ I \ J

. Then pick

x ∈ A

. Set

′

J ∪ {i}

and

′

f ∪ {

(

i, x

)

}

. Then this is greater than (

J, f

). This contradicts the

maximality of (

J, f

). So we must have

, i.e.

is a full choice function.

We have shown that Zorn’s lemma is equivalent to the Axiom of Choice.

There is a third statement that is also equivalent to both of these:

Theorem (Well-ordering theorem). Axiom of choice

⇒

every set

can be

well-ordered.

This might be very surprising at first for, say

, since there is no obvious

way we can well-order

. However, it is much less surprising given Hartogs’

lemma, since Hartogs’ lemma says that there is a (well-ordered) ordinal even

bigger than R. So well-ordering R shouldn’t be hard.

Proof.

The idea is to pick an element from

and call it the first; pick another

element and call it the second, and continue transfinitely until we pick everything.

For each

A ⊆ X

with

A 

, we let

be an element of

X \A

. Here we are

using Choice to pick out y

Define

recursively: Having defined

for all

β < α

, if

β < α}

then stop. Otherwise, set

:β<α}

, ie pick

to be something not yet

chosen.

We must stop at some time. Otherwise, we have injected

(

) (ie the ordinal

larger than

) into

, which is a contradiction. So when stop, we have bijected

with an well-ordered set (i.e.

, where

is when you’ve stopped). Hence we

have well-ordered X.

Did we need AC? Yes, trivially.

Theorem. Well-ordering theorem ⇒ Axiom of Choice.

Proof.

Given non-empty sets

i ∈ I}

, well-order

. Then define

(

) to

be the least element of A

Our conclusion is:

Axiom of Choice ⇔ Zorn’s lemma ⇔ Well-ordering theorem.

Before we end, we need to do a small sanity check: we showed that these three

statements are equivalents using a lot of ordinal theory. Our proofs above make

sense only if we did not use AC when building our ordinal theory. Fortunately,

we did not, apart from the remark that

is not a countable supremum — which

used the fact that a countable union of countable sets is countable.

3.3 Bourbaki-Witt theorem*

Finally, we’ll quickly present a second (non-examinable) fixed-point theorem.

This time, we are concerned about chain-complete posets and inflationary

functions.

Definition (Chain-complete poset). We say a poset

is chain-complete if

X = ∅ and every non-empty chain has a supremum.

Example.

– Every complete poset is chain-complete.

–

Any finite (non-empty) poset is chain complete, since every chain is finite

and has a greatest element.

– {A ⊆ V

A is linearly independent}

for any vector space

is chain-

complete, as shown in the proof that every vector space has a basis.

Definition (Inflationary function). A function

X → X

is inflationary if

f(x) ≥ x for all x.

Theorem (Bourbaki-Witt theorem). If

is chain-complete and

X → X

inflationary, then f has a fixed point.

This is follows instantly from Zorn’s, since

has a maximal element

, and

since

(

)

≥ x

, we must have

(

) =

. However, we can prove Bourbaki-Witt

without choice. In the proof of Zorn’s, we had to “pick” and

> x

. Here, we

can simply let

7−→ x

···

Since each chain has a supremum instead of an upper bound, we also don’t need

Choice to pick our favorite upper bound of each chain.

Then we can do the same proof as Zorn’s to find a fixed point.

We can view this as the “AC-free” part of Zorn’s. It can be used to prove

Zorn’s lemma, but the proof is totally magic.

4 Predicate logic

In the first chapter, we studied propositional logic. However, it isn’t sufficient

for most mathematics we do.

In, say, group theory, we have a set of objects, operations and constants. For

example, in group theory, we have the operations multiplication

→ A

inverse

A → A

, and a constant

e ∈ A

. For each of these, we assign a number

known as the arity, which specifies how many inputs each operation takes. For

example, multiplication has arity 2, inverse has arity 1 and

has arity 0 (we can

view e as a function A

→ A, that takes no inputs and gives a single output).

The study of these objects is known as predicate logic. Compared to proposi-

tional logic, we have a much richer language, which includes all the operations

and possibly relations. For example, with group theory, we have

m, i, e

in our

language, as well as things like

∀

⇒

etc. Note that unlike propositional logic,

different theories give rise to different languages.

Instead of a valuation, now we have a structure, which is a solid object plus

the operations and relations required. For example, a structure of group theory

will be an actual concrete group with the group operations.

Similar to what we did in propositional logic, we will take

S |

to mean

“for any structure in which

is true,

is true”. For example, “Axioms of group

theory”

(

e, e

) =

, i.e. in any set that satisfies the group axioms,

(

e, e

) =

We also have S ⊢ t meaning we can prove t from S.

4.1 Language of predicate logic

We start with the definition of the language. This is substantially more compli-

cated than what we’ve got in propositional logic.

Definition (Language). Let Ω (function symbols) and Π (relation symbols) be

disjoint sets, and α : Ω ∪ Π → N a function (’arity’).

The language L = L(Ω, Π, α) is the set of formulae, defined as follows:

–

Variables: we have some variables

, x

, ···

. Sometimes (i.e. always), we

write x, y, z, ··· instead.

– Terms: these are defined inductively by

(i) Every variable is a term

(ii)

f ∈

Ω,

(

) =

, and

, ··· , t

are terms, then

···t

is a term.

We often write f (t

, ··· , t

) instead.

Example. In the language of groups Ω =

{m, i, e}

, Π =

∅

, and

(

) =

2, α(i) = 1, α(e) = 0. Then e, x

, m(x

, x

), i(m(x

, x

)) are terms.

– Atomic formulae: there are three sorts:

(i) ⊥

(ii) (s = t) for any terms s, t.

(iii) (ϕt

···t

) for any ϕ ∈ Π with α(ϕ) = n and t

, ··· , t

terms.

Example. In the language of posets, Ω =

∅

, Π =

{≤}

and

(

≤

) = 2.

Then (x

= x

), x

≤ x

(really means (≤ x

)) are atomic formulae.

– Formulae: defined inductively by

(i) Atomic formulae are formulae

(ii) (p ⇒ q) is a formula for any formulae p, q.

(iii) (∀x)p is a formula for any formula p and variable x.

Example. In the language of groups,

(

e, e

) =

(

∀x

)

(

x, i

(

)) =

, (

∀x

)(

(

x, x

) =

e ⇒

(

∃y

)(

(

y, y

) =

)) are formu-

lae.

It is important to note that a formula is a string of meaningless symbol. It

doesn’t make sense to ask whether it is true or false. In particular, the function

and relation symbols are not assigned any meaning. The only thing the language

cares is the arity of the symbol.

Again, we have the usual abbreviations

¬p

p ∧ q

p ∨ q

etc. Also, we have

(∃x)p for ¬(∀x)(¬p).

Definition (Closed term). A term is closed if it has no variables.

Example. In the language of groups,

e, m

(

e, e

) are closed terms. However,

(

x, i

(

)) is not closed even though we think it is always

. Apart from the

fact that it is by definition not closed (it has a variable

), we do not have the

groups axioms stating that m(x, i(x)) = e.

Definition (Free and bound variables). An occurrence of a variable

in a

formula

is bound if it is inside brackets of a (

∀x

) quantifier. It is free otherwise.

Example. In (∀x)(m(x, x) = e), x is a bound variable.

In (∀y)(m(y, y) = x ⇒ m(x, y) = m(y, x)), y is bound while x is free.

We are technically allowed to have a formula with

both bound and free,

but DO NOT DO IT. For example,

(

x, x

) =

e ⇒

(

∀x

)(

∀y

)(

(

x, y

) =

(

y, x

))

is a valid formula (first two x are free, while the others are bound).

Definition (Sentence). A sentence is a formula with no free variables.

Example.

(

e, e

) =

and (

∀x

)(

(

x, x

) =

) are sentences, while

(

x, i

(

)) =

is not.

Definition (Substitution). For a formula

, a variable

and a term

, the

substitution p[t/x] is obtained by replacing each free occurrence of x with t.

Example. If

is the statement (

∃y

)(

(

y, y

) =

), then

[

e/x

] is (

∃y

)(

(

y, y

) =

4.2 Semantic entailment

In propositional logic, we can’t say whether a proposition is true or false unless

we have a valuation. What would be a “valuation” in the case of predicate logic?

It is a set with the operations of the right arity. We call this a structure.

For example, in the language of groups, a structure is a set

with the correct

operations. Note that it does not have to satisfy the group axioms in order to

qualify as a structure.

Definition (Structure). An

-structure is a non-empty set

with a function

→ A

for each

f ∈

Ω

, α

(

) =

, and a relation

⊆ A

, for each

ϕ ∈

Π,

α(ϕ) = n.

Note that we explicitly forbid

from being empty. It is possible to formulate

predicate logic that allows empty structures, but we will have to make many

exceptions when defining our axioms, as we will see later. Since empty structures

are mostly uninteresting (and don’t exist if there is at least one constant), it

isn’t a huge problem if we ignore it. (There is a small caveat here — we are

working with single-sorted logic here, so everything is of the same “type”. If

we want to work with multi-sorted logic, where there can be things of different

types, it would then be interesting to consider the case where some of the types

could be empty).

Example. In the language of posets

, a structure is a set

with a relation

≤

⊆ A × A.

In the language of groups, a structure is a set

with functions

A×A →

A, i

: A → A and e

∈ A.

Again, these need not be genuine posets/groups since we do not have the

axioms yet.

Now we want to define “

holds in

” for a sentence

p ∈ L

and a

-structure

For example, we want (

∀x

)(

(

x, x

) =

) to be true in

iff for each

a ∈ A

we have

(

a, a

) =

. So to translate

into something about

, you “add

subscript

to each function-symbol and relation-symbol, insert

∈ A

after the

quantifiers, and say it aloud”. We call this the interpretation of the sentence p.

This is not great as a definition. So we define it formally, and then quickly

forget about it.

Definition (Interpretation). To define the interpretation

∈ 0, 1

for each

sentence p and L-structure A, we define inductively:

(i) Closed terms: define t

∈ A for each closed term t by

(ft

, ··· , t

)

= f

, t

··· , t

)

for any f ∈ Ω, α(f) = n, and closed terms t

, ··· , t

Example. (m(m(e, e), e))

= m

, e

), e

(ii) Atomic formulae:

⊥

= 0

(s = t)

(

1 s

= t

0 s

= t

(ϕt

···t

)

(

1 (t

, ··· , t

) ∈ ϕ

0 otherwise

(iii) Sentences:

(p ⇒ q)

(

0 p

= 1, q

= 0

1 otherwise

((∀x)p)

(

1 p[¯a/x]

for all a ∈ A

0 otherwise

where for any

a ∈ A

, we define a new language

′

by adding a constant

¯a

and make A into an L

′

structure

A by setting ¯a

= a.

Now that we have formally defined truth, just forget about it!

Note that we have only defined the interpretation only for sentences. We

can also define it for functions with free variables. For any formula

with

free

variables, we can define the interpretation as the set of all things that satisfy

For example, if p is (∃y)(m(y, y) = a), then

= {a ∈ A : ∃b ∈ A such that m(b, b) = a}.

However, we are mostly interested in sentences, and don’t have to worry about

these.

Now we can define models and entailment as in propositional logic.

Definition (Theory). A theory is a set of sentences.

Definition (Model). If a sentence

has

= 1, we say that

holds in

, or

is true in A, or A is a model of p.

For a theory S, a model of S is a structure that is a model for each s ∈ S.

Definition (Semantic entailment). For a theory

and a sentence

entails

written as S |= t, if every model of S is a model of t.

“Whenever S is true, t is also true”.

Definition (Tautology).

is a tautology, written

, if

∅ |

, i.e. it is true

everywhere.

Example. (∀x)(x = x) is a tautology.

Example.

(i) Groups:

–

The language

is Ω = (

m, i, e

) and Π =

∅

, with arities 2, 1, 0

respectively.

– Let T be

{(∀x)(∀y)(∀z)m(x, m(y, z)) = m(m(x, y), z),

(∀x)(m(x, e) = x ∧ m(e, x) = x),

(∀x)(m(x, i(x)) = e ∧ m(i(x), x) = e)}.

Then an

-structure

is a model for

iff

is a group. We say

axiomatizes the theory of groups/class of groups. Sometimes we call the

members of T the axioms of T .

Note that we could use a different language and theory to axiomatize group

theory. For example, we can have Ω = (

m, e

) and change the last axiom to

(∀x)(∃y)(m(x, y) = e ∧m(y, x) = e)}.

(ii) Fields:

–

The language

is Ω = (+

, ×, −,

1) and Π =

∅

, with arities 2, 2, 1,

0, 0.

– The theory T consists of:

◦ Abelian group under +

◦ × is commutative, associative, and distributes over +

◦ ¬(0 = 1)

◦ (∀x)((¬(x = 0)) ⇒ (∃y)(y × x = 1).

Then an

-structure is a model of

iff it is a field. Then we have

T |= ”inverses are unique”, i.e.

T |= (∀x)





¬(x = 0)



⇒ (∀y)(∀z)



(xy = 1 ∧ xz = 1) ⇒ (y = z)





(iii) Posets:

– The language is Ω = ∅, and Π = {≤} with arity 2.

– The theory T is

{(∀x)(x ≤ x),

(∀x)(∀y)(∀z)



(x ≤ y) ∧ (y ≤ z) ⇒ x ≤ z



(∀x)(∀y)



x ≤ y ∧ y ≤ z ⇒ x = y



}

Then T axiomatizes the theory of posets.

(iv) Graphs:

–

The language

is Ω =

∅

and Π =

{a}

with arity 2. This relation is

“adjacent to”. So a(x, y) means there is an edge between x and y.

– The theory is

{(∀x)(¬a(x, x)),

(∀x)(∀y)(a(x, y) ⇔ a(y, x)}

Predicate logic is also called “first-order logic”. It is “first-order” because our

quantifiers range over elements of the structure only, and not subsets. It would

be difficult (and in fact impossible) to axiomatize, say, a complete ordered field,

since the definition requires says every bounded subset has a least upper bound.

4.3 Syntactic implication

Again, to define syntactic implication, we need axioms and deduction rules.

Definition (Axioms of predicate logic). The axioms of predicate logic consists

of the 3 usual axioms, 2 to explain how = works, and 2 to explain how

∀

works.

They are

1. p ⇒ (q ⇒ p) for any formulae p, q.

2. [p ⇒ (q ⇒ r)] ⇒ [(p ⇒ q) ⇒ (p ⇒ r)] for any formulae p, q, r.

3. (¬¬p ⇒ p) for any formula p.

4. (∀x)(x = x) for any variable x.

(

∀x

)(

∀y

)



(

)

⇒

(

p ⇒ p

[

y/x

])



for any variable

x, y

and formula

with y not occurring bound in p.

[(

∀x

)

]

⇒ p

[

t/x

] for any formula

, variable

, term

with no free variable

of t occurring bound in p.

[(

∀x

)(

p ⇒ q

)]

⇒

[

p ⇒

(

∀x

)

] for any formulae

p, q

with variable

not

occurring free in p.

The deduction rules are

1. Modus ponens: From p and p ⇒ q, we can deduce q.

Generalization: From

, we can deduce (

∀x

)

, provided that no premise

used in the proof so far had x as a free variable.

Again, we can define proofs, theorems etc.

Definition (Proof). A proof of

from

is a sequence of statements, in which

each statement is either an axiom, a statement in

, or obtained via modus

ponens or generalization.

Definition (Syntactic implication). If there exists a proof a formula

form a

set of formulae S, we write S ⊢ p “S proves t”.

Definition (Theorem). If

S ⊢ p

, we say

is a theorem of

. (e.g. a theorem of

group theory)

Note that these definitions are exactly the same as those we had in propo-

sitional logic. The only thing that changed is the set of axioms and deduction

rules.

Example. {x = y, x = z} ⊢ y = z.

We go for x = z giving y = z using Axiom 5.

1. (∀x)(∀y)((x = y) ⇒ (x = z ⇒ y = z)) Axiom 5

2. [(∀x)(∀y)((x = y) ⇒ (x = z ⇒ y = z))] ⇒ (∀y)(x = y ⇒ y = z) Axiom 6

3. (∀y)((x = y) ⇒ x = z ⇒ y = z) MP on 1, 2

4. [(∀y)((x = y) ⇒ x = z ⇒ y = z)] ⇒ [(x = y) ⇒ (x = z ⇒ y = z) Axiom 6

5. (x = y) ⇒ [(x = z) ⇒ (y = z)] MP on 3, 4

6. x = y Premise

7. (x = z) ⇒ (y = z) MP 6, 7

8. x = z Premise

9. y = z MP on 7, 8

Note that in the first 5 rows, we are merely doing tricks to get rid of the

∀

signs.

We can now revisit why we forbid

∅

from being a structure. If we allowed

∅

then (

∀x

)

⊥

holds in

∅

. However, axioms 6 states that ((

∀x

)

⊥

)

⇒ ⊥

. So we can

deduce

⊥

in the empty structure! To fix this, we will have to add some weird

clauses to our axioms, or simply forbid the empty structure!

Now we will prove the theorems we had for propositional logic.

Proposition (Deduction theorem). Let

S ⊆ L

, and

p, q ∈ L

. Then

S ∪ {p} ⊢ q

if and only if S ⊢ p ⇒ q.

Proof.

The proof is exactly the same as the one for propositional logic, except

in the ⇒ case, we have to check Gen.

Suppose we have lines

– r

– (∀x)r Gen

and we have a proof of

S ⊢ p ⇒ r

(by induction). We want to seek a proof of

p ⇒ (∀x)r from S.

We know that no premise used in the proof of

from

S ∪{p}

had

as a free

variable, as required by the conditions of the use of Gen. Hence no premise used

in the proof of p ⇒ r from S had x as a free variable.

Hence S ⊢ (∀x)(p ⇒ r).

If x is not free in p, then we get S ⊢ p ⇒ (∀x)r by Axiom 7 (and MP).

is free in

, then we did not use premise

in our proof

from

S ∪ {p}

(by the conditions of the use of Gen). So

S ⊢ r

, and hence

S ⊢

(

∀x

)

by Gen.

So S ⊢ p ⇒ (∀x)r.

Now we want to show

S ⊢ p

iff

S |

. For example, a sentence that holds in

all groups should be deducible from the axioms of group theory.

Proposition (Soundness theorem). Let

be a set of sentences,

a sentence.

Then S ⊢ p implies S |= p.

Proof.

(non-examinable) We have a proof of

from

, and want to show that

for every model of S, p holds.

This is an easy induction on the lines of the proof, since our axioms are

tautologies and our rules of deduction are sane.

The hard part is proving

S |= p ⇒ S ⊢ p.

This is, by the deduction theorem,

S ∪ {¬p} |= ⊥ ⇒ S ∪ {¬p} ⊢ ⊥.

This is equivalent to the contrapositive:

S ∪ {¬p} ⊢ ⊥ ⇒ S ∪ {¬p} |= ⊥.

Theorem (Model existence lemma). Let

be a consistent set of sentences.

Then S has a model.

We need several ideas to prove the lemma:

(i)

We need to find a structure. Where can we start from? The only thing we

have is the language. So we start form the language. Let

= set of all

closed terms, with the obvious operations.

For example, in the theory of fields, we have “1 + 1”, “0 + 1“ etc in the

structure. Then (1 + 1) +

(0 + 1) = (1 + 1) + (0 + 1).

(ii)

However, we have a problem. In, say, the language of fields, and

our

field axioms, our

has distinct elements “1 + 0”, “0 + 1”, “0 + 1 + 0”

etc. However,

S ⊢

1 + 0 = 0 + 1 etc. So we can’t have them as distinct

elements. The solution is to quotient out by the equivalence relation

s ∼ t

S ⊢

(

), i.e. our structure is the set of equivalence classes. It is

trivial check to check that the +,

operations are well-defined for the

equivalence classes.

(iii)

We have the next problem: If

is ”field of characteristic 2 or 3“, i.e.

has a field axiom plus 1 + 1 = 0

∨

1 + 1 + 1 = 0. Then

S ⊢

1 + 1 = 0. Also

S ⊢

1 + 1 + 1 = 0. So [1 + 1]



= [0], and [1 + 1 + 1]



= [0]. But then our

has neither characteristic 2 or 3.

This is similar to the problem we had in the propositional logic case, where

we didn’t know what to do with

only talks about

and

. So we

first extend S to a maximal consistent (or complete)

(iv)

Next problem: Let

= “fields with a square root of 2”, i.e.

is the

field axioms plus (

∃x

)(

= 1 + 1). However, there is no closed term

which is equivalent to

√

. We say we lack witnesses to the statement

(

∃x

)(

= 1 + 1). So we add a witness. We add a constant

to the

language, and add the axiom “

= 1 + 1” to

. We do this for each such

existential statement.

(v)

Now what? We have added new symbols to

, so our new

is no longer

complete! Of course, we go back to (iii), and take the completion again.

Then we have new existential statements to take care of, and we do (iv)

again. Then we’re back to (iii) again! It won’t terminate!

So we keep on going, and finally take the union of all stages.

Proof.

(non-examinable) Suppose we have a consistent

in the language

(Ω

Π). Extend

to a consistent

such that

p ∈ S

or (

¬p

)

∈ S

for each

sentence

p ∈ L

(by applying Zorn’s lemma to get a maximal consistent

). In

particular, S

is complete, meaning S

⊢ p or S

⊢ ¬p for all p.

Then for each sentence of the form (

∃x

)

, add a new constant

and add

[

c/x

] to

— obtaining

in language

(Ω

∪ C

Π). It is easy

to check that T

is consistent.

Extend

to a complete theory

⊆ L

, and add witnesses to form

⊆

= L(Ω ∪ C

∪ C

, Π). Continue inductively.

Let

∪ S

∪ ···

in language

∪ L

∪ ···

(i.e.

(Ω

∪ C

∪

∪ ··· , Π)).

Claim.

is consistent, complete, and has witnesses, i.e. if (

∃x

)

p ∈

, then

p[t/x] ∈

S For some term t.

It is consistent since if

S ⊢ ⊥

, then some

⊢ ⊥

since proofs are finite. But

all S

are consistent. So

S is consistent.

To show completeness, for sentence

p ∈

, we have

p ∈ L

for some

, as

has only finitely many symbols. So

n+1

⊢ p

n+1

⊢ ¬p

. Hence

S ⊢ p

S ⊢ ¬p.

To show existence of witnesses, if (

∃x

)

p ∈

, then (

∃x

)

p ∈ S

for some

Hence (by construction of T

), we have p[c/x] ∈ T

for some constant c.

Now define an equivalence relation

∼

on closed term of

s ∼ t

S ⊢

(

). It is easy to check that this is indeed an equivalence relation. Let

be the set of equivalence classes. Define

(i) f

([t

], ··· , [t

]) = [ft

, ··· , t

] for each formula f ∈ Ω, α(f) = n.

(ii) ϕ

{

([

]

, ··· ,

[

]) :

S ⊢ ϕ

(

, ··· , t

)

}

for each relation

ϕ ∈

Π and

α(ϕ) = n.

It is easy to check that this is well-defined.

Claim. For each sentence

S ⊢ p

(i.e.

p ∈

) if and only if

holds in

, i.e.

= 1.

We prove this by an easy induction.

– Atomic sentences:

◦ ⊥:

S ⊢ ⊥, and ⊥

= 0. So good.

◦ s

S ⊢ s

iff [

] = [

] (by definition) iff

(by definition

of s

) iff (s = t)

. So done.

◦ ϕt

, ··· , t

is the same.

– Induction step:

◦ p ⇒ q

S ⊢

(

p ⇒ q

) iff

S ⊢

(

¬p

) or

S ⊢ q

(justification: if

S ⊢ ¬p

and

S ⊢ q

, then

S ⊢ p

and

S ⊢ ¬q

by completeness, hence

S ⊢ ¬

(

p ⇒ q

contradiction). This is true iff p

= 0 or q

= 1 iff (p ⇒ q)

= 1.

◦

(

∃x

)

S ⊢

(

∃x

)

iff

S ⊢ p

[

t/x

] for some closed term

. This is true

since

has witnesses. Now this holds iff

[

t/x

]

= 1 for some closed

term

(by induction). This is the same as saying (

∃x

)

holds in

because A is the set of (equivalence classes of) closed terms.

Here it is convenient to pretend

∃

is the primitive symbol instead of

∀

Then we can define (

∀x

)

to be

(

∃x

)

¬p

, instead of the other way round.

It is clear that the two approaches are equivalent, but using

∃

as primitive

makes the proof look clearer here.

Hence A is a model of

S. Hence it is also a model of S. So S has a model.

Again, if

is countable (i.e. Ω

Π are countable), then Zorn’s Lemma is not

needed.

From the Model Existence lemma, we obtain:

Corollary (Adequacy theorem). Let

be a theory, and

a sentence. Then

S |= p implies S ⊢ p.

Theorem (G¨odel’s completeness theorem (for first order logic)). Let

be a

theory, p a sentence. Then S ⊢ p if and only if S |= p.

Proof. (⇒) Soundness, (⇐) Adequacy.

Corollary (Compactness theorem). Let

be a theory such that every finite

subset of S has a model. Then so does S.

Proof.

Trivial if we replace “has a model” with “is consistent”, because proofs

are finite.

We can look at some applications of this:

Can we axiomatize the theory of finite groups (in the language of groups)?

i.e. is there a set of sentences T such that models of T are finite groups.

Corollary. The theory of finite groups cannot be axiomatized (in the language

of groups).

It is extraordinary that we can prove this, as opposed to just “believing it

should be true”.

Proof.

Suppose theory

has models all finite groups and nothing else. Let

′

be T together with

– (∃x

)(∃x

)(x

= x

) (intuitively, |G| ≥ 2)

– (∃x

)(∃x

)(x

= x

) (intuitively, |G| ≥ 3)

– ···

Then

′

has no model, since each model has to be simultaneously arbitrarily

large and finite. But every finite subset of

′

does have a model (e.g.

for

some n). Contradiction.

This proof looks rather simple, but it is not “easy” in any sense. We are

using the full power of completeness (via compactness), and this is not easy to

prove!

Corollary. Let

be a theory with arbitrarily large models. Then

has an

infinite model.

“Finiteness is not a first-order property”

Proof. Same as above.

Similarly, we have

Corollary (Upward L¨owenheim-Skolem theorem). Let

be a theory with an

infinite model. Then S has an uncountable model.

Proof. Add constants {c

: i ∈ I} to L for some uncountable I.

Let T = S

{“c

= c

” : i, j ∈ I, i = j}.

Then any finite

′

⊆ T

has a model, since it can only mention finitely many

of the

. So any infinite model of

will do. Hence by compactness,

has a

model

Similarly, we have a model for

that does not inject into

, for any chosen

set X. For example, we can add γ(X) constants, or P(X) constants.

Example. There exists an infinite field (

). So there exists an uncountable

field (e.g. R). Also, there is a field that does not inject into P(P(R)), say,

Theorem (Downward L¨owenheim-Skolem theorem). Let

be a countable

language (i.e. Ω and Π are countable). Then if

has a model, then it has a

countable model.

Proof.

The model constructed in the proof of model existence theorem is count-

able.

Note that the proof of the model existence theorem is non-examinable, but

the proof of this is examinable! So we are supposed to magically know that the

model constructed in the proof is countable without knowing what the proof

itself is!

4.4 Peano Arithmetic

As an example, we will make the usual axioms for

into a first-order theory.

We take Ω =

{

, s,

, ×}

, and Π =

∅

. The arities are

(0) = 0

, α

(

) = 1

, α

(+) =

α(×) = 2.

The operation

is the successor operation, which should be thought of as

“+1”.

Our axioms are

Definition (Peano’s axioms). The axioms of Peano’s arithmetic (PA) are

(i) (∀x)¬(s(x) = 0).

(ii) (∀x)(∀y)((s(x) = s(y)) ⇒ (x = y)).

(iii) (∀y

) ···(∀y

)

[

]

∧

(

∀x

)(

p ⇒ p

[

(

)

])]

⇒

(

∀x

)

. This is actually

infinitely many axioms — one for each formula

, free variables

, ··· , y

, x

i.e. it is an axiom scheme.

(iv) (∀x)(x + 0 = x).

(v) (∀x)(∀y)(x + s(y) = s(x + y)).

(vi) (∀x)(x × 0 = 0).

(vii) (∀x)(∀y)(x × s(y) = (x + y) + x).

Note that our third axiom looks rather funny with all the (

∀y

) in front. Our

first guess at writing it would be

[p[0/x] ∧ (∀x)(p ⇒ p[s(x)/x])] ⇒ (∀x)p.

However, this is in fact not sufficient. Suppose we want to prove that for all

and

. The natural thing to do would be to fix a

and induct

(or the other way round). We want to be able to fix any

to do so. So

we need a (

∀y

) in front of our induction axiom, so that we can prove it for all

values of

all at once, instead of proving it once for

= 0, once for

= 1 , once

for

= 1 + 1 etc. This is important, since we might have an uncountable model

of PA, and we cannot name all

. When we actually think about it, we can just

forget about the (∀y

)s. But just remember that formally we need them.

We know that PA has a model

that is infinite. So it has an uncountable

model by Upward L¨owenheim-Skolem. Then clearly this model is not isomorphic

. However, we are also told that the axioms of arithmetic characterize

completely. Why is this so?

This is since Axiom 3 is not full induction, but a “first-order” version. The

proper induction axiom talks about subsets of N, i.e.

(∀S ⊆ N)((0 ∈ S ∧ x ∈ S ⇒ s(x) ∈ S) ⇒ S = N).

However, there are uncountably many subsets of

, and countably many formulae

p. So our Axiom 3 only talks about some subsets of N, not all.

Now the important question is: is PA complete?

G¨odel’s incompleteness theorem says: no! There exists some

with PA

⊢ p

and PA ⊢ ¬p.

Hence, there is a p that is true in N, but PA ⊢ p.

Note that this does not contradict G¨odel’s completeness theorem. The

completeness theorem tells us that if

holds in every model of PA (not just in

N), then P A ⊢ p.

4.5 Completeness and categoricity*

We now study some “completeness” property of theories. For convenience, we

will assume that the language is countable.

Definition (Complete theory). A theory

is complete if for all propositions

in the language, either T ⊢ p or T ⊢ ¬p.

Complete theories are rather hard to find. So for the moment, we will content

ourselves by just looking at theories that are not complete.

Example. The theory of groups is not complete, since the proposition

(∀g)(∀h)(gh = hg)

can neither be proven or disproven (there are abelian groups and non-abelian

groups).

Another “completeness” notion we might be interested in is categoricity. We

will need to use the notion of a cardinal, which will be introduced later in the

course. Roughly, a cardinal is a mathematical object that denotes the sizes of

sets, and a set “has cardinality κ” if it bijects with κ.

Definition (

-categorical). Let

be an infinite cardinal. Then a theory

-categorical if there is a unique model of the theory of cardinality

up to

isomorphism.

Here the notion of “isomorphism” is the obvious one — a homomorphism of

models is a function between the structures that preserves everything, and an

isomorphism is a homomorphism with an inverse (that is also a homomorphism).

Example (Pure identity theory). The pure identity theory has an empty language

and no axioms. Then this theory is

-categorical for any

, since if two models

have the same cardinality, then by definition there is a bijection between them,

and any such bijection is an isomorphism because there are no properties to be

preserve.

One nice thing about categorical theories is that they are complete!

Proposition. Let

be a theory that is

categorical for some

, and suppose

T has no finite models. Then T is complete.

Note that the requirement that

has no finite models is necessary. For

example, pure identity theory is

-categorical for all

but is not complete, since

the statement (∃x)(∃y)(¬(x = y)) is neither provable nor disprovable.

Proof.

Let

be a proposition. Suppose

T ⊢ p

and

T ⊢ ¬p

. Then there are

infinite models of

T ∪{p}

and

T ∪{¬p}

(since the models cannot be finite), and

so by the L¨owenhein–Skolem theorems, we can find such models of cardinality

. But since one satisfies

and the other does not, they cannot be isomorphic.

This contradicts κ-categoricity.

We are now going to use this idea to prove the Ax-Grothendieck theorem

Theorem (Ax-Grothendieck theorem). Let

→ C

be a complex polyno-

mial. If f is injective, then it is in fact a bijection.

We will use the following result from field theory without proof:

Lemma. Any two uncountable algebraically closed fields with the same di-

mension and same characteristic are isomorphic. In other words, the theory of

algebraically closed fields of characteristic

(for

a prime or 0) is

-categorical

for all uncountable cardinals κ, and in particular complete.

The rough idea (for the field theorists) is that an algebraically closed field is

uniquely determined by its transcendence degree over the base field Q or F

In the following proof, we will also use some (very) elementary results about

field theory that can be found in IID Galois Theory.

Proof of Ax-Grothendieck.

We will use compactness and completeness to show

that we only have to prove this for fields of positive characteristic, and the result

can be easily proven since we end up dealing with finite fields.

Let

ACF

be the theory of algebraically closed fields. The language is the

language of rings, and the axioms are the usual axioms of a field, plus the

following axiom for each n > 0:

(∀a

, a

, ··· , a

n−1

)(∃x)(x

+ a

n−1

+ ··· + a

x + a

= 0).

Let

ACF

denote the theory of algebraically closed fields of characteristic 0,

where we add the axiom

1 + 1 + ··· + 1

| {z }

n times

= 0 (∗)

for all n to ACF

Let

ACF

denote the theory of algebraically closed fields of characteristic

where we add the axiom

1 + 1 + ··· + 1

| {z }

p times

= 0

to ACF.

We now notice the following fact: if

is a proposition that is a theorem of

ACF

for all

, then it is true of

ACF

. Indeed, we know that

ACF

is complete.

So if

is not a theorem in

ACF

, then

¬r

is a theorem. But the proof is finite,

so it can only use finitely many instances of (

∗

). So there is some large

where

¬r can be proven in ACF

, which is a contradiction.

Now the statement “If

is a polynomial of degree

and

is injective, then

is surjective” can be expressed as a first-order statement. So we just have to

prove it for all fields of characteristic

p >

0. Moreover, by completeness, for each

, we only need to prove it for some algebraically complete field of characteristic

Fix a prime

, and consider

, the algebraic closure of

. This is an

algebraically closed field with the property that every element is algebraic over

, i.e. the field generated by any finite subset of elements is finite.

Let

→ F

be a polynomial function involving coefficients

, ··· , a

Let

= (

, ··· , b

)

∈ F

be a point. Then

restricts to a function from the

field

generated by

, ··· , b

, a

, ··· , a

}

to itself. But

is finite, so any

function

F →

that is injective must also be surjective. So

is in the

image of f. So f is surjective. So done.

We conclude the section by stating, without proof, a theorem by Morely:

Theorem (Morley’s categoricity theorem). Let

be a theory with a countable

language. If

-categorical for some uncountable cardinal

, then it is

µ-categorical for all uncountable cardinals µ.

Hence we often just say a theory is uncountably categorical when the theory

is categorical for some (hence all) uncountable cardinals.

5 Set theory

Here we’ll axiomatize set theory as “just another first-order theory”, with

signatures, structures etc. There are many possible formulations, but the most

common one is Zermelo Fraenkel set theory (with the axiom of choice), which is

what we will study.

5.1 Axioms of set theory

Definition (Zermelo-Fraenkel set theory). Zermelo-Fraenkel set theory (ZF)

has language Ω = ∅, Π = {∈}, with arity 2.

Then a “universe of sets” will simply mean a model of these axioms, a pair

(

V, ∈

), where

is a set and

∈

is a binary relation on

in which the axioms are

true (officially, we should write

∈

, but it’s so weird that we don’t do it). Each

model (

V, ∈

) will (hopefully) contain a copy of all of maths, and so will look

very complicated!

ZF will have 2 axioms to get started with, 4 to build things, and 3 more

weird ones one might not realize are needed.

Axiom (Axiom of extension). “If two sets have the same elements, they are the

same set”.

(∀x)(∀y)((∀z)(z ∈ x ⇔ z ∈ y) ⇒ x = y).

We could replace the

⇒

with an

⇔

, but the converse statement

y ⇒

(

z ∈ x ⇔ z ∈ y

) is an instance of a logical axiom, and we don’t have to explicitly

state it.

Axiom (Axiom of separation). “Can form subsets of sets”. More precisely, for

any set x and a formula p, we can form {z ∈ x : p(z)}.

(∀t

) ···(∀t

)(∀x)(∃y)(∀z)(z ∈ y ⇔ (z ∈ x ∧ p)).

This is an axiom scheme, with one instance for each formula

with free variables

, ··· , t

, z.

Note again that we have those funny (

∀t

). We do need them to form, e.g.

{z ∈ x : t ∈ z}, where t is a parameter.

This is sometimes known as Axiom of comprehension.

Axiom (Axiom of empty set). “The empty-set exists”

(∃x)(∀y)(y ∈ x).

We write

∅

for the (unique, by extension) set with no members. This is an

abbreviation:

(

∅

) means (

∃x

)(

x has no members ∧ p

(

)). Similarly, we tend to

write {z ∈ x : p(z)} for the set given by separation.

Axiom (Axiom of pair set). “Can form {x, y}”.

(∀x)(∀y)(∃z)(∀t)(t ∈ z ⇔ (t = x ∨ t = y)).

We write {x, y} for this set. We write {x} for {x, x}.

We can now define ordered pairs:

Definition (Ordered pair). An ordered pair (x, y) is {{x}, {x, y}}.

We define “x is an ordered pair” to mean (∃y)(∃z)(x = (y, z)).

We can show that (a, b) = (c, d) ⇔ (a = c) ∧(b = d).

Definition (Function). We define “f is a function” to mean

(∀x)(x ∈ f ⇒ x is an ordered pair)∧

(∀x)(∀y)(∀z)[(x, y) ∈ f ∧(x, z) ∈ f ] ⇒ y = z.

We define

dom f

to mean

is a function and (

∀y

)(

y ∈ x ⇔

(

∃z

)((

y, z

)

∈ f

)).

We define f : x → y to mean f is a function and dom f = x and

(∀z)[(∃t)((t, z) ∈ f) ⇒ z ∈ y].

Once we’ve defined them formally, let’s totally forget about the definition

and move on with life.

Axiom (Axiom of union). “We can form unions” Intuitively, we have

a ∪b ∪c

x ∈ a or x ∈ b or x ∈ c}

. but instead of

a ∪b ∪c

, we write

{a, b, c}

so that

we can express infinite unions as well.

(∀x)(∃y)(∀z)(z ∈ y ⇔ (∃t)(t ∈ x ∧ z ∈ t)).

We tend to write

x for the set given above. We also write x ∪ y for

{x, y}.

Note that we can define intersection

as a subset of

(for any

y ∈ x

) by

separation, so we don’t need an axiom for that.

Axiom (Axiom of power set). “Can form power sets”.

(∀x)(∃y)(∀z)(z ∈ y ⇔ z ⊆ x),

where z ⊆ x means (∀t)(t ∈ z ⇒ t ∈ x).

We tend to write P(x) for the set generated above.

We can now form

x × y

, as a subset of

(

x ∪ y

)), because for

t ∈ x, s ∈ y

we have (t, s) = {{t}, {t, s}} ∈ P(P(x ∪ y)).

Similarly, we can form the set of all functions from

as a subset of

P(x × y) (or P(P(P(x ∪ y)))!).

Now we’ve got the easy axioms. Time for the weird ones.

Axiom of infinity

From the axioms so far, we cannot prove that there is an infinite set! We start

from the empty set, and all the operations above can only produce finite sets.

So we need an axiom to say that the natural numbers exists.

But how could we do so in finitely many words? So far, we do have infinitely

many sets. For example, if we write

for

x ∪ {x}

, it is easy to check that

, ∅

, ··· are all distinct.

(Writing them out explicitly, we have

∅

{∅}

∅

{∅, {∅}}

∅

+++

{∅, {∅}, {∅, {∅}}}

. We can also write 0 for

∅

, 1 for

∅

, 2 for

∅

. So 0 =

∅

1 = {0}, 2 = {0, 1}, ···)

Note that all of these are individually sets, and so

is definitely infinite.

However, inside

is not a set, or else we can apply separation to obtain

Russell’s paradox.

So the collection of all 0

, ··· ,

need not be a set. Therefore we want an

axiom that declares this is a set. So we want an axiom stating

∃x

such that

∅ ∈ x, ∅

∈ x, ∅

∈ x, ···

. But this is an infinite statement, and we need a

smarter way of formulating this.

Axiom (Axiom of infinity). “There is an infinite set”.

(∃x)(∅ ∈ x ∧ (∀y)(y ∈ x ⇒ y

∈ x)).

We say any set that satisfies the above axiom is a successor set.

A successor set can contain a lot of nonsense elements. How do we want to

obtain N, without nonsense?

We know that the intersection of successors is also a successor set. So there

is a least successor set, i.e. the intersection of all successor sets. Call this

. This

will be our copy of N in V . So

(∀x)(x ∈ ω ⇔ (∀y)(y is a successor set ⇒ x ∈ y)).

Therefore, if we have

(∀x)[(x ⊆ ω ∧ x is a successor set) ⇒ x = ω],

by definition of

. By the definition of “is a successor set”, we can write this as

(∀x)[(x ⊆ ω ∧ ∅ ∈ x ∧ (∀y)(y ∈ x ⇒ y

∈ x)) ⇒ x = ω].

This is genuine induction (in

)! It is not just our weak first-order axiom in

Peano’s axioms.

Also, we have

(∀x)(x ∈ ω ⇒ x

= ∅) and (∀x)(∀y)((x ∈ ω ∧ y ∈ ω ∧ x

= y

) ⇒ x = y)

-induction (i.e. induction on

). Hence

satisfies the axioms of natural

numbers.

We can now write, e.g. “

is finite” for (

∃y

)(

∃f

)(

y ∈ x ∧ f bijects x with y

Similarly, we define “

is countable” to mean “

is finite or

bijects with

ω”.

That’s about it for this axiom. Time for the next one:

Axiom of foundation

“Sets are built out of simpler sets”. We want to ban weird things like

x ∈ x

x ∈ y ∧ y ∈ x

, or similarly for longer chains. We also don’t want infinite

descending chains like ···x

∈ x

How can we capture the “wrongness” of these weird things all together? In

the first case

x ∈ x

, we see that

{x}

has no

∈

-minimal element (we say

∈

-minimal in

if (

∀z ∈ x

)(

z ∈ y

)). In the second case,

{x, y}

has no minimal

element. In the last case, {x

, x

, ···} has no ∈-minimal element.

Axiom (Axiom of foundation). “Every (non-empty) set has an

∈

-minimal

member”

(∀x)(x = ∅ ⇒ (∃y)(y ∈ x ∧ (∀z)(z ∈ x ⇒ z ∈ y))).

This is sometimes known as the Axiom of regularity.

Note that most of the time, we don’t actually use the Axiom of Foundation.

It’s here just so that our universe “looks nice”. Most results in set theory don’t

rely on foundation.

We will later show that this entails that all sets in

can be “built out of”

the empty set, but that’s left for a later time.

Axiom of replacement

In ordinary mathematics, we often say things like “for each

x ∈ I

, we have some

. Now take

i ∈ I}

”. For example, (after defining ordinals), we want to

have the set {ω + i : i ∈ ω}.

How do we know that

i ∈ I}

is a set? How do we know that

i 7→ A

is a

function, i.e. that

{

(

i, A

) :

i ∈ I}

is a set? It feels like it should be. We want an

axiom that says something along the line of “the image of a set under a function

is a set”. However, we do not know that the thing is a function yet. So we will

have “the image of a set under something that looks like a function is a set”.

To formally talk about “something that looks like a function”, we need a

digression on classes:

Digression on classes

x 7→ {x}

for all

looks like a function, but isn’t it, since every function

has a

(set) domain, defined as a suitable subset of

, but our function here has

domain V .

So what is this x 7→ {x}? We call it a class.

Definition (Class). Let (

V, ∈

) be an

-structure. A class is a collection

points of

such that, for some formula

with free variable

(and maybe more

funny parameters), we have

x ∈ C ⇔ p holds.

Intuitively, everything of the form {x ∈ V : p(x)} is a class.

Note that here we are abusing notation. When we say

x ∈ C

, the symbol

∈

does not mean the membership relation in

. Inside the theory, we should view

x ∈ C as a shorthand of “p(x) holds”.

Example.

(i) V is a class, by letting p be “x = x”.

(ii)

For any

{x ∈ V

t ∈ x}

is a class, with

being

t ∈ x

. Here

is a

parameter.

(iii) For any set y ∈ V , y is a class — let p be “x ∈ y”.

Definition (Proper class). We say

is a proper class if

is not a set (in

), ie

¬(∃y)(∀x)(x ∈ y ⇔ p).

Similarly, we can have

Definition (Function-class). A function-class

is a collection of ordered pairs

such that there is a formula

with free variables

x, y

(and maybe more) such

that

(x, y) ∈ F ⇔ p holds, and (x, y) ∈ F ∧ (x, z) ∈ F ⇒ y = z.

Example. x 7→ {x} is a function-class: take p to be y = {x}.

Back to replacement

How do we talk about function-classes? We cannot refer to classes inside

Instead, we must refer to the first-order formula p.

Axiom (Axiom of replacement). “The image of a set under a function-class is a

set”. This is an axiom scheme, with an instance for each first-order formula p:

(∀t

) ···(∀t

)

| {z }

parameters



[(∀x)(∀y)(∀z)((p ∧p[z/y]) ⇒ y = z)]

| {z }

p defines a function-class

⇒ [(∀x)(∃y)(z ∈ y ⇔ (∃t)(t ∈ x ∧ p[t/x, z/y]))]

| {z }

image of x under F is a set



Example. For any set

, we can form

{{t}

t ∈ x}

being “

{x}

”.

However, this is a bad use of replacement, because we can already obtain that

by power set (and separation).

We will give a good example later.

So that’s all the axioms of ZF. Note that we did not include the Axiom of

choice!

Axiom of choice

Definition (ZFC). ZFC is the axioms ZF + AC, where AC is the axiom of

choice, “every family of non-empty sets has a choice function”.

(∀f)[(∀x)(x ∈ dom f ⇒ f(x) = ∅) ⇒

(∃g)(dom g = dom f) ∧ (∀x)(x ∈ dom g ⇒ g(x) ∈ f (x))]

Here we define a family of sets

i ∈ I}

to be a function

I → V

such that

i 7→ A

5.2 Properties of ZF

Now, what does V look like? We start with the following definition:

Definition (Transitive set). A set

is transitive if every member of a member

of x is a member of x:

(∀y)((∃z)(y ∈ z ∧ z ∈ x) ⇒ y ∈ x).

This can be more concisely written as

x ⊆ x

, but is very confusing and

impossible to understand!

This notion seems absurd — and it is. It is only of importance in the context

of set theory and understanding what

looks like. It is an utterly useless notion

in, say, group theory.

Example.

{∅, {∅}}

is transitive. In general, for any

n ∈ ω

is transitive.

also transitive.

Lemma. Every x is contained in a transitive set.

Note that this is a theorem of

, i.e. it officially means: let (

V, ∈

) be a

model of ZF. Then in V , <stuff above>. Equivalently, ZF ⊢ <stuff above>.

We know that any intersection of transitive sets is transitive. So this lemma

will tell us that

is contained in a least transitive set, called the transitive

closure of x, or T C(x).

Proof.

We’d like to form “

x ∪

(

)

∪

(

)

∪

(

SSS

)

∪ ···

”. If this makes

sense, then we are done, since the final product is clearly transitive. This will be

a set by the union axiom applied to

{x,

x, ···}

, which itself is a set by

replacement applied to

, for the function-class 0

7→ x

, 1

7→

, 2

7→

etc.

Of course we have to show that the above is a function class, i.e. can be

expressed as a first order relation. We might want to write the sentence as:

p(s, t) is (s = 0 ∧ t = x) ∨ (∃u)(∃v)(s = u + 1 ∧ t =

v ∧ p(u, v)),

but this is complete nonsense! We are defining p in terms of itself!

The solution would be to use attempts, as we did previously for recursion. We

define “

is an attempt” to mean “

is a function and

dom f ∈ ω

and

dom f 

∅

and

(0) =

and (

∀n

)(

n ∈ ω ∧ n ∈ dom f

)

⇒ f

(

) =

(

n −

1), i.e.

defined for some natural numbers and meet our requirements when it is defined.

Then it is easy to show that two attempts

and

′

agree whenever both are

defined. Also,

∀n ∈ ω

, there is an attempt

defined for

(both by

-induction).

Note that the definition of an attempt is a first-order formula. So our function

class is

p(s, t) is (∃f )(f is an attempt ∧ y ∈ dom f ∧ f (y) = z).

This is a good use of replacement, unlike our example above.

We previously said that Foundation captures the notion “every set is built

out of simpler sets”. What does that exactly mean? If this is true, we should be

able to do induction on it: if p(y) for all y ∈ x, then p(x).

If this sounds weird, think about the natural numbers: everything is built

from 0 using +1. So we can do induction on ω.

Theorem (Principle of

∈

-induction). For each formula

, with free variables

, ··· , t

, x,

(∀t

) ···(∀t

)



[(∀x)((∀y)(y ∈ x ⇒ p(y))) ⇒ p(x)] ⇒ (∀x)(p(x))



Note that officially, p(y) means p[y/x] and p(x) is simply x.

Proof.

Given

, ··· , t

, suppose

(

∀x

)

(

). So we have some

with

¬p

(

Similar to how we proved regular induction on naturals from the well-ordering

principle (in IA Numbers and Sets), we find a minimal

such that

(

) does

not hold.

While foundation allows us to take the minimal element of a set,

¬p

(

)

}

need not be a set — e.g. if p(y) is y = y.

Instead, we pick a single

such that

¬p

(

). Let

T C

(

{x}

). Then

{y ∈ u

¬p

(

)

} 

∅

, since

x ∈ u

. So it has an

∈

-minimal element, say

, by

Foundation. Then each

z ∈ y

has

z ∈ u

since

is transitive. Hence

(

) by

minimality of y. But this implies p(y). Contradiction.

Note that here we used the transitive closure to reduce from reasoning about

the whole scary V , to an actual set T C(x). This technique is rather useful.

We have now used Foundation to prove

∈

-induction. It turns out that the

two are in fact equivalent.

Proposition. ∈-induction ⇒ Foundation.

Proof.

To deduce foundation from

∈

-induction, the obvious

(

) —

has an

∈-minimal member, doesn’t work.

Instead, consider p(x) given by

(∀y) x ∈ y ⇒ y has an ∈ -minimal member.

(

) is true, we say

is regular. To show that (

∀x

)

(

), it is enough to show

that: if every y ∈ x is regular, then x is regular.

Given any

with

x ∈ z

, we want to show that

has an

∈

-minimal member.

is itself minimal in

, then done. Otherwise, then

y ∈ z

for some

y ∈ x

But since y ∈ x, y is regular. So z has a minimal element.

Hence all

is regular. Since all non-empty sets contain at least one element

(by definition), all sets have ∈-minimal member.

This looked rather simple to prove. However, it is because we had the clever

idea of regularity. If we didn’t we would be stuck for a long time!

Now what about recursion? Can we define f(x) using f(y) for all y ∈ x?

Theorem (

∈

-recursion theorem). Let

be a function-class, everywhere defined.

Then there is a function-class

such that

(

) =

(

F |

) for all

. Moreover,

F is unique (cf. definition of recursion on well-orderings).

Note that F |

= {(y, F (y)) : y ∈ x} is a set, by replacement.

Proof.

We first show existence. Again, we prove this with attempts. Define “

is an attempt” to mean “

is a function and

dom f

is transitive and (

∀x

)(

x ∈

dom f ⇒ f (x) = G(f |

))”.

Then by simple ∈-induction, we have

(∀x)(∀f

′

)[(f an attempt defined at x∧

′

an attempt defined at x) ⇒ f(x) = f

′

(x)].

Also, (

∀x

)(

∃f

)(

f an attempt defined at x

), again by

∈

-induction: suppose for

each

y ∈ x

, there exists an attempt defined at

. So there exists a unique attempt

with domain

T C

(

{y}

). Set

y∈x

, and let

′

{

(

x, G

(

)

}

. Then

this is an attempt defined at x.

So we take q(x, y) to be

(∃f)(f is an attempt defined at x with f(x) = y).

Uniqueness follows form ∈-induction.

Note that this is exactly the same as the proof of recursion on well-orderings.

It is just that we have some extra mumbling about transitive closures.

So we proved recursion and induction for

∈

. What property of the relation-

class

∈

(with

(

x, y

) defined as

x ∈ y

) did we use? We most importantly used

the Axiom of foundation, which says that

(i) p is well-founded: every set has a p-minimal element.

(ii) p

is local: ie,

(

x, y

)

}

is a set for each

. We needed this to form the

transitive closure.

Definition (Well-founded relation). A relation-class

is well-founded if every

set has a R-minimal element.

Definition (Local relation). A relation-class

is local if

(

x, y

)

}

is a set

for each y.

So we will have

Proposition.

-induction and

-recursion are well-defined and valid for any

p(x, y) that is well-founded and local.

Proof. Same as above.

Note that if

is a relation on a set

, then

is trivially local. So we just

need r to be well-founded. In particular, our induction and recursion theorems

for well-orderings are special cases of this.

We have almost replicated everything we proved for well-orderings, except

for subset collapse. We will now do that.

This is motivated by the following question: can we model a given relation

on a set by ∈?

For example, let

{b, c, d}

, with

b r c

and

c r d

. Can we find a set

′

, c

′

, d

′

}

such that

′

∈ c

′

and

′

∈ d

′

? Yes. We can put

′

∅

′

{∅}

′

Moreover, a

′

= {b

′

, c

′

, d

′

} is transitive.

Definition (Extensional relation). We say a relation

on set

is extensional if

(∀x ∈ a)(∀y ∈ a)((∀z ∈ a)(z r x ⇔ z r y) ⇒ x = y).

i.e. it obeys the axiom of extension.

Theorem (Mostowski collapse theorem). Let

be a relation on a set

that is

well-founded and extensional. Then there exists a transitive

and a bijection

a → b

such that (

∀x, y ∈ a

)(

x r y ⇔ f

(

)

∈ f

(

)). Moreover,

and

are

unique.

Note that the two conditions “well-founded” and “extensional” are trivially

necessary, since ∈ is both well-founded and extensional.

Proof.

Existence: define

the obvious way —

(

) =

(

) :

y r x}

. This

is well-defined by

-recursion, and is a genuine function, not just of a function

class by replacement — it is an image of a.

Let

(

) :

x ∈ a}

(this is a set by replacement). We need to show that

it is transitive and bijective.

By definition of

is transitive, and

is surjective as

is defined to be the

image of f. So we have to show that f is injective.

We’ll show that (

∀x ∈ a

)(

(

) =

(

)

⇒ y

) for each

x ∈ a

, by

induction. Given

y ∈ a

, with

(

) =

(

), we have

(

) :

t r y}

(

) :

s r y}

by definition of

. So

t r y}

s r x}

by the induction hypothesis. Hence

x = y since r is extensional.

So we have constructed such an

and

. Now we show it is unique: for any

suitable f, f

′

, we have f(x) = f

′

(x) for all x ∈ a by r-induction.

Recall that we defined the ordinals to be the “equivalence class” of all well-

orderings. But this is not a good definition since the ordinals won’t be sets.

Hence we define them (formally) as follows:

Definition (Ordinal). An ordinal is a transitive set, totally ordered by ∈.

This is automatically well-ordered by ∈, by foundation.

Example.

∅

{∅}

{∅, {∅}}

are ordinals. Any

n ∈ ω

{

, ··· , n −

}

, as

well as ω itself, are ordinals.

Why is this a good definition? Mostowski says that each well-ordering is

order-isomorphic to a unique ordinal (using our definition of ordinal above)

— this is its order-type. So here, instead of saying that the ordinals is the

equivalence class of well-orderings, we simply choose one representative of each

equivalence class (given by Mostowski collapse), and call that the ordinal.

For any ordinal

{β

β < α}

is a well-ordering of order-type

Applying Mostowski collapse, we get α = {β : β < α}. So β < α iff β ∈ α.

So, for example,

α ∪ {α}

, and

sup{α

k ∈ I}

{α

i ∈ I}

Set theorists are often write suprema as unions, but this is a totally unhelpful

notation!

5.3 Picture of the universe

What does the universe look like? We start with the empty set, and take the

power set, repeatedly, transfinitely.

Definition (von Neumann hierarchy). Define sets

for

α ∈ On

(where

the class of ordinals) by ∈-recursion:

(i) V

= ∅.

(ii) V

α+1

= P(V

(iii) V

: γ < λ} for λ a non-zero limit ordinal.

= ∅

= {∅}

ω+1

ω+ω

Note that by definition, we have x ⊆ V

⇔ x ∈ V

α+1

We would like every

to be in some

, and that is indeed true. We prove

this though a series of lemmas:

Lemma. Each V

is transitive.

Proof.

Since we define

by recursion, it is sensible to prove this by induction:

By induction on α:

(i) Zero: V

= ∅ is transitive.

(ii)

Successors: If

is transitive, then so is

(

): given

y ∈ z ∈ P

(

), we want

to show that

y ∈ P

(

). Since

is in a member of

(

), i.e. a subset of

we must have y ∈ x. So y ⊆ x since x is transitive. So y ∈ P(x).

(iii) Limits: Any union of transitive sets is transitive.

Lemma. If α ≤ β, then V

⊆ V

Proof. Fix α, and induct on β.

(i) β = α: trivial

(ii)

Successors:

⊆ V

since

x ⊆ P

(

) for transitive

. So

⊆ V

⇒ V

⊆

(iii) Limits: Trivial by definition

Finally we are ready for:

Theorem. Every x belongs to some V

. Intuitively, we want to say

V =

[

α∈On

We’ll need some terminology to start with. If

x ⊆ V

for some

, then there

is a least α with x ⊆ V

. We call this the rank of x.

For example,

rank

(

∅

) = 0,

rank

(

{∅}

) = 1. Also

rank

(

) =

. In fact,

rank(α) = α for all α ∈ On.

Note that we want the least α such that x ⊆ V

, not x ∈ V

Proof. We’ll show that (∀x)(∃α)(x ∈ V

) by ∈-induction on x.

So we are allowed to assume that for each

y ∈ x

, we have

y ⊆ V

for some

So y ⊆ V

rank(y)

, or y ∈ V

rank(y)+1

Let

sup{

(

rank

(

)

y ∈ x}

. Then

y ∈ V

for every

y ∈ x

. So

x ⊆ V

Our definition of rank is easy to get wrong — it is easy to be off by of 1. So

the official definition is

Definition (Rank). The rank of a set x is defined recursively by

rank(x) = sup{(rank y)

: y ∈ x}.

Then the initial definition we had is now a proposition.

Proposition. rank(x) is the first α such that x ⊆ V

6 Cardinals

In this chapter, we will look at the “sizes” of (infinite) sets (finite sets are

boring!). We work in ZFC, since things become really weird without Choice.

Since we will talk about bijections a lot, we will have the following notation:

Notation. Write x ↔ y for ∃f : f is a bijection from x to y.

6.1 Definitions

We want to define

card

(

) (the cardinality, or size of

) in such a way that

card

(

) =

card

(

)

⇔ x ↔ y

. We can’t define

card

(

) =

y ↔ x}

as it may

not be a set. So we want to pick a “representative” of the sets that biject with

x, like how we defined the ordinals.

So why not use the ordinals? By Choice, we know that all

is well-orderable.

So x ↔ α for some α. So we define:

Definition (Cardinality). The cardinality of a set

, written

card

(

), is the

least ordinal α such that x ↔ α.

Then we have (trivially) card(x) = card(y) ⇔ x ↔ y.

(What if we don’t have Choice, i.e. if we are in ZF? This will need a really

clever trick, called the Scott trick. In our universe of ZF, there is a huge of

blob of things that biject with

. We cannot take the whole blob (it won’t be a

set), or pick one of them (requires Choice). So we “chop off” the blob at fixed,

determined point, so we are left with a set.

Define the essential rank of

to be the least rank of all

such that

y ↔ x

Then set card(x) = {y ∈ V

essrank(x)

: y ↔ x}.)

So what cardinals are there? Clearly we have 1, 2, 3, ···. What else?

Definition (Initial ordinal). We say an ordinal α is initial if

(∀β < α)(¬β ↔ α),

i.e. it is the smallest ordinal of that cardinality.

Then 0

, ··· , ω, ω

, γ

(

) for any

are all initial. However,

is not

initial, as it bijects with ω (both are countable).

Can we find them all? Yes!

Definition (Omega ordinals). We define ω

for each α ∈ On by

(i) ω

= ω;

(ii) ω

α+1

= γ(ω

);

(iii) ω

= sup{ω

: α < λ} for non-zero limit λ.

It is easy induction to show that each

is initial. We can also show that

every initial

(for

δ ≥ ω

) is an

. We know that the

are unbounded since,

say ω

≥ α for all α. So there is a least α with ω

≥ δ.

is a successor, then let

. Then

< δ ≤ ω

. But there is no

initial ordinal between

and

(

), since

(

) is defined as the least

ordinal that does not biject with X. So we must have δ = ω

is a limit, then since

is defined as a supremum, by definition we

cannot have

δ < ω

, or else there is some

β < α

with

δ < ω

. So

as well.

Definition (Aleph number). Write ℵ

(“aleph-α”) for card(ω

Then from the argument above, we have

Theorem. The

ℵ

are the cardinals of all infinite sets (or, in ZF, the cardinals

of all infinite well-orderable sets). For example, card(ω) = ℵ

, card ω

= ℵ

We will use lower case letters to denote cardinalities and upper case for the

sets with that cardinality. e.g. card(N) = n.

Definition (Cardinal (in)equality). For cardinals

and

, write

m ≤ n

injects into

, where

card M

m, card N

. This clearly does not depend on

M and N .

m ≤ n

and

n ≤ m

implies

by Schr¨oder-Bernstein. Write

m < n

m ≤ n by m = n.

Example. card(P(ω)) > card(ω).

This ≤ is a partial order. Moreover, it is a total order (assuming AC).

6.2 Cardinal arithmetic

Definition (Cardinal addition, multiplication and exponentiation). For cardinals

m, n

, write

for

card

(

M ⊔N

);

for

card

(

M ×N

); and

for

card

(

where

f is a function N → M}

. Note that this coincides with our

usual definition of X

for finite n.

Example. R ↔ P(ω) ↔ 2

. So card(R) = card(P

) = 2

ℵ

Similarly, define

i∈I

= card

i∈I

Example. How many sequences of reals are there? A real sequence is a function

from ω → R. We have

card(R

) = (2

ℵ

)

ℵ

= 2

ℵ

×ℵ

= 2

ℵ

= card(R)

Note that we used facts like

Proposition.

(i) m + n = n + m since N ⊔ M ↔ N ⊔ N with the obvious bijection.

(ii) mn = nm using the obvious bijection

(iii)

(

)

as (

)

↔ M

N×P

since both objects take in a

and an

N and returns an M.

It is important to note that cardinal exponentiation is different from ordinal

exponentiation. For example,

(ordinal exponentiation) is countable, but

ℵ

≥ 2

ℵ

> ℵ

(cardinal exponentiation).

From IA Numbers and sets, we know that

ℵ

. What about

ℵ

Or ℵ

ℵ

It turns out that cardinal sums and multiplications are utterly boring:

Theorem. For every ordinal α,

ℵ

= ℵ

This is the best we could ever ask for. What can be simpler?

Proof.

Since the Alephs are defined by induction, it makes sense to prove it by

induction.

In the following proof, there is a small part that doesn’t work nicely with

α = 0. But α = 0 case (ie ℵ

ℵ

= ℵ

) is already done. So assume α = 0.

Induct on

. We want

× ω

to biject with

, i.e. well-order

× ω

an ordering of length ω

Using the ordinal product clearly doesn’t work. The ordinal product counts

the product in rows, so we have many copies of

. When we proved

ℵ

we counted them diagonally. But counting diagonally here doesn’t look very

nice, since we will have to “jump over” infinities. Instead, we count in squares

We set (

x, y

)

(

′

, y

′

) if either

max

(

x, y

)

< max

(

′

, y

′

) (this says that (

′

, y

′

)

is in a bigger square), or, (say

max

(

x, y

) =

max

(

′

, y

′

) =

and

′

β, y < β

′

β, y < y

′

β, x < x

′

) (nonsense used to order things in the

same square — utterly unimportant).

How do we show that this has order type

? We show that any initial

segment has order type < ω

For any proper initial segment I

(x,y)

, we have

(x,y)

⊆ β × β

for some β < ω

, since ω

is a limit, with wlog β infinite. So

β × β ↔ β

by induction hypothesis (their cardinality is less that ω

). So

card(β × β) < card(ω

Hence

(x,y)

has order type

< ω

. Thus the order type of our well-order is

≤ ω

× ω

injects into

. Since trivially

injects into

× ω

, we have

× ω

↔ ω

So why did we say cardinal arithmetic is boring? We have

Corollary. Let α ≤ β. Then

ℵ

+ ℵ

= ℵ

ℵ

= ℵ

Proof.

ℵ

≤ ℵ

+ ℵ

≤ ℵ

+ ℵ

= 2ℵ

≤ ℵ

× ℵ

= ℵ

So done

Example. X ⊔ X bijects with X, for infinite X (in ZFC).

However, cardinal exponentiation is very hard. For example, is 2

ℵ

This is the continuum hypothesis, and cannot be proved or disproved in ZFC!

Even today, not all implications among values of

ℵ

are known, i.e. we don’t

know whether they are true, false or independent!

7 Incompleteness*

The big goal of this (non-examinable) chapter is to show that PA is incomplete,

i.e. there is a sentence p such that PA ⊢ p and PA ⊢ ¬p.

The strategy is to find a p that is true in N, but PA ⊢ p.

Here we say “true” to mean “true in

”, and “provable” to mean “PA proves

it”.

The idea is to find a

that says “I am not provable”. More precisely, we

want a

such that

is true if and only if

is not provable. We are then done:

must be true, since if

were false, then it is provable, i.e.

PA ⊢ p

. So

holds

in every model of PA, and in particular,

holds in

. Contradiction. So

true. So p is not provable.

We’ll have to “code” formulae, proofs etc. inside PA, i.e. as numbers. But

this doesn’t seem possible — it seems like, in any format, “

is not provable”

must be longer than p. So p cannot say “p is not provable“!

So be prepared for some magic to come up in the middle for the proof!

Definability

We first start with some notions of definability.

Definition (Definability). A subset

S ⊆ N

is definable if there is a formula

with one free variable such that

∀m ∈ N : m ∈ S ⇔ p(m) holds.

Similarly, f : N → N is definable if there exists a formula p(x, y) such that

∀m, n ∈ N : f (m) = n ⇔ p(m, n) holds.

Example. The set of primes is definable: p(x) is

x = 1 ∧ (∀y)(∀z)(yz = x ⇒ (y = 1 ∨ z = 1)).

We can say “m is prime” is definable.

How about powers of 2? We don’t have exponentiation here. What can we

do? We can take p(x) to be

(∀y)((y is prime ∧ y | x) ⇒ y = 2),

where 2 is a shorthand for

(

(0)), and

y | x

is a shorthand for (

∃z

)(

). So

this is also definable.

The function m 7→ m

is also definable: take p(x, y) to be yy = x.

Here we will assume:

Fact. Any function given by an algorithm is definable.

Proof will not be given here. See, eg, PTJ’s book for detailed proof.

Example. m 7→ 2

is definable.

Coding

Our language has 12 symbols:

, ×,

, ⊥, ⇒,

(

)

, x,

′

, ∀

(where the variables

are now called x, x

′

, x

′′

, x

′′′

We assign values to them, say

(

) = 1

, v

(0) = 2

, ··· , v

(

∀

) = 12. To code a

formula, we can take

v(first symbol)

· 3

v(second symbol)

···(nth prime)

v(nth symbol)

For example, (∀x)(x = x) is coded as

Not every

codes a formula, e.g. 2

is translated to (

∀x

, which is clearly

nonsense. Similarly, 2

or 2

100

can’t even be translated at all.

However, “

codes a formula” is definable, as there is an algorithm that

checks that.

Write

for the formula coded by

(and set

to be “

⊥

” if

does not

code a formula). Similarly, write c(p) for the code of p.

Given a finite sequence p

, ···p

for formulae, code it as

c(p

)

c(p

)

···(nth prime)

c(p

)

Alternatively, we can add a separator character to our 12 symbols and concatenate

the sequence of formulae with the separator character.

Now, “

codes an axiom (either logical or axiom of PA)” is definable, as

there exists an algorithm to check it. For example, an instance of the first logical

axiom can be validated by the following regex:

^$[s0()=x’\+×⊥ ⇒ ∀]+$⇒([s0()=x’\+×⊥ ⇒ ∀]+ ⇒ \1)$

(after translating to a sentence and verifying it is a valid logical formula)

Also,

ϕ(ℓ, m, n) =“S

obtained from S

ℓ

, S

via MP”

is definable, and similarly for generalization.

Θ(m, n) = “n codes a proof of S

”

is definable.

Thus

Ψ(m) = “S

is provable”

is definable as

Ψ(m) ⇔ (∃n)Θ(m, n).

So far, everything is rather boring. We all know that strings of symbols can be

coded into numbers — that’s what computers do!

Clever bit

Consider the statement χ(m) that states

“m codes a formula S

with one free variable, and S

(m) is not provable.”

This is clearly definable. Suppose this is defined by p. So

χ(n) ⇔ p[n/x]

Suppose c(p) = N. Then χ(N) asserts

“N codes a formula S

with one free variable and S

(N) is not provable.”

But we already know that

codes the formula

. So

(

) asserts that

(

) is

not provable.

Theorem (G¨odel’s incompleteness theorem). PA is incomplete.

Maybe that’s because PA is rubbish. Could we add some clever axiom

(true

) to PA, so that PA

∪{t}

is complete? No! Just run the same proof with

“PA” replaced by “PA∪{t}” to show that PA∪{t} is incomplete.

But we can certainly extend PA to a complete theory — let

p holds in N}. What will go wrong in our proof? It can only be because

Theorem. “Truth is not definable”

T = {p : p holds in N} is not definable. This officially means

{m : m codes a member of T }

is not a definable set.

Next question: we proved that our clever statement

is true but not provable.

Why doesn’t this formalize into PA? The answer is, in the proof, we also used

the fact that PA has a model,

. By completeness, this means that we used the

statement con(PA), i.e. the statement that PA is consistent, or

(∀m)(m does not code a proof of ⊥).

With this statement, the proof does formalize to PA. So PA

∪{con(PA)} ⊢ p

Hence

Theorem. PA ⊢ con(PA).

The same proof works in ZF. So ZF is incomplete and ZF does not prove its

own consistency.