Part IA — Numbers and Sets
Based on lectures by A. G. Thomason
Notes taken by Dexter Chua
Michaelmas 2014
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Introduction to number systems and logic
Overview of the natural numbers, integers, real numbers, rational and irrational
numbers, algebraic and transcendental numbers. Brief discussion of complex numbers;
statement of the Fundamental Theorem of Algebra.
Ideas of axiomatic systems and proof within mathematics; the need for proof; the
role of counterexamples in mathematics. Elementary logic; implication and negation;
examples of negation of compound statements. Proof by contradiction. [2]
Sets, relations and functions
Union, intersection and equality of sets. Indicator (characteristic) functions; their use in
establishing set identities. Functions; injections, surjections and bijections. Relations,
and equivalence relations. Counting the combinations or permutations of a set. The
InclusionExclusion Principle. [4]
The integers
The natural numb ers: mathematical induction and the wellordering principle. Exam
ples, including the Binomial Theorem. [2]
Elementary number theory
Prime numbers: existence and uniqueness of prime factorisation into primes; highest
common factors and least common multiples. Euclid’s proof of the infinity of primes.
Euclid’s algorithm. Solution in integers of ax + by = c.
Mo dular arithmetic (congruences). Units modulo
n
. Chinese Remainder Theorem.
Wilson’s Theorem; the FermatEuler Theorem. Public key cryptography and the RSA
algorithm. [8]
The real numbers
Least upper bounds; simple examples. Least upper bound axiom. Sequences and series;
convergence of bounded monotonic sequences. Irrationality of
√
2
and
e
. Decimal
expansions. Construction of a transcendental number. [4]
Countability and uncountability
Definitions of finite, infinite, countable and uncountable sets. A countable union of
countable sets is countable. Uncountability of R. Nonexistence of a bijection from a
set to its power set. Indirect proof of existence of transcendental numbers. [4]
Contents
0 Introduction
1 Proofs and logic
1.1 Proofs
1.2 Examples of proofs
1.3 Logic
2 Sets, functions and relations
2.1 Sets
2.2 Functions
2.3 Relations
3 Division
3.1 Euclid’s Algorithm
3.2 Primes
4 Counting and integers
4.1 Basic counting
4.2 Combinations
4.3 Wellordering and induction
5 Modular arithmetic
5.1 Modular arithmetic
5.2 Multiple moduli
5.3 Prime moduli
5.4 Publickey (asymmetric) cryptography
6 Real numbers
6.1 Construction of numbers
6.2 Sequences
6.3 Series
6.4 Irrational numbers
6.5 Euler’s number
6.6 Algebraic numbers
7 Countability
0 Introduction
According to the Faculty, this course is not aimed at teaching you any new
knowledge. In particular, the Faculty says
This course is concerned not so much with teaching you new parts
of mathematics . . .
Instead, this course is intended to teach you about how to do maths properly.
The objective of this course is to start from a few axioms, which are assumptions
about our system, and try to prove everything rigorously from the axioms.
This is different from how mathematics is usually done in secondary school.
In secondary school, we just accept that certain statements are true. For example,
probably no one rigorously proved to you that each natural number has a unique
prime factorization. In this course, nothing is handwaved. Everything will be
obtained as a logical consequence of the axioms and previous (proven) results.
In the course, you shouldn’t focus too much on the content itself. Instead,
the major takeaway should be how we do mathematics. In particular, how we
construct and present arguments.
The actual content of this course is rather diverse. As the name suggests,
the course touches on numbers and sets. The “numbers” part is mostly basic
number theory, which you may have come across if you have participated in
mathematical olympiads. We also study some properties of real numbers, but
the actual serious study of real numbers is done in IA Analysis I.
The “sets” part starts with some basic definitions of sets, functions and
relations, which are really important and will crop up in all courses you will
encounter. In some sense, these form the language in which mathematics is
written. At the end of the course, we will touch on countability, which tells us
how “big” a set is.
1 Proofs and logic
1.1 Proofs
As the first course in pure mathematics, we will have an (informal) look at proofs
and logic.
In mathematics, we often want to prove things. This is different from most
other disciplines. For example, in science, we perform experiments to convince
ourselves that our theory is correct. However, no matter how many experiments
we do, we still cannot be absolutely sure that our theory is correct. For example,
Newtonian mechanics was believed to be correct for a long time, but is nowadays
only considered to be an approximation of reality.
On the other hand, when we prove a theorem in mathematics, we are
completely sure that the theorem is true. Also, when we actually prove a
theorem, (hopefully) we can also understand why it is true, and gain further
insight.
Definition
(Proof)
.
A proof is a sequence of true statements, without logical
gaps, that is a logical argument establishing some conclusion.
To prove things, we need to start from some assumptions. These assumptions
are known as axioms. When we call something an axiom, it does not mean that
we take these statements to be true without questioning. Instead, we are saying
“if we assume these axioms, then these results hold”. Two people can disagree on
what the axioms are and still be friends.
We also tend to define concepts as unambiguously as possible. Of course, just
like a dictionary cannot define all words without being circular, we do not define
everything in mathematics. To prove things, we have to start somewhere, with
some agreed assumptions (axioms). We also don’t define everything rigorously
(or else how could one start speaking?).
In mathematics, we are often concerned about truth. Often, we only care
about statements that can take some truth value.
Definition
(Statement)
.
A statement is a sentence that can have a true value.
Example. The following are statements:
(i) There are infinitely many primes of the form n
2
+ 1
(ii) There is always a prime number between n and 2n
(iii)
There is no computer program that can factorize an
n
digit number in
n
3
steps
(iv)
For every polynomial
p
(
x
) =
a
n
x
n
+
a
n−1
x
n−1
+
···
+
a
0
, where
a
i
are
complex numbers,
n ≥
1,
a
n
6
= 0, there exists (possibly complex)
z
such
that p(z) = 0
(v) m × n = n × m for all natural numbers n, m
(vi) 2 + 2 = 4
The current status (as of 2015) of these statements are:
(i) No one has a proof but it is probably true
(ii) This is known to be true
(iii) No one knows (related to P = NP problem)
(iv) This is known to be true (Fundamental Theorem of Algebra)
(v) This is known to be true
(vi) This is known to be true (obviously — does this need to be proved?)
1.2 Examples of proofs
Apart from having a proof, it is very important that a proof is correct. Here we
will look at some examples of proofs and nonproofs.
We first start with a simple example.
Proposition. For all natural numbers n, n
3
− n is a multiple of 3.
Proof.
We have
n
3
−n
= (
n −
1)
n
(
n
+ 1). One of the three consecutive integers
is divisible by 3. Hence so is their product.
Proposition. If n
2
is even, then so is n.
Proof.
If
n
is even, then
n
= 2
k
for some integer
k
. Then
n
2
= 4
k
2
, which is
even.
This is incorrect! We wanted to proof “
n
2
is even”
⇒
“
n
is even”, but what
we proved is “n is even” ⇒ “n
2
is even”, which are distinct statements.
Instead, a correct proof is as follows:
Proof.
Suppose
n
is odd. Then
n
= 2
k
+ 1 for some integer
k
. Then
n
2
=
4
k
2
+ 4
k
+ 1 = 2(2
k
2
+ 2) + 1, which is odd. This contradicts our assumption
that n
2
is even.
This is an example of proof by contradiction. We assume what we want to
prove is false, and show that this leads to nonsense.
Proposition. The solutions to x
2
− 5x + 6 = 0 are x = 2 and x = 3.
Note that these are actually 2 different statements:
(i) x = 2 and x = 3 are solutions
(ii) There are no other solutions
We can write this as an “if and only if” statement:
x
is a solution if and only if
x
= 2 or
x
= 3. Alternatively, we say “
x
is a solution iff
x
= 2 or
x
= 3”; or “
x
is a solution ⇔ x = 2 or x = 3”.
Proof.
(i) If x = 2 or x = 3, then x − 2 = 0 or x − 3 = 0. So (x − 2)(x − 3) = 0.
(ii)
If
x
2
−
5
x
+ 6 = 0, then (
x −
2)(
x −
3) = 0. So
x −
2 = 0 or
x −
3 = 0.
Then x = 2 or x = 3.
Note that the second direction is simply the first argument reversed. We can
write this all in one go:
x = 3 or x = 2 ⇔ x − 3 = 0 or x − 2 = 0
⇔ (x − 3)(x − 2) = 0
⇔ x
2
− 5x − 6 = 0
Note that we used the “if and only if” sign between all lines.
We’ll do another nonproof.
Proposition. Every positive number is ≥ 1.
Proof. Let r be the smallest positive real. Then either r < 1, r = 1 or r > 1.
If
r <
1, then 0
< r
2
< r
. Contradiction. If
r >
1, then 0
<
√
r < r
.
Contradiction. So r = 1.
Now this is obviously false, since 0
.
5
<
1. The problem with this proof is
that the smallest positive real need not exist. So we have to make sure we justify
all our claims.
1.3 Logic
Mathematics is full of logical statements, which are made of statements and
logical connectives. Usually, we use shorthands for the logical connectives.
Let
P
and
Q
be statements. Then
P ∧Q
stands for “
P
and
Q
”;
P ∨Q
stands
for “
P
or
Q
”;
P ⇒ Q
stands for “
P
implies
Q
”;
P ⇔ Q
stands for “
P
iff
Q
”;
¬P
stands for “not
P
”. The truth of these statements depends on the truth of
P and Q . It can be shown by a truth table:
P Q P ∧ Q P ∨ Q P ⇒ Q P ⇔ Q ¬P
T T T T T T F
T F F T F F F
F T F T T F T
F F F F T T T
Certain logical propositions are equivalent, which we denote using the
⇔
sign.
For example,
¬(P ∧ Q) ⇔ (¬P ∨ ¬Q),
or
(P ⇒ Q) ⇔ (¬P ∨ Q) ⇔ (¬Q ⇒ ¬P ).
By convention, negation has the highest precedence when bracketing. For
example, ¬P ∨ ¬Q should be bracketed as (¬P ) ∨ (¬Q).
We also have quantifiers. (
∀x
)
P
(
x
) means “for all
x
,
P
(
x
) is true”, while
(∃x)P (x) means “there exists x such that P (x) is true”.
The quantifiers are usually bounded, i.e. we write
∀x ∈ X
or
∃x ∈ X
to mean
“for all x in the set X” and “there exists x in the set X” respectively.
Quantifiers are negated as follows:
¬(∀x)P (x) ⇔ (∃x)(¬P (x));
¬(∃x)P (x) ⇔ (∀x)(¬P (x)).
2 Sets, functions and relations
In this chapter, we will look at the basic building blocks of mathematics, namely
sets, functions and relations. Most of mathematics can be expressed in terms of
these notions, and it is helpful to be familiar with the relevant terms.
2.1 Sets
Definition
(Set)
.
A set is a collection of stuff, without regards to order. El
ements in a set are only counted once. For example, if
a
= 2
, b
=
c
= 1, then
A
=
{a, b, c}
has only two members. We write
x ∈ X
if
x
is a member of the set
X.
Example. Common sets and the symbols used to denote them:
– N = {1, 2, 3, ···} is the natural numbers
– N
0
= {0, 1, 2, ···} is the natural numbers with 0
– Z = {··· , −2, −1, 0, 1, 2, ···} is the integers
– Q = {
a
b
: a, b ∈ Z, b 6= 0} is the rational numbers
– R is the real numbers
– C is the complex numbers
It is still debated whether 0 is a natural number. Those who believe that 0 is a
natural number usually write
N
for
{
0
,
1
,
2
, ···}
, and
N
+
for the positive natural
numbers. However, most of the time, it doesn’t matter, and when it does, you
should specify it explicitly.
Definition (Equality of sets). A is equal to B, written as A = B, if
(∀x) x ∈ A ⇔ x ∈ B,
i.e. two sets are equal if they have the same elements.
Definition
(Subsets)
. A
is a subset of
B
, written as
A ⊆ B
or
A ⊂ B
, if all
elements in A are in B. i.e.
(∀x) x ∈ A ⇒ x ∈ B.
Theorem. (A = B) ⇔ (A ⊆ B and B ⊆ A)
Suppose
X
is a set and
P
is the property of some elements in
x
, we can write
a set
{x ∈ X
:
P
(
x
)
}
for the subset of
x
comprising of the elements for which
P (x) is true. e.g. {n ∈ N : n is prime} is the set of all primes.
Definition
(Intersection, union, set difference, symmetric difference and power
set). Given two sets A and B, we define the following:
– Intersection: A ∩ B = {x : x ∈ A and x ∈ B}
– Union: A ∪B = {x : x ∈ A or x ∈ B}
– Set difference: A \ B = {x ∈ A : x 6∈ B}
–
Symmetric difference:
A
∆
B
=
{x
:
x ∈ A xor x ∈ B}
, i.e. the elements in
exactly one of the two sets
– Power set: P(A) = {X : X ⊆ A}, i.e. the set of all subsets
New sets can only be created via the above operations on old sets (plus
replacement, which says that you can replace an element of a set with another
element). One cannot arbitrarily create sets such as
X
=
{x
:
x is a set and x 6∈
x}. Otherwise paradoxes will arise.
We have several rules regarding how these set operations behave, which
should be intuitively obvious.
Proposition.
– (A ∩ B) ∩ C = A ∩ (B ∩ C)
– (A ∪ B) ∪ C = A ∪ (B ∪ C)
– A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
Notation. If A
α
are sets for all α ∈ I, then
\
α∈I
A
α
= {x : (∀α ∈ I)x ∈ A
α
}
and
[
α∈I
A
α
= {x : (∃α ∈ I)x ∈ A
α
}.
Definition
(Ordered pair)
.
An ordered pair (
a, b
) is a pair of two items in which
order matters. Formally, it is defined as
{{a}, {a, b}}
. We have (
a, b
) = (
a
0
, b
0
)
iff a = a
0
and b = b
0
.
Definition
(Cartesian product)
.
Given two sets
A, B
, the Cartesian product
of
A
and
B
is
A × B
=
{
(
a, b
) :
a ∈ A, b ∈ B}
. This can be extended to
n
products, e.g.
R
3
=
R × R × R
=
{
(
x, y, z
) :
x, y, z ∈ R}
(which is officially
{(x, (y, z)) : x, y, z ∈ R}).
2.2 Functions
Definition
(Function/map)
.
A function (or map)
f
:
A → B
is a “rule” that
assigns, for each
a ∈ A
, precisely one element
f
(
a
)
∈ B
. We can write
a 7→ f
(
a
).
A and B are called the domain and codomain respectively.
If we wish to be very formal, we can define a function to be a subset
f ⊆ A×B
such that for any
a ∈ A
, there exists a unique
b ∈ B
such that (
a, b
)
∈ f
. We
then think of (
a, b
)
∈ f
as saying
f
(
a
) =
b
. However, while this might act as a
formal definition of a function, it is a terrible way to think about functions.
Example. x
2
:
R → R
is a function that sends
x
to
x
2
.
1
x
:
R → R
is not a
function since
f
(0) is not defined.
±x
:
R → R
is also not a function since it is
multivalued.
It is often helpful to categorize functions into different categories.
Definition
(Injective function)
.
A function
f
:
X → Y
is injective if it hits
everything at most once, i.e.
(∀x, y ∈ X) f (x) = f(y) ⇒ x = y.
Definition
(Surjective function)
.
A function
f
:
X → Y
is surjective if it hits
everything at least once, i.e.
(∀y ∈ Y )(∃x ∈ X) f(x) = y
Example. f : R → R
+
∪ {0} with x 7→ x
2
is surjective but not injective.
Definition
(Bijective function)
.
A function is bijective if it is both injective
and surjective. i.e. it hits everything exactly once.
Definition (Permutation). A permutation of A is a bijection A → A.
Definition
(Composition of functions)
.
The composition of two functions is a
function you get by applying one after another. In particular, if
f
:
X → Y
and
G
:
Y → Z
, then
g ◦ f
:
X → Z
is defined by
g ◦ f
(
x
) =
g
(
f
(
x
)). Note that
function composition is associative.
Definition
(Image of function)
.
If
f
:
A → B
and
U ⊆ A
, then
f
(
U
) =
{f
(
u
) :
u ∈ U }. f (A) is the image of A.
By definition, f is surjective iff f(A) = B.
Definition
(Preimage of function)
.
If
f
:
A → B
and
V ⊆ B
, then
f
−1
(
V
) =
{a ∈ A : f(a) ∈ V }.
This is the preimage of the function
f
, and acts on subsets of
B
. This is
defined for any function
f
. It is important to note that we use the same symbol
f
−1
to denote the inverse function, which we will define later, but they are very
distinct entities. For example, we will see that the inverse function exists only
for bijective functions.
To define the inverse function, we will first need some preliminary definitions.
Definition
(Identity map)
.
The identity map
id
A
:
A → A
is defined as the
map a 7→ a.
Definition
(Left inverse of function)
.
Given
f
:
A → B
, a left inverse of
f
is a
function g : B → A such that g ◦ f = id
A
.
Definition
(Right inverse of function)
.
Given
f
:
A → B
, a right inverse of
f
is a function g : B → A such that f ◦g = id
B
.
Theorem. The left inverse of f exists iff f is injective.
Proof.
(
⇒
) If the left inverse
g
exists, then
∀a, a
0
∈ A, f
(
a
) =
f
(
a
0
)
⇒ g
(
f
(
a
)) =
g(f(a
0
)) ⇒ a = a
0
. Therefore f is injective.
(⇐) If f is injective, we can construct a g defined as
g :
(
g(b) = a if b ∈ f(A), where f(a) = b
g(b) = anything otherwise
.
Then g is a left inverse of f .
Theorem. The right inverse of f exists iff f is surjective.
Proof.
(
⇒
) We have
f
(
g
(
B
)) =
B
since
f ◦g
is the identity function. Thus
f
must be surjective since its image is B.
(
⇐
) If
f
is surjective, we can construct a
g
such that for each
b ∈ B
, pick
one a ∈ A with f(a) = b, and put g(b) = a.
(Note that to prove the second part, for each
b
, we need to pick an
a
such
that
f
(
a
) =
b
. If
B
is infinite, doing so involves making infinite arbitrary choices.
Are we allowed to do so?
To make infinite choices, we need to use the Axiom of choice, which explicitly
says that this is allowed. In particular, it says that given a family of sets
A
i
for
i ∈ I, there exists a choice function f : I →
S
A
i
such that f (i) ∈ A
i
for all i.
So can we prove the theorem without the Axiom of Choice? The answer is
no. This is since if we assume surjective functions have inverses, then we can
prove the Axiom of Choice.
Assume any surjective function
f
has a right inverse. Given a family of
nonempty sets
A
i
for
i ∈ I
(wlog assume they are disjoint), define a function
f
:
S
A
i
→ I
that sends each element to the set that contains the element. This
is surjective since each set is nonempty. Then it has a right inverse. Then the
right inverse must send each set to an element in the set, i.e. is a choice function
for A
i
.)
Definition
(Inverse of function)
.
An inverse of
f
is a function that is both a
left inverse and a right inverse. It is written as
f
−1
:
B → A
. It exists iff
f
is
bijective, and is necessarily unique.
2.3 Relations
Definition
(Relation)
.
A relation
R
on
A
specifies that some elements of
A
are
related to some others. Formally, a relation is a subset
R ⊆ A × A
. We write
aRb iff (a, b) ∈ R.
Example. The following are examples of relations on natural numbers:
(i) aRb iff a and b have the same final digit. e.g. (37)R(57).
(ii) aRb iff a divides b. e.g. 2R6 and 2 6R7.
(iii) aRb iff a 6= b.
(iv) aRb iff a = b = 1.
(v) aRb iff a − b ≤ 3.
(vi) aRb iff either a, b ≥ 5 or a, b ≤ 4.
Again, we wish to classify different relations.
Definition (Reflexive relation). A relation R is reflexive if
(∀a) aRa.
Definition (Symmetric relation). A relation R is symmetric iff
(∀a, b) aRb ⇔ bRa.
Definition (Transitive relation). A relation R is transitive iff
(∀a, b, c) aRb ∧ bRc ⇒ aRc.
Example. With regards to the examples above,
Examples (i) (ii) (iii) (iv) (v) (vi)
Reflexive X X × × X X
Symmetric X × X X X X
Transitive X X × X × X
Definition (Equivalence relation). A relation is an equivalence relation if it is
reflexive, symmetric and transitive. e.g. (i) and (vi) in the above examples are
equivalence relations.
If it is an equivalence relation, we usually write
∼
instead of
R
. As the name
suggests, equivalence relations are used to describe relations that are similar
to equality. For example, if we want to represent rational numbers as a pair
of integers, we might have an equivalence relation defined by (
n, m
)
∼
(
p, q
) iff
nq
=
mp
, such that two pairs are equivalent if they represent the same rational
number.
Example.
If we consider a deck of cards, define two cards to be related if they
have the same suite.
As mentioned, we like to think of things related by
∼
as equal. Hence we
want to identify all “equal” things together and form one new object.
Definition
(Equivalence class)
.
If
∼
is an equivalence relation, then the equiva
lence class [x] is the set of all elements that are related via ∼ to x.
Example. In the cards example, [8♥] is the set of all hearts.
Definition
(Partition of set)
.
A partition of a set
X
is a collection of subsets
A
α
of X such that each element of X is in exactly one of A
α
.
Theorem.
If
∼
is an equivalence relation on
A
, then the equivalence classes of
∼ form a partition of A.
Proof.
By reflexivity, we have
a ∈
[
a
]. Thus the equivalence classes cover the
whole set. We must now show that for all
a, b ∈ A
, either [
a
] = [
b
] or [
a
]
∩
[
b
] =
∅
.
Suppose [
a
]
∩
[
b
]
6
=
∅
. Then
∃c ∈
[
a
]
∩
[
b
]. So
a ∼ c, b ∼ c
. By symmetry,
c ∼ b
. By transitivity, we have
a ∼ b
. For all
b
0
∈
[
b
], we have
b ∼ b
0
. Thus
by transitivity, we have
a ∼ b
0
. Thus [
b
]
⊆
[
a
]. By symmetry, [
a
]
⊆
[
b
] and
[a] = [b].
On the other hand, each partition defines an equivalence relation in which
two elements are related iff they are in the same partition. Thus partitions and
equivalence relations are “the same thing”.
Definition
(Quotient map)
.
The quotient map
q
maps each element in
A
to
the equivalence class containing a, i.e. a 7→ [a]. e.g. q(8♥) = {♥}.
3 Division
If you think you already know how to divide, well perhaps you are right. However,
in this chapter, we will define these notions formally and properly prove things
we already know (plus maybe some new things).
3.1 Euclid’s Algorithm
Definition
(Factor of integers)
.
Given
a, b ∈ Z
, we say
a
divides
b
,
a
is a factor
of
b
or
a  b
if (
∃c ∈ Z
)
b
=
ac
. For any
b
,
±
1 and
±b
are always factors of
b
.
The other factors are called proper factors.
Theorem
(Division Algorithm)
.
Given
a, b ∈ Z
,
b 6
= 0, there are unique
q, r ∈ Z
with a = qb + r and 0 ≤ r < b.
Despite the name, the division algorithm is not an algorithm in the usual
sense. Instead, it merely states that you can divide. Even the proof does not
specify a (nonbrute force) way of how to divide.
Proof.
Choose
q
=
max{q
:
qb ≤ a}
. This maximum exists because the set of all
q
such that
qb ≤ a
is finite. Now write
r
=
a − qb
. We have 0
≤ r < b
and thus
q and r are found.
To show that they are unique, suppose that
a
=
qb
+
r
=
q
0
b
+
r
0
. We
have (
q − q
0
)
b
= (
r
0
− r
). Since both
r
and
r
0
are between 0 and
b
, we have
−b < r − r
0
< b
. However,
r
0
− r
is a multiple of
b
. Thus
q − q
0
=
r
0
− r
= 0.
Consequently, q = q
0
and r = r
0
.
Definition
(Common factor of integers)
.
A common factor of
a
and
b
is a
number c ∈ Z such that c  a and c  b.
Definition
(Highest common factor/greatest common divisor)
.
The highest
common factor or greatest common divisor of two numbers
a, b ∈ N
is a number
d ∈ N
such that
d
is a common factor of
a
and
b
, and if
c
is also a common
factor, c  d.
Clearly if the hcf exists, it must be the largest common factor, since all other
common factors divide it, and thus necessarily unique.
You might think reasonably it is more natural to define
hcf
(
a, b
) to be the
largest common factor. Then show that it has the property that all common
factors divide it. But the above definition is superior because it does not require
a prior ordering of the natural numbers (and can be extended to any ring even
if they are not ordered, as we will do in IB Groups, Rings and Modules).
Notation. We write d = hcf(a, b) = gcd(a, b) = (a, b).
Here we use (
a, b
) to stand for a number, and has nothing to do with an
ordered pair.
Proposition. If c  a and c  b, c  (ua + vb) for all u, v ∈ Z.
Proof.
By definition, we have
a
=
kc
and
b
=
lc
. Then
ua
+
vb
=
ukc
+
vlc
=
(uk + vl)c. So c  (ua + vb).
Theorem. Let a, b ∈ N. Then (a, b) exists.
Proof.
Let
S
=
{ua
+
vb
:
u, v ∈ Z}
be the set of all linear combinations of
a, b
.
Let
d
be the smallest positive member of
S
. Say
d
=
xa
+
yb
. Hence if
c  a, c  b
,
then c  d. So we need to show that d  a and d  b, and thus d = (a, b).
By the division algorithm, there exist numbers
q, r ∈ Z
with
a
=
qd
+
r
with
0
≤ r < d
. Then
r
=
a−qd
=
a
(1
−qx
)
−qyb
. Therefore
r
is a linear combination
of
a
and
b
. Since
d
is the smallest positive member of
S
and 0
≤ r < d
, we have
r = 0 and thus d  a. Similarly, we can show that d  b.
Corollary.
(from the proof) Let
d
= (
a, b
), then
d
is the smallest positive linear
combination of a and b.
Corollary
(B´ezout’s identity)
.
Let
a, b ∈ N
and
c ∈ Z
. Then there exists
u, v ∈ Z with c = ua + vb iff (a, b)  c.
Proof.
(
⇒
) Let
d
= (
a, b
). If
c
is a linear combination of
a
and
b
, then
d  c
because d  a and d  b.
(
⇐
) Suppose that
d  c
. Let
d
=
xa
+
yb
and
c
=
kd
. Then
c
= (
kx
)
a
+ (
ky
)
b
.
Thus c is a linear combination of a and b.
Note that the proof that (
a, b
) exists is existential, not constructive. How
can we actually find d, and how can we find x, y such that d = xa + yb?
While it might be easy to simply inspect
d
for small numbers, how would
you find common factors of, say, 4931 and 3795? We cannot use primes because
(a) prime factorization is hard; and (b) primes are not yet defined.
You might spot if
c 
4931 and
c 
3795, then
c 
(4931
−
3795) = 1136. The
process is also reversible — if
c 
1136 and
c 
3795, then
c 
(1136 + 3795) = 4931.
Thus the problem is equivalent to finding common factors of 3795 and 1136. The
process can be repeated until we have small numbers.
Proposition
(Euclid’s Algorithm)
.
If we continuously break down
a
and
b
by
the following procedure:
a = q
1
b + r
1
b = q
2
r
1
+ r
2
r
1
= q
3
r
2
+ r
3
.
.
.
r
n−2
= q
n
r
n−1
then the highest common factor is r
n−1
.
Proof.
We have (common factors of
a, b
) = (common factors of
b, r
1
) = (common
factors of r
1
, r
2
) = ··· = (factors of r
n−1
).
This gives an alternative proof that hcfs exist.
How efficient is this algorithm? For every step, we have
a ≥ b
+
r
1
>
2
r
1
.
Thus every two steps, the number on the left goes down by at least half. Hence
the number of digits goes down every 8 steps. Thus the time needed is
≤
8
×
number of digits and has time complexity O(log b).
Example. Suppose a = 57 and b = 42.
common factors of 57 and 42 57 = 1 × 42 + 15
= common factors of 42 and 15 42 = 2 × 15 + 12
= common factors of 15 and 12 15 = 1 × 12 + 3
= common factors of 12 and 3 12 = 4 × 3 + 0
= common factors of 3 and 0
= factors of 3.
So the hcf is 3.
By reversing Euclid’s Algorithm, we can find the hcf of two numbers as a
linear combination of a and b.
Example. Consider 57 and 21.
57 = 2 × 21 + 15
21 = 1 × 15 + 6
15 = 2 × 6 + 3
6 = 2 × 3
In the opposite direction, we have
3 = 15 − 2 × 6
= 15 − 2 × (21 − 15)
= 3 × 15 − 2 × 21
= 3 × (57 − 2 × 21) − 2 ×21
= 3 × 57 − 8 × 21
This gives an alternative constructive proof of B´ezout’s identity. Moreover,
it gives us a quick way of expressing (
a, b
) =
ax
+
by
. However, this algorithm
requires storing the whole process of Euclid’s Algorithm and is not efficient
spacewise.
To achieve higher space efficiency, we attempt to find a recurrence relation
for the coefficients
A
j
, B
j
such that
a × B
j
− b × A
j
= (
−
1)
j
r
j
. The possible
factor of
−
1 is there just so that the recurrence relation will look nicer. Suppose
that this is satisfied for all indices less than j. Then we have
(−1)
j
r
j
= (−1)
j
(r
j−2
− q
j
r
j−1
)
= (−1)
j−2
r
j−2
+ q
j
(−1)
j−1
r
j−1
= a(B
j−2
+ q
j
B
j−1
) − b(A
j−2
+ q
j
A
j−1
).
Hence we can obtain the following recurrence relation:
A
j
= q
j
A
j−1
+ A
j−2
B
j
= q
j
B
j−1
+ B
j−2
with
a × B
j
− b × A
j
= (−1)
j
r
j
.
In particular, a × B
n−1
− b × A
n−1
= (−1)
n−1
r
n−1
= (a, b).
Also, by an easy induction, A
j
B
j−1
− B
j
A
j−1
= (−1)
j
. So (A
j
, B
j
) = 1.
These coefficients also play another role. We can put the Euclid’s Algorithm’s
equations in the following form:
57
21
= 2 +
15
21
21
15
= 1 +
6
15
15
6
= 2 +
3
6
6
3
= 2
Then we can write out the fraction
57
21
in continued fraction form
57
21
= 2 +
1
1 +
1
2 +
1
2
Expanding this continued fractions term by term, we can have the sequence
2
,
2 +
1
1
= 3, 2 +
1
1+
1
2
=
8
3
. These are called the “convergents”. The sequence
happens to be
A
i
B
i
.
3.2 Primes
There are a lot of ways we can define prime numbers in
N
. The definition we
will use is the following:
Definition
(Prime number)
. p ∈ N
is a prime if
p >
1 and the only factors of
p
(in Z) are ±1 and ±p.
In this chapter, the objective is to prove things we already know and think
are obvious.
Theorem. Every number can be written as a product of primes.
Proof.
If
n ∈ N
is not a prime itself, then by definition
n
=
ab
. If either
a
or
b
is not prime, then that number can be written as a product, say
b
=
cd
. Then
n
=
acd
and so on. Since these numbers are getting smaller, and the process
will stop when they are all prime.
In the proof, we handwaved a bit when we said “and so on”. We will later
come up with the principle of (strong) induction that rigorously justifies this.
This is the case for many proofs we will have here.
Theorem. There are infinitely many primes.
Proof.
(Euclid’s proof) Suppose there are finitely many primes, say
p
1
, p
2
···p
n
.
Then
N
=
p
1
p
2
···p
n
+ 1 is divisible by none of the primes. Otherwise,
p
j

(
N − p
1
p
2
···p
n
), i.e.
p
j

1, which is impossible. However,
N
is a product of
primes, so there must be primes not amongst p
1
, p
2
···p
n
.
Proof.
(Erd¨os 1930) Suppose that there are finitely many primes,
p
1
, p
2
···p
k
.
Consider all numbers that are the products of these primes, i.e.
p
j
1
1
p
j
2
2
···p
j
k
k
,
where
j
i
≥
0. Factor out all squares to obtain the form
m
2
p
i
1
1
p
i
2
2
···p
i
k
k
, where
m ∈ N and i
j
= 0 or 1.
Let
N ∈ N
. Given any number
x ≤ N
, when put in the above form, we have
m ≤
√
N
. So there are at most
√
N
possible values of
m
. For each
m
, there
are 2
k
numbers of the form
m
2
p
i
1
1
p
i
2
2
···p
i
k
k
. So there are only
√
N ×
2
k
possible
values of x of this kind.
Now pick
N ≥
4
k
. Then
N >
√
N ×
2
k
. So there must be a number
≤ N
not of this form, i.e. it has a prime factor not in this list.
Historically, many people have came up with “new” proofs that there are
infinitely many primes. However, most of these proofs were just Euclid’s proof
in disguise. Erd¨os’ proof is genuinely a new proof. For example, Euclid’s proof
comes up with a particular number
N
, and says all its factors are not in the
list of primes. On the other hand, Erd¨os’ proof says that there is some number,
which we don’t know, with at least one factor not in the list.
Also, the proofs give different bounds on when we should expect to see the
k
th prime. For example, Euclid tells us that the
k
th prime must be less than
2
2
k
, while Erd¨os tells us it is less than 4
k
.
Theorem. If a  bc and (a, b) = 1, then a  c.
Proof.
From Euclid’s algorithm, there exist integers
u, v ∈ Z
such that
ua
+
vb
= 1.
So multiplying by c, we have uac + vbc = c. Since a  bc, a LHS. So a  c.
Definition (Coprime numbers). We say a, b are coprime if (a, b) = 1.
Corollary. If p is a prime and p  ab, then p  a or p  b. (True for all p, a, b)
Proof.
We know that (
p, a
) =
p
or 1 because
p
is a prime. If (
p, a
) =
p
, then
p  a. Otherwise, (p, a) = 1 and p  b by the theorem above.
Corollary. If p is a prime and p  n
1
n
2
···n
i
, then p  n
i
for some i.
Note that when we defined primes, we defined it in terms of factors of
p
.
This corollary is the opposite — it is about how
p
behaves as a factor of other
numbers.
Theorem
(Fundamental Theorem of Arithmetic)
.
Every natural number is
expressible as a product of primes in exactly one way. In particular, if
p
1
p
2
···p
k
=
q
1
q
2
···q
l
, where
p
i
, q
i
are primes but not necessarily distinct,
then k = l. q
1
, ···q
l
are p
1
, ···p
k
in some order.
Proof.
Since we already showed that there is at least one way above, we only
need to show uniqueness.
Let
p
1
···p
k
=
q
1
···q
l
. We know that
p
1
 q
1
···q
l
. Then
p
1
 q
1
(
q
2
q
3
···q
l
).
Thus
p
1
 q
i
for some
i
. wlog assume
i
= 1. Then
p
1
=
q
1
since both are primes.
Thus p
2
p
3
···p
k
= q
2
q
3
···q
l
. Likewise, we have p
2
= q
2
, ··· and so on.
Corollary.
If
a
=
p
i
1
1
p
i
2
2
···p
i
r
r
and
b
=
p
j
1
1
p
j
2
2
···p
j
r
r
, where
p
i
are distinct primes
(exponents can be zero). Then (
a, b
) =
Q
p
min{i
k
,j
k
}
k
. Likewise,
lcm
(
a, b
) =
Q
p
max{i
k
,j
k
}
k
. We have hcf(a, b) × lcm(a, b) = ab.
However, this is not an efficient way to calculate (
a, b
), since prime factoriza
tion is very hard.
Note that this is a property peculiar to natural numbers. There are “arith
metical systems” (permitting addition, multiplication and subtraction) where
factorization is not unique, e.g. even numbers.
Example. The following systems have no prime unique factorization
(i)
Even numbers. “Primes” are twice of odd numbers. So 6 is a prime (NOT
divisible by 2!) while 8 is not. We have 60 = 2
×
30 = 6
×
10, where
2
,
6
,
10
,
30 are primes. However, this example is not “proper” since there
is no identity element. (i.e. not a ring)
(ii)
Consider
Z
[
√
−5
] =
{a
+
b
√
−5
:
a, b ∈ Z}
. We have 6 = 2
×
3 =
(1
−
√
−5
)(1+
√
−5
). It can be shown that these are primes (see IB Groups,
Rings and Modules).
Exercise: Where does the proof of the Fundamental Theorem of Arithmetic
fail in these examples?
4 Counting and integers
This chapter exists because experience shows that mathematicians do not know
how to count.
4.1 Basic counting
A useful theorem is the pigeonhole principle.
Theorem
(Pigeonhole Principle)
.
If we put
mn
+ 1 pigeons into
n
pigeonholes,
then some pigeonhole has at least m + 1 pigeons.
Example.
In Cambridge, there are 2 people who have the same number of
hairs.
Another useful tool for counting is the indicator function.
Definition
(Indicator function/characteristic function)
.
Let
X
be a set. For
each
A ⊆ X
, the indicator function or characteristic function of
A
is the function
i
A
:
X → {
0
,
1
}
with
i
A
(
x
) = 1 if
x ∈ A
, 0 otherwise. It is sometimes written as
χ
A
.
Proposition.
(i) i
A
= i
B
⇔ A = B
(ii) i
A∩B
= i
A
i
B
(iii) i
¯
A
= 1 − i
A
(iv) i
A∪B
= 1
− i
A∪B
= 1
− i
¯
A∩
¯
B
= 1
− i
¯
A
i
¯
B
= 1
−
(1
− i
A
)(1
− i
B
) =
i
A
+ i
B
− i
A∩B
.
(v) i
A\B
= i
A∩
¯
B
= i
A
i
¯
B
= i
A
(1 − i
B
) = i
A
− i
A∩B
Example.
We can use the indicator function to prove certain properties about
sets:
(i) Proof that A ∩(B ∪ C) = (A ∩ B) ∪ (A ∩ C):
i
A∩(B∪C)
= i
A
i
B∪C
= i
A
(i
B
+ i
C
− i
B
i
C
)
= i
A
i
B
+ i
A
i
C
− i
A
i
B
i
C
i
(A∩B)∪(A∩C)
= i
A∩B
+ i
A∩C
− i
A∩C
i
A∩B
= i
A
i
B
+ i
A
i
C
− i
A
i
C
i
A
i
B
= i
A
i
B
+ i
A
i
C
− i
A
i
B
i
C
Therefore
i
A∩(B∪C)
=
i
(A∩B)∪(A∩C)
and thus
A ∩
(
B ∪ C
) = (
A ∩ B
)
∪
(A ∩ C).
Note that i
A
= i
2
A
since i
A
= 0 or 1, and 0
2
= 0 and 1
2
= 1.
(ii)
Proof that the symmetric difference is associative: Observe that
i
A∆B
≡
i
A
+ i
B
(mod 2). Thus i
(A∆B)∆C
= i
A∆(B∆C)
≡ i
A
+ i
B
+ i
C
(mod 2).
Indicator functions are handy for computing the sizes of finite sets because if
A ⊆ X, then A =
P
x∈X
i
A
(x).
Proposition. A ∪B = A + B − A ∩ B
Proof.
A ∪ B =
X
x∈X
i
A(x)∪B(x)
=
X
(i
A
(x) + i
B
(x) − i
A∩B
(x))
=
X
i
A
(x) +
X
i
B
(x) −
X
i
A∩B
(x)
= A + B − A ∩ B
More importantly, we will use indicator functions to prove a powerful result.
Theorem
(InclusionExclusion Principle)
.
Let
A
i
be subsets of a finite set
X
,
for 1 ≤ i ≤ n. Then

¯
A
1
∩ ···∩
¯
A
n
 = X −
X
i
A
i
 +
X
i<j
A
i
∩ A
j
 − ···+ (−1)
n
A
1
∩ ···A
n
.
Equivalently,
A
1
∪ ···∪A
n
 =
X
i
A
i
 −
X
i<j
A
i
∩ A
j
 + ···+ (−1)
n−1
A
1
∩ ···∩A
n
.
The two forms are equivalent since A
1
∪ ···∪A
n
 = X − 
¯
A
1
∩ ···∩
¯
A
n
.
Proof. Using indicator functions,
i
¯
A
1
∩
¯
A
2
∩···∩
¯
A
n
=
Y
j
i
¯
A
j
=
Y
j
(1 − i
A
j
)
= 1 −
X
i
i
A
i
+
X
i<j
i
A
i
i
A
j
− ···+ (−1)
n
i
A
1
i
A
2
···i
A
n
= 1 −
X
i
i
A
i
+
X
i<j
i
A
i
∩A
j
− ···+ (−1)
n
i
A
1
∩A
2
∩A
3
∩···∩A
n
Thus

¯
A
1
∩ ···∩
¯
A
n
 =
X
x∈X
i
¯
A
1
∩
¯
A
2
∩···∩
¯
A
n
(x)
=
X
x
1 −
X
i
X
x
i
A
i
(x) +
X
i<j
X
x
i
A
i
∩A
j
(x) − ···
+
X
x
(−1)
n
i
A
1
∩A
2
∩A
3
∩···∩A
n
(x)
= X −
X
i
A
i
 +
X
i<j
A
i
∩ A
j

−
X
i<j<k
A
i
∩ A
j
∩ A
k
 + ···+ (−1)
n
A
1
∩ A
2
∩ ···A
n

Example. How many numbers ≤ 200 are coprime to 110?
Let
X
=
{
1
, ···
200
}
, and
A
1
=
{x
: 2
 x}
,
A
2
=
{x
: 5
 x}
,
A
3
=
{x
: 11
 x}
.
We know that
A
1
 = b200/2c = 100
A
2
 = b200/5c = 40
A
3
 = b200/11c = 18
A
1
∩ A
2
 = b200/10c = 20
A
1
∩ A
3
 = b200/22c = 9
A
2
∩ A
3
 = b200/55c = 3
A
1
∩ A
2
∩ A
3
 = b200/110c = 1
Then the answer is 200 −100 −40 −18 + 20 + 9 + 3 − 1 = 73.
4.2 Combinations
“Combinations” is about counting the ways we can pick things without regards
to order. We can formulate these problems in the language of sets: given a set
X
, how many subsets of
X
are there that satisfy some particular properties?
For example, if we want to pick 3 people from 10, we can let
X
be the set of the
10 people. Then the number of ways of picking the 3 people is the number of
subsets of size three.
Example.
How many subsets of
{
1
,
2
, ···n}
are there? There are 2
×
2
×···×
2 =
2
n
. Since for each subset, every element is either in or out of the subset, and there
are two choices for each element. Equivalently, there are 2
n
possible indicator
functions, i.e. functions {1, 2, 3, ··· , n} → {0, 1}.
Definition
(Combination
n
r
)
.
The number of subsets of
{
1
,
2
,
3
, ··· , n}
of size
r is denoted by
n
r
. The symbol is pronounced as “n choose r”.
This is the definition of
n
r
. This does not in any way specify how we can
actually calculate the value of
n
r
.
Proposition. By definition,
n
0
+
n
1
+ ···+
n
n
= 2
n
Theorem (Binomial theorem). For n ∈ N with a, b ∈ R, we have
(a + b)
n
=
n
0
a
n
b
0
+
n
1
a
n−1
b
1
+ ···+
n
r
a
n−r
b
r
+ ···+
n
n
a
0
b
n
Proof.
We have (
a
+
b
)
n
= (
a
+
b
)(
a
+
b
)
···
(
a
+
b
). When we expand the product,
we get all terms attained by choosing
b
from some brackets,
a
from the rest. The
term
a
n−r
b
r
comes from choosing
b
from
r
brackets,
a
from the rest, and there
are
n
r
ways to make such a choice.
This theorem is not immediately useful since we do not know the value of
n
r
!
Because of this theorem,
n
r
is sometimes called a “binomial coefficient”.
Proposition.
(i)
n
r
=
n
n − r
. This is because choosing
r
things to keep is the same as
choosing n − r things to throw away.
(ii)
n
r − 1
+
n
r
=
n + 1
r
(Pascal’s identity) The RHS counts the number
of ways to choose a team of
r
players from
n
+ 1 available players, one of
whom is Pietersen. If Pietersen is chosen, there are
n
r−1
ways to choose
the remaining players. Otherwise, there are
n
r
ways. The total number
of ways is thus
n
r−1
+
n
r
.
Now given that
n
0
=
n
n
= 1, since there is only one way to choose
nothing or everything, we can construct Pascal’s triangle:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
where each number is the sum of the two numbers above it, and the
r
th
item of the nth row is
n
r
(first row is row 0).
(iii)
n
k
k
r
=
n
r
n − r
k − r
. We are counting the number of pairs of sets
(
Y, Z
) with
Y 
=
k
and
Z
=
r
with
Z ⊆ Y
. In the LHS, we first choose
Y
then choose
Z ⊆ Y
. The RHS chooses
Z
first and then choose the
remaining Y \ Z from {1, 2, ···n} \ Z.
(iv)
a
r
b
0
+
a
r − 1
b
1
+
···
+
a
r − k
b
k
+
···
a
0
b
r
=
a + b
r
(Vandermonde’s convolution) Suppose we have
a
men and
b
women, and
we need to choose a committee of
r
people. The right hand side is the total
number of choices. The left hand side breaks the choices up according to
the number of men vs women.
Example.
A greengrocer stocks
n
kinds of fruit. In how many ways can we
choose a bag of
r
fruits? If we are only allowed to choose one of each kind, then
the answer is
n
r
. But we might have
r
= 4, and we want to allow picking 2
apples, 1 plum and 1 quince. The total number of ways to choose is
n+r−1
r
.
Why?
Each choice can be represented by a binary string of length
n
+
r −
1, with
r
0’s and
n −
1 1’s. The string can be constructed as follows (by example): when
n
= 5 and
r
= 8, a possible binary string 000100110010. The block of zeros
corresponds to the number of each fruit chosen, and the 1s separate the choices.
In the string above, we have 3 of type 1, 2 of type 2, 0 of type 3, 2 of type 4 and
1 of type 5.
Then clearly the number of possible strings is
n+r−1
r
.
Proposition.
n
r
=
n!
(n − r)!r!
.
Proof.
There are
n
(
n−
1)(
n−
2)
···
(
n−r
+1) =
n!
(n−r)!
ways to choose
r
elements
in order. Each choice of subsets is chosen this way in
r
! orders, so the number of
subsets is
n!
(n−r)!r!
.
We might write
x
r
for the polynomial
x
(
x −
1)
···
(
x − r
+ 1). We call this
“
x
to the
r
falling”. We can write
n
r
=
n
r
r!
. Multiplying Vandermonde by
r
!,
we obtain the “falling binomial theorem”
r
0
a
r
b
0
+
r
1
a
r−1
b
1
+ ···+
r
r
a
0
b
r
= (a + b)
r
.
Example.
A bank prepares a letter for each of its
n
customers, saying how
much it cares. (Each of these letters costs the customer £40) There are
n
! ways
to put the letters in the envelopes. In how many ways can this be done so that no
one gets the right letter (i.e. how many derangements are there of
n
elements)?
We let
X
be the set of all envelopings (permutation of
n
).
X
=
n
!. For each
i
, let
A
i
=
{x ∈ X
:
x assigns the correct letter to customer i}
. We want to
know

T
i
¯
A
i

. We know that
A
i

= (
n −
1)! since
i
’s letter gets in
i
’s envelopes
and all others can be placed randomly. We have
A
i
∩ A
j

= (
n −
2)! as well.
Similarly, A
i
∩ A
j
∩ A
k
 = (n − 3)!.
By the inclusionexclusion formula, we have
\
i
¯
A
i
= X −
X
A
i
 +
X
A
i
∩ A
j
 + ···
= n! −
n
1
(n − 1)! +
n
2
(n − 2)! − ···
= n!
1 −
1
1!
+
1
2!
− ··· +
(−1)
n
n!
≈ n!e
−1
4.3 Wellordering and induction
Several proofs so far involved “take the least integer such that”, e.g. division
algorithm; or involved a sequence of moves “and so on. . . ” e.g. Euclid’s algorithm,
every number is a product of primes. We rely on the following:
Theorem
(Weak Principle of Induction)
.
Let
P
(
n
) be a statement about the
natural number n. Suppose that
(i) P (1) is true
(ii) (∀n) P (n) ⇒ P (n + 1)
Then P (n) is true for all n ≥ 1.
Example (Tower of Hanoi). Referring to the image below,
n
.
.
.
3
2
1
A B C
The objective is to move the
n
rings on peg A to peg B, with the constraints
that you can only move one ring at a time, and you can never place a larger ring
onto a smaller ring.
Now claim that this needs exactly 2
n
− 1 moves.
Let
P
(
n
) be “
n
rings needs 2
n
−
1 moves”. Note that this statement contains
two assertions — (1) we can do it in 2
n
− 1 moves; (2) We can’t do it in fewer.
First consider P (1). We simply have to move the ring from A to B.
Suppose we have
n
+ 1 rings. We can move the top
n
rings to peg C, then
move the bottom ring to peg B, then move the
n
rings from
C
back to
B
.
Assuming P (n) is true, this needs at most 2 × (2
n
− 1) + 1 = 2
n+1
− 1 moves.
Can we do it in fewer moves? To succeed, we must free the bottom ring,
so we must shift the top
n
rings to another pe.g. This needs
≥
2
n
−
1 moves
by
P
(
n
). Then we need to shift the bottom ring. Then we need to shift the
n
smaller rings to the big one. This needs
≥
2
n
−
1 moves by
P
(
n
). So this needs
≥ 2
n+1
− 1 moves altogether.
So we showed that
P
(
n
)
⇒ P
(
n
+ 1) (we used
P
(
n
) four times). By the WPI,
P (n) is true for all n.
Example.
All numbers are equal. Let
P
(
n
) be “if
{a
1
, ···a
n
}
is a set of
n
numbers, then
a
1
=
a
2
=
···a
n
”.
P
(1) is trivially true. Suppose we have
{a
1
, a
2
···a
n+1
}
. Assuming
P
(
n
), apply it to
{a
1
, a
2
···a
n
}
and
{a
2
, ··· , a
n+1
}
,
then
a
1
=
···
=
a
n
and
a
2
=
a
3
=
···
=
a
n+1
. So
a
1
=
a
2
=
···
=
a
n+1
. Hence
P (n) ⇒ P (n + 1). So P (n) is true for all n ∈ N.
Theorem. Inclusionexclusion principle.
Proof.
Let
P
(
n
) be the statement “for any sets
A
1
···A
n
”, we have
A
1
∪ ··· ∪
A
n
 =
P
i
A
i
 −
P
i<j
A
i
∩ A
j
 + ··· ± A
i
∩ A
2
∩ ··· ∩ A
n
”.
P
(1) is trivially true.
P
(2) is also true (see above). Now given
A
1
···A
n+1
,
Let B
i
= A
i
∩ A
n+1
for 1 ≤ i ≤ n. We apply P (n) both to the A
i
and B
i
.
Now observe that
B
i
∩ B
j
=
A
i
∩ A
j
∩ A
n+1
. Likewise,
B
i
∩ B
j
∩ B
k
=
A
i
∩ A
j
∩ A
k
∩ A
n+1
. Now
A
1
∪ A
2
∪ ··· ∪ A
n+1
 = A
1
∪ ··· ∪ A
n
 + A
n+1
 − (A
1
∪ ··· ∪ A
n
) ∩ A
n+1

= A
1
∪ ··· ∪ A
n
 + A
n+1
 − B
1
∪ ··· ∪ B
n

=
X
i≤n
A
i
 −
X
i<j≤n
A
i
∩ A
j
 + ··· + A
n+1

−
X
i≤n
B
i
 +
X
i<j≤n
B
i
∩ B
j
 − ···
Note
P
i≤n
B
i

=
P
i≤n
A
i
∩A
n+1

. So
P
i<j≤n
A
i
∩A
j

+
P
i≤n
B
i

=
P
i<j≤n+1
A
i
∩A
j

,
and similarly for the other terms. So
=
X
i≤n+1
A
i
 −
X
i<j≤n+1
A
i
∩ A
j
 + ···
So P (n) ⇒ P (n + 1) for n ≥ 2. By WPI, P (n) is true for all n.
However, WPI is not quite what we want for “every number is a product of
primes”. We need a different form of induction.
Theorem
(Strong principle of induction)
.
Let
P
(
n
) be a statement about
n ∈ N
.
Suppose that
(i) P (1) is true
(ii) ∀n ∈ N , if P (k) is true ∀k < n then P (n) is true.
Then P (n) is true for all n ∈ N .
Note that (i) is redundant as it follows from (ii), but we state it for clarity.
Example.
“Evolutionary trees” Imagine that we have a mutant that can produce
two offsprings. Each offspring is either an animal or another mutant. A possible
evolutionary tree is as follows:
mutant
mutant
pig
mutant
slug
man
mutant
gnu
ibex
Let
P
(
n
) be the statement
n −
1 mutants produces
n
animals. Given some tree
with
n
animals, remove the top mutant to get two subtrees, with
n
1
and
n
2
animals, where
n
1
+
n
2
=
n
. If
P
(
k
) is true
∀k < n
, then
P
(
n
1
) and
P
(
n
2
) are
true. So the total number of mutants is 1 + (
n
1
−
1) + (
n
2
−
1) =
n −
1. So
P
(
n
)
is true. Hence by strong principle of induction, P (n) is true for all n.
Theorem.
The strong principle of induction is equivalent to the weak principle
of induction.
Proof.
Clearly the strong principle implies the weak principle since if
P
(
n
)
⇒
P (n + 1), then (P (1) ∧ P (2) ∧ ··· ∧ P (n)) ⇒ P (n + 1).
Now show that the weak principle implies the strong principle. Suppose that
P
(1) is true and (
∀n
)
P
(1)
∧ P
(2)
∧ ··· ∧ P
(
n −
1)
⇒ P
(
n
). We want to show
that P (n) is true for all n using the weak principle.
Let
Q
(
n
) = “
P
(
k
) is true
∀k ≤ n
”. Then
Q
(1) is true. Suppose that
Q
(
n
) is
true. Then
P
(1)
∧P
(2)
∧···∧P
(
n
) is true. So
P
(
n
+ 1) is true. Hence
Q
(
n
+ 1)
is true. By the weak principle,
Q
(
n
) is true for all
n
. So
P
(
n
) is true for all
n.
While strong and weak induction are practically equivalent, they are rather
distinct conceptually. Weak induction is expressed in terms of “adding 1”, while
strong induction is based on the ordering of natural numbers.
It turns out that there is another statement about the natural numbers that
can be stated in terms of orders. We first formally define what it means to be
an order.
Definition
(Partial order)
.
A partial order on a set is a reflexive, antisymmetric
((aRb) ∧ (bRa) ⇔ a = b) and transitive relation.
Example.
The ordinary ordering of
N a ≤ b
is a partial order of
N
. Also,
a  b
on N is also a partial order.
Definition
(Total order)
.
A total order is a partial order where
∀a 6
=
b
, exactly
one of aRb or bRa holds. This means that every two things must be related.
Definition
(Wellordered total order)
.
A total order is wellordered if every
nonempty subset has a minimal element, i.e. if
S 6
=
∅
, then
∃m ∈ S
such that
x < m ⇒ x 6∈ S.
Example. Z
with the usual order is not wellordered since the set of even
integers has no minimum. The positive rationals are also not wellordered under
the usual order.
Theorem
(Wellordering principle)
. N
is wellordered under the usual order,
i.e. every nonempty subset of N has a minimal element.
Theorem.
The wellordering principle is equivalent to the strong principle of
induction.
Proof.
First prove that wellordering implies strong induction. Consider a
proposition P (n). Suppose P (k) is true ∀k < n implies P (n).
Assume the contrary. Consider the set
S
=
{n ∈ N
:
¬P
(
n
)
}
. Then
S
has a
minimal element
m
. Since
m
is the minimal counterexample to
P
,
P
(
k
) is true
for all
k < m
. However, this implies that
P
(
m
) is true, which is a contradiction.
Therefore P (n) must be true for all n.
To show that strong induction implies wellordering, let
S ⊆ N
. Suppose
that
S
has no minimal element. We need to show that
S
is empty. Let
P
(
n
) be