Part IA Numbers and Sets
Based on lectures by A. G. Thomason
Notes taken by Dexter Chua
Michaelmas 2014
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Introduction to number systems and logic
Overview of the natural numbers, integers, real numbers, rational and irrational
numbers, algebraic and transcendental numbers. Brief discussion of complex numbers;
statement of the Fundamental Theorem of Algebra.
Ideas of axiomatic systems and proof within mathematics; the need for proof; the
role of counter-examples in mathematics. Elementary logic; implication and negation;
examples of negation of compound statements. Proof by contradiction. [2]
Sets, relations and functions
Union, intersection and equality of sets. Indicator (characteristic) functions; their use in
establishing set identities. Functions; injections, surjections and bijections. Relations,
and equivalence relations. Counting the combinations or permutations of a set. The
Inclusion-Exclusion Principle. [4]
The integers
The natural numb ers: mathematical induction and the well-ordering principle. Exam-
ples, including the Binomial Theorem. [2]
Elementary number theory
Prime numbers: existence and uniqueness of prime factorisation into primes; highest
common factors and least common multiples. Euclid’s proof of the infinity of primes.
Euclid’s algorithm. Solution in integers of ax + by = c.
Mo dular arithmetic (congruences). Units modulo
n
. Chinese Remainder Theorem.
Wilson’s Theorem; the Fermat-Euler Theorem. Public key cryptography and the RSA
algorithm. [8]
The real numbers
Least upper bounds; simple examples. Least upper bound axiom. Sequences and series;
convergence of bounded monotonic sequences. Irrationality of
2
and
e
. Decimal
expansions. Construction of a transcendental number. [4]
Countability and uncountability
Definitions of finite, infinite, countable and uncountable sets. A countable union of
countable sets is countable. Uncountability of R. Non-existence of a bijection from a
set to its power set. Indirect proof of existence of transcendental numbers. [4]
Contents
0 Introduction
1 Proofs and logic
1.1 Proofs
1.2 Examples of proofs
1.3 Logic
2 Sets, functions and relations
2.1 Sets
2.2 Functions
2.3 Relations
3 Division
3.1 Euclid’s Algorithm
3.2 Primes
4 Counting and integers
4.1 Basic counting
4.2 Combinations
4.3 Well-ordering and induction
5 Modular arithmetic
5.1 Modular arithmetic
5.2 Multiple moduli
5.3 Prime moduli
5.4 Public-key (asymmetric) cryptography
6 Real numbers
6.1 Construction of numbers
6.2 Sequences
6.3 Series
6.4 Irrational numbers
6.5 Euler’s number
6.6 Algebraic numbers
7 Countability
0 Introduction
According to the Faculty, this course is not aimed at teaching you any new
knowledge. In particular, the Faculty says
This course is concerned not so much with teaching you new parts
of mathematics . . .
Instead, this course is intended to teach you about how to do maths properly.
The objective of this course is to start from a few axioms, which are assumptions
about our system, and try to prove everything rigorously from the axioms.
This is different from how mathematics is usually done in secondary school.
In secondary school, we just accept that certain statements are true. For example,
probably no one rigorously proved to you that each natural number has a unique
prime factorization. In this course, nothing is handwaved. Everything will be
obtained as a logical consequence of the axioms and previous (proven) results.
In the course, you shouldn’t focus too much on the content itself. Instead,
the major takeaway should be how we do mathematics. In particular, how we
construct and present arguments.
The actual content of this course is rather diverse. As the name suggests,
the course touches on numbers and sets. The “numbers” part is mostly basic
number theory, which you may have come across if you have participated in
mathematical olympiads. We also study some properties of real numbers, but
the actual serious study of real numbers is done in IA Analysis I.
The “sets” part starts with some basic definitions of sets, functions and
relations, which are really important and will crop up in all courses you will
encounter. In some sense, these form the language in which mathematics is
written. At the end of the course, we will touch on countability, which tells us
how “big” a set is.
1 Proofs and logic
1.1 Proofs
As the first course in pure mathematics, we will have an (informal) look at proofs
and logic.
In mathematics, we often want to prove things. This is different from most
other disciplines. For example, in science, we perform experiments to convince
ourselves that our theory is correct. However, no matter how many experiments
we do, we still cannot be absolutely sure that our theory is correct. For example,
Newtonian mechanics was believed to be correct for a long time, but is nowadays
only considered to be an approximation of reality.
On the other hand, when we prove a theorem in mathematics, we are
completely sure that the theorem is true. Also, when we actually prove a
theorem, (hopefully) we can also understand why it is true, and gain further
insight.
Definition
(Proof)
.
A proof is a sequence of true statements, without logical
gaps, that is a logical argument establishing some conclusion.
To prove things, we need to start from some assumptions. These assumptions
are known as axioms. When we call something an axiom, it does not mean that
we take these statements to be true without questioning. Instead, we are saying
“if we assume these axioms, then these results hold”. Two people can disagree on
what the axioms are and still be friends.
We also tend to define concepts as unambiguously as possible. Of course, just
like a dictionary cannot define all words without being circular, we do not define
everything in mathematics. To prove things, we have to start somewhere, with
some agreed assumptions (axioms). We also don’t define everything rigorously
(or else how could one start speaking?).
In mathematics, we are often concerned about truth. Often, we only care
about statements that can take some truth value.
Definition
(Statement)
.
A statement is a sentence that can have a true value.
Example. The following are statements:
(i) There are infinitely many primes of the form n
2
+ 1
(ii) There is always a prime number between n and 2n
(iii)
There is no computer program that can factorize an
n
-digit number in
n
3
steps
(iv)
For every polynomial
p
(
x
) =
a
n
x
n
+
a
n1
x
n1
+
···
+
a
0
, where
a
i
are
complex numbers,
n
1,
a
n
6
= 0, there exists (possibly complex)
z
such
that p(z) = 0
(v) m × n = n × m for all natural numbers n, m
(vi) 2 + 2 = 4
The current status (as of 2015) of these statements are:
(i) No one has a proof but it is probably true
(ii) This is known to be true
(iii) No one knows (related to P = NP problem)
(iv) This is known to be true (Fundamental Theorem of Algebra)
(v) This is known to be true
(vi) This is known to be true (obviously does this need to be proved?)
1.2 Examples of proofs
Apart from having a proof, it is very important that a proof is correct. Here we
will look at some examples of proofs and non-proofs.
We first start with a simple example.
Proposition. For all natural numbers n, n
3
n is a multiple of 3.
Proof.
We have
n
3
n
= (
n
1)
n
(
n
+ 1). One of the three consecutive integers
is divisible by 3. Hence so is their product.
Proposition. If n
2
is even, then so is n.
Proof.
If
n
is even, then
n
= 2
k
for some integer
k
. Then
n
2
= 4
k
2
, which is
even.
This is incorrect! We wanted to proof
n
2
is even”
n
is even”, but what
we proved is n is even” n
2
is even”, which are distinct statements.
Instead, a correct proof is as follows:
Proof.
Suppose
n
is odd. Then
n
= 2
k
+ 1 for some integer
k
. Then
n
2
=
4
k
2
+ 4
k
+ 1 = 2(2
k
2
+ 2) + 1, which is odd. This contradicts our assumption
that n
2
is even.
This is an example of proof by contradiction. We assume what we want to
prove is false, and show that this leads to nonsense.
Proposition. The solutions to x
2
5x + 6 = 0 are x = 2 and x = 3.
Note that these are actually 2 different statements:
(i) x = 2 and x = 3 are solutions
(ii) There are no other solutions
We can write this as an “if and only if” statement:
x
is a solution if and only if
x
= 2 or
x
= 3. Alternatively, we say
x
is a solution iff
x
= 2 or
x
= 3”; or
x
is a solution x = 2 or x = 3”.
Proof.
(i) If x = 2 or x = 3, then x 2 = 0 or x 3 = 0. So (x 2)(x 3) = 0.
(ii)
If
x
2
5
x
+ 6 = 0, then (
x
2)(
x
3) = 0. So
x
2 = 0 or
x
3 = 0.
Then x = 2 or x = 3.
Note that the second direction is simply the first argument reversed. We can
write this all in one go:
x = 3 or x = 2 x 3 = 0 or x 2 = 0
(x 3)(x 2) = 0
x
2
5x 6 = 0
Note that we used the “if and only if” sign between all lines.
We’ll do another non-proof.
Proposition. Every positive number is 1.
Proof. Let r be the smallest positive real. Then either r < 1, r = 1 or r > 1.
If
r <
1, then 0
< r
2
< r
. Contradiction. If
r >
1, then 0
<
r < r
.
Contradiction. So r = 1.
Now this is obviously false, since 0
.
5
<
1. The problem with this proof is
that the smallest positive real need not exist. So we have to make sure we justify
all our claims.
1.3 Logic
Mathematics is full of logical statements, which are made of statements and
logical connectives. Usually, we use shorthands for the logical connectives.
Let
P
and
Q
be statements. Then
P Q
stands for
P
and
Q
”;
P Q
stands
for
P
or
Q
”;
P Q
stands for
P
implies
Q
”;
P Q
stands for
P
iff
Q
”;
¬P
stands for “not
P
”. The truth of these statements depends on the truth of
P and Q . It can be shown by a truth table:
P Q P Q P Q P Q P Q ¬P
T T T T T T F
T F F T F F F
F T F T T F T
F F F F T T T
Certain logical propositions are equivalent, which we denote using the
sign.
For example,
¬(P Q) (¬P ¬Q),
or
(P Q) (¬P Q) (¬Q ¬P ).
By convention, negation has the highest precedence when bracketing. For
example, ¬P ¬Q should be bracketed as (¬P ) (¬Q).
We also have quantifiers. (
x
)
P
(
x
) means “for all
x
,
P
(
x
) is true”, while
(x)P (x) means “there exists x such that P (x) is true”.
The quantifiers are usually bounded, i.e. we write
x X
or
x X
to mean
“for all x in the set X and “there exists x in the set X respectively.
Quantifiers are negated as follows:
¬(x)P (x) (x)(¬P (x));
¬(x)P (x) (x)(¬P (x)).
2 Sets, functions and relations
In this chapter, we will look at the basic building blocks of mathematics, namely
sets, functions and relations. Most of mathematics can be expressed in terms of
these notions, and it is helpful to be familiar with the relevant terms.
2.1 Sets
Definition
(Set)
.
A set is a collection of stuff, without regards to order. El-
ements in a set are only counted once. For example, if
a
= 2
, b
=
c
= 1, then
A
=
{a, b, c}
has only two members. We write
x X
if
x
is a member of the set
X.
Example. Common sets and the symbols used to denote them:
N = {1, 2, 3, ···} is the natural numbers
N
0
= {0, 1, 2, ···} is the natural numbers with 0
Z = ·· , 2, 1, 0, 1, 2, ···} is the integers
Q = {
a
b
: a, b Z, b 6= 0} is the rational numbers
R is the real numbers
C is the complex numbers
It is still debated whether 0 is a natural number. Those who believe that 0 is a
natural number usually write
N
for
{
0
,
1
,
2
, ···}
, and
N
+
for the positive natural
numbers. However, most of the time, it doesn’t matter, and when it does, you
should specify it explicitly.
Definition (Equality of sets). A is equal to B, written as A = B, if
(x) x A x B,
i.e. two sets are equal if they have the same elements.
Definition
(Subsets)
. A
is a subset of
B
, written as
A B
or
A B
, if all
elements in A are in B. i.e.
(x) x A x B.
Theorem. (A = B) (A B and B A)
Suppose
X
is a set and
P
is the property of some elements in
x
, we can write
a set
{x X
:
P
(
x
)
}
for the subset of
x
comprising of the elements for which
P (x) is true. e.g. {n N : n is prime} is the set of all primes.
Definition
(Intersection, union, set difference, symmetric difference and power
set). Given two sets A and B, we define the following:
Intersection: A B = {x : x A and x B}
Union: A B = {x : x A or x B}
Set difference: A \ B = {x A : x 6∈ B}
Symmetric difference:
A
B
=
{x
:
x A xor x B}
, i.e. the elements in
exactly one of the two sets
Power set: P(A) = {X : X A}, i.e. the set of all subsets
New sets can only be created via the above operations on old sets (plus
replacement, which says that you can replace an element of a set with another
element). One cannot arbitrarily create sets such as
X
=
{x
:
x is a set and x 6∈
x}. Otherwise paradoxes will arise.
We have several rules regarding how these set operations behave, which
should be intuitively obvious.
Proposition.
(A B) C = A (B C)
(A B) C = A (B C)
A (B C) = (A B) (A C)
Notation. If A
α
are sets for all α I, then
\
αI
A
α
= {x : (α I)x A
α
}
and
[
αI
A
α
= {x : (α I)x A
α
}.
Definition
(Ordered pair)
.
An ordered pair (
a, b
) is a pair of two items in which
order matters. Formally, it is defined as
{{a}, {a, b}}
. We have (
a, b
) = (
a
0
, b
0
)
iff a = a
0
and b = b
0
.
Definition
(Cartesian product)
.
Given two sets
A, B
, the Cartesian product
of
A
and
B
is
A × B
=
{
(
a, b
) :
a A, b B}
. This can be extended to
n
products, e.g.
R
3
=
R × R × R
=
{
(
x, y, z
) :
x, y, z R}
(which is officially
{(x, (y, z)) : x, y, z R}).
2.2 Functions
Definition
(Function/map)
.
A function (or map)
f
:
A B
is a “rule” that
assigns, for each
a A
, precisely one element
f
(
a
)
B
. We can write
a 7→ f
(
a
).
A and B are called the domain and co-domain respectively.
If we wish to be very formal, we can define a function to be a subset
f A×B
such that for any
a A
, there exists a unique
b B
such that (
a, b
)
f
. We
then think of (
a, b
)
f
as saying
f
(
a
) =
b
. However, while this might act as a
formal definition of a function, it is a terrible way to think about functions.
Example. x
2
:
R R
is a function that sends
x
to
x
2
.
1
x
:
R R
is not a
function since
f
(0) is not defined.
±x
:
R R
is also not a function since it is
multi-valued.
It is often helpful to categorize functions into different categories.
Definition
(Injective function)
.
A function
f
:
X Y
is injective if it hits
everything at most once, i.e.
(x, y X) f (x) = f(y) x = y.
Definition
(Surjective function)
.
A function
f
:
X Y
is surjective if it hits
everything at least once, i.e.
(y Y )(x X) f(x) = y
Example. f : R R
+
{0} with x 7→ x
2
is surjective but not injective.
Definition
(Bijective function)
.
A function is bijective if it is both injective
and surjective. i.e. it hits everything exactly once.
Definition (Permutation). A permutation of A is a bijection A A.
Definition
(Composition of functions)
.
The composition of two functions is a
function you get by applying one after another. In particular, if
f
:
X Y
and
G
:
Y Z
, then
g f
:
X Z
is defined by
g f
(
x
) =
g
(
f
(
x
)). Note that
function composition is associative.
Definition
(Image of function)
.
If
f
:
A B
and
U A
, then
f
(
U
) =
{f
(
u
) :
u U }. f (A) is the image of A.
By definition, f is surjective iff f(A) = B.
Definition
(Pre-image of function)
.
If
f
:
A B
and
V B
, then
f
1
(
V
) =
{a A : f(a) V }.
This is the pre-image of the function
f
, and acts on subsets of
B
. This is
defined for any function
f
. It is important to note that we use the same symbol
f
1
to denote the inverse function, which we will define later, but they are very
distinct entities. For example, we will see that the inverse function exists only
for bijective functions.
To define the inverse function, we will first need some preliminary definitions.
Definition
(Identity map)
.
The identity map
id
A
:
A A
is defined as the
map a 7→ a.
Definition
(Left inverse of function)
.
Given
f
:
A B
, a left inverse of
f
is a
function g : B A such that g f = id
A
.
Definition
(Right inverse of function)
.
Given
f
:
A B
, a right inverse of
f
is a function g : B A such that f g = id
B
.
Theorem. The left inverse of f exists iff f is injective.
Proof.
(
) If the left inverse
g
exists, then
a, a
0
A, f
(
a
) =
f
(
a
0
)
g
(
f
(
a
)) =
g(f(a
0
)) a = a
0
. Therefore f is injective.
() If f is injective, we can construct a g defined as
g :
(
g(b) = a if b f(A), where f(a) = b
g(b) = anything otherwise
.
Then g is a left inverse of f .
Theorem. The right inverse of f exists iff f is surjective.
Proof.
(
) We have
f
(
g
(
B
)) =
B
since
f g
is the identity function. Thus
f
must be surjective since its image is B.
(
) If
f
is surjective, we can construct a
g
such that for each
b B
, pick
one a A with f(a) = b, and put g(b) = a.
(Note that to prove the second part, for each
b
, we need to pick an
a
such
that
f
(
a
) =
b
. If
B
is infinite, doing so involves making infinite arbitrary choices.
Are we allowed to do so?
To make infinite choices, we need to use the Axiom of choice, which explicitly
says that this is allowed. In particular, it says that given a family of sets
A
i
for
i I, there exists a choice function f : I
S
A
i
such that f (i) A
i
for all i.
So can we prove the theorem without the Axiom of Choice? The answer is
no. This is since if we assume surjective functions have inverses, then we can
prove the Axiom of Choice.
Assume any surjective function
f
has a right inverse. Given a family of
non-empty sets
A
i
for
i I
(wlog assume they are disjoint), define a function
f
:
S
A
i
I
that sends each element to the set that contains the element. This
is surjective since each set is non-empty. Then it has a right inverse. Then the
right inverse must send each set to an element in the set, i.e. is a choice function
for A
i
.)
Definition
(Inverse of function)
.
An inverse of
f
is a function that is both a
left inverse and a right inverse. It is written as
f
1
:
B A
. It exists iff
f
is
bijective, and is necessarily unique.
2.3 Relations
Definition
(Relation)
.
A relation
R
on
A
specifies that some elements of
A
are
related to some others. Formally, a relation is a subset
R A × A
. We write
aRb iff (a, b) R.
Example. The following are examples of relations on natural numbers:
(i) aRb iff a and b have the same final digit. e.g. (37)R(57).
(ii) aRb iff a divides b. e.g. 2R6 and 2 6R7.
(iii) aRb iff a 6= b.
(iv) aRb iff a = b = 1.
(v) aRb iff |a b| 3.
(vi) aRb iff either a, b 5 or a, b 4.
Again, we wish to classify different relations.
Definition (Reflexive relation). A relation R is reflexive if
(a) aRa.
Definition (Symmetric relation). A relation R is symmetric iff
(a, b) aRb bRa.
Definition (Transitive relation). A relation R is transitive iff
(a, b, c) aRb bRc aRc.
Example. With regards to the examples above,
Examples (i) (ii) (iii) (iv) (v) (vi)
Reflexive X X × × X X
Symmetric X × X X X X
Transitive X X × X × X
Definition (Equivalence relation). A relation is an equivalence relation if it is
reflexive, symmetric and transitive. e.g. (i) and (vi) in the above examples are
equivalence relations.
If it is an equivalence relation, we usually write
instead of
R
. As the name
suggests, equivalence relations are used to describe relations that are similar
to equality. For example, if we want to represent rational numbers as a pair
of integers, we might have an equivalence relation defined by (
n, m
)
(
p, q
) iff
nq
=
mp
, such that two pairs are equivalent if they represent the same rational
number.
Example.
If we consider a deck of cards, define two cards to be related if they
have the same suite.
As mentioned, we like to think of things related by
as equal. Hence we
want to identify all “equal” things together and form one new object.
Definition
(Equivalence class)
.
If
is an equivalence relation, then the equiva-
lence class [x] is the set of all elements that are related via to x.
Example. In the cards example, [8] is the set of all hearts.
Definition
(Partition of set)
.
A partition of a set
X
is a collection of subsets
A
α
of X such that each element of X is in exactly one of A
α
.
Theorem.
If
is an equivalence relation on
A
, then the equivalence classes of
form a partition of A.
Proof.
By reflexivity, we have
a
[
a
]. Thus the equivalence classes cover the
whole set. We must now show that for all
a, b A
, either [
a
] = [
b
] or [
a
]
[
b
] =
.
Suppose [
a
]
[
b
]
6
=
. Then
c
[
a
]
[
b
]. So
a c, b c
. By symmetry,
c b
. By transitivity, we have
a b
. For all
b
0
[
b
], we have
b b
0
. Thus
by transitivity, we have
a b
0
. Thus [
b
]
[
a
]. By symmetry, [
a
]
[
b
] and
[a] = [b].
On the other hand, each partition defines an equivalence relation in which
two elements are related iff they are in the same partition. Thus partitions and
equivalence relations are “the same thing”.
Definition
(Quotient map)
.
The quotient map
q
maps each element in
A
to
the equivalence class containing a, i.e. a 7→ [a]. e.g. q(8) = {♥}.
3 Division
If you think you already know how to divide, well perhaps you are right. However,
in this chapter, we will define these notions formally and properly prove things
we already know (plus maybe some new things).
3.1 Euclid’s Algorithm
Definition
(Factor of integers)
.
Given
a, b Z
, we say
a
divides
b
,
a
is a factor
of
b
or
a | b
if (
c Z
)
b
=
ac
. For any
b
,
±
1 and
±b
are always factors of
b
.
The other factors are called proper factors.
Theorem
(Division Algorithm)
.
Given
a, b Z
,
b 6
= 0, there are unique
q, r Z
with a = qb + r and 0 r < b.
Despite the name, the division algorithm is not an algorithm in the usual
sense. Instead, it merely states that you can divide. Even the proof does not
specify a (non-brute force) way of how to divide.
Proof.
Choose
q
=
max{q
:
qb a}
. This maximum exists because the set of all
q
such that
qb a
is finite. Now write
r
=
a qb
. We have 0
r < b
and thus
q and r are found.
To show that they are unique, suppose that
a
=
qb
+
r
=
q
0
b
+
r
0
. We
have (
q q
0
)
b
= (
r
0
r
). Since both
r
and
r
0
are between 0 and
b
, we have
b < r r
0
< b
. However,
r
0
r
is a multiple of
b
. Thus
q q
0
=
r
0
r
= 0.
Consequently, q = q
0
and r = r
0
.
Definition
(Common factor of integers)
.
A common factor of
a
and
b
is a
number c Z such that c | a and c | b.
Definition
(Highest common factor/greatest common divisor)
.
The highest
common factor or greatest common divisor of two numbers
a, b N
is a number
d N
such that
d
is a common factor of
a
and
b
, and if
c
is also a common
factor, c | d.
Clearly if the hcf exists, it must be the largest common factor, since all other
common factors divide it, and thus necessarily unique.
You might think reasonably it is more natural to define
hcf
(
a, b
) to be the
largest common factor. Then show that it has the property that all common
factors divide it. But the above definition is superior because it does not require
a prior ordering of the natural numbers (and can be extended to any ring even
if they are not ordered, as we will do in IB Groups, Rings and Modules).
Notation. We write d = hcf(a, b) = gcd(a, b) = (a, b).
Here we use (
a, b
) to stand for a number, and has nothing to do with an
ordered pair.
Proposition. If c | a and c | b, c | (ua + vb) for all u, v Z.
Proof.
By definition, we have
a
=
kc
and
b
=
lc
. Then
ua
+
vb
=
ukc
+
vlc
=
(uk + vl)c. So c | (ua + vb).
Theorem. Let a, b N. Then (a, b) exists.
Proof.
Let
S
=
{ua
+
vb
:
u, v Z}
be the set of all linear combinations of
a, b
.
Let
d
be the smallest positive member of
S
. Say
d
=
xa
+
yb
. Hence if
c | a, c | b
,
then c | d. So we need to show that d | a and d | b, and thus d = (a, b).
By the division algorithm, there exist numbers
q, r Z
with
a
=
qd
+
r
with
0
r < d
. Then
r
=
aqd
=
a
(1
qx
)
qyb
. Therefore
r
is a linear combination
of
a
and
b
. Since
d
is the smallest positive member of
S
and 0
r < d
, we have
r = 0 and thus d | a. Similarly, we can show that d | b.
Corollary.
(from the proof) Let
d
= (
a, b
), then
d
is the smallest positive linear
combination of a and b.
Corollary
(B´ezout’s identity)
.
Let
a, b N
and
c Z
. Then there exists
u, v Z with c = ua + vb iff (a, b) | c.
Proof.
(
) Let
d
= (
a, b
). If
c
is a linear combination of
a
and
b
, then
d | c
because d | a and d | b.
(
) Suppose that
d | c
. Let
d
=
xa
+
yb
and
c
=
kd
. Then
c
= (
kx
)
a
+ (
ky
)
b
.
Thus c is a linear combination of a and b.
Note that the proof that (
a, b
) exists is existential, not constructive. How
can we actually find d, and how can we find x, y such that d = xa + yb?
While it might be easy to simply inspect
d
for small numbers, how would
you find common factors of, say, 4931 and 3795? We cannot use primes because
(a) prime factorization is hard; and (b) primes are not yet defined.
You might spot if
c |
4931 and
c |
3795, then
c |
(4931
3795) = 1136. The
process is also reversible if
c |
1136 and
c |
3795, then
c |
(1136 + 3795) = 4931.
Thus the problem is equivalent to finding common factors of 3795 and 1136. The
process can be repeated until we have small numbers.
Proposition
(Euclid’s Algorithm)
.
If we continuously break down
a
and
b
by
the following procedure:
a = q
1
b + r
1
b = q
2
r
1
+ r
2
r
1
= q
3
r
2
+ r
3
.
.
.
r
n2
= q
n
r
n1
then the highest common factor is r
n1
.
Proof.
We have (common factors of
a, b
) = (common factors of
b, r
1
) = (common
factors of r
1
, r
2
) = ··· = (factors of r
n1
).
This gives an alternative proof that hcfs exist.
How efficient is this algorithm? For every step, we have
a b
+
r
1
>
2
r
1
.
Thus every two steps, the number on the left goes down by at least half. Hence
the number of digits goes down every 8 steps. Thus the time needed is
8
×
number of digits and has time complexity O(log b).
Example. Suppose a = 57 and b = 42.
common factors of 57 and 42 57 = 1 × 42 + 15
= common factors of 42 and 15 42 = 2 × 15 + 12
= common factors of 15 and 12 15 = 1 × 12 + 3
= common factors of 12 and 3 12 = 4 × 3 + 0
= common factors of 3 and 0
= factors of 3.
So the hcf is 3.
By reversing Euclid’s Algorithm, we can find the hcf of two numbers as a
linear combination of a and b.
Example. Consider 57 and 21.
57 = 2 × 21 + 15
21 = 1 × 15 + 6
15 = 2 × 6 + 3
6 = 2 × 3
In the opposite direction, we have
3 = 15 2 × 6
= 15 2 × (21 15)
= 3 × 15 2 × 21
= 3 × (57 2 × 21) 2 ×21
= 3 × 57 8 × 21
This gives an alternative constructive proof of ezout’s identity. Moreover,
it gives us a quick way of expressing (
a, b
) =
ax
+
by
. However, this algorithm
requires storing the whole process of Euclid’s Algorithm and is not efficient
space-wise.
To achieve higher space efficiency, we attempt to find a recurrence relation
for the coefficients
A
j
, B
j
such that
a × B
j
b × A
j
= (
1)
j
r
j
. The possible
factor of
1 is there just so that the recurrence relation will look nicer. Suppose
that this is satisfied for all indices less than j. Then we have
(1)
j
r
j
= (1)
j
(r
j2
q
j
r
j1
)
= (1)
j2
r
j2
+ q
j
(1)
j1
r
j1
= a(B
j2
+ q
j
B
j1
) b(A
j2
+ q
j
A
j1
).
Hence we can obtain the following recurrence relation:
A
j
= q
j
A
j1
+ A
j2
B
j
= q
j
B
j1
+ B
j2
with
a × B
j
b × A
j
= (1)
j
r
j
.
In particular, a × B
n1
b × A
n1
= (1)
n1
r
n1
= (a, b).
Also, by an easy induction, A
j
B
j1
B
j
A
j1
= (1)
j
. So (A
j
, B
j
) = 1.
These coefficients also play another role. We can put the Euclid’s Algorithm’s
equations in the following form:
57
21
= 2 +
15
21
21
15
= 1 +
6
15
15
6
= 2 +
3
6
6
3
= 2
Then we can write out the fraction
57
21
in continued fraction form
57
21
= 2 +
1
1 +
1
2 +
1
2
Expanding this continued fractions term by term, we can have the sequence
2
,
2 +
1
1
= 3, 2 +
1
1+
1
2
=
8
3
. These are called the “convergents”. The sequence
happens to be
A
i
B
i
.
3.2 Primes
There are a lot of ways we can define prime numbers in
N
. The definition we
will use is the following:
Definition
(Prime number)
. p N
is a prime if
p >
1 and the only factors of
p
(in Z) are ±1 and ±p.
In this chapter, the objective is to prove things we already know and think
are obvious.
Theorem. Every number can be written as a product of primes.
Proof.
If
n N
is not a prime itself, then by definition
n
=
ab
. If either
a
or
b
is not prime, then that number can be written as a product, say
b
=
cd
. Then
n
=
acd
and so on. Since these numbers are getting smaller, and the process
will stop when they are all prime.
In the proof, we handwaved a bit when we said “and so on”. We will later
come up with the principle of (strong) induction that rigorously justifies this.
This is the case for many proofs we will have here.
Theorem. There are infinitely many primes.
Proof.
(Euclid’s proof) Suppose there are finitely many primes, say
p
1
, p
2
···p
n
.
Then
N
=
p
1
p
2
···p
n
+ 1 is divisible by none of the primes. Otherwise,
p
j
|
(
N p
1
p
2
···p
n
), i.e.
p
j
|
1, which is impossible. However,
N
is a product of
primes, so there must be primes not amongst p
1
, p
2
···p
n
.
Proof.
(Erd¨os 1930) Suppose that there are finitely many primes,
p
1
, p
2
···p
k
.
Consider all numbers that are the products of these primes, i.e.
p
j
1
1
p
j
2
2
···p
j
k
k
,
where
j
i
0. Factor out all squares to obtain the form
m
2
p
i
1
1
p
i
2
2
···p
i
k
k
, where
m N and i
j
= 0 or 1.
Let
N N
. Given any number
x N
, when put in the above form, we have
m
N
. So there are at most
N
possible values of
m
. For each
m
, there
are 2
k
numbers of the form
m
2
p
i
1
1
p
i
2
2
···p
i
k
k
. So there are only
N ×
2
k
possible
values of x of this kind.
Now pick
N
4
k
. Then
N >
N ×
2
k
. So there must be a number
N
not of this form, i.e. it has a prime factor not in this list.
Historically, many people have came up with “new” proofs that there are
infinitely many primes. However, most of these proofs were just Euclid’s proof
in disguise. Erd¨os’ proof is genuinely a new proof. For example, Euclid’s proof
comes up with a particular number
N
, and says all its factors are not in the
list of primes. On the other hand, Erd¨os’ proof says that there is some number,
which we don’t know, with at least one factor not in the list.
Also, the proofs give different bounds on when we should expect to see the
k
th prime. For example, Euclid tells us that the
k
th prime must be less than
2
2
k
, while Erd¨os tells us it is less than 4
k
.
Theorem. If a | bc and (a, b) = 1, then a | c.
Proof.
From Euclid’s algorithm, there exist integers
u, v Z
such that
ua
+
vb
= 1.
So multiplying by c, we have uac + vbc = c. Since a | bc, a |LHS. So a | c.
Definition (Coprime numbers). We say a, b are coprime if (a, b) = 1.
Corollary. If p is a prime and p | ab, then p | a or p | b. (True for all p, a, b)
Proof.
We know that (
p, a
) =
p
or 1 because
p
is a prime. If (
p, a
) =
p
, then
p | a. Otherwise, (p, a) = 1 and p | b by the theorem above.
Corollary. If p is a prime and p | n
1
n
2
···n
i
, then p | n
i
for some i.
Note that when we defined primes, we defined it in terms of factors of
p
.
This corollary is the opposite it is about how
p
behaves as a factor of other
numbers.
Theorem
(Fundamental Theorem of Arithmetic)
.
Every natural number is
expressible as a product of primes in exactly one way. In particular, if
p
1
p
2
···p
k
=
q
1
q
2
···q
l
, where
p
i
, q
i
are primes but not necessarily distinct,
then k = l. q
1
, ···q
l
are p
1
, ···p
k
in some order.
Proof.
Since we already showed that there is at least one way above, we only
need to show uniqueness.
Let
p
1
···p
k
=
q
1
···q
l
. We know that
p
1
| q
1
···q
l
. Then
p
1
| q
1
(
q
2
q
3
···q
l
).
Thus
p
1
| q
i
for some
i
. wlog assume
i
= 1. Then
p
1
=
q
1
since both are primes.
Thus p
2
p
3
···p
k
= q
2
q
3
···q
l
. Likewise, we have p
2
= q
2
, ··· and so on.
Corollary.
If
a
=
p
i
1
1
p
i
2
2
···p
i
r
r
and
b
=
p
j
1
1
p
j
2
2
···p
j
r
r
, where
p
i
are distinct primes
(exponents can be zero). Then (
a, b
) =
Q
p
min{i
k
,j
k
}
k
. Likewise,
lcm
(
a, b
) =
Q
p
max{i
k
,j
k
}
k
. We have hcf(a, b) × lcm(a, b) = ab.
However, this is not an efficient way to calculate (
a, b
), since prime factoriza-
tion is very hard.
Note that this is a property peculiar to natural numbers. There are “arith-
metical systems” (permitting addition, multiplication and subtraction) where
factorization is not unique, e.g. even numbers.
Example. The following systems have no prime unique factorization
(i)
Even numbers. “Primes” are twice of odd numbers. So 6 is a prime (NOT
divisible by 2!) while 8 is not. We have 60 = 2
×
30 = 6
×
10, where
2
,
6
,
10
,
30 are primes. However, this example is not “proper” since there
is no identity element. (i.e. not a ring)
(ii)
Consider
Z
[
5
] =
{a
+
b
5
:
a, b Z}
. We have 6 = 2
×
3 =
(1
5
)(1+
5
). It can be shown that these are primes (see IB Groups,
Rings and Modules).
Exercise: Where does the proof of the Fundamental Theorem of Arithmetic
fail in these examples?
4 Counting and integers
This chapter exists because experience shows that mathematicians do not know
how to count.
4.1 Basic counting
A useful theorem is the pigeonhole principle.
Theorem
(Pigeonhole Principle)
.
If we put
mn
+ 1 pigeons into
n
pigeonholes,
then some pigeonhole has at least m + 1 pigeons.
Example.
In Cambridge, there are 2 people who have the same number of
hairs.
Another useful tool for counting is the indicator function.
Definition
(Indicator function/characteristic function)
.
Let
X
be a set. For
each
A X
, the indicator function or characteristic function of
A
is the function
i
A
:
X {
0
,
1
}
with
i
A
(
x
) = 1 if
x A
, 0 otherwise. It is sometimes written as
χ
A
.
Proposition.
(i) i
A
= i
B
A = B
(ii) i
AB
= i
A
i
B
(iii) i
¯
A
= 1 i
A
(iv) i
AB
= 1
i
AB
= 1
i
¯
A
¯
B
= 1
i
¯
A
i
¯
B
= 1
(1
i
A
)(1
i
B
) =
i
A
+ i
B
i
AB
.
(v) i
A\B
= i
A
¯
B
= i
A
i
¯
B
= i
A
(1 i
B
) = i
A
i
AB
Example.
We can use the indicator function to prove certain properties about
sets:
(i) Proof that A (B C) = (A B) (A C):
i
A(BC)
= i
A
i
BC
= i
A
(i
B
+ i
C
i
B
i
C
)
= i
A
i
B
+ i
A
i
C
i
A
i
B
i
C
i
(AB)(AC)
= i
AB
+ i
AC
i
AC
i
AB
= i
A
i
B
+ i
A
i
C
i
A
i
C
i
A
i
B
= i
A
i
B
+ i
A
i
C
i
A
i
B
i
C
Therefore
i
A(BC)
=
i
(AB)(AC)
and thus
A
(
B C
) = (
A B
)
(A C).
Note that i
A
= i
2
A
since i
A
= 0 or 1, and 0
2
= 0 and 1
2
= 1.
(ii)
Proof that the symmetric difference is associative: Observe that
i
AB
i
A
+ i
B
(mod 2). Thus i
(AB)∆C
= i
A∆(BC)
i
A
+ i
B
+ i
C
(mod 2).
Indicator functions are handy for computing the sizes of finite sets because if
A X, then |A| =
P
xX
i
A
(x).
Proposition. |A B| = |A| + |B| |A B|
Proof.
|A B| =
X
xX
i
A(x)B(x)
=
X
(i
A
(x) + i
B
(x) i
AB
(x))
=
X
i
A
(x) +
X
i
B
(x)
X
i
AB
(x)
= |A| + |B| |A B|
More importantly, we will use indicator functions to prove a powerful result.
Theorem
(Inclusion-Exclusion Principle)
.
Let
A
i
be subsets of a finite set
X
,
for 1 i n. Then
|
¯
A
1
···
¯
A
n
| = |X|
X
i
|A
i
| +
X
i<j
|A
i
A
j
| ···+ (1)
n
|A
1
···A
n
|.
Equivalently,
|A
1
···A
n
| =
X
i
|A
i
|
X
i<j
|A
i
A
j
| + ···+ (1)
n1
|A
1
···A
n
|.
The two forms are equivalent since |A
1
···A
n
| = |X| |
¯
A
1
···
¯
A
n
|.
Proof. Using indicator functions,
i
¯
A
1
¯
A
2
∩···∩
¯
A
n
=
Y
j
i
¯
A
j
=
Y
j
(1 i
A
j
)
= 1
X
i
i
A
i
+
X
i<j
i
A
i
i
A
j
···+ (1)
n
i
A
1
i
A
2
···i
A
n
= 1
X
i
i
A
i
+
X
i<j
i
A
i
A
j
···+ (1)
n
i
A
1
A
2
A
3
∩···∩A
n
Thus
|
¯
A
1
···
¯
A
n
| =
X
xX
i
¯
A
1
¯
A
2
∩···∩
¯
A
n
(x)
=
X
x
1
X
i
X
x
i
A
i
(x) +
X
i<j
X
x
i
A
i
A
j
(x) ···
+
X
x
(1)
n
i
A
1
A
2
A
3
∩···∩A
n
(x)
= |X|
X
i
|A
i
| +
X
i<j
|A
i
A
j
|
X
i<j<k
|A
i
A
j
A
k
| + ···+ (1)
n
|A
1
A
2
···A
n
|
Example. How many numbers 200 are coprime to 110?
Let
X
=
{
1
, ···
200
}
, and
A
1
=
{x
: 2
| x}
,
A
2
=
{x
: 5
| x}
,
A
3
=
{x
: 11
| x}
.
We know that
|A
1
| = b200/2c = 100
|A
2
| = b200/5c = 40
|A
3
| = b200/11c = 18
|A
1
A
2
| = b200/10c = 20
|A
1
A
3
| = b200/22c = 9
|A
2
A
3
| = b200/55c = 3
|A
1
A
2
A
3
| = b200/110c = 1
Then the answer is 200 100 40 18 + 20 + 9 + 3 1 = 73.
4.2 Combinations
“Combinations” is about counting the ways we can pick things without regards
to order. We can formulate these problems in the language of sets: given a set
X
, how many subsets of
X
are there that satisfy some particular properties?
For example, if we want to pick 3 people from 10, we can let
X
be the set of the
10 people. Then the number of ways of picking the 3 people is the number of
subsets of size three.
Example.
How many subsets of
{
1
,
2
, ···n}
are there? There are 2
×
2
×···×
2 =
2
n
. Since for each subset, every element is either in or out of the subset, and there
are two choices for each element. Equivalently, there are 2
n
possible indicator
functions, i.e. functions {1, 2, 3, ··· , n} {0, 1}.
Definition
(Combination
n
r
)
.
The number of subsets of
{
1
,
2
,
3
, ··· , n}
of size
r is denoted by
n
r
. The symbol is pronounced as n choose r”.
This is the definition of
n
r
. This does not in any way specify how we can
actually calculate the value of
n
r
.
Proposition. By definition,
n
0
+
n
1
+ ···+
n
n
= 2
n
Theorem (Binomial theorem). For n N with a, b R, we have
(a + b)
n
=
n
0
a
n
b
0
+
n
1
a
n1
b
1
+ ···+
n
r
a
nr
b
r
+ ···+
n
n
a
0
b
n
Proof.
We have (
a
+
b
)
n
= (
a
+
b
)(
a
+
b
)
···
(
a
+
b
). When we expand the product,
we get all terms attained by choosing
b
from some brackets,
a
from the rest. The
term
a
nr
b
r
comes from choosing
b
from
r
brackets,
a
from the rest, and there
are
n
r
ways to make such a choice.
This theorem is not immediately useful since we do not know the value of
n
r
!
Because of this theorem,
n
r
is sometimes called a “binomial coefficient”.
Proposition.
(i)
n
r
=
n
n r
. This is because choosing
r
things to keep is the same as
choosing n r things to throw away.
(ii)
n
r 1
+
n
r
=
n + 1
r
(Pascal’s identity) The RHS counts the number
of ways to choose a team of
r
players from
n
+ 1 available players, one of
whom is Pietersen. If Pietersen is chosen, there are
n
r1
ways to choose
the remaining players. Otherwise, there are
n
r
ways. The total number
of ways is thus
n
r1
+
n
r
.
Now given that
n
0
=
n
n
= 1, since there is only one way to choose
nothing or everything, we can construct Pascal’s triangle:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
where each number is the sum of the two numbers above it, and the
r
th
item of the nth row is
n
r
(first row is row 0).
(iii)
n
k

k
r
=
n
r

n r
k r
. We are counting the number of pairs of sets
(
Y, Z
) with
|Y |
=
k
and
|Z|
=
r
with
Z Y
. In the LHS, we first choose
Y
then choose
Z Y
. The RHS chooses
Z
first and then choose the
remaining Y \ Z from {1, 2, ···n} \ Z.
(iv)
a
r

b
0
+
a
r 1

b
1
+
···
+
a
r k

b
k
+
···
a
0

b
r
=
a + b
r
(Vandermonde’s convolution) Suppose we have
a
men and
b
women, and
we need to choose a committee of
r
people. The right hand side is the total
number of choices. The left hand side breaks the choices up according to
the number of men vs women.
Example.
A greengrocer stocks
n
kinds of fruit. In how many ways can we
choose a bag of
r
fruits? If we are only allowed to choose one of each kind, then
the answer is
n
r
. But we might have
r
= 4, and we want to allow picking 2
apples, 1 plum and 1 quince. The total number of ways to choose is
n+r1
r
.
Why?
Each choice can be represented by a binary string of length
n
+
r
1, with
r
0’s and
n
1 1’s. The string can be constructed as follows (by example): when
n
= 5 and
r
= 8, a possible binary string 000100110010. The block of zeros
corresponds to the number of each fruit chosen, and the 1s separate the choices.
In the string above, we have 3 of type 1, 2 of type 2, 0 of type 3, 2 of type 4 and
1 of type 5.
Then clearly the number of possible strings is
n+r1
r
.
Proposition.
n
r
=
n!
(n r)!r!
.
Proof.
There are
n
(
n
1)(
n
2)
···
(
nr
+1) =
n!
(nr)!
ways to choose
r
elements
in order. Each choice of subsets is chosen this way in
r
! orders, so the number of
subsets is
n!
(nr)!r!
.
We might write
x
r
for the polynomial
x
(
x
1)
···
(
x r
+ 1). We call this
x
to the
r
falling”. We can write
n
r
=
n
r
r!
. Multiplying Vandermonde by
r
!,
we obtain the “falling binomial theorem”
r
0
a
r
b
0
+
r
1
a
r1
b
1
+ ···+
r
r
a
0
b
r
= (a + b)
r
.
Example.
A bank prepares a letter for each of its
n
customers, saying how
much it cares. (Each of these letters costs the customer £40) There are
n
! ways
to put the letters in the envelopes. In how many ways can this be done so that no
one gets the right letter (i.e. how many derangements are there of
n
elements)?
We let
X
be the set of all envelopings (permutation of
n
).
|X|
=
n
!. For each
i
, let
A
i
=
{x X
:
x assigns the correct letter to customer i}
. We want to
know
|
T
i
¯
A
i
|
. We know that
|A
i
|
= (
n
1)! since
i
’s letter gets in
i
’s envelopes
and all others can be placed randomly. We have
|A
i
A
j
|
= (
n
2)! as well.
Similarly, |A
i
A
j
A
k
| = (n 3)!.
By the inclusion-exclusion formula, we have
\
i
¯
A
i
= |X|
X
|A
i
| +
X
|A
i
A
j
| + ···
= n!
n
1
(n 1)! +
n
2
(n 2)! ···
= n!
1
1
1!
+
1
2!
··· +
(1)
n
n!
n!e
1
4.3 Well-ordering and induction
Several proofs so far involved “take the least integer such that”, e.g. division
algorithm; or involved a sequence of moves “and so on. . . e.g. Euclid’s algorithm,
every number is a product of primes. We rely on the following:
Theorem
(Weak Principle of Induction)
.
Let
P
(
n
) be a statement about the
natural number n. Suppose that
(i) P (1) is true
(ii) (n) P (n) P (n + 1)
Then P (n) is true for all n 1.
Example (Tower of Hanoi). Referring to the image below,
n
.
.
.
3
2
1
A B C
The objective is to move the
n
rings on peg A to peg B, with the constraints
that you can only move one ring at a time, and you can never place a larger ring
onto a smaller ring.
Now claim that this needs exactly 2
n
1 moves.
Let
P
(
n
) be
n
rings needs 2
n
1 moves”. Note that this statement contains
two assertions (1) we can do it in 2
n
1 moves; (2) We can’t do it in fewer.
First consider P (1). We simply have to move the ring from A to B.
Suppose we have
n
+ 1 rings. We can move the top
n
rings to peg C, then
move the bottom ring to peg B, then move the
n
rings from
C
back to
B
.
Assuming P (n) is true, this needs at most 2 × (2
n
1) + 1 = 2
n+1
1 moves.
Can we do it in fewer moves? To succeed, we must free the bottom ring,
so we must shift the top
n
rings to another pe.g. This needs
2
n
1 moves
by
P
(
n
). Then we need to shift the bottom ring. Then we need to shift the
n
smaller rings to the big one. This needs
2
n
1 moves by
P
(
n
). So this needs
2
n+1
1 moves altogether.
So we showed that
P
(
n
)
P
(
n
+ 1) (we used
P
(
n
) four times). By the WPI,
P (n) is true for all n.
Example.
All numbers are equal. Let
P
(
n
) be “if
{a
1
, ···a
n
}
is a set of
n
numbers, then
a
1
=
a
2
=
···a
n
”.
P
(1) is trivially true. Suppose we have
{a
1
, a
2
···a
n+1
}
. Assuming
P
(
n
), apply it to
{a
1
, a
2
···a
n
}
and
{a
2
, ··· , a
n+1
}
,
then
a
1
=
···
=
a
n
and
a
2
=
a
3
=
···
=
a
n+1
. So
a
1
=
a
2
=
···
=
a
n+1
. Hence
P (n) P (n + 1). So P (n) is true for all n N.
Theorem. Inclusion-exclusion principle.
Proof.
Let
P
(
n
) be the statement “for any sets
A
1
···A
n
”, we have
|A
1
···
A
n
| =
P
i
|A
i
|
P
i<j
|A
i
A
j
| + ··· ± |A
i
A
2
··· A
n
|”.
P
(1) is trivially true.
P
(2) is also true (see above). Now given
A
1
···A
n+1
,
Let B
i
= A
i
A
n+1
for 1 i n. We apply P (n) both to the A
i
and B
i
.
Now observe that
B
i
B
j
=
A
i
A
j
A
n+1
. Likewise,
B
i
B
j
B
k
=
A
i
A
j
A
k
A
n+1
. Now
|A
1
A
2
··· A
n+1
| = |A
1
··· A
n
| + |A
n+1
| |(A
1
··· A
n
) A
n+1
|
= |A
1
··· A
n
| + |A
n+1
| |B
1
··· B
n
|
=
X
in
|A
i
|
X
i<jn
|A
i
A
j
| + ··· + |A
n+1
|
X
in
|B
i
| +
X
i<jn
|B
i
B
j
| ···
Note
P
in
|B
i
|
=
P
in
|A
i
A
n+1
|
. So
P
i<jn
|A
i
A
j
|
+
P
in
|B
i
|
=
P
i<jn+1
|A
i
A
j
|
,
and similarly for the other terms. So
=
X
in+1
|A
i
|
X
i<jn+1
|A
i
A
j
| + ···
So P (n) P (n + 1) for n 2. By WPI, P (n) is true for all n.
However, WPI is not quite what we want for “every number is a product of
primes”. We need a different form of induction.
Theorem
(Strong principle of induction)
.
Let
P
(
n
) be a statement about
n N
.
Suppose that
(i) P (1) is true
(ii) n N , if P (k) is true k < n then P (n) is true.
Then P (n) is true for all n N .
Note that (i) is redundant as it follows from (ii), but we state it for clarity.
Example.
“Evolutionary trees” Imagine that we have a mutant that can produce
two offsprings. Each offspring is either an animal or another mutant. A possible
evolutionary tree is as follows:
mutant
mutant
pig
mutant
slug
man
mutant
gnu
ibex
Let
P
(
n
) be the statement
n
1 mutants produces
n
animals. Given some tree
with
n
animals, remove the top mutant to get two sub-trees, with
n
1
and
n
2
animals, where
n
1
+
n
2
=
n
. If
P
(
k
) is true
k < n
, then
P
(
n
1
) and
P
(
n
2
) are
true. So the total number of mutants is 1 + (
n
1
1) + (
n
2
1) =
n
1. So
P
(
n
)
is true. Hence by strong principle of induction, P (n) is true for all n.
Theorem.
The strong principle of induction is equivalent to the weak principle
of induction.
Proof.
Clearly the strong principle implies the weak principle since if
P
(
n
)
P (n + 1), then (P (1) P (2) ··· P (n)) P (n + 1).
Now show that the weak principle implies the strong principle. Suppose that
P
(1) is true and (
n
)
P
(1)
P
(2)
··· P
(
n
1)
P
(
n
). We want to show
that P (n) is true for all n using the weak principle.
Let
Q
(
n
) =
P
(
k
) is true
k n
”. Then
Q
(1) is true. Suppose that
Q
(
n
) is
true. Then
P
(1)
P
(2)
···P
(
n
) is true. So
P
(
n
+ 1) is true. Hence
Q
(
n
+ 1)
is true. By the weak principle,
Q
(
n
) is true for all
n
. So
P
(
n
) is true for all
n.
While strong and weak induction are practically equivalent, they are rather
distinct conceptually. Weak induction is expressed in terms of “adding 1”, while
strong induction is based on the ordering of natural numbers.
It turns out that there is another statement about the natural numbers that
can be stated in terms of orders. We first formally define what it means to be
an order.
Definition
(Partial order)
.
A partial order on a set is a reflexive, antisymmetric
((aRb) (bRa) a = b) and transitive relation.
Example.
The ordinary ordering of
N a b
is a partial order of
N
. Also,
a | b
on N is also a partial order.
Definition
(Total order)
.
A total order is a partial order where
a 6
=
b
, exactly
one of aRb or bRa holds. This means that every two things must be related.
Definition
(Well-ordered total order)
.
A total order is well-ordered if every
non-empty subset has a minimal element, i.e. if
S 6
=
, then
m S
such that
x < m x 6∈ S.
Example. Z
with the usual order is not well-ordered since the set of even
integers has no minimum. The positive rationals are also not well-ordered under
the usual order.
Theorem
(Well-ordering principle)
. N
is well-ordered under the usual order,
i.e. every non-empty subset of N has a minimal element.
Theorem.
The well-ordering principle is equivalent to the strong principle of
induction.
Proof.
First prove that well-ordering implies strong induction. Consider a
proposition P (n). Suppose P (k) is true k < n implies P (n).
Assume the contrary. Consider the set
S
=
{n N
:
¬P
(
n
)
}
. Then
S
has a
minimal element
m
. Since
m
is the minimal counterexample to
P
,
P
(
k
) is true
for all
k < m
. However, this implies that
P
(
m
) is true, which is a contradiction.
Therefore P (n) must be true for all n.
To show that strong induction implies well-ordering, let
S N
. Suppose
that
S
has no minimal element. We need to show that
S
is empty. Let
P
(
n
) be
the statement n 6∈ S.
Certainly 1
6∈ S
, or else it will be the minimal element. So
P
(1) is true.
Suppose we know that
P
(
k
) is true for all
k < n
, i.e.
k 6∈ S
for all
k < n
.
Now
n 6∈ S
, or else
n
will be the minimal element. So
P
(
n
) is true. By strong
induction, P (n) is true for all n, i.e. S is empty.
The well-ordering principle enables us to show that
P
(
n
) is true as follows:
if
P
(
n
) fails for some
n
, then there is a minimal counterexample
m
. Then we
try to show that this leads to a contradiction.
Example.
Proof that every number is a product of primes by strong induction:
Assume the contrary. Then there exists a minimal
n
that cannot be written as a
product of prime (by the well-ordering principle). If
n
is a prime, then
n
is a
product of primes. Otherwise, write
n
=
ab
, where 1
< a, b < n
. By minimality
of n, both a and b are products of primes. Hence so is n. Contradiction.
Example.
All numbers are interesting. Suppose that there are uninteresting
numbers. Then there exists a smallest uninteresting number. Then the property
of being the smallest uninteresting number is itself interesting. Contradiction.
Example.
Consider a total order on
N × N
by “lexicographic” or “dictionary”
order, i.e. (a, b) (c, d) if a < c or (a = c b d).
The Ackermann function is a function a : N
0
× N
0
N is defined by
a(m, n) =
n + 1 if m = 0
a(m 1, 1) if m > 0 and n = 0
a(m 1, a(m, n 1)) if m > 0 and n > 0.
We want to show that this is well-defined.
Note that
a
(
m, n
) is expressed in terms of
a
at points (
x, y
)
<
(
m, n
). So
a
is well-defined if lexicographic order is well-ordered, i.e. every non-empty subset
has a minimal element (if
a
were not well-defined, then would be a smallest place
where the definition is bad. But definition of that point is defined in terms of
smaller points which are well defined).
We can see that
N
0
× N
0
is well-ordered: if
S N
0
× N
0
is non-empty,
let
S
x
be the set of
{x N
: (
y
) (
x, y
)
S}
, i.e. the set of all
x
-coordinates
of
S
. By the well-ordering principle,
S
x
has a minimal element
m
. Then let
S
y
=
{y N
0
: (
m, y
)
S}
. Then
S
y
has a minimal element
n
. Then (
m, n
) is
the minimal element of S.
5 Modular arithmetic
We are going to study modular arithmetic. In modular arithmetic, we first pick
a particular natural number to be the modulus, say 7. Then we consider two
numbers to be “equal” if their difference is a multiple of the modulus. For
example, modulo 7, we will think that 3 and 10 are “the same”, while 2 and 4
are different.
We will study arithmetic under this number system. Like the integers, we
are allowed to add and multiply numbers. However, while in
Z
, we can only
divide by 1 and -1, in modular arithmetic, more numbers can be divided. For
example, modulo 10, we are allowed to divide by 3.
An important application of modular arithmetic is RSA encryption. This is
a widely deployed asymmetric encryption algorithm. By asymmetric, we mean
that the key for encryption is different from the key for decryption. This is very
useful in real life, since we can broadcast the encryption key to the world, and
keep the decryption key to ourselves. This way anyone can send you encrypted
messages that only you can decrypt.
5.1 Modular arithmetic
Definition
(Modulo)
.
If
a, b Z
have the same remainder after division by
m
,
i.e. m | (a b), we say a and b are congruent modulo m, and write
a b (mod m)
Example.
The check digits of the ISBN (or Hong Kong ID Card Number) are
calculated modulo 11.
Example. 9 0 (mod 3), 11 6 (mod 5).
Proposition. If a b (mod m), and d | m, then a b (mod d).
Proof. a b
(
mod m
) if and only if
m |
(
a b
), hence
d |
(
a b
), i.e.
a b
(mod d).
Observe that with
m
fixed,
a b
(
mod m
) is an equivalence relation. The
set of equivalence classes is written as Z
m
or Z/(mZ).
Example. Z
3
= {[0], [1], [2]}
Proposition.
If
a b
(
mod m
) and
u v
(
mod m
), then
a
+
u b
+
v
(mod m) and au bv (mod m).
Proof.
Since
a b
(
mod m
) and
u v
(
mod m
), we have
m |
(
ab
)+(
uv
) =
(a + u) (b + v). So a + u b + v (mod m)
Since
a b
(
mod m
) and
u v
(
mod m
), we have
m |
(
a b
)
u
+
b
(
u v
) =
au bv. So au bv (mod m).
This means that we can do arithmetic modulo
n
. Formally, we are doing
arithmetic with the congruence classes, i.e
Z
m
. For example, in
Z
7
, [4] + [5] =
[9] = [2].
Modular arithmetic can sometimes be used to show that equations have no
solutions.
Example.
2
a
2
+ 3
b
3
= 1 has no solutions in
Z
. If there were a solution, then
2
a
2
1 (
mod
3). But 2
·
0
2
0, 2
·
1
2
2 and 2
·
2
2
2. So there is no solution
to the congruence, and hence none to the original equation.
Observe that all odd numbers are either
1 (
mod
4) or
3
1 (
mod
4).
So we can classify primes depending on their value modulo 4.
Theorem. There are infinitely many primes that are 1 (mod 4).
Proof.
Suppose not. So let
p
1
, ···p
k
be all primes
1 (
mod
4). Let
N
=
4
p
1
p
2
···p
k
1. Then
N
1 (
mod
4). Now
N
is a product of primes, say
N
=
q
1
q
2
···q
`
. But 2
- N
and
p
i
- N
for all
i
. So
q
i
1 (
mod
4) for all
i
. But
then that implies N = q
1
q
2
···q
`
1 (mod 4), which is a contradiction.
Example.
Solve 7
x
2 (
mod
10). Note that 3
·
7
1 (
mod
10). If we multiply
the equation by 3, then we get 3
·
7
· x
3
·
2 (
mod
10). So
x
6 (
mod
10).
Effectively, we divided by 7.
“Division” doesn’t always work for all numbers, e.g. you cannot divide by 2
mod 10. We give a name to numbers we can divide.
Definition
(Unit (modular arithmetic))
. u
is a unit if
v
such that
uv
1
(mod m).
Theorem. u is a unit modulo m if and only if (u, m) = 1.
Proof.
(
) Suppose
u
is a unit. Then
v
such that
uv
1 (
mod m
). Then
uv
= 1 +
mn
for some
n
, or
uv mn
= 1. So 1 can be written as a linear
combination of u and m. So (u, m) = 1.
(
) Suppose that (
u, m
) = 1. Then there exists
a, b
with
ua
+
mb
= 1. Thus
ua 1 (mod m).
Using the above proof, we can find the inverse of a unit efficiently by Euclid’s
algorithm.
Corollary.
If (
a, m
) = 1, then the congruence
ax b
(
mod m
) has a unique
solution (mod m).
Proof.
If
ax b
(
mod m
), and (
a, m
) = 1, then
a
1
such that
a
1
a
1
(
mod m
). So
a
1
ax a
1
b
(
mod m
) and thus
x a
1
b
(
mod m
). Finally we
check that
x a
1
b
(
mod m
) is indeed a solution:
ax aa
1
b b
(
mod m
).
Proposition. There is a solution to ax b (mod m) if and only if (a, m) | b.
If
d
= (
a, m
)
| b
, then the solution is the unique solution to
a
d
x
b
d
(
mod
m
d
)
Proof.
Let
d
= (
a, m
). If there is a solution to
ax b
(
mod m
), then
m | ax b
.
So d | ax b and d | b.
On the contrary, if d | b, we have ax b (mod m) ax b = km for some
k Z
. Write
a
=
da
0
,
b
=
db
0
and
m
=
dm
0
. So
ax b
(
mod m
)
da
0
xdb
0
=
dkm
0
a
0
x b
0
=
km
0
a
0
x b
0
(
mod m
0
). Note that (
a
0
, m
0
) = 1 since
we divided by their greatest common factor. Then this has a unique solution
modulo m
0
.
Example.
2
x
3 (
mod
4) has no solution since (2
,
4) = 2 which does not
divide 3.
5.2 Multiple moduli
In this section, we are concerned with multiple equations involving different
moduli.
Suppose we are given
x
2 (
mod
3) and
x
1 (
mod
4). What is the general
solution to
x
? We work in mod 12. Since we are given that
x
2 (
mod
3),
we know that
x
2
,
5
,
8 or 11 (
mod
12). Similarly, since
x
1 (
mod
4), we
must have
x
1
,
5 or 9 (
mod
12). Combining these results, we must have
x
5
(mod 12).
On the other hand, if
x
5 (
mod
12), then
x
5
2 (
mod
3) and
x
5
1
(mod 4). So x 5 (mod 12) is indeed the most general solution.
In general, we have the Chinese remainder theorem.
Theorem
(Chinese remainder theorem)
.
Let (
m, n
) = 1 and
a, b Z
. Then
there is a unique solution (modulo mn) to the simultaneous congruences
(
x a (mod m)
x b (mod n)
,
i.e. x satisfying both and every other solution is x (mod mn).
Proof.
Since (
m, n
) = 1,
u, v Z
with
um
+
vn
= 1. Then
vn
1 (
mod m
)
and
um
1 (
mod n
). Put
x
=
umb
+
vna
. So
x a
(
mod m
) and
x b
(mod n).
To show it is unique, suppose both
y
and
x
are solutions to the equation.
Then
y a (mod m) and y b (mod n)
y x (mod m) and y x (mod n)
m | y x and n | y x
mn | y x
y x (mod mn)
This shows a congruence (mod
mn
) is equivalent to one (mod
n
) and another
(mod m).
We can easily extend this to more than two moduli by repeatedly applying
this theorem.
Proposition.
Given any (
m, n
) = 1,
c
is a unit mod
mn
iff
c
is a unit both
mod m and mod n.
Proof.
(
) If
u
such that
cu
1 (
mod mn
), then
cu
1 (
mod m
) and
cu
1
(mod n). So c is a unit mod m and n.
(
) Suppose there exists
u, v
such that
cu
1 (
mod m
) and
cv
1 (
mod n
).
Then by CRT,
w
with
w u
(
mod m
) and
w v
(
mod n
). Then
cw cu
1
(mod m) and cw cv 1 (mod n).
But we know that 1
1 (
mod m
) and 1
1 (
mod n
). So 1 is a solution to
cw
1 (
mod m
),
cw
1 (
mod n
). By the “uniqueness” part of the Chinese
remainder theorem, we must have cw 1 (mod mn).
Definition
(Euler’s totient function)
.
We denote by
φ
(
m
) the number of integers
a, 0 a m, such that (a, m) = 1, i.e. a is a unit mod m. Note φ(1) = 1.
Proposition.
(i) φ(mn) = φ(m)φ(n) if (m, n) = 1, i.e. φ is multiplicative.
(ii) If p is a prime, φ(p) = p 1
(iii) If p is a prime, φ(p
k
) = p
k
p
k1
= p
k
(1 1/p)
(iv) φ(m) = m
Q
p|m
(1 1/p).
Proof. We will only prove (iv). In fact, we will prove it twice.
(i) Suppose m = p
k
1
1
p
k
2
2
···p
k
`
`
. Then
φ(m) = φ(p
k
1
1
)φ(p
k
2
2
) ···φ(p
k
`
`
)
= p
k
1
1
(1 1/p
1
)p
k
2
2
(1 1/p
2
) ···p
k
`
`
(1 1/p
`
)
= m
Y
p|m
(1 1/p)
(ii)
Let
m
=
p
k
1
1
p
k
2
2
···p
k
`
`
. Let
X
=
{
0
, ···m
1
}
. Let
A
j
=
{x X
:
p
j
| x}
.
Then
|X|
=
m
,
|A
j
|
=
m/p
j
,
|A
i
A
j
|
=
m/
(
p
i
p
j
) etc. So
φ
(
m
) =
|
¯
A
1
¯
A
2
···
¯
A
`
| = m
Q
p|m
(1 1/p).
Example. φ(60) = 60(1 1/2)(1 1/3)(1 1/5) = 16.
If
a, b
are both units (mod
m
), then so is
ab
, for if
au
1 and
bv
1, then
(ab)(uv) 1. So the units form a multiplicative group of size φ(m).
5.3 Prime moduli
Modular arithmetic has some nice properties when the modulus is a prime
number.
Theorem (Wilson’s theorem). (p 1)! 1 (mod p) if p is a prime.
Proof.
If
p
is a prime, then 1
,
2
, ··· , p
1 are units. Among these, we can pair
each number up with its inverse (e.g. 3 with 4 in modulo 11). The only elements
that cannot be paired with a different number are 1 and
1, who are self-inverses,
as show below:
x
2
1 (mod p)
p | (x
2
1)
p | (x 1)(x + 1)
p | x 1 or p | x + 1
x ±1 (mod p)
Now (
p
1)! is a product of (
p
3)
/
2 inverse pairs together with 1 and
1. So
the product is 1.
Theorem
(Fermat’s little theorem)
.
Let
p
be a prime. Then
a
p
a
(
mod p
)
for all a Z. Equivalently, a
p1
1 (mod p) if a 6≡ 0 (mod p).
Proof. Two proofs are offered:
(i)
The numbers
{
1
,
2
, ···p
1
}
are units modulo
p
and form a group of order
p 1. So a
p1
1 by Lagrange’s theorem.
(ii)
If
a 6≡
0, then
a
is a unit. So
ax ay
iff
x y
. Then
a,
2
a,
3
a, ···
(
p
1)
a
are distinct mod
p
. So they are congruent to 1
,
2
, ···p
1 in some order.
Hence
a ·
2
a ···
3
a ···
(
p
1)
a
1
·
2
·
3
···
(
p
1). So
a
p1
(
p
1)!
(
p
1)!.
So a
p1
1 (mod p).
Neither Wilson nor Fermat’s theorem hold if the modulus is non-prime.
However, Fermat’s theorem can be generalized:
Theorem (Fermat-Euler Theorem). Let a, m be coprime. Then
a
φ(m)
1 (mod m).
Proof. Lagrange’s theorem: The units mod m form a group of size φ(m).
Alternatively, let
U
=
{x N
: 0
< x < m,
(
x, m
) = 1
}
. These are
the
φ
(
m
) units. Since
a
is a unit,
ax ay
(
mod m
) only if
x y
(
mod m
).
So if
U
=
{u
1
, u
2
, ··· , u
φ(m)
}
, then
{au
1
, au
2
, ···au
φ(m)
}
are distinct and are
units. So they must be
u
1
, ···u
φ(m)
in some order. Then
au
1
au
2
···au
φ(m)
u
1
u
2
···u
φ(m)
. So
a
φ(m)
z z
, where
z
=
u
1
u
2
···u
φ(m)
. Since
z
is a unit, we
can multiply by its inverse and obtain a
φ(m)
1.
Definition
(Quadratic residues)
.
The quadratic residues are the “squares” mod
p, i.e. 1
2
, 2
2
, ··· , (p 1)
2
.
Note that if
a
2
b
2
(
mod p
), then
p | a
2
b
2
= (
a b
)(
a
+
b
). Then
p | a b
or
p | a
+
b
. So
a ±b
(
mod p
). Thus every square is a square of exactly two
numbers.
Example.
If
p
= 7, then 1
2
6
2
1, 2
2
5
2
4, 3
2
4
2
2. So 1
,
2
,
4 are
quadratic residues. 3, 5, 6 are not.
Proposition.
If
p
is an odd prime, then
1 is a quadratic residue if and only if
p 1 (mod 4).
Proof.
If
p
1 (
mod
4), say
p
= 4
k
+ 1, then by Wilson’s theorem,
1
(
p
1)!
1
·
2
···
p1
2
p1
2
···
(
2)(
1)
(
1)
p1
2
p1
2
!
2
= (
1)
2k
(2
k
!)
2
(2k!)
2
. So 1 is a quadratic residue.
When
p
1 (
mod
4), i.e.
p
= 4
k
+ 3, suppose
1 is a square, i.e.
1
z
2
.
Then by Fermat’s little theorem, 1
z
p1
z
4k+2
(
z
2
)
2k+1
(
1)
2k+1
1.
Contradiction.
Proposition.
(Unproven) A prime
p
is the sum of two squares if and only if
p 1 (mod 4).
Proposition. There are infinitely many primes 1 (mod 4).
Proof.
Suppose not, and
p
1
, ···p
k
are all the primes
1 (
mod
4). Let
N
=
(2
p
1
···p
k
)
2
+1. Then
N
is not divisible by 2 or
p
1
, ··· , p
k
. Let
q
be a prime
q | N
.
Then
q
1 (
mod
4). Then
N
0 (
mod q
) and hence (2
p
1
···p
k
)
2
+ 1
0
(
mod q
), i.e. (2
p
1
···p
k
)
2
1 (
mod q
). So
1 is a quadratic residue mod
q
,
which is a contradiction since q 1 (mod 4).
Proposition.
Let
p
= 4
k
+ 3 be a prime. Then if
a
is a quadratic residue, i.e.
a z
2
(mod p) for some z, then z = ±a
k+1
.
Proof.
By Fermat’s little theorem,
a
2k+1
z
4k+2
z
p1
1. If we multiply by
a, then a
2k+2
a (mod p). So (±a
k+1
)
2
a (mod p).
This allows us to take square roots efficiently. This efficiency requires an
effective way of computing powers of
a
efficiently. This can be done by repeated
squaring. For example, to find
a
37
, we can calculate this by
a
37
=
a
32
a
4
a
1
=
((((
a
2
)
2
)
2
)
2
)
2
·
(
a
2
)
2
·a
. Thus calculation of
a
n
has time complexity
O
(
log n
), as
opposed to O(n) if you take powers manually.
Suppose
a
is a square mod
n
, where
n
=
pq
and
p, q
are distinct primes. Then
a
is a square mod
p
and a square mod
q
. So there exists some
s
with (
±s
)
2
a
(
mod p
) and some
t
with (
±t
)
2
a
(
mod q
). By the Chinese remainder theorem,
we can find a unique solution of each case, so we get 4 square roots of
a
modulo
n.
5.4 Public-key (asymmetric) cryptography
Tossing a coin over a phone
Suppose we have Alice and Bob who wish to toss a coin fairly over the phone.
Alice chooses two 100 digit primes with
p, q
3 (
mod
4). Then she tells Bob the
product
n
=
pq
. Bob picks a number
u
coprime to
n
, computes
a u
2
(
mod n
)
and tells Alice the value of a.
Alice can compute the square roots of
a
by the above algorithm (
O
(
log n
)),
obtain ±u, ±v and tells Bob one of these pairs.
Now if Alice picks
±u
, Bob says “you win”. Otherwise, Bob says “you lose”.
Can Bob cheat? If he says “you lose” when Alice says
±u
, Bob must produce
the other pair
±v
, but he can’t know
±v
without factorizing
n
. (If he knows
±u
and
±v
, then
u
2
v
2
(
mod n
), then
n |
(
u v
)(
u
+
v
). But
n -
(
u v
) and
n -
(
u
+
v
). So
p |
(
u v
) and
q |
(
u
+
v
). Then
p
= (
n, u v
) and
q
= (
n, u
+
v
)
which we can calculate efficiently by Euclid’s algorithm)
Thus cheating is as hard as prime factorization.
Note that a difficult part is to generate the 100-digit prime. While there are
sufficiently many primes to keep trying random numbers until we get one, we
need an efficient method to test whether a number is prime. We cannot do this
by factorization since it is slow. So we need an efficient prime-checking function.
We can test whether a large number is prime by doing Fermat-like checks.
We choose random numbers and take it to the (
p
1)th power and see if they
become 1. If it is not 1, then it is definitely not a prime. If we do sufficiently
many tests that all result in 1, we can be sufficiently certain that it is a prime
(even though not with 100% certainty).
(Recent advancements in algorithms have found efficient ways of deterministic
prime test, but they are generally slower than the above algorithm and is not
widely used)
It is currently believed that it is hard to prime factorize a number, so this is
secure as far as we know.
RSA encryption
Theorem
(RSA Encryption)
.
We want people to be able to send a message to
Bob without Eve eavesdropping. So the message must be encrypted. We want
an algorithm that allows anyone to encrypt, but only Bob to decrypt (e.g. many
parties sending passwords with the bank).
Let us first agree to write messages as sequences of numbers, e.g. in ASCII
or UTF-8.
After encoding, the encryption part is often done with RSA encryption
(Rivest, Shamier, Adleman). Bob thinks of two large primes
p, q
. Let
n
=
pq
and pick
e
coprime to
φ
(
n
) = (
p
1)(
q
1). Then work out
d
with
de
1
(mod φ(n)) (i.e. de = kφ(n) + 1). Bob then publishes the pair (n, e).
For Alice to encrypt a message, Alice splits the message into numbers
M < n
.
Alice sends M
e
(mod n) to Bob.
Bob then computes (
M
e
)
d
=
M
(n)+1
M
(
mod n
) by Fermat-Euler
theorem.
How can Eve find
M
? We can, of course, factorize
n
, find
d
efficiently, and
be in the same position as Bob. However, it is currently assumed that this is
hard. Is there any other way? Currently we do not know if RSA can be broken
without factorizing (cf. RSA problem).
6 Real numbers
So far, we have only worked with natural numbers and integers. Unfortunately
the real world often involves rational numbers and even real numbers. In this
chapter, our goal is to study real numbers. To do so, we will first have to define
the real numbers. To do so, we will start from the natural numbers.
Before we start, an important (philosophical) point has to be made. The idea
is to define, say, the “real numbers” as a set with some operations (e.g. addition,
multiplication) that satisfies some particular properties, known as the axioms.
When we do this, there are two questions we can ask is there actually a set
that satisfies these properties, and is it unique?
The first question can be answered by an explicit construction, i.e. we find
a concrete set that actually satisfies these properties. However, it is important
to note that we perform the construction only to show that it makes sense to
talk about such a set. For example, we will construct a real number as a pair of
subsets of
Q
, but it would be absurd to actually think that a real number “is” a
pair of sets. It’s just that it can be constructed as a pair of sets. You would be
considered insane if you asked if
x : x 3 x π
holds, even though it is a valid thing to ask if we view each real number as a set
(and is in fact true).
The next problem of uniqueness is more subtle. Firstly, it is clear that the
constructions themselves aren’t unique instead of constructing the natural
number 0 as the set
, as we will later do, we could as well define it as
{{∅}}
,
and the whole construction will go through. However, we could still hope that
all possible constructions are “isomorphic” in some way. It turns out this is true
for what we have below, but the proofs are not trivial.
However, while this is nice, it doesn’t really matter. This is since we don’t
care how, say, the real numbers are constructed. When working with them, we
just assume that they satisfy the relevant defining properties. So we can just
choose anything that satisfies the axioms, and use it. The fact that there are
other ones not isomorphic to this is not a problem.
Since we are mostly interested in the natural numbers and the real numbers,
we will provide both the axiomatic description and explicit construction for these.
However, we will only give explicit constructions for the integers and rationals,
and we will not give detailed proofs of why they work.
6.1 Construction of numbers
Construction of natural numb ers
Our construction of natural numbers will include 0.
Definition
(Natural numbers)
.
The natural numbers
N
is defined by Peano’s
axioms. We call a set
N
“natural numbers” if it has a special element 0 and a
map S : N N that maps n to its “successor” (intuitively, it is +1) such that:
(i) S(n) 6= 0 for all n N
(ii) For all n, m N, if S(n) = S(m), then n = m.
(iii)
For any subset
A N
, if 0
A
and
n A S
(
n
)
A
, then in fact
A = N.
The last axiom is the axiom of induction.
We write 1 =
S
(0)
,
2 =
S
(1), 3 =
S
(2) etc. We can (but will not) prove that
the axiom of induction allows us to define functions on
N
recursively (cf. IID
Logic and Set Theory). Assuming this, we can define addition and multiplication
recursively by
n + 0 = n n × 0 = 0
n + S(m) = S(n + m) n × S(m) = n × m + n
We can show by induction that these satisfy the usual rules (e.g. associativity,
distributivity).
We can construct this explicitly by 0 =
, 1 =
{
0
}
, 2 =
{
0
,
1
}
etc. In general,
we define
S
(
n
) =
{n} n
. Note that this is in some sense a circular definition,
since we are defining the natural numbers recursively, but to do recursion, we
need the natural numbers. Also, it is not clear how we can show this satisfies
the axioms above. To actually do this properly, we will need to approach this in
a slightly different way, and the details are better left for the IID Logic and Set
Theory course.
Construction of integers
Definition
(Integers)
. Z
is obtained from
N
by allowing subtraction. Formally,
we define
Z
to be the equivalence classes of
N ×N
under the equivalence relation
(a, b) (c, d) if and only if a + d = b + c.
Intuitively, we think of (a, b) as a b.
We write a for [(a, 0)] and a for [(0, a)], and define the operations by
(a, b) + (c, d) = (a + c, b + d)
(a, b) × (c, d) = (ac + bd, bd + ad).
We can check that these are well-defined and satisfy the usual properties.
Construction of rationals
Definition
(Rationals)
. Q
is obtained from
Z
by allowing division. Formally,
we define Q to be the equivalence classes of Z × N under the relation
(a, b) (c, d) if and only if ad = bc.
We write
a
b
for [(a, b)]. We define
(a, b) + (c, d) = (ad + bc, bd)
(a, b) × (c, d) = (ac, bd).
We can check that these are well-defined and satisfy the usual properties.
Algebraically, we say Q is a “totally ordered field”.
Definition
(Totally ordered field)
.
A set
F
equipped with binary operations
+, × and relation is a totally ordered field if
(i) F is an additive abelian group with identity 0.
(ii) F \ {0} is a multiplicative abelian group with identity 1.
(iii) Multiplication is distributed over addition: a(b + c) = ab + ac.
(iv) is a total order.
(v) For any p, q, r F , if p q, then p + r q + r.
(vi) For any p, q, r F , if p q and 0 r, then pr qr.
Proposition. Q is a totally ordered-field.
Examples of non-totally-ordered fields include
Z
p
, which is a field but not
totally ordered.
Proposition. Q
is densely ordered, i.e. for any
p, q Q
, if
p < q
, then there is
some r Q such that p < r < q.
Proof. Take r =
p+q
2
.
However, Q is not enough for our purposes.
Proposition. There is no rational q Q with q
2
= 2.
Proof.
Suppose not, and (
a
b
)
2
= 2, where
b
is chosen as small as possible. We
will derive a contradiction in four ways.
(i) a
2
= 2
b
2
. So
a
is even. Let
a
= 2
a
0
. Then
b
2
= 2
a
02
. Then
b
is even as
well, and b = 2b
0
. But then
a
b
=
a
0
b
0
with a smaller b
0
. Contradiction.
(ii)
We know that
b
is a product of primes if
b 6
= 1. Let
p | b
. Then
a
2
= 2
b
2
.
So p | a
2
. So p | a. Contradict b minimal.
(iii)
(Dirichlet) We have
a
b
=
2b
a
. So
a
2
= 2
b
2
. For any,
u, v
, we have
a
2
v
= 2
b
2
v
and thus
uab
+
a
2
v
=
uab
+ 2
b
2
v
. So
a
b
=
au+2bv
bu+av
. Put
u
=
1
, v
= 1.
Then
a
b
=
2ba
ab
. Since
a <
2
b, a b < b
. So we have found a rational with
smaller b.
(iv)
Same as 3, but pick
u, v
so
bu
+
av
= 1 since
a
and
b
are coprime. So
a
b
is
an integer.
Construction of real numb ers
As shown above, the rational numbers don’t have all the numbers we need. What
exactly do we mean by “numbers are missing”? We might want to say that the
problem is that not all polynomial equations have solutions. However, this is
not the real problem. First of all, even if we are working with the reals, not all
equations have solutions, e.g.
x
2
+ 1 = 0. Also, some real numbers such as
π
are not solutions to polynomial equations (with integer coefficients), but we still
want them.
The real problem is expressed in terms of least upper bounds, or suprema.
Definition
(Least upper bound/supremum and greatest lower bound/infimum)
.
For an ordered set
X
,
s X
is a least upper bound (or supremum) for the set
S X, denoted by s = sup S, if
(i) s is an upper bound for S, i.e. for every x S, we have x s.
(ii) if t is any upper bound for S, then s t.
Similarly,
s X
is a greatest lower bound (or infimum) if
s
is a lower bound and
any lower bound t s.
By definition, the least upper bound for S, if exists, is unique.
The problem with
Q
is that if we let
S
=
{q Q
:
q
2
<
2
}
, then it has no
supremum in Q.
Recall that
Q
is a totally ordered field. We will define the real numbers
axiomatically to be a totally ordered field without this problem.
Definition
(Real numbers)
.
The real numbers is a totally ordered field contain-
ing Q that satisfies the least upper bound axiom.
Axiom (Least upper bound axiom). Every non-empty set of the real numbers
that has an upper bound has a least upper bound.
We have the requirement “non-empty” since every number is an upper bound
of but it has no least upper bound.
We will leave the construction to the end of the section.
Note that
Q
is a subset of
R
, in the sense that we can find a copy of
Q
inside
R
. By definition of a field, there is a multiplicative identity 1
R
. We can then
define the natural numbers by
n = 1 + ··· + 1
| {z }
n times
.
We can then define the negative integers by letting
n
be the additive inverse of
n
. Then
1
n
is the multiplicative inverse of
n
(for
n 6
= 0), and
m
n
is just
m
copies
of
1
n
added together. This is our canonical copy of Q.
Corollary.
Every non-empty set of the real numbers bounded below has an
infimum.
Proof.
Let
S
be non-empty and bounded below. Then
S
=
{−x
:
x S}
is a
non-empty set bounded above, and inf S = sup(S).
Alternatively, we can prove it just using the ordering of R:
Proof.
Let
S
be non-empty and bounded below. Let
L
be the set of all lower
bounds of
S
. Since
S
is bounded below,
L
is non-empty. Also,
L
is bounded
above by any member of S. So L has a least upper bound sup L.
For each
x S
, we know
x
is an upper bound of
L
. So we have
sup L x
by definition. So sup L is indeed a lower bound of S. Also, by definition, every
lower bound of S is less than (or equal to) sup L. So this is the infimum.
Now the set {q Q : q
2
< 2} has a supremum in R (by definition).
We make some useful definitions.
Definition
(Closed and open intervals)
.
A closed interval [
a, b
] with
a b R
is the set {x R : a x b}.
An open interval (a, b) with a b R is the set {x R : a < x < b}.
Similarly, we can have [
a, b
) =
{x R
:
a x < b}
and (
a, b
] =
{x R
:
a <
x b}.
Example.
Let
S
= [0
,
1]. Then
S 6
=
. Also
S
has an upper bound, e.g. 2.
Hence sup S exists.
To find it explicitly, notice that 1 is an upper bound for
S
by definition, and
if
t <
1, then
t
is not an upper bound for
S
since 1
S
but 1
6≤ t
. So every
upper bound is at least 1 and therefore 1 is the supremum of S.
Now let
T
= (0
,
1). Again
T
is non-empty and has an upper bound (e.g.
2). So again
sup T
exists. We know that 1 is an upper bound. If
t <
0, then
0
.
5
S
but
s 6≤ t
. So
t
is not an upper bound. Now suppose 0
t <
1, then
0
< t <
1+t
2
<
1 and so
1+t
2
S
but
1+t
2
6≤ t
. So
t
is not an upper bound. So
sup T = 1.
Note that these cases differ by
sup S S
but
sup T 6∈ T
.
S
has a maximum
element 1 and the maximum is the supremum.
T
doesn’t have a maximum, but
the supremum can still exist.
The real numbers has a rather interesting property.
Theorem
(Axiom of Archimedes)
.
Given
r R
, there exists
n N
with
n > r
.
This was considered an axiom by Archimedes but we can prove this with the
least upper bound axiom.
Proof.
Assume the contrary. Then
r
is an upper bound for
N
.
N
is not empty
since 1
N
. By the least upper bound axiom,
s
=
sup N
exists. Since
s
is the
least upper bound for
N
,
s
1 is not an upper bound for
N
. So
m N
with
m > s
1. Then
m
+ 1
N
but
m
+ 1
> s
, which contradicts the statement
that s is an upper bound.
Notice that every non-empty set
S R
which is bounded below has a greatest
lower bound (or infimum). In particular, we have
Proposition. inf{
1
n
: n N} = 0.
Proof.
Certainly 0 is a lower bound for
S
. If
t >
0, there exists
n N
such that
n 1/t. So t 1/n S. So t is not a lower bound for S.
Theorem. Q
is dense in
R
, i.e. given
r, s R
, with
r < s
,
q Q
with
r < q < s.
Proof.
wlog assume first
r
0 (just multiply everything by
1 if
r <
0 and
swap
r
and
s
). Since
s r >
0, there is some
n N
such that
1
n
< s r
. By
the Axiom of Archimedes, N N such that N > sn.
Let
T
=
{k N
:
k
n
s}
.
T
is not empty, since
N T
. Then by the well-
ordering principle,
T
has a minimum element
m
. Now
m 6
= 1 since
1
n
< s r s
.
Let
q
=
m1
n
. Since
m
1
6∈ T
,
q < s
. If
q
=
m1
n
< r
, then
m
n
< r
+
1
n
< s
, so
m 6∈ T , contradiction. So r < q < s.
Theorem. There exists x R with x
2
= 2.
Proof.
Let
S
=
{r R
:
r
2
2
}
. Then 0
S
so
S 6
=
. Also for every
r S
,
we have r 3. So S is bounded above. So x = sup S exists and 0 x 3.
By trichotomy, either x
2
< 2, x
2
> 2 or x
2
= 2.
Suppose
x
2
<
2. Let 0
< t <
1. Then consider (
x
+
t
)
2
=
x
2
+ 2
xt
+
t
2
<
x
2
+ 6
t
+
t x
2
+ 7
t
. Pick
t <
2x
2
7
, then (
x
+
t
)
2
<
2. So
x
+
t S
. This
contradicts the fact that x is an upper bound of S.
Now suppose
x
2
>
2. Let 0
< t <
1. Then consider (
x t
)
2
=
x
2
2
xt
+
t
2
x
2
6
t
. Pick
t <
x
2
2
6
. Then (
x t
)
2
>
2, so
x t
is an upper bound for
S
.
This contradicts the fact that x is the least upper bound of S.
So by trichotomy, x
2
= 2.
Now let’s try to construct the real numbers from the rationals. The idea
is that each set like
{q Q
:
q
2
<
2
}
represents a “missing number”, namely
the supremum
2
. However, many different sets can correspond to the same
missing number, e.g.
{q Q
:
q
2
<
2
} {−
3
}
also “should have” supremum
2
.
So we need to pick a particular one that represents this missing number.
To do so, we pick the maximal one, i.e. a subset
S
such that for
x S
and
y Q
, if
y < x
, then
y S
(if
y
is not in
S
, we can add it to
S
, and
S {y}
will
“have the same upper bound”). Each rational
q Q
can also be represented by
the set
{x Q
:
x q}
. These are known as Dedekind cuts. By convention, a
Dedekind cut is instead defined as the pair (
S, Q \ S
), and can be characterized
by the following definition:
Definition
(Dedekind cut)
.
A Dedekind cut of
Q
is a set of partition of
Q
into
L and R such that
(l L)(r R) l < r,
and R has no minimum.
The requirement that
R
has no minimum corresponds to our (arbitrary)
decision that the rationals should be embedded as
q 7→ {x q : x Q}, {x Q : x > q},
instead of q 7→ {x q : x < Q}, {x Q : x q},
We can then construct the set
R
from
Q
by letting
R
be the set of all
Dedekind cuts. The supremum of any bounded set of real numbers is obtained
by taking the union of (the left sides) of the Dedekind cuts. The definition of
the arithmetic operations is left as an exercise for the reader (to actually define
them is tedious but not hard).
6.2 Sequences
Here we will look at sequences, with series in the next chapter. Only a brief
introduction to these topics will be provided here, as these topics will be studied
later in Analysis I.
Definition
(Sequence)
.
A sequence is a function
N R
. If
a
is a sequence,
instead of
a
(1)
, a
(2)
, ···
, we usually write
a
1
, a
2
, ···
. To emphasize it is a
sequence, we write the sequence as (a
n
).
We want to capture the notion of a sequence tending to a limit. For example,
we want to say that 1
,
1
2
,
1
3
, ···
tends to 0, while 1
,
2
,
3
, ···
does not converge to
a limit.
The idea is that if
a
n
l
, then we can get as close to
l
as we like, as long
as we are sufficiently far down the sequence. More precisely, given any “error
threshold”
ε
, we can find a (possibly large) number
N
such that whenever
n N
,
we have |a
n
l| < ε.
Definition
(Limit of sequence)
.
The sequence (
a
n
) tends to
l R
as
n
tends
to infinity if and only if
(ε > 0)(N N)(n N ) |a
n
l| < ε.
If
a
n
tends to
l
as
n
tends to infinity, we write
a
n
l
as
n
;
lim
n→∞
a
n
=
l
;
or a
n
converges to l.
Intuitively, if
a
n
l
, we mean given any
ε
, for sufficiently large
n
,
a
n
is
always within l ± ε.
The definition a
n
6→ l is the negation of the above statement:
(ε > 0)(N N)(n N ) |a
n
l| ε.
Definition
(Convergence of sequence)
.
The sequence (
a
n
) converges if there
exists an l such that a
n
l. The sequence diverges if it doesn’t converge.
Every proof of
a
n
l
looks like: Given
ε >
0, (argument to show
N
exists,
maybe depending on ε), such that n N, |a
n
l| < ε.
Example. Show that a
n
= 1
1
n
1.
Given
ε >
0, choose
N >
1
ε
, which exists by the Axiom of Archimedes. If
n N , then |a
n
1| =
1
n
ε. So a
n
1.
Example. Let
a
n
=
(
1
n
n is prime
1
2n
n is not prime
.
We will show that
a
n
0. Given
ε >
0. Choose
N >
1
ε
. Then
n N
,
|a
n
0|
1
n
< ε.
Example. Prove that
a
n
=
(
1 n is prime
0 n is not prime
diverges.
Let
ε
=
1
3
. Suppose
l R
. If
l <
1
2
, then
|a
n
l| > ε
when
n
is prime. If
l
1
2
, then
|a
n
l| > ε
when
n
is not prime. Since the primes and non-primes
are unbounded, (N)n > N such that |a
n
l| > ε. So a
n
diverges.
An important property of R is the following:
Theorem. Every bounded monotonic sequence converges.
In case of confusion, the terms are defined as follows: (
a
n
) is increasing if
m n
implies
a
m
a
n
. Decreasing is defined similarly. Then it is monotonic if
it is increasing or decreasing. (
a
n
) is bounded if there is some
B R
such that
|a
n
| B for all n.
Proof.
wlog assume (
a
n
) is increasing. The set
{a
n
:
n
1
}
is bounded and
non-empty. So it has a supremum
l
(least upper bound axiom). Show that
l
is
the limit:
Given any
ε >
0,
l ε
is not an upper bound of
a
n
. So
N
such that
a
N
l ε
. Since
a
n
is increasing, we know that
l a
m
a
N
> l ε
for all
m N . So N such that n N, |a
n
l| < ε. So a
n
l.
We can show that this theorem is equivalent to the least upper bound axiom.
Definition
(Subsequence)
.
A subsequence of (
a
n
) is (
a
g(n)
) where
g
:
N N
is
strictly increasing. e.g. a
2
, a
3
, a
5
, a
7
··· is a subsequence of (a
n
).
Theorem. Every sequence has a monotonic subsequence.
Proof.
Call a point
a
k
a “peak” if (
m k
)
a
m
a
k
. If there are infinitely
many peaks, then they form a decreasing subsequence. If there are only finitely
many peaks,
N
such that no
a
n
with
n > N
is a peak. Pick
a
N
1
with
N
1
> N
.
Then pick
a
N
2
with
N
2
> N
1
and
a
N
2
> a
N
1
. This is possible because
a
N
1
is
not a peak. The pick
a
N
3
with
N
3
> N
2
and
a
N
3
> a
N
2
, ad infinitum. Then we
have a monotonic subsequence.
We will now prove the following basic properties of convergence:
Theorem.
(i) If a
n
a and a
n
b, then a = b (i.e. limits are unique)
(ii) If a
n
a and b
n
= a
n
for all but finitely many n, then b
n
a.
(iii) If a
n
= a for all n, then a
n
a.
(iv) If a
n
a and b
n
b, then a
n
+ b
n
a + b
(v) If a
n
a and b
n
b, then a
n
b
n
ab
(vi) If a
n
a 6= 0, and n(a
n
6= 0). Then 1/a
n
1/a.
(vii)
If
a
n
a
and
b
n
a
, and
n
(
a
n
c
n
b
n
), then
c
n
a
. (Sandwich
theorem)
Many students are confused as to why we should prove these “obvious”
properties of convergence. It seems “obvious” that if
a
n
converges to
a
and
b
n
converges to
b
, then the sum converges to
a
+
b
. However, it is not obvious (at least
for the first-time learners) that if (
ε >
0)(
N
)(
n N
)
|a
n
a| < ε
and (
ε >
0)(
N
)(
n N
)
|b
n
b| < ε
, then (
ε >
0)(
N
)(
n N
)
|
(
a
n
+
b
n
)
(
a
+
b
)
| < ε
.
In some sense, what we are trying to prove is that our attempt at defining
convergence actually satisfies the “obvious” properties we think convergence
should satisfy.
In proving this, we will make frequent use of the triangle inequality:
|x
+
y|
|x| + |y|.
Proof.
(i)
Suppose instead
a < b
. Then choose
ε
=
ba
2
. By the definition of the
limit,
N
1
such that
n N
1
,
|a
n
a| < ε
. There also
N
2
st.
n N
2
,
|a
n
b| < ε.
Let
N
=
max{N
1
, N
2
}
. If
n max{N
1
, N
2
}
, then
|a b| |a a
n
|
+
|a
n
b| < 2ε = b a. Contradiction. So a = b.
(ii)
Given
ε >
0, there
N
1
st.
n N
1
, we have
|a
n
a| < ε
. Since
b
n
=
a
n
for all but finitely many n, there exists N
2
such that n N
2
, a
n
= b
n
.
Let
N
=
max{N
1
, N
2
}
. Then
n N
, we have
|b
n
a|
=
|a
n
a| < ε
. So
b
n
a.
(iii) ε, take N = 1. Then |a
n
a| = 0 < ε for all n 1.
(iv)
Given
ε >
0,
N
1
such that
n N
1
, we have
|a
n
a| < ε/
2. Similarly,
N
2
such that n N
2
, we have |b
n
b| < ε/2.
Let
N
=
max{N
1
, N
2
}
. Then
n N
,
|
(
a
n
+
b
n
)
(
a
+
b
)
| |a
n
a|
+
|b
n
b| < ε.
(v) Given ε > 0, Then there exists N
1
, N
2
, N
3
such that
n N
1
: |a
n
a| <
ε
2(|b| + 1)
n N
2
: |b
n
b| <
ε
2|a|
n N
3
: |b
n
b| < 1 |b
n
| < |b| + 1
Then let N = max{N
1
, N
2
, N
3
}. Then n N,
|a
n
b
n
ab| = |b
n
(a
n
a) + a(b
n
b)|
|b
n
||a
n
a| + |a||b
n
b|
< (|b| + 1)|a
n
a| + |a||b
n
b|
<
ε
2
+
ε
2
= ε
(vi) Given ε > 0, then N
1
, N
2
such that |a
n
a| <
|a|
2
2
ε and |a
n
a| <
|a|
2
.
Let N = max{N
1
, N
2
}. The n N,
1
a
n
1
a
=
|a
n
a|
|a
n
||a|
<
2
|a|
2
|a
n
a|
< ε
(vii)
By (iii) to (v), we know that
b
n
a
n
0. Let
ε >
0. Then
N
such that
n N
, we have
|b
n
a
n
| < ε
. So
|c
n
a
n
| < ε
. So
c
n
a
n
0. So
c
n
= (c
n
a
n
) + a
n
a.
Example. Let x
n
=
n
2
(n+1)(2n+1)
n
4
+1
. Then we have
x
n
=
(1 + 1/n)(2 + 1/n)
1 + 1/n
4
1 · 2
1
= 2
by the theorem (many times).
Example.
Let
y
n
=
100
n
n!
. Since
y
n+1
y
n
=
100
n+1
<
1
2
for large
n >
200, we know
that 0 y
n
< y
200
·
2
200
2
n
. Since y
200
·
2
200
2
n
0, we know that y
n
0 as well.
6.3 Series
In a field, the sum of two numbers is defined. By induction, the sum of finitely
many numbers is defined as well. However, infinite sums (“series”) are not. We
will define what it means to take an infinite sum. Of course, infinite sums exist
only for certain nice sums. For example, 1 + 1 + 1 + ··· does not exist.
Definition
(Series and partial sums)
.
Let (
a
n
) be a sequence. Then
s
m
=
P
m
n=1
a
n
is the mth partial sum of (a
n
). We write
X
n=1
a
n
= lim
m→∞
s
m
if the limit exists.
Example. Let a
n
=
1
n(n1)
for n 2. Then
s
m
=
m
X
n=2
1
n(n 1)
=
m
X
n=2
1
n 1
1
n
= 1
1
m
1.
Then
X
n=2
1
n(n 1)
= 1.
Example.
Let
a
n
=
1
n
2
. Then
s
m
=
P
m
n=1
1
n
2
. We know that
s
m
is increasing.
We also know that
s
m
1 +
P
1
n(n1)
2, i.e. it is bounded above. So
s
m
converges and
P
n=1
1
n
2
exists (in fact it is π
2
/6).
Example.
(Geometric series) Suppose
a
n
=
r
n
, where
|r| <
1. Then
s
m
=
r ·
1r
m
1r
r
1r
since r
n
0. So
X
n=1
r
n
=
r
1 r
.
Example. (Harmonic series) Let a
n
=
1
n
. Consider
S
2
k
= 1 +
1
2
+
1
3
+
1
4
+
1
5
+
1
6
+
1
7
+
1
8
+
1
9
+ ···+
1
2
k
1 +
1
2
+
1
4
+
1
4
+
1
8
+
1
8
+
1
8
+
1
8
+
1
16
+ ···+
1
2
k
1 +
k
2
.
So
X
n=1
1
n
diverges.
Decimal expansions
Definition
(Decimal expansion)
.
Let (
d
n
) be a sequence with
d
n
{
0
,
1
, ···
9
}
.
Then
X
n=1
d
10
n
converges to a limit
r
with 0
r
1 since the partial sums
s
m
are increasing and bounded by
P
9
10
n
1 (geometric series). We say
r = 0.d
1
d
2
d
3
···, the decimal expansion of r.
Does every
x
with 0
x <
1 have a decimal expansion? Pick
d
1
maximal
such that
d
1
10
x <
1. Then 0
x
d
1
10
<
1
10
since
d
1
is maximal. Then pick
d
2
maximal such that
d
2
100
x
d
1
10
. By maximality, 0
x
d
1
10
d
2
100
<
1
100
.
Repeat inductively, pick maximal d
n
with
d
n
10
n
x
n1
X
j=1
d
j
10
j
so
0 x
n
X
j=1
d
j
10
j
<
1
10
n
.
Since both LHS and RHS
0, by sandwich,
x
P
j=1
d
j
10
j
= 0, i.e.
x
= 0
.d
1
d
2
···
.
Since we have shown that at least one decimal expansion, can the same
number have two different decimal expansions? i.e. if 0
.a
1
a
2
···
= 0
.b
1
b
2
···
,
must a
i
= b
i
for all i?
Now suppose that the
a
j
and
b
j
are equal until
k
, i.e.
a
j
=
b
j
for
j < k
. wlog
assume a
k
< b
k
. Then
X
j=k+1
a
j
10
j
X
j=k+1
9
10
j
=
9
10
k+1
·
1
1 1/10
=
1
10
k
.
So we must have
b
k
=
a
k
+ 1,
a
j
= 9 for
j > k
and
b
j
= 0 for
j > k
. For example,
0.47999 ··· = 0.48000 ···.
6.4 Irrational numbers
Recall Q R.
Definition (Irrational number). Numbers in R \ Q are irrational.
Definition (Periodic number). A decimal is periodic if after a finite number `
of digits, it repeats in blocks of k for some k, i.e. d
n+k
= d
n
for n > `.
Proposition. A number is periodic iff it is rational.
Proof.
Clearly a periodic decimal is rational: Say
x
= 0
.
7413157157157
···
.
Then
10
`
x = 10
4
x
= 7413.157157 ···
= 7413 + 157
1
10
3
+
1
10
6
+
1
10
9
+ ···
= 7413 + 157 ·
1
10
3
·
1
1 1/10
3
Q
Conversely, let
x Q
. Then
x
has a periodic decimal. Suppose
x
=
p
2
c
5
d
q
with (
q,
10) = 1. Then 10
max(c,d)
x
=
a
q
=
n
+
b
q
for some
a, b, n Z
and
0
b < q
. However, since (
q,
10) = 1, by Fermat-Euler, 10
φ(q)
1 (
mod q
), i.e.
10
φ(q)
1 = kq for some k. Then
b
q
=
kb
kq
=
kb
999 ···9
= kb
1
10
φ(q)
+
1
10
2φ(q)
+ ···
.
Since
kb < kq <
10
φ(q)
, write
kb
=
d
1
d
2
···d
φ(q)
. So
b
q
= 0
.d
1
d
2
···d
φ(q)
d
1
d
2
···
and x is periodic.
Example. x
= 0
.
01101010001010
···
, where 1s appear in prime positions, is
irrational since the digits don’t repeat.
6.5 Euler’s number
Definition (Euler’s number).
e =
X
j=0
1
j!
= 1 +
1
1!
+
1
2!
+
1
3!
+ ···
This sum exists because the partial sums are bounded by 1+
1
1
+
1
2
+
1
4
+
1
8
···
=
3 and it is increasing. So 2 < e < 3.
Proposition. e is irrational.
Proof.
Is
e Q
? Suppose
e
=
p
q
. We know
q
2 since
e
is not an integer (it is
between 2 and 3). Then q!e N. But
q!e = q! + q! +
q!
2!
+
q!
3!
+ ···+
q!
q!
| {z }
n
+
q!
(q + 1)!
+
q!
(q + 2)!
+ ···
| {z }
x
,
where n N. We also have
x =
1
q + 1
+
1
(q + 1)(q + 2)
+ ··· .
We can bound it by
0 < x <
1
q + 1
+
1
(q + 1)
2
+
1
(q + 1)
3
+ ··· =
1
q + 1
·
1
1 1/(q + 1)
=
1
q
< 1.
This is a contradiction since
q
!
e
must be in
N
but it is a sum of an integer
n
plus a non-integer x.
6.6 Algebraic numbers
Rational numbers are “nice”, because they can be written as fractions. Irrational
numbers are bad. However, some irrational numbers are worse than others. We
can further classify some irrational numbers as being transcendental.
Definition
(Algebraic and transcendental numbers)
.
An algebraic number is a
root of a polynomial with integer coefficients (or rational coefficients). A number
is transcendental if it is not algebraic.
Proposition. All rational numbers are algebraic.
Proof. Let x =
p
q
, then x is a root of qx p = 0.
Example.
2 is irrational but algebraic since it is a root of x
2
2 = 0.
So do transcendental numbers exist?
Theorem. (Liouville 1851; Non-examinable) L is transcendental, where
L =
X
n=1
1
10
n!
= 0.11000100 ···
with 1s in the factorial positions.
Proof.
Suppose instead that
f
(
L
) = 0 where
f
(
x
) =
a
k
x
k
+
a
k1
x
k1
+
···
+
a
0
,
where a
i
Z, a
k
6= 0.
For any rational p/q, we have
f
p
q
= a
k
p
q
k
+ ···+ a
0
=
integer
q
k
.
So if p/q is not a root of f, then |f(p/q)| q
k
.
For any m, we can write L = first m terms + rest of the terms = s + t.
Now consider |f (s)| = |f(L) f(s)| (since f(L) = 0). We have
|f(L) f (s)| =
X
a
i
(L
i
s
i
)
X
|a
i
(L
i
s
i
)|
=
X
|a
i
|(L s)(L
i1
+ ···+ s
i1
)
X
|a
i
|(L s)i,
= (L s)
X
i|a
i
|
= tC
with C =
P
i|a
i
|.
Writing
s
as a fraction, its denominator is at most 10
m!
. So
|f
(
s
)
|
10
k×m!
.
Combining with the above, we have tC 10
k×m!
.
We can bound t by
t =
X
j=m+1
10
j!
X
`=(m+1)!
10
`
=
10
9
10
(m+1)!
.
So (10
C/
9)10
(m+1)!
10
k×m!
. Pick
m N
so that
m > k
and 10
m!
>
10C
9
.
This is always possible since both
k
and 10
C/
9 are constants. Then the inequality
gives 10
(m+1)
10
(k+1)
, which is a contradiction since m > k.
Theorem. (Hermite 1873) e is transcendental.
Theorem. (Lindermann 1882) π is transcendental.
7 Countability
After messing with numbers, we finally get back to sets. Here we are concerned
about the sizes of sets. We can count how big a set is by constructing bijections.
Two sets have the same number of things if there is a bijection between them.
In particular, a set has
n
things if we can bijection it with [
n
] =
{
1
,
2
,
3
, ··· , n}
.
First first prove a few preliminary properties about bijecting with [
n
] that
should be obviously true.
Lemma. If f : [n] [n] is injective, then f is bijective.
Proof.
Perform induction on
n
: It is true for
n
= 1. Suppose
n >
1. Let
j = f(n). Define g : [n] [n] by
g(j) = n, g(n) = j, g(i) = i otherwise.
Then
g
is a bijection. So the map
g f
is injective. It fixes
n
, i.e.
g f
(
n
) =
n
.
So the map h : [n 1] [n 1] by h(i) = g f (i) is well-defined and injective.
So h is surjective. So h is bijective. So g f is bijective. So is f.
Corollary.
If
A
is a set and
f
:
A
[
n
] and
g
:
A
[
m
] are both bijections,
then m = n.
Proof.
wlog assume
m n
. Let
h
: [
n
]
[
m
] with
h
(
i
) =
i
, which is injective.
Then the map
h f g
1
: [
m
]
[
m
] is injective. Then by the lemma this is
surjective. So h must be surjective. So n m. Hence n = m.
This shows that we cannot biject a set to two different numbers, or a set
cannot have two different sizes!
Definition
(Finite set and cardinality of set)
.
The set
A
is finite if there exists
a bijection
A
[
n
] for some
n N
0
. The cardinality or size of
A
, written as
|A|, is n. By the above corollary, this is well-defined.
Lemma. Let S N. Then either S is finite or there is a bijection g : N S.
Proof.
If
S 6
=
, by the well-ordering principle, there is a least element
s
1
S
.
If
S \ {s
1
} 6
=
, it has a least element
s
2
. If
S \ {s
1
, s
2
}
is not empty, there is a
least element
s
3
. If at some point the process stops, then
S
=
{s
1
, s
2
, ··· , s
n
}
,
which is finite. Otherwise, if it goes on forever, the map
g
:
N S
given by
g
(
i
) =
s
i
is well-defined and is an injection. It is also a surjection because if
k S
, then
k
is a natural number and there are at most
k
elements of
S
less
than k. So k will be mapped to s
i
for some i k.
Definition
(Countable set)
.
A set
A
is countable if
A
is finite or there is a
bijection between A and N. A set A is uncountable if A is not countable.
This is one possible definition of countability, but there are some (often)
more helpful definitions.
Theorem. The following are equivalent:
(i) A is countable
(ii) There is an injection from A N
(iii) A = or there is a surjection from N A
Proof.
(i)
(iii): If
A
is finite, there is a bijection
f
:
A S
for some
S N
.
For all
x N
, if
x S
, then map
x 7→ f
1
(
x
). Otherwise, map
x
to any element
of A. This is a surjection since a A, we have f(a) 7→ a.
(iii)
(ii): If
A 6
=
and
f
:
N A
is a surjection. Define a map
g
:
A N
by g(a) = min f
1
({a}), which exists by well-ordering. So g is an injection.
(ii)
(i): If there is an injection
f
:
A N
, then
f
gives a bijection between
A
and
S
=
f
(
A
)
N
. If
S
is finite, so is
A
. If
S
is infinite, there is a bijection
g
between S and N. So there is a bijection g f between A and N.
Often, the injection definition is the most helpful.
Proposition. The integers Z are countable.
Proof. The map f : Z N given by
f(n) =
(
2n n > 0
2(n) + 1 n 0
is a bijection.
Proposition. N × N is countable.
Proof.
We can map (
a, b
)
7→
2
a
3
b
injectively by the fundamental theorem of
arithmetic. So N × N is countable.
We can also have a bijection by counting diagonally: (
a, b
)
7→
a+b
2
a
+ 1:
1
1
2
2
3
3
4
4
1
3
6
10
2
5
9
14
4
8
13
19
7
12
18
25
Since
Z
is countable, we have an injection
Z N
, so there is an injection
from
Z × N N × N N
. So
Z × N
is countable. However, the rationals are
the equivalence classes of Z × N. So Q is countable.
Proposition.
If
A B
is injective and
B
is countable, then
A
is countable
(since we can inject B N).
Proposition. Z
k
is countable for all k N
Proof.
Proof by induction:
Z
is countable. If
Z
k
is countable,
Z
k+1
=
Z × Z
k
.
Since we can map
Z
k
N
injectively by the induction hypothesis, we can map
injectively Z
k+1
Z × N, and we can map that to N injectively.
Theorem. A countable union of countable sets is countable.
Proof.
Let
I
be a countable index set, and for each
α I
, let
A
α
be a countable
set. We need to show that
S
αI
A
α
is countable. It is enough to construct an
injection
h
:
S
αI
A
α
N × N
because
N × N
is countable. We know that
I
is
countable. So there exists an injection
f
:
I N
. For each
α I
, there exists
an injection g
α
: A
α
N.
For
a
S
A
α
, pick
m
=
min{j N
:
a A
α
and f
(
α
) =
j}
, and let
α
be
the corresponding index such that
f
(
α
) =
m
. We then set
h
(
a
) = (
m, g
α
(
a
)),
and this is an injection.
Proposition. Q is countable.
Proof. It can be proved in two ways:
(i) Q
=
S
n1
1
n
Z
=
S
n1
m
n
: m Z
, which is a countable union of count-
able sets.
(ii) Q
can be mapped injectively to
Z × N
by
a/b 7→
(
a, b
), where
b >
0 and
(a, b) = 1.
Theorem. The set of algebraic numbers is countable.
Proof.
Let
P
k
be the set of polynomials of degree
k
with integer coefficients.
Then
a
k
x
k
+
a
k1
x
k1
+
···
+
a
0
7→
(
a
k
, a
k1
, ··· , a
0
) is an injection
P
k
Z
k+1
.
Since Z
k+1
is countable, so is P
k
.
Let
P
be the set of all polynomials with integer coefficients. Then clearly
P =
S
P
k
. This is a countable union of countable sets. So P is countable.
For each polynomial
p P
, let
R
p
be the set of its roots. Then
R
p
is
finite and thus countable. Hence
S
p∈P
R
p
, the set of all algebraic numbers, is
countable.
Theorem. The set of real numbers R is uncountable.
Proof.
(Cantor’s diagonal argument) Assume
R
is countable. Then we can list
the reals as
r
1
, r
2
, r
3
···
so that every real number is in the list. Write each
r
n
uniquely in decimal form (i.e. without infinite trailing ’9’s). List them out
vertically:
r
1
= n
1
. d
11
d
12
d
13
d
14
···
r
2
= n
2
. d
21
d
22
d
23
d
24
···
r
3
= n
3
. d
31
d
32
d
33
d
34
···
r
4
= n
4
. d
41
d
42
d
43
d
44
···
Define
r
= 0
. d
1
d
2
d
3
d
4
···
by
d
n
=
(
0 d
nn
6= 0
1 d
nn
= 0
. Then by construction, this
differs from the
n
th number in the list by the
n
th digit, and is so different
from every number in the list. Then
r
is a real number but not in the list.
Contradiction.
Corollary. There are uncountable many transcendental numbers.
Proof.
If not, then the reals, being the union of the transcendentals and algebraic
numbers, must be countable. But the reals is uncountable.
This is an easy but non-constructive proof that transcendental numbers exists.
“If we can’t find one, find lots!” (it is debatable whether this proof is constructive
or not. Some argue that we can use this to construct a transcendental number
by listing all the algebraic numbers and perform the diagonal argument to
obtain a number not in the list, i.e. a transcendental number. So this is in fact
constructive)
Example.
Let
F
k
=
{Y N
:
|Y |
=
k}
, i.e. the set of all subsets of
N
of size
k
.
We can inject
F
k
Z
k
in the obvious way, e.g.
{
1
,
3
,
7
} 7→
(1
,
3
,
7) etc. So it is
countable. So F =
S
k0
F
k
, the set of all finite subsets of N is countable.
Example.
Recall
P
(
X
) =
{Y
:
Y X}
. Now suppose
P
(
N
) is countable. Let
S
1
, S
2
, S
3
, ···
be the list of all subsets of
N
. Let
S
=
{n
:
n 6∈ S
n
}
. But then
S
is not in the list. Contradiction. So P(N) is uncountable.
Example.
Let Σ be the set of all functions
N N
(i.e. the set of all integer
sequences). If Σ were countable, we could list it as
f
1
, f
2
, f
3
···
. But then consider
f
given by
f
(
n
) =
(
1 f
n
(n) 6= 1
2 f
n
(n) = 1
. Again
f
is not in the list. Contradiction. So
Σ is uncountable.
Alternatively, there is a bijection between
P
(
N
) and the set of 0, 1 sequences
by
S 7→
the indicator function. So we can inject
P
(
N
)
Σ by
S 7→
indicator
function +1. So Σ cannot be countable (since P(N) is uncountable).
Or, we can let Σ
Σ be the set of bijections from
N N
. Let Σ
∗∗
Σ
be the bijections of the special form: for every n,
either
(
f(2n 1) = 2n 1
f(2n) = 2n
, or
(
f(2n 1) = 2n
f(2n) = 2n 1
,
i.e. for every odd-even pair, we either flip them or keep them the same.
But there is a bijection between Σ
∗∗
and 0
,
1 sequences: if the
n
th term in
the sequence = 0, don’t flip the
n
th pair in the function, vice versa. Hence Σ
∗∗
is uncountable.
Theorem. Let A be a set. Then there is no surjection from A P(A).
Proof.
Suppose
f
:
A P
(
A
) is surjective. Let
S
=
{a A
:
a 6∈ f
(
a
)
}
. Since
f
is surjective, there must exist
s A
such that
f
(
s
) =
S
. If
s S
, then
s 6∈ S
by the definition of
S
. Conversely, if
s 6∈ S
, then
s S
. Contradiction. So
f
cannot exist.
This shows that there are infinitely many different possible “infinite sizes” of
sets.
We conclude by two theorems that we will not prove.
Theorem
(Cantor-Schr¨oder-Bernstein theorem)
.
Suppose there are injections
A B and B A. Then there’s a bijection A B.
Continuum hypothesis.
There is no set whose size lies between
N
and
R
.
In 1963, Paul Cohen proved that it is impossible to prove this or disprove this
statement (in ZFC). The proof can be found in the Part III Topics in Set Theory
course.