IA Numbers and Sets (Full)

Part IA — Numbers and Sets

Based on lectures by A. G. Thomason

Notes taken by Dexter Chua

Michaelmas 2014

These notes are not endorsed by the lecturers, and I have modified them (often

significantly) after lectures. They are nowhere near accurate representations of what

was actually lectured, and in particular, all errors are almost surely mine.

Introduction to number systems and logic

Overview of the natural numbers, integers, real numbers, rational and irrational

numbers, algebraic and transcendental numbers. Brief discussion of complex numbers;

statement of the Fundamental Theorem of Algebra.

Ideas of axiomatic systems and proof within mathematics; the need for proof; the

role of counter-examples in mathematics. Elementary logic; implication and negation;

examples of negation of compound statements. Proof by contradiction. [2]

Sets, relations and functions

Union, intersection and equality of sets. Indicator (characteristic) functions; their use in

establishing set identities. Functions; injections, surjections and bijections. Relations,

and equivalence relations. Counting the combinations or permutations of a set. The

Inclusion-Exclusion Principle. [4]

The integers

The natural numb ers: mathematical induction and the well-ordering principle. Exam-

ples, including the Binomial Theorem. [2]

Elementary number theory

Prime numbers: existence and uniqueness of prime factorisation into primes; highest

common factors and least common multiples. Euclid’s proof of the infinity of primes.

Euclid’s algorithm. Solution in integers of ax + by = c.

Mo dular arithmetic (congruences). Units modulo

. Chinese Remainder Theorem.

Wilson’s Theorem; the Fermat-Euler Theorem. Public key cryptography and the RSA

algorithm. [8]

The real numbers

Least upper bounds; simple examples. Least upper bound axiom. Sequences and series;

convergence of bounded monotonic sequences. Irrationality of

√

and

. Decimal

expansions. Construction of a transcendental number. [4]

Countability and uncountability

Definitions of finite, infinite, countable and uncountable sets. A countable union of

countable sets is countable. Uncountability of R. Non-existence of a bijection from a

set to its power set. Indirect proof of existence of transcendental numbers. [4]

Contents

0 Introduction

1 Proofs and logic

1.1 Proofs

1.2 Examples of proofs

1.3 Logic

2 Sets, functions and relations

2.1 Sets

2.2 Functions

2.3 Relations

3 Division

3.1 Euclid’s Algorithm

3.2 Primes

4 Counting and integers

4.1 Basic counting

4.2 Combinations

4.3 Well-ordering and induction

5 Modular arithmetic

5.1 Modular arithmetic

5.2 Multiple moduli

5.3 Prime moduli

5.4 Public-key (asymmetric) cryptography

6 Real numbers

6.1 Construction of numbers

6.2 Sequences

6.3 Series

6.4 Irrational numbers

6.5 Euler’s number

6.6 Algebraic numbers

7 Countability

0 Introduction

According to the Faculty, this course is not aimed at teaching you any new

knowledge. In particular, the Faculty says

This course is concerned not so much with teaching you new parts

of mathematics . . .

Instead, this course is intended to teach you about how to do maths properly.

The objective of this course is to start from a few axioms, which are assumptions

about our system, and try to prove everything rigorously from the axioms.

This is different from how mathematics is usually done in secondary school.

In secondary school, we just accept that certain statements are true. For example,

probably no one rigorously proved to you that each natural number has a unique

prime factorization. In this course, nothing is handwaved. Everything will be

obtained as a logical consequence of the axioms and previous (proven) results.

In the course, you shouldn’t focus too much on the content itself. Instead,

the major takeaway should be how we do mathematics. In particular, how we

construct and present arguments.

The actual content of this course is rather diverse. As the name suggests,

the course touches on numbers and sets. The “numbers” part is mostly basic

number theory, which you may have come across if you have participated in

mathematical olympiads. We also study some properties of real numbers, but

the actual serious study of real numbers is done in IA Analysis I.

The “sets” part starts with some basic definitions of sets, functions and

relations, which are really important and will crop up in all courses you will

encounter. In some sense, these form the language in which mathematics is

written. At the end of the course, we will touch on countability, which tells us

how “big” a set is.

1 Proofs and logic

1.1 Proofs

As the first course in pure mathematics, we will have an (informal) look at proofs

and logic.

In mathematics, we often want to prove things. This is different from most

other disciplines. For example, in science, we perform experiments to convince

ourselves that our theory is correct. However, no matter how many experiments

we do, we still cannot be absolutely sure that our theory is correct. For example,

Newtonian mechanics was believed to be correct for a long time, but is nowadays

only considered to be an approximation of reality.

On the other hand, when we prove a theorem in mathematics, we are

completely sure that the theorem is true. Also, when we actually prove a

theorem, (hopefully) we can also understand why it is true, and gain further

insight.

Definition

(Proof)

A proof is a sequence of true statements, without logical

gaps, that is a logical argument establishing some conclusion.

To prove things, we need to start from some assumptions. These assumptions

are known as axioms. When we call something an axiom, it does not mean that

we take these statements to be true without questioning. Instead, we are saying

“if we assume these axioms, then these results hold”. Two people can disagree on

what the axioms are and still be friends.

We also tend to define concepts as unambiguously as possible. Of course, just

like a dictionary cannot define all words without being circular, we do not define

everything in mathematics. To prove things, we have to start somewhere, with

some agreed assumptions (axioms). We also don’t define everything rigorously

(or else how could one start speaking?).

In mathematics, we are often concerned about truth. Often, we only care

about statements that can take some truth value.

Definition

(Statement)

A statement is a sentence that can have a true value.

Example. The following are statements:

(i) There are infinitely many primes of the form n

+ 1

(ii) There is always a prime number between n and 2n

(iii)

There is no computer program that can factorize an

-digit number in

steps

(iv)

For every polynomial

(

) =

n−1

···

, where

are

complex numbers,

n ≥

= 0, there exists (possibly complex)

such

that p(z) = 0

(v) m × n = n × m for all natural numbers n, m

(vi) 2 + 2 = 4

The current status (as of 2015) of these statements are:

(i) No one has a proof but it is probably true

(ii) This is known to be true

(iii) No one knows (related to P = NP problem)

(iv) This is known to be true (Fundamental Theorem of Algebra)

(v) This is known to be true

(vi) This is known to be true (obviously — does this need to be proved?)

1.2 Examples of proofs

Apart from having a proof, it is very important that a proof is correct. Here we

will look at some examples of proofs and non-proofs.

We first start with a simple example.

Proposition. For all natural numbers n, n

− n is a multiple of 3.

Proof.

We have

−n

= (

n −

(

+ 1). One of the three consecutive integers

is divisible by 3. Hence so is their product.

Proposition. If n

is even, then so is n.

Proof.

is even, then

= 2

for some integer

. Then

= 4

, which is

even.

This is incorrect! We wanted to proof “

is even”

⇒

“

is even”, but what

we proved is “n is even” ⇒ “n

is even”, which are distinct statements.

Instead, a correct proof is as follows:

Proof.

Suppose

is odd. Then

= 2

+ 1 for some integer

. Then

+ 4

+ 1 = 2(2

+ 2) + 1, which is odd. This contradicts our assumption

that n

is even.

This is an example of proof by contradiction. We assume what we want to

prove is false, and show that this leads to nonsense.

Proposition. The solutions to x

− 5x + 6 = 0 are x = 2 and x = 3.

Note that these are actually 2 different statements:

(i) x = 2 and x = 3 are solutions

(ii) There are no other solutions

We can write this as an “if and only if” statement:

is a solution if and only if

= 2 or

= 3. Alternatively, we say “

is a solution iff

= 2 or

= 3”; or “

is a solution ⇔ x = 2 or x = 3”.

Proof.

(i) If x = 2 or x = 3, then x − 2 = 0 or x − 3 = 0. So (x − 2)(x − 3) = 0.

(ii)

−

+ 6 = 0, then (

x −

2)(

x −

3) = 0. So

x −

2 = 0 or

x −

3 = 0.

Then x = 2 or x = 3.

Note that the second direction is simply the first argument reversed. We can

write this all in one go:

x = 3 or x = 2 ⇔ x − 3 = 0 or x − 2 = 0

⇔ (x − 3)(x − 2) = 0

⇔ x

− 5x − 6 = 0

Note that we used the “if and only if” sign between all lines.

We’ll do another non-proof.

Proposition. Every positive number is ≥ 1.

Proof. Let r be the smallest positive real. Then either r < 1, r = 1 or r > 1.

r <

1, then 0

< r

. Contradiction. If

r >

1, then 0

√

r < r

Contradiction. So r = 1.

Now this is obviously false, since 0

1. The problem with this proof is

that the smallest positive real need not exist. So we have to make sure we justify

all our claims.

1.3 Logic

Mathematics is full of logical statements, which are made of statements and

logical connectives. Usually, we use shorthands for the logical connectives.

Let

and

be statements. Then

P ∧Q

stands for “

and

”;

P ∨Q

stands

for “

”;

P ⇒ Q

stands for “

implies

”;

P ⇔ Q

stands for “

iff

”;

¬P

stands for “not

”. The truth of these statements depends on the truth of

P and Q . It can be shown by a truth table:

P Q P ∧ Q P ∨ Q P ⇒ Q P ⇔ Q ¬P

T T T T T T F

T F F T F F F

F T F T T F T

F F F F T T T

Certain logical propositions are equivalent, which we denote using the

⇔

sign.

For example,

¬(P ∧ Q) ⇔ (¬P ∨ ¬Q),

(P ⇒ Q) ⇔ (¬P ∨ Q) ⇔ (¬Q ⇒ ¬P ).

By convention, negation has the highest precedence when bracketing. For

example, ¬P ∨ ¬Q should be bracketed as (¬P ) ∨ (¬Q).

We also have quantifiers. (

∀x

)

(

) means “for all

(

) is true”, while

(∃x)P (x) means “there exists x such that P (x) is true”.

The quantifiers are usually bounded, i.e. we write

∀x ∈ X

∃x ∈ X

to mean

“for all x in the set X” and “there exists x in the set X” respectively.

Quantifiers are negated as follows:

¬(∀x)P (x) ⇔ (∃x)(¬P (x));

¬(∃x)P (x) ⇔ (∀x)(¬P (x)).

2 Sets, functions and relations

In this chapter, we will look at the basic building blocks of mathematics, namely

sets, functions and relations. Most of mathematics can be expressed in terms of

these notions, and it is helpful to be familiar with the relevant terms.

2.1 Sets

Definition

(Set)

A set is a collection of stuff, without regards to order. El-

ements in a set are only counted once. For example, if

= 2

, b

= 1, then

{a, b, c}

has only two members. We write

x ∈ X

is a member of the set

Example. Common sets and the symbols used to denote them:

– N = {1, 2, 3, ···} is the natural numbers

– N

= {0, 1, 2, ···} is the natural numbers with 0

– Z = {··· , −2, −1, 0, 1, 2, ···} is the integers

– Q = {

: a, b ∈ Z, b 6= 0} is the rational numbers

– R is the real numbers

– C is the complex numbers

It is still debated whether 0 is a natural number. Those who believe that 0 is a

natural number usually write

for

{

, ···}

, and

for the positive natural

numbers. However, most of the time, it doesn’t matter, and when it does, you

should specify it explicitly.

Definition (Equality of sets). A is equal to B, written as A = B, if

(∀x) x ∈ A ⇔ x ∈ B,

i.e. two sets are equal if they have the same elements.

Definition

(Subsets)

. A

is a subset of

, written as

A ⊆ B

A ⊂ B

, if all

elements in A are in B. i.e.

(∀x) x ∈ A ⇒ x ∈ B.

Theorem. (A = B) ⇔ (A ⊆ B and B ⊆ A)

Suppose

is a set and

is the property of some elements in

, we can write

a set

{x ∈ X

(

)

}

for the subset of

comprising of the elements for which

P (x) is true. e.g. {n ∈ N : n is prime} is the set of all primes.

Definition

(Intersection, union, set difference, symmetric difference and power

set). Given two sets A and B, we define the following:

– Intersection: A ∩ B = {x : x ∈ A and x ∈ B}

– Union: A ∪B = {x : x ∈ A or x ∈ B}

– Set difference: A \ B = {x ∈ A : x 6∈ B}

–

Symmetric difference:

∆

x ∈ A xor x ∈ B}

, i.e. the elements in

exactly one of the two sets

– Power set: P(A) = {X : X ⊆ A}, i.e. the set of all subsets

New sets can only be created via the above operations on old sets (plus

replacement, which says that you can replace an element of a set with another

element). One cannot arbitrarily create sets such as

x is a set and x 6∈

x}. Otherwise paradoxes will arise.

We have several rules regarding how these set operations behave, which

should be intuitively obvious.

Proposition.

– (A ∩ B) ∩ C = A ∩ (B ∩ C)

– (A ∪ B) ∪ C = A ∪ (B ∪ C)

– A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)

Notation. If A

are sets for all α ∈ I, then

α∈I

= {x : (∀α ∈ I)x ∈ A

}

and

[

α∈I

= {x : (∃α ∈ I)x ∈ A

Definition

(Ordered pair)

An ordered pair (

a, b

) is a pair of two items in which

order matters. Formally, it is defined as

{{a}, {a, b}}

. We have (

a, b

) = (

, b

)

iff a = a

and b = b

Definition

(Cartesian product)

Given two sets

A, B

, the Cartesian product

and

A × B

{

(

a, b

) :

a ∈ A, b ∈ B}

. This can be extended to

products, e.g.

R × R × R

{

(

x, y, z

) :

x, y, z ∈ R}

(which is officially

{(x, (y, z)) : x, y, z ∈ R}).

2.2 Functions

Definition

(Function/map)

A function (or map)

A → B

is a “rule” that

assigns, for each

a ∈ A

, precisely one element

(

)

∈ B

. We can write

a 7→ f

(

A and B are called the domain and co-domain respectively.

If we wish to be very formal, we can define a function to be a subset

f ⊆ A×B

such that for any

a ∈ A

, there exists a unique

b ∈ B

such that (

a, b

)

∈ f

. We

then think of (

a, b

)

∈ f

as saying

(

) =

. However, while this might act as a

formal definition of a function, it is a terrible way to think about functions.

Example. x

R → R

is a function that sends

R → R

is not a

function since

(0) is not defined.

±x

R → R

is also not a function since it is

multi-valued.

It is often helpful to categorize functions into different categories.

Definition

(Injective function)

A function

X → Y

is injective if it hits

everything at most once, i.e.

(∀x, y ∈ X) f (x) = f(y) ⇒ x = y.

Definition

(Surjective function)

A function

X → Y

is surjective if it hits

everything at least once, i.e.

(∀y ∈ Y )(∃x ∈ X) f(x) = y

Example. f : R → R

∪ {0} with x 7→ x

is surjective but not injective.

Definition

(Bijective function)

A function is bijective if it is both injective

and surjective. i.e. it hits everything exactly once.

Definition (Permutation). A permutation of A is a bijection A → A.

Definition

(Composition of functions)

The composition of two functions is a

function you get by applying one after another. In particular, if

X → Y

and

Y → Z

, then

g ◦ f

X → Z

is defined by

g ◦ f

(

) =

(

)). Note that

function composition is associative.

Definition

(Image of function)

A → B

and

U ⊆ A

, then

(

) =

(

) :

u ∈ U }. f (A) is the image of A.

By definition, f is surjective iff f(A) = B.

Definition

(Pre-image of function)

A → B

and

V ⊆ B

, then

−1

(

) =

{a ∈ A : f(a) ∈ V }.

This is the pre-image of the function

, and acts on subsets of

. This is

defined for any function

. It is important to note that we use the same symbol

−1

to denote the inverse function, which we will define later, but they are very

distinct entities. For example, we will see that the inverse function exists only

for bijective functions.

To define the inverse function, we will first need some preliminary definitions.

Definition

(Identity map)

The identity map

A → A

is defined as the

map a 7→ a.

Definition

(Left inverse of function)

Given

A → B

, a left inverse of

is a

function g : B → A such that g ◦ f = id

Definition

(Right inverse of function)

Given

A → B

, a right inverse of

is a function g : B → A such that f ◦g = id

Theorem. The left inverse of f exists iff f is injective.

Proof.

(

⇒

) If the left inverse

exists, then

∀a, a

∈ A, f

(

) =

(

)

⇒ g

(

)) =

g(f(a

)) ⇒ a = a

. Therefore f is injective.

(⇐) If f is injective, we can construct a g defined as

g :

(

g(b) = a if b ∈ f(A), where f(a) = b

g(b) = anything otherwise

Then g is a left inverse of f .

Theorem. The right inverse of f exists iff f is surjective.

Proof.

(

⇒

) We have

(

)) =

since

f ◦g

is the identity function. Thus

must be surjective since its image is B.

(

⇐

) If

is surjective, we can construct a

such that for each

b ∈ B

, pick

one a ∈ A with f(a) = b, and put g(b) = a.

(Note that to prove the second part, for each

, we need to pick an

such

that

(

) =

. If

is infinite, doing so involves making infinite arbitrary choices.

Are we allowed to do so?

To make infinite choices, we need to use the Axiom of choice, which explicitly

says that this is allowed. In particular, it says that given a family of sets

for

i ∈ I, there exists a choice function f : I →

such that f (i) ∈ A

for all i.

So can we prove the theorem without the Axiom of Choice? The answer is

no. This is since if we assume surjective functions have inverses, then we can

prove the Axiom of Choice.

Assume any surjective function

has a right inverse. Given a family of

non-empty sets

for

i ∈ I

(wlog assume they are disjoint), define a function

→ I

that sends each element to the set that contains the element. This

is surjective since each set is non-empty. Then it has a right inverse. Then the

right inverse must send each set to an element in the set, i.e. is a choice function

for A

Definition

(Inverse of function)

An inverse of

is a function that is both a

left inverse and a right inverse. It is written as

−1

B → A

. It exists iff

bijective, and is necessarily unique.

2.3 Relations

Definition

(Relation)

A relation

specifies that some elements of

are

related to some others. Formally, a relation is a subset

R ⊆ A × A

. We write

aRb iff (a, b) ∈ R.

Example. The following are examples of relations on natural numbers:

(i) aRb iff a and b have the same final digit. e.g. (37)R(57).

(ii) aRb iff a divides b. e.g. 2R6 and 2 6R7.

(iii) aRb iff a 6= b.

(iv) aRb iff a = b = 1.

(v) aRb iff |a − b| ≤ 3.

(vi) aRb iff either a, b ≥ 5 or a, b ≤ 4.

Again, we wish to classify different relations.

Definition (Reflexive relation). A relation R is reflexive if

(∀a) aRa.

Definition (Symmetric relation). A relation R is symmetric iff

(∀a, b) aRb ⇔ bRa.

Definition (Transitive relation). A relation R is transitive iff

(∀a, b, c) aRb ∧ bRc ⇒ aRc.

Example. With regards to the examples above,

Examples (i) (ii) (iii) (iv) (v) (vi)

Reflexive X X × × X X

Symmetric X × X X X X

Transitive X X × X × X

Definition (Equivalence relation). A relation is an equivalence relation if it is

reflexive, symmetric and transitive. e.g. (i) and (vi) in the above examples are

equivalence relations.

If it is an equivalence relation, we usually write

∼

instead of

. As the name

suggests, equivalence relations are used to describe relations that are similar

to equality. For example, if we want to represent rational numbers as a pair

of integers, we might have an equivalence relation defined by (

n, m

)

∼

(

p, q

) iff

, such that two pairs are equivalent if they represent the same rational

number.

Example.

If we consider a deck of cards, define two cards to be related if they

have the same suite.

As mentioned, we like to think of things related by

∼

as equal. Hence we

want to identify all “equal” things together and form one new object.

Definition

(Equivalence class)

∼

is an equivalence relation, then the equiva-

lence class [x] is the set of all elements that are related via ∼ to x.

Example. In the cards example, [8♥] is the set of all hearts.

Definition

(Partition of set)

A partition of a set

is a collection of subsets

of X such that each element of X is in exactly one of A

Theorem.

∼

is an equivalence relation on

, then the equivalence classes of

∼ form a partition of A.

Proof.

By reflexivity, we have

a ∈

[

]. Thus the equivalence classes cover the

whole set. We must now show that for all

a, b ∈ A

, either [

] = [

] or [

]

∩

[

] =

∅

Suppose [

]

∩

[

]

∅

. Then

∃c ∈

[

]

∩

[

]. So

a ∼ c, b ∼ c

. By symmetry,

c ∼ b

. By transitivity, we have

a ∼ b

. For all

∈

[

], we have

b ∼ b

. Thus

by transitivity, we have

a ∼ b

. Thus [

]

⊆

[

]. By symmetry, [

]

⊆

[

] and

[a] = [b].

On the other hand, each partition defines an equivalence relation in which

two elements are related iff they are in the same partition. Thus partitions and

equivalence relations are “the same thing”.

Definition

(Quotient map)

The quotient map

maps each element in

the equivalence class containing a, i.e. a 7→ [a]. e.g. q(8♥) = {♥}.

3 Division

If you think you already know how to divide, well perhaps you are right. However,

in this chapter, we will define these notions formally and properly prove things

we already know (plus maybe some new things).

3.1 Euclid’s Algorithm

Definition

(Factor of integers)

Given

a, b ∈ Z

, we say

divides

is a factor

a | b

if (

∃c ∈ Z

)

. For any

1 and

±b

are always factors of

The other factors are called proper factors.

Theorem

(Division Algorithm)

Given

a, b ∈ Z

b 6

= 0, there are unique

q, r ∈ Z

with a = qb + r and 0 ≤ r < b.

Despite the name, the division algorithm is not an algorithm in the usual

sense. Instead, it merely states that you can divide. Even the proof does not

specify a (non-brute force) way of how to divide.

Proof.

Choose

max{q

qb ≤ a}

. This maximum exists because the set of all

such that

qb ≤ a

is finite. Now write

a − qb

. We have 0

≤ r < b

and thus

q and r are found.

To show that they are unique, suppose that

. We

have (

q − q

)

= (

− r

). Since both

and

are between 0 and

, we have

−b < r − r

< b

. However,

− r

is a multiple of

. Thus

q − q

− r

= 0.

Consequently, q = q

and r = r

Definition

(Common factor of integers)

A common factor of

and

is a

number c ∈ Z such that c | a and c | b.

Definition

(Highest common factor/greatest common divisor)

The highest

common factor or greatest common divisor of two numbers

a, b ∈ N

is a number

d ∈ N

such that

is a common factor of

and

, and if

is also a common

factor, c | d.

Clearly if the hcf exists, it must be the largest common factor, since all other

common factors divide it, and thus necessarily unique.

You might think reasonably it is more natural to define

hcf

(

a, b

) to be the

largest common factor. Then show that it has the property that all common

factors divide it. But the above definition is superior because it does not require

a prior ordering of the natural numbers (and can be extended to any ring even

if they are not ordered, as we will do in IB Groups, Rings and Modules).

Notation. We write d = hcf(a, b) = gcd(a, b) = (a, b).

Here we use (

a, b

) to stand for a number, and has nothing to do with an

ordered pair.

Proposition. If c | a and c | b, c | (ua + vb) for all u, v ∈ Z.

Proof.

By definition, we have

and

. Then

ukc

vlc

(uk + vl)c. So c | (ua + vb).

Theorem. Let a, b ∈ N. Then (a, b) exists.

Proof.

Let

{ua

u, v ∈ Z}

be the set of all linear combinations of

a, b

Let

be the smallest positive member of

. Say

. Hence if

c | a, c | b

then c | d. So we need to show that d | a and d | b, and thus d = (a, b).

By the division algorithm, there exist numbers

q, r ∈ Z

with

≤ r < d

. Then

a−qd

−qx

)

−qyb

. Therefore

is a linear combination

and

. Since

is the smallest positive member of

and 0

≤ r < d

, we have

r = 0 and thus d | a. Similarly, we can show that d | b.

Corollary.

(from the proof) Let

= (

a, b

), then

is the smallest positive linear

combination of a and b.

Corollary

(B´ezout’s identity)

Let

a, b ∈ N

and

c ∈ Z

. Then there exists

u, v ∈ Z with c = ua + vb iff (a, b) | c.

Proof.

(

⇒

) Let

= (

a, b

). If

is a linear combination of

and

, then

d | c

because d | a and d | b.

(

⇐

) Suppose that

d | c

. Let

and

. Then

= (

)

+ (

)

Thus c is a linear combination of a and b.

Note that the proof that (

a, b

) exists is existential, not constructive. How

can we actually find d, and how can we find x, y such that d = xa + yb?

While it might be easy to simply inspect

for small numbers, how would

you find common factors of, say, 4931 and 3795? We cannot use primes because

(a) prime factorization is hard; and (b) primes are not yet defined.

You might spot if

c |

4931 and

c |

3795, then

c |

(4931

−

3795) = 1136. The

process is also reversible — if

c |

1136 and

c |

3795, then

c |

(1136 + 3795) = 4931.

Thus the problem is equivalent to finding common factors of 3795 and 1136. The

process can be repeated until we have small numbers.

Proposition

(Euclid’s Algorithm)

If we continuously break down

and

the following procedure:

a = q

b + r

b = q

+ r

= q

+ r

n−2

= q

n−1

then the highest common factor is r

n−1

Proof.

We have (common factors of

a, b

) = (common factors of

b, r

) = (common

factors of r

, r

) = ··· = (factors of r

n−1

This gives an alternative proof that hcfs exist.

How efficient is this algorithm? For every step, we have

a ≥ b

Thus every two steps, the number on the left goes down by at least half. Hence

the number of digits goes down every 8 steps. Thus the time needed is

≤

number of digits and has time complexity O(log b).

Example. Suppose a = 57 and b = 42.

common factors of 57 and 42 57 = 1 × 42 + 15

= common factors of 42 and 15 42 = 2 × 15 + 12

= common factors of 15 and 12 15 = 1 × 12 + 3

= common factors of 12 and 3 12 = 4 × 3 + 0

= common factors of 3 and 0

= factors of 3.

So the hcf is 3.

By reversing Euclid’s Algorithm, we can find the hcf of two numbers as a

linear combination of a and b.

Example. Consider 57 and 21.

57 = 2 × 21 + 15

21 = 1 × 15 + 6

15 = 2 × 6 + 3

6 = 2 × 3

In the opposite direction, we have

3 = 15 − 2 × 6

= 15 − 2 × (21 − 15)

= 3 × 15 − 2 × 21

= 3 × (57 − 2 × 21) − 2 ×21

= 3 × 57 − 8 × 21

This gives an alternative constructive proof of B´ezout’s identity. Moreover,

it gives us a quick way of expressing (

a, b

) =

. However, this algorithm

requires storing the whole process of Euclid’s Algorithm and is not efficient

space-wise.

To achieve higher space efficiency, we attempt to find a recurrence relation

for the coefficients

, B

such that

a × B

− b × A

= (

−

. The possible

factor of

−

1 is there just so that the recurrence relation will look nicer. Suppose

that this is satisfied for all indices less than j. Then we have

(−1)

= (−1)

j−2

− q

j−1

)

= (−1)

j−2

+ q

(−1)

j−1

= a(B

j−2

+ q

j−1

) − b(A

j−2

+ q

j−1

Hence we can obtain the following recurrence relation:

= q

j−1

+ A

j−2

= q

j−1

+ B

j−2

with

a × B

− b × A

= (−1)

In particular, a × B

n−1

− b × A

n−1

= (−1)

n−1

= (a, b).

Also, by an easy induction, A

j−1

− B

j−1

= (−1)

. So (A

, B

) = 1.

These coefficients also play another role. We can put the Euclid’s Algorithm’s

equations in the following form:

= 2 +

= 1 +

= 2 +

= 2

Then we can write out the fraction

in continued fraction form

= 2 +

1 +

2 +

Expanding this continued fractions term by term, we can have the sequence

2 +

= 3, 2 +

. These are called the “convergents”. The sequence

happens to be

3.2 Primes

There are a lot of ways we can define prime numbers in

. The definition we

will use is the following:

Definition

(Prime number)

. p ∈ N

is a prime if

p >

1 and the only factors of

(in Z) are ±1 and ±p.

In this chapter, the objective is to prove things we already know and think

are obvious.

Theorem. Every number can be written as a product of primes.

Proof.

n ∈ N

is not a prime itself, then by definition

. If either

is not prime, then that number can be written as a product, say

. Then

acd

and so on. Since these numbers are getting smaller, and the process

will stop when they are all prime.

In the proof, we handwaved a bit when we said “and so on”. We will later

come up with the principle of (strong) induction that rigorously justifies this.

This is the case for many proofs we will have here.

Theorem. There are infinitely many primes.

Proof.

(Euclid’s proof) Suppose there are finitely many primes, say

, p

···p

Then

···p

+ 1 is divisible by none of the primes. Otherwise,

(

N − p

···p

), i.e.

1, which is impossible. However,

is a product of

primes, so there must be primes not amongst p

, p

···p

Proof.

(Erd¨os 1930) Suppose that there are finitely many primes,

, p

···p

Consider all numbers that are the products of these primes, i.e.

···p

where

≥

0. Factor out all squares to obtain the form

···p

, where

m ∈ N and i

= 0 or 1.

Let

N ∈ N

. Given any number

x ≤ N

, when put in the above form, we have

m ≤

√

. So there are at most

√

possible values of

. For each

, there

are 2

numbers of the form

···p

. So there are only

√

N ×

possible

values of x of this kind.

Now pick

N ≥

. Then

N >

√

N ×

. So there must be a number

≤ N

not of this form, i.e. it has a prime factor not in this list.

Historically, many people have came up with “new” proofs that there are

infinitely many primes. However, most of these proofs were just Euclid’s proof

in disguise. Erd¨os’ proof is genuinely a new proof. For example, Euclid’s proof

comes up with a particular number

, and says all its factors are not in the

list of primes. On the other hand, Erd¨os’ proof says that there is some number,

which we don’t know, with at least one factor not in the list.

Also, the proofs give different bounds on when we should expect to see the

th prime. For example, Euclid tells us that the

th prime must be less than

, while Erd¨os tells us it is less than 4

Theorem. If a | bc and (a, b) = 1, then a | c.

Proof.

From Euclid’s algorithm, there exist integers

u, v ∈ Z

such that

= 1.

So multiplying by c, we have uac + vbc = c. Since a | bc, a |LHS. So a | c.

Definition (Coprime numbers). We say a, b are coprime if (a, b) = 1.

Corollary. If p is a prime and p | ab, then p | a or p | b. (True for all p, a, b)

Proof.

We know that (

p, a

) =

or 1 because

is a prime. If (

p, a

) =

, then

p | a. Otherwise, (p, a) = 1 and p | b by the theorem above.

Corollary. If p is a prime and p | n

···n

, then p | n

for some i.

Note that when we defined primes, we defined it in terms of factors of

This corollary is the opposite — it is about how

behaves as a factor of other

numbers.

Theorem

(Fundamental Theorem of Arithmetic)

Every natural number is

expressible as a product of primes in exactly one way. In particular, if

···p

···q

, where

, q

are primes but not necessarily distinct,

then k = l. q

, ···q

are p

, ···p

in some order.

Proof.

Since we already showed that there is at least one way above, we only

need to show uniqueness.

Let

···p

···q

. We know that

| q

···q

. Then

| q

(

···q

Thus

| q

for some

. wlog assume

= 1. Then

since both are primes.

Thus p

···p

= q

···q

. Likewise, we have p

= q

, ··· and so on.

Corollary.

···p

and

···p

, where

are distinct primes

(exponents can be zero). Then (

a, b

) =

min{i

}

. Likewise,

lcm

(

a, b

) =

max{i

}

. We have hcf(a, b) × lcm(a, b) = ab.

However, this is not an efficient way to calculate (

a, b

), since prime factoriza-

tion is very hard.

Note that this is a property peculiar to natural numbers. There are “arith-

metical systems” (permitting addition, multiplication and subtraction) where

factorization is not unique, e.g. even numbers.

Example. The following systems have no prime unique factorization

(i)

Even numbers. “Primes” are twice of odd numbers. So 6 is a prime (NOT

divisible by 2!) while 8 is not. We have 60 = 2

30 = 6

10, where

30 are primes. However, this example is not “proper” since there

is no identity element. (i.e. not a ring)

(ii)

Consider

[

√

−5

] =

√

−5

a, b ∈ Z}

. We have 6 = 2

3 =

−

√

−5

)(1+

√

−5

). It can be shown that these are primes (see IB Groups,

Rings and Modules).

Exercise: Where does the proof of the Fundamental Theorem of Arithmetic

fail in these examples?

4 Counting and integers

This chapter exists because experience shows that mathematicians do not know

how to count.

4.1 Basic counting

A useful theorem is the pigeonhole principle.

Theorem

(Pigeonhole Principle)

If we put

+ 1 pigeons into

pigeonholes,

then some pigeonhole has at least m + 1 pigeons.

Example.

In Cambridge, there are 2 people who have the same number of

hairs.

Another useful tool for counting is the indicator function.

Definition

(Indicator function/characteristic function)

Let

be a set. For

each

A ⊆ X

, the indicator function or characteristic function of

is the function

X → {

}

with

(

) = 1 if

x ∈ A

, 0 otherwise. It is sometimes written as

Proposition.

(i) i

= i

⇔ A = B

(ii) i

A∩B

= i

(iii) i

= 1 − i

(iv) i

A∪B

= 1

− i

A∪B

= 1

− i

A∩

= 1

− i

= 1

−

− i

)(1

− i

) =

+ i

− i

A∩B

(v) i

A\B

= i

A∩

= i

(1 − i

) = i

− i

A∩B

Example.

We can use the indicator function to prove certain properties about

sets:

(i) Proof that A ∩(B ∪ C) = (A ∩ B) ∪ (A ∩ C):

A∩(B∪C)

= i

B∪C

= i

+ i

− i

)

= i

+ i

− i

(A∩B)∪(A∩C)

= i

A∩B

+ i

A∩C

− i

A∩C

A∩B

= i

+ i

− i

= i

+ i

− i

Therefore

A∩(B∪C)

(A∩B)∪(A∩C)

and thus

A ∩

(

B ∪ C

) = (

A ∩ B

)

∪

(A ∩ C).

Note that i

= i

since i

= 0 or 1, and 0

= 0 and 1

= 1.

(ii)

Proof that the symmetric difference is associative: Observe that

A∆B

≡

+ i

(mod 2). Thus i

(A∆B)∆C

= i

A∆(B∆C)

≡ i

+ i

(mod 2).

Indicator functions are handy for computing the sizes of finite sets because if

A ⊆ X, then |A| =

x∈X

(x).

Proposition. |A ∪B| = |A| + |B| − |A ∩ B|

Proof.

|A ∪ B| =

x∈X

A(x)∪B(x)

(x) + i

(x) − i

A∩B

(x))

(x) +

(x) −

A∩B

(x)

= |A| + |B| − |A ∩ B|

More importantly, we will use indicator functions to prove a powerful result.

Theorem

(Inclusion-Exclusion Principle)

Let

be subsets of a finite set

for 1 ≤ i ≤ n. Then

∩ ···∩

| = |X| −

| +

i<j

∩ A

| − ···+ (−1)

∩ ···A

Equivalently,

∪ ···∪A

| =

| −

i<j

∩ A

| + ···+ (−1)

n−1

∩ ···∩A

The two forms are equivalent since |A

∪ ···∪A

| = |X| − |

∩ ···∩

Proof. Using indicator functions,

∩

∩···∩

(1 − i

)

= 1 −

i<j

− ···+ (−1)

···i

= 1 −

i<j

∩A

− ···+ (−1)

∩A

∩···∩A

Thus

∩ ···∩

| =

x∈X

∩

∩···∩

(x)

1 −

(x) +

i<j

∩A

(x) − ···

(−1)

∩A

∩···∩A

(x)

= |X| −

| +

i<j

∩ A

−

i<j<k

∩ A

| + ···+ (−1)

∩ A

∩ ···A

Example. How many numbers ≤ 200 are coprime to 110?

Let

{

, ···

200

}

, and

: 2

| x}

: 5

| x}

: 11

| x}

We know that

| = b200/2c = 100

| = b200/5c = 40

| = b200/11c = 18

∩ A

| = b200/10c = 20

∩ A

| = b200/22c = 9

∩ A

| = b200/55c = 3

∩ A

| = b200/110c = 1

Then the answer is 200 −100 −40 −18 + 20 + 9 + 3 − 1 = 73.

4.2 Combinations

“Combinations” is about counting the ways we can pick things without regards

to order. We can formulate these problems in the language of sets: given a set

, how many subsets of

are there that satisfy some particular properties?

For example, if we want to pick 3 people from 10, we can let

be the set of the

10 people. Then the number of ways of picking the 3 people is the number of

subsets of size three.

Example.

How many subsets of

{

, ···n}

are there? There are 2

×···×

2 =

. Since for each subset, every element is either in or out of the subset, and there

are two choices for each element. Equivalently, there are 2

possible indicator

functions, i.e. functions {1, 2, 3, ··· , n} → {0, 1}.

Definition

(Combination





)

The number of subsets of

{

, ··· , n}

of size

r is denoted by





. The symbol is pronounced as “n choose r”.

This is the definition of





. This does not in any way specify how we can

actually calculate the value of





Proposition. By definition,









+ ···+





= 2

Theorem (Binomial theorem). For n ∈ N with a, b ∈ R, we have

(a + b)









n−1

+ ···+





n−r

+ ···+





Proof.

We have (

)

= (

)(

)

···

(

). When we expand the product,

we get all terms attained by choosing

from some brackets,

from the rest. The

term

n−r

comes from choosing

from

brackets,

from the rest, and there

are





ways to make such a choice.

This theorem is not immediately useful since we do not know the value of





Because of this theorem,





is sometimes called a “binomial coefficient”.

Proposition.

(i)







n − r



. This is because choosing

things to keep is the same as

choosing n − r things to throw away.

(ii)



r − 1









n + 1



(Pascal’s identity) The RHS counts the number

of ways to choose a team of

players from

+ 1 available players, one of

whom is Pietersen. If Pietersen is chosen, there are



r−1



ways to choose

the remaining players. Otherwise, there are





ways. The total number

of ways is thus



r−1







Now given that









= 1, since there is only one way to choose

nothing or everything, we can construct Pascal’s triangle:

1 1

1 2 1

1 3 3 1

1 4 6 4 1

where each number is the sum of the two numbers above it, and the

item of the nth row is





(first row is row 0).

(iii)











n − r

k − r



. We are counting the number of pairs of sets

(

Y, Z

) with

|Y |

and

|Z|

with

Z ⊆ Y

. In the LHS, we first choose

then choose

Z ⊆ Y

. The RHS chooses

first and then choose the

remaining Y \ Z from {1, 2, ···n} \ Z.

(iv)









r − 1





···



r − k





···









a + b



(Vandermonde’s convolution) Suppose we have

men and

women, and

we need to choose a committee of

people. The right hand side is the total

number of choices. The left hand side breaks the choices up according to

the number of men vs women.

Example.

A greengrocer stocks

kinds of fruit. In how many ways can we

choose a bag of

fruits? If we are only allowed to choose one of each kind, then

the answer is





. But we might have

= 4, and we want to allow picking 2

apples, 1 plum and 1 quince. The total number of ways to choose is



n+r−1



Why?

Each choice can be represented by a binary string of length

r −

1, with

0’s and

n −

1 1’s. The string can be constructed as follows (by example): when

= 5 and

= 8, a possible binary string 000100110010. The block of zeros

corresponds to the number of each fruit chosen, and the 1s separate the choices.

In the string above, we have 3 of type 1, 2 of type 2, 0 of type 3, 2 of type 4 and

1 of type 5.

Then clearly the number of possible strings is



n+r−1



Proposition.





(n − r)!r!

Proof.

There are

(

n−

1)(

n−

···

(

n−r

+1) =

(n−r)!

ways to choose

elements

in order. Each choice of subsets is chosen this way in

! orders, so the number of

subsets is

(n−r)!r!

We might write

for the polynomial

(

x −

···

(

x − r

+ 1). We call this

“

to the

falling”. We can write





. Multiplying Vandermonde by

we obtain the “falling binomial theorem”









r−1

+ ···+





= (a + b)

Example.

A bank prepares a letter for each of its

customers, saying how

much it cares. (Each of these letters costs the customer £40) There are

! ways

to put the letters in the envelopes. In how many ways can this be done so that no

one gets the right letter (i.e. how many derangements are there of

elements)?

We let

be the set of all envelopings (permutation of

|X|

!. For each

, let

{x ∈ X

x assigns the correct letter to customer i}

. We want to

know

. We know that

= (

n −

1)! since

’s letter gets in

’s envelopes

and all others can be placed randomly. We have

∩ A

= (

n −

2)! as well.

Similarly, |A

∩ A

| = (n − 3)!.

By the inclusion-exclusion formula, we have



= |X| −

| +

∩ A

| + ···

= n! −





(n − 1)! +





(n − 2)! − ···

= n!



1 −

− ··· +

(−1)



≈ n!e

−1

4.3 Well-ordering and induction

Several proofs so far involved “take the least integer such that”, e.g. division

algorithm; or involved a sequence of moves “and so on. . . ” e.g. Euclid’s algorithm,

every number is a product of primes. We rely on the following:

Theorem

(Weak Principle of Induction)

Let

(

) be a statement about the

natural number n. Suppose that

(i) P (1) is true

(ii) (∀n) P (n) ⇒ P (n + 1)

Then P (n) is true for all n ≥ 1.

Example (Tower of Hanoi). Referring to the image below,

A B C

The objective is to move the

rings on peg A to peg B, with the constraints

that you can only move one ring at a time, and you can never place a larger ring

onto a smaller ring.

Now claim that this needs exactly 2

− 1 moves.

Let

(

) be “

rings needs 2

−

1 moves”. Note that this statement contains

two assertions — (1) we can do it in 2

− 1 moves; (2) We can’t do it in fewer.

First consider P (1). We simply have to move the ring from A to B.

Suppose we have

+ 1 rings. We can move the top

rings to peg C, then

move the bottom ring to peg B, then move the

rings from

back to

Assuming P (n) is true, this needs at most 2 × (2

− 1) + 1 = 2

n+1

− 1 moves.

Can we do it in fewer moves? To succeed, we must free the bottom ring,

so we must shift the top

rings to another pe.g. This needs

≥

−

1 moves

(

). Then we need to shift the bottom ring. Then we need to shift the

smaller rings to the big one. This needs

≥

−

1 moves by

(

). So this needs

≥ 2

n+1

− 1 moves altogether.

So we showed that

(

)

⇒ P

(

+ 1) (we used

(

) four times). By the WPI,

P (n) is true for all n.

Example.

All numbers are equal. Let

(

) be “if

, ···a

}

is a set of

numbers, then

···a

”.

(1) is trivially true. Suppose we have

, a

···a

n+1

}

. Assuming

(

), apply it to

, a

···a

}

and

, ··· , a

n+1

}

then

···

and

···

n+1

. So

···

n+1

. Hence

P (n) ⇒ P (n + 1). So P (n) is true for all n ∈ N.

Theorem. Inclusion-exclusion principle.

Proof.

Let

(

) be the statement “for any sets

···A

”, we have

∪ ··· ∪

| =

| −

i<j

∩ A

| + ··· ± |A

∩ A

∩ ··· ∩ A

|”.

(1) is trivially true.

(2) is also true (see above). Now given

···A

n+1

Let B

= A

∩ A

n+1

for 1 ≤ i ≤ n. We apply P (n) both to the A

and B

Now observe that

∩ B

∩ A

n+1

. Likewise,

∩ B

∩ A

n+1

. Now

∪ A

∪ ··· ∪ A

n+1

| = |A

∪ ··· ∪ A

| + |A

n+1

| − |(A

∪ ··· ∪ A

) ∩ A

n+1

= |A

∪ ··· ∪ A

| + |A

n+1

| − |B

∪ ··· ∪ B

i≤n

| −

i<j≤n

∩ A

| + ··· + |A

n+1

−

i≤n

| +

i<j≤n

∩ B

| − ···

Note

i≤n

∩A

n+1

. So

i<j≤n

∩A

i≤n

i<j≤n+1

∩A

and similarly for the other terms. So

i≤n+1

| −

i<j≤n+1

∩ A

| + ···

So P (n) ⇒ P (n + 1) for n ≥ 2. By WPI, P (n) is true for all n.

However, WPI is not quite what we want for “every number is a product of

primes”. We need a different form of induction.

Theorem

(Strong principle of induction)

Let

(

) be a statement about

n ∈ N

Suppose that

(i) P (1) is true

(ii) ∀n ∈ N , if P (k) is true ∀k < n then P (n) is true.

Then P (n) is true for all n ∈ N .

Note that (i) is redundant as it follows from (ii), but we state it for clarity.

Example.

“Evolutionary trees” Imagine that we have a mutant that can produce

two offsprings. Each offspring is either an animal or another mutant. A possible

evolutionary tree is as follows:

mutant

pig

mutant

slug

man

mutant

gnu

ibex

Let

(

) be the statement

n −

1 mutants produces

animals. Given some tree

with

animals, remove the top mutant to get two sub-trees, with

and

animals, where

. If

(

) is true

∀k < n

, then

(

) and

(

) are

true. So the total number of mutants is 1 + (

−

1) + (

−

1) =

n −

1. So

(

)

is true. Hence by strong principle of induction, P (n) is true for all n.

Theorem.

The strong principle of induction is equivalent to the weak principle

of induction.

Proof.

Clearly the strong principle implies the weak principle since if

(

)

⇒

P (n + 1), then (P (1) ∧ P (2) ∧ ··· ∧ P (n)) ⇒ P (n + 1).

Now show that the weak principle implies the strong principle. Suppose that

(1) is true and (

∀n

)

(1)

∧ P

(2)

∧ ··· ∧ P

(

n −

⇒ P

(

). We want to show

that P (n) is true for all n using the weak principle.

Let

(

) = “

(

) is true

∀k ≤ n

”. Then

(1) is true. Suppose that

(

) is

true. Then

(1)

∧P

(2)

∧···∧P

(

) is true. So

(

+ 1) is true. Hence

(

+ 1)

is true. By the weak principle,

(

) is true for all

. So

(

) is true for all

While strong and weak induction are practically equivalent, they are rather

distinct conceptually. Weak induction is expressed in terms of “adding 1”, while

strong induction is based on the ordering of natural numbers.

It turns out that there is another statement about the natural numbers that

can be stated in terms of orders. We first formally define what it means to be

an order.

Definition

(Partial order)

A partial order on a set is a reflexive, antisymmetric

((aRb) ∧ (bRa) ⇔ a = b) and transitive relation.

Example.

The ordinary ordering of

N a ≤ b

is a partial order of

. Also,

a | b

on N is also a partial order.

Definition

(Total order)

A total order is a partial order where

∀a 6

, exactly

one of aRb or bRa holds. This means that every two things must be related.

Definition

(Well-ordered total order)

A total order is well-ordered if every

non-empty subset has a minimal element, i.e. if

S 6

∅

, then

∃m ∈ S

such that

x < m ⇒ x 6∈ S.

Example. Z

with the usual order is not well-ordered since the set of even

integers has no minimum. The positive rationals are also not well-ordered under

the usual order.

Theorem

(Well-ordering principle)

. N

is well-ordered under the usual order,

i.e. every non-empty subset of N has a minimal element.

Theorem.

The well-ordering principle is equivalent to the strong principle of

induction.

Proof.

First prove that well-ordering implies strong induction. Consider a

proposition P (n). Suppose P (k) is true ∀k < n implies P (n).

Assume the contrary. Consider the set

{n ∈ N

¬P

(

)

}

. Then

has a

minimal element

. Since

is the minimal counterexample to

(

) is true

for all

k < m

. However, this implies that

(

) is true, which is a contradiction.

Therefore P (n) must be true for all n.

To show that strong induction implies well-ordering, let

S ⊆ N

. Suppose

that

has no minimal element. We need to show that

is empty. Let

(

) be

the statement n 6∈ S.

Certainly 1

6∈ S

, or else it will be the minimal element. So

(1) is true.

Suppose we know that

(

) is true for all

k < n

, i.e.

k 6∈ S

for all

k < n

Now

n 6∈ S

, or else

will be the minimal element. So

(

) is true. By strong

induction, P (n) is true for all n, i.e. S is empty.

The well-ordering principle enables us to show that

(

) is true as follows:

(

) fails for some

, then there is a minimal counterexample

. Then we

try to show that this leads to a contradiction.

Example.

Proof that every number is a product of primes by strong induction:

Assume the contrary. Then there exists a minimal

that cannot be written as a

product of prime (by the well-ordering principle). If

is a prime, then

is a

product of primes. Otherwise, write

, where 1

< a, b < n

. By minimality

of n, both a and b are products of primes. Hence so is n. Contradiction.

Example.

All numbers are interesting. Suppose that there are uninteresting

numbers. Then there exists a smallest uninteresting number. Then the property

of being the smallest uninteresting number is itself interesting. Contradiction.

Example.

Consider a total order on

N × N

by “lexicographic” or “dictionary”

order, i.e. (a, b) ≤ (c, d) if a < c or (a = c ∧ b ≤ d).

The Ackermann function is a function a : N

× N

→ N is defined by

a(m, n) =











n + 1 if m = 0

a(m − 1, 1) if m > 0 and n = 0

a(m − 1, a(m, n − 1)) if m > 0 and n > 0.

We want to show that this is well-defined.

Note that

(

m, n

) is expressed in terms of

at points (

x, y

)

(

m, n

). So

is well-defined if lexicographic order is well-ordered, i.e. every non-empty subset

has a minimal element (if

were not well-defined, then would be a smallest place

where the definition is bad. But definition of that point is defined in terms of

smaller points which are well defined).

We can see that

× N

is well-ordered: if

S ⊆ N

× N

is non-empty,

let

be the set of

{x ∈ N

: (

∃y

) (

x, y

)

∈ S}

, i.e. the set of all

-coordinates

. By the well-ordering principle,

has a minimal element

. Then let

{y ∈ N

: (

m, y

)

∈ S}

. Then

has a minimal element

. Then (

m, n

) is

the minimal element of S.

5 Modular arithmetic

We are going to study modular arithmetic. In modular arithmetic, we first pick

a particular natural number to be the modulus, say 7. Then we consider two

numbers to be “equal” if their difference is a multiple of the modulus. For

example, modulo 7, we will think that 3 and 10 are “the same”, while 2 and 4

are different.

We will study arithmetic under this number system. Like the integers, we

are allowed to add and multiply numbers. However, while in

, we can only

divide by 1 and -1, in modular arithmetic, more numbers can be divided. For

example, modulo 10, we are allowed to divide by 3.

An important application of modular arithmetic is RSA encryption. This is

a widely deployed asymmetric encryption algorithm. By asymmetric, we mean

that the key for encryption is different from the key for decryption. This is very

useful in real life, since we can broadcast the encryption key to the world, and

keep the decryption key to ourselves. This way anyone can send you encrypted

messages that only you can decrypt.

5.1 Modular arithmetic

Definition

(Modulo)

a, b ∈ Z

have the same remainder after division by

i.e. m | (a − b), we say a and b are congruent modulo m, and write

a ≡ b (mod m)

Example.

The check digits of the ISBN (or Hong Kong ID Card Number) are

calculated modulo 11.

Example. 9 ≡ 0 (mod 3), 11 ≡ 6 (mod 5).

Proposition. If a ≡ b (mod m), and d | m, then a ≡ b (mod d).

Proof. a ≡ b

(

mod m

) if and only if

m |

(

a − b

), hence

d |

(

a − b

), i.e.

a ≡ b

(mod d).

Observe that with

fixed,

a ≡ b

(

mod m

) is an equivalence relation. The

set of equivalence classes is written as Z

or Z/(mZ).

Example. Z

= {[0], [1], [2]}

Proposition.

a ≡ b

(

mod m

) and

u ≡ v

(

mod m

), then

u ≡ b

(mod m) and au ≡ bv (mod m).

Proof.

Since

a ≡ b

(

mod m

) and

u ≡ v

(

mod m

), we have

m |

(

a−b

)+(

u−v

) =

(a + u) − (b + v). So a + u ≡ b + v (mod m)

Since

a ≡ b

(

mod m

) and

u ≡ v

(

mod m

), we have

m |

(

a −b

)

(

u −v

) =

au − bv. So au ≡ bv (mod m).

This means that we can do arithmetic modulo

. Formally, we are doing

arithmetic with the congruence classes, i.e

. For example, in

, [4] + [5] =

[9] = [2].

Modular arithmetic can sometimes be used to show that equations have no

solutions.

Example.

+ 3

= 1 has no solutions in

. If there were a solution, then

≡

1 (

mod

3). But 2

≡

0, 2

≡

2 and 2

≡

2. So there is no solution

to the congruence, and hence none to the original equation.

Observe that all odd numbers are either

≡

1 (

mod

4) or

≡

≡ −

1 (

mod

4).

So we can classify primes depending on their value modulo 4.

Theorem. There are infinitely many primes that are ≡ −1 (mod 4).

Proof.

Suppose not. So let

, ···p

be all primes

≡ −

1 (

mod

4). Let

···p

−

1. Then

N ≡ −

1 (

mod

4). Now

is a product of primes, say

···q

. But 2

- N

and

- N

for all

. So

≡

1 (

mod

4) for all

. But

then that implies N = q

···q

≡ 1 (mod 4), which is a contradiction.

Example.

Solve 7

x ≡

2 (

mod

10). Note that 3

≡

1 (

mod

10). If we multiply

the equation by 3, then we get 3

· x ≡

2 (

mod

10). So

x ≡

6 (

mod

10).

Effectively, we divided by 7.

“Division” doesn’t always work for all numbers, e.g. you cannot divide by 2

mod 10. We give a name to numbers we can divide.

Definition

(Unit (modular arithmetic))

. u

is a unit if

∃v

such that

uv ≡

(mod m).

Theorem. u is a unit modulo m if and only if (u, m) = 1.

Proof.

(

⇒

) Suppose

is a unit. Then

∃v

such that

uv ≡

1 (

mod m

). Then

= 1 +

for some

, or

uv − mn

= 1. So 1 can be written as a linear

combination of u and m. So (u, m) = 1.

(

⇐

) Suppose that (

u, m

) = 1. Then there exists

a, b

with

= 1. Thus

ua ≡ 1 (mod m).

Using the above proof, we can find the inverse of a unit efficiently by Euclid’s

algorithm.

Corollary.

If (

a, m

) = 1, then the congruence

ax ≡ b

(

mod m

) has a unique

solution (mod m).

Proof.

ax ≡ b

(

mod m

), and (

a, m

) = 1, then

∃a

−1

such that

−1

a ≡

(

mod m

). So

−1

ax ≡ a

−1

(

mod m

) and thus

x ≡ a

−1

(

mod m

). Finally we

check that

x ≡ a

−1

(

mod m

) is indeed a solution:

ax ≡ aa

−1

b ≡ b

(

mod m

Proposition. There is a solution to ax ≡ b (mod m) if and only if (a, m) | b.

= (

a, m

)

| b

, then the solution is the unique solution to

x ≡

(

mod

)

Proof.

Let

= (

a, m

). If there is a solution to

ax ≡ b

(

mod m

), then

m | ax −b

So d | ax − b and d | b.

On the contrary, if d | b, we have ax ≡ b (mod m) ⇔ ax − b = km for some

k ∈ Z

. Write

and

. So

ax ≡ b

(

mod m

)

⇔ da

x−db

dkm

⇔ a

x − b

⇔ a

x ≡ b

(

mod m

). Note that (

, m

) = 1 since

we divided by their greatest common factor. Then this has a unique solution

modulo m

Example.

x ≡

3 (

mod

4) has no solution since (2

4) = 2 which does not

divide 3.

5.2 Multiple moduli

In this section, we are concerned with multiple equations involving different

moduli.

Suppose we are given

x ≡

2 (

mod

3) and

x ≡

1 (

mod

4). What is the general

solution to

? We work in mod 12. Since we are given that

x ≡

2 (

mod

3),

we know that

x ≡

8 or 11 (

mod

12). Similarly, since

x ≡

1 (

mod

4), we

must have

x ≡

5 or 9 (

mod

12). Combining these results, we must have

x ≡

(mod 12).

On the other hand, if

x ≡

5 (

mod

12), then

x ≡

≡

2 (

mod

3) and

x ≡

≡

(mod 4). So x ≡ 5 (mod 12) is indeed the most general solution.

In general, we have the Chinese remainder theorem.

Theorem

(Chinese remainder theorem)

Let (

m, n

) = 1 and

a, b ∈ Z

. Then

there is a unique solution (modulo mn) to the simultaneous congruences

(

x ≡ a (mod m)

x ≡ b (mod n)

i.e. ∃x satisfying both and every other solution is ≡ x (mod mn).

Proof.

Since (

m, n

) = 1,

∃u, v ∈ Z

with

= 1. Then

vn ≡

1 (

mod m

)

and

um ≡

1 (

mod n

). Put

umb

vna

. So

x ≡ a

(

mod m

) and

x ≡ b

(mod n).

To show it is unique, suppose both

and

are solutions to the equation.

Then

y ≡ a (mod m) and y ≡ b (mod n)

⇔ y ≡ x (mod m) and y ≡ x (mod n)

⇔ m | y − x and n | y − x

⇔ mn | y − x

⇔ y ≡ x (mod mn)

This shows a congruence (mod

) is equivalent to one (mod

) and another

(mod m).

We can easily extend this to more than two moduli by repeatedly applying

this theorem.

Proposition.

Given any (

m, n

) = 1,

is a unit mod

iff

is a unit both

mod m and mod n.

Proof.

(

⇒

) If

∃u

such that

cu ≡

1 (

mod mn

), then

cu ≡

1 (

mod m

) and

cu ≡

(mod n). So c is a unit mod m and n.

(

⇐

) Suppose there exists

u, v

such that

cu ≡

1 (

mod m

) and

cv ≡

1 (

mod n

Then by CRT,

∃w

with

w ≡ u

(

mod m

) and

w ≡ v

(

mod n

). Then

cw ≡ cu ≡

(mod m) and cw ≡ cv ≡ 1 (mod n).

But we know that 1

≡

1 (

mod m

) and 1

≡

1 (

mod n

). So 1 is a solution to

cw ≡

1 (

mod m

cw ≡

1 (

mod n

). By the “uniqueness” part of the Chinese

remainder theorem, we must have cw ≡ 1 (mod mn).

Definition

(Euler’s totient function)

We denote by

(

) the number of integers

a, 0 ≤ a ≤ m, such that (a, m) = 1, i.e. a is a unit mod m. Note φ(1) = 1.

Proposition.

(i) φ(mn) = φ(m)φ(n) if (m, n) = 1, i.e. φ is multiplicative.

(ii) If p is a prime, φ(p) = p − 1

(iii) If p is a prime, φ(p

) = p

− p

k−1

= p

(1 − 1/p)

(iv) φ(m) = m

p|m

(1 − 1/p).

Proof. We will only prove (iv). In fact, we will prove it twice.

(i) Suppose m = p

···p

. Then

φ(m) = φ(p

)φ(p

) ···φ(p

)

= p

(1 − 1/p

) ···p

(1 − 1/p

)

= m

p|m

(1 − 1/p)

(ii)

Let

···p

. Let

{

, ···m −

}

. Let

{x ∈ X

| x}

Then

|X|

m/p

∩ A

(

) etc. So

(

) =

∩

∩ ···

| = m

p|m

(1 − 1/p).

Example. φ(60) = 60(1 − 1/2)(1 − 1/3)(1 − 1/5) = 16.

a, b

are both units (mod

), then so is

, for if

au ≡

1 and

bv ≡

1, then

(ab)(uv) ≡ 1. So the units form a multiplicative group of size φ(m).

5.3 Prime moduli

Modular arithmetic has some nice properties when the modulus is a prime

number.

Theorem (Wilson’s theorem). (p − 1)! ≡ −1 (mod p) if p is a prime.

Proof.

is a prime, then 1

, ··· , p −

1 are units. Among these, we can pair

each number up with its inverse (e.g. 3 with 4 in modulo 11). The only elements

that cannot be paired with a different number are 1 and

−

1, who are self-inverses,

as show below:

≡ 1 (mod p)

⇔ p | (x

− 1)

⇔ p | (x − 1)(x + 1)

⇔ p | x − 1 or p | x + 1

⇔ x ≡ ±1 (mod p)

Now (

p −

1)! is a product of (

p −

2 inverse pairs together with 1 and

−

1. So

the product is −1.

Theorem

(Fermat’s little theorem)

Let

be a prime. Then

≡ a

(

mod p

)

for all a ∈ Z. Equivalently, a

p−1

≡ 1 (mod p) if a 6≡ 0 (mod p).

Proof. Two proofs are offered:

(i)

The numbers

{

, ···p −

}

are units modulo

and form a group of order

p − 1. So a

p−1

≡ 1 by Lagrange’s theorem.

(ii)

a 6≡

0, then

is a unit. So

ax ≡ ay

iff

x ≡ y

. Then

a, ···

(

p −

are distinct mod

. So they are congruent to 1

, ···p −

1 in some order.

Hence

a ·

a ···

(

p −

a ≡

···

(

p −

1). So

p−1

(

p −

1)!

≡

(

p −

1)!.

So a

p−1

≡ 1 (mod p).

Neither Wilson nor Fermat’s theorem hold if the modulus is non-prime.

However, Fermat’s theorem can be generalized:

Theorem (Fermat-Euler Theorem). Let a, m be coprime. Then

φ(m)

≡ 1 (mod m).

Proof. Lagrange’s theorem: The units mod m form a group of size φ(m).

Alternatively, let

{x ∈ N

: 0

< x < m,

(

x, m

) = 1

}

. These are

the

(

) units. Since

is a unit,

ax ≡ ay

(

mod m

) only if

x ≡ y

(

mod m

So if

, u

, ··· , u

φ(m)

}

, then

{au

, au

, ···au

φ(m)

}

are distinct and are

units. So they must be

, ···u

φ(m)

in some order. Then

···au

φ(m)

≡

···u

φ(m)

. So

φ(m)

z ≡ z

, where

···u

φ(m)

. Since

is a unit, we

can multiply by its inverse and obtain a

φ(m)

≡ 1.

Definition

(Quadratic residues)

The quadratic residues are the “squares” mod

p, i.e. 1

, 2

, ··· , (p − 1)

Note that if

≡ b

(

mod p

), then

p | a

−b

= (

a −b

)(

). Then

p | a −b

p | a

. So

a ≡ ±b

(

mod p

). Thus every square is a square of exactly two

numbers.

Example.

= 7, then 1

≡

1, 2

≡

4, 3

≡

2. So 1

4 are

quadratic residues. 3, 5, 6 are not.

Proposition.

is an odd prime, then

−

1 is a quadratic residue if and only if

p ≡ 1 (mod 4).

Proof.

p ≡

1 (

mod

4), say

= 4

+ 1, then by Wilson’s theorem,

−

≡

(

p −

1)!

≡

···



p−1



−

p−1



···

(

−

2)(

−

≡

(

−

p−1



p−1



= (

−

≡

(2k!)

. So −1 is a quadratic residue.

When

p ≡ −

1 (

mod

4), i.e.

= 4

+ 3, suppose

−

1 is a square, i.e.

−

≡ z

Then by Fermat’s little theorem, 1

≡ z

p−1

≡ z

4k+2

≡

(

)

2k+1

≡

(

−

2k+1

≡ −

Contradiction.

Proposition.

(Unproven) A prime

is the sum of two squares if and only if

p ≡ 1 (mod 4).

Proposition. There are infinitely many primes ≡ 1 (mod 4).

Proof.

Suppose not, and

, ···p

are all the primes

≡

1 (

mod

4). Let

···p

)

+1. Then

is not divisible by 2 or

, ··· , p

. Let

be a prime

q | N

Then

q ≡ −

1 (

mod

4). Then

N ≡

0 (

mod q

) and hence (2

···p

)

+ 1

≡

(

mod q

), i.e. (2

···p

)

≡ −

1 (

mod q

). So

−

1 is a quadratic residue mod

which is a contradiction since q ≡ −1 (mod 4).

Proposition.

Let

= 4

+ 3 be a prime. Then if

is a quadratic residue, i.e.

a ≡ z

(mod p) for some z, then z = ±a

k+1

Proof.

By Fermat’s little theorem,

2k+1

≡ z

4k+2

≡ z

p−1

≡

1. If we multiply by

a, then a

2k+2

≡ a (mod p). So (±a

k+1

)

≡ a (mod p).

This allows us to take square roots efficiently. This efficiency requires an

effective way of computing powers of

efficiently. This can be done by repeated

squaring. For example, to find

, we can calculate this by

((((

)

(

)

·a

. Thus calculation of

has time complexity

(

log n

), as

opposed to O(n) if you take powers manually.

Suppose

is a square mod

, where

and

p, q

are distinct primes. Then

is a square mod

and a square mod

. So there exists some

with (

±s

)

≡ a

(

mod p

) and some

with (

±t

)

≡ a

(

mod q

). By the Chinese remainder theorem,

we can find a unique solution of each case, so we get 4 square roots of

modulo

5.4 Public-key (asymmetric) cryptography

Tossing a coin over a phone

Suppose we have Alice and Bob who wish to toss a coin fairly over the phone.

Alice chooses two 100 digit primes with

p, q ≡

3 (

mod

4). Then she tells Bob the

product

. Bob picks a number

coprime to

, computes

a ≡ u

(

mod n

)

and tells Alice the value of a.

Alice can compute the square roots of

by the above algorithm (

(

log n

)),

obtain ±u, ±v and tells Bob one of these pairs.

Now if Alice picks

±u

, Bob says “you win”. Otherwise, Bob says “you lose”.

Can Bob cheat? If he says “you lose” when Alice says

±u

, Bob must produce

the other pair

±v

, but he can’t know

±v

without factorizing

. (If he knows

±u

and

±v

, then

≡ v

(

mod n

), then

n |

(

u − v

)(

). But

n -

(

u − v

) and

n -

(

). So

p |

(

u − v

) and

q |

(

). Then

= (

n, u −v

) and

= (

n, u

)

which we can calculate efficiently by Euclid’s algorithm)

Thus cheating is as hard as prime factorization.

Note that a difficult part is to generate the 100-digit prime. While there are

sufficiently many primes to keep trying random numbers until we get one, we

need an efficient method to test whether a number is prime. We cannot do this

by factorization since it is slow. So we need an efficient prime-checking function.

We can test whether a large number is prime by doing Fermat-like checks.

We choose random numbers and take it to the (

p −

1)th power and see if they

become 1. If it is not 1, then it is definitely not a prime. If we do sufficiently

many tests that all result in 1, we can be sufficiently certain that it is a prime

(even though not with 100% certainty).

(Recent advancements in algorithms have found efficient ways of deterministic

prime test, but they are generally slower than the above algorithm and is not

widely used)

It is currently believed that it is hard to prime factorize a number, so this is

secure as far as we know.

RSA encryption

Theorem

(RSA Encryption)

We want people to be able to send a message to

Bob without Eve eavesdropping. So the message must be encrypted. We want

an algorithm that allows anyone to encrypt, but only Bob to decrypt (e.g. many

parties sending passwords with the bank).

Let us first agree to write messages as sequences of numbers, e.g. in ASCII

or UTF-8.

After encoding, the encryption part is often done with RSA encryption

(Rivest, Shamier, Adleman). Bob thinks of two large primes

p, q

. Let

and pick

coprime to

(

) = (

p −

1)(

q −

1). Then work out

with

de ≡

(mod φ(n)) (i.e. de = kφ(n) + 1). Bob then publishes the pair (n, e).

For Alice to encrypt a message, Alice splits the message into numbers

M < n

Alice sends M

(mod n) to Bob.

Bob then computes (

)

kφ(n)+1

≡ M

(

mod n

) by Fermat-Euler

theorem.

How can Eve find

? We can, of course, factorize

, find

efficiently, and

be in the same position as Bob. However, it is currently assumed that this is

hard. Is there any other way? Currently we do not know if RSA can be broken

without factorizing (cf. RSA problem).

6 Real numbers

So far, we have only worked with natural numbers and integers. Unfortunately

the real world often involves rational numbers and even real numbers. In this

chapter, our goal is to study real numbers. To do so, we will first have to define

the real numbers. To do so, we will start from the natural numbers.

Before we start, an important (philosophical) point has to be made. The idea

is to define, say, the “real numbers” as a set with some operations (e.g. addition,

multiplication) that satisfies some particular properties, known as the axioms.

When we do this, there are two questions we can ask — is there actually a set

that satisfies these properties, and is it unique?

The first question can be answered by an explicit construction, i.e. we find

a concrete set that actually satisfies these properties. However, it is important

to note that we perform the construction only to show that it makes sense to

talk about such a set. For example, we will construct a real number as a pair of

subsets of

, but it would be absurd to actually think that a real number “is” a

pair of sets. It’s just that it can be constructed as a pair of sets. You would be

considered insane if you asked if

∃x : x ∈ 3 ∨ x ∈ π

holds, even though it is a valid thing to ask if we view each real number as a set

(and is in fact true).

The next problem of uniqueness is more subtle. Firstly, it is clear that the

constructions themselves aren’t unique — instead of constructing the natural

number 0 as the set

∅

, as we will later do, we could as well define it as

and the whole construction will go through. However, we could still hope that

all possible constructions are “isomorphic” in some way. It turns out this is true

for what we have below, but the proofs are not trivial.

However, while this is nice, it doesn’t really matter. This is since we don’t

care how, say, the real numbers are constructed. When working with them, we

just assume that they satisfy the relevant defining properties. So we can just

choose anything that satisfies the axioms, and use it. The fact that there are

other ones not isomorphic to this is not a problem.

Since we are mostly interested in the natural numbers and the real numbers,

we will provide both the axiomatic description and explicit construction for these.

However, we will only give explicit constructions for the integers and rationals,

and we will not give detailed proofs of why they work.

6.1 Construction of numbers

Construction of natural numb ers

Our construction of natural numbers will include 0.

Definition

(Natural numbers)

The natural numbers

is defined by Peano’s

axioms. We call a set

“natural numbers” if it has a special element 0 and a

map S : N → N that maps n to its “successor” (intuitively, it is +1) such that:

(i) S(n) 6= 0 for all n ∈ N

(ii) For all n, m ∈ N, if S(n) = S(m), then n = m.

(iii)

For any subset

A ⊆ N

, if 0

∈ A

and “

n ∈ A ⇒ S

(

)

∈ A

, then in fact

A = N.

The last axiom is the axiom of induction.

We write 1 =

(0)

2 =

(1), 3 =

(2) etc. We can (but will not) prove that

the axiom of induction allows us to define functions on

recursively (cf. IID

Logic and Set Theory). Assuming this, we can define addition and multiplication

recursively by

n + 0 = n n × 0 = 0

n + S(m) = S(n + m) n × S(m) = n × m + n

We can show by induction that these satisfy the usual rules (e.g. associativity,

distributivity).

We can construct this explicitly by 0 =

∅

, 1 =

{

}

, 2 =

{

}

etc. In general,

we define

(

) =

{n} ∪ n

. Note that this is in some sense a circular definition,

since we are defining the natural numbers recursively, but to do recursion, we

need the natural numbers. Also, it is not clear how we can show this satisfies

the axioms above. To actually do this properly, we will need to approach this in

a slightly different way, and the details are better left for the IID Logic and Set

Theory course.

Construction of integers

Definition

(Integers)

. Z

is obtained from

by allowing subtraction. Formally,

we define

to be the equivalence classes of

N ×N

under the equivalence relation

(a, b) ∼ (c, d) if and only if a + d = b + c.

Intuitively, we think of (a, b) as a − b.

We write a for [(a, 0)] and −a for [(0, a)], and define the operations by

(a, b) + (c, d) = (a + c, b + d)

(a, b) × (c, d) = (ac + bd, bd + ad).

We can check that these are well-defined and satisfy the usual properties.

Construction of rationals

Definition

(Rationals)

. Q

is obtained from

by allowing division. Formally,

we define Q to be the equivalence classes of Z × N under the relation

(a, b) ∼ (c, d) if and only if ad = bc.

We write

for [(a, b)]. We define

(a, b) + (c, d) = (ad + bc, bd)

(a, b) × (c, d) = (ac, bd).

We can check that these are well-defined and satisfy the usual properties.

Algebraically, we say Q is a “totally ordered field”.

Definition

(Totally ordered field)

A set

equipped with binary operations

+, × and relation ≤ is a totally ordered field if

(i) F is an additive abelian group with identity 0.

(ii) F \ {0} is a multiplicative abelian group with identity 1.

(iii) Multiplication is distributed over addition: a(b + c) = ab + ac.

(iv) ≤ is a total order.

(v) For any p, q, r ∈ F , if p ≤ q, then p + r ≤ q + r.

(vi) For any p, q, r ∈ F , if p ≤ q and 0 ≤ r, then pr ≤ qr.

Proposition. Q is a totally ordered-field.

Examples of non-totally-ordered fields include

, which is a field but not

totally ordered.

Proposition. Q

is densely ordered, i.e. for any

p, q ∈ Q

, if

p < q

, then there is

some r ∈ Q such that p < r < q.

Proof. Take r =

p+q

However, Q is not enough for our purposes.

Proposition. There is no rational q ∈ Q with q

= 2.

Proof.

Suppose not, and (

)

= 2, where

is chosen as small as possible. We

will derive a contradiction in four ways.

(i) a

= 2

. So

is even. Let

= 2

. Then

= 2

. Then

is even as

well, and b = 2b

. But then

with a smaller b

. Contradiction.

(ii)

We know that

is a product of primes if

b 6

= 1. Let

p | b

. Then

= 2

So p | a

. So p | a. Contradict b minimal.

(iii)

(Dirichlet) We have

. So

= 2

. For any,

u, v

, we have

= 2

and thus

uab

+ 2

. So

au+2bv

bu+av

. Put

−

, v

= 1.

Then

2b−a

a−b

. Since

a <

b, a − b < b

. So we have found a rational with

smaller b.

(iv)

Same as 3, but pick

u, v

= 1 since

and

are coprime. So

an integer.

Construction of real numb ers

As shown above, the rational numbers don’t have all the numbers we need. What

exactly do we mean by “numbers are missing”? We might want to say that the

problem is that not all polynomial equations have solutions. However, this is

not the real problem. First of all, even if we are working with the reals, not all

equations have solutions, e.g.

+ 1 = 0. Also, some real numbers such as

are not solutions to polynomial equations (with integer coefficients), but we still

want them.

The real problem is expressed in terms of least upper bounds, or suprema.

Definition

(Least upper bound/supremum and greatest lower bound/infimum)

For an ordered set

s ∈ X

is a least upper bound (or supremum) for the set

S ⊆ X, denoted by s = sup S, if

(i) s is an upper bound for S, i.e. for every x ∈ S, we have x ≤ s.

(ii) if t is any upper bound for S, then s ≤ t.

Similarly,

s ∈ X

is a greatest lower bound (or infimum) if

is a lower bound and

any lower bound t ≤ s.

By definition, the least upper bound for S, if exists, is unique.

The problem with

is that if we let

{q ∈ Q

}

, then it has no

supremum in Q.

Recall that

is a totally ordered field. We will define the real numbers

axiomatically to be a totally ordered field without this problem.

Definition

(Real numbers)

The real numbers is a totally ordered field contain-

ing Q that satisfies the least upper bound axiom.

Axiom (Least upper bound axiom). Every non-empty set of the real numbers

that has an upper bound has a least upper bound.

We have the requirement “non-empty” since every number is an upper bound

of ∅ but it has no least upper bound.

We will leave the construction to the end of the section.

Note that

is a subset of

, in the sense that we can find a copy of

inside

. By definition of a field, there is a multiplicative identity 1

∈ R

. We can then

define the natural numbers by

n = 1 + ··· + 1

| {z }

n times

We can then define the negative integers by letting

−n

be the additive inverse of

. Then

is the multiplicative inverse of

(for

n 6

= 0), and

is just

copies

added together. This is our canonical copy of Q.

Corollary.

Every non-empty set of the real numbers bounded below has an

infimum.

Proof.

Let

be non-empty and bounded below. Then

−S

{−x

x ∈ S}

is a

non-empty set bounded above, and inf S = −sup(−S).

Alternatively, we can prove it just using the ordering of R:

Proof.

Let

be non-empty and bounded below. Let

be the set of all lower

bounds of

. Since

is bounded below,

is non-empty. Also,

is bounded

above by any member of S. So L has a least upper bound sup L.

For each

x ∈ S

, we know

is an upper bound of

. So we have

sup L ≤ x

by definition. So sup L is indeed a lower bound of S. Also, by definition, every

lower bound of S is less than (or equal to) sup L. So this is the infimum.

Now the set {q ∈ Q : q

< 2} has a supremum in R (by definition).

We make some useful definitions.

Definition

(Closed and open intervals)

A closed interval [

a, b

] with

a ≤ b ∈ R

is the set {x ∈ R : a ≤ x ≤ b}.

An open interval (a, b) with a ≤ b ∈ R is the set {x ∈ R : a < x < b}.

Similarly, we can have [

a, b

) =

{x ∈ R

a ≤ x < b}

and (

a, b

] =

{x ∈ R

a <

x ≤ b}.

Example.

Let

= [0

1]. Then

S 6

∅

. Also

has an upper bound, e.g. 2.

Hence sup S exists.

To find it explicitly, notice that 1 is an upper bound for

by definition, and

t <

1, then

is not an upper bound for

since 1

∈ S

but 1

6≤ t

. So every

upper bound is at least 1 and therefore 1 is the supremum of S.

Now let

= (0

1). Again

is non-empty and has an upper bound (e.g.

2). So again

sup T

exists. We know that 1 is an upper bound. If

t <

0, then

∈ S

but

s 6≤ t

. So

is not an upper bound. Now suppose 0

≤ t <

1, then

< t <

1+t

1 and so

1+t

∈ S

but

1+t

6≤ t

. So

is not an upper bound. So

sup T = 1.

Note that these cases differ by

sup S ∈ S

but

sup T 6∈ T

has a maximum

element 1 and the maximum is the supremum.

doesn’t have a maximum, but

the supremum can still exist.

The real numbers has a rather interesting property.

Theorem

(Axiom of Archimedes)

Given

r ∈ R

, there exists

n ∈ N

with

n > r

This was considered an axiom by Archimedes but we can prove this with the

least upper bound axiom.

Proof.

Assume the contrary. Then

is an upper bound for

is not empty

since 1

∈ N

. By the least upper bound axiom,

sup N

exists. Since

is the

least upper bound for

s −

1 is not an upper bound for

. So

∃m ∈ N

with

m > s −

1. Then

+ 1

∈ N

but

+ 1

> s

, which contradicts the statement

that s is an upper bound.

Notice that every non-empty set

S ∈ R

which is bounded below has a greatest

lower bound (or infimum). In particular, we have

Proposition. inf{

: n ∈ N} = 0.

Proof.

Certainly 0 is a lower bound for

. If

t >

0, there exists

n ∈ N

such that

n ≥ 1/t. So t ≥ 1/n ∈ S. So t is not a lower bound for S.

Theorem. Q

is dense in

, i.e. given

r, s ∈ R

, with

r < s

∃q ∈ Q

with

r < q < s.

Proof.

wlog assume first

r ≥

0 (just multiply everything by

−

1 if

r <

0 and

swap

and

). Since

s − r >

0, there is some

n ∈ N

such that

< s − r

. By

the Axiom of Archimedes, ∃N ∈ N such that N > sn.

Let

{k ∈ N

≥ s}

is not empty, since

N ∈ T

. Then by the well-

ordering principle,

has a minimum element

. Now

m 6

= 1 since

< s −r ≤ s

Let

m−1

. Since

m −

6∈ T

q < s

. If

m−1

< r

, then

< r

< s

, so

m 6∈ T , contradiction. So r < q < s.

Theorem. There exists x ∈ R with x

= 2.

Proof.

Let

{r ∈ R

≤

}

. Then 0

∈ S

S 6

∅

. Also for every

r ∈ S

we have r ≤ 3. So S is bounded above. So x = sup S exists and 0 ≤ x ≤ 3.

By trichotomy, either x

< 2, x

> 2 or x

= 2.

Suppose

2. Let 0

< t <

1. Then consider (

)

+ 2

+ 6

t ≤ x

+ 7

. Pick

t <

2−x

, then (

)

2. So

t ∈ S

. This

contradicts the fact that x is an upper bound of S.

Now suppose

2. Let 0

< t <

1. Then consider (

x −t

)

−

≥

−

. Pick

t <

−2

. Then (

x − t

)

2, so

x − t

is an upper bound for

This contradicts the fact that x is the least upper bound of S.

So by trichotomy, x

= 2.

Now let’s try to construct the real numbers from the rationals. The idea

is that each set like

{q ∈ Q

}

represents a “missing number”, namely

the supremum

√

. However, many different sets can correspond to the same

missing number, e.g.

{q ∈ Q

} ∪ {−

}

also “should have” supremum

√

So we need to pick a particular one that represents this missing number.

To do so, we pick the maximal one, i.e. a subset

such that for

x ∈ S

and

y ∈ Q

, if

y < x

, then

y ∈ S

(if

is not in

, we can add it to

, and

S ∪{y}

will

“have the same upper bound”). Each rational

q ∈ Q

can also be represented by

the set

{x ∈ Q

x ≤ q}

. These are known as Dedekind cuts. By convention, a

Dedekind cut is instead defined as the pair (

S, Q \ S

), and can be characterized

by the following definition:

Definition

(Dedekind cut)

A Dedekind cut of

is a set of partition of

into

L and R such that

(∀l ∈ L)(∀r ∈ R) l < r,

and R has no minimum.

The requirement that

has no minimum corresponds to our (arbitrary)

decision that the rationals should be embedded as

q 7→ {x ∈ q : x ≤ Q}, {x ∈ Q : x > q},

instead of q 7→ {x ∈ q : x < Q}, {x ∈ Q : x ≥ q},

We can then construct the set

from

by letting

be the set of all

Dedekind cuts. The supremum of any bounded set of real numbers is obtained

by taking the union of (the left sides) of the Dedekind cuts. The definition of

the arithmetic operations is left as an exercise for the reader (to actually define

them is tedious but not hard).

6.2 Sequences

Here we will look at sequences, with series in the next chapter. Only a brief

introduction to these topics will be provided here, as these topics will be studied

later in Analysis I.

Definition

(Sequence)

A sequence is a function

N → R

. If

is a sequence,

instead of

(1)

, a

(2)

, ···

, we usually write

, a

, ···

. To emphasize it is a

sequence, we write the sequence as (a

We want to capture the notion of a sequence tending to a limit. For example,

we want to say that 1

, ···

tends to 0, while 1

, ···

does not converge to

a limit.

The idea is that if

→ l

, then we can get as close to

as we like, as long

as we are sufficiently far down the sequence. More precisely, given any “error

threshold”

, we can find a (possibly large) number

such that whenever

n ≥ N

we have |a

− l| < ε.

Definition

(Limit of sequence)

The sequence (

) tends to

l ∈ R

tends

to infinity if and only if

(∀ε > 0)(∃N ∈ N)(∀n ≥ N ) |a

− l| < ε.

tends to

tends to infinity, we write

→ l

n → ∞

;

lim

n→∞

;

or a

converges to l.

Intuitively, if

→ l

, we mean given any

, for sufficiently large

always within l ± ε.

The definition a

6→ l is the negation of the above statement:

(∃ε > 0)(∀N ∈ N)(∃n ≥ N ) |a

− l| ≥ ε.

Definition

(Convergence of sequence)

The sequence (

) converges if there

exists an l such that a

→ l. The sequence diverges if it doesn’t converge.

Every proof of

→ l

looks like: Given

ε >

0, (argument to show

exists,

maybe depending on ε), such that ∀n ≥ N, |a

− l| < ε.

Example. Show that a

= 1 −

→ 1.

Given

ε >

0, choose

N >

, which exists by the Axiom of Archimedes. If

n ≥ N , then |a

− 1| =

≤ ε. So a

→ 1.

Example. Let

(

n is prime

n is not prime

We will show that

→

0. Given

ε >

0. Choose

N >

. Then

∀n ≥ N

− 0| ≤

< ε.

Example. Prove that

(

1 n is prime

0 n is not prime

diverges.

Let

. Suppose

l ∈ R

. If

l <

, then

− l| > ε

when

is prime. If

l ≥

, then

− l| > ε

when

is not prime. Since the primes and non-primes

are unbounded, (∀N)∃n > N such that |a

− l| > ε. So a

diverges.

An important property of R is the following:

Theorem. Every bounded monotonic sequence converges.

In case of confusion, the terms are defined as follows: (

) is increasing if

m ≤ n

implies

≤ a

. Decreasing is defined similarly. Then it is monotonic if

it is increasing or decreasing. (

) is bounded if there is some

B ∈ R

such that

| ≤ B for all n.

Proof.

wlog assume (

) is increasing. The set

n ≥

}

is bounded and

non-empty. So it has a supremum

(least upper bound axiom). Show that

the limit:

Given any

ε >

l − ε

is not an upper bound of

. So

∃N

such that

≥ l − ε

. Since

is increasing, we know that

l ≥ a

≥ a

> l − ε

for all

m ≥ N . So ∃N such that ∀n ≥ N, |a

− l| < ε. So a

→ l.

We can show that this theorem is equivalent to the least upper bound axiom.

Definition

(Subsequence)

A subsequence of (

) is (

g(n)

) where

N → N

strictly increasing. e.g. a

, a

··· is a subsequence of (a

Theorem. Every sequence has a monotonic subsequence.

Proof.

Call a point

a “peak” if (

∀m ≥ k

)

≤ a

. If there are infinitely

many peaks, then they form a decreasing subsequence. If there are only finitely

many peaks,

∃N

such that no

with

n > N

is a peak. Pick

with

> N

Then pick

with

> N

and

> a

. This is possible because

not a peak. The pick

with

> N

and

> a

, ad infinitum. Then we

have a monotonic subsequence.

We will now prove the following basic properties of convergence:

Theorem.

(i) If a

→ a and a

→ b, then a = b (i.e. limits are unique)

(ii) If a

→ a and b

= a

for all but finitely many n, then b

→ a.

(iii) If a

= a for all n, then a

→ a.

(iv) If a

→ a and b

→ b, then a

+ b

→ a + b

(v) If a

→ a and b

→ b, then a

→ ab

(vi) If a

→ a 6= 0, and ∀n(a

6= 0). Then 1/a

→ 1/a.

(vii)

→ a

and

→ a

, and

∀n

(

≤ c

≤ b

), then

→ a

. (Sandwich

theorem)

Many students are confused as to why we should prove these “obvious”

properties of convergence. It seems “obvious” that if

converges to

and

converges to

, then the sum converges to

. However, it is not obvious (at least

for the first-time learners) that if (

∀ε >

0)(

∃N

)(

∀n ≥ N

)

−a| < ε

and (

∀ε >

0)(

∃N

)(

∀n ≥ N

)

−b| < ε

, then (

∀ε >

0)(

∃N

)(

∀n ≥ N

)

(

)

−

(

)

| < ε

In some sense, what we are trying to prove is that our attempt at defining

convergence actually satisfies the “obvious” properties we think convergence

should satisfy.

In proving this, we will make frequent use of the triangle inequality:

y| ≤

|x| + |y|.

Proof.

(i)

Suppose instead

a < b

. Then choose

b−a

. By the definition of the

limit,

∃N

such that

∀n ≥ N

− a| < ε

. There also

∃N

st.

∀n ≥ N

− b| < ε.

Let

max{N

, N

}

. If

n ≥ max{N

, N

}

, then

|a − b| ≤ |a − a

− b| < 2ε = b − a. Contradiction. So a = b.

(ii)

Given

ε >

0, there

∃N

st.

∀n ≥ N

, we have

− a| < ε

. Since

for all but finitely many n, there exists N

such that ∀n ≥ N

, a

= b

Let

max{N

, N

}

. Then

∀n ≥ N

, we have

−a|

−a| < ε

. So

→ a.

(iii) ∀ε, take N = 1. Then |a

− a| = 0 < ε for all n ≥ 1.

(iv)

Given

ε >

∃N

such that

∀n ≥ N

, we have

− a| < ε/

2. Similarly,

∃N

such that ∀n ≥ N

, we have |b

− b| < ε/2.

Let

max{N

, N

}

. Then

∀n ≥ N

(

)

−

(

)

| ≤ |a

− a|

− b| < ε.

(v) Given ε > 0, Then there exists N

, N

such that

∀n ≥ N

: |a

− a| <

2(|b| + 1)

∀n ≥ N

: |b

− b| <

2|a|

∀n ≥ N

: |b

− b| < 1 ⇒ |b

| < |b| + 1

Then let N = max{N

, N

}. Then ∀n ≥ N,

− ab| = |b

− a) + a(b

− b)|

≤ |b

||a

− a| + |a||b

− b|

< (|b| + 1)|a

− a| + |a||b

− b|

= ε

(vi) Given ε > 0, then ∃N

, N

such that |a

− a| <

|a|

ε and |a

− a| <

|a|

Let N = max{N

, N

}. The ∀n ≥ N,



−



− a|

||a|

|a|

− a|

< ε

(vii)

By (iii) to (v), we know that

− a

→

0. Let

ε >

0. Then

∃N

such that

∀n ≥ N

, we have

− a

| < ε

. So

− a

| < ε

. So

− a

→

0. So

= (c

− a

) + a

→ a.

Example. Let x

(n+1)(2n+1)

. Then we have

(1 + 1/n)(2 + 1/n)

1 + 1/n

→

1 · 2

= 2

by the theorem (many times).

Example.

Let

100

. Since

n+1

100

n+1

for large

n >

200, we know

that 0 ≤ y

< y

200

. Since y

200

→ 0, we know that y

→ 0 as well.

6.3 Series

In a field, the sum of two numbers is defined. By induction, the sum of finitely

many numbers is defined as well. However, infinite sums (“series”) are not. We

will define what it means to take an infinite sum. Of course, infinite sums exist

only for certain nice sums. For example, 1 + 1 + 1 + ··· does not exist.

Definition

(Series and partial sums)

Let (

) be a sequence. Then

n=1

is the mth partial sum of (a

). We write

∞

n=1

= lim

m→∞

if the limit exists.

Example. Let a

n(n−1)

for n ≥ 2. Then

n=2

n(n − 1)

n=2



n − 1

−



= 1 −

→ 1.

Then

∞

n=2

n(n − 1)

= 1.

Example.

Let

. Then

n=1

. We know that

is increasing.

We also know that

≤

1 +

n(n−1)

≤

2, i.e. it is bounded above. So

converges and

∞

n=1

exists (in fact it is π

/6).

Example.

(Geometric series) Suppose

, where

|r| <

1. Then

r ·

1−r

→

1−r

since r

→ 0. So

∞

n=1

1 − r

Example. (Harmonic series) Let a

. Consider

= 1 +

+ ···+

≥ 1 +

+ ···+

≥ 1 +

∞

n=1

diverges.

Decimal expansions

Definition

(Decimal expansion)

Let (

) be a sequence with

∈ {

, ···

}

Then

∞

n=1

converges to a limit

with 0

≤ r ≤

1 since the partial sums

are increasing and bounded by

→

1 (geometric series). We say

r = 0.d

···, the decimal expansion of r.

Does every

with 0

≤ x <

1 have a decimal expansion? Pick

maximal

such that

≤ x <

1. Then 0

≤ x −

since

is maximal. Then pick

maximal such that

100

≤ x −

. By maximality, 0

≤ x −

−

100

Repeat inductively, pick maximal d

with

≤ x −

n−1

j=1

0 ≤ x −

j=1

Since both LHS and RHS

→

0, by sandwich,

x−

∞

j=1

= 0, i.e.

= 0

···

Since we have shown that at least one decimal expansion, can the same

number have two different decimal expansions? i.e. if 0

···

= 0

···

must a

= b

for all i?

Now suppose that the

and

are equal until

, i.e.

for

j < k

. wlog

assume a

< b

. Then

∞

j=k+1

≤

∞

j=k+1

k+1

1 − 1/10

So we must have

+ 1,

= 9 for

j > k

and

= 0 for

j > k

. For example,

0.47999 ··· = 0.48000 ···.

6.4 Irrational numbers

Recall Q ⊆ R.

Definition (Irrational number). Numbers in R \ Q are irrational.

Definition (Periodic number). A decimal is periodic if after a finite number `

of digits, it repeats in blocks of k for some k, i.e. d

n+k

= d

for n > `.

Proposition. A number is periodic iff it is rational.

Proof.

Clearly a periodic decimal is rational: Say

= 0

7413157157157

···

Then

x = 10

= 7413.157157 ···

= 7413 + 157



+ ···



= 7413 + 157 ·

1 − 1/10

∈ Q

Conversely, let

x ∈ Q

. Then

has a periodic decimal. Suppose

with (

10) = 1. Then 10

max(c,d)

for some

a, b, n ∈ Z

and

≤ b < q

. However, since (

10) = 1, by Fermat-Euler, 10

φ(q)

≡

1 (

mod q

), i.e.

φ(q)

− 1 = kq for some k. Then

999 ···9

= kb



φ(q)

2φ(q)

+ ···



Since

kb < kq <

φ(q)

, write

···d

φ(q)

. So

= 0

···d

φ(q)

···

and x is periodic.

Example. x

= 0

01101010001010

···

, where 1s appear in prime positions, is

irrational since the digits don’t repeat.

6.5 Euler’s number

Definition (Euler’s number).

e =

∞

j=0

= 1 +

+ ···

This sum exists because the partial sums are bounded by 1+

···

3 and it is increasing. So 2 < e < 3.

Proposition. e is irrational.

Proof.

e ∈ Q

? Suppose

. We know

q ≥

2 since

is not an integer (it is

between 2 and 3). Then q!e ∈ N. But

q!e = q! + q! +

+ ···+

| {z }

(q + 1)!

(q + 2)!

+ ···

| {z }

where n ∈ N. We also have

x =

q + 1

(q + 1)(q + 2)

+ ··· .

We can bound it by

0 < x <

q + 1

(q + 1)

+ ··· =

q + 1

1 − 1/(q + 1)

< 1.

This is a contradiction since

must be in

but it is a sum of an integer

plus a non-integer x.

6.6 Algebraic numbers

Rational numbers are “nice”, because they can be written as fractions. Irrational

numbers are bad. However, some irrational numbers are worse than others. We

can further classify some irrational numbers as being transcendental.

Definition

(Algebraic and transcendental numbers)

An algebraic number is a

root of a polynomial with integer coefficients (or rational coefficients). A number

is transcendental if it is not algebraic.

Proposition. All rational numbers are algebraic.

Proof. Let x =

, then x is a root of qx − p = 0.

Example.

√

2 is irrational but algebraic since it is a root of x

− 2 = 0.

So do transcendental numbers exist?

Theorem. (Liouville 1851; Non-examinable) L is transcendental, where

L =

∞

n=1

= 0.11000100 ···

with 1s in the factorial positions.

Proof.

Suppose instead that

(

) = 0 where

(

) =

k−1

···

where a

∈ Z, a

6= 0.

For any rational p/q, we have





= a





+ ···+ a

integer

So if p/q is not a root of f, then |f(p/q)| ≥ q

−k

For any m, we can write L = first m terms + rest of the terms = s + t.

Now consider |f (s)| = |f(L) − f(s)| (since f(L) = 0). We have

|f(L) − f (s)| =



− s

)



≤

− s

|(L − s)(L

i−1

+ ···+ s

i−1

)

≤

|(L − s)i,

= (L − s)

i|a

= tC

with C =

i|a

Writing

as a fraction, its denominator is at most 10

. So

(

)

| ≥

−k×m!

Combining with the above, we have tC ≥ 10

−k×m!

We can bound t by

t =

∞

j=m+1

−j!

≤

∞

`=(m+1)!

−`

−(m+1)!

So (10

9)10

−(m+1)!

≥

−k×m!

. Pick

m ∈ N

so that

m > k

and 10

10C

This is always possible since both

and 10

9 are constants. Then the inequality

gives 10

−(m+1)

≥ 10

−(k+1)

, which is a contradiction since m > k.

Theorem. (Hermite 1873) e is transcendental.

Theorem. (Lindermann 1882) π is transcendental.

7 Countability

After messing with numbers, we finally get back to sets. Here we are concerned

about the sizes of sets. We can count how big a set is by constructing bijections.

Two sets have the same number of things if there is a bijection between them.

In particular, a set has

things if we can bijection it with [

] =

{

, ··· , n}

First first prove a few preliminary properties about bijecting with [

] that

should be obviously true.

Lemma. If f : [n] → [n] is injective, then f is bijective.

Proof.

Perform induction on

: It is true for

= 1. Suppose

n >

1. Let

j = f(n). Define g : [n] → [n] by

g(j) = n, g(n) = j, g(i) = i otherwise.

Then

is a bijection. So the map

g ◦ f

is injective. It fixes

, i.e.

g ◦ f

(

) =

So the map h : [n − 1] → [n − 1] by h(i) = g ◦ f (i) is well-defined and injective.

So h is surjective. So h is bijective. So g ◦ f is bijective. So is f.

Corollary.

is a set and

A →

[

] and

A →

[

] are both bijections,

then m = n.

Proof.

wlog assume

m ≥ n

. Let

: [

]

→

[

] with

(

) =

, which is injective.

Then the map

h ◦ f ◦ g

−1

: [

]

→

[

] is injective. Then by the lemma this is

surjective. So h must be surjective. So n ≥ m. Hence n = m.

This shows that we cannot biject a set to two different numbers, or a set

cannot have two different sizes!

Definition

(Finite set and cardinality of set)

The set

is finite if there exists

a bijection

A →

[

] for some

n ∈ N

. The cardinality or size of

, written as

|A|, is n. By the above corollary, this is well-defined.

Lemma. Let S ⊆ N. Then either S is finite or there is a bijection g : N → S.

Proof.

S 6

∅

, by the well-ordering principle, there is a least element

∈ S

S \ {s

} 6

∅

, it has a least element

. If

S \ {s

, s

}

is not empty, there is a

least element

. If at some point the process stops, then

, s

, ··· , s

}

which is finite. Otherwise, if it goes on forever, the map

N → S

given by

(

) =

is well-defined and is an injection. It is also a surjection because if

k ∈ S

, then

is a natural number and there are at most

elements of

less

than k. So k will be mapped to s

for some i ≤ k.

Definition

(Countable set)

A set

is countable if

is finite or there is a

bijection between A and N. A set A is uncountable if A is not countable.

This is one possible definition of countability, but there are some (often)

more helpful definitions.

Theorem. The following are equivalent:

(i) A is countable

(ii) There is an injection from A → N

(iii) A = ∅ or there is a surjection from N → A

Proof.

(i)

⇒

(iii): If

is finite, there is a bijection

A → S

for some

S ⊆ N

For all

x ∈ N

, if

x ∈ S

, then map

x 7→ f

−1

(

). Otherwise, map

to any element

of A. This is a surjection since ∀a ∈ A, we have f(a) 7→ a.

(iii)

⇒

(ii): If

A 6

∅

and

N → A

is a surjection. Define a map

A → N

by g(a) = min f

−1

({a}), which exists by well-ordering. So g is an injection.

(ii)

⇒

(i): If there is an injection

A → N

, then

gives a bijection between

and

(

)

⊆ N

. If

is finite, so is

. If

is infinite, there is a bijection

between S and N. So there is a bijection g ◦ f between A and N.

Often, the injection definition is the most helpful.

Proposition. The integers Z are countable.

Proof. The map f : Z → N given by

f(n) =

(

2n n > 0

2(−n) + 1 n ≤ 0

is a bijection.

Proposition. N × N is countable.

Proof.

We can map (

a, b

)

7→

injectively by the fundamental theorem of

arithmetic. So N × N is countable.

We can also have a bijection by counting diagonally: (

a, b

)

7→



a+b



− a

+ 1:

Since

is countable, we have an injection

Z → N

, so there is an injection

from

Z × N → N × N → N

. So

Z × N

is countable. However, the rationals are

the equivalence classes of Z × N. So Q is countable.

Proposition.

A → B

is injective and

is countable, then

is countable

(since we can inject B → N).

Proposition. Z

is countable for all k ∈ N

Proof.

Proof by induction:

is countable. If

is countable,

k+1

Z × Z

Since we can map

→ N

injectively by the induction hypothesis, we can map

injectively Z

k+1

→ Z × N, and we can map that to N injectively.

Theorem. A countable union of countable sets is countable.

Proof.

Let

be a countable index set, and for each

α ∈ I

, let

be a countable

set. We need to show that

α∈I

is countable. It is enough to construct an

injection

α∈I

→ N × N

because

N × N

is countable. We know that

countable. So there exists an injection

I → N

. For each

α ∈ I

, there exists

an injection g

: A

→ N.

For

a ∈

, pick

min{j ∈ N

a ∈ A

and f

(

) =

, and let

the corresponding index such that

(

) =

. We then set

(

) = (

m, g

(

)),

and this is an injection.

Proposition. Q is countable.

Proof. It can be proved in two ways:

(i) Q

n≥1



: m ∈ Z



, which is a countable union of count-

able sets.

(ii) Q

can be mapped injectively to

Z × N

a/b 7→

(

a, b

), where

b >

0 and

(a, b) = 1.

Theorem. The set of algebraic numbers is countable.

Proof.

Let

be the set of polynomials of degree

with integer coefficients.

Then

k−1

···

7→

(

, a

k−1

, ··· , a

) is an injection

→ Z

k+1

Since Z

k+1

is countable, so is P

Let

be the set of all polynomials with integer coefficients. Then clearly

P =

. This is a countable union of countable sets. So P is countable.

For each polynomial

p ∈ P

, let

be the set of its roots. Then

finite and thus countable. Hence

p∈P

, the set of all algebraic numbers, is

countable.

Theorem. The set of real numbers R is uncountable.

Proof.

(Cantor’s diagonal argument) Assume

is countable. Then we can list

the reals as

, r

···

so that every real number is in the list. Write each

uniquely in decimal form (i.e. without infinite trailing ’9’s). List them out

vertically:

= n

. d

···

= n

. d

···

= n

. d

···

= n

. d

···

Define

= 0

. d

···

(

0 d

6= 0

1 d

= 0

. Then by construction, this

differs from the

th number in the list by the

th digit, and is so different

from every number in the list. Then

is a real number but not in the list.

Contradiction.

Corollary. There are uncountable many transcendental numbers.

Proof.

If not, then the reals, being the union of the transcendentals and algebraic

numbers, must be countable. But the reals is uncountable.

This is an easy but non-constructive proof that transcendental numbers exists.

“If we can’t find one, find lots!” (it is debatable whether this proof is constructive

or not. Some argue that we can use this to construct a transcendental number

by listing all the algebraic numbers and perform the diagonal argument to

obtain a number not in the list, i.e. a transcendental number. So this is in fact

constructive)

Example.

Let

{Y ⊆ N

|Y |

, i.e. the set of all subsets of

of size

We can inject

→ Z

in the obvious way, e.g.

{

} 7→

7) etc. So it is

countable. So F =

k≥0

, the set of all finite subsets of N is countable.

Example.

Recall

(

) =

Y ⊆ X}

. Now suppose

(

) is countable. Let

, S

, ···

be the list of all subsets of

. Let

n 6∈ S

}

. But then

is not in the list. Contradiction. So P(N) is uncountable.

Example.

Let Σ be the set of all functions

N → N

(i.e. the set of all integer

sequences). If Σ were countable, we could list it as

, f

···

. But then consider

given by

(

) =

(

1 f

(n) 6= 1

2 f

(n) = 1

. Again

is not in the list. Contradiction. So

Σ is uncountable.

Alternatively, there is a bijection between

(

) and the set of 0, 1 sequences

S 7→

the indicator function. So we can inject

(

)

→

Σ by

S 7→

indicator

function +1. So Σ cannot be countable (since P(N) is uncountable).

Or, we can let Σ

∗

⊆

Σ be the set of bijections from

N → N

. Let Σ

∗∗

⊆

∗

be the bijections of the special form: for every n,

either

(

f(2n − 1) = 2n − 1

f(2n) = 2n

, or

(

f(2n − 1) = 2n

f(2n) = 2n − 1

i.e. for every odd-even pair, we either flip them or keep them the same.

But there is a bijection between Σ

∗∗

and 0

1 sequences: if the

th term in

the sequence = 0, don’t flip the

th pair in the function, vice versa. Hence Σ

∗∗

is uncountable.

Theorem. Let A be a set. Then there is no surjection from A → P(A).

Proof.

Suppose

A → P

(

) is surjective. Let

{a ∈ A

a 6∈ f

(

)

}

. Since

is surjective, there must exist

s ∈ A

such that

(

) =

. If

s ∈ S

, then

s 6∈ S

by the definition of

. Conversely, if

s 6∈ S

, then

s ∈ S

. Contradiction. So

cannot exist.

This shows that there are infinitely many different possible “infinite sizes” of

sets.

We conclude by two theorems that we will not prove.

Theorem

(Cantor-Schr¨oder-Bernstein theorem)

Suppose there are injections

A → B and B → A. Then there’s a bijection A ↔ B.

Continuum hypothesis.

There is no set whose size lies between

and

In 1963, Paul Cohen proved that it is impossible to prove this or disprove this

statement (in ZFC). The proof can be found in the Part III Topics in Set Theory

course.