III Ramsey Theory (Full)

Part III — Ramsey Theory

Based on lectures by B. P. Narayanan

Notes taken by Dexter Chua

Lent 2017

These notes are not endorsed by the lecturers, and I have modified them (often

significantly) after lectures. They are nowhere near accurate representations of what

was actually lectured, and in particular, all errors are almost surely mine.

What happens when we cut up a mathematical structure into many ‘pieces’ ? How

big must the original structure be in order to guarantee that at least one of the pieces

has a specific property of interest? These are the kinds of questions at the heart of

Ramsey theory. A prototypical result in the area is van der Waerden’s theorem, which

states that whenever we partition the natural numb ers into finitely many classes, there

is a class that contains arbitrarily long arithmetic progressions.

The course will cover both classical material and more recent developments in the

subject. Some of the classical results that I shall cover include Ramsey’s theorem, van

der Waerden’s theorem and the Hales–Jewett theorem; I shall discuss some applications

of these results as well. More recent developments that I hope to cover include the

prop erties of non-Ramsey graphs, topics in geometric Ramsey theory, and finally,

connections between Ramsey theory and top ological dynamics. I will also indicate a

number of open problems.

Pre-requisites

There are almost no pre-requisites and the material presented in this course will, by

and large, be self-contained. However, students familiar with the basic notions of graph

theory and point-set topology are bound to find the course easier.

Contents

0 Introduction

1 Graph Ramsey theory

1.1 Infinite graphs

1.2 Finite graphs

2 Ramsey theory on the integers

3 Partition Regularity

4 Topological Dynamics in Ramsey Theory

5 Sums and products*

0 Introduction

Vaguely, the main idea of Ramsey theory is as follows — we have some object

that comes with some structure, and then we break

up into finitely many

pieces. Can we find a piece that retains some of the structure of

? Usually, we

phrase this in terms of colourings. We pick a list of finitely many colours, and

colour each element of X. Then we want to find a monochromatic subset of X

that satisfies some properties.

The most classical example of this is a graph colouring problem. We take a

graph, say

We then colour each edge with either red or blue:

We now try to find a (complete) subgraph that is monochromatic, i.e. a subgraph

all of whose edges are the same colour. With a bit of effort, we can find a red

monochromatic subgraph of size 4:

Are we always guaranteed that we can find a monochromatic subgraph of size 4?

If not, how big does the initial graph have to be? These are questions we will

answer in this course. Often, we will ask the question in an “infinite” form —

given an infinite graph, is it always possible to find an infinite monochromatic

subgraph? The answer is yes, and it turns out we can deduce results about the

finitary version just by knowing the infinite version.

These questions about graphs will take up the first chapter of the course.

In the remaining of the course, we will discuss Ramsey theory on the integers,

which, for the purpose of this course, will always mean

{

, ···}

. We will

now try to answer questions about the arithmetic structure of

. For example,

when we finitely colour

, can we find an infinite arithmetic progression? With

some thought, we can see that the answer is no. However, it turns out we can

find arbitrarily long arithmetic progressions.

More generally, suppose we have some system of linear equations

3x + 6y + 2z = 0

2x + 7y + 3z = 0.

If we finitely colour

, can we always find a monochromatic solution to these

equations?

Remarkably, there is a complete characterization of all linear systems that

always admit monochromatic solutions. This is given by Rado’s theorem, which

is one of the hardest theorems we are going to prove in the course.

If we are brave, then we can try to get the multiplicative structure of

the picture. We begin by asking the most naive question — if we finitely colour

, can we always find

x, y ∈ N

, not both 2, such that

and

are the same

colour? This is a very hard question, and the answer turns out to be yes.

1 Graph Ramsey theory

1.1 Infinite graphs

We begin by laying out some notations and definitions.

Definition (N and [n]). We write

N = {1, 2, 3, 4, ···}.

We also write

[n] = {1, 2, 3, ··· , n}.

Notation.

For a set

, we write

(r)

for the subsets of

of size

. The

elements are known as r-sets.

We all know what a graph is, hopefully, but for concreteness we provide a

definition:

Definition (Graph). A graph G is a pair (V, E), where E ⊆ V

(2)

In particular, our graphs are not directed.

We are mostly interested in graphs with

or [

]. In this case, we will

write an edge

{i, j}

, and always assume that

i < j

. More generally, in

(r)

we write i

···i

for an r-set, and implicitly assume i

< i

< ··· < i

Definition (k-colouring). A k-colouring of a set X is a map c : X → [k].

Of course, we often replace

with some actual set of colours, e.g.

{red, blue}

We make more obvious definitions:

Definition

(Monochromatic set)

Let

be a set with a

-colouring. We say a

subset Y ⊆ X is monochromatic if the colouring restricted to Y is constant.

The first question we are interested is in the following — if

(2)

-coloured,

can we find a complete infinite subgraph that is monochromatic? In other words,

is there an infinite subset X ⊆ N such that X

(2)

⊆ N

(2)

is monochromatic?

We begin by trying some examples. The correct way to read these examples

is to stare at the colouring, and then try to find a monochromatic subset yourself,

instead of immediately reading on to see what the monochromatic subset is.

Example. Suppose we had the colouring c : N

(2)

→ {red, blue} by

c(ij) =

(

red i + j is even

blue i + j is odd

Then we can pick

{

, ···}

, and this is an infinite monochromatic set.

Example. Consider c : N

(2)

→ {0, 1, 2}, where

c(ij) = max{n : 2

| i + j} mod 3

In this case, taking

{

, ···}

gives an infinite monochromatic set of

colour 0.

Example. Let c : N

(2)

→ {red, blue} by

c(ij) =

(

red i + j has an even number of distinct prime factors

blue otherwise

It is in fact an open problem to find an explicit infinite monochromatic set

for this colouring, or even for which colour do we have an infinite monochromatic

set. However, we can prove that such a set must exist!

Theorem

(Ramsey’s theorem)

Whenever we

-colour

(2)

, there exists an

infinite monochromatic set

, i.e. given any map

(2)

→

[

], there exists a

subset X ⊆ N such that X is infinite and c|

(2)

is a constant function.

Proof.

Pick an arbitrary point

∈ N

. Then by the pigeonhole principle, there

must exist an infinite set

⊆ N \ {a

}

such that all the

edges (i.e. edges

of the form (a

, b

) with b

∈ B

) are of the same colour c

Now again arbitrarily pick an element

∈ B

. Again, we find some infinite

⊆ B

such that all

edges are the same colour

. We proceed inductively.

We obtain a sequence

, a

, ···}

and a sequence of colours

, c

, ···}

such

that c(a

, a

) = c

, for i < j.

Now again by the pigeonhole principle, since there are finitely many colours,

there exists an infinite subsequence

, c

, ···

that is constant. Then

, a

, ···

is an infinite monochromatic set, since all edges are of the colour

···

So we are done.

This proof exhibits many common themes we will see in the different Ram-

sey theory proofs we will later do. The first is that the proof is highly non-

constructive. Not only does it not give us the infinite monochromatic set; it

doesn’t even tell us what the colour is.

Another feature of the proof is that we did not obtain the infinite monochro-

matic set in one go. Instead, we had to first pass through that intermediate

structure, and then obtain an infinite monochromatic set from that. In future

proofs, we might have to go through many passes, instead of just two.

This theorem looks rather innocuous, but it has many interesting applications.

Corollary

(Bolzano-Weierstrass theorem)

Let (

)

i≥0

be a bounded sequence

of real numbers. Then it has a convergent subsequence.

Proof. We define a colouring c : N

(2)

→ {↑, ↓}, where

c(ij) =

(

↑ x

< x

↓ x

≤ x

Then Ramsey’s theorem gives us an infinite monochromatic set, which is the

same as a monotone subsequence. Since this is bounded, it must convergence.

With a very similar technique, we can prove that we can do this for

(r)

for

any r, and not just N

(2)

Theorem

(Ramsey’s theorem for

sets)

Whenever

(r)

-coloured, there

exists an infinite monochromatic set, i.e. for any

(r)

→

[

], there exists an

infinite X ⊆ N such that c|

(r)

is constant.

We can again try some concrete examples:

Example. We define c : N

(3)

→ {red, blue} by

c(ijk) =

(

red i | j + k

blue otherwise

Then X = {2

, 2

, ···} is an infinite monochromatic set.

Proof.

We induct on

. This is trivial when

= 1. Assume

r >

1. We fix

∈ N

We induce a k-colouring c

of (N \ {a

})

(r−1)

(F ) = c(F ∪ {a

}).

By induction, there exists an infinite

⊆ N\{a

}

such that

is monochromatic

for c

, i.e. all a

-B

r-sets have the same colour c

We proceed inductively as before. We get

, a

, ···

and colours

, c

, ···

etc. such that for any

-set

contained in

, a

, ···}

, we have

(

) =

min F

Then again, there exists

, c

, ···

all identical, and our monochromatic

set is {a

, a

, ···}.

Now a natural question to ask is — what happens when we have infinitely

many colours? Clearly an infinite monochromatic subset cannot be guaranteed,

because we can just colour all edges with different colours.

The natural compromise is to ask if we can find an

such that either

monochromatic, or

is injective. After a little thought, we realize this is also

impossible. We can construct a colouring on

(2)

as follows: we first colour all

edges that involve 1 with the colour 1, then all edges that involve 2 with the

colour 2 etc:

. . .

It is easy to see that we cannot find an infinite monochromatic subset or an

infinite subset with all edges of different colours.

However, this counterexample we came up with still has a high amount of

structure — the colour of the edges are uniquely determined by the first element.

It turns out this is the only missing piece (plus the obvious dual case).

With this, we can answer our previous question:

Theorem

(Canonical Ramsey theorem)

For any

(2)

→ N

, there exists an

infinite X ⊆ N such that one of the following hold:

(i) c|

(2)

is constant.

(ii) c|

(2)

is injective.

(iii) c(ij) = c(k`) iff i = k for all i, j, k, ` ∈ X.

(iv) c(ij) = c(k`) iff j = ` for all i, j, k, ` ∈ X.

Recall that when we write

, we always implicitly assume

i < j

, so that (iii)

and (iv) make sense.

In previous proofs, we only had to go through two passes to get the desired

set. This time we will need more.

Proof. Consider the following colouring of X

(4)

: let c

be a 2-colouring

(ijk`) =

(

SAME c(ij) = c(k`)

DIFF otherwise

Then we know there is some infinite monochromatic set

⊆ N

for

. If

coloured

SAME

, then we are done. Indeed, for any pair

and

, we can

pick some huge k, ` such that j, j

< k < `, and then

c(ij) = c(k`) = c(i

)

as we know c

(ijk`) = c

k`) = SAME.

What if

is coloured

DIFF

? We next look at what happens when we have

edges that are nested each other. We define

(4)

→ {SAME, DIFF}

defined by

(ijk`) =

(

SAME c(i`) = c(jk)

DIFF otherwise

Again, we can find an infinite monochromatic subset

⊆ X

with respect to

We now note that

cannot be coloured

SAME

. Indeed, we can just look at

i j

k `

m n

So if X

were SAME, we would have

c(`m) = c(in) = c(jk),

which is impossible since X

is coloured DIFF under c

So X

is DIFF. Now consider c

: X

(4)

→ {SAME, DIFF} given by

(ijk`) =

(

SAME c(ik) = c(j`)

DIFF otherwise

Again find an infinite monochromatic subset

⊆ X

for

. Then

cannot

be SAME, this time using the following picture:

contradicting the fact that c

is DIFF. So we know X

is DIFF.

We have now have ended up in this set

such that if we have any two pairs

of edges with different end points, then they must be different.

We now want to look at cases where things share a vertex. Consider

(3)

→ {SAME, DIFF} given by

(ijk) =

(

SAME c(ij) = c(jk)

DIFF otherwise

Let

⊆ X

be an infinite monochromatic set for

. Now

cannot be

coloured SAME, using the following picture:

which contradicts the fact that

DIFF

. So we know

is also coloured

DIFF

under c

We are almost there. We need to deal with edges that nest in the sense of

(iii) and (iv). We look at c

: X

(3)

→ {LSAME, LDIFF} given by

(ijk) =

(

LSAME c(ij) = c(ik)

LDIFF otherwise

Again we find

⊆ X

, an infinite monochromatic set for

. We don’t separate

into cases yet, because we know both cases are possible, but move on to classify

the right case as well. Define c

: X

(3)

→ {RSAME, RDIFF} given by

(ijk) =

(

RSAME c(ik) = c(jk)

RDIFF otherwise

Let X

⊆ X

be an infinite monochromatic subset under c

As before, we can check that it is impossible to get both

LSAME

and

RSAME

using the following picture:

contradicting c

being DIFF.

Then the remaining cases (

LDIFF, RDIFF

), (

LSAME, RDIFF

) and (

RDIFF, LSAME

)

corresponds to the cases (ii), (iii) and (iv).

Note that we could this theorem in one pass only, instead of six, by considering

a much more complicated colouring (c

, c

) with values in

{SAME, DIFF}

× {LSAME, LDIFF} × {RSAME, RDIFF},

but we still have to do the same analysis and it just complicates matters more.

There is a generalization of this to

-sets. One way we can rewrite the

theorem is to say that the colour is uniquely determined by some subset of the

vertices. The cases (i), (ii), (iii), (iv) correspond to no vertices, all vertices, first

vertex, and second vertex respectively. Then for

-sets, we have 2

possibilities,

one for each subset of the r-coordinates.

Theorem

(Higher dimensional canonical Ramsey theorem)

Let

(r)

→ N

be a colouring. Then there exists

D ⊆

[

] and an infinite subset

X ⊆ N

such

that for all

x, y ∈ X

(r)

, we have

(

) =

(

) if

} ⊇ D

, where

x = {x

< x

< ··· < x

} (and similarly for y).

1.2 Finite graphs

We now move on to discuss a finitary version of the above problem. Of course, if

we finitely colour an infinite graph, then we can obtain a finite monochromatic

subgraph of any size we want, because this is just a much weaker version of

Ramsey’s theorem. However, given

n < N

, if we finitely colour a graph of size

N, are we guaranteed to have a monochromatic subgraph of size n?

Before we begin, we note some definitions. Recall again that a graph

is a

pair (V, E) where E ⊆ V

(2)

Example. The path graph on n vertices P

We can write

V = [n], E = {12, 23, 34, ··· , (n − 1)n}.

Example. The n-cycle C

V = [n], E = {12, 23, ··· , (n − 1)n, 1n}.

Finally, we have

Example. The complete graph V

on n vertices is

V = [n], E = V

(2)

Concretely, the question we want to answer is the following — how big

does our (complete) graph have to be to guarantee a complete monochromatic

subgraph of size n?

In this section, we will usually restrict to 2 colours. Everything we say will

either be trivially be generalizable to more colours, or we have no idea how to

do so. It is an exercise for the reader to figure out which it is.

Definition

(Ramsey number)

We let

(

) =

(

) to be the smallest

N ∈ N

whenever we red-blue colour the edges of

, then there is a monochromatic

copy of K

It is not immediately obvious that

(

) has to exist, i.e. it is a finite number.

It turns out we can easily prove this from the infinite Ramsey theorem.

Theorem (Finite Ramsey theorem). For all n, we have R(n) < ∞.

Proof.

Suppose not. Let

be such that

(

) =

∞

. Then for any

m ≥ n

, there

is a 2-colouring

= [

]

(2)

such that there is no monochromatic set of

size n.

We want to use these colourings to build up a colouring of

(2)

with no

monochromatic set of size

. We want to say we take the “limit” of these

colourings, but what does this mean? To do so, we need these colourings to be

nested.

By the pigeonhole principle, there exists an infinite set

⊆ N

and some

fixed 2-colouring d

of [n] such that c

[n]

(2)

= d

for all m ∈ M

Similarly, there exists an infinite

⊆ M

such that

[n+1]

(2)

n+1

for

m ∈ M

, again for some 2-colouring

n+1

of [

+ 1]. We repeat this over and

over again. Then we get a sequence

, d

n+1

, ···

of colourings such that

is a

2-colouring of [

]

(2)

without a monochromatic

, and further for

i < j

, we have

[i]

(2)

= d

We then define a 2-colouring c of N

(2)

c(ij) = d

(ij)

for any m > i, j. Clearly, there exists no monochromatic K

in c, as any K

finite. This massively contradicts the infinite Ramsey theorem.

There are other ways of obtaining the finite Ramsey theorem from the infinite

one. For example, we can use the compactness theorem for first order logic.

Proof.

Suppose

(

) =

∞

. Consider the theory with proposition letters

for

each

ij ∈ N

(2)

. We will think of the edge

as red if

, and blue if

⊥

For each subset of

of size

, we add in the axiom that says that set is not

monochromatic.

Then given any finite subset of the axioms, it mentions only finitely many

subsets of

. Suppose it mentions vertices only up to

m ∈ N

. Then by assumption,

there is a 2-colouring of [

]

(2)

with no monochromatic subset of size

. So by

assigning

accordingly (and randomly assigning the remaining ones), we have

found a model of this finite subtheory. Thus every finite fragment of the theory

is consistent. Hence the original theory is consistent. So there is a model, i.e. a

colouring of N

(2)

with no monochromatic subset.

This contradicts the infinite Ramsey theorem.

Similarly, we can do it by compactness of the space

{

}

endowed with

metric

d(f, g) =

if n = min{i : f

6= g

By Tychonoff theorem, we know this is compact, and we can deduce the theorem

from this.

While these theorems save work by reusing the infinite Ramsey theorem, it

is highly non-constructive. It is useless if we want to obtain an actual bound on

(

). We now go back and see what we did when we proved the infinite Ramsey

theorem.

To prove the infinite Ramsey theory, We randomly picked a point

. Then

there are some things that are connected by red to it, and others connected by

blue:

Next we randomly pick a point in one of the red or blue sets, and try to move on

from there. Suppose we landed in the red one. Now note that if we find a blue

in the red set, then we are done. But on the other hand, we only need a red

n−1

, and not a full blown

. When we moved to this red set, the problem is

no longer symmetric.

Thus, to figure out a good bound, it is natural to consider an asymmetric

version of the problem.

Definition

(Off-diagonal Ramsey number)

We define

(

n, m

) =

(

, K

)

to be the minimum

N ∈ N

such that whenever we red-blue colour the edges of

, we either get a red K

or a blue K

Clearly we have

R(n, m) ≤ R(max{n, m}).

In particular, they are finite. Once we make this definition, it is easy to find a

bound on R.

Theorem. We have

R(n, m) ≤ R(n − 1, m) + R(n, m − 1).

for all n, m ∈ N. Consequently, we have

R(n, m) ≤



n + m − 1

n − 2



Proof. We induct on n + m. It is clear that

R(1, n) = R(n, 1) = 1, R(n, 2) = R(2, n) = n

Now suppose

N ≥ R

(

n −

, m

) +

(

n, m −

1). Consider any red-blue colouring

of K

and any vertex v ∈ V (K

). We write

v(K

) \ {v} = A ∪ B,

where each vertex

is joined by a red edge to

, and each vertex in

is joined

by a blue edge to v. Then

|A| + |B| ≥ N − 1 ≥ R(n − 1, m) + R(n, m − 1) − 1.

It follows that either

|A| ≥ R

(

n −

, m

) or

|B| ≥ R

(

n, m −

1). We wlog it is the

former. Then by definition of

(

n −

, m

), we know

contains either a blue

copy of

or a red copy of

n−1

. In the first case, we are done, and in the

second case, we just add v to the red K

n−1

The last formula is just a straightforward property of binomial coefficients.

In particular, we find

R(n) ≤



2n − 1

n − 2



≤

√

We genuinely have no idea whether

∼

is the correct growth rate, i.e. if there

is some

such that

(

)

≤

− ε

)

. However, we do know that for any

c >

we eventually have

R(n) ≤

That was an upper bound. Do we have a lower bound? In particular, does

(

)

have to grow exponentially? The answer is yes, and the answer is a very classical

construction of Erd¨os.

Theorem. We have R(n) ≥

√

for sufficiently large n ∈ N.

The proof is remarkable in that before this was shown, we had no sensible

bound at all. However, the proof is incredibly simple, and revolutionized how

we think about colourings.

Proof.

Let

N ≤

√

. We perform a red-blue colouring of

randomly, where

each edge is coloured red independently of the others with probability

We let

be the number of red copies of

in such a colouring. Then

since expectation is linear, we know the expected value is

] =







(

)

≤









(

)

≤





√

−n





for sufficiently large n.

Similarly, we have [

]

. So the expected number of monochromatic

1. So in particular there must be some colouring with no monochromatic

Recall the bound

R(m, n) ≤



m + n − 2

m − 1



If we think of m as being fixed, then this tells us

R(m, n) ∼ (n + m)

m−1

For example, if m is 3, then we have

R(3, n) ≤



n + 1



n(n + 1)

∼ n

We can sort-of imagine where this bound came from. Suppose we randomly pick

a vertex

. Then if it is connected to at least

other vertices by a red edge,

then we are done — if there is even one red edge among those

things, then we

have a red triangle. Otherwise, all edges are blue, and we’ve found a complete

blue K

So this is connected to at most

n −

1 things by a red edge. So if our graph

is big enough, we can pick some

connected to

by a blue edge, and do the

same thing to

. We keep going on, and by the time we reach

, we would

have found

, ··· , v

all connected to each other by blue edges, and we are

done. So we have K(3, n) ∼ n

But this argument is rather weak, because we didn’t use that large pool of blue

edges we’ve found at v

. So in fact this time we can do better than n

Theorem. We have

R(3, n) ≤

100n

log n

for sufficiently large n ∈ N.

Here the 100 is, obviously, just some random big number.

Proof.

Let

N ≥

100

/ log n

, and consider a red-blue colouring of the edges of

with no red K

. We want to find a blue K

in such a colouring.

We may assume that

(i) No vertex v has ≥ n red edges incident to it, as argued just now.

(ii) If we have v

, v

such that v

and v

are red, then v

is blue:

We now let

F = {W : W ⊆ V (K

) such that W

(2)

is monochromatic and blue}.

We want to find some

W ∈ F

such that

|W | ≥ n

, i.e. a blue

. How can we

go about finding this?

Let

be a uniformly random member of

. We will be done if we can show

that that E[|W |] ≥ n.

We are going to define a bunch of random variables. For each vertex

v ∈

V (K

), we define the variable

= n1

{v∈W }

+ |{u : uv is red and u ∈ W }|.

Claim.

E[X

] >

log n

for each vertex v.

To see this, let

A = {u : uv is red}

and let

B = {u : uv is blue}.

then from the properties we’ve noted down, we know that

|A| < n

and

(2)

blue. So we know very well what is happening in

, and nothing about what is

in B.

We fix a set

S ⊆ B

such that

S ∈ F

, i.e.

(2)

is blue. What can we say about

W if we condition on B ∩ W = S?

Let

T ⊆ A

be the set of vertices that are joined to

only by blue edges. Write

|T |

. Then if

B ∩W

, then either

W ⊆ S ∪T

, or

W ⊆ S ∪{v}

. So there

are exactly 2

+ 1 choices of W . So we know

E[X

| W ∩ B = S] ≥

+ 1

(E[|random subset of T|])

+ 1

x−1

+ 1

Now if

x <

log n

then

+ 1

≥

√

n + 1

≥

log n

for all sufficiently large n. On the other hand, if

x ≥

log n

then

x−1

+ 1

≥

log n

≥

log n

So we are done.

Claim.

v∈V

≤ 2n|W |.

To see this, for each vertex

, we know that if

v ∈ W

, then it contributes

to the sum via the first term. Also, by our initial observation, we know that

v ∈ W

has at most

neighbours. So it contributes at most

to the second term

(acting as the “u”).

Finally, we know that

E[|W |] ≥

E[X

] ≥

log n

≥

100n

log n

20n

≥ 5n.

Therefore we can always find some W ∈ F such that |W | ≥ n.

Now of course, there is the following obvious generalization of Ramsey

numbers:

Definition

(

G, H

))

Let

G, H

be graphs. Then we define

(

G, H

) to be the

smallest

such that any red-blue colouring of

has either a red copy of

or a blue copy of H.

Obviously, we have

(

G, H

)

≤ R

(

|G|, |H|

). So the natural question is if we

can do better than that.

Exercise. Show that

R(P

, P

) ≤ 2n.

So sometimes it can be much better.

2 Ramsey theory on the integers

So far, we’ve been talking about what happens when we finitely colour graphs.

What if we k-colour the integers N? What can we say about it?

It is a trivial statement that this colouring has a monochromatic subset, by the

pigeonhole principle. Interesting questions arise when we try to take the additive

structure of

into account. So we could ask, can we find a monochromatic

“copy” of N.

One way to make this question concrete is to ask if there is an infinite

monochromatic arithmetic progression.

The answer is easily a “no”! There are only countably many progressions, so

for each arithmetic progression, we pick two things in the progression and colour

them differently.

We can also construct this more concretely. We can colour the first number

red, the next two blue, the next three red etc. then it is easy to see that it

doesn’t have an infinite arithmetic progression.

···

But this is somewhat silly, because there is clearly a significant amount of

structure in the sequence there. It turns out the following is true:

Theorem

(van der Waerden theorem)

Let

m, k ∈ N

. Then there is some

(

m, k

) such that whenever [

] is

-coloured, then there is a monochromatic

arithmetic progression of length n.

The idea is to do induction on

. We will be using colourings with much

greater than k colours to deduce the existence of W (m, k).

We can try a toy example first. Let’s try to show that

2) exists. Suppose

we have three natural numbers:

By the pigeonhole principle, there must be two things that are the same colour,

say

If this is the case, then if we don’t want to have an arithmetic progression of

length 3, then the fifth number must be blue

We now cut the universe into blocks into 5 things. Again by the pigeonhole

principle, there must be two blocks that look the same. Say it’s this block again.

Now we have two sequences, and the point at the end belongs to both of the

two sequences. And no matter what colour it is, we are done.

For

= 3, we can still find such a block, but now that third point could be a

third colour, say, green. This will not stop us. We can now look at these big

blocks, and we know that eventually, these big blocks must repeat themselves.

Here we did the case

= 2, and we used the pigeonhole principle. When we do

m > 2, we will use van der Waerden theorem for smaller m inductively.

We now come up with names to describe the scenario we had above.

Definition

(Focused progression)

We say a collection of arithmetic progressions

, A

, ··· , A

of length m with

= {a

, a

+ d

, ··· , a

+ (m − 1)d

}

are focused at f if a

+ md

= f for all 1 ≤ i ≤ r.

Example. {1, 4} and {5, 6} are focused at 7.

Definition

(Colour focused progression)

, ··· , A

are focused at

, and

each

is monochromatic and no two are the same colour, then we say they are

colour focused at f .

We can now write the proof.

Proof.

We induct on

. The result is clearly trivial when

= 1, and follows

easily from the pigeonhole principle when m = 2.

Suppose

m >

1, and assume inductively that

(

m −

, k

) exists for any

∈ N.

Here is the claim we are trying to established:

Claim.

For each

r ≤ k

, there is a natural number

such that whenever we

k-colour [n], then either

(i) There exists a monochromatic AP of length m; or

(ii) There are r colour-focused AP’s of length m − 1.

It is clear that this claim implies the theorem, as we can pick

. Then if

there isn’t a monochromatic AP of length

, then we look at the colour of the

common focus, and it must be one of the colours of those AP’s.

To prove the claim, we induct on

. When

= 1, we may take

(

m −

, k

Now suppose

r >

1, and some

works for

r −

1. With the benefit of hindsight,

we shall show that

n = W (m − 1, k

)2n

works for r.

We consider any

-colouring of [

], and suppose it has no monochromatic

AP of length m. We need to find r colour-focused progressions of length n − 1.

We view this

-colouring of [

] as a

colouring of blocks of length 2

, of

which there are W (m − 1, k

Then by definition of W (m − 1, k

), we can find blocks

, B

s+t

, ··· , B

s+(m−2)t

which are coloured identically. By the inductive hypothesis, we know each

contains

r −

1 colour-focused AP’s of length

m −

1, say

, .., A

r−1

with first

terms

, ··· , a

and common difference

, ··· , d

r−1

, and also their focus

because the length of B

is 2n

, not just n

Since we assumed there is no monochromatic progression of length

, we can

assume f has a different colour than the A

Now consider

, A

, ··· , A

r−1

, where

has first term

, common difference

+ 2

, and length

m −

1. This difference sends us to the next block, and then

the next term in the AP. We also pick

to consist of the all the focus of the

blocks B

, namely

= {f, f + 2n

t, ··· , f + 2n

t(m − 2)}

These progressions are monochromatic with distinct colours, and focused at

f + (2n

t)(m − 1). So we are done.

This argument, where one looks at the induced colourings of blocks, is called

a product argument.

The bounds we obtain from this argument are, as one would expect, terrible.

We have

W (3, k) ≤ k

where the tower of k’s has length k − 1.

Now we can generalize in a different way. What can we say about monochro-

matic structures a

-colouring of

? What is the right generalization of van

der Waerden theorem?

To figure out the answer, we need to find the right generalization of arithmetic

progressions.

Definition

(Homothetic copy)

Given a finite

S ⊆ N

, a homothetic copy of

is a set of the form

`S + M,

where `, M ∈ N

and ` 6= 0.

Example.

An arithmetic progression of length

is just a homothetic copy of

[m] = {1, 2, ··· , m}.

Thus, the theorem we want to say is the following:

Theorem

(Gallai)

Whenever

-coloured, there exists a monochromatic

(homothetic) copy of S for each finite S ⊆ N

In order to prove this, we prove the beautiful Hales–Jewett theorem, which

is in some sense the abstract core of the argument we used for van der Waerden

theorem.

We need a bit of set up. As usual, we write

[m]

= {(x

, ··· , x

) : x

∈ [m]}.

Here is the important definition:

Definition

(Combinatorial line)

A combinatorial line

L ⊆

[

]

is a set of the

form

{(x

, ··· , x

n) : x

= x

for all i, j ∈ I; x

= a

for all i 6∈ I}

for some fixed non-empty set of coordinates I ⊆ [n] and a

∈ [m].

I is called the set of active coordinates.

Example. Here is a line in [3]

This line is given by I = {1}, a

= 1.

The following shows all the lines we have:

It is easy to see that any line has exactly [

] elements. We write

−

and

for the first and last point of the line, i.e. the points where the active coordinates

are all 1 and

respectively. It is clear that any line

is uniquely determined

by L

−

and its active coordinates.

Example. In [3]

, we have the line

L = {(1, 2, 1), (2, 2, 2), (3, 2, 3)}.

This is a line with

{

}

and

= 2. The first and last points are (1

and (3, 2, 3).

Then we have the following theorem:

Theorem

(Hales–Jewett theorem)

For all

m, k ∈ N

, there exists

(

m, k

)

such that whenever [m]

is k-coloured, there exists a monochromatic line.

Note that this theorem implies van der Waerden’s theorem easily. The idea

is to embed [

]

into

linearly, so that a monochromatic line in [

]

gives an

arithmetic progression of length

. Explicitly, given a

-colouring

N →

[

we define c

: [m]

→ [k] by

, x

, ··· , x

) = c(x

+ x

+ ··· + x

Now a monochromatic line gives us an arithmetic progression of length m. For

example, if the line is

L = {(1, 2, 1), (2, 2, 2), (3, 2, 3)},

then we get the monochromatic progression 4

8 of length 3. In general, the

monochromatic AP defined has d = |I|.

The proof is essentially what we did for van der Waerden theorem. We are

going to build a lot of almost-lines that point at the same vertex, and then no

matter what colour the last vertex is, we are happy.

Definition

(Focused lines)

We say lines

, ··· , L

are focused at

f ∈

[

]

= f for all i = 1, ··· , r.

Definition

(Colour focused lines)

, ··· , L

are focused lines, and

\{L

}

is monochromatic for each

= 1

, ··· , r

and all these colours are distinct, then

we say L

, ··· , L

are colour focused at f .

Proof.

We proceed by induction on

. This is clearly trivial for

= 1, as a line

only has a single point.

Now suppose

m >

1, and that

(

m −

, k

) exists for all

k ∈ N

. As before,

we will prove the following claim:

Claim.

For each 1

≤ r ≤ k

, there is an

n ∈ N

such that in any

-colouring of

[m]

, either

(i) there exists a monochromatic line; or

(ii) there exists r colour-focused lines.

Again, the result is immediate from the claim, as we just use it for

and

look at the colour of the focus.

The prove this claim, we induct on

. If

= 1, then picking

(

m−

, k

)

works, as a single colour-focused line of length

is just a monochromatic line of

length n − 1, and [m − 1]

⊆ [m]

naturally.

Now suppose

r >

1 and

works for

r −

1. Then we will show that

works, where

= HJ(m −1, k

Consider a colouring

: [

]

n+n

→

[

], and we assume this has no monochromatic

lines.

We think of [

]

n+n

as [

]

[

]

. So for each point in [

]

, we have

a whole cube [

]

. Consider a

colouring of [

]

as follows — given any

x ∈

[

]

, we look at the subset of [

]

n+n

containing the points with last

coordinates

. Then we define the new colouring of [

]

to be the restriction

of c to this [m]

, and there are m

possibilities.

Now there exists a line

such that

L \ L

is monochromatic for the new

colouring. This means for all a ∈ [m]

and for all b, b

∈ L \ L

, we have

c(a, b) = c(a, b

Let

(

) denote this common colour for all

a ∈

[

]

. this is a

-colouring of

[

]

with no monochromatic line (because

doesn’t have any). So by definition

, there exists lines

, L

, ··· , L

r−1

in [

]

which are colour-focused at some

f ∈ [m]

for c

In the proof of van der Waerden, we had a progression within each block,

and also how we just between blocks. Here we have the same thing. We have

the lines in [m]

, and also the “external” line L.

Consider the line L

, L

, ··· , L

r−1

in [m]

n+n

, where

)

−

= (L

−

, L

−

and the active coordinate set is

∪ I

, where

is the active coordinate set of

Also consider the line F with F

−

= (f, L

−

) and active coordinate set I.

Then we see that

, ··· , L

r−1

, F

are

colour-focused lines with focus

(f, L

We can now prove our generalized van der Waerden.

Theorem

(Gallai)

Whenever

-coloured, there exists a monochromatic

(homothetic) copy of S for each finite S ⊆ N

Proof.

Let

(1)

, S

(2)

, ··· , S

(

)

} ⊆ N

. Given a

-colouring

→

[

], we

induce a k-colouring c : [m]

→ [k] by

, ··· , x

) = c(S(x

) + S(x

) + ··· + S(x

)).

By Hales-Jewett, for sufficiently large

, there exists a monochromatic line

for

, which gives us a monochromatic homothetic copy of

. For example, if

the line is (1 2 1), (2 2 2) and (3 2 3), then we know

c(S(1) + S(2) + S(1)) = c(S(2) + S(2) + S(2)) = c(S(3) + S(2) + S(3)).

So we have the monochromatic homothetic copy

λS

, where

= 2 (the

number of active coordinates), and µ = S(2).

Largeness in Ramsey theory*

In van der Waerden, we proved that for each

k, m

, there is some

such that

whenever we

-colour [

], then there is a monochromatic AP of length

. How

much of this is relies on the underlying set being [

]? Or is it just that if we

finitely colour [

], then one of the colours must contain a lot of numbers, and if

we have a lot of numbers, then we are guaranteed to have a monochromatic AP?

Of course, by “contains a lot of numbers”, we do not mean the actual number

of numbers we have. It is certainly not true that for some

, whenever we

-colour any set of

integers, we must have a monochromatic

-AP, because

an arbitrary set of

integers need not even contain an

-AP at all, let alone a

monochromatic one. Thus, what we really mean is density.

Definition (Density). For A ⊆ N, we let the density of A as

d(A) = lim sup

(b−a)→∞

A ∩ [a, b]

|b − a|

Clearly, in any finite

-colouring of

, there exists a colour class with positive

density. Thus, we want to know if merely a positive density implies the existence

of progressions. Remarkably, the answer is yes!

Theorem

(Szemer´edi theorem)

Let

δ >

0 and

m ∈ N

. Then there exists some

(

m, δ

)

∈ N

such that any subset

A ⊆

[

] with

|A| ≥ δN

contains an

m-term arithmetic progression.

The proof of this theorem is usually the subject of an entire lecture course,

so we are not going to attempt to prove this. Even the particular case

= 3 is

very hard.

This theorem has a lot of very nice Ramsey consequences. In the case of

graph colourings, we asked ourselves what happens if we colour a graph with

infinitely many colours. Suppose we now have a colouring

N → N

. Can we

still find a monochromatic progression of length

? Of course not, because

can be injective.

Theorem. For any c : N → N, there exists a m-AP on which either

(i) c is constant; or

(ii) c is injective.

It is also possible to prove this directly, but it is easy with Szemer´edi.

Proof. We set

δ =

(m + 1)

We let N = S(m, δ). We write

[N] = A

∪ A

∪ ··· ∪ A

where the

’s are the colour-classes of

[N]

. By choice of

, we are done if

| ≥ δN for some 1 ≤ i ≤ k. So suppose not.

Let’s try to count the number of arithmetic progressions in [

]. There are

more than

(

+ 1)

of these, as we can take any

a, d ∈

[

N/m

+ 1]. We want

to show that there is an AP that hits each A

at most once.

So, fixing an

, how many AP’s are there that hit

at least twice? We need

to pick two terms in

, and also decide which two terms in the progression they

are in, e.g. they can be the first and second term, or the 5th and 17th term. So

there are at most m

terms.

So the number of AP’s on which c is injective is greater than

(m + 1)

− k|A

≥

(m + 1)

−

(δN)

(m + 1)

− δN

≥ 0.

So we are done. Here the first inequality follows from the fact that

= [

]

and each |A

| < δN.

Our next theorem will mix arithmetic and graph-theoretic properties. Con-

sider a colouring

(2)

→

[2]. As before, we say a set

is monochromatic if

(2)

is constant. Now we want to try to find a monochromatic set with some

arithmetic properties.

The first obvious question to ask is — can we find a monochromatic 3-term

arithmetic progression? The answer is no. For example, we can define

(

) to be

the parity of largest power of 2 dividing

j −i

, and then there is no monochromatic

3-term arithmetic progression.

What if we make some concessions — can we find a blue 10-AP, or if not,

find an infinite red set? The answer is again no. This construction is slightly

more involved. To construct a counterexample, we can make progressively larger

red cliques, and take all other edges blue. If we double the size of the red cliques

every time, it is not hard to check that there is no blue 4-AP, and no infinite

red set.

···

What if we further relax our condition, and only require an arbitrarily large red

set?

Theorem. For any c : N

(2)

→ {red, blue}, either

(i) There exists a blue m-AP for each m ∈ N; or

(ii) There exists arbitrarily large red sets.

Proof.

Suppose we can’t find a blue

-AP for some fixed

. We induct on

and try to find a red set of size r.

Say

A ⊆ N

is a progression of length

. Since

has no blue

-term

progression, so it must contain many red edges. Indeed, each

-AP in

must

contain a red edge. Also each edge specifies two points, and this can be extended

to an

-term progression in at most

ways. Since there are

(

+ 1)

. So

there are at least

(m + 1)

red edges. With the benefit of hindsight, we set

δ =

(m + 1)

The idea is that since we have lots of red edges, we can try to find a point with

a lot of red edges connected to it, and we hope to find a progression in it.

We say X, Y ⊆ N form an (r, k)-structure if

(i) They are disjoint

(ii) X is red;

(iii) Y is an arithmetic progression;

(iv) All X-Y edges are red;

(v) |X| = r and |Y | = k.

···

We show by induction that there is an (r, k)-structure for each r and k.

A (1

, k

) structure is just a vertex connected by red edges to a

-point structure.

If we take

(

δ, k

), we know among the first

natural numbers, there are

at least

(

+ 1)

) red edges inside [

]. So in particular, some

v ∈

[

]

has at least

δN

red neighbours in [

], and so we know

is connected to some

k-AP by red edges. That’s the base case done.

Now suppose we can find an (r − 1, k

)-structure for all k

∈ N. We set

= S



(m + 1)

, k



We let (

X, Y

) be an (

r −

, k

)-structure. As before, we can find

v ∈ Y

such

that

has

δ|Y |

red neighbours in

. Then we can find a progression

length

in the red neighbourhood of

, and we are done, as (

X ∪ {v}, Y

) is an

(

r, k

)-structure, and an “arithmetic progression” within an arithmetic progression

is still an arithmetic progression.

Before we end this chapter, we make a quick remark. Everything we looked

for in this chapter involved the additive structure of the naturals. What about

the multiplicative structure? For example, given a finite colouring of

, can we

find a monochromatic geometric progression? The answer is trivially yes. We

can look at

{

x ∈ N}

, and then multiplication inside this set just looks like

addition in the naturals.

But what if we want to mix additive and multiplicative structures? For

example, can we always find a monochromatic set of the form

y, xy}

? Of

course, there is the trivial answer

= 2, but is there any other? This

question was answered positively in 2016! We will return to this at the end of

the course.

3 Partition Regularity

In the previous chapter, the problems we studied were mostly “linear” in nature.

We had some linear system, namely that encoding the fact that a sequence is an

AP, and we wanted to know if it had a monochromatic solution. More generally,

we can define the following:

Definition

(Partition regularity)

We say an

m ×n

matrix

over

is partition

regular if whenever

is finitely coloured, there exists an

x ∈ N

such that

Ax = 0 and x is monochromatic, i.e. all coordinates of x have the same colour.

Recall that N does not include 0.

There is no loss in generality by assuming

in fact has entries in

, by

scaling

, but sometimes it is (notationally) convenient to consider cases where

the entries are rational.

The question of the chapter is the following — when is a matrix partition

regular? We begin by looking at some examples.

Example

(Schur’s theorem)

Schur’s theorem says whenever

is finitely

coloured, there exists a monochromatic set of the form

{x, y, x

. In other

words the matrix



1 1 −1



is partition regular, since



1 1 −1











= 0 ⇐⇒ z = x + y.

Example.

How about



2 3 −5



. This is partition regular, because we can

pick any x, and we have



2 3 −5











= 0.

This is a trivial example.

How about van der Waerden’s theorem?

Example.

Van der Waerden’s theorem says there is a monochromatic 3-AP

, x

}

whenever

is finitely-coloured. We know

, x

forms a 3-AP iff

− x

= x

− x

or equivalently

+ x

= 2x

This implies that



1 −2 1



is partition regular. But this is actually not a

very interesting thing to say, because

is always a solution to this

equation. So this falls into the previous “trivial” case.

On the second example sheet we saw a stronger version of van der Waerden.

It says we can always find a monochromatic set of the form

{d, a, a + d, a + 2d, ··· , a + md}.

By including this variable, we can write down the property of being a progression

in a non-trivial format by



1 1 0 −1

2 1 −1 0















= 0

This obviously generalizes to an arbitrary m-AP, with matrix







1 1 −1 0 0 ··· 0

1 2 0 −1 0 ··· 0

1 3 0 0 −1 ··· 0

1 m 0 0 0 ··· −1







We’ve seen three examples of partition regular matrices. Of course, not every

matrix is partition regular. The matrix



1 1



is not partition regular, for the

silly reason that two positive things cannot add up to zero.

Let’s now look at some non-trivial first matrix that is not partition regular.

Example.

The matrix



2 −1



is not partition regular, since we can put

(

) = (

−

, where

is the maximum integer such that 2

| x

. Then

{x,

never monochromatic.

A similar argument shows that if

λ ∈ Q

is such that



λ, −1



is partition

regular, then λ = 1.

But if we write down a random matrix, say



2 3 −6



? The goal of this

chapter is to find a complete characterization of matrices that are partition

regular.

Definition (Columns property). Let

A =





↑ ↑ ↑

(1)

(2)

··· c

(n)

↓ ↓ ↓





We say

has the columns property if there is a partition [

] =

∪B

∪···∪B

such that

i∈B

(i)

∈ span{c

(i)

: c

(i)

∈ B

∪ ··· ∪ B

s−1

}

for s = 1, ··· , d. In particular,

i∈B

(i)

= 0

What does this mean? Let’s look at the matrices we’ve seen so far.

Example.



1 1 −1



has the columns property by picking

{

}

and

= {2}.

Example.



2 3 −5



has the columns property by picking B

= {1, 2, 3}.

Example. The matrix







1 1 −1 0 0 ··· 0

1 2 0 −1 0 ··· 0

1 3 0 0 −1 ··· 0

1 m 0 0 0 ··· −1







has the column property. Indeed, we take

{

, ··· , m

+ 2

}

, and since

(3)

, ··· , c

(m+2)

} spans all of R

, we know picking B

= {2} works.

Example.



` −1



has the columns property iff

= 1. In particular,



1 1



does not have a columns property.

Given these examples, it is natural to conjecture the following:

Theorem

(Rado’s theorem)

A matrix

is partition regular iff it has the

column property.

This is a remarkable theorem! The property of being partition regular

involves a lot of quantifiers, over infinitely many things, but the column property

is entirely finite, and we can get a computer to check it for us easily.

Another remarkable property of this theorem is that neither direction is obvi-

ous! It turns out partition regularity implies the column property is (marginally)

easier, because if we know something is partition regular, then we can try to

cook up some interesting colourings and see what happens. The other direction

is harder.

To get a feel of the result, we will first prove it in the case of a single equation.

The columns property in particular becomes much simpler. It just means that

there exists a non-empty subset of the non-zero a

’s that sums up to zero.

Theorem.

, ··· , a

∈ Q \ {

}

, then



··· a



is partition regular iff

there exists a non-empty I ⊆ [n] such that

i∈I

= 0.

For a fixed prime

, we let

(

) denote the last non-zero digit of

in base

i.e. if

x = d

+ d

n−1

+ ··· + d

then

L(x) = min{i : d

6= 0}

and d(x) = d

L(x)

. We now prove the easy direction of the theorem.

Proposition.

, ··· , a

∈ Q\{

}

and



··· a



is partition regular,

then

i∈I

= 0

for some non-empty I.

Proof. We wlog a

∈ Z, by scaling. Fix a suitably large prime

p >

i=1

and consider the (

p −

1)-colouring of

where

is coloured

(

). We find

, ··· , x

such that

= 0.

and

(

) =

for some

d ∈ {

, ··· , p −

}

. We write out everything in base

and let

L = min{L(x

) : 1 ≤ i ≤ n},

and set

I = {i : L(x

) = L}.

Then for all i ∈ I, we have

≡ d (mod p

L+1

On the other hand, we are given that

= 0.

Taking mod p

L+1

gives us

i∈I

d = 0 (mod p

L+1

Since d is invertible, we know

i∈I

= 0 (mod p

L+1

But by our choice of p, this implies that

i∈I

= 0.

Here we knew what prime

to pick ahead. If we didn’t, then we could

consider all primes p, and for each p we find I

⊆ [n] such that

i∈I

= 0 (mod p).

Then some

has to occur infinitely often, and then we are done. Note that this

is merely about the fact that if a fixed number is 0 mod

for arbitrarily large

then it must be zero. This is not some deep local-global correspondence number

theoretic fact.

It turns out this is essentially the only way we know for proving this theorem.

One possible variation is to use the “first non-zero digit” to do the colouring,

but this is harder.

Let’s now try and do the other direction. Before we do that, we start by

doing a warm up. Last time, we proved that if we had



1 λ



, then this is

partition regular iff λ = −1. We now prove a three element version.

Proposition. The equation



1 λ −1



is partition regular for all λ ∈ Q.

Proof.

We may wlog

λ >

0. If

= 0, then this is trivial, and if

λ <

0, then we

can multiply the whole equation by −1.

Say

λ =

The idea is to try to find solutions of this in long arithmetic progressions.

Suppose N is k-coloured. We let

{a, a + d, ··· , a + nd}

be a monochromatic AP, for n sufficiently large.

If sd were the same colour as this AP, then we’d be done, as we can take

x = a, y = sd, z = a +

sd.

In fact, if any of

sd,

sd, ··· ,





have the same colour as the AP, then we’d

be done by taking

x = a, y = isd, z = a +

isd = a + ird ≤ a + nd.

If this were not the case, then

{sd,

sd, ··· ,





sd}

is (

k −

1)-coloured, and this

is just a scaled up copy of N. So we are done by induction on k.

Note that when

= 1, we have



1 1 −1



is partition regular, and this

may be proved by Ramsey’s theorem. Can we prove this more general result by

Ramsey’s theorem as well? The answer is, we don’t know.

It turns out this is not just a warm up, but the main ingredient of what we

are going to do.

Theorem.

, ··· , a

∈ Q

, then



··· a



is partition regular iff there

exists a non-empty I ⊆ [n] such that

i∈I

= 0.

Proof.

One direction was done. To do the other direction, we recall that we had

a really easy case of, say,



2 3 −5



because we can just make all the variables the same?

In the general case, we can’t quite do this, but we may try to solve this

equation with the least number of variables possible. In fact, we shall find some

monochromatic x, y, z, and then assign each of x

, ··· , x

to be one of x, y, z.

We know

i∈I

= 0.

We now partition I into two pieces. We fix i

∈ I, and set











x i = i

y i ∈ I \ {i

}

z i 6∈ I

We would be done if whenever we finitely colour

, we can find monochromatic

x, y, z such that

x +





i∈I\{i

}





z +





i6∈I





y = 0.

But, since

i∈I

= 0,

this is equivalent to

x − a

z + (something)y = 0.

Since all these coefficients were non-zero, we can divide out by

, and we are

done by the previous case.

Note that our proof of the theorem shows that if an equation is partition

regular for all last-digit base

colourings, then it is partition regular for all

colourings. This might sound like an easier thing to prove that the full-blown

Rado’s theorem, but it turns out the only proof we have for this implication is

Rado’s theorem.

We now prove the general case. We first do the easy direction, because it is

largely the same as the single-equation case.

Proposition.

is an

m × n

matrix with rational entries which is partition

regular, then A has the columns property.

Proof.

We again wlog all entries of

are integers. Let the columns of

(1)

, ··· , c

(n)

. Given a prime

, we consider the (

p −

1)-colouring of

, where

is coloured

(

), the last non-zero digit in the base

expansion. Since

partition regular, we obtain a monochromatic solution.

We then get a monochromatic x

, ··· , x

such that Ax = 0, i.e.

(i)

= 0.

Any such solution with colour

induces a partition of [

] =

∪ B

∪ ··· ∪ B

where

– For all i, j ∈ B

, we have L(x

) = L(x

); and

– For all s < t and i ∈ B

, j ∈ B

, the L(x

) < L(x

Last time, with the benefit of hindsight, we were able to choose some large prime

that made the argument work. So we use the trick we mentioned after the

proof last time.

Since there are finitely many possible partitions of [

], we may assume that

this particular partition is generated by infinitely many primes

. Call these

primes

. We introduce some notation. We say two vectors

u, v ∈ Z

satisfy

u ≡ v (mod p) if u

≡ v

(mod p) for all i = 1, ··· , m.

Now we know that

(i)

= 0.

Looking at the first non-zero digit in the base p expansion, we have

i∈B

(i)

≡ 0 (mod p).

From this, we conclude that, by multiplying by d

−1

i∈B

(i)

≡ 0 (mod p),

for all p ∈ P. So we deduce that

i∈B

(i)

= 0.

Similarly, for higher s, we find that for each base p colouring, we have

i∈B

(i)

i∈B

∪...∪B

(i)

≡ 0 (mod p

t+1

)

for all s ≥ 2, and some t dependent on s and p. Multiplying by d

−1

, we find

i∈B

(i)

i∈B

∪...∪B

s−1

−1

(i)

≡ 0 (mod p

t+1

). (∗)

We claim that this implies

i∈B

(i)

∈ hc

(i)

: i ∈ B

∪ ··· ∪ B

s−1

This is not exactly immediate, because the values of

in (

∗

) may change as we

change our p. But it is still some easy linear algebra.

Suppose this were not true. Since we are living in a Euclidean space, we have

an inner product, and we can find some v ∈ Z

such that

hv, c

(i)

i = 0 for all i ∈ B

∪ ··· ∪ B

s−1

and

i∈B

(i)

6= 0.

But then, taking inner products with gives

i∈B

(i)

≡ 0 (mod p

t+1

Equivalently, we have

i∈B

(i)

≡ 0 (mod p),

but this is a contradiction. So we showed that

has the columns property with

respect to the partition [n] = B

∪ ··· ∪ B

We now prove the hard direction. We want an analogous gadget to our



1 λ −1



we had for the single-equation case. The definition will seem rather

mysterious, but it turns out to be what we want, and its purpose becomes more

clear as we look at some examples.

Definition

((

m, p, c

)-set)

For

m, p, c ∈ N

, a set

S ⊆ N

is an (

m, p, c

)-set with

generators x

, ··· , x

if S has the form

S =







i=0

= 0 for all i < j

= c

∈ [−p, p]







In other words, we have

S =

[

j=1

{cx

+ λ

j+1

+ ··· + λ

: λ

∈ [−p, p]}.

For each

, the set

{cx

j+1

···

∈

[

−p, p

]

}

is called a row

of S.

Example. What does a (2, p, 1) set look like? It has the form

− px

, x

− (p − 1)x

, ··· , x

+ px

} ∪ {x

In other words, this is just an arithmetic progression with its common difference.

Example. A (2, p, 3)-set has the form

{3x

− px

, ··· , 3x

= px

} ∪ {3x

The idea of an (

m, p, c

) set is that we “iterate” this process many times, and

so an (

m, p, c

)-set is an “iterated AP’s and various patterns of their common

differences”.

Our proof strategy is to show that that whenever we finitely-colour

, we can

always find an (

m, p, c

)-set, and given any matrix

with the columns property

and any (

m, p, c

)-set (for suitable

), there will always be a solution in

there.

Proposition.

Let

m, p, c ∈ N

. Then whenever

is finitely coloured, there

exists a monochromatic (m, p, c)-set.

Proof.

It suffices to find an (

m, p, c

)-set all of whose rows are monochromatic,

since when

-coloured, and (

, p, c

)-set with

+ 1 has

monochro-

matic rows of the same colour by pigeonhole, and these rows contain a monochro-

matic (m, p, c)-set, by restricting to the elements where a lot of the λ

are zero.

In this proof, whenever we say (

m, p, c

)-set, we mean one all of whose rows are

monochromatic.

We will prove this by induction. We have a

-colouring of [

], where

very very very large. This contains a k-colouring of

B =

c, 2c, ··· ,

Since

is fixed, we can pick this so that

is large. By van der Waerden, we

find some set monochromatic

A = {cx

− Nd, cx

− (N − 1)d, ··· , cx

+ Nd} ⊆ B,

with

very very large. Since each element is a multiple of

by assumption,

we know that

c | d

. By induction, we may find an (

m −

, p, c

)-set in the set

{d,

d, ··· , M d}

, where

is large. We are now done by the (

m, p, c

) set on

generators x

, ··· , x

, provided

i=2

∈ A

for all

∈

[

−p, p

], which is easily seen to be the case, provided

N ≥

(

m −

1)pM.

Note that the argument itself is quite similar to that in the



1 λ −1



case.

Recall that Schur’s theorem said that whenever we finitely-colour

, we can

find a monochromatic

{x, y, x

. More generally, for

, x

, ··· , x

∈ N

, we

let

F S(x

, ··· , x

) =

(

i∈I

: I ⊆ [n], I 6= ∅

)

The existence of a monochromatic (m, 1, 1)-sets gives us

Corollary

(Finite sum theorem)

For every fixed

, whenever we finitely-colour

N, there exists x

, ··· , x

such that F S(x

, ··· , x

) is monochromatic.

This is since an (

1)-set contains more things than

F S

(

, ··· , x

). This

was discovered independently by a bunch of different people, including Folkman,

Rado and Sanders.

Similarly, if we let

F P (x

, ··· , x

) =

(

i∈I

: I ⊆ [n], I 6= ∅

)

then we can find these as well. For example, we can restrict attention to

{

n ∈ N}

, and use the finite sum theorem. This is the same idea as we had

when we used van der Waerden to find geometric progressions.

But what if we want both? Can we have

F S

(

, ··· , x

)

∪ F P

(

, ··· , x

)

in the same colour class? The answer is actually not known! Even the case

when

= 2, i.e. finding a monochromatic set of the form

{x, y, x

y, xy}

open. Until mid 2016, we did not know if we can find

y, xy}

monochromatic

(x, y > 2).

To finish he proof of Rado’s theorem, we need the following proposition:

Proposition.

is a rational matrix with the columns property, then there

is some

m, p, c ∈ N

such that

= 0 has a solution inside any (

m, p, c

) set, i.e.

all entries of the solution lie in the (m, p, c) set.

In the case of a single equation, we reduced the general problem to the case

of 3 variables only. Here we are going to do something similar — we will use the

columns property to reduce the solution to something much smaller.

Proof. We again write

A =





↑ ↑ ↑

(1)

(2)

··· c

(n)

↓ ↓ ↓





Re-ordering the columns of A if necessary, we assume that we have

[n] = B

∪ ··· ∪ B

such that max(B

) < min(B

s+1

) for all s, and we have

i∈B

(i)

i∈B

∪...∪B

s−1

(i)

for some

∈ Q

. These

only depend on the matrix. In other words, we have

(i)

= 0,

where











−q

i ∈ B

∪ ···B

s−1

1 i ∈ B

0 otherwise

For a fixed

, if we scan these coefficients starting from

and then keep

decreasing

, then the first non-zero coefficient we see is 1, which is good, because

it looks like what we see in an (m, p, c) set.

Now we try to write down a general solution with

many free variables.

Given x

, ··· , x

∈ N

, we look at

s=1

It is easy to check that Ay = 0 since

(i)

= 0.

Now take

, and pick

large enough such that

∈ Z

for all

i, s

, and

finally, p = max{cd

: i, s ∈ Q}.

Thus, if we consider the (

m, p, c

)-set on generators (

, ··· , x

) and

defined above, then we have

= 0 and hence

(

) = 0. Since

is integral,

and lies in the (m, p, c) set, we are done!

We have thus proved Rado’s theorem.

Theorem

(Rado’s theorem)

A matrix

is partition regular iff it has the

column property.

So we have a complete characterization of all partition regular matrices.

Note that Rado’s theorem reduces Schur’s theorem, van der Waerden’s

theorem, finite sums theorem etc. to just checking if certain matrices have the

columns property, which are rather straightforward computations.

More interestingly, we can prove some less obvious “theoretical” results.

Corollary

(Consistency theorem)

are partition regular in independent

variables, then



A 0

0 B



is partition regular. In other words, we can solve

= 0 and

= 0 simultane-

ously in the same colour class.

Proof. The matrix



A 0

0 B



has the columns property if A and B do.

In fact, much much more is true.

Corollary.

Whenever

is finitely-coloured, one colour class contains solutions

to all partition regular systems!

Proof.

Suppose not. Then we have

∪ ··· ∪ D

such that for each

there is some partition regular matrix

such that we cannot solve

= 0

inside

. But this contradicts the fact that

diag

(

, A

, ··· , A

) is partition

regular (by applying consistency theorem many times).

Where did this whole idea of the (

m, p, c

)-sets come from? The original proof

by Rado didn’t use (

m, p, c

)-sets, and this idea only appeared only a while later,

when we tried to prove a more general result.

In general, we call a set

D ⊆ N

partition regular if we can solve any partition

regular system in

. Then we know that

are partition regular sets, but

+ 1 is not (because we can’t solve

, say). Then what Rado’s theorem

says is that whenever we finitely partition

, then one piece of

is partition

regular.

In the 1930’s, Rado conjectured that there is nothing special about

begin with — whenever we break up a partition regular set, then one of the

pieces is partition regular. This conjecture was proved by Deuber in the 1970s

who introduced the idea of the (m, p, c)-set.

It is not hard to check that

is partition regular iff

contains an (

m, p, c

)

set of each size. Then Deuber’s proof involves showing that for all

m, p, c, k ∈ N

there exists

n, q, d ∈ N

such that any

-colouring of an (

n, q, d

)-set contains

a monochromatic (

m, p, c

) set. The proof is quite similar to how we found

(

m, p, c

)-sets in the naturals, but instead of using van der Waerden theorem, we

need the Hales–Jewett theorem.

We end by mentioning an open problem in this area. Suppose

is an

m ×n

matrix

that is not partition regular. That is, there is some

-colouring of

with no solution to Ax = 0 in a colour class. Can we find some bound f(m, n),

such that every such

has a “bad” colouring with

k < f

(

m, n

)? This is an open

problem, first conjectured by Rado, and we think the answer is yes.

What do we actually know about this problem? The answer is trivially yes

for

2), as there aren’t many matrices of size 1

2, up to rescaling. It is a

non-trivial theorem that

3) exists, and in fact

≤

24. We don’t know

anything more than that.

4 Topological Dynamics in Ramsey Theory

Recall the finite sum theorem — for any fixed

, whenever

is finitely coloured,

we can find

, ··· , x

such that

F S

(

, x

, ··· , x

) is monochromatic. One

natural generalization is to ask the following question — can we find an infinite

set A such that

F S(A) =

(

x∈B

x : B ⊆ A finite non-empty

)

is monochromatic? The first thought upon hearing this question is probably

either “this is so strong that there is no way it can be true”, or “we can deduce

it easily, perhaps via some compactness argument, from the finite sum theorem”.

Both of these thoughts are wrong. It is true that it is always possible to find

these x

, x

, ···, but the proof is hard. This is Hindman’s theorem.

The way we are going to approach this is via topological dynamics, which

is the title of this chapter. The idea is we construct a metric space (and in

particular a dynamical system) out of the Ramsey-theoretic problem we are

interested in, and then convert the Ramsey-theoretic problem into a question

about topological properties of the space we are constructed.

In the case of Hindman’s theorem, it turns out the “topological” form is a

general topological fact, which we can prove for a general dynamical system.

What is the advantage of this approach? When we construct the dynamical

system we are interested in, what we get is highly “non-geometric”. It would

be very difficult to think of it as an actual “space”. It is just something that

happens to satisfy the definition of a metric space. However, when we prove the

general topological facts, we will think of our metric space as a genuine space

, and this makes it much easier to visualize what is going on in the proofs.

A less obvious benefit of this approach is that we will apply Tychonoff’s

theorem many times, and also use the fact that the possibly infinite intersection

of nested compact sets is open. If we don’t formulate our problem in topological

terms, then we might find ourselves re-proving these repeatedly, which further

complicates the proof.

We now begin our journey.

Definition

(Dynamical system)

A dynamical system is a pair (

X, T

) where

is the compact metric space and T is a homeomorphism on X.

Example. (T = R/Z, T

(x) = x + α) for α ∈ R is a dynamical system.

In this course, this is not the dynamical system we are interested in. In fact,

we are only ever going to be interested in one dynamical system.

We fix number of colours

k ∈ N

. Instead of looking at a particular colouring,

we consider the space of all k-colourings. We let

C = {c : Z → [k]}

∼

[k]

We endow this with the product topology. Since [

] has the discrete topology,

our basic open sets have the form

U = {c : Z → [k] : c(i

) = c

, ··· , c(i

) = c

}

for some i

, ··· , i

∈ Z and c

, ··· , c

∈ [k].

Since [

] is trivially compact, by Tychonoff’s theorem, we know that

compact. But we don’t really need Tychonoff, since we know that

is metrizable,

with metric

ρ(c

, c

) =

n + 1

where

is the largest integer for which

and

agree on [

−n, n

]. This metric

is easily seen to give rise to the same topology, and by hand, we can see this is

sequentially compact.

We now have to define the operation on C we are interested in.

Definition (Left shift). The left shift operator L : C → C is defined by

(Lc)(i) = c(i + 1).

We observe that this map is invertible, with inverse given by the right shift. This

is one good reason to work over

instead of

. Moreover, we see that

is nice

and uniformly continuous, since if

ρ(c

, c

) ≤

n + 1

then

ρ(Lc

, Lc

) ≤

Similarly,

−1

is uniformly continuous. So it follows that (

C, L

) is a dynamical

system.

Instead of proving Hindman’s theorem directly, we begin by re-proving van

der Waerden’s theorem using this approach, to understand better how it works.

Theorem

(Topological van der Waerden)

Let (

X, T

) be a dynamical system.

Then there exists an

ε >

0 such that whenever

r ∈ N

, then we can find

x ∈ X

and n ∈ N such that ρ(x, T

x) < ε for all i = 1, ··· , r.

In words, this says if we flow around

under

, then eventually some point

must go back near to itself, and must do that as many times as we wish to, in

regular intervals.

To justify calling this the topological van der Waerden theorem, we should

be able to easily deduce van der Waerden’s theorem from this. It is not difficult

to imagine this is vaguely related to van der Waerden in some sense, but it is

not obvious how we can actually do the deduction. After all, topological van

der Waerden theorem talks about the whole space of all colourings, and we do

not get to control what

we get back from it. What we want is to start with a

single colouring c and try to say something about this particular c.

Now of course, we cannot restrict to the sub-system consisting only of

itself,

because, apart from it being really silly, the function

does not restrict to a

map

{c} → {c}

. We might think of considering the set of all translates of

While this is indeed closed under

, this is not a dynamical system, since it is

not compact! The solution is to take the closure of that.

Definition

(Orbital closure)

Let (

X, T

) be a dynamical system, and

x ∈ X

The orbital closure ¯x of x is the set cl{T

x : s ∈ Z}.

Observe that ¯x is a closed subset of a compact space, hence compact.

Proposition. (

X, T ) is a dynamical system.

Proof.

It suffices to show that

¯x

is closed under

. If

y ∈ ¯x

, then we have

some

such that

x → y

. Since

is continuous, we know

x → T y

. So

T y ∈ ¯x. Similarly, T

−1

¯x = ¯x.

Once we notice that we can produce such a dynamical system from our

favorite point, it is straightforward to prove van der Waerden.

Corollary

(van der Waerden theorem)

Let

r, k ∈ N

. Then whenever

k-coloured, then there is a monochromatic arithmetic progression of length r.

Proof.

Let

Z →

[

]. Consider (

¯c, L

). By topological van der Waerden, we can

find

x ∈ ¯c

and

n ∈ N

such that

(

x, L

)

1 for all

= 1

, ··· , r

. In particular,

we know that x and L

x agree at 0. So we know that

x(0) = L

x(0) = x(in)

for all i = 0, ··· , r.

We are not quite done, because we just know that

x ∈ ¯c

, and not that

x = L

x for some k.

But this is not bad. Since x ∈ ¯c, we can find some s ∈ Z such that

ρ(T

c, x) ≤

rn + 1

This means x and T

c agree on the first rn + 1 elements. So we know

c(s) = c(s + n) = ··· = c(s + rn).

Since we will be looking at these

¯c

a lot, it is convenient to figure out what

this

¯c

actually looks like. It turns out there is a reasonably good characterization

of these orbital closures.

Let’s look at some specific examples. Consider c given by

Then the orbital closure has just two points, c and Lc.

What if we instead had the following?

Then ¯c contains all translates of these, but also

We define the following:

Definition (Seq). Let c : Z → [k]. We define

Seq(c) = {(c(i), ··· , c(i + r)) : i ∈ Z, r ∈ N}.

It turns out these sequences tell us everything we want to know about orbital

closures.

Proposition. We have c

∈ ¯c iff Seq(c

) ⊆ Seq(c).

The proof is just following the definition and checking everything works.

Proof.

We first prove

⇒

. Suppose

∈ ¯c

. Let (

(

)

, ··· , c

(

r −

1))

∈ Seq

(

Then we have s ∈ Z such that

ρ(c

, L

c) <

1 + max(|i|, |i + s − 1|)

which implies

(c(s + i), ··· , c(s + i + r − 1)) = (c

(i), ··· , c

(i + s − 1)).

So we are done.

For ⇐, if Seq(c

) ⊆ Seq(c), then for all n ∈ N, there exists s

∈ Z such that

(−n), ··· , c

(n)) = (c(s

− n), ··· , c(s

+ n)).

Then we have

c → c

So we have c

∈ ¯c.

We now return to talking about topological dynamics. We saw that in our

last example, the orbital closure of

has three “kinds” of things — translates of

, and the all-red and all-blue ones. The all-red and all-blue ones are different,

because they seem to have “lost” the information of

. In fact, the orbital closure

of each of them is just that colouring itself. This is a strict subset of

¯c

. These

are phenomena we don’t want.

Definition

(Minimal dynamical system)

We say (

X, T

) is minimal if

¯x

for all

x ∈ X

. We say

x ∈ X

is minimal if (

¯x, T

) is a minimal dynamical system.

Proposition. Every dynamical system (X, T ) has a minimal point.

Thus, most of the time, when we want to prove something about dynamical

systems, we can just pass to a minimal subsystem, and work with it.

Proof.

Let

{¯x

x ∈ X}

. Thus is a family of closed sets, ordered by inclusion.

We want to apply Zorn’s lemma to obtain a minimal element. Consider a chain

S in U. If

¯x

⊇ ¯x

⊇ ··· ⊇ ¯x

then their intersection is

¯x

, which is in particular non-empty. So any finite

collection in S has non-empty intersection. Since X is compact, we know

¯x∈S

¯x 6= ∅.

We pick

z ∈

¯x∈S

¯x 6= ∅.

Then we know that

¯z ⊆ ¯x

for all ¯x ∈ S. So we know

¯z ⊆

¯x∈S

¯x 6= ∅.

So by Zorn’s lemma, we can find a minimal element (in both senses).

Example. Consider c given by

Then we saw that ¯c contains all translates of these, and also

Then this system is not minimal, since the orbital closure of the all-red one is a

strict subsystem of ¯c.

We are now going to derive some nice properties of minimal systems.

Lemma.

If (

X, T

) is a minimal system, then for all

ε >

0, there is some

m ∈ N

such that for all x, y ∈ X, we have

min

|s|≤m

ρ(T

x, y) < ε.

Proof. Suppose not. Then there exists ε > 0 and points x

, y

∈ X such that

min

|s|≤i

ρ(T

, y

) ≥ ε.

By compactness, we may pass to subsequences, and assume

→ x

and

→ y

By continuity, it follows that

ρ(T

x, y) ≥ ε

for all s ∈ Z. This is a contradiction, since ¯x = X by minimality.

We now want to prove topological van der Waerden.

Theorem

(Topological van der Waerden)

Let (

X, T

) be a dynamical system.

Then there exists an

ε >

0 such that whenever

r ∈ N

, then we can find

x ∈ X

and n ∈ N such that ρ(x, T

x) < ε for all i = 1, ··· , r.

Proof.

Without loss of generality, we may assume (

X, T

) is a minimal system.

We induct on

. If

= 1, we can just choose

y ∈ X

, and consider

T y, T

y, ···

Then note that by compactness, we have s

, s

∈ N such that

ρ(T

y, T

y) < ε,

and then take x = T

y and n = s

− s

Now suppose the result is true for

r >

1, and that we have the result for all

ε > 0 and r − 1.

Claim.

For all

ε >

0, there is some point

y ∈ X

such that there is an

x ∈ X

and n ∈ N such that

ρ(t

x, y) < ε

for all 1 ≤ i ≤ r.

Note that this is a different statement from the theorem, because we are not

starting at

. In fact, this is a triviality. Indeed, let (

, n

) be as guaranteed by

the hypothesis. That is,

(

, T

)

< ε

for all 1

≤ i ≤ r

. Then pick

and x = T

−n

The next goal is to show that there is nothing special about this y.

Claim.

For all

ε >

0 and for all

z ∈ X

, there exists

∈ X

and

n ∈ N

for

which ρ(T

, z) < ε.

The idea is just to shift the picture so that

gets close to

, and see where

we send x to. We will use continuity to make sure we don’t get too far away.

We choose

as in the previous lemma for

. Since

−m

, T

−m+1

, ··· , T

are all uniformly continuous, we can choose

such that

(

a, b

)

< ε

implies

ρ(T

a, T

b) <

for all |s| ≤ m.

Given

z ∈ X

, we obtain

and

by applying our first claim to

. Then we

can find

s ∈ Z

with

|s| ≤ m

such that

(

y, z

)

. Consider

. Then

ρ(T

, z) ≤ ρ(T

, T

y) + ρ(T

y, z)

≤ ρ(T

x), T

y) +

≤

= ε

Claim.

For all

ε >

0 and

z ∈ X

, there exists

x ∈ X

n ∈ N

and

0 such

that T

(B(x, ε

)) ⊆ B(z, ε) for all 1 ≤ i ≤ r.

We choose ε

by continuity, using the previous claim.

The idea is that we repeatedly apply the final claim, and then as we keep

moving back, eventually two points will be next to each other.

We pick

∈ X

and set

. By the final claim, there exists

∈ X

and

some

∈ N

such that

(

, ε

))

⊆ B

(

, ε

) for some 0

< ε

≤ ε

and all

1 ≤ i ≤ r.

Inductively, we find z

∈ X, n

∈ N and some 0 < ε

≤ ε

s−1

such that

(B(z

, ε

)) ⊆ B(z

s−1

, ε

s−1

)

for all 1 ≤ i ≤ r.

By compactness, (

) has a convergent subsequence, and in particular, there

exists i < j ∈ N such that ρ(z

, z

) <

Now take x = z

, and

n = n

+ n

j−1

+ ··· + n

i+1

Then

(B(x, ε

)) ⊆ B(z

, ε

for all 1 ≤ ` ≤ r. But we know

ρ(z

, z

) ≤

and

≤ ε

≤

So we have

ρ(T

x, x) ≤ ε

for all 1 ≤ ` ≤ r.

In the remaining of the chapter, we will actually prove Hindman’s theorem.

We will again first prove a “topological” version, but unlike the topological van

der Waerden theorem, it will look nothing like the actually Hindman’s theorem.

So we still need to do some work to deduce the actual version from the topological

version.

To state topological Hindman, we need the following definition:

Definition (Proximal points). We say x, y ∈ X are proximal if

inf

s∈Z

ρ(T

x, T

y) = 0.

Example.

Two colourings

, c

∈ C

are proximal iff there are arbitrarily large

intervals where they agree.

The topological form of Hindman’s theorem says the following:

Theorem

(Topological Hindman’s theorem)

Let (

X, T

) be a dynamical system,

and suppose

¯x

for some

x ∈ X

. Then for any minimal subsystem

Y ⊆ X

then there is some y ∈ Y such that x and y are proximal.

We will first show how this implies Hindman’s theorem, and then prove this.

To do the translation, we need to, of course, translate concepts in dynamical

systems to concepts about colourings. We previously showed that for colourings

c, c

, we have

∈ ¯c

iff

Seq

(

)

⊆ Seq

(

). The next thing to figure out is when

Z →

[

] is minimal. By definition, this means for any

x ∈ ¯c

, we have

¯x

¯c

Equivalently, we have Seq(x) = Seq(c).

We introduce some notation.

Notation.

For an interval

I ⊆ Z

, we say

{a, a

+ 1

, ··· , b}

, we write

(

) for

the sequence (c(a), c(a + 1), ··· , c(b)).

Clearly, we have

Seq(c) = {c(I) : I ⊆ Z is an interval}.

We say for intervals

, I

⊆ Z

, we have

(

)

4 c

(

), if

(

) is a subsequence

of c(I

Definition

(Bounded gaps property)

We say a colouring

Z →

[

] has the

bounded gaps property if for each interval

I ⊆ Z

, there exists some

M >

0 such

that c(I) 4 c(U) for every interval U ⊆ Z of length at least M.

For example, every periodic colouring has the bounded gaps property.

It turns out this is precisely what we need to characterize minimal colourings.

Proposition.

A colouring

Z →

[

] is minimal iff

has the bounded gaps

property.

Proof.

Suppose

has the bounded gaps property. We want to show that for all

x ∈ ¯c

, we have

Seq

(

) =

Seq

(

). We certainly got one containment

Seq

(

)

⊆

Seq

(

). Consider any interval

I ⊆ Z

. We want to show that

(

)

∈ Seq

(

). By

the bounded gaps property, there is some

such that any interval

U ⊆ Z

length > M satisfies c(I) 4 c(U). But since x ∈ ¯c, we know that

x([0, ··· , M ]) = c([t, t + 1, ··· , t + M ])

for some t ∈ Z. So c(I) 4 x([0, ··· , M]), and this implies Seq(c) ⊆ Seq(x).

For the other direction, suppose

does not have the bounded gaps property.

So there is some bad interval

I ⊆ Z

. Then for any

n ∈ N

, there is some

such

that

c(I) 64 c([t

− n, t

− n + 1, ··· , t

+ n]).

Now consider

(

). By passing to a subsequence if necessary, we may

assume that

→ x

. Clearly,

(

)

6∈ Seq

(

). So we have found something in

Seq(c) \ Seq(x). So c 6∈ ¯x.

It turns out once we have equated minimality with the bounded gaps property,

it is not hard to deduce Hindman’s theorem.

Theorem

(Hindman’s theorem)

N →

[

] is a

-colouring, then there

exists an infinite A ⊆ N such that F S(A) is monochromatic.

Proof.

We extend the colouring

to a colouring of

by defining

Z →

[

+ 1]

with

x(i) = x(−i) = c(i)

x 6

= 0, and

(0) =

+ 1. Then it suffices to find an infinite an infinite

A ⊆ Z

such that F S(A) is monochromatic with respect to x.

We apply topological Hindman to (

¯x, L

) to find a minimal colouring

such

that x and y are proximal. Then either

inf

n∈N

ρ(L

x, L

y) = 0 or inf

n∈N

ρ(L

−n

x, L

−n

y) = 0.

We wlog it is the first case, i.e.

and

are positively proximal. We now build

up this infinite set A we want one by one.

Let the colour of

(0) be red. We shall in fact pick 0

< a

< ···

inductively so that x(t) = y(t) = red for all t ∈ F S(a

, a

, ···).

By the bounded gaps property of

, there exists some

such that for any

I ⊆ Z of length ≥ M

, then it contains y([0]), i.e. y([0]) 4 y(I).

Since

and

are positively proximal, there exists

I ⊆ Z

such that

|I| ≥ M

and

min

(

)

0 such that

(

) =

(

). Then we pick

∈ I

such that

x(a

) = y(a

) = y(0).

Now we just have to keep doing this. Suppose we have found

< ··· < a

as required. Consider the interval

= [0

, ··· , a

···

]. Again by

the bounded gaps property, there exists some

n+1

such that if

I ⊆ Z

has

|I| ≥ M

n+1

, then y(J) 4 y(I).

We choose an interval I ⊆ Z such that

(i) x(I) = y(I)

(ii) |I| ≥ M

n+1

(iii) min I >

i=1

Then we know that

y([0, ··· , a

+ ··· + a

]) 4 y(I).

Let

n+1

denote by the position at which

([0

, ··· , a

···

]) occurs in

(

It is then immediate that

(

) =

(

) =

red

for all

z ∈ F S

(

, ··· , a

n+1

), as any

element in F S(a

, ··· , a

n+1

) is either

– t

for some t

∈ F S(a

, . . . , a

);

– a

n+1

; or

– a

n+1

+ t

for some t

∈ F S(a

, ··· , a

and these are all red by construction.

Then A = {a

, a

, ···} is the required set.

The heart of the proof is that we used topological Hindman to obtain a

minimal proximal point, and this is why we can iteratively get the a

It remains to prove topological Hindman.

Theorem

(Topological Hindman’s theorem)

If (

X, T

) is a dynamical system,

¯x

and

Y ⊆ X

is minimal, then there exists

y ∈ Y

such that

and

are

proximal.

This is a long proof, and we will leave out proofs of basic topological facts.

Proof.

is a compact metric space, then

X → X}

is compact

under the product topology by Tychonoff. The basic open sets are of the form

{f : X → Y | f(x

) ∈ U

for i = 1, ··· , k}

for some fixed x

∈ X and U

⊆ X open.

Now any function g : X → X can act on X

in two obvious ways —

– Post-composition L

: X

→ X

be given by L

(f) = g ◦ f .

– Pre-composition R

: X

→ X

be given by R

(f) = f ◦ g.

The useful thing to observe is that

is always continuous, while

is continuous

if g is continuous.

Now if (X, T ) is a dynamical system, we let

= cl{T

: s ∈ Z} ⊆ X

The idea is that we look at what is going on in this space instead. Let’s note a

few things about this subspace E

– E

is compact, as it is a closed subset of a compact set.

– f ∈ E

iff for all

ε >

0 and points

, ··· , x

∈ X

, there exists

s ∈ Z

such

that ρ(f (x

), T

)) < ε.

– E

is closed under composition.

So in fact,

is a compact semi-group inside

. It turns out every proof of

Hindman’s theorem involves working with a semi-group-like object, and then

trying to find an idempotent element.

Theorem

(Idempotent theorem)

E ⊆ X

is a compact semi-group, then

there exists g ∈ E such that g

= g.

This is a strange thing to try to prove, but it turns out once we came up

with this statement, it is not hard to prove. The hard part is figuring out this is

the thing we want to prove.

Proof.

Let

denote the collection of all compact semi-groups of

. Let

A ∈ F

be minimal with respect to inclusion. To see

exists, it suffices (by Zorn) to

check that any chain

has a lower bound in

, but this is immediate,

since the intersection of nested compact sets is noon-empty.

Claim. Ag = A for all g ∈ A.

We first observe that

is compact since

is continuous. Also,

is a

semigroup, since if f

g, f

g ∈ Ag, then

g = (f

)g ∈ A

Finally, since g ∈ A, we have Ag ⊆ A. So by minimality, we have Ag = A.

Now let

= {f ∈ A : fg = g}.

Claim. B

= A for all g ∈ A.

Note that

is non-empty, because

. Moreover,

is closed, as

the inverse of

{g}

under

. So

is compact. Finally, it is clear that

is a

semi-group. So by minimality, we have B

= A.

So pick any g ∈ A, and then g

= g.

Proof of topological Hindman (continued).

We are actually going to work in a

slightly smaller compact semigroup than E

. We let F ⊆ E

be defined by

F = {f ∈ E

: f(x) ∈ Y }.

Claim. F ⊆ X

is a compact semigroup.

Before we prove the claim, we see why it makes sense to consider this

Suppose the claim is true. Then by applying the idempotent theorem, to get

g ∈ F such that g

= g. We now check that x, g(x) are proximal.

Since g ∈ F ⊆ E

, we know for all ε, there exists s ∈ Z such that

ρ(g(x), T

(x)) <

, ρ(g(g(x)), T

(g(x))) <

But we are done, since

(

) =

(

)), and then we conclude from the triangle

inequality that

ρ(T

(x), T

(g(x))) < ε.

So x and g(x) ∈ Y are proximal.

It now remains to prove the final claim. We first note that

is non-empty.

Indeed, pick any

y ∈ Y

. Since

¯x

, there exists some sequence

(

)

→ y

Now by compactness of

, we can find some

f ∈ E

such that (

)

i≥0

cluster

, i.e. every open neighbourhood of

contains infinitely many

. Then

since X is Hausdorff, it follows that f(x) = y. So f ∈ F.

To show

is compact, it suffices to show that it is closed. But since

¯y

for all

y ∈ Y

by minimality, we know

is in particular closed, and so

is closed

in the product topology.

Finally, we have to show that

is closed under composition, so that it is a

semi-group. To show this, it suffices to show that any map

f ∈ E

sends

. Indeed, by minimality of

, this is true for

for

s ∈ Z

, and then we are

done since Y is closed.

The actual hard part of the proof is deciding we should prove the idempotent

theorem. It is not an obvious thing to try!

5 Sums and products*

The goal of this final (non-examinable) chapter is to prove the following theorem:

Theorem

(Moreira, 2016)

-coloured, then there exists infinitely many

x, y such that {x, x + y, xy} is monochromatic.

This is a rather hard theorem, but the proof is completely elementary, and

just involves a very clever use of van der Waerden’s theorem. One of the main

ideas is to consider syndetic sets.

Definition

(Syndetic set)

We say

< a

< ···} ⊆ N

is syndetic if it

has bounded gaps, i.e. a

i+1

− a

< b for all i.

It turns out these sets contain a lot of structure. What happens when we

finitely colour a syndetic set? Are we guaranteed to have a monochromatic

syndetic set? Let’s take the simplest syndetic set —

. Let’s say we 2-colour

Does one of the classes have to by syndetic? Of course not. For example, we

can look at this:

···

But this set obviously has some very nice structure. So we are motivated to

consider the following definition:

Definition

(Piecewise syndetic set)

We say

A ⊆ N

is piecewise syndetic if there

exists b > 0 such that A has gaps bounded by b on arbitrarily long intervals.

More explicitly, if we again write

< a

< ···}

, then there exists

such that for all

N >

0, there is

i ∈ N

such that

k+1

−a

< b

for

i, ··· , i

We begin with two propositions which are standard results.

Proposition.

is piecewise syndetic and

X ∪ Y

, then one of

and

is piecewise syndetic.

Proof.

We fix

b >

0 and the sequence of intervals

, I

, ···

with

| ≥ n

such

that A has gaps ≤ b in each I

. We may, of course, wlog I

are disjoint.

Each element in

is either in

or not. We let

denote the largest

number of consecutive things in

that are in

. If

is unbounded, then

piecewise syndetic. If

≤ K

, then

is piecewise syndetic, with gaps bounded

by (K + 1)b.

Of course, this can be iterated to any finite partition. There is nothing clever

so far.

Our next proposition is going to be a re-statement of van der Waerden’s

theorem in the world of piecewise syndetic sets.

Proposition.

Let

A ⊆ N

be piecewise syndetic. Then for all

m ∈ N

, there

exists d ∈ N such that the set

∗

= {x ∈ N : x, x + d, ··· , x + md ∈ A}.

is piecewise syndetic.

Proof.

Let’s in fact assume that

is syndetic, with gaps bounded by

. The

proof for a piecewise syndetic set is similar, but we have to pass to longer and

longer intervals, and then piece them back together.

We want to apply van der Waerden’s theorem. By definition,

N = A ∪ (A + 1) ∪··· ∪ (A + b).

Let c : N → {0, ··· , b} by given by

c(x) = min{i : x ∈ A + i}.

Then by van der Waerden, we can find some a

, d

such that

, a

+ d

, ··· , a

+ md

∈ A + i.

Of course, this is equivalent to

− i), (a

+ i) + d

, ··· , (a

− i) + md

∈ A.

So we wlog i = 0.

But van der Waerden doesn’t require the whole set of

. We can split

into

huge blocks of size

(

+ 1

, b

+ 1), and then we run the same argument in each

of these blocks. So we find (a

, d

), (a

, d

), ··· such that

, a

+ d

, ··· , a

+ md

∈ A.

Moreover,

, a

, ···}

is syndetic with gaps bounded by

(

+ 1

, b

+ 1).

But we need a fixed d that works for everything.

We now observe that by construction, we must have

≤ W

(

+ 1

, b

+ 1)

for all

. So this induces a finite colouring on the

, where the colour of

just

. Now we are done by the previous proposition, which tells us one of the

partitions must be piecewise syndetic.

That was the preliminary work we had to do to prove the theorem. The

proof will look very magical, and things only start to make sense as we reach the

end. This is not a deficiency of the proof. There is some reason this problem

was open for more than 40 years.

The original proof (arxiv:1605.01469) is a bit less magical, and used concepts

from topological dynamics. Interested people can find the original paper to read.

However, we will sacrifice “deep understanding” for having a more elementary

proof that just uses van der Waerden.

Proof of theorem. Let N = C

∪ ··· ∪ C

. We build the following sequences:

(i) y

, y

, ··· ∈ N;

(ii) B

, B

, ··· and D

, D

, ··· which are piecewise syndetic sets;

(iii) t

, t

, ··· ∈ [r] which are colours.

We will pick these such that B

⊆ C

We first observe that one of the colour classes

is piecewise syndetic. We

pick

∈

[

] be such that

is piecewise syndetic, and pick

. We want

to apply the previous proposition to obtain

from

. We pick

= 2, and

then we can find a y

such that

= {z | z, z + y

∈ B

}

is piecewise syndetic.

We now want to get

. We notice that

is also piecewise syndetic, just

with a bigger gap. So there is a colour class

∈

[

] such that

∩ C

piecewise syndetic, and we let B

= y

∩ C

In general, having constructed

, ··· , y

i−1

;

, ··· , B

i−1

;

, ··· , D

i−1

and

, . . . , t

i−1

, we proceed in the following magical way:

We apply the previous proposition with the really large

···y

i−1

find y

∈ N such that the set

= {z | z, z + y

, ··· , z + (y

···y

i−1

∈ B

i−1

}

is piecewise syndetic. The main thing is that we are not using all points in this

progression, but are just using some important points in there. In particular, we

use the fact that

z, z + y

, z + (y

···y

i−1

∈ B

i−1

for all 1 ≤ j ≤ i − 1. It turns out these squares are important.

We have now picked

and

, but we still have to pick

. But this is just

the same as what we did. We know that

is piecewise syndetic. So we know

there is some t

∈ [r] such that B

= y

∩ C

is piecewise syndetic.

Let’s write down some properties of these

’s. Clearly,

⊆ C

, but we

can say much more. We know that

⊆ y

i−1

⊆ ··· ⊆ y

i−1

···y

j+1

Of course, there exists some t

, t

with j < i such that t

= t

. We set

y = y

i−1

···y

j+1

Let’s choose any z ∈ B

. Then we can find some x ∈ B

such that

z = xy,

We want to check that this gives us what we want. By construction, we have

x ∈ B

⊆ C

, xy = z ∈ B

⊆ C

So they have the same colour.

How do we figure out the colour of x + y? Consider

y(x + y) = z + y

∈ y

+ y

By definition, if a ∈ D

, then we know

a + (y

j+1

···y

i−1

∈ B

i−1

So we have

⊆ B

i−1

− (y

j+1

···y

i+1

⊆ y

j+1

···y

i−1

− (y

j+1

···y

i−1

So it follows that

y(x + y) ⊆ y

i−1

···y

j+1

− y

+ y

= yB

So x + y ∈ B

⊆ C

= C

. So {x, x + y, xy} have the same colour.