Part III — Ramsey Theory
Based on lectures by B. P. Narayanan
Notes taken by Dexter Chua
Lent 2017
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
What happens when we cut up a mathematical structure into many ‘pieces’ ? How
big must the original structure be in order to guarantee that at least one of the pieces
has a specific property of interest? These are the kinds of questions at the heart of
Ramsey theory. A prototypical result in the area is van der Waerden’s theorem, which
states that whenever we partition the natural numb ers into finitely many classes, there
is a class that contains arbitrarily long arithmetic progressions.
The course will cover both classical material and more recent developments in the
subject. Some of the classical results that I shall cover include Ramsey’s theorem, van
der Waerden’s theorem and the Hales–Jewett theorem; I shall discuss some applications
of these results as well. More recent developments that I hope to cover include the
prop erties of non-Ramsey graphs, topics in geometric Ramsey theory, and finally,
connections between Ramsey theory and top ological dynamics. I will also indicate a
number of open problems.
Pre-requisites
There are almost no pre-requisites and the material presented in this course will, by
and large, be self-contained. However, students familiar with the basic notions of graph
theory and point-set topology are bound to find the course easier.
0 Introduction
Vaguely, the main idea of Ramsey theory is as follows — we have some object
X
that comes with some structure, and then we break
X
up into finitely many
pieces. Can we find a piece that retains some of the structure of
X
? Usually, we
phrase this in terms of colourings. We pick a list of finitely many colours, and
colour each element of X. Then we want to find a monochromatic subset of X
that satisfies some properties.
The most classical example of this is a graph colouring problem. We take a
graph, say
We then colour each edge with either red or blue:
We now try to find a (complete) subgraph that is monochromatic, i.e. a subgraph
all of whose edges are the same colour. With a bit of effort, we can find a red
monochromatic subgraph of size 4:
Are we always guaranteed that we can find a monochromatic subgraph of size 4?
If not, how big does the initial graph have to be? These are questions we will
answer in this course. Often, we will ask the question in an “infinite” form —
given an infinite graph, is it always possible to find an infinite monochromatic
subgraph? The answer is yes, and it turns out we can deduce results about the
finitary version just by knowing the infinite version.
These questions about graphs will take up the first chapter of the course.
In the remaining of the course, we will discuss Ramsey theory on the integers,
which, for the purpose of this course, will always mean
N
=
{
1
,
2
,
3
, ···}
. We will
now try to answer questions about the arithmetic structure of
N
. For example,
when we finitely colour
N
, can we find an infinite arithmetic progression? With
some thought, we can see that the answer is no. However, it turns out we can
find arbitrarily long arithmetic progressions.
More generally, suppose we have some system of linear equations
3x + 6y + 2z = 0
2x + 7y + 3z = 0.
If we finitely colour
N
, can we always find a monochromatic solution to these
equations?
Remarkably, there is a complete characterization of all linear systems that
always admit monochromatic solutions. This is given by Rado’s theorem, which
is one of the hardest theorems we are going to prove in the course.
If we are brave, then we can try to get the multiplicative structure of
N
in
the picture. We begin by asking the most naive question — if we finitely colour
N
, can we always find
x, y ∈ N
, not both 2, such that
x
+
y
and
xy
are the same
colour? This is a very hard question, and the answer turns out to be yes.
1 Graph Ramsey theory
1.1 Infinite graphs
We begin by laying out some notations and definitions.
Definition (N and [n]). We write
N = {1, 2, 3, 4, ···}.
We also write
[n] = {1, 2, 3, ··· , n}.
Notation.
For a set
X
, we write
X
(r)
for the subsets of
X
of size
r
. The
elements are known as r-sets.
We all know what a graph is, hopefully, but for concreteness we provide a
definition:
Definition (Graph). A graph G is a pair (V, E), where E ⊆ V
(2)
.
In particular, our graphs are not directed.
We are mostly interested in graphs with
V
=
N
or [
n
]. In this case, we will
write an edge
{i, j}
as
ij
, and always assume that
i < j
. More generally, in
N
(r)
,
we write i
1
i
2
···i
r
for an r-set, and implicitly assume i
1
< i
2
< ··· < i
r
.
Definition (k-colouring). A k-colouring of a set X is a map c : X → [k].
Of course, we often replace
k
with some actual set of colours, e.g.
{red, blue}
.
We make more obvious definitions:
Definition
(Monochromatic set)
.
Let
X
be a set with a
k
-colouring. We say a
subset Y ⊆ X is monochromatic if the colouring restricted to Y is constant.
The first question we are interested is in the following — if
N
(2)
is
k
-coloured,
can we find a complete infinite subgraph that is monochromatic? In other words,
is there an infinite subset X ⊆ N such that X
(2)
⊆ N
(2)
is monochromatic?
We begin by trying some examples. The correct way to read these examples
is to stare at the colouring, and then try to find a monochromatic subset yourself,
instead of immediately reading on to see what the monochromatic subset is.
Example. Suppose we had the colouring c : N
(2)
→ {red, blue} by
c(ij) =
(
red i + j is even
blue i + j is odd
.
Then we can pick
X
=
{
2
,
4
,
6
,
8
, ···}
, and this is an infinite monochromatic set.
Example. Consider c : N
(2)
→ {0, 1, 2}, where
c(ij) = max{n : 2
n
| i + j} mod 3
In this case, taking
X
=
{
8
,
8
2
,
8
3
, ···}
gives an infinite monochromatic set of
colour 0.
Example. Let c : N
(2)
→ {red, blue} by
c(ij) =
(
red i + j has an even number of distinct prime factors
blue otherwise
It is in fact an open problem to find an explicit infinite monochromatic set
for this colouring, or even for which colour do we have an infinite monochromatic
set. However, we can prove that such a set must exist!
Theorem
(Ramsey’s theorem)
.
Whenever we
k
-colour
N
(2)
, there exists an
infinite monochromatic set
X
, i.e. given any map
c
:
N
(2)
→
[
k
], there exists a
subset X ⊆ N such that X is infinite and c|
X
(2)
is a constant function.
Proof.
Pick an arbitrary point
a
1
∈ N
. Then by the pigeonhole principle, there
must exist an infinite set
B
1
⊆ N \ {a
1
}
such that all the
a
1
-
B
1
edges (i.e. edges
of the form (a
1
, b
1
) with b
1
∈ B
1
) are of the same colour c
1
.
Now again arbitrarily pick an element
a
2
∈ B
1
. Again, we find some infinite
B
2
⊆ B
1
such that all
a
2
-
B
2
edges are the same colour
c
2
. We proceed inductively.
a
1
a
2
We obtain a sequence
{a
1
, a
2
, ···}
and a sequence of colours
{c
1
, c
2
, ···}
such
that c(a
i
, a
j
) = c
i
, for i < j.
Now again by the pigeonhole principle, since there are finitely many colours,
there exists an infinite subsequence
c
i
1
, c
i
2
, ···
that is constant. Then
a
i
1
, a
i
2
, ···
is an infinite monochromatic set, since all edges are of the colour
c
i
1
=
c
i
2
=
···
.
So we are done.
This proof exhibits many common themes we will see in the different Ram-
sey theory proofs we will later do. The first is that the proof is highly non-
constructive. Not only does it not give us the infinite monochromatic set; it
doesn’t even tell us what the colour is.
Another feature of the proof is that we did not obtain the infinite monochro-
matic set in one go. Instead, we had to first pass through that intermediate
structure, and then obtain an infinite monochromatic set from that. In future
proofs, we might have to go through many passes, instead of just two.
This theorem looks rather innocuous, but it has many interesting applications.
Corollary
(Bolzano-Weierstrass theorem)
.
Let (
x
i
)
i≥0
be a bounded sequence
of real numbers. Then it has a convergent subsequence.
Proof. We define a colouring c : N
(2)
→ {↑, ↓}, where
c(ij) =
(
↑ x
i
< x
j
↓ x
j
≤ x
i
Then Ramsey’s theorem gives us an infinite monochromatic set, which is the
same as a monotone subsequence. Since this is bounded, it must convergence.
With a very similar technique, we can prove that we can do this for
N
(r)
for
any r, and not just N
(2)
.
Theorem
(Ramsey’s theorem for
r
sets)
.
Whenever
N
(r)
is
k
-coloured, there
exists an infinite monochromatic set, i.e. for any
c
:
N
(r)
→
[
k
], there exists an
infinite X ⊆ N such that c|
X
(r)
is constant.
We can again try some concrete examples:
Example. We define c : N
(3)
→ {red, blue} by
c(ijk) =
(
red i | j + k
blue otherwise
Then X = {2
0
, 2
1
, 2
2
, ···} is an infinite monochromatic set.
Proof.
We induct on
r
. This is trivial when
r
= 1. Assume
r >
1. We fix
a
1
∈ N
.
We induce a k-colouring c
1
of (N \ {a
1
})
(r−1)
by
c
1
(F ) = c(F ∪ {a
1
}).
By induction, there exists an infinite
B
1
⊆ N\{a
1
}
such that
B
1
is monochromatic
for c
1
, i.e. all a
1
-B
1
r-sets have the same colour c
1
.
We proceed inductively as before. We get
a
1
, a
2
, a
3
, ···
and colours
c
1
, c
2
, ···
etc. such that for any
r
-set
F
contained in
{a
1
, a
2
, ···}
, we have
c
(
F
) =
c
min F
.
Then again, there exists
c
i
1
, c
i
2
, c
i
3
, ···
all identical, and our monochromatic
set is {a
i
1
, a
i
2
, a
i
3
, ···}.
Now a natural question to ask is — what happens when we have infinitely
many colours? Clearly an infinite monochromatic subset cannot be guaranteed,
because we can just colour all edges with different colours.
The natural compromise is to ask if we can find an
X
such that either
c|
X
is
monochromatic, or
c|
X
is injective. After a little thought, we realize this is also
impossible. We can construct a colouring on
N
(2)
as follows: we first colour all
edges that involve 1 with the colour 1, then all edges that involve 2 with the
colour 2 etc:
. . .
It is easy to see that we cannot find an infinite monochromatic subset or an
infinite subset with all edges of different colours.
However, this counterexample we came up with still has a high amount of
structure — the colour of the edges are uniquely determined by the first element.
It turns out this is the only missing piece (plus the obvious dual case).
With this, we can answer our previous question:
Theorem
(Canonical Ramsey theorem)
.
For any
c
:
N
(2)
→ N
, there exists an
infinite X ⊆ N such that one of the following hold:
(i) c|
X
(2)
is constant.
(ii) c|
X
(2)
is injective.
(iii) c(ij) = c(k`) iff i = k for all i, j, k, ` ∈ X.
(iv) c(ij) = c(k`) iff j = ` for all i, j, k, ` ∈ X.
Recall that when we write
ij
, we always implicitly assume
i < j
, so that (iii)
and (iv) make sense.
In previous proofs, we only had to go through two passes to get the desired
set. This time we will need more.
Proof. Consider the following colouring of X
(4)
: let c
1
be a 2-colouring
c
1
(ijk`) =
(
SAME c(ij) = c(k`)
DIFF otherwise
Then we know there is some infinite monochromatic set
X
1
⊆ N
for
c
1
. If
X
1
is
coloured
SAME
, then we are done. Indeed, for any pair
ij
and
i
0
j
0
in
X
1
, we can
pick some huge k, ` such that j, j
0
< k < `, and then
c(ij) = c(k`) = c(i
0
j
0
)
as we know c
1
(ijk`) = c
1
(i
0
j
0
k`) = SAME.
What if
X
1
is coloured
DIFF
? We next look at what happens when we have
edges that are nested each other. We define
c
2
:
X
(4)
1
→ {SAME, DIFF}
defined by
c
2
(ijk`) =
(
SAME c(i`) = c(jk)
DIFF otherwise
Again, we can find an infinite monochromatic subset
X
2
⊆ X
1
with respect to
c
2
.
We now note that
X
2
cannot be coloured
SAME
. Indeed, we can just look at
i j
k `
m n
So if X
2
were SAME, we would have
c(`m) = c(in) = c(jk),
which is impossible since X
1
is coloured DIFF under c
1
.
So X
2
is DIFF. Now consider c
3
: X
(4)
2
→ {SAME, DIFF} given by
c
3
(ijk`) =
(
SAME c(ik) = c(j`)
DIFF otherwise
Again find an infinite monochromatic subset
X
3
⊆ X
2
for
c
3
. Then
X
3
cannot
be SAME, this time using the following picture:
contradicting the fact that c
2
is DIFF. So we know X
3
is DIFF.
We have now have ended up in this set
X
3
such that if we have any two pairs
of edges with different end points, then they must be different.
We now want to look at cases where things share a vertex. Consider
c
4
:
X
(3)
3
→ {SAME, DIFF} given by
c
4
(ijk) =
(
SAME c(ij) = c(jk)
DIFF otherwise
Let
X
4
⊆ X
3
be an infinite monochromatic set for
c
4
. Now
X
4
cannot be
coloured SAME, using the following picture:
which contradicts the fact that
c
1
is
DIFF
. So we know
X
4
is also coloured
DIFF
under c
4
.
We are almost there. We need to deal with edges that nest in the sense of
(iii) and (iv). We look at c
5
: X
(3)
4
→ {LSAME, LDIFF} given by
c
5
(ijk) =
(
LSAME c(ij) = c(ik)
LDIFF otherwise
Again we find
X
5
⊆ X
4
, an infinite monochromatic set for
c
5
. We don’t separate
into cases yet, because we know both cases are possible, but move on to classify
the right case as well. Define c
6
: X
(3)
5
→ {RSAME, RDIFF} given by
c
5
(ijk) =
(
RSAME c(ik) = c(jk)
RDIFF otherwise
Let X
6
⊆ X
5
be an infinite monochromatic subset under c
5
.
As before, we can check that it is impossible to get both
LSAME
and
RSAME
,
using the following picture:
contradicting c
4
being DIFF.
Then the remaining cases (
LDIFF, RDIFF
), (
LSAME, RDIFF
) and (
RDIFF, LSAME
)
corresponds to the cases (ii), (iii) and (iv).
Note that we could this theorem in one pass only, instead of six, by considering
a much more complicated colouring (c
1
, c
2
, c
3
, c
4
, c
5
, c
6
) with values in
{SAME, DIFF}
4
× {LSAME, LDIFF} × {RSAME, RDIFF},
but we still have to do the same analysis and it just complicates matters more.
There is a generalization of this to
r
-sets. One way we can rewrite the
theorem is to say that the colour is uniquely determined by some subset of the
vertices. The cases (i), (ii), (iii), (iv) correspond to no vertices, all vertices, first
vertex, and second vertex respectively. Then for
r
-sets, we have 2
r
possibilities,
one for each subset of the r-coordinates.
Theorem
(Higher dimensional canonical Ramsey theorem)
.
Let
c
:
N
(r)
→ N
be a colouring. Then there exists
D ⊆
[
r
] and an infinite subset
X ⊆ N
such
that for all
x, y ∈ X
(r)
, we have
c
(
x
) =
c
(
y
) if
{i
:
x
i
=
y
i
} ⊇ D
, where
x = {x
1
< x
2
< ··· < x
r
} (and similarly for y).
1.2 Finite graphs
We now move on to discuss a finitary version of the above problem. Of course, if
we finitely colour an infinite graph, then we can obtain a finite monochromatic
subgraph of any size we want, because this is just a much weaker version of
Ramsey’s theorem. However, given
n < N
, if we finitely colour a graph of size
N, are we guaranteed to have a monochromatic subgraph of size n?
Before we begin, we note some definitions. Recall again that a graph
G
is a
pair (V, E) where E ⊆ V
(2)
.
Example. The path graph on n vertices P
n
is
We can write
V = [n], E = {12, 23, 34, ··· , (n − 1)n}.
Example. The n-cycle C
n
is
V = [n], E = {12, 23, ··· , (n − 1)n, 1n}.
Finally, we have
Example. The complete graph V
n
on n vertices is
V = [n], E = V
(2)
.
Concretely, the question we want to answer is the following — how big
does our (complete) graph have to be to guarantee a complete monochromatic
subgraph of size n?
In this section, we will usually restrict to 2 colours. Everything we say will
either be trivially be generalizable to more colours, or we have no idea how to
do so. It is an exercise for the reader to figure out which it is.
Definition
(Ramsey number)
.
We let
R
(
n
) =
R
(
K
n
) to be the smallest
N ∈ N
whenever we red-blue colour the edges of
K
N
, then there is a monochromatic
copy of K
n
.
It is not immediately obvious that
R
(
n
) has to exist, i.e. it is a finite number.
It turns out we can easily prove this from the infinite Ramsey theorem.
Theorem (Finite Ramsey theorem). For all n, we have R(n) < ∞.
Proof.
Suppose not. Let
n
be such that
R
(
n
) =
∞
. Then for any
m ≥ n
, there
is a 2-colouring
c
m
of
K
m
= [
m
]
(2)
such that there is no monochromatic set of
size n.
We want to use these colourings to build up a colouring of
N
(2)
with no
monochromatic set of size
n
. We want to say we take the “limit” of these
colourings, but what does this mean? To do so, we need these colourings to be
nested.
By the pigeonhole principle, there exists an infinite set
M
1
⊆ N
and some
fixed 2-colouring d
n
of [n] such that c
m
|
[n]
(2)
= d
n
for all m ∈ M
1
.
Similarly, there exists an infinite
M
2
⊆ M
1
such that
c
m
|
[n+1]
(2)
=
d
n+1
for
m ∈ M
2
, again for some 2-colouring
d
n+1
of [
n
+ 1]. We repeat this over and
over again. Then we get a sequence
d
n
, d
n+1
, ···
of colourings such that
d
i
is a
2-colouring of [
i
]
(2)
without a monochromatic
K
n
, and further for
i < j
, we have
d
j
|
[i]
(2)
= d
i
.
We then define a 2-colouring c of N
(2)
by
c(ij) = d
m
(ij)
for any m > i, j. Clearly, there exists no monochromatic K
n
in c, as any K
n
is
finite. This massively contradicts the infinite Ramsey theorem.
There are other ways of obtaining the finite Ramsey theorem from the infinite
one. For example, we can use the compactness theorem for first order logic.
Proof.
Suppose
R
(
n
) =
∞
. Consider the theory with proposition letters
p
ij
for
each
ij ∈ N
(2)
. We will think of the edge
ij
as red if
p
ij
=
>
, and blue if
p
ij
=
⊥
.
For each subset of
N
of size
n
, we add in the axiom that says that set is not
monochromatic.
Then given any finite subset of the axioms, it mentions only finitely many
subsets of
N
. Suppose it mentions vertices only up to
m ∈ N
. Then by assumption,
there is a 2-colouring of [
m
]
(2)
with no monochromatic subset of size
n
. So by
assigning
p
ij
accordingly (and randomly assigning the remaining ones), we have
found a model of this finite subtheory. Thus every finite fragment of the theory
is consistent. Hence the original theory is consistent. So there is a model, i.e. a
colouring of N
(2)
with no monochromatic subset.
This contradicts the infinite Ramsey theorem.
Similarly, we can do it by compactness of the space
{
1
,
2
}
N
endowed with
metric
d(f, g) =
1
2
n
if n = min{i : f
i
6= g
i
}.
By Tychonoff theorem, we know this is compact, and we can deduce the theorem
from this.
While these theorems save work by reusing the infinite Ramsey theorem, it
is highly non-constructive. It is useless if we want to obtain an actual bound on
R
(
n
). We now go back and see what we did when we proved the infinite Ramsey
theorem.
To prove the infinite Ramsey theory, We randomly picked a point
a
1
. Then
there are some things that are connected by red to it, and others connected by
blue:
a
1
Next we randomly pick a point in one of the red or blue sets, and try to move on
from there. Suppose we landed in the red one. Now note that if we find a blue
K
n
in the red set, then we are done. But on the other hand, we only need a red
K
n−1
, and not a full blown
K
n
. When we moved to this red set, the problem is
no longer symmetric.
Thus, to figure out a good bound, it is natural to consider an asymmetric
version of the problem.
Definition
(Off-diagonal Ramsey number)
.
We define
R
(
n, m
) =
R
(
K
n
, K
m
)
to be the minimum
N ∈ N
such that whenever we red-blue colour the edges of
K
N
, we either get a red K
n
or a blue K
m
.
Clearly we have
R(n, m) ≤ R(max{n, m}).
In particular, they are finite. Once we make this definition, it is easy to find a
bound on R.
Theorem. We have
R(n, m) ≤ R(n − 1, m) + R(n, m − 1).
for all n, m ∈ N. Consequently, we have
R(n, m) ≤
n + m − 1
n − 2
.
Proof. We induct on n + m. It is clear that
R(1, n) = R(n, 1) = 1, R(n, 2) = R(2, n) = n
Now suppose
N ≥ R
(
n −
1
, m
) +
R
(
n, m −
1). Consider any red-blue colouring
of K
N
and any vertex v ∈ V (K
n
). We write
v(K
n
) \ {v} = A ∪ B,
where each vertex
A
is joined by a red edge to
v
, and each vertex in
B
is joined
by a blue edge to v. Then
|A| + |B| ≥ N − 1 ≥ R(n − 1, m) + R(n, m − 1) − 1.
It follows that either
|A| ≥ R
(
n −
1
, m
) or
|B| ≥ R
(
n, m −
1). We wlog it is the
former. Then by definition of
R
(
n −
1
, m
), we know
A
contains either a blue
copy of
K
m
or a red copy of
K
n−1
. In the first case, we are done, and in the
second case, we just add v to the red K
n−1
.
The last formula is just a straightforward property of binomial coefficients.
In particular, we find
R(n) ≤
2n − 1
n − 2
≤
4
n
√
n
.
We genuinely have no idea whether
∼
4
n
is the correct growth rate, i.e. if there
is some
ε
such that
R
(
n
)
≤
(4
− ε
)
n
. However, we do know that for any
c >
0,
we eventually have
R(n) ≤
4
n
n
c
.
That was an upper bound. Do we have a lower bound? In particular, does
R
(
n
)
have to grow exponentially? The answer is yes, and the answer is a very classical
construction of Erd¨os.
Theorem. We have R(n) ≥
√
2
n
for sufficiently large n ∈ N.
The proof is remarkable in that before this was shown, we had no sensible
bound at all. However, the proof is incredibly simple, and revolutionized how
we think about colourings.
Proof.
Let
N ≤
√
2
n
. We perform a red-blue colouring of
K
N
randomly, where
each edge is coloured red independently of the others with probability
1
2
.
We let
X
R
be the number of red copies of
K
n
in such a colouring. Then
since expectation is linear, we know the expected value is
[X
R
] =
N
n
1
2
(
n
2
)
≤
eN
n
n
1
2
(
n
2
)
≤
e
n
n
√
2
n
2
√
2
−n
2
=
e
n
n
<
1
2
for sufficiently large n.
Similarly, we have [
X
B
]
<
1
2
. So the expected number of monochromatic
K
n
is
<
1. So in particular there must be some colouring with no monochromatic
K
n
.
Recall the bound
R(m, n) ≤
m + n − 2
m − 1
.
If we think of m as being fixed, then this tells us
R(m, n) ∼ (n + m)
m−1
.
For example, if m is 3, then we have
R(3, n) ≤
n + 1
2
=
n(n + 1)
2
∼ n
2
.
We can sort-of imagine where this bound came from. Suppose we randomly pick
a vertex
v
1
. Then if it is connected to at least
n
other vertices by a red edge,
then we are done — if there is even one red edge among those
n
things, then we
have a red triangle. Otherwise, all edges are blue, and we’ve found a complete
blue K
n
.
So this is connected to at most
n −
1 things by a red edge. So if our graph
is big enough, we can pick some
v
2
connected to
v
1
by a blue edge, and do the
same thing to
v
2
. We keep going on, and by the time we reach
v
n
, we would
have found
v
1
, ··· , v
n
all connected to each other by blue edges, and we are
done. So we have K(3, n) ∼ n
2
.
v
1
v
2
v
3
But this argument is rather weak, because we didn’t use that large pool of blue
edges we’ve found at v
1
. So in fact this time we can do better than n
2
.
Theorem. We have
R(3, n) ≤
100n
2
log n
for sufficiently large n ∈ N.
Here the 100 is, obviously, just some random big number.
Proof.
Let
N ≥
100
n
2
/ log n
, and consider a red-blue colouring of the edges of
K
N
with no red K
3
. We want to find a blue K
n
in such a colouring.
We may assume that
(i) No vertex v has ≥ n red edges incident to it, as argued just now.
(ii) If we have v
1
, v
2
, v
3
such that v
1
v
2
and v
1
v
3
are red, then v
2
v
3
is blue:
v
1
v
2
v
3
We now let
F = {W : W ⊆ V (K
N
) such that W
(2)
is monochromatic and blue}.
We want to find some
W ∈ F
such that
|W | ≥ n
, i.e. a blue
K
n
. How can we
go about finding this?
Let
W
be a uniformly random member of
F
. We will be done if we can show
that that E[|W |] ≥ n.
We are going to define a bunch of random variables. For each vertex
v ∈
V (K
N
), we define the variable
X
v
= n1
{v∈W }
+ |{u : uv is red and u ∈ W }|.
Claim.
E[X
v
] >
log n
10
for each vertex v.
To see this, let
A = {u : uv is red}
and let
B = {u : uv is blue}.
then from the properties we’ve noted down, we know that
|A| < n
and
A
(2)
is
blue. So we know very well what is happening in
A
, and nothing about what is
in B.
We fix a set
S ⊆ B
such that
S ∈ F
, i.e.
S
(2)
is blue. What can we say about
W if we condition on B ∩ W = S?
a
1
S
B
A
T
Let
T ⊆ A
be the set of vertices that are joined to
S
only by blue edges. Write
|T |
=
x
. Then if
B ∩W
=
S
, then either
W ⊆ S ∪T
, or
W ⊆ S ∪{v}
. So there
are exactly 2
x
+ 1 choices of W . So we know
E[X
v
| W ∩ B = S] ≥
n
2
x
+ 1
+
2
x
2
x
+ 1
(E[|random subset of T|])
=
n
2
x
+ 1
+
2
x−1
x
2
x
+ 1
.
Now if
x <
log n
2
,
then
n
2
x
+ 1
≥
n
√
n + 1
≥
log n
10
for all sufficiently large n. On the other hand, if
x ≥
log n
2
,
then
2
x−1
x
2
x
+ 1
≥
1
4
·
log n
2
≥
log n
10
.
So we are done.
Claim.
X
v∈V
X
v
≤ 2n|W |.
To see this, for each vertex
v
, we know that if
v ∈ W
, then it contributes
n
to the sum via the first term. Also, by our initial observation, we know that
v ∈ W
has at most
n
neighbours. So it contributes at most
n
to the second term
(acting as the “u”).
Finally, we know that
E[|W |] ≥
1
2n
X
E[X
V
] ≥
N
2n
log n
10
≥
100n
2
log n
·
log n
20n
≥ 5n.
Therefore we can always find some W ∈ F such that |W | ≥ n.
Now of course, there is the following obvious generalization of Ramsey
numbers:
Definition
(
R
(
G, H
))
.
Let
G, H
be graphs. Then we define
R
(
G, H
) to be the
smallest
N
such that any red-blue colouring of
K
N
has either a red copy of
G
or a blue copy of H.
Obviously, we have
R
(
G, H
)
≤ R
(
|G|, |H|
). So the natural question is if we
can do better than that.
Exercise. Show that
R(P
n
, P
n
) ≤ 2n.
So sometimes it can be much better.
2 Ramsey theory on the integers
So far, we’ve been talking about what happens when we finitely colour graphs.
What if we k-colour the integers N? What can we say about it?
It is a trivial statement that this colouring has a monochromatic subset, by the
pigeonhole principle. Interesting questions arise when we try to take the additive
structure of
N
into account. So we could ask, can we find a monochromatic
“copy” of N.
One way to make this question concrete is to ask if there is an infinite
monochromatic arithmetic progression.
The answer is easily a “no”! There are only countably many progressions, so
for each arithmetic progression, we pick two things in the progression and colour
them differently.
We can also construct this more concretely. We can colour the first number
red, the next two blue, the next three red etc. then it is easy to see that it
doesn’t have an infinite arithmetic progression.
···
But this is somewhat silly, because there is clearly a significant amount of
structure in the sequence there. It turns out the following is true:
Theorem
(van der Waerden theorem)
.
Let
m, k ∈ N
. Then there is some
N
=
W
(
m, k
) such that whenever [
N
] is
k
-coloured, then there is a monochromatic
arithmetic progression of length n.
The idea is to do induction on
m
. We will be using colourings with much
greater than k colours to deduce the existence of W (m, k).
We can try a toy example first. Let’s try to show that
W
(3
,
2) exists. Suppose
we have three natural numbers:
By the pigeonhole principle, there must be two things that are the same colour,
say
If this is the case, then if we don’t want to have an arithmetic progression of
length 3, then the fifth number must be blue
We now cut the universe into blocks into 5 things. Again by the pigeonhole
principle, there must be two blocks that look the same. Say it’s this block again.
Now we have two sequences, and the point at the end belongs to both of the
two sequences. And no matter what colour it is, we are done.
For
k
= 3, we can still find such a block, but now that third point could be a
third colour, say, green. This will not stop us. We can now look at these big
blocks, and we know that eventually, these big blocks must repeat themselves.
Here we did the case
m
= 2, and we used the pigeonhole principle. When we do
m > 2, we will use van der Waerden theorem for smaller m inductively.
We now come up with names to describe the scenario we had above.
Definition
(Focused progression)
.
We say a collection of arithmetic progressions
A
1
, A
2
, ··· , A
r
of length m with
A
i
= {a
i
, a
i
+ d
i
, ··· , a
i
+ (m − 1)d
i
}
are focused at f if a
i
+ md
i
= f for all 1 ≤ i ≤ r.
Example. {1, 4} and {5, 6} are focused at 7.
Definition
(Colour focused progression)
.
If
A
1
, ··· , A
r
are focused at
f
, and
each
A
i
is monochromatic and no two are the same colour, then we say they are
colour focused at f .
We can now write the proof.
Proof.
We induct on
m
. The result is clearly trivial when
m
= 1, and follows
easily from the pigeonhole principle when m = 2.
Suppose
m >
1, and assume inductively that
W
(
m −
1
, k
0
) exists for any
k
0
∈ N.
Here is the claim we are trying to established:
Claim.
For each
r ≤ k
, there is a natural number
n
such that whenever we
k-colour [n], then either
(i) There exists a monochromatic AP of length m; or
(ii) There are r colour-focused AP’s of length m − 1.
It is clear that this claim implies the theorem, as we can pick
r
=
k
. Then if
there isn’t a monochromatic AP of length
m
, then we look at the colour of the
common focus, and it must be one of the colours of those AP’s.
To prove the claim, we induct on
r
. When
n
= 1, we may take
W
(
m −
1
, k
).
Now suppose
r >
1, and some
n
0
works for
r −
1. With the benefit of hindsight,
we shall show that
n = W (m − 1, k
2n
0
)2n
0
works for r.
We consider any
k
-colouring of [
n
], and suppose it has no monochromatic
AP of length m. We need to find r colour-focused progressions of length n − 1.
We view this
k
-colouring of [
n
] as a
k
2n
0
colouring of blocks of length 2
n
0
, of
which there are W (m − 1, k
2n
0
).
Then by definition of W (m − 1, k
2n
0
), we can find blocks
B
s
, B
s+t
, ··· , B
s+(m−2)t
which are coloured identically. By the inductive hypothesis, we know each
B
s
contains
r −
1 colour-focused AP’s of length
m −
1, say
A
1
, .., A
r−1
with first
terms
a
1
, ··· , a
r
and common difference
d
1
, ··· , d
r−1
, and also their focus
f
,
because the length of B
s
is 2n
0
, not just n
0
.
Since we assumed there is no monochromatic progression of length
n
, we can
assume f has a different colour than the A
i
.
Now consider
A
0
1
, A
0
2
, ··· , A
0
r−1
, where
A
0
i
has first term
a
i
, common difference
d
i
+ 2
n
0
t
, and length
m −
1. This difference sends us to the next block, and then
the next term in the AP. We also pick
A
0
r
to consist of the all the focus of the
blocks B
i
, namely
A
0
r
= {f, f + 2n
0
t, ··· , f + 2n
0
t(m − 2)}
These progressions are monochromatic with distinct colours, and focused at
f + (2n
0
t)(m − 1). So we are done.
This argument, where one looks at the induced colourings of blocks, is called
a product argument.
The bounds we obtain from this argument are, as one would expect, terrible.
We have
W (3, k) ≤ k
.
.
.
k
4k
,
where the tower of k’s has length k − 1.
Now we can generalize in a different way. What can we say about monochro-
matic structures a
k
-colouring of
N
d
? What is the right generalization of van
der Waerden theorem?
To figure out the answer, we need to find the right generalization of arithmetic
progressions.
Definition
(Homothetic copy)
.
Given a finite
S ⊆ N
d
, a homothetic copy of
S
is a set of the form
`S + M,
where `, M ∈ N
d
and ` 6= 0.
Example.
An arithmetic progression of length
m
is just a homothetic copy of
[m] = {1, 2, ··· , m}.
Thus, the theorem we want to say is the following:
Theorem
(Gallai)
.
Whenever
N
d
is
k
-coloured, there exists a monochromatic
(homothetic) copy of S for each finite S ⊆ N
d
.
In order to prove this, we prove the beautiful Hales–Jewett theorem, which
is in some sense the abstract core of the argument we used for van der Waerden
theorem.
We need a bit of set up. As usual, we write
[m]
n
= {(x
1
, ··· , x
n
) : x
i
∈ [m]}.
Here is the important definition:
Definition
(Combinatorial line)
.
A combinatorial line
L ⊆
[
m
]
n
is a set of the
form
{(x
1
, ··· , x
n
n) : x
i
= x
j
for all i, j ∈ I; x
i
= a
i
for all i 6∈ I}
for some fixed non-empty set of coordinates I ⊆ [n] and a
i
∈ [m].
I is called the set of active coordinates.
Example. Here is a line in [3]
2
This line is given by I = {1}, a
2
= 1.
The following shows all the lines we have:
It is easy to see that any line has exactly [
m
] elements. We write
L
−
and
L
+
for the first and last point of the line, i.e. the points where the active coordinates
are all 1 and
m
respectively. It is clear that any line
L
is uniquely determined
by L
−
and its active coordinates.
Example. In [3]
3
, we have the line
L = {(1, 2, 1), (2, 2, 2), (3, 2, 3)}.
This is a line with
I
=
{
1
,
3
}
and
a
2
= 2. The first and last points are (1
,
2
,
1)
and (3, 2, 3).
Then we have the following theorem:
Theorem
(Hales–Jewett theorem)
.
For all
m, k ∈ N
, there exists
N
=
HJ
(
m, k
)
such that whenever [m]
N
is k-coloured, there exists a monochromatic line.
Note that this theorem implies van der Waerden’s theorem easily. The idea
is to embed [
m
]
N
into
N
linearly, so that a monochromatic line in [
m
]
N
gives an
arithmetic progression of length
m
. Explicitly, given a
k
-colouring
c
:
N →
[
k
],
we define c
0
: [m]
N
→ [k] by
c
0
(x
1
, x
2
, ··· , x
N
) = c(x
1
+ x
2
+ ··· + x
N
).
Now a monochromatic line gives us an arithmetic progression of length m. For
example, if the line is
L = {(1, 2, 1), (2, 2, 2), (3, 2, 3)},
then we get the monochromatic progression 4
,
6
,
8 of length 3. In general, the
monochromatic AP defined has d = |I|.
The proof is essentially what we did for van der Waerden theorem. We are
going to build a lot of almost-lines that point at the same vertex, and then no
matter what colour the last vertex is, we are happy.
Definition
(Focused lines)
.
We say lines
L
1
, ··· , L
r
are focused at
f ∈
[
m
]
N
if
L
+
i
= f for all i = 1, ··· , r.
Definition
(Colour focused lines)
.
If
L
1
, ··· , L
r
are focused lines, and
L
i
\{L
+
i
}
is monochromatic for each
i
= 1
, ··· , r
and all these colours are distinct, then
we say L
1
, ··· , L
r
are colour focused at f .
Proof.
We proceed by induction on
m
. This is clearly trivial for
m
= 1, as a line
only has a single point.
Now suppose
m >
1, and that
HJ
(
m −
1
, k
) exists for all
k ∈ N
. As before,
we will prove the following claim:
Claim.
For each 1
≤ r ≤ k
, there is an
n ∈ N
such that in any
k
-colouring of
[m]
n
, either
(i) there exists a monochromatic line; or
(ii) there exists r colour-focused lines.
Again, the result is immediate from the claim, as we just use it for
r
=
k
and
look at the colour of the focus.
The prove this claim, we induct on
r
. If
r
= 1, then picking
n
=
HJ
(
m−
1
, k
)
works, as a single colour-focused line of length
n
is just a monochromatic line of
length n − 1, and [m − 1]
n
⊆ [m]
n
naturally.
Now suppose
r >
1 and
n
works for
r −
1. Then we will show that
n
+
n
0
works, where
n
0
= HJ(m −1, k
m
n
).
Consider a colouring
c
: [
m
]
n+n
0
→
[
k
], and we assume this has no monochromatic
lines.
We think of [
m
]
n+n
0
as [
m
]
n
×
[
m
]
n
0
. So for each point in [
m
]
n
0
, we have
a whole cube [
m
]
n
. Consider a
k
m
n
colouring of [
m
]
n
0
as follows — given any
x ∈
[
m
]
n
0
, we look at the subset of [
m
]
n+n
0
containing the points with last
n
0
coordinates
x
. Then we define the new colouring of [
m
]
n
0
to be the restriction
of c to this [m]
n
, and there are m
n
possibilities.
Now there exists a line
L
such that
L \ L
+
is monochromatic for the new
colouring. This means for all a ∈ [m]
n
and for all b, b
0
∈ L \ L
+
, we have
c(a, b) = c(a, b
0
).
Let
c
00
(
a
) denote this common colour for all
a ∈
[
m
]
n
. this is a
k
-colouring of
[
m
]
n
with no monochromatic line (because
c
doesn’t have any). So by definition
of
n
, there exists lines
L
1
, L
2
, ··· , L
r−1
in [
m
]
n
which are colour-focused at some
f ∈ [m]
n
for c
00
.
In the proof of van der Waerden, we had a progression within each block,
and also how we just between blocks. Here we have the same thing. We have
the lines in [m]
n
, and also the “external” line L.
L
Consider the line L
0
1
, L
0
2
, ··· , L
0
r−1
in [m]
n+n
0
, where
(L
0
i
)
−
= (L
−
i
, L
−
),
and the active coordinate set is
I
i
∪ I
, where
I
is the active coordinate set of
L
.
Also consider the line F with F
−
= (f, L
−
) and active coordinate set I.
Then we see that
L
0
1
, ··· , L
0
r−1
, F
are
r
colour-focused lines with focus
(f, L
+
).
We can now prove our generalized van der Waerden.
Theorem
(Gallai)
.
Whenever
N
d
is
k
-coloured, there exists a monochromatic
(homothetic) copy of S for each finite S ⊆ N
d
.
Proof.
Let
S
=
{S
(1)
, S
(2)
, ··· , S
(
m
)
} ⊆ N
d
. Given a
k
-colouring
N
d
→
[
k
], we
induce a k-colouring c : [m]
N
→ [k] by
c
0
(x
1
, ··· , x
N
) = c(S(x
1
) + S(x
2
) + ··· + S(x
N
)).
By Hales-Jewett, for sufficiently large
N
, there exists a monochromatic line
L
for
c
0
, which gives us a monochromatic homothetic copy of
S
in
N
d
. For example, if
the line is (1 2 1), (2 2 2) and (3 2 3), then we know
c(S(1) + S(2) + S(1)) = c(S(2) + S(2) + S(2)) = c(S(3) + S(2) + S(3)).
So we have the monochromatic homothetic copy
λS
+
µ
, where
λ
= 2 (the
number of active coordinates), and µ = S(2).
Largeness in Ramsey theory*
In van der Waerden, we proved that for each
k, m
, there is some
n
such that
whenever we
k
-colour [
n
], then there is a monochromatic AP of length
m
. How
much of this is relies on the underlying set being [
n
]? Or is it just that if we
finitely colour [
n
], then one of the colours must contain a lot of numbers, and if
we have a lot of numbers, then we are guaranteed to have a monochromatic AP?
Of course, by “contains a lot of numbers”, we do not mean the actual number
of numbers we have. It is certainly not true that for some
n
, whenever we
k
-colour any set of
n
integers, we must have a monochromatic
m
-AP, because
an arbitrary set of
n
integers need not even contain an
m
-AP at all, let alone a
monochromatic one. Thus, what we really mean is density.
Definition (Density). For A ⊆ N, we let the density of A as
¯
d(A) = lim sup
(b−a)→∞
A ∩ [a, b]
|b − a|
.
Clearly, in any finite
k
-colouring of
N
, there exists a colour class with positive
density. Thus, we want to know if merely a positive density implies the existence
of progressions. Remarkably, the answer is yes!
Theorem
(Szemer´edi theorem)
.
Let
δ >
0 and
m ∈ N
. Then there exists some
N
=
S
(
m, δ
)
∈ N
such that any subset
A ⊆
[
N
] with
|A| ≥ δN
contains an
m-term arithmetic progression.
The proof of this theorem is usually the subject of an entire lecture course,
so we are not going to attempt to prove this. Even the particular case
m
= 3 is
very hard.
This theorem has a lot of very nice Ramsey consequences. In the case of
graph colourings, we asked ourselves what happens if we colour a graph with
infinitely many colours. Suppose we now have a colouring
c
:
N → N
. Can we
still find a monochromatic progression of length
m
? Of course not, because
c
can be injective.
Theorem. For any c : N → N, there exists a m-AP on which either
(i) c is constant; or
(ii) c is injective.
It is also possible to prove this directly, but it is easy with Szemer´edi.
Proof. We set
δ =
1
m
2
(m + 1)
2
.
We let N = S(m, δ). We write
[N] = A
1
∪ A
2
∪ ··· ∪ A
k
,
where the
A
i
’s are the colour-classes of
c|
[N]
. By choice of
N
, we are done if
|A
i
| ≥ δN for some 1 ≤ i ≤ k. So suppose not.
Let’s try to count the number of arithmetic progressions in [
N
]. There are
more than
N
2
/
(
m
+ 1)
2
of these, as we can take any
a, d ∈
[
N/m
+ 1]. We want
to show that there is an AP that hits each A
i
at most once.
So, fixing an
i
, how many AP’s are there that hit
A
i
at least twice? We need
to pick two terms in
A
i
, and also decide which two terms in the progression they
are in, e.g. they can be the first and second term, or the 5th and 17th term. So
there are at most m
2
|A
i
|
2
terms.
So the number of AP’s on which c is injective is greater than
N
2
(m + 1)
2
− k|A
i
|
2
m
2
≥
N
2
(m + 1)
2
−
1
δ
(δN)
2
m
2
=
N
2
(m + 1)
2
− δN
2
m
2
≥ 0.
So we are done. Here the first inequality follows from the fact that
P
|A
i
|
= [
N
]
and each |A
i
| < δN.
Our next theorem will mix arithmetic and graph-theoretic properties. Con-
sider a colouring
c
:
N
(2)
→
[2]. As before, we say a set
X
is monochromatic if
c|
X
(2)
is constant. Now we want to try to find a monochromatic set with some
arithmetic properties.
The first obvious question to ask is — can we find a monochromatic 3-term
arithmetic progression? The answer is no. For example, we can define
c
(
ij
) to be
the parity of largest power of 2 dividing
j −i
, and then there is no monochromatic
3-term arithmetic progression.
What if we make some concessions — can we find a blue 10-AP, or if not,
find an infinite red set? The answer is again no. This construction is slightly
more involved. To construct a counterexample, we can make progressively larger
red cliques, and take all other edges blue. If we double the size of the red cliques
every time, it is not hard to check that there is no blue 4-AP, and no infinite
red set.
···
What if we further relax our condition, and only require an arbitrarily large red
set?
Theorem. For any c : N
(2)
→ {red, blue}, either
(i) There exists a blue m-AP for each m ∈ N; or
(ii) There exists arbitrarily large red sets.
Proof.
Suppose we can’t find a blue
m
-AP for some fixed
m
. We induct on
r
,
and try to find a red set of size r.
Say
A ⊆ N
is a progression of length
N
. Since
A
has no blue
m
-term
progression, so it must contain many red edges. Indeed, each
m
-AP in
A
must
contain a red edge. Also each edge specifies two points, and this can be extended
to an
m
-term progression in at most
m
2
ways. Since there are
N
2
/
(
m
+ 1)
2
. So
there are at least
N
2
m
2
(m + 1)
2
red edges. With the benefit of hindsight, we set
δ =
1
2m
2
(m + 1)
2
.
The idea is that since we have lots of red edges, we can try to find a point with
a lot of red edges connected to it, and we hope to find a progression in it.
We say X, Y ⊆ N form an (r, k)-structure if
(i) They are disjoint
(ii) X is red;
(iii) Y is an arithmetic progression;
(iv) All X-Y edges are red;
(v) |X| = r and |Y | = k.
···
We show by induction that there is an (r, k)-structure for each r and k.
A (1
, k
) structure is just a vertex connected by red edges to a
k
-point structure.
If we take
N
=
S
(
δ, k
), we know among the first
N
natural numbers, there are
at least
N
2
/
(
m
2
(
m
+ 1)
2
) red edges inside [
N
]. So in particular, some
v ∈
[
N
]
has at least
δN
red neighbours in [
N
], and so we know
v
is connected to some
k-AP by red edges. That’s the base case done.
Now suppose we can find an (r − 1, k
0
)-structure for all k
0
∈ N. We set
k
0
= S
1
2m
2
(m + 1)
2
, k
.
We let (
X, Y
) be an (
r −
1
, k
0
)-structure. As before, we can find
v ∈ Y
such
that
v
has
δ|Y |
red neighbours in
Y
. Then we can find a progression
Y
0
of
length
k
in the red neighbourhood of
v
, and we are done, as (
X ∪ {v}, Y
0
) is an
(
r, k
)-structure, and an “arithmetic progression” within an arithmetic progression
is still an arithmetic progression.
Before we end this chapter, we make a quick remark. Everything we looked
for in this chapter involved the additive structure of the naturals. What about
the multiplicative structure? For example, given a finite colouring of
N
, can we
find a monochromatic geometric progression? The answer is trivially yes. We
can look at
{
2
x
:
x ∈ N}
, and then multiplication inside this set just looks like
addition in the naturals.
But what if we want to mix additive and multiplicative structures? For
example, can we always find a monochromatic set of the form
{x
+
y, xy}
? Of
course, there is the trivial answer
x
=
y
= 2, but is there any other? This
question was answered positively in 2016! We will return to this at the end of
the course.
3 Partition Regularity
In the previous chapter, the problems we studied were mostly “linear” in nature.
We had some linear system, namely that encoding the fact that a sequence is an
AP, and we wanted to know if it had a monochromatic solution. More generally,
we can define the following:
Definition
(Partition regularity)
.
We say an
m ×n
matrix
A
over
Q
is partition
regular if whenever
N
is finitely coloured, there exists an
x ∈ N
n
such that
Ax = 0 and x is monochromatic, i.e. all coordinates of x have the same colour.
Recall that N does not include 0.
There is no loss in generality by assuming
A
in fact has entries in
Z
, by
scaling
A
, but sometimes it is (notationally) convenient to consider cases where
the entries are rational.
The question of the chapter is the following — when is a matrix partition
regular? We begin by looking at some examples.
Example
(Schur’s theorem)
.
Schur’s theorem says whenever
N
is finitely
coloured, there exists a monochromatic set of the form
{x, y, x
+
y}
. In other
words the matrix
1 1 −1
is partition regular, since
1 1 −1
x
y
z
= 0 ⇐⇒ z = x + y.
Example.
How about
2 3 −5
. This is partition regular, because we can
pick any x, and we have
2 3 −5
x
x
x
= 0.
This is a trivial example.
How about van der Waerden’s theorem?
Example.
Van der Waerden’s theorem says there is a monochromatic 3-AP
{x
1
, x
2
, x
3
}
whenever
N
is finitely-coloured. We know
x
1
, x
2
, x
3
forms a 3-AP iff
x
3
− x
2
= x
2
− x
1
,
or equivalently
x
3
+ x
1
= 2x
2
.
This implies that
1 −2 1
is partition regular. But this is actually not a
very interesting thing to say, because
x
1
=
x
2
=
x
3
is always a solution to this
equation. So this falls into the previous “trivial” case.
On the second example sheet we saw a stronger version of van der Waerden.
It says we can always find a monochromatic set of the form
{d, a, a + d, a + 2d, ··· , a + md}.
By including this variable, we can write down the property of being a progression
in a non-trivial format by
1 1 0 −1
2 1 −1 0
d
a
x
2
x
3
= 0
This obviously generalizes to an arbitrary m-AP, with matrix
1 1 −1 0 0 ··· 0
1 2 0 −1 0 ··· 0
1 3 0 0 −1 ··· 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 m 0 0 0 ··· −1
.
We’ve seen three examples of partition regular matrices. Of course, not every
matrix is partition regular. The matrix
1 1
is not partition regular, for the
silly reason that two positive things cannot add up to zero.
Let’s now look at some non-trivial first matrix that is not partition regular.
Example.
The matrix
2 −1
is not partition regular, since we can put
c
(
x
) = (
−
1)
n
, where
n
is the maximum integer such that 2
n
| x
. Then
{x,
2
x}
is
never monochromatic.
A similar argument shows that if
λ ∈ Q
is such that
λ, −1
is partition
regular, then λ = 1.
But if we write down a random matrix, say
2 3 −6
? The goal of this
chapter is to find a complete characterization of matrices that are partition
regular.
Definition (Columns property). Let
A =
↑ ↑ ↑
c
(1)
c
(2)
··· c
(n)
↓ ↓ ↓
.
We say
A
has the columns property if there is a partition [
n
] =
B
1
∪B
2
∪···∪B
d
such that
X
i∈B
s
c
(i)
∈ span{c
(i)
: c
(i)
∈ B
1
∪ ··· ∪ B
s−1
}
for s = 1, ··· , d. In particular,
X
i∈B
1
c
(i)
= 0
What does this mean? Let’s look at the matrices we’ve seen so far.
Example.
1 1 −1
has the columns property by picking
B
1
=
{
1
,
3
}
and
B
2
= {2}.
Example.
2 3 −5
has the columns property by picking B
1
= {1, 2, 3}.
Example. The matrix
1 1 −1 0 0 ··· 0
1 2 0 −1 0 ··· 0
1 3 0 0 −1 ··· 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 m 0 0 0 ··· −1
has the column property. Indeed, we take
B
1
=
{
1
,
3
, ··· , m
+ 2
}
, and since
{c
(3)
, ··· , c
(m+2)
} spans all of R
m
, we know picking B
2
= {2} works.
Example.
` −1
has the columns property iff
`
= 1. In particular,
1 1
does not have a columns property.
Given these examples, it is natural to conjecture the following:
Theorem
(Rado’s theorem)
.
A matrix
A
is partition regular iff it has the
column property.
This is a remarkable theorem! The property of being partition regular
involves a lot of quantifiers, over infinitely many things, but the column property
is entirely finite, and we can get a computer to check it for us easily.
Another remarkable property of this theorem is that neither direction is obvi-
ous! It turns out partition regularity implies the column property is (marginally)
easier, because if we know something is partition regular, then we can try to
cook up some interesting colourings and see what happens. The other direction
is harder.
To get a feel of the result, we will first prove it in the case of a single equation.
The columns property in particular becomes much simpler. It just means that
there exists a non-empty subset of the non-zero a
i
’s that sums up to zero.
Theorem.
If
a
1
, ··· , a
n
∈ Q \ {
0
}
, then
a
1
··· a
n
is partition regular iff
there exists a non-empty I ⊆ [n] such that
X
i∈I
a
i
= 0.
For a fixed prime
p
, we let
d
(
x
) denote the last non-zero digit of
x
in base
p
,
i.e. if
x = d
n
p
n
+ d
n−1
p
n−1
+ ··· + d
0
,
then
L(x) = min{i : d
i
6= 0}
and d(x) = d
L(x)
. We now prove the easy direction of the theorem.
Proposition.
If
a
2
, ··· , a
n
∈ Q\{
0
}
and
a
1
a
2
··· a
n
is partition regular,
then
X
i∈I
a
i
= 0
for some non-empty I.
Proof. We wlog a
i
∈ Z, by scaling. Fix a suitably large prime
p >
n
X
i=1
|a
i
|,
and consider the (
p −
1)-colouring of
N
where
x
is coloured
d
(
x
). We find
x
1
, ··· , x
n
such that
X
a
i
x
i
= 0.
and
d
(
x
i
) =
d
for some
d ∈ {
1
, ··· , p −
1
}
. We write out everything in base
p
,
and let
L = min{L(x
i
) : 1 ≤ i ≤ n},
and set
I = {i : L(x
i
) = L}.
Then for all i ∈ I, we have
x
i
≡ d (mod p
L+1
).
On the other hand, we are given that
X
a
i
x
i
= 0.
Taking mod p
L+1
gives us
X
i∈I
a
i
d = 0 (mod p
L+1
).
Since d is invertible, we know
X
i∈I
a
i
= 0 (mod p
L+1
).
But by our choice of p, this implies that
P
i∈I
a
i
= 0.
Here we knew what prime
p
to pick ahead. If we didn’t, then we could
consider all primes p, and for each p we find I
p
⊆ [n] such that
X
i∈I
p
a
i
= 0 (mod p).
Then some
I
p
has to occur infinitely often, and then we are done. Note that this
is merely about the fact that if a fixed number is 0 mod
n
for arbitrarily large
n
,
then it must be zero. This is not some deep local-global correspondence number
theoretic fact.
It turns out this is essentially the only way we know for proving this theorem.
One possible variation is to use the “first non-zero digit” to do the colouring,
but this is harder.
Let’s now try and do the other direction. Before we do that, we start by
doing a warm up. Last time, we proved that if we had
1 λ
, then this is
partition regular iff λ = −1. We now prove a three element version.
Proposition. The equation
1 λ −1
is partition regular for all λ ∈ Q.
Proof.
We may wlog
λ >
0. If
λ
= 0, then this is trivial, and if
λ <
0, then we
can multiply the whole equation by −1.
Say
λ =
r
s
.
The idea is to try to find solutions of this in long arithmetic progressions.
Suppose N is k-coloured. We let
{a, a + d, ··· , a + nd}
be a monochromatic AP, for n sufficiently large.
If sd were the same colour as this AP, then we’d be done, as we can take
x = a, y = sd, z = a +
r
s
sd.
In fact, if any of
sd,
2
sd, ··· ,
n
r
sd
have the same colour as the AP, then we’d
be done by taking
x = a, y = isd, z = a +
r
s
isd = a + ird ≤ a + nd.
If this were not the case, then
{sd,
2
sd, ··· ,
n
r
sd}
is (
k −
1)-coloured, and this
is just a scaled up copy of N. So we are done by induction on k.
Note that when
λ
= 1, we have
1 1 −1
is partition regular, and this
may be proved by Ramsey’s theorem. Can we prove this more general result by
Ramsey’s theorem as well? The answer is, we don’t know.
It turns out this is not just a warm up, but the main ingredient of what we
are going to do.
Theorem.
If
a
1
, ··· , a
n
∈ Q
, then
a
1
··· a
n
is partition regular iff there
exists a non-empty I ⊆ [n] such that
X
i∈I
a
i
= 0.
Proof.
One direction was done. To do the other direction, we recall that we had
a really easy case of, say,
2 3 −5
,
because we can just make all the variables the same?
In the general case, we can’t quite do this, but we may try to solve this
equation with the least number of variables possible. In fact, we shall find some
monochromatic x, y, z, and then assign each of x
1
, ··· , x
n
to be one of x, y, z.
We know
X
i∈I
a
i
= 0.
We now partition I into two pieces. We fix i
0
∈ I, and set
x
i
=
x i = i
0
y i ∈ I \ {i
0
}
z i 6∈ I
.
We would be done if whenever we finitely colour
N
, we can find monochromatic
x, y, z such that
a
i
0
x +
X
i∈I\{i
0
}
a
i
z +
X
i6∈I
a
i
y = 0.
But, since
X
i∈I
a
i
= 0,
this is equivalent to
a
i
0
x − a
i
0
z + (something)y = 0.
Since all these coefficients were non-zero, we can divide out by
a
i
0
, and we are
done by the previous case.
Note that our proof of the theorem shows that if an equation is partition
regular for all last-digit base
p
colourings, then it is partition regular for all
colourings. This might sound like an easier thing to prove that the full-blown
Rado’s theorem, but it turns out the only proof we have for this implication is
Rado’s theorem.
We now prove the general case. We first do the easy direction, because it is
largely the same as the single-equation case.
Proposition.
If
A
is an
m × n
matrix with rational entries which is partition
regular, then A has the columns property.
Proof.
We again wlog all entries of
A
are integers. Let the columns of
A
be
c
(1)
, ··· , c
(n)
. Given a prime
p
, we consider the (
p −
1)-colouring of
N
, where
x
is coloured
d
(
x
), the last non-zero digit in the base
p
expansion. Since
A
is
partition regular, we obtain a monochromatic solution.
We then get a monochromatic x
1
, ··· , x
n
such that Ax = 0, i.e.
X
x
i
c
(i)
= 0.
Any such solution with colour
d
induces a partition of [
n
] =
B
1
∪ B
2
∪ ··· ∪ B
r
,
where
– For all i, j ∈ B
s
, we have L(x
i
) = L(x
j
); and
– For all s < t and i ∈ B
s
, j ∈ B
t
, the L(x
i
) < L(x
j
).
Last time, with the benefit of hindsight, we were able to choose some large prime
p
that made the argument work. So we use the trick we mentioned after the
proof last time.
Since there are finitely many possible partitions of [
n
], we may assume that
this particular partition is generated by infinitely many primes
p
. Call these
primes
P
. We introduce some notation. We say two vectors
u, v ∈ Z
m
satisfy
u ≡ v (mod p) if u
i
≡ v
i
(mod p) for all i = 1, ··· , m.
Now we know that
X
x
i
c
(i)
= 0.
Looking at the first non-zero digit in the base p expansion, we have
X
i∈B
1
dc
(i)
≡ 0 (mod p).
From this, we conclude that, by multiplying by d
−1
X
i∈B
1
c
(i)
≡ 0 (mod p),
for all p ∈ P. So we deduce that
X
i∈B
1
c
(i)
= 0.
Similarly, for higher s, we find that for each base p colouring, we have
X
i∈B
s
p
t
dc
(i)
+
X
i∈B
1
∪...∪B
s
x
i
c
(i)
≡ 0 (mod p
t+1
)
for all s ≥ 2, and some t dependent on s and p. Multiplying by d
−1
, we find
X
i∈B
s
p
t
c
(i)
+
X
i∈B
1
∪...∪B
s−1
(d
−1
x
i
)c
(i)
≡ 0 (mod p
t+1
). (∗)
We claim that this implies
X
i∈B
s
c
(i)
∈ hc
(i)
: i ∈ B
1
∪ ··· ∪ B
s−1
i.
This is not exactly immediate, because the values of
x
i
in (
∗
) may change as we
change our p. But it is still some easy linear algebra.
Suppose this were not true. Since we are living in a Euclidean space, we have
an inner product, and we can find some v ∈ Z
m
such that
hv, c
(i)
i = 0 for all i ∈ B
1
∪ ··· ∪ B
s−1
,
and
*
v,
X
i∈B
s
c
(i)
+
6= 0.
But then, taking inner products with gives
p
t
*
v,
X
i∈B
s
c
(i)
+
≡ 0 (mod p
t+1
).
Equivalently, we have
*
v,
X
i∈B
s
c
(i)
+
≡ 0 (mod p),
but this is a contradiction. So we showed that
A
has the columns property with
respect to the partition [n] = B
1
∪ ··· ∪ B
r
.
We now prove the hard direction. We want an analogous gadget to our
1 λ −1
we had for the single-equation case. The definition will seem rather
mysterious, but it turns out to be what we want, and its purpose becomes more
clear as we look at some examples.
Definition
((
m, p, c
)-set)
.
For
m, p, c ∈ N
, a set
S ⊆ N
is an (
m, p, c
)-set with
generators x
1
, ··· , x
m
if S has the form
S =
m
X
i=0
λ
i
x
i
:
λ
i
= 0 for all i < j
λ
j
= c
λ
i
∈ [−p, p]
.
In other words, we have
S =
m
[
j=1
{cx
j
+ λ
j+1
x
j+1
+ ··· + λ
m
x
m
: λ
i
∈ [−p, p]}.
For each
j
, the set
{cx
j
+
λ
j+1
x
j+1
+
···
+
λ
m
x
m
:
λ
i
∈
[
−p, p
]
}
is called a row
of S.
Example. What does a (2, p, 1) set look like? It has the form
{x
1
− px
2
, x
1
− (p − 1)x
2
, ··· , x
1
+ px
2
} ∪ {x
2
}.
In other words, this is just an arithmetic progression with its common difference.
Example. A (2, p, 3)-set has the form
{3x
1
− px
2
, ··· , 3x
1
, ··· , 3x
1
= px
2
} ∪ {3x
2
}.
The idea of an (
m, p, c
) set is that we “iterate” this process many times, and
so an (
m, p, c
)-set is an “iterated AP’s and various patterns of their common
differences”.
Our proof strategy is to show that that whenever we finitely-colour
N
, we can
always find an (
m, p, c
)-set, and given any matrix
A
with the columns property
and any (
m, p, c
)-set (for suitable
m
,
p
,
c
), there will always be a solution in
there.
Proposition.
Let
m, p, c ∈ N
. Then whenever
N
is finitely coloured, there
exists a monochromatic (m, p, c)-set.
Proof.
It suffices to find an (
m, p, c
)-set all of whose rows are monochromatic,
since when
N
is
k
-coloured, and (
m
0
, p, c
)-set with
m
0
=
mk
+ 1 has
m
monochro-
matic rows of the same colour by pigeonhole, and these rows contain a monochro-
matic (m, p, c)-set, by restricting to the elements where a lot of the λ
i
are zero.
In this proof, whenever we say (
m, p, c
)-set, we mean one all of whose rows are
monochromatic.
We will prove this by induction. We have a
k
-colouring of [
n
], where
n
is
very very very large. This contains a k-colouring of
B =
n
c, 2c, ··· ,
j
n
c
k
c
o
.
Since
c
is fixed, we can pick this so that
n
c
is large. By van der Waerden, we
find some set monochromatic
A = {cx
1
− Nd, cx
1
− (N − 1)d, ··· , cx
1
+ Nd} ⊆ B,
with
N
very very large. Since each element is a multiple of
c
by assumption,
we know that
c | d
. By induction, we may find an (
m −
1
, p, c
)-set in the set
{d,
2
d, ··· , M d}
, where
M
is large. We are now done by the (
m, p, c
) set on
generators x
1
, ··· , x
n
, provided
cx
1
+
m
X
i=2
λ
i
x
i
∈ A
for all
λ
i
∈
[
−p, p
], which is easily seen to be the case, provided
N ≥
(
m −
1)pM.
Note that the argument itself is quite similar to that in the
1 λ −1
case.
Recall that Schur’s theorem said that whenever we finitely-colour
N
, we can
find a monochromatic
{x, y, x
+
y}
. More generally, for
x
1
, x
2
, ··· , x
m
∈ N
, we
let
F S(x
1
, ··· , x
m
) =
(
X
i∈I
x
i
: I ⊆ [n], I 6= ∅
)
.
The existence of a monochromatic (m, 1, 1)-sets gives us
Corollary
(Finite sum theorem)
.
For every fixed
m
, whenever we finitely-colour
N, there exists x
1
, ··· , x
m
such that F S(x
1
, ··· , x
m
) is monochromatic.
This is since an (
m,
1
,
1)-set contains more things than
F S
(
x
1
, ··· , x
m
). This
was discovered independently by a bunch of different people, including Folkman,
Rado and Sanders.
Similarly, if we let
F P (x
1
, ··· , x
m
) =
(
Y
i∈I
x
i
: I ⊆ [n], I 6= ∅
)
,
then we can find these as well. For example, we can restrict attention to
{
2
n
:
n ∈ N}
, and use the finite sum theorem. This is the same idea as we had
when we used van der Waerden to find geometric progressions.
But what if we want both? Can we have
F S
(
x
1
, ··· , x
m
)
∪ F P
(
x
1
, ··· , x
m
)
in the same colour class? The answer is actually not known! Even the case
when
m
= 2, i.e. finding a monochromatic set of the form
{x, y, x
+
y, xy}
is
open. Until mid 2016, we did not know if we can find
{x
+
y, xy}
monochromatic
(x, y > 2).
To finish he proof of Rado’s theorem, we need the following proposition:
Proposition.
If
A
is a rational matrix with the columns property, then there
is some
m, p, c ∈ N
such that
Ax
= 0 has a solution inside any (
m, p, c
) set, i.e.
all entries of the solution lie in the (m, p, c) set.
In the case of a single equation, we reduced the general problem to the case
of 3 variables only. Here we are going to do something similar — we will use the
columns property to reduce the solution to something much smaller.
Proof. We again write
A =
↑ ↑ ↑
c
(1)
c
(2)
··· c
(n)
↓ ↓ ↓
.
Re-ordering the columns of A if necessary, we assume that we have
[n] = B
1
∪ ··· ∪ B
r
such that max(B
s
) < min(B
s+1
) for all s, and we have
X
i∈B
s
c
(i)
=
X
i∈B
1
∪...∪B
s−1
q
is
c
(i)
for some
q
is
∈ Q
. These
q
is
only depend on the matrix. In other words, we have
X
d
is
c
(i)
= 0,
where
d
is
=
−q
is
i ∈ B
1
∪ ···B
s−1
1 i ∈ B
s
0 otherwise
For a fixed
s
, if we scan these coefficients starting from
i
=
n
and then keep
decreasing
i
, then the first non-zero coefficient we see is 1, which is good, because
it looks like what we see in an (m, p, c) set.
Now we try to write down a general solution with
r
many free variables.
Given x
1
, ··· , x
r
∈ N
r
, we look at
y
i
=
r
X
s=1
d
is
x
s
.
It is easy to check that Ay = 0 since
X
y
i
c
(i)
=
X
i
X
s
d
is
x
s
c
(i)
=
X
s
x
s
X
i
d
is
c
(i)
= 0.
Now take
m
=
r
, and pick
c
large enough such that
cd
is
∈ Z
for all
i, s
, and
finally, p = max{cd
is
: i, s ∈ Q}.
Thus, if we consider the (
m, p, c
)-set on generators (
x
m
, ··· , x
1
) and
y
i
as
defined above, then we have
Ay
= 0 and hence
A
(
cy
) = 0. Since
cy
is integral,
and lies in the (m, p, c) set, we are done!
We have thus proved Rado’s theorem.
Theorem
(Rado’s theorem)
.
A matrix
A
is partition regular iff it has the
column property.
So we have a complete characterization of all partition regular matrices.
Note that Rado’s theorem reduces Schur’s theorem, van der Waerden’s
theorem, finite sums theorem etc. to just checking if certain matrices have the
columns property, which are rather straightforward computations.
More interestingly, we can prove some less obvious “theoretical” results.
Corollary
(Consistency theorem)
.
If
A
,
B
are partition regular in independent
variables, then
A 0
0 B
is partition regular. In other words, we can solve
Ax
= 0 and
By
= 0 simultane-
ously in the same colour class.
Proof. The matrix
A 0
0 B
has the columns property if A and B do.
In fact, much much more is true.
Corollary.
Whenever
N
is finitely-coloured, one colour class contains solutions
to all partition regular systems!
Proof.
Suppose not. Then we have
N
=
D
1
∪ ··· ∪ D
k
such that for each
D
i
,
there is some partition regular matrix
A
i
such that we cannot solve
A
i
x
= 0
inside
D
i
. But this contradicts the fact that
diag
(
A
1
, A
2
, ··· , A
k
) is partition
regular (by applying consistency theorem many times).
Where did this whole idea of the (
m, p, c
)-sets come from? The original proof
by Rado didn’t use (
m, p, c
)-sets, and this idea only appeared only a while later,
when we tried to prove a more general result.
In general, we call a set
D ⊆ N
partition regular if we can solve any partition
regular system in
D
. Then we know that
N,
2
N
are partition regular sets, but
2
N
+ 1 is not (because we can’t solve
x
+
y
=
z
, say). Then what Rado’s theorem
says is that whenever we finitely partition
N
, then one piece of
N
is partition
regular.
In the 1930’s, Rado conjectured that there is nothing special about
N
to
begin with — whenever we break up a partition regular set, then one of the
pieces is partition regular. This conjecture was proved by Deuber in the 1970s
who introduced the idea of the (m, p, c)-set.
It is not hard to check that
D
is partition regular iff
D
contains an (
m, p, c
)
set of each size. Then Deuber’s proof involves showing that for all
m, p, c, k ∈ N
,
there exists
n, q, d ∈ N
such that any
k
-colouring of an (
n, q, d
)-set contains
a monochromatic (
m, p, c
) set. The proof is quite similar to how we found
(
m, p, c
)-sets in the naturals, but instead of using van der Waerden theorem, we
need the Hales–Jewett theorem.
We end by mentioning an open problem in this area. Suppose
A
is an
m ×n
matrix
A
that is not partition regular. That is, there is some
k
-colouring of
N
with no solution to Ax = 0 in a colour class. Can we find some bound f(m, n),
such that every such
A
has a “bad” colouring with
k < f
(
m, n
)? This is an open
problem, first conjectured by Rado, and we think the answer is yes.
What do we actually know about this problem? The answer is trivially yes
for
f
(1
,
2), as there aren’t many matrices of size 1
×
2, up to rescaling. It is a
non-trivial theorem that
f
(1
,
3) exists, and in fact
f
(1
,
3)
≤
24. We don’t know
anything more than that.
4 Topological Dynamics in Ramsey Theory
Recall the finite sum theorem — for any fixed
k
, whenever
N
is finitely coloured,
we can find
x
1
, ··· , x
k
such that
F S
(
x
1
, x
2
, ··· , x
k
) is monochromatic. One
natural generalization is to ask the following question — can we find an infinite
set A such that
F S(A) =
(
X
x∈B
x : B ⊆ A finite non-empty
)
is monochromatic? The first thought upon hearing this question is probably
either “this is so strong that there is no way it can be true”, or “we can deduce
it easily, perhaps via some compactness argument, from the finite sum theorem”.
Both of these thoughts are wrong. It is true that it is always possible to find
these x
1
, x
2
, ···, but the proof is hard. This is Hindman’s theorem.
The way we are going to approach this is via topological dynamics, which
is the title of this chapter. The idea is we construct a metric space (and in
particular a dynamical system) out of the Ramsey-theoretic problem we are
interested in, and then convert the Ramsey-theoretic problem into a question
about topological properties of the space we are constructed.
In the case of Hindman’s theorem, it turns out the “topological” form is a
general topological fact, which we can prove for a general dynamical system.
What is the advantage of this approach? When we construct the dynamical
system we are interested in, what we get is highly “non-geometric”. It would
be very difficult to think of it as an actual “space”. It is just something that
happens to satisfy the definition of a metric space. However, when we prove the
general topological facts, we will think of our metric space as a genuine space
like
R
n
, and this makes it much easier to visualize what is going on in the proofs.
A less obvious benefit of this approach is that we will apply Tychonoff’s
theorem many times, and also use the fact that the possibly infinite intersection
of nested compact sets is open. If we don’t formulate our problem in topological
terms, then we might find ourselves re-proving these repeatedly, which further
complicates the proof.
We now begin our journey.
Definition
(Dynamical system)
.
A dynamical system is a pair (
X, T
) where
X
is the compact metric space and T is a homeomorphism on X.
Example. (T = R/Z, T
α
(x) = x + α) for α ∈ R is a dynamical system.
In this course, this is not the dynamical system we are interested in. In fact,
we are only ever going to be interested in one dynamical system.
We fix number of colours
k ∈ N
. Instead of looking at a particular colouring,
we consider the space of all k-colourings. We let
C = {c : Z → [k]}
∼
=
[k]
Z
.
We endow this with the product topology. Since [
k
] has the discrete topology,
our basic open sets have the form
U = {c : Z → [k] : c(i
1
) = c
1
, ··· , c(i
s
) = c
s
}
for some i
1
, ··· , i
s
∈ Z and c
1
, ··· , c
s
∈ [k].
Since [
k
] is trivially compact, by Tychonoff’s theorem, we know that
C
is
compact. But we don’t really need Tychonoff, since we know that
C
is metrizable,
with metric
ρ(c
1
, c
2
) =
1
n + 1
,
where
n
is the largest integer for which
c
1
and
c
2
agree on [
−n, n
]. This metric
is easily seen to give rise to the same topology, and by hand, we can see this is
sequentially compact.
We now have to define the operation on C we are interested in.
Definition (Left shift). The left shift operator L : C → C is defined by
(Lc)(i) = c(i + 1).
We observe that this map is invertible, with inverse given by the right shift. This
is one good reason to work over
Z
instead of
N
. Moreover, we see that
L
is nice
and uniformly continuous, since if
ρ(c
1
, c
2
) ≤
1
n + 1
,
then
ρ(Lc
1
, Lc
2
) ≤
1
n
.
Similarly,
L
−1
is uniformly continuous. So it follows that (
C, L
) is a dynamical
system.
Instead of proving Hindman’s theorem directly, we begin by re-proving van
der Waerden’s theorem using this approach, to understand better how it works.
Theorem
(Topological van der Waerden)
.
Let (
X, T
) be a dynamical system.
Then there exists an
ε >
0 such that whenever
r ∈ N
, then we can find
x ∈ X
and n ∈ N such that ρ(x, T
in
x) < ε for all i = 1, ··· , r.
In words, this says if we flow around
X
under
T
, then eventually some point
must go back near to itself, and must do that as many times as we wish to, in
regular intervals.
To justify calling this the topological van der Waerden theorem, we should
be able to easily deduce van der Waerden’s theorem from this. It is not difficult
to imagine this is vaguely related to van der Waerden in some sense, but it is
not obvious how we can actually do the deduction. After all, topological van
der Waerden theorem talks about the whole space of all colourings, and we do
not get to control what
x
we get back from it. What we want is to start with a
single colouring c and try to say something about this particular c.
Now of course, we cannot restrict to the sub-system consisting only of
c
itself,
because, apart from it being really silly, the function
L
does not restrict to a
map
{c} → {c}
. We might think of considering the set of all translates of
c
.
While this is indeed closed under
L
, this is not a dynamical system, since it is
not compact! The solution is to take the closure of that.
Definition
(Orbital closure)
.
Let (
X, T
) be a dynamical system, and
x ∈ X
.
The orbital closure ¯x of x is the set cl{T
s
x : s ∈ Z}.
Observe that ¯x is a closed subset of a compact space, hence compact.
Proposition. (
¯
X, T ) is a dynamical system.
Proof.
It suffices to show that
¯x
is closed under
T
. If
y ∈ ¯x
, then we have
some
s
i
such that
T
s
i
x → y
. Since
T
is continuous, we know
T
s
i
+1
x → T y
. So
T y ∈ ¯x. Similarly, T
−1
¯x = ¯x.
Once we notice that we can produce such a dynamical system from our
favorite point, it is straightforward to prove van der Waerden.
Corollary
(van der Waerden theorem)
.
Let
r, k ∈ N
. Then whenever
Z
is
k-coloured, then there is a monochromatic arithmetic progression of length r.
Proof.
Let
c
:
Z →
[
k
]. Consider (
¯c, L
). By topological van der Waerden, we can
find
x ∈ ¯c
and
n ∈ N
such that
ρ
(
x, L
in
x
)
<
1 for all
i
= 1
, ··· , r
. In particular,
we know that x and L
in
x agree at 0. So we know that
x(0) = L
in
x(0) = x(in)
for all i = 0, ··· , r.
We are not quite done, because we just know that
x ∈ ¯c
, and not that
x = L
k
x for some k.
But this is not bad. Since x ∈ ¯c, we can find some s ∈ Z such that
ρ(T
s
c, x) ≤
1
rn + 1
.
This means x and T
s
c agree on the first rn + 1 elements. So we know
c(s) = c(s + n) = ··· = c(s + rn).
Since we will be looking at these
¯c
a lot, it is convenient to figure out what
this
¯c
actually looks like. It turns out there is a reasonably good characterization
of these orbital closures.
Let’s look at some specific examples. Consider c given by
Then the orbital closure has just two points, c and Lc.
What if we instead had the following?
Then ¯c contains all translates of these, but also
,
We define the following:
Definition (Seq). Let c : Z → [k]. We define
Seq(c) = {(c(i), ··· , c(i + r)) : i ∈ Z, r ∈ N}.
It turns out these sequences tell us everything we want to know about orbital
closures.
Proposition. We have c
0
∈ ¯c iff Seq(c
0
) ⊆ Seq(c).
The proof is just following the definition and checking everything works.
Proof.
We first prove
⇒
. Suppose
c
0
∈ ¯c
. Let (
c
0
(
i
)
, ··· , c
0
(
i
+
r −
1))
∈ Seq
(
c
).
Then we have s ∈ Z such that
ρ(c
0
, L
s
c) <
1
1 + max(|i|, |i + s − 1|)
,
which implies
(c(s + i), ··· , c(s + i + r − 1)) = (c
0
(i), ··· , c
0
(i + s − 1)).
So we are done.
For ⇐, if Seq(c
0
) ⊆ Seq(c), then for all n ∈ N, there exists s
n
∈ Z such that
(c
0
(−n), ··· , c
0
(n)) = (c(s
n
− n), ··· , c(s
n
+ n)).
Then we have
L
s
i
c → c
0
.
So we have c
0
∈ ¯c.
We now return to talking about topological dynamics. We saw that in our
last example, the orbital closure of
c
has three “kinds” of things — translates of
c
, and the all-red and all-blue ones. The all-red and all-blue ones are different,
because they seem to have “lost” the information of
c
. In fact, the orbital closure
of each of them is just that colouring itself. This is a strict subset of
¯c
. These
are phenomena we don’t want.
Definition
(Minimal dynamical system)
.
We say (
X, T
) is minimal if
¯x
=
X
for all
x ∈ X
. We say
x ∈ X
is minimal if (
¯x, T
) is a minimal dynamical system.
Proposition. Every dynamical system (X, T ) has a minimal point.
Thus, most of the time, when we want to prove something about dynamical
systems, we can just pass to a minimal subsystem, and work with it.
Proof.
Let
U
=
{¯x
:
x ∈ X}
. Thus is a family of closed sets, ordered by inclusion.
We want to apply Zorn’s lemma to obtain a minimal element. Consider a chain
S in U. If
¯x
1
⊇ ¯x
2
⊇ ··· ⊇ ¯x
n
,
then their intersection is
¯x
n
, which is in particular non-empty. So any finite
collection in S has non-empty intersection. Since X is compact, we know
\
¯x∈S
¯x 6= ∅.
We pick
z ∈
\
¯x∈S
¯x 6= ∅.
Then we know that
¯z ⊆ ¯x
for all ¯x ∈ S. So we know
¯z ⊆
\
¯x∈S
¯x 6= ∅.
So by Zorn’s lemma, we can find a minimal element (in both senses).
Example. Consider c given by
Then we saw that ¯c contains all translates of these, and also
,
Then this system is not minimal, since the orbital closure of the all-red one is a
strict subsystem of ¯c.
We are now going to derive some nice properties of minimal systems.
Lemma.
If (
X, T
) is a minimal system, then for all
ε >
0, there is some
m ∈ N
such that for all x, y ∈ X, we have
min
|s|≤m
ρ(T
s
x, y) < ε.
Proof. Suppose not. Then there exists ε > 0 and points x
i
, y
i
∈ X such that
min
|s|≤i
ρ(T
s
x
i
, y
i
) ≥ ε.
By compactness, we may pass to subsequences, and assume
x
i
→ x
and
y
i
→ y
.
By continuity, it follows that
ρ(T
s
x, y) ≥ ε
for all s ∈ Z. This is a contradiction, since ¯x = X by minimality.
We now want to prove topological van der Waerden.
Theorem
(Topological van der Waerden)
.
Let (
X, T
) be a dynamical system.
Then there exists an
ε >
0 such that whenever
r ∈ N
, then we can find
x ∈ X
and n ∈ N such that ρ(x, T
in
x) < ε for all i = 1, ··· , r.
Proof.
Without loss of generality, we may assume (
X, T
) is a minimal system.
We induct on
r
. If
r
= 1, we can just choose
y ∈ X
, and consider
T y, T
2
y, ···
.
Then note that by compactness, we have s
1
, s
2
∈ N such that
ρ(T
s
1
y, T
s
2
y) < ε,
and then take x = T
s
1
y and n = s
2
− s
1
.
Now suppose the result is true for
r >
1, and that we have the result for all
ε > 0 and r − 1.
Claim.
For all
ε >
0, there is some point
y ∈ X
such that there is an
x ∈ X
and n ∈ N such that
ρ(t
in
x, y) < ε
for all 1 ≤ i ≤ r.
Note that this is a different statement from the theorem, because we are not
starting at
x
. In fact, this is a triviality. Indeed, let (
x
0
, n
) be as guaranteed by
the hypothesis. That is,
ρ
(
x
0
, T
in
x
0
)
< ε
for all 1
≤ i ≤ r
. Then pick
y
=
x
0
and x = T
−n
x
0
.
The next goal is to show that there is nothing special about this y.
Claim.
For all
ε >
0 and for all
z ∈ X
, there exists
x
z
∈ X
and
n ∈ N
for
which ρ(T
in
x
z
, z) < ε.
The idea is just to shift the picture so that
y
gets close to
z
, and see where
we send x to. We will use continuity to make sure we don’t get too far away.
We choose
m
as in the previous lemma for
ε
2
. Since
T
−m
, T
−m+1
, ··· , T
m
are all uniformly continuous, we can choose
ε
0
such that
ρ
(
a, b
)
< ε
0
implies
ρ(T
s
a, T
s
b) <
ε
2
for all |s| ≤ m.
Given
z ∈ X
, we obtain
x
and
y
by applying our first claim to
ε
0
. Then we
can find
s ∈ Z
with
|s| ≤ m
such that
ρ
(
T
s
y, z
)
<
ε
2
. Consider
x
z
=
T
s
x
. Then
ρ(T
in
x
z
, z) ≤ ρ(T
in
x
z
, T
s
y) + ρ(T
s
y, z)
≤ ρ(T
s
(T
in
x), T
s
y) +
ε
2
≤
ε
2
+
ε
2
= ε
Claim.
For all
ε >
0 and
z ∈ X
, there exists
x ∈ X
,
n ∈ N
and
ε
0
>
0 such
that T
in
(B(x, ε
0
)) ⊆ B(z, ε) for all 1 ≤ i ≤ r.
We choose ε
0
by continuity, using the previous claim.
The idea is that we repeatedly apply the final claim, and then as we keep
moving back, eventually two points will be next to each other.
We pick
z
0
∈ X
and set
ε
0
=
ε
2
. By the final claim, there exists
z
1
∈ X
and
some
n
1
∈ N
such that
T
in
1
(
B
(
z
1
, ε
1
))
⊆ B
(
z
0
, ε
0
) for some 0
< ε
1
≤ ε
0
and all
1 ≤ i ≤ r.
Inductively, we find z
s
∈ X, n
s
∈ N and some 0 < ε
s
≤ ε
s−1
such that
T
in
s
(B(z
s
, ε
s
)) ⊆ B(z
s−1
, ε
s−1
)
for all 1 ≤ i ≤ r.
By compactness, (
z
s
) has a convergent subsequence, and in particular, there
exists i < j ∈ N such that ρ(z
i
, z
j
) <
ε
2
.
Now take x = z
j
, and
n = n
j
+ n
j−1
+ ··· + n
i+1
Then
T
`n
(B(x, ε
j
)) ⊆ B(z
i
, ε
i
).
for all 1 ≤ ` ≤ r. But we know
ρ(z
i
, z
j
) ≤
ε
2
,
and
ε
i
≤ ε
0
≤
ε
2
.
So we have
ρ(T
`n
x, x) ≤ ε
for all 1 ≤ ` ≤ r.
In the remaining of the chapter, we will actually prove Hindman’s theorem.
We will again first prove a “topological” version, but unlike the topological van
der Waerden theorem, it will look nothing like the actually Hindman’s theorem.
So we still need to do some work to deduce the actual version from the topological
version.
To state topological Hindman, we need the following definition:
Definition (Proximal points). We say x, y ∈ X are proximal if
inf
s∈Z
ρ(T
s
x, T
s
y) = 0.
Example.
Two colourings
c
1
, c
2
∈ C
are proximal iff there are arbitrarily large
intervals where they agree.
The topological form of Hindman’s theorem says the following:
Theorem
(Topological Hindman’s theorem)
.
Let (
X, T
) be a dynamical system,
and suppose
X
=
¯x
for some
x ∈ X
. Then for any minimal subsystem
Y ⊆ X
,
then there is some y ∈ Y such that x and y are proximal.
We will first show how this implies Hindman’s theorem, and then prove this.
To do the translation, we need to, of course, translate concepts in dynamical
systems to concepts about colourings. We previously showed that for colourings
c, c
0
, we have
c
0
∈ ¯c
iff
Seq
(
c
0
)
⊆ Seq
(
c
). The next thing to figure out is when
c
:
Z →
[
k
] is minimal. By definition, this means for any
x ∈ ¯c
, we have
¯x
=
¯c
.
Equivalently, we have Seq(x) = Seq(c).
We introduce some notation.
Notation.
For an interval
I ⊆ Z
, we say
I
=
{a, a
+ 1
, ··· , b}
, we write
c
(
I
) for
the sequence (c(a), c(a + 1), ··· , c(b)).
Clearly, we have
Seq(c) = {c(I) : I ⊆ Z is an interval}.
We say for intervals
I
1
, I
2
⊆ Z
, we have
c
(
I
1
)
4 c
(
I
2
), if
c
(
I
1
) is a subsequence
of c(I
2
).
Definition
(Bounded gaps property)
.
We say a colouring
c
:
Z →
[
k
] has the
bounded gaps property if for each interval
I ⊆ Z
, there exists some
M >
0 such
that c(I) 4 c(U) for every interval U ⊆ Z of length at least M.
For example, every periodic colouring has the bounded gaps property.
It turns out this is precisely what we need to characterize minimal colourings.
Proposition.
A colouring
c
:
Z →
[
k
] is minimal iff
c
has the bounded gaps
property.
Proof.
Suppose
c
has the bounded gaps property. We want to show that for all
x ∈ ¯c
, we have
Seq
(
x
) =
Seq
(
c
). We certainly got one containment
Seq
(
x
)
⊆
Seq
(
c
). Consider any interval
I ⊆ Z
. We want to show that
c
(
I
)
∈ Seq
(
x
). By
the bounded gaps property, there is some
M
such that any interval
U ⊆ Z
of
length > M satisfies c(I) 4 c(U). But since x ∈ ¯c, we know that
x([0, ··· , M ]) = c([t, t + 1, ··· , t + M ])
for some t ∈ Z. So c(I) 4 x([0, ··· , M]), and this implies Seq(c) ⊆ Seq(x).
For the other direction, suppose
c
does not have the bounded gaps property.
So there is some bad interval
I ⊆ Z
. Then for any
n ∈ N
, there is some
t
n
such
that
c(I) 64 c([t
n
− n, t
n
− n + 1, ··· , t
n
+ n]).
Now consider
x
n
=
L
t
n
(
c
). By passing to a subsequence if necessary, we may
assume that
x
n
→ x
. Clearly,
c
(
I
)
6∈ Seq
(
x
). So we have found something in
Seq(c) \ Seq(x). So c 6∈ ¯x.
It turns out once we have equated minimality with the bounded gaps property,
it is not hard to deduce Hindman’s theorem.
Theorem
(Hindman’s theorem)
.
If
c
:
N →
[
k
] is a
k
-colouring, then there
exists an infinite A ⊆ N such that F S(A) is monochromatic.
Proof.
We extend the colouring
c
to a colouring of
Z
by defining
x
:
Z →
[
k
+ 1]
with
x(i) = x(−i) = c(i)
if
x 6
= 0, and
x
(0) =
k
+ 1. Then it suffices to find an infinite an infinite
A ⊆ Z
such that F S(A) is monochromatic with respect to x.
We apply topological Hindman to (
¯x, L
) to find a minimal colouring
y
such
that x and y are proximal. Then either
inf
n∈N
ρ(L
n
x, L
n
y) = 0 or inf
n∈N
ρ(L
−n
x, L
−n
y) = 0.
We wlog it is the first case, i.e.
x
and
y
are positively proximal. We now build
up this infinite set A we want one by one.
Let the colour of
y
(0) be red. We shall in fact pick 0
< a
1
< a
2
< ···
inductively so that x(t) = y(t) = red for all t ∈ F S(a
1
, a
2
, ···).
By the bounded gaps property of
y
, there exists some
M
1
such that for any
I ⊆ Z of length ≥ M
1
, then it contains y([0]), i.e. y([0]) 4 y(I).
Since
x
and
y
are positively proximal, there exists
I ⊆ Z
such that
|I| ≥ M
and
min
(
I
)
>
0 such that
x
(
I
) =
y
(
I
). Then we pick
a
1
∈ I
such that
x(a
1
) = y(a
1
) = y(0).
Now we just have to keep doing this. Suppose we have found
a
1
< ··· < a
n
as required. Consider the interval
J
= [0
, ··· , a
1
+
a
2
+
···
+
a
n
]. Again by
the bounded gaps property, there exists some
M
n+1
such that if
I ⊆ Z
has
|I| ≥ M
n+1
, then y(J) 4 y(I).
We choose an interval I ⊆ Z such that
(i) x(I) = y(I)
(ii) |I| ≥ M
n+1
(iii) min I >
P
n
i=1
a
i
.
Then we know that
y([0, ··· , a
1
+ ··· + a
n
]) 4 y(I).
Let
a
n+1
denote by the position at which
y
([0
, ··· , a
1
+
···
+
a
n
]) occurs in
y
(
I
).
It is then immediate that
x
(
z
) =
y
(
z
) =
red
for all
z ∈ F S
(
a
1
, ··· , a
n+1
), as any
element in F S(a
1
, ··· , a
n+1
) is either
– t
0
for some t
0
∈ F S(a
1
, . . . , a
n
);
– a
n+1
; or
– a
n+1
+ t
0
for some t
0
∈ F S(a
1
, ··· , a
n
),
and these are all red by construction.
Then A = {a
1
, a
2
, ···} is the required set.
The heart of the proof is that we used topological Hindman to obtain a
minimal proximal point, and this is why we can iteratively get the a
n
.
It remains to prove topological Hindman.
Theorem
(Topological Hindman’s theorem)
.
If (
X, T
) is a dynamical system,
X
=
¯x
and
Y ⊆ X
is minimal, then there exists
y ∈ Y
such that
x
and
y
are
proximal.
This is a long proof, and we will leave out proofs of basic topological facts.
Proof.
If
X
is a compact metric space, then
X
X
=
{f
:
X → X}
is compact
under the product topology by Tychonoff. The basic open sets are of the form
{f : X → Y | f(x
i
) ∈ U
i
for i = 1, ··· , k}
for some fixed x
i
∈ X and U
i
⊆ X open.
Now any function g : X → X can act on X
X
in two obvious ways —
– Post-composition L
g
: X
X
→ X
X
be given by L
g
(f) = g ◦ f .
– Pre-composition R
g
: X
X
→ X
X
be given by R
g
(f) = f ◦ g.
The useful thing to observe is that
R
g
is always continuous, while
L
g
is continuous
if g is continuous.
Now if (X, T ) is a dynamical system, we let
E
T
= cl{T
s
: s ∈ Z} ⊆ X
X
.
The idea is that we look at what is going on in this space instead. Let’s note a
few things about this subspace E
T
.
– E
T
is compact, as it is a closed subset of a compact set.
– f ∈ E
T
iff for all
ε >
0 and points
x
1
, ··· , x
k
∈ X
, there exists
s ∈ Z
such
that ρ(f (x
i
), T
s
(x
i
)) < ε.
– E
T
is closed under composition.
So in fact,
E
T
is a compact semi-group inside
X
X
. It turns out every proof of
Hindman’s theorem involves working with a semi-group-like object, and then
trying to find an idempotent element.
Theorem
(Idempotent theorem)
.
If
E ⊆ X
X
is a compact semi-group, then
there exists g ∈ E such that g
2
= g.
This is a strange thing to try to prove, but it turns out once we came up
with this statement, it is not hard to prove. The hard part is figuring out this is
the thing we want to prove.
Proof.
Let
F
denote the collection of all compact semi-groups of
E
. Let
A ∈ F
be minimal with respect to inclusion. To see
A
exists, it suffices (by Zorn) to
check that any chain
S
in
F
has a lower bound in
F
, but this is immediate,
since the intersection of nested compact sets is noon-empty.
Claim. Ag = A for all g ∈ A.
We first observe that
Ag
is compact since
R
g
is continuous. Also,
Ag
is a
semigroup, since if f
1
g, f
2
g ∈ Ag, then
f
1
gf
2
g = (f
1
gf
2
)g ∈ A
g
.
Finally, since g ∈ A, we have Ag ⊆ A. So by minimality, we have Ag = A.
Now let
B
g
= {f ∈ A : fg = g}.
Claim. B
g
= A for all g ∈ A.
Note that
B
g
is non-empty, because
Ag
=
A
. Moreover,
B
is closed, as
B
is
the inverse of
{g}
under
R
g
. So
B
is compact. Finally, it is clear that
B
is a
semi-group. So by minimality, we have B
g
= A.
So pick any g ∈ A, and then g
2
= g.
Proof of topological Hindman (continued).
We are actually going to work in a
slightly smaller compact semigroup than E
T
. We let F ⊆ E
T
be defined by
F = {f ∈ E
T
: f(x) ∈ Y }.
Claim. F ⊆ X
X
is a compact semigroup.
Before we prove the claim, we see why it makes sense to consider this
F
.
Suppose the claim is true. Then by applying the idempotent theorem, to get
g ∈ F such that g
2
= g. We now check that x, g(x) are proximal.
Since g ∈ F ⊆ E
T
, we know for all ε, there exists s ∈ Z such that
ρ(g(x), T
s
(x)) <
ε
2
, ρ(g(g(x)), T
s
(g(x))) <
ε
2
.
But we are done, since
g
(
x
) =
g
(
g
(
x
)), and then we conclude from the triangle
inequality that
ρ(T
s
(x), T
s
(g(x))) < ε.
So x and g(x) ∈ Y are proximal.
It now remains to prove the final claim. We first note that
F
is non-empty.
Indeed, pick any
y ∈ Y
. Since
X
=
¯x
, there exists some sequence
T
n
i
(
x
)
→ y
.
Now by compactness of
E
T
, we can find some
f ∈ E
T
such that (
T
n
i
)
i≥0
cluster
at
f
, i.e. every open neighbourhood of
f
contains infinitely many
T
n
i
. Then
since X is Hausdorff, it follows that f(x) = y. So f ∈ F.
To show
F
is compact, it suffices to show that it is closed. But since
Y
=
¯y
for all
y ∈ Y
by minimality, we know
Y
is in particular closed, and so
F
is closed
in the product topology.
Finally, we have to show that
F
is closed under composition, so that it is a
semi-group. To show this, it suffices to show that any map
f ∈ E
T
sends
Y
to
Y
. Indeed, by minimality of
Y
, this is true for
T
s
for
s ∈ Z
, and then we are
done since Y is closed.
The actual hard part of the proof is deciding we should prove the idempotent
theorem. It is not an obvious thing to try!
5 Sums and products*
The goal of this final (non-examinable) chapter is to prove the following theorem:
Theorem
(Moreira, 2016)
.
If
N
is
k
-coloured, then there exists infinitely many
x, y such that {x, x + y, xy} is monochromatic.
This is a rather hard theorem, but the proof is completely elementary, and
just involves a very clever use of van der Waerden’s theorem. One of the main
ideas is to consider syndetic sets.
Definition
(Syndetic set)
.
We say
A
=
{a
1
< a
2
< ···} ⊆ N
is syndetic if it
has bounded gaps, i.e. a
i+1
− a
i
< b for all i.
It turns out these sets contain a lot of structure. What happens when we
finitely colour a syndetic set? Are we guaranteed to have a monochromatic
syndetic set? Let’s take the simplest syndetic set —
N
. Let’s say we 2-colour
N
.
Does one of the classes have to by syndetic? Of course not. For example, we
can look at this:
···
But this set obviously has some very nice structure. So we are motivated to
consider the following definition:
Definition
(Piecewise syndetic set)
.
We say
A ⊆ N
is piecewise syndetic if there
exists b > 0 such that A has gaps bounded by b on arbitrarily long intervals.
More explicitly, if we again write
A
=
{a
1
< a
2
< ···}
, then there exists
b
such that for all
N >
0, there is
i ∈ N
such that
a
k+1
−a
k
< b
for
k
=
i, ··· , i
+
N
.
We begin with two propositions which are standard results.
Proposition.
If
A
is piecewise syndetic and
A
=
X ∪ Y
, then one of
X
and
Y
is piecewise syndetic.
Proof.
We fix
b >
0 and the sequence of intervals
I
1
, I
2
, ···
with
|I
n
| ≥ n
such
that A has gaps ≤ b in each I
n
. We may, of course, wlog I
n
are disjoint.
Each element in
I
n
is either in
X
or not. We let
t
n
denote the largest
number of consecutive things in
I
n
that are in
X
. If
t
n
is unbounded, then
X
is
piecewise syndetic. If
t
n
≤ K
, then
Y
is piecewise syndetic, with gaps bounded
by (K + 1)b.
Of course, this can be iterated to any finite partition. There is nothing clever
so far.
Our next proposition is going to be a re-statement of van der Waerden’s
theorem in the world of piecewise syndetic sets.
Proposition.
Let
A ⊆ N
be piecewise syndetic. Then for all
m ∈ N
, there
exists d ∈ N such that the set
A
∗
= {x ∈ N : x, x + d, ··· , x + md ∈ A}.
is piecewise syndetic.
Proof.
Let’s in fact assume that
A
is syndetic, with gaps bounded by
b
. The
proof for a piecewise syndetic set is similar, but we have to pass to longer and
longer intervals, and then piece them back together.
We want to apply van der Waerden’s theorem. By definition,
N = A ∪ (A + 1) ∪··· ∪ (A + b).
Let c : N → {0, ··· , b} by given by
c(x) = min{i : x ∈ A + i}.
Then by van der Waerden, we can find some a
1
, d
1
such that
a
1
, a
1
+ d
1
, ··· , a
1
+ md
1
∈ A + i.
Of course, this is equivalent to
(a
1
− i), (a
1
+ i) + d
1
, ··· , (a
1
− i) + md
1
∈ A.
So we wlog i = 0.
But van der Waerden doesn’t require the whole set of
N
. We can split
N
into
huge blocks of size
W
(
m
+ 1
, b
+ 1), and then we run the same argument in each
of these blocks. So we find (a
1
, d
1
), (a
2
, d
2
), ··· such that
a
i
, a
i
+ d
i
, ··· , a
i
+ md
i
∈ A.
Moreover,
˜
A
=
{a
1
, a
2
, ···}
is syndetic with gaps bounded by
W
(
m
+ 1
, b
+ 1).
But we need a fixed d that works for everything.
We now observe that by construction, we must have
d
i
≤ W
(
m
+ 1
, b
+ 1)
for all
i
. So this induces a finite colouring on the
a
i
, where the colour of
a
i
is
just
d
i
. Now we are done by the previous proposition, which tells us one of the
partitions must be piecewise syndetic.
That was the preliminary work we had to do to prove the theorem. The
proof will look very magical, and things only start to make sense as we reach the
end. This is not a deficiency of the proof. There is some reason this problem
was open for more than 40 years.
The original proof (arxiv:1605.01469) is a bit less magical, and used concepts
from topological dynamics. Interested people can find the original paper to read.
However, we will sacrifice “deep understanding” for having a more elementary
proof that just uses van der Waerden.
Proof of theorem. Let N = C
1
∪ ··· ∪ C
r
. We build the following sequences:
(i) y
1
, y
2
, ··· ∈ N;
(ii) B
0
, B
1
, ··· and D
1
, D
2
, ··· which are piecewise syndetic sets;
(iii) t
0
, t
1
, ··· ∈ [r] which are colours.
We will pick these such that B
i
⊆ C
t
i
.
We first observe that one of the colour classes
C
i
is piecewise syndetic. We
pick
t
0
∈
[
r
] be such that
C
t
0
is piecewise syndetic, and pick
B
0
=
C
t
0
. We want
to apply the previous proposition to obtain
D
0
from
B
0
. We pick
m
= 2, and
then we can find a y
1
such that
D
1
= {z | z, z + y
1
∈ B
0
}
is piecewise syndetic.
We now want to get
B
1
. We notice that
y
1
D
1
is also piecewise syndetic, just
with a bigger gap. So there is a colour class
t
1
∈
[
r
] such that
y
1
D
1
∩ C
t
1
is
piecewise syndetic, and we let B
1
= y
1
D
1
∩ C
t
1
.
In general, having constructed
y
1
, ··· , y
i−1
;
B
0
, ··· , B
i−1
;
D
1
, ··· , D
i−1
and
t
0
, . . . , t
i−1
, we proceed in the following magical way:
We apply the previous proposition with the really large
m
=
y
2
1
y
2
2
···y
2
i−1
to
find y
i
∈ N such that the set
D
i
= {z | z, z + y
i
, ··· , z + (y
2
1
y
2
2
···y
2
i−1
)y
i
∈ B
i−1
}
is piecewise syndetic. The main thing is that we are not using all points in this
progression, but are just using some important points in there. In particular, we
use the fact that
z, z + y
i
, z + (y
2
j
···y
2
i−1
)y
i
∈ B
i−1
for all 1 ≤ j ≤ i − 1. It turns out these squares are important.
We have now picked
y
i
and
D
i
, but we still have to pick
B
i
. But this is just
the same as what we did. We know that
y
i
D
i
is piecewise syndetic. So we know
there is some t
i
∈ [r] such that B
i
= y
i
D
i
∩ C
t
i
is piecewise syndetic.
Let’s write down some properties of these
B
i
’s. Clearly,
B
i
⊆ C
t
i
, but we
can say much more. We know that
B
i
⊆ y
i
B
i−1
⊆ ··· ⊆ y
i
y
i−1
···y
j+1
B
j
.
Of course, there exists some t
j
, t
i
with j < i such that t
j
= t
i
. We set
y = y
i
y
i−1
···y
j+1
.
Let’s choose any z ∈ B
i
. Then we can find some x ∈ B
j
such that
z = xy,
We want to check that this gives us what we want. By construction, we have
x ∈ B
j
⊆ C
t
i
, xy = z ∈ B
i
⊆ C
t
i
.
So they have the same colour.
How do we figure out the colour of x + y? Consider
y(x + y) = z + y
2
∈ y
i
D
i
+ y
2
.
By definition, if a ∈ D
i
, then we know
a + (y
2
j+1
···y
2
i−1
)y
i
∈ B
i−1
.
So we have
D
i
⊆ B
i−1
− (y
2
j+1
···y
2
i+1
)y
i
⊆ y
j+1
···y
i−1
B
j
− (y
2
j+1
···y
2
i−1
)y
i
So it follows that
y(x + y) ⊆ y
i
(y
i−1
···y
j+1
)B
j
− y
2
+ y
2
= yB
j
.
So x + y ∈ B
j
⊆ C
t
j
= C
t
i
. So {x, x + y, xy} have the same colour.