Part III — Combinatorics
Based on lectures by B. Bollobas
Notes taken by Dexter Chua
Michaelmas 2017
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
What can one say about a collection of subsets of a finite set satisfying certain conditions
in terms of containment, intersection and union? In the past fifty years or so, a good
many fundamental results have been proved about such questions: in the course we
shall present a selection of these results and their applications, with emphasis on the
use of algebraic and probabilistic arguments.
The topics to be covered are likely to include the following:
– The de Bruijn–Erd¨os theorem and its extensions.
– The Graham–Pollak theorem and its extensions.
– The theorems of Sperner, EKR, LYMB, Katona, Frankl and F¨uredi.
–
Isoperimetric inequalities: Kruskal–Katona, Harper, Bernstein, BTBT, and their
applications.
–
Correlation inequalities, including those of Harris, van den Berg and Kesten, and
the Four Functions Inequality.
– Alon’s Combinatorial Nullstellensatz and its applications.
– LLLL and its applications.
Pre-requisites
The main requirement is mathematical maturity, but familiarity with the basic graph
theory course in Part II would be helpful.
1 Hall’s theorem
We shall begin with a discussion of Hall’s theorem. Ideally, you’ve already met
it in IID Graph Theory, but we shall nevertheless go through it again.
Definition
(Bipartite graph)
.
We say
G
= (
X, Y
;
E
) is a bipartite graph with
bipartition
X
and
Y
if (
X t Y, E
) is a graph such that every edge is between a
vertex in X and a vertex in Y .
We say such a bipartite graph is (
k, `
)-regular if every vertex in
X
has degree
k
and every vertex in
Y
has degree
`
. A bipartite graph that is (
k, `
)-regular for
some k, ` ≥ 1 is said to be biregular.
Definition
(Complete matching)
.
Let
G
= (
X, Y
;
E
) be a bipartite graph
with bipartition
X
and
Y
. A complete matching from
X
to
Y
is an injection
f : X → Y such that x f(x) is an edge for every x ∈ X.
Hall’s theorem gives us a necessary and sufficient condition for the existence
of a complete matching. Let’s try to first come up with a necessary condition.
If there is a complete matching, then for any subset
S ⊆ X
, we certainly have
|
Γ(
S
)
| ≥ |S|
, where Γ(
S
) is the set of neighbours of
S
. Hall’s theorem says this
is also sufficient.
Theorem
(Hall, 1935)
.
A bipartite graph
G
= (
X, Y
;
E
) has a complete match-
ing from X to Y if and only if |Γ(S)| ≥ |S| for all S ⊆ X.
This condition is known as Hall’s condition.
Proof.
We may assume
G
is edge-minimal satisfying Hall’s condition. We show
that
G
is a complete matching from
X
to
Y
. For
G
to be a complete matching,
we need the following two properties:
(i) Every vertex in X has degree 1
(ii) Every vertex in Y has degree 0 or 1.
We first examine the second condition. Suppose
y ∈ Y
is such that there
exists edges
x
1
y, x
2
y ∈ E
. Then the minimality of
G
implies there are sets,
X
1
, X
2
⊆ X
such that
x
i
∈ X
i
such that
|
Γ(
X
i
)
|
=
|X
i
|
and
x
i
is the only
neighbour of y in X
i
.
Now consider the set
X
1
∩ X
2
. We know Γ(
X
1
∩ X
2
)
⊆
Γ(
X
1
)
∩
Γ(
X
2
).
Moreover, this is strict, as y is in the RHS but not the LHS. So we have
Γ(X
1
∩ X
2
) ≤ |Γ(X
i
) ∩Γ(X
2
)| −1.
But also
|X
1
∩ X
2
| ≤ |Γ(X
1
∩ X
2
)|
≤ |Γ(X
1
) ∩Γ(X
2
)| −1
= |Γ(X
1
)| + |Γ(X
2
)| −|Γ(X
1
) ∪Γ(X
2
)| −1
= |X
1
| + |X
2
| −|Γ(X
1
∪ X
2
)| −1
≤ |X
1
| + |X
2
| −|X
1
∪ X
2
| −1
= |X
1
∩ X
2
| −1,
which contradicts Hall’s condition.
One then sees that the first condition is also satisfied — if
x ∈ X
is a vertex,
then the degree of
x
certainly cannot be 0, or else
|
Γ(
{x}
)
| < |{x}|
, and we see
that
d
(
x
) cannot be
>
1 or else we can just remove an edge from
x
without
violating Hall’s condition.
We shall now describe some consequences of Hall’s theorem. They will
be rather straightforward applications, but we shall later see they have some
interesting consequences.
Let
A
=
{A
1
, . . . , A
m
}
be a set system. All sets are finite. A set of distinct
representatives of A is a set {a
1
, . . . a
m
} of distinct elements a
i
∈ A
i
.
Under what condition do we have a set of distinct representatives? If we
have one, then for any I ⊆ [m] = {1, 2, . . . , m}, we have
[
i∈I
A
i
≥ |I|.
We might hope this is sufficient.
Theorem. A has a set of distinct representatives iff for all B ⊆ A, we have
[
B∈B
B
≥ |B|.
This is an immediate consequence of Hall’s theorem.
Proof.
Define a bipartite graph as follows — we let
X
=
A
, and
Y
=
S
i∈[m]
A
i
.
Then draw an edge from
x
to
A
i
if
x ∈ A
i
. Then there is a complete matching
of this graph iff
A
has a set of distinct representations, and the condition in the
theorem is exactly Hall’s condition. So we are done by Hall’s theorem.
Theorem.
Let
G
= (
X, Y
;
E
) be a bipartite graph such that
d
(
x
)
≥ d
(
y
) for
all x ∈ X and y ∈ Y . Then there is a complete matching from X to Y .
Proof.
Let
d
be such that
d
(
x
)
≥ d ≥ d
(
y
) for all
x ∈ X
and
y ∈ Y
. For
S ⊆ X
and
T ⊆ Y
, we let
e
(
S, T
) be the number of edges between
S
and
T
. Let
S ⊆ X
,
and T = Γ(S). Then we have
e(S, T ) =
X
x∈S
d(x) ≥ d|S|,
but on the other hand, we have
e(S, T ) ≤
X
y ∈T
d(y) ≤ d|T|.
So we find that |T | ≥ |S|. So Hall’s condition is satisfied.
Corollary.
If
G
= (
X, Y
;
E
) is a (
k, `
)-regular bipartite graph with 1
≤ ` ≤ k
,
then there is a complete matching from X to Y .
Theorem. Let G = (X, Y ; E) be biregular and A ⊆ X. Then
|Γ(A)|
|Y |
≥
|A|
|X|
.
Proof. Suppose G is (k, `)-regular. Then
k|A| = e(A, Γ(A)) ≤ `|Γ(A)|.
Thus we have
|Γ(A)|
|Y |
≥
k|A|
`|Y |
.
On the other hand, we can count that
|E| = |X|k = |Y |`,
and so
k
`
=
|Y |
|X|
.
So we are done.
Briefly, this says biregular graphs “expand”.
Corollary.
Let
G
= (
X, Y
;
E
) be biregular and let
|X| ≤ |Y |
. Then there is a
complete matching of X into Y .
In particular, for any biregular graph, there is always a complete matching
from one side of the graph to the other.
Notation.
Given a set
X
, we write
X
(r)
for the set of all subsets of
X
with
r
elements, and similarly for X
(≥r)
and X
(≤r)
.
If |X| = n, then |X
(r)
| =
n
r
.
Now given a set
X
and two numbers
r < s
, we can construct a biregular
graph (X
(r)
, X
(s)
; E), where A ∈ X
(r)
is joined to B ∈ X
(s)
if A ⊆ B.
Corollary.
Let 1
≤ r < s ≤ |X|
=
n
. Suppose
|
n
2
− r| ≥ |
n
2
− s|
. Then there
exists an injection f : X
(r)
→ X
(s)
such that A ⊆ f(A) for all A ∈ X
(r)
.
If
|
n
2
− r| ≤ |
n
2
− s|
, then there exists an injection
g
:
X
(s)
→ X
(r)
such that
A ⊇ g(A) for all A ∈ X
(s)
.
Proof. Note that |
n
2
− r| ≤ |
n
2
− s| iff
n
r
≥
n
s
.
2 Sperner systems
In the next few chapters, we are going to try to understand the power set
P
(
X
)
of a set. One particularly important structure of
P
(
X
) is that it is a graded
poset. A lot of the questions we ask can be formulated for arbitrary (graded)
posets, but often we will only answer them for power sets, since that is what we
are interested in.
Definition
(Chain)
.
A subset
C ⊆ S
of a poset is a chain if any two of its
elements are comparable.
Definition
(Anti-chain)
.
A subset
A ⊆ S
is an anti-chain if no two of its
elements are comparable.
Given a set
X
, the power set
P
(
X
) of
X
can be viewed as a Boolean lattice.
This is a poset by saying A < B if A ( B.
In general, there are many questions we can ask about a poset
P
. For
example, we may ask what is the largest possible of an anti-chain in
P
. While
this is quite hard in general, we may be able to produce answers if we impose
some extra structure on our posets. One particularly useful notion is that of a
graded poset.
Definition
(Graded poset)
.
We say
P
= (
S, <
) is a graded poset if we can write
S as a disjoint union
S =
n
a
i=0
S
i
such that
– S
i
is an anti-chain; and
– x < y
iff there exists elements
x
=
z
i
< z
i+1
< ··· < z
j
=
y
such that
z
h
∈ S
h
.
Example. If X is a set, P(X) is an anti-chain with X
i
= X
(i)
.
If we want to obtain the largest anti-chain as possible, then we might try
X
i
with
i
=
b
n
2
c
. But is this actually the largest possible? Or can we construct
some funny-looking anti-chain that is even larger? Sperner says no.
Theorem
(Sperner, 1928)
.
For
|X|
=
n
, the maximal size of an antichain in
P(X) is
n
bn/2c
, witnessed by X
bn/2c
.
Proof.
If
C
is a chain and
A
is an antichain, then
|A ∩C| ≤
1. So it suffices to
partition P(X) into
m = max
k
n
k
=
n
bn/2c
=
n
dn/2e
many chains.
We can do so using the injections constructed at the end of the previous
section. For
i ≥ b
n
2
c
, we can construct injections
f
i
:
X
i−1
→ X
i
such that
A ⊆ f
i
(
A
) for all
A
. By chaining these together, we get
m
chains ending in
X
b
n
2
c
.
Similarly, we can partition
X
(≤bn/2c)
into
m
chains with each chain ending
in X
(bn/2c)
. Then glue them together.
Another way to prove this result is to provide an alternative measure on how
large an antichain can be, and this gives a stronger result.
Theorem
(LYM inequality)
.
Let
A
be an antichain in
P
(
X
) with
|X|
=
n
.
Then
n
X
r=0
|A ∩X
(r)
|
n
r
≤ 1.
In particular, |A| ≤ max
r
n
r
=
n
bn/2c
, as we already know.
Proof.
A chain
C
0
⊆ C
1
⊆ ··· ⊆ C
m
is maximal if it has
n
+ 1 elements.
Moreover, there are
n
! maximal chains, since we start with the empty set and
then, given
C
i
, we produce
C
i+1
by picking one unused element and adding it
to C
i
.
For every maximal chain
C
, we have
|C ∩ A| ≤
1. Moreover, every set of
k
elements appears in
k
!(
n −k
)! maximal chains, by a similar counting argument
as above. So
X
A∈A
|A|!(n −|A|)! ≤ n!.
Then the result follows.
There are analogous results for posets more general than just
P
(
X
). To
formulate these results, we must introduce the following new terminology.
Definition (Shadow). Given A ⊆ S
i
, the shadow at level i −1 is
∂A = {x ∈ S
i−1
: x < y for some y ∈ A}.
Definition
(Downward-expanding poset)
.
A graded poset
P
= (
S, <
) is said to
be downward-expanding if
|∂A|
|S
i−1
|
≥
|A|
|S
i
|
for all A ⊆ A
i
.
We similarly define upward-expanding, and say a poset is expanding if it is
upward or downward expanding.
Definition (Weight). The weight of a set A ⊆ S is
w(A) =
n
X
i=0
|A ∩S
i
|
|S
i
|
.
The theorem is that the LYM inequality holds in general for any downward
expanding posets.
Theorem.
If
P
is downward expanding and
A
is an anti-chain, then
w
(
A
)
≤
1.
In particular, |A| ≤ max
i
|S
i
|.
Since each S
i
is an anti-chain, the largest anti-chain has size max
i
|S
i
|.
Proof. We define the span of A to be
span A = max
A
j
6=∅
j − min
A
i
6=∅
i.
We do induction on span A.
If
span A
= 0, then we are done. Otherwise, let
h
i
=
max
A
j
6=0
j
, and set
B
h−1
= ∂A
h
. Then since A is an anti-chain, we know A
h−1
∩ B
h−1
= ∅.
We set
A
0
=
A\A
h
∪B
h−1
. This is then another anti-chain, by the transitivity
of <. We then have
w(A) = w(A
0
) + w(A
h
) −w(B
h−1
) ≤ w(A
0
) ≤ 1,
where the first inequality uses the downward-expanding hypothesis and the
second is the induction hypothesis.
We may want to mimic our other proof of the fact that the largest size of an
antichain in P(X) is
n
bn/2c
. This requires the notion of a regular poset.
Definition
(Regular poset)
.
We say a graded poset (
S, <
) is regular if for each
i
, there exists
r
i
, s
i
such that if
x ∈ A
i
, then
x
dominates
r
i
elements at level
i −1, and is dominated by s
i
elements at level i + 1.
Proposition. An anti-chain in a regular poset has weight ≤ 1.
Proof.
Let
M
be the number of maximal chains of length (
n
+ 1), and for each
x ∈ S
k
, let m(x) be the number of maximal chains through x. Then
m(x) =
k
Y
i=1
r
i
n−1
Y
i=k
s
i
.
So if x, y ∈ S
i
, then m(x) = m(y).
Now since every maximal chain passes through a unique element in
S
i
, for
each x ∈ S
i
, we have
M =
X
x∈S
i
m(x) = |S
i
|m(x).
This gives the formula
m(x) =
M
|S
i
|
.
now let
A
be an anti-chain. Then
A
meets each chain in
≤
1 elements. So we
have
M =
X
maximal chains
1 ≥
X
x∈A
m(x) =
n
X
i=0
|A ∩S
i
| ·
M
|S
i
|
.
So it follows that
X
|A ∩S
i
|
|S
i
|
≤ 1.
Let’s now turn to a different problem. Suppose
x
1
, . . . , x
n
∈ C
, with each
|x
i
| ≥ 1. Given A ⊆ [n], we let
x
A
=
X
i∈A
x
i
.
We now seek the largest size of
A
such that
|x
A
− x
B
| <
1 for all
A, B ∈ A
.
More precisely, we want to find the best choice of
x
1
, . . . , x
n
and
A
so that
|A|
is as large as possible while satisfying the above condition.
If we are really lazy, then we might just choose
x
i
= 1 for all
i
. By taking
A = [n]
bn/2c
, we can obtain |A| =
n
bn/2c
.
Erd¨os noticed this is the best bound if we require the x
i
to be real.
Theorem (Erd¨os, 1945). Let x
i
be all real, |x
i
| ≥ 1. For A ⊆ [n], let
x
A
=
X
i∈A
x
i
.
Let A ⊆ P(n). Then |A| ≤
n
bn/2c
.
Proof.
We claim that we may assume
x
i
≥
1 for all
i
. To see this, suppose we
instead had
x
1
=
−
2, say. Then whether or not
i ∈ A
determines whether
x
A
should include 0 or
−
2 in the sum. If we replace
x
i
with 2, then whether or not
i ∈ A
determines whether
x
A
should include 0 or 2. So replacing
x
i
with 2 just
essentially shifts all terms by 2, which doesn’t affect the difference.
But if we assume that
x
i
≥
1 for all
i
, then we are done, since
A
must be an
anti-chain, for if A, B ∈ A and A ( B, then x
B
− x
A
= x
B\A
≥ 1.
Doing it for complex numbers is considerably harder. In 1970, Kleitman
found a gorgeous proof for every normed space. This involves the notion of a
symmetric decomposition. To motivate this, we first consider the notion of a
symmetric chain.
Definition
(Symmetric chain)
.
We say a chain
C
=
{C
i
, C
i+1
, . . . , C
n−i
}
is
symmetric if |C
j
| = j for all j.
Theorem. P(n) has a decomposition into symmetric chain.
Proof.
We prove by induction. In the case
n
= 1, we simply have to take
{∅, {1}}.
Now suppose
P
(
n −
1) has a symmetric chain decomposition
C
1
∪ ··· ∪ C
t
.
Given a symmetric chain
C
j
= {C
i
, C
i+1
, . . . , C
n−1−i
},
we obtain two chains C
(0)
j
, C
(1)
j
in P(n) by
C
(0)
j
= {C
i
, C
i+1
, . . . , C
n−1−i
, C
n−1−i
∪ {n}}
C
(1)
j
= {C
i
∪ {n}, C
i+1
∪ {n}, . . . , C
n−2−i
∪ {n}}.
Note that if
|C
j
|
= 1, then
C
(1)
j
=
∅
, and we drop this. Under this convention, we
note that every A ∈ P(n) appears in exactly one C
(ε)
j
, and so we are done.
We are not going to actually need the notion of symmetric chains in our
proof. What we need is the “profile” of a symmetric chain decomposition. By a
simple counting argument, we see that for 0
≤ i ≤
n
2
, the number of chains with
n + 1 − 2i sets is
`(n, i) ≡
n
i
−
n
i −1
.
Theorem
(Kleitman, 1970)
.
Let
x
1
, x
2
, . . . , x
n
be vectors in a normed space
with norm kx
I
k ≥ 1 for all i. For A ∈ P(n), we set
x
A
=
X
i∈A
x
i
.
Let A ⊆ P(n) be such that kx
A
− x
B
k < 1. Then kAk ≤
n
bn/2c
.
This bound is indeed the best, since we can pick
x
i
=
x
for some
kxk ≥
1,
and then we can pick A = [n]
bn/2c
.
Proof.
Call
F ⊆ P
(
n
) sparse if
kx
E
− x
F
k ≥
1 for all
E, F ∈ F
,
E 6
=
F
. Note
that if
F
is sparse, then
|F ∩ A| ≤
1. So if we can find a decomposition of
P
(
n
)
into
n
bn/2c
sparse sets, then we are done.
We call a partition
P
(
n
) =
F
1
∪··· ∪F
t
symmetric if the number of families
with
n
+ 1
−
2
i
sets is
`
(
n, i
), i.e. the “profile” is that of a symmetric chain
decomposition.
Claim. P(n) has a symmetric decomposition into sparse families.
We again induct on
n
. When
n
= 1, we can take
{∅, {
1
}}
. Now suppose
∆
n−1
is a symmetric decomposition of P(n −1) as F
1
∪ ··· ∪ F
t
.
Given
F
j
, we construct
F
(0)
j
and
F
(1)
j
“as before”. We pick some
D ∈ F
j
, to
be decided later, and we take
F
(0)
j
= F
j
∪ {D ∪ {n}}
F
(1)
j
= {E ∪ {n} : E ∈ F
j
\ {D}}.
The resulting set is certainly still symmetric. The question is whether it is sparse,
and this is where the choice of
D
comes in. The collection
F
(1)
j
is certainly still
sparse, and we must pick a D such that F
(0)
j
is sparse.
To do so, we use Hahn–Banach to obtain a linear functional
f
such that
kfk
= 1 and
f
(
x
n
) =
kx
n
k ≥
1. We can then pick
D
to maximize
f
(
x
D
). Then
we check that if E ∈ F
j
, then
f(x
D∪{n}
− x
2
) = f(x
D
) −f (x
E
) + f (x
n
).
By assumption,
f
(
x
n
)
≥
1 and
f
(
x
D
)
≥ f
(
x
E
). So this is
≥
1. Since
kfk
= 1, it
follows that kx
D∪{n}
− x
E
k ≥ 1.
3 The Kruskal–Katona theorem
For A ⊆ X
(r)
, recall we defined the lower shadow to be
∂A = {B ∈ X
(r−1)
: B ⊆ A for some A ∈ A}.
The question we wish to understand is how small we can make
∂A
, relative to
A. Crudely, we can bound the size by
|∂A| ≥ |A|
n
r−1
n
r
=
n −r
r
|A|.
But surely we can do better than this. To do so, one reasonable strategy is to
first produce some choice of
A
we think is optimal, and see how we can prove
that it is indeed optimal.
To do so, let’s look at some examples.
Example. Take n = 6 and r = 3. We pick
A = {123, 456, 124, 256}.
Then we have
∂A = {12, 13, 23, 45, 46, 56, 14, 24, 25, 26},
and this has 10 elements.
But if we instead had
A = {123, 124, 134, 234},
then
∂A = {12, 13, 14, 23, 24, 34},
and this only has 6 elements, and this is much better.
Intuitively, the second choice of
A
is better because the terms are “bunched”
together.
More generally, we would expect that if we have
|A|
=
k
r
, then the best
choice should be
A
= [
k
]
(r)
, with
|∂A|
=
k
r−1
. For other choices of
A
, perhaps
a reasonable strategy is to find the largest
k
such that
k
r
< |A|
, and then take
A
to be [
k
]
(r)
plus some elements. To give a concrete description of which extra
elements to pick, our strategy is to define a total order on [
n
]
(r)
, and say we
should pick the initial segment of length |A|.
This suggests the following proof strategy:
(i)
Come up with a total order on [
n
]
(r)
, or even
N
(r)
such that [
k
]
(r)
are
initial segments for all k.
(ii)
Construct some “compression” operators
P
(
N
(r)
)
→ P
(
N
(r)
) that pushes
each element down the ordering without increasing the |∂A|.
(iii)
Show that the only subsets of
N
(r)
that are fixed by the compression
operators are the initial segments.
There are two natural orders one can put on [n]
(r)
:
– lex: We say A < B if min A∆B ∈ A.
– colex: We say A < B if max A∆B ∈ B.
Example. For r = 3, the elements of X
(3)
in colex order are
123, 124, 134, 234, 125, 135, 235, 145, 245, 345, 126, . . .
In fact, colex is an order on
N
(r)
, and we see that the initial segment with
n
r
elements is exactly [n]
(r)
. So this is a good start.
If we believe that colex is indeed the right order to do, then we ought to
construct some compression operators. For
i 6
=
j
, we define the (
i, j
)-compression
as follows: for a set A ∈ X
(r)
, we define
C
ij
(A) =
(
(A \{j}) ∪{i} j ∈ A, i 6∈ A
A otherwise
For a set system, we define
C
ij
(A) = {C
ij
(A) : A ∈ A}∪ {A ∈ A : C
ij
(A) ∈ A}
We can picture our universe of sets as follows:
B ∪ {j}
B ∪ {i}
The set system
A
is some subset of all these points, and we what we are doing
is that we are pushing everything down when possible.
It is clear that we have |C
ij
(A)| = |A|. We further observe that
Lemma. We have
∂C
ij
(A) ⊆ C
ij
(∂A).
In particular, |∂C
ij
(A)| ≤ |∂A|.
Given
A ⊆ X
(r)
, we say
A
is left-compressed if
c
ij
(
A
) =
A
for all
i < j
. Is
this good enough?
Of course initial segments are left-compressed. However, it turns out the
converse is not true.
Example. {123, 124, 125, 126} are left-compressed, but not an initial segment.
So we want to come up with “more powerful” compressions. For
U, V ∈ X
(s)
with
U ∩V
=
∅
, we define a (
U, V
)-compression as follows: for
A ⊆ X
, we define
C
UV
(A) =
(
(A \V ) ∪U A ∩(U ∪ V ) = V
A otherwise
Again, for A ⊆ X
(r)
, we can define
C
UV
(A) = {C
UV
(A) : A ∈ A}∪ {A ∈ A : C
UV
(A) ∈ A}.
Again, A is (U, V )-compressed if C
UV
(A) = A.
This time the behaviour of the compression is more delicate.
Lemma.
Let
A ⊆ X
(r)
and
U, V ∈ X
(s)
,
U ∩ V
=
∅
. Suppose for all
u ∈ U
,
there exists v such that A is (U \ {u}, V \ {v})-compressed. Then
∂C
UV
(A) ⊆ C
UV
(∂A).
Lemma. A ⊆ X
(r)
is an initial segment of
X
(r)
in colex if and only if it is
(U, V )-compressed for all U, V disjoint with |U| = |V | and max V > max U .
Proof. ⇒
is clear. Suppose
A
is (
U, V
) compressed for all such
U, V
. If
A
is not
an initial segment, then there exists
B ∈ A
and
C 6∈ A
such that
C < B
. Then
A is not (C \ B, B \ C)-compressed. A contradiction.
Lemma.
Given
A ∈ X
(r)
, there exists
B ⊆ X
(r)
such that
B
is (
U, V
)-
compressed for all |U| = |V |, U ∩V = ∅, max V > max U, and moreover
|B| = |A|, |∂B| ≤ |∂A|. (∗)
Proof. Let B be such that
X
B∈B
X
i∈B
2
i
is minimal among those
B
’s that satisfy (
∗
). We claim that this
B
will do. Indeed,
if there exists (
U, V
) such that
|U|
=
|V |
,
max V > max U
and
C
UV
(
B
)
6
=
B
,
then pick such a pair with
|U|
minimal. Then apply a (
U, V
)-compression, which
is valid since given any
u ∈ U
we can pick any
v ∈ V
that is not
max V
to satisfy
the requirements of the previous lemma. This decreases the sum, which is a
contradiction.
From these, we conclude that
Theorem
(Kruskal 1963, Katona 1968)
.
Let
A ⊆ X
(r)
, and let
C ⊆ X
(r)
be
the initial segment with |C| = |A|. Then
|∂A| ≥ |∂C|.
We can now define the shadow function
∂
(r)
(m) = min{|∂A| : A ⊆ X
(r)
, |A| = m}.
This does not depend on the size of
X
as long as
X
is large enough to accommo-
date
m
sets, i.e.
n
r
≥ m
. It would be nice if we can concretely understand this
function. So let’s try to produce some initial segments.
Essentially by definition, an initial segment is uniquely determined by the
last element. So let’s look at some examples.
Example.
Take
r
= 4. What is the size of the initial segment ending in 3479?
We note that anything that ends in something less than 8 is less that 3479,
and there are
8
4
such elements. If you end in 9, then you are still fine if the
second-to-last digit is less than 7, and there are
6
3
such elements. Continuing,
we find that there are
8
4
+
6
3
+
4
2
such elements.
Given
m
r
> m
r−1
> ··· > m
s
≥ s
, we let
B
(r)
(
m
r
, m
r−1
, . . . , m
s
) be the
initial segment ending in the element
m
r
+ 1, m
r−1
+ 1, . . . , m
s+1
+ 1, m
s
, m
s
− 1, m
s
− 2, . . . , m
s
− (s − 1).
This consists of the sets
{a
1
< a
2
< ··· < a
r
}
such that there exists
j ∈
[
s, r
]
with a
i
= m
i
+ 1 for i > j, and a
j
≤ m
j
.
To construct an element in
B
(r)
(
m
r
, . . . , m
s
), we need to first pick a
j
, and
then select j elements that are ≤ m
j
. Thus, we find that
|B
(r)
(m
r
, . . . , m
s
)| =
r
X
j=s
m
j
j
= b
(r)
(m
1
, . . . , m
s
).
We see that this
B
(r)
is indeed the initial segment in the colex order ending
in that element. So we know that for all
m ∈ N
, there is a unique sequence
m
r
> m
r−1
> . . . , > m
s
≥ s such that n =
P
r
j=0
m
j
j
.
It is also not difficult to find the shadow of this set. After a bit of thinking,
we see that it is given by
B
(r−1)
(m
r
, . . . , m
s
).
Thus, we find that
∂
(r)
r
X
i=s
m
i
i
!
=
r
X
i=s
m
i
i −1
,
and moreover every
m
can be expressed in the form
P
r
i=s
m
i
i
for some unique
choices of m
i
.
In particular, we have
∂
(r)
n
r
=
n
r − 1
.
Since it might be slightly annoying to write
m
in the form
P
r
i=s
m
i
i
, Lov´asz
provided another slightly more convenient bound.
Theorem (Lov´asz, 1979). If A ⊆ X
(r)
with |A| =
x
r
for x ≥ 1, x ∈ R, then
|∂A| ≥
x
r − 1
.
This is best possible if x is an integer.
Proof. Let
A
0
= {A ∈ A : 1 6∈ A}
A
1
= {A ∈ A : 1 ∈ A}.
For convenience, we write
A
1
− 1 = {A \ {1} : A ∈ A
1
}.
We may assume
A
is (
i, j
)-compressed for all
i < j
. We induct on
r
and then on
|A|. We have
|A
0
| = |A|−|A
1
|.
We note that A
1
is non-empty, as A is left-compressed. So |A
0
| < |A|.
If r = 1 and |A| = 1 then there is nothing to do.
Now observe that
∂A ⊆ A
1
−
1, since if
A ∈ A
, 1
6∈ A
, and
B ⊆ A
is such
that
|A \ B|
= 1, then
B ∪ {
1
} ∈ A
1
since
A
is left-compressed. So it follows
that
|∂A
0
| ≤ |A
1
|.
Suppose |A
1
| <
x−1
r−1
. Then
|A
0
| >
x
r
−
x −1
r − 1
=
x −1
r
.
Therefore by induction, we have
|∂A
0
| >
x −1
r − 1
.
This is a contradiction, since
|∂A
0
| ≤ |A
1
|
. Hence
|A
1
| ≥
x−1
r−1
. Hence we are
done, since
|∂A| ≥ |∂A
1
| = |A
1
| + |∂(A
1
− 1)| ≥
x −1
r − 1
+
x −1
r − 2
=
x
r − 1
.
4 Isoperimetric inequalities
We are now going to ask a question similar to the one answered by Kruskal–
Katona. Kruskal–Katona answered the question of how small can
∂A
be among
all
A ⊆ X
(r)
of fixed size. Clearly, we obtain the same answer if we sought to
minimized the upper shadow instead of the lower. But what happens if we want
to minimize both the upper shadow and the lower shadow? Or, more generally,
if we allow
A ⊆ P
(
X
) to contain sets of different sizes, how small can the set of
“neighbours” of A be?
Definition
(Boundary)
.
Let
G
be a graph and
A ⊆ V
(
A
). Then the boundary
b(A) is the set of all x ∈ G such that x 6∈ A but x is adjacent to A.
Example. In the following graph
the boundary of the green vertices is the red vertices.
An isoperimetric inequality on G is an inequality of the form
|b(A)| ≥ f(|A|)
for all
A ⊆ G
. Of course, we could set
f ≡
0, but we would like to do better
than that.
The “continuous version” of this problem is well-known. For example, in a
plane, given a fixed area, the perimeter of the area is minimized if we pick the
area to be a disc. Similarly, among subsets of
R
3
of a given volume, the solid
sphere has the smallest surface area. Slightly more exotically, among subsets of
S
2
of given area, the circular cap has smallest perimeter.
Before we proceed, we note the definition of a neighbourhood :
Definition
(Neighbourhood)
.
Let
G
be a graph and
A ⊆ V
(
A
). Then the
neighbourhood of A is N (A) = A ∪ b(A).
Of course,
|b
(
A
)
|
=
|N
(
A
)
| −|A|
, and it is often convenient to express and
prove our isoperimetric inequalities in terms of the neighbourhood instead.
If we look at our continuous cases, then we observe that all our optimal
figures are balls, i.e. they consist of all the points a distance at most
r
from a
point, for some
r
and single point. We would hope that this pattern generalizes.
Of course, it would be a bit ambitious to hope that balls are optimal for all
graphs. However, we can at least show that it is true for the graphs we care
about, namely graphs obtained from power sets.
Definition
(Discrete cube)
.
Given a set
X
, we turn
P
(
X
) into a graph as
follows: join
x
to
y
if
|x
∆
y|
= 1, i.e. if
x
=
y ∪ {a}
for some
a 6∈ y
, or vice versa.
This is the discrete cube Q
n
, where n = |X|.
Example. Q
3
looks like
123
1312 23
21 3
∅
This looks like a cube! Indeed, if we identify each
x ∈ Q
with the 0-1 sequence
of length
n
(e.g. 13
7→
101000
···
0), or, in other words, its indicator function,
then Q
n
is naturally identified with the unit cube in R
n
.
∅
1
3
2 12
13
23 123
Note that in this picture, the topmost layer is the points that do have 3, and the
bottom layer consists of those that do not have a 3, and we can make similar
statements for the other directions.
Example.
Take
Q
3
, and try to find a size
A
of size 4 that has minimum
boundary. There are two things we might try — we can take a slice, or we can
take a ball. In this case, we see the ball is the best.
We can do more examples, and it appears that the ball
X
(≤r)
is the best all
the time. So that might be a reasonable thing to try to prove. But what if we
have |A| such that |X
(≤r)
| < |A| < |X
(≤r+1)
?
It is natural to think that we should pick an
A
with
X
(≤r)
⊆ A ⊆ X
(≤r+1)
,
so we set
A
=
X
(≤r)
∪B
, where
B ⊆ X
(r+1)
. Such an
A
is known as a Hamming
ball.
What B should we pick? Observe that
N(A) = X
(≤r+1)
∪ ∂
+
B.
So we want to pick
B
to minimize the upper shadow. So by Kruskal–Katona,
we know we should pick B to be the initial segment in the lex order.
Thus, if we are told to pick 1000 points to minimize the boundary, we go up
in levels, and in each level, we go up in lex.
Definition
(Simplicial ordering)
.
The simplicial ordering on
Q
n
is defined by
x < y if either |x| < |y|, or |x| = |y| and x < y in lex.
Our aim is to show that the initial segments of the simplicial order minimize
the neighbourhood. Similar to Kruskal–Katona, a reasonable strategy would be
to prove it by compression.
For
A ⊆ Q
n
, and 1
≤ i ≤ n
, the
i
-sections of
A
are
A
(i)
+
, A
(i)
−
⊆ P
(
X \ {i}
)
defined by
A
(i)
−
= {x ∈ A : i 6∈ x}
A
(i)
+
= {x \{i} : x ∈ A, i ∈ x}.
These are the top and bottom layers in the i direction.
The
i
-compression (or co-dimension 1 compression) of
A
is
C
i
(
A
), defined by
C
i
(A)
+
= first |A
+
| elements of P(X \ {i}) in simplicial
C
i
(A)
−
= first |A
−
| elements of P(X \ {i}) in simplicial
Example. Suppose we work in Q
4
, where the original set is
i
The resulting set is then
i
Clearly, we have
|C
i
(
A
)
|
=
|A|
, and
C
i
(
A
) “looks more like” an initial segment
in simplicial ordering than A did.
We say A is i-compressed if C
i
(A) = A.
Lemma. For A ⊆ Q
n
, we have |N(C
i
(A))| ≤ |N(A)|.
Proof. We have
|N(A)| = |N(A
+
) ∪A
−
| + |N (A
−
) ∪A
+
|
Take B = C
i
(A). Then
|N(B)| = |N(B
+
) ∪B
−
| + |N (B
−
) ∪B
+
|
= max{|N(B
+
)|, |B
−
|} + max{|N (B
−
)|, |B
+
|}
≤ max{|N(A
+
)|, |A
−
|} + max{|N (A
−
)|, |A
+
|}
≤ |N(A
+
) ∪A
i
| + |N (A
−
) ∪A
+
|
= |N(A)|
Since each compression moves us down in the simplicial order, we can keep
applying compressions, and show that
Lemma. For any A ⊆ Q
n
, there is a compressed set B ⊆ Q
n
such that
|B| = |A|, |N(B)| ≤ |N(A)|.
Are we done? Does being compressed imply being an initial segment? No!
For
n
= 3, we can take
{∅,
1
,
2
,
12
}
, which is obviously compressed, but is not
an initial segment. To obtain the actual initial segment, we should replace 12
with 3.
∅
1
3
2 12
13
23 123
For
n
= 4, we can take
{∅,
1
,
2
,
3
,
4
,
12
,
13
,
23
}
, which is again compressed by
not an initial segment. It is an initial segment only if we replace 23 with 14.
∅
1
3
2 12
13
23 123
4 14
34
24 124
134
234 1234
We notice that these two examples have a common pattern. The “swap” we
have to perform to get to an initial segment is given by replacing an element
with its complement, or equivalently, swapping something the opposite diagonal
element. This is indeed general.
Lemma.
For each
n
, there exists a unique element
z ∈ Q
n
such that
z
c
is the
successor of z.
Moreover, if
B ⊆ Q
n
is compressed but not an initial segment, then
|B|
=
2
n−1
and
B
is obtained from taking the initial segment of size 2
n−1
and replacing
x with x
c
.
Proof.
For the first part, simply note that complementation is an order-reversing
bijection
Q
n
→ Q
n
, and
|Q
n
|
is even. So the 2
n−1
th element is the only such
element z.
Now if
B
is not an initial segment, then we can find some
x < y
such that
x 6∈ B
and
y ∈ B
. Since
B
is compressed, it must be the case that for each
i
,
there is exactly one of
x
and
y
that contains
i
. Hence
x
=
y
c
. Note that this is
true for all
x < y
such that
x 6∈ B
and
y ∈ B
. So if we write out the simplicial
order, then B must look like
···
since any
x 6∈ B
such that
x < y
must be given by
x
=
y
c
, and so there must be
a unique such
x
, and similarly the other way round. So it must be the case that
y is the successor of x, and so x = z.
We observe that these anomalous compressed sets are worse off than the
initial segments (exercise!). So we deduce that
Theorem
(Harper, 1967)
.
Let
A ⊆ Q
n
, and let
C
be the initial segment in the
simplicial order with |C| = |A|. Then |N(A)| ≥ |N(C)|. In particular,
|A| =
r
X
i=0
n
i
implies |N (A)| ≥
r+1
X
i=0
n
i
.
The edge isoperimetric inequality in the cube
Let
A ⊆ V
be a subset of vertices in a graph
G
= (
V, E
). Consider the edge
boundary
∂
e
A = {xy ∈ E : x ∈ A, y 6∈ A}.
Given a graph
G
, and given the size of
A
, can we give a lower bound for the size
of ∂
e
A?
Example.
Take
G
=
Q
3
. For the vertex isoperimetric inequality, our optimal
solution with |A| = 4 was given by
∅
1
3
2 12
13
23 123
The edge boundary has size 6. However, if we just pick a slice, then the edge
boundary has size 4 only.
More generally, consider
Q
n
=
Q
2k+1
, and take the Hamming ball
B
k
=
X
(≤`)
. Then
∂
e
B
k
= {AB : A ⊆ B ⊆ X : |A| = k, |B| = k + 1}.
So we have
|∂
e
B
k
| =
2k + 1
k + 1
· (k + 1) ∼
2
n
√
n
√
2π
.
However, if we pick the bottom face of
Q
n
. Then
|A|
= 2
n−1
and
|∂
e
A|
= 2
n−1
.
This is much much better.
More generally, it is not unreasonable to suppose that sub-cubes are always
the best. For a k-dimensional sub-cube in Q
n
, we have
|∂
k
| = 2
k
(n −k).
If we want to prove this, and also further solve the problem for
|A|
not a power
of 2, then as our previous experience would suggest, we should define an order
on P(X).
Definition
(Binary order)
.
The binary order on
Q
n
∼
=
P
(
X
) is given by
x < y
if max x∆y ∈ y.
Equivalently, define ϕ : P(X) → N by
ϕ(x) =
X
i∈x
2
i
.
Then x < y if ϕ(x) < ϕ(y).
The idea is that we avoid large elements. The first few elements in the
elements would like like
∅, 1, 2, 123, 13, 23, 123, . . . .
Theorem.
Let
A ⊆ Q
n
be a subset, and let
C ⊆ Q
n
be the initial segment of
length |A| in the binary order. Then |∂
e
C| ≤ |∂
e
A|.
Proof.
We induct on
n
using codimension-1 compressions. Recall that we
previously defined the sets A
(i)
±
.
The
i
-compression of
A
is the set
B ⊆ Q
n
such that
|B
(i)
±
|
=
|A
(i)
±
|
, and
B
(i)
±
are initial segments in the binary order. We set D
i
(A) = B.
Observe that performing
D
i
reduces the edge boundary. Indeed, given any
A, we have
|∂
e
A| = |∂
e
A
(i)
+
| + |∂
e
A
(i)
−
| + |A
(i)
+
∆A
(i)
i
|.
Applying
D
i
clearly does not increase any of those factors. So we are happy.
Now note that if A 6= D
i
A, then
X
x∈A
X
i∈x
2
i
<
X
x∈D
i
A
X
i∈x
2
i
.
So after applying compressions finitely many times, we are left with a compressed
set.
We now hope that a compressed subset must be an initial segment, but this
is not quite true.
Claim. If A is compressed but not an initial, then
A =
˜
B = P(X \ {n}) \{123 ···(n −1)} ∪ {n}.
By direct computation, we have
|∂
e
˜
B| = 2
n−1
− 2(n − 2),
and so the initial segment is better. So we are done.
The proof of the claim is the same as last time. Indeed, by definition, we can
find some
x < y
such that
x 6∈ A
and
y ∈ A
. As before, for any
i
, it cannot be
the case that both
x
and
y
contain
i
or neither contain
i
, since
A
is compressed.
So x = y
c
, and we are done as before.
5 Sum sets
Let G be an abelian group, and A, B ⊆ G. Define
A + B = {a + b : a ∈ A, b ∈ B}.
For example, suppose
G
=
R
and
A
=
{a
1
< a
2
< ··· < a
n
}
and
B
=
{b
1
<
b
2
< ··· < b
m
}
. Surely,
A
+
B ≤ nm
, and this bound can be achieved. Can we
bound it from below? The elements
a
1
+ b
1
, a
1
+ b
2
, . . . , a
1
+ b
m
, a
2
+ b
m
, . . . , a
n
+ b
m
are certainly distinct as well, since they are in strictly increasing order. So
|A + B| ≥ m + n − 1 = |A| + |B|− 1.
What if we are working in a finite group? In general, we don’t have an order, so
we can’t make the same argument. Indeed, the same inequality cannot always
be true, since
|G
+
G|
=
|G|
. Slightly more generally, if
H
is a subgroup of
G
,
then |H + H| = |H|.
So let’s look at a group with no subgroups. In other words, pick G = Z
p
.
Theorem
(Cauchy–Davenport theorem)
.
Let
A
and
B
be non-empty subsets
of Z
p
with p a prime, and |A| + |B| ≤ p + 1. Then
|A + B| ≥ |A| + |B|−1.
Proof.
We may assume 1
≤ |A| ≤ |B|
. Apply induction on
|A|
. If
|A|
= 1, then
there is nothing to do. So assume A ≥ 2.
Since everything is invariant under translation, we may assume 0
, a ∈ A
with
a 6
= 0. Then
{a,
2
a, . . . , pa}
=
Z
p
. So there exists
k ≥
0 such that
ka ∈ B
and
(k + 1)a 6∈ B.
By translating B, we may assume 0 ∈ B and a 6∈ B.
Now 0 ∈ A ∩B, while a ∈ A \ B. Therefore we have
1 ≤ |A ∩B| < |A|.
Hence
|(A ∩B) + (A ∪B)| ≥ |A ∩ B|+ |A ∪ B|− 1 = |A| + |B|− 1.
Also, clearly
(A ∩B) + (A ∪B) ⊆ A + B.
So we are done.
Corollary. Let A
1
, . . . , A
k
be non-empty subsets of Z
p
such that
d
X
i=1
|A
i
| ≤ p + k −1.
Then
|A
1
+ . . . + A
k
| ≥
k
X
i=1
|A
i
| −k + 1.
What if we don’t take sets, but sequences? Let
a
1
, . . . , a
m
∈ Z
n
. What
m
do we need to take to guarantee that there are
m
elements that sums to 0? By
the pigeonhole principle, m ≥ n suffices. Indeed, consider the sequence
a
1
, a
1
+ a
2
, a
1
+ a
2
+ a
3
, ··· , a
1
+ ··· + a
n
.
If they are all distinct, then one of them must be zero, and so we are done. If
they are not distinct, then by the pigeonhole principle, there must be
k < k
0
such that
a
1
+ ··· + a
k
= a
1
+ ··· + a
k
0
.
So it follows that
a
k+1
+ ··· + a
k
0
.
So in fact we can even require the elements we sum over to be consecutive. On
the other hand, m ≥ n is also necessary, since we can take a
i
= 1 for all i.
We can tackle a harder question, where we require that the sum of a fixed
number of things vanishes.
Theorem
(Erd¨os–Ginzburg–Ziv)
.
Let
a
1
, . . . , a
2n−1
∈ Z
n
. Then there exists
I ∈ [2n −1]
(n)
such that
X
i∈I
a
i
= 0
in Z
n
.
Proof. First consider the case n = p is a prime. Write
0 ≤ a
1
≤ a
2
≤ ··· ≤ a
2p−1
< p.
If
a
i
=
a
i+p−1
, then there are
p
terms that are the same, and so we are done
by adding them up. Otherwise, set
A
i
=
{a
i
, a
i+p−1
}
for
i
= 1
, . . . , p −
1, and
A
p
= {a
2p−1
}, then |A
i
| = 2 for i = 1, . . . , p −1 and |A
p
| = 1. Hence we know
|A
1
+ ··· + A
p
| ≥ (2(p − 1) + 1) − p + 1 = p.
Thus, every element in
Z
p
is a sum of some
p
of our terms, and in particular 0 is.
In general, suppose
n
is not a prime. Write
n
=
pm
, where
p
is a prime and
m >
1. By induction, for every 2
m −
1 terms, we can find
m
terms whose sum
is a multiple of m.
Select disjoint S
1
, S
2
, . . . , S
2p−1
∈ [2n −1]
(m)
such that
X
j∈S
i
a
j
= mb
i
.
This can be done because after selecting, say, S
1
, . . . , S
2p−2
, we have
(2n −1) − (2p −2)m = 2m − 1
elements left, and so we can pick the next one.
We are essentially done, because we can pick
j
1
, . . . , j
p
such that
P
p
k=1
b
i
k
is
a multiple of p. Then
p
X
k=1
X
j∈S
i
k
a
j
is a sum of mp = n terms whose sum is a multiple of mp.
6 Projections
So far, we have been considering discrete objects only. For a change, let’s work
with something continuous.
Let K ⊆ R
n
be a bounded open set. For A ⊆ [n], we set
K
A
= {(x
i
)
i∈A
: ∃y ∈ K, y
i
− x
i
for all i ∈ A} ⊆ R
A
.
We write
|K
A
|
for the Lebesgue measure of
K
A
as a subset of
R
A
. The question
we are interested in is given some of these
|K
A
|
, can we bound
|K|
? In some
cases, it is completely trivial.
Example. If we have a partition of A
1
∪ ··· ∪ A
m
= [n], then we have
|K| ≤
m
Y
i=1
|K
A
i
|.
But, for example, in R
3
, can we bound |K| given |K
12
|, |K
13
| and |K
23
|?
It is clearly not possible if we only know, say
|K
12
|
and
|K
13
|
. For example,
we can consider the boxes
0,
1
n
× (0, n) ×(0, n).
Proposition. Let K be a body in R
3
. Then
|K|
2
≤ |K
12
||K
13
||K
23
|.
This is actually quite hard to prove! However, given what we have done so
far, it is natural to try to compress
K
in some sense. Indeed, we know equality
holds for a box, and if we can make
K
look more like a box, then maybe we can
end up with a proof.
For K ⊆ R
n
, its n-sections are the sets K(x) ⊆ R
n−1
defined by
K(x) = {(x
1
, . . . , x
n−1
) ∈ R
n−1
: (x
1
, . . . , x
n−1
, x) ∈ K}.
Proof. Suppose first that each section of K is a square, i.e.
K(x) = (0, f(x)) × (0, f(x)) dx
for all x and some f. Then
|K| =
Z
f(x)
2
dx.
Moreover,
|K
12
| =
sup
x
f(x)
2
≡ M
2
, |K
13
| = |K
23
| =
Z
f(x) dx.
So we have to show that
Z
f(x)
2
dx
2
≤ M
2
Z
f(x) dx
2
,
but this is trivial, because f(x) ≤ M for all x.
Let’s now consider what happens when we compress
K
. For the general case,
define a new body L ⊆ R
3
by setting its sections to be
L(x) = (0,
p
|K(x)|) ×(0,
p
|K(x)|).
Then |L| = |K|, and observe that
|L
12
| ≤ sup |K(x)| ≤
[
K(x)
= |K
12
|.
To understand the other two projections, we introduce
g(x) = |K(x)
1
|, h(x) = |K(x)
2
|.
Now observe that
|L(x)| = |K(x)| ≤ g(x)h(x),
Since
L
(
x
) is a square, it follows that
L
(
x
) has side length
≤ g
(
x
)
1/2
h
(
x
)
1/2
. So
|L
13
| = |L
23
| ≤
Z
g(x)
1/2
h(x)
1/2
dx.
So we want to show that
Z
g
1/2
h
1/2
dx
2
≤
Z
g dx
Z
h dx
.
Observe that this is just the Cauchy–Schwarz inequality applied to
g
1/2
and
h
1/2
. So we are done.
Let’s try to generalize this.
Definition ((Uniform) cover). We say a family A
1
, . . . , A
r
⊆ [n] covers [n] if
r
[
i=1
A
r
= [n],
and is a uniform k-cover if each i ∈ [n] is in exactly k many of the sets.
Example.
With
n
= 3, the singletons
{
1
}, {
2
}, {
3
}
form a 1-uniform cover,
and so does
{
1
}, {
2
,
3
}
. Also,
{
1
,
2
}, {
1
,
3
}
and
{
2
,
3
}
form a uniform 2-cover.
However, {1, 2} and {2, 3} do not form a uniform cover of [3].
Note that we allow repetitions.
Example. {1}, {1}, {2, 3}, {2}, {3} is a 2-uniform cover of [3].
Theorem
(Uniform cover inequality)
.
If
A
1
, . . . , A
r
is a uniform
k
-cover of [
n
],
then
|K|
k
=
r
Y
i=1
|K
A
|.
Proof. Let A be a k-uniform cover of [k]. Note that A is a multiset. Write
A
−
= {A ∈ A : n 6∈ A}
A
+
= {A \{n} ∈ A : n ∈ A}
We have |A
+
| = k, and A
+
∪ A
−
forms a k-uniform cover of [n − 1].
Now note that if K = R
n
and n 6∈ A, then
|K
A
| ≥ |K(x)
A
| (1)
for all x. Also, if n ∈ A, then
|K
A
| =
Z
|K(x)
A\{n}
| dx. (2)
In the previous proof, we used Cauchy–Schwarz. What we need here is H¨older’s
inequality
Z
fg dx ≤
Z
f
p
dx
1/p
Z
g
q
dx
1/q
,
where
1
p
+
1
q
= 1. Iterating this, we get
Z
f
1
···f
k
dx ≤
k
Y
i=1
Z
f
k
i
dx
1/k
.
Now to perform the proof, we induct on
n
. We are done if
n
= 1. Otherwise,
given K ⊆ R
n
and n ≥ 2, by induction,
|K| =
Z
|K(x)| dx
≤
Z
Y
A∈A
−
|K(x)
A
|
1/k
Y
A∈A
+
|K(x)
A
|
1/k
dx (by induction)
≤
Y
A∈A
−
|K
A
|
1/k
Z
Y
A∈A
+
|K(x)
A
|
1/k
dx (by (1))
≤
Y
A≤|A
−
|K
A
|
1/k
Y
A∈A
+
Z
|K(x)
A
|
1/k
(by H¨older)
=
Y
A∈A
|K
A
|
1/k
Y
A∈A
+
|K
A∪{n}
|
1/k
. (by (2))
This theorem is great, but we can do better. In fact,
Theorem
(Box Theorem (Bollob´as, Thomason))
.
Given a body
K ⊆ R
n
, i.e.
a non-empty bounded open set, there exists a box
L
such that
|L|
=
|K|
and
|L
A
| ≤ |K
A
| for all A ⊆ [n].
Of course, this trivially implies the uniform cover theorem. Perhaps more
surprisingly, we can deduce this from the uniform cover inequality.
To prove this, we first need a lemma.
Definition
(Irreducible cover)
.
A uniform
k
-cover is reducible if it is the disjoint
union of two uniform covers. Otherwise, it is irreducible.
Lemma. There are only finitely many irreducible covers of [n].
Proof.
Let
A
and
B
be covers. We say
A < B
if
A
is a “subset” of
B
, i.e. for
each A ⊆ [n], the multiplicity of A in A is less than the multiplicity in B.
Then note that the set of irreducible uniform
k
-covers form an anti-chain,
and observe that there cannot be an infinite anti-chain.
Proof of box theorem. For A an irreducible cover, we have
|K|
k
≤
Y
A∈A
|K
A
|.
Also,
|K
A
| ≤
Y
i∈A
|K
{i}
|.
Let
{x
A
:
A ⊆
[
n
]
}
be a minimal array with
x
A
≤ |K
A
|
such that for each
irreducible k-cover A, we have
|K|
k
≤
Y
A∈A
x
A
(1)
and moreover
x
A
≤
Y
i∈A
x
{i}
(2)
for all
A ⊆
[
n
]. We know this exists since there are only finitely many inequalities
to be satisfied, and we can just decrease the
x
A
’s one by one. Now again by
finiteness, for each
x
A
, there must be at least one inequality involving
x
A
on the
right-hand side that is in fact an equality.
Claim.
For each
i ∈
[
n
], there exists a uniform
k
i
-cover
C
i
containing
{i}
with
equality
|K|
k
i
=
Y
A∈C
i
x
A
.
Indeed if
x
i
occurs on the right of (1), then we are done. Otherwise, it occurs
on the right of (2), and then there is some
A
such that (2) holds with equality.
Now there is some cover
A
containing
A
such that (1) holds with equality. Then
replace A in A with {{j} : j ∈ A}, and we are done.
Now let
C =
n
[
i=1
C
i
, C
0
= C \ {{1}, {2}, . . . , {n}}, k =
n
X
i=1
k
i
.
Then
|K|
k
=
Y
A∈C
x
A
=
Y
A∈C
1
x
A
!
≥ |K|
k−1
n
Y
i=1
x
i
.
So we have
|K| ≥
n
Y
i=1
x
i
.
But we of course also have the reverse inequality. So it must be the case that
they are equal.
Finally, for each
A
, consider
A
=
{A} ∪{{i}
:
i 6∈ A}
. Then dividing (1) by
Q
i∈A
x
i
gives us
Y
i6∈A
x
i
≤ x
A
.
By (2), we have the inverse equality. So we have
x
A
=
Y
i∈A
x
i
for all i. So we are done by taking L to be the box with side length x
i
.
Corollary.
If
K
is a union of translates of the unit cube, then for any (not
necessarily uniform) k-cover A, we have
|K|
k
≤
Y
A∈A
|K
A
|.
Here a k-cover is a cover where every element is covered at least k times.
Proof.
Observe that if
B ⊆ A
, then
|K
B
| ≤ |K
A
|
. So we can reduce
A
to a
uniform k-cover.
7 Alon’s combinatorial Nullstellensatz
Alon’s combinatorial Nullstellensatz is a seemingly unexciting result that has
surprisingly many useful consequences.
Theorem
(Alon’s combinatorial Nullstellensatz)
.
Let
F
be a field, and let
S
1
, . . . , S
n
be non-empty finite subsets of
F
with
|S
i
|
=
d
i
+ 1. Let
f ∈
F
[
X
1
, . . . , X
n
] have degree
d
=
P
n
i=1
d
i
, and let the coefficient of
X
d
1
1
···X
d
n
n
be non-zero. Then f is not identically zero on S = S
1
× ··· × S
n
.
Its proof follows from generalizing facts we know about polynomials in one
variable. Here
R
will always be a ring;
F
always a field, and
F
q
the unique field
of order q = p
n
. Recall the following result:
Proposition
(Division algorithm)
.
Let
f, g ∈ R
[
X
] with
g
monic. Then we can
write
f = hg + r,
where deg h ≤ deg f − deg g and deg r < deg g.
Our convention is that deg 0 = −∞.
Let
X
= (
X
1
, . . . , X
n
) be a sequence of variables, and write
R
[
X
] =
R[X
1
, . . . , X
n
].
Lemma.
Let
f ∈ R
[
X
], and for
i
= 1
, . . . , n
, let
g
i
(
X
i
)
∈ R
[
X
i
]
⊆ R
[
X
]
be monic of degree
deg g
i
=
deg
X
i
g
i
=
d
i
. Then there exists polynomials
h
1
, . . . , h
n
, r ∈ R[X] such that
f =
X
f
i
g
i
+ r,
where
deg h
i
≤ deg f − deg d
i
deg
X
i
r ≤ d
i
− 1
deg
X
i
h
i
≤ deg
X
i
f − d
i
deg
X
i
r ≤ deg
X
i
f
deg
X
j
h
i
≤ deg
X
j
f deg r ≤ deg f
for all i, j.
Proof.
Consider
f
as a polynomial with coefficients in
R
[
X
2
, . . . , X
n
], then divide
by g
1
using the division algorithm. So we write
f = h
1
g
1
+ r
1
.
Then we have
deg
X
1
h
1
≤ deg
X
1
f − d
1
deg
X
1
r
1
≤ d
1
− 1
deg h
1
≤ deg f deg
X
j
r
1
≤ deg
X
j
f
deg
X
j
h
1
≤ deg
X
j
f deg r ≤ deg f.
Then repeat this with f replaced by r
1
, g
1
by g
2
, and X
1
by X
2
.
We also know that a polynomial of one variable of degree
n ≥
1 over a field
has at most n zeroes.
Lemma.
Let
S
1
, . . . , S
n
be non-empty finite subsets of a field
F
, and let
h ∈ F
[
X
]
be such that
deg
X
i
h < |S
i
|
for
i
= 1
, . . . , n
. Suppose
h
is identically 0 on
S = S
1
× ··· × S
n
⊆ F
n
. Then h is the zero polynomial.
Proof.
Let
d
i
=
|S
i
|−
1. We induct on
n
. If
n
= 1, then we are done. For
n ≥
2,
consider
h
as a one-variable polynomial in
F
[
X
1
, . . . , X
n−1
] in
X
n
. Then we can
write
h =
d
n
X
i=0
g
i
(X
1
, . . . , X
n−1
)X
i
m
.
Fix (
x
1
, . . . , x
n−1
)
∈ S
1
× ···S
n−1
, and set
c
i
=
g
i
(
x
1
, . . . , x
n−1
)
∈ F
. Then
P
d
n
i=0
c
i
X
i
n
vanishes on
S
n
. So
c
i
=
g
i
(
x
1
, . . . , x
n−1
) = 0 for all (
x
1
, . . . , x
n−1
)
∈
S
1
× ··· × S
n−1
. So by induction, g
i
= 0. So h = 0.
Another fact we know about polynomials in one variables is that if
f ∈ F
[
X
]
vanishes at z
1
, . . . , z
n
, then f is a multiple of
Q
n
i=1
(X − z
i
).
Lemma. For i = 1, . . . , n, let S
i
be a non-empty finite subset of F, and let
g
i
(X
i
) =
Y
s∈S
i
(X
i
− s) ∈ F[X
i
] ⊆ F [X].
Then if
f ∈ F
[
X
] is identically zero on
S
=
S
1
× ··· × S
n
, then there exists
h
i
∈ F[X], deg h
i
≤ deg f − |S
i
| and
f =
n
X
i=1
h
i
g
i
.
Proof. By the division algorithm, we can write
f =
n
X
i=1
h
i
g
i
+ r,
where
r
satisfies
deg
X
i
r < deg g
i
. But then
r
vanishes on
S
1
×···×S
n
, as both
f and g
i
do. So r = 0.
We finally get to Alon’s combinatorial Nullstellensatz.
Theorem
(Alon’s combinatorial Nullstellensatz)
.
Let
S
1
, . . . , S
n
be non-empty
finite subsets of
F
with
|S
i
|
=
d
i
+ 1. Let
f ∈ F
[
X
] have degree
d
=
P
n
i=1
d
i
,
and let the coefficient of
X
d
1
1
···X
d
n
n
be non-zero. Then
f
is not identically zero
on S = S
1
× ··· × S
n
.
Proof.
Suppose for contradiction that
f
is identically zero on
S
. Define
g
i
(
X
i
)
and h
i
as before such that
f =
X
h
i
g
i
.
Since the coefficient of
X
d
1
1
···X
d
n
n
is non-zero in
f
, it is non-zero in some
h
j
g
j
.
But that’s impossible, since
deg h
j
≤
n
X
i=1
d
i
!
− deg g
j
=
X
i6=j
d
i
− 1,
and so h
j
cannot contain a X
d
1
1
···
ˆ
X
j
d
j
···X
d
n
n
term.
Let’s look at some applications. Here
p
is a prime,
q
=
p
k
, and
F
q
is the
unique field of order q.
Theorem (Chevalley, 1935). Let f
1
, . . . , f
m
∈ F
q
[X
1
, . . . , X
n
] be such that
m
X
i=1
deg f
i
< n.
Then the f
i
cannot have exactly one common zero.
Proof.
Suppose not. We may assume that the common zero is 0 = (0
, . . . ,
0).
Define
f =
m
Y
i=1
(1 −f
i
(X)
q−1
) −γ
n
Y
i=1
Y
s∈F
×
q
(X
i
− s),
where γ is chosen so that F (0) = 0, namely the inverse of
Q
s∈F
×
q
(−s)
m
.
Now observe that for any non-zero
x
, the value of
f
i
(
x
)
q−1
= 1, so
f
(
x
) = 0.
Thus, we can set
S
i
=
F
q
, and they satisfy the hypothesis of the theorem. In
particular, the coefficient of
X
q −1
1
···X
q −1
n
is
γ 6
= 0. However,
f
vanishes on
F
n
q
.
This is a contradiction.
It is possible to prove similar results without using the combinatorial Null-
stellensatz. These results are often collectively refered to as Chevalley–Warning
theorems.
Theorem
(Warning)
.
Let
f
(
X
) =
f
(
X
1
, . . . , X
n
)
∈ F
q
[
X
] have degree
< n
.
Then N (f), the number of zeroes of f is a multiple of p.
One nice trick in doing these things is that finite fields naturally come with
an “indicator function”. Since the multiplicative group has order
q −
1, we know
that if x ∈ F
q
, then
x
q −1
=
(
1 x 6= 0
0 x = 0
.
Proof. We have
1 −f (x)
q −1
=
(
1 f(x) = 0
0 otherwise
.
Thus, we know
N(f) =
X
x∈F
n
q
(1 −f (x)
q−1
) = −
X
x∈F
n
q
f(x)
q −1
∈ F
q
.
Further, we know that if k ≥ 0, then
X
x∈F
n
q
x
k
=
(
−1 k = q − 1
0 otherwise
.
So let’s write
f
(
x
)
q −1
as a linear combination of monomials. Each monomial
has degree
< n
(
q −
1). So there is at least one
k
such that the power of
X
k
in
that monomial is
< q −
1. Then the sum over
X
k
vanishes for this monomial.
So each monomial contributes 0 to the sum.
We can use Alon’s combinatorial Nullstellensatz to effortlessly prove some of
our previous theorems.
Theorem
(Cauchy–Davenport theorem)
.
Let
p
be a prime and
A, B ⊆ Z
p
be
non-empty subsets with |A| + |B| ≤ p + 1. Then |A + B| ≥ |A|+ |B| − 1.
Proof.
Suppose for contradiction that
A
+
B ⊆ C ⊆ Z
p
, and
|C|
=
|A|
+
|B|−
2.
Let’s come up with a polynomial that encodes the fact that
C
contains the sum
A + B. We let
f(X, Y ) =
Y
c∈C
(X + Y − c).
Then f vanishes on A × B, and deg f = |C|.
To apply the theorem, we check that the coefficient of
X
|A|−1
Y
|B|−1
is
|C|
|A|−1
,
which is non-zero in
Z
p
, since
C < p
. This contradicts Alon’s combinatorial
Nullstellensatz.
We can also use this to prove Erd¨os–Ginzburg–Ziv again.
Theorem
(Erd¨os–Ginzburg–Ziv)
.
Let
p
be a prime and
a
1
, . . . , a
2p+1
∈ Z
p
.
Then there exists I ∈ [2p −1]
(p)
such that
X
i∈I
a
i
= 0 ∈ Z
p
.
Proof. Define
f
1
(X
1
, . . . , X
2p−1
) =
2p−1
X
i=1
X
p−1
i
.
f
2
(X
1
, . . . , X
2p−1
) =
2p−1
X
i=1
a
i
X
p−1
i
.
Then by Chevalley’s theorem, we know there cannot be exactly one common
zero. But 0 is one common zero. So there must be another. Take this solution,
and let
I
=
{i
:
x
i
6
= 0
}
. Then
f
1
(
X
) = 0 is the same as saying
|I|
=
p
, and
f
2
(X) = 0 is the same as saying
P
i∈I
a
i
= 0.
We can also consider restricted sums. We set
A
·
+ B = {a + b : a ∈ A, b ∈ B, a 6= b}.
Example. If n 6= m, then
[n]
·
+ [m] = {3, 4, . . . , m + n}
[n]
·
+ [n] = {3, 4, . . . , 2n −1}
From this example, we show that if
|A| ≥
2, then
|A
·
+ A|
can be as small as
2|A| −3. In 1964, Erd¨os and Heilbronn
Conjecture
(Erd¨os–Heilbronn, 1964)
.
If 2
|A| ≤ p
+ 3, then
|A
·
+ A| ≥
2
|A|−
3.
This remained open for 30 years, and was proved by Dias da Silva and
Hamidoune. A much much simpler proof was given by Alon, Nathanson and
Ruzsa in 1996.
Theorem.
Let
A, B ⊆ Z
p
be such that 2
≤ |A| < |B|
and
|A|
+
|B| ≤ p
+ 2.
Then A
·
+ B ≥ |A| + |B|−2.
The above example shows we cannot do better.
Proof. Suppose not. Define
f(X, Y ) = (X − Y )
Y
c∈C
(X + Y − c),
where A
·
+ B ⊆ C ⊆ Z
p
and |C| = |A| + |B|−3.
Then deg g = |A| + |B| −2, and the coefficient of X
|A|−1
Y
|B|−1
is
|A| + |B| − 3
|A| −2
−
|A| + |B| − 3
|A| −1
6= 0.
Hence by Alon’s combinatorial Nullstellensatz,
f
(
x, y
) is not identically zero on
A ×B. A contradiction.
Corollary
(Erd¨os–Heilbronn conjecture)
.
If
A, B ⊆ Z
p
, non-empty and
|A|
+
|B| ≤ p + 3, and p is a prime, then |A
·
+ B| ≥ |A|+ |B| −3.
Proof. We may assume 2 ≤ |A| ≤ |B|. Pick a ∈ A, and set A
0
= A \{a}. Then
|A
·
+ B| ≥ |A
0
·
+ B| ≥ |A
0
| + |B| − 2 = |A| + |B| − 3.
Now consider the following problem: suppose we have a circular table
Z
2n+1
.
Suppose the host invites
n
couples, and the host, being a terrible person, wants
the
i
th couple to be a disatnce
d
i
apart for some 1
≤ d
i
≤ n
. Can this be done?
Theorem. If 2n + 1 is a prime, then this can be done.
Proof.
We may wlog assume the host is at 0. We want to partition
Z
p
\{
0
}
=
Z
×
p
into
n
pairs
{x
i
, x
i
+
d
i
}
. Consider the polynomial ring
Z
p
[
X
1
, . . . , X
n
] =
Z
p
[
X
].
We define
f(x) =
Y
i
X
i
(X
i
+d
i
)
Y
i<j
(X
i
−X
j
)(X
i
+d
i
−X
j
)(X
i
−X
j
−d
j
)(X
i
+d
i
−X
j
−d
j
).
We want to show this is not identically zero on Z
n
p
First of all, we have
deg f = 4
n
2
+ 2n = 2n
2
.
So we are good. The coefficient of X
2n
1
···X
2n
n
is the same as that in
Y
X
2
i
Y
i<j
(X
i
− X
j
)
4
=
Y
X
2
i
Y
i6=j
(X
i
− X
j
)
2
=
Y
X
2n
i
Y
i6=j
1 −
X
i
X
j
2
.
This, we are looking for the constant term in
Y
i6=j
1 −
X
i
X
j
2
.
By a question on the example sheet, this is
2n
2, 2, . . . , 2
6= 0 in Z
p
.
Our final example is as follows: suppose we are in
Z
p
, and
a
1
, . . . , a
p
and
c
1
, . . . , c
p
are enumerations of the elements, and
b
i
=
c
i
− a
i
. Then clearly we
have
P
b
i
= 0. Is the converse true? The answer is yes!
Theorem.
If
b
1
, . . . , b
p
∈ Z
p
are such that
P
b
i
= 0, then there exists numer-
ations
a
1
, . . . , a
p
and
b
1
, . . . , b
p
of the elements of
Z
p
such that for each
i
, we
have
a
i
+ b
i
= c
i
.
Proof.
It suffices to show that for all (
b
i
), there are distinct
a
1
, ··· , a
p−1
such
that a
i
+ b
i
6= a
j
+ b
j
for all i 6= j. Consider the polynomial
Y
i<j
(X
i
− X
j
)(X
i
+ b
i
− X
j
− b
j
).
The degree is
2
p −1
2
= (p −1)(p − 2).
We then inspect the coefficient of
X
p−2
1
···X
p−2
p−1
, and checking that this is
non-zero is the same as above.