1Proof theory and constructive logic
III Logic
1.2 Curry–Howard correspondence
Return to the days where we did IID Logic and Set Theory. Instead of natural
deduction, we bootstrapped our system with two axioms K and S:
K: A → (B → A)
S: [A → (B → C)] → [(A → B) → (A → C)]
It is a fact that
{K, S}
axiomatizes the conditional (
→
) fragment of constructive
propositional logic. To axiomatize the conditional fragment of classical logic, we
need a new law, Peirce’s law:
((A → B) → A) → A.
In particular, we cannot prove Peirce’s law from K and S.
How can we go about proving that Peirce’s law is unprovable? The most
elementary way of doing so would be to try to allow for more truth values, such
that
K
,
S
and modus ponens hold, but Peirce’s law does not. It turns out three
truth values are sufficient. We consider the following truth table for →:
→ 1 2 3
1 1 2 3
2 1 1 3
3 1 1 1
Here we interpret 1 as “true”, 3 as “false”, and 2 as “maybe”.
By writing out the truth tables, one can verify that the formulas
K
and
S
always take truth value 1. Also, if
A
and
A → B
are both 1, then so is
B
. So
by induction on the structure of the proof, we know that anything deduced from
K and S and modus ponens will have truth value 1.
However, if we put
A
= 2 and
B
= 3, then ((
A → B
)
→ A
)
→ A
has value 2,
not 1. So it cannot possibly be deduced from K and S.
This is, of course, a very unsatisfactory answer. The truth table was just
pulled out of a hat, and we just checked to see that Peirce’s law doesn’t hold. It
doesn’t tell us where the truth table came from, and why we can’t prove Peirce’s
law.
We will answer the second question first using the Curry–Howard correspon-
dence. Then we will see how we can come up with this truth table using possible
world semantics in the next chapter.
Consider the piece of syntax
A → B.
So far, we have viewed this as a logical formula —
A
and
B
are formulae, and
→
denotes implication. However, in mathematics, there is another common use
of this notation — we can think of
A
and
B
as sets, and
A → B
is the set of
all functions from
A
to
B
. The idea of the Curry–Howard correspondence is to
view implication as functions between sets.
While this might seem like a bizarre thing to do, it actually makes sense.
Given a formula
A
, we can view
A
as a “set”, whose “elements” are the proofs
of
A
. Now if we know
A → B
, this means any proof of
A
becomes a proof of
B
. In other words, we have a function
A → B
. As we move on, we will see that
this idea of viewing a formula
A
as the set of all proofs of
A
gives a very close
correspondence between the logical connectives and set-theoretic constructions.
This is also known as propositions as types, and it is common to call
A
,
B
etc “types” instead of “sets”.
Under this correspondence, modus ponens has a very natural interpretation.
We can just think of the rule
A A → B
B
as function application! If we have an element
a ∈ A
, and an element
f ∈ A → B
,
then we have
f
(
a
)
∈ B
. Now it is awkward to write
f ∈ A → B
instead of
f
:
A → B
. So in fact, we will write
f
:
A → B
instead. Moreover, we will use
the colon instead of ∈ for all types. So we can write
x : A f : A → B
f(x) : B
Often, we will just write
fx
instead of
f
(
x
). If we write
fxy
, then it means
(fx)y, but we will usually put brackets to indicate.
Example.
What is the interpretation of
K
under the Curry–Howard correspon-
dence? It corresponds to a function
A → (B → A).
Let’s try to think of a function that does this. Given an element
a
:
A
, we need
to find a function
B → A
. Since the only thing we know about is
a
, we could
return the constant function that always returns
a
. So for each
a
:
A
, we define
K(a) to be the constant function a.
This is indeed where the name
K
came from. It’s just that the inventor was
German, and so it is not C.
What about conjunction? To prove
A ∧ B
, we need to prove
A
and
B
. In
other words, an element of
A ∧ B
should be an element of the Cartesian product
A × B.
We can write the introduction rule as
x : A y : B
hx, yi : A ∧ B
Here we are using the angled brackets to denote ordered pairs. The elimination
rules are just given by the projection fst : A × B → A and snd : A × B → B:
x : A ∧ B
fst(x) : A
x : A ∧ B
snd(x) : B
Some people write fst and snd as π
1
and π
2
instead.
We are not going to talk about the rules for “or”, because they are horrendous.
But if we think hard about it, then
A ∨ B
corresponds to the disjoint union of
A and B.
So far, we only know how to decorate our LEGO pieces. When we assemble
them to get proofs, we can get more complicated tress. Suppose we had the tree
[A ∧ B]
1
∧-elim
B
[A ∧ B]
1
∧-elim
A
∧-int
B ∧ A
→-int (1)
(A ∧ B) → (B ∧ A)
We now try to decorate this tree:
x : [A ∧ B]
1
∧-elim
snd(x) : B
x : [A ∧ B]
1
∧-elim
fst(x) : A
∧-int
hsnd(x), fst(x)i : B ∧ A
→-int (1)
(A ∧ B) → (B ∧ A)
Note that both [
A ∧ B
]
1
are decorated using the same letter
x
. This is since
they are cancelled at the same time, so we are using the same instance of
A ∧ B
in the statement (A ∧ B) → (B ∧ A).
Now the remaining question to answer is how we are going to decorate the
→
-introduction. We want some notation that denotes the function that takes
x
and returns
hsnd
(
x
)
, fst
(
x
). More generally, if we have an expression
ϕ
(
x
) in
x
,
we want a notation that denotes the function that takes in x and returns ϕ(x).
The convention is to denote this using a λ:
λx.ϕ(x).
These expressions we produce are known as λ expressions.
For example, in the previous case, the final result is
λx.hfst(x), snd(x)i : (A ∧ B) → (B ∧ A).
The general decoration rule for →-introduction is
x : [A]
1
.
.
.
ϕ(x) : C
→-int (1)
λx.ϕ(x) : A → C
Recall that previously we had a rule for getting rid of maximal formulae. Given
a tree of the form
P
.
.
.
Q
→-int
P → Q
.
.
.
P
→-elim
Q
We can just move the second column to the top of the first to get
.
.
.
Q
This conversion corresponds to the conversion
(λx.ϕ(x))a ϕ(a),
which is just applying the function! This process is known as β-reduction.
Since there is
β
-reduction, there is, of course, also
α
-reduction, but this
is boring. It is just re-lettering
λx.φ
(
x
) to
λy.φ
(
y
). There will also be an
η-reduction, and we will talk about that later.
In the following exercise/example, note that we have a convention that if we
write A → B → C, then we mean A → (B → C).
Exercise. Find λ-expressions that encode proofs of
(i) ((A → B) ∧ (C → D)) → ((A ∧ C) → (B ∧ D))
(ii) (A → B) → (B → C) → (A → C)
(iii) ((A → B) → A) → (A → B) → B
(iv) (A → B → C) → (B → A → C)
(v) (A → B → C) → ((A ∧ B) → C)
(vi) (B ∧ A → C) → (A → B → C)
Note that statements of the form
A → B → C
are to be interpreted as
A →
(B → C).
Solutions.
(i) λf.λx.hfst(f)(fst(x)), snd(f)(snd(x))i
(ii) λf.λg.λa.g(fa)
(iii) λf.λg.g(fg)
(iv) λf.λb.λa.(fa)b
(v) λf.λx.f(fst x)(snd x)
(vi) λf.λa.λb.fhb, ai
One can write out the corresponding trees explicitly. For example, (iii) can be
done by
g : [A → B]
1
f : [(A → B) → A]
2
→-elim
fg : A g : [A → B]
1
→-elim
g(fg)
→-int (1)
λg.g(fg) : (A → B) → B
→-int (2)
λf.λg.g(fg) : (A → B) → A → (A → B) → B
Note that we always decorate assumptions with a single variable, say
f
, even
if they are very complicated. For example, if we have an assumption
A ∧ B
, it
might be tempting to decorate it as ha, bi, but we do not.
Now what we have is that for any proof tree, we obtain a
λ
formula. But for
any formula, it can have many proofs, and so it can be given by many
λ
formula.
Example. Consider
x : [A]
1
f : [A → A]
2
→-elim
fx : A
→-int (1)
λx.fx : A → A
→-int (2)
λf.λx.fx : (A → A) → (A → A)
This is really just the identity function!
We can also try something slightly more complicated.
x : [A]
1
f : [A → A]
2
fx : A f : [A → A]
2
f(fx) : A
→-int (1)
λx.f(f(x)) : A → A
→-int (2)
λf.λx.f(fx) : (A → A) → (A → A)
This is another
λ
term for (
A → A
)
→
(
A → A
). This says given any
f
,
we return the function that does
f
twice. These are two genuinely different
functions, and thus these two
λ
terms correspond to two different proofs of
(A → A) → (A → A).
Note that given any proof tree, we can produce some
λ
expression representing
the proof rather systematically. However, given any
λ
expression, it is difficult
to figure out what it is a proof of, or if it represents a proof at all. For example,
if we write down
λx.xx,
then we have no idea what this can possibly mean.
The problem is that we don’t know what formula, or type each variable is
representing. The expression
λf.λx.f(fx)
is confusing, because we need to guess that
x
is probably just some arbitrary
variable of type
A
, and that
f
is probably a function
A → A
. Instead, we can
make these types explicit:
λf
A→A
.λx
A
.f
A→A
(f
A→A
x
A
).
Given such a term, we can systematically deduce that it is of type (
A → A
)
→
(
A → A
), i.e. it encodes a proof of (
A → A
)
→
(
A → A
). Such an expression
with explicit types is known as a typed
λ
term, and one without types is known
as an untyped λ term.
Moreover, it is also straightforward to reconstruct the proof tree from this
expression. Of course, we need to make sure the typing of the expression is valid.
For example, we cannot write
f
A→A
x
B
,
since
f
is not a function from
B
. Assuming we restrict to these valid terms, we
have established a correspondence
Proofs in propositional natural deduction are in a 1-to-1 correspondence
with typed λ terms.
That is, if we also make it clear what the rules for ∨ are.
Finally, what about
⊥
? To what set does the proposition
⊥
correspond? The
proposition
⊥
should have no proofs. So we would think
⊥
should correspond
to ∅.
In propositional calculus, the defining property of
⊥
is that we can prove any
proposition we like from
⊥
. Indeed, the empty set is a set that admits a function
to any set
X
whatsoever, and in fact this function can be uniformly defined for
all sets, by the empty function. We again see that the ideas of propositional
calculus correspond nicely with the properties of sets.
Example. Consider the law of excluded middle:
((A → ⊥) → ⊥) → A.
Can we write down a λ expressions for this?
We will argue rather informally. If there is a
λ
expression of this type, then
it should define the function “uniformly” without regards to what
A
is about.
We now see what this function could be.
–
If
A
is empty, then
A → ⊥
contains exactly the empty function. So
(
A → ⊥
)
→ ⊥
is the empty set, since there is no function from a non-
empty set to the empty set. And there is a function
∅ → A
, namely the
empty function again.
–
If
A
is non-empty, then
A → ⊥
contains no functions, so (
A → ⊥
)
→ ⊥
contains the empty function. So ((
A → ⊥
)
→ ⊥
)
→ A
is given by exactly
an element of A.
We see that to construct such a function, we need to first know if
A
is empty or
not. If it is non-empty, then we have to arbitrarily pick an element of
A
. Thus,
there is no way we can uniformly construct such a function without knowing
anything about A.
Proposition.
We cannot prove ((
A → B
)
→ A
)
→ A
in natural deduction
without the law of excluded middle.
Proof. Suppose there were a lambda term P : ((A → B) → A) → A.
We pick
B = {0, 1}, A = {0, 1, 2, 3, 4}.
In this setting, any function f : A → B identifies a distinguished member of B,
namely the one that is hit by more members of
A
. We know
B ⊆ A
, so this
is an element in
A
. So we have a function
F
: (
A → B
)
→ A
. Then
P
(
F
) is a
member of A. We claim that in fact P (F ) ∈ B.
Indeed, we write
P
(
F
) =
a ∈ A
. Let
π
be a 3-cycle moving everything in
A \ B
and fixing
B
. Then this induces an action on functions among
A
and
B
by conjugation. We have.
π(P (F )) = π(a).
But since
P
is given by a
λ
term, it cannot distinguish between the different
memebers of A \ B. So we have
P (π(F )) = π(a).
We now claim that
π
(
F
) =
F
. By construction,
π
(
F
) =
π
−1
◦ F ◦ π
. Then for
all f : A → B, we have
π(F )(f) = (π
−1
◦ f ◦ π)(f) = (π
−1
◦ F )(π(f))
= π
−1
(F (π(f ))) = π
−1
(F (f)) = F (f).
Noting that
π
fixes
f
, the only equality that isn’t obvious from unravelling the
definitions is
F
(
π
(
f
)) =
F
(
f
). Indeed, we have
π
(
f
) =
π
−1
◦ f ◦ π
=
f ◦ π
. But
F
(
f ◦ π
) =
F
(
f
) because our explicit construction of
F
depends only on the
image of
f
. So
π
(
F
) =
F
. So this implies
a
=
π
(
a
), which means
a ∈ B
. So we
always have P (F ) ∈ B.
But this means we have found a way of uniformly picking a distinguished
element out of a two-membered set. Indeed, given any such set
B
, we pick any
three random objects, and combine it with
B
to obtain
A
. Then we just apply
P (F ). This is clearly nonsense.