III Logic - Proof theory and constructive logic

1Proof theory and constructive logic

III Logic

1.2 Curry–Howard correspondence

Return to the days where we did IID Logic and Set Theory. Instead of natural

deduction, we bootstrapped our system with two axioms K and S:

K: A → (B → A)

S: [A → (B → C)] → [(A → B) → (A → C)]

It is a fact that

{K, S}

axiomatizes the conditional (

→

) fragment of constructive

propositional logic. To axiomatize the conditional fragment of classical logic, we

need a new law, Peirce’s law:

((A → B) → A) → A.

In particular, we cannot prove Peirce’s law from K and S.

How can we go about proving that Peirce’s law is unprovable? The most

elementary way of doing so would be to try to allow for more truth values, such

that

and modus ponens hold, but Peirce’s law does not. It turns out three

truth values are sufficient. We consider the following truth table for →:

→ 1 2 3

1 1 2 3

2 1 1 3

3 1 1 1

Here we interpret 1 as “true”, 3 as “false”, and 2 as “maybe”.

By writing out the truth tables, one can verify that the formulas

and

always take truth value 1. Also, if

and

A → B

are both 1, then so is

. So

by induction on the structure of the proof, we know that anything deduced from

K and S and modus ponens will have truth value 1.

However, if we put

= 2 and

= 3, then ((

A → B

)

→ A

)

→ A

has value 2,

not 1. So it cannot possibly be deduced from K and S.

This is, of course, a very unsatisfactory answer. The truth table was just

pulled out of a hat, and we just checked to see that Peirce’s law doesn’t hold. It

doesn’t tell us where the truth table came from, and why we can’t prove Peirce’s

law.

We will answer the second question first using the Curry–Howard correspon-

dence. Then we will see how we can come up with this truth table using possible

world semantics in the next chapter.

Consider the piece of syntax

A → B.

So far, we have viewed this as a logical formula —

and

are formulae, and

→

denotes implication. However, in mathematics, there is another common use

of this notation — we can think of

and

as sets, and

A → B

is the set of

all functions from

. The idea of the Curry–Howard correspondence is to

view implication as functions between sets.

While this might seem like a bizarre thing to do, it actually makes sense.

Given a formula

, we can view

as a “set”, whose “elements” are the proofs

. Now if we know

A → B

, this means any proof of

becomes a proof of

. In other words, we have a function

A → B

. As we move on, we will see that

this idea of viewing a formula

as the set of all proofs of

gives a very close

correspondence between the logical connectives and set-theoretic constructions.

This is also known as propositions as types, and it is common to call

etc “types” instead of “sets”.

Under this correspondence, modus ponens has a very natural interpretation.

We can just think of the rule

A A → B

as function application! If we have an element

a ∈ A

, and an element

f ∈ A → B

then we have

(

)

∈ B

. Now it is awkward to write

f ∈ A → B

instead of

A → B

. So in fact, we will write

A → B

instead. Moreover, we will use

the colon instead of ∈ for all types. So we can write

x : A f : A → B

f(x) : B

Often, we will just write

instead of

(

). If we write

fxy

, then it means

(fx)y, but we will usually put brackets to indicate.

Example.

What is the interpretation of

under the Curry–Howard correspon-

dence? It corresponds to a function

A → (B → A).

Let’s try to think of a function that does this. Given an element

, we need

to find a function

B → A

. Since the only thing we know about is

, we could

return the constant function that always returns

. So for each

, we define

K(a) to be the constant function a.

This is indeed where the name

came from. It’s just that the inventor was

German, and so it is not C.

What about conjunction? To prove

A ∧ B

, we need to prove

and

. In

other words, an element of

A ∧ B

should be an element of the Cartesian product

A × B.

We can write the introduction rule as

x : A y : B

hx, yi : A ∧ B

Here we are using the angled brackets to denote ordered pairs. The elimination

rules are just given by the projection fst : A × B → A and snd : A × B → B:

x : A ∧ B

fst(x) : A

x : A ∧ B

snd(x) : B

Some people write fst and snd as π

and π

instead.

We are not going to talk about the rules for “or”, because they are horrendous.

But if we think hard about it, then

A ∨ B

corresponds to the disjoint union of

A and B.

So far, we only know how to decorate our LEGO pieces. When we assemble

them to get proofs, we can get more complicated tress. Suppose we had the tree

[A ∧ B]

∧-elim

[A ∧ B]

∧-elim

∧-int

B ∧ A

→-int (1)

(A ∧ B) → (B ∧ A)

We now try to decorate this tree:

x : [A ∧ B]

∧-elim

snd(x) : B

x : [A ∧ B]

∧-elim

fst(x) : A

∧-int

hsnd(x), fst(x)i : B ∧ A

→-int (1)

(A ∧ B) → (B ∧ A)

Note that both [

A ∧ B

]

are decorated using the same letter

. This is since

they are cancelled at the same time, so we are using the same instance of

A ∧ B

in the statement (A ∧ B) → (B ∧ A).

Now the remaining question to answer is how we are going to decorate the

→

-introduction. We want some notation that denotes the function that takes

and returns

hsnd

(

)

, fst

(

). More generally, if we have an expression

(

) in

we want a notation that denotes the function that takes in x and returns ϕ(x).

The convention is to denote this using a λ:

λx.ϕ(x).

These expressions we produce are known as λ expressions.

For example, in the previous case, the final result is

λx.hfst(x), snd(x)i : (A ∧ B) → (B ∧ A).

The general decoration rule for →-introduction is

x : [A]

ϕ(x) : C

→-int (1)

λx.ϕ(x) : A → C

Recall that previously we had a rule for getting rid of maximal formulae. Given

a tree of the form

→-int

P → Q

→-elim

We can just move the second column to the top of the first to get

This conversion corresponds to the conversion

(λx.ϕ(x))a ϕ(a),

which is just applying the function! This process is known as β-reduction.

Since there is

-reduction, there is, of course, also

-reduction, but this

is boring. It is just re-lettering

λx.φ

(

) to

λy.φ

(

). There will also be an

η-reduction, and we will talk about that later.

In the following exercise/example, note that we have a convention that if we

write A → B → C, then we mean A → (B → C).

Exercise. Find λ-expressions that encode proofs of

(i) ((A → B) ∧ (C → D)) → ((A ∧ C) → (B ∧ D))

(ii) (A → B) → (B → C) → (A → C)

(iii) ((A → B) → A) → (A → B) → B

(iv) (A → B → C) → (B → A → C)

(v) (A → B → C) → ((A ∧ B) → C)

(vi) (B ∧ A → C) → (A → B → C)

Note that statements of the form

A → B → C

are to be interpreted as

A →

(B → C).

Solutions.

(i) λf.λx.hfst(f)(fst(x)), snd(f)(snd(x))i

(ii) λf.λg.λa.g(fa)

(iii) λf.λg.g(fg)

(iv) λf.λb.λa.(fa)b

(v) λf.λx.f(fst x)(snd x)

(vi) λf.λa.λb.fhb, ai

One can write out the corresponding trees explicitly. For example, (iii) can be

done by

g : [A → B]

f : [(A → B) → A]

→-elim

fg : A g : [A → B]

→-elim

g(fg)

→-int (1)

λg.g(fg) : (A → B) → B

→-int (2)

λf.λg.g(fg) : (A → B) → A → (A → B) → B

Note that we always decorate assumptions with a single variable, say

, even

if they are very complicated. For example, if we have an assumption

A ∧ B

, it

might be tempting to decorate it as ha, bi, but we do not.

Now what we have is that for any proof tree, we obtain a

formula. But for

any formula, it can have many proofs, and so it can be given by many

formula.

Example. Consider

x : [A]

f : [A → A]

→-elim

fx : A

→-int (1)

λx.fx : A → A

→-int (2)

λf.λx.fx : (A → A) → (A → A)

This is really just the identity function!

We can also try something slightly more complicated.

x : [A]

f : [A → A]

fx : A f : [A → A]

f(fx) : A

→-int (1)

λx.f(f(x)) : A → A

→-int (2)

λf.λx.f(fx) : (A → A) → (A → A)

This is another

term for (

A → A

)

→

(

A → A

). This says given any

we return the function that does

twice. These are two genuinely different

functions, and thus these two

terms correspond to two different proofs of

(A → A) → (A → A).

Note that given any proof tree, we can produce some

expression representing

the proof rather systematically. However, given any

expression, it is difficult

to figure out what it is a proof of, or if it represents a proof at all. For example,

if we write down

λx.xx,

then we have no idea what this can possibly mean.

The problem is that we don’t know what formula, or type each variable is

representing. The expression

λf.λx.f(fx)

is confusing, because we need to guess that

is probably just some arbitrary

variable of type

, and that

is probably a function

A → A

. Instead, we can

make these types explicit:

λf

A→A

.λx

A→A

Given such a term, we can systematically deduce that it is of type (

A → A

)

→

(

A → A

), i.e. it encodes a proof of (

A → A

)

→

(

A → A

). Such an expression

with explicit types is known as a typed

term, and one without types is known

as an untyped λ term.

Moreover, it is also straightforward to reconstruct the proof tree from this

expression. Of course, we need to make sure the typing of the expression is valid.

For example, we cannot write

A→A

since

is not a function from

. Assuming we restrict to these valid terms, we

have established a correspondence

Proofs in propositional natural deduction are in a 1-to-1 correspondence

with typed λ terms.

That is, if we also make it clear what the rules for ∨ are.

Finally, what about

⊥

? To what set does the proposition

⊥

correspond? The

proposition

⊥

should have no proofs. So we would think

⊥

should correspond

to ∅.

In propositional calculus, the defining property of

⊥

is that we can prove any

proposition we like from

⊥

. Indeed, the empty set is a set that admits a function

to any set

whatsoever, and in fact this function can be uniformly defined for

all sets, by the empty function. We again see that the ideas of propositional

calculus correspond nicely with the properties of sets.

Example. Consider the law of excluded middle:

((A → ⊥) → ⊥) → A.

Can we write down a λ expressions for this?

We will argue rather informally. If there is a

expression of this type, then

it should define the function “uniformly” without regards to what

is about.

We now see what this function could be.

–

is empty, then

A → ⊥

contains exactly the empty function. So

(

A → ⊥

)

→ ⊥

is the empty set, since there is no function from a non-

empty set to the empty set. And there is a function

∅ → A

, namely the

empty function again.

–

is non-empty, then

A → ⊥

contains no functions, so (

A → ⊥

)

→ ⊥

contains the empty function. So ((

A → ⊥

)

→ ⊥

)

→ A

is given by exactly

an element of A.

We see that to construct such a function, we need to first know if

is empty or

not. If it is non-empty, then we have to arbitrarily pick an element of

. Thus,

there is no way we can uniformly construct such a function without knowing

anything about A.

Proposition.

We cannot prove ((

A → B

)

→ A

)

→ A

in natural deduction

without the law of excluded middle.

Proof. Suppose there were a lambda term P : ((A → B) → A) → A.

We pick

B = {0, 1}, A = {0, 1, 2, 3, 4}.

In this setting, any function f : A → B identifies a distinguished member of B,

namely the one that is hit by more members of

. We know

B ⊆ A

, so this

is an element in

. So we have a function

: (

A → B

)

→ A

. Then

(

) is a

member of A. We claim that in fact P (F ) ∈ B.

Indeed, we write

(

) =

a ∈ A

. Let

be a 3-cycle moving everything in

A \ B

and fixing

. Then this induces an action on functions among

and

by conjugation. We have.

π(P (F )) = π(a).

But since

is given by a

term, it cannot distinguish between the different

memebers of A \ B. So we have

P (π(F )) = π(a).

We now claim that

(

) =

. By construction,

(

) =

−1

◦ F ◦ π

. Then for

all f : A → B, we have

π(F )(f) = (π

−1

◦ f ◦ π)(f) = (π

−1

◦ F )(π(f))

= π

−1

(F (π(f ))) = π

−1

(F (f)) = F (f).

Noting that

fixes

, the only equality that isn’t obvious from unravelling the

definitions is

(

)) =

(

). Indeed, we have

(

) =

−1

◦ f ◦ π

f ◦ π

. But

(

f ◦ π

) =

(

) because our explicit construction of

depends only on the

image of

. So

(

) =

. So this implies

(

), which means

a ∈ B

. So we

always have P (F ) ∈ B.

But this means we have found a way of uniformly picking a distinguished

element out of a two-membered set. Indeed, given any such set

, we pick any

three random objects, and combine it with

to obtain

. Then we just apply

P (F ). This is clearly nonsense.