3Computability theory
III Logic
3.5 Computability by λ-calculus
We first introduced
λ
-calculus in the setting of decorating natural deduction
proofs. Remarkably, it turns out
λ
-calculus can be used to encode any program
at all! For the sake of concreteness, we will properly define λ calculus.
Definition (λ terms). The set Λ of λ terms is defined recursively as follows:
– If x is any variable, then x ∈ Λ.
– If x is a variable and g ∈ Λ, then λx. g ∈ Λ.
– If f, g ∈ Λ, then (f g) ∈ Λ.
As always, we use variables sensibly, so that we don’t write things like
x
(
λx. fx
). We also write
f
(
x
) to denote a
λ
term that has a free variable
x
,
and then f(y) would mean taking f and replacing all occurrences of x with y.
We will often be lazy and write λxy.f instead of λx. λy. f.
We can define free variables and bound variables in the obvious way. If we
want to be really formal, we can define them as follows:
Definition (Free and bound variables).
– In the λ term x, we say x is a free variable.
– In the λ term λx. g, the free variables are all free variables of g except x.
–
In the
λ
term (
f g
), the free variables is the union of the free variables of
f and those of g,
The variables that are not free are said to be bound.
The definitions become more subtle when we have, say, a variable that is
free in
f
and bound in
g
, but this is not a course on pedantry. We will just not
worry about these, and keep ourself disciplined and not wrote things like that.
We will gloss over subtleties like this in our upcoming definitions as well.
As mentioned previously, we want to think of
λx. f
(
x
) as some sort of
function. So we want things like
(λx. f(x)) y = f (y)
to be true. Of course, they are not actually equal as expressions, but they are
more-or-less equivalent.
Definition
(
α
-equivalence)
.
We say two
λ
terms are
α
-equivalent if they are
the same up to renaming of bound variables.
Example. λx. x and λy. y are α-equivalent.
For any sane purposes at all in
λ
calculus,
α
-equivalent terms are really
considered equal.
Definition
(
β
-reduction)
.
If
f
= (
λx. y
(
x
))
z
and
g
=
y
(
z
), then we say
g
is
obtained from f via β-reduction.
This captures the idea that λ expressions are really functions.
Definition
(
η
-reduction)
. η
-conversion is the conversion from
λx.
(
f x
) to
f
,
whenever x is not free in f.
This captures “function extensionality”, i.e. the idea that functions are the
same iff they give the same results for all inputs.
Definition
(
β
-normal form)
.
We say a term is in
β
-normal form if it has no
possible β-reduction.
Our model of computation with
λ
calculus is as follows. A program is a
λ
expression
f
, and an input is another
λ
expression
x
. To run
f
on
x
, we take
f x
, and then try to
β
-reduce it as far as possible. If it reaches a
β
-normal form,
then we are done, and say that is the output. If we never get to a
β
-normal
form, then we say we don’t terminate.
To do so, we need a reduction strategy. It is possible that a
λ
expression can
be β-reduced in many ways, e.g. if we have
(λx. (λy. f y x) x) z,
then we can reduce this to either (
λy. f y z
)
z
or (
λx f x x
)
z
. These are not
α
-equivalent, but of course performing one more
β
-reduction will yield the same
terms. A reduction strategy is a prescription of which
β
-reduction we should
perform when there is more than one choice.
There are some things that might worry us. The first is that if we reduce in
different ways, then we might end up in different normal forms. Even worse, it
could be that one reduction strategy would not terminate, while another might
halt in 3 steps!
For this to work, we need the following theorem:
Theorem.
Every
λ
expression can be reduced to at most one
β
-normal form.
Moreover, there exists a reduction strategy such that whenever a
λ
expression
can be reduced to a
β
-normal form, then it will be reduced to it via this reduction
strategy.
This magic reduction strategy is just to always perform
β
-reduction on the
leftmost thing possible.
Notation ( ). We write f g if f β-reduces to g.
There is another thing we should note, which is currying. If we want to
implement a function with, say, 2 inputs, say a function
f
:
N × N → N
, we can
instead implement it as a function
˜
f : N → (N → N), where
˜
f(x)(y) = f(x, y).
This allows us to always work with functions of one variable that can potentially
return functions. We will usually just write the type as N → N → N.
Note that in this setting, officially, our
λ
terms do not have “types”. However,
it is often convenient to imagine they have types to make things less confusing,
and we will often do so.
With the formalities out of the way, we should start doing things. We claimed
that we can do programming with
λ
terms. The first thing we might want to
try is to encode numbers. How can we do so?
Recalling that types don’t exist in
λ
calculus, consider a “generic type”
A
.
We will imagine that natural numbers take in “functions”
A → A
and returns a
new function A → A. We write
N = (A → A) → (A → A)
for the “type” of natural numbers. The idea is that
n
represents the
λ
term that
sends f to the nth iteration f
n
.
Formally, we define them as follows:
Definition (Church numerals). We define
0 = K(id) = λf. λx. x : N
succ = λn. λf. λx. f ((n f) x) : N → N
We write n = succ
n
(0).
We can define arithmetic as follows:
plus = λn. λm. λf. λx. (n f) ((m f) x) : N → N → N
mult = λn. λm. λf. λx. (n (m f)) x : N → N → N
exp = λn. λm. λf. λx. ((n m) f)) x : N → N → N
Here exp n m = n
m
.
Of course, our “typing” doesn’t really make sense, but helps us figure out
what we are trying to say.
Certain people might prefer to write mult instead as
mult = λn. λm. λf. n (m f).
We can define a lot of other things. It is possible to give motivation for these,
and if one thinks hard enough, these are “obviously” the right definitions to
make. However, we are not really concerned about this, because what we really
want to talk about is recursion. We can define
true = λxy. x
false = λxy. y
if b then x else y = λbxy. b x y
iszero = λn. n (λx. false) true
pair = λxyf. f x y
fst = λp. p true
snd = λp. p false
nil = λx. true
null = λp. p (λxy. false)
We can check that
fst (pair x y) = x
snd (pair x y) = y
Using
pair
,
fst
and
snd
, we can implement lists just as pairs of pair, with functions
head, tail, cons satisfying
cons (head x) (tail x) = x.
We by convention always end our lists with a
nil
, and
null
is a test of whether
something is nil.
Exercise.
Implement the predecessor function that sends
n
to (
n −
1), if
n >
1,
and 0 otherwise. Also implement
and
,
or
, and test of equality for two general
church numerals.
We want to show that we can in fact represent all primitive recursive functions
with
λ
-terms. We will in fact do something stronger — we will implement
general recursion. This is a version of recursion that doesn’t always guarantee
termination, but will be useful if we want to implement “infinite structures”.
We look at an example. We take our favorite example fact : N → N
fact n = if iszero n then 1 else n · fact (n − 1).
But this is not a valid definition, since it is circular, and doesn’t really give us a
λ
expression. The trick is to introduce a new function
metafact
: (
N → N
)
→
(N → N), given by
metafact f n = if n = 0 then 1 else n · f (n − 1)
This is a genuinely legitimate definition of a function. Now suppose
f
is a fixed
point of metafact, i.e. f = metafact f. Then we must have
f(n) = (metafact f)(n) = if n = 0 then 1 else n · (f(n − 1)).
So f is in fact the factorial function!
Thus, we have reduced the problem of recursion to the problem of computing
fixed points. If we have a general fixed point operator, then we can implement
all recursive functions.
Definition (Y-combinator). The Y-combinator is
Y = λf.
h
(λx.f(x x))(λx.f(x x))
i
.
Theorem. For any g, we have
Y g g (Y g).
Proof.
Y g λf.
h
(λx.f(x x))(λx.f(x x))
i
g
(λx.(g(x x)))(λx.g(x x))
g
(λx.(g(x x)))(λx.g(x x))
We also need to implement the minimization operator. To compute
f
−1
(17),
our plan is as follows:
(i) Produce an “infinite list” (stream) of natural numbers.
(ii) Apply a function f to every element of the stream.
(iii) Find the first element in the stream that is sent to 17.
Note that infinite lists are genuinely allowed in
λ
-calculus. These behave like
lists, but they never have an ending
nil
element. We can always extract more
and more terms out of it.
We do them in a slightly different order
(ii)
We implement a function
map
that takes in a list or stream, and applies a
function f to every element of the list. We simply define it as
map f x = if null x then x else cons (f (head x)) (map f (tail x)).
Of course, we need to use our Y combinator to actually achieve this.
(i)
We can produce a stream of natural numbers by asking for the fixed point
of
λ`. cons 0(map succ `).
(iii)
We will actually apply
map
to
λn. pair n
(
f n
), so that we remember the
index. We then use the function
find n x = if snd (head x) = n then fst (head x) else find n (tail x).