3Computability theory
III Logic
3.1 Computability
We begin by discussing some number theory. Consider the Diophantine equation
x
2
+ y
2
= z
2
. (∗)
We can certainly write down some solutions to this equation, e.g.
3
2
+ 4
2
= 5
2
.
But is it possible to find all solutions? It is a well-known result that we can. Of
course, to do so, it suffices to enumerate all solutions where
x, y, z
are coprime
It turns out every such solution of (∗) can be written as
x = m
2
− n
2
, y = 2mn, z = m
2
+ n
2
for some
m, n ∈ N
, and conversely, for any
m, n ∈ N
, the (
x, y, z
) given by this
formula is a solution. Thus, if we want to list all solutions to (
∗
), then it suffices
to plug in all possible values of
m
and
n
. In other words, we have an algorithm
that enumerates all solutions to (∗).
Given any Diophantine equation, it would be great to have some similar
algorithm that enumerates all solutions to the equation. But even before that, it
would be great to have some algorithms that tell us whether there is any solution
at all. This was Hilbert’s 10th problem — is there an algorithm that determines
whether or not a Diophantine equation has a solution?
If such an algorithm existed, then to prove this, we merely have to exhibit
such an algorithm, and prove that it works? But how could we prove that there
isn’t one. To do so, we must formulate an “algorithm” as a mathematical object.
Alternatively, we want a formal notion of what is a “computable” function.
One way to do so is to define “computable functions” recursively. We will be
concerned with functions
N
n
→ N
m
for different
k
and
m
, and the idea is to list
some functions that any sensible computer should be able to compute, and then
close it under operations that really should preserve computability. We then
hope that the set of functions produced this way is exactly the functions that
can be “computed” by an “algorithm”.
We start with a naive attempt, and what we obtain is known as “primitive
recursive” functions.
Definition
(Primitive recursive functions)
.
The class of primitive recursive
functions N
n
→ N
m
for all n, m is defined inductively as follows:
– The constantly zero function f(x) = 0, f : N
n
→ N is primitive recursive.
–
The successor function
succ
:
N → N
sending a natural number to its
successor (i.e. it “plus one”) is primitive recursive.
– The identity function id : N → N is primitive recursive.
– The projection functions
proj
i
j
(x) : N
j
N
(x
1
, · · · , x
j
) x
i
are primitive recursive.
Moreover,
–
Let
f
:
N
k
→ N
m
and
g
1
, · · · , g
k
:
N
n
→ N
be primitive recursive. Then
the function
(x
1
, · · · , x
n
) 7→ f(g
1
(x
1
, · · · , x
n
), · · · , g
k
(x
1
, · · · , x
n
)) : N
n
→ N
m
is primitive recursive.
Finally, we have closure under primitive recursion
–
If
g
:
N
k
→ N
m
and
f
:
N
m+k+1
→ N
m
are primitive recursive, then so is
the function h : N
k+1
→ N
m
defined by
h(0, x) = g(x)
h(succ n, x) = f(h(n, x), n, x).
Do these give us everything we want? We first notice that we can define
basic arithmetic using primitive recursion by
plus(0, n) = n
plus(succ m, n) = succ(plus(m, n))
mult(1, n) = n
mult(succ m, n) = plus(n, mult(m, n)),
Similarly, we can define exp etc.
What else can we do? We might want to define the Fibonacci function
Fib(0) = Fib(1) = 1, Fib(n + 1) = Fib(n) + Fib(n − 1)?
But the definition of primitive recursion only allows us to refer to
h
(
n, x
) when
computing
h
(
succ n, x
). This isn’t really a problem. The trick is to not define
Fib directly, but to define the function
Fib
0
(n) =
Fib(n)
Fib(n − 1)
instead, using primitive recursion, and then projecting back to the first compo-
nent. But this requires us to fix the number of things we are required to refer
back to. What if we want to refer back to everything less than n?
The solution to this is that we can actually encode pairs as natural numbers
themselves:
Proposition.
There exists primitive recursion functions
pair
:
N
2
→ N
and
unpair : N → N
2
such that
unpair(pair(x, y)) = (x, y)
for all x, y ∈ N.
Proof. We can define the pairing function by
pair(x, y) =
x + y + 1
2
+ y.
The unpairing function can be shown to be primitive recursive, but is more
messy.
The benefit of this is that we can now encode lists as naturals. What would
it mean to be a list? We will informally denote a list by square brackets, e.g.
[x
1
, · · · , x
n
]. The idea is that we have functions head, tail, cons such that
head [x
1
, · · · , x
n
] = x
1
tail [x
1
, · · · , x
n
] = [x
2
, · · · , x
n
]
cons(x, [x
1
, · · · , x
n
]) = [x, x
1
, · · · , x
n
]
If we think a bit harder, then what we are really looking for is really the following
property:
Corollary.
There exists
cons
:
N → N, head
:
N → N
and
tail
:
N → N
such that
cons(head x, tail x) = x
Proof.
Take
cons
=
pair
;
head
to be the first projection of the unpairing and
tail
to be a second projection.
So we can encode lists as well. We can lock ourselves in a room with enough
pen and paper, and try to prove that most functions
N → N
we encounter “in
daily life” are indeed primitive recursive.
However, it turns out there are some contrived examples that really should
be computable, but not primitive recursive.
Definition (Ackermann function). The Ackermann function is defined to be
A(0, n) = n + 1
A(m, 0) = A(m − 1, 1)
A(m + 1, n + 1) = A(m, A(m + 1, n)).
Proposition. The Ackermann function is well-defined.
Proof.
To see this, note that computing
A
(
m
+ 1
, n
+ 1) requires knowledge of
the values of A(m + 1, n), and A(m, A(m + 1, n)).
Consider
N × N
ordered lexicographically. Then computing
A
(
m
+ 1
, n
+ 1)
requires knowledge of the values of
A
at pairs lying below
hm
+ 1
, n
+ 1
i
in this
order. Since the lexicographic order of
N × N
is well-ordered (it has order type
ω × ω), by transfinite induction, we know A is well-defined.
We can indeed use this definition to compute
A
, and it is going to take
forever. But we can still do it.
However, this function is not primitive recursive! Intuitively, this is because
the definition of
A
does “transfinite recursion over
ω × ω
”, while the definition
of primitive recursion only allows us to do “transfinite recursion over ω”.
That is the idea, but obviously not a proof. To prove it properly, this is done
by proving that the Ackermann “grows too fast”.
Definition
(Dominating function)
.
Let
f, g
:
N → N
be functions. Then we
write
f < g
if for all sufficiently large integer
n
, we have
f
(
n
)
< g
(
n
). We say
g
dominates f.
It is a remarkable theorem that
Theorem.
The function
n 7→ A
(
n, n
) dominates all primitive recursive functions.
We will not prove this, as it is messy, and not really the point of the course.
The point of bringing up the Ackermann function is just to know that we are
not doing good enough!
The obvious fix would be to allow some more complicated form of recursion,
and hope that we will be safe. But this doesn’t really do the job.
It turns out the problem is that we are only defining total functions. If we
tried to compute any of the primitive recursive functions, we are guaranteed
that we can do it in finite time. In particular, our functions are always defined
everywhere. But anyone with programming experience knows very well that
most programs we write are not everywhere defined. Often, they are nowhere
defined, and just doesn’t run, or perhaps gets stuck in a loop for ever! But we
did write some code for this program, so surely this nowhere defined function
should be computable as well!
Our way of thinking about functions that are not everywhere defined is via
non-termination. We imagine we have a computer program trying to compute
a function
f
. If
f
(42) is undefined, instead of the program running for a while
and then saying “I give up”, the program keeps running forever,
It turns out we can “fix” the problem merely by introducing one operation —
inverses. This is slightly subtle. Imagine we are writing a compute program, and
suppose we have a (computable) function
f
:
N → N
. We want to compute an
inverse to
f
. Say we want to find
f
−1
(17). To compute this, we just compute
f
(0),
f
(1),
f
(2), etc., and if 17 is in the image, then we will eventually find
something that gets sent to 17.
There are two minor problems and one major problems with this:
– f
might not be injective, and there are multiple values that get sent to 17.
If this happens, we just agree that we return the first natural number that
gets sent to 17, which is what is going to happen if we perform the above
procedure.
– f
might not be surjective, and we can never find something that gets sent
to 17. But this is not a problem, since we just decided that our functions
don’t have to be total! If
f
is not surjective, we can just let the program
keep running forever.
But there is still a major problem. Since we decided functions need not be total,
what happens if
f
is total? We might have
f
(23) = 17, but
f
(11) never halts.
So we never learn that f(23) = 17.
We might think we can get around this problem by “diagonalizing”. We
assume our computers perform computations for discrete time steps. Then we
compute
f
(1) for one step; then
f
(1) and
f
(2) for two steps; then
f
(1),
f
(2)
and
f
(3) for three steps, etc. Then if
f
(23) = 17, and it takes 100 steps of
computation to compute this, then we will figure this out on the 100th iteration
of this process.
This doesn’t really solve our problem, though. If
f
is not injective, then this
doesn’t guarantee to return the minimum
x
such that
f
(
x
) = 17. It just returns
some arbitrary x such that f (x) = 17. This is bad.
So we make a compromise. The obvious compromise is to just allow computing
inverses of total functions, but since total functions are hard to invert, we shall
make a further compromise to just allow computing inverse of primitive recursive
functions.
Before we continue, we define some useful conventions:
Notation.
We write
f
(
x
)
↑
if
f
(
x
) is undefined, or alternatively, after we define
what this actually means, if the computation of
f
(
x
) doesn’t halt. We write
f(x) ↓ otherwise.
Definition
(Partial recursive function)
.
The class of partial recursive functions
is given inductively by
– Every primitive recursive function is partial recursive.
–
The inverse of every primitive recursive function is partial recursive, where if
f
:
N
k+n
→ N
m
, then the/an inverse of
f
is the function
f
−1
:
N
k+m
→ N
n
given by
f
−1
(x; y) =
(
min{z : f(x, z) = y} if exists
↑ otherwise
.
–
The set of partial recursive functions is closed under primitive recursion
and composition.
Of course, this allows us to compute the inverse of functions of several inputs
using pairing and unpairing functions.
It turns out this definition is “correct”. What do we mean by this?
What we have described so far is syntactic declarations of functions. We
describe how we can build up our functions. We can also define computability
semantically, by describing some rules for constructing machines that compute
functions. In other words, we actually define what a “computer” is, and then a
function is computable if it can be computed by a computer. What we want to
prove is that the partial recursive functions are exactly the functions that can
be computed by a machine.
We will not go into details about how we can define these machines. It is pos-
sible to do so; it doesn’t really matter; and the actual details are messy. However,
any sensible construction of machines will satisfy the following description:
–
A machine can be described by a finite string. In particular, it is possible
to number all the possible machines. Thus, each machine can be specified
by a G¨odel number uniquely. We write
{m}
for the machine corresponding
to the numbering m.
–
A machine has a countable (enumerated) number of possible states, and
one of them is called
HALT
. A machine also has a register that can store a
natural number.
–
A machine can take in a fixed number of inputs. Afterwards, in every
discrete time step, it (potentially) changes the content of its register, and
moves to a new state. However, if the current state is
HALT
, then it does
nothing.
For inputs x
1
, · · · , x
n
into the machine {m}, we write
{m}(x
1
, · · · , x
n
) =
(
value in the register after halting if machine halts
↑ otherwise
Thus, we will also write {m} for the function computed by {m}.
–
The above process is “computable” in the following precise sense — we
define Kleene’s
T
function
T
(
m, i, t
) :
N
3
→ N
whose value encodes (in a
primitive recursive way) the states and register contents during the first
t
steps in the evaluation of
{m}
(
i
). Then
T
is a primitive recursive function.
–
For any partial recursive function
f
, there is a machine
{m}
such that
{m}(x) = f(x) for all x ∈ N. In particular {m}(x) always halts.
Now of course, given our requirements for what a “machine” should be, it is
absolutely obvious that functions computable by programs in the above sense is
exactly the same as the functions that are partial recursive. The only perhaps
slightly non-trivial part is to show that every function computed by a machine
is partial recursive. To prove this, given a machine
{m}
, inversion gives us the
function
g(i) = min{t : T (m, i, t) has final state HALT},
and then we can define
h
(
i
) by computing
T
(
m, i, g
(
i
)) and then extracting the
register content in the final state, which we assumed is a primitive recursive
process. Then h(i) = {m}(i), and so {m} is a partial recursive function.