Part III — Quantum Computation
Based on lectures by R. Jozsa
Notes taken by Dexter Chua
Michaelmas 2016
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Quantum mechanical processes can be exploited to provide new modes of information
pro cessing that are beyond the capabilities of any classical computer. This leads to
remarkable new kinds of algorithms (so-called quantum algorithms) that can offer
a dramatically increased efficiency for the execution of some computational tasks.
Notable examples include integer factorisation (and consequent efficient breaking of
commonly used public key crypto systems) and database searching. In addition to such
p otential practical benefits, the study of quantum computation has great theoretical
interest, combining concepts from computational complexity theory and quantum
physics to provide striking fundamental insights into the nature of both disciplines.
The course will cover the following topics:
Notion of qubits, quantum logic gates, circuit model of quantum computation. Basic
notions of quantum computational complexity, oracles, query complexity.
The quantum Fourier transform. Exp osition of fundamental quantum algorithms
including the Deutsch-Jozsa algorithm, Shor’s factoring algorithm, Grovers searching
algorithm.
A selection from the following further topics (and possibly others):
(i)
Quantum teleportation and the measurement-based model of quantum computa-
tion;
(ii) Lower bounds on quantum query complexity;
(iii) Phase estimation and applications in quantum algorithms;
(iv) Quantum simulation for local hamiltonians.
Pre-requisites
It is desirable to have familiarity with the basic formalism of quantum mechanics
especially in the simple context of finite dimensional state spaces (state vectors, Dirac
notation, composite systems, unitary matrices, Born rule for quantum measurements).
Prerequisite notes will be provided on the course webpage giving an account of the
necessary material including exercises on the use of notations and relevant calculational
techniques of linear algebra. It would be desirable for you to look through this material
at (or slightly before) the start of the course. Any encounter with basic ideas of classical
theoretical computer science (complexity theory) would b e helpful but is not essential.
Contents
0 Introduction
1 Classical computation theory
2 Quantum computation
3 Some quantum algorithms
3.1 Balanced vs constant problem
3.2 Quantum Fourier transform and periodicities
3.3 Shor’s algorithm
3.4 Search problems and Grover’s algorithm
3.5 Amplitude amplification
4 Measurement-based quantum computing
5 Phase estimation algorithm
6 Hamiltonian simulation
0 Introduction
Quantum computation is currently a highly significant and important subject,
and is very active in international research.
First of all, it is a fundamental connection between physics and computing.
We can think of physics as computing, where in physics, we label states with
parameters (i.e. numbers), and physical evolution changes these parameters.
So we can think of these parameters as encoding information, and physical
evolution changes the information. Thus, this evolution can be thought of as a
computational process.
More strikingly, we can also view computing as physics! We all have com-
puters, and usually represent information as bits, 0 or 1. We often think of
computation as manipulation of these bits, i.e. as discrete maths. However, there
is no actual discrete bits — when we build a computer, we need physical devices
to represent these bits. When we run a computation on a computer, it has to
obey the laws of physics. So we arrive at the idea that the limits of computation
are not a part of mathematics, but depend on the laws of physics. Thus, we can
associate a “computing power” with any theory of physics!
On the other hand, there is also a technology/engineering aspect of quantum
computation. Historically, we have been trying to reduce the size of computers.
Eventually, we will want to try to achieve miniaturization of computer compo-
nents to essentially the subatomic scale. The usual boolean operations we base
our computations on do not work so well on this small scale, since quantum
effects start to kick in. We could try to mitigate these quantum issues and
somehow force the bits to act classically, but we can also embrace the quantum
effects, and build a quantum computer! There is a lot of recent progress in
quantum technology. We are now expecting a 50-qubit quantum computer in full
coherent control soon. However, we are not going to talk about implementation
in this course.
Finally, apart from the practical problem of building quantum computers, we
also have theoretical quantum computer science, where we try to understand how
quantum algorithms behave. This is about how we can actually exploit quantum
physical facts for computational possibilities beyond classical computers. This
will be the focus of the course.
1 Classical computation theory
To appreciate the difference between quantum and classical computing, we need
to first understand classical computing. We will only briefly go over the main
ideas instead of working out every single technical detail. Hence some of the
definitions might be slightly vague.
We start with the notion of “computable”. To define computability, one
has to come up with a sensible mathematical model of a computer, and then
“computable” means that theoretical computer can compute it. So far, any two
sensible mathematical models of computations we manage to come up with are
equivalent, so we can just pick any one of them. Consequently, we will not spend
much time working out a technical definition of computable.
Example.
Let
N
be an integer. We want to figure out if
N
a prime. This is
clearly computable, since we can try all numbers less than
N
and see if it divides
N.
This is not too surprising, but it turns out there are some problems that are
not computable! Most famously, we have the Halting problem.
Example
(Halting problem)
.
Given the code of a computer program, we want
to figure out if the computer will eventually halt. In 1936, Turing proved that
this problem is uncomputable! So we cannot have a program that determines if
an arbitrary program halts.
For a less arbitrary problem, we have
Example.
Given a polynomial with integer coefficients with many variables,
e.g. 2
x
2
y −
17
zw
19
+
x
5
w
3
+ 1, does this have a root in the integers? It was
shown in 1976 that this problem is uncomputable as well!
These results are all for classical computing. If we exp e ct quantum computing
to be somehow different, can we get around this problems? This turns out not to
be the case, for the very reason that all the laws of quantum physics (e.g. state
descriptions, evolution equations) are all computable on a classical computer
(in principle). So it follows that quantum computing, being a quantum process,
cannot compute any classical uncomputable problem.
Despite this limitation, quantum computation is still interesting! In practice,
we do not only care about computability. We care about how efficient we are at
doing the computation. This is the problem of complexity — the complexity of
a quantum computation might be much simpler than the classical counterpart.
To make sense of complexity, we need to make our notion of computations a
bit more precise.
Definition
(Input string)
.
An input bit string is a sequence of bits
x
=
i
1
i
2
···i
n
,
where each
i
k
is either 0 or 1. We write
B
n
for the set of all
n
-bit string, and
B
=
S
n∈N
B
n
. The input size is the length
n
. So in particular, if the input is
regarded as an actual number, the size is not the number itself, but its logarithm.
Definition (Language). A language is a subset L ⊆ B.
Definition
(Decision problem)
.
Given a language
L
, the decision problem is to
determine whether an arbitrary
x ∈ B
is a member of
L
. The output is thus 1
bit of information, namely yes or no.
Of course, we can have a more general task with multiple outputs, but for
simplicity, we will not consider that case here.
Example.
If
L
is the set of all prime numbers, then the corresponding decision
problem is determining whether a number is prime.
We also have to talk about models of computations. We will only give an
intuitive and classical description of it.
Definition
(Computational model)
.
A computational model is a process with
discrete steps (elementary computational steps), where each step requires a
constant amount of effort/resources to implement.
If we think about actual computers that works with bits, we can imagine a
step as an operation such as “and” or “or”. Note that addition and multiplication
are not considered a single step — as the number gets larger, it takes more effort
to add or multiply them.
Sometimes it is helpful to allow some randomness.
Definition
(Randomized/probabilistic computation)
.
This is the same as a usual
computational mo del, but the process also has access to a string
r
1
, r
2
, r
3
, ···
of independent, uniform random bits. In this case, we will often require the
answer/output to be correct with “suitably good” probability.
In computer science, there is a separate notion of “non-deterministic” com-
putation, which is different from probabilistic computation. In probabilistic
computation, every time we ask for a random number, we just pick one of the
possible output and follows that. With a non-deterministic computer, we simul-
taneously consider all possible choices with no extra overhead. This is extremely
powerful, and also obviously physically impossible, but it is a convenient thing
to consider theoretically.
Definition
(Complexity of a computational task (or an algorithm))
.
The com-
plexity of a computational task or algorithm is the “consumption of resources as
a function of input size n”. The resources are usually the time
T (n) = number of computational steps needed,
and space
Sp(n) = number of memory/work space needed.
In each case, we take the worse case input of a given size n.
We usually consider the worst-case scenario, since, e.g. for primality testing,
there are always some numbers which we can easily rule out as being not
prime (e.g. even numbers). Sometimes, we will also want to study the average
complexity.
In the course, we will mostly focus on the time complexity, and not work
with the space complexity itself.
As one would imagine, the actual time or space taken would vary a lot on the
actual computational model. Thus, the main question we ask will be whether
T (n) grows polynomially or sup er-polynomially (“exponentially”) with n.
Definition (Polynomial growth). We say T (n) grows polynomially, and write
T (n) = O(poly(n)) = O(n
k
)
for some
k
, if there is some constant
c
, and some integer
k
and some integer
n
0
such that T (n) < cn
k
for all n > n
0
.
The other possible cases are exponential growth, e.g.
T
(
n
) =
c
1
2
c
2
n
, or
super-polynomial and sub-exponential growth such as T (n) = 2
√
n
or n
log n
.
We will usually regard polynomial time processes as “feasible in practice”,
while super-polynomial ones are considered “infeasible”. Of course, this is not
always actually true. For example, we might have a polynomial time of
n
10
10
10
,
or an exponential time of 2
0.0000...0001n
. However, this distinction of polynomial
vs non-polynomial is robust, since any computational model can “simulate” other
computational models in polynomial time. So if something is polynomial in one
computational model, it is polynomial in all models.
In general, we can have a more refined complexity classes of decision problems:
(i) P
(polynomial time): The class of decision problems having deterministic
polynomial-time algorithm.
(ii) BPP
(bounded error, probabilistic polynomial time): The class of decision
problems having probabilistic polynomial time algorithms such that for
every input,
Prob(answer is correct) ≥
2
3
.
The number
2
3
is sort of arbitrary — we see that we cannot put
1
2
, or
else we can just randomly guess a number. So we need something greater
than
1
2
, and “bounded” refers to it being bounded away from
1
2
. We could
replace
2
3
with any other constant
1
2
+
δ
with 0
< δ <
1
2
, and
BPP
is the
same. This is because if we have a
1
2
+
δ
algorithm, we simply repeat the
algorithm
K
times, and take the majority vote. By the Chernoff bound (a
result in probability), the probability that the majority vote is correct is
>
1
− e
−2δ
2
K
. So as we do more and more runs, the probability of getting
a right answer grows exponentially. This can be bigger than an 1
− ε
by
a suitably large
K
. Since
K
times a polynomial time is still polynomial
time, we still have a polynomial time algorithm.
These two are often considered as “classically feasible computations”, or “com-
putable in practice”. In the second case, we tolerate small errors, but that is fine
in practice, since in genuine computers, random cosmic rays and memory failures
can also cause small errors in the result, even for a deterministic algorithm.
It is clear that
P
is contained in
BPP
, but we do not know about the other
direction. It is not known whether
P
and
BPP
are the same — in general, not
much is known about whether two complexity classes are the same.
Example
(Primality testing)
.
Let
N
be an integer. We want to determine if it
is prime. The input size is
log
2
N
. The naive method of primality testing is to
test all numbers and see if it divides
N
. We only need to test up to
√
N
, since if
N
has a factor, there must be one below
√
N
. The is not polynomial time, since
we need
√
N = 2
1
2
log N
operations, we see that this is exponential time.
How about a probabilistic algorithm? We can choose a random
k < N
, and
see if
k
divides
N
. This is a probabilistic, polynomial time algorithm, but it is
not bounded, since the probability of getting a correct answer is not >
1
2
.
In reality, primality testing is known to be in
BPP
(1976), and it is also
known to be in P (2004).
Finally, we quickly describe a simple model of (classical) computation that we
will use to build upon later on. While the most famous model for computation
is probably the Turing machine, for our purposes, it is much simpler to work
with the circuit model.
The idea is simple. In general, we are working with bits, and a program is a
function
f
:
B
m
→ B
n
. It is a mathematical fact that any such function can be
constructed by combinations of boolean
AND
,
OR
and
NOT
gates. We say that
this is a universal set of gates. Thus a “program” is a specification of how to
arrange these gates in order to give the function we want, and the time taken by
the circuit is simply the number of gates we need.
Of course, we could have chosen a different universal set of gates, and the
programs would be different. However, since only a fixed number of gates is
needed to construct
AND
,
OR
and
NOT
from any universal set, and vice versa,
it follows that the difference in time is always just polynomial.
2 Quantum computation
We are now going to start talking ab out quantum computation. Our model of
quantum computation will be similar to the circuit model.
The main difference is that instead of working with bits, we work with qubits.
A single qubit is an element of C
2
, with basis vectors
|0i =
1
0
, |1i =
0
1
.
When we have multiple qubits, we write them as
|ai|bi
, which is a shorthand
for |ai ⊗ |bi etc.
Now any classical bit string x = i
1
i
2
···i
n
can be encoded as a qubit
|i
1
i|i
2
i···|i
n
i|0i···|0i ∈
n+k
O
i=0
C
2
∼
=
C
2
n+k
,
where we padded
k
extra zeroes at the end. In classical computation, there was
no such need, because within any computation, we were free to introduce or
remove extra bits. However, one peculiarity of quantum computation is that all
processes are invertible, and in particular, the number of qubits is always fixed.
So if we want to do something “on the side” during the computations, the extra
bits needed to do so must be supplied at the beginning.
Now the quantum gates are not just bo olean functions, but unitary operators
on the states. The standard gates will operate on one or two qubits only, and
we can chain them together to get larger operators.
We now list our standard unitary gates. The four main (families) single-qubit
gates we will need are the following (in the standard |0i, |1i basis):
X =
0 1
1 0
, Z =
1 0
0 −1
, H =
1
√
2
1 1
1 −1
, P
ϕ
=
1 0
0 e
ϕ
We also have two “controlled” gates. These controlled gates take in two qubits.
It does not directly change the first qubit, and will decide whether or not to act
on the second bit depending on the value of the first. They are given by
CX |ii|ji = |iiX
i
|ji, CZ |ii|ji = |ii
In the basis {|0i|0i, |0i|1i, |1i|0i, |1i|1i}, we can write these operators as
CX =
1 0 0 0
0 1 0 0
0 0 0 1
0 0 1 0
, CZ =
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 −1
,
These will be taken as the basic unitary gates.
The other thing we are allowed to do in a quantum system is measurement.
What does this do? We wlog we are measuring the first qubit. Suppose the state
before the measurement is given by
c
0
|0i|ai + c
1
|1i|bi,
where |ai, |bi are (n −1)-qubit states of unit norm, and |c
0
|
2
+ |c
1
|
2
= 1.
Then when we measure the first qubit, we have probability
|c
0
|
2
of getting 0,
and probability
|c
1
|
2
of getting 1. After measuring, the resulting state is either
|0i|ai or |1i|bi, depending on whether we measured 0 or 1 respectively.
Measurements are irreversible, and in particular aren’t given by unitary
matrices. We will allow ourselves to make classical computations based on the
results of the measurements, and decide which future quantum gates we apply
based on the results of the classical computations.
While this seems like a very useful thing to do, it is a mathematical fact
that we can modify such a system to an equivalent quantum one where all
the measurements are done at the end instead. So we are not actually adding
anything new to our mathematical model of computation by allowing such
classical manipluations.
Now what is the analogous notion of a universal set of gates? In the classical
case, the set of boolean functions is discrete, and therefore a finite set of gates
can be universal. However, in the quantum case, the possible unitary matrices
are continuous, so no finite set can be universal (more mathematically, there is
an uncountable number of unitary matrices, but a finite collection of gates can
only generate a countable subgroup of matrices).
Thus, instead of asking for universality, we ask for approximate universality.
To appreciate this, we can take the example of rotations — there is no single
rotation that generates all possible rotations in
R
2
. However, we can pick a
rotation by an irrational angle, and then the set of rotations generated by this
rotation is dense in the set of all rotations, and this is good enough.
Definition
(Approximate universality)
.
A collection of gates is approximately
universal if for any unitary matrix
U
and any
ε >
0, there is some circuit
˜
U
built out of the collection of gates such that
U −
˜
U
< ε.
In other words, we have
sup
kψk=1
U |ψi −
˜
U |ψi
< ε,
where we take the usual norm on the vectors (any two norms are equivalent if
the state space is finite dimensional, so it doesn’t really matter).
We will provide some examples without proof.
Example. The infinite set {CX} ∪ {all 1-qubit gates} is exactly universal.
Example. The collection
{H, T = P
π/4
, CX}
is approximately universal.
Similar to the case of classical computation, we can define the following
complexity class:
Definition
(
BQP
)
.
The complexity class
BQP
(bounded error, quantum
polynomial time) is the class of all decision problems computable with polynomial
quantum circuits with at least 2/3 probability of being correct.
We can show that
BQP
is independent of choice of approximately universal
gate set. This is not as trivial as the classical case, since when we switch to
a different set, we cannot just replace a gate with an equivalent circuit — we
can only do so approximately, and we have to make sure we control the error
appropriately to maintain the bound of 2/3.
We will consider
BQP
to be the feasible computations with quantum com-
putations.
It is also a fact that
BPP
is a subset of
BQP
. This, again, is not a trivial
result. In a quantum computation, we act by unitary matrices, which are
invertible. However, boolean functions in classical computing are not invertible
in general. So there isn’t any straightforward plug-in replacement.
However, it turns out that for any classical computation, there is an equivalent
computation that uses reversible/invertible boolean gates, with a modest (i.e.
polynomial) overhead of both space and time resources. Indeed, let
f
:
B
m
→ B
n
be a boolean function. We consider the function
˜
f : B
m+n
→ B
m+n
(x, y) 7→ (x, y ⊕ f (x)),
where
⊕
is the bitwise addition (i.e. addition in (
Z/
2
Z
)
n
, e.g. 011
⊕
110 = 101).
We call x and y the input register and output register respectively.
Now if we set
y
= 0, then we get
f
(
x
) in the second component of
˜
f
. So we
can easily obtain f from
˜
f, and vice versa.
Lemma. For any boolean function f : B
m
→ B
n
, the function
˜
f : B
m+n
→ B
m+n
(x, y) 7→ (x, y ⊕ f (x)),
is invertible, and in fact an involution, i.e. is its own inverse.
Proof.
Simply note that
x ⊕ x
= 0 for any
x
, and bitwise addition is associative.
So we can just consider boolean functions that are invertible. There is an
easy way of making this a unitary matrix.
Lemma.
Let
g
:
B
k
→ B
k
be a reversible permutation of
k
-bit strings. Then
the linear map on C
k
defined by
A : |xi 7→ |g(x)i
on k qubits is unitary.
Proof.
This is because the
x
th column of the matrix of
A
is in fact
A |xi
=
|g(x)i
,
and since g is bijective, the collection of all |g(x)i are all orthonormal.
Thus, given any
f
:
B
m
→ B
n
, we get the
n
+
m
-qubit unitary matrix
denoted by U
f
, given by
U
f
|xi|yi = |xi|y ⊕ f (x)i.
In particular, if we set |yi = |0 ···0i, then we get
|xi|0 ···0i |xi|f(x)i
U
f
,
which we can use to evaluate f(x).
What does quantum computation give us? Our gate
U
f
is unitary, and in
particular acts linearly on the states. So by linearity, we have
1
√
2
n
X
x
|xi|0 ···0i
1
√
2
n
X
x
|xi|f(x)i
U
f
.
Now one run of
U
f
gives us this state that embodies all exp onentially many
values of
f
(
x
)’s. Of course, we still need to figure out how we can extract useful
information from this mixed state, and we will figure that out later on.
While the state
|ψi =
1
√
2
n
X
x
|xi
has exponentially many terms, it can be made in polynomial (and in fact linear)
time by n applications of H. Indeed, recall that H is given by
|0i
1
√
2
(|0i + |1i)
H
So for an n-qubit state, we have
|0i···|0i
1
√
2
n
(|0i + |1i) ···(|0i + |1i)
H⊗···⊗H
,
and expanding the right hand side gives us exactly what we want.
3 Some quantum algorithms
3.1 Balanced vs constant problem
We are going to come to our first quantum algorithm.
Here our computational task is a bit special. Instead of an input
i
1
, ··· , i
n
∈
B
n
, we are given a black box /oracle that computes some
f
:
B
m
→ B
n
. We may
have some a priori promise on
f
, and we want to determine some property of
the function f . The only access to f is querying the oracle with its inputs.
The use of
f
(classical) or
U
f
(quantum) counts as one step of computation.
The query complexity of this task is the least number of times the oracle needs
to be queried. Usually, we do not care much about how many times the other
gates are used.
Obviously, if we just query all the values of the function, then we can
determine anything about the function, since we have complete information. So
the goal is to see if we can do it with fewer queries.
The problem we are going to look at is the balanced vs constant problem.
The input black box is a function
f
:
B
n
→ B
1
. The promise is that
f
is either
(i) a constant function, i.e. f(x) = 0 for all x, or f(x) = 1 for all x; or
(ii) a balanced function, i.e. exactly half of the values in B
n
are sent to 1.
We want to determine if f is (i) or (ii) with certainty.
Classically, if we want to find the answer with certainty, in the worst case
scenario, we will have to perform 2
n−1
+ 1 queries — if you are really unlucky,
you might query a balanced function 2
n−1
times and get 0 all the time, and you
can’t distinguish it from a constant 0 function.
Quantumly, we have the Deutsch-Jozsa algorithm, that answers the question
in 1 query!
A trick we are going to use is something known as “phase kickback”. Instead
of encoding the result as a single bit, we encode them as
±
signs, i.e. as phases
of the quantum bits. The “kickback” part is about using the fact that we have
|ai(e
iθ
|bi) = (e
iθ
|ai) |bi,
So we might do something to
|bi
to give it a phase, and then we “kick back” the
phase on |ai, and hopefully obtain something when we measure |ai.
Recall that we have
U
f
|xi|yi = |xi|y ⊕ f (x)i.
Here |xi has n qubits, and |yi has 1 qubit.
The non-obvious thing to do is to set the output register to
|αi =
|0i − |1i
2
= H |1i = HX |0i.
We then note that U
f
acts by
|xi
|0i − |1i
2
7→ |xi
|f(x)i − |1 ⊕f(x)i
√
2
=
(
|xi
|0i−|1i
√
2
if f (x) = 0
|xi
|1i−|0i
√
2
if f (x) = 1
= (−1)
f(x)
|xi|αi.
Now we do this to the superposition over all possible x:
1
√
2
n
X
|xi|αi 7→
1
√
2
n
X
(−1)
f(x)
|xi
|αi.
So one query gives us
|ξ
f
i =
1
√
2
n
X
(−1)
f(x)
|xi.
The key observation now is simply that if
f
is constant, then all the signs are
the same. If
x
is balanced, then exactly half of the signs are + and
−
. The
crucial thing is that
|ξ
f
const
i
is orthogonal to
|ξ
f
balanced
i
. This is a good thing,
since orthogonality is something we can p erfectly distinguish with a quantum
measurement.
There is a slight technical problem here. We allow only measurements in the
standard
|0i
,
|1i
basis. So we need want to “rotate” our states to the standard
basis. Fortunately, recall that
|0i···|0i
1
√
2
n
P
|xi
H⊗···⊗H
,
Now recall that H is self-inverse, so
H
2
=
I
. Thus, if we apply H
⊗ ··· ⊗
H to
1
√
2
n
P
|xi, then we obtain |0i···|0i.
We write
|η
f
i = H ⊗··· ⊗H |ξ
f
i.
Since H is unitary, we still have
|η
f
const
i ⊥ |η
f
balanced
i.
Now we note that if f is constant, then
η
f
const
= ±|0i···|0i.
If we look at what
|η
f
balanced
i
is, it will be a huge mess, but it doesn’t really
matter — all that matters is that it is perpendicular to |0i···|0i.
Now when we measure
η
f
, if
f
is a constant function, then we obtain 0
···
0
with probability 1. If it is balanced, then we obtain something that is not 0 with
probability 1. So we can determine the result with probability 1.
|0i
.
.
.
|0i
|0i
H
H
H
X H
U
f
H
H
H
input
output
measure
discard
This uses exactly one query, with 1 + (
n
+ 1) +
n
+
n
=
O
(
n
) elementary gates
and measurements.
What if we tolerate error in the balanced vs constant problem? In other
words, we only require that the answer is correct with probability 1
− ε
with
0 < ε <
1
2
.
In the quantum case, nothing much changes, since we are probably not going
to do better than 1 query. However, we no longer have a huge benefit over
classical algorithms. There is a classical randomized algorithm with
O
(
log
(1
/ε
))
queries, and in particular does not depend on n.
Indeed, we do it the obvious way — we choose some
K x
-values uniformly at
random from
B
n
, say
x
1
, ··· , x
n
(where
K
is fixed and determined later). We
then evaluate f(x
1
), ··· , f (x
n
).
If all the outputs are the same, then we say
f
is constant. If they are not
the same, then we say f is balanced.
If
f
actually is constant, then the answer is correct with probability 1. If
f
is balanced, then each
f
(
x
i
) is 0 or 1 with equal probability. So the probability
of getting the same values for all x
i
is
2
2
K
= 2
1−K
.
This is our failure probability. So if we pick
K > log
2
(ε
−1
) + 1,
then we have a failure probability of less than ε.
Can we decide every yes/no question about
f
:
B
n
→ B
1
’s by quantum
algorithms with “a few” queries? The answer is no. One prominent example is
the SAT problem (satisfiability problem) — given an arbitrary
f
, we want to
determine if there an
x
such that
f
(
x
) = 1? It can be shown that any quantum
algorithm (even if we allow for bounded errors) needs at least
O
(
√
2
n
) queries,
which is achieved by Grover’s algorithm. Classically, we need
O
(2
n
) queries. So
we have achieved a square root speedup, which is good, but not as good as the
Deutsch-Jozsa algorithm.
In any case, the Deutsch-Jozsa algorithm demonstrates how we can achieve
an exponential benefit with quantum algorithms, but it happ ens only when
we have no error tolerance. In real life scenario, external factors will lead to
potential errors anyway, and requiring that we are always correct is not a sensible
requirement.
There are other problems where quantum algorithms are better:
Example
(Simon’s algorithm)
.
The Simon’s problem is a promise problem ab out
f
:
B
n
→ B
n
with provably exponential separation between classical (
O
(2
n/4
))
and quantum (
O
(
n
)) query complexity even with bounded error. The details
are on the first example sheet.
3.2 Quantum Fourier transform and periodicities
We’ve just seen some nice examples of benefits of quantum algorithms. However,
oracles are rather unnatural problems — it is rare to just have a black-box access
to a function without knowing anything else about the function.
How about more “normal” problems? The issue with trying to compare
quantum and classical algorithms for “normal” problems is that we don’t actually
have any method to find the lower bound for the computation complexity. For
example, while we have not managed to find polynomial prime factorization
algorithms, we cannot prove for sure that there isn’t any classical algorithm that
is polynomial time. However, for the prime factorization problem, we do have a
quantum algorithm that does much better than all known classical algorithms.
This is Shor’s algorithm, which relies on the toolkit of the quantum Fourier
transform.
We start by defining the quantum Fourier transform.
Definition
(Quantum Fourier transform mod
N
)
.
Suppose we have an
N
-
dimensional state space with basis
|0i, |1i, ··· , |N − 1i
labelled by
Z/N Z
. The
quantum Fourier transform mod N is defined by
QFT : |ai 7→
1
√
N
N−1
X
b=0
e
2πiab/N
|bi.
The matrix entries are
[QFT]
ab
=
1
√
N
ω
ab
, ω = e
2πi/N
,
where
a, b
= 0
,
1
, ··· , N −
1. We write
QFT
n
for the quantum Fourier transform
mod n.
Note that we are civilized and start counting at 0, not 1.
We observe that the matrix
√
NQFT is
(i) Symmetric
(ii) The first (i.e. 0th) row and column are all 1’s.
(iii)
Each row and column is a geometric progression 1
, r, r
2
, ··· , r
n−1
, where
r = ω
k
for the kth row or column.
Example.
If we look at
QFT
2
, then we get our good old H. However,
QFT
4
is
not H ⊗ H.
Proposition. QFT is unitary.
Proof. We use the fact that
1 + r + ··· + r
N−1
=
(
1−r
N
1−r
r 6= 1
N r = 1
.
So if r = ω
k
, then we get
1 + r + ··· + r
N−1
=
(
0 k 6≡ 0 mod N
N k ≡ 0 mod N
.
Then we have
(QFT
†
QFT)
ij
=
1
√
N
2
X
k
ω
−ik
ω
jk
=
1
N
X
k
ω
(j−i)k
=
(
1 i = j
0 i 6= j
.
We now use the quantum Fourier Transform to solve the periodicity problem.
Example.
Suppose we are given
f
:
Z/N Z → Y
(for some set
Y
). We are
promised that f is periodic with some period r | N, so that
f(x + r) = f (x)
for all x. We also assume that f is injective in each period, so that
0 ≤ x
1
6= x
2
≤ r − 1 implies f(x
1
) 6= f (x
2
).
The problem is to find
r
, with any constant level of error 1
− ε
independent of
N. Since this is not a decision problem, we can allow ε >
1
2
.
In the classical setting, if
f
is viewed as an oracle, then
O
(
√
N
) queries are
necessary and sufficient. We are going to show that quantumly,
O
(
log log N
)
queries with
O
(
poly
(
log N
)) processing steps suffice. In later applications, we
will see that the relevant input size is
log N
, not
N
. So the classical algorithm is
exponential time, while the quantum algorithm is polynomial time.
Why would we care about such a problem? It turns out that later we will
see that we can reduce prime factorization into a periodicity problem. While
we will actually have a very explicit formula for
f
, there isn’t much we can do
with it, and treating it as a black box and using a slight modification of what
we have here will be much more efficient than any known classical algorithm.
The quantum algorithm is given as follows:
(i)
Make
1
√
N
P
N−1
x=0
|xi
. For example, if
N
= 2
n
, then we can make this using
H
⊗ ··· ⊗
H. If
N
is not a power of 2, it is not immediately obvious how
we can make this state, but we will discuss this problem later.
(ii) We make one query to get
|fi =
1
√
N
X
|xi|f(x)i.
(iii)
We now recall that
r | N
, Write
N
=
Ar
, so that
A
is the number of
periods. We measure the second register, and we will see some
y
=
f
(
x
).
We let
x
0
be the least
x
with
f
(
x
) =
y
, i.e. it is in the first period. Note
that we don’t know what x
0
is. We just know what y is.
By periodicity, we know there are exactly
A
values of
x
such that
f
(
x
) =
y
,
namely
x
0
, x
0
+ r, x
0
+ 2r, ··· , x
0
+ (A − 1)r.
By the Born rule, the first register is collapsed to
|peri =
1
√
A
A−1
X
j=0
|x
0
+ jri
|f(x
0
)i.
We throw the second register away. Note that
x
0
is chosen randomly from
the first period 0, 1, ··· , r − 1 with equal probability.
What do we do next? If we measure
|peri
, we obtain a random
j
-value, so
what we actually get is a random element (
x
0
th) of a random p eriod (
j
th),
namely a uniformly chosen random number in 0
,
1
, ··· , N
. This is not too
useful.
(iv)
The solution is the use the quantum Fourier transform, which is not sur-
prising, since Fourier transforms are classically used to extract perio dicity
information.
Apply QFT
N
to |peri now gives
QFT
N
|peri =
1
√
NA
n−1
X
j=0
N−1
X
y=0
ω
(x
0
+jr)y
|yi
=
1
√
NA
N−1
X
y=0
ω
x
0
y
n−1
X
j=0
ω
jry
|yi
We now see the inner sum is a geometric series. If
ω
ry
= 1, then this sum
is just A. Otherwise, we have
A−1
X
j=0
ω
jry
=
1 − ω
rA
1 − ω
ry
=
1 − 1
1 − ω
ry
= 0.
So we are left with
QFT
n
|peri =
r
A
N
r−1
X
k=0
ω
x
0
kN/r
k
N
r
.
Note that before the Fourier transform, the random shift of
x
0
lied in the
label
|x
0
+ jri
. After the Fourier transform, it is now encoded in the phase
instead.
(v)
Now we can measure the label, and we will get some
C
which is a multiple
k
0
N
r
, where 0
≤ k
0
≤ r −
1 is chosen uniformly at random. We rewrite
this equation as
k
0
r
=
C
N
.
We know
C
, because we just measured it, and
N
is a given in the question.
Also,
k
0
is randomly chosen, and
r
is what we want. So how do we extract
that out?
If by good chance, we have
k
0
coprime to
r
, then we can cancel
C/N
to
lowest terms and read off
r
as the resulting denominator
˜r
. Note that
cancelling
C/N
to lowest terms can be done quickly by Euclidean algorithm.
But how likely are we to be so lucky? We can just find some number theory
book, and figure out that the number of natural numbers
< r
that are
coprime to
r
grows as
O
(
r/ log log r
). More precisely, it is
∼ e
−γ
r/ log log r
,
where γ is the other Euler’s constant. We note that
O
r
log log r
> O
1
log log N
.
So if
k
0
is chosen uniformly and randomly, the probability that
k
0
is
coprime to r is at least O(1/ log log N).
Note that if
k
0
is not coprime with
r
, then we have
˜r | r
, and in particular
˜r < r
. So we can check if
˜r
is a true period — we compute
f
(0) and
f
(
˜r
),
and see if they are the same. If
˜r
is wrong, then they cannot be equal as
f
is injective in the perio d.
While the probability of getting a right answer decreases as
N → ∞
, we just
have to do the experiment many times. From elementary probability, if an
event has some (small) success probability
p
, then given any 0
<
1
−ε <
1,
for
M
=
−
log ε
p
trials, the probability that there is at least one success is
>
1
− ε
. So if we repeat the quantum algorithm
O
(
log log N
) times, and
check
˜r
each time, then we can get a true
r
with any constant level of
probability.
(vi)
We can further improve this process — if we have obtained two attempts
˜r, ˜r
0
, then we know
r
is at least their least common multiple. So we can
in fact achieve this in constant time, if we do a bit more number theory.
However, the other parts of the algorithm (e.g. cancelling
C/N
down to
lowest terms) still use time polynomial in
log N
. So we have a polynomial
time algorithm.
There is one thing we swept under the carpet. We need to find an efficient
way of computing the quantum Fourier transform, or else we just hid all our
complexity in the quantum Fourier transform.
In general, we would expect that a general unitary operations on
n
qubits
needs
exp
(
n
) elementary circuits. However, the quantum Fourier transform is
special.
Fact. QFT
2
n
can be implemented by a quantum circuit of size O(n
2
).
The idea of the construction is to mimic the classical fast Fourier transform.
An important ingredient of it is:
Fact. The state
QFT
2
n
|xi =
1
2
n/2
2
n
−1
X
y=0
ω
xy
|yi
is in fact a product state.
We will not go into the details of implementation.
We can generalize the periodicity problem to arbitrary groups, known as the
hidden subgroup problem. We are given some oracle for
f
:
G → Y
, and we are
promised that there is a subgroup
H < G
such that
f
is constant and distinct on
cosets of
H
in
G
. We want to find
H
(we can make “find” more precise in two
ways — we can either ask for a set of generators, or provide a way of sam pling
uniformly from H).
In our case, we had G = (Z/N Z, +), and our subgroup was
H = {0, r, 2r, ··· , (A − 1)r}.
Unfortunately, we do not know how to do this efficiently for a group in general.
3.3 Shor’s algorithm
All that was a warm up for Shor’s algorithm. This is a quantum algorithm that
factorizes numbers in polynomial time. The crux of the algorithm will be a
modified version of the quantum Fourier transform.
The precise statement of the problem is as follows — given an integer
N
with
n
=
log N
digits, we want to find a factor 1
< K < N
. Shor’s algorithm will
achieve this with constant probability (1
− ε
) in
O
(
n
3
) time. The best known
classical algorithm is e
O(n
1/3
(log n)
2/3
)
.
To do this, we will use the periodicity algorithm. However, there is one
subtlety involved. Instead of working in
Z/nZ
, we need to work in
Z
. Since
computers cannot work with infinitely many numbers, we will have to truncate
it somehow. Since we have no idea what the period of our function will be, we
must truncate it randomly, and we need to make sure we can control the error
introduced by the truncation.
We shall now begin. Given an
N
, we first choose some 1
< a < N
uniformly
randomly, and compute
hcf
(
a, N
). If it is not equal to 1, then we are finished.
Otherwise, by Euler’s theorem, there is a least power
r
of
a
such that
a
r
≡
1
mod N
. The number
r
is called the order of
a
mod
N
. It follows that the
function
f
:
Z → Z/NZ
given by
f
(
k
) =
a
k
mod N
has period
r
, and is injective
in each period.
Note that
f
(
k
) can be efficiently computed in
poly
(
log k
) time, by repeated
squaring. Also note that classically, it is hard to find
r
, even though
f
has a
simple formula!
It was known to Legendre in 1800 that knowing
r
means we can factor
n
.
Suppose we can find r, and further suppose r is even. Then we have
a
r
− 1 ≡ (a
r/2
+ 1)(a
r/2
− 1) ≡ 0 (mod N).
So
N
exactly divides the product. By minimality of
r
, we know
N
does not
divide
a
r/2
−
1. So if
N
does not divide
a
r/2
+ 1 as well, then
hcf
(
N, a
r/2
±
1)
are non-trivial factors of N.
For this to work, we needed two assumptions –
r
is even, and
a
r/2
6≡ −
1
(
mod N
). Fortunately, there is a theorem in number theory that says if
N
is
odd and not a prime power, and
a
is chosen uniformly at random, then the
probability that these two things happen is at least
1
2
. In fact, it is
≥
1
−
1
2
m−1
,
where m is the number of prime factors of N.
So if we repeat this
k
times, the probability that they all fail to give a factor
is less than
1
2
k
. So this can be as small as we wish.
What about the other possibilities? If
N
is even, then we would have noticed
by looking at the last digit, and we can just write down 2. If
N
=
c
`
for
c, ` >
2,
then there is a classical polynomial time algorithm that outputs
c
, which is a
factor. So these are the easy cases.
Everything we’ve done so far is classical! The quantum part comes in when
we want to compute
r
. We know that
f
(
k
) =
a
k
is periodic on
Z
, which is an
infinite domain. So we cannot just apply our periodicity algorithm.
By number theory, we know that
r
is at most
N
. But other than that, we
have no idea what
r
actually is, nor do we know of any multiple of
r
. So we
cannot apply the periodicity argument directly. Instead, we pick a big number
2
m
, and work on the domain
D
=
{
0
,
1
, ··· ,
2
m
−
1
}
=
Z/
2
m
Z
. How do we
choose
m
? The idea is that we want 0
, ··· ,
2
m
−
1 to contain
B
full periods,
plus some extra “corrupt” noise b, so
2
m
= Br + b,
with 0
≤ b < r
. Since we want to separate out the periodicity information from
the corrupt noise, we will want
b
to be relatively small, compared to
Br
. We
know the size of
b
is bounded by
r
, hence by
N
. So we need 2
m
to be “much
larger” than
N
. It turns out picking 2
m
> N
2
is enough, and we will pick
m
to
be the smallest number such that this holds.
We now study the effect of corruption on the periodicity algorithm. We again
make the state
|fi =
1
√
2
m
X
|xi|f(x)i.
and measure the value of f. We then get
|peri =
1
√
A
A−1
X
k=0
|x
0
+ kri,
where
A
=
B
or
B
+ 1, depending on whether
x
0
≤ b
or not. As before, we
apply QFT
2
m
to obtain
QFT
2
m
|peri =
2
n
−1
X
c=0
˜
f(c) |ci.
When we did this before, with an exact period, most of the
ˆ
f
(
c
) is zero. However,
this time things are a bit more messy. As before, we have
˜
f(c) =
ω
cx
0
√
A
√
2
m
[1 + α + ··· + α
A−1
], α = e
2πicr/2
m
.
The important ques tion is, when we measure this, which
c
’s will we see with
“good probability”? With exact periodicity, we knew that
2
m
r
=
A
is an exact
integer. So
˜
f
(
c
) = 0 except when
c
is a multiple of
A
. Intuitively, we can think
of this as interference, and we had totally destructive and totally constructive
interference respectively.
In the inexact case, we will get constructive interference for those
c
such that
the phase
α
is close to 1. These are the
c
’s with
cr
2
m
nearest to integers
k
, and
the powers up to
α
A−1
don’t spread too far around the unit circle. So we avoid
cancellations.
So we look at those special
c
’s having this particular property. As
c
increases
from 0 to 2
m
−
1, the angle
cr
2
m
increments by
r
2
m
each time from 0 up to
r
. So
we have c
k
’s for each k = 0, 1, ··· , r − 1 such that
c
k
r
2
m
− k
<
1
2
·
r
2
m
.
In other words, we have
c
k
− k
2
m
r
<
1
2
.
So the c
k
are the integers nearest to the multiples of 2
m
/r.
In
˜
f
(
c
), the
α
’s corresponding to the
c
k
’s have the smallest phases, i.e. nearest
to the positive real axis. We write
c
k
r
2
m
= k + ξ,
where
k ∈ Z, |ξ| <
1
2
r
2
m
.
Then we have
α
n
= exp
2πi
c
k
r
2
m
n
= exp (eπi(k + ξ)n) = exp(2πiξn)
Now for
n < A
, we know that
|
2
ξn| < π
, and thus 1
, α, α
2
, ··· , α
A−1
all lie in
the lower half plane or upper half plane.
Doing all the algebra needed, we find that if
QFT |peri
is measured, then for
any c
k
as above, we have
Prob(c
k
) >
γ
r
,
where
γ =
4
π
2
≈ 0.4.
Recall that in the exact periodicity case, the points
c
k
hit the integers exactly,
and instead of γ we had 1. The distribution of the c’s then look like:
With inexact periods, we obtain something like
Now how do we get r from a c
k
? We know
c
k
2
m
−
k
r
<
1
2
m+1
<
1
2N
2
.
We claim that there is at most 1 fraction
k
r
with denominator
< N
such that
this inequality holds. So this inequality does uniquely determine k/r.
Indeed, suppose
k
r
and
k
0
r
0
both work. Then we have
k
0
r
0
−
k
r
=
|k
0
r − r
0
k|
rr
0
>
1
rr
0
>
1
N
2
.
However, we also have
k
0
r
0
−
k
r
≤
k
0
r
0
−
c
k
2
m
+
c
k
2
m
−
k
r
<
1
2N
2
+
1
2N
2
=
1
N
2
.
So it follows that we must have
k
0
r
0
=
k
r
.
We introduce the notion of a “good”
c
k
value, which is when
k
is coprime to
r. The probability of getting a good c
k
is again
O(1/ log log r) > O(1/ log log N).
Note that this is the same rate as the case of exact periodicity, since we have
only lost a constant factor of
γ
! If we did have such a
c
k
, then now
r
is uniquely
determined.
However, there is still the problem of finding
r
from a good
c
k
value. At this
point, this is just classical number theory.
We can certainly try all
k
0
r
0
with
k
0
< r
0
< N
and find the closest one to
c
k
/
2
m
, but there are
O
(
N
2
) fractions to try, but we want a
O
(
poly
(
log N
))
algorithm. Indeed, if we were to do it this way, we might as well try all numbers
less than N and see if they divide N . And this is just O(N)!
The answer comes from the nice theory of continued fractions. Any rational
number
s
t
< 1 has a continued fraction expansion
s
t
=
1
a
1
+
1
a
2
+
1
a
3
+ ···
.
Indeed to do this, we simply write
s
t
=
1
t
s
=
1
a
1
+
s
1
t
1
,
where we divide
t
by
s
to get
t
=
a
1
s
+
s
1
, and then put
t
1
=
s
. We then keep
going on with
s
1
t
1
. Since the numbers
s
i
, t
i
keep getting smaller, it follows that
this process will eventually terminate.
Since it is very annoying to type these continued fractions in L
A
T
E
X, we often
write the continued fraction as
s
t
= [a
1
, a
2
, a
3
, ··· , a
n
].
We define the kth convergent of
s
t
to be
p
k
q
k
= [a
1
, a
2
, ··· , a
k
].
There are some magic results from number theory that gives us a simple recur-
rence relation for the convergents.
Lemma. For a
1
, a
2
, ··· , a
`
any positive reals, we set
p
0
= 0 q
0
= 1
p
1
= 1 q
1
= a
1
We then define
p
k
= a
k
p
k−1
+ p
k−2
q
k
= a
k
q
k−1
+ q
k−2
Then we have
(i) We have
[a
1
, ··· , a
k
] =
p
k
q
k
.
(ii) We also have
q
k
p
k−1
− p
k
q
k−1
= (−1)
k
.
In particular, p
k
and q
k
are coprime.
From a bit more number theory, we find that
Fact.
If
s < t
are
m
-bit integers, then the continued fraction has length
O
(
m
),
and all convergents
p
k
q
k
can be computed in O(m
3
) time.
More importantly, we have the following result:
Fact. Let 0 < x < 1 be rational, and supp ose
p
q
is rational with
x −
p
q
<
1
2q
2
.
Then
p
q
is a convergent of the continued fraction of x.
Then by this theorem, for a good
c
, we know
k
r
must be a convergent of
c
2
m
.
So we compute all convergents find a (unique) one whose denominator is less
than
N
and is within
1
2N
2
of
c
2
m
. This gives us the value of
r
, and we are done.
In fact, this last classical part is the slowest part of the algorithm.
Example.
Suppose we want to factor
N
= 39. Suppose the random
a
we chose
is
a
= 7
<
39, which is coporime to
N
. Let
r
be the period of
f
(
x
) = 7
x
mod
39.
We notice
1024 = 2
10
< N
2
= 1621 < 2
11
= 2048.
So we pick m = 11. Suppose the measurement of QFT
2
m
|peri yeilds c = 853.
By the theory, this has a constant probability (approximately 0
.
4) to satisfy
853
2
m
−
k
r
<
1
2
m+1
=
1
2
12
<
1
2N
2
.
We also have a probability of
O
(1
/ log log r
) to have
k
and
r
coprime. In this
case, c is indeed “goo d”. So there is a unique
k
r
satisfying
853
2048
−
k
r
<
1
2
12
.
So to find
k
r
, we do the continued fraction expansion of
853
2048
. We have
853
2048
=
1
2048
853
=
1
2 +
342
853
=
1
2 +
1
853
342
=
1
2 +
1
2 +
169
342
= ··· = [2, 2, 2, 42, 4].
We can then compute the convergents
[2] =
1
2
[2, 2] =
2
5
[2, 2, 2] =
5
12
[2, 2, 2, 42] =
212
509
[2, 2, 2, 42, 4] =
853
2048
Of all these numbers, only
5
12
is within
1
2
12
of
853
2048
and whose denominator is
less than N = 39.
If we do not assume k and r are coprime, then the possible
k
r
are
5
12
,
10
24
,
15
36
.
If we assume that
k
r
are coprime, then r = 12. Indeed, we can try that
7
12
≡ 1 (mod 39).
So we now know that
39 | (7
6
+ 1)(7
6
− 1).
We now hope/expect with probability
>
1
2
exactly that it goes partly into each
factor. We can compute
7
6
+ 1 = 117650 ≡ 26 (mod 39)
7
6
− 1 = 117648 ≡ 24 (mod 39)
We can then compute
hcf(26, 39) = 13, hcf(24, 39) = 3 (mod 39).
We see that 3 and 13 are factors of 39.
3.4 Search problems and Grover’s algorithm
We are now going to turn our attention to search problems. These are very
important problems in computing, as we can formulate almost all problems as
some sort of search problems.
One important example is simultaneous constraint satisfaction. Here we have
a large configuration space of options, and we want to find some configuration
that satisfies some constraints. For example, when designing a lecture timetable
for Part III courses, we need to schedule the courses so that we don’t clash
two popular courses in the same area, and the courses need to have big enough
lecture halls, and we have to make sure a lecturer doesn’t have to simultaneously
lecture two courses at the same time. This is very complicated.
In general, search problems have some common features:
(i)
Given any instance of solution attempt, it is easy to check if it is good or
not.
(ii) There are exponentially many possible instances to try out.
One example is the boolean satisfiability problem, which we have already
seen before.
Example
(Boolean satisfiability problem)
.
The boolean satisfiability problem
(SAT ) is as follows — given a Boolean formula
f
:
B
n
→ B
, we want to know if
there is a “satisfying argument”, i.e. if there is an x with f (x) = 1.
This has complexity class
NP
, standing for non-deterministic polynomial
time. There are many ways to define
NP
, and here we will provide two. The
first definition of NP will involve the notion of a verifier:
Definition (Verifier). Suppose we have a language L ⊆ B
∗
, where
B
∗
=
[
n∈N
B
n
is the set of all bit strings.
A verifier for L is a computation V (w, c) with two inputs w, c such that
(i) V halts on all inputs.
(ii) If w ∈ L, then for some c, V (w, c) halts with “accept”.
(iii) If w 6∈ L, then for all c, V (w, c) halts with “reject”.
A polynomial time verifier is a
V
that runs in polynomial time in
|w|
(not
|w|+ |c|!).
We can think of
c
as “certificate of membership”. So if you are a member,
you can exhibit a certificate of membership that you are in there, and we can
check if the certification is valid. However, if you are not a member, you cannot
“fake” a certificate.
Definition
(Non-deterministic polynomial time problem)
. NP
is the class of
languages that have polynomial time verifiers.
Example.
The SAT problem is in
NP
. Here
c
is the satisfying argument, and
V (f, c) just computes f(c) and checks whether it is 1.
Example.
Determining if a number is composite is in
NP
, where a certificate
is a factor of the number.
However, it is not immediately obvious that testing if a numb er is prime is
in
NP
. It is an old result that it indeed is, and recent progress shows that it is
in fact in P.
It is rather clear that
P ⊆ NP
. Indeed, if we can check membership in
polynomial time, then we can also construct a verifier in polynomial time that
just throws the certificate away and check directly.
There is another model of
NP
, via non-deterministic computation. Recall
that in probabilistic computation, in some steps, we had to pick a random
number, and picking a different number would lead to a different “branch”. In
the case of non-deterministic computation, we are allowed to take all paths at
the same time. If some of the paths end up being accepting, then we accept the
input. If all paths reject, then we reject the input. Then we can alternatively
say a problem is in
NP
if there is a polynomial-time non-deterministic machine
that checks if the string is in the language.
It is not difficult to see that these definitions of
NP
are equivalent. Suppose
we have a non-deterministic machine that checks if a string is in the language.
Then we can construct a verifier whose certificate is a prescription of which
particular branch we should follow. Then the verifier just takes the prescription,
follows the path described and see if we end up being accepted.
Conversely, if we have a verifier, we can construct a non-deterministic machine
by testing a string on all possible certificates, and check if any of them accepts.
Unfortunately, we don’t know anything about how these different complexity
classes compare. We clearly have
P ⊆ BPP ⊆ BQP
and
P ⊆ NP
. However,
we do not know if these inclusions are strict, or how
NP
compares to the others.
Unstructured search problem and Grover’s algorithm
Usually, when we want to search something, the search space we have is structured
in some way, and this greatly helps our searching problem.
For example, if we have a phone book, then the names are ordered alpha-
betically. If we want to find someone’s phone number, we don’t have to look
through the whole book. We just open to the middle of the book, and see if the
person’s name is before or after the names on the page. By one lookup like this,
we have already eliminated half of the phone bo ok we have to search through,
and we can usually very quickly locate the name.
However, if we know someone’s phone number and want to figure out their
name, it is pretty much hopeless! This is the problem with unstructured data!
So the problem is as follows: we are given an unstructured database with
N
= 2
n
items and a unique good item (or no good items). We can query any
item for good or bad-ness. The problem is to find the good item, or determine if
one exists.
Classically,
O
(
N
) queries are necessary and sufficient. Even if we are asking
for a right result with fixed probability
c
, if we pick items randomly to check,
then the probability of seeing the “good” one in
k
queries is given by
k/N
. So
we still need O(N) queries for any fixed probability.
Quantumly, we have Grover’s algorithm. This needs
O
(
√
N
) queries, and
this is both necessary and sufficient.
The database of
N
= 2
n
items will be considered as an oracle
f
:
B
n
→ B
1
.
it is promised that there is a unique
x
0
∈ B
n
with
f
(
x
0
) = 1. The problem is to
find x
0
. Again, we have the quantum version
U
f
|xi|yi = |xi|y ⊗ f (x)i.
However, we’ll use instead I
x
0
on n qubits given by
I
x
0
|xi =
(
|xi x 6= x
0
−|xi x 6= x
0
.
This can be constructed from
U
f
as we’ve done before, and one use of
I
x
0
can
be done with one use of U
f
.
|si I
x
0
|si
|0i−|1i
2
|0i−|1i
2
U
f
We can write I
x
0
as
I
x
0
= I − 2 |x
0
ihx
0
|,
where I is the identity operator.
We are now going to state the Grover’s algorithm, and then later prove that
it works.
For convenience, we write
H
n
= H ⊗··· ⊗ H
| {z }
n times
We start with a uniform superposition
|ψ
0
i = H
n
|0 ···0i =
1
√
n
X
all x
|xi.
We consider the Grover iteration operator on n qubits given by
Q = −H
n
I
0
H
n
I
x
0
.
Here running
I
x
0
requires one query (whereas
I
0
is “free” because it is just
I − 2 |0ih0|).
Note that all these operators are all real. So we can pretend we are living
in the real world and have nice geometric pictures of what is going on. We let
P(x
0
) be the (real) plane spanned by |x
0
i and |ψ
0
i. We claim that
(i) In this plane P(x
0
), this operator Q is a rotation by 2α, where
sin α =
1
√
N
= hx
0
|ψ
0
i.
(ii) In the orthogonal complement P(x
0
)
⊥
, we have Q = −I.
We will prove these later on. But if we know this, then we can repeatedly apply
Q
to
|ψ
0
i
to rotate it near to
|x
0
i
, and then measure. Then we will obtain
|x
0
i
with very high probability:
P(x
0
)
|ψ
0
i
|x
0
i
β
α
The initial angle is
cos β = hx
0
|ψ
0
i =
1
√
N
.
So the number of iterations needed is
cos
−1
(1/
√
N)
2 sin
−1
(1/
√
N)
=
β
2α
.
In general, this is not an integer, but applying a good integer approximation to it
will bring us to very close to
|x
0
i
, and thus we measure
x
0
with high probability.
For large n, the number of iterations is approximately
π/2
2/
√
N
=
π
4
√
N.
Example. Let’s do a boring example with N = 4. The initial angle satisfies
cos β =
1
√
4
=
1
2
.
So we know
β =
π
3
.
Similarly, we have
2α = 2 sin
−1
1
2
=
π
3
.
So 1 iteration of
Q
will rotate
|ψ
0
i
exactly to
|x
0
i
, so we can find it with certainty
with 1 lookup.
Now we prove that this thing actually works. In general, for any unitary
U
and I
|ψi
= I − 2 |ψihψ|, we have
UI
|ψi
U
†
= U IU
†
− 2U |ψihψ|U
†
= I
U|ψi
.
In particular, since
H
n
is self-adjoint, i.e.
H
†
n
H
n
, and that by definition
H
n
|0i
=
|ψ
0
i, we know
Q = −H
n
I
0
H
n
I
x
0
= −I
|ψ
0
i
I
x
0
.
Next we note that for any |ψi and |ξi, we know by definition
I
|ψi
|ξi = |ξi −2 |ψihψ|ξi.
So this modifies |ξi by some multiple of |ψi. So we know our operator
Q|ψi = −I
|ψ
0
i
I
x
0
|ψi
modifies
|ψi
first by some multiple of
|x
0
i
, then by some multiple of
ψ
0
. So if
|ξi ∈ P(x
0
), then Q|ψi ∈ P(x
0
) too! So Q preserves P(x
0
).
We know that
Q
is a unitary, and it is “real”. So it must be a rotation or
a reflection, since these are the only things in O(2). We can explicitly figure
out what it is. In the plane
P
(
x
0
), we know
I
|x
0
i
is reflection in the mirror line
perpendicular to
|x
0
i
. Similarly,
I
|ψ
0
i
is reflection in the mirror line perpendicular
to |ψ
0
i.
We now use the following facts about 2D Euclidean geometry:
(i)
If
R
is a reflection in mirror
M
along
|Mi
, then
−R
is reflection in mirror
M
⊥
along
M
⊥
.
To see this, we know any vector can be written as a |Mi + b
M
⊥
. Then
R
sends this to
a |Mi − b
M
⊥
, while
−R
sends it to
−a |Mi
+
b
M
⊥
,
and this is reflection in
M
⊥
.
(ii) Suppose we have mirrors M
1
and M
2
making an angle of θ:
M
1
M
2
θ
Then reflection in
M
1
then reflection in
M
2
is the same as rotating coun-
terclockwise by 2θ.
So we know
Q = −I
|ψ
0
i
I
|x
0
i
is reflection in
x
⊥
0
then reflection in
ψ
⊥⊥
0
=
|ψ
0
i
. So this is a rotation by 2
α
,
where α is the angle between
x
⊥
0
and |ψ
0
i, i.e.
sin α = cos β = hx
0
|ψ
0
i.
To prove our second claim that
Q
acts as
−1
in
P
(
x
0
)
⊥
, we simply note that if
|ξi ∈ P(x
0
)
⊥
, then |ξi ⊥ |ψi
0
and ξ ⊥ |x
0
i. So both I
|x
0
i
and I
|ψ
0
i
fix |ξi.
In fact, Grover’s algorithm is the best algorithm we can achieve.
Theorem.
Let
A
be any quantum algorithm that solves the unique search
problem with probability 1
− ε
(for any constant
ε
), with
T
queries. Then
T
is
at least O(
√
N). In fact, we have
T ≥
π
4
(1 − ε)
√
N.
So Grover’s algorithm is not only optimal in the growth rate, but in the
constant as well, asymptotically.
Proof is omitted.
Further generalizations
Suppose we have multiple good items instead, say
r
of them. We then replace
I
x
0
with I
f
, where
I
f
|xi =
(
−|xi x good
|xi x bad
We run the same algorithm as before. We let
|ψ
good
i =
1
√
r
X
x good
|xi.
Then now
Q
is a rotation through 2
α
in the plane spanned by
|ψ
good
i
and
|ψ
0
i
with
sin α = hψ
good
|ψ
0
i =
r
r
N
.
So for large N , we need
π/2
2
p
r/N
=
π
4
r
N
r
,
i.e. we have a
√
r
reduction over the unique case. We will prove that these
numbers are right later when we prove a much more general result.
What if we don’t know what
r
is? The above algorithm would not work,
because we will not know when to stop the rotation. However, there are some
tricks we can do to fix it. This involves cleverly picking angles of rotation at
random, and we will not go into the details.
3.5 Amplitude amplification
In fact, the techniques from Grover’s algorithm is completely general. Let
G
be
any subspace (“good” subspace) of the state space
H
, and
G
⊥
be its orthogonal
complement (“bad” subspace). Then
H = G ⊕ G
⊥
.
Given any normalized vector
|ψi ∈ H
, we have a unique decomposition with
real, non-negative coefficients
|ψi = sin θ |ψ
g
i + cos θ |ψ
b
i
such that
|ψ
g
i ∈ G, |ψ
b
i ∈ G
⊥
are normalized.
We define the reflections
I
|ψi
= I − 2 |ψihψ|, I
g
= I − 2P,
where P is the projection onto G given by
P =
X
b
|bihb|
for any orthonormal basis {|bi} of G. This P satisfies
P |ψi =
(
|ψi |ψi ∈ G
0 |ψi ∈ G
⊥
.
We now define the Grover operator
Q = −I
ψ
I
G
.
Theorem
(Amplitude amplification thoerem)
.
In the 2-dimensional subspace
spanned by |ψ
g
i and |ψi (or equivalently by |ψ
g
i and |ψ
b
i), where
|ψi = sin θ |ψ
g
i + cos θ |ψ
b
i,
we have that Q is rotation by 2θ.
Proof. We have
I
G
|ψ
g
i = −|ψ
g
i, I
G
|ψ
b
i = |ψ
b
i.
So
Q|ψ
g
i = I
ψ
|ψ
g
i, Q|ψ
b
i = −I
ψ
|ψ
b
i.
We know that
I
ψ
= I − 2 |ψihψ|.
So we have
Q|ψ
g
i = I
ψ
|ψ
g
i
= |ψ
g
i − 2(sin θ |ψ
g
i + cos θ |ψ
b
i)(sin θ)
= (1 −2 sin
2
θ) |ψ
g
i − 2 sin θ cos θ |ψ
b
i
= cos 2θ |ψ
g
i − sin 2θ |ψ
b
i
Q|ψ
b
i = −I
ψ
|ψ
b
i
= −|ψ
b
i + 2(sin θ |ψ
g
i + cos θ |ψ
b
i)(cos θ)
= 2 sin θ cos θ |ψ
g
i + (2 cos
2
θ − 1) |ψ
b
i
= sin 2θ |ψ
g
i + cos 2θ |ψ
b
i.
So this is rotation by 2θ.
If we iterate this
n
times, then we have rotated by 2
nθ
, but we started at
θ
from the |ψ
b
i direction. So we have
Q
n
|ψi = sin(2n + 1)θ |ψ
g
i + cos(2n + 1)θ |ψ
b
i.
If we measure Q
n
|ψi for good versus bad, we know
P(good) = sin
2
(2n + 1)θ,
and this is a maximum, when (2n + 1)θ =
π
2
, i.e.
n =
π
4θ
−
1
2
.
For a general
θ
, we know that
n
is not a n integer. So we use
n
the nearest
integer to
π
4θ
−
1
2
, which is approximately
π
4θ
= O(θ
−1
) = O(1/ sin θ) = O
1
kgood projection of |ψik
.
Example.
Suppose we want to do a Grover search for
r
good items in
N
objects.
We start with
|ψi =
1
√
N
X
all x
|xi =
r
r
N
1
√
r
X
good x
|xi
+
r
N − r
N
1
√
N − r
X
bad x
|xi
!
.
Then G is the subspace spanned by the good x’s, and
sin θ =
r
r
N
,
So Q is a rotation by 2θ, where
θ = sin
−1
r
r
N
≈
r
r
N
for r N. So we will use O(
p
r/N) operations.
Example.
Let
A
be any quantum circuit on start state
|0 ···0i
. Then the final
state is
A |0 ···0i
. The good states are the desired computational outcomes. For
example, if
A
is Shor’s algorithm, then the desired outcomes might be the good
c-values. We can write
A |0 ···0i = a |ψ
g
i + b |ψ
b
i.
The probability of a success in one run is
|a|
2
. So we normally need
O
(1
/|a|
2
)
repetitions of A to succeed with a given constant probability 1 − ε.
Instead of just measuring the result and hoping for the best, we can use
amplitude amplification. We assume we can check if
x
is good or bad, so we can
implement I
G
. We consider
|ψi = A |0 ···0i.
Then we define
Q = −I
A|0···0i
I
G
= −AI
|0···0i
A
†
I
G
.
Here we can construct
A
†
just by reversing the gates in
A
. So all parts are
implementable.
By amplitude amplification, Q is rotation by 2θ, where sin θ = |a|. So after
n ≈
π
4θ
= O(|a|
−1
)
repetitions,
A |0 ···0i
will be rotated to very near to
|ψ
g
i
, and this will succeed
with high probability. This gives us a square root speedup over the naive method.
4 Measurement-based quantum computing
In this chapter, we are going to look at an alternative model of quantum
computation. This is rather weird. Instead of using unitary gates and letting
them act on state, we prepare a generic starting state known as a graph state, then
we just keep measuring them. Somehow, by cleverly planning the measurements,
we will be able to simulate any quantum computation in the usual sense with
such things.
We will need a bunch of new notation.
Notation. We write
|±
α
i =
1
√
2
(|0i ± e
−iα
|1i).
In particular, we have
|±
0
i = |±i =
1
√
2
(|0i ± |1i)
Then
B(α) = {|+
α
i, |−
α
i}
is an orthonormal basis. We have 1-qubit gates
J(α) =
1
√
2
1 e
iα
1 −e
iα
= HP(α),
where
H =
1
√
2
1 1
1 −1
, P(α) =
1 0
0 e
iα
.
We also have the “Pauli gates”
X =
0 1
1 0
, Z =
1 0
0 −1
= P(π)
We also have the 2-qubit gates
E = CZ = diag(1, 1, 1, −1).
We also have 1-qubit measurements
M
i
(α) = measurement of qubit i in basis B(α).
The outcome |+
α
i is denoted 0 and the outcome |−
α
i is denoted 1.
We also have M
i
(Z), which is measurement of qubit
i
in the standard basis
{|0i, |1i}.
Finally, we have the notion of a graph state. Suppose we have an undirected
graph
G
= (
V, E
) with vertices
V
and edges
E
with no self-loops and at most
one edge between two vertices, we can define the graph state
|ψ
G
i
that is a state
of
|V |
qubits as follows: for each vertex
i ∈ V
, introduce a qubit
|+i
i
. For each
edge
e
:
i → j
, we apply
E
ij
(i.e.
E
operating on the qubits
i
and
j
). Since all
these E
ij
commute, the order does not matter.
Example. If G
1
is
0 1
then we have
|ψ
G
1
i = E
12
|+i
1
|+i
2
=
1
2
[|00i + |01i + |10i − |11i],
and this is an entangled state.
If G
2
is
0 1 2
then we have
|ψ
G
2
i = E
12
E
23
|+i
1
|+i
2
|+i
3
.
A cluster state is a graph state |ψ
G
i for G being a rectangular 2D grid.
The main result of measurement-based quantum computation is the following:
Theorem.
Let
C
be any quantum circuit on
n
qubits with a sequence of
gates
U
1
, ··· , U
K
(in order). We have an input state
|ψ
in
i
, and we perform
Z-measurements on the output states on specified qubits
j
=
i
1
, ··· , i
k
to obtain
a k-bit string.
We can always simulate the pro cess as follows:
(i)
The starting resource is a graph state
|ψ
G
i
, where
G
is chosen depending
on the connectivity structure of C.
(ii)
The computational steps are 1-qubit measurements of the form M
i
(
α
), i.e.
measurement in the basis
B
(
α
). This is adaptive —
α
may depend on the
(random) outcomes s
1
, s
2
, ··· of previous measurements.
(iii)
The computational pro cess is a prescribed (adaptive) sequence M
i
1
(
α
1
),
M
i
2
(α
2
), ···, M
i
N
(α
N
), where the qubit labels i
1
, i
2
, ··· , i
N
all distinct.
(iv)
To obtain the output of the process, we perform further measurements
M(Z) on
k
specified qubits not previously measured, and we get results
s
i
1
, ··· , s
i
k
, and finally the output is obtained by further (simple) classical
computations on s
i
1
, ··· , s
i
k
as well as the previous M
i
(α) outcomes.
The idea of the last part is that the final measurement
s
i
1
, ··· , s
i
k
has to be
re-interpret in light of the results M
i
(α
i
).
This is a funny process, because the result of each measurement
M
i
(
α
) is
uniformly random, with probability
1
2
for each outcome, but somehow we can
obtain useful information by doing adaptive measurements.
We now start the process of constructing such a system. We start with the
following result:
Fact. The 1-qubit gates J(α) with E
i,i±1
is a universal set of gate.
In particular, any 1-qubit U is a product of 3 J’s.
We call these E
i,i±1
nearest neighbour E
ij
’s.
Proof. This is just some boring algebra.
So we can assume that our circuit
C
’s gates are all of the form J(
α
)’s or E
0
ij
s
,
and it suffices to try to implement these gates in our weird system.
The next result we need is what we call the J-lemma:
Lemma (J-lemma). Given any 1-qubit state |ψi, consider the state
E
12
(|ψi
1
|+i
2
).
Suppose we now measure M
1
(
α
), and supp os e the outcome is
s
1
∈ {
0
,
1
}
. Then
after measurement, the state of 2 is
X
s
1
J(α) |ψi.
Also, two outcomes
s
= 0
,
1 always occurs with probability
1
2
, regardless of the
values of |ψib, α.
Proof. We just write it out. We write
|ψi = a |0i + b |1i.
Then we have
E
12
(|ψi
1
|+i
2
) =
1
√
2
E
12
(a |0i|0i + a |0i|1i + b |1i|0i + b |1i|1i)
=
1
√
2
(a |0i|0i + a |0i|1i + b |1i|0i −b |1i|1i)
So if we measured 0, then we would get something proportional to
h+
α
|
1
E
12
(|ψi
1
|+i
2
) =
1
2
(a |0i + a |1i + be
iα
|0i − be
iα
|1i)
=
1
2
1 e
iα
1 −e
iα
a
b
,
as required. Similarly, if we measured 1, then we get XJ(α) |ψi.
We will usually denote processes by diagrams. In this case, we started with
the graph state
|ψi |+i
and the measurement can be pictured as
α
s
1
|ψi |+i
If we measure Z, we denote that by
Z
i
In fact, this can be extended if 1 is just a single qubit part of a larger multi-qubit
system 1S, i.e.
Lemma. Suppose we start with a state
|ψi
1S
= |0i
1
|ai
S
+ |1i
1
|bi
S
.
We then apply the
J
-lemma process by adding a new qubit
|+i
for 2
6∈ S
, and
then query 1. Then the resulting state is
X
s
1
2
J
2
(α) |ψi
2S
.
So the J-lemma allows us to simulate J-gates with measurements. But we
want to do many J gates. So we need the concatenation lemma:
Lemma
(Concatenation lemma)
.
If we concatenate the process of J-lemma on a
row of qubits 1
,
2
,
3
, ···
to apply a sequence of J(
α
) gates, then all the entangling
operators E
12
, E
23
, ··· can be done first before any measurements are applied.
It is a fact that for any composite quantum system
A ⊗ B
, any local actions
(unitary gates or measurements) done on
A
always commutes with anything
done on
B
, which is easy to check by expanding out the definition. So the proof
of this is simple:
Proof.
For a state
|ψi
1
|+i
2
|+i
3
···
, we can look at the sequence of J-processes
in the sequence of operations (left to right):
E
12
M
1
(α
1
)E
23
M
2
(α
2
)E
34
M
3
(α
3
) ···
It is then clear that each E
ij
commutes with all the measurements before it. So
we are safe.
We can now determine the measurement-based quantum computation process
corresponding to a quantum circuit
C
of gates
U
1
, U
2
, ··· , U
K
with each
U
i
either
a J(
α
) or a nearest-neighbour E
ij
. We may wlog assume the input state to
C
is
|+i···|+i
as any 1-qubit product state may be written as
|ψi = U |+i
for suitable
U
, which is then represented as at most three J(
α
)’s. So we simply
prefix C with these J(α) gates.
Example. We can write |ji for j = 0, 1 as
|ji = X
j
H |+i.
We also have
H = J(0), X = J(π)J(0).
So the idea is that we implement these J(
α
) gates by the J-processes we just
described, and the nearest-neighbour E
ij
gates will just be performed when we
create the graph state.
We first do a simple example:
Example. Consider the circuit C given by
|+i
|+i
J(α
1
)
J(α
2
)
J(α
3
)
where the vertical line denotes the
E
12
operators. At the end, we measure the
outputs i
1
, i
2
by M(Z) measurements.
We use the graph state
In other words, we put a node for a
|+i
, horizontal line for a J(
α
) and a vertical
line for an E.
If we just measured all the qubits for the
J
-process in the order
α
1
, α
2
, α
3
,
and then finally read off the final results i
1
, i
2
:
s
1
α
1
s
2
α
2
s
3
α
3
i
1
Z
i
2
Z
then we would have effected the circuit
|+i
|+i
J(α
1
)
J(α
2
)
X
s
1
X
s
2
J(α
3
)
X
s
3
Now the problem is to get rid of the
X
i
’s. We know each
X
i
comes with
probability
1
2
. So the probability of them all not appearing is tiny for more
complicated circuits, and we cannot just rely on pure chance for it to turn out
right.
To deal with the unwanted
X
i
“errors”, we want to commute them out to
the end of the circuit. But they do not commute, so we are going to use the
following commutation relations:
J(α)X = e
iα
ZJ(−α)
In other words, up to an overall phase, the following diagrams are equivalent:
X
J(α)
is equivalent to
J(−α)
X
More generally, we can write
J
i
(α)X
s
i
= e
−iαs
Z
s
i
J
i
((−1)
s
α)
J
i
(α)Z
s
i
= X
s
i
J
i
(α)
E
ij
Z
s
i
= Z
s
i
E
ij
E
ij
X
s
i
= X
s
i
Z
s
i
E
ij
Here the subscripts tell us which qubit the gates are acting on.
The last one corresponds to
X
is equivalent to
X
Z
All of these are good, except for the first one where we have a funny phase and
the angle is negatived. The phase change is irrelevant because it doesn’t affect
measurements, but the sign changes are problematic. To fix this, we need to use
adaptive measurements.
Example. Consider the simpler 1-qubit circuit
|+i J(α
1
) J(α
2
)
We first prepare the graph sate
We now measure the first qubit to get
r
1
α
1
We have thus done
|+i J(α
1
)
X
r
1
To deal with the unwanted X
r
1
, we note that
X
r
1
J(α
2
)
is equivalent to
J((−1)
r
1
α
2
)
Z
r
1
So we adapt the sign of the second measurement angle to depend on the previous
measurement result:
r
1
α
1
r
2
(−1)
r
1
α
2
Then this measurement results in
J(α
1
)
X
r
1
J((−1)
r
1
α
2
)
X
r
2
which is equivalent to
J(α
1
) J(α
2
)
Z
r
1
X
r
2
If we had further J-gates, we need to commute both Z
r
1
and X
r
2
over.
Note that while we are introducing a lot of funny X’s and Z’s, these are all
we’ve got, and the order of applying them does not matter, as they anti-commute:
XZ = −ZX.
So if we don’t care about the phase, they effectively commute.
Also, since X
2
= Z
2
=
I
, we only need to count the number of X’s and Z’s
mod 2, which is very helpful.
Now what do we do with the Z and X at the end? For the final Z-measurement,
having moved everything to the end, we simply reinterpret the final, actual
Z-measurement result j:
(i)
The Z-gate do es not affect outcome or probability of a Z-measurement,
becasuse if
|ψi = a |0i + b |1i,
then
Z |ψi = a |0i − b |1i.
So the probabilities of |0i and |1i are |a|
2
and |b|
2
regardless.
(ii)
The X gate simply interchanges the labels, while leavining probabilities
the same , because if
|ψi = a |0i + b |1i,
then
X |ψi = a |1i + b |0i.
So we ignore all Z-errors, and for each X
r
error, we just modify the seen
measurement outcome j by j 7→ j ⊕ r.
If we actually implement measurement-based quantum computations, the
measurements can always be done “left to right”, implementing the gates in order.
However, we don’t have to do that. Recall that quantum operations on disjoint
qubits always commute. Since the only thing we are doing are measurements,
all
M
i
(
α
) measurements can be performed simultaneously if the angles
α
do
not depend on other measurements. This gives us a novel way to parallel a
computation.
For example, in our simple example, we can start by first measuring
r
1
and
j
, and then measuring
r
2
after we know
r
1
. In particular, we can first measure
the “answer”
j
, before we do any other thing! The remaining measurements just
tell us how we should interpret the answer.
In general, we can divide the measurements into “layers” — the first layer
consists of all measurements that do not require any adaptation. The second
layer then consists of the measurements that only depends on the first layer. The
logical depth is the least number of layers needed, and this somewhat measures
the complexity of our circuit.
5 Phase estimation algorithm
We now describe a quantum algorithm that estimates the eigenvalues of a unitary
operator. Suppose we are given a unitary operator
U
and an eigenstate
|v
ϕ
i
.
Then we can write
U |V
ϕ
i = e
2πiϕ
|v
ϕ
i
with 0 ≤ ϕ < 1. Our objective is to estimate ϕ to n binary bits of precision:
ϕ ≈ 0.i
1
i
2
i
3
···i
n
=
i
1
2
+
i
2
2
2
+ ··· +
i
n
2
n
.
We will need the controlled U
k
gate c - U
k
for integers k, defined by
c - U
k
|0i|ξi = |0i|ξi
c - U
k
|1i|ξi = |1iU
k
|ξi,
where |0i, |1i are 1-qubit states, and |ξi is a general d-dimensional register.
Note that we have
U
k
|v
ϕ
i = e
2πikϕ
|v
ϕ
i,
and we have
c - U
k
= (c - U )
k
,
Note that if we are given
U
as a formula or a circuit description, then we can
readily implement c
- U
by adding control to each gate. However, if
U
is a
quantum black-box, then we need further information. For example, it suffices to
have an eigenstate
|αi
with known eigenvalue
e
iα
. However, we will not bother
ourselves with that, and just assume that we can indeed implement it.
In fact, we will use a “generalized” controlled U given by
|xi|ξi 7→ |xiU
x
|ξi,
where |xi has n qubits. We will make this from c - U
k
= (c - U )
k
as follows: for
x = x
n−1
···x
1
x
0
= x
0
+ 2
1
x
1
+ 2
2
x
2
+ ··· + 2
n−1
x
n−1
,
we write c - U
k
i
for the controlled U
k
controlled by i. Then we just construct
U
2
0
0
U
2
1
1
···U
2
n−1
n−1
.
Now if input |ξi = |v
ϕ
i, then we get
e
2πiϕx
|xi|v
ϕ
i.
To do phase estimation, we superpose the above over all
x
= 0
,
1
,
2
, ··· ,
2
n−1
and use |ξi = |v
ϕ
i. So we construct our starting state by
|si = H ⊗··· ⊗H |0 ···0i =
1
√
2
n
X
all x
|xi.
Now if we apply the generalized control U, we obtain
1
√
2
n
X
x
e
2πiϕx
|xi
!
| {z }
|Ai
|v
ψ
i.
Finally, we apply the inverse Fourier transform
QFT
−1
2
n
to
|Ai
and measure to
see y
0
, y
1
, ··· , y
n−1
on lines 0, 1, ··· , n −1. Then we simply output
0.y
0
y
1
···y
n−1
=
y
0
2
+
y
1
4
+ ··· +
y
n−1
2
n
=
y
0
y
1
···y
n−1
2
n
.
as the estimate of ϕ.
Why does this work? Suppose
ϕ
actually only had
n
binary digits. Then we
have
ϕ = 0.z
0
z
1
······z
n−1
=
z
2
n
,
where z ∈ Z
2
n
. Then we have
|Ai =
1
√
2
n
X
x
2
2πixz/2
n
|xi,
which is the Fourier transform of
|zi
. So the inverse Fourier transform of
|Ai
is
exactly |Zi and we get ϕ exactly with certainty.
If ϕ has more than n bits, say
ϕ = 0.z
0
z
1
···z
n−1
z
n
z
n+1
··· ,
then we have
Theorem.
If the measurements in the above algorithm give
y
0
, y
1
, ··· , y
n
and
we output
θ = 0.y
0
y
1
···y
n−1
,
then
(i) The probability that θ is ϕ to n digits is at least
4
π
2
.
(ii) The probability that |θ − ϕ| ≥ ε is at most O(1/(2
n
ε)).
The proofs are rather boring and easy algebra.
So for any fixed desired accuracy
ε
, the probability to fail to get
ϕ
to this
accuracy falls exponentially with n.
Note that if c
- U
2
k
is implemented as (c
- U
)
2
k
, then the algorithm would
need
1 + 2 + 4 + ··· + 2
n−1
= 2
n−1
many c
- U
gates. But for some special
U
’s, this c
- U
2
k
can be implem ented in
polynomial time in k.
For example, in Kitaev’s factoring algorithm, for
hcf
(
a, N
) = 1, we will use
the function
U : |mi 7→ |am mod Ni.
Then we have
U
2
k
|mi =
a
2
k
m
E
,
which we can implement by repeated squaring.
Now what if we didn’t have an eigenstate to being with? If instead of
|v
ϕ
i
,
we used a general input state |ξi, then we can write
|ξi =
X
j
c
j
v
ϕ
j
,
where
U
v
ϕ
j
= e
2πiϕ
j
v
ϕ
j
.
Then in the phase estimation algorithm, just before the final measurement, we
have managed to get ourselves
|0 ···0i|ξi →
X
j
c
j
|ϕ
j
i
v
ϕ
j
.
Then when we measure, we get one of the
ϕ
j
’s (or an approximation of it) with
probability
|c
j
|
2
. Note that this is not some average of them. Of course, we
don’t know which one we got, but we still get some meaningful answer.
Quantum counting
An application of this is the quantum counting problem. Given
f
:
B
n
→ B
with k good x’s, we want to estimate the number k.
Recall the Grove iteration operator
Q
G
is rotation through 2
θ
in a 2-
dimensional plane spanned by
|ψ
0
i =
1
√
2
n
X
x
|xi
and its good projection, and θ is given by
sin θ ≈ θ =
r
k
N
.
Now the eigenvalues of this rotation in the plane are
e
2iθ
, e
−2iθ
.
So either eigenvalue will suffice to get k.
We will equivalently write
e
i2θ
= e
2πiϕ
with
0 ≤ ϕ < 1.
Then ±2θ is equivalent to ϕ or 1 −ϕ, where ϕ is small.
Now we don’t have an eigenstate, but we can start with any state in the
plane,
|ψ
0
i
. We then do phase estimation with it. We will then get either
ϕ
or
1
− ϕ
with some probabilities, but we don’t mind which one we get, since we
can obtain one from the other, and we can tell them apart because ϕ is small.
6 Hamiltonian simulation
So. Suppose we did manage to invent a usable quantum computer. What would
it be goo d for? Grover’s algorithm is nice, but it seems a bit theoretical. You
might say we can use Shor’s algorithm to c rack encryption, but then if quantum
computers are available, then no one would be foolish enough to use encryption
that is susceptible to such attacks. So what can we actually do?
One useful thing would be to simulate physical systems. If we have a quantum
system and with n qubits, then we want to simulate its evolution over time, as
governed by Schr¨odinger’s equation. Classically, a
n
-qubit system is specified by
2
n
complex numbers, so we would expect any such algorithm to have performance
at best
O
(2
n
). However, one would imagine that to simulate a quantum
n
-qubit
system in a quantum computer, we only need
n
-qubits! Indeed, we will see that
we will be able to simulate quantum systems in polynomial time in n.
In a quantum physical system, the state of the system is given by a state
|ψi
, and the evolution is governed by a Hamiltonian
H
. This is a self-adjoint
(Hermitian) operator, and in physics, this represents the energy of the system.
Thus, we have
hψ|H |ψi = average value obtained in measurement of energy.
The time evolution of the particle is given by the Schr¨odinger equation
d
dt
|ψ(t)i = −iH |ψ(t)i.
We’ll consider only time-independent Hamiltonians
H
(
t
) =
H
. Then the solution
can be written down as
|ψ(t)i = e
−iHt
|ψ(0)i.
Here e
−iHt
is the matrix exponential given by
e
A
= I + A +
A
2
2
+ ··· ,
Thus, given a Hamiltonian
H
and a time
t
, we want to simulate
U
(
t
) =
e
−iHt
to suitable approximations.
Before we begin, we note the following useful definitions:
Definition (Operator norm). The operator norm of an operator A is
kAk = max
k|ψik=1
kA |ψik.
If A is diagonalizable, then this is the maximum eigenvalue of A.
The following properties are easy to see:
Proposition.
kA + Bk ≤ kAk + kBk
kABk ≤ kAkkBk.
We now begin. There will be a slight catch in what we do. We will have
to work with special Hamiltonians known as
k
-local Hamiltonians for a fixed
k
.
Then for this fixed
k
, the time required to simulate the system will be polynomial
in
n
. However, we should not expect the complexity to grow nicely as we increase
k!
So what is a
k
-local Hamiltonian? This is a Hamiltonian in which each
interaction governed by the Hamiltonian only involves
k
qubits. In other words,
this Hamiltonian can be written as a sum of operators, each of which only
touches
k
qubits. This is not too bad a restriction, because in real life, most
Hamiltonians are indeed local, so that if each qubit represents a particle, then
the behaviour of the particle will only be affected by the particles near it.
Definition (k-local Hamiltonian). We say a Hamiltonian H is k-local (for k a
fixed constant) on n qubits if
H =
m
X
j=1
H
j
,
where each
H
j
acts on at most
k
qubits (not necessarily adjacent), i.e. we can
write
H
j
=
˜
H
j
⊗ I,
where
˜
H
j
acts on some k qubits, and I acts on all other qubits as the identity.
The number m of terms we need is bounded by
m ≤
n
k
= O(n
k
),
which is polynomial in n.
Example. The Hamiltonian
H = X ⊗I ⊗ I − Z ⊗I ⊗ Y
is 2-local on 3 qubits.
We write M
(i)
to denote the operator M acting on the ith qubit.
Example. We could write
X ⊗ I ⊗ I = X
(1)
.
Example
(Ising model)
.
The Ising model on an
n × n
square lattice of qubits
is given by
H = J
n−1
X
i,j=1
Z
(i,j)
Z
i,j+1
+ Z
(i,j)
Z
(i+1,j)
.
Example (Heisenberg model). The Heisenberg model on a line is given by
H =
n−1
X
i=1
J
x
X
(i)
X
(i+1)
+ J
y
Y
(i)
Y
(i+1)
+ J
z
Z
(i)
Z
(i+1)
,
where J
x
, J
y
and J
z
are real constants.
This is useful in modelling magnetic system.
The idea is that we simulate each
e
iH
j
t
separately, and then put them together.
However, if {H
j
} doesn’t commute, then in general
e
−i
P
j
H
j
t
6=
Y
j
e
−iH
j
t
.
So we need to somehow solve this problem. But putting it aside, we can start
working on the quantum simulation problem.
We will make use of the following theorem:
Theorem (Solovay-Kitaev theorem). Let U be a unitary operator on k qubits
and
S
any universal set of quantum gates. Then
U
can be approximated to
within ε using O(log
c
1
ε
) from S, where c < 4.
In other words, we can simulate each
e
−iH
j
t
with very modest overhead in
circuit size for improved error, assuming we fix k.
Proof. Omitted.
We will also need to keep track of the accumulation of errors. The following
lemma will be useful:
Lemma. Let {U
i
} and {V
i
} be sets of unitary operators with
kU
i
− V
i
k ≤ ε.
Then
kU
m
···U
1
− V
m
···V
1
k ≤ mε.
This is remarkable!
Proof.
See example sheet 2. The idea is that unitary gates preserve the size of
vectors, hence do not blow up errors.
We start by doing a warm-up: we solve the easy case where the terms in the
Hamiltonian commute.
Proposition. Let
H =
m
X
j=1
H
j
be any k-local Hamiltonian with commuting terms.
Then for any t, e
−iHt
can be approximated to within ε by a circuit of
O
m poly
log
m
ε
gates from any given universal set.
Proof.
We pick
ε
0
=
ε
m
, and approximate
e
−iH
j
t
to within
ε
0
. Then the total
error is bounded by mε
0
= ε, and this uses
O
m poly
log
m
ε
gates.
We now do the full non-commutative case. To do so, we need to keep track
of how much e
iH
i
t
e
−iH
j
t
differs from e
i(H
i
+H
j
)t
.
Notation. For a matrix X, we write
X + O(ε)
for X + E with kEk = O(ε).
Then we have
Lemma
(Lie-Trotter product formula)
.
Let
A, B
be matrices with
kAk, kBk ≤
K < 1. Then we have
e
−iA
e
−iB
= e
−i(A+B)
+ O(K
2
).
Proof. We have
e
−iA
= 1 −iA +
∞
X
k=2
(iA)
k
k!
= I − iA + (iA)
2
∞
X
k=0
(−iA)
k
(k + 2)!
We notice that
k
(
iA
)
2
k ≤ K
2
, the final sum has norm bounded by
e
K
< e
. So
we have
e
−iA
= I − iA + O(K
2
).
Then we have
e
−iA
e
−iB
= (I − iA + O(K
2
))(I − iB + O(K
2
))
= I − i(A + B) + O(K
2
)
= e
−i(A+B)
+ O(K
2
).
Here we needed the fact that
kA
+
Bk ≤
2
K
=
O
(
K
) and
kABk ≤ K
2
=
O(K
2
).
We now apply this repeatedly to accumulate sums
H
1
, H
2
, .., H
m
in the
exponent. First of all, we note that if each
kH
i
k < K
, then
kH
i
+
···
+
H
`
k < `K
.
We want this to be
<
1 for all
` ≤ m
. So for now, we assume
K <
1
m
. Also, we
take t = 1 for now. Then consider
e
−iH
1
e
−iH
2
···e
−iH
m
= (e
−i(H
1
+H
2
)
+ O(K
2
))e
−iH
3
···e
−iH
m
= e
−i(H
1
+H
2
)
e
−iH
3
···e
−iH
m
+ O(K
2
)
= e
−i(H
1
+H
2
+H
3
)
e
−iH
4
···e
−iH
m
+ O((2K)
2
) + O(K
2
)
= e
−i
P
H
j
+ O(m
3
K
2
),
where we used the fact that
1
2
+ 2
2
+ ··· + m
2
= O(m
3
).
We write the error as Cm
3
K
2
.
This is fine if
K
is super small, but it won’t be in general. For general
K
and t values, we introduce a large N such that
H
j
t
N
<
Kt
N
≤
˜
K < 1.
In other words, we divide time up into small
t
N
intervals. We then try to simulate
U = e
−i(H
1
+···+H
m
)t
=
e
−i
(
H
1
t
N
+...+
H
n
t
N
)
N
.
This equality holds because we know that
t
N
(
H
1
+
···
+
H
n
) commutes with
itself (as does everything in the world).
We now want to make sure the final error for
U
is
< ε
. So we know each
term
e
−i
(
H
1
t
n
+...+
H
n
t
n
)
needs to be approximated to
ε
N
. So using our previous
formula, we want that
Cm
3
˜
K
2
<
ε
N
,
Doing some algebraic manipulation, we find that we need
N >
Cm
3
K
2
t
2
ε
.
We now have Nm gates of the form e
iH
j
t/N
. So the circuit size is at most
O
m
4
(Kt)
2
ε
.
Recall for
n
-qubits, a general
k
-local Hamiltonian has
m
=
O
(
n
k
). So the circuit
size is
|C| = O
n
4k
(Kt)
2
ε
.
Now this is in terms of the number of
e
iH
j
t/N
gates. If we want to express this
in terms of universal gates, then each gate needs to be approximated to
O
(
ε/|C|
).
We then need
O
(
log
c
(
|C|
ε
)) gates for each, for some
c <
4. So we only get a
modest extra multiplicative factor in |C|.
Note that for a fixed
n
with a variable
t
, then a quantum process
e
−iHt
runs
in time
t
, but our simulation needs time
O
(
t
2
). This can be improved to
O
(
t
1+δ
)
for any δ > 0 by using “better” Lie-Trotter expansions.
Local Hamiltonian ground state problem
There are many other things we might want to do with a
k
-local Hamiltonian.
One question we might be interested in is the eigenvalues of
H
. Suppose we are
given a 5-local Hamiltonian
H =
m
X
i=1
H
j
on
n
qubits (this was the original result proved). We suppose
kH
i
k <
1, and we
are given two numbers
a < b
, e.g.
a
=
1
3
and
b
=
2
3
. We are promised that the
smallest eigenvalue
E
0
of
H
is
< a
or
> b
. The problem is to decide whether
E
0
< a.
The reason we have these funny
a, b
is so that we don’t have to worry about
precision problems. If we only had a single
a
and we want to determine if
E
0
> a
or
E
0
< a
, then it would be difficult to figure out if
E
0
happens to b e very close
to a.
Kitaev’s theorem says that the above problem is complete for a complexity
class known as
QMA
, i.e. it is the “hardest” problem in
QMA
. In other words,
any problem in QMA can be translated into a local Hamiltonian ground state
problem with polynomial overhead. A brief survey can be found on arXiv:quant-
ph/0210077.
What is this
QMA
? We will not go into details, but roughly, it is a quantum
version of
NP
. In case you are wondering, MA stands for Merlin and Arthur. . .