III Quantum Computation (Full)

Part III — Quantum Computation

Based on lectures by R. Jozsa

Notes taken by Dexter Chua

Michaelmas 2016

These notes are not endorsed by the lecturers, and I have modified them (often

significantly) after lectures. They are nowhere near accurate representations of what

was actually lectured, and in particular, all errors are almost surely mine.

Quantum mechanical processes can be exploited to provide new modes of information

pro cessing that are beyond the capabilities of any classical computer. This leads to

remarkable new kinds of algorithms (so-called quantum algorithms) that can offer

a dramatically increased efficiency for the execution of some computational tasks.

Notable examples include integer factorisation (and consequent efficient breaking of

commonly used public key crypto systems) and database searching. In addition to such

p otential practical benefits, the study of quantum computation has great theoretical

interest, combining concepts from computational complexity theory and quantum

physics to provide striking fundamental insights into the nature of both disciplines.

The course will cover the following topics:

Notion of qubits, quantum logic gates, circuit model of quantum computation. Basic

notions of quantum computational complexity, oracles, query complexity.

The quantum Fourier transform. Exp osition of fundamental quantum algorithms

including the Deutsch-Jozsa algorithm, Shor’s factoring algorithm, Grovers searching

algorithm.

A selection from the following further topics (and possibly others):

(i)

Quantum teleportation and the measurement-based model of quantum computa-

tion;

(ii) Lower bounds on quantum query complexity;

(iii) Phase estimation and applications in quantum algorithms;

(iv) Quantum simulation for local hamiltonians.

Pre-requisites

It is desirable to have familiarity with the basic formalism of quantum mechanics

especially in the simple context of finite dimensional state spaces (state vectors, Dirac

notation, composite systems, unitary matrices, Born rule for quantum measurements).

Prerequisite notes will be provided on the course webpage giving an account of the

necessary material including exercises on the use of notations and relevant calculational

techniques of linear algebra. It would be desirable for you to look through this material

at (or slightly before) the start of the course. Any encounter with basic ideas of classical

theoretical computer science (complexity theory) would b e helpful but is not essential.

Contents

0 Introduction

1 Classical computation theory

2 Quantum computation

3 Some quantum algorithms

3.1 Balanced vs constant problem

3.2 Quantum Fourier transform and periodicities

3.3 Shor’s algorithm

3.4 Search problems and Grover’s algorithm

3.5 Amplitude amplification

4 Measurement-based quantum computing

5 Phase estimation algorithm

6 Hamiltonian simulation

0 Introduction

Quantum computation is currently a highly significant and important subject,

and is very active in international research.

First of all, it is a fundamental connection between physics and computing.

We can think of physics as computing, where in physics, we label states with

parameters (i.e. numbers), and physical evolution changes these parameters.

So we can think of these parameters as encoding information, and physical

evolution changes the information. Thus, this evolution can be thought of as a

computational process.

More strikingly, we can also view computing as physics! We all have com-

puters, and usually represent information as bits, 0 or 1. We often think of

computation as manipulation of these bits, i.e. as discrete maths. However, there

is no actual discrete bits — when we build a computer, we need physical devices

to represent these bits. When we run a computation on a computer, it has to

obey the laws of physics. So we arrive at the idea that the limits of computation

are not a part of mathematics, but depend on the laws of physics. Thus, we can

associate a “computing power” with any theory of physics!

On the other hand, there is also a technology/engineering aspect of quantum

computation. Historically, we have been trying to reduce the size of computers.

Eventually, we will want to try to achieve miniaturization of computer compo-

nents to essentially the subatomic scale. The usual boolean operations we base

our computations on do not work so well on this small scale, since quantum

effects start to kick in. We could try to mitigate these quantum issues and

somehow force the bits to act classically, but we can also embrace the quantum

effects, and build a quantum computer! There is a lot of recent progress in

quantum technology. We are now expecting a 50-qubit quantum computer in full

coherent control soon. However, we are not going to talk about implementation

in this course.

Finally, apart from the practical problem of building quantum computers, we

also have theoretical quantum computer science, where we try to understand how

quantum algorithms behave. This is about how we can actually exploit quantum

physical facts for computational possibilities beyond classical computers. This

will be the focus of the course.

1 Classical computation theory

To appreciate the difference between quantum and classical computing, we need

to first understand classical computing. We will only briefly go over the main

ideas instead of working out every single technical detail. Hence some of the

definitions might be slightly vague.

We start with the notion of “computable”. To define computability, one

has to come up with a sensible mathematical model of a computer, and then

“computable” means that theoretical computer can compute it. So far, any two

sensible mathematical models of computations we manage to come up with are

equivalent, so we can just pick any one of them. Consequently, we will not spend

much time working out a technical definition of computable.

Example.

Let

be an integer. We want to figure out if

a prime. This is

clearly computable, since we can try all numbers less than

and see if it divides

This is not too surprising, but it turns out there are some problems that are

not computable! Most famously, we have the Halting problem.

Example

(Halting problem)

Given the code of a computer program, we want

to figure out if the computer will eventually halt. In 1936, Turing proved that

this problem is uncomputable! So we cannot have a program that determines if

an arbitrary program halts.

For a less arbitrary problem, we have

Example.

Given a polynomial with integer coefficients with many variables,

e.g. 2

y −

+ 1, does this have a root in the integers? It was

shown in 1976 that this problem is uncomputable as well!

These results are all for classical computing. If we exp e ct quantum computing

to be somehow different, can we get around this problems? This turns out not to

be the case, for the very reason that all the laws of quantum physics (e.g. state

descriptions, evolution equations) are all computable on a classical computer

(in principle). So it follows that quantum computing, being a quantum process,

cannot compute any classical uncomputable problem.

Despite this limitation, quantum computation is still interesting! In practice,

we do not only care about computability. We care about how efficient we are at

doing the computation. This is the problem of complexity — the complexity of

a quantum computation might be much simpler than the classical counterpart.

To make sense of complexity, we need to make our notion of computations a

bit more precise.

Definition

(Input string)

An input bit string is a sequence of bits

···i

where each

is either 0 or 1. We write

for the set of all

-bit string, and

n∈N

. The input size is the length

. So in particular, if the input is

regarded as an actual number, the size is not the number itself, but its logarithm.

Definition (Language). A language is a subset L ⊆ B.

Definition

(Decision problem)

Given a language

, the decision problem is to

determine whether an arbitrary

x ∈ B

is a member of

. The output is thus 1

bit of information, namely yes or no.

Of course, we can have a more general task with multiple outputs, but for

simplicity, we will not consider that case here.

Example.

is the set of all prime numbers, then the corresponding decision

problem is determining whether a number is prime.

We also have to talk about models of computations. We will only give an

intuitive and classical description of it.

Definition

(Computational model)

A computational model is a process with

discrete steps (elementary computational steps), where each step requires a

constant amount of effort/resources to implement.

If we think about actual computers that works with bits, we can imagine a

step as an operation such as “and” or “or”. Note that addition and multiplication

are not considered a single step — as the number gets larger, it takes more effort

to add or multiply them.

Sometimes it is helpful to allow some randomness.

Definition

(Randomized/probabilistic computation)

This is the same as a usual

computational mo del, but the process also has access to a string

, r

, ···

of independent, uniform random bits. In this case, we will often require the

answer/output to be correct with “suitably good” probability.

In computer science, there is a separate notion of “non-deterministic” com-

putation, which is different from probabilistic computation. In probabilistic

computation, every time we ask for a random number, we just pick one of the

possible output and follows that. With a non-deterministic computer, we simul-

taneously consider all possible choices with no extra overhead. This is extremely

powerful, and also obviously physically impossible, but it is a convenient thing

to consider theoretically.

Definition

(Complexity of a computational task (or an algorithm))

The com-

plexity of a computational task or algorithm is the “consumption of resources as

a function of input size n”. The resources are usually the time

T (n) = number of computational steps needed,

and space

Sp(n) = number of memory/work space needed.

In each case, we take the worse case input of a given size n.

We usually consider the worst-case scenario, since, e.g. for primality testing,

there are always some numbers which we can easily rule out as being not

prime (e.g. even numbers). Sometimes, we will also want to study the average

complexity.

In the course, we will mostly focus on the time complexity, and not work

with the space complexity itself.

As one would imagine, the actual time or space taken would vary a lot on the

actual computational model. Thus, the main question we ask will be whether

T (n) grows polynomially or sup er-polynomially (“exponentially”) with n.

Definition (Polynomial growth). We say T (n) grows polynomially, and write

T (n) = O(poly(n)) = O(n

)

for some

, if there is some constant

, and some integer

and some integer

such that T (n) < cn

for all n > n

The other possible cases are exponential growth, e.g.

(

) =

, or

super-polynomial and sub-exponential growth such as T (n) = 2

√

or n

log n

We will usually regard polynomial time processes as “feasible in practice”,

while super-polynomial ones are considered “infeasible”. Of course, this is not

always actually true. For example, we might have a polynomial time of

or an exponential time of 2

0.0000...0001n

. However, this distinction of polynomial

vs non-polynomial is robust, since any computational model can “simulate” other

computational models in polynomial time. So if something is polynomial in one

computational model, it is polynomial in all models.

In general, we can have a more refined complexity classes of decision problems:

(i) P

(polynomial time): The class of decision problems having deterministic

polynomial-time algorithm.

(ii) BPP

(bounded error, probabilistic polynomial time): The class of decision

problems having probabilistic polynomial time algorithms such that for

every input,

Prob(answer is correct) ≥

The number

is sort of arbitrary — we see that we cannot put

, or

else we can just randomly guess a number. So we need something greater

than

, and “bounded” refers to it being bounded away from

. We could

replace

with any other constant

with 0

< δ <

, and

BPP

is the

same. This is because if we have a

algorithm, we simply repeat the

algorithm

times, and take the majority vote. By the Chernoff bound (a

result in probability), the probability that the majority vote is correct is

− e

−2δ

. So as we do more and more runs, the probability of getting

a right answer grows exponentially. This can be bigger than an 1

− ε

a suitably large

. Since

times a polynomial time is still polynomial

time, we still have a polynomial time algorithm.

These two are often considered as “classically feasible computations”, or “com-

putable in practice”. In the second case, we tolerate small errors, but that is fine

in practice, since in genuine computers, random cosmic rays and memory failures

can also cause small errors in the result, even for a deterministic algorithm.

It is clear that

is contained in

BPP

, but we do not know about the other

direction. It is not known whether

and

BPP

are the same — in general, not

much is known about whether two complexity classes are the same.

Example

(Primality testing)

Let

be an integer. We want to determine if it

is prime. The input size is

log

. The naive method of primality testing is to

test all numbers and see if it divides

. We only need to test up to

√

, since if

has a factor, there must be one below

√

. The is not polynomial time, since

we need

√

N = 2

log N

operations, we see that this is exponential time.

How about a probabilistic algorithm? We can choose a random

k < N

, and

see if

divides

. This is a probabilistic, polynomial time algorithm, but it is

not bounded, since the probability of getting a correct answer is not >

In reality, primality testing is known to be in

BPP

(1976), and it is also

known to be in P (2004).

Finally, we quickly describe a simple model of (classical) computation that we

will use to build upon later on. While the most famous model for computation

is probably the Turing machine, for our purposes, it is much simpler to work

with the circuit model.

The idea is simple. In general, we are working with bits, and a program is a

function

→ B

. It is a mathematical fact that any such function can be

constructed by combinations of boolean

AND

and

NOT

gates. We say that

this is a universal set of gates. Thus a “program” is a specification of how to

arrange these gates in order to give the function we want, and the time taken by

the circuit is simply the number of gates we need.

Of course, we could have chosen a different universal set of gates, and the

programs would be different. However, since only a fixed number of gates is

needed to construct

AND

and

NOT

from any universal set, and vice versa,

it follows that the difference in time is always just polynomial.

2 Quantum computation

We are now going to start talking ab out quantum computation. Our model of

quantum computation will be similar to the circuit model.

The main difference is that instead of working with bits, we work with qubits.

A single qubit is an element of C

, with basis vectors

|0i =





, |1i =





When we have multiple qubits, we write them as

|ai|bi

, which is a shorthand

for |ai ⊗ |bi etc.

Now any classical bit string x = i

···i

can be encoded as a qubit

i|i

i···|i

i|0i···|0i ∈

n+k

i=0

∼

n+k

where we padded

extra zeroes at the end. In classical computation, there was

no such need, because within any computation, we were free to introduce or

remove extra bits. However, one peculiarity of quantum computation is that all

processes are invertible, and in particular, the number of qubits is always fixed.

So if we want to do something “on the side” during the computations, the extra

bits needed to do so must be supplied at the beginning.

Now the quantum gates are not just bo olean functions, but unitary operators

on the states. The standard gates will operate on one or two qubits only, and

we can chain them together to get larger operators.

We now list our standard unitary gates. The four main (families) single-qubit

gates we will need are the following (in the standard |0i, |1i basis):

X =



0 1

1 0



, Z =



1 0

0 −1



, H =

√



1 1

1 −1



, P



1 0

0 e



We also have two “controlled” gates. These controlled gates take in two qubits.

It does not directly change the first qubit, and will decide whether or not to act

on the second bit depending on the value of the first. They are given by

CX |ii|ji = |iiX

|ji, CZ |ii|ji = |ii

In the basis {|0i|0i, |0i|1i, |1i|0i, |1i|1i}, we can write these operators as

CX =







1 0 0 0

0 1 0 0

0 0 0 1

0 0 1 0







, CZ =







1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 −1







These will be taken as the basic unitary gates.

The other thing we are allowed to do in a quantum system is measurement.

What does this do? We wlog we are measuring the first qubit. Suppose the state

before the measurement is given by

|0i|ai + c

|1i|bi,

where |ai, |bi are (n −1)-qubit states of unit norm, and |c

+ |c

= 1.

Then when we measure the first qubit, we have probability

of getting 0,

and probability

of getting 1. After measuring, the resulting state is either

|0i|ai or |1i|bi, depending on whether we measured 0 or 1 respectively.

Measurements are irreversible, and in particular aren’t given by unitary

matrices. We will allow ourselves to make classical computations based on the

results of the measurements, and decide which future quantum gates we apply

based on the results of the classical computations.

While this seems like a very useful thing to do, it is a mathematical fact

that we can modify such a system to an equivalent quantum one where all

the measurements are done at the end instead. So we are not actually adding

anything new to our mathematical model of computation by allowing such

classical manipluations.

Now what is the analogous notion of a universal set of gates? In the classical

case, the set of boolean functions is discrete, and therefore a finite set of gates

can be universal. However, in the quantum case, the possible unitary matrices

are continuous, so no finite set can be universal (more mathematically, there is

an uncountable number of unitary matrices, but a finite collection of gates can

only generate a countable subgroup of matrices).

Thus, instead of asking for universality, we ask for approximate universality.

To appreciate this, we can take the example of rotations — there is no single

rotation that generates all possible rotations in

. However, we can pick a

rotation by an irrational angle, and then the set of rotations generated by this

rotation is dense in the set of all rotations, and this is good enough.

Definition

(Approximate universality)

A collection of gates is approximately

universal if for any unitary matrix

and any

ε >

0, there is some circuit

built out of the collection of gates such that



U −



< ε.

In other words, we have

sup

kψk=1



U |ψi −

U |ψi



< ε,

where we take the usual norm on the vectors (any two norms are equivalent if

the state space is finite dimensional, so it doesn’t really matter).

We will provide some examples without proof.

Example. The infinite set {CX} ∪ {all 1-qubit gates} is exactly universal.

Example. The collection

{H, T = P

π/4

, CX}

is approximately universal.

Similar to the case of classical computation, we can define the following

complexity class:

Definition

(

BQP

)

The complexity class

BQP

(bounded error, quantum

polynomial time) is the class of all decision problems computable with polynomial

quantum circuits with at least 2/3 probability of being correct.

We can show that

BQP

is independent of choice of approximately universal

gate set. This is not as trivial as the classical case, since when we switch to

a different set, we cannot just replace a gate with an equivalent circuit — we

can only do so approximately, and we have to make sure we control the error

appropriately to maintain the bound of 2/3.

We will consider

BQP

to be the feasible computations with quantum com-

putations.

It is also a fact that

BPP

is a subset of

BQP

. This, again, is not a trivial

result. In a quantum computation, we act by unitary matrices, which are

invertible. However, boolean functions in classical computing are not invertible

in general. So there isn’t any straightforward plug-in replacement.

However, it turns out that for any classical computation, there is an equivalent

computation that uses reversible/invertible boolean gates, with a modest (i.e.

polynomial) overhead of both space and time resources. Indeed, let

→ B

be a boolean function. We consider the function

f : B

m+n

→ B

m+n

(x, y) 7→ (x, y ⊕ f (x)),

where

⊕

is the bitwise addition (i.e. addition in (

)

, e.g. 011

⊕

110 = 101).

We call x and y the input register and output register respectively.

Now if we set

= 0, then we get

(

) in the second component of

. So we

can easily obtain f from

f, and vice versa.

Lemma. For any boolean function f : B

→ B

, the function

f : B

m+n

→ B

m+n

(x, y) 7→ (x, y ⊕ f (x)),

is invertible, and in fact an involution, i.e. is its own inverse.

Proof.

Simply note that

x ⊕ x

= 0 for any

, and bitwise addition is associative.

So we can just consider boolean functions that are invertible. There is an

easy way of making this a unitary matrix.

Lemma.

Let

→ B

be a reversible permutation of

-bit strings. Then

the linear map on C

defined by

A : |xi 7→ |g(x)i

on k qubits is unitary.

Proof.

This is because the

th column of the matrix of

is in fact

A |xi

|g(x)i

and since g is bijective, the collection of all |g(x)i are all orthonormal.

Thus, given any

→ B

, we get the

-qubit unitary matrix

denoted by U

, given by

|xi|yi = |xi|y ⊕ f (x)i.

In particular, if we set |yi = |0 ···0i, then we get

|xi|0 ···0i |xi|f(x)i

which we can use to evaluate f(x).

What does quantum computation give us? Our gate

is unitary, and in

particular acts linearly on the states. So by linearity, we have

√

|xi|0 ···0i

√

|xi|f(x)i

Now one run of

gives us this state that embodies all exp onentially many

values of

(

)’s. Of course, we still need to figure out how we can extract useful

information from this mixed state, and we will figure that out later on.

While the state

|ψi =

√

|xi

has exponentially many terms, it can be made in polynomial (and in fact linear)

time by n applications of H. Indeed, recall that H is given by

|0i

√

(|0i + |1i)

So for an n-qubit state, we have

|0i···|0i

√

(|0i + |1i) ···(|0i + |1i)

H⊗···⊗H

and expanding the right hand side gives us exactly what we want.

3 Some quantum algorithms

3.1 Balanced vs constant problem

We are going to come to our first quantum algorithm.

Here our computational task is a bit special. Instead of an input

, ··· , i

∈

, we are given a black box /oracle that computes some

→ B

. We may

have some a priori promise on

, and we want to determine some property of

the function f . The only access to f is querying the oracle with its inputs.

The use of

(classical) or

(quantum) counts as one step of computation.

The query complexity of this task is the least number of times the oracle needs

to be queried. Usually, we do not care much about how many times the other

gates are used.

Obviously, if we just query all the values of the function, then we can

determine anything about the function, since we have complete information. So

the goal is to see if we can do it with fewer queries.

The problem we are going to look at is the balanced vs constant problem.

The input black box is a function

→ B

. The promise is that

is either

(i) a constant function, i.e. f(x) = 0 for all x, or f(x) = 1 for all x; or

(ii) a balanced function, i.e. exactly half of the values in B

are sent to 1.

We want to determine if f is (i) or (ii) with certainty.

Classically, if we want to find the answer with certainty, in the worst case

scenario, we will have to perform 2

n−1

+ 1 queries — if you are really unlucky,

you might query a balanced function 2

n−1

times and get 0 all the time, and you

can’t distinguish it from a constant 0 function.

Quantumly, we have the Deutsch-Jozsa algorithm, that answers the question

in 1 query!

A trick we are going to use is something known as “phase kickback”. Instead

of encoding the result as a single bit, we encode them as

signs, i.e. as phases

of the quantum bits. The “kickback” part is about using the fact that we have

|ai(e

iθ

|bi) = (e

iθ

|ai) |bi,

So we might do something to

|bi

to give it a phase, and then we “kick back” the

phase on |ai, and hopefully obtain something when we measure |ai.

Recall that we have

|xi|yi = |xi|y ⊕ f (x)i.

Here |xi has n qubits, and |yi has 1 qubit.

The non-obvious thing to do is to set the output register to

|αi =

|0i − |1i

= H |1i = HX |0i.

We then note that U

acts by

|xi



|0i − |1i



7→ |xi

|f(x)i − |1 ⊕f(x)i

√

(

|xi

|0i−|1i

√

if f (x) = 0

|xi

|1i−|0i

√

if f (x) = 1

= (−1)

f(x)

|xi|αi.

Now we do this to the superposition over all possible x:

√

|xi|αi 7→



√

(−1)

f(x)

|xi



|αi.

So one query gives us

|ξ

i =

√

(−1)

f(x)

|xi.

The key observation now is simply that if

is constant, then all the signs are

the same. If

is balanced, then exactly half of the signs are + and

−

. The

crucial thing is that

|ξ

const

is orthogonal to

|ξ

balanced

. This is a good thing,

since orthogonality is something we can p erfectly distinguish with a quantum

measurement.

There is a slight technical problem here. We allow only measurements in the

standard

|0i

|1i

basis. So we need want to “rotate” our states to the standard

basis. Fortunately, recall that

|0i···|0i

√

|xi

H⊗···⊗H

Now recall that H is self-inverse, so

. Thus, if we apply H

⊗ ··· ⊗

H to

√

|xi, then we obtain |0i···|0i.

We write

|η

i = H ⊗··· ⊗H |ξ

Since H is unitary, we still have

|η

const

i ⊥ |η

balanced

Now we note that if f is constant, then

const

= ±|0i···|0i.

If we look at what

|η

balanced

is, it will be a huge mess, but it doesn’t really

matter — all that matters is that it is perpendicular to |0i···|0i.

Now when we measure

, if

is a constant function, then we obtain 0

···

with probability 1. If it is balanced, then we obtain something that is not 0 with

probability 1. So we can determine the result with probability 1.

|0i

X H

input

output

measure

discard

This uses exactly one query, with 1 + (

+ 1) +

(

) elementary gates

and measurements.

What if we tolerate error in the balanced vs constant problem? In other

words, we only require that the answer is correct with probability 1

− ε

with

0 < ε <

In the quantum case, nothing much changes, since we are probably not going

to do better than 1 query. However, we no longer have a huge benefit over

classical algorithms. There is a classical randomized algorithm with

(

log

/ε

))

queries, and in particular does not depend on n.

Indeed, we do it the obvious way — we choose some

K x

-values uniformly at

random from

, say

, ··· , x

(where

is fixed and determined later). We

then evaluate f(x

), ··· , f (x

If all the outputs are the same, then we say

is constant. If they are not

the same, then we say f is balanced.

actually is constant, then the answer is correct with probability 1. If

is balanced, then each

(

) is 0 or 1 with equal probability. So the probability

of getting the same values for all x

= 2

1−K

This is our failure probability. So if we pick

K > log

(ε

−1

) + 1,

then we have a failure probability of less than ε.

Can we decide every yes/no question about

→ B

’s by quantum

algorithms with “a few” queries? The answer is no. One prominent example is

the SAT problem (satisfiability problem) — given an arbitrary

, we want to

determine if there an

such that

(

) = 1? It can be shown that any quantum

algorithm (even if we allow for bounded errors) needs at least

(

√

) queries,

which is achieved by Grover’s algorithm. Classically, we need

) queries. So

we have achieved a square root speedup, which is good, but not as good as the

Deutsch-Jozsa algorithm.

In any case, the Deutsch-Jozsa algorithm demonstrates how we can achieve

an exponential benefit with quantum algorithms, but it happ ens only when

we have no error tolerance. In real life scenario, external factors will lead to

potential errors anyway, and requiring that we are always correct is not a sensible

requirement.

There are other problems where quantum algorithms are better:

Example

(Simon’s algorithm)

The Simon’s problem is a promise problem ab out

→ B

with provably exponential separation between classical (

n/4

))

and quantum (

(

)) query complexity even with bounded error. The details

are on the first example sheet.

3.2 Quantum Fourier transform and periodicities

We’ve just seen some nice examples of benefits of quantum algorithms. However,

oracles are rather unnatural problems — it is rare to just have a black-box access

to a function without knowing anything else about the function.

How about more “normal” problems? The issue with trying to compare

quantum and classical algorithms for “normal” problems is that we don’t actually

have any method to find the lower bound for the computation complexity. For

example, while we have not managed to find polynomial prime factorization

algorithms, we cannot prove for sure that there isn’t any classical algorithm that

is polynomial time. However, for the prime factorization problem, we do have a

quantum algorithm that does much better than all known classical algorithms.

This is Shor’s algorithm, which relies on the toolkit of the quantum Fourier

transform.

We start by defining the quantum Fourier transform.

Definition

(Quantum Fourier transform mod

)

Suppose we have an

dimensional state space with basis

|0i, |1i, ··· , |N − 1i

labelled by

Z/N Z

. The

quantum Fourier transform mod N is defined by

QFT : |ai 7→

√

N−1

b=0

2πiab/N

|bi.

The matrix entries are

[QFT]

√

, ω = e

2πi/N

where

a, b

= 0

, ··· , N −

1. We write

QFT

for the quantum Fourier transform

mod n.

Note that we are civilized and start counting at 0, not 1.

We observe that the matrix

√

NQFT is

(i) Symmetric

(ii) The first (i.e. 0th) row and column are all 1’s.

(iii)

Each row and column is a geometric progression 1

, r, r

, ··· , r

n−1

, where

r = ω

for the kth row or column.

Example.

If we look at

QFT

, then we get our good old H. However,

QFT

not H ⊗ H.

Proposition. QFT is unitary.

Proof. We use the fact that

1 + r + ··· + r

N−1

(

1−r

r 6= 1

N r = 1

So if r = ω

, then we get

1 + r + ··· + r

N−1

(

0 k 6≡ 0 mod N

N k ≡ 0 mod N

Then we have

(QFT

†

QFT)

√

−ik

(j−i)k

(

1 i = j

0 i 6= j

We now use the quantum Fourier Transform to solve the periodicity problem.

Example.

Suppose we are given

Z/N Z → Y

(for some set

). We are

promised that f is periodic with some period r | N, so that

f(x + r) = f (x)

for all x. We also assume that f is injective in each period, so that

0 ≤ x

6= x

≤ r − 1 implies f(x

) 6= f (x

The problem is to find

, with any constant level of error 1

− ε

independent of

N. Since this is not a decision problem, we can allow ε >

In the classical setting, if

is viewed as an oracle, then

(

√

) queries are

necessary and sufficient. We are going to show that quantumly,

(

log log N

)

queries with

(

poly

(

log N

)) processing steps suffice. In later applications, we

will see that the relevant input size is

log N

, not

. So the classical algorithm is

exponential time, while the quantum algorithm is polynomial time.

Why would we care about such a problem? It turns out that later we will

see that we can reduce prime factorization into a periodicity problem. While

we will actually have a very explicit formula for

, there isn’t much we can do

with it, and treating it as a black box and using a slight modification of what

we have here will be much more efficient than any known classical algorithm.

The quantum algorithm is given as follows:

(i)

Make

√

N−1

x=0

|xi

. For example, if

= 2

, then we can make this using

⊗ ··· ⊗

H. If

is not a power of 2, it is not immediately obvious how

we can make this state, but we will discuss this problem later.

(ii) We make one query to get

|fi =

√

|xi|f(x)i.

(iii)

We now recall that

r | N

, Write

, so that

is the number of

periods. We measure the second register, and we will see some

(

We let

be the least

with

(

) =

, i.e. it is in the first period. Note

that we don’t know what x

is. We just know what y is.

By periodicity, we know there are exactly

values of

such that

(

) =

namely

, x

+ r, x

+ 2r, ··· , x

+ (A − 1)r.

By the Born rule, the first register is collapsed to

|peri =





√

A−1

j=0

+ jri





|f(x

)i.

We throw the second register away. Note that

is chosen randomly from

the first period 0, 1, ··· , r − 1 with equal probability.

What do we do next? If we measure

|peri

, we obtain a random

-value, so

what we actually get is a random element (

th) of a random p eriod (

th),

namely a uniformly chosen random number in 0

, ··· , N

. This is not too

useful.

(iv)

The solution is the use the quantum Fourier transform, which is not sur-

prising, since Fourier transforms are classically used to extract perio dicity

information.

Apply QFT

to |peri now gives

QFT

|peri =

√

n−1

j=0

N−1

y=0

+jr)y

|yi

√

N−1

y=0





n−1

j=0

jry





|yi

We now see the inner sum is a geometric series. If

= 1, then this sum

is just A. Otherwise, we have

A−1

j=0

jry

1 − ω

1 − 1

1 − ω

= 0.

So we are left with

QFT

|peri =

r−1

k=0

kN/r





Note that before the Fourier transform, the random shift of

lied in the

label

+ jri

. After the Fourier transform, it is now encoded in the phase

instead.

(v)

Now we can measure the label, and we will get some

which is a multiple

, where 0

≤ k

≤ r −

1 is chosen uniformly at random. We rewrite

this equation as

We know

, because we just measured it, and

is a given in the question.

Also,

is randomly chosen, and

is what we want. So how do we extract

that out?

If by good chance, we have

coprime to

, then we can cancel

C/N

lowest terms and read off

as the resulting denominator

˜r

. Note that

cancelling

C/N

to lowest terms can be done quickly by Euclidean algorithm.

But how likely are we to be so lucky? We can just find some number theory

book, and figure out that the number of natural numbers

< r

that are

coprime to

grows as

(

r/ log log r

). More precisely, it is

∼ e

−γ

r/ log log r

where γ is the other Euler’s constant. We note that



log log r



> O



log log N



So if

is chosen uniformly and randomly, the probability that

coprime to r is at least O(1/ log log N).

Note that if

is not coprime with

, then we have

˜r | r

, and in particular

˜r < r

. So we can check if

˜r

is a true period — we compute

(0) and

(

˜r

and see if they are the same. If

˜r

is wrong, then they cannot be equal as

is injective in the perio d.

While the probability of getting a right answer decreases as

N → ∞

, we just

have to do the experiment many times. From elementary probability, if an

event has some (small) success probability

, then given any 0

−ε <

for

−

log ε

trials, the probability that there is at least one success is

− ε

. So if we repeat the quantum algorithm

(

log log N

) times, and

check

˜r

each time, then we can get a true

with any constant level of

probability.

(vi)

We can further improve this process — if we have obtained two attempts

˜r, ˜r

, then we know

is at least their least common multiple. So we can

in fact achieve this in constant time, if we do a bit more number theory.

However, the other parts of the algorithm (e.g. cancelling

C/N

down to

lowest terms) still use time polynomial in

log N

. So we have a polynomial

time algorithm.

There is one thing we swept under the carpet. We need to find an efficient

way of computing the quantum Fourier transform, or else we just hid all our

complexity in the quantum Fourier transform.

In general, we would expect that a general unitary operations on

qubits

needs

exp

(

) elementary circuits. However, the quantum Fourier transform is

special.

Fact. QFT

can be implemented by a quantum circuit of size O(n

The idea of the construction is to mimic the classical fast Fourier transform.

An important ingredient of it is:

Fact. The state

QFT

|xi =

n/2

−1

y=0

|yi

is in fact a product state.

We will not go into the details of implementation.

We can generalize the periodicity problem to arbitrary groups, known as the

hidden subgroup problem. We are given some oracle for

G → Y

, and we are

promised that there is a subgroup

H < G

such that

is constant and distinct on

cosets of

. We want to find

(we can make “find” more precise in two

ways — we can either ask for a set of generators, or provide a way of sam pling

uniformly from H).

In our case, we had G = (Z/N Z, +), and our subgroup was

H = {0, r, 2r, ··· , (A − 1)r}.

Unfortunately, we do not know how to do this efficiently for a group in general.

3.3 Shor’s algorithm

All that was a warm up for Shor’s algorithm. This is a quantum algorithm that

factorizes numbers in polynomial time. The crux of the algorithm will be a

modified version of the quantum Fourier transform.

The precise statement of the problem is as follows — given an integer

with

log N

digits, we want to find a factor 1

< K < N

. Shor’s algorithm will

achieve this with constant probability (1

− ε

) in

(

) time. The best known

classical algorithm is e

O(n

1/3

(log n)

2/3

)

To do this, we will use the periodicity algorithm. However, there is one

subtlety involved. Instead of working in

Z/nZ

, we need to work in

. Since

computers cannot work with infinitely many numbers, we will have to truncate

it somehow. Since we have no idea what the period of our function will be, we

must truncate it randomly, and we need to make sure we can control the error

introduced by the truncation.

We shall now begin. Given an

, we first choose some 1

< a < N

uniformly

randomly, and compute

hcf

(

a, N

). If it is not equal to 1, then we are finished.

Otherwise, by Euler’s theorem, there is a least power

such that

≡

mod N

. The number

is called the order of

mod

. It follows that the

function

Z → Z/NZ

given by

(

) =

mod N

has period

, and is injective

in each period.

Note that

(

) can be efficiently computed in

poly

(

log k

) time, by repeated

squaring. Also note that classically, it is hard to find

, even though

has a

simple formula!

It was known to Legendre in 1800 that knowing

means we can factor

Suppose we can find r, and further suppose r is even. Then we have

− 1 ≡ (a

r/2

+ 1)(a

r/2

− 1) ≡ 0 (mod N).

exactly divides the product. By minimality of

, we know

does not

divide

r/2

−

1. So if

does not divide

r/2

+ 1 as well, then

hcf

(

N, a

r/2

are non-trivial factors of N.

For this to work, we needed two assumptions –

is even, and

r/2

6≡ −

(

mod N

). Fortunately, there is a theorem in number theory that says if

odd and not a prime power, and

is chosen uniformly at random, then the

probability that these two things happen is at least

. In fact, it is

≥

−

m−1

where m is the number of prime factors of N.

So if we repeat this

times, the probability that they all fail to give a factor

is less than

. So this can be as small as we wish.

What about the other possibilities? If

is even, then we would have noticed

by looking at the last digit, and we can just write down 2. If

for

c, ` >

then there is a classical polynomial time algorithm that outputs

, which is a

factor. So these are the easy cases.

Everything we’ve done so far is classical! The quantum part comes in when

we want to compute

. We know that

(

) =

is periodic on

, which is an

infinite domain. So we cannot just apply our periodicity algorithm.

By number theory, we know that

is at most

. But other than that, we

have no idea what

actually is, nor do we know of any multiple of

. So we

cannot apply the periodicity argument directly. Instead, we pick a big number

, and work on the domain

{

, ··· ,

−

}

. How do we

choose

? The idea is that we want 0

, ··· ,

−

1 to contain

full periods,

plus some extra “corrupt” noise b, so

= Br + b,

with 0

≤ b < r

. Since we want to separate out the periodicity information from

the corrupt noise, we will want

to be relatively small, compared to

. We

know the size of

is bounded by

, hence by

. So we need 2

to be “much

larger” than

. It turns out picking 2

> N

is enough, and we will pick

be the smallest number such that this holds.

We now study the effect of corruption on the periodicity algorithm. We again

make the state

|fi =

√

|xi|f(x)i.

and measure the value of f. We then get

|peri =

√

A−1

k=0

+ kri,

where

+ 1, depending on whether

≤ b

or not. As before, we

apply QFT

to obtain

QFT

|peri =

−1

c=0

f(c) |ci.

When we did this before, with an exact period, most of the

(

) is zero. However,

this time things are a bit more messy. As before, we have

f(c) =

√

[1 + α + ··· + α

A−1

], α = e

2πicr/2

The important ques tion is, when we measure this, which

’s will we see with

“good probability”? With exact periodicity, we knew that

is an exact

integer. So

(

) = 0 except when

is a multiple of

. Intuitively, we can think

of this as interference, and we had totally destructive and totally constructive

interference respectively.

In the inexact case, we will get constructive interference for those

such that

the phase

is close to 1. These are the

’s with

nearest to integers

, and

the powers up to

A−1

don’t spread too far around the unit circle. So we avoid

cancellations.

So we look at those special

’s having this particular property. As

increases

from 0 to 2

−

1, the angle

increments by

each time from 0 up to

. So

we have c

’s for each k = 0, 1, ··· , r − 1 such that



− k



In other words, we have



− k



So the c

are the integers nearest to the multiples of 2

/r.

(

), the

’s corresponding to the

’s have the smallest phases, i.e. nearest

to the positive real axis. We write

= k + ξ,

where

k ∈ Z, |ξ| <

Then we have

= exp



2πi



= exp (eπi(k + ξ)n) = exp(2πiξn)

Now for

n < A

, we know that

ξn| < π

, and thus 1

, α, α

, ··· , α

A−1

all lie in

the lower half plane or upper half plane.

Doing all the algebra needed, we find that if

QFT |peri

is measured, then for

any c

as above, we have

Prob(c

) >

where

γ =

≈ 0.4.

Recall that in the exact periodicity case, the points

hit the integers exactly,

and instead of γ we had 1. The distribution of the c’s then look like:

With inexact periods, we obtain something like

Now how do we get r from a c

? We know



−



m+1

We claim that there is at most 1 fraction

with denominator

< N

such that

this inequality holds. So this inequality does uniquely determine k/r.

Indeed, suppose

and

both work. Then we have



−



r − r

However, we also have



−



≤



−



−



So it follows that we must have

We introduce the notion of a “good”

value, which is when

is coprime to

r. The probability of getting a good c

is again

O(1/ log log r) > O(1/ log log N).

Note that this is the same rate as the case of exact periodicity, since we have

only lost a constant factor of

! If we did have such a

, then now

is uniquely

determined.

However, there is still the problem of finding

from a good

value. At this

point, this is just classical number theory.

We can certainly try all

with

< r

< N

and find the closest one to

, but there are

(

) fractions to try, but we want a

(

poly

(

log N

))

algorithm. Indeed, if we were to do it this way, we might as well try all numbers

less than N and see if they divide N . And this is just O(N)!

The answer comes from the nice theory of continued fractions. Any rational

number

< 1 has a continued fraction expansion

+ ···

Indeed to do this, we simply write

where we divide

to get

, and then put

. We then keep

going on with

. Since the numbers

, t

keep getting smaller, it follows that

this process will eventually terminate.

Since it is very annoying to type these continued fractions in L

X, we often

write the continued fraction as

= [a

, a

, ··· , a

We define the kth convergent of

to be

= [a

, a

, ··· , a

There are some magic results from number theory that gives us a simple recur-

rence relation for the convergents.

Lemma. For a

, a

, ··· , a

any positive reals, we set

= 0 q

= 1

= 1 q

= a

We then define

= a

k−1

+ p

k−2

= a

k−1

+ q

k−2

Then we have

(i) We have

, ··· , a

] =

(ii) We also have

k−1

− p

k−1

= (−1)

In particular, p

and q

are coprime.

From a bit more number theory, we find that

Fact.

s < t

are

-bit integers, then the continued fraction has length

(

and all convergents

can be computed in O(m

) time.

More importantly, we have the following result:

Fact. Let 0 < x < 1 be rational, and supp ose

is rational with



x −



Then

is a convergent of the continued fraction of x.

Then by this theorem, for a good

, we know

must be a convergent of

So we compute all convergents find a (unique) one whose denominator is less

than

and is within

. This gives us the value of

, and we are done.

In fact, this last classical part is the slowest part of the algorithm.

Example.

Suppose we want to factor

= 39. Suppose the random

we chose

= 7

39, which is coporime to

. Let

be the period of

(

) = 7

mod

39.

We notice

1024 = 2

< N

= 1621 < 2

= 2048.

So we pick m = 11. Suppose the measurement of QFT

|peri yeilds c = 853.

By the theory, this has a constant probability (approximately 0

4) to satisfy



853

−



m+1

We also have a probability of

/ log log r

) to have

and

coprime. In this

case, c is indeed “goo d”. So there is a unique

satisfying



853

2048

−



So to find

, we do the continued fraction expansion of

853

2048

. We have

853

2048

853

2 +

342

853

2 +

853

342

2 +

169

342

= ··· = [2, 2, 2, 42, 4].

We can then compute the convergents

[2] =

[2, 2] =

[2, 2, 2] =

[2, 2, 2, 42] =

212

509

[2, 2, 2, 42, 4] =

853

2048

Of all these numbers, only

is within

853

2048

and whose denominator is

less than N = 39.

If we do not assume k and r are coprime, then the possible

are

If we assume that

are coprime, then r = 12. Indeed, we can try that

≡ 1 (mod 39).

So we now know that

39 | (7

+ 1)(7

− 1).

We now hope/expect with probability

exactly that it goes partly into each

factor. We can compute

+ 1 = 117650 ≡ 26 (mod 39)

− 1 = 117648 ≡ 24 (mod 39)

We can then compute

hcf(26, 39) = 13, hcf(24, 39) = 3 (mod 39).

We see that 3 and 13 are factors of 39.

3.4 Search problems and Grover’s algorithm

We are now going to turn our attention to search problems. These are very

important problems in computing, as we can formulate almost all problems as

some sort of search problems.

One important example is simultaneous constraint satisfaction. Here we have

a large configuration space of options, and we want to find some configuration

that satisfies some constraints. For example, when designing a lecture timetable

for Part III courses, we need to schedule the courses so that we don’t clash

two popular courses in the same area, and the courses need to have big enough

lecture halls, and we have to make sure a lecturer doesn’t have to simultaneously

lecture two courses at the same time. This is very complicated.

In general, search problems have some common features:

(i)

Given any instance of solution attempt, it is easy to check if it is good or

not.

(ii) There are exponentially many possible instances to try out.

One example is the boolean satisfiability problem, which we have already

seen before.

Example

(Boolean satisfiability problem)

The boolean satisfiability problem

(SAT ) is as follows — given a Boolean formula

→ B

, we want to know if

there is a “satisfying argument”, i.e. if there is an x with f (x) = 1.

This has complexity class

, standing for non-deterministic polynomial

time. There are many ways to define

, and here we will provide two. The

first definition of NP will involve the notion of a verifier:

Definition (Verifier). Suppose we have a language L ⊆ B

∗

, where

∗

[

n∈N

is the set of all bit strings.

A verifier for L is a computation V (w, c) with two inputs w, c such that

(i) V halts on all inputs.

(ii) If w ∈ L, then for some c, V (w, c) halts with “accept”.

(iii) If w 6∈ L, then for all c, V (w, c) halts with “reject”.

A polynomial time verifier is a

that runs in polynomial time in

|w|

(not

|w|+ |c|!).

We can think of

as “certificate of membership”. So if you are a member,

you can exhibit a certificate of membership that you are in there, and we can

check if the certification is valid. However, if you are not a member, you cannot

“fake” a certificate.

Definition

(Non-deterministic polynomial time problem)

. NP

is the class of

languages that have polynomial time verifiers.

Example.

The SAT problem is in

. Here

is the satisfying argument, and

V (f, c) just computes f(c) and checks whether it is 1.

Example.

Determining if a number is composite is in

, where a certificate

is a factor of the number.

However, it is not immediately obvious that testing if a numb er is prime is

. It is an old result that it indeed is, and recent progress shows that it is

in fact in P.

It is rather clear that

P ⊆ NP

. Indeed, if we can check membership in

polynomial time, then we can also construct a verifier in polynomial time that

just throws the certificate away and check directly.

There is another model of

, via non-deterministic computation. Recall

that in probabilistic computation, in some steps, we had to pick a random

number, and picking a different number would lead to a different “branch”. In

the case of non-deterministic computation, we are allowed to take all paths at

the same time. If some of the paths end up being accepting, then we accept the

input. If all paths reject, then we reject the input. Then we can alternatively

say a problem is in

if there is a polynomial-time non-deterministic machine

that checks if the string is in the language.

It is not difficult to see that these definitions of

are equivalent. Suppose

we have a non-deterministic machine that checks if a string is in the language.

Then we can construct a verifier whose certificate is a prescription of which

particular branch we should follow. Then the verifier just takes the prescription,

follows the path described and see if we end up being accepted.

Conversely, if we have a verifier, we can construct a non-deterministic machine

by testing a string on all possible certificates, and check if any of them accepts.

Unfortunately, we don’t know anything about how these different complexity

classes compare. We clearly have

P ⊆ BPP ⊆ BQP

and

P ⊆ NP

. However,

we do not know if these inclusions are strict, or how

compares to the others.

Unstructured search problem and Grover’s algorithm

Usually, when we want to search something, the search space we have is structured

in some way, and this greatly helps our searching problem.

For example, if we have a phone book, then the names are ordered alpha-

betically. If we want to find someone’s phone number, we don’t have to look

through the whole book. We just open to the middle of the book, and see if the

person’s name is before or after the names on the page. By one lookup like this,

we have already eliminated half of the phone bo ok we have to search through,

and we can usually very quickly locate the name.

However, if we know someone’s phone number and want to figure out their

name, it is pretty much hopeless! This is the problem with unstructured data!

So the problem is as follows: we are given an unstructured database with

= 2

items and a unique good item (or no good items). We can query any

item for good or bad-ness. The problem is to find the good item, or determine if

one exists.

Classically,

(

) queries are necessary and sufficient. Even if we are asking

for a right result with fixed probability

, if we pick items randomly to check,

then the probability of seeing the “good” one in

queries is given by

k/N

. So

we still need O(N) queries for any fixed probability.

Quantumly, we have Grover’s algorithm. This needs

(

√

) queries, and

this is both necessary and sufficient.

The database of

= 2

items will be considered as an oracle

→ B

it is promised that there is a unique

∈ B

with

(

) = 1. The problem is to

find x

. Again, we have the quantum version

|xi|yi = |xi|y ⊗ f (x)i.

However, we’ll use instead I

on n qubits given by

|xi =

(

|xi x 6= x

−|xi x 6= x

This can be constructed from

as we’ve done before, and one use of

can

be done with one use of U

|si I

|si

|0i−|1i

We can write I

= I − 2 |x

ihx

where I is the identity operator.

We are now going to state the Grover’s algorithm, and then later prove that

it works.

For convenience, we write

= H ⊗··· ⊗ H

| {z }

n times

We start with a uniform superposition

|ψ

i = H

|0 ···0i =

√

all x

|xi.

We consider the Grover iteration operator on n qubits given by

Q = −H

Here running

requires one query (whereas

is “free” because it is just

I − 2 |0ih0|).

Note that all these operators are all real. So we can pretend we are living

in the real world and have nice geometric pictures of what is going on. We let

P(x

) be the (real) plane spanned by |x

i and |ψ

i. We claim that

(i) In this plane P(x

), this operator Q is a rotation by 2α, where

sin α =

√

= hx

|ψ

(ii) In the orthogonal complement P(x

)

⊥

, we have Q = −I.

We will prove these later on. But if we know this, then we can repeatedly apply

|ψ

to rotate it near to

, and then measure. Then we will obtain

with very high probability:

P(x

)

|ψ

The initial angle is

cos β = hx

|ψ

i =

√

So the number of iterations needed is

cos

−1

(1/

√

2 sin

−1

(1/

√

2α

In general, this is not an integer, but applying a good integer approximation to it

will bring us to very close to

, and thus we measure

with high probability.

For large n, the number of iterations is approximately

π/2

√

Example. Let’s do a boring example with N = 4. The initial angle satisfies

cos β =

√

So we know

β =

Similarly, we have

2α = 2 sin

−1

So 1 iteration of

will rotate

|ψ

exactly to

, so we can find it with certainty

with 1 lookup.

Now we prove that this thing actually works. In general, for any unitary

and I

|ψi

= I − 2 |ψihψ|, we have

|ψi

†

= U IU

†

− 2U |ψihψ|U

†

= I

U|ψi

In particular, since

is self-adjoint, i.e.

†

, and that by definition

|0i

|ψ

i, we know

Q = −H

= −I

|ψ

Next we note that for any |ψi and |ξi, we know by definition

|ψi

|ξi = |ξi −2 |ψihψ|ξi.

So this modifies |ξi by some multiple of |ψi. So we know our operator

Q|ψi = −I

|ψ

|ψi

modifies

|ψi

first by some multiple of

, then by some multiple of

. So if

|ξi ∈ P(x

), then Q|ψi ∈ P(x

) too! So Q preserves P(x

We know that

is a unitary, and it is “real”. So it must be a rotation or

a reflection, since these are the only things in O(2). We can explicitly figure

out what it is. In the plane

(

), we know

is reflection in the mirror line

perpendicular to

. Similarly,

|ψ

is reflection in the mirror line perpendicular

to |ψ

We now use the following facts about 2D Euclidean geometry:

(i)

is a reflection in mirror

along

|Mi

, then

−R

is reflection in mirror

⊥

along



⊥



To see this, we know any vector can be written as a |Mi + b



⊥



. Then

sends this to

a |Mi − b



⊥



, while

−R

sends it to

−a |Mi



⊥



and this is reflection in



⊥



(ii) Suppose we have mirrors M

and M

making an angle of θ:

Then reflection in

then reflection in

is the same as rotating coun-

terclockwise by 2θ.

So we know

Q = −I

|ψ

is reflection in



⊥



then reflection in



⊥⊥



|ψ

. So this is a rotation by 2

where α is the angle between



⊥



and |ψ

i, i.e.

sin α = cos β = hx

|ψ

To prove our second claim that

acts as

−1

(

)

⊥

, we simply note that if

|ξi ∈ P(x

)

⊥

, then |ξi ⊥ |ψi

and ξ ⊥ |x

i. So both I

and I

|ψ

fix |ξi.

In fact, Grover’s algorithm is the best algorithm we can achieve.

Theorem.

Let

be any quantum algorithm that solves the unique search

problem with probability 1

− ε

(for any constant

), with

queries. Then

at least O(

√

N). In fact, we have

T ≥

(1 − ε)

√

So Grover’s algorithm is not only optimal in the growth rate, but in the

constant as well, asymptotically.

Proof is omitted.

Further generalizations

Suppose we have multiple good items instead, say

of them. We then replace

with I

, where

|xi =

(

−|xi x good

|xi x bad

We run the same algorithm as before. We let

|ψ

good

i =

√

x good

|xi.

Then now

is a rotation through 2

in the plane spanned by

|ψ

good

and

|ψ

with

sin α = hψ

good

|ψ

i =

So for large N , we need

π/2

r/N

i.e. we have a

√

reduction over the unique case. We will prove that these

numbers are right later when we prove a much more general result.

What if we don’t know what

is? The above algorithm would not work,

because we will not know when to stop the rotation. However, there are some

tricks we can do to fix it. This involves cleverly picking angles of rotation at

random, and we will not go into the details.

3.5 Amplitude amplification

In fact, the techniques from Grover’s algorithm is completely general. Let

any subspace (“good” subspace) of the state space

, and

⊥

be its orthogonal

complement (“bad” subspace). Then

H = G ⊕ G

⊥

Given any normalized vector

|ψi ∈ H

, we have a unique decomposition with

real, non-negative coefficients

|ψi = sin θ |ψ

i + cos θ |ψ

such that

|ψ

i ∈ G, |ψ

i ∈ G

⊥

are normalized.

We define the reflections

|ψi

= I − 2 |ψihψ|, I

= I − 2P,

where P is the projection onto G given by

P =

|bihb|

for any orthonormal basis {|bi} of G. This P satisfies

P |ψi =

(

|ψi |ψi ∈ G

0 |ψi ∈ G

⊥

We now define the Grover operator

Q = −I

Theorem

(Amplitude amplification thoerem)

In the 2-dimensional subspace

spanned by |ψ

i and |ψi (or equivalently by |ψ

i and |ψ

i), where

|ψi = sin θ |ψ

i + cos θ |ψ

we have that Q is rotation by 2θ.

Proof. We have

|ψ

i = −|ψ

i, I

|ψ

i = |ψ

Q|ψ

i = I

|ψ

i, Q|ψ

i = −I

|ψ

We know that

= I − 2 |ψihψ|.

So we have

Q|ψ

i = I

|ψ

= |ψ

i − 2(sin θ |ψ

i + cos θ |ψ

i)(sin θ)

= (1 −2 sin

θ) |ψ

i − 2 sin θ cos θ |ψ

= cos 2θ |ψ

i − sin 2θ |ψ

Q|ψ

i = −I

|ψ

= −|ψ

i + 2(sin θ |ψ

i + cos θ |ψ

i)(cos θ)

= 2 sin θ cos θ |ψ

i + (2 cos

θ − 1) |ψ

= sin 2θ |ψ

i + cos 2θ |ψ

So this is rotation by 2θ.

If we iterate this

times, then we have rotated by 2

nθ

, but we started at

from the |ψ

i direction. So we have

|ψi = sin(2n + 1)θ |ψ

i + cos(2n + 1)θ |ψ

If we measure Q

|ψi for good versus bad, we know

P(good) = sin

(2n + 1)θ,

and this is a maximum, when (2n + 1)θ =

, i.e.

n =

4θ

−

For a general

, we know that

is not a n integer. So we use

the nearest

integer to

4θ

−

, which is approximately

4θ

= O(θ

−1

) = O(1/ sin θ) = O



kgood projection of |ψik



Example.

Suppose we want to do a Grover search for

good items in

objects.

We start with

|ψi =

√

all x

|xi =





√

good x

|xi





N − r

√

N − r

bad x

|xi

Then G is the subspace spanned by the good x’s, and

sin θ =

So Q is a rotation by 2θ, where

θ = sin

−1

≈

for r  N. So we will use O(

r/N) operations.

Example.

Let

be any quantum circuit on start state

|0 ···0i

. Then the final

state is

A |0 ···0i

. The good states are the desired computational outcomes. For

example, if

is Shor’s algorithm, then the desired outcomes might be the good

c-values. We can write

A |0 ···0i = a |ψ

i + b |ψ

The probability of a success in one run is

|a|

. So we normally need

/|a|

)

repetitions of A to succeed with a given constant probability 1 − ε.

Instead of just measuring the result and hoping for the best, we can use

amplitude amplification. We assume we can check if

is good or bad, so we can

implement I

. We consider

|ψi = A |0 ···0i.

Then we define

Q = −I

A|0···0i

= −AI

|0···0i

†

Here we can construct

†

just by reversing the gates in

. So all parts are

implementable.

By amplitude amplification, Q is rotation by 2θ, where sin θ = |a|. So after

n ≈

4θ

= O(|a|

−1

)

repetitions,

A |0 ···0i

will be rotated to very near to

|ψ

, and this will succeed

with high probability. This gives us a square root speedup over the naive method.

4 Measurement-based quantum computing

In this chapter, we are going to look at an alternative model of quantum

computation. This is rather weird. Instead of using unitary gates and letting

them act on state, we prepare a generic starting state known as a graph state, then

we just keep measuring them. Somehow, by cleverly planning the measurements,

we will be able to simulate any quantum computation in the usual sense with

such things.

We will need a bunch of new notation.

Notation. We write

|±

i =

√

(|0i ± e

−iα

|1i).

In particular, we have

|±

i = |±i =

√

(|0i ± |1i)

Then

B(α) = {|+

i, |−

is an orthonormal basis. We have 1-qubit gates

J(α) =

√



1 e

iα

1 −e

iα



= HP(α),

where

H =

√



1 1

1 −1



, P(α) =



1 0

0 e

iα



We also have the “Pauli gates”

X =



0 1

1 0



, Z =



1 0

0 −1



= P(π)

We also have the 2-qubit gates

E = CZ = diag(1, 1, 1, −1).

We also have 1-qubit measurements

(α) = measurement of qubit i in basis B(α).

The outcome |+

i is denoted 0 and the outcome |−

i is denoted 1.

We also have M

(Z), which is measurement of qubit

in the standard basis

{|0i, |1i}.

Finally, we have the notion of a graph state. Suppose we have an undirected

graph

= (

V, E

) with vertices

and edges

with no self-loops and at most

one edge between two vertices, we can define the graph state

|ψ

that is a state

|V |

qubits as follows: for each vertex

i ∈ V

, introduce a qubit

|+i

. For each

edge

i → j

, we apply

(i.e.

operating on the qubits

and

). Since all

these E

commute, the order does not matter.

Example. If G

0 1

then we have

|ψ

i = E

|+i

[|00i + |01i + |10i − |11i],

and this is an entangled state.

If G

0 1 2

then we have

|ψ

i = E

|+i

A cluster state is a graph state |ψ

i for G being a rectangular 2D grid.

The main result of measurement-based quantum computation is the following:

Theorem.

Let

be any quantum circuit on

qubits with a sequence of

gates

, ··· , U

(in order). We have an input state

|ψ

, and we perform

Z-measurements on the output states on specified qubits

, ··· , i

to obtain

a k-bit string.

We can always simulate the pro cess as follows:

(i)

The starting resource is a graph state

|ψ

, where

is chosen depending

on the connectivity structure of C.

(ii)

The computational steps are 1-qubit measurements of the form M

(

), i.e.

measurement in the basis

(

). This is adaptive —

may depend on the

(random) outcomes s

, s

, ··· of previous measurements.

(iii)

The computational pro cess is a prescribed (adaptive) sequence M

(

(α

), ···, M

(α

), where the qubit labels i

, i

, ··· , i

all distinct.

(iv)

To obtain the output of the process, we perform further measurements

M(Z) on

specified qubits not previously measured, and we get results

, ··· , s

, and finally the output is obtained by further (simple) classical

computations on s

, ··· , s

as well as the previous M

(α) outcomes.

The idea of the last part is that the final measurement

, ··· , s

has to be

re-interpret in light of the results M

(α

This is a funny process, because the result of each measurement

(

) is

uniformly random, with probability

for each outcome, but somehow we can

obtain useful information by doing adaptive measurements.

We now start the process of constructing such a system. We start with the

following result:

Fact. The 1-qubit gates J(α) with E

i,i±1

is a universal set of gate.

In particular, any 1-qubit U is a product of 3 J’s.

We call these E

i,i±1

nearest neighbour E

’s.

Proof. This is just some boring algebra.

So we can assume that our circuit

’s gates are all of the form J(

)’s or E

and it suffices to try to implement these gates in our weird system.

The next result we need is what we call the J-lemma:

Lemma (J-lemma). Given any 1-qubit state |ψi, consider the state

(|ψi

|+i

Suppose we now measure M

(

), and supp os e the outcome is

∈ {

}

. Then

after measurement, the state of 2 is

J(α) |ψi.

Also, two outcomes

= 0

1 always occurs with probability

, regardless of the

values of |ψib, α.

Proof. We just write it out. We write

|ψi = a |0i + b |1i.

Then we have

(|ψi

|+i

) =

√

(a |0i|0i + a |0i|1i + b |1i|0i + b |1i|1i)

√

(a |0i|0i + a |0i|1i + b |1i|0i −b |1i|1i)

So if we measured 0, then we would get something proportional to

(|ψi

|+i

) =

(a |0i + a |1i + be

iα

|0i − be

iα

|1i)



1 e

iα

1 −e

iα





as required. Similarly, if we measured 1, then we get XJ(α) |ψi.

We will usually denote processes by diagrams. In this case, we started with

the graph state

|ψi |+i

and the measurement can be pictured as

|ψi |+i

If we measure Z, we denote that by

In fact, this can be extended if 1 is just a single qubit part of a larger multi-qubit

system 1S, i.e.

Lemma. Suppose we start with a state

|ψi

= |0i

|ai

+ |1i

|bi

We then apply the

-lemma process by adding a new qubit

|+i

for 2

6∈ S

, and

then query 1. Then the resulting state is

(α) |ψi

So the J-lemma allows us to simulate J-gates with measurements. But we

want to do many J gates. So we need the concatenation lemma:

Lemma

(Concatenation lemma)

If we concatenate the process of J-lemma on a

row of qubits 1

, ···

to apply a sequence of J(

) gates, then all the entangling

operators E

, E

, ··· can be done first before any measurements are applied.

It is a fact that for any composite quantum system

A ⊗ B

, any local actions

(unitary gates or measurements) done on

always commutes with anything

done on

, which is easy to check by expanding out the definition. So the proof

of this is simple:

Proof.

For a state

|ψi

|+i

···

, we can look at the sequence of J-processes

in the sequence of operations (left to right):

(α

) ···

It is then clear that each E

commutes with all the measurements before it. So

we are safe.

We can now determine the measurement-based quantum computation process

corresponding to a quantum circuit

of gates

, U

, ··· , U

with each

either

a J(

) or a nearest-neighbour E

. We may wlog assume the input state to

|+i···|+i

as any 1-qubit product state may be written as

|ψi = U |+i

for suitable

, which is then represented as at most three J(

)’s. So we simply

prefix C with these J(α) gates.

Example. We can write |ji for j = 0, 1 as

|ji = X

H |+i.

We also have

H = J(0), X = J(π)J(0).

So the idea is that we implement these J(

) gates by the J-processes we just

described, and the nearest-neighbour E

gates will just be performed when we

create the graph state.

We first do a simple example:

Example. Consider the circuit C given by

|+i

J(α

)

J(α

)

J(α

)

where the vertical line denotes the

operators. At the end, we measure the

outputs i

, i

by M(Z) measurements.

We use the graph state

In other words, we put a node for a

|+i

, horizontal line for a J(

) and a vertical

line for an E.

If we just measured all the qubits for the

-process in the order

, α

and then finally read off the final results i

, i

then we would have effected the circuit

|+i

J(α

)

J(α

)

J(α

)

Now the problem is to get rid of the

’s. We know each

comes with

probability

. So the probability of them all not appearing is tiny for more

complicated circuits, and we cannot just rely on pure chance for it to turn out

right.

To deal with the unwanted

“errors”, we want to commute them out to

the end of the circuit. But they do not commute, so we are going to use the

following commutation relations:

J(α)X = e

iα

ZJ(−α)

In other words, up to an overall phase, the following diagrams are equivalent:

J(α)

is equivalent to

J(−α)

More generally, we can write

(α)X

= e

−iαs

((−1)

α)

(α)Z

= X

(α)

= Z

= X

Here the subscripts tell us which qubit the gates are acting on.

The last one corresponds to

is equivalent to

All of these are good, except for the first one where we have a funny phase and

the angle is negatived. The phase change is irrelevant because it doesn’t affect

measurements, but the sign changes are problematic. To fix this, we need to use

adaptive measurements.

Example. Consider the simpler 1-qubit circuit

|+i J(α

) J(α

)

We first prepare the graph sate

We now measure the first qubit to get

We have thus done

|+i J(α

)

To deal with the unwanted X

, we note that

J(α

)

is equivalent to

J((−1)

)

So we adapt the sign of the second measurement angle to depend on the previous

measurement result:

(−1)

Then this measurement results in

J(α

)

J((−1)

)

which is equivalent to

J(α

) J(α

)

If we had further J-gates, we need to commute both Z

and X

over.

Note that while we are introducing a lot of funny X’s and Z’s, these are all

we’ve got, and the order of applying them does not matter, as they anti-commute:

XZ = −ZX.

So if we don’t care about the phase, they effectively commute.

Also, since X

= Z

, we only need to count the number of X’s and Z’s

mod 2, which is very helpful.

Now what do we do with the Z and X at the end? For the final Z-measurement,

having moved everything to the end, we simply reinterpret the final, actual

Z-measurement result j:

(i)

The Z-gate do es not affect outcome or probability of a Z-measurement,

becasuse if

|ψi = a |0i + b |1i,

then

Z |ψi = a |0i − b |1i.

So the probabilities of |0i and |1i are |a|

and |b|

regardless.

(ii)

The X gate simply interchanges the labels, while leavining probabilities

the same , because if

|ψi = a |0i + b |1i,

then

X |ψi = a |1i + b |0i.

So we ignore all Z-errors, and for each X

error, we just modify the seen

measurement outcome j by j 7→ j ⊕ r.

If we actually implement measurement-based quantum computations, the

measurements can always be done “left to right”, implementing the gates in order.

However, we don’t have to do that. Recall that quantum operations on disjoint

qubits always commute. Since the only thing we are doing are measurements,

all

(

) measurements can be performed simultaneously if the angles

not depend on other measurements. This gives us a novel way to parallel a

computation.

For example, in our simple example, we can start by first measuring

and

, and then measuring

after we know

. In particular, we can first measure

the “answer”

, before we do any other thing! The remaining measurements just

tell us how we should interpret the answer.

In general, we can divide the measurements into “layers” — the first layer

consists of all measurements that do not require any adaptation. The second

layer then consists of the measurements that only depends on the first layer. The

logical depth is the least number of layers needed, and this somewhat measures

the complexity of our circuit.

5 Phase estimation algorithm

We now describe a quantum algorithm that estimates the eigenvalues of a unitary

operator. Suppose we are given a unitary operator

and an eigenstate

Then we can write

U |V

i = e

2πiϕ

with 0 ≤ ϕ < 1. Our objective is to estimate ϕ to n binary bits of precision:

ϕ ≈ 0.i

···i

+ ··· +

We will need the controlled U

gate c - U

for integers k, defined by

c - U

|0i|ξi = |0i|ξi

c - U

|1i|ξi = |1iU

|ξi,

where |0i, |1i are 1-qubit states, and |ξi is a general d-dimensional register.

Note that we have

i = e

2πikϕ

and we have

c - U

= (c - U )

Note that if we are given

as a formula or a circuit description, then we can

readily implement c

- U

by adding control to each gate. However, if

is a

quantum black-box, then we need further information. For example, it suffices to

have an eigenstate

|αi

with known eigenvalue

iα

. However, we will not bother

ourselves with that, and just assume that we can indeed implement it.

In fact, we will use a “generalized” controlled U given by

|xi|ξi 7→ |xiU

|ξi,

where |xi has n qubits. We will make this from c - U

= (c - U )

as follows: for

x = x

n−1

···x

= x

+ 2

+ ··· + 2

n−1

we write c - U

for the controlled U

controlled by i. Then we just construct

···U

n−1

Now if input |ξi = |v

i, then we get

2πiϕx

|xi|v

To do phase estimation, we superpose the above over all

= 0

, ··· ,

n−1

and use |ξi = |v

i. So we construct our starting state by

|si = H ⊗··· ⊗H |0 ···0i =

√

all x

|xi.

Now if we apply the generalized control U, we obtain

√

2πiϕx

|xi

| {z }

|Ai

Finally, we apply the inverse Fourier transform

QFT

−1

|Ai

and measure to

see y

, y

, ··· , y

n−1

on lines 0, 1, ··· , n −1. Then we simply output

0.y

···y

n−1

+ ··· +

n−1

···y

n−1

as the estimate of ϕ.

Why does this work? Suppose

actually only had

binary digits. Then we

have

ϕ = 0.z

······z

n−1

where z ∈ Z

. Then we have

|Ai =

√

2πixz/2

|xi,

which is the Fourier transform of

|zi

. So the inverse Fourier transform of

|Ai

exactly |Zi and we get ϕ exactly with certainty.

If ϕ has more than n bits, say

ϕ = 0.z

···z

n−1

n+1

··· ,

then we have

Theorem.

If the measurements in the above algorithm give

, y

, ··· , y

and

we output

θ = 0.y

···y

n−1

then

(i) The probability that θ is ϕ to n digits is at least

(ii) The probability that |θ − ϕ| ≥ ε is at most O(1/(2

ε)).

The proofs are rather boring and easy algebra.

So for any fixed desired accuracy

, the probability to fail to get

to this

accuracy falls exponentially with n.

Note that if c

- U

is implemented as (c

- U

)

, then the algorithm would

need

1 + 2 + 4 + ··· + 2

n−1

= 2

n−1

many c

- U

gates. But for some special

’s, this c

- U

can be implem ented in

polynomial time in k.

For example, in Kitaev’s factoring algorithm, for

hcf

(

a, N

) = 1, we will use

the function

U : |mi 7→ |am mod Ni.

Then we have

|mi =



which we can implement by repeated squaring.

Now what if we didn’t have an eigenstate to being with? If instead of

we used a general input state |ξi, then we can write

|ξi =





where





= e

2πiϕ





Then in the phase estimation algorithm, just before the final measurement, we

have managed to get ourselves

|0 ···0i|ξi →

|ϕ





Then when we measure, we get one of the

’s (or an approximation of it) with

probability

. Note that this is not some average of them. Of course, we

don’t know which one we got, but we still get some meaningful answer.

Quantum counting

An application of this is the quantum counting problem. Given

→ B

with k good x’s, we want to estimate the number k.

Recall the Grove iteration operator

is rotation through 2

in a 2-

dimensional plane spanned by

|ψ

i =

√

|xi

and its good projection, and θ is given by

sin θ ≈ θ =

Now the eigenvalues of this rotation in the plane are

2iθ

, e

−2iθ

So either eigenvalue will suffice to get k.

We will equivalently write

i2θ

= e

2πiϕ

with

0 ≤ ϕ < 1.

Then ±2θ is equivalent to ϕ or 1 −ϕ, where ϕ is small.

Now we don’t have an eigenstate, but we can start with any state in the

plane,

|ψ

. We then do phase estimation with it. We will then get either

− ϕ

with some probabilities, but we don’t mind which one we get, since we

can obtain one from the other, and we can tell them apart because ϕ is small.

6 Hamiltonian simulation

So. Suppose we did manage to invent a usable quantum computer. What would

it be goo d for? Grover’s algorithm is nice, but it seems a bit theoretical. You

might say we can use Shor’s algorithm to c rack encryption, but then if quantum

computers are available, then no one would be foolish enough to use encryption

that is susceptible to such attacks. So what can we actually do?

One useful thing would be to simulate physical systems. If we have a quantum

system and with n qubits, then we want to simulate its evolution over time, as

governed by Schr¨odinger’s equation. Classically, a

-qubit system is specified by

complex numbers, so we would expect any such algorithm to have performance

at best

). However, one would imagine that to simulate a quantum

-qubit

system in a quantum computer, we only need

-qubits! Indeed, we will see that

we will be able to simulate quantum systems in polynomial time in n.

In a quantum physical system, the state of the system is given by a state

|ψi

, and the evolution is governed by a Hamiltonian

. This is a self-adjoint

(Hermitian) operator, and in physics, this represents the energy of the system.

Thus, we have

hψ|H |ψi = average value obtained in measurement of energy.

The time evolution of the particle is given by the Schr¨odinger equation

|ψ(t)i = −iH |ψ(t)i.

We’ll consider only time-independent Hamiltonians

(

) =

. Then the solution

can be written down as

|ψ(t)i = e

−iHt

|ψ(0)i.

Here e

−iHt

is the matrix exponential given by

= I + A +

+ ··· ,

Thus, given a Hamiltonian

and a time

, we want to simulate

(

) =

−iHt

to suitable approximations.

Before we begin, we note the following useful definitions:

Definition (Operator norm). The operator norm of an operator A is

kAk = max

k|ψik=1

kA |ψik.

If A is diagonalizable, then this is the maximum eigenvalue of A.

The following properties are easy to see:

Proposition.

kA + Bk ≤ kAk + kBk

kABk ≤ kAkkBk.

We now begin. There will be a slight catch in what we do. We will have

to work with special Hamiltonians known as

-local Hamiltonians for a fixed

Then for this fixed

, the time required to simulate the system will be polynomial

. However, we should not expect the complexity to grow nicely as we increase

So what is a

-local Hamiltonian? This is a Hamiltonian in which each

interaction governed by the Hamiltonian only involves

qubits. In other words,

this Hamiltonian can be written as a sum of operators, each of which only

touches

qubits. This is not too bad a restriction, because in real life, most

Hamiltonians are indeed local, so that if each qubit represents a particle, then

the behaviour of the particle will only be affected by the particles near it.

Definition (k-local Hamiltonian). We say a Hamiltonian H is k-local (for k a

fixed constant) on n qubits if

H =

j=1

where each

acts on at most

qubits (not necessarily adjacent), i.e. we can

write

⊗ I,

where

acts on some k qubits, and I acts on all other qubits as the identity.

The number m of terms we need is bounded by

m ≤





= O(n

which is polynomial in n.

Example. The Hamiltonian

H = X ⊗I ⊗ I − Z ⊗I ⊗ Y

is 2-local on 3 qubits.

We write M

(i)

to denote the operator M acting on the ith qubit.

Example. We could write

X ⊗ I ⊗ I = X

(1)

Example

(Ising model)

The Ising model on an

n × n

square lattice of qubits

is given by

H = J

n−1

i,j=1

(i,j)

i,j+1

+ Z

(i,j)

(i+1,j)

Example (Heisenberg model). The Heisenberg model on a line is given by

H =

n−1

i=1

(i)

(i+1)

+ J

(i)

(i+1)

+ J

(i)

(i+1)

where J

, J

and J

are real constants.

This is useful in modelling magnetic system.

The idea is that we simulate each

separately, and then put them together.

However, if {H

} doesn’t commute, then in general

−i

−iH

So we need to somehow solve this problem. But putting it aside, we can start

working on the quantum simulation problem.

We will make use of the following theorem:

Theorem (Solovay-Kitaev theorem). Let U be a unitary operator on k qubits

and

any universal set of quantum gates. Then

can be approximated to

within ε using O(log

) from S, where c < 4.

In other words, we can simulate each

−iH

with very modest overhead in

circuit size for improved error, assuming we fix k.

Proof. Omitted.

We will also need to keep track of the accumulation of errors. The following

lemma will be useful:

Lemma. Let {U

} and {V

} be sets of unitary operators with

− V

k ≤ ε.

Then

···U

− V

···V

k ≤ mε.

This is remarkable!

Proof.

See example sheet 2. The idea is that unitary gates preserve the size of

vectors, hence do not blow up errors.

We start by doing a warm-up: we solve the easy case where the terms in the

Hamiltonian commute.

Proposition. Let

H =

j=1

be any k-local Hamiltonian with commuting terms.

Then for any t, e

−iHt

can be approximated to within ε by a circuit of



m poly



log





gates from any given universal set.

Proof.

We pick

, and approximate

−iH

to within

. Then the total

error is bounded by mε

= ε, and this uses



m poly



log





gates.

We now do the full non-commutative case. To do so, we need to keep track

of how much e

−iH

differs from e

i(H

Notation. For a matrix X, we write

X + O(ε)

for X + E with kEk = O(ε).

Then we have

Lemma

(Lie-Trotter product formula)

Let

A, B

be matrices with

kAk, kBk ≤

K < 1. Then we have

−iA

−iB

= e

−i(A+B)

+ O(K

Proof. We have

−iA

= 1 −iA +

∞

k=2

(iA)

= I − iA + (iA)

∞

k=0

(−iA)

(k + 2)!

We notice that

(

)

k ≤ K

, the final sum has norm bounded by

< e

. So

we have

−iA

= I − iA + O(K

Then we have

−iA

−iB

= (I − iA + O(K

))(I − iB + O(K

))

= I − i(A + B) + O(K

)

= e

−i(A+B)

+ O(K

Here we needed the fact that

Bk ≤

(

) and

kABk ≤ K

O(K

We now apply this repeatedly to accumulate sums

, H

, .., H

in the

exponent. First of all, we note that if each

k < K

, then

···

k < `K

We want this to be

1 for all

` ≤ m

. So for now, we assume

K <

. Also, we

take t = 1 for now. Then consider

−iH

···e

−iH

= (e

−i(H

)

+ O(K

))e

−iH

···e

−iH

= e

−i(H

)

−iH

···e

−iH

+ O(K

)

= e

−i(H

)

−iH

···e

−iH

+ O((2K)

) + O(K

)

= e

−i

+ O(m

where we used the fact that

+ 2

+ ··· + m

= O(m

We write the error as Cm

This is fine if

is super small, but it won’t be in general. For general

and t values, we introduce a large N such that



≤

K < 1.

In other words, we divide time up into small

intervals. We then try to simulate

U = e

−i(H

+···+H



−i

(

+...+

)



This equality holds because we know that

(

···

) commutes with

itself (as does everything in the world).

We now want to make sure the final error for

< ε

. So we know each

term

−i

(

+...+

)

needs to be approximated to

. So using our previous

formula, we want that

Doing some algebraic manipulation, we find that we need

N >

We now have Nm gates of the form e

t/N

. So the circuit size is at most



(Kt)



Recall for

-qubits, a general

-local Hamiltonian has

(

). So the circuit

size is

|C| = O



(Kt)



Now this is in terms of the number of

t/N

gates. If we want to express this

in terms of universal gates, then each gate needs to be approximated to

(

ε/|C|

We then need

(

log

(

|C|

)) gates for each, for some

c <

4. So we only get a

modest extra multiplicative factor in |C|.

Note that for a fixed

with a variable

, then a quantum process

−iHt

runs

in time

, but our simulation needs time

(

). This can be improved to

(

1+δ

)

for any δ > 0 by using “better” Lie-Trotter expansions.

Local Hamiltonian ground state problem

There are many other things we might want to do with a

-local Hamiltonian.

One question we might be interested in is the eigenvalues of

. Suppose we are

given a 5-local Hamiltonian

H =

i=1

qubits (this was the original result proved). We suppose

k <

1, and we

are given two numbers

a < b

, e.g.

and

. We are promised that the

smallest eigenvalue

< a

> b

. The problem is to decide whether

< a.

The reason we have these funny

a, b

is so that we don’t have to worry about

precision problems. If we only had a single

and we want to determine if

> a

< a

, then it would be difficult to figure out if

happens to b e very close

to a.

Kitaev’s theorem says that the above problem is complete for a complexity

class known as

QMA

, i.e. it is the “hardest” problem in

QMA

. In other words,

any problem in QMA can be translated into a local Hamiltonian ground state

problem with polynomial overhead. A brief survey can be found on arXiv:quant-

ph/0210077.

What is this

QMA

? We will not go into details, but roughly, it is a quantum

version of

. In case you are wondering, MA stands for Merlin and Arthur. . .