Part III Advanced Quantum Field Theory
Based on lectures by D. B. Skinner
Notes taken by Dexter Chua
Lent 2017
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Quantum Field Theory (QFT) provides the most profound description of Nature we
currently possess. As well as being the basic theoretical framework for describing
elementary particles and their interactions (excluding gravity), QFT also plays a major
role in areas of physics and mathematics as diverse as string theory, condensed matter
physics, topology and geometry, astrophysics and cosmology.
This course builds on the Michaelmas Quantum Field Theory course, using techniques
of path integrals and functional methods to study quantum gauge theories. Gauge
Theories are a generalisation of electrodynamics and form the backbone of the Standard
Mo del our best theory encompassing all particle physics. In a gauge theory, fields
have an infinitely redundant description; we can transform the fields by a different
element of a Lie Group at every point in space-time and yet still describe the same
physics. Quantising a gauge theory requires us to eliminate this infinite redundancy.
In the path integral approach, this is done using tools such as ghost fields and BRST
symmetry. We discuss the construction of gauge theories and their most important
observables, Wilson Loops. Time permitting, we will explore the possibility that a
classical symmetry may be broken by quantum effects. Such anomalies have many
imp ortant consequences, from constraints on interactions between matter and gauge
fields, to the ability to actually render a QFT inconsistent.
A further major component of the course is to study Renormalization. Wilson’s
picture of Renormalisation is one of the deepest insights into QFT it explains
why we can do physics at all! The essential point is that the physics we see depends
on the scale at which we look. In QFT, this dependence is governed by evolution
along the Renormalisation Group (RG) flow. The course explores renormalisation
systematically, from the use of dimensional regularisation in p erturbative loop integrals,
to the difficulties inherent in trying to construct a quantum field theory of gravity. We
discuss the various possible behaviours of a QFT under RG flow, showing in particular
that the coupling constant of a non-Abelian gauge theory can effectively become small
at high energies. Known as ”asymptotic freedom”, this phenomenon revolutionised our
understanding of the strong interactions. We introduce the notion of an Effective Field
Theory that describes the low energy limit of a more fundamental theory and helps
parametrise possible departures from this low energy approximation. From a modern
perspective, the Standard Model itself appears to be but an effective field theory.
Pre-requisites
Knowledge of the Michaelmas term Quantum Field Theory course will be assumed.
Familiarity with the course Symmetries, Fields and Particles would be very helpful.
Contents
0 Introduction
0.1 What is quantum field theory
0.2 Building a quantum field theory
1 QFT in zero dimensions
1.1 Free theories
1.2 Interacting theories
1.3 Feynman diagrams
1.4 An effective theory
1.5 Fermions
2 QFT in one dimension (i.e. QM)
2.1 Quantum mechanics
2.2 Feynman rules
2.3 Effective quantum field theory
2.4 Quantum gravity in one dimension
3 Symmetries of the path integral
3.1 Ward identities
3.2 The Ward–Takahashi identity
4 Wilsonian renormalization
4.1 Background setting
4.2 Integrating out modes
4.3 Correlation functions and anomalous dimensions
4.4 Renormalization group flow
4.5 Taking the continuum limit
4.6 Calculating RG evolution
5 Perturbative renormalization
5.1 Cutoff regularization
5.2 Dimensional regularization
5.3 Renormalization of the φ
4
coupling
5.4 Renormalization of QED
6 Non-abelian gauge theory
6.1 Bundles, connections and curvature
6.2 Yang–Mills theory
6.3 Quantum Yang–Mills theory
6.4 Faddeev–Popov ghosts
6.5 BRST symmetry and cohomology
6.6 Feynman rules for Yang–Mills
6.7 Renormalization of Yang–Mills theory
0 Introduction
0.1 What is quantum field theory
What is Quantum Field Theory? The first answer we might give is it’s just a
quantum version of a field theory (duh!). We know the world is described by
fields, e.g. electromagnetic fields, and the world is quantum. So naturally we
want to find a quantum version of field theory. Indeed, this is what drove the
study of quantum field theory historically.
But there are other things we can use quantum field theory for, and they
are not so obviously fields. In a lot of condensed matter applications, we can
use quantum field theory techniques to study, say, vibrations of a crystal, and
its quanta are known as “phonons”. More generally, we can use quantum field
theory to study things like phase transitions (e.g. boiling water). This is not so
obvious a place you think quantum field theory will happen, but it is. Some
people even use quantum field theory techniques to study the spread of diseases
in a population!
We can also use quantum field theory to study problems in mathematics!
QFT is used to study knot invariants, which are ways to assign labels to different
types of knots to figure out if they are the same. Donaldson won a fields medal
for showing that there are inequivalent differentiable structures on
R
4
, and this
used techniques coming from quantum Yang–Mills theory as well.
These are not things we would traditionally think of as quantum field theory.
In this course, we are not going to be calculating, say, Higgs corrections to
γγ-interactions. Instead, we try to understand better this machinery known as
quantum field theory.
0.2 Building a quantum field theory
We now try to give a very brief outline of how one does quantum field theory.
Roughly, we follow the steps below:
(i)
We pick a space to represent our “universe”. This will always be a manifold,
but we usually impose additional structure on it.
In particle physics, we often pick the manifold to be a 4-dimensional
pseudo-Riemannian manifold of signature +
−−
. Usually, we in
fact pick (M, g) = (R
4
, η) where η is the usual Minkowski metric.
In condensed matter physics, we often choose (
M, g
) = (
R
3
, δ
), where
δ is the flat Euclidean metric.
In string theory, we have fields living on Riemann surface Σ (e.g.
sphere, torus). Instead of specifying a metric, we only specify the
conformal equivalence class [
g
] of the metric, i.e. the metric up to a
scalar factor.
In QFT for knots, we pick
M
to be some oriented 3-manifold, e.g.
S
3
,
but with no metric.
In this course, we will usually take (
M, g
) = (
R
d
, δ
) for some
d
, where
δ is again the flat, Euclidean metric.
We might think this is a very sensible and easy choice, because we are
all used to working in (
R
d
, δ
). However, mathematically, this space is
non-compact, and it will lead to a lot of annoying things happening.
(ii)
We pick some fields. The simplest choice is just a function
φ
:
M R
or
C
. This is a scalar field. Slightly more generally, we can also have
φ : M N for some other manifold N . We call N the target space.
For example, quantum mechanics is a quantum field theory. Here we
choose
M
to be some interval
M
=
I
= [0
,
1], which we think of as time,
and the field is a map φ : I R
3
. We think of this as a path in R
3
.
φ
In string theory, we often have fields
φ
: Σ
N
, where
N
is a Calabi–Yau
manifold.
In pion physics,
π
(
x
) describes a map
φ
: (
R
4
, η
)
G/H
, where
G
and
H
are Lie groups.
Of course, we can go beyond scalar fields. We can also have fields with
non-zero spin such as fermions or gauge fields, e.g. a connection on a
principal
G
-bundle, as we will figure out later. These are fields that carry
a non-trivial representation of the Lorentz group. There is a whole load of
things we can choose for our field.
Whatever field we choose, we let
C
be the space of all field configurations,
i.e. a point
φ C
represents a picture of our field across all of
M
. Thus
C
is some form of function space and will typically be infinite dimensional.
This infinite-dimensionality is what makes QFT hard, and also what makes
it interesting.
(iii)
We choose an action. An action is just a function
S
:
C R
. You tell
me what the field looks like, and I will give you a number, the value of
the action. We often choose our action to be local, in the sense that we
assume there exists some function L(φ, φ, ···) such that
S[φ] =
Z
M
d
4
x
g L(φ(x), φ(x), ···).
The physics motivation behind this choice is obvious we don’t want
what is happening over here to depend on what is happening at the far
side of Pluto. However, this assumption is actually rather suspicious, and
we will revisit this later.
For example, we have certainly met actions that look like
S[φ] =
Z
d
4
x
1
2
(φ)
2
+
m
2
2
φ
2
+
λ
4!
φ
4
,
and for gauge fields we might have seen
S[A] =
1
4
Z
d
4
x F
µν
F
µν
.
If we have a coupled fermion field, we might have
S[A, ψ] =
1
4
Z
d
4
x F
µν
F
µν
+
¯
ψ(
/
D + m)ψ.
But recall when we first encountered Lagrangians in classical dynamics,
we worked with lots of different Lagrangians. We can do whatever thing
we like, make the particle roll down the hill and jump into space etc, and
we get to deal with a whole family of different Lagrangians. But when we
come to quantum field theory, the choices seem to be rather restrictive.
Why can’t we choose something like
S[A] =
Z
F
2
+ F
4
+ cosh(F
2
) + ···?
It turns out we can, and in fact we must. We will have to work with
something much more complicated.
But then what were we doing in the QFT course? Did we just waste time
coming up with tools that just work with these very specific examples? It
turns out not. We will see that there are very good reasons to study these
specific actions.
(iv)
What do we compute? In this course, the main object we’ll study is the
partition function
Z =
Z
C
Dφ e
S[φ]/~
,
which is some sort of integral over the space of all fields. Note that the
minus sign in the exponential is for the Euclidean signature. If we are not
Euclidean, we get some i’s instead.
We will see that the factor of
e
S[φ]/~
means that as
~
0, the dominant
contribution to the partition function comes from stationary points of
S
[
φ
]
over
C
, and this starts to bring us back to the classical theory of fields.
The effect of
e
S[φ]/~
is to try to suppress “wild” contributions to
Z
, e.g.
where φ is varying very rapidly or φ takes very large values.
Heuristically, just as in statistical physics, we have a competition between
the two factors D
φ
and
e
S[φ]/~
. The action part tries to suppress crazy
things, but how well this happens depends on how much crazy things are
happening, which is measured by the measure D
φ
. We can think of this
Dφ as “entropy”.
However, the problem is, the measure D
φ
on
C
doesn’t actually exist!
Understanding what we mean by this path integral, and what this measure
actually is, and how we can actually compute this thing, and how this has
got to do with the canonical quantization operators we had previously, is
the main focus of the first part of this course.
1 QFT in zero dimensions
We start from the simplest case we can think of, namely quantum field theory in
zero dimensions. This might seem like an absurd thing to study the universe
is far from being zero-dimensional. However, it turns out this is the case where
we can make sense of the theory mathematically. Thus, it is important to study
0-dimensional field theories and understand what is going on.
There are two reasons for this. As mentioned, we cannot actually define the
path integral in higher dimensions. Thus, if we were to do this “properly”, we will
have to define it as the limit of something that is effectively a zero-dimensional
quantum field theory. The other reason is that in this course, we are not going
to study higher-dimensional path integrals “rigorously”. What we are going
to do is that we will study zero-dimensional field theories rigorously, and then
assume that analogous results hold for higher-dimensional field theories.
One drawback of this approach is that what we do in this section will have
little physical content or motivation. We will just assume that the partition
function is something we are interested in, without actually relating it to any
physical processes. In the next chapter, on one-dimensional field theories, we
are going see why this is an interesting thing to study.
Let’s begin. In
d
= 0, if our universe
M
is connected, then the only choice of
M
is
{pt}
. There is a no possibility for a field to have spin, because the Lorentz
group is trivial. Our fields are scalar, and the simplest choice is just a single
field
φ
:
{pt} R
, i.e. just a real variable. Similarly, we simply have
C
=
R
.
This is not an infinite-dimensional space.
The action is just a normal function
S
:
C
=
R R
of one real variable.
The path integral measure D
φ
can be taken to just be the standard (Lebesgue)
measure dφ on R. So our partition function is just
Z =
Z
R
dφ e
S(φ)/~
,
where we assume
S
is chosen so that this converges. This happens if
S
grows
sufficiently quickly as φ ±∞.
More generally, we may wish to compute correlation functions, i.e. we pick
another function f (φ) and compute the expectation
hf(φ)i =
1
Z
Z
dφ f(φ)e
S(φ)/~
.
Again, we can pick whatever
f
we like as long as the integral converges. In this
case,
1
Z
e
S(φ)/~
is a probability distribution on
R
, and as the name suggests,
hf
(
φ
)
i
is just the expectation value of
f
in this distribution. Later on, when we
study quantum field theory in higher dimensions, we can define more complicated
f
by evaluating
φ
at different points, and we can use this to figure out how the
field at different points relate to each other.
Our action is taken to have a series expansion in
φ
, so in particular we can
write
S(φ) =
m
2
φ
2
2
+
N
X
n=3
g
n
φ
n
n!
.
We didn’t put in a constant term, as it would just give us a constant factor in
Z
.
We could have included linear terms, but we shall not. The important thing is
that
N
has to be even, so that the asymptotic behaviour of
S
is symmetric in
both sides.
Now the partition function is a function of all these terms:
Z
=
Z
(
m
2
, g
n
).
Similarly,
hfi
is again a function of
m
2
and
g
n
, and possibly other things used
to define f itself.
Note that nothing depends on the field, because we are integrating over all
possible fields.
1.1 Free theories
We consider the simplest possible QFT. These QFT’s are free, and so
S
(
φ
) is at
most quadratic. Classically, this implies the equations of motions are linear, and
so there is superposition, and thus the particles do not interact.
Let φ : {pt} R
n
be a field with coordinates φ
a
, and define
S(φ) =
1
2
M(φ, φ) =
1
2
M
ab
φ
a
φ
b
,
where
M
:
R
n
× R
n
R
is a positive-definite symmetric matrix. Then the
partition function Z(M) is just a Gaussian integral:
Z(M) =
Z
R
n
d
n
φ e
1
2~
M(φ,φ)
=
(2π~)
n/2
det M
.
Indeed, to compute this integral, since
M
is symmetric, there exists an orthogonal
transformation
O
:
R
n
R
n
that diagonalizes it. The measure d
n
φ
is invariant
under orthogonal transformations. So in terms of the eigenvectors of
M
, this
just reduces to a product of
n
1D Gaussian integrals of this type, and this is a
standard integral:
Z
dχ e
2
/2~
=
r
2π~
m
.
In our case,
m >
0 runs over all eigenvalues of
M
, and the product of eigenvalues
is exactly det M .
A small generalization is useful. We let
S(φ) =
1
2
M(φ, φ) + J(φ),
where
J
:
R
n
R
is some linear map (we can think of
J
as a (co)vector, and
also write
J
(
φ
) =
J · φ
).
J
is a source in the classical case. Then in this theory,
we have
Z(M, J) =
Z
R
n
d
n
φ exp
1
~
1
2
M(φ, φ) + J(φ)

.
To do this integral, we complete the square by letting
˜
φ
=
φ
+
M
1
J
. In other
words,
˜
φ
a
= φ
a
+ (M
1
)
ab
J
b
.
The inverse exists because
M
is assumed to be positive definite. We can now
complete the square to find
Z(M, J) =
Z
R
n
d
n
˜
φ exp
1
2~
M(
˜
φ,
˜
φ) +
1
2~
M
1
(J, J)
= exp
1
2~
M
1
(J, J)
Z
R
n
d
n
˜
φ exp
1
2~
M(
˜
φ,
˜
φ)
= exp
1
2~
M
1
(J, J)
(2π~)
n/2
det M
.
In the long run, we really don’t care about the case with a source. However, we
will use this general case to compute some correlation functions.
We return to the case without a source, and let
P
:
R
n
R
be a polynomial.
We want to compute
hP (φ)i =
1
Z(M)
Z
R
n
d
n
φ P (φ) exp
1
2~
M(φ, φ)
.
By linearity, it suffices to consider the case where P is just a monomial, so
P (φ) =
m
Y
i=1
(`
i
(φ)),
for
`
i
:
R
n
R
linear maps. Now if
m
is odd, then clearly
hP
(
φ
)
i
= 0, since
this is an integral of an odd function. When m = 2k, then we have
hP (φ)i =
1
Z(M)
Z
d
n
φ (`
i
· φ) ···(`
2k
· φ) exp
1
2~
M(φ, φ)
J · φ
~
.
Here we are eventually going to set
J
= 0, but for the time being, we will be
silly and put the source there. The relevance is that we can then think of our
factors `
i
· φ as derivatives with respect to J:
hP (φ)i =
(~)
2k
Z(M)
Z
d
n
φ
2k
Y
i=1
`
i
·
J
exp
1
2~
M(φ, φ)
J · φ
~
Since the integral is absolutely convergent, we can move the derivative out of
the integral, and get
=
(~)
2k
Z(M)
2k
Y
i=1
`
i
·
J
Z
d
n
φ exp
1
2~
M(φ, φ)
J · φ
~
= ~
2k
2k
Y
i=1
`
i
·
J
exp
1
2~
M
1
(J, J)
.
When each derivative `
i
·
J
acts on the exponential, we obtain a factor of
1
~
M
1
(J, `
i
).
in front. At the end, we are going to set
J
= 0. So we only get contributions if
and only if exactly half (i.e.
k
) of the derivatives act on the exponential, and the
other k act on the factor in front to get rid of the J.
We let
σ
denote a (complete) pairing of the set
{
1
, ··· ,
2
k}
, and Π
2k
be the
set of all such pairings. For example, if we have
k
= 2, then the possible pairings
are {(1, 2), (3, 4)}, {(1, 3), (2, 4)} and {(1, 4), (2, 3)}:
In general, we have
|Π
2k
| =
(2k)!
2
k
k!
,
and we have
Theorem (Wick’s theorem). For a monomial
P (φ) =
2k
Y
i=1
`
i
(φ),
we have
hP (φ)i = ~
k
X
σΠ
2k
Y
i∈{1,···,2k}
M
1
(`
i
, `
σ(i)
).
where the
{
1
, ··· ,
2
k}
says we sum over each pair
{i, σ
(
i
)
}
only once, rather
than once for (i, σ(i)) and another for (σ(i), i).
This is in fact the version of Wick’s theorem for this 0d QFT, and
M
1
plays
the role of the propagator.
For example, we have
h`
1
(φ)`
2
(φ)i = ~M
1
(`
1
, `
2
).
We can represent this by the diagram
1 2M
1
Similarly, we have
h`
1
(φ) ···`
4
(φ)i = ~
2
M
1
(`
1
, `
2
)M
1
(`
3
, `
4
)
+ M
1
(`
1
, `
3
)M
1
(`
2
, `
4
) + M
1
(`
1
, `
4
)M
1
(`
2
, `
3
)
.
Note that we have now reduced the problem of computing an integral for the
correlation function into a purely combinatorial problem of counting the number
of ways to pair things up.
1.2 Interacting theories
Physically interesting theories contain interactions, i.e.
S
(
φ
) is non-quadratic.
Typically, if we are trying to compute
Z
d
n
φ P (φ) exp
S(φ)
~
,
these things involve transcendental functions and, except in very special circum-
stances, is just too hard. We usually cannot perform these integrals analytically.
Naturally, in perturbation theory, we would thus want to approximate
Z =
Z
d
n
φ exp
S(φ)
~
by some series. Unfortunately, the integral very likely diverges if
~ <
0, as
usually
S
(
φ
)
as
φ ±∞
. So it can’t have a Taylor series expansion
around
~
= 0, as such expansions have to be valid in a disk in the complex plane.
The best we can hope for is an asymptotic series.
Recall that a series
X
n=0
f
n
(~)
is an asymptotic series for
Z
(
~
) if for any fixed
N N
, if we write
Z
N
(
~
) for
the first N terms on the RHS, then
lim
~0
+
|Z(~) Z
N
(~)|
~
N
0.
Thus as
~
0
+
, we get an arbitrarily good approximation to what we really
wanted from any finite number of terms. But the series will in general diverge if
try to fix
~ R
>0
, and include increasingly many terms. Most of the expansions
we do in quantum field theories are of this nature.
We will assume standard results about asymptotic series. Suppose
S
(
φ
) is
smooth with a global minimum at φ = φ
0
R
n
, where the Hessian
2
S
φ
a
φ
b
φ
0
is positive definite. Then by Laplace’s method/Watson’s lemma, we have an
asymptotic series of the form
Z(~) (2π~)
n/2
exp
S(φ
0
)
~
p
det
a
b
S(φ
0
)
1 + A~ + B~
2
+ ···
.
We will not prove this, but the proof is available in any standard asymptotic
methods textbook. The leading term involves the action evaluated on the
classical solution
φ
0
, and is known as the semiclassical term. The remaining
terms are called the quantum correction.
In Quantum Field Theory last term, the tree diagrams we worked with were
just about calculating the leading term. So we weren’t actually doing quantum
field theory.
Example. Let’s consider a single scalar field φ with action
S(φ) =
m
2
2
φ
2
+
λ
4!
φ
4
,
where
m
2
, λ >
0. The action has a unique global minimum at
φ
0
= 0. The
action evaluated at
φ
0
= 0 vanishes, and
2
S
=
m
2
. So the leading term in the
asymptotic expansion of Z(~, ) is
(2π~)
1/2
m
.
Further, we can find the whole series expansion by
Z(~, m, λ) =
Z
R
dφ exp
1
~
m
2
2
φ
2
+
λ
4!
φ
4

=
2~
m
Z
d
˜
φ exp
˜
φ
2
exp
4λ~
4!m
4
˜
φ
4
2~
m
Z
d
˜
φ e
˜
φ
2
N
X
n=0
1
n!
4λ~
4!m
4
n
˜
φ
4n
=
2~
m
N
X
n=0
4λ~
4!m
4
n
1
n!
Z
d
˜
φ e
˜
φ
2
˜
φ
4n
=
2~
m
N
X
n=0
4λ~
4!m
4
n
1
n!
Γ
2n +
1
2
.
The last line uses the definition of the Gamma function.
We can plug in the value of the Γ function to get
Z(~, m, λ)
2π~
m
N
X
n=0
~λ
m
4
n
1
(4!)
n
n!
(4n)!
4
n
(2n)!
=
2π~
m
1
~λ
8m
4
+
35
384
~
2
λ
2
m
8
+ ···
.
We will get to understand these coefficients much more in terms of Feynman
diagrams later.
Note that apart from the factor
2π~
m
, the series depends on (
~, λ
) only
through the product
~λ
. So we can equally view it as an asymptotic series in
the coupling constant
λ
. This view has the benefit that we can allow ourselves
to set ~ = 1, which is what we are usually going to do.
As we emphasized previously, we get an asymptotic series, not a Taylor series.
Why is this so?
When we first started our series expansion, we wrote
exp
(
S
(
φ
)
/~
) as a
series in
λ
. This is absolutely fine, as
exp
is a very well-behaved function when
it comes to power series. The problem comes when we want to interchange the
integral with the sum. We know this is allowed only if the sum is absolutely
convergent, but it is not, as the integral does not converge for negative
~
. Thus,
the best we can do is that for any
N
, we truncate the sum at
N
, and then do
the exchange. This is legal since finite sums always commute with integrals.
We can in fact see the divergence of the asymptotic series for finite (
~λ
)
>
0
from the series itself, using Stirling’s approximation. Recall we have
n! e
n log n
.
So via a straightforward manipulation, we have
1
(4!)
n
n!
(4n)!
4
n
(2n)!
e
n log n
.
So the coefficients go faster than exponentially, and thus the series has vanishing
radius of convergence.
We can actually plot out the partial sums. For example, we can pick
~
=
m
= 1,
λ
= 0
.
1, and let
Z
n
be the
n
th partial sum of the series. We can start
plotting this for n up to 40 (the red line is the true value):
0 5 10 15 20 25 30 35 40
2.46
2.48
2.5
2.52
2.54
n
Z
n
After the initial few terms, it seems like the sum has converged, and indeed
to the true value. Up to around
n
= 30, there is no visible change, and indeed,
the differences tend to be of the order
10
7
. However, after around 33, the
sum starts to change drastically.
We can try to continue plotting. Since the values become ridiculously large,
we have to plot this in a logarithmic scale, and thus we plot
|Z
n
|
instead. Do
note that for sufficiently large
n
, the actual sign flips every time we increment
n! This thing actually diverges really badly.
0 10 20 30 40 50 60
10
0
10
3
10
6
10
9
10
12
n
|Z
n
|
We can also see other weird phenomena here. Suppose we decided that we
have
m
2
<
0 instead (which is a rather weird thing to do). Then it is clear that
Z
(
m
2
, λ
) still exists, as the
φ
4
eventually dominates, and so what happens at
the
φ
2
term doesn’t really matter. However, the asymptotic series is invalid in
this region.
Indeed, if we look at the fourth line of our derivation, each term in the
asymptotic series will be obtained by integrating
e
˜
φ
2
˜
φ
4n
, instead of integrating
e
˜
φ
2
˜
φ
4n
, and each of these integrals would diverge.
Fundamentally, this is since when
m
2
<
0, the global minimum of
S
(
φ
) are
now at
φ
0
= ±
r
6m
2
λ
.
φ
S(φ)
φ
0
Our old minimum
φ
= 0 is now a (local) maximum! This is the wrong point to
expand about! Fields with
m
2
<
0 are called tachyons, and these are always
signs of some form of instability.
Actions whose minima occur at non-zero field values are often associated
with spontaneous symmetry breaking. Our original theory has a
Z/
2 symmetry,
namely under the transformation
x x
. But once we pick a minimum
φ
0
to expand about, we have broken the symmetry. This is an interesting kind of
symmetry breaking because the asymmetry doesn’t really exist in our theory.
It’s just our arbitrary choice of minimum that breaks it.
1.3 Feynman diagrams
We now try to understand the terms in the asymptotic series in terms of Feynman
diagrams. The easy bits are the powers of
~λ
m
4
n
.
This combination is essentially fixed by dimensional analysis. What we really
want to understand are the combinatorial factors, given by
1
(4!)
n
n!
×
(4n)!
4
n
(2n)!
.
To understand this, we can write the path integral in a different way. We have
Z(~, m, λ) =
Z
dφ exp
1
~
m
2
2
φ
2
+
λ
4!
φ
4

=
Z
dφ
X
n=0
1
n!
λ
4!~
n
φ
4n
exp
m
2
2~
φ
2
X
n=0
1
(4!)
n
n!
λ
~
n
hφ
4n
i
free
Z
free
.
Thus, we can apply our previous Wick’s theorem to compute this! We now see
that there is a factor of
1
(4!)
n
n!
coming from expanding the exponential, and
the factor of
(4n)!
4
n
(2n)!
came from Wick’s theorem it is the number of ways to
pair up 4
n
vertices together. There will be a factor of
~
2k
coming from Wick’s
theorem when evaluating hφ
4n
i, so the expansion is left with a ~
k
factor.
Let’s try to compute the order
λ
term. By Wick’s theorem, we should
consider diagrams with 4 external vertices, joined by straight lines. There are
three of these:
But we want to think of these as interaction vertices of valency 4. So instead of
drawing them this way, we “group” the four vertices together. For example, the
first one becomes
Similarly, at order λ
2
, we have a diagram
Note that in the way we are counting diagrams in Wick’s theorem means we
consider the following diagrams to be distinct:
However, topologically, these are the same diagrams, and we don’t want to count
them separately. Combinatorially, these diagrams can be obtained from each
other by permuting the outgoing edges at each vertex, or by permuting the
vertices. In other words, in the “expanded view”, when we group our vertices as
we are allowed to permute the vertices within each block, or permute the blocks
themselves. We let
D
n
be the set of all graphs, and
G
n
be the group consisting
of these “allowed” permutations. This
G
n
is a semi-direct product (
S
4
)
n
o S
n
.
Then by some combinatorics,
|D
n
| =
(4n)!
4
n
(2n)!
, |G
n
| = (4!)
n
n!.
Recall that
|D
n
|
is the number Wick’s theorem gives us, and we now see that
1
|G
n
|
happens to be the factor we obtained from expanding the exponential. Of
course, this isn’t a coincidence we chose to put
λ
4!
instead of, say,
λ
4
in front
of the φ
4
term so that this works out.
We can now write the asymptotic series as
Z(m
2
, λ)
Z(m
2
, 0)
N
X
n=0
|D
n
|
|G
n
|
~λ
m
4
n
.
It turns out a bit of pure mathematics will allow us to express this in terms
of the graphs up to topological equivalence. By construction, two graphs are
topologically equivalent if they can be related to each other via
G
n
. In other
words, the graphs correspond to the set of orbits
O
n
of
D
n
under
G
n
. For
n
= 1,
there is only one graph up to topological equivalence, so
|O
1
|
= 1. It is not hard
to see that |O
2
| = 3.
In general, each graph Γ has some automorphisms that fix the graph. For
example, in the following graph:
3 4
21
we see that the graph is preserved by the permutations (1 3), (2 4) and (1 2)(3 4).
In general, the automorphism group
Aut
(Γ) is the subgroup of
G
n
consisting of
all permutations that preserve the graph Γ. In this case,
Aut
(Γ) is generated by
the above three permutations, and |Aut(Γ)| = 8.
Recall from IA Groups that the orbit-stabilizer theorem tells us
|D
n
|
|G
n
|
=
X
Γ∈O
n
1
|Aut(Γ)|
.
Thus, if we write
O =
[
nN
O
n
,
then we have
Z(m
2
, λ)
Z(m
2
, 0)
X
Γ∈O
1
|Aut(Γ)|
~λ
m
4
n
,
where the
n
appearing in the exponent is the number of vertices in the graph Γ.
The last loose end to tie up is the factor of
~λ
m
4
n
appearing in the sum.
While this is a pretty straightforward thing to come up with in this case, for
more complicated fields with more interactions, we need a better way of figuring
out what factor to put there.
If we look back at how we derived this factor, we had a factor of
λ
~
coming from each interaction vertex, whereas when computing the correlation
functions
hφ
4n
i
, every edge contributes a factor of
~m
2
(since, in the language
we expressed Wick’s theorem, we had
M
=
m
2
). Thus, we can imagine this as
saying we have a propagator
~/m
2
and a vertex
λ/~
Note that the negative sign appears because we have
e
S
in “Euclidean” QFT.
In Minkowski spacetime, we have a factor of i instead.
After all this work, we can now expand the partition function as
Z(m
2
, λ)
Z(m
2
, 0)
+ + + + + ···
1 +
λ~
m
4
1
8
+
λ
2
~
2
m
8
1
48
+
λ
2
~
2
m
8
1
16
+
λ
2
~
2
m
8
1
128
+ ···
In generally, if we have a theory with several fields with propagators of value
1
P
i
, and many different interactions with coupling constants λ
α
, then
Z({λ
α
})
Z({0})
X
Γ∈O
1
|Aut Γ|
Q
α
λ
|v
α
(Γ)|
α
Q
i
|P
i
|
|e
i
(Γ)|
~
E(Γ)V (Γ)
,
where
e
i
(Γ) is the number of edges of type i in Γ;
v
α
(Γ) is the number of vertices of type α in Γ;
V (Γ) is the number of vertices in total; and
E(Γ) the total number of edges.
There are a few comments we can make. Usually, counting all graphs is rather
tedious. Instead, we can consider the quantity
log(Z/Z
0
).
This is the sum of all connected graphs. For example, we can see that the term for
the two-figure-of-eight is just the square of the term for a single figure-of-eight
times the factor of
1
2!
, and one can convince oneself that this generalizes to
arbitrary combinations.
In this case, we can simplify the exponent of
~
. In general, by Euler’s theorem,
we have
E V = L C,
where
L
is the number of loops in the graph, and
C
is the number of connected
components. In the case of
log
(
Z/Z
0
), we have
C
= 1. Then the factor of
~
is
just ~
L1
. Thus, the more loops we have, the less it contributes to the sum.
It is not difficult to see that if we want to compute correlation functions, say
hφ
2
i
, then we should consider graphs with two external vertices. Note that in
this case, when computing the combinatorial factors, the automorphisms of a
graph are required to fix the external vertices. To see this mathematically, note
that the factors in
1
|G
n
|
=
1
(4!)
n
n!
came from Taylor expanding
exp
and the coefficient
1
4!
of
φ
4
. We do not get
these kinds of factors from φ
2
when computing
Z
dφ φ
2
e
S[φ]
.
Thus, we should consider the group of automorphisms that fix the external
vertices to get the right factor of
1
|G
n
|
.
Finally, note that Feynman diagrams don’t let us compute exact answers
in general. Even if we had some magical means to compute all terms in the
expansions using Feynman diagrams, if we try to sum all of them up, it will
diverge! We only have an asymptotic series, not a Taylor series. However, it
turns out in some special theories, the leading term is the exact answer! The tail
terms somehow manage to all cancel each other out. For example, this happens
for supersymmetry theories, but we will not go into details.
1.4 An effective theory
We now play with another toy theory. Suppose we have two scalar fields
φ, χ R
,
and consider the action
S(φ, χ) =
m
2
2
φ
2
+
M
2
2
χ
2
+
λ
4
φ
2
χ
2
.
For convenience, we will set ~ = 1. We have Feynman rules
1/m
2
1/M
2
with a vertex
λ
We can use these to compute correlation functions and expectation values. For
example, we might want to compute
log(Z/Z
0
).
We have a diagram that looks like
log
Z(m
2
, λ)
Z(m
2
, 0)
+ + + + ···
λ
4m
2
M
2
+
λ
2
16m
4
M
4
+
λ
2
16m
4
M
4
+
λ
2
8m
4
M
4
+ ···
We can also try to compute
hφ
2
i
. To do so, we consider diagrams with two
vertices connected to a solid line:
The relevant diagrams are
hφ
2
i + + + + + ···
1
m
2
+
λ
2m
4
M
4
+
λ
2
4m
6
M
4
+
λ
2
2m
6
M
4
+
λ
2
4m
6
M
4
+ ···
Let’s arrive at this result in a different way. Suppose we think of
χ
as “heavy”,
so we cannot access it directly using our experimental equipment. In particular,
if we’re only interested in the correlation functions that depend only on
φ
, then
we could try to “integrate out” χ first.
Suppose we have a function f(φ), and consider the integral
Z
R
2
dφ dχ f(φ)e
S(φ,χ)/~
=
Z
R
dφ
f(φ)
Z
R
dχ e
S(φ,χ)/~
.
We define the effective action for φ, S
eff
(φ) by
S
eff
(φ) = ~ log
Z
R
dχ e
S(φ,χ)/~
.
Then the above integral becomes
Z
R
dφ f(φ)e
S
eff
(φ)/~
.
So doing any computation with
φ
only is equivalent to pretending
χ
doesn’t
exist, and using this effective potential instead.
In general, the integral is very difficult to compute, and we can only find
an asymptotic series for the effective action. However, in this very particular
example, we have chosen
S
such that it is only quadratic in
χ
, and one can
compute to get
Z
R
dχ e
S(φ,χ)/~
= e
m
2
φ
2
/2~
s
2π~
M
2
+ λφ
2
/2
.
Therefore we have
S
eff
(φ) =
m
2
2
φ
2
+
~
2
log
1 +
λφ
2
2M
2
+
~
2
log
M
2
2π~
=
m
2
2
+
~λ
4M
2
φ
2
~λ
2
16M
4
φ
4
+
~λ
3
48M
6
φ
6
+ ··· +
~
2
log
M
2
2π~
=
m
2
eff
2
φ
2
+
λ
4
4!
φ
4
+
λ
6
6!
φ
6
+ ··· +
~
2
log
M
2
2π~
,
where
λ
2k
= (1)
k+1
~(2k)!
2
k+1
k!
λ
k
M
2k
.
We see that once we integrated out
χ
, we have generated an infinite series of
new interactions for φ in S
eff
(φ). Moreover, the mass also shifted as
m
2
7→ m
2
eff
= m
2
+
~λ
2M
4
.
It is important to notice that the new vertices generated are quantum effects.
They vanish as
~
0 (they happen to be linear in
~
here, but that is just a
coincidence). They are also suppressed by powers of
1
M
2
. So if
χ
is very very
heavy, we might think these new couplings have very tiny effects.
This is a very useful technique. If the universe is very complicated with
particles at very high energies, then it would be rather hopeless to try to account
for all of the effects of the particles we cannot see. In fact, it is impossible to
know about these very high energy particles, as their energy are too high to
reach. All we can know about them is their incarnation in terms of these induced
couplings on lower energy fields.
A few things to note:
The original action had a
Z
2
× Z
2
symmetry, given by (
φ, χ
)
7→
(
±φ, ±χ
).
This symmetry is preserved, and we do not generate any vertices with odd
powers of φ.
The field S
eff
(φ) also contains a field independent term
~
2
log
M
2
2π~
.
This plays no role in correlation functions
hf
(
φ
)
i
. Typically, we are just
going to drop it. However, this is one of the biggest problem in physics.
This term contributes to the cosmological constant, and the scales that are
relevant to this term are much larger than the actual observed cosmological
constant. Thus, to obtain the observed cosmological constant, we must
have some magic cancelling of these cosmological constants.
In this case, passing to the effective action produced a lot of new inter-
actions. However, if we already had a complicated theory, e.g. when we
started off with an effective action, and then decided to integrate out
more things, then instead of introducing new interactions, the effect of this
becomes shifting the coupling constants, just as we shifted the mass term.
In general, how can we compute the effective potential? We are still computing
an integral of the form
Z
R
dχ e
S(φ,χ)/~
,
so our previous technique of using Feynman diagrams should still work. We will
treat
φ
as a constant in the theory. So instead of having a vertex that looks like
λ
we drop all the φ lines and are left with
λφ
2
But actually, this would be rather confusing. So instead what we do is that
we still draw the solid
φ
lines, but whenever such a line appears, we have to
terminate the line immediately, and this contributes a factor of φ:
λφ
2
For our accounting purposes, we will say the internal vertex contributes a factor
of
λ
2
(since the automorphism group of this vertex has order 2, and the action
had
λ
4
), and each terminating blue vertex contributes a factor of φ.
Since we have a “constant” term
m
2
2
φ
2
as well, we need to add one more
diagram to account for that. We allow a single edge of the form
m
2
With these ingredients, we can compute the effective potential as follows:
S
eff
(φ) + + + + ···
m
2
2
φ
2
+
λ
4M
2
φ
2
+
λ
2
φ
4
16M
4
+
λ
3
φ
6
48M
6
+ ···
These are just the terms we’ve previously found. There are, again, a few things
to note
The diagram expansion is pretty straightforward in this case, because we
started with a rather simple interacting theory. More complicated examples
will have diagram expansions that are actually interesting.
We only sum over connected diagrams, as
S
eff
is the logarithm of the
integral.
We see that the new/shifted couplings in
S
eff
(
φ
) are generated by loops of
χ fields.
When we computed the effective action directly, we had a “cosmological
constant” term
~
2
log
M
2
2π~
, but this doesn’t appear in the Feynman
diagram calculations. This is expected, since when we developed our
Feynman rules, what it computed for us was things of the form
log
(
Z/Z
0
),
and the Z
0
term is that “cosmological constant”.
Example. We can use the effective potential to compute hφ
2
i:
hφ
2
i + + ···
1
m
2
eff
+
λ
2m
6
eff
+ ···
We can see that this agrees with our earlier calculation correct to order λ
2
.
At this moment, this is not incredibly impressive, since it takes a lot of work
to compute
S
eff
(
φ
). But the computation of
S
eff
(
φ
) can be reused to compute
any correlation involving
φ
. So if we do many computations with
φ
, then we we
can save time.
But the point is not that this saves work. The point is that this is what we
do when we do physics. We can never know if there is some really high energy
field we can’t reach with our experimental apparatus. Thus, we can only assume
that the actions we discover experimentally are the effective actions coming from
integrating out some high energy fields.
1.5 Fermions
So far, we have been dealing with Bosons. These particles obey Bose statistics,
so different fields
φ
commute with each other. However, we want to study
Fermions as well. This would involve variables that anti-commute with each
other. When we did canonical quantization, this didn’t pose any problem to our
theory, because fields were represented by operators, and operators can obey
any commutation or anti-commutation relations we wanted them to. However,
in the path integral approach, our fields are actual numbers, and they are forced
to commute.
So how can we take care of fermions in our theory? We cannot actually use
path integrals, as these involves things that commute. Instead, we will now treat
the fields as formal symbols.
The theory is rather trivial if there is only one field. So let’s say we have many
fields
θ
1
, ··· , θ
n
. Then a “function” of these fields will be a formal polynomial
expression in the symbols
θ
1
, ··· , θ
n
subject to the relations
θ
i
θ
j
=
θ
j
θ
i
. So,
for example, the action might be
S[θ
1
, θ
2
] =
1
2
m
2
θ
1
θ
2
=
1
2
m
2
θ
2
θ
1
.
Now the path integral is defined as
Z
dθ
1
···dθ
n
e
S[θ
1
,...,θ
n
]
.
There are two things we need to make sense of the exponential, and the
integral.
The exponential is easy. In general, for any analytic function
f
:
R R
(or
f : C C), we can write it as a power series
f(x) =
X
i=0
a
i
x
i
.
Then for any polynomial p in the fields, we can similarly define
f(p) =
X
i=0
a
i
p
i
.
Crucially, this expression is a finite sum, since we have only finitely many fields,
and any monomial in the fields of degree greater than the number of fields must
vanish by anti-commutativity.
Example.
e
θ
1
+θ
2
= 1 + θ
1
+ θ
2
+
1
2
(θ
1
+ θ
2
)
2
+ ···
= 1 + θ
1
+ θ
2
+
1
2
(θ
1
θ
2
+ θ
2
θ
1
)
= 1 + θ
1
+ θ
2
How about integration? Since our fields are just formal symbols, and do not
represent real/complex numbers, it doesn’t make sense to actually integrate it.
However, we can still define a linear functional from the space of all polynomials
in the fields to the reals (or complexes) that is going to act as integration.
If we have a single field
θ
, then the most general polynomial expression in
θ
is a + . It turns out the correct definition for the “integral” is
Z
dθ (a + ) = b.
This is known as the Berezin integral.
How can we generalize this to more fields? Heuristically, the rule is that
R
d
θ
is an expression that should anti-commute with all fields. Thus, for example,
Z
dθ
1
dθ
2
(3θ
2
+ 2θ
1
θ
2
) =
Z
dθ
1
3
Z
dθ
2
θ
2
+ 2
Z
dθ
2
θ
1
θ
2
=
Z
dθ
1
3 2θ
1
Z
dθ
2
θ
2
=
Z
dθ
1
(3 2θ
1
)
= 2.
On the other hand, we have
Z
dθ
1
dθ
2
(3θ
2
+ 2θ
2
θ
1
) =
Z
dθ
1
3
Z
dθ
2
θ
2
+ 2
Z
dθ
2
θ
2
θ
1
=
Z
dθ
1
3 + 2
Z
dθ
2
θ
2
θ
1
=
Z
dθ
1
(3 2θ
1
)
= 2.
Formally speaking, we can define the integral by
Z
dθ
1
···dθ
n
θ
n
···θ
1
= 1,
and then sending other polynomials to 0 and then extending linearly.
When actually constructing a fermion field, we want an action that is “sensi-
ble”. So we would want a mass term
1
2
m
2
θ
2
in the action. But for anti-commuting
variables, this must vanish!
The solution is to imagine that our fields are complex, and we have two
fields
θ
and
¯
θ
. Of course, formally, these are just two formal symbols, and bear
no relation to each other. However, we will think of them as being complex
conjugates to each other. We can then define the action to be
S[
¯
θ, θ] =
1
2
m
2
¯
θθ.
Then the partition function is
Z =
Z
dθ d
¯
θ e
S(
¯
θ,θ)
.
Similar to the previous computations, we can evaluate this to be
Z =
Z
dθ d
¯
θ e
S(
¯
θ,θ)
=
Z
dθ
Z
d
¯
θ
1
1
2
m
2
¯
θθ

=
Z
dθ
Z
d
¯
θ
1
2
m
2
Z
d
¯
θ
¯
θ
θ
=
Z
dθ
1
2
m
2
θ
=
1
2
m
2
.
We will need the following formula:
Proposition.
For an invertible
n × n
matrix
B
and
η
i
, ¯η
i
, θ
i
,
¯
θ
i
independent
fermionic variables for i = 1, . . . , n, we have
Z(η, ¯η) =
Z
d
n
θ d
n
¯
θ exp
¯
θ
i
B
ij
θ
j
+ ¯η
i
θ
i
+
¯
θ
i
η
i
= det B exp
¯η
i
(B
1
)
ij
η
j
.
In particular, we have
Z = Z(0, 0) = det B.
As before, for any function f, we can again define the correlation function
hf(
¯
θ, θ)i =
1
Z(0, 0)
Z
d
n
θ d
n
¯
θ e
S(
¯
θ,θ)
f(
¯
θ, θ).
Note that usually,
S
contains even-degree terms only. Then it doesn’t matter
if we place f on the left or the right of the exponential.
It is an exercise on the first example sheet to prove these, derive Feynman
rules, and investigate further examples.
2 QFT in one dimension (i.e. QM)
2.1 Quantum mechanics
In
d
= 1, there are two possible connected, compact manifolds
M
we can
have M = S
1
or M = I = [0, 1].
M = I
φ
We will mostly be considering the case where
M
=
I
. In this case, we need to
specify boundary conditions on the field in the path integral, and we will see that
this corresponds to providing start and end points to compute matrix elements.
We let
t
[0
,
1] be the worldline coordinate parametrizing the field, and we
write our field as
x : I N
for some Riemannian manifold (
N, g
), which we call the target manifold. If
U N
has coordinates
x
a
with
a
= 1
, ··· , n
=
dim
(
N
), then we usually write
x
a
(t) for the coordinates of x(t). The standard choice of action is
S[x] =
Z
I
1
2
g( ˙x, ˙x) + V (x)
dt,
where ˙x is as usual the time derivative, and V (x) is some potential term.
We call this theory a non-linear
σ
-model. This is called
σ
-model because
when people first wrote this theory down, they used
σ
for the name of the field,
and the name stuck. This is non-linear because the term
g
(
˙x, ˙x
) can be very
complicated when the metric
g
if it is not flat. Note that +
V
(
x
) is the correct
sign for a Euclidean worldline.
Classically, we look for the extrema of
S
[
x
] (for fixed end points), and the
solutions are given by the solutions to
d
2
x
a
dt
2
+ Γ
a
bc
˙x
b
˙x
c
= g
ab
V
x
b
.
The left-hand side is just the familiar geodesic equation we know from general
relativity, and the right hand term corresponds to some non-gravitational force.
In the case of zero-dimensional quantum field theory, we just wrote down
the partition function
Z =
Z
e
S
,
and assumed it was an interesting thing to calculate. There wasn’t really a
better option because it is difficult to give a physical interpretation to a zero-
dimensional quantum field theory. In this case, we will see that path integrals
naturally arise when we try to do quantum mechanics.
To do quantum mechanics, we first pick a Hilbert space
H
. We usually take
it as
H = L
2
(N, dµ),
the space of square-integrable functions on N.
To describe dynamics, we pick a Hamiltonian operator
H
:
H H
, with the
usual choice being
H =
1
2
+ V,
where the Laplacian is given by
∆ =
1
g
a
(
gg
ab
b
).
As usually, the
g refers to the square root of the determinant of g.
We will work in the Heisenberg picture. Under this set up, the amplitude for
a particle to travel from x N to y N in time T is given by the heat kernel
K
T
(y, x) = hy|e
HT
|xi.
Note that we have
e
HT
instead of
e
iHT
because we are working in a Euclidean
world. Strictly speaking, this doesn’t really make sense, because
|xi
and
|yi
are not genuine elements of
H
=
L
2
(
N,
d
µ
). They are
δ
-functions, which aren’t
functions. There are some ways to fix this problem. One way is to see that the
above suggests K
T
satisfies
t
K
t
(y, x) + HK
t
(y, x) = 0,
where we view
x
as a fixed parameter, and
K
t
is a function of
y
, so that
H
can
act on K
t
. The boundary condition is
lim
t0
K
t
(y, x) = δ(y x),
and this uniquely specifies
K
t
. So we can define
K
t
to be the unique solution to
this problem.
We can reconnect this back to the usual quantum mechanics by replacing
the Euclidean time
T
with
it
, and the above equation gives us the Schr¨odinger
equation
i
K
t
t
(y, x) = HK
t
(y, x).
We first focus on the case where
V
= 0. In the case where (
N, g
) = (
R
n
, δ
), we
know from, say, IB Methods, that the solution is given by
K
t
(y, x) =
1
(2πt)
n/2
exp
|x y|
2
2t
.
For an arbitrary Riemannian manifold (
N, g
), it is in general very hard to write
down a closed-form expression for
K
t
. However, we can find the asymptotic form
lim
t0
K
t
(y, x)
a(x)
(2πt)
n/2
exp
d(y, x)
2
2t
,
where
d
(
x, y
) is the geodesic distance between
x
and
y
, and
a
(
x
) is some invariant
of our manifold built from (integrals of) polynomials of the Riemann curvature
that isn’t too important.
Here comes the magic. We notice that since
I =
Z
d
n
z |zihz|
is the identity operator, we can write
K
t
1
+t
2
(y, x) = hy|e
T H
|xi
=
Z
d
n
z hy|e
t
1
H
|zihz|e
t
2
H
|xi
=
Z
d
n
z K
t
2
(y, z)K
t
1
(z, x).
For flat space, this is just reduces to the formula for convolution of Gaussians.
This is the concatenation property of the heat kernel.
Using this many times, we can break up our previous heat kernel by setting
t =
T
N
for some large N N. Then we have
K
T
(y, x) =
Z
N1
Y
i=1
d
n
x
i
K
t
(x
i
, x
i1
),
where we conveniently set x
0
= x and x
N
= y.
The purpose of introducing these
t
is that we can now use the asymptotic
form of K
t
(y, t). We can now say
hy
1
|e
HT
|y
0
i
= lim
N→∞
1
2πt
nN/2
Z
N1
Y
i=1
d
n
x
i
a(x
i
) exp
t
2
d(x
i+1
, x
i
)
t
2
!
.
This looks more-or-less like a path integral! We now dubiously introduce the
path integral measure
Dx
?
lim
N→∞
1
2πt
nN/2
N1
Y
i=1
d
n
x
i
a(x
i
),
and also assume our map
x
(
t
) is at least once-continuously differentiable, so that
lim
N→∞
N1
Y
i=1
exp
t
2
d(x
i+1
, x
i
)
t
2
!
?
exp
1
2
Z
dt g( ˙x, ˙x)
= exp(S[x]).
Assuming these things actually make sense (we’ll later figure out they don’t),
we can write
hy
1
|e
HT
|y
0
i =
Z
C
T
[y
1
,y
0
]
Dx e
S[x]
,
where
C
T
[
y
1
, y
0
] is the space of “all” maps
I N
such that
x
(0) =
y
0
and
x(1) = y
1
.
Now consider an arbitrary V 6= 0. Then we can write
H = H
0
+ V (x),
where H
0
is the free Hamiltonian. Then we note that for small t, we have
e
Ht
= e
H
0
t
E
V (x)t
+ o(t).
Thus, for small t, we have
K
t
(y, x) = hy|e
Ht
|xi
hy|e
H
0
t
e
V (x)t
|xi
a(x)
(2πt)
n/2
exp
1
2
d(y, x)
t
2
+ V (x)
!
t
!
.
Then repeating the above derivations will again give us
hy
1
|e
HT
|y
0
i =
Z
C
T
[y
1
,y
0
]
Dx e
S[x]
.
Before we move on to express other things in terms of path integrals, and
then realize our assumptions are all wrong, we make a small comment on the
phenomena we see here.
Notice that the states
|y
0
i H
and
hy
1
| H
we used to evaluate our
propagator here arise as boundary conditions on the map
x
. This is a general
phenomenon. The co-dimension-1 subspaces (i.e. subspaces of
M
of one dimen-
sion lower than
M
) are associated to states in our Hilbert space
H
. Indeed, when
we did quantum field theory via canonical quantization, the states corresponded
to the state of the universe at some fixed time, which is a co-dimension 1 slice of
spacetime.
The partition function
We can naturally interpret a lot of the things we meet in quantum mechanics via
path integrals. In statistical physics, we often called the quantity
Tr
H
(
e
HT
)
the partition function. Here we can compute it as
Tr
H
(e
HT
) =
Z
d
n
y hy|e
HT
|yi.
Using our path integral formulation, we can write this as
Tr
H
(e
HT
) =
Z
d
n
y
Z
C
I
[y,y]
Dx e
S
=
Z
C
S
1
Dx e
S
,
where we integrate over all circles. This is the partition function
Z
(
S
1
,
(
N, g, V
))
of our theory. If one is worried about convergence issues, then we would have some
problems in this definition. If we work in flat space, then
K
T
(
y, y
) =
hy|e
HT
|yi
is independent of
y
. So when we integrate over all
y
, the answer diverges (as
long as
K
T
(
y, y
) is non-zero). However, we would get a finite result if we had a
compact manifold instead.
Correlation functions
More interestingly, we can do correlation functions. We will begin by considering
the simplest choice local operators.
Definition
(Local operator)
.
A local operator
O
(
t
) is one which depends on
the values of the fields and finitely many derivatives just at one point t M.
We further restrict to local operators that do not depend on derivatives.
These are given by functions
O
:
N R
, and then by pullback we obtain an
operator O(x(t)).
Suppose the corresponding quantum operator is
ˆ
O
=
O
(
ˆx
), characterized by
the property
ˆ
O|xi = O(x) |xi.
If we want to evaluate this at time t, then we would compute
hy
1
|
ˆ
O(t) |y
0
i = hy
1
|e
H(T t)
ˆ
Oe
Ht
|y
0
i.
But, inserting a complete set of states, this is equal to
Z
d
n
x hy
1
|e
H(T t)
|xihx|
ˆ
Oe
Ht
|yi
=
Z
d
n
x O(x) hy
1
|e
H(T t)
|xihx|e
Ht
|yi.
Simplifying notation a bit, we can write
hy
1
|
ˆ
O(t) |y
0
i =
Z
d
n
x O(x(t))
Z
C
[T,t]
[y
1
,x
t
]
Dx e
S[x]
Z
C
[t,0]
[x
t
,y
0
]
Dx e
S[x]
.
But this is just the same as
Z
C
[T,0]
[y
1
,y
0
]
Dx O(x(t))e
S[x]
.
More generally, suppose we have a sequence of operators
O
n
, ··· , O
1
we want
to evaluate at times
T > t
n
> t
n1
> ··· > t
1
>
0, then by the same argument,
we find
hy
1
|e
HT
ˆ
O
n
(t
n
) ···
ˆ
O
1
(t
1
) |y
0
i =
Z
C
[0,T ]
[y
0
,y
1
]
Dx O
n
(x(t
n
)) ···O(x(t
1
))e
S[x]
.
Note that it is crucial that the operators are ordered in this way, as one would
see if they actually try to prove this. Indeed, the
ˆ
O
i
are operators, but the
objects in the path integral, i.e. the
O
(
x
(
t
i
)) are just functions. Multiplication
of operators does not commute, but multiplication of functions does. In general,
if {t
i
} (0, T ) are a collection of times, then we have
Z
Dx
n
Y
i=1
O
i
(x(t
i
))e
S[x]
= hy
1
|e
HT
T
n
Y
i=1
ˆ
O
i
(t
i
) |y
0
i,
where T denotes the time ordering operator. For example, for n = 2, we have
T [
ˆ
O
1
(t
1
)
ˆ
O
2
(t
2
)] = Θ(t
2
t
1
)
ˆ
O
2
(t
2
)
ˆ
O
1
(t
1
) + Θ(t
1
t
2
)
ˆ
O
1
(t
1
)
ˆ
O
2
(t
2
),
where Θ is the step function.
It is interesting to note that in this path integral formulation, we see that
we get non-trivial correlation between operators at different times only because
of the kinetic (derivative) term in the action. Indeed, for a free theory, the
discretized version of the path integral looked like
S
kin
[x] =
X
i
1
2
x
i+1
x
i
t
2
t,
Now if we don’t have this term, and the action is purely potential:
S
pot
[x] =
X
i
V (x
i
),
then the discretized path integral would have factorized into a product of integrals
at these sampling points
x
i
. It would follow that for any operators
{O
i
}
that
depend only on position, we have
*
Y
i
O
i
(x(t
i
))
+
=
Y
i
hO
i
(x(t
i
))i,
and this is incredibly boring. When we work with higher dimensional universes,
the corresponding result shows that if there are no derivative terms in the
potential, then events at one position in spacetime have nothing to do with
events at any other position.
We have already seen this phenomena when we did quantum field theory
with perturbation theory the interactions between different times and places
are given by propagators, and these propagators arise from the kinetic terms in
the Lagrangian.
Derivative terms
We now move on to consider more general functions of the field and its derivative.
Consider operators
O
i
(
x, ˙x, ···
). We might expect that the value of this operator
is related to path integrals of the form
Z
Dx O
1
(x, ˙x, ···)|
t
1
O
2
(x, ˙x, ···)|
t
2
e
S[x]
But this can’t be right. We were told that one of the most important properties
of quantum mechanics is that operators do not commute. In particular, for
p
i
= ˙x
i
, we had the renowned commutator relation
[ˆx
i
, ˆp
j
] = δ
i
j
.
But in this path integral formulation, we feed in functions to the path integral,
and it knows nothing about how we order
x
and
˙x
in the operators
O
i
. So what
can we do?
The answer to this is really really important. The answer is that path
integrals don’t work.
The path integral measure
Recall that to express our correlation functions as path integrals, we had to take
the limits
Dx
?
= lim
N→∞
1
(2πt)
nN/2
N1
Y
i=1
d
n
x
i
a(x
i
),
and also
S[x]
?
= lim
N→∞
N1
X
n=1
1
2
x
n+1
x
n
t
2
t.
Do these actually make sense?
What we are trying to do with these expressions is that we are trying to
regularize our path integral, i.e. find a finite-dimensional approximation of the
path integral. For quantum field theory in higher dimensions, this is essentially
a lattice regularization.
Before we move on and try to see if this makes sense, we look at another way
of regularizing our path integral. To do so, we decompose our field into Fourier
modes:
x
a
(t) =
X
kZ
x
a
k
e
2πikt/T
,
and then we can obtain a regularized form of the action as
S
N
[x] =
N
X
k=N
1
2
k
2
x
a
k
x
a
k
.
Under this decomposition, we can take the regularized path integral measure to
be
D
N
x =
Y
|k|≤N
d
n
x
k
.
This is analogous to high-energy cutoff regularization. Now the natural question
to ask is do the limits
lim
N→∞
Z
D
N
x, lim
N→∞
S
N
[x]
exist?
The answer is, again: NO! This is in fact not a problem of us not being able
to produce limits well. It is a general fact of life that we cannot have a Lebesgue
measure on an infinite dimensional inner product space (i.e. vector space with
an inner product).
Recall the following definition:
Definition
(Lebesgue measure)
.
A Lebesgue measure d
µ
on an inner product
space V obeys the following properties
For all non-empty open subsets U R
D
, we have
vol(U ) =
Z
U
dµ > 0.
If U
0
is obtained by translating U, then
vol(U
0
) = vol(U ).
Every
x V
is contained in at least one open neighbourhood
U
x
with
finite volume.
note that we don’t really need the inner product structure in this definition.
We just need it to tell us what the word “open” means.
We now prove that there cannot be any Lebesgue measure on an infinite
dimensional inner product space.
We first consider the case of a finite-dimensional inner product space. Any
such inner product space is isomorphic to
R
D
for some
D
. Write
C
(
L
) for an
open hypercube of side length
L
. By translation in variance, the volume of any
two such hypercubes would be the same.
Now we note that given any such hypercube, we can cut it up into 2
D
hypercubes of side length L/2:
Then since
C
(
L
) contains 2
D
disjoint copies of
C
(
L/
2) (note that it is not exactly
the union of them, since we are missing some boundary points), we know that
vol(C(L))
2
D
X
i=1
vol(C(L/2)) = 2
D
vol(C(L/2)),
Now in the case of an infinite dimensional vector space,
C
(
L
) will contain
infinitely many copies of
C
(
L/
2). So since
vol
(
C
(
L/
2)) must be non-zero, as it
is open, we know
vol
(
C
(
L
)) must be infinite, and this is true for any
L
. Since
any open set must contain some open hypercube, it follows that all open sets
have infinite measure, and we are dead.
Theorem.
There are no Lebesgue measures on an infinite dimensional inner
product space.
This means whenever we do path integrals, we need to understand that we
are not actually doing an integral in the usual sense, but we are just using a
shorthand for the limit of the discretized integral
lim
N→∞
1
2πt
nN/2
Z
N1
Y
i=1
d
n
x
i
exp
1
2
|x
i+1
x
i
|
t
2
t
!
.
as a whole. In particular, we cannot expect the familiar properties of integrals
to always hold for our path integrals.
If we just forget about this problem and start to do path integrals, then we
would essentially be writing down nonsense. We can follow perfectly logical steps
and prove things, but the output will still be nonsense. Then we would have
to try to invent some new nonsense to make sense of the nonsense. This was,
in fact, how renormalization was invented! But as we will see, that is not what
renormalization really is about.
Note that we showed that the measure D
x
doesn’t exist, but what we really
need wasn’t Dx. What we really needed was
Z
Dx e
S[x]
.
This is no longer translation invariant, so it is conceivable that it exists. Indeed,
in the case of a 1D quantum field theory, it does, and is known as the Wiener
measure.
In higher dimensions, we are less certain. We know it doesn’t exist for QED,
and we believe it does not exist for the standard model. However, we believe
that it does exist for Yang–Mills theory in four dimensions.
Non-commutativity in QM
Now we know that the path integral measure doesn’t exist, and this will solve
our problem with non-commutativity. Indeed, as we analyze the discretization
of the path integral, the fact that
[ˆx, ˆp] 6= 0
will fall out naturally.
Again, consider a free theory, and pick times
T > t
+
> t > t
> 0.
We will consider
Z
Dx x(t) ˙x(t
)e
S
= hy
1
|e
H(T t)
ˆxe
H(tt
)
ˆpe
Ht
|y
0
i,
Z
Dx x(t) ˙x(t
+
)e
S
= hy
1
|e
H(tt
+
)
ˆpe
H(t
+
t)
ˆxe
Ht
|y
0
i.
As we take the limit t
±
t, the difference of the right hand sides becomes
hy
1
|e
H(tt)
[ˆx, ˆp]e
Ht
|y
0
i = hy
1
|e
HT
|y
0
i 6= 0.
On the other hand, in the continuum path integral, the limit seems to give the
same expression in both cases, and the difference vanishes, naively. The problem
is that we need to regularize. We cannot just bring two operators together in
time and expect it to behave well. We saw that the path integral was sensitive to
the time-ordering of the operators, so we expect something “discrete” to happen
when the times cross. This is just like in perturbation theory, when we bring two
events together in time, we have to worry that the propagators become singular.
Normally, we would have something like
x(t) ˙x(t
) x(t) ˙x(t
+
) = x
t
x
t
x
t
δt
δt
x
t
x
t
+
+δt
x
t
+
δt
.
In the regularized integral, we can keep increasing
t
and decreasing
t
+
, until
we get to the point
x
t
x
t
x
tt
t
x
t
x
t+∆t
x
t
t
.
Now that
t
±
have hit
t
, we need to look carefully what happens to the individual
heat kernels. In general, we stop taking the limit as soon as any part of the
discretized derivative touches
x
t
. The part of the integral that depends on
x
t
looks like
Z
d
n
x
t
K
t
(x
t+∆t
, x
t
)x
t
x
t
x
tt
t
x
t+∆t
x
t
t
K
t
(x
t
, x
tt
).
Using the fact that
K
t
(x
t
, x
tt
) exp
(x
t
x
tt
)
2
2∆t
,
we can write the integral as
Z
d
n
x
t
x
t
x
t
K
t
(x
t+∆t
, x
t
)K
t
(x
t
, x
tt
)
,
Now integrating by parts, we get that this is equal to
Z
d
n
x
t
K
t
(x
t+∆t
, x
t
)K
t
(x
t
, x
tt
) = K
2∆t
(x
t+∆t
, x
tt
).
So we get the same as in the operator approach.
2.2 Feynman rules
Consider a theory with a single field x : S
1
R, and action
S[x] =
Z
S
1
dt
1
2
˙x
2
+
1
2
m
2
x
2
+
λ
4!
x
4
.
We pick
S
1
as our universe so that we don’t have to worry about boundary
conditions. Then the path integral for the partition function is
Z
Z
0
=
1
Z
0
Z
S
1
Dx x(t
1
) ···x(t
n
) e
S[x]
1
Z
0
N
X
n=0
Z
n
Y
i=1
dt
i
Z
S
1
Dx e
S
free
[x]
λ
n
(4!)
n
n!
n
Y
i=1
x(t
i
)
4
=
N
X
n=0
Z
n
Y
i=1
dt
i
λ
n
(4!)
n
n!
*
n
Y
i=1
x(t
i
)
4
+
free
So we have again reduced the problem to computing the correlators of the free
theory.
Instead of trying to compute these correlators directly, we instead move to
momentum space. We write
x(t) =
X
kZ
x
k
e
ikt
.
For the sake of brevity (or rather, laziness of the author), we shall omit all
factors of 2π. Using orthogonality relations, we have
S[x] =
X
kZ
1
2
(k
2
+ m
2
)|x
k
|
2
+
X
k
1
,k
2
,k
3
,k
4
Z
δ(k
1
+ k
2
+ k
3
+ k
4
)
λ
4!
x
k
1
x
k
2
x
k
3
x
k
4
.
Note that here the
δ
is the “discrete” version, so it is 1 if the argument vanishes,
and 0 otherwise.
Thus, we may equivalently represent
Z
Z
0
N
X
n=0
X
{k
(i)
j
}
λ
n
4!
n
n!
n
Y
i=1
δ(k
(i)
1
+ k
(i)
2
+ k
(i)
3
+ k
(i)
4
)
*
n
Y
i=1
4
Y
j=1
x
k
(i)
j
+
free
.
This time, we are summing over momentum-space correlators. But in momentum
space, the free part of the action just looks like countably many free, decoupled
0-dimensional fields! Moreover, each correlator involves only finitely many of
these fields. So we can reuse our results for the 0-dimensional field, i.e. we can
compute these correlators using Feynman diagrams! This time, the propagators
have value
1
k
2
+m
2
.
If we are working over a non-compact space, then we have a Fourier transform
instead of a Fourier series, and we get
Z
Z
0
N
X
n=0
Z
n
Y
i=1
4
Y
j=1
dk
(i)
j
λ
n
4!
n
n!
n
Y
i=1
δ(k
(i)
1
+k
(i)
2
+k
(i)
3
+k
(i)
4
)
*
n
Y
i=1
4
Y
j=1
x(k
(i)
j
)
+
free
.
2.3 Effective quantum field theory
We now see what happens when we try to obtain effective field theories in 1
dimension. Suppose we have two real-valued fields
x, y
:
S
1
R
. We pick the
circle as our universe so that we won’t have to worry about boundary conditions.
We pick the action
S[x, y] =
Z
S
1
1
2
˙x
2
+
1
2
˙y
2
+
1
2
m
2
x
2
+
1
2
M
2
y
2
+
λ
4
x
2
y
2
dt.
As in the zero-dimensional case, we have Feynman rules
1/(k
2
+ m
2
) 1/(k
2
+ M
2
)
λ
As in the case of zero-dimensional QFT, if we are only interested in the correla-
tions involving
x
(
t
), then we can integrate out the field
y
(
t
) first. The effective
potential can be written as
Z
Dy exp
1
2
Z
S
1
y
d
2
dt
2
+ M
2
+
λx
2
2
y dt
,
where we integrated by parts to turn ˙y
2
to y ¨y.
We start doing dubious things. Recall that we previously found that for a
bilinear operator M : R
n
× R
n
R, we have
Z
R
n
d
n
x exp
1
2
M(x, x)
=
(2π)
n/2
det M
.
Now, we can view our previous integral just as a Gaussian integral over the
operator
(y, ˜y) 7→
Z
S
1
y
d
2
dt
2
+ M
2
+
λx
2
2
˜y dt ()
on the vector space of fields. Thus, (ignoring the factors of (2
π
)
n/2
) we can
formally write the integral as
det
d
2
dt
2
+ M
2
+
λx
2
2
1/2
.
S
eff
[x] thus looks like
S
eff
[x] =
Z
S
1
1
2
( ˙x
2
+ m
2
x
2
) dt +
1
2
log det
d
2
dt
2
+ M
2
+
λx
2
2
We now continue with our formal manipulations. Note that
log det
=
tr log
,
since
det
is the product of eigenvalues and
tr
is the sum of them. Then if we
factor our operators as
d
2
dt
2
+ M
2
+
λx
2
2
=
d
2
dt
2
+ M
2
1 λ
d
2
dt
2
+ M
2
1
x
2
2
!
,
then we can write the last term in the effective potential as
1
2
tr log
d
2
dt
2
+ M
2
+
1
2
tr log
1 λ
d
2
dt
2
M
2
1
x
2
2
!
The first term is field independent, so we might as well drop it. We now look
carefully at the second term. The next dodgy step to take is to realize we know
what the inverse of the differential operator
d
2
dt
2
M
2
is. It is just the Green’s function! More precisely, it is the convolution with the
Green’s function. In other words, it is given by the function G(t, t
0
) such that
d
2
dt
2
M
2
G(t, t
0
) = δ(t t
0
).
Equivalently, this is the propagator of the
y
field. If we actually try to solve this,
we find that we have
G(t, t
0
) =
1
2M
X
nZ
exp
M
t t
0
+
k
T
.
We don’t actually need this formula. The part that will be important is that it
is
1
M
.
We now try to evaluate the effective potential. When we expand
log
1 λG(t, t
0
)
x
2
2
,
the first term in the expansion is
λG(t, t
0
)
x
2
2
.
What does it mean to take the trace of this? We pick a basis for the space we
are working on, say {δ(t t
0
) : t
0
S
1
}. Then the trace is given by
Z
t
0
S
1
dt
0
Z
tS
1
dt δ(t t
0
)
Z
t
0
S
1
dt
0
(λ)G(t, t
0
)
x
2
(t
0
)
2
δ(t
0
t
0
)
.
We can dissect this slowly. The rightmost integral is nothing but the definition
of how
G
acts by convolution. Then the next
t
integral is the definition of how
bilinear forms act, as in (
). Finally, the integral over
t
0
is summing over all
basis vectors, which is what gives us the trace. This simplifies rather significantly
to
λ
2
Z
tS
1
G(t, t)x
2
(t) dt.
In general, we find that we obtain
tr log
1 λG(t, t
0
)
x
2
2
=
λ
2
Z
S
1
G(t, t)x
2
(t) dt
λ
2
8
Z
S
1
×S
1
dt dt
0
G(t
0
, t)x
2
(t)G(t, t
0
)x
2
(t
0
) ···
These terms in the effective field theory are non-local! It involves integrating
over many different points in
S
1
. In fact, we should have expected this non-
locality from the corresponding Feynman diagrams. The first term corresponds
to
x(t)
x(t)
Here G(t, t) corresponds to the y field propagator, and the
λ
2
comes from the
vertex.
The second diagram we have looks like this:
x(t)
x(t)
x(t
0
)
x(t
0
)
We see that the first diagram is local, as there is just one vertex at time
t
. But
in the second diagram, we use the propagators to allow the
x
at time
t
to talk
to x at time t
0
. This is non-local!
Non-locality is generic. Whenever we integrate out our fields, we get non-local
terms. But non-locality is terrible in physics. It means that the equations of
motion we get, even in the classical limit, are going to be integral differential
equations, not just normal differential equations. For a particle to figure out
what it should do here, it needs to know what is happening in the far side of the
universe!
To make progress, we note that if
M
is very large, then we would expect
G
(
t, t
0
) could be highly suppressed for
t 6
=
t
0
. So we can try to expand around
t = t
0
. Recall that the second term is given by
Z
dt dt
0
G(t, t
0
)
2
x
2
(t)x
2
(t
0
)
We can write out x
0
(t
2
) as
x
0
(t
2
) = x
2
(t) + 2x(t) ˙x(t)(t
0
t) +
˙x
2
(t) +
1
2
x(t) ˙x(t)
(t t
0
)
2
+ ··· .
Using the fact that
G
(
t, t
0
) depends on
t
0
only through
M
(
t
0
t
), by dimensional
analysis, we get an expansion that looks like
1
M
2
Z
dt
α
M
x
4
(t) +
β
M
3
x
2
˙x
2
+
1
2
x
2
¨x
+
γ
M
5
(4-derivative terms) + ···
Here α, β, γ are dimensionless quantities.
Thus, we know that every extra derivative is accompanied by a further power
of
1
M
. Thus, provided
x
(
t
) is slowly varying on scales of order
1
M
, we may hope
to truncate the series.
Thus, at energies
E M
, our theory looks approximately local. So as long
as we only use our low-energy approximation to answer low-energy questions, we
are fine. However, if we try to take our low-energy theory and try to extrapolate
it to higher and higher energies, up to
E M
, it is going to be nonsense. In
particular, it becomes non-unitary, and probability is not preserved.
This makes sense. By truncating the series at the first term, we are ignoring
all the higher interactions governed by the
y
fields. By ignoring them, we are
ignoring some events that have non-zero probability of happening, and thus we
would expect probability not to be conserved.
There are two famous examples of this. The first is weak interactions. At
very low energies, weak interactions are responsible for
β
-decay. The effective
action contains a quartic interaction
Z
d
4
x
¯
ψ
e
e
p G
weak
.
This coupling constant
G
weak
has mass dimensional
1. At low energies, this is
a perfectly well description of beta decay. However, this is suspicious. The fact
that we have a coupling constant with negative mass dimension suggests this
came from integrating some fields out.
At high energies, we find that this 4-Fermi theory becomes non-unitary, and
G
weak
is revealed as an approximation to a
W
-boson propagator. Instead of an
interaction that looks like this:
what we really have is
W
There are many other theories we can write down that has negative mass
dimension, the most famous one being general relativity.
2.4 Quantum gravity in one dimension
In quantum gravity, we also include a (path) integral over all metrics on our
spacetime, up to diffeomorphism (isometric) invariance. We also sum over all
possible topologies of
M
. In
d
= 1 (and
d
= 2 for string theory), we can just do
this.
In
d
= 1, a metric
g
only has one component
g
tt
(
t
) =
e
(
t
). There is no
curvature, and the only diffeomorphism invariant of this metric is the total
length
T =
I
e(t) dt.
So the instruction to integrate over all metrics modulo diffeomorphism is just
the instruction to integrate over all possible lengths of the worldline
T
(0
,
),
which is easy. Let’s look at that.
The path integral is given by
Z
T
dT
Z
C
[0,T ]
[y,x]
Dx e
S[x]
.
where as usual
S[x] =
1
2
Z
T
0
˙x
2
dt.
Just for fun, we will include a “cosmological constant” term into our action, so
that we instead have
S[x] =
1
2
Z
T
0
˙x
2
+
m
2
2
dt.
The reason for this will be revealed soon.
We can think of the path integral as the heat kernel, so we can write it as
Z
0
dT hy|e
HT
|xi =
Z
0
dT
d
n
p d
n
q
(2π)
n
hy|qihq|e
HT
|pihp|xi
=
Z
0
dT
d
n
p d
n
q
(2π)
n
e
ip·xiq·y
e
T (p
2
+m
2
)/2
δ
n
(p q)
=
Z
0
dT
d
n
p
(2π)
n
e
ip·(xy)
e
T (p
2
+m
2
)/2
= 2
Z
d
n
p
(2π)
n
e
ip·(xy)
p
2
+ m
2
= 2D(x, y),
where
D
(
x, y
) is the propagator for a scalar field on the target space
R
n
with
action
S[Φ] =
Z
d
n
x
1
2
(Φ)
2
+
m
2
2
Φ
2
.
So a 1-dimensional quantum gravity theory with values in
R
n
is equivalent to
(or at least has deep connections to) a scalar field theory on R
n
.
How about interactions? So far, we have been taking rather unexciting
1-dimensional manifolds as our universe, and there are only two possible choices.
If we allow singularities in our manifolds, then we would allow graphs instead of
just a line and a circle. Quantum gravity then says we should not only integrate
over all possible lengths, but also all possible graphs.
For example, to compute correlation functions such as
h
Φ(
x
1
)
···
Φ(
x
n
)
i
in
Φ
4
theory on
R
n
, say, we consider all 4-valent with
n
external legs with one
endpoint at each of the x
i
, and then we proceed just as in quantum gravity.
For example, we get a contribution to hΦ(x)Φ(y)i from the graph
x x
The contribution to the quantum gravity expression is
Z
zR
n
d
n
z
Z
[0,)
3
dT
1
dT
2
dT
3
Z
C
T
1
[z,x]
Dx e
S
T
1
[x]
Z
C
T
2
[z,z]
Dx e
S
T
2
[x]
Z
C
T
3
[y,z]
Dx e
S
T
3
[x]
,
where
S
T
[x] =
1
2
Z
T
0
x
2
dt +
m
2
2
Z
T
0
dt.
We should think of the second term as the “cosmological constant”, while the
1D integrals over
T
i
’s are the “1d quantum gravity” part of the path integral
(also known as the Schwinger parameters for the graph).
We can write this as
Z
d
n
z dT
1
dT
2
dT
3
hz|e
HT
1
|xihz|e
HT
2
|zihy|e
HT
3
|zi.
Inserting a complete set of eigenstates between the position states and the time
evolution operators, we get
=
Z
d
n
p d
n
` d
n
q
(2π)
3n
e
ip·(xz)
p
2
+ m
2
e
iq·(yz)
q
2
+ m
2
e
i`·(zz)
`
2
+ m
2
=
Z
d
n
p d
n
`
(2π)
2n
e
ip·(xy)
(p
2
+ m
2
)
2
1
`
2
+ m
2
.
This is exactly what we would have expected if we viewed the above diagram as
a Feynman diagram:
x
y
p p
`
This is the worldline perspective to QFT, and it was indeed Feynman’s original
approach to doing QFT.
3 Symmetries of the path integral
From now on, we will work with quantum field theory in general, and impose no
restrictions on the dimension of our universe. The first subject to study is the
notion of symmetries.
We first review what we had in classical field theory. In classical field theory,
Noether’s theorem relates symmetries to conservation laws. For simplicity, we
will work with the case of a flat space.
Suppose we had a variation
δφ = εf(φ, φ)
of the field. The most common case is when
f
(
φ, φ
) depends on
φ
only locally,
in which case we can think of the transformation as being generated by the
vector
V
f
=
Z
M
d
d
x f(φ, φ)
δ
δφ(x)
acting on the “space of fields”.
If the function
S
[
φ
] is invariant under
V
f
when
ε
is constant, then for general
ε(x), we must have
δS =
Z
d
d
x j
µ
(x)
µ
ε.
for some field-dependent current
j
µ
(
x
) (we can actually find an explicit expression
for j
µ
). If we choose ε(x) to have compact support, then we can write
δS =
Z
d
d
x (
µ
j
µ
) ε(x).
On solutions of the field equation, we know the action is stationary under
arbitrary variations. So δS = 0. Since ε(x) was arbitrary, we must have
µ
j
µ
= 0.
So we know that j
µ
is a conserved current.
Given any such conserved current, we can define the charge
Q
[
N
] associated
to an (oriented) co-dimension 1 hypersurface N as
Q[N] =
Z
N
n
µ
j
µ
d
d1
x,
where
n
µ
is the normal vector to
N
. Usually,
N
is a time slice, and the normal
points in the future direction.
Now if
N
0
and
N
1
are two such hypersurfaces bounding a region
M
0
M
,
then by Stokes’ theorem, we have
Q[N
0
] Q[N
1
] =
Z
N
0
Z
N
1
n
µ
j
µ
d
n1
x =
Z
M
0
(
µ
j
µ
) d
n
x = 0.
So we find
Q[N
0
] = Q[N
1
].
This is the conservation of charge!
3.1 Ward identities
The derivation of Noether’s theorem used the classical equation of motion. But in
a quantum theory, the equation of motion no longer holds. We must re-examine
what happens in the quantum theory.
Suppose a transformation φ 7→ φ
0
of the fields has the property that
Dφ e
S[φ]
= Dφ
0
e
S[φ
0
]
.
In theory, the whole expression D
φ e
S[φ]
is what is important in the quantum
theory. In practice, we often look for symmetries where D
φ
and
e
S[φ]
are
separately conserved. In fact, what we will do is that we look for a symmetry
that preserves
S
[
φ
], and then try to find a regularization of D
φ
that is manifestly
invariant under the symmetry.
Often, it is not the case that the obvious choice of regularization of D
φ
is
manifestly invariant under the symmetry. For example, we might have a
S
[
φ
]
that is rotationally invariant. However, if we regularize the path integral measure
by picking a lattice and sampling
φ
on different points, it is rarely the case that
this lattice is rotationally invariant.
In general, there are two possibilities:
(i)
The symmetry could be restored in the limit. This typically means there
exists a regularized path integral measure manifestly invariant under this
symmetry, but we just didn’t use it. For example, rotational invariance is
not manifestly present in lattice regularization, but it is when we do the
cut-off regularization.
(ii)
It could be that the symmetry is not restored. It is said to be anomalous,
i.e. broken in the quantum theory. In this case, there can be no invariant
path integral measure. An example is scale invariance in QED, if the mass
of the electron is 0.
Sometimes, it can be hard to tell. For now, let’s just assume we are in a situation
where Dφ = Dφ
0
when ε is constant. Then for any ε(x), we clearly have
Z =
Z
Dφ e
S[φ]
=
Z
Dφ
0
e
S[φ
0
]
,
since this is just renaming of variables. But using the fact that the measure is
invariant when
ε
is constant, we can can expand the right-hand integral in
ε
,
and again argue that it must be of the form
Z
Dφ
0
e
S[φ
0
]
=
Z
Dφ e
S[φ]
1
Z
M
j
µ
µ
ε d
n
x
.
in first order in
ε
. Note that in general,
j
µ
can receive contributions from
S
[
φ
]
and D
φ
. But if it doesn’t receive any contribution from D
φ
, then it would just
be the classical current.
Using this expansion, we deduce that we must have
Z
Dφ e
S[φ]
Z
M
j
µ
µ
ε d
d
x = 0.
Integrating by parts, and using the definition of the expectation, we can write
this as
Z
M
ε∂
µ
hj
µ
(x)i d
n
x,
for any
ε
with compact support. Note that we dropped a normalization factor
of
Z
in the definition of
hj
µ
(
x
)
i
, because
Z
times zero is still zero. So we know
that hj
µ
(x)i is a conserved current, just as we had classically.
Symmetries of correlation functions
Having a current is nice, but we want to say something about actual observable
quantities, i.e. we want to look at how how symmetries manifest themselves with
correlation functions. Let’s look at what we might expect. For example, if our
theory is translation invariant, we might expect, say
hφ(x)φ(y)i = hφ(x a)φ(y a)i
for any a. This is indeed the case.
Suppose we have an operator
O
(
φ
). Then under a transformation
φ 7→ φ
0
,
our operator transforms as
O(φ) 7→ O(φ
0
),
By definition, the correlation function is defined by
hO(φ)i =
1
Z
Z
Dφ e
S[φ]
O(φ).
Note that despite the appearance of
φ
on the left, it is not a free variable. For
example, the correlation hφ(x
1
)φ(x
2
)i is not a function of φ.
We suppose the transformation
φ 7→ φ
0
is a symmetry. By a trivial renaming
of variables, we have
hO(φ)i =
1
Z
Z
Dφ
0
e
S[φ
0
]
O(φ
0
)
By assumption, the function D
φ e
S[φ]
is invariant under the transformation.
So this is equal to
=
1
Z
Z
Dφ e
S[φ]
O(φ
0
)
= hO(φ
0
)i.
This is, of course, not surprising. To make this slightly more concrete, we look
at an example.
Example.
Consider (
M, g
) = (
R
4
, δ
), and consider spacial translation
x 7→ x
0
=
x a for a constant vector a. A scalar field φ transforms as
φ(x) 7→ φ
0
(x) = φ(x a).
In most cases, this is a symmetry.
We suppose O(φ) can be written as
O(φ) = O
1
(φ(x
1
)) ···O
n
(φ(x
n
)),
where
O
i
depends only on the value of
φ
at
x
i
. A canonical example is when
O(φ) = φ(x
1
) ···φ(x
n
) is an n-point correlation function.
Then the above result tells us that
hO
1
(φ(x
1
)) ···O
n
(φ(x
n
))i = hO
1
(φ(x
1
a)) ···O
n
(φ(x
n
a))i
So the correlation depends only on the separations
x
i
x
j
. We can obtain similar
results if the action and measure are rotationally or Lorentz invariant.
Example.
Suppose we have a complex field
φ
, and we have a transformation
φ 7→ φ
0
=
e
φ
for some constant
α R/
2
πZ
. Then the conjugate field
transforms as
¯
φ 7→
¯
φ
0
= e
¯
φ.
Suppose this transformation preserves the action and measure. For example,
the measure will be preserved if we integrate over the same number of
φ
and
¯
φ
modes. Consider the operators
O
i
(φ,
¯
φ) = φ(x
i
)
s
i
¯
φ(x
i
)
r
i
.
Then the operators transform as
O
i
(φ,
¯
φ) 7→ O
i
(φ
0
,
¯
φ
0
) = e
α(r
i
s
i
)
O
i
(φ,
¯
φ).
So symmetry entails
*
m
Y
i=1
O
i
(x
i
)
+
= exp
m
X
i=1
(r
i
s
i
)
!*
m
Y
i=1
O
i
(x
i
)
+
.
Since this is true for all α, the correlator must vanish unless
m
X
i=1
r
i
=
m
X
i=1
s
i
.
So we need the same number of φ and
¯
φ insertions in total.
We can interpret this in terms of Feynman diagrams — each propagator joins
up a
φ
and
¯
φ
. So if we don’t have equal number of
φ
and
¯
φ
, then we can’t draw
any Feynman diagrams at all! So the correlator must vanish.
Ward identity for correlators
What we’ve done with correlators was rather expected, and in some sense trivial.
Let’s try to do something more interesting.
Again, consider an operator that depends only on the value of
φ
at finitely
many points, say
O(φ) = O
1
(φ(x
1
)) ···O
n
(φ(x
n
)).
As before, we will write the operators as O
i
(x
i
) when there is no confusion.
As in the derivation of Noether’s theorem, suppose we have an infinitesimal
transformation, with
φ 7→ φ
+
εδφ
that is a symmetry when
ε
is constant. Then
for general ε(x), we have
Z
Dφ e
S[φ]
O
1
(φ(x
1
)) ···O
n
(φ(x
n
))
=
Z
Dφ
0
e
S[φ
0
]
O
1
(φ
0
(x
1
)) ···O
n
(φ
0
(x
n
))
=
Z
Dφ e
S[φ]
1
Z
j
µ
(x)
µ
ε(x) dx
O
1
(φ(x
1
)) ···O
n
(φ(x
n
))
+
n
X
i=1
ε(x
i
)δO
i
(x
i
)
Y
j6=i
O
j
(x
j
)
,
where
δO(x
i
) =
O
φ
δφ.
Again, the zeroth order piece of the correlation function cancels, and to lowest
non-trivial order, we find that we must have
Z
µ
ε(x)
*
j
µ
(x)
n
Y
i=1
O
i
(x
i
)
+
d
d
x =
m
X
i=1
ε(x
i
)
*
δO
i
(x
i
)
Y
j6=i
O
j
(x
j
)
+
.
On the left, we can again integrate by parts to shift the derivative to the current
term. On the right, we want to write it in the form
R
ε
(
x
)
···
d
d
x
, so that we
can get rid of the
ε
term. To do so, we introduce some
δ
-functions. Then we
obtain
Z
ε(x)
µ
*
j
µ
(x)
n
Y
i=1
O
i
(x
i
)
+
d
d
x
=
m
X
i=1
Z
ε(x)δ
d
(x x
i
)
*
δO
i
(x
i
)
Y
j6=i
O
j
(x
j
)
+
d
d
x.
Since this holds for arbitrary ε(x) (with compact support), we must have
µ
D
j
µ
(x)
Y
O
i
(x
i
)
E
=
n
X
i=1
δ
d
(x x
i
)
*
δO
i
(x
i
)
Y
j6=i
O
j
(x
j
)
+
.
This is the Ward identity for correlation functions. It says that the vector field
f
µ
(x, x
i
) =
*
j
µ
(x)
Y
i
O
i
(x
i
)
+
is divergence free except at the insertions x
i
.
This allows us to recover the previous invariance of correlations. Suppose
M
is compact without boundary. We then integrate the Ward identity over all
M
.
By Stokes’ theorem, we know the integral of any divergence term vanishes. So
we obtain
0 =
Z
M
µ
f
µ
(x, x
i
) d
d
x =
n
X
i=1
*
δO
i
(x
i
)
Y
j6=i
O
j
(x
j
)
+
= δ
*
n
Y
i=1
O
i
(x
i
)
+
,
Of course, this is what we would obtain if we set ε(x) 1 above.
That was nothing new, but suppose
M
0
M
is a region with boundary
N
1
N
0
.
N
1
N
0
M
0
Let’s see what integrating the Ward identity over
M
0
gives us. The left hand
side gives us
Z
M
0
µ
*
j
µ
(x)
n
Y
i=1
O
i
(x
i
)
+
d
d
x =
Z
N
1
N
0
n
µ
*
j
µ
(x)
n
Y
i=1
O
i
(x
i
)
+
d
d1
x
=
*
Q[N
1
]
n
Y
i=1
O
i
(x
i
)
+
*
Q[N
0
]
n
Y
i=1
O
i
(x
i
)
+
The right hand side of Ward’s identity just gives us the sum over all points inside
M
0
.
*
Q[N
1
]
n
Y
i=1
O
i
(x
i
)
+
*
Q[N
0
]
n
Y
i=1
O
i
(x
i
)
+
=
X
x
i
M
0
*
δO
i
(x
i
)
m
Y
j6=i
O
j
(x
j
)
+
.
In particular, if
M
0
contains only one point, say
x
1
, and we choose the region to
be infinitesimally thin, then in the canonical picture, we have
h|T [
ˆ
Q,
ˆ
O
1
(x
1
)]
m
Y
j=2
ˆ
O
j
(x
j
) |i = h|T
δ
ˆ
O
1
n
Y
j=2
ˆ
O
j
(x
j
)
|i,
where |i is some (vacuum) state. So in the canonical picture, we find that
δ
ˆ
O = [
ˆ
Q,
ˆ
O].
So we see that the change of
ˆ
O
under some transformation is given by the
commutator with the charge operator.
3.2 The Ward–Takahashi identity
We focus on an important example of this. The QED action is given by
S[A, ψ] =
Z
d
d
x
1
4
F
µν
F
µν
+ i
¯
ψ
/
Dψ + m
¯
ψψ
.
This is invariant under the global transformations
ψ(x) 7→ e
ψ(x), A
µ
(x) 7→ A
µ
(x)
for constant
α R/
2
πZ
. The path integral measure is also invariant under this
transformation provided we integrate over equal numbers of
ψ
and
¯
ψ
modes in
the regularization.
In this case, the classical current is given by
j
µ
(x) =
¯
ψ(x)γ
µ
ψ(x).
We will assume that D
ψ
D
¯
ψ
is invariant under a position-dependent
α
. This is a
reasonable assumption to make, if we want our measure to be gauge invariant.
In this case, the classical current is also the quantum current.
Noting that the infinitesimal change in
ψ
is just (proportional to)
ψ
itself,
the Ward identity applied to hψ(x
1
)
¯
ψ(x
2
)i gives
µ
hj
µ
(x)ψ(x
1
)
¯
ψ(x
2
)i = δ
4
(x x
1
)hψ(x
1
)
¯
ψ(x
2
)i + δ
4
(x x
2
)hψ(x
1
)
¯
ψ(x
2
)i.
We now try to understand what these individual terms mean. We first understand
the correlators hψ(x
1
)
¯
ψ(x
2
)i.
Recall that when we did perturbation theory, the propagator was defined as
the Fourier transform of the free theory correlator
hψ
(
x
1
)
¯
ψ
(
x
2
)
i
. This is given
by
D(k
1
, k
2
) =
Z
d
4
x
1
d
4
x
2
e
ik
1
·x
1
e
ik
2
·x
2
hψ(x
1
)
¯
ψ(x
2
)i
=
Z
d
4
y d
4
x
2
e
i(k
1
k
2
)·x
2
e
ik
1
·y
hψ(y)
¯
ψ(0)i
= δ
4
(k
1
k
2
)
Z
d
4
y e
ik·y
hψ(y)
¯
ψ(0)i.
Thus, we can interpret the interacting correlator
hψ
(
x
1
)
¯
ψ
(
x
2
)
i
as the propa-
gator with “quantum corrections” due to the interacting field.
Definition (Exact propagator). The exact (electron) propagator is defined by
S(k) =
Z
d
4
y e
ik·y
hψ(y)
¯
ψ(0)i,
evaluated in the full, interacting theory.
Usually, we don’t want to evaluate this directly. Just as we can compute the
sum over all diagrams by computing the sum over all connected diagrams, then
take exp, in this case, one useful notion is a one-particle irreducible graph.
Definition
(One-particle irreducible graph)
.
A one-particle irreducible graph for
hψ
¯
ψi
is a connected Feynman diagram (in momentum space) with two external
vertices
¯
ψ
and
ψ
such that the graph cannot be disconnected by the removal of
one internal line.
This definition is rather abstract, but we can look at some examples to see
what this actually means.
Example. The following are one-particle irreducible graphs:
γ
¯
ψ ψ
γ
¯
ψ ψ
while the following is not:
¯
ψ ψ
We will write
1PI
= Σ(
/
k)
for the sum of all contributions due to one-particle irreducible graphs. This is
known as the electron self-energy. Note that we do not include the contributions
of the propagators connecting us to the external legs. It is not difficult to see
that any Feynman diagram with external vertices
¯
ψ, ψ is just a bunch of 1PI’s
joined together. Thus, we can expand
S(k)
¯
ψ ψ
¯
ψ
ψ
+
¯
ψ ψ
¯
ψ
ψ
1
i
/
k + m
+ quantum corrections.
with the quantum corrections given by
¯
ψ ψ
¯
ψ
ψ
=
¯
ψ ψ
1PI
¯
ψ
ψ
+
¯
ψ ψ
1PI 1PI
¯
ψ
ψ
+ ···
This sum is easy to perform. The diagram with
n
many 1PI’s has contributions
from
n
many 1PI’s and
n
+ 1 many propagators. Also, momentum contribution
forces them to all have the same momentum. So we simply have a geometric
series
S(k)
1
i
/
k + m
+
1
i
/
k + m
Σ(
/
k)
1
i
/
k + m
+
1
i
/
k + m
Σ(
/
k)
1
i
/
k + m
Σ(
/
k)
1
i
/
k + m
+ ···
=
1
i
/
k + m Σ(
/
k)
.
We can interpret this result as saying integrating out the virtual photons
gives us a shift in the kinetic term by Σ(
/
k).
We now move on to study the other term. It was the expectation
hj
µ
(x)ψ(x
1
)
¯
ψ(x
2
)i.
We note that using the definition of D, our classical action can be written as
S[A, ψ] =
Z
d
d
x
1
4
F
µν
F
µν
+
¯
ψ
/
ψ + j
µ
A
µ
+ m
¯
ψψ.
In position space, this gives interaction vertices of the form
x
1
x
x
2
Again, we want to consider quantum corrections to this interaction vertex. It
turns out the interesting correlation function is exactly hj
µ
(x)ψ(x
1
)
¯
ψ(x
2
)i.
This might seem a bit odd. Why do we not just look at the vertex itself, and
just consider
hj
µ
i
? Looking at
ψj
µ
¯
ψ
instead corresponds to including including
the propagators coming from the external legs. The point is that there can be
photons that stretch across the vertex, looking like
x
1
x
x
2
So when doing computations, we must involve the two external electron prop-
agators as well. (We do not include the photon propagator. We explore what
happens when we do that in the example sheet)
We again take the Fourier transform of the correlator, and define
Definition
(Exact electromagnetic vertex)
.
The exact electromagnetic vertex
Γ
µ
(k
1
, k
2
) is defined by
δ
4
(p + k
1
k
2
)S(k
1
µ
(k
1
, k
2
)S(k
2
)
=
Z
d
4
x d
4
x
1
d
4
x
2
hj
µ
(x)ψ(x
1
)
¯
ψ(x
2
)ie
ip·x
e
ik
1
·x
1
e
ik
2
·x
2
.
Note that we divided out the
S
(
k
1
) and
S
(
k
2
) in the definition of Γ
µ
(
k
1
, k
2
),
because ultimately, we are really just interested in the vertex itself.
Can we figure out what this Γ is? Up to first order, we have
hψ(x
1
)j
µ
(x)
¯
ψ(x
2
)i hψ(x
1
)
¯
ψ(x)iγ
µ
hψ(x)
¯
ψ(x
2
)i + quantum corrections.
So in momentum space, after dividing out by the exact propagators, we obtain
Γ
µ
(k
1
, k
2
) = γ
µ
+ quantum corrections.
This first order term corresponds to diagrams that do not include photons going
across the two propagators, and just corresponds to the classical
γ
µ
inside the
definition of j
µ
. The quantum corrections are the interesting parts.
In the case of the exact electron propagator, we had this clever idea of
one-particle irreducible graphs that allowed us to simplify the propagator compu-
tations. Do we have a similar clever idea here? Unfortunately, we don’t. But we
don’t have to! The Ward identity relates the exact vertex to the exact electron
propagator.
Taking the Fourier transform of the Ward identity, and dropping some
δ-functions, we obtain
(k
1
k
2
)
µ
S(k
1
µ
(k
1
, k
2
)S(k
2
) = iS(k
1
) iS(k
2
).
Recall that
S
(
k
i
) are matrices in spinor space, and we wrote them as
1
···
. So it is
easy to invert them, and we find
(k
1
k
2
)
µ
Γ
µ
(k
1
, k
2
) = iS
1
(k
2
) iS
1
(k
1
)
= i(i
/
k
1
+ m Σ(
/
k
1
) i
/
k
2
m + Σ(
/
k
2
))
= (k
1
k
2
)
µ
γ
µ
+ i(Σ(
/
k
1
) Σ(
/
k
2
)).
This gives us an explicit expression for the quantum corrections of the exact
vertex Γ
µ
in terms of the quantum corrections of the exact propagator S(k).
Note that very little of this calculation relied on what field we actually worked
with. We could have included more fields in the theory, and everything would
still go through. We might obtain a different value of Σ(
/
k
), but this relation
between the quantum corrections of Γ
µ
and the quantum corrections of
S
still
holds.
What is the “philosophical” meaning of this? Recall that the contributions
to the propagator comes from the
¯
ψ
/
ψ
term, while the contributions to the
vertex comes from the
¯
ψ
/
term. The fact that their quantum corrections are
correlated in such a simple way suggests that our quantum theory treats the
¯
ψ
/
Dψ
term as a whole, and so they receive the “same” quantum corrections. In
other words, the quantum theory respects gauge transformations. When we first
studied QED, we didn’t understand renormalization very well, and the Ward
identity provided a sanity check that we didn’t mess up the gauge invariance of
our theory when regularizing.
How could we have messed up? In the derivations, there was one crucial
assumption we made, namely that D
ψ
D
¯
ψ
is invariant under position-dependent
transformations
ψ
(
x
)
7→ e
(x)
ψ
(
x
). This was needed for
j
µ
to be the classical
current. This is true if we regularized by sampling our field at different points
in space, as long as we included the same number of ψ and
¯
ψ terms.
However, historically, this is not what we used. Instead, we imposed cutoffs
in the Fourier modes, asking
k
2
Λ
0
. This is not compatible with arbitrary
changes
ψ
(
x
)
7→ e
(x)
ψ
(
x
), as we can introduce some really high frequency
changes in ψ by picking a wild α.
4 Wilsonian renormalization
4.1 Background setting
We are now going to study renormalization. Most of the time, we will assume we
are talking about a real scalar field ϕ, but the ideas and results are completely
general.
Supposed we did some experiments, obtained some results, and figured that
we are probably working with a quantum field theory described by some action.
But we didn’t test our theory to arbitrarily high energies. We don’t know how
“real physics” looks like at high energy scales. So we can’t really write down a
theory that we can expect is valid to arbitrarily high energies.
However, we have previously seen that “integrating out” high energy particles
has the same effect as just modifying the coupling constants of our theory.
Similarly, even with a single fixed field
ϕ
, we can integrate out the high energy
modes of the field
ϕ
, and obtain an effective theory. Suppose we integrate out
all modes with k
2
Λ
0
, and obtain an effective action
S
Λ
0
[ϕ] =
Z
M
d
d
x
"
1
2
(ϕ)
2
+
X
i
g
i
O
i
(ϕ, ϕ)
#
.
This, by definition, means the partition function of the theory is now given by
Z =
Z
C
(M)
Λ
0
Dϕ e
S
Λ
0
[ϕ]
,
where
C
(
M
)
Λ
0
denotes the space of all functions on
M
consisting of sums
(integrals) of eigenmodes of the Laplacian with eigenvalues
Λ
0
(in “layman”
terms, these are fields with momentum
k
2
Λ
0
). This effective action can
answer questions about “low energy physics”, at scales
<
Λ
0
, which we can use
to test our theory against experiments.
Note that in the case of a compact universe, the Laplacian has discrete
eigenvalues. For example, if we work on a flat torus (equivalently,
R
n
with
periodic boundary conditions), then the possible eigenvalues of the Laplacian lie
on a lattice. Then after imposing a cutoff, there are only finitely many energy
modes, and we have successfully regularized the theory into something that
makes mathematical sense.
In the case of a non-compact universe, this doesn’t happen. But still, in
perturbation theory, this theory will give finite answers, not infinite ones. The
loop integrals will have a finite cut-off, and will thus give finite answers. Of
course, we are not saying that summing all Feynman diagrams will give a finite
answer this doesn’t happen even for 0-dimensional QFTs. (This isn’t exactly
true. There are further subtleties due to “infrared divergences”. We’ll mostly
ignore these, as they are not really problems related to renormalization)
Either way, we have managed to find ourselves a theory that we are reasonably
confident in, and gives us the correct predictions in our experiments.
Now suppose 10 years later, we got the money and built a bigger accelerator.
We can then test our theories at higher energy scales. We can then try to write
down an effective action at this new energy scale. Of course, it will be a different
action, since we have changed the energy scale. However, the two actions are
not unrelated! Indeed, they must give the same answers for our “low energy”
experiments. Thus, we would like to understand how the action changes when
we change this energy scale.
In general, the coupling constants are a function of the energy scale Λ
0
, and
we will write them as
g
i
0
). The most general (local) action can be written as
S
Λ
0
[ϕ] =
Z
M
d
d
x
"
1
2
(ϕ)
2
+
X
i
g
i
0
dd
i
0
O
i
(ϕ, ϕ)
#
,
where
O
(
ϕ, ϕ
) are monomials in fields and derivatives, and
d
i
the mass di-
mension [
O
i
]. Note that this expression assumes that the kinetic term does not
depend on Λ
0
. This is generally not the case, and we will address this issue later.
We inserted the factor of Λ
dd
i
0
such that the coupling constants
g
i
0
) are
dimensionless. This is useful, as we are going to use dimensional analysis a
lot. However, this has the slight disadvantage that even without the effects of
integrating out fields, the coupling constants
g
i
must change as we change the
energy scale Λ
0
.
To do this, we need to actually figure out the mass dimension of our operators
O
i
. Thus, we need to figure out the dimensions of
ϕ
and
. We know that
S
itself is dimensionless, since we want to stick it into an exponential. Thus, any
term appearing in the integrand must have mass dimension
d
(as the measure
has mass dimension d).
By looking at the kinetic term in the Lagrangian, we deduce that we must
have
[(ϕ)
2
] = d.
as we have to eventually integrate it over space.
Also, we know that [
µ
] = 1. Thus, we must have
Proposition.
[
µ
] = 1, [ϕ] =
d 2
2
.
4.2 Integrating out modes
Suppose, for some magical reason, we know exactly what the theory at the
energy scale Λ
0
is, and they are given by the coupling coefficients
g
i
0
). What
happens when we integrate out some high energy modes?
We pick some Λ
<
Λ
0
, and split our field
ϕ
into “low” and “high” energy
modes as follows:
ϕ(x) =
Z
|p|≤Λ
0
d
d
p
(2π)
4
˜ϕ(p)e
ip·x
=
Z
0≤|p|≤Λ
d
d
p
(2π)
4
˜ϕ(p)e
ip·x
+
Z
Λ<|p|≤Λ
0
d
d
p
(2π)
4
˜ϕ(p)e
ip·x
.
We thus define
φ(x) =
Z
0≤|p|≤Λ
d
d
p
(2π)
4
˜ϕ(p)e
ip·x
χ(x) =
Z
Λ<|p|≤Λ
0
d
d
p
(2π)
4
˜ϕ(p)e
ip·x
,
and so
ϕ(x) = φ(x) + χ(x).
Let’s consider the effective theory we obtain by integrating out
χ
. As before, we
define the scale Λ effective action
S
Λ
[φ] = ~ log
"
Z
C
(M)
Λ<|p|<Λ
0
Dχ e
S
Λ
0
[ϕ,χ]/~
#
. ()
Of course, this can be done for any Λ
<
Λ
0
, and so defines a map from [0
,
Λ
0
] to
the “space of all actions”. More generally, for any
ε
, this procedure allows us to
take a scale Λ action and produce a scale Λ
ε
effective action from it. This is
somewhat like a group (or monoid) action on the “space of all actions”, and thus
the equation () is known as the Wilsonian renormalization group equation.
Just as we saw in low-dimensional examples, when we do this, the coupling
constants of the interactions will shift. For each Λ
<
Λ
0
, we can define the
shifted coefficients g
i
(Λ), as well as Z
Λ
and δm
2
, by the equation
S
Λ
[φ] =
Z
M
d
d
x
"
Z
Λ
2
(φ)
2
+
X
i
Λ
dd
i
Z
n
i
/2
Λ
g
i
(Λ)O
i
(φ, φ)
#
,
where n
i
is the number of times φ or φ appears in O
i
.
Note that we normalized the
g
i
(Λ) in terms of the new Λ and
Z
Λ
. So even if,
by some miracle, our couplings receive no new corrections, the coefficients still
transform by
g
i
(Λ) =
Λ
0
Λ
dd
i
g
i
0
).
The factor
Z
Λ
account from the fact that there could be new contributions to
the kinetic term for
φ
. This is called wavefunction renormalization. The factor
Z
Λ
is not to be confused with the partition function, which we denote by a
calligraphic Z instead. We will explore these in more detail later.
We define
Z, g
i
(Λ)) =
Z
C
(M)
Λ
Dϕ e
S
Λ
[ϕ]/~
.
Then by construction, we must have
Z
0
, g
i
0
)) = Z, g
i
(Λ))
for all Λ
<
Λ
0
. This is a completely trivial fact, because we obtained
Z
, g
i
(Λ))
simply by doing part of the integral and leaving the others intact.
We will assume that
Z
varies continuously with Λ (which is actually not the
case when the allowed modes are discrete, but whatever). It is then convenient
to write the above expression infinitesimally, by taking the derivative. Instead
of the usual
d
, it is more convenient to talk about the operator Λ
d
instead,
as this is a dimensionless operator.
Differentiating the above equation, we obtain
Λ
dZ
, g
i
(Λ)) = Λ
Z
Λ
g
i
+
X
i
Z
g
i
Λ
Λ
g
i
Λ
= 0. ()
This is the Callan-Symanzik equation for the partition function.
It is convenient to refer to the following object:
Definition (Beta function). The beta function of the coupling g
i
is
β
i
(g
j
) = Λ
g
i
Λ
.
As mentioned before, even if our coupling constants magically receive no
corrections, they will still change. Thus, it is convenient to separate out the
boring part, and write
β
i
(g
i
) = (d
i
d)g
i
+ β
quantum
i
({g
j
}).
Notice that perturbatively, the
β
quantum
i
(
{g
i
}
) come from loops coming from
integrating out diagrams. So generically, we expect them to depend on all other
coupling constants.
We will later need the following definition, which at this point is rather
unmotivated:
Definition (Anomalous dimension). The anomalous dimension of φ by
γ
φ
=
1
2
Λ
log Z
Λ
Λ
Of course, at any given scale, we can absorb, say,
Z
Λ
by defining a new field
ϕ(x) =
p
Z
Λ
φ
so as to give
ϕ
(
x
) canonically normalized kinetic terms. Of course, if we do
this at any particular scale, and then try to integrate out more modes, then the
coefficient will re-appear.
4.3 Correlation functions and anomalous dimensions
Let’s say we now want to compute correlation functions. We will write
S
Λ
[φ, g
i
] =
Z
M
d
d
x
"
1
2
(φ)
2
+
m
2
2
φ
2
+
X
i
g
i
Λ
dd
i
0
O
i
(φ, φ)
#
,
where, as before, we will assume
m
2
is one the of the
g
i
. Note that the action
of the
φ
we produced by integrating out modes is not
S
Λ
[
φ, g
i
(Λ)], because
we had the factor of
Z
Λ
sticking out in the action. Instead, it is given by
S
Λ
[Z
1/2
φ, g
i
(Λ)].
Now we can write a general n-point correlation function as
hφ(x
1
) ···φ(x
n
)i =
1
Z
Z
Λ
Dφ e
S
Λ
[Z
1/2
Λ
φ,g
i
(Λ)]
φ(x
1
) ···φ(x
n
).
We can invent a canonically normalized field
ϕ(x) =
p
Z
Λ
φ,
so that the kinetic term looks right. Then defining
hϕ(x
1
) ···ϕ(x
n
)i =
1
Z
Z
Λ
Dφ e
S
Λ
[ϕ,g
i
(Λ)]
ϕ(x
1
) ···ϕ(x
n
),
we find
hφ(x
1
) ···φ(x
n
)i = Z
n/2
Λ
hϕ(x
1
) ···ϕ(x
n
)i.
Note that we don’t have to worry about the factor of
Z
1/2
Λ
coming from the
scaling of the path integral measure, as the partition function
Z
is scaled by the
same amount.
Definition
(n)
Λ
). We write
Γ
(n)
Λ
({x
i
}, g
i
) =
1
Z
Z
Λ
Dφ e
S
Λ
[φ,g
i
]
φ(x
1
) ···φ(x
n
) = hϕ(x
1
) ···ϕ(x
n
)i.
Now suppose 0
< s <
1, and that we’ve chosen to insert fields only with
energies
< s
Λ. Then we should equally be able to compute the correlator using
the low energy theory S
sΛ
. We’re then going to find
Z
n/2
sΛ
Γ
(n)
sΛ
(x
1
, ··· , x
n
, g
i
(sΛ)) = Z
n/2
Λ
Γ
(n)
sΛ
(x
1
, ··· , x
n
, g
i
(Λ)).
Differentiating this with respect to s, we find
Λ
d
Γ
(n)
Λ
(x
1
, ··· , x
n
, g
i
(Λ)) =
Λ
Λ
+ β
i
g
i
+
φ
Γ
(n)
Λ
({x
i
}, g
i
(Λ)) = 0.
This is the Callan-Symanzik equation for the correlation functions.
There is an alternative way of thinking about what happens when change Λ.
We will assume we work over
R
n
, so that it makes sense to scale our universe
(on a general Riemannian manifold, we could achieve the same effect by scaling
the metric). The coordinates change by
x 7→ sx
. How does Γ
Λ
(
x
1
, . . . , x
n
, g
i
)
relate to Γ
Λ
(sx
1
, . . . , sx
n
, g
i
)?
We unwrap the definitions
Γ
(n)
Λ
({sx
i
}, g
i
) =
1
Z
Z
Λ
Dφ e
S
Λ
[φ,g
i
]
φ(sx
1
) ···φ(sx
n
)
We make the substitution
ϕ
(
x
) =
(
sx
), with a constant
a
to be chosen later so
that things work out. Again, we don’t have to worry about how D
φ
transforms.
However, this change of variables does scale the Fourier modes, so the new cutoff
of ϕ is in fact sΛ. How the S
Λ
[φ, g
i
] transform? Using the chain rule, we have
S
sΛ
[ϕ, g
i
] =
Z
M
d
d
x
"
1
2
(ϕ)
2
+
m
2
2
ϕ
2
+
X
i
g
i
(sΛ)
dd
i
0
O
i
(ϕ, ϕ)
#
Putting in the definition of ϕ, and substituting y = sx, we have
= s
d
Z
M
d
d
y
"
1
2
a
2
s
2
(φ)
2
+
m
2
2
ϕ
2
+
X
i
g
i
(sΛ)
dd
i
0
O
i
(aφ, as∂φ)
#
,
where all fields are evaluated at
y
. We want this to be equal to
S
Λ
[
φ, g
i
]. By
looking at the kinetic term, we know that we need
a = s
(d2)/2
.
By a careful analysis, we see that the other terms also work out (or we know
they must be, by dimensional analysis). So we have
Γ
(n)
Λ
({sx
i
}, g
i
) =
1
Z
Z
Λ
Dφ e
S
Λ
[φ,g
i
]
φ(sx
1
) ···φ(sx
n
)
=
1
Z
Z
sΛ
Dϕ e
S
sΛ
[ϕ,g
i
]
s
(d2)n/2
ϕ(x
1
) ···ϕ(x
n
)
= s
(d2)n/2
Γ
(n)
sΛ
({x
i
}, g
i
)
Thus, we can write
Γ
n
Λ
(x
1
, ··· , x
n
, g
i
(Λ)) =
Z
Λ
Z
sΛ
n/2
Γ
n
sΛ
(x
1
, ··· , x
n
, g
i
(sΛ))
=
Z
Λ
s
2d
Z
sΛ
n/2
Γ
n
Λ
(sx
1
, ··· , sx
n
, g
i
(sΛ)).
Note that in the second step, we don’t change the values of the
g
i
! We are just
changing units for measuring things. We are not integrating out modes.
Equivalently, if y
i
= sx
i
, then what we found is that
Γ
n
Λ
y
1
s
, ··· ,
y
n
s
, g
i
(Λ)
=
Z
Λ
s
d2
Z
sΛ
n/2
Γ
m
Λ
(y
1
, ··· , y
n
, g
i
(sΛ)).
What does this equation say? Here we are cutting off the Γ at the same energy
level. As we reduce
s
, the right hand side has the positions fixed, while on the
left hand side, the points get further and further apart. So on the left hand side,
as
s
0, we are probing the theory at longer and longer distances. Thus, what
we have found is that “zooming out” in our theory is the same as flowing down
the couplings g
i
(sΛ) to a scale appropriate for the low energy theory.
Infinitesimally, let s = 1 δs, with 0 < δs 1. Then we have
Z
Λ
(1 δs)
2d
Z
(1δs
1/2
1 +
d 2
2
+ γ
φ
δs,
where γ
s
is the anomalous dimension of φ we defined before.
Classically, we’d expect this correlation function
hφ
(
sx
1
)
···φ
(
sx
n
)
i
to scale
with s as
d s
2
n
,
since that’s what dimensional analysis would tell us. But quantum mechanically,
we see that there is a correction given by γ
φ
, and what we really have is
n
φ
=
d 2
2
+ γ
φ
n
.
So the dependence of the correlation on the distance is not just what we expect
from dimensional analysis, but it gains a quantum correction factor. Thus, we
say γ
φ
is the “anomalous dimension” of the field.
4.4 Renormalization group flow
We now study the renormalization group flow. In other words, we want to
understand how the coupling constants actually change as we move to the
infrared, i.e. take Λ
0. The actual computations are difficult, so in this section,
we are going to understand the scenario rather qualitatively and geometrically.
We can imagine that there is a configuration space whose points are the
possible combinations of the
g
i
, and as we take Λ
0, we trace out a trajectory
in this configuration space. We want to understand how these trajectories look
like.
As in most of physics, we start at an equilibrium point.
Definition
(Critical point)
.
A critical point is a point in the configuration
space, i.e. a choice of couplings g
i
= g
i
such that β
i
(g
i
) = 0.
One such example of a critical point is the Gaussian theory, with all couplings,
including the mass term, vanishing. Since there are no interactions at all, nothing
happens when we integrate out modes. It is certainly imaginable that there are
other critical points. We might have a theory where the classical dimensions
of all couplings are zero, and also by a miracle, all quantum corrections vanish.
This happens, for example, in some supersymmetric theories. Alternatively, the
classical dimensions are non-zero, but the quantum corrections happen to exactly
compensate the effect of the classical dimension.
In either case, we have some couplings
g
i
that are independent of scale, and
thus the anomalous dimension
γ
φ
(
g
i
) =
γ
φ
would also be independent of scale.
This has important consequences.
Example.
At a critical point, the renormalization group equation for a two-point
function becomes
0 =
Λ
Λ
+ β
i
(g
i
)
g
i
+ 2γ
φ
(g
i
)
Γ
(2)
Λ
(x, y).
But the β-function is zero, and γ
φ
is independent of scale. So
Λ
Λ
Γ
(2)
Λ
(x, y) = 2γ
φ
Γ
(2)
Λ
(x, y).
On the other hand, on dimensional grounds, Γ must be of the form
Γ
(2)
Λ
(x, y, g
i
) = f|x y|, g
i
d2
for some function
f
. Feeding this into the RG equation, we find that Γ must be
of the form
Γ
(2)
Λ
(x, y, g
i
) =
Λ
d2
c(g
i
)
Λ
2∆
φ
|x y|
2∆
φ
c(g
i
)
|x y|
2∆
φ
,
where
c
(
g
i
) are some constants independent of the points. This is an example of
what we were saying before. This scales as
|x y|
2∆φ
, instead of
|x y|
2d
,
and the anomalous dimension is the necessary correction.
Now a Gaussian universe is pretty boring. What happens when we start
close to a critical point? As in, say, IA Differential Equations, we can try to
Taylor expand, and look at the second derivatives to understand the behaviour
of the system. This corresponds to Taylor-expanding the
β
-function, which is by
itself the first derivative.
We set our couplings to be
g
i
= g
i
+ δg
i
.
Then we can write
Λ
g
i
Λ
g
i
+δg
i
= B
ij
({g
k
})δg
j
+ O(δg
2
),
where
B
ij
is (sort of) the Hessian matrix, which is an infinite dimensional matrix.
As in IA Differential Equations, we consider the eigenvectors of
B
ij
. Suppose
we have an “eigencoupling” σ
j
. Classically, we expect
g
i
(Λ) =
Λ
Λ
0
d
i
d
g
i
0
),
and so
δg
j
=
δ
ij
gives an eigenvector with eigenvalue
d
i
d
. In the fully quantum
case, we will write the eigenvalue as
i
d, and we define
γ
i
= ∆
i
d
i
to be the anomalous dimension of the operator. Since
σ
j
was an eigenvector, we
find that
Λ
σ
i
Λ
= (∆
i
d)σ
i
.
Consequently, we find
σ
i
(Λ) =
Λ
Λ
0
i
d
σ
i
0
)
to this order.
Suppose
i
> d
. Then as we lower the cutoff from Λ
0
to 0, we find that
σ
i
(Λ)
0 exponentially. So we flow back to the theory at
g
i
as we move to
lower energies. These operators are called irrelevant.
Assuming that quantum corrections do not play a very large role near the
critical point, we know that there must be infinitely many such operators, as we
can always increase
d
i
, hence
i
by adding more derivatives or fields (for
d >
2).
So we know the critical surface is infinite dimensional.
On the other hand, if
i
< d
, then
σ
(Λ) increases as we go to the infrared.
These operators hence become more significant. These are called relevant
operators. There are only finitely many such relevant operators, at least for
d > 2. Any RG trajectory emanating from g
i
is called a critical trajectory.
We can draw a picture. The critical surface
C
consisting of (the span of) all
irrelevant modes, and is typically infinite dimensional with finite codimension:
A generic QFT will start at scale Λ
0
with both relevant and irrelevant operators
turned on. As we flow along the RG trajectory, we focus towards the critical
trajectory. This focusing is called universality.
This is, in fact, the reason we can do physics! We don’t know about the
detailed microscopic information about the universe. Further, there are infinitely
many coupling constants that can be non-zero. But at low energies, we don’t
need to know them! Most of them are irrelevant, and at low energies, only the
relevant operators matter, and there is only finitely many of them.
One thing we left out in our discussion is marginal operators, i.e. those
with
i
=
d
. To lowest order, these are unchanged under RG flow, but we
have to examine higher order corrections to decide whether these operators are
marginally relevant or marginally irrelevant, or perhaps exactly marginal.
Marginally relevant or marginally irrelevant may stay roughly constant for
long periods of RG evolution. Because of this, these operators are often important
phenomenologically. We’ll see that most of the couplings we see in QED and
QCD, apart from mass terms, are marginal operators, at least to lowest order.
If we use the classical dimension in place of
i
, it is straightforward to figure
out what the relevant and marginal operators are. We know that in
d
dimensions,
the mass dimension [
φ
] =
d2
2
for a scalar field, and [
] = 1. Focusing on the
even operators only, we find that the only ones are
Dimension d Relevant operators Marginal operators
2 φ
2k
for all k > 0 (φ)
2
, φ
2k
(φ)
2
for all k > 0
3 φ
2k
for k = 1, 2 (φ)
2
, φ
6
4 φ
2
(φ)
2
, φ
4
> 4 φ
2
(φ)
2
Of course, there are infinitely many irrelevant operators, and we do not attempt
to write them out.
Thus, with the exception of
d
= 2, we see that there is a short, finite list of
relevant and marginal operators, at least if we just use the classical dimension.
Note that here we ignored all quantum corrections, and that sort-of defeats
the purpose of doing renormalization. In general, the eigen-operators will not be
simple monomials, and can in fact look very complicated!
4.5 Taking the continuum limit
So far, we assumed we started with an effective theory at a high energy scale Λ
0
,
and studied the behaviour of the couplings as we flow down to low energies. This
is pretty much what we do in, say, condensed matter physics. We have some
detailed description of the system, and then we want to know what happens
when we zoom out. Since we have a fixed lattice of atoms, there is a natural
energy scale Λ
0
to cut off at, based on the spacing and phonon modes of the
lattice.
However, in high energy physics, we want to do the opposite. We instead
want to use what we know about the low energy version of the system, and then
project and figure out what the high energy theory is. In other words, we are
trying to take the continuum limit Λ
0
.
What do we actually mean by that? Suppose our theory is defined at a
critical point g
i
and some cutoff Λ
0
. Then by definition, in our path integral
Z
0
, g
i
) =
Z
C
(M)
Λ
0
Dϕ e
S
Λ
0
[ϕ,g
i
]
,
No matter what values of Λ
0
we pick (while keeping the
g
i
fixed), we are going
to get the same path integral, and obtain the same answers, as that is what
“critical point” means. In particular, we are free to take the limit Λ
0
, and
then we are now integrating over “all paths”.
What if we don’t start at a critical point? Suppose we start somewhere on
the critical surface,
{g
i
}
. We keep the same constants, but raise the value of Λ
0
.
What does the effective theory at a scale Λ look like? As we increase Λ
0
, the
amount of “energy scale” we have to flow down to get to Λ increases. So as we
raise Λ
0
, the coupling constants at scale Λ flow towards the critical point. As
we take this continuum limit Λ
0
0, we end up at a critical point, namely a
conformal field theory. This is perhaps a Gaussian, which is not very interesting,
but at least we got something.
However, suppose our theory has some relevant operators turned on. Then
as we take the limit Λ
0
, the coupling constants of our theory diverges!
This sounds bad.
It might seem a bit weird that we fix the constants and raise the values of Λ
0
.
However, sometimes, this is a reasonable thing to do. For example, if we think
in terms of the “probing distances” of the theory, as we previously discussed,
then this is equivalent to taking the same theory but “zooming out” and probing
it at larger and larger distances. It turns out, when we do perturbation theory,
the “naive” thing to do is to do exactly this. Of course, we now know that the
right thing to do is that we should change our couplings as we raise Λ
0
, so as to
give the same physical predictions at any fixed scale Λ
<
Λ
0
. In other words, we
are trying to backtrace the renormalization group flow to see where we came
from! This is what we are going to study in the next chapter.
4.6 Calculating RG evolution
We now want to actually compute the RG evolution of a theory. To do so, we of
course need to make some simplifying assumptions, and we will also leave out a
lot of details. We note that (in
d >
2), the only marginal or relevant operators
that involves derivatives is the kinetic term (
ϕ
)
2
. This suggests we can find a
simple truncation of the RG evolution by restricting to actions of the form
S[ϕ] =
Z
d
d
x
1
2
(ϕ)
2
+ V (ϕ)
,
and write the potential as
V (ϕ) =
X
k
Λ
dk(d2)
g
2k
(2k)!
ϕ
2k
.
In other words, we leave out higher terms that involve derivatives. This is known
as the local potential approximation (LPA). As before, we split our field into low
and high energy modes,
ϕ = φ + χ,
and we want to compute the effective action for φ:
S
eff
Λ
[φ] = ~ log
Z
C
(M)
,Λ
0
]
Dχ e
S[φ+χ]
.
This is still a very complicated path integral to do. To make progress, we assume
we lower the cutoff just infinitesimally, Λ = Λ
0
δ
Λ. The action at scale Λ now
becomes
S[φ + χ] = S[φ] +
Z
d
d
x
1
2
(χ)
2
+
1
2
χ
2
V
00
(φ) +
1
3!
χ
3
V
000
(φ) + ···
,
where it can be argued that we can leave out the terms linear in χ.
Since we’re just doing path integral over modes with energies in
δ
Λ
,
Λ],
each loop integral takes the form
Z
ΛδΛ≤|p|≤Λ
d
d
p ··· = Λ
d1
δΛ
Z
S
d1
dΩ ··· ,
where dΩ denotes an integral over the unit (
d
1) sphere. Since each loop
integral comes with a factor of
δ
Λ, to leading order, we need to consider only
1-loop diagrams.
A connected graph with
E
edges and
L
loops and
V
i
vertices of
χ
-valency
i
(and arbitrarily many valency in φ) obeys
L 1 = E
X
i=2
V
i
.
Note that by assumption, there are no single-χ vertices.
Also, every edge contributes to two vertices, as there are no
χ
loose ends.
On the other hand, each vertex of type i has i many χ lines. So we have
2E =
X
i
V
i
.
Combining these two formulae, we have
L = 1 +
X
i=2
(i 2)
2
V
i
.
The number on the right is non-negative with equality iff
V
i
= 0 for all
i
3.
Hence, for 1-loop diagrams, we only need to consider vertices with precisely two
χ-lines attached. Thus, all the contributions look like
, , , . . .
We can thus truncate the action as
S[φ + χ] S[φ] =
Z
d
d
x
1
2
(χ)
2
+
1
2
χ
2
V
00
(φ)
.
This is still not very feasible to compute. We are only going to do this integral
in a very specific case, where φ is chosen to be constant.
We use the fact that the Fourier modes of
χ
only live between Λ
δ
Λ
< |p| <
Λ.
Then taking the Fourier transform and doing the integral in momentum space,
we have
S[φ + χ] S[φ] =
Z
ΛδΛ<|p|≤Λ
d
d
p
2(2π)
d
˜χ(p)(p
2
+ V
00
(φ))˜χ(p)
=
Λ
d1
δΛ
2(2π)
d
2
+ V
00
(φ)]
Z
S
d1
dΩ ˜χ(Λˆp)˜χˆp).
The
χ
path integral is finite if we work on a compact space, say
T
d
with side
length
L
, in which case there are only finitely many Fourier modes. Then the
momenta are
p
µ
=
2π
L
n
µ
, and the path integral is just a product of Gaussians
integrals. Going through the computations, we find
e
δ
Λ
S
=
Z
Dχ e
(S[φ+χ]S[φ])
= C
π
Λ
2
+ V
00
(φ)
N/2
,
where
N
is the number of
χ
modes in our shell of radius Λ and thickness
δ
Λ,
and C is some constant. From our previous formula, we see that it is just
N = vol(S
d1
d1
δΛ ·
L
2π
d
2aΛ
d1
δΛ,
where
a =
vol(S
d1
)
2(2π
d
)
=
1
(4π)
d/2
Γ(d/2)
L
d
.
Consequently, up to field-independent numerical factors, integrating out
χ
leads
to a change in the effective action
δ
Λ
S
eff
δΛ
= a log(Λ
2
+ V
00
(φ))Λ
d1
L
d
.
This diverges as
L
! This is infrared divergence, and it can be traced to our
simplifying assumption that
φ
is everywhere constant. More generally, it is not
unreasonable to believe that we have
δ
Λ
S
eff
δΛ
= aΛ
d1
Z
d
d
x log
2
+ V
00
(φ)).
This isn’t actually quite the result, but up to the local approximation, it is.
Now we can write down the
β
-function. As expected, integrating out some
modes has lead to new terms. We have
Λ
dg
2k
= [k(d 2) d]g
2k
aΛ
k(d2)
2k
φ
2k
log(Λ
2
+ V
00
(φ))
φ=0
.
As before, the first term does not relate to quantum corrections. It is just due to
us rescaling our normalization factors. On the other hand, the 2
k
th derivative is
just a fancy way to extract the factor of φ
2k
in term.
We can actually compute these things!
Example. Then we find that
Λ
dg
2
= 2g
ag
4
1 + g
2
Λ
dg
4
= (d 4)g
4
ag
6
(1 + g
2
)
+
3ag
2
4
(1 + g
2
)
2
Λ
dg
6
= (2d 6)g
6
ag
8
(1 + g
2
)
+
15ag
4
g
6
(1 + g
2
)
2
30ag
3
4
(1 + g
2
)
3
.
Note that the first term on the right hand side is just the classical behaviour
of the dimensionless couplings. It has nothing to do with the
χ
field. The
remaining terms are quantum corrections (
~
), and each comes from specific
Feynman diagrams. For example,
ag
4
1 + g
2
involves one quartic vertex, and this comes from the diagram
There is one
χ
propagator, and this gives rise to the one 1 +
g
2
factor in the
denominator.
The first term in β
4
comes from
The second term comes from
Note that g
2
is just the dimensionless mass coupling of χ,
g
2
=
m
2
Λ
2
.
At least perturbatively, we expect this to be a relevant coupling. It increases
as Λ
0. Consequently, at scales Λ
m
, the quantum corrections to these
β-functions are strongly suppressed! This makes sense!
The Gaussian fixed point
From the formulae we derived, we saw that there is only one critical point, with
g
2k
= 0 for all
k
2. This is free since there are no vertices at all, and hence no
corrections can happen. This is just the free theory.
In a neighbourhood of this critical point, we can expand the
β
-functions in
lowest order in δg
i
= g
i
g
i
. We have
β
2k
= Λ
g
2k
Λ
= [k(d 2) d]g
2k
ag
2k+2
.
Writing this linearized β-function as
β
2i
= B
ij
g
2j
,
we see that
B
ij
is upper triangular, and hence its eigenvalues are just the diagonal
entries, which are
k(d 2) d = 2k 4,
in four dimensions.
So vertices
φ
2k
with
k
3 are irrelevant. If we turn them on at some scale,
they become negligible as we fall towards the infrared. The mass term
g
2
is
relevant, as we said before, so even a small mass becomes increasingly significant
in the infrared. Of course, we are making these conclusions based on a rather
perturbative way of computing the
β
function, and our predictions about what
happens at the far infrared should be taken with a grain of salt. However, we
can go back to check our formulation, and see that our conclusion still holds.
The interesting term is
φ
4
, which is marginal in
d
= 4 to lowest order. This
means we have to go to higher order. To next non-trivial order, we have
Λ
dg
4
= 3ag
2
4
+ O(g
2
4
g
2
),
where we neglected
g
6
as it is irrelevant. Using the specific value of
a
in
d
= 4,
we find that, to this order,
1
g
4
(Λ)
= C
3
16π
2
log Λ.
Equivalently, we have
g
4
(Λ) =
16π
2
3
log
µ
Λ

1
for some scale
µ
. If we have no higher order terms, then for the theory to make
sense, we must have g
4
> 0. This implies that we must pick µ > Λ.
How does this coefficient run? Our coupling
g
4
is marginal to leading order,
and consequently it doesn’t run as some power of Λ. It runs only logarithmically
in Λ.
This coupling is irrelevant. In the infrared limit, as we take Λ
0, we find
g
4
0, and so we move towards the Gaussian fixed point. This is rather boring.
On the other hand, if we take Λ
, then eventually
µ
Λ
hits 1, and we
divide by zero. So our perturbation theory breaks! Notice that we are not saying
that our theory goes out of control as Λ
. This perturbation theory breaks
at some finite energy scale!
Recall that last term, we were studying
φ
4
theory. We didn’t really run into
trouble, because we only worked at tree level (and hence wasn’t doing quantum
field theory). But if we actually try to do higher loop integrals, then everything
breaks down. The φ
4
theory doesn’t actually exist.
The Wilson–Fisher critical point
Last time, we ignored all derivatives terms, and we found, disappointedly, that
the only fixed point we can find is the free theory. This was bad.
Wilson–Fisher, motivated by condensed matter physics rather than funda-
mental physics, found another non-trivial fixed point. What they did was rather
peculiar. They set the dimension to
d
= 4
ε
, for some small values of
ε
. While
this might seem rather absurd, because non-integral dimensions do not exist
(unless we want to talk about fractals, but doing physics on fractals is hard),
but we can still do manipulations formally, and see if we get anything sensible.
They proved that there exists a new fixed point with
g
2
=
1
6
ε + O(ε
2
)
g
4
=
ε
3a
+ O(ε
2
)
g
2k
O(ε
k
).
for k 3. To study the behaviour near this critical point, we again expand
g
i
= g
i
+ δg
i
in the
β
-function we found earlier to lowest non-trivial order, this time expanding
around the Wilson–Fisher fixed point.
If we do this, then in the (g
2
, g
4
) subspace, we find that
Λ
Λ
δg
2
δg
4
=
ε
3
2 a
1 +
ε
6
0 ε
δg
2
δg
4
.
The eigenvalues and eigenvectors are
ε
3
2 and ε, with eigenvectors
σ
1
=
1
0
, σ
2
=
a
3 +
ε
2
2(3 + ε)
Notice that while the mass term itself is an eigenvector, the quartic coupling is
not! Using the asymptotic expansion
Γ
ε
2
2
ε
γ + O(ε),
where
γ
0
.
577 is the Euler–Mascheroni constant, plus fact that Γ(
x
+ 1) =
xΓ(x), we find that in d = 4 ε, we have
a =
1
(4π)
d/2
1
γ(d/2)
d=4ε
1
16π
2
+
ε
32π
2
(1 γ + log 4π) + O(ε
2
),
Since
ε
is small, we know that the eigenvalue of
σ
1
is negative. This means it is
a relevant operator. On the other hand
σ
4
, is an irrelevant operator. We thus
have the following picture of the RG flow:
g
4
g
2
I
II
III
IV
We see that theories in region I are massless and free in the deep UV, but flows
to become massive and interacting in the IR. Theories in region II behaves
similarly, but now the mass coefficient is negative. Consequently,
φ
= 0 is a local
maximum of the effective potential. These theories tend to exhibit spontaneous
symmetry breaking.
Finally, theories in III and IV do not have a sensible continuum limit as
both couplings increase without bound. So at least within perturbation theory,
these theories don’t exist. They can only manifest themselves as effective field
theories.
5 Perturbative renormalization
5.1 Cutoff regularization
Our discussion of renormalization has been theoretical so far. Historically, this
was not what people were studying when doing quantum field theory. Instead,
what they did was that they had to evaluate integrals in Feynman diagrams,
and the results happened to be infinite!
For example, consider the scalar
φ
4
theory in
d
= 4 dimensions, with action
given by
S[φ] =
Z
1
2
(φ)
2
+
1
2
m
2
φ
2
+
λ
4!
φ
4
d
4
x.
We want to compute the two point function
hφφi
. This has, of course, the tree
level diagram given by just the propagator. There is also a 1-loop diagram given
by
φ φ
k k
p
We will ignore the propagators coming from the legs, and just look at the loop
integral. The diagram has a symmetry factor of 2, and thus loop integral is
given by
λ
2(2π)
4
Z
d
4
p
p
2
+ m
2
.
This integral diverges. Indeed, we can integrate out the angular components,
and this becomes
λ
2(2π)
4
vol(S
3
)
Z
|p|
3
d|p|
|p|
2
+ m
2
.
The integrand tends to infinity as we take p , so this clearly diverges.
Well, this is bad, isn’t it. In light of what we have been discussing so far,
what we should do is to not view
S
as the Lagrangian in “continuum theory”,
but instead just as a Lagrangian under some cutoff
k
2
Λ
0
. Then when doing
the loop integral, instead of integrating over all
p
, we should integrate over all
p
such that p
2
Λ
0
. And this certainly gives a finite answer.
But we want to take the continuum limit, and so we want to take Λ
0
.
Of course the loop integral will diverge if we fix our coupling constants. So
we might think, based on what we learnt previously, that we should tune the
coupling constants as we go.
This is actually very hard, because we have no idea what is the “correct”
way to tune them. Historically, and practically, what people did was just to
introduce some random terms to cancel off the infinity.
The idea is to introduce a counterterm action
S
CT
[φ, Λ] = ~
Z
δZ
2
(φ)
2
+
δm
2
2
φ
2
+
δλ
4!
φ
4
d
4
x,
where
δz
,
δm
and
δφ
are some functions of Λ to be determined. We then set the
full action at scale Λ to be
S
Λ
[φ] = S[φ] + S
CT
[φ, Λ].
This action depends on Λ. Then for any physical quantity
hOi
we are interested
in, we take it to be
hOi = lim
Λ→∞
hOi computed with cutoff Λ and action S
Λ
.
Note that in the counterterm action, we included a factor of
~
in front of
everything. This means in perturbation theory, the tree-level contributions from
S
CT
would be of the same order as the 1-loop diagrams in S.
For example, in the above 1-loop diagram, we obtain further contributions
to the quadratic terms, given by
φ φ×
k
2
δZ
φ φ×
δm
2
We first evaluate the original loop integral properly. We use the mathematical
fact that vol(S
3
) = 2π
2
. Then the integral is
p
=
λ
16π
2
Z
Λ
0
0
p
3
dp
p
2
+ m
2
=
λm
2
32π
2
Z
Λ
2
0
/m
2
0
x dx
1 + x
=
λ
32π
2
Λ
2
0
m
2
log
1 +
Λ
2
0
m
2

,
where we substituted x = p
2
/m
2
in the middle.
Including these counter-terms, the 1-loop contribution to hφφi is
λ
32π
2
Λ
2
0
m
2
log
1 +
Λ
2
0
m
2

+ k
2
δZ + δm
2
.
The objective is, of course, to pick
δz
,
δm
,
δφ
so that we always get finite answers
in the limit. There are many ways we can pick these quantities, and of course,
different ways will give different answers. However, what we can do is that
we can fix some prescriptions for how to pick these quantities, and this gives
as a well-defined theory. Any such prescription is known as a renormalization
scheme.
It is important that we describe it this way. Each individual loop integral
still diverges as we take Λ
, as we didn’t change it. Instead, for each fixed
Λ, we have to add up, say, all the 1-loop contributions, and then after adding
up, we take the limit Λ . Then we do get a finite answer.
On-shell renormalization scheme
We will study one renormalization scheme, known as the on-shell renormalization
scheme. Consider the exact momentum space propagator
Z
d
4
x e
ik·x
hφ(x)φ(0)i.
Classically, this is just given by
1
k
2
+ m
2
,
where m
2
is the original mass term in S[φ].
In the on-shell renormalization scheme, we pick our counterterms such that
the exact momentum space propagator satisfies the following two properties:
It has a simple pole when
k
2
=
m
2
phys
, where
m
2
phys
is the physical mass
of the particle; and
The residue at this pole is 1.
Note that we are viewing k
2
as a single variable when considering poles.
To find the right values of
δm
and
δZ
, we recall that we had the one-particle
irreducible graphs, which we write as
Π(k
2
) =
k
1PI
,
where the dashed line indicates that we do not include the propagator contribu-
tions. For example, this 1PI includes graphs of the form
k k
k k
as well as counterterm contributions
×
k
2
δZ
×
δm
2
Then the exact momentum propagator is
∆(k
2
)
=
φ φ
k
+
φ φ
1PI
+
φ φ
1PI 1PI
+ ···
=
1
k
2
+ m
2
1
k
2
+ m
2
Π(k
2
)
1
k
2
+ m
2
+
1
k
2
+ m
2
Π(k
2
)
1
k
2
+ m
2
Π(k
2
)
1
k
2
+ m
2
+ ···
=
1
k
2
+ m
2
+ Π(k
2
)
.
The negative sign arises because we are working in Euclidean signature with
path integrals weighted by e
S
.
Thus, if we choose our original parameter
m
2
to be the measured
m
2
phys
, then
in the on-shell scheme, we want
Π(m
2
phys
) = 0,
and also
k
2
Π(k
2
)
k
2
=m
2
phys
= 0.
To 1-loop, the computations at the beginning of the chapter tells us
Π(k
2
) = δm
2
+ k
2
δZ +
λ
32π
2
Λ
2
0
m
2
log
1 +
Λ
2
0
m
2

.
We see that no 1-loop contributions involve
k
, which we can see in our unique
1-loop diagram, because the loop integral doesn’t really involve k in any way.
We see that the second condition forces δZ = 0, and then we must have
δZ = O(λ
2
)
δm
2
=
λ
32π
2
Λ
2
0
m
2
log
1 +
Λ
2
0
m
2

+ O(λ
2
).
Here to 1-loop, we don’t need wavefunction renormalization, but this is merely a
coincidence, not a general phenomenon.
Of course, if we consider higher loop diagrams, then we have further correc-
tions to the counterterms.
5.2 Dimensional regularization
People soon realized this was a terrible way to get rid of the infinities. Doing
integrals from 0 to Λ
0
is usually much harder than integrals from 0 to
.
Moreover, in gauge theory, it is (at least naively) incompatible with gauge
invariance. Indeed, say in U(1) gauge theory, a transformation
ψ(x) e
(x)
ψ(x)
can potentially introduce a lot of high energy modes. So our theory will not be
gauge invariant.
For these reasons, people invented a different way to get rid of infinite
integrals. This is known as dimensional regularization. This method of getting
rid of infinities doesn’t fit into the ideas we’ve previously discussing. It is just
magic. Moreover, this method only works perturbatively it tells us how to
get rid of infinities in loops. It doesn’t give any definition of a regularized path
integral measure, or describe any full, coherent non-perturbative theory that
predicts the results.
Yet, this method avoids all the problems we mentioned above, and is rather
easy to use. Hence, we will mostly used dimensional regularization in the rest of
the course.
To do dimensional regularization, we will study our theory in an arbitrary
dimension
d
, and do the integrals of loop calculations. For certain dimensions,
the integral will converge, and give us a sensible answer. For others, it won’t. In
particular, for d = 4, it probably won’t (or else we have nothing to do!).
After obtaining the results for some functions, we attempt to analytically
continue it as a function of
d
. Of course, the analytic continuation is non-unique
(e.g. we can multiply the result by
sin d
and still get the same result for integer
d
), but there is often an “obvious” choice. This does not solve our problem yet
this analytic continuation tends to still have a pole at
d
= 4. However, after
doing this analytic continuation, it becomes more clear how we are supposed to
get rid of the pole.
Note that we are not in any way suggesting the universe has a non-integer
dimension, or that non-integer dimensions even makes any sense at all. This is
just a mathematical tool to get rid of infinities.
Let’s actually do it. Consider the same theory as before, but in arbitrary
dimensions:
S[φ] =
Z
d
d
x
1
2
(φ)
2
+
1
2
m
2
φ
2
+
λ
4!
φ
4
.
In d dimensions, this λ is no longer dimensionless. We have
[φ] =
d 2
2
.
So for the action to be dimensionless, we must have
[λ] = 4 d.
Thus, we write
λ = µ
4d
g(µ)
for some arbitrary mass scale
µ
. This
µ
is not a cutoff, since we are not
going to impose one. It is just some arbitrary mass scale, so that
g
(
µ
) is now
dimensionless.
We can then compute the loop integral
p
=
1
2
gµ
4d
Z
d
d
p
(2π)
d
1
p
2
+ m
2
=
gµ
4d
2(2π)
d
vol(S
d1
)
Z
0
p
d1
dp
p
2
+ m
2
.
We note the mathematical fact that
vol(S
d1
) =
2π
d/2
Γ(d/2)
.
While
S
d1
does not make sense when
d
is not an integer, the right-hand
expression does. So replacing the volume with this expression, we can analytically
continue this to all d, and obtain
p
= µ
4d
Z
0
p
d1
dp
p
2
+ m
2
=
1
2
µ
4d
Z
0
(p
2
)
d/21
dp
2
p
2
+ m
2
=
m
2
2
µ
m
4d
Γ
d
2
Γ
1
d
2
.
The detailed computations are entirely uninteresting, but if one were to do this
manually, it is helpful to note that
Z
1
0
u
s1
(1 u)
t1
du =
Γ(s)Γ(t)
Γ(s + t)
.
The appearance of Γ-functions is typical in dimensional regularization.
Combining all factors, we find that
p
=
gm
2
2(4π)
d/2
µ
m
4d
Γ
1
d
2
.
This formula makes sense for any value of
d
in the complex plane. Let’s see what
happens when we try to approach
d
= 4. We set
d
= 4
ε
, and use the Laurent
series
Γ(ε) =
1
ε
γ + O(ε)
x
ε
= 1 +
ε
2
log x + O(ε
2
),
plus the following usual property of the Γ function:
Γ(x + 1) = xΓ(x).
Then, as d 4, we can asymptotically expand
p
=
gm
2
32π
2
2
ε
γ + log
4πµ
2
m
2
+ O(ε)
.
Unsurprisingly, this diverges as
ε
0, as Γ has a (simple) pole at
1. The
pole in
1
ε
reflects the divergence of this loop integral as Λ
0
in the cutoff
regularization.
We need to obtain a finite limit as
d
4 by adding counterterms. This time,
the counterterms are not dependent on a cutoff, because there isn’t one. There
are also not (explicitly) dependent on the mass scale
µ
, because
µ
was arbitrary.
Instead, it is now a function of ε.
So again, we introduce a new term
φ φ×
δm
2
Again, we need to make a choice of this. We need to choose a renormalization
scheme. We can again use the on-shell renormalization. However, we could have
done on-shell renormalization without doing all this weird dimension thing. Once
we have done it, there is a more convenient way of doing this in dimensional
regularization:
(i) Minimal subtraction (MS): we choose
δm
2
=
gm
2
16π
2
ε
so as to cancel just the pole.
(ii) Modified minimal subtraction (MS): We set
δm
2
=
gm
2
32π
2
2
ε
γ + log 4π
to get rid of some pesky constants, because no one likes the Euler–
Mascheroni constant.
In practice, we are mostly going to use the
MS
scheme, because we really, really,
hate the Euler–Mascheroni constant.
Note that at the end, after subtracting off these counter-terms and taking
ε 0, there is still an explicit µ dependence! In this case, we are left with
gm
2
32π
2
log
µ
2
m
2
Of course, the actual physical predictions of the theory must not depend on
µ
, because
µ
was an arbitrary mass scale. This means
g
must genuinely be a
function of µ, just like when we did renormalization!
What is the physical interpretation of this? We might think that since
µ
is arbitrary, there is no significance. However, it turns out when we do actual
computations with perturbation theory, the quantum corrections tend to look like
log
Λ
2
µ
2
, where Λ is the relevant “energy scale”. Thus, if we want perturbation
theory to work well (or the quantum corrections to be small), we must pick
µ
to
be close to the “energy scale” of the scenario we are interested in. Thus, we can
still think of g(µ
2
) as the coupling constant of the theory “at scale µ
2
”.
5.3 Renormalization of the φ
4
coupling
We now try to renormalize the
φ
4
coupling. At 1-loop, in momentum space, we
receive contributions from
x
1
k
1
x
2
k
2
x
4
x
3
p
p+k
1
+k
2
and also a counter-term:
×
We first do the first loop integral. It is given by
g
2
µ
4d
2
Z
d
4
p
(2π)
4
1
p
2
+ m
2
1
(p + k
1
+ k
2
)
2
+ m
2
.
This is a complicated beast. Unlike the loop integral we did for the propagator,
this loop integral knows about the external momenta
k
1
,
k
2
. We can imagine
ourselves expanding the integrand in
k
1
and
k
2
, and then the result involves
some factors of
k
1
and
k
2
. If we invert the Fourier transform to get back to
position space, then multiplication by
k
i
becomes differentiation. So these gives
contributions to terms of the form, say, (
φ
)
2
φ
2
, in addition to the
φ
4
, which is
what we really care about.
One can check that only the
φ
4
contribution is divergent in
d
= 4. This is
reflecting the fact that all these higher operators are all irrelevant.
So we focus on the contribution to
φ
4
. This is
k
i
-independent, and is given
by the leading part of the integral:
g
2
µ
4d
2(2π)
d
Z
d
4
p
(p
2
+ m
2
)
2
=
1
2
g
2
(4π)
d/2
µ
m
4d
Γ
2
d
2
.
How about the other two loop integrals? They give different integrals, but they
differ only in where the
k
i
appear in the denominator. So up to leading order,
they are the same thing. So we just multiply our result by 3, and find that the
loop contributions are
δλ +
3g
2
2(4π)
d/2
µ
m
4d
Γ
2
d
2
δλ +
3g
2
32π
2
2
ε
γ + log
4πµ
2
m
2

+ O(ε).
Therefore, in the MS scheme, we choose
δλ =
3g
2
32π
2
2
ε
γ + log 4π
,
and so up to O(~), the loop contribution to the φ
4
coupling is
3g
2
32π
2
log
µ
2
m
2
.
So in
λφ
4
theory, to subleading order, with an (arbitrary) dimensional regular-
ization scale µ, we have
+ + two more +
×
+ ···
g
~
+
3g
2
32π
2
log
µ
2
m
2
+ O(~)
Now note that nothing physical (such as this 4-point function) can depend on
our arbitrary scale µ. Consequently, the coupling g(µ) must run so that
µ
µ
g
~
+
3g
2
32π
2
log
µ
2
m
2
+ O(~)
= 0.
This tells us that we must have
β(g) =
3g
2
~
32π
2
.
Note that this is the same
β
-function as we had when we did local potential
approximation!
We can solve this equation for the coupling
g
(
µ
), and find that the couplings
at scales µ and µ
0
are related by
1
g(µ)
=
1
g(µ
0
)
+
3
16π
2
log
µ
0
µ
.
Thus, if we find that at energy scale
µ
, the coupling takes value
g
0
, then at the
scale
µ
0
= µe
16π
2
/(3g
0
)
,
the coupling g(µ
0
) diverges. Our theory breaks down in the UV!
This is to be taken with a pinch of salt, because we are just doing perturbation
theory, with a semi-vague interpretation of what the running of
g
signifies.
So claiming that
g
(
µ
0
) diverges only says something about our perturbative
approximation, and not the theory itself. Unfortunately, more sophisticated
non-perturbative analysis of the theory also suggests the theory doesn’t exist.
5.4 Renormalization of QED
That was pretty disappointing. How about the other theory we studied in QFT,
namely QED? Does it exist?
We again try to do dimensional regularization again. This will be slightly
subtle, because in QED, we have the universe and also a spinor space. In genuine
QED, both of these have dimension 4. If we were to do this properly, we would
have to change the dimensions of both of them to
d
, and then do computations.
In this case, it is okay to to just keep working with 4-dimensional spinors. We
can just think of this as picking as slightly different renormalization scheme than
MS.
In d dimensions, the classical action for QED in Euclidean signature is
S[A, ψ] =
Z
d
d
x
1
4e
2
F
µν
F
µν
+
¯
ψ
/
Dψ + m
¯
ψψ
,
where
/
Dψ = γ
µ
(
µ
+ iA
µ
)ψ.
Note that in the Euclidean signature, we have lost a factor of
i
, and also we have
{γ
µ
, γ
ν
} = 2δ
µν
.
To do perturbation theory, we’d like the photon kinetic term to be canonically
normalized. So we introduce
A
new
µ
=
1
e
A
old
µ
,
and then
S[A
new
, ψ] =
Z
d
d
x
1
4
F
µν
F
µν
+
¯
ψ(
/
+ ie
/
A)ψ + m
¯
ψψ
.
The original photon field necessarily has [
A
old
] = 1, as it goes together with the
derivative. So in d dimensions, we have
[e] =
4 d
2
.
Thus, we find
[A
new
] = [A
old
] [e] =
d 2
2
.
From now on, unless otherwise specified, we will use A
new
, and just call it A.
As before, we introduce a dimensionless coupling constant in terms of an
arbitrary scale µ by
e
2
= µ
4d
g
2
(µ).
Let’s consider the exact photon propagator in momentum space, i.e.
µν
(q) =
Z
d
d
x e
iq·x
hA
µ
(x)A
ν
(0)i
in Lorenz gauge
µ
A
µ
= 0.
We can expand this perturbatively as
+
1PI
+
1PI 1PI
+ ···
The first term is the classical propagator
0
µν
(q) =
1
q
2
δ
µν
q
µ
q
ν
q
2
,
and then as before, we can write the whole thing as
µν
(q) = ∆
0
µν
(q) +
0ρ
µ
(q
σ
ρ
(q)∆
0
σν
(q) +
0ρ
µ
Π
σ
ρ
0λ
σ
Π
κ
λ
0
κν
+ ··· ,
where
Π
ρσ
(
q
) is the photon self-energy, given by the one-particle irreducible
graphs.
We will postpone the computation of the self-energy for the moment, and
just quote that the result is
Π
σ
ρ
(q) = q
2
δ
σ
ρ
q
ρ
q
σ
q
2
π(q
2
)
for some scalar function π(q
2
). This operator
P
σ
ρ
=
δ
σ
ρ
q
ρ
q
σ
q
2
is a projection operator onto transverse polarizations. In particular, like any
projection operator, it is idempotent:
P
σ
ρ
P
λ
σ
= P
λ
ρ
.
This allows us to simply the expression of the exact propagator, and write it as
µν
(q) = ∆
0
µν
(1 + π(q
2
) + π
2
(q
2
) + ···) =
0
µν
1 π(q
2
)
.
Just as the classical propagator came from the kinetic term
S
kin
=
1
4
Z
F
µν
F
µν
dx =
1
2
Z
q
2
δ
µν
q
µ
q
ν
q
2
˜
A
µ
(q)
˜
A
ν
(q) d
d
q,
so too our exact propagator is what we’d get from an action whose quadratic
term is
S
quant
=
1
2
Z
(1 π(q
2
))q
2
δ
µν
q
µ
q
ν
q
2
˜
A
µ
(q)
˜
A
ν
(q) d
d
q.
Expanding
π
(
q
2
) around
q
2
= 0, the leading term just corrects the kinetic term
of the photon. So we have
S
quant
1
4
(1 π(0))
Z
F
µν
F
µν
d
d
x + higher derivative terms.
One can check that the higher derivative terms will be irrelevant in d = 4.
Computing the photon self-energy
We now actually compute the self energy. This is mostly just doing some horrible
integrals.
To leading order, using the classical action (i.e. not including the countert-
erms), we have
Π
ρσ
1loop
=
A
σ
q
A
ρ
p q
p
= g
2
µ
4d
Z
d
4
p
(2π)
d
Tr
(i
/
p + m)γ
ρ
p
2
+ m
2
i(
/
p
/
q + m)γ
σ
(p q)
2
+ m
2
,
where we take the trace to take into account of the fact that we are considering
all possible spins.
To compute this, we need a whole series of tricks. They are just tricks. We
first need the partial fraction identity
1
AB
=
1
B A
1
A
1
B
=
Z
1
0
dx
((1 x)A + xB)
2
.
Applying this to the two denominators in our propagators gives
1
(p
2
+ m
2
)((p q)
2
+ m
2
)
=
Z
1
0
dx
((p
2
+ m
2
)(1 x) + ((p q)
2
+ m
2
)x)
2
=
Z
1
0
dx
(p
2
+ m
2
+ 2xpq + q
2
x)
2
=
Z
1
0
dx
((p xq)
2
+ m
2
+ q
2
x(1 x))
2
Letting p
0
= p qx, and then dropping the prime, our loop integral becomes
g
2
µ
4d
(2π)
d
Z
d
d
p
Z
1
0
dx
tr((i(
/
p
/
qx) + m)γ
ρ
(i(
/
p
/
q(1 x)) + m)γ
σ
)
(p
2
+ ∆)
2
,
where ∆ = m
2
+ q
2
x(1 x).
We next do the trace over Dirac spinor indices. As mentioned, if we are
working in
d
dimensions, then we should be honest and work with spinors in
d
dimensions, but in this case, we’ll get away with just pretending it is four
dimensions.
We have
tr(γ
ρ
γ
σ
) = 4δ
ρσ
, tr(γ
µ
γ
ρ
γ
ν
γ
σ
) = 4(δ
µν
δ
ρσ
δ
µν
δ
ρσ
+ δ
µσ
δ
ρν
).
Then the huge trace expression we have just becomes
4
(p + qx)
ρ
(p q(1 x))
σ
+ (p + qx) ·(p q(1 x))δ
ρσ
(p + qx)
σ
(p q(1 x))
ρ
+ m
2
δ
ρσ
.
Whenever
d N
, we certainly get zero if any component of
p
σ
appears an odd
number of times. Consequently, in the numerator, we can replace
p
ρ
p
σ
7→
p
2
d
δ
ρσ
.
Similarly, we have
p
µ
p
ρ
p
ν
p
σ
7→
(p
2
)
2
d(d + 2)
(δ
µν
δ
ρσ
+ δ
µρ
δ
νσ
+ δ
µσ
δ
ρν
)
The integrals are given in terms of Γ-functions, and we obtain
Π
ρσ
1loop
(q) =
4g
2
µ
4d
(4π)
d/2
Γ
2
d
2
Z
1
0
dx
1
2d/2
δ
ρσ
(m
2
+ x(1 x)q
2
) + δ
ρσ
(m
2
+ x(1 x)q
2
) 2x(1 x)q
ρ
q
σ
.
And, looking carefully, we find that the m
2
terms cancel, and the result is
Π
ρσ
1loop
(q) = q
2
δ
ρσ
q
ρ
q
σ
q
2
π
1loop
(q
2
),
where
π
1loop
(q
2
) =
8g
2
(µ)
(4π)
d/2
Γ
2
d
2
Z
1
0
dx x(1 x)
µ
2
2d/2
.
The key point is that this diverges when
d
= 4, because Γ(0) is a pole, and so
we need to introduce counterterms.
We set d = 4 ε, and ε 0
+
. We introduce a counterterm
S
CT
[A, ψ, ε] =
Z
1
4
δZ
3
F
µν
F
µν
+ δZ
2
¯
ψ
/
Dψ + δM
¯
ψψ
d
d
x.
Note that the counterterms multiply gauge invariant contributions. We do not
have separate counterterms for
¯
ψ
/
ψ
and
¯
ψ
/
. We can argue that this must be
the case because our theory is gauge invariant, or if we want to do it properly,
we can justify this using the Ward identities. This is important, because if we
did this by imposing cutoffs, then this property doesn’t hold.
For Π
1loop
µν
, the appropriate counterterm is δZ
3
. As ε 0
+
, we have
π
1loop
(q
2
)
g
2
(µ)
2π
2
Z
1
0
dx x(1 x)
2
ε
γ + log
4πµ
2
+ O(ε)

.
The counterterm
×
=
δZ
3
4
Z
F
µν
F
µν
d
4
x,
and must be chosen to remove the
1
ε
singularity. In the
MS
scheme, we also
remove the γ + log 4π piece, because we hate them. So what is left is
π
MS
(q
2
) = +
g
2
(µ)
2π
2
Z
1
0
dx x(1 x) log
m
2
+ q
2
x(1 x)
µ
2
.
Then this is finite in
d
= 4. As we previously described, this 1-loop correction
contains the term log
m
2
+q
2
µ
2
. So it is small when m
2
+ q
2
µ
2
.
Notice that the log term has a branch point when
m
2
+ q
2
x(1 x) = 0.
For x [0, 1], we have
x(1 x)
0,
1
4
.
So the branch cut is inaccessible with real Euclidean momenta. But in Lorentzian
signature, we have
q
2
= q
2
E
2
,
so the branch cut occurs when
(E
2
q
2
)x(1 x) m
2
,
which can be reached whenever
E
2
(2
m
)
2
. This is exactly the threshold energy
for creating a real e
+
e
pair.
The QED β-function
To relate this “one-loop” exact photon propagator to the
β
-function for the
electromagnetic coupling, we rescale back to A
old
µ
= eA
new
µ
, where we have
S
(2)
eff
[A
old
] =
1
4g
2
(1 π(0))
Z
F
µν
F
µν
d
4
z
=
1
4
1
g
2
(µ)
1
2π
2
Z
1
0
dx x(1 x) log
m
2
µ
2

Z
F
µν
F
µν
d
4
z.
We call the spacetime parameter
z
instead of
x
to avoid confusing it with the
Feynman parameter x.
Since nothing physical can depend on the arbitrary scale µ, we must have
0 = µ
µ
1
g
2
(µ)
1
2π
2
Z
1
0
dx x(1 x) log
m
2
µ
2

.
Solving this, we find that to lowest order, we have
β(g) =
g
2
12π
2
.
This integral is easy to do because we are evaluating at
π
(0), and the then
log
term does not depend on x.
This tells us that the running couplings are given by
1
g
2
(µ)
=
1
g
2
(µ
0
)
+
1
6π
2
log
µ
0
µ
.
Now suppose µ m
e
, where we measure
g
2
(m
e
)
4π
1
137
,
the fine structure constant. Then there exists a scale µ
0
given by
m
0
= m
e
e
6π
2
/g
2
(m
e
)
10
286
GeV,
where g
2
(µ
0
) diverges! This is known as a Landau pole.
Yet again, the belief is that pure QED does not exist as a continuum quantum
field theory. Of course, what we have shown is that our perturbation theory
breaks down, but it turns out more sophisticated regularizations also break
down.
Physics of vacuum polarization
Despite QED not existing, we can still say something interesting about it.
Classically, when we want to do electromagnetism in a dielectric material, the
electromagnetic field causes the dielectric material to become slightly polarized,
which results in a shift in the effective electromagnetic potential. We shall see
that in QED, vacuum itself will act as such a dielectric material, due to effects
of virtual electron-positron pairs.
Consider the scattering two (distinguishable) Dirac spinors of charges
e
1
and
e
2
. The S matrix for the process is given by
S(1 2 1
0
2
0
) =
e
1
e
2
4π
δ
4
(p
1
+ p
2
p
1
0
p
2
0
)¯u
1
0
γ
µ
u
1
µν
(q)¯u
2
0
γ
ν
u
2
,
where q = p
1
p
0
1
is the momentum of the photon propagator.
1
0
1
2
0
2
p
1
p
0
1
p
2
p
0
2
Here we are including the exact photon propagator in the diagram. It is given by
0
µν
(q)
1 π(q
2
)
,
and so we can write
e
1
e
2
4π
δ
(4)
(p
1
+ p
2
p
1
0
p
2
0
)¯u
1
0
γ
µ
u
1
0
µν
¯u
2
0
γ
ν
u
2
(1 + π(q
2
) + ···).
So the quantum corrections modify the classical one by a factor of (1+
π
(
q
2
)+
···
).
To evaluate this better, we note that
¯u
i
γ
µ
u
i
0
µν
¯u
2
0
γ
ν
u
2
= ¯u
1
0
γ
µ
u
1
¯u
2
0
γ
µ
u
2
1
q
2
.
In the non-relativistic limit, we expect |q
2
| |q| and
¯u
1
0
γ
µ
u
1
g
m
1
m
0
1
0
,
where
m
1
, m
1
0
are the
σ
3
(spin) angular momentum quantum numbers. This
tells us it is very likely that the angular momentum is conserved.
Consequently, in this non-relativistic limit, we have
S(1 2 1
0
2
0
)
e
1
e
2
4π|q|
2
δ
(4)
(p
1
+ p
2
p
1
0
p
2
0
)(1 + π(|q|
2
))δ
m
1
m
0
1
δ
m
2
m
0
2
.
This is what we would get using the Born approximation if we had a potential of
V (r) = e
1
e
2
Z
d
3
q
(2π)
3
1 + π(|q|
2
)
|q|
2
e
iq·r
.
In particular, if we cover off the
π
(
|q|
2
) piece, then this is just the Coulomb
potential.
In the regime |q|
2
m
2
e
, picking µ = m
e
, we obtain
π(|q|
2
) = π(0) +
g
2
(µ)
2π
2
Z
1
0
dx x(1 x) log
1 +
x(1 x)|q|
2
m
2
π(0) +
g
2
(µ)
60π
2
|q|
2
m
2
.
Then we find
V (r) e
1
e
2
Z
d
3
q
(2π)
3
1 + π(0)
q
2
+
g
2
60π
2
m
2
+ ···
e
iq·r
= e
1
e
2
1 + π(0)
4πr
+
g
2
60π
2
m
2
δ
3
(r)
.
So we got a short-range modification to the Coulomb potential. We obtained
a
δ
-function precisely because we made the assumption
|q|
2
m
2
e
. If we did
more accurate calculations, then we don’t get a
δ
function, but we still get a
contribution that is exponentially suppressed as we go away form 0.
This modification of the Coulomb force is attributed to screening”. It leads
to a measured shift in the energy levels of
`
= 0 bound states of hydrogen. We
can interpret this as saying that the vacuum itself is some sort of dielectric
medium, and this is the effect of loop diagrams that look like
1
0
1
2
0
2
p
1
p
0
1
p
2
p
0
2
The idea is that the (genuine) charges
e
1
and
e
2
polarizes the vacuum, which
causes virtual particles to pop in and out to repel or attract other particles.
So far, the conclusions of this course is pretty distressing. We looked at
φ
4
theory, and we discovered it doesn’t exist. We then looked at QED, and it
doesn’t exist. Now, we are going to look at Yang–Mills theory, which is believed
to exist.
6 Non-abelian gauge theory
6.1 Bundles, connections and curvature
Vector bundles
To do non-abelian gauge theory (or even abelian gauge theory, really), we need
to know some differential geometry. In this section, vector spaces can be either
real or complex our theory works fine with either.
So far, we had a universe
M
, and our fields took values in some vector space
V
. Then a field is just a (smooth/continuous/whatever) function
f
:
M V
.
However, this requires the field to take values in the same vector space at each
point. It turns out it is a more natural thing to assign a vector space
V
x
for each
x M, and then a field φ would be a function
φ : M
a
xM
V
x
E,
where
`
means, as usual, the disjoint union of the vector spaces. Of course, the
point of doing so is that we shall require
φ(x) V
x
for each x. ()
This will open a can of worms, but is needed to do gauge theory properly.
We are not saying that each
x M
will be assigned a completely random
vector space, e.g. one would get
R
3
and another would get
R
144169
. In fact,
they will be isomorphic to some fixed space
V
. So what do we actually achieve?
While it is true that
V
x
=
V
for all
x
, there is no canonical choice of such an
isomorphism. We will later see that picking such an isomorphism correspond to
picking a gauge of the field.
Now if we just have a bunch of vector spaces
V
x
for each
x M
, then we
lose the ability to talk about whether a field is differentiable or not, let alone
taking the derivative of a field. So we want to “glue” them together in some way.
We can write
E = {(x, v) : x M, v V
x
},
and we require
E
to be a manifold as well. We call these vector bundles. There
are some technical conditions we want to impose on
E
so that it is actually
possible to work on it.
There is a canonical projection map
π
:
E M
that sends (
x, v
) to
x
. Then
we have
V
x
=
π
1
(
{x}
). A natural requirement is that we require this map
π
to
be smooth. Otherwise, we might find someone making
E
into a manifold in a
really stupid way.
We now introduce a convenient terminology.
Definition
(Section)
.
Let
p
:
E M
be a map between manifolds. A section
is a smooth map s : M E such that p s = id
M
.
Now we can rewrite the condition (
) as saying
φ
is a section of the map
π
:
E M
. In general, our fields will usually be sections of some vector bundle.
Example.
Let
M
be any manifold, and
V
be any vector space. Then
M ×V
is
a manifold in a natural way, with a natural projection
π
1
:
M × V M
. This
is called the trivial bundle.
Now trivial bundles are very easy to work with, since a section of the trivial
bundle is in a very natural correspondence with maps
M V
. So we know
exactly how to work with them.
The final condition we want to impose on our vector bundles is not that
they are trivial, or else we have achieved absolutely nothing. What we want to
require is that near every point, the vector bundle looks trivial.
Definition
(Vector bundle)
.
Let
M
be a manifold, and
V
a vector space. A
vector bundle over
M
with typical fiber
V
is a manifold
E
with a map
π
:
E M
such that for all
x M
, the fiber
E
x
=
π
1
(
{x}
) is a vector space that is
isomorphic to V .
Moreover, we require that for each
x M
, there exists an open neighbourhood
U
of
x
, and a diffeomorphism Φ :
U × V π
1
(
U
) such that
π
(Φ(
y, v
)) =
y
for
all y, and Φ(y, ·) : {y}× V E
y
is a linear isomorphism of vector spaces.
Such a Φ is called a local trivialization of
E
, and
U
is called a trivializing
neighbourhood.
By definition, each point
x
is contained in some trivializing neighbourhood.
Thus, we can find a trivializing cover
{U
α
}
with a trivialization on each
U
α
such
that
S
U
α
= M.
There are some philosophical remarks we can make here. On
R
n
, every
bundle is isomorphic to a trivial bundle. If we only care about Euclidean (or
Minkowski) space, then it seems like we are not actually achieving much. But
morally, requiring that a bundle is just locally trivial, and not globally trivial
in some sense tells us everything we do is “local”. Indeed, we are mere mortals
who can at best observe the observable universe. We cannot “see” anything
outside of the observable universe, and in particular, it is impossible to know
whether bundles remain trivial if we go out of the observable universe. Even
if we observe that everything we find resembles a trivial bundle, we are only
morally allowed to conclude that we have a locally trivial bundle, and are not
allowed to conclude anything about the global geometry of the universe.
Another reason for thinking about bundles instead of just trivial ones is that
while every bundle over
R
n
is globally trivial, the choice of trivialization is not
canonical, and there is a lot of choice to be made. Usually, when we have a
vector space
V
and want to identify it with
R
n
, then we just pick a basis for
it. But now if we have a vector bundle, then we have to pick a basis at each
point in space. This is a lot of arbitrary choices to be made, and it is often more
natural to study the universe without making such choices. Working on a vector
bundle in general also prevents us from saying things that depends on the way
we trivialize our bundle, and so we “force” ourselves to say sensible things only.
Recall that for a trivial bundle, a section of the bundle is just a map
M V
.
Thus, for a general vector bundle, if we have a local trivialization Φ on
U
, then
under the identification given by Φ, a section defined on
U
can be alternatively
be written as a map
φ
:
U V
, which we may write, in coordinates, as
φ
a
(
x
),
for
a
= 1
, ··· , dim V
. Note that this is only valid in the neighbourhood
U
, and
also depends on the Φ we pick.
Example. Let M be any manifold. Then the tangent bundle
T M =
a
xM
T
x
M M
is a vector bundle. Similarly, the cotangent bundle
T
M =
a
xM
T
x
M M
is a vector bundle.
Recall that given vector spaces
V
and
W
, we can form new vector spaces by
taking the direct sum
V W
and the tensor product
V W
. There is a natural
way to extend this to vector bundles, so that if
E, F M
are vector bundles,
then
E F
and
E F
are vector bundles with fibers (
E F
)
x
=
E
x
F
x
and
(
E F
)
x
=
E
x
F
x
. It is an exercise for the reader to actually construct these
bundles. We can also similarly extend then notion of exterior product
V
p
V
to
vector bundles.
In particular, applying these to
T M
and
T
M
gives us tensor product bundles
of the form (T M )
n
(T
M)
m
, whose sections are tensor fields.
In more familiar notation, (in a local trivialization) we write sections of
tangent bundles as
X
µ
(
x
), and sections of the cotangent bundle as
Y
µ
(
x
).
Sections of the tensor product are written as X
µ
1
,...,µ
n
ν
1
,...,ν
m
.
Example.
There is exactly one non-trivial vector bundle we can visualize.
Consider the circle S
1
:
Let’s consider line bundles on
S
1
, i.e. vector bundles with fiber
=
R
. There is of
course the trivial bundle, and it looks like this:
However, we can also introduce a “twist” into this bundle, and obtain the obius
band:
This is an example of a non-trivial line bundle. How do we know it is non-trivial?
The trivial bundle obviously has a nowhere-vanishing section. However, if we
stare at the obius band hard enough, we see that any section of the obius
band must vanish somewhere. Thus, this cannot be the trivial line bundle.
In fact, it is a general theorem that a line bundle has a nowhere-vanishing
section if and only if it is trivial.
We introduce a bit more notation which we will use later.
Notation.
Let
E M
be a vector bundle. Then we write
0
M
(
E
) for the
vector space of sections of
E M
. Locally, we can write an element of this as
X
a
, for a = 1, ··· , dim E
x
.
More generally, we write
p
M
(
E
) for sections of
E
V
p
T
M M
, where
V
p
T
M
is the bundle of
p
-forms on
M
. Elements can locally be written as
X
a
µ
1
...µ
n
.
If V is a vector space, then
p
M
(V ) is a shorthand for
p
M
(V × M).
Let’s return to the definition of a vector bundle. Suppose we had two trivial-
izing neighbourhoods
U
α
and
U
β
, and that they have a non-trivial intersection
U
α
U
β
. We can then compare the two trivializations on U
α
and U
β
:
(U
α
U
β
) × V π
1
(U
α
× U
β
) (U
α
U
β
) × V.
Φ
α
Φ
β
Composing the maps gives us a map
t
αβ
: Φ
1
α
Φ
β
: (U
α
U
β
) × V (U
α
U
β
) × V
that restricts to a linear isomorphism on each fiber. Thus, this is equivalently a
map U
α
U
β
GL(V ). This is called the transition function.
These transition functions satisfy some compatibility conditions. Whenever
x U
α
U
β
U
γ
, we have
t
αβ
(x) · t
βγ
(x) = t
αγ
(x).
Note that on the left, what we have is the (group) multiplication of
GL
(
V
).
These also satisfy the boring condition
t
αα
=
id
. These are collectively known
as the cocycle conditions.
Exercise.
Convince yourself that it is possible to reconstruct the whole vector
bundle just from the knowledge of the transition functions. Moreover, given any
cover
{U
α
}
of
M
and functions
t
αβ
:
U
α
U
β
GL
(
V
) satisfying the cocycle
conditions, we can construct a vector bundle with these transition functions.
This exercise is crucial. It is left as an exercise, instead of being spelt out
explicitly, because it is much easier to imagine what is going on geometrically
in your head than writing it down in words. The idea is that the transition
functions tell us how we can glue different local trivializations together to get a
vector bundle.
Now we want to do better than this. For example, suppose we have
V
=
R
n
, which comes with the Euclidean inner product. Then we want the local
trivializations to respect this inner product, i.e. they are given by orthogonal
maps, rather than just linear isomorphisms. It turns out this is equivalent to
requiring that the transition functions
t
αβ
actually land in O(
n
) instead of just
GL(n, R). More generally, we can have the following definition:
Definition
(
G
-bundle)
.
Let
V
be a vector space, and
G GL
(
V
) be a Lie
subgroup. Then a
G
-bundle over
M
is a vector bundle over
M
with fiber
V
,
equipped with a trivializing cover such that the transition functions take value
in G.
Note that it is possible to turn a vector bundle into a
G
-bundle into many
different ways. So the trivializing cover is part of the data needed to specify the
G-bundle.
We can further generalize this a bit. Instead of picking a subgroup
G
GL
(
V
), we pick an arbitrary Lie group
G
with a representation on
V
. The
difference is that now certain elements of G are allowed to act trivially on V .
Definition
(
G
-bundle)
.
Let
V
be a representation,
G
a Lie group, and
ρ
:
G
GL(V ) a representation. Then a G-bundle consists of the following data:
(i) A vector bundle E M.
(ii) A trivializing cover {U
α
} with transition functions t
αβ
.
(iii)
A collection of maps
ϕ
αβ
:
U
α
U
β
G
satisfying the cocycle conditions
such that t
αβ
= ρ ϕ
αβ
.
Note that to specify a
G
-bundle, we require specifying an element
ϕ
αβ
(
x
)
G
for each
x M
, instead of just the induced action
ρ
(
ϕ
αβ
(
x
))
GL
(
V
). This
is crucial for our story. We are requiring more information than just how the
elements in
V
transform. Of course, this makes no difference if the representation
ρ
is faithful (i.e. injective), but makes a huge difference when
ρ
is the trivial
representation.
We previously noted that it is possible to recover the whole vector bundle just
from the transition functions. Consequently, the information in (i) and (ii) are
actually redundant, as we can recover
t
αβ
from
ϕ
αβ
by composing with
ρ
. Thus,
a
G
-bundle is equivalently a cover
{U
α
}
of
M
, and maps
ϕ
αβ
:
U
α
U
β
G
satisfying the cocycle condition.
Note that this equivalent formulation does not mention ρ or V at all!
Example.
Every
n
-dimensional vector bundle is naturally a
GL
(
n
) bundle
we take ρ to be the identity map, andϕ
αβ
= t
αβ
.
Principal G-bundles
We are halfway through our voyage into differential geometry. I promise this
really has something to do with physics.
We just saw that a
G
-bundle can be purely specified by the transition
functions, without mentioning the representation or fibers at all. In some sense,
these transition functions encode the “pure twisting” part of the bundle. Given
this “pure twisting” information, and any object
V
with a representation of
G
on
V
, we can construct a bundle with fiber
V
, twisted according to this prescription.
This is what we are going to do with gauge theory. The gauge group is the
group
G
, and the gauge business is encoded in these “twisting” information.
Traditionally, a field is a function
M V
for some vector space
V
. To do gauge
coupling, we pick a representation of
G
on
V
. Then the twisting information
allows us to construct a vector bundle over
M
with fiber
V
. Then gauge-
coupled fields now correspond to sections of this bundle. Picking different local
trivializations of the vector bundle corresponds to picking different gauges, and
the transition functions are the gauge transformations!
But really, we prefer to work with some geometric object, instead of some
collection of transition functions. We want to find the most “natural” object
for
G
to act on. It turns out the most natural object with a
G
-action is not a
vector space. It is just G itself!
Definition
(Principal
G
-bundle)
.
Let
G
be a Lie group, and
M
a manifold.
A principal
G
-bundle is a map
π
:
P M
such that
π
1
(
{x}
)
=
G
for each
x M
. Moreover,
π
:
P M
is locally trivial, i.e. it locally looks like
U × G
,
and transition functions are given by left-multiplication by an element of G.
More precisely, we are given an open cover
{U
α
}
of
M
and diffeomorphisms
Φ
α
: U
α
× G π
1
(U
α
)
satisfying π
α
(x, g)) = x, such that the transition functions
Φ
1
α
Φ
β
: (U
α
U
β
) × G (U
α
U
β
) × G
is of the form
(x, g) 7→ (x, t
αβ
(x) · g)
for some t
αβ
: U
α
U
β
G.
Theorem.
Given a principal
G
-bundle
π
:
P M
and a representation
ρ
:
G GL
(
V
), there is a canonical way of producing a
G
-bundle
E M
with
fiber V . This is called the associated bundle.
Conversely, given a
G
-bundle
E M
with fiber
V
, there is a canonical way
of producing a principal
G
-bundle out of it, and these procedures are mutual
inverses.
Moreover, this gives a correspondence between local trivializations of
P M
and local trivializations of E M.
Note that since each fiber of
P M
is a group, and trivializations are
required to respect this group structure, a local trivialization is in fact equivalent
to a local section of the bundle, where we set the section to be the identity.
Proof. If the expression
P ×
G
V M
makes any sense to you, then this proves the first part directly. Otherwise,
just note that both a principal
G
-bundle and a
G
-bundle with fiber
V
can be
specified just by the transition functions, which do not make any reference to
what the fibers look like.
The proof is actually slightly less trivial than this, because the same vector
bundle can have be many choices of trivializing covers, which gives us different
transition functions. While these different transition functions patch to give the
same vector bundle, by assumption, it is not immediate that they must give the
same principal G-bundle as well, or vice versa.
The way to fix this problem is to figure out explicitly when two collection of
transition functions give the same vector bundle or principal bundle, and the
answer is that this is true if they are cohomologous. Thus, the precise statement
needed to prove this is that both principal
G
-bundle and
G
-bundles with fiber
V
biject naturally with the first
ˇ
Cech cohomology group of
M
with coefficients
in G.
We now get to the physics part of the story. To specify a gauge theory
with gauge group
G
, we supplement our universe
M
with a principal
G
-bundle
π
:
P M
. In QED, the gauge group is U(1), and in QCD, the gauge group is
SU(3). In the standard model, for some unknown reason, the gauge group is
G = SU(3) × SU(2) × U(1).
Normally, a field with values in a vector space
V
is is given by a smooth map
φ
:
M V
. To do gauge coupling, we pick a representation
ρ
:
G V
, and
then form the associated bundle to
P M
. Then a field is now a section of
this associated bundle.
Example.
In Maxwell theory, we have
G
= U(1). A complex scalar field is a
function
φ
:
G C
. The vector space
C
has a very natural action of U(1) given
by left-multiplication.
We pick our universe to be
M
=
R
4
, and then the bundle is trivial. However,
we can consider two different trivializations defined on the whole of
M
. Then
we have a transition function
t
:
M
U(1), say
t
(
x
) =
e
(x)
. Then under this
change of trivialization, the field would transform as
φ(x) 7→ e
(x)
φ(x).
This is just the usual gauge transformation we’ve seen all the time!
Example.
Recall that a vector bundle
E M
with fiber
R
n
is naturally a
GL
(
n
)-bundle. In this case, there is a rather concrete description of the principal
GL(n)-bundle that gives rise to E.
At each point
x M
, we let
Fr
(
E
x
) be the set of all ordered bases of
E
x
.
We can biject this set with
GL
(
n
) as follows we first fix a basis
{e
i
}
of
E
x
.
Then given any other basis
{f
i
}
, there is a unique matrix in
GL
(
n
) that sends
{e
i
}
to
{f
i
}
. This gives us a map
Fr
(
E
x
)
GL
(
n
), which is easily seen to be a
bijection. This gives a topology on Fr(E
x
).
The map constructed above obviously depends on the basis
e
i
chosen. Indeed,
changing the
e
i
corresponds to multiplying
t
on the right by some element
of
GL
(
n
). However, we see that at least the induced smooth structure on
Fr
(
E
x
) is well-defined, since right-multiplication by an element of
GL
(
n
) is a
diffeomorphism.
We can now consider the disjoint union of all such
Fr
(
E
x
). To make this
into a principal
GL
(
n
)-bundle, we need to construct local trivializations. Given
any trivialization of
E
on some neighbourhood
U
, we have a smooth choice
of basis on each fiber, since we have bijected the fiber with
R
n
, and
R
n
has
a standard basis. Thus, performing the above procedure, we have a choice of
bijection between
Fr
(
E
x
) between
GL
(
n
). If we pick a different trivialization,
then the choice of bijection differs by some right-multiplication.
This is almost a principal
GL
(
n
)-bundle, but it isn’t quite so to obtain
a principal
GL
(
n
)-bundle, we want the transition functions to be given by left
multiplication. To solve this problem, when we identified
Fr
(
E
x
) with
GL
(
n
)
back then, we should have sent
{f
i
}
to the inverse of the matrix that sends
{e
i
}
to {f
i
}.
In fact, we should have expected this. Recall from linear algebra that under
a change of basis, if the coordinates of elements transform by
A
, then the basis
themselves transform by
A
1
. So if we want our principal
GL
(
n
)-bundle to have
the same transition functions as
E
, we need this inverse. One can now readily
check that this has the same transition functions as
E
. This bundle is known as
the frame bundle, and is denoted Fr(E).
Note that specifying trivializations already gives a smooth structure on
π
:
Fr
(
E
)
M
. Indeed, on each local trivialization on
U
, we have a bijection
between
π
1
(
U
) and
U × GL
(
n
), and this gives a chart on
π
1
(
U
). The fact
that transition functions are given by smooth maps
U GL
(
n
) ensures the
different charts are compatible.
Recall that we previously said there is a bijection between a section of a
principal
G
-bundle and a trivialization of the associated bundle. This is very
clearly true in this case a section is literally a choice of basis on each fiber!
Connection
Let’s go back to the general picture of vector bundles, and forget about the
structure group
G
for the moment. Consider a general vector bundle
π
:
E M
,
and a section
s
:
M E
. We would like to be able to talk about derivatives of
this section. However, the “obvious” attempt looking like
s(x + ε) s(x)
|ε|
doesn’t really make sense, because
s
(
x
+
ε
) and
s
(
x
) live in different vector
spaces, namely E
x+ε
and E
x
.
We have encountered this problem in General Relativity already, where we
realized the “obvious” derivative of, say, a vector field on the universe doesn’t
make sense. We figured that what we needed was a connection, and it turns
out the metric on
M
gives us a canonical choice of the connection, namely the
Levi-Civita connection.
We can still formulate the notion of a connection for a general vector bundle,
but this time, there isn’t a canonical choice of connection. It is some additional
data we have to supply.
Before we meet the full, abstract definition, we first look at an example of a
connection.
Example.
Consider a trivial bundle
M × V M
. Then the space of sections
0
M
(
V
) is canonically isomorphic to the space of maps
M V
. This we know
how to differentiate. There is a canonical map d :
0
M
(
V
)
1
M
(
V
) sending
f 7→ df , where for any vector X T
p
M, we have
df(X) =
f
X
V.
This is a one-form with values in
V
(or
M × V
) because it takes in a vector
X
and returns an element of V .
In coordinates, we can write this as
df =
f
x
µ
dx
µ
.
We can now define a connection:
Definition
(Connection)
.
A connection is a linear map
: Ω
0
M
(
E
)
1
M
(
E
)
satisfying
(i) Linearity:
(α
1
s
1
+ α
2
s
2
) = α
1
(s
1
) + α
2
(s
2
)
for all s
1
, s
2
0
M
(E) and α
1
, α
2
constants.
(ii) Leibnitz property:
(fs) = (df)s + f(S)
for all
s
0
M
(
E
) and
f C
(
M
), where, d
f
is the usual exterior
derivative of a function, given in local coordinates by
df =
f
x
µ
dx
µ
.
Given a vector field
V
on
M
, the covariant derivative of a section in the direction
of V is the map
V
: Ω
0
M
(E)
0
M
(E)
defined by
V
s = V ys = V
µ
µ
s.
In more physics settings, the connection is usually written as D
µ
.
Consider any two connections
,
0
. Their difference is not another connec-
tion. Instead, for any f C
(M) and s
0
M
(E), we have
(
0
)(f s) = f(
0
)(s).
So in fact the difference is a map
0
M
(
E
)
1
M
(
E
) that is linear over functions
in
C
(
M
). Equivalently, it is some element of
1
M
(
End
(
E
)), i.e. some matrix-
valued 1-form A
µ
(x) End(E
x
).
In particular, consider any open set
U M
equipped with a local trivializa-
tion. Then after picking the trivialization, we can treat the bundle on
U
as a
trivial one, and our previous example showed that we had a “trivial” connection
given by d. Then any other connection can be expressed as
s = ds + As
for some
A
1
U
(
End
(
V
)), where the particular
A
depends on our trivialization.
This is called the connection 1-form, or the gauge field . In the case where
E
is
the tangent bundle, this is also known as the Christoffel symbols.
This was all on a general vector bundle. But the case we are really interested
in is a
G
-bundle. Of course, we can still say the same words as above, as any
G
-bundle is also a vector bundle. But can we tell a bit more about how the
connection looks like? We recall that specifying a
G
-bundle with fiber
V
is
equivalent to specifying a principal
G
-bundle. What we would like is to have
some notion of “connection on a principal G-bundle”.
Theorem.
There exists a notion of a connection on a principal
G
-bundle.
Locally on a trivializing neighbourhood
U
, the connection 1-form is an element
A
µ
(x)
1
U
(g), where g is the Lie algebra of G.
Every connection on a principal
G
-bundle induces a connection on any asso-
ciated vector bundle. On local trivializations, the connection on the associated
vector bundle has the “same” connection 1-form
A
µ
(
x
), where
A
µ
(
x
) is regarded
as an element of End(V ) by the action of G on the vector space.
Readers interested in the precise construction of a connection on a principal
G
-bundle should consult a textbook on differential geometry. Our previous work
doesn’t apply because G is not a vector space.
It is useful to know how the connection transforms under a change of local
trivialization. For simplicity of notation, suppose we are working on two trivial-
izations on the same open set
U
, with the transition map given by
g
:
U G
.
We write
A
and
A
0
for the connection 1-forms on the two trivializations. Then
for a section s expressed in the first coordinates, we have
g · (ds + As) = (d + A
0
)(g · s).
So we find that
A
0
= gAg
1
gd(g
1
).
This expression makes sense if
G
is a matrix Lie group, and so we can canonically
identify both
G
and
g
as subsets of
GL
(
n, R
) for some
n
. Then we know what
it means to multiply them together. For a more general Lie group, we have to
replace the first term by the adjoint representation of
G
on
g
, and the second by
the Maurer–Cartan form.
Example.
In the U(1) case,our transition functions
g
αβ
are just multiplication
by complex numbers. So if g = e
, then we have
A
β
= gdg
1
+ gA
α
g
1
= gdg
1
+ A
α
= i(dλ iA
α
).
Note that since U(1) is abelian, the conjugation by
g
has no effect on
A
. This is
one of the reasons why non-abelian gauge theory is simple.
Minimal coupling
So how do we do gauge coupling? Suppose we had an “ordinary” scalar field
ψ : M C on our manifold, and we have the usual action
S[ψ] =
Z
1
2
|ψ|
2
+
1
2
m
2
ψ
2
+ ··· .
We previously said that to do gauge theory, we pick a representation of our
gauge group
G
= U(1) on
C
, which we can take to be the “obvious” action by
multiplication. Then given a principal U(1)-bundle
P M
, we can form the
associated vector bundle, and now our field is a section of this bundle.
But how about the action? As mentioned, the
ψ
term no longer makes
sense. But in the presence of a connection, we can just replace
with
! Now,
the action is given by
S[ψ] =
Z
1
2
|∇ψ|
2
+
1
2
m
2
ψ
2
+ ··· .
This is known as minimal coupling.
At this point, one should note that the machinery of principal
G
-bundles
was necessary. We only ever have one principal
G
-bundle
P M
, and a single
connection on it. If we have multiple fields, then we use the same connection on
all of them, via the mechanism of associated bundles. Physically, this is important
— this means different charged particles couple to the same electromagnetic field!
This wouldn’t be possible if we only worked with vector bundles; we wouldn’t
be able to compare the connections on different vector bundles.
Curvature
Given a connection
, we can extend it to a map
p
M
(
E
)
p+1
M
(
E
) by requiring
it to satisfy the conditions
(α
1
s
1
+ α
2
s
2
) = α
1
(s
1
) + α
2
(s
2
),
(ω s) = (dω) s + (1)
deg ω
ω s.
whenever ω
q
(M) and s
pq
M
(E).
We can think of
as a “covariant generalization” of the de Rham operator
d. However, where the ordinary d is nilpotent, i.e. d
2
= 0, here this is not
necessarily the case.
What kind of object is
2
, then? We can compute
2
(ω s) = (dω s + (1)
q
ω s)
= d
2
ω s + (1)
q+1
dω s + (1)
q
dω s + (1)
2q
ω
2
s
= ω
2
s.
So we find that
2
is in fact a map
q
M
(
E
)
q+2
M
(
E
) that is linear over any
forms! Specializing to the case of q = 0 only, we can write
2
as
2
(s) = F
s,
for some
F
2
M
(
End
(
E
)). It is an easy exercise to check that the same
formula works for all q. In local coordinates, we can write
F
=
1
2
(F
µν
(x))
a
b
dx
µ
dx
ν
.
Since we have
s = ds + As,
we find that
2
s = (ds + As) = d
2
s + d(As) + A(ds + As) = (dA + A A)s.
Note that by
A A
, we mean, locally in coordinates,
A
a
b
A
b
c
, which is still a
form with values in End(V ).
Thus, locally, we have
F = dA + A A
= (
µ
A
ν
+ A
µ
A
ν
) dx
µ
dx
ν
=
1
2
(
µ
A
ν
ν
A
µ
+ A
µ
A
ν
A
ν
A
µ
) dx
µ
dx
ν
=
1
2
(
µ
A
ν
ν
A
µ
+ [A
µ
, A
ν
]) dx
µ
dx
ν
Of course, when in the case of U(1) theory, the bracket vanishes, and this is
just the usual field strength tensor. Unsurprisingly, this is what will go into the
Lagrangian for a general gauge theory.
Crucially, in the non-abelian theory, the bracket term is there, and is non-zero.
This is important. Our
F
is no longer linear in
A
. When we do Yang–Mills later,
the action will contain a
F
2
term, and then expanding this out will give
A
3
and
A
4
terms. This causes interaction of the gauge field with itself, even without
the presence of matter!
We end by noting a seemingly-innocent identity. Note that we can compute
3
s = (
2
s) = (F s) = (F ) s + F (s).
On the other hand, we also have
3
s =
2
(s) = F s.
These two ways of thinking about
3
must be consistent. This implies we have
(F
) 0.
This is known as the Bianchi identity.
6.2 Yang–Mills theory
At the classical level, Yang–Mills is an example of non-abelian gauge theory
defined by the action
S[] =
1
2g
2
Y M
Z
M
(F
µν
, F
µν
)
gd
d
x,
where (
·, ·
) denotes the Killing form on the Lie algebra
g
of the gauge group,
and
g
2
Y M
is the coupling constant. For flat space, we have
g
= 1, and we will
drop that term.
For example, if G = SU(n), we usually choose a basis such that
(t
a
, t
b
) =
1
2
δ
ab
,
and on a local U M , we have
S[] =
1
4g
2
Y M
Z
F
a
µν
F
b,µν
δ
ab
d
d
x,
with
F
a
µν
=
µ
A
a
ν
ν
A
a
µ
+ f
a
bc
A
b
µ
A
c
ν
.
Thus, Yang–Mills theory is the natural generalization of Maxwell theory to the
non-Abelian case.
Note that the action is treated as a function of the connection, and not the
curvature, just like in Maxwell’s theory. Schematically, we have
F
2
(dA + A
2
)
2
(dA)
2
+ A
2
dA + A
4
.
So as mentioned, there are non-trivial interactions even in the absence of charged
matter. This self-interaction is proportional to the structure constants
f
a
bc
. So
in the abelian case, these do not appear.
At the level of the classical field equations, if we vary our connection by
7→ + δa, where δa is a matrix-valued 1-form, then
δF
= F
+δa
F
=
[µ
δa
ν]
dx
µ
dx
ν
.
In other words,
δF
µν
=
[µ
δa
ν]
+ [A
µ
, δa
ν
].
The Yang–Mills equation we obtain from extremizing with respect to these
variations is
0 = δS[] =
1
g
2
Y M
Z
(δF
µν
, F
µν
) d
d
x =
1
g
2
Y M
(
µ
δa
ν
, F
µν
) d
d
x = 0.
So we get the Yang–Mills equation
µ
F
µν
=
µ
F
µν
+ [A
µ
, F
µν
] = 0.
This is just like Gauss’ equation. Recall we also had the Bianchi identity
F = 0,
which gives
µ
F
νλ
+
ν
F
λµ
+
λ
F
µν
= 0,
similar to Maxwell’s equations.
But unlike Maxwell’s equations, these are non-linear PDE’s for
A
. We no
longer have the principle of superposition. This is much more similar to general
relativity. In general relativity, we had some non-linear PDE’s we had to solve
for the metric or the connection.
We all know some solutions to Einstein’s field equations, say black holes and
the Schwarzschild metric. We also know many solutions to Maxwell’s equations.
But most people can’t write down a non-trivial solution to Yang–Mills equations.
This is not because Yang–Mills is harder to solve. If you ask anyone who
does numerics, Yang–Mills is much more pleasant to work with. The real reason
is that electromagnetism and general relativity were around for quite a long time,
and we had a lot of time to understand the solutions. Moreover, these solutions
have very direct relations to things we can observe at everyday energy scales.
However, this is not true for Yang–Mills. It doesn’t really describe everyday
phenomena, and thus less people care.
Note that the action contains no mass terms for A, i.e. there is no A
2
term.
So
A
should describe a massless particle, and this gives rise to long-range force,
just like Coulomb or gravitational forces. When Yang–Mills first introduced this
theory, Pauli objected to this, because we are introducing some new long-range
force, but we don’t see any!
To sort-of explain this, the coupling constant
g
2
Y M
plays no role in the
classical (pure) theory. Of course, it will play a role if we couple it to matter.
However, in the quantum theory,
g
2
Y M
appears together with
~
as
g
2
Y M
~
. So
the classical theory is only a reasonable approximation to the physics only if
g
2
Y M
0. Skipping ahead of the story, we will see that
g
2
Y M
is marginally
relevant. So at low energies, the classical theory is not a good approximation for
the actual quantum theory.
So let’s look at quantum Yang–Mills theory.
6.3 Quantum Yang–Mills theory
Our first thought to construct a path integral for Yang–Mills may be to compute
the partition function
Z
naive
?
=
Z
A
DA e
S
Y M
[A]
,
where
A
is the “space of all connections” on some fixed principal
G
-bundle
P M
. If we were more sophisticated, we might try to sum over all possible
principal G-bundles.
But this is not really correct. We claim that no matter how hard we try to
make sense of path integrals, this integral must diverge.
We first think about what this path integral can be. For any two connections
,
0
, it is straightforward to check that
t
= t + (1 t)
0
is a connection of
P
. Consequently, we can find a path between any two
connections on
P
. Furthermore, we saw before that the difference
0
1
M
(
g
). This says that
A
is an (infinite-dimensional) affine space modelled on
1
M
(
g
)
=
T
A
. This is like a vector space, but there is no preferred origin 0.
There is even a flat metric on A, given by
ds
2
A
=
Z
M
(δA
µ
, δA
µ
) d
d
x,
i.e. given any two tangent vectors
a
1
, a
2
1
M
(
g
)
=
T
A
, we have an inner
product
ha
1
, a
2
i
=
Z
M
(a
1µ
, a
µ
2
) d
d
x.
Importantly, this is independent of the choice of .
This all is trying to say that this
A
is nice and simple. Heuristically, we
imagine the path integral measure is the natural
L
2
measure on
A
as an affine
space. Of course, this measure doesn’t exist, because it is infinite dimensional.
But the idea is that this is just the same as working with a scalar field.
Despite this niceness, S
Y M
[] is degenerate along gauge orbits. The action
is invariant under gauge transformations (i.e. automorphisms of the principal
G
-bundle), and so we are counting each connection infinitely many times. In
fact, the group
G
of all gauge transformations is an infinite dimensional space
that is (locally)
Maps
(
M, G
), and even for compact
M
and
G
, the volume of
this space diverges.
Instead, we should take the integral over all connections modulo gauge
transformations:
Z
Y M
=
Z
A/G
dµ; e
S
Y M
[]/~
,
where
A/G
is the space of all connections modulo gauge transformation, and d
µ
is some sort of measure. Note that this means there is no such thing as “gauge
symmetry” in nature. We have quotiented out by the gauge transformations
in the path integral. Rather, gauge transformations are a redundancy in our
description.
But we now have a more complicated problem. We have no idea what the
space
A/G
looks like. Thus, even formally, we don’t understand what d
µ
on this
space should be.
In electromagnetism, we handled this problem by “picking a gauge”. We are
going to do exactly the same here, but the non-linearities of our non-abelian
theory means this is more involved. We need to summon ghosts.
6.4 Faddeev–Popov ghosts
To understand how to do this, we will first consider a particular finite-dimensional
example. Suppose we have a field (
x, y
) :
{pt} R
2
on a zero-dimensional
universe and an action
S
[
x, y
]. For simplicity, we will often ignore the existence
of the origin in R
2
.
The partition function we are interested in is
Z
R
2
dx dy e
S[x,y]
.
Suppose the action is rotationally invariant. Then we can write the integral as
Z
R
2
dx dy e
S[x,y]
=
Z
2π
0
dθ
Z
0
r dr e
S[r]
= 2π
Z
0
r dr e
S[r]
.
We can try to formulate this result in more abstract terms. Our space
R
2
of
fields has an action of the group
SO
(2) by rotation. The quotient/orbit space of
this action is
R
2
\ {0}
SO(2)
=
R
>0
.
Then what we have done is that we have replaced the integral over the whole of
R
2
, namely the (
x, y
) integral, with an integral over the orbit space
R
>0
, namely
the r integral. There are two particularly important things to notice:
The measure on
R
>0
is not the “obvious” measure d
r
. In general, we have
to do some work to figure out what the correct measure is.
We had a factor of 2
π
sticking out at the front, which corresponds to the
“volume” of the group SO(2).
In general, how do we figure out what the quotient space
R
>0
is, and how do we
find the right measure to integrate against? The idea, as we have always done,
is to “pick a gauge”.
We do so via a gauge fixing function. We specify a function
f
:
R
2
R
, and
then our gauge condition will be
f
(
x
) = 0. In other words, the “space of gauge
orbits” will be
C = {x R
2
: f(x) = 0}
f(x) = 0
For this to work out well, we need the following two conditions:
(i) For each x R
2
, there exists some R SO(2) such that f(Rx) = 0.
(ii) f
is non-degenerate. Technically, we require that for any
x
such that
f(x) = 0, we have
f
(x) =
θ
f(R
θ
(x))
θ=0
6= 0,
where R
θ
is rotation by θ.
The first condition is an obvious requirement our function
f
does pick out a
representative for each gauge orbit. The second condition is technical, but also
crucial. We want the curve to pick out a unique point in each gauge orbit. This
prevents a gauge orbit from looking like
where the dashed circle intersects the curved line three times. For any curve that
looks like this, we would have a vanishing
f
(
x
) at the turning points of the
curve. The non-degeneracy condition forces the curve to always move radially
outwards, so that we pick out a good gauge representative.
It is important to note that the non-degeneracy condition in general does not
guarantee that each gauge orbit has a unique representative. In fact, it forces
each gauge orbit to have two representatives instead. Indeed, if we consider a
simple gauge fixing function
f
(
x, y
) =
x
, then this is non-degenerate, but the
zero set looks like
It is an easy exercise with the intermediate value theorem to show that there
must be at least two representatives in each gauge orbit (one will need to use
the non-degeneracy condition).
This is not a particularly huge problem, since we are just double counting
each gauge orbit, and we know how to divide by 2. Let’s stick with this and
move on.
To integrate over the gauge orbit, it is natural to try the integral.
Z
R
2
dx dy δ(f(x))e
S(x,y)
.
Then the δ-function restricts us to the curve C, known as the gauge slice.
However, this has a problem. We would want our result to not depend on
how we choose our gauge. But not only does this integral depend on
C
. It in
fact depends on f as well!
To see this, we can simply replace f by cf for some constant c R. Then
δ(f(x)) 7→ δ(cf (x)) =
1
|c|
δ(f(x)).
So our integral changes. It turns out the trick is to include the factor of
f
we
previously defined. Consider the new integral
Z
R
2
dx dy δ(f(x))|
f
(x)|e
S(x)
. ()
To analyze how this depends on
f
, we pretend that the zero set
C
of
f
actually
looks like this:
rather than
Of course, we know the former is impossible, and the zero set must look like the
latter. However, the value of the integral (
) depends only on how
f
behaves
locally near the zero set, and so we may analyze each “branch” separately,
pretending it looks like the former. This will make it much easier to say what
we want to say.
Theorem. The integral () is independent of the choice of f and C.
Proof.
We first note that if we replace
f
(
x
) by
c
(
r
)
f
(
x
) for some
c >
0, then we
have
δ(cf) =
1
|c|
δ(f), |
cf
(x)| = c(r)|
f
|,
and so the integral doesn’t change.
Next, suppose we replace
f
with some
˜
f
, but they have the same zero set.
Now notice that
δ
(
f
) and
|
f
|
depend only on the first-order behaviour of
f
at
C
. In particular, it depends only on
f
θ
on
C
. So for all practical purposes,
changing
f
to
˜
f
is equivalent to multiplying
f
by the ratio of their derivatives.
So changing the function
f
while keeping
C
fixed doesn’t affect the value of (
).
Finally, suppose we have two arbitrary
f
and
˜
f
, with potentially different
zero sets. Now for each value of r, we pick a rotation R
θ(r)
SO(2) such that
˜
f(x) f(R
θ(r)
x).
By the previous part, we can rescale
f
or
˜
f
, and assume we in fact have equality.
We let
x
0
=
R
θ(r)
x
. Now since the action only depends on the radius, it
in particular is invariant under the action of
R
θ(r)
. The measure d
x
d
y
is also
invariant, which is particularly clear if we write it as d
θ r
d
r
instead. Then we
have
Z
R
2
dx dy δ(f(x))|
f
(x)|e
S(x)
=
Z
R
2
dx
0
dy
0
δ(f(x
0
))|
f
(x
0
)|e
S(x
0
)
=
Z
R
2
dx
0
dy
0
δ(
˜
f(x))|
˜
f
(x)|e
S(x
0
)
=
Z
R
2
dx dy δ(
˜
f(x))|
˜
f
(x)|e
S(x)
Example.
We choose
C
to be the
x
-axis with
f
(
x
) =
y
. Then under a rotation
f(x) = y 7→ y sin θ x sin θ,
we have
f
(x) = x.
So we have
Z
R
2
dx dy δ(f)∆
f
(x)e
S(x,y)
=
Z
R
2
dx dy δ(y)|x|e
S(x,y)
=
Z
−∞
dx |x|e
S(x,0)
= 2
Z
0
d|x| |x|e
S(|x|)
= 2
Z
0
r dr e
S(r)
.
So this gives us back the original integral
Z
0
r dr e
S(r)
of along the gauge orbit we wanted, except for the factor of 2. As we mentioned
previously, this is because our gauge fixing condition actually specifies two points
on each gauge orbit, instead of one. This is known as the Gribov ambiguity.
When we do perturbation theory later on, we will not be sensitive to this
global factor, because in perturbation theory, we only try to understand the
curve locally, and the choice of gauge is locally unique.
The advantage of looking at the integral
Z
R
2
dx dy δ(f)∆
f
e
S(x,y)
is that it only refers to functions and measures on the full space
R
2
, which we
understand well.
More generally, suppose we have a (well-understood) space
X
with a measure
d
µ
. We then have a Lie group
G
acting on
X
. Suppose locally (near the identity),
we can parametrize elements of
G
by parameters
θ
a
for
a
= 1
, ··· , dim G
. We
write
R
θ
for the corresponding element of
G
(technically, we are passing on to
the Lie algebra level).
To do gauge fixing, we now need many gauge fixing functions, say
f
a
, again
with a = 1, ··· , dim G. We then let
f
= det
f
a
(R
θ
x)
θ
b
θ=0
.
This is known as the Fadeev–Popov determinant.
Then if we have a function
e
S[x]
that is invariant under the action of
G
,
then to integrate over the gauge orbits, we integrate
Z
X
dµ |
f
|
dim G
Y
a=1
δ(f
a
(x))e
S[x]
.
Now in Yang–Mills, our spaces and groups are no longer finite-dimensional, and
nothing makes sense. Well, we can manipulate expressions formally. Suppose we
have some gauge fixing condition f. Then the expression we want is
Z =
Z
A/G
Dµ e
S
Y M
=
Z
A
DA δ[f]|
f
(A)|e
S
Y M
[A]
,
Suppose the gauge group is
G
, with Lie algebra
g
. We will assume the gauge
fixing condition is pointwise, i.e. we have functions
f
a
:
g g
, and the gauge
fixing condition is
f(A(x)) = 0 for all x M.
Then writing n = dim g, we can write
δ[f] =
Y
xM
δ
(n)
(f(A(x))).
We don’t really know what to do with these formal expressions. Our strategy is
to write this integral in terms of more “usual” path integrals, and then we can
use our usual tools of perturbation theory to evaluate the integral.
We first take care of the
δ
[
f
] term. We introduce a new bosonic field
h
0
M
(g), and then we can write
δ[f] =
Z
Dh exp
Z
ih
a
(x)f
a
(A(x)) d
d
x
,
This
h
acts as a “Lagrange multiplier” that enforces the condition
f
(
A
(
x
)) = 0,
and we can justify this by comparing to the familiar result that
Z
e
ip·x
dp = δ(x).
To take care of the determinant term, we wanted to have
f
= det
δf
a
[A
λ
(x)]
δλ
b
(y)
,
where λ
a
(y) are our gauge parameters, and A
λ
is the gauge-transformed field.
Now recall that for a finite-dimensional n × n matrix M , we have
det(M) =
Z
d
n
c d
n
¯c e
¯cMc
,
where
c, ¯c
are
n
-dimensional fermion fields. Thus, in our case, we can write the
Fadeev—Popov determinant as the path integral
f
=
Z
Dc c exp
Z
M×M
d
d
x d
d
y ¯c
a
(x)
δf
a
(A
λ
(x))
δλ
b
(y)
c
b
(y)
,
where
c, ¯c
are fermionic scalars, again valued in
g
under the adjoint action. Since
we assumed that
f
a
is local, i.e.
f
a
(
A
)(
x
) is a function of
A
(
x
) and its derivative
at x only, we can simply write this as
f
=
Z
Dc c exp
Z
M
d
d
x ¯c
a
(x)
δf
a
(A
λ
)
δλ
b
(x)c
b
(x)
.
The fermionic fields
c
and
¯c
are known as ghosts and anti-ghosts respectively.
We might find these a bit strange, since they are spin 0 fermionic fields, which
violates the spin statistic theorem. So, if we do canonical quantization with this,
then we find that we don’t get a Hilbert space, as we get states with negative
norm! Fortunately, there is a subspace of gauge invariant states that do not
involve the ghosts, and the inner product is positive definite in this subspace.
When we focus on these states, and on operators that do not involve ghosts,
then we are left with a good, unitary theory. These
c
and
¯c
aren’t “genuine”
fields, and they are just there to get rid of the extra fields we don’t want. The
“physically meaningful” theory is supposed to happen in A/G, where no ghosts
exist.
Will all factors included, the full action is given by
S[A, ¯c, c, h] =
Z
d
d
x
1
4g
2
Y M
F
a
µν
F
a,µν
+ ih
a
f
a
(A) ¯c
a
δf
a
(A
λ
)
δλ
b
c
b
,
and the path integral is given by
Z =
Z
DA Dc c Dh exp
S[A, ¯c, c, h]
.
Example.
We often pick Lorenz gauge
f
a
(
A
) =
µ
A
a
µ
. Under a gauge transfor-
mation, we have A 7→ A
λ
= A + λ. More explicitly, we have
(A
λ
)
a
µ
= A
a
µ
+
µ
λ
a
+ f
a
bc
A
b
µ
λ
c
.
So the matrix appearing in the Fadeev–Popov determinant is
δf
a
(A
λ
)
δλ
b
=
µ
µ
.
Thus, the full Yang–Mills action is given by
S[A, ¯c, c, h] =
Z
d
d
x
1
4g
2
Y M
F
a
µν
F
a,µν
+
i
2
h
a
µ
A
a
µ
¯c
a
µ
µ
c
a
.
Why do have to do this? Why didn’t we have to bother when we did
electrodynamics?
If we did this for an abelian gauge theory, then the structure constants were
zero. Consequently, all
f
a
bc
terms do not exist. So the ghost kinetic operator
does not involve the gauge field. So the path integral over the ghosts would be
completely independent of the gauge field, and so as long as we worked in a
gauge, then we can ignore the ghosts. However, in a non-abelian gauge theory,
this is not true. We cannot just impose a gauge and get away with it. We need
to put back the Jacobian to take into account the change of variables, and this
is where the ghosts come in.
The benefit of doing all this work is that it now looks very familiar. It seems
like something we can tackle using Feynman rules with perturbation theory.
6.5 BRST symmetry and cohomology
In the Faddeev–Popov construction, we have introduced some gauge-fixing terms,
and so naturally, the Lagrangian is no longer gauge invariant. However, we would
like to be able to use gauge symmetries to understand our theory. For example,
we would expect gauge symmetry to restrict the possible terms generated by the
renormalization group flow, but we now can’t do that.
It turns out that our action still has a less obvious symmetry arising form
gauge invariance, known as BRST symmetry. This was discovered by Becchi,
Rouet and Stora, and also independently by Tyruin.
To describe this symmetry, we are going to construct an BRST operator.
Since we want to prove things about this, we have to be more precise about
what space this is operating on.
We let
B
be the (complex) space of all polynomial functions in the fields and
their derivatives. More precisely, it is defined recursively as follows:
Let Ψ be any of
{A
µ
, c
a
, ¯c
a
, h
a
}
, and
α
be any differential operator (e.g.
1
2
3
). Then
α
Ψ B.
Any complex C
function on M is in B.
If a, b B, then a + b, ab B.
We impose the obvious commutativity relations based on fermionic and bosonic
statistics. Note that by construction, this space only contains polynomial
functions in the fields. This is what we are going to assume when we try to prove
things, and makes things much more convenient because we can now induct on
the degree of the polynomial.
However, for a general gauge-fixing function, we cannot guarantee that the
Yang–Mills Lagrangian lives in
B
(even though for the Lorenz gauge, it does).
What we do can be extended to allow for more general functions of the fields,
but this is difficult to make precise and we will not do that.
This
B
is in fact a
Z/
2
Z
-graded algebra, or a superalgebra, i.e. we can
decompose it as
B = B
0
B
1
,
where
B
0
, B
1
are vector subspaces of
B
. Here
B
0
contains the “purely bosonic”
terms, while B
1
contains the purely fermionic terms. These satisfy
B
s
B
t
B
s+t
,
where the subscripts are taken modulo 2. Moreover, if
y B
s
and
x B
t
, then
we have the (graded-)commutativity relation
yx = (1)
st
xy.
If
x
belongs to one of
B
0
or
B
1
, we say it has definite statistics, and we write
|x| = s if x B
s
.
Definition (BRST operator). The BRST operator Q is defined by
QA
µ
=
µ
c Q¯c = ih
Qc =
1
2
[c, c] Qh = 0.
This extends to an operator on
B
by sending all constants to 0, and for
f, g B
of definite statistics, we set
Q(fg) = (1)
|f|
fQg + (Qf)g, Q(
µ
f) =
µ
Qf.
In other words, Q is a graded derivation.
There are a couple of things to note:
Even though we like to think of
¯c
as the “complex conjugate” of
c
, formally
speaking, they are unrelated variables, and so we have no obligation to
make Q¯c related to Qc.
The expression [c, c] is defined by
[c, c]
a
= f
a
bc
c
b
c
c
,
where
f
a
bc
are the structure constants. This is non-zero, even though the
Lie bracket is anti-commutative, because the ghosts are fermions.
The operator
Q
exchanges fermionic variables with bosonic variables. Thus,
we can think of this as a “fermionic operator”.
It is an exercise to see that Q is well-defined.
We will soon see that this gives rise to a symmetry of the Yang–Mills action.
To do so, we first need the following fact:
Theorem. We have Q
2
= 0.
Proof. We first check that for any field Ψ, we have Q
2
Ψ = 0.
This is trivial for h.
We have
Q
2
¯c = Qih = 0.
Note that for fermionic a, b, we have [a, b] = [b, a]. So
Q
2
c =
1
2
Q[c, c] =
1
2
([Qc, c] + [c, Qc]) = [Qc, c] =
1
2
[[c, c], c].
It is an exercise to carefully go through the anti-commutativity and see
that the Jacobi identity implies this vanishes.
Noting that G acts on the ghosts fields by the adjoint action, we have
Q
2
A
µ
= Q∇
µ
c
=
µ
(Qc) + [QA
µ
, c]
=
1
2
µ
[c, c] + [
µ
c, c]
=
1
2
([
µ
c, c] + [c,
µ
c]) + [
µ
c, c]
= 0.
To conclude the proof, it suffices to show that if
a, b B
are elements of definite
statistics such that
Q
2
a
=
Q
2
b
= 0, then
Q
2
ab
= 0. Then we are done by
induction and linearity. Using the fact that |Qa| = |a| + 1 (mod 2), we have
Q
2
(ab) = (Q
2
a)b + aQ
2
b + (1)
|a|
(Qa)(Qb) + (1)
|a|+1
(Qa)(Qb) = 0.
We can now introduce some terminology.
Definition
(BRST exact)
.
We say
a B
is BRST exact if
a
=
Qb
for some
b B.
Definition (BRST closed). We say a B is BRST closed if Qa = 0.
By the previous theorem, we know that all BRST exact elements are BRST
closed, but the converse need not be true.
In the canonical picture, as we saw in Michaelmas QFT, the “Hilbert space”
of this quantum theory is not actually a Hilbert space some states have
negative norm, and some non-zero states have zero norm. In order for things
to work, we need to first restrict to a subspace of non-negative norm, and then
quotient out by states of zero norm, and this gives us the space of “physical”
states.
The BRST operator gives rise to an analogous operator in the canonical
picture, which we shall denote
ˆ
Q
. It turns out that the space of non-negative
norm states is exactly the states
|ψi
such that
ˆ
Q|ψi
= 0. We can think of this
as saying the physical states must be BRST-invariant. Moreover, the zero norm
states are exactly those that are of the form Q|φi = 0. So we can write
H
phys
=
BRST closed states
BRST exact states
.
This is known as the cohomology of the operator
ˆ
Q
. We will not justify this or
further discuss this, as the canonical picture is not really the main focus of this
course or section.
Let’s return to our main focus, which was to find a symmetry of the Yang–
Mills action.
Theorem.
The Yang–Mills Lagrangian is BRST closed. In other words,
QL
= 0.
This theorem is often phrased in terms of the operator
δ
=
εQ
, where
ε
is a
new Grassmannian variable. Since
ε
2
= 0, we know (
δa
)(
δb
) = 0 for any
a, b
. So
the function that sends any
a
to
a 7→ a
+
δa
is well-defined (or rather, respects
multiplication), because
ab + δ(ab) = (a + δa)(b + δb).
So this
δ
behaves like an infinitesimal transformation with
ε
“small”. This is
not true for Q itself.
Then the theorem says this infinitesimal transformation is in fact a symmetry
of the action.
Proof.
We first look at the (
F
µν
, F
µν
) piece of
L
. We notice that for the purposes
of this term, since
δA
is bosonic, the BRST transformation for
A
looks just like
a gauge transformation
A
µ
7→ A
µ
+
µ
λ
. So the usual (explicit) proof that this
is invariant under gauge transformations shows that this is also invariant under
BRST transformations.
We now look at the remaining terms. We claim that it is not just BRST
closed, but in fact BRST exact. Indeed, it is just
Qc
a
f
a
[A]) = ih
a
f
a
[A] ¯c
a
δf
δλ
c.
So we have found a symmetry of the action, and thus if we regularize the path
integral measure appropriately so that it is invariant under BRST symmetries
(e.g. in dimensional regularization), then BRST symmetry will in fact be a
symmetry of the full quantum theory.
Now what does this actually tell us? We first note the following general fact:
Lemma.
Suppose
δ
is some operator such that Φ
7→
Φ +
δ
Φ is a symmetry.
Then for all O, we have
hδOi = 0.
Proof.
hO(Φ)i = hO
0
)i = hO(Φ) + δOi.
Corollary.
Adding BRST exact terms to the Lagrangian does not affect the
expectation of BRST invariant functions.
Proof.
Suppose we add
Qg
to the Lagrangian, and
O
is BRST invariant. Then
the change in hOi is
Z
Qg(x) d
d
x O
=
Z
hQ(gO)i d
d
x = 0.
If we want to argue about this more carefully, we should use
εQ
instead of
Q
.
This has some pretty important consequences. For example, any gauge
invariant term that involves only
A
is BRST invariant. Also, changing the
gauge-fixing function just corresponds to changing the Lagrangian by a BRST
exact term,
Q
(
¯c
a
f
a
[
A
]). So this implies that as long as we only care about gauge-
invariant quantities, then all correlation functions are completely independent of
the choice of f .
6.6 Feynman rules for Yang–Mills
In general, we cannot get rid of the ghost fields. However, we can use the
previous corollary to get rid of the h field. We add the BRST exact term
i
ξ
2
Qc
a
h
a
) =
ξ
2
h
a
h
a
d
d
x
to the Lagrangian, where
ξ
is some arbitrary constant. Then we can complete
the square, and obtain
ih
a
f
a
[A] +
ξ
2
h
a
h
a
=
ξ
2
h
a
+
i
ξ
f
a
[A]
2
+
1
2ξ
f
a
[A]f
a
[A].
Moreover, in the path integral, for each fixed A, we should have
Z
Dh exp
ξ
2
h
a
+
i
ξ
f
a
[A]
2
!
=
Z
Dh exp
ξ
2
(h
a
)
2
,
since we are just shifting all
h
by a constant. Thus, if we are not interested in
correlation functions involving
h
, then we can simply factor out and integrate
out
h
, and it no longer exists. Then the complete gauge-fixed Yang–Mills action
in Lorenz gauge with coupling to fermions ψ is then
S[A, ¯c, c,
¯
ψ, ψ]
=
Z
d
d
x
1
4
F
a
µν
F
µν,a
+
1
2ξ
(
µ
A
a
µ
)(
ν
A
a
ν
) ¯c
a
µ
µ
c
a
+
¯
ψ(
/
+ m)ψ
,
where we absorbed the factor of g
Y M
into A.
Using this new action, we can now write down our Feynman rules. We have
three propagators
p
= D
ab
µν
(p) =
δ
ab
p
2
δ
µν
(1 ξ)
p
µ
p
ν
p
2
p
= S(p) =
1
i
/
p + m
¯c
c
p
= C
ab
(p) =
δ
ab
p
2
We also have interaction vertices given by
A
a
µ
A
b
ν
A
c
λ
p
q
k
= g
Y M
f
abc
((k p)
λ
δ
µν
+ (p q)
µ
δ
νλ
+ (q k)δ
µλ
)
A
a
µ
A
b
ν
A
c
λ
A
d
σ
=
g
2
Y M
f
abe
f
cde
(δ
µλ
δ
νσ
δ
µσ
δ
νλ
)
g
2
Y M
f
ace
f
bde
(δ
µν
δ
σλ
δ
µσ
δ
νλ
)
g
2
Y M
f
ade
f
bce
(δ
µν
δ
σλ
δ
µλ
δ
νσ
)
¯c
b
c
c
A
a
µ
(p)
p
= g
Y M
f
abc
p
µ
¯
ψ ψ
A
a
µ
(p)
= g
Y M
γ
µ
t
a
f
.
What is the point of writing all this out? The point is not to use them.
The point is to realize these are horrible! It is a complete pain to work with
these Feynman rules. For example, a
gg ggg
scattering process involves
10000 terms when we expand it in terms of Feynman diagrams, at tree level!
Perturbation theory is not going to right way to think about Yang–Mills. And
GR is only worse.
Perhaps this is a sign that our theory is wrong. Surely a “correct” theory
must look nice. But when we try to do actual computations, despite these
horrific expressions, the end results tends to be very nice. So it’s not really
Yang–Mills’ fault. It’s just perturbation theory that is bad.
Indeed, there is no reason to expect perturbation theory to look good.
We formulated Yang–Mills in terms of a very nice geometric picture, about
principal
G
-bundles. From this perspective, everything is very natural and
simple. However, to do perturbation theory, we needed to pick a trivialization,
and then worked with this object
A
. The curvature
F
µν
was a natural geometric
object to study, but breaking it up into d
A
+
A A
is not. The individual terms
that give rise to the interaction vertices have no geometric meaning only
and
F
do. The brutal butchering of the connection and curvature into these
non-gauge-invariant terms is bound to make our theory look messy.
Then what is a good way to do Yang–Mills? We don’t know. One rather
successful approach is to use lattice regularization, and use a computer to
actually compute partition functions and correlations directly. But this is very
computationally intensive, and it’s difficult to do complicated computations with
this. Some other people try to understand Yang–Mills in terms of string theory,
or twistor theory. But we don’t really know.
The only thing we can do now is to work with this mess.
6.7 Renormalization of Yang–Mills theory
To conclude the course, we compute the
β
-function of Yang–Mills theory. We
are mostly interested in the case of
SU
(
N
). While a lot of the derivations (or
lack of) we do are general, every now and then, we use the assumption that
G = SU(N), and that the fermions live in the fundamental representation.
Usually, to do perturbation theory, we expand around a vacuum, where all
fields take value 0. However, this statement doesn’t make much sense for gauge
theory, because
A
µ
is only defined up to gauge transformations, and it doesn’t
make sense to “set it to zero”. Instead, what we do is that we fix any classical
solution
A
0
µ
to the Yang–Mills equation, and take this as the “vacuum” for
A
µ
.
This is known as the background field . We write
A
µ
= A
0
µ
+ a
µ
.
Whenever we write
µ
, we will be using the background field, so that
µ
=
µ
+ A
0
µ
.
We can compute
F
µν
=
µ
A
ν
ν
A
µ
+ [A
µ
, A
ν
]
= F
0
µν
+
µ
a
ν
ν
a
µ
+ [a
µ
, a
ν
] + [A
0
µ
, a
ν
] + [a
µ
, A
0
ν
]
= F
0
µν
+
µ
a
ν
ν
a
µ
+ [a
µ
, a
ν
].
Thus, if we compute the partition function, we would expect to obtain
something of the form
Z e
S
eff
[A]
= exp
1
2g
2
Y M
Z
(F
0
µν
, F
0,µν
) d
d
x
(something).
A priori, the “something” will be a function of
A
, and also the energy scale
µ
.
Then since the result shouldn’t actually depend on
µ
, this allows us to compute
the β-function.
We will work in Feynman gauge, where we pick ξ = 1. So we have
S[a, ¯c, c,
¯
ψ, ψ] =
Z
d
d
x
1
4g
2
(F
0
µν
+
[µ
a
ν]
+ [a
µ
, a
ν
])
2
1
2g
2
(
µ
A
µ
+
µ
a
µ
)
2
¯c∂
µ
µ
c ¯c∂
µ
a
µ
c +
¯
ψ(
/
+ m)ψ +
¯
ψ
/
.
This allows us to rewrite the original action in terms of the new field
a
µ
. We
will only compute the β-function up to 1-loop, and it turns out this implies we
don’t need to know about the whole action. We claim that we only need to know
about quadratic terms.
Let
L
be the number of loops,
V
the number of vertices, and
P
the number
of propagators. Then, restricting to connected diagrams, Euler’s theorem says
P V = L 1.
By restricting to 1-loop diagrams, this means we only care about diagrams with
P = V .
For each node
i
, we let
n
q,i
be the number of “quantum” legs coming out of
the vertex, i.e. we do not count background fields. Then since each propagator
connects two vertices, we must have
2P =
X
vertices i
n
q,i
.
Also, almost by definition, we have
V =
X
vertices i
1.
So this implies we only care about fields with
0 =
X
(n
q,i
2).
It can be argued that since
A
0
µ
satisfies the Yang–Mills equations, we can ignore
all linear terms. So this means it suffices to restrict to the quadratic terms.
Restricting to the quadratic terms, for each of
c, ψ, a
, we have a term that
looks like, say
Z
Dc c e
R
d
d
x ¯cc
for some operator ∆. Then the path integral will give
det
∆. If the field is a
boson, then we obtain
1
det
instead, but ultimately, the goal is to figure out what
this is.
Note that each particle comes with a representation of
SO
(
d
) (or rather, the
spin group) and the gauge group
G
. For our purposes, all we need to know about
the representations of
SO
(
d
) is the spin, which may be 0 (trivial),
1
2
(spinor) or
1 (vector). We will refer to the representation of
G
as
R
”, and the spin as
j
.
We then define the operator
R,j
=
2
+ 2
1
2
F
a
µν
J
µν
(j)
t
a
R
,
where
{t
a
R
}
are the images of the generators of
g
in the representation, and
J
µν
(j)
are the generators of so(d) in the spin j representation. In particular,
J
µν
(0)
= 0, J
µν
(
1
2
)
= S
µν
=
1
4
[γ
µ
, γ
ν
], J
µν
(1)
= i(δ
ρ
µ
δ
σ
ν
δ
ρ
ν
δ
σ
µ
).
For simplicity, we will assume the fermion masses
m
= 0. Then we claim that
(up to a constant factor), the of the
c
,
ψ
and
a
fields are just
adj,0
,
q
R,
1
2
and
adj,1
respectively. This is just a computation, which we shall omit.
Thus, if there are n many fermions, then we find that we have
Z = exp
1
2g
2
Y M
Z
(F
0
µν
, F
0,µν
) d
d
x
(det
adj,0
)(det
R,
1
2
)
n/2
(det
adj,1
)
1/2
.
We are going to view these extra terms as being quantum corrections to the
effective action of
A
µ
, and we will look at the corrections to the coupling
g
2
Y M
they induce. Thus, we are ultimately interested in the logarithm of these extra
terms.
To proceed, write write out explicitly:
R,j
=
2
+ (
µ
A
a
µ
+ A
a
µ
µ
)t
a
(R)
| {z }
(1)
+ A
µ,a
A
b
µ
t
a
R
t
b
R
| {z }
(2)
+ 2
1
2
F
a
µν
J
µν
(j)
t
a
(R)
| {z }
(J)
.
Then we can write the logarithm as
log det
R,j
= log det(
2
+
(1)
+
(2)
+
(J)
)
= log det(
2
) + tr log(1 (
2
)
1
(∆
(1)
+
(2)
+
(J)
)).
Again, the first term is a constant, and we will ignore it. We want to know the
correction to the coupling
1
4g
2
Y M
. Since everything is covariant with respect to
the background field
A
0
, it is enough to just compute the quadratic terms in
A
µ
, because all the other terms are required to behave accordingly.
We now see what quadratic terms we obtain in the series expansion of
log
.
In the case of
G
=
SU
(
N
), We have
tr t
a
=
tr J
µν
= 0. So certain cross terms
vanish, and the only quadratic terms are
log det
R,j
tr(
2
(2)
)
| {z }
(a)
+
1
2
tr(
2
(1)
2
(1)
)
| {z }
(b)
+
1
2
tr(
2
(J)
2
(J)
)
| {z }
(c)
.
(a) =
Z
d
d
k
(2π)
d
A
a
µ
(k)A
b
ν
(k)
Z
d
d
p
(2π)
d
tr
R
(t
a
t
b
)
p
2
δ
µν
d(j)
(b) =
Z
d
d
k
(2π)
d
d
d
p
(2π)
d
(k + 2p)
µ
(k + 2p)
ν
tr(t
a
t
b
)A
a
µ
(k)A
b
ν
(k)
p
2
(p + k)
2
(c) =
Z
d
d
k
(2π)
d
d
d
p
(2π)
d
A
a
µ
(k)A
b
ν
(k)(k
2
δ
µν
k
µ
k
ν
)
1
p
2
(p + k)
2
C(j) tr(t
a
t
b
).
where
d
(
j
) is the number of spin components of the field and
C
(
j
) is some
constant depending on j. Explicitly, they are given by
scalar Dirac 4-vector
d(j) 1 4 4
C(j) 0 1 2
In terms of Feynman diagrams, we can interpret (
a
) as being given by the
loop diagram
k
A
a
µ
A
b
ν
p
while (b) and (c) are given by diagrams that look like
k
A
a
µ
A
b
ν
We now define the quantity C(R) by
tr
R
(t
a
t
b
) = C(R)δ
ab
,
where the trace is to be taken in the representation
R
. For
G
=
SU
(
N
), we have
C(adj) = N, C(fund) =
1
2
.
Then, evaluating all those integrals, we find that, in dimensional regularization,
we have
log
R,j
=
Γ(2
d
2
)
4
Z
d
d
x µ
4d
(
2
)
d4
2
F
a
µν
F
a,µν
·
C(R)
(4π)
2
d(j)
3
4C(j)
.
Thus, combining all the pieces, we obtain
S
eff
[A] =
1
4g
2
Y M
Z
d
d
x F
a
µν
F
a,µν
Γ
2
d
2
4
Z
d
d
x µ
4d
(
2
)
d4
2
F
a
µν
F
µν
a
×
1
2
C
adj,1
C
adj,0
n
2
C
R,1/2
,
where
C
R,j
=
C(R)
(4π)
2
d(j)
3
4C(j)
.
Explicitly, we have
C
R,j
=
C(R)
(4π)
2
×
1
3
scalars
8
3
Dirac
20
3
vectors
.
As always, the Γ function diverges as we take
d
4, and we need to remove
the divergence using counterterms. In the
MS
scheme with scale
µ
(with
g
2
Y M
=
µ
4d
g
2
(
µ
)), we are left with logarithmic dependence on
µ
2
. The independence
of µ gives the condition
µ
d
dµ
1
g
2
(µ)
+ log
µ
2
something
1
2
C
adj,1
C
adj,0
n
2
C
R,
1
2

= 0.
So
2
g
3
(µ)
β(g) + 2
1
2
C
adj,1
C
adj,0
n
2
C
R,1
= 0.
In other words, we find
β(g) = g
3
(µ)
1
2
C
adj,1
C
adj,0
n
2
C
R,1
=
g
3
(µ)
(4π)
2
11
3
C(adj)
4n
3
C(R)
=
g
3
(4π)
2
11
3
N
2n
3
.
Thus, for
n
sufficiently small, the
β
-function is negative! Hence, at least for
small values of
g
, where we can trust perturbation theory, this coupling now
increases as Λ 0, and decreases as Λ .
The first consequence is that this theory has a sensible continuum limit! The
coupling is large at low energies, and after a long story, this is supposed to lead
to the confinement of quarks.
In fact, in 1973, Gross–Coleman showed that the only non-trivial QFT’s in
d
= 4 with a continuum limit are non-abelian gauge theories. The proof was by
exhaustion! They just considered the most general kind of QFT we can have,
and then computed the
β
-functions etc., and figured the only ones that existed
were non-abelian gauge theories.