Part II Statistical Physics
Based on lectures by H. S. Reall
Notes taken by Dexter Chua
Lent 2017
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Part IB Quantum Mechanics and “Multiparticle Systems” from Part II Principles of
Quantum Mechanics are essential
Fundamentals of statistical mechanics
Micro canonical ensemble. Entropy, temperature and pressure. Laws of thermody-
namics. Example of paramagnetism. Boltzmann distribution and canonical ensemble.
Partition function. Free energy. Specific heats. Chemical Potential. Grand Canonical
Ensemble. [5]
Classical gases
Density of states and the classical limit. Ideal gas. Maxwell distribution. Equipartition
of energy. Diatomic gas. Interacting gases. Virial expansion. Van der Waal’s equation
of state. Basic kinetic theory. [3]
Quantum gases
Density of states. Planck distribution and black body radiation. Debye model of
phonons in solids. Bose–Einstein distribution. Ideal Bose gas and Bose–Einstein
condensation. Fermi-Dirac distribution. Ideal Fermi gas. Pauli paramagnetism. [8]
Thermodynamics
Thermodynamic temperature scale. Heat and work. Carnot cycle. Applications of laws
of thermo dynamics. Thermodynamic potentials. Maxwell relations. [4]
Phase transitions
Liquid-gas transitions. Critical point and critical exponents. Ising model. Mean field
theory. First and second order phase transitions. Symmetries and order parameters. [4]
Contents
0 Introduction
1 Fundamentals of statistical mechanics
1.1 Microcanonical ensemble
1.2 Pressure, volume and the first law of thermodynamics
1.3 The canonical ensemble
1.4 Helmholtz free energy
1.5 The chemical potential and the grand canonical ensemble
1.6 Extensive and intensive properties
2 Classical gases
2.1 The classical partition function
2.2 Monoatomic ideal gas
2.3 Maxwell distribution
2.4 Diatomic gases
2.5 Interacting gases
3 Quantum gases
3.1 Density of states
3.2 Black-body radiation
3.3 Phonons and the Debye model
3.4 Quantum ideal gas
3.5 Bosons
3.6 Bose–Einstein condensation
3.7 Fermions
3.8 Pauli paramagnetism
4 Classical thermodynamics
4.1 Zeroth and first law
4.2 The second law
4.3 Carnot cycles
4.4 Entropy
4.5 Thermodynamic potentials
4.6 Third law of thermodynamics
5 Phase transitions
5.1 Liquid-gas transition
5.2 Critical point and critical exponents
5.3 The Ising model
5.4 Landau theory
0 Introduction
In all of our previous physics courses, we mostly focused on “microscopic” physics.
For example, we used the Schr¨odinger equation to describe the mechanics of
a single hydrogen atom. We had to do quite a bit of hard work to figure out
exactly how it behaved. But what if we wanted to study a much huger system?
Let’s say we have a box of hydrogen gas, consisting of
10
23
molecules. If we
tried to solve the Schr¨odinger equation for this system, then, even numerically,
it is completely intractable.
So how can we understand such a system? Of course, given these
10
23
molecules, we are not interested in the detailed dynamics of each molecule. We
are only interested in some “macroscopic” phenomena. For example, we might
want to know its pressure and temperature. So what we want to do is to describe
the whole system using just a few “macroscopic” variables, and still capture the
main properties we are interested.
In the first part of the course, we will approach this subject rather “rigorously”.
We will start from some microscopic laws we know about, and use them to deduce
properties of the macroscopic system. This turns out to be rather successful,
at least if we treat things sufficiently properly. Surprisingly, it even applies to
scenarios where it is absolutely not obvious it should apply!
Historically, this was not how statistical physics was first developed. Instead,
we only had the “macroscopic” properties we tried to understand. Back in
the days, we didn’t even know things are made up of atoms! We will try to
understand statistical phenomena in purely macroscopic and “wordy” terms,
and it turns out we can reproduce the same predictions as before.
Finally, we will turn our attention to something rather different in nature
phase transitions. As we all know, water turns from liquid to gas as we raise the
temperature. This transition is an example of a phase transition. Of course, this
is still a “macroscopic” phenomena, fitting in line with the theme of the course.
It doesn’t make sense to talk about a single molecule transitioning from liquid
to gas. Interestingly, when we look at many different systems undergoing phase
transitions, they seem to all behave “in the same way” near the phase transition.
We will figure out why. At least, partially.
Statistical physics is important. As we mentioned, we can use this to make
macroscopic predictions from microscopic laws. Thus, this allows us to put our
microscopic laws to experimental test. It turns out methods of statistical physics
have some far-reaching applications elsewhere. It can be used to study black
holes, or even biology!
1 Fundamentals of statistical mechanics
1.1 Microcanonical ensemble
We begin by considering a rather general system. Suppose we have an isolated
system containing
N
particles, where
N
is a Large Number
TM
. The canonical
example to keep in mind is a box of gas detached from reality.
Definition
(Microstate)
.
The microstate of a system is the actual (quantum)
state of the system. This gives a complete description of the system.
As one would expect, the microstate is very complicated and infeasible to
describe, especially when we have many particles. In statistical physics, we
observe that many microstates are indistinguishable macroscopically. Thus, we
only take note of some macroscopically interesting quantities, and use these
macroscopic quantities to put a probability distribution on the microstates.
More precisely, we let {|ni} be a basis of normalized eigenstates, say
ˆ
H |ni = E
n
|ni.
We let
p
(
n
) be the probability that the microstate is
|ni
. Note that this
probability is not the quantum probability we are used to. It is some probability
assigned to reflect our ignorance of the system. Given such probabilities, we can
define the expectation of an operator in the least imaginative way:
Definition
(Expectation value)
.
Given a probability distribution
p
(
n
) on the
states, the expectation value of an operator O is
hOi =
X
n
p(n) hn|O|ni.
If one knows about density operators, we can describe the system as a mixed
state with density operator
ρ =
X
n
p(n) |nihn|.
There is an equivalent way of looking at this. We can consider an ensemble
consisting of
W
1 independent copies of our system such that
W p
(
n
) many
copies are in the microstate
|ni
. Then the expectation is just the average over
the ensemble. For most purposes, how we think about this doesn’t really matter.
We shall further assume our system is in equilibrium, i.e. the probability
distribution
p
(
n
) does not change in time. So in particular
hOi
is independent of
time. Of course, this does not mean the particles stop moving. The particles are
still whizzing around. It’s just that the statistical distribution does not change.
In this course, we will mostly be talking about equilibrium systems. When we
get out of equilibrium, things become very complicated.
The idea of statistical physics is that we have some partial knowledge about
the system. For example, we might know its total energy. The microstates that
are compatible with this partial knowledge are called accessible. The fundamental
assumption of statistical mechanics is then
An isolated system in equilibrium is equally likely to be in any of the
accessible microstates.
Thus, different probability distributions, or different ensembles, are distinguished
by the partial knowledge we know.
Definition
(Microcanonical ensemble)
.
In a microcanonical ensemble, we know
the energy is between
E
and
E
+
δE
, where
δE
is the accuracy of our measuring
device. The accessible microstates are those with energy
E E
n
E
+
δE
. We
let Ω(E) be the number of such states.
In practice,
δE
is much much larger than the spacing of energy levels, and so
Ω(
E
)
1. A priori, it seems like our theory will depend on what the value of
δE is, but as we develop the theory, we will see that this doesn’t really matter.
It is crucial here that we are working with a quantum system, so the possible
states is discrete, and it makes sense to count the number of systems. We need
to do more quite a bit work if we want to do this classically.
Example.
Suppose we have
N
= 10
23
particles, and each particle can occupy
two states
|↑i
and
|↓i
, which have the same energy
ε
. Then we always have
Nε
total energy, and we have
Ω(Nε) = 2
10
23
.
This is a fantastically huge, mind-boggling number. This is the kind of number
we are talking about.
By the fundamental assumption, we can write
p(n) =
(
1
Ω(E)
if E E
n
E + δE
0 otherwise
.
This is the characteristic distribution of the microcanonical ensemble.
It turns out it is not very convenient to work with Ω(
E
). In particular, Ω(
E
)
is not linear in
N
, the number of particles. Instead, it scales as an exponential
of N . So we take the logarithm.
Definition (Boltzmann entropy). The (Boltzmann) entropy is defined as
S(E) = k log Ω(E),
where k = 1.381 ×10
23
J K
1
is Boltzmann’s constant.
This annoying constant
k
is necessary because when people started doing
thermodynamics, they didn’t know about statistical physics, and picked weird
conventions.
We wrote our expressions as
S
(
E
), instead of
S
(
E, δE
). As promised, the
value of
δE
doesn’t really matter. We know that Ω(
E
) will scale approximately
linearly with
δE
. So if we, say, double
δE
, then
S
(
E
) will increase by
k log
2,
which is incredibly tiny compared to
S
(
E
) =
k log
Ω(
E
). So it doesn’t matter
which value of δE we pick.
Even if you are not so convinced that multiplying 10
10
23
by a factor of 2 or
adding
log
2 to 10
23
do not really matter, you should be reassured that at the
end, we will rarely talk about Ω(
E
) or
S
(
E
) itself. Instead, we will often divide
two different Ω’s to get probabilities, or differentiate
S
to get other interesting
quantities. In these cases, the factors really do not matter.
The second nice property of the entropy is that it is additive if we have
two non-interacting systems with energies
E
(1)
, E
(2)
. Then the total number of
states of the combined system is
Ω(E
(1)
, E
(2)
) = Ω
1
(E
(1)
)Ω
2
(E
(2)
).
So when we take the logarithm, we find
S(E
(1)
, E
(2)
) = S(E
(1)
) + S(E
(2)
).
Of course, this is not very interesting, until we bring our systems together and
let them interact with each other.
Interacting systems
Suppose we bring the two systems together, and let them exchange energy. Then
the energy of the individual systems is no longer fixed, and only the total energy
E
total
= E
(1)
+ E
(2)
is fixed. Then we find that
Ω(E
total
) =
X
E
i
1
(E
i
)Ω
2
(E
total
E
i
),
where we sum over all possible energy levels of the first system. In terms of the
entropy of the system, we have
Ω(E
total
) =
X
E
i
exp
S
1
(E
i
)
k
+
S
2
(E
total
E
i
)
k
We can be a bit more precise with what the sum means. We are not summing
over all eigenstates. Recall that we have previously fixed an accuracy
δE
. So
we can imagine dividing the whole energy spectrum into chunks of size
δE
, and
here we are summing over the chunks.
We know that
S
1,2
/k N
1,2
10
23
, which is a ridiculously large number.
So the sum is overwhelmingly dominated by the term with the largest exponent.
Suppose this is maximized when E
i
= E
. Then we have
S(E
total
) = k log Ω(E
total
) S
1
(E
) + S
2
(E
total
E
).
Again, we are not claiming that only the factor coming from
E
has significant
contribution. Maybe one or two energy levels next to
E
are also very significant,
but taking these into account will only multiply Ω(
E
total
) by a (relatively) small
constant, hence contributes a small additive factor to
S
(
E
total
), which can be
neglected.
Now given any
E
(1)
, what is the probability that the actual energy of the
first system is E
(1)
? For convenience, we write E
(2)
= E
total
E
(1)
, the energy
of the second system. Then the probability desired is
1
(E
(1)
)Ω
2
(E
(2)
)
Ω(E
total
)
= exp
1
k
S
1
(E
(1)
) + S
2
(E
(2)
) S(E
total
)
.
Again recall that the numbers at stake are unimaginably huge. So if
S
1
(
E
(1)
) +
S
2
(
E
(2)
) is even slightly different from
S
(
E
total
), then the probability is effectively
zero. And by above, for the two quantities to be close, we need
E
(1)
=
E
. So
for all practical purposes, the value of E
(1)
is fixed into E
.
Now imagine we prepare two systems separately with energies
E
(1)
and
E
(2)
such that
E
(1)
6
=
E
, and then bring the system together, then we are no longer
in equilibrium.
E
(1)
will change until it takes value
E
, and then entropy of the
system will increase from
S
1
(
E
(1)
) +
S
2
(
E
(2)
) to
S
1
(
E
) +
S
2
(
E
total
E
). In
particular, the entropy increases.
Law
(Second law of thermodynamics)
.
The entropy of an isolated system
increases (or remains the same) in any physical process. In equilibrium, the
entropy attains its maximum value.
This prediction is verified by virtually all observations of physics.
While our derivation did not show it is impossible to violate the second law
of thermodynamics, it is very very very very very very very very unlikely to be
violated.
Temp erature
Having defined entropy, the next interesting thing we can define is the temperature.
We assume that
S
is a smooth function in
E
. Then we can define the temperature
as follows:
Definition (Temperature). The temperature is defined to be
1
T
=
dS
dE
.
Why do we call this the temperature? Over the course, we will see that
this quantity we decide to call “temperature” does behave as we would expect
temperature to behave. It is difficult to give further justification of this definition,
because even though we vaguely have some idea what temperature is like in
daily life, those ideas are very far from anything we can concretely write down
or even describe.
One reassuring property we can prove is the following:
Proposition.
Two interacting systems in equilibrium have the same tempera-
ture.
Proof. Recall that the equilibrium energy E
is found by maximizing
S
1
(E
i
) + S
2
(E
total
E
i
)
over all possible
E
i
. Thus, at an equilibrium, the derivative of this expression
has to vanish, and the derivative is exactly
dS
1
dE
E
(1)
=E
dS
i
dE
E
(2)
=E
total
E
= 0
So we need
1
T
1
=
1
T
2
.
In other words, we need
T
1
= T
2
.
Now suppose initially, our systems have different temperature. We would
expect energy to flow from the hotter system to the cooler system. This is indeed
the case.
Proposition.
Suppose two systems with initial energies
E
(1)
, E
(2)
and temper-
atures
T
1
, T
2
are put into contact. If
T
1
> T
2
, then energy will flow form the
first system to the second.
Proof.
Since we are not in equilibrium, there must be some energy transfer from
one system to the other. Suppose after time δt, the energy changes by
E
(1)
7→ E
(1)
+ δE
E
(2)
7→ E
(2)
δE,
keeping the total energy constant. Then the change in entropy is given by
δS =
dS
1
dE
δE
(1)
+
dS
2
dE
δE
(2)
=
1
T
1
1
T
2
δE.
By assumption, we know
1
T
1
1
T
2
< 0,
but by the second law of thermodynamics, we know
δS
must increase. So we
must have δE < 0, i.e. energy flows from the first system to the second.
So this notion of temperature agrees with the basic properties of temperature
we expect.
Note that these properties we’ve derived only depends on the fact that
1
T
is a monotonically decreasing function of T . In principle, we could have picked
any monotonically decreasing function of
T
, and set it to
dS
dE
. We will later see
that this definition will agree with the other definitions of temperature we have
previously seen, e.g. via the ideal gas law, and so this is indeed the “right” one.
Heat capacity
As we will keep on doing later, we can take different derivatives to get different
interesting quantities. This time, we are going to get heat capacity. Recall that
T
was a function of energy,
T
=
T
(
E
). We will assume that we can invert this
function, at least locally, to get E as a function of T .
Definition (Heat capacity). The heat capacity of a system is
C =
dE
dT
.
The specific heat capacity is
C
mass of system
.
The specific heat capacity is a property of the substance that makes up the
system, and not how much stuff there is, as both
C
and the mass scale linearly
with the size of the system.
This is some quantity we can actually physically measure. We can measure
the temperature with a thermometer, and it is usually not too difficult to see how
much energy we are pumping into a system. Then by measuring the temperature
change, we can figure out the heat capacity.
In doing so, we can indirectly measure the entropy, or at least the changes in
entropy. Note that we have
dS
dT
=
dS
dE
dE
dT
=
C
T
.
Integrating up, if the temperature changes from T
1
to T
2
, we know
S =
Z
T
2
T
1
C(T )
T
dT.
As promised, by measuring heat capacity experimentally, we can measure the
change in entropy.
The heat capacity is useful in other ways. Recall that to find the equilibrium
energy E
, a necessary condition was that it satisfies
dS
1
dE
dS
2
dE
= 0.
However, we only know that the solution is an extrema, and not necessarily
maximum. To figure out if it is the maximum, we take the second derivative.
Note that for a single system, we have
d
2
S
dE
2
=
d
dE
1
T
=
1
T
2
C
.
Applying this to two systems, one can check that entropy is maximized at
E
(1)
=
E
if
C
1
, C
2
>
0. The actual computations is left as an exercise on the
first example sheet.
Let’s look at some actual systems and try to calculate these quantities.
Example.
Consider a 2-state system, where we have
N
non-interacting particles
with fixed positions. Each particle is either in
|↑i
or
|↓i
. We can think of these
as spins, for example. These two states have different energies
E
= ε, E
= 0.
We let
N
and
N
be the number of particles in
|↑i
and
|↓i
respectively. Then
the total energy of the system is
E = εN
.
We want to calculate this quantity Ω(
E
). Here in this very contrived example,
it is convenient to pick
δE < ε
, so that Ω(
E
) is just the number of ways of
choosing N
particles from N. By basic combinatorics, we have
Ω(E) =
N!
N
!(N N
)!
,
and
S(E) = k log
N!
N
!(N N
)!
.
This is not an incredibly useful formula. Since we assumed that
N
and
N
are
huge, we can use Stirling’s approximation
N! =
2πNN
N
e
N
1 + O
1
N

.
Then we have
log N! = N log N N +
1
2
log(2πN) + O
1
N
.
We just use the approximation three times to get
S(E) = k (N log N N N
log N
+ N
(N N
) log(N N
) + N N
)
= k
(N N
) log
N N
N
+ N
log
N
N

= kN

1
E
Nε
log
1
E
Nε
+
E
Nε
log
E
Nε

.
This is better, but we can get much more out of it if we plot it:
E
S(E)
0
Nε
Nε/2
Nk log 2
The temperature is
1
T
=
dS
dT
=
k
ε
log
Nε
E
1
,
and we can invert to get
N
N
=
E
Nε
=
1
e
ε/kT
+ 1
.
Suppose we get to control the temperature of the system, e.g. if we put it with a
heat reservoir. What happens as we vary our temperature?
As
T
0, we have
N
0. So the states all try to go to the ground state.
As T , we find N
/N
1
2
, and E N ε/2.
The second result is a bit weird. As
T
, we might expect all things to go
the maximum energy level, and not just half of them.
To confuse ourselves further, we can plot another graph, for
1
T
vs
E
. The
graph looks like
E
1
T
0
Nε
Nε/2
We see that having energy
> Nε/
2 corresponds to negative temperature, and to
go from positive temperature to negative temperature, we need to pass through
infinite temperature. So in some sense, negative temperature is “hotter” than
infinite temperature.
What is going on? By definition, negative
T
means Ω(
E
) is a decreasing
function of energy. This is a very unusual situation. In this system, all the
particles are fixed, and have no kinetic energy. Consequently, the possible energy
levels are bounded. If we included kinetic energy into the system, then kinetic
energy can be arbitrarily large. In this case, Ω(
E
) is usually an increasing
function of E.
Negative
T
has indeed been observed experimentally. This requires setups
where the kinetic energy is not so important in the range of energies we are
talking about. One particular scenario where this is observed is in nuclear spins
of crystals in magnetic fields. If we have a magnetic field, then naturally, most
of the spins will align with the field. We now suddenly flip the field, and then
most of the spins are anti-aligned, and this can give us a negative temperature
state.
Now we can’t measure negative temperature by sticking a thermometer into
the material and getting a negative answer. Something that can be interestingly
measured is the heat capacity
C =
dE
dT
=
Nε
2
kT
2
e
ε/kT
(e
ε/kT
+ 1)
2
.
This again exhibits some peculiar properties. We begin by looking at a plot:
T
C
0
kT ε
By looking at the formula, we see that the maximum
kT
is related to the
microscopic
ε
. If we know about the value of
k
, then we can use the macroscopic
observation of C to deduce something about the microscopic ε.
Note that C is proportional to N. As T 0, we have
C T
2
e
ε/kT
,
and this is a function that decreases very rapidly as
T
0, and in fact this is
one of the favorite examples in analysis where all derivatives of the function at 0
vanish. Ultimately, this is due to the energy gap between the ground state and
the first excited state.
Another peculiarity of this plot is that the heat capacity vanishes at high
temperature, but this is due to the peculiar property of the system at high
temperature. In a general system, we expect the heat capacity to increase with
temperature.
How much of this is actually physical? The answer is “not much”. This is
not surprising, because we didn’t really do much physics in these computations.
For most solids, the contribution to
C
from spins is swamped by other effects
such as contributions of phonons (quantized vibrations in the solid) or electrons.
In this case, C(T ) is monotonic in T .
However, there are some very peculiar materials for which we obtain a small
local maximum in
C
(
T
) for very small
T
, before increasing monotonically, which
is due to the contributions of spin:
T
C
0
1.2 Pressure, volume and the first law of thermodynamics
So far, our system only had one single parameter the energy. Usually, our
systems have other external parameters which can be varied. Recall that our
“standard” model of a statistical system is a box of gas. If we allow ourselves
to move the walls of the box, then the volume of the system may vary. As we
change the volume, the allowed energies eigenstates will change. So now Ω, and
hence S are functions of energy and volume:
S(E, V ) = k log Ω(E, V ).
We now need to modify our definition of temperature to account for this depen-
dence:
Definition
(Temperature)
.
The temperature of a system with variable volume
is
1
T
=
S
E
V
,
with V fixed.
But now we can define a different thermodynamic quantity by taking the
derivative with respect to V .
Definition
(Pressure)
.
We define the pressure of a system with variable volume
to be
p = T
S
V
E
.
Is this thing we call the “pressure” any thing like what we used to think of
as pressure, namely force per unit area? We will soon see that this is indeed the
case.
We begin by deducing some familiar properties of pressure.
Proposition.
Consider as before two interacting systems where the total volume
V
=
V
1
+
V
2
is fixed by the individual volumes can vary. Then the entropy of
the combined system is maximized when T
1
= T
2
and p
1
= p
2
.
Proof. We have previously seen that we need T
1
= T
2
. We also want
dS
dV
E
= 0.
So we need
dS
1
dV
E
=
dS
2
dV
E
.
Since the temperatures are equal, we know that we also need p
1
= p
2
.
For a single system, we can use the chain rule to write
dS =
S
E
V
dE +
S
V
E
dV.
Then we can use the definitions of temperature and pressure to write
Proposition (First law of thermodynamics).
dE = T dS p dV.
This law relates two infinitesimally close equilibrium states. This is sometimes
called the fundamental thermodynamics relation.
Example.
Consider a box with one side a movable piston of area
A
. We apply
a force F to keep the piston in place.
= F
dx
What happens if we move the piston for a little bit? If we move through a
distance d
x
, then the volume of the gas has increased by
A
d
x
. We assume
S
is
constant. Then the first law tells us
dE = pA dx.
This formula should be very familiar to us. This is just the work done by the
force, and this must be
F
=
pA
. So our definition of pressure in terms of partial
derivatives reproduces the mechanics definition of force per unit area.
One has to be cautious here. It is not always true that
p
d
V
can be equated
with the word done on a system. For this to be true, we require the change to
be reversible, which is a notion we will study more in depth later. For example,
this is not true when there is friction.
In the case of a reversible change, if we equate
p
d
V
with the work done,
then there is only one possible thing
T
d
S
can be it is the heat supplied to
the system.
It is important to remember that the first law holds for any change. It’s just
that this interpretation does not.
Example.
Consider the irreversible change, where we have a “free expansion”
of gas into vacuum. We have a box
gas
vacuum
We have a valve in the partition, and as soon as we open up the valve, the gas
flows to the other side of the box.
In this system, no energy has been supplied. So d
E
= 0. However, d
V >
0,
as volume clearly increased. But there is no work done on or by the gas. So in
this case,
p
d
V
is certainly not the work done. Using the first law, we know
that
T dS = p dV.
So as the volume increases, the entropy increases as well.
We now revisit the concept of heat capacity. We previously defined it as
d
E/
d
T
, but now we need to decide what we want to keep fixed. We can keep
V
fixed, and get
C
V
=
E
T
V
= T
S
T
V
.
While this is the obvious generalization of what we previously had, it is not a
very useful quantity. We do not usually do experiments with fixed volume. For
example, if we do a chemistry experiment in a test tube, say, then the volume is
not fixed, as the gas in the test tube is free to go around. Instead, what is fixed
is the pressure. We can analogously define
C
p
= T
S
T
p
.
Note that we cannot write this as some sort of
E
T
.
1.3 The canonical ensemble
So far, we have been using the microcanonical ensemble. The underlying as-
sumption is that our system is totally isolated, and we know what the energy of
the system is. However, in real life, this is most likely not the case. Even if we
produce a sealed box of gas, and try to do experiments with it, the system is
not isolated. It can exchange heat with the environment.
On the other hand, there is one thing that is fixed the temperature.
The box is in thermal equilibrium with the environment. If we assume the
environment is “large”, then we can assume that the environment is not really
affected by the box, and so the box is forced to have the same temperature as
the environment.
Let’s try to study this property. Consider a system
S
interacting with a
much larger system
R
. We call this
R
a heat reservoir. Since
R
is assumed to
be large, the energy of
S
is negligible to
R
, and we will assume
R
always has
a fixed temperature
T
. Then in this set up, the systems can exchange energy
without changing T .
As before, we let
|ni
be a basis of microstates with energy
E
n
. We suppose
we fix a total energy
E
total
, and we want to find the total number of microstates
of the combined system with this total energy. To do so, we fix some state
|ni
of
S
, and ask how many states of
S
+
R
there are for which
S
is in
|ni
. We then
later sum over all |ni.
By definition, we can write this as
R
(E
total
E
n
) = exp
k
1
S
R
(E
total
E
n
)
.
By assumption, we know
R
is a much larger system than
S
. So we only get
significant contributions when
E
n
E
total
. In these cases, we can Taylor expand
to write
R
(E
total
E
n
) = exp
k
1
S
R
(E
total
) k
1
S
R
E
V
E
n
.
But we know what
S
R
E
is it is just T
1
. So we finally get
R
(E
total
E
n
) = e
k
1
S
R
(E
total
)
e
βE
n
,
where we define
Notation (β).
β =
1
kT
.
Note that while we derived this this formula under the assumption that
E
n
is small, it is effectively still valid when
E
n
is large, because both sides are very
tiny, and even if they are very tiny in different ways, it doesn’t matter when we
add over all states.
Now we can write the total number of microstates of S + R as
Ω(E
total
) =
X
n
R
(E
total
E
n
) = e
k
1
S
R
(E
total
)
X
n
e
βE
n
.
Note that we are summing over all states, not energy.
We now use the fundamental assumption of statistical mechanics that all
states of
S
+
R
are equally likely. Then we know the probability that
S
is in
state |ni is
p(n) =
R
(E
total
E
n
)
Ω(E
total
)
=
e
βE
n
P
k
e
βE
k
.
This is called the Boltzmann distribution for the canonical ensemble. Note that
at the end, all the details have dropped out apart form the temperature. This
describes the energy distribution of a system with fixed temperature.
Note that if
E
n
kT
=
1
β
, then the exponential is small. So only states
with
E
n
kT
have significant probability. In particular, as
T
0, we have
β , and so only the ground state can be occupied.
We now define an important quantity.
Definition (Partition function). The partition function is
Z =
X
n
e
βE
n
.
It turns out most of the interesting things we are interested in can be expressed
in terms of
Z
and its derivatives. Thus, to understand a general system, what
we will do is to compute the partition function and express it in some familiar
form. Then we can use standard calculus to obtain quantities we are interested
in. To begin with, we have
p(n) =
e
βE
n
Z
.
Proposition. For two non-interacting systems, we have Z(β) = Z
1
(β)Z
2
(β).
Proof. Since the systems are not interacting, we have
Z =
X
n,m
e
β(E
(1)
n
+E
(2)
n
)
=
X
n
e
βE
(1)
n
!
X
n
e
βE
(2)
n
!
= Z
1
Z
2
.
Note that in general, energy is not fixed, but we can compute the average
value:
hEi =
X
n
p(n)E
n
=
X
E
n
e
βE
n
Z
=
β
log Z.
This partial derivative is taken with all
E
i
fixed. Of course, in the real world,
we don’t get to directly change the energy eigenstates and see what happens.
However, they do depend on some “external” parameters, such as the volume
V
,
the magnetic field
B
etc. So when we take this derivative, we have to keep all
those parameters fixed.
We look at the simple case where
V
is the only parameter we can vary. Then
Z = Z(β, V ). We can rewrite the previous formula as
hEi =
β
log Z
V
.
This gives us the average, but we also want to know the variance of
E
. We have
E
2
= h(E hEi)
2
i = hE
2
i hEi
2
.
On the first example sheet, we calculate that this is in fact
E
2
=
2
β
2
log Z
V
=
hEi
β
V
.
We can now convert
β
-derivatives to
T
-derivatives using the chain rule. Then
we get
E
2
= kT
2
hEi
T
V
= kT
2
C
V
.
From this, we can learn something important. We would expect
hEi N
, the
number of particles of the system. But we also know C
V
N. So
E
hEi
1
N
.
Therefore, the fluctuations are negligible if
N
is large enough. This is called the
thermodynamic limit
N
. In this limit, we can ignore the fluctuations in
energy. So we expect the microcanonical ensemble and the canonical ensemble to
give the same result. And for all practical purposes,
N
10
23
is a large number.
Because of that, we are often going to just write E instead of hEi.
Example. Suppose we had particles with
E
= ε, E
= 0.
So for one particle, we have
Z
1
=
X
n
e
βE
n
= 1 + e
βε
= 2e
βε/2
cosh
βε
2
.
If we have
N
non-interacting systems, then since the partition function is
multiplicative, we have
Z = Z
N
1
= 2
n
e
βεN/2
cosh
N
βε
2
.
From the partition function, we can compute
hEi =
d log Z
dβ
=
Nε
2
1 tanh
βε
2
.
We can check that this agrees with the value we computed with the microcanon-
ical ensemble (where we wrote the result using exponentials directly), but the
calculation is much easier.
Entropy
When we first began our journey to statistical physics, the starting point of
everything was the entropy. When we derived the canonical ensemble, we used
the entropy of the everything, including that of the reservoir. However, we are
not really interested in the reservoir, so we need to come up with an alternative
definition of the entropy.
We can motivate our new definition as follows. We use our previous picture
of an ensemble. We have
W
1 many worlds, and our probability distribution
says there are
W p
(
n
) copies of the world living in state
|ni
. We can ask what is
the number of ways of picking a state for each copy of the world so as to reach
this distribution.
We apply the Boltzmann definition of entropy to this counting:
S = k log
This time, is given by
Ω =
W !
Q
n
(W p(n))!
.
We can use Stirling’s approximation, and find that
S
ensemble
= kW
X
n
p(n) log p(n).
This suggests that we should define the entropy of a single copy as follows:
Definition
(Gibbs entropy)
.
The Gibbs entropy of a probability distribution
p(n) is
S = k
X
n
p(n) log p(n).
If the density operator is given by
ˆρ =
X
n
p(n) |nihn|,
then we have
S = Tr(ˆρ log ˆρ).
We now check that this definition makes sense, in that when we have a micro-
canonical ensemble, we do get what we expect.
Example. In the microcanonical ensemble, we have
p(n) =
(
1
Ω(E)
E E
n
E + δE
0 otherwise
Then we have
S = k
X
n:EE
n
E+δE
1
Ω(E)
log
1
Ω(E)
= kΩ(E) ·
1
Ω(E)
log
1
Ω(E)
= k log Ω(E).
So the Gibbs entropy reduces to the Boltzmann entropy.
How about the canonical ensemble?
Example. In the canonical ensemble, we have
p(n) =
e
βE
n
Z
.
Plugging this into the definition, we find that
S = k
X
n
p(n) log
e
βE
n
Z
= k
X
n
p(n)(βE
n
log Z)
= kβhEi + k log Z,
using the fact that
P
p(n) = 1.
Using the formula of the expected energy, we find that this is in fact
S = k
T
(T log Z)
V
.
So again, if we want to compute the entropy, it suffices to find a nice closed
form of Z.
Maximizing entropy
It turns out we can reach the canonical ensemble in a different way. The second
law of thermodynamics suggests we should always seek to maximize entropy. Now
if we take the optimization problem of “maximizing entropy”, what probability
distribution will we end up with?
The answer depends on what constraints we put on the optimization problem.
We can try to maximize
S
Gibbs
over all probability distributions such that
p
(
n
) = 0 unless
E E
n
E
+
δE
. Of course, we also have the constraint
P
p(n) = 1. Then we can use a Lagrange multiplier α and extremize
k
1
S
Gibbs
+ α
X
n
p(n) 1
!
,
Differentiating with respect to p(n) and solving, we get
p(n) = e
α1
.
In particular, this is independent of
n
. So all microstates with energy in this
range are equally likely, and this gives the microcanonical ensemble.
What about the canonical ensemble? It turns out this is obtained by max-
imizing the entropy over all
p
(
n
) such that
hEi
is fixed. The computation is
equally straightforward, and is done on the first example sheet.
1.4 Helmholtz free energy
In the microcanonical ensemble, we discussed the second law of thermodynamics,
namely the entropy increases with time and the maximum is achieved in an
equilibrium.
But this is no longer true in the case of the canonical ensemble, because
we now want to maximize the total entropy of the system plus the heat bath,
instead of just the system itself. Then is there a proper analogous quantity for
the canonical ensemble?
The answer is given by the Helmholtz free energy.
Definition (Helmholtz free energy). The Helmholtz free energy is
F = hEi T S.
As before, we will often drop the h·i.
In general, in an isolated system,
S
increases, and
S
is maximized in equilib-
rium. In a system with a reservoir,
F
decreases, and
F
minimizes in equilibrium.
In some sense F captures the competition between entropy and energy.
Now is there anything analogous to the first law
dE = T dS p dV ?
Using this, we can write
dF = dE d(T S) = S dT p dV.
When we wrote down the original first law, we had d
S
and d
V
on the right,
and thus it is natural to consider energy as a function of entropy and volume
(instead of pressure and temperature). Similarly, It is natural to think of
F
as a
function of
T
and
V
. Mathematically, the relation between
F
and
E
is that
F
is the Legendre transform of E.
From this expression, we can immediately write down
S =
F
T
V
,
and the pressure is
p =
F
V
T
.
As always, we can express the free energy in terms of the partition function.
Proposition.
F = kT log Z.
Alternatively,
Z = e
βF
.
Proof. We use the fact that
d
dβ
= kT
2
d
dT
.
Then we can start from
F = E T S
=
log Z
β
T S
= kT
2
log Z
T
V
kT
T
(T log Z)
V
= kT log Z,
and we are done. Good.
1.5 The chemical potential and the grand canonical en-
semble
So far we have considered situations where we had fixed energy or fixed volume.
However, there are often other things that are fixed. For example, the number
of particles, or the electric charge of the system would be fixed. If we measure
these things, then this restricts which microstates are accessible.
Let’s consider
N
. This quantity was held fixed in the microcanonical and
canonical ensembles. But these quantities do depend on N, and we can write
S(E, V, N ) = k log Ω(E, V, N).
Previously, we took this expression, and asked what happens when we varied
E
,
and we got temperature. We then asked what happens when we vary
V
, and we
got pressure. Now we ask what happens when we vary N .
Definition
(Chemical potential)
.
The chemical potential of a system is given
by
µ = T
S
N
E,V
.
Why is this significant? Recall that when we only varied
E
, then we figured
that two systems in equilibrium must have equal temperature. Then we varied
V
as well, and found that two systems in equilibrium must have equal temperature
and pressure. Then it shouldn’t be surprising that if we have two interacting
systems that can exchange particles, then we must have equal temperature,
pressure and chemical potential. Indeed, the same argument works.
If we want to consider what happens to the first law when we vary
N
, we
can just straightforwardly write
dS =
S
E
V,N
dE +
S
V
E,N
dV +
S
N
E,V
dV.
Then as before, we get
dE = T dS p dV + µ dN.
From this expressions, we can get some feel for what
µ
is. This is the energy
cost of adding one particle at fixed
S, V
. We will actually see later that
µ
is
usually negative. This might seem counter-intuitive, because we shouldn’t be
able to gain energy by putting in particles in general. However, this
µ
is the cost
of adding a particle at fixed entropy and volume. In general, adding a particle
will cause the entropy to increase. So to keep
S
fixed, we will have to take out
energy.
Of course, we can do the same thing with other sorts of external variables.
For example, we can change
N
to
Q
, the electric charge, and then we use Φ, the
electrostatic potential instead of
µ
. The theory behaves in exactly the same way.
From the first law, we can write
µ =
E
N
S,V
.
In the canonical ensemble, we have fixed
T
, but the free energy will also depend
on energy:
F (T, V, N) = E T S.
Again, we have
dF = dE d(T S) = S dT p dV + µ dN.
So we have
µ =
F
N
T,V
.
But in this case, the canonical ensemble is not the most natural thing to consider.
Instead of putting our system in a heat reservoir, we put it in a “heat and
particle” reservoir
R
. In some sense, this is a completely open system it can
exchange both heat and particles with the external world.
As before,
µ
and
T
are fixed by their values in
R
. We repeat the argument
with the canonical ensemble, and we will find that the probability that a system
is in state n is
p(n) =
e
β(E
n
µN
n
)
Z
,
where
N
n
is the number of particles in
|ni
, and we can define the grand canonical
partition function
Z =
X
n
e
β(E
n
µN
n
)
Of course, we can introduce more and more quantities after
V, N
, and then get
more and more terms in the partition function, but they are really just the same.
We can quickly work out how we can compute quantities from
Z
. By writing
out the expressions, we have
Proposition.
hEi µhN i =
Z
β
µ,V
.
Proposition.
hNi =
X
n
p(n)N
n
=
1
β
log Z
µ
T,V
.
As in the canonical ensemble, there is a simple formula for variance:
Proposition.
N
2
= hN
2
i hNi
2
=
1
β
2
2
log Z
µ
2
T,V
=
1
β
hNi
µ
T,V
N.
So we have
N
hNi
1
N
.
So again in the thermodynamic limit, the fluctuations in
hNi
are negligible.
We can also calculate the Gibbs entropy:
Proposition.
S = k
T
(T log Z)
µ,N
.
With the canonical ensemble, we had the free energy. There is an analogous
thing we can define for the grand canonical ensemble.
Definition (Grand canonical potential). The grand canonical potential is
Φ = F µN = E T S µN.
Then we have
Proposition.
dΦ = S dT p dV N dµ.
We thus see that it is natural to view Φ as a function of
T
,
V
and
µ
. Using
the formula for E µN, we find
Φ =
log Z
β
µ,V
= kT log Z,
which is exactly the form we had for the free energy in the canonical ensemble.
In particular, we have
Z = e
βΦ
.
1.6 Extensive and intensive properties
So far, we have defined a lot of different quantities
p, V, µ, N, T, S
etc. In
general, we can separate these into two different types. Quantities such as
V, N
scale with the size of the volume, while µ and p do not scale with the size.
Definition
(Extensive quantity)
.
An extensive quantity is one that scales pro-
portionally to the size of the system.
Definition
(Intensive quantity)
.
An intensive quantity is one that is independent
of the size of the system.
Example. N, V, E, S are all extensive quantities.
Now note that the entropy is a function of
E, V, N
. So if we scale a system
by λ, we find that
S(λE, λV, λN ) = λS(E, V, N ).
Example. Recall that we defined
1
T
=
S
E
V,N
.
So if we scale the system by
λ
, then both
S
and
E
scale by
λ
, and so
T
does
not change. Similarly,
p = T
S
V
T,N
, µ = T
S
N
T,V
Example. The free energy is defined by
F = E T S.
Since
E
and
S
are both extensive, and
T
is intensive, we find that
F
is extensive.
So
F (T, λV, λN) = λF (T, V, N ).
Similarly, the grand canonical potential is
Φ = F µN.
Since F and N are extensive and µ are intensive, we know Φ is extensive:
Φ(T, λV, µ) = λΦ(T, V, µ).
This tells us something useful. We see that Φ must be proportional to
V
.
Indeed, taking the above equation with respect to λ, we find
V
Φ
V
T
(T, λV, µ) = Φ(T, V, µ).
Now setting λ = 1, we find
Φ(T, V, µ) = V
Φ
V
T
= pV.
Here p is an intensive quantity, it cannot depend on V . So we have
Φ(T, V, µ) = p(T, µ)V.
This is quite a strong result.
2 Classical gases
So far, we have been doing statistical physics starting from quantum mechanics.
But statistical mechanics was invented before quantum mechanics. So we should
be able to do statistical mechanics classically, and we will apply it to the case of
classical gases. This classical theory agree quite well with experiment, except
for a few small things that went wrong, and it turns out we have to solve these
problems by going quantum.
To do statistical physics classically, the idea is to figure out what the classical
version of the partition function should be. We can then use this to derive the
different thermodynamic quantities, as we have previously managed to express
them in terms of the partition function.
After figuring out the partition function, we are going to study three types
of classical gases. We begin by looking at monoatomic ideal gases, which is the
simplest type of gases one can think of. They do not have internal structure and
do not interact with each other.
After understanding monoatomic ideal gases, we will move on to consider
two possible modifications. The first is the case of a diatomic (ideal) gas, where
the gases now have some internal structure, and hence kinetic energy is not the
only possible kind of energy. It turns out the theory works out pretty similarly
to the monoatomic case, except that the average energy of each particle is higher
than the ideal gas version.
Finally, we will consider what happens if we have gas molecules that do inter-
act with each other, and we will do so perturbatively, assuming the interactions
are small (which they are).
2.1 The classical partition function
In the canonical ensemble, the quantum partition function is
Z =
X
n
e
βE
n
.
What is the classical analogue? Classically, we can specify the state of a system
by a point in phase space, which is the space of all positions and momentum.
For example, if we have a simple particle, then a point in phase space is just
(
q
(
t
)
, p
(
t
)), the position and momentum of the particle. It is conventional to use
q
instead of
x
when talking about a point in the phase space. So in this case,
the phase space is a 6-dimensional space.
The equation of motion determines the trajectory of the particle through
phase space, given the initial position in phase space. This suggests that we
should replace the sum over states by an integral over phase space. Classically,
we also know what the energy is. For a single particle, the energy is given by
the Hamiltonian
H =
p
2
2m
+ V (q).
So it seems that we know what we need to know to make sense of the partition
function classically. We might want to define the partition function as
Z
1
=
Z
d
3
q d
3
p e
βH(p,q)
.
This seems to make sense, except that we expect the partition function to be
dimensionless. The solution is to introduce a quantity
h
, which has dimensions
of length times momentum. Then we have
Definition
(Partition function (single particle))
.
We define the single particle
partition function as
Z
1
=
1
h
3
Z
d
3
q d
3
p e
βH(p,q)
.
We notice that whenever we use the partition function, we usually differentiate
the log of
Z
. So the factor of
h
3
doesn’t really matter for observable quantities.
However, recall that entropy is just given by
log Z
, and we might worry that
the entropy depends on
Z
. But it doesn’t matter, because entropy is not actually
observable. Only entropy differences are. So we are fine.
The more careful reader might worry that our choice to integrate e
βH(p,q)
against d
3
q
d
3
p
is rather arbitrary, and there is no good a priori reason why we
shouldn’t integrate it against, say, d
3
q d
3
p
5
instead.
However, it is possible to show that this is indeed the “correct” partition
function to use, by taking the quantum partition function, and then taking the
limit ~ 0. Moreover, we find that the correct value of h should be
h = 2π~.
We will from now on use this value of
h
. In the remainder of the chapter, we
will mostly spend our time working out the partition function of several different
systems as laid out in the beginning of the chapter.
2.2 Monoatomic ideal gas
We now begin considering ideal gases.
Definition
(Ideal gas)
.
An ideal gas is a gas where the particles do not interact
with each other.
Of course, this is never true, but we can hope this is a good approximation
when the particles are far apart.
We begin by considering a monoatomic ideal gas. These gases have no
internal structure, and is made up of single atoms. In this case, the only energy
we have is the kinetic energy, and we get
H =
p
2
2m
.
We just have to plug this into our partition function and evaluate the integral.
We have
Z
1
(V, T ) =
1
(2π~)
3
Z
d
3
p d
3
q e
βp
2
/2m
=
V
(2π~)
3
Z
d
3
p e
βp
2
/2m
Here
V
is the volume of the box containing the particle, which is what we obtain
when we do the d
3
q integral.
This remaining integral is just the Gaussian integral. Recall that we have
Z
dx e
ax
2
=
r
π
a
.
Using this three times, we find
Proposition. For a monoatomic gas, we have
Z
1
(V, T ) = V
mkT
2π~
2
3/2
=
V
λ
3
,
where we define
Definition
(Thermal de Broglie wavelength)
.
The thermal de Broglie wavelength
of a gas at temperature T is
λ =
r
2π~
2
mkT
.
If we think of the wavelength as the “size” of the particle, then we see that
the partition function counts the number of particles we can fit in the volume
V
.
We notice that this partition function involves
~
, which is a bit weird since we
are working classically, but we will see that the
~
doesn’t appear in the formulas
we derive from this.
The generalization to multiple particles is straightforward. If we have
N
particles, since the partition function is again multiplicative, we have
Z(N, V, T ) = Z
N
1
= V
N
λ
3N
.
There is a small caveat at this point. We will later see that this is not quite right.
When we think about the quantum version, if the particles are indistinguishable,
then we would have counted each state
N
! times, which would give the wrong
answer. Again, this doesn’t affect any observable quantity, so we will put this
issue aside for the moment, until we get to studying the entropy itself, in which
case this N! does matter.
We can similarly define the pressure to be
p =
F
V
T
=
V
(kT log Z)
T
.
Then plugging our partition function into this definition, we find
p =
NkT
V
.
Rearranging, we obtain
Proposition (Ideal gas law).
pV = NkT.
Notice that in this formula, the
λ
has dropped out, and there is no dependence
on ~.
Definition
(Equation of state)
.
An equation of st ate is an equation that relates
state variables, i.e. variables that depend only on the current state of the system,
as opposed to how we obtained this system.
The ideal gas law is an example of an equation of state.
Let’s now look at the energy of the ideal gas. We can similarly compute
hEi =
log Z
β
V
=
3
2
NkT = 3N
1
2
kT
.
This is a general phenomenon. Our system has
N
particles, and each particle
has three independent directions it can move in. So there are 3
N
degrees of
freedom.
Law
(Equipartition of energy)
.
Each degree of freedom of an ideal gas contributes
1
2
kT to the average energy.
In the next section, we will study gases with internal structure, hence internal
degrees of freedom, and each such degree of freedom will again contribute
1
2
kT
to the average energy.
Of course, this law requires some hidden assumptions we do not make precise.
If we add a degree of freedom
s
with a term
s
5.3
log
(2
s
+ 1) in the Hamiltonian,
then there is no reason to believe the contribution to the average energy would
still be
1
2
kT
. We will also see in the next section that if the degree of freedom
has some potential energy, then there will be even more contribution to the
energy.
There are other quantities of the gas we can compute. We know the average
energy of a single particle is
hp
2
i
2m
=
3
2
kT.
So we have
hp
2
i mkT.
Thus, for a single particle, we have
|p|
mkT ,
and so
λ
h
|p|
.
This is the usual formula for the de Broglie wavelength. So our thermal de
Broglie wavelength is indeed related to the de Broglie wavelength.
Finally, we can compute the heat capacity
C
V
=
E
T
V
=
3
2
Nk.
Boltzmann’s constant
Recall that Boltzmann’s constant is
k = 1.381 × 10
23
J K
1
This number shows that we were terrible at choosing units. If we were to invent
physics again, we would pick energy to have the same unit as temperature, so
that
k
= 1. This number
k
is just a conversion factor between temperature and
energy because we chose the wrong units.
But of course, the units were not chosen randomly in order to mess up our
thermodynamics. The units were chosen to relate to scales we meet in everyday
life. So it is still reasonable to ask why
k
has such a small value. We look at the
ideal gas law.
pV
T
= Nk.
We would expect when we plug in some everyday values for the left hand side,
the result would be somewhat sensible, because our ancestors were sane when
picking units (hopefully).
Indeed, we can put in numbers
p = 10
5
N m
2
V = 1 m
3
T = 300 K,
and we find that the LHS is 300.
So what makes
k
such a tiny number is that
N
is huge. The number of
particles is of the order 10
23
. Thus, for Nk to have a sensible value, k must be
tiny.
The fact that
k
is small tells us that everyday lumps of matter contain a lot
of particles, and in turn, this tells us that atoms are small.
Entropy
The next thing to study is the entropy of an ideal gas. We previously wrote
down
Z = Z
N
1
,
and briefly noted that this isn’t actually right. In quantum mechanics, we know
that if we swap two indistinguishable particles, then we get back the same state,
at least up to a sign. Similarly, if we permute any of our particles, which are
indistinguishable, then we get the same system. We are over-counting the states.
What we really should do is to divide by the number of ways to permute the
particles, namely N!:
Z =
1
N!
Z
N
1
.
Just as in the constant
h
in our partition function, this
N
! doesn’t affect any of
our observables. In particular,
p
and
hEi
are unchanged. However, this
N
! does
affect the entropy
S =
T
(kT log Z).
Plugging the partition function in and using Stirling’s formula, we get
S = Nk
log
V
Nλ
3
+
5
2
.
This is known as the Sackur-Tetrode equation.
Recall that the entropy is an extensive property. If we re-scale the system by
a factor of α, then
N 7→ αN, V 7→ αV.
Since
λ
depends on
T
only, it is an intensive quantity, and this indeed scales
as
S 7→ αS
. But for this to work, we really needed the
N
inside the logarithm,
and the reason we have the
N
inside the logarithm is that we had an
N
! in the
partition function.
When people first studied statistical mechanics of ideal gases, they didn’t
know about quantum mechanics, and didn’t know they should put in the
N
!.
Then the resulting value of
S
is no longer extensive. This leads to Gibbs paradox.
The actual paradox is as follows:
Suppose we have a box of bas with entropy
S
. We introduce a partition
between the gases, so that the individual partitions have entropy
S
1
and
S
2
.
Then the fact that the gas is not extensive means
S 6= S
1
+ S
2
.
This means by introducing or removing a partition, we have increased or decreased
the entropy, which violates the second law of thermodynamics.
This
N
!, which comes from quantum effects, fixes this problem. This is a
case where quantum mechanics is needed to understand something that really
should be classical.
Grand canonical ensemble
We now consider the case where we have a grand canonical ensemble, so that
we can exchange heat and particles. In the case of gas, we can easily visualize
this as a small open box of gas where gas is allowed to freely flow around. The
grand ensemble has partition function
Z
ideal
(µ, V, T ) =
X
N=0
e
βµN
Z
ideal
(N, V, T )
=
X
N=0
1
N!
e
βµ
V
λ
3
N
= exp
e
βµ
V
λ
3
Armed with this, we can now calculate the average number of particles in our
system. Doing the same computations as before, we have
N =
1
β
log Z
µ
V,T
=
e
βµ
V
λ
3
.
So we can work out the value of µ:
µ = kT log
λ
3
N
V
.
Now we can use this to get some idea of what the chemical potential actually
means. For a classical gas, we need the wavelength to be significantly less than
the average distance between particles, i.e.
λ
V
N
1/3
,
so that the particles are sufficiently separated. If this is not true, then quantum
effects are important, and we will look at them later. If we plug this into the
logarithm, we find that µ < 0.
Remember that µ is defined by
µ =
E
N
S,V
.
It might seem odd that we get energy out when we add a particle. But note that
this derivative is taken with
S
fixed. Normally, we would expect adding a particle
to increase the entropy. So to keep the entropy fixed, we must simultaneously
take out energy of the system, and so µ is negative.
Continuing our exploration of the grand canonical ensemble, we can look at
the fluctuations in N, and find
N
2
=
1
β
2
log Z
ideal
= N.
So we find that
N
N
=
1
N
0
as N . So in the thermodynamic limit, the fluctuations are negligible.
We can now obtain our equation of state. Recall the grand canonical potential
is
Φ = kT log Z,
and that
pV = Φ.
Since we know log Z, we can work out what pV is, we find that
pV = kT
e
βµ
V
λ
3
= NkT.
So the ideal gas law is also true in the grand canonical ensemble. Also, from
cancelling the
V
from both sides, we see that this determines
p
as a function of
T and µ:
p =
kT e
βµ
λ
3
.
2.3 Maxwell distribution
We calculated somewhere the average energy of our gas, so we can calculate
the average energy of an atom in the gas. But that is just the average energy.
We might want to figure out the distribution of energy in the atoms of our gas.
Alternatively, what is the distribution of particle speed in a gas?
We can get that fairly straightforwardly from what we’ve got so far.
We ask what’s the probability of a given particle being in a region of phase
space of volume d
3
q d
3
p centered at (q, p). We know what this is. It is just
Ce
βp
2
/2m
d
3
q d
3
p
for some normalization constant
C
, since the kinetic energy of a particle is
p
2
/
2
m
. Now suppose we don’t care about the position, and just want to know
about the momentum. So we integrate over
q
, and the probability that the
momentum is within d
3
p of p is
CV d
3
p e
βp
2
/2m
.
Let’s say we are interested in velocity instead of momentum, so this is equal to
CV m
2
d
3
v e
βmv
2
/2
.
Moreover, we are only interested in the speed, not the velocity itself. So we
introduce spherical polar coordinates (v, θ, φ) for v. Then we get
CV m
3
sin θ dθ dϕ v
2
dv e
mv
2
/(2kT )
.
Now we don’t care about the direction, so we again integrate over all possible
values of θ and φ. Thus, the probability that the speed is within dv of v is
f(v) dv = Nv
2
e
mv
2
/(2kT )
dv,
where we absorbed all our numbers into the constant
N
. Then
f
is the probability
density function of v. We can fix this constant N by normalizing:
Z
0
f(v) dv = 1.
So this fixes
N = 4π
m
2πkT
1/2
.
This f (v) is called the Maxwell distribution.
We can try to see what
f
looks like. We see that for large
v
, it is exponentially
decreasing, and for small
v
, it is quadratic in
v
. We can plot it for a few
monoatomic ideal gases:
T
C
4
He
20
Ne
40
Ar
132
Xe
We see that the energy distribution shifts to the right as we increase the mass.
This is expected, because we know that the energy of the particle is always
1
2
kT
,
and so for lighter particles, we need to have higher energy.
We can sanity-check that the expected value of v is correct. We have
hv
2
i =
Z
0
v
2
f(v) dv =
3kT
m
.
So we find that
hEi =
1
2
mhv
2
i =
3
2
kT.
This agrees with the equipartition of energy, as it must. But now we have an
actual distribution, we can compute other quantities like hv
4
i.
Note that in these derivations, we assumed we were dealing with a monoatomic
ideal gas, but it happens that in fact this holds for a much more general family
of gases. We will not go much into details.
2.4 Diatomic gases
We now move on to consider more complicated gases. If we have molecules, then
they can contain other forms of energy such as rotational energy.
Everything we are doing is classical, so we need to come up with a model of a
diatomic molecule, instead of studying the actual quantum system of molecules.
For simplicity, we can model them as two point masses joined together by a
massless spring.
m
1
m
2
As we know from classical dynamics, we can decompose the motion into different
components:
Translation of the center of mass. We can view this as a single mass with
energy M = m
1
+ m
2
.
Rotations about center of mass. There are two axes of rotations orthogonal
to the spring, and these have a moment of inertia
I
. We will ignore the
rotation along the axes parallel to the spring because we assume the masses
are point masses.
Vibrations along the axis of symmetry. The important quantity here is the
reduced mass
m =
m
1
m
1
m
1
+ m
2
.
We will assume all these motions are independent. In reality, the translation is
indeed independent from the others, but the rotations and vibrations can couple
in complicated ways. But we are lazy, and will make this simplifying assumption,
we have
Z
1
= Z
trans
Z
rot
Z
vib
.
We can obtain
Z
trans
just as the partition energy we obtained previously for a
single mass, with mass
M
. The remaining two terms depend only on the relative
position of the masses. So they do not depend on the molecule as a whole, and
are going to be independent of V .
Now when we differentiate
log Z
1
with respect to
V
, then the latter two terms
drop out, and thus we deduce that the ideal gas law still holds for diatomic
gases.
We now try to figure out how rotations and vibrations contribute to the
partition function. For rotations, we can parametrize this using spherical polars
(θ, ϕ). The Lagrangian for this rotational motion is given by
L
rot
=
1
2
I(
˙
θ
2
+ sin
2
θ ˙ϕ
2
).
The conjugate momenta are given by
p
θ
=
L
˙
θ
= I
˙
θ
p
ϕ
=
L
˙ϕ
= I sin
2
θ ˙ϕ.
So we have
H
rot
=
˙
θp
θ
+ ˙ϕp
ϕ
L
rot
=
p
2
θ
2I
+
p
2
ϕ
2I sin
2
θ
.
We can then work out the partition function
Z
rot
=
1
(2π~)
2
Z
dθ dϕ dp
θ
dp
ϕ
e
βH
rot
We note that the p
θ
and p
ϕ
integrals are just Gaussians, and so we get
Z
rot
=
1
(2π~)
2
s
2πI
β
Z
π
0
dθ
s
2πI sin
2
θ
β
Z
2π
0
dϕ =
2IkT
~
2
.
Then we get
E
rot
=
β
log Z
rot
=
1
β
= kT.
This agrees with the equipartition of energy, as here we have two degrees of
freedom, and each contributes
1
2
kT .
Example.
In the case of a system where vibrations are not important, e.g. if
the string is very rigid, then we are done, and we have found
Z = Z
trans
Z
rot
(kT)
5/2
.
Then the partition function for N particles is
Z =
1
N!
Z
N
1
.
and the total energy is
β
log Z =
5
2
NkT.
This is exactly as we expect. There is
3
2
NkT
from translation, and
NkT
from
rotation. From this, we obtain the heat capacity
C
V
=
5
2
Nk.
We now put in the vibrations. Since we are modelling it by a spring, we
can treat it as a harmonic oscillator, with mass
m
and frequency
ω
, which is
determined by the bond strength. Then if we let
ζ
be the displacement form
equilibrium, then the Hamiltonian is
H
vib
=
p
2
ζ
2m
+
1
2
2
ζ
2
.
So we have
Z
vib
=
1
2π~
Z
dζ dp
ζ
e
βH
vib
=
kT
~ω
.
From the partition function, we can get the energy of a single molecule, and find
it to be
E
vib
= kT.
This is the average energy in the vibrational motion of a molecule. This looks a
bit funny. The vibration is only one degree of freedom, but the equipartition of
energy seems to think this has two degrees of freedom. It turns out equipartition
of energy behaves differently when we have potential energy. In general, we
should think of having one degree of freedom for each quadratic term in the
Hamiltonian, and so we have one degree of freedom for kinetic energy and another
for potential.
Putting all three types of motion together, we get
E =
7
2
NkT,
and the heat capacity is
C
V
=
7
2
Nk.
Note that these results are completely independent of the parameters describing
the molecule!
Does this agree with experiments? The answer is no! If we go and measure
the heat capacity of, say, molecular hydrogen, we find something like
T
C
V
/Nk
200 2000
1.5
2.5
3.5
So it seems like our prediction only works when we have high enough temperature.
At lower temperatures, the vibration modes freeze out. Then as we further lower
the energy, rotation modes also go away. This is a big puzzle classically! This is
explained by quantum effects, which we will discuss later.
2.5 Interacting gases
So far, we have been talking about ideal gases. What happens if there are
interactions?
For a real gas, if they are sufficiently dilute, i.e.
N/V
is small, then we expect
the interactions to be negligible. This suggests that we can try to capture the
effects of interactions perturbatively in
N
V
. We can write the ideal gas law as
p
kT
=
N
V
.
We can think of this as a first term in an expansion, and add higher order terms
p
kT
=
N
V
+ B
2
(T )
N
2
V
2
+ B
3
(T )
N
3
V
3
+ ··· .
Note that the coefficients depend on
T
only, as they should be intensive quantities.
This is called the Virial expansion, and the coefficients
B
k
(
T
) are the Virial
coefficients. Our goal is to figure out what these B
k
(T ) are.
We suppose the interaction is given by a potential energy
U
(
r
) between two
neutral atoms (assuming monoatomic) at separation r.
Example. In genuine atoms, for large r (relative to atomic size), we have
U(r)
1
r
6
.
This comes from dipole-dipole interactions. Heuristically, we can understand
this power of 6 as follows while the expectation values of electric dipole of an
atom vanishes, there exists non-trivial probability that the dipole
p
1
is non-zero.
This gives an electric field of
E
p
1
r
3
.
This induces a dipole p
2
in atom 2. So we have
p
2
E
p
1
r
3
.
So the resulting potential energy is
U p
2
E
p
2
1
r
6
.
This is called the van der Waals interaction. Note that the negative sign means
this is an attractive force.
For small
r
, the electron orbitals of the atoms start to overlap, and then we get
repulsion due to the Pauli principle. All together, we obtain the Lennard-Jones
potential given by
U(r) = U
0
r
0
r
12
r
0
r
6
.
r
U(r)
To make life easy, we will actually use a “hard core repulsion” potential instead,
given by
U(r) =
(
r < r
0
U
0
r
0
r
6
r > r
0
r
U(r)
r
0
For a general U, we can write the Hamiltonian of the gas is
H =
N
X
i=1
p
2
i
2m
+
X
i>j
U(r
ij
),
with
r
ij
= |r
i
r
j
|.
is the separation between particle i and particle j.
Because of this interaction, it is no longer the case that the partition function
for
N
particles is the
N
th power of the partition function for one particle. We
have
Z(N, V, T ) =
1
N
1
(2π~)
3n
Z
N
Y
i=1
d
3
p
i
d
3
r
i
e
βH
=
1
N!
1
(2π~)
3n
Z
Y
i
d
3
p
i
e
βp
2
i
/2m
!
Z
Y
i
d
3
r
i
e
β
P
j<k
U(r
jk
)
!
=
1
N!λ
3N
Z
Y
i
d
3
r
i
e
β
P
j<k
U(r
jk
)
,
where again
λ =
r
2π~
2
mkT
.
Since we want to get an expansion, we might want to expand this in terms of
the potential, but that is not helpful, because the potential is infinite for small
r
.
Instead it is useful to consider the Mayer f-function:
f(r) = e
βU(r)
1.
This function has the property that
f
(
r
) =
1 for
r < r
0
(in the case of the
hardcore repulsion), and
f
(
r
)
0 as
r
. So this is a nicer function as it
only varies within this finite range.
We further simplify notation by defining
f
ij
= f(r
ij
).
Then we have
Z(N, V, T ) =
1
N!λ
3N
Z
Y
i
d
3
r
i
Y
j<k
(1 + f
jk
)
=
1
N!λ
3N
Z
Y
i
d
3
r
i
1 +
X
j<k
f
jk
+
X
j<k
X
`<m
f
jk
f
`m
+ ···
.
The first term is just
Z
Y
i
d
3
r
i
= V
N
,
and this gives the ideal gas term. Now each of the second terms is the same, e.g.
for j = 1, k = 2, this is
Z
Y
i
d
3
r
i
f
12
= V
N2
Z
d
3
r
1
d
3
r
2
f
12
= V
N1
I,
where we set
I =
Z
d
3
r f (r).
Since
f
(
r
)
0 as
r
, we might as well integrate over all space. Summing
over all terms, and approximating
N
(
N
1)
/
2
N
2
/
2, we find that the first
two terms of the partition function are
Z(N, V, T ) =
V
N
N!λ
3N
1 +
N
2
2V
I + ···
.
Up to first order, we can write this as
Z(N, V, T ) =
V
N
N!λ
3N
1 +
N
2V
I + ···
N
= Z
ideal
1 +
N
2V
I + ···
N
.
This pulling out of the
N
to the exponent might seem a bit arbitrary, but writing
it this way, it makes it much clearer that
S
,
F
etc would be extensive quantities.
For example, we can write down the free energy as
F = kT log Z = F
ideal
NkT log
1 +
N
2V
I + ···
.
Without actually computing
I
, we can expect that it grows as
I r
3
0
. Since we
are expanding in terms of NI/V , we need
N
V
1
r
3
0
.
So we know the expansion is valid if the density of the gas is much less than the
density of an atom. In real life, to determine the density of an atom, we need to
find a system where the atoms are closely packed. Thus, it suffices to measure
the density of the substance in liquid or solid form.
Assuming that the gas is indeed not dense, we can further use the approxi-
mation
log(1 + x) x.
So we have
p =
F
V
T
=
NkT
V
1
N
2V
I + ···
.
So we have
pV
NkT
= 1
N
2V
I + ··· .
So we can now read off what the second Virial coefficient is:
B
2
=
1
2
I.
We can consider what we get with different potentials. If we have a completely
repulsive potential, then
U
(
r
)
>
0 everywhere. So
f <
0, and thus
B
2
(
t
)
>
0. In
other words, having a repulsive interaction tends to increase the pressure, which
makes sense. On the other hand, if we have an attractive potential, then the
pressure decreases.
If we have a hardcore repulsion, then we have
I =
Z
r=r
0
r=0
d
3
r (1) +
Z
r=r
0
d
3
r
e
βU
0
(r
0
/r)
6
1
.
The second term is slightly tricky, so we’ll restrict to the case of high temperature,
hence small β. We can write
e
βU
0
(r
0
/r)
6
1 + βU
0
r
0
r
6
+ ···
Then we get
I
4
3
πr
3
0
+
4πU
0
kT
Z
r
0
dr
r
6
0
r
6
=
4πr
3
0
3
U
0
kT
1
.
For large
T
, this is negative, and so the repulsive interaction dominates. We can
plug this into our equation of state, and find
pV
NkT
1
N
V
a
kT
b
,
where
a =
2πr
3
0
3
U
0
, b =
2πr
3
0
3
.
We can invert this to get
kT =
V
N
p +
N
2
V
2
a
1 +
N
V
b
1
.
Taking the Taylor expansion of the inverse and truncating higher order terms,
we obtain
kT =
p +
N
2
V
2
a
V
N
b
.
This is the van der Waals equation of state, which is valid at low density
(Nr
3
0
/V 1) and high temperature (βU
0
1).
We can also write this as
p =
NkT
V bN
a
N
2
V
2
.
In this form, we see that
a
gives a reduced pressure due to long distance attractive
force, while we can view the
b
contribution as saying that the atoms take up
space, and hence reduces the volume.
By why exactly the factor of
bN
? Imagine two atoms living right next to
each other.
r
0
The value
r
0
was chosen to be the distance between the two cores, so the volume
of each atom is
4π
3
r
0
2
3
,
which is not
b
. We might think the right way to do this problem is to look at
how much volume the atom excludes, i.e. the dashed volume. This is
Ω =
4
3
πr
3
0
= 2b.
This is again not right. So we probably want to think more carefully.
Suppose we have
N
particles. When we put in the first atom, the amount
of space available to put it is
V
. When we now try to put in the second, the
available space is
V
(assuming the two atoms are not too close together so
that they don’t exclude the same volume). Similarly, when we put in the third,
the available space is V 2Ω.
Diving by
N
! for indistinguishability, we find that the total phase space
volume available for placing our particles is
1
N!
V (V Ω)(V 2Ω) ···(V (N 1)Ω)
1
N!
V
N
1
N
2
2
V
+ ···
1
N!
V
N
2
N
,
which explains why the reduced volume for each particle is /2 instead of Ω.
Now suppose we want to find higher-order corrections to the partition function
in the Virial expansion. The obvious thing might be to take them to be the
contributions by
PP
f
jk
f
`m
. However, this is not quite right. If we try to keep
track of how large the terms are carefully, we will get a different answer.
The way to do it properly would be via the cluster expansion, which involves
using nice diagrams to figure out the right terms to include, similar to how
Feynman diagrams are used to do perturbation theory in quantum field theory.
Unfortunately, we do not have the time to go into the details.
3 Quantum gases
We now move on to study quantum gases. As before, we are going to spend a
lot of time evaluating the partition function for different systems. Recall that
the partition function is defined as a sum over all eigenstates. In the world of
classical gases, we had the benefit of working over a continuous phase space,
and thus we can replace the sum by integrals. In the quantum world, this is no
longer the case, However, most of the time, the states are very closely packed,
and we might as well approximate the sum by an integral. Of course, we cannot
replace the sum just by
R
d
E
. We have to make sure we preserve the “density”
of the energy levels.
3.1 Density of states
Consider an ideal gas in a cubic box with side lengths
L
. So we have
V
=
L
3
.
Since there are no interactions, the wavefunction for multiple states can be
obtained from the wavefunction of single particle states,
ψ(x) =
1
V
e
ik·x
.
We impose periodic boundary conditions, so the wavevectors are quantized by
k
i
=
2πn
i
L
,
with n
i
Z.
The one-particle energy is given by
E
n
=
~
2
k
2
2m
=
4π
2
~
2
2mL
2
(n
2
1
+ n
2
2
+ n
2
3
).
So if we are interested in a one-particle partition function, we have
Z
1
=
X
n
e
βE
n
.
We now note that
βE
n
λ
2
L
2
n
2
,
where
λ =
r
2π~
2
mkT
.
is the de Broglie wavelength.
This
λ
2
L
2
gives the spacing between the energy levels. But we know that
λ L
. So the energy levels are very finely spaced, and we can replace the sum
by an integral. Thus we can write
X
n
Z
d
3
n
V
(2π)
3
Z
d
3
k
4πV
(2π)
3
Z
0
d|k| |k|
2
,
where in the last step, we replace with spherical polars and integrated over
angles. The final step is to replace |k| by E. We know that
E =
~
2
|k|
2
2m
.
So we get that
dE =
~
2
|k|
m
d|k|.
Therefore we can write
X
n
=
4πV
(2π)
3
Z
0
dE
r
2mE
~
2
m
~
2
.
We then set
g(E) =
V
2π
2
2m
~
2
3/2
E
1/2
.
This is called the density of states. Approximately,
g
(
E
) d
E
is the number of
single particle states with energy between E and E + dE. Then we have
X
n
=
Z
g(E) dE.
The same result (and derivation) holds if the sides of the box have different
length. In general, in d dimensions, the density of states is given by
g(E) =
V vol(S
d1
)
2 · π
d/2
m
2π~
2
d/2
E
d/21
.
All these derivations work if we have a non-relativistic free particle, since we
assumed the dispersion relation between E and k, namely
E =
~
2
|k|
2
2m
.
For a relativistic particle, we instead have
E =
p
~
2
|k|
2
c
2
+ m
2
c
4
.
In this case,
|k|
is still quantized as before, and repeating the previous argument,
we find that
g(E) =
V E
2π
3
~
3
c
3
p
E
2
m
2
c
4
.
We will be interested in the special case where m = 0. Then we simply have
g(E) =
V E
2
2π
3
~
3
c
3
.
We can start doing some physics with this.
3.2 Black-body radiation
Unfortunately, in this chapter, we do not have much opportunity to deal with
genuine gases. Instead, we are going to take some system that is not a gas, and
pretend it is a gas. Of course, nothing in our previous derivations relied on the
system actually being a gas. It just relied on the fact that we had a lot of things.
So the results should still hold.
In this chapter, we suppose have a gas of photons (which is just a fancy way
to say “photons”) in a box with opaque walls.
We suppose this box of photon is at equilibrium with temperature
T
. We will
use the box of photons as a model of a black body”, i.e. a perfectly absorbing
object. The gas of photons inside is called black-body radiation. Later on, we
will argue that the photons emitted by a black body must indeed follow the
same distribution as the photons inside our box.
We begin by reminding ourselves of some basic properties of photons: they
are massless and have energy
E = ~ω, ω =
2πc
λ
,
and λ is the (genuine) wavelength of the photon.
Interactions between photons are negligible, so we can treat them as noble
gas.
Photons have two polarization states, since given any direction of travel
k
,
the electric and magnetic fields have to obey
E · k = B · k = B · E = 0,
and there are two independent choices for
E
and
B
. This has implications for
our counting, as we need an extra factor of 2 in our density of states. So the
density of states is given by
g(E) dE =
V E
2
π
2
~
3
c
3
dE.
Using the fact that
E = ~ω,
it is often convenient to instead write this as
g(ω) dω =
V ω
2
π
2
c
3
dω.
This is an abuse of notation, as the two
g
’s we wrote down are different functions,
but this is a physics course.
The final important property of the photons is that photon numbers are not
conserved. Indeed, they are absorbed and emitted by walls of the box. So we
must sum over all possible photon numbers, even though we are in the canonical
ensemble. In practice, what we get is the same as the grand canonical ensemble,
but with µ = 0.
We can begin. While we have figured out the density of states, we will not
use that immediately. Instead, we still begin by assuming that we have a discrete
set of states. Only after doing all the manipulations, we replace all remaining
occurrences of sums with integrals so that we can actually evaluate it.
We can work with this in more generality. Consider a system of non-
interacting particles with one-particle states
|ii
of energies
E
i
. Then the general
accessible state can be labelled by
{n
1
, n
2
, ···},
where
n
i
is the number of particles in
|ii
. Since particle numbers are not
conserved, we do not impose any restrictions on the possible values of the
n
i
.
The energy of such a state is just
X
i
n
i
E
i
.
As before, in the canonical ensemble, the probability of being in such a state is
p({n
i
}) =
1
Z
e
β
P
n
j
E
j
,
and
Z =
X
{n
k
}
e
β
P
n
j
E
j
=
X
n
1
=0
e
βn
1
E
1
X
n=2
e
βn
2
E
2
··· =
Y
i
1
1 e
βE
i
.
So we find that
log Z =
X
i
log(1 e
βE
i
).
Using this, we have
hn
i
i =
X
{n
k
}
n
i
p({n
k
})
=
X
{n
k
}
n
i
e
β
P
j
n
j
E
j
Z
=
1
β
E
i
log Z.
Using the formula of log Z we had, we find
hn
i
i =
1
e
βE
i
1
.
Applying this to photons, we replace the sum with an integral, and
E
i
=
~ω
.
Using our density of states, we have
log Z =
V
π
3
c
3
Z
0
dω ω
2
log(1 e
β~ω
).
Similarly, the average number of photons with frequency between ω and dω is
n(ω) dω = g(ω) dω ·
1
e
β~ω
1
=
V ω
2
dω
π
2
c
3
(e
β~ω
1)
.
Thus, the total energy in this range is
E(ω) dω = ~ωn(ω) dω =
V ~
π
2
c
3
ω
3
e
β~ω
1
dω.
This is the Planck distribution.
Let’s check that this makes sense. Let’s try to compute the total energy of
all photons. We have
E =
log Z
β
V
=
V ~
π
2
c
3
Z
0
dω
ω
3
e
β~ω
1
=
Z
0
E(ω) dω,
as we would expect.
Putting
ω
=
2πc
λ
, we find that the average energy with wavelength between
λ and λ + dλ is given by the horrendous formula
˜
E(λ) dλ =
V ~(2πc)
4
π
2
c
3
λ
5
dλ
e
β~2πc/λ
1
.
We can plot this for some values of T :
λ
E(λ)
Note that the maximum shifts to the right as temperature lowers, and the total
energy released also decreases. For the maximum to be near the visible range,
we need T 6000 K, which is quite a lot.
In general, if we want to figure out when E(ω) is maximized, then we set
dE
dω
= 0.
This gives us
ω
max
= ζ
kT
~
,
where ζ 2.822 is the solution to
3 ζ = 3e
ζ
.
This is known as Wien’s displacement law. In particular, we see that this is
linear in T .
Recall that we didn’t actually find the value of
E
. To do so, we have to do
the integral. We perform the change of variables x = β~ω to get
E =
V (kT )
4
π
2
c
3
~
3
Z
0
x
3
dx
e
x
1
.
The remaining integral is just some constant, which isn’t really that important,
but it happens that we can actually evaluate it, and find the value to be
Z
0
x
3
dx
e
x
1
= Γ(4)ζ(4) =
π
4
15
,
where ζ is the Riemann zeta function.
The energy density of the box is thus
E =
E
V
=
π
2
k
4
15~
3
c
3
T
4
T
4
.
Now if we are experimentalists, then we have no way to figure out if the above
result is correct, because the box is all closed. To look into the box, we cut a
small hole in the box:
Let’s say this hole has area
A
, which is sufficiently small that it doesn’t mess
with what is going on in the box. How much energy do we expect the box to
leak out of the hole?
For the purposes of this analysis, for a wave vector
k
, we let
S
(
k
) be the set
of all photons with wave-vector within d
3
k
of
k
. We further suppose
k
makes
an angle θ with the normal to the hole.
k
θ
What we want to know is what portion of
S
(
k
) manages to get out of the hole,
say in a time period dt. To do so, we look at the following volume:
The volume of this box is
Ac cos θ
d
t
. So the proportion of energy that gets out
is simply
Ac cos θ
V
dt.
We now write
E(|k|) d
3
k = total energy in S(k).
Then the total energy leaving the hole in time dt as
Z
θ[0/2]
d
3
k
Ac cos θ
V
dt E(|k|).
To evaluate this integrate, we simply have to introduce polar coordinates, and
we get
cA dt
V
Z
2π
0
dϕ
Z
π/2
0
dθ sin θ cos θ
Z
0
d|k| |k|
2
E(|k|).
The first two integrals give 2
π
and
1
2
respectively, and we can rewrite the last
integral back into Cartesian coordinates, and it is
1
4π
Z
d
3
k E(|k|) =
E
4π
.
Putting these all together, we find that the rate of energy emission is
1
4
cAE dt.
If we define the energy flux as energy per unit time per unit area leaving the
hole, then the energy flux is given by
c
4
E = σT
4
,
where
σ =
π
2
k
4
60~
3
c
2
5.67 × 10
8
J s
1
m
2
K
4
.
What was the point of computing this? We don’t really care that much about
the experimentalists. Suppose we had any black body, and we want to figure
out how much radiation it is emitting. We imagine we put it inside that box.
Then to the box, the surface of the black body is just like the hole we drilled
through the box. So we know that the black body is absorbing
σT
4
energy per
unit time per unit area.
But the system is in equilibrium, so the black body must emit the exact same
amount of radiation out. So what we have derived is in fact how black bodies
behave!
The best example of a black body we know of is the cosmic background
microwave radiation of the universe. This is a black body radiation to incredibly
high accuracy, with a temperature of
T
=
2.7 K
. This is the temperature of
space.
Let’s quickly calculate some other thermodynamic quantities we might be
interested in. We have
F = kT log Z
=
V kT
π
2
c
3
Z
0
dω ω
2
log(1 e
β~ω
)
Integrating by parts, we obtain
=
V ~
3π
2
c
3
Z
0
dω
ω
3
e
β~ω
1 e
β~ω
=
V ~
3π
2
c
3
1
β
4
~
4
Z
0
x
3
dx
e
x
1
=
V π
2
k
4
45~
3
c
3
T
4
.
The free energy is useful, because we can differentiate it to get pressure:
p =
F
V
T
=
E
3V
=
1
3
E =
4σ
3c
T
4
.
This is radiation pressure. Since
c
is a big number, we see that radiation pressure
is small.
We can also get the entropy from this. We compute
S =
F
T
V
=
16V σ
3c
T
3
.
Another quantity we can be interested in is the heat capacity, which is
C
V
=
E
T
V
=
16V σ
c
T
3
.
The classical/high temperature limit of black body radiation happens when
~ω kT. In this case, we have
1
e
β~ω
1
1
β~ω
.
So we have
E(ω) dω
V ω
2
π
2
c
3
kT dω E
classical
(ω) dω.
This is known as the Rayleigh-Jeans law. Note that there is no
~
in it. So it is
indeed a classical result. It also agrees with equipartition of energy, as it does
give
kT
per normal mode of the electromagnetic field, with each mode viewed
as a harmonic oscillator. We get kT because here we have a potential.
However, this has the obvious problem that
E
(
ω
) is unbounded as
ω
.
So if we tried to compute the total energy, then we get infinity. This is called
the ultraviolet catastrophy. This showed that classical reasoning doesn’t actually
work at high energies, and this is what eventually led Planck to come up with
Planck’s constant.
3.3 Phonons and the Debye model
One could still reasonably imagine a “gas of photons” as a gas. In this section,
we are going to study solids. It turns out, this works.
In the first example sheet, we studied a model of a solid, by treating it
as some harmonic oscillators. We saw it was good at high temperatures, but
rubbish at low temperatures. Here we are going to use a better model.
If we have a crystal lattice, then they vibrate, and these give sound waves in
the crystal. In quantum mechanics, we know that light is made out of particles,
called photons. Similarly, in quantum mechanics, sound is made up of “particles”
called phonons. Similar to the case of photons, the energy of a phonon of
frequency ω is
E = ~ω,
and the frequency of a phonon is described by a dispersion relation involving
the wavevector k.
ω = ω(k).
In general, this function is very complicated. We learn how to compute these in
the IID Applications of Quantum Mechanics course, but we will not do that.
Suppose the spacing in the crystal is
a
. If
|k|a
1, then the dispersion
relation becomes linear, and we have
ω |k|c
s
,
where
c
s
is the speed of sound in the solid. This is all very similar to photons,
except we replace c with c
s
.
However, there is some important difference compared to photons. While
photons have two polarizations, phonons have 3. Two of these are transverse,
while the third is longitudinal. As one may know from the IID Waves course,
they travel at different speeds, which we write as
c
T
and
c
L
for transverse and
longitudinal respectively.
We now want to count the number of phonons, and obtain the density of
states. Repeating what we did before, we find
g(ω) dω =
V ω
2
2π
2
c
3
T
+
1
c
3
L
dω.
This is valid for |k|a 1. Alternatively, we need ωa/c
s
1.
It is convenient to write this as
g(ω) dω =
3V ω
2
2π
2
¯c
3
s
dω, ()
where
3
¯c
3
s
=
2
c
3
T
+
1
c
3
L
defines ¯c
s
as the average speed.
The Debye model ignores the
|k|a
1 assumption, and supposes it is also
valid for ωa/c
3
1.
There is another difference between photons and phonons for phonons,
ω
cannot be arbitrarily large. We know high
ω
corresponds to small wavelength,
but if the wavelength gets too short, it is less than the separation of the atoms,
which doesn’t make sense. So we have a minimum possible wavelength set by the
atomic spacing. Consequently, there is a maximum frequency
ω
D
(
D
for
Debye
).
We would expect the minimum wavelength to be
a
V
N
1/3
. So we
expect
ω
D
¯c
s
N
V
1/3
.
Here we are assuming that the different speeds of sound are of similar orders of
magnitude.
Is this actually true? Given a cut-off
ω
0
, the total number of 1-phonon states
is
Z
ω
0
0
g(ω) dω =
V ω
2
0
2π
2
¯c
3
s
.
This is the number of different ways the lattice can vibrate. In other words, the
number of normal modes of the lattice. Thus, to find
ω
0
, it suffices to find the
number of normal modes, and equate it with the above quantity.
Claim. There are in fact 3N normal modes.
Proof. We consider the big vector
X =
x
1
x
2
.
.
.
x
N
,
where each x
i
is the position of the ith atom. Then the Lagrangian is
L =
1
2
˙
X
2
V (X),
where V is the interaction potential that keeps the solid a solid.
We suppose V is minimized at some particular X = X
0
. We let
δX = X X
0
.
Then we can write
L =
1
2
˙
X
2
V
0
1
2
δX
T
VδX + ··· ,
where
V
is the Hessian of
V
at
X
0
, which is a symmetric positive definite matrix
since we are at a minimum. The equation of motion is then, to first order,
¨
X = VδX.
We assume that X = Re(e
t
Q) for some Q and ω, and then this reduces to
VQ =
2
Q.
This is an eigenvalue equation for
V
. Since
V
is a 3
n ×
3
n
symmetric matrix,
it has 3
N
independent eigenvectors, and so we are done, and these are the 3
N
normal modes.
If we wanted to, we can diagonalize
V
, and then the system becomes 3
N
independent harmonic oscillators.
If we accept that there are 3N normal modes, then this tells us
V ω
2
0
2π
2
¯c
3
3
= 3N.
Using this, we can determine
ω
0
=
6π
2
N
V
1/3
¯c
s
,
which is of the form we predicted above by handwaving. If we live in a frequency
higher than this, then we are exciting the highest frequency phonons. We also
define the Debye temperature
T
0
=
~ω
0
k
.
How high is this temperature? This depends only on
N, V
and
¯c
s
, and these
are not hard to find. For example
T
0
for lead is
100 K
, since lead is a soft
substance and speed is slow. On the other hand,
T
0
2000 K
for diamond, since
it is hard.
We can compute partition functions for these, entirely analogous to that
of photons. The only important difference is that we have that cutoff
ω ω
0
.
What we get is
log Z =
Z
ω
0
0
dω g(ω) log
1 e
β~ω
.
From the partition function, we can get the energy
E =
log Z
β
V
=
Z
ω
0
0
dω ~ωg(ω)
e
β~ω
1
=
3V ~
2π
2
¯c
3
s
Z
ω
0
0
dω ω
3
e
β~ω
1
.
Again we set
x
=
β~ω
. Then the limit becomes
x
=
T
0
/T
. So the integral
becomes
E =
3V (kT )
4
2π
3
(~¯c
s
)
3
Z
T
0
/T
0
dx x
3
e
x
1
.
This is the same as the case of a photon, but the integral has a limit related to
the temperature. This is integral not something we can investigate easily, but
we can easily analyze what happens at extreme temperatures.
If T T
0
, then x takes a very small range, and we can Taylor expand
Z
T
0
/T
0
dx x
3
e
x
1
=
Z
T
0
/T
0
dx (x
2
+ ···) =
1
3
T
0
T
3
.
So we find that E T , and
C
V
=
E
T
V
=
V k
4
T
3
0
2π
2
(~¯c
s
)
3
= 3Nk.
This agrees reasonably well with experiment, and is called the Dulong–Petit
law. This is also predicted by the Einstein’s model. This is essentially just
a consequence of the equipartition of energy, as there are 3
N
degrees of
freedom, and we have kT for each.
If
T T
0
, then we might as well replace the upper limit of the integral by
infinity, and we have
Z
T
0
/T
0
dx x
3
e
x
1
Z
0
dx x
3
e
x
1
=
π
4
15
.
So we have
E T
4
, which is exactly like photons. If we work out the heat
capacity, then we find
C
V
=
2π
2
V k
4
5(~¯c
s
)
3
T
3
= Nk
12π
4
5
T
T
0
3
.
Remarkably, this
T
3
behaviour also agrees with experiment for many
substances, unlike the Einstein model.
This is pattern is observed for most solids. There is one important exception,
which is for metals. In metals, we have electrons which are free to move in the
lattice. In this case, they can also be considered to form a gas, and they are
important at low temperatures.
Note that in the model, we made the assumption
ω |k|c
s
. This is valid
for
ω ω
D
. Because of this, we would expect the Debye model to work at low
temperatures
T T
0
. At high temperature, we saw it also happens to works,
but as we said, this is just equipartition of energy with 3
N
oscillators, and any
model that involves 3N harmonic oscillators should give the same prediction.
3.4 Quantum ideal gas
Finally, we go back to talk about actual gases. Previously, for an ideal gas, we
wrote
Z = Z
N
1
,
and we found there was a problem with entropy, and then we argued that we
should have a factor of
1
N!
there, because we over-counted identical states. How-
ever, it turns out this is still not quite true. This is really just an approximation
that is valid at certain circumstances.
We let single particle states be
|ii
, with energies
E
i
. For simplicity, we assume
they are bosons. Let’s consider the simplest non-trivial example, which is when
N
= 2. Then since we have two bosons, we know the states of the particles must
be symmetric. So the possible states are of the form
|ii|ii,
1
2
(|ii|ji + |ji|ii),
with
i 6
=
j
. Let’s now calculate the partition function by summing over all these
states. We have
Z =
X
i
e
β2E
i
+
1
2
X
i6=j
e
β(E
i
+E
j
)
,
where we had to divide by 2 to avoid double counting. We compare this with
1
2!
Z
2
1
=
1
2
X
i
e
βE
i
!
X
j
e
βE
j
=
1
2
X
i
e
β2E
i
+
1
2
X
i6=j
e
β(E
i
+E
j
)
.
We see that the second parts are the same, but the first terms differ by a factor
of
1
2
. Thus, for the approximation
Z
Z
2
1
2!
to be valid, we need that the probability of two particles being in the same
one-particle state is negligible. Similarly, for N particles, the approximation
Z =
Z
N
1
N!
()
is valid if the probability of 2 or more particles being in the same state is
negligible. This would be true if
hn
i
i
1 for all
i
, where
n
i
is the number of
particles in
|ii
. Under these circumstances, we can indeed use (
). This is also
true for Fermions, but we will not do those computations.
When does this assumption hold? We will somewhat circularly assume that
our model is valid, and then derive some relation between
hn
i
i
and other known
quantities.
Last time, we fond
hn
i
i =
1
β
log Z
E
i
N
β
log Z
1
E
i
=
N
Z
1
e
βE
i
for all i, where we used the approximation () in deriving it.
For a monoatomic gas, we had
Z
1
=
X
i
e
βE
i
Z
0
dE g(e)e
βE
=
V
λ
3
,
where
λ =
r
2π~
2
mkT
.
Substituting this into our expression for hn
i
i, we need
1
Nλ
3
V
e
βE
i
for all
i
. So we need
λ
V
E
1/3
. So we see that this is valid when our gas is
not too dense. And by our definition of λ, this is valid at high temperatures.
Recall that when we did diatomic gases classically, we had
H = H
trans
+ H
rot
+ H
vib
Z
1
= Z
trans
+ H
rot
+ H
vib
.
We re-examine this in the quantum case.
We still have
H
trans
=
p
2
2m
.
So we find that
Z
trans
=
Z
0
dE g(E)e
βE
=
V
λ
3
,
as before. So this part is unchanged.
We now look at the rotational part. We have
H
rot
=
J
2
2I
.
We know what the eigenvalues of J
2
are. The energy levels are
~
2
j(j + 1)
2I
,
where j = 0, 1, 2, ···, and the degeneracy of each level is 2j + 1. Using this, we
can write down the partition function
Z
rot
=
X
j=0
(2j + 1)e
β~
3
j(j+1)/2I
.
Let’s look at the high and low energy limits. If
kT
~
2
2I
,
then the exponents up there are small. So we can approximate the sum by an
integral,
Z
rot
Z
0
dx (2x + 1)e
β~
3
x(x+1)/2I
.
This fortunately is an integral we can evaluate, because
d
dx
x(x + 1) = 2x + 1.
So we find that
Z
rot
=
2I
β~
2
.
This is exactly what we saw classically.
But if
kT
~
2
2I
, all terms are exponentially suppressed except for
j
= 0. So
we find
Z
rot
1.
This is what we meant when we said the rotational modes are frozen out for low
energy modes.
We can do the same thing for the vibrational modes of the degree of freedom.
Recall we described them by harmonic oscillators. Recall that the energy levels
are given by
E
n
= kω
n +
1
2
.
So we can write down the partition function
Z
vib
=
X
n=0
e
β
(
n+
1
2
)
= e
β~ω/2
1
1 e
β~ω
=
1
2 sinh(~βω/2)
.
We can do the same thing as we did for rotational motion. Now the relevant
energy scale is ~ω/2.
At high temperatures
kT ~ω/
2, we can replace
sinh
by the first term in
its Taylor expansion, and we have
Z
vib
1
β~ω
.
On the other hand, at low temperatures, we have a sum of exponentially small
terms, and so
Z
vib
e
β~ω/2
.
So we find that
E
vib
=
β
log Z
vib
=
~ω
2
.
So for N molecules, we have
E
vib
N~ω
2
.
But this is not measurable, as it is independent of
T
. So this doesn’t affect the
heat capacity. So the vibrational modes freeze out.
We can now understand the phenomenon we saw previously:
T
C
V
/Nk
~
2
/2I
~ω/2
1.5
2.5
3.5
Note that in theory it is certainly possible that the vibrational mode comes first,
instead of what we drew. However, in practice, this rarely happens.
3.5 Bosons
In the previous derivation, we needed to assume that
λ =
r
2π~
2
mkT
V
N
1/2
.
At very low temperatures, this is no longer true. Then quantum effects become
important. If there are no interactions between particles, then only one effect is
important quantum statistics.
Bosons have integer spin, and the states have to be are symmetric with
respect to interchange of 2 particles. For example, photons are bosons.
Fermions have spin
1
2
, and the states have to be antisymmetric, e.g.
e, p, n
.
Since spins add up, atoms made from an even (odd resp.) number of electrons,
protons and neutrons is a boson (fermion resp.). In an atom, since the charge is
neutral, we know we have the same number of electrons and protons. So we are
really counting the number of neutrons.
For example, hydrogen has no neutrons, so it is a boson. On the other hand,
deuterium has 1 neutron, and so it is a fermion.
We suppose we have bosons, and the single particles states are
|ri
with energy
E
r
. We let
n
r
be the number of particles in
|ri
. We assume the particles are
indistinguishable, so to specify the state of the
n
-particle system, we just need
to specify how many particles are in each 1-particle state by
{n
1
, n
2
, n
3
, ···}.
Once we specify these numbers, then the energy is just
X
r
n
r
E
r
.
So we can write down the partition function in the canonical ensemble by
summing over all possible states:
Z =
X
{n
r
}
e
β
P
s
n
s
E
s
.
This is exactly the same as we had for photons. But there is one very important
difference. For photons, the photon numbers are not conserved. But here the
atoms cannot be destroyed. So we assume that the particle number is conserved.
Thus, we are only summing over {n
r
} such that
X
n
r
= N.
This makes it rather difficult to evaluate the sum. The trick is to use the grand
canonical ensemble instead of the canonical ensemble. We have
Z =
X
{n
r
}
e
β
P
s
n
s
(E
s
µ)
,
and now there is no restriction on the allowed values of
{n
i
}
. So we can factorize
this sum and write it as
Z =
Y
s
Z
s
,
where
Z
s
=
X
n
s
=0
e
β(E
s
µ)n
s
=
1
1 e
β(E
s
µ)
.
For this sum to actually converge, we need
µ < E
s
, and this has to be true for
any
s
. So if the ground state energy is
E
0
, which we can always choose to be 0,
then we need µ < 0.
Taking the logarithm, we can write
log Z =
X
r
log(1 e
β(E
r
µ)
).
We can now compute the expectation values hn
r
i. This is
hn
r
i =
1
β
E
r
log Z =
1
e
β(E
r
µ)
1
.
This is called the Bose–Einstein distribution. This is the expected number of
variables in the 1-particle state r.
We can also ask for the expected total number of particles,
hNi =
1
β
µ
log Z =
X
r
1
e
β(E
r
µ)
1
=
X
r
hn
r
i.
Similarly, we have
hEi =
X
r
E
r
hn
r
i,
as expected. As before, we will stop writing the brackets to denote expectation
values.
It is convenient to introduce the fugacity
z = e
βµ
.
Since µ < 0, we know
0 < z < 1.
We now replace the sum with an integral using the density of states. Then we
obtain
log Z =
Z
0
dE g(E) log(1 ze
βE
).
The total number of particles is similarly
N =
Z
0
dE
g(E)
z
1
e
βE
1
= N(T, V, µ).
Now we are using the grand canonical ensemble just because it is convenient,
not because we like it. So we hope that we can invert this relation to write
µ = µ(T, V, N). ()
So we can write
E =
Z
0
dE
Eg(E)
z
1
e
βE
1
= E(T, V, µ) = E(T, V, N),
using (). As before, we have the grand canonical potential
pV = Φ =
1
β
log Z =
1
β
Z
0
dE g(E) log(1 ze
βE
).
We now focus on the case of monoatmoic, non-relativistic particles. Then the
density of states is given by
g(E) =
V
4π
2
2m
~
2
3/2
E
1/2
.
There is a nice trick we can do. We integrate by parts, and by integrating
g
(
E
)
and differentiating the log, we can obtain
pV =
2
3
Z
0
dE Eg(E)
z
1
e
βE
1
=
2
3
E.
So if we can express
E
as a function of
T, V, N
, then we can use this to give us
an equation of state!
So what we want to do now is to actually evaluate those integrals, and
then invert the relation we found. Often, the integrals cannot be done exactly.
However, we can express them in terms of some rather more standardized
function. We first introduce the Gamma functions:
Γ(s) =
Z
0
t
s1
e
t
dt
This function is pretty well-known in, say, number-theoretic circles. If
n
is a
positive integer, then we have Γ(n) = (n 1)!, and it also happens that
Γ
3
2
=
π
2
, Γ
5
2
=
3
π
4
.
We shall also introduce the seemingly-arbitrary functions
g
n
(z) =
1
Γ(n)
Z
0
dx x
n1
z
1
e
x
1
,
where
n
need not be an integer. It turns these functions appear every time we
try to compute something. For example, we have
N
V
=
1
4π
2
2m
~
2
3/2
Z
0
dE E
1/2
z
1
e
βE
1
=
1
4π
2
2m
~
2
3/2
1
β
3/2
Z
0
dx x
1/2
z
1
e
x
1
=
1
λ
3
g
3/2
(z).
Similarly, the energy is given by
E
V
=
3
2λ
3
β
g
5/2
(z).
It is helpful to obtain a series expansion for the
g
n
(
z
) when 0
z
1. Then we
have
g
n
(z) =
1
Γ(n)
Z
0
dx
zx
n1
e
x
1 ze
x
=
z
Γ(n)
Z
0
dx zx
n1
e
x
X
m=0
z
m
e
mx
=
1
Γ(n)
m
X
m=1
z
m
Z
m
0
dx x
n1
e
mx
=
1
Γ(n)
X
m=1
z
m
m
n
Z
0
du u
n1
e
u
=
X
m=1
z
m
m
n
.
Note that it is legal to exchange the integral and the summation because
everything is non-negative.
In particular, this gives series expansions
N
V
=
z
λ
3
1 +
z
2
2
+ O(z
2
)
E
V
=
3z
2λ
3
β
1 +
z
4
2
+ O(z
2
)
.
Let’s now consider different possible scenario. In the limit
z
1, this gives
z
λ
3
N
V
.
By the assumption on z, this implies that
λ
N
V
1/3
,
which is our good old classical high-temperature limit. So
z
1 corresponds to
high temperature.
This might seem a bit counter-intuitive, since
z
=
e
βµ
. So if
T
, then we
should have
β
0, and hence
z
1. But this is not a valid argument, because
we are fixing N, and hence as we change T , the value of µ also varies.
In this high temperature limit, we can invert the
N
V
equation to obtain
z =
λ
3
N
V
1
1
2
2
λ
3
N
V
+ O
λ
3
N
V
2
!!
.
Plugging this into our expression for E/V , we find that
E =
3
2
N
β
1
1
2
2
λ
3
N
V
+ ···
+
1
4
2
λ
3
N
V
+ ···
.
So we find that
pV =
2
3
E = NkT
1
1
4
2
λ
3
N
V
+ O
λ
3
N
V
2
!!
.
We see that at leading order, we recover the ideal gas law. The first correction
term is the second Virial coefficient. These corrections are not coming from
interactions, but quantum statistics. This Bose statistics reduces pressure.
This is good, but it is still in the classical limit. What we really want to
understand is when things become truly quantum, and
z
is large. This leads to
the phenomenon of Bose–Einstein condensation.
3.6 Bose–Einstein condensation
We can write our previous
N
V
equation as
g
3/2
(z) =
N
V
λ
3
. ()
Recall that
λ T
1/2
. As we decrease
T
with
N/V
fixed, the right hand side
grows. So the
g
3/2
(
z
) must grow as well. Looking at the series expansion,
g
3/2
is evidently an increasing function of
z
. But
z
cannot grow without bound! It is
bounded by z 1.
How does g
3/2
grow as z 1? By definition, we have
g
n
(1) = ζ(n),
the Riemann
ζ
-function. It is a standard result that this is finite at
n
=
3
2
. In
fact,
ζ
3
2
2.612.
We have secretly met this
ζ
function before. Recall that when we did the black
body radiation, we had an integral to evaluate, which was
Z
0
dx x
3
e
x
1
= Γ(4)g
4
(1) = Γ(4)ζ(4) = 3! ·
π
4
90
=
π
4
15
.
Of course, getting the final answer requires knowledge of the value of
ζ
(4), which
fortunately the number theorists have figured out for us in their monumental
special values of L-functions programme.
Going back to physics, we have found that when
T
hits some finite value
T
=
T
c
, the critical temperature, we have
z
= 1. We can simply find
T
c
by
inverting the relation (), and obtain
T
c
=
2π~
2
km
1
ζ(3/2)
N
V
2/3
.
Then we can express
T
c
T
=
λ
3
ζ(3/2)
N
V
2/3
.
So
T
c
is the temperature at which
λ
3
becomes comparable to the number density.
But physically, there shouldn’t be anything that stops us from going below
T
c
.
But we can’t. What does this mean? Maybe
N
should decrease, but particles
cannot just disappear. So something has gone wrong. What has gone wrong?
The problem happened when we replaced the sum of states with the integral
over the density of states
X
k
V (2m)
3/2
4π
2
~
3
Z
0
dE E
1/2
.
The right hand side ignores the ground state
E
= 0. Using the formula for
hn
r
i
,
we would expect the number of things in the ground state to be
n
0
=
1
z
1
1
=
z
1 z
.
For most values of
z
[0
,
1), this is not a problem, as this has just a few particles.
However, for
z
very very very close to 1, this becomes large. This is very
unusual. Recall that when we described the validity of the classical approximation,
we used the fact the number of particles in each state is unlikely to be greater
than 1. Here the exact opposite happens a lot of states try to lump into the
ground state.
Now the solution is to manually put back these ground states. This might
seem a bit dodgy do we have to fix the first exciting state as well? And the
second? However, we shall not worry ourselves with these problems. Just fixing
the ground states is a decent approximation.
We now just put
N =
V
λ
3
g
3/2
(z) +
z
1 z
.
Then there is no problem keeping
N
fixed as we take
T
0. Then we find
z 1
1
N
as T 0.
For
T < T
c
, we can set
z
= 1 in the formula for
g
3/2
(
z
), and we can
approximate
N =
V
λ
3
g
3/2
(1) +
1
1 z
,
and
n
0
=
1
1 z
.
If we divide this expression through by N, we find
n
0
N
= 1
V
Nλ
3
ζ
3
2
= 1
T
c
T
3/2
.
This is the fraction of the particles that are in the ground state, and we see that
as T 0, they all go to the ground state.
For
T < T
c
, there is a macroscopic number of particles that occupy the
ground state. This is called a Bose–Einstein condensate. This is a theoretical
prediction made in the early 20th century, and was confirmed experimentally
much later in 1995. It took so long to do this because the critical temperature is
10
7
K, which is very low!
Let’s look at the equation of state in this condensate. We just have to take
our previous expression for the equation of state, but put back in the contribution
of the ground state. For example, we had
pV = Φ =
1
β
X
r
log(1 e
β(E
r
µ)
).
We previously just replaced the sum with an integral, but this time we shall not
forget the ground state term. Then we have
pV =
2
3
E
1
β
log(1 z).
Recall that we had
E
V
=
3kT
2λ
3
g
5/2
(z). ()
We don’t need to add terms for the ground state since the ground state has zero
energy. Substitute this into the equation of state, and use the fact that
z
1
for T < T
c
to obtain
pV = NkT
V
Nλ
3
ζ
5
2
kT log(1 z).
For
T < T
c
, the first term is
O
(
N
), and the second term is
O
(
log N
), which is
much much smaller than
N
. So the second term is negligible. Thus, we can
approximate
p =
kT
λ
3
ζ
5
2
T
5/2
.
There is a remarkable thing about this expression there is no
V
or
N
in here.
This is just a function of the energy, and very unlike an ideal gas.
We can also compute the heat capacity. We have
C
V
=
E
T
V,N
.
Using , we find that
C
V
V
=
15k
4λ
3
g
5/2
(1) +
3kT
2λ
3
g
0
5/2
(1)
z
T
V,N
.
When T T
c
, we have z 1. So the second term is negligible. So
C
V
V
=
15k
4λ
3
ζ
5
2
T
3/2
.
That wasn’t too exciting. Now consider the case where
T
is slightly greater than
T
c
. We want to find an expression for
z
in terms of
T
. Our previous expressions
were expansions around
z
= 0, which aren’t very helpful here. Instead, we have
to find an expression for z near 1.
Recall that z is determined by
g
3/2
(z) =
λ
3
N
V
.
It turns out the best way to understand the behaviour of
g
3/2
near
z
= 1 is to
consider its derivative. Using the series expansion, it is not hard to see that
g
0
n
(z) =
1
z
g
n1
(z).
Taking n = 3/2, we find
g
0
3/2
(z) =
1
z
g
1/2
(z).
But
g
1/2
(
z
) is not a very nice function near
z
= 1. If we look at the series
expansion, it is clear that it diverges as
z
1. What we want to know is how it
diverges. Using the integral formula, we have
g
1/2
(z) =
1
Γ(1/2)
Z
0
dx
x
1/2
z
1
e
x
1
=
1
Γ(1/2)
Z
ε
0
dx x
1/2
z
1
(1 + x) 1
+ bounded terms
=
z
Γ(1/2)
Z
ε
0
dx x
1/2
1 z + x
+ ···
=
2z
Γ(1/2)
1 z
Z
ε/(1z)
0
du
1 + u
2
+ ···
where we made the substitution
u
=
q
x
1z
. As
z
1, the integral tends to
π/
2
as it is arctan, and so we find that
g
0
3/2
(z) =
π
Γ(1/2)
1 z
+ finite.
We can then integrate this back up and find
g
3/2
(z) = g
3/2
(1)
2π
Γ(1/2)
1 z + O(1 z).
So from (), we know that
2π
Γ(1/2)
1 z ζ(3/2)
λ
3
N
V
.
Therefore we find
z 1
Γ(1/2)ζ(3/2)
2π
2
1
λ
3
N
ζ(3/2)V
2
= 1
Γ(1/2)ζ(3/2)
2π
2
1
T
c
T
3/2
!
2
.
We can expand
1
T
c
T
3/2
3
2
T T
c
T
c
for T T
c
. So we find that
z 1 B
T T
c
T
c
2
for some positive number B, and this is valid for T T
+
c
.
We can now use this to compute the quantity in the heat capacity:
z
T
V,N
2B
T
2
c
(T T
c
).
So we find that
C
V
V
15k
4λ
3
g
5/2
(z)
3Bk
λ
3
g
0
5/2
(1)(T T
c
).
Finally, we can compare this with what we have got when
T < T
c
. The first
part is the same, but the second term is not. It is only present when T > T
c
.
Since the second term vanishes when
T
=
T
c
, we see that
C
V
is continuous
at T = T
c
, but the derivative is not.
T
C
V
T
c
3
2
Nk
This is an example of a phase transition a discontinuity in a thermodynamic
quantity.
Note that this is possible only in the thermodynamic limit
N
. If we
work with finite
N
, then the partition function is just a finite sum of analytic
function, so all thermodynamic quantities will be analytic.
Superfluid Helium
There are two important isotopes of Helium. There is
4
He, which has two protons,
two neutrons and two electrons, hence a boson. There is also a second isotope,
namely
3
He, where we only have one neutron. So we only have 5 fermions, hence
a fermion. Both of these are liquids at low temperatures.
It turns out when temperature is very low, at
2.17 K
, helium-4 exhibits a
phase transition. Then it becomes what is known as a superfluid. This is a
strange state of matter where the liquid appears to have zero viscosity. It does
weird things. For example, if you put it in a cup, it just flows out of the cup.
But we don’t really care about that. What is relevant to us is that there is a
phase transition at this temperature. On the other hand, we see that
3
He doesn’t.
Thus, we figure that this difference is due to difference in quantum statistics!
One natural question to ask is is this superfluid a Bose condensation?
One piece of evidence that suggests they might be related is that when we
look at heat capacity, we get a similar discontinuity in derivative. However,
when we look at this in detail, there are differences. First of all, we are talking
about a liquid, not a gas. Thus, liquid interactions would be important. The
details of the
C
V
graph are also different from Bose gas. For example, the low
energy part of the curve goes like
C
V
T
3
. So in some sense, we found a fluid
analogue of Bose condensates.
It turns out
3
He becomes superfluid when the temperature is very very low,
at
T 10
3
K
. Why is this? Apparently, there are very weak forces between
helium-3 atoms. At very low temperatures, this causes them to pair up, and
when they pair up, they become bosons, and these can condense.
3.7 Fermions
We now move on to study fermions. One major application of this is the study
of electrons in a metal, which we can model as gases. But for the moment, we
can work in complete generality, and suppose we again have a non-interacting
fermion gas. Since they are fermions, they obey the Pauli exclusion principle.
Two fermions cannot be in the same state. Mathematically, this means the
occupation numbers
hn
r
i
cannot be arbitrary they are either 0 or 1. As in
the case of bosons, lets study these fermions using the grand canonical ensemble.
The partition function is
Z =
Y
r
Z
r
,
where
Z
r
=
1
X
n
r
=0
e
βn
r
(E
r
µ)
= 1 + e
β(E
r
µ)
.
This is just like the bosonic case, but we only have two terms in the sum. We
then find that
log Z =
X
r
log(1 + e
β(E
r
µ)
).
Then we have
hn
r
i =
1
β
E
r
log Z =
1
e
β(E
r
µ)
+ 1
.
This is called the Fermi-Dirac distribution. The difference between this and the
bosonic case is that we have a difference in sign!
We can get the total number of particles in the usual way,
hNi =
1
β
β
log Z =
X
R
1
e
β(E
r
µ)
+ 1
=
X
r
hri.
As before, we will stop drawing the h·i.
There is another difference between this and the bosonic case. In the bosonic
case, we required
µ <
0 for the sum
Z
r
to converge. However, here the sum
always converges, are there are just two terms. So there is no restriction on µ.
We assume that our particles are non-relativistic, so the energy is
E =
~
2
k
2
2m
.
Another thing to notice is that our particles have a spin
s Z +
1
2
.
This gives a degeneracy of
g
2
= 2
s
+ 1. This means the density of states has to
be multiplied by g
s
. We have
g(E) =
g
s
V
4π
2
2m
~
2
3/2
E
1/2
.
We can now write down the total number of particles in terms of the density of
states. We have
N =
Z
0
dE g(E)
z
1
e
βE
+ 1
,
where again
z = e
βµ
.
Similarly, the energy is given by
E =
Z
0
dE Eg(E)
z
1
e
βE
+ 1
.
Again, we have the density of states
pV = kT log Z = kT
Z
0
g(E) log(1 + ze
βE
).
Just as for bosons, we can integrate this by parts, and we obtain
pV =
2
3
E.
We now look at the high temperature limit. In the case of boson, we identified
this with a small
z
, and we can do the same here. This is done on example sheet
3, and we find that to leading order, we have
z λ
3
N
V
,
exactly as for bosons. This is exactly the same as in bosons, and again as in the
example sheet, we find
pV = NkT
1 +
λ
3
N
4
2g
s
V
+ ···
.
We see the sign there is positive. So Fermi statistics increases pressure. This is
the opposite of what we saw for bosons, which makes sense.
We now go to the opposite extreme, with zero temperature. Then
β
=
.
So
1
e
β(Eµ)
+ 1
=
(
1 E < µ
0 E > µ
Note that here the
µ
is evaluated at 0, as
µ
depends on
T
. Clearly the value of
µ at T = 0 is important.
Definition (Fermi energy). The Fermi energy is
E
f
= µ(T = 0) = lim
T 0
µ(T, V, N).
Here we see that at
T
= 0, all states with
E E
f
are occupied, and states
with higher energy are not occupied. Now it is easy to work out what this Fermi
energy is, because we know the total number of particles is N. We have
N =
Z
E
f
0
dE g(E) =
g
s
V
6π
2
2m
~
2
3/2
E
3/2
f
.
Inverting this, we find that
E
f
=
~
2
2m
6π
2
g
s
N
V
2/3
.
We note that this is very much like the critical temperature of the Bose–Einstein
distribution. We can define the characteristic temperature scale
T
f
=
E
f
k
.
Now whenever we talk about whether the temperature is high or low, we mean
relative to this scale.
How high is this temperature? We saw that for bosons, this temperature is
very low.
Example. For
3
He, the Fermi temperature is
T
f
4.3 K.
While this is very low compared to everyday temperatures, it is pretty high
compared to our previous T
c
.
For electrons in a metal, we have
T
f
10
4
K
. This is very high! Thus, when
studying metals, we cannot take the classical limit.
For electrons in a white dwarf stars, we have
T
f
10
7
K
! One reason this is
big is that
m
is the electron mass, which is very tiny. So it follows that
E
f
is
very large.
From the Fermi energy, we can define a momentum scale
~k
f
= (2mE
f
)
1/2
.
While we will not be using this picture, one way to understand this is that in
momentum space, at
T
= 0, all states with
|k| k
f
are occupied, and these
states are known as a Fermi sea. States at the boundary of this region are known
as Fermi surface.
Let’s continue working at zero temperature and work out the equations of
state. We have
E =
Z
E
f
0
dE Eg(E) =
3
5
NE
f
.
This is the total energy at zero temperature. We saw earlier that
pV =
2
3
E =
2
5
NE
f
.
This shows the pressure is non-zero, even at zero temperature! This is called the
degeneracy pressure. This is pressure coming from the Fermi statistics, which is
completely unlike the bosons.
Now let’s see what happens when we are at low, but non-zero temperature,
say
T T
f
. While we call this low temperature, since we saw that
T
f
can be
pretty high, this “low temperature” might still be reasonably high by everyday
standards.
In this case, we have
µ
kT
E
f
kT
=
T
f
T
1.
Therefore we find that z 1.
Our goal now is to compute the heat capacity. We begin by trying to plot
the Fermi-Dirac distribution at different temperature scales based on heuristic
arguments. At T = 0, it is simply given by
E
n(E)
E
f
When we increase the temperature a bit, we would expect something that looks
like
E
n(E)
E
f
kT
Since
kT
is small compared to
E
f
, that transition part is very small. Only states
within kT of E
f
are affected by T .
We can now give a heuristic argument for what the capacity should look like.
The number of states in that range is
g
(
E
f
)
kT
. Relative to where they are in
T
= 0, each of these is contributing an excess energy of order
kT
. Therefore the
total energy is
E E|
T =0
+ g(E
f
)(kT )
2
.
So we expect the heat capacity to be
C
V
k
2
g(E
f
)T kN
T
T
f
.
This rather crude argument suggests that this heat capacity is linear in tempera-
ture. We shall now show that this is indeed correct by direct calculation. It is
the same kind of computations as we did for bosons, except it is slightly more
complicated. We have
N
V
=
g
s
4π
2
2m
~
2
2/3
Z
0
dE
E
1/2
z
1
e
βE
+ 1
.
Similarly, we find
E
V
=
g
s
4π
2
2m
~
2
3/2
Z
0
dE E
3/2
z
1
e
βE
+ 1
.
We again do a substitution x = βE, and then we find
N
V
=
g
s
λ
3
f
3/2
(z)
E
V
=
3
2
g
s
λ
3
kT f
5/2
(z),
where we similarly have
f
n
(z) =
1
Γ(n)
Z
0
dx
x
n1
z
1
e
x
+ 1
.
Recall that we have
Γ
3
2
=
π
2
, Γ
5
2
=
3
π
4
.
We know that
z
1, and we want to expand
f
n
in
z
. This is called the
Sommerfeld expansion. We have
Γ(n)f
n
(z) =
Z
βµ
0
dx
x
n1
z
1
e
x
+ 1
+
Z
βµ
x
n1
z
1
e
x
+ 1
| {z }
(2)
.
We can rewrite the first term as
Z
βµ
0
dx x
n1
1
1
1 + ze
x
=
(log z)
n
n
Z
βµ
0
x
n1
1 + ze
x
| {z }
(1)
,
using the fact that z = e
βµ
.
We now see that the two remaining integrals (1) and (2) look very similar.
We are going to make a change of variables to make them look more-or-less the
same. We let
η
1
= βµ x
in (1), and
η
2
= x βµ
in (2). Then we find that
Γ(n)f
n
(z) =
(log z)
n
n
Z
βµ
0
dη
1
(βµ η
1
)
n1
1 + e
η
1
+
Z
0
dη
2
(βµ + η
2
)
n1
e
η
2
+ 1
.
Since
βµ
1, we may approximate the first integral in this expression by
R
0
.
The error is e
βµ
1
z
1, so it is fine.
Calling η
1
= η
2
= η, we get
Γ(n)f
n
(z) =
(log z)
n
n
+
Z
0
dη
(βµ + η)
n1
(βµ η)
n1
1 + e
η
.
Let’s now look at the numerator in this expression here. We have
(βµ + η)
n1
(βµ η)
n1
= (βµ)
n1
"
1 +
η
βµ
n1
1
η
βµ
n1
#
= (βµ)
n1

1 +
(n 1)η
βµ
+ ···
+
1
(n 1)η
βµ
+ ···

= 2(n 1)(βµ)
n2
η
1 + O
η
βµ

.
We just plug this into our integral, and we find
Γ(n)f
n
(z) =
(log z)
n
n
+ 2(n 1)(log z)
n2
Z
0
dη
η
e
η+1
+ ··· .
We have to be slightly careful. Our expansion was in
η
βµ
, and while
βµ
is large,
µ
can get arbitrarily large in the integral. Fortunately, for large
µ
, the integrand
is exponentially suppressed.
We can compute
Z
0
dη
η
e
η
+ 1
=
Z
0
dη
ηe
η
1 + e
η
=
Z
0
dη η
X
m=1
e
(1)
m+1
=
X
m=1
(1)
m+1
m
2
=
π
2
12
.
Note that it was valid to swap the sum and integral because the sum is absolutely
convergent.
So we find that
f
n
(z) =
(log z)
n
Γ(n + 1)
1 +
π
2
6
n(n 1)
(log z)
2
+ ···
.
Hence we find that
N
V
=
g
s
6π
2
~
3
(2)
3/2
1 +
π
2
8
kT
µ
2
+ ···
!
. ()
Recall that the reason we did this is that we didn’t like
µ
. We want to express
everything in terms of
N
instead. It is an exercise on the example sheet to invert
this, and find
µ = E
f
1
π
2
12
T
T
f
2
+ ···
!
. ()
We can plug this into the epxression of
E
V
and obtain
E
V
=
g
s
10π
2
~
3
(2m)
3/2
µ
5/2
1 +
5π
2
8
kT
µ
2
+ ···
!
.
Dividing by () and using (), we find
E
N
=
3E
f
5
1 +
5π
2
12
T
T
f
2
+ ···
!
.
We can then differentiate this with respect to
T
, and find the heat capacity to
be
C
V
E
T
V,N
=
π
2
2
Nk
T
T
f
+ ··· ,
and this is valid for T T
f
.
Heat capacity of metals
In a metal, we have a lattice of positive charges, and electrons that are free to
move throughout the lattice. We can model them as a Fermi ideal gas. As we
mentioned previously, we have
T
f
10
4
K.
So we usually have
T T
f
. In particular, this means any classical treatment of
the electrons would be totally rubbish! We have the lattice as well, and we have
to think about the phonons. The Debye temperature is
T
D
10
2
K.
Now if
T
D
T T
f
,
then, assuming an equal number of electrons and atoms, we have
C
V
π
2
2
Nk
T
T
f
| {z }
electrons
+3Nk 3Nk.
This agrees with the Dulong–Petit result.
But if
T T
D
, T
f
, then we need to use the low energy formula for the
phonons, and we get
C
V
π
2
2
Nk
T
T
f
+
12π
4
5
Nk
T
T
0
3
.
In this case, we cannot just throw the terms away. We can ask ourselves when
the contributions of the terms are comparable. When are their contributions
comparable. This is not hard to figure out. We just have to equate them and
solve for T , and we find that this happens at
T
2
=
5T
3
D
24π
3
T
f
.
If we plug in actual values, we find that this is just a few Kelvins.
While this agrees with experiments very well, there are some flaws. Most
notably, we neglected interactions with the lattice. We need to correct for a
periodic potential, and is not too difficult. This is addressed in the AQM course.
Even worse, we are ignoring the interaction between electrons! It is a puzzle
why our model works despite ignoring these. This is explained by the Landau–
Fermi liquid theory.
White dwarf stars
Another important application of this is to astrophysics. A star is a big ball
of hot gas that is not collapsing under gravity. Why does it not collapse under
gravity?
In our sun, there are nuclear reactions happening in the core of the star,
and these produce pressure that resists gravitational collapse. But this process
eventually ends. As the star runs out of fuel, the star will shrink. As this
happens, the density of the star increases, and thus the Fermi energy
E
f
for
electrons increase. Eventually this becomes so big that it exceeds the energy
required to ionize the atoms in the star. This happens at around
T
f
10
7
K
.
At this stage, then, the electrons are no longer bound to the atoms, and all
electrons can move through the star. We can therefore attempt to model these
electrons as a low temperature Fermi gas (because T
f
is now very high).
One important feature of Fermi gases is that they have pressure even at zero
temperature. In a white dwarf star, the degeneracy pressure of this gas supports
the star against gravitational collapse, even if the star cools down to
T
= 0! A
particular important example of a star we think will end this way is our own
sun.
Let’s try and understand some properties of these stars. We construct a very
crude model of a white dwarf star by treating it as a ball of constant density. In
addition to the kinds of energies we’ve talked about, there is also gravitational
energy. This is given by
E
grav
=
3GM
2
5R
.
This is just a statement about any star of constant density. To see this, we ask
ourselves what energy it takes to destroy the star by moving shells of mass
δM
off to infinity. The work done is just
δW = PE =
GM
R
δM.
So we have
dW
dM
=
GM
R(M)
=
GM
(M/(
4
3
πρ))
1/3
.
where
ρ
is the fixed density. Solving this, and using the expression of
R
in terms
of M gives
E
grav
= W =
3GM
2
5R
.
Let’s see what stuff we have got in our star. We have
N
electrons, hence
N
protons. This gives
αN
neutrons for some
α
1. In the third example sheet,
we argue that we can ignore the effect of the protons and neutrons, for reasons
related to their mass.
Thus, we are left with electrons, and the total energy is
E
total
= E
grav
+ E
kinetic
,
We can then ask ourselves what radius minimizes the energy. We find that
R M
1/3
N
1/3
.
In other words, the more massive the star is, the smaller it is! This is pretty
unusual, because we usually expect more massive things to be bigger.
Now we can ask ourselves how massive can the star be? As the mass goes
up, the radius increases, and the density goes up. So
E
f
goes up. For more and
more massive stars, the Fermi energy is larger and larger. Eventually, this
E
f
becomes comparable to the rest mass of the electron, and relativity becomes
important.
Consider E
f
mc
2
. Then we have E mc
2
, and we can expand
g(E)
V
π
2
~
3
c
3
E
2
m
2
c
4
2
+ ···
,
where we plugged in g
s
= 2. We can work out the number of electrons to be
N
Z
E
f
0
dE g(E) =
V
π
2
~
3
c
2
1
3
E
3
f
m
2
c
4
2
.
Here we are assuming that we are at “low” energies, because the Fermi tempera-
ture is very high.
Recall this is the equation that gives us what the Fermi energy is. We can
invert this to get
E
f
= ~c
3π
2
N
V
1/3
+ O
N
V
1/3
!
.
Similarly, we find
E
kinetic
Z
E
f
0
dE Eg(E)
=
V
π
2
~
3
c
3
1
4
E
4
f
m
2
c
2
4
E
2
f
+ ···
=
~c
4π
(3π
2
N)
4/3
V
1/3
+ O
V
N
V
2/3
!
.
We now note that
V
=
4
3
πR
3
, and
m m
p
m
n
, where
m, m
p
, m
n
are the
masses of an electron, proton and neutron respectively. Then we find that
E
total
=
3~c
4
9πM
4
4(α + 1)
4
m
4
p
1/3
3GM
2
5
!
1
R
+ O(R).
This is the total energy once the electrons are very energetic. We can plot what
this looks like. If the coefficient of
1
R
is positive, then we get something that
looks like this:
R
E
There is a stable minimum in there, and this looks good. However, if the
coefficient is negative, then we go into trouble. It looks like this:
R
E
This is bad! This collapses to small
R
. In other words, the degeneracy of the
electrons is not enough to withstand the gravitational collapse.
For stability, we need the coefficient of
1
R
to be positive. This gives us a
maximum possible mass
M
c
~c
G
3/2
1
m
2
p
.
This is called the Chandrasekhar limit.
Should we believe this? The above calculation neglects the pressure. We
assumed the density is constant, but if this were true, the pressure is also
constant, but this is nonsense, because a constant pressure means there is no
force!
A more careful treatment shows that
M
c
1.4M
sun
.
What if we are bigger than this? In the case of a very very very dense star, we
get a neutron star, where it is the neutrons that support the star. In this case,
we have protons combining with electrons to get neutrons. This would involve
actually doing some general relativity, which we will not do this.
This neutron star also has a maximum mass, which is now
M
max
25M
sun
.
If we are bigger than this, then we have to either explode and lose a lot of mass,
or become a black hole.
3.8 Pauli paramagnetism
What happens when we put a gas of electrons (e.g. electrons in a metal) in a
magnetic field? There are two effects the Lorentz force, which is
VB
, and
the spin of the electron coupling to B. We will examine the second first.
For simplicity, we assume there is a constant
B
field in the
z
-direction, and
the electron can either be aligned or anti-aligned. The electron given by the spin
is
E
spn
= µ
B
Bs,
where
s =
(
1 spin up
1 spin down
Here
µ
B
=
|e|~
2mc
.
The Bohr magnetization. We see that being aligned gives negative energy,
because the electron has negative charge.
Due to the asymmetry, the
and
states have different occupation numbers.
They are no longer degenerate. Again, we have
N
V
=
1
4π
2
2m
~
2
3/2
Z
0
dE
E
1/2
e
β(E+µ
B
Bµ)
+ 1
=
1
λ
3
f
3/2
(ze
βµ
B
B
).
Similarly, for the down spin, we have
N
V
=
1
λ
3
f
3/2
(ze
+βµ
B
B
).
Compared to what we have been doing before, we have an extra external
parameter
B
, which the experimenter can vary. Now in the microcanonical
ensemble, we can write
E = E(S, V, N, B).
As before, we can introduce a new quantity
Definition (Magnetization). The magnetization is
M =
E
B
S,V,N
.
Using this definition, we can write out a more general form of the first law:
dE = T dS p dV + µ dN M dB.
In the canonical ensemble, we worked with the free energy
F = E T S.
Then we have
dF = S dT p dV + µ dN M dB.
Thus, we find that
M =
F
B
T,V,N
.
The grand canonical potential is
Φ = E T S µ dN.
So we find
dΦ = S dT p dV N dµ M dB.
So we see that in the grand canonical ensemble, we can write
M =
Φ
B
T,V
.
This is what we want, because we are working in the grand canonical ensemble.
How do we compute the magnetization? Recall that we had an expression
for Φ in terms of Z. This gives us
M = kT
log Z
B
T,V
.
So we just have to compute the partition function. We have
log Z =
Z
0
dE g(E)
h
log
1 + ze
β(E+µ
B
B)
+ log
1 + ze
β(Eµ
B
B)
i
.
To compute the magnetization, we simply have to differentiate this with respect
to B, and we find
M = µ
B
(N
N
) =
µ
B
V
λ
3
(f
3/2
(ze
βµ
B
B
) f
3/2
(ze
βµ
B
B
)).
So we see that the magnetization counts how many spins are pointing up or
down.
As before, this function
f
3/2
is not very friendly. So let’s try to approximate
it in different limits.
First suppose
ze
±βµ
B
B
1. Then for a small argument, then is essentially
what we had in the high temperature limit, and if we do that calculation, we
find
f
3/2
(ze
±βµ
B
B
) = ze
±βµ
B
B
.
Plugging this back in, we find
M
2µ
B
V
λ
3
z sinh(βµ
B
B).
This expression still involves
z
, and
z
is something we don’t want to work with.
We want to get rid of it and work with particle number instead. We have
N = N
+ N
2V z
λ
3
cosh(βµ
B
B).
Now we can ask what our assumption
ze
±βµ
B
B
1 means. It implies this
cosh
is small, and so this means
λ
3
N
V
1.
It also requires high temperature, or weak field, or else the quantity will be big
for one choice of the sign. In this case, we find
M µ
B
N tanh(βµ
B
B).
There is one useful quantity we can define from
M
, namely the magnetic
susceptibility.
Definition (Magnetic susceptibility). The magnetic susceptibility is
χ =
M
β
T,V,N
.
This tells us how easy it is to magnetize a substance.
Let’s evaluate this in the limit B = 0. It is just
χ|
B=0
=
Nµ
2
B
kT
.
The fact that χ is proportional to
1
T
is called Curie’s law .
Let’s now look at the opposite limit. Suppose
ze
±βµ
B
B
1. Recall that for
large z, we had the result
f
n
(z)
(log z)
n
Γ(n + 1)
.
Then this gives
f
3/2
(ze
±βµ
B
B
)
[β(µ ± µ
B
B)]
3/2
Γ(5/2)
=
(βµ)
3/2
Γ(5/2)
1 ±
3µ
B
B
2µ
+ ···
.
To get N , we need to sum these things up, and we find
N = N
+ N
2V
λ
3
(βµ)
3/2
Γ(5/2)
.
So we find that
µ E
f
. Taking the
log
of our assumption
ze
±βµ
B
B
1, we
know
β(µ ± µ
B
B) 1,
which is equivalent to
T
f
±
µ
B
B
k
T.
Since this has to be true for both choices of sign, we must have
T
f
µ
B
|B|
k
T.
In particular, we need
T T
f
. So this is indeed a low temperature expansion.
We also need
µ
B
|B| < kT
f
= E
f
.
So the magnetic field cannot be too strong. It is usually the case that
µ
B
|B|
E
f
.
In this case, we find that
M
µ
B
B
λ
3
(βµ)
3/2
Γ(5/2)
3µ
B
B
µ
µ
2
B
V
2π
2
2m
~
2
3/2
E
1/2
f
B
= µ
2
B
g(E
f
)B.
This is the magnetization, and from this we get the susceptibility
χ µ
2
B
g(E
f
).
This is the low temperature result. We see that
χ
approaches a constant, and
doesn’t obey Curie’s law.
There is a fairly nice heuristic explanation for why this happens. Suppose we
start from a zero
B
field, and we switch it on. Now the spin down electrons have
lower energy than the spin up electrons. So the spin up electrons will want to
become spin down, but they can’t all do that, because of Fermi statistics. It is
only those at the Fermi surface who can do so. Hence the magnetic susceptibility
is proportional to how many things there are on the Fermi surface.
Does this match what we see? The first thing we see is that
χ
is always
non-negative in our expression. Substances with
χ >
0 are called paramagnetic.
These are not permanently magnetic, but whenever we turn on a magnetic
field, they become magnetized. They are weakly attracted by a magnetic field.
Examples of such substances include aluminium.
We can also have paramagnetism coming from other sources, e.g. from the
metal ions. Most paramagnetic substances actually have complicated contribu-
tions from all sorts of things.
Landau diamagnetism*
There is another effect we haven’t discussed yet. The magnetic field induces a
Lorentz force, and this tends to make electrons go around in circles. We will not
go into full details, as this is non-examinable. The Hamiltonian is
H =
1
2m
p +
e
c
A(x)
2
,
where p is the momentum, e is the charge, and A is the magnetic potential,
B = × A.
To understand the effect of the
B
-field, we can look at what happens classically
with this B-field. We look at the 1-particle partition function
Z
1
=
1
(2π~)
3
Z
d
3
x d
3
p e
βH(x,p)
.
By making a change of variables in this integral,
p
0
= p +
e
c
A(x),
then we can get rid of
A
, and the partition function
Z
1
is independent of
A
!
Classically, there is no magnetization.
However, when we go to a quantum treatment, this is no longer true. This
leads to Landau levels, which is discussed more thoroughly in IID AQM. When
we take these into account, and see what happens at
T T
f
, then we find that
M
µ
2
B
3
g(E
f
)B.
This gives a negative susceptibility. This gives an opposite effect to that coming
from spins. But since we have the factor of
1
3
, we know the net effect is still
positive, and we still have M
total
, χ
total
> 0. The net effect is still positive.
However, there do exists substances that have negative
χ
. They are repelled
by magnetic fields. These are called diamagnetic. Bismuth is an example of a
substance with such effects.
4 Classical thermodynamics
In this chapter, we are going to forget about statistical physics, and return to
the early 19th century, when we didn’t know that things are made up of atoms.
Instead, people just studied macroscopic objects and looked at macroscopic
properties only.
Why are we doing this? Now that we have statistical physics, we can start
from microscopic properties and then build up to the macroscopic properties.
The problem is that we can’t always do so. So far, we have been working with
rather simple systems, and managed to derive their macroscopic properties from
the microscopic ones. However, in real life, a lot of systems have very complex
interactions, and we cannot expect to work with it sensibly. However, if we
develop theory only on the macroscopic scale, then we can be sure that these
apply to all system whatsoever.
4.1 Zeroth and first law
We begin by defining some words.
Definition (Wall). A wall is a rigid boundary that matter cannot cross.
Definition
(Adiabatic wall)
.
Adiabatic walls isolate the system completely from
external influences, i.e. the system is insulated.
Definition
(Diathermal wall)
.
A non-adiabatic wall is called diathermal. Sys-
tems separated by a diathermal wall are said to be in thermal contact.
Definition
(Equilibrium)
.
An isolated system with a time-independent state is
said to be in equilibrium.
Two systems are said to be in equilibrium if when they are put in thermal
contact, then the whole system is in equilibrium.
We will assume that a system in equilibrium can be completely specified by
a few macroscopic variables. For our purposes, we assume our system is a gas,
and We will take these variables to be pressure
p
and volume
V
. Historically,
these systems are of great interest, because people were trying to build good
steam engines.
There will be 4 laws of thermodynamics, which, historically, were discovered
experimentally. Since we are actually mathematicians, not physicists, we will
not perform, or even talk about such experiments. Instead, we will take these
laws as “axioms” of the subject, and try to derive consequences of them. Also,
back in the days, physicists were secretly programmers. So we start counting at
0.
Law
(Zeroth law of thermodynamics)
.
If systems
A
and
B
are individually in
equilibrium with C, then A and B are in equilibrium.
In other words, “equilibrium” is an equivalence relation (since reflexivity and
symmetry are immediate).
In rather concise terms, this allows us to define temperature.
Definition
(Temperature)
.
Temperature is an equivalence class of systems with
respect to the “equilibrium” relation.
More explicitly, the temperature of a system is a quantity (usually a number)
assigned to each system, such that two systems have the same temperature iff
they are in equilibrium. If we assume any system is uniquely specified by the
pressure and volume, then we can write the temperature of a system as
T
(
p, V
).
We have a rather large freedom in defining what the temperature of a system
is, as a number. If
T
is a valid temperature-assigning function, then so is
f
(
T
(
p, V
)) for any injective function
f
whatsoever. We can impose some further
constraints, e.g. require that
T
is a smooth function in
p
and
V
, but we are still
free to pick almost any function we like.
We will later see there is a rather natural temperature scale to adopt, which
is well defined up to a constant, i.e. a choice of units. But for now, we can just
work with an abstract “temperature” function T .
We can now move on in ascending order, and discuss the first law.
Law
(First law of thermodynamics)
.
The amount of work required to change
an isolated system from one state to another is independent of how the work is
done, and depends only on the initial and final states.
From this, we deduce that there is some function of state
E
(
p, V
) such that
the work done is just the change in E,
W = ∆E.
For example, we can pick some reference system (
p
0
, V
0
), and define
E
(
p, V
) to
be the work done required to get us from (p
0
, V
0
) to (p, V ).
What if the system is not isolated? Then in general
E 6
=
W
. We account
for the difference by introducing a new quantity Q, defined by
E = Q + W.
it is important to keep in mind which directions these quantities refer to.
Q
is the
heat supplied to the system, and
W
is the work done on the system. Sometimes,
this relation
E
=
Q
+
W
is called the first law instead, but here we take it as
the definition of Q.
It is important to note that
E
is a function of state it depends only on
p
and
V
. However,
Q
and
W
are not. They are descriptions of how a state
changes to another. If we are just given some fixed state, it doesn’t make sense
to say there is some amount of heat and some amount of work in the state.
For an infinitesimal change, we can write
dE = ¯dQ + ¯dW.
Here we write the
¯d
with a slash to emphasize it is not an exact differential in
any sense, because Q and W aren’t “genuine variables”.
Most of the time, we are interested in studying how objects change. We will
assign different labels to different possible changes.
Definition
(Quasi-static change)
.
A change is quasi-static if it is done so slowly
that the system remains in equilibrium throughout the change.
Definition
(Reversible change)
.
A change is reversible if the time-reversal
process is possible.
For example, consider a box of gases with a frictionless piston. We can very
very slowly compress or expand the gas by moving the piston.
If we take pressure to mean the force per unit area on the piston, then the work
done is given by
¯dW = p dV.
Now consider the following two reversible paths in pV space:
V
p
A
B
The change in energy
E
is independent of the path, as it is a function of state.
On the other hand, the work done on the gas does it is
Z
¯dW =
Z
p dV.
This is path-dependent.
Now if we go around a cycle instead:
V
p
A
B
then we have E = 0. So we must have
I
¯dQ =
I
p dV.
In other words, the heat supplied to the gas is equal to the work done by the
gas. Thus, if we repeat this process many times, then we can convert heat to
work, or vice versa. However, there is some restriction on how much of this we
can perform, and this is given by the second law.
4.2 The second law
We want to talk about the second law, but we don’t know about entropy yet.
The “original” versions of the second law were much more primitive and direct.
There are two versions of the second law, which we will show are equivalent.
Law
(Kelvin’s second law)
.
There exists no process whose sole effect is to extract
heat from a heat reservoir and convert it into work.
Law
(Clausius’s second law)
.
There is no process whose sole effect is to transfer
heat from a colder body to a hotter body.
Both of these statements are motivated by experiments, but from the point
of view of our course, we are just treating them as axioms.
Note that the words “sole effect” in the statement of the second laws are
important. For example, a fridge transfers heat from a colder body to a hotter
body. But of course, a fridge needs work input we need to get electricity from
somewhere.
Since they are given the same name, the two statements are actually equiv-
alent! This relies on some rather mild assumptions, namely that fridges and
heat engines exist. As mentioned, fridges take in work, and transfer heat from a
colder body to a hotter body. Heat engines transfer heat from a hotter body to a
cooler body, and does work in the process. We will later construct some explicit
examples of such machines, but for now we will just take it that they exist.
Proposition. Clausius’s second law implies Kelvin’s second law.
Proof.
Suppose there were some process that violated Kelvin’s second law. Let’s
use it to run a fridge:
hot reservoir
cold reservoir
not
Kelvin
fridge
Q
H
W = Q
H
Q
C
Q
C
+Q
H
and this violates Clausius’s law.
Similarly, we can prove the other direction.
Proposition. Kelvin’s second law implies Clausius’s second law.
Proof. If we have a not Clausius machine, then we can do
hot reservoir
cold reservoir
heat
engine
not
Clausius
Q
H
Q
H
W
W
Q
H
W
Q
H
W
Then this has a net effect of taking heat
W
from the hot reservoir and done
work W . This violates Kelvin’s law.
4.3 Carnot cycles
We now construct our “canonical” examples of a heat engine, known as the
Carnot cycle. The important feature of this heat engine is that it is reversible,
so we can run it backwards, and turn it into a fridge.
The system consists of a box of gas that can be expanded or compressed.
The Carnot cycle will involve compressing and expanding this box of gas at
different environments in a clever way in order to extract work out of it. Before
we explain how it works, we can plot the “trajectory” of the system on the
p
-
V
plane as follows:
V
p
C
B
D
A
T = T
H
T = T
C
We start at point
A
, and perform isothermal expansion to get to
B
. In
other words, we slowly expand the gas while in thermal contact with the
heat reservoir at hot temperature
T
H
. As it expands, the gas is doing work,
which we can use to run a TV. Since it remains in constant temperature,
it absorbs some heat Q
H
.
When we reach
B
, we perform adiabatic expansion to get to
C
. To do this,
we isolate the system and allow it to expand slowly. This does work, and
no heat is absorbed.
At
C
, we perform isothermal compression to get to
D
. We put the system
in thermal contact with the cold reservoir at the cold temperature
T
C
.
Then we compress the gas slowly. This time, we are doing work on the gas,
and as we compress it, it gives out heat Q
C
to the cold reservoir.
From
D
to
A
, we perform adiabatic compression. We compress the gas
slowly in an isolated way. We are doing work on the system. Since it is
isothermal, no heat is given out.
Note that we are not just running in a cycle here. We are also transferring heat
from the hot reservoir to the cold reservoir.
Since these combined things form a cycle, we saw that we can write
I
¯dW =
I
¯dQ.
The total amount of work done is
W = Q
H
Q
C
.
If we are building a steam engine, we want to find the most efficient way of
doing this. Given a fixed amount of
Q
H
, we want to minimize the heat
Q
C
we
return to the cold reservoir.
Definition (Efficiency). The efficiency of a heat engine is
η =
W
Q
H
= 1
Q
C
Q
H
.
Ideally, we would want to build a system where
η
= 1. This is, of course,
impossible, by Kelvin’s second law. So we can ask what is the best we can
do? The answer is given by Carnot’s theorem.
Theorem.
Of all engines operating between heat reservoirs, reversible engines
are the most efficient. In particular, all reversible engines have the same efficiency,
and is just a function of the temperatures of the two reservoirs.
Proof.
Consider any other engine, Ivor. We use this to drive Carnot (an arbitrary
reversible engine) backwards:
hot reservoir
cold reservoir
Ivor
reverse
Carnot
Q
0
H
Q
0
C
W
Q
C
Q
H
Now we have heat
Q
0
H
Q
H
extracted from the hot reservoir, and
Q
0
C
Q
C
deposited in the cold, and we have
Q
0
H
Q
H
= Q
0
C
Q
C
.
Then Clausius’ law says we must have
Q
0
H
Q
H
0. Then the efficiency of
Igor is
η
Ivor
=
Q
0
H
Q
0
C
Q
0
H
=
Q
H
Q
C
Q
0
H
Q
H
Q
C
Q
H
= η
Carnot
.
Now if Ivor is also reversible, then we can swap the argument around, and deduce
that η
Carnot
η
Ivor
. Combining the two inequalities, they must be equal.
In this case, we know
η
is just a function of
T
H
and
T
C
. We call this
η(T
H
, T
C
). We can use this to define T .
To do so, we need three heat reservoirs,
T
1
> T
2
> T
3
, and run some Carnot
engines.
T
1
T
2
T
3
Carnot
Carnot
Q
1
Q
2
Q
2
Q
3
W
W
0
Then we have
Q
2
= Q
1
(1 η(T
1
, T
2
))
and thus
Q
3
= Q
2
(1 η(T
2
, T
3
)) = Q
1
(1 η(T
1
, T
2
))(1 η(T
2
, T
3
)).
But since this composite Carnot engine is also reversible, we must have
Q
3
= Q
1
(1 η(T
1
, T
3
)).
So we deduce that
1 η(T
1
, T
3
) = (1 η(T
1
, T
2
))(1 η(T
2
, T
3
)). ()
We now fix a T
. We define
f(T ) = 1 η(T, T
)
g(T ) = 1 η(T
, T ).
So we have
1 η(T, T
0
) = f(T )g(T
0
).
Plugging this into (), we find
f(T
1
)g(T
3
) = f(T
1
)g(T
2
)f(T
2
)g(T
3
).
So we find that
g(T
2
)f(T
2
) = 1
for any T
2
.
Thus, we can write
1 η(T, T
0
) = f(T )g(T
0
) =
f(T )
f(T
0
)
.
The ideal is to use this to define temperature. We now define T such that
f(T )
1
T
.
In other words, such that a reversible engine operating between
T
H
and
T
C
has
η = 1
T
C
T
H
.
Of course, this is defined only up to a constant. We can fix this constant by
saying the “triple point” of water has
T = 273.16 K.
The triple point is a very precisely defined temperature, where water, ice and
water vapour are in equilibrium. We will talk about this later when we discuss
phase transitions.
This is rather remarkable. We started from a rather wordy and vague
statement of the second law, and we came up with a rather canonical way of
defining temperature.
4.4 Entropy
We’ve been talking about the second law in the previous chapter all along, but
we didn’t mention entropy! Where does it come from?
Recall that in the Carnot cycle, we had
η = 1
Q
C
Q
H
= 1
T
C
T
H
.
So we get
Q
H
T
H
=
Q
C
T
C
.
But
Q
H
is the heat absorbed, and
Q
C
is the heat emitted. This is rather
asymmetric. So we set
Q
1
= Q
H
T
1
= T
H
Q
2
= Q
C
T
2
= T
C
.
Then both Q
i
are the heat absorbed, and then we get
2
X
i=1
Q
i
T
i
= 0.
We now consider a more complicated engine.
V
p
C
B
D
A
E
F
G
We want to consider AEFGCD as a new engine. We saw that Carnot ABCD had
Q
AB
T
H
+
Q
CD
T
C
= 0. (1)
Similarly, Carnot EBGF gives
Q
EB
T
H
+
Q
GF
T
M
= 0. (2)
But we have
Q
AB
= Q
AE
+ Q
EB
, Q
GF
= Q
F G
.
So we can subtract (1) (2), and get
Q
AE
T
H
+
Q
F C
T
M
+
Q
CD
T
C
= 0.
But this is exactly the sum of the
Q
T
for the Carnot cycle AEFGCD. So we again
find
3
X
i=1
Q
i
T
i
= 0.
If we keep doing these corner cuttings, then we can use it to approximate it any
simple closed curve in the plane. So we learn that
I
¯dQ
T
= 0
for any reversible cycle. This allows us to define a new function of state, called
the entropy.
Definition (Entropy). The entropy of a system at A = (p, V ) is given by
S(A) =
Z
A
0
¯dQ
T
,
where 0 is some fixed reference state, and the integral is evaluated along any
reversible path.
We next want to see that this entropy we’ve defined actually has the properties
of entropy we’ve found previously.
It is immediate from this definition that we have
T dS = ¯dQ.
We also see that for reversible changes, we have
p dV = ¯dW.
So we know that
dE = T dS p dV,
which is what we previously called the first law of thermodynamics.
Since this is a relation between functions of state, it must hold for any
infinitesimal change, whether reversible or not!
Since the same first law holds for the entropy we’ve previously defined, this
must be the same entropy as the one we’ve discussed previously, at least up to a
constant. This is quite remarkable.
Historically, this was how we first defined entropy. And then Boltzmann
came along and explained the entropy is the measure of how many states we’ve
got. No one believed him, and he committed suicide, but we now know he was
correct.
Let’s compare an irreversible Ivor and the Carnot cycle between
T
H
and
T
C
.
We assume the Carnot cycle does the same work as Ivor, say W . Now we have
W = Q
0
H
Q
0
C
= Q
H
Q
C
.
By Carnot’s theorem, since Carnot is more efficient, we must have
Q
0
H
> Q
H
.
Note that we have
Q
0
H
T
H
Q
0
C
T
C
=
Q
0
H
T
H
+
Q
H
Q
C
Q
0
H
T
C
=
Q
H
T
H
Q
C
T
C
+ (Q
0
H
Q
H
)
1
T
H
1
T
C
= (Q
0
H
Q
H
)
1
T
H
1
T
C
.
The first term is positive, while the second term is negative. So we see that the
sum of
Q
T
is negative.
The same method of “cutting off corners” as above shows that
I
¯dQ
T
0
for any cycle. This is called Clausius inequality.
Now consider two paths from
A
to
B
, say
I
and
II
. Suppose
I
is irreversible
and II is reversible.
V
p
A
B
II
I
Then we can run II backwards, and we get
Z
I
¯dQ
T
Z
II
¯dQ
T
=
I
¯dQ
T
0.
So we know that
Z
I
¯dQ
T
S(B) S(A).
Let’s now assume that
I
is adiabatic. In other words, the system is isolated and
is not absorbing any heat. So
¯dQ
= 0. So the left hand side of the inequality is
0. So we found that
S(B) S(A)
for isolated systems! This is precisely the way we previously stated the second
law previously.
If the change is reversible, then we obtain equality. As an example of this,
we look at the Carnot cycle. We can plot this in the
T S
plane. Here the Carnot
cycle is just a rectangle.
S
T
A B
CD
We see that the periods of constant entropy are exactly those when the system
is adiabatic.
We can look at some examples of non-reversible processes.
Example. Suppose we have a box of gas:
gas
vacuum
On the left, there is gas, and on the right, it is vacuum. If we remove the
partition, then the gas moves on to fill the whole box. If we insulate this good
enough, then it is adiabatic, but it is clearly irreversible. This is not a quasi-static
process.
In this scenario, we have
¯dQ
= 0 =
¯dW
. So we know that d
E
= 0, i.e.
E
is
constant. Except at low temperatures, we find that the temperature does not
change. So we find that
E
is a function of temperature. This is Joule’s law. We
can plot the points in a T S plane:
S
T
initial final
Note that it doesn’t make sense to draw a line from the initial state to the final
state, as the process is not quasi-static.
4.5 Thermodynamic potentials
So far, the state of gas is specified by
p
and
V
. These determine our other
quantities
T, E, S
. But we could instead use other variables. For example, when
drawing the T S plots, we are using T and S.
We can choose any 2 of
p, V, E, T, S
to specify the state. Recall that we had
dE = T dS p dV.
So it is natural to regard
E
=
E
(
S, V
), or
S
=
S
(
E, V
). Taking derivates of
these expressions, we find
E
S
V
= T,
E
V
S
= p.
In statical mechanics, we defined
T
and
p
this way, but here we derived it as a
consequence. It is good that they agree.
We now further differentiate these objects. Note that since derivatives are
symmetric, we know
2
E
V S
=
2
E
SV
.
This tells us that
T
V
S
=
p
S
V
.
Equations derived this way are known as Maxwell equations.
There are 3 more sets. We can, as before, define the Helmholtz free energy by
F = E T S,
and then we have
dF = S dT p dV.
So we view F = F (T, V ).
What does this
F
tell us? For a reversible change at constant temperature,
we have
F (B) F (A) =
Z
B
A
p dV.
We now use the fact that the change is reversible to write this as
R
B
A
¯dW
, which
is the work done on the system. So the free energy measures the amount of
energy available do work at constant temperature.
Using this, we can express
F
T
V
= S,
F
V
T
= p.
Again taking mixed partials, we find that
S
V
T
=
p
T
V
.
We obtained
F
from
E
by subtracting away the conjugate pair
T, S
. We can
look at what happens when we mess with another conjugate pair, p and V .
We first motivate this a bit. Consider a system in equilibrium with a reservoir
R
of fixed temperature
T
and pressure
p
. The volume of the system and the
reservoir R can vary, but the total volume is fixed.
We can consider the total entropy
S
total
(E
total
, V
total
) = S
R
(E
total
E, V
total
V ) + S(E, V ).
Again, since the energy of the system is small compared to the total energy of
the reservoir, we can Taylor expand S
R
, and have
S
total
(E
total
, V
total
) = S
R
(E
total
, V
total
)
S
R
E
V
E
S
R
V
E
V + S(E, V ).
Using the definition of T and p, we obtain
S
total
(E
total
, V
total
) = S
R
(E
total
, V
total
)
E + pV T S
T
.
The second law of thermodynamics says we want to maximize this expression.
But the first term is constant, and we are assuming we are working at fixed
temperature and pressure. So we should minimize
G = F + pV = E + pV T S.
This is the Gibbs free energy. This is important in chemistry, since we usually
do experiments in a test tube, which is open to the atmosphere. It is easy to
find that
dG = S dT + V dp.
So the natural variables for G are G = G(T, p). We again find
S =
G
T
p
, V =
G
p
T
.
Using the symmetry of mixed partials again, we find
S
p
T
=
V
T
p
.
We now look at the final possibility, where we just add
pV
and not subtract
T S
.
This is called the enthalpy
H = E + pV.
Then we have
dH = T dS + V dp.
So we naturally view H = H(S, p). We again have
T =
H
S
p
, V =
H
p
S
.
So we find
T
p
S
=
V
S
p
.
For the benefit of mankind, we collect all the Maxwell relations in the following
proposition
Proposition.
T
V
S
=
p
S
V
S
V
T
=
p
T
V
S
p
T
=
V
T
p
T
p
S
=
V
S
p
.
Ideal gases
We now return to gases. We previously defined ideal gases as those whose atoms
don’t interact. But this is a microscopic definition. Before we knew atoms
existed, we couldn’t have made this definition. How can we define it using
classical thermodynamics?
Definition
(Ideal gas)
.
An ideal gas is a gas that satisfies Boyle’s law, which
says
pV
is just a function of
T
, say
pV
=
f
(
T
), and Joule’s law, which says
E
is a function of T .
These are true for real gases at low enough pressure.
Let’s try to take this definition and attempt to recover the equation of state.
We have
dE = T dS p dV.
Since E is just a function of T , we know that
E
V
T
= 0.
In other words, plugging in the formula for dE, we find
T
S
V
T
p = 0.
Using the Maxwell equations, we find
T
p
T
p
p = 0.
Rearranging this tells us
p
T
V
=
p
T
.
Using Boyle’s law pV = f(T ), this tells us
f
0
(T )
V
=
f(T )
T V
.
So we must have
f(T ) = CT
for some C. So we have
pV = CT.
Since the left hand side is extensive, we must have
C N
. If we don’t know
about atoms, then we can talk about the number of moles of gas rather than
the number of atoms. So we have
C = kN
for some k. Then we obtain
pV = NkT,
as desired. In statistical physics, we found the same equation, where
k
is the
Boltzmann constant.
Carnot cycle for the ideal gas
Let’s now look at the Carnot cycle for ideal gases.
On AB, we have dT = 0. So dE = 0. So ¯dQ = ¯dW . Then we have
Q
H
=
Z
B
A
¯dQ =
Z
B
A
¯dW =
Z
B
A
p dV,
where we used the fact that the change is reversible in the last equality.
Using the equation of state, we have
Q
H
=
Z
B
A
NkT
V
dV = NkT
H
log
V
B
V
A
.
Similarly, we have
Q
C
= NkT
C
log
V
C
V
D
.
Along
BC
, since the system is isolated, we have
¯dQ
= 0. So d
E
=
¯dW
=
p
d
V
. From Joule’s law, we know
E
is just a function of temperature.
So we have
E
0
(T ) dT =
NkT
V
dV.
We now have a differential equation involving
T
and
V
. Dividing by
NkT
,
we obtain
E
0
(T )
NkT
dT =
dV
V
.
Integrating this from B to C, we find
Z
T
C
T
H
E
0
(T )
NkT
dT =
Z
C
B
dV
V
.
So we find
log
V
C
V
B
=
Z
T
H
T
C
E
0
(T )
NkT
dT.
Note that the right hand side depends only on the temperatures of the hot
and cold reservoirs. So we have
log
V
D
V
A
= log
V
C
V
B
.
In other words, we have
V
D
V
A
=
V
C
V
B
.
Alternatively, we have
V
C
V
D
=
V
B
V
A
.
Finally, we calculate the efficiency. We have
η = 1
Q
C
Q
H
= 1
T
C
T
H
,
as expected. Of course, this must have been the case.
We can again talk about heat capacities. We again define
C
V
=
E
T
V
= T
S
T
V
.
We can also define
C
p
= T
S
T
p
.
We can use the Maxwell relations to find that
C
V
V
T
= T
2
p
T
2
V
,
C
p
p
T
= T
2
V
T
2
p
.
More interestingly, we find that
C
p
C
V
= T
V
T
p
p
T
V
.
For an ideal gas, we have pV = NkT . Then we have
V
T
p
=
Nk
p
,
p
T
V
=
Nk
V
.
Plugging this into the expression, we find
C
p
C
V
=
T (Nk)
2
pV
= Nk.
So for any ideal gas, the difference between
C
p
and
C
V
is always the same. Note
that C
V
and C
p
themselves need not be constant.
4.6 Third law of thermodynamics
We end by a quick mention of the third law. The first three laws (starting from
0!) are “fundamental” in some sense, but the third one is rather different.
Law (Third law of thermodynamics). As T 0, we have
lim
T 0
S = S
0
,
which is independent of other parameters (e.g.
V, B
etc). In particular, the limit
is finite.
Recall that classically,
S
is only defined up to a constant. So we can set
S
0
= 0, and fix this constant.
What does this imply for the heat capacities? We have
C
V
= T
S
T
V
.
Integrating this equation up, we find that
S(T
2
, V ) S(T
1
, V ) =
Z
T
2
T
1
C
V
(T, V )
T
dT.
Let
T
1
0. Then the left hand side is finite. But on the right, as
T
0, the
1
T
term diverges. For the integral to be finite, we must have
C
V
0 as
T
0.
Similarly, doing this with C
p
, we find that
S(T
2
, p) S(T
1
, p) =
Z
T
2
T
1
C
p
(T, p)
T
dT.
Saying the same words, we deduce that the third law says
C
p
(
T, V
)
0 as
T 0.
But for an ideal gas, we know that
C
p
C
V
must be constant! So it cannot,
in particular, tend to 0. So if we believe in the third law, then the approximation
of an ideal gas must break down at low temperatures.
If we go back and check, we find that quantum gases do satisfy the third
law. The statistical mechanical interpretation of this is just that as
T
0, the
number of states goes to 1, as we are forced to the ground state, and so
S
= 0.
So this says the system has a unique ground state.
5 Phase transitions
In the remaining of the course, we are going to study phase transitions. This is
a discrete change in the properties of a system, characterized by discontinuities
in some of the variables describing the system. Macroscopically, these often
correspond to some rather drastic changes, such as freezing and boiling.
We have already met some examples of phase transitions before, such as
Bose–Einstein condensation. However, in this chapter, we are mainly going to
focus on two examples the liquid-gas transition, and the Ising model. At first
sight, these two examples should be rather unrelated to each other, but it turns
out they behave in a very similar way. In fact, it seems like all phase transitions
look somewhat alike. After examining these systems in some detail, our goal is
to understand where this similarity came from.
5.1 Liquid-gas transition
We are now going to understand the liquid-gas transition. This is sometimes
known as boiling and condensation. We will employ the van der Waal’s equation
of state,
p =
kT
v b
a
v
2
,
where
v =
V
N
is the volume per particle. This is only valid for low density gases, but let’s now
just ignore that, and assume we can use this for any density. This provides a
toy model of the liquid-gas phase transition.
Let’s look at isotherms of this system in the
pv
diagram. Note that in the
equation state, there is a minimum value of
v
, namely at
v
=
b
. Depending on
the values of T , the isotherms can look different:
v
p
T > T
c
T < T
c
b
There is a critical temperature
T
C
that separates two possible behaviours. If
T > T
C
, then the isotherms are monotonic, whereas when
T < T
C
, then there
are two turning points. At
T
=
T
C
, the two extrema merge into a single point,
which is a point of inflection. To find this critical point, we have to solve
p
v
T
=
2
p
v
2
T
= 0.
Using the equation of state, we can determine the critical temperature to be
kT
c
=
8a
27b
.
We are mainly interested in what happens when
T < T
C
. We see that there
exists a range of pressures where there are three states with the same
p, T
but
different V .
v
p
L U G
At U, we have
p
v
T
> 0.
Thus, when we increase the volume a bit, the pressure goes up, which makes
it expand further. Similarly, if we compress it a bit, then
p
decreases, and it
collapses further. So this state is unstable.
How about
L
? Here we have
v b
. So the atoms are closely packed. Also,
the quantity
p
v
T
is large. So this is very hard to compress. We call this a liquid.
There are no surprise to guessing what
G
is. Here we have
v b
, and
p
v
is small. So we get a gas.
Note that once we’ve made this identification with liquids and gases, we see
that above T
C
, there is no distinction between liquids and gases!
Phase equilibriums
What actually happens when we try to go to this region? Suppose we fix a
pressure
p
and temperature
T
. Ruling out the possibility of living at
U
, we
deduce that we must either be at
L
or at
G
. But actually, there is a third
possibility. It could be that part of the system is at
L
and the other part is at
G
.
We suppose we have two separate systems, namely
L
and
G
. They have the
same
T
and
p
. To determine which of the above possibilities happen, we have to
consider the chemical potentials µ
L
and µ
G
. If we have
µ
L
= µ
G
,
then as we have previously seen, this implies the two systems can live in equilib-
rium. Thus, it follows that we can have any combination of
L
and
G
we like,
and we will see that the actual combination will be dictated by the volume v.
What if they are not equal? Say, suppose
µ
L
> µ
G
. Then essentially by
definition of
µ
, this says we would want to have as little liquid as possible. Of
course, since the number of liquid molecules has to be non-negative, this implies
we have no liquid molecules, i.e. we have a pure gas. Similarly, if
µ
G
> µ
L
, then
we have a pure liquid state.
Now let’s try to figure out when these happen. Since we work at fixed
T
and
p, the right thermodynamic potential to use is the Gibbs free energy
G = E + pV T S.
We have previously discussed this for a fixed amount of gas, but we would now
let it vary. Then
dE = T dS p dV + µ dN.
So we have
dG = S dT + V dp + µ dN.
So we have
G
=
G
(
T, p, N
), and we also know
G
is extensive. As before, we can
argue using intensivity and extensivity that we must have
G = f(T, p)N
for some function f. Then we have
f =
G
N
T,p
= µ.
Therefore
µ(T, p) = g(T, p)
G
N
.
This is the Gibbs free energy per particle. Of course, this notation is misleading,
since we know
T
and
p
don’t uniquely specify the state. We should have a copy
of this equation for L, and another for G (and also U, but nobody cares).
Using the first law, we get
µ
p
T
=
1
N
G
p
T,N
=
V
N
= v(T, p). ()
This allows us to compute
µ
(
T, p
) (up to a constant). To do so, we fix an
arbitrary starting point
O
, and then integrate (
) along the isotherm starting at
O
. At some other point
Q
, we can write an equation for the chemical potential
µ
Q
= µ
O
+
Z
Q
0
dp v(T, p).
Geometrically, this integral is just the area between the isotherm and the
p
axis
between the two endpoints.
We will pick
O
to be a gas state of high volume (and hence low pressure).
Referring to the upcoming diagram, we see that as we go from
O
to
N
, the
pressure is increasing. So the integral increases. But once we reach
N
, the
pressure starts decreasing until we get
J
. It then keeps increasing again on
JM
.
v
p
J
N
O
M
K
We can now sketch what µ looks like.
p
µ
unstable
N
J
X
p(T )
p
1
p
2
K
M
gas
liquid
We see that at a unique point
X
, we have
µ
L
=
µ
S
. We define the vapour
pressure p(T ) to be the pressure at X.
Geometrically, the equilibrium condition is equivalent to saying
Z
L
G
dp v = 0.
Thus,
p
(
T
) is determined by the condition that the shaded regions have equal
area. This is known as the Maxwell construction.
v
p
We write N
L
for the number of particles in liquid phase. Then we have
N
G
= N N
L
= number of particles in G phase.
Then we have
G = N
L
g
L
+ (N N
L
)g
G
= N
L
µ
L
+ (N N
L
)µ
G
.
The second law tells us we want to try to minimize
G
. Consider the part of
the plot where
p < p
(
T
). Since
µ
G
< µ
L
, we find that
G
is minimized when
N
L
= 0. What does this mean? If we live in the bit of the liquid curve where
p < p
(
T
), namely
JX
, then we are not on an unstable part of the curve, as the
liquid obeys the stability condition
p
v
T
< 0.
However, it has higher Gibbs free energy than the gas phase. So it seems like we
wouldn’t want to live there. Thus, the liquid is locally stable, but not globally
stable. It is meta-stable. We can indeed prepare such states experimentally.
They are long-lived but delicate, and they evaporate whenever perturbed. This
is called a superheated liquid.
Dually, when
p > p
(
T
), global stability is achieved when
N
G
= 0. If we have
a gas living on XN , then this is supercooled vapour .
The interesting part happens when we look at the case
p
=
p
(
T
). Here if we
look at the equation for
G
, we see that
N
L
is undetermined. We can have an
arbitrary portion of L and G. To have a better picture of what is going on, we
go back to the phase diagram we had before, and add two curves. We define
the coexistence curve to be the line in the
pV
plane where liquid and gas are in
equilibrium, and the spinodal curve to be the line where
p
v
T
= 0.
v
p
The unstable states are the states inside the spinodal curve, and the metastable
ones are those in between the spinodal curve and coexistence curve. Now suppose
we only work with stable states. Then we want to remove the metastable states.
v
p
It seems like there is some sort of missing states in the curve. But really, there
aren’t. What the phase diagram shows is the plots of the pure states, i.e. those
that are purely liquid or gas. The missing portion is where
p
=
p
(
T
), and we
just saw that at this particular pressure, we can have any mixture of liquid and
gas. Suppose the volume per unit particle of the liquid and gas phases are
v
L
= lim
pp(T )
v(T, p), v
G
= lim
pp(T )
+
v(T, p).
Then if we have
N
L
many liquid particles and
N
G
many gas particles, then the
total volume is
V = V
L
+ V
G
= N
L
v
L
+ N
G
+ v
G
.
or equivalently,
v =
N
L
N
v
L
+
N N
L
N
v
G
.
since
N
L
is not fixed,
v
can take any value between
v
L
and
v
G
. Thus, we
should fill the gap by horizontal lines corresponding to liquid-gas equilibrium at
p = p(T ).
v
p
Thus, inside the coexistence curve, the value of
v
(size of container) determines
N
L
inside the coexistence curve.
Practically, we can take our gas, and try to study its behaviour at different
temperatures and pressures, and see how the phase transition behaves. By doing
so, we might be able to figure out
a
and
b
. Once we know their values, we can
use them to predict the value of
T
C
, as well as the temperature needed to liquefy
a gas.
Clausius–Clapeyron equation
We now look at the liquid-gas phase transition in the (p, T ) plane.
T
p
critical point
p = p(T )
T
c
liquid
gas
Crossing the line
p
=
p
(
T
) results in a phase transition. The volume changes
discontinuously from
v
L
(
T
) to
v
G
(
T
), or vice versa. On the other hand,
g
=
G
N
changes continuously, since we know g
L
= g
G
.
In general, we can write
dg =
g
T
p
dT +
g
p
T
dp.
Using
dG = S dT + V dp + µ dN,
we find
dg = s dT + v dp,
where
s =
S
N
is the entropy per particle.
Along p = p(T ), we know that g
L
= g
G
, and hence
dg
L
= dg
G
.
Plugging in our expressions for dg, we find
s
L
dT + v
L
dp = s
G
dT + v
G
dp,
where
s
G
and
s
L
are the entropy (per particle) right below and above the line
p = p(T ).
We can rearrange this to get
dp
dT
=
s
G
s
L
v
G
v
L
.
This is the Clausius–Clapeyron equation. Alternatively, we can write this as
dp
dT
=
S
G
S
L
V
G
V
L
.
There is another way of writing it, in terms of the latent heat of vaporization
L = T (S
G
S
L
),
the heat we need to convert things of entropy
S
L
to things of entropy
S
G
at
temperature T . Then we can write it as
dp
dT
=
L
T (V
G
V
L
)
.
This is also called the Clapeyron–Clapeyron equation.
Example.
Suppose
L
does not depend on temperature, and also
V
G
V
L
. We
further assume that the gas is ideal. Then we have
dp
dT
L
T V
G
=
Lp
NkT
2
.
So we have
p = p
0
exp
L
NkT
for some p
0
.
5.2 Critical point and critical exponents
Eventually, we want to understand phase transitions in general. To begin, we
will classify phase transitions into different orders. The definition we are using is
a more “modern” one. There is a more old-fashioned way of defining the order,
which we will not use.
Definition
(First-order phase transition)
.
A first-order phase transition is one
with a discontinuity in a first derivative of G (or F ).
For example, if we have a discontinuity in
S
, then this gives rise to latent
heat.
Example.
The liquid-gas phase transition is first-order, provided we stay away
from the critical point.
Definition
(Second-order phase transition)
.
A second-order phase transition is
one with continuous first order derivatives, but some second (or higher) derivative
of G (or F ) exhibits some kind of singularity.
For example, we can look at the isothermal compressibility
κ =
1
v
v
p
T
=
1
v
2
g
p
2
T
.
This is a second-order derivative in
G
. We look at how this behaves at the
critical point, at (T
c
, p(T
c
)).
v
p
CP
Here we have
p
v
T
= 0
at the critical point. Since
κ
is the inverse of this, it diverges at the critical
point.
So the liquid-gas transition is second-order at the critical point.
Example. For water, we have T
C
= 647 K, and p(T
C
) = 218 atm.
Experimentally, we observe that liquids and gases are not the only possible
phases. For most materials, it happens that solid phases are also possible. In
this case, the phase diagram of a normal material looks like
T
p
T
c
solid
liquid
gas
There is clearly another special point on the phase diagram, namely the triple
point. This is the point where all three phases meet, and this is what we
previously used to fix the temperature scale in classical thermodynamics.
Critical point
We are going to spend some more time studying the critical point. Let’s first
carefully figure out where this point is.
If we re-arrange the van der Waals equation of state, we find
pv
3
(pb + kT )v
2
+ av ab = 0. ()
When
T < T
c
, this has three real roots, (
L, G, U
). At
T > T
C
, we have a single
real root, and two complex conjugate roots.
Thus at
T
=
T
C
, there must be three equal real roots of
T
=
T
C
. Hence, the
equation must look like
p
C
(v v
C
)
3
= 0,
Equating coefficients, we find
kT
C
=
8a
27b
, v
C
= 3b, p
C
=
a
27b
2
= p(T
C
).
We introduce some new dimensionless “reduced” quantities by
¯
T =
T
T
C
, ¯p =
p
p
C
, ¯v =
v
v
C
.
At the critical point, all these values are 1. Then the van der Waals equation
becomes
¯p =
8
3
¯
T
¯v
1
3
3
¯v
2
.
We have gotten rid of all free parameters! This is called the law of corresponding
states.
Using this, we can compute the ratio
p
C
v
C
kT
C
=
3
8
= 0.375.
This depends on no external parameters. Thus, this is something we can
experimentally try to determine, which can put our van der Waals model to test.
Experimentally, we find
Substance p
C
v
C
/(kT
C
)
H
2
O 0.23
He, H
2
0.3
Ne, N
2
, Ar 0.29
So we got the same order of magnitude, but the actual numbers are clearly off.
Indeed, there is no reason to expect van der Waals to give quantitative agreement
at the critical point, where the density is not small.
But something curious happens. The van der Waals model predicted all gases
should look more-or-less the same, but we saw van der Waals is wrong. However,
it is still the case that all gases look more-or-less the same. In particular, if we
try to plot
¯
T
against
¯ρ
=
¯v
1
, then we find that all substances seem to give the
same coexistence curve! This is the famous Guggenheim plot. Why is this the
case? There is certainly some pieces we are missing.
Critical exponents
There are more similarities between substances we can talk about. As we tend
towards the critical point, the quantities
v
G
v
L
,
T
C
T
and
p p
C
all tend
to 0. On the other hand, we saw the isothermal compressibility diverges as
T T
+
C
. One natural question to ask is how these tend to 0 or diverge. Often,
these are given by some power law, and the exponents are called the critical
exponents. Mysteriously, it appears that these exponents for all our systems are
all the same.
For example, as the critical point is approached along the coexistence curve,
we always have
v
G
v
L
(T
C
T )
β
,
where β 0.32.
On the other hand, if we approach the critical point along the critical isotherm
T = T
C
, then we get
p p
C
(v v
C
)
δ
,
where δ 4.8.
Finally, we can vary the isothermal compressibility
κ =
1
v
v
p
T
along v = v
C
and let T T
+
C
, then we find
κ (T T
C
)
γ
,
where γ 1.2.
These numbers
β, γ, δ
are the critical exponents, and appear to be the same
for all substances. The challenge is to understand why we have this universal
behaviour, and, more ambitiously, calculate them.
We start by making a rather innocent assumption. We suppose we can
describe a system near the critical point using some equation of state, which
we assume is analytic. We also assume that it is “non-degenerate”, in the sense
that if there is no reason for the derivative at a point to vanish, then it doesn’t.
If this equation of state gives the same qualitative behaviour as van der
Waals, i.e. we have a critical point
T
C
which is a reflection point on
T
=
T
C
,
then we have
ρ
v
T
=
2
p
v
2
T
= 0
at the critical point. Therefore, using the non-degeneracy condition, we find
p p
C
(v v
C
)
3
.
near T = T
C
. So we predict δ = 3, is wrong.
But let’s continue and try to make other predictions. We look at how
γ
behaves. We have
p
v
T
(T, v
C
) a(T T
C
)
for some a near the critical point. Therefore we find
κ (T T
C
)
1
.
if v = v
C
and T T
+
C
, and so γ = 1. This is again wrong.
This is pretty bad. While we do predict universal behaviour, we are always
getting the wrong values of the exponents! We might as well work out what
β
is,
which is more complicated, because we have to figure out the coexistence curve.
So we just compute it using van der Waals along the coexistence curve
¯p =
8
¯
T
3¯v
L
1
3
¯v
2
L
=
8
¯
T
3¯v
G
1
3
¯v
2
G
.
Rearranging, we find
¯
T =
(3¯v
L
1)(3¯v
G
1)(¯v
L
+ ¯v
G
)
8¯v
2
G
¯v
2
L
. ()
To understand how this behaves as
T T
C
, we need to understand the coexis-
tence curve. We use the Maxwell construction, and near the critical point, we
have
¯v
L
= 1 + δ¯v
L
, ¯v
G
= 1 + δ¯v
G
.
In the final example sheet, we find that we get
δ¯v
L
= δ¯v
G
=
ε
2
for some ε. Then using (), we find that
¯
T 1
1
16
ε
2
= 1
1
16
(¯v
G
¯v
L
)
2
.
So we find that
v
G
v
L
(T
C
T )
1/2
.
This again doesn’t agree with experiment.
Why are all these exponents wrong? The answer is fluctuations. Near the
critical point, fluctuations are large. So it is not good enough to work with mean
values hvi, hpi. Recall that in the grand canonical ensemble, we had
N
2
=
1
β
N
µ
T,V
=
1
β
N
p
T,V
p
µ
T,V
.
We can try to work out these terms. Recall that we had the grand canonical
potential
Φ = E T S µN,
which satisfies the first law-like expression
dΦ = S dT p dV N dµ.
We also showed before, using extensivity, that
Φ = pV.
So if we plug this into the first law, we find
V dp = S dT N dµ.
Therefore we know that
dp
dµ
T,V
=
N
V
.
So we find
N
2
=
N
βV
N
p
T,V
.
To figure out the remaining term, we use the magic identity
x
y
z
y
z
x
z
x
y
= 1.
This gives us
N
2
=
N
βV
1
p
V
T,N
V
N
T,p
=
N
βV
N
V
T,p
V
p
T,N
=
κN
β
N
V
T,p
.
Recall that the density is
ρ(T, p, N) =
N
V
,
which is an intensive quantity. So we can write it as ρ(T, p). Since we have
N = ρV,
we simply have
N
V
T,p
= ρ.
Therefore we have
N
2
=
kN
β
ρ.
So we find that
N
2
N
2
=
κkT
V
.
This is how big the fluctuations of N are compared to N.
Crucially, this is proportional to
κ
, which we have already seen diverges
at the critical point. This means near the critical point, we cannot ignore the
fluctuations in N.
Before we say anything more about this, let’s study another system that
experiences a phase transition.
5.3 The Ising model
The Ising model was invented as a model of a ferromagnet. Consider a
d
-
dimensional lattice with
N
sites. On each of these sites, we have a degree of
freedom called the “spin”
s
i
=
(
+1 spin up
1 spin down
.
The Hamiltonian is given by
H = J
X
hiji
s
i
s
j
B
X
i
s
i
for some
J, B
. We should think of
B
as the magnetic field, and for simplicity,
we just assume that the magnetic moment is 1. The first term describes interac-
tions between the spins themselves, where
P
hiji
denotes the sum over nearest
neighbours.
For example in d = 1, a lattice looks like
and the nearest neighbours are the literal neighbours.
In d = 2, we can have many different lattices. We could have a square grid
The nearest neighbours of the central red dot are the orange ones.
We let
q
be the number of nearest neighbours. So for example, when
d
= 1,
we have q = 2; for a d = 2 square lattice, then q = 4.
What is
J
? It is clearly a measure of the strength of the interaction. But the
sign matters. The case
J >
0 corresponds to the case where neighbouring spins
prefer to align. This is called a ferromagnet; and the
J <
0 case corresponds to
the case where they prefer to anti-align, and this is called an anti-ferromagnet.
It doesn’t really matter a lot, but for the sake of definiteness, we assume
J > 0.
Now if turn on the
B
field, then on the basis of energy, it will make the spins
to try to align with the
B
field, and the interactions will further try to make it
align.
But energy is not the only thing that matters. There is also entropy. If all
spins align with the field, then entropy will be zero, which is not very good. So
there is a competition between energy, which wants to align, and entropy, which
prefers to anti-align, and the end result depends on how this competition turns
out.
To understand this properly, we use the canonical ensemble
Z =
X
{s
i
}
e
βE[{s
i
}]
.
We define the average spin, or the magnetization, to be
m =
1
N
X
i
hs
i
i =
1
Nβ
log Z
B
T
. ()
It turns out there is a different possible interpretation of the Ising model. It can
also be viewed as a description of a gas! Consider “hard core” particles living on
a lattice. Then there is either 0 or 1 particle living on each site
i
. Let’s call it
n
i
.
We suppose there is an attractive force between the particles, and suppose the
kinetic energy is not important. The model is then defined by the Hamiltonian
H = 4J
X
hiji
n
i
n
j
.
This is a rather crude model for studying gas on a lattice. To understand this
system of gas, we use the grand canonical ensemble
Z
gas
=
X
{n
i
}
e
β(E[{n
i
}]µ
P
i
n
i
)
.
We look at the exponent
E[{n
i
}] µ
X
i
n
i
= 4J
X
hiji
n
i
n
j
µ
X
i
n
i
.
We can compare this with the Ising model, and they look very similar. Unsur-
prisingly, this is equal to E
Ising
[{s
i
}] if we set
n
i
=
s
i
+ 1
2
,
and let
B =
µ
2
+ qJ.
Then we have
Z
gas
= Z
Ising
.
So if we understand one, then we understand the other.
We now concentrate on
Z
Ising
. Can we actually compute it? It turns out
it is possible to do it for
d
= 1, which we will do some time later. In the case
d
= 2 with
B
= 0, we can still manage, but in general, we can’t. So we want to
develop some approximate method to study them.
We’ll do so using mean field theory. In general, we would expect
hs
i
i
=
m
.
So, being a good physicist, we expand around this equilibrium, and write
s
i
s
j
= [(s
i
m) + m][(s
j
m) + m)]
= (s
i
m)(s
j
m) + m(s
j
m) + m(s
i
m) + m
2
.
We will assume that
X
hiji
(s
i
m)(s
j
m)
is negligible compared to other terms in
H
. This is a subtle assumption. It is
not true that
h
(
s
i
m
)
2
i
is negligible. In fact, since
s
i
= 1 always, we must have
hs
2
i
i = 1. Thus,
h(s
i
m)
2
i = hs
2
i
i 2mhs
i
i + m
2
= 1 m
2
,
and for small magnetization, this is actually pretty huge. However,
hs
i
s
j
i
can
behave rather differently than
hs
i
i
2
. If we make suitable assumptions about the
fluctuations, this can potentially be small.
Assuming we can indeed neglect that term, we have
H J
X
hiji
(m(s
i
+ s
j
) m
2
) B
X
i
s
i
.
Now the system decouples! We can simply write this as
H =
1
2
JN qm
2
(Jqm + B)
X
s
i
.
Now this is just the 2-state model we saw at the very beginning of the course,
with an effective magnetic field
B
eff
= B + Jqm.
Thus, we have “averaged out” the interactions, and turned it into a shifted
magnetic field.
Using this, we get
Z e
1
2
βJNqm
2
+B
eff
P
i
s
i
= e
1
2
βJNqm
2
e
βB
eff
+ e
βB
eff
N
= e
1
2
βJNqm
2
2
N
cosh
N
(βB + βJqm)
Note that the partition function contains a variable
m
, which we do not know
about. But we can determine m(T, B) using (), and this gives
m = tanh(βB + βJqm). (∗∗)
It is not easy to solve this analytically, but we can graph this and understand it
qualitatively:
We first consider the case B = 0. Note that for small x, we have
tanh x x
1
3
x
3
.
We can then plot both sides of the equation as a function of
m
. If
βJq <
1, then
we see that the only solution is m = 0:
m
On the other hand, if βJq > 1, then we have three possible states:
m
m
0
(T )
Something interesting happens when βJq = 1. We define T
c
by
kT
c
= Jq.
Then we have
βJq =
T
C
T
.
We can now rephrase our previous observation as follows if
T > T
C
, then
there is only one solution to the system, namely
m
= 0. We interpret this as
saying that at high temperature, thermal fluctuations make them not align. In
this case, entropy wins.
If
T < T
c
, then there are 3 solutions. One is
m
= 0, and then there are
non-trivial solutions
±m
0
(
T
). Let’s look at the
m
= 0 solution first. This we
should think of as the unstable solution we had for the liquid-gas equation.
Indeed, taking the derivative of (∗∗) with respect to B, we find
m
B
T
B=0
=
β
1 βJq
< 0.
So for the same reason as before, this is unstable. So only
±m
0
(
T
) is physical.
In this case the spins align. We have
m
0
±
1 as
T
0, and the two signs
correspond to pointing up and down.
The interesting part is, of course, what happens near the critical temperature.
When we are close to the critical temperature, we can Taylor expand the equation
(∗∗). Then we have
m
0
βJqm
0
1
3
(βJqm
0
)
3
.
We can rearrange this, and we obtain
m
0
(T
c
T )
1/2
.
We can plot the solutions as
T
m
1
1
T
c
There is a phase transition at
T
=
T
C
for
B
= 0. For
T > T
C
, the stable states
have
m
= 0, while for
T < T
C
the stable states have non-zero
m
. Thus, the
magnetization disappears when we are above the critical temperature T
C
.
We now want to understand the order of the phase transition. For general
B, we can write the free energy as
F (T, B) = kT log Z =
1
2
JN qm
2
NkT log (2 cosh(βB + βJqm)) ,
where m = m(T, B), which we found by solving (∗∗).
Restricting back to
B
= 0, for
T T
C
, we have a small
m
. Then we can
expand the expression in powers of m, and get
F (T, 0)
1
2
NkT
C
1
T
C
T
m
2
NkT log 2.
The second term does not depend on
m
, and behaves in a non-singular way. So
we are mostly interested in the behaviour of the other part. We have
F + N kT log 2
(
(T
C
T )
2
T < T
C
0 T > T
C
If we look at the first derivative, then this is continuous at
T
=
T
C
! However,
the second derivative is discontinuous. Hence we have a second-order phase
transition at B = 0, T = T
C
.
We’ve understood what happens when
B
= 0. Now let’s turn on
B
, and put
B 6= 0. Then this merely shifts the tanh curve horizontally:
m
m(T, B)
We see that we always have a unique solution of
m
of the same sign as
B
. We
call this
m
(
T, B
). However, when we are at sufficiently low temperature, and
B
is not too big, then we also have other solutions
U
and
M
, which are of opposite
sign to B:
m
M
U
m(T, B)
As before,
U
is unstable. But
M
is something new. It is natural to ask ourselves
which of the two states have lower free energy. It turns out
M
has higher
F
, and
so it is metastable. This state can indeed be constructed by doing very delicate
experiments.
This time, at least qualitatively from the pictures, we see that
m
(
T, B
)
depends smoothly on T and B. So there is no phase transition.
We can also see what happens at high temperature. If
βJq
1 and
β|B|
1,
then we can expand (∗∗) and get
m βB =
B
kT
.
This is Curie’s law. But now let’s do something different. We fix
T
, and vary
B
.
If T > T
c
, then m(T, B) depends smoothly on B, and in particular
lim
B0
m(T, B) = 0.
However, if we start lower than the critical temperature, and start at
B >
0,
then our magnetization is positive. As we decrease
B
, we have
B
0
+
, and
so
m m
0
(
T
). However, if we start with negative
B
, we have a negative
magnetization. We have
lim
B0
+
m(T, B) = m
0
(T ) 6= m
0
(T ) = lim
B0
m
0
(T ).
So we indeed have a phase transition at B = 0 for T < T
C
.
We now want to determine the order of this transition. We note that we have
m
1
Nβ
log Z
B
T
=
1
N
f
B
T
.
Since
m
is discontinuous, we know this first derivative of the free energy is
discontinuous. So this is a first-order phase transition.
We can plot our rather unexciting phase diagram:
B
T
We can compare this with the liquid-gas phase diagram we had:
T
p
critical point
liquid
gas
It looks roughly the same.
Let’s now compute some critical components. We saw that for
B
= 0 and
T T
C
, we had
m
0
(T
C
T )
β
,
with β =
1
2
.
We can also set T = T
C
, and let B 0. Then our equation (∗∗) gives
m = tanh
B
T q
+ m
.
We can invert this equation by taking tanh
1
, and get
B
Jq
+ m = tanh
1
m = m +
1
3
m
3
+ ··· .
So we find
m
3B
Jq
1/3
.
So we find B m
δ
, where δ = 3.
Finally, we can consider the susceptibility
χ = N
m
B
T
.
We set B = 0, and let T T
+
C
. We then find that
χ =
Nβ
1 βJq
(T T
C
)
γ
,
where γ = 1.
These were exactly the (incorrect) critical exponents we computed for the
liquid-gas phase transition using the van der Waals equation!
In the case of van der Waals, we saw that our result was wrong because we
made the approximation of ignoring fluctuations. Here we made the mean field
approximation. Is this a problem? There are two questions we should ask
is the qualitative picture of the phase transition correct? And are the critical
exponents correct?
The answer depends on the dimension.
If
d
= 1, then this is completely wrong! We can solve the Ising model
exactly, and the exact solution has no phase transition.
If
d
2, then the phase diagram is qualitatively correct. Moreover, the
critical exponents correct for
d
4. To learn more, one should take III
Statistical Field Theory.
In
d
= 2, we can indeed solve the Ising model exactly, and the exact
solution gives
β =
1
8
, γ =
7
4
, δ = 15.
So the mean field approximation is pretty useless at predicting the critical
exponents.
In
d
= 3, then there is no exact solution known, but there has been a lot
of progress on this recently, within the last 2 or 3 years! This led to a
very accurate calculation of the critical exponents, using a combination of
analytic and numerical methods, which gives
β = 0.326, γ = 1.237, δ = 4.790.
These are actually known to higher accuracy.
These are exactly the same as the measured critical exponents for the
liquid-gas transition!
So this funny model of spins on a lattice is giving us exactly the same number as
real-world liquid-gas! This is evidence for universality. Near the critical point,
the system is losing knowledge of what the individual microscopic description
of the system is, and all systems exhibit the same kind of physics. We say the
critical points of the three-dimensional Ising model and liquid-gas system belong
to the same universality class. This is something described by a conformal field
theory.
One-dimensional Ising model
Let’s now solve the one-dimensional Ising model. Here we just have
H = J
N
X
i=1
s
i
s
i+1
B
2
n
X
i=1
(s
i
+ s
i+1
).
To make our study easier, we impose the periodic boundary condition
s
N+1
s
1
.
We then have
Z =
1
X
s
1
=1
1
X
s
2
=1
···
1
X
s
N
=1
N
Y
i=1
exp
BJs
i
s
i+1
+
βB
2
(s
i
+ s
i+1
)
.
We define the symmetric 2 ×2 matrix T by
T
st
= exp
BJst +
βB
2
(s + t)
,
where s, t = ±1. We can then rewrite this Z using matrix multiplication
Z =
X
s
1
···
X
s
N
T
s
1
s
2
T
s
2
s
3
···T
s
N
s
1
= Tr(T
N
).
The trace is defined to be the sum of eigenvalues, and if we know the eigenvalues
of T , we know that of T
N
. We can do this directly. We have
λ
±
= e
βJ
cosh βB ±
q
e
2βJ
cosh
2
βB 2 sinh 2βJ.
This is not terribly attractive, but we can indeed write it down. As expected,
these eigenvalues are real, and we picked them so that
λ
+
> λ
. Then we have
Z = λ
N
+
+ λ
N
.
We have thus solved the 1
d
Ising model completely. We can now see if there is a
phase transition. We have
m =
1
Nβ
log Z
B
T
=
1
βZ
λ
N1
+
λ
+
B
T
+ λ
N1
λ
B
T
.
To evaluate this derivative, we see that the
B
-dependence lives in the
cosh
terms.
So we have
λ
±
B
T
sinh βB.
But this is all we need, so we only need to evaluate it at
B
= 0, and
sinh
vanishes.
So we know that when
B
= 0, we have
m
= 0 for all
T
! So there is no phase
transition at B = 0.
More generally, we can look at the free energy
F =
1
β
log Z
=
1
β
log
λ
N
+
1 +
λ
λ
+
N
!!
=
N
β
log λ
+
1
β
log
1 +
λ
λ
+
N
!
.
Recall that we said phase transitions are possible only in the thermodynamic
limit. So we want to take N , and see what we get. In this case, we have
λ
λ
+
N
0.
So we have
F
N
1
β
λ
+
.
We have
λ
+
>
0, and this depends smoothly on
T
and
B
(as
·
never vanishes).
So there is no phase transition.
In fact, this holds for any 1d system without long-range interactions (Peierls).
5.4 Landau theory
What is it that made the Ising model behave so similarly to the liquid-gas
model? To understand this, we try to fit these into a “general” theory of phase
transitions, known as Landau theory. We will motivate Landau theory by trying
to capture the essential properties of the mean-field Ising model that lead to the
phase transition behaviour. Consequently, Landau theory will predict the same
“wrong” critical exponents, so perhaps it isn’t that satisfactory. Nevertheless,
doing Landau theory will help us understand better “where” the phase transition
came from.
In the mean field approximation for the Ising model, the free energy was
given by
F (T, B; m) =
1
2
JN qm
2
N
B
log (2 cosh(βB + βJqm)) ,
where m = m(T, B) is determined by the equation
m =
1
Nβ
log Z
B
T
. ()
It is convenient to distinguish between the function
F
we wrote above, and the
value of F when we set m = m(T, B). So we write
˜
F (T, B; m) =
1
2
JN qm
2
N
B
log (2 cosh(βB + βJqm))
F (T, B) =
˜
F (T, B, m(T, B)).
The idea of Landau theory is to see what happens when we don’t impose (
), and
take
˜
F
seriously. There are some questions we can ask about
˜
F
. For example,
we can try to minimize
˜
F . We set
˜
F
m
!
T,B
= 0.
Then we find
m = tanh(βB + βJqm),
which is exactly the same thing we get by imposing (
)! Thus, another way to
view the mean-field Ising model is that we have a system with free parameters
m, T, B
, and the equilibrium systems are those where
m
minimizes the free
energy.
To do Landau theory, we need a generalization of
m
in an arbitrary system.
The role of
m
in the Ising model is that it is the order parameter if
m 6
= 0,
then we are ordered, i.e. spins are aligned; if m = 0, then we are disordered.
After we have identified an order parameter
m
, we need to have some function
˜
F
such that the equilibria are exactly the minima of
˜
F
. In the most basic set up,
˜
F
is an analytic function of
T
and
m
, but in general, we can incorporate some
other external parameters, such as
B
. Finally, we assume that we have a
Z/
2
Z
symmetry, namely
F
(
T, m
) =
F
(
T, m
), and moreover that
m
is sufficiently
small near the critical point that we can analyze
˜
F using its Taylor expansion.
Since
˜
F is an even function in m, we can write the Taylor expansion as
˜
F (T, m) = F
0
(T ) + a(T )m
2
+ b(T )m
4
+ ··· .
Example. In the Ising model with B = 0, we take
˜
F
Ising
(T, m) = NkT log 2 +
NJq
2
(1 JqB)m
2
+
Nβ
3
J
4
q
4
24
m
4
+ ··· .
Now of course, the value of
F
0
(
T
) doesn’t matter. We will assume
b
(
T
)
>
0.
Otherwise, we need to look at the
m
6
terms to see what happens, which will be
done in the last example sheet.
There are now two cases to consider. If a(T ) > 0 as well, then it looks like
m
˜
F
However, if
a
(
T
)
<
0, then now
m
= 0 is a local maximum, and there are two
other minimum near m = 0:
m
˜
F
m
0
(T )
We call the minima ±m
0
(T ).
Thus, we expect a rather discrete change in behaviour when we transition
from
a
(
T
)
>
0 to
a
(
T
)
<
0. We let
T
C
be the temperature such that
a
(
T
C
) = 0.
This is the critical temperature. Since we are only interested in the behaviour
near the critical point, We may wlog assume that
a(T )
> 0 if T > T
C
= 0 if T = T
C
< 0 if T < T
C
.
In other words,
a
(
T
) has the same sign as
T
. We will further assume that this is
a simple zero at T = T
C
.
Example. In the Ising model, we have a(T ) = 0 iff kT = Jq.
We can work out what the values of
m
0
(
T
) are, assuming we ignore
O
(
m
6
)
terms. This is simply given by
m
0
(T ) =
r
a
2b
.
Having determined the minimum, we plug it back into
˜
F
to determine the free
energy F (T ). This gives
F (T ) =
(
F
0
(T ) T > T
C
F
0
(T )
a
2
4b
T < T
C
.
Recall that
a
passes through 0 when
T
=
T
C
, so this is in fact a continuous
function.
Now we want to determine the order of this phase transition. Since we
assumed
F
is analytic, we know in particular
F
0
,
a
and
b
are smooth functions
in T . Then we can write
a(T ) = a
0
(T T
C
), b(T ) = b
0
(T
C
) > 0
near T = T
C
, where a
0
> 0. Then we get
F (T ) =
(
F
0
(T ) T > T
C
F
0
(T )
a
2
0
4b
0
(T T
C
)
2
T T
C
.
So we see that
S =
dF
dT
is continuous. So this is not a first-order phase transition, as
S
is continuous.
However, the second-order derivative
C = T
dS
dT
is discontinuous. This is a second-order phase transition. We can calculate
critical exponents as well. We have
m
0
r
a
0
2b
0
(T
C
T )
1/2
.
So we have
β
=
1
2
, which is exactly what we saw for the mean field approximation
and van der Waals. So Landau theory gives us the same answer, which are still
wrong. However, it does give a qualitative description of a phase transition. In
particular, we see where the non-analyticity of the result comes from.
One final comment is that we started with this
˜
F
which has a reflection
symmetry in
m
. However, below the critical temperature, the ground state does
not have such symmetry. It is either +
m
0
or
m
0
, and this breaks the symmetry.
This is known as spontaneous symmetry breaking.
Non-symmetric free energy
In our Ising model, we had an external parameter
B
. We saw that when
T < T
C
,
then we had a first-order phase transition when
B
passes through 0. How can
we capture this behaviour in Landau theory?
The key observation is that in the Ising model, when
B 6
= 0, then
˜
F
is not
symmetric under m m. Instead, it has the symmetry
˜
F (T, B, m) =
˜
F (T, B, m).
Consider a general Landau theory, where
˜
F
has an external parameter
B
with
this same symmetry property. We can expand
˜
F = F
0
(T, B) + F
1
(T, B)m + F
2
(T, B)m
2
+ F
3
(T, B)m
3
+ F
4
(T, B)m
4
+ ··· .
In this case,
F
n
(
T, B
) is odd/even in
B
if
m
is odd/even (resp.). We assume
F
4
>
0, and ignore
O
(
m
5
) terms. As before, for any fixed
B
, we either have a
single minimum, or has a single maxima and two minima.
As before, at high temperature, we assume we have a single minimum. Then
the
˜
F looks something like this:
m
˜
F
At low temperature, we assume we go to a situation where
˜
F
has multiple
extrema. Then for B > 0, we have
m
˜
F
M
U
G
So we have a ground state
G
; a metastable state
M
and an unstable state
U
.
When we go to B = 0, this becomes symmetric, and it now looks like
m
˜
F
m
0
(T )
Now we have two ground states. When we move to
B <
0, the ground state now
shifts to the left:
m
˜
F
M
U
G
Now we can see a first-order phase transition. Our
m
is discontinuous at
B
= 0.
In particular,
lim
B0
+
m(T, B) = m
0
(T ), lim
B0
m(T, B) = m
0
(T ).
This is exactly the phenomena we observed previously.
Landau–Ginzburg theory*
The key idea of Landau theory was that we had a single order parameter
m
that
describes the whole system. This ignores fluctuations over large scales, and lead
to the “wrong” answers. We now want to understand fluctuations better.
Definition (Correlation function). We define the correlation function
G
ij
= hs
i
s
j
i hs
i
ihs
j
i.
Either numerically, or by inspecting the exact solution in low dimensions, we
find that
G
ij
e
r
ij
,
where
r
ij
is the separation between the sites, and
ξ
is some function known as
the correlation length. The key property of this function is that as we approach
the critical point
T
=
T
C
,
B
= 0, we have
ξ
. So we have correlations over
arbitrarily large distance.
In this case, it doesn’t make much sense to use a single
m
to represent the
average over all lattice sites, as long-range correlations means there is some
significant variation from site to site we want to take into account. But we
still want to do some averaging. The key idea is that we averaging over short
distances. This is sometimes called course graining.
To do this, we pick some length scale that is large relative to the lattice
separation, but small relative to the correlation scale. In other words, we pick a
b such that
ξ b a = lattice spacing.
Let m
i
be the average of all s
j
such that |r
i
r
j
| < b.
Now the idea is to promote these numbers to a smooth function. Given some
set of values {m
i
}, we let m(r) be a smooth function such that
m(r
i
) = m
i
,
with
m
slowly varying over distances
< b
. Of course, there is no unique way of
doing this, but let’s just pick a sensible one.
We now want to regard this
m
as an order parameter, but this is more general
than what we did before. This is a spatially varying order parameter. We define
a functional
˜
F [T, m] by
e
B
˜
F [T,m]
=
X
{s
i
}
e
βE[{s
i
}]
,
but this time we sum only over the {s
i
} giving {m
i
} such that m
i
= m(r
i
).
In principle, this determines the Landau functional
˜
F
. In practice, we can’t
actually evaluate this.
To get the full partition function, we want to sum over all
m
(
r
). But
m
(
r
)
is a function! We are trying to do a sum over all possible choices of a smooth
function. We formally write this sum as
Z =
Z
Dm e
β
˜
F [T,m(r)]
.
This is an example of a functional integral, or a path integral. We shall not
mathematically define what we mean by this, because this is a major unsolved
problem of theoretical physics. In the context we are talking about here, it
isn’t really a huge problem. Indeed, our problem was initially discrete, and we
started with the finitely many points
m
i
. It made sense to sum over all possible
combinations of
{m
i
}
. It’s just that for some reason, we decided to promote
this to a function
m
(
r
) and cause ourselves trouble. However, writing a path
integral like this makes it easier to manipulate, at least formally, and in certain
other scenarios in physics, such path integrals are inevitable.
Ignoring the problem, for small
m
, if we have reflection symmetry, then we
can expand
˜
F =
Z
d
d
r (a(t)m
2
+ b(T )m
4
+ c(T )(m)
2
+ ···).
To proceed with this mathematically fearsome path integral, we make an approx-
imation, namely the saddle point approximation. This says the path integral
should be well-approximated by the value of
˜
F at the minimum, so
Z e
β
˜
F [T,m]
,
where m is determined by minimizing
˜
F , i.e.
δ
˜
F
δm
= 0.
To solve this, we have to vary
m
. A standard variational calculus manipulation
gives
δ
˜
F =
Z
d
d
r
2amδm + 4bm
3
δm + 2cm · (δm) + ···
=
Z
d
d
r (2am + 4bm
2
2c
2
m + ···)δm.
So the minimum point of
˜
F is given by the solution to
c
2
m = am + 2bm
2
+ ··· . ()
Let’s first think about what happens when
m
is constant. Then this equation
just reproduces the equation for Landau theory. This “explains” why Landau
theory works. But in this set up, we can do more than that. We can study
corrections to the saddle point approximation, and these are the fluctuations.
We find that the fluctuations are negligible if we are in
d
4. So Landau theory
is reliable. However, this is not the case in
d <
4. It is an unfortunate fact that
we live in 3 dimensions, not 4.
But we can also go beyond Landau theory in another way. We can consider
the case where
m
is not constant. Really, we have been doing this all along for
the liquid-gas system, because below the critical temperature, we can have a
mixture of liquid and gas in our system. We can try to describe it in this set up.
Non-constant solutions are indeed possible for appropriate boundary condi-
tions. For example, suppose
T < T
C
, and assume
m m
0
(
T
) as
x
and
m m
0
(
T
) as
x −∞
. So we have gas on one side and liquid on the other.
We can take
m
=
m
(
x
), and then the equation (
) becomes the second-order
ODE
d
2
m
dx
2
=
am
c
+
2bm
3
c
,
and we can check that the solution is given by
m = m
0
(T ) tanh
r
a
2c
x
!
,
and
m
0
=
r
a
2b
.
This configuration is known as the domain wall that interpolates between the
two ground states. This describes an interface between the liquid and gas phases.
x
m
m
0
m
0
To learn more about this, take Part III Statistical Field Theory.