1Fundamentals of statistical mechanics

II Statistical Physics



1.1 Microcanonical ensemble
We begin by considering a rather general system. Suppose we have an isolated
system containing
N
particles, where
N
is a Large Number
TM
. The canonical
example to keep in mind is a box of gas detached from reality.
Definition
(Microstate)
.
The microstate of a system is the actual (quantum)
state of the system. This gives a complete description of the system.
As one would expect, the microstate is very complicated and infeasible to
describe, especially when we have many particles. In statistical physics, we
observe that many microstates are indistinguishable macroscopically. Thus, we
only take note of some macroscopically interesting quantities, and use these
macroscopic quantities to put a probability distribution on the microstates.
More precisely, we let {|ni} be a basis of normalized eigenstates, say
ˆ
H |ni = E
n
|ni.
We let
p
(
n
) be the probability that the microstate is
|ni
. Note that this
probability is not the quantum probability we are used to. It is some probability
assigned to reflect our ignorance of the system. Given such probabilities, we can
define the expectation of an operator in the least imaginative way:
Definition
(Expectation value)
.
Given a probability distribution
p
(
n
) on the
states, the expectation value of an operator O is
hOi =
X
n
p(n) hn|O|ni.
If one knows about density operators, we can describe the system as a mixed
state with density operator
ρ =
X
n
p(n) |nihn|.
There is an equivalent way of looking at this. We can consider an ensemble
consisting of
W
1 independent copies of our system such that
W p
(
n
) many
copies are in the microstate
|ni
. Then the expectation is just the average over
the ensemble. For most purposes, how we think about this doesn’t really matter.
We shall further assume our system is in equilibrium, i.e. the probability
distribution
p
(
n
) does not change in time. So in particular
hOi
is independent of
time. Of course, this does not mean the particles stop moving. The particles are
still whizzing around. It’s just that the statistical distribution does not change.
In this course, we will mostly be talking about equilibrium systems. When we
get out of equilibrium, things become very complicated.
The idea of statistical physics is that we have some partial knowledge about
the system. For example, we might know its total energy. The microstates that
are compatible with this partial knowledge are called accessible. The fundamental
assumption of statistical mechanics is then
An isolated system in equilibrium is equally likely to be in any of the
accessible microstates.
Thus, different probability distributions, or different ensembles, are distinguished
by the partial knowledge we know.
Definition
(Microcanonical ensemble)
.
In a microcanonical ensemble, we know
the energy is between
E
and
E
+
δE
, where
δE
is the accuracy of our measuring
device. The accessible microstates are those with energy
E E
n
E
+
δE
. We
let Ω(E) be the number of such states.
In practice,
δE
is much much larger than the spacing of energy levels, and so
Ω(
E
)
1. A priori, it seems like our theory will depend on what the value of
δE is, but as we develop the theory, we will see that this doesn’t really matter.
It is crucial here that we are working with a quantum system, so the possible
states is discrete, and it makes sense to count the number of systems. We need
to do more quite a bit work if we want to do this classically.
Example.
Suppose we have
N
= 10
23
particles, and each particle can occupy
two states
|↑i
and
|↓i
, which have the same energy
ε
. Then we always have
Nε
total energy, and we have
Ω(Nε) = 2
10
23
.
This is a fantastically huge, mind-boggling number. This is the kind of number
we are talking about.
By the fundamental assumption, we can write
p(n) =
(
1
Ω(E)
if E E
n
E + δE
0 otherwise
.
This is the characteristic distribution of the microcanonical ensemble.
It turns out it is not very convenient to work with Ω(
E
). In particular, Ω(
E
)
is not linear in
N
, the number of particles. Instead, it scales as an exponential
of N. So we take the logarithm.
Definition (Boltzmann entropy). The (Boltzmann) entropy is defined as
S(E) = k log Ω(E),
where k = 1.381 ×10
23
J K
1
is Boltzmann’s constant.
This annoying constant
k
is necessary because when people started doing
thermodynamics, they didn’t know about statistical physics, and picked weird
conventions.
We wrote our expressions as
S
(
E
), instead of
S
(
E, δE
). As promised, the
value of
δE
doesn’t really matter. We know that Ω(
E
) will scale approximately
linearly with
δE
. So if we, say, double
δE
, then
S
(
E
) will increase by
k log
2,
which is incredibly tiny compared to
S
(
E
) =
k log
Ω(
E
). So it doesn’t matter
which value of δE we pick.
Even if you are not so convinced that multiplying 10
10
23
by a factor of 2 or
adding
log
2 to 10
23
do not really matter, you should be reassured that at the
end, we will rarely talk about Ω(
E
) or
S
(
E
) itself. Instead, we will often divide
two different Ω’s to get probabilities, or differentiate
S
to get other interesting
quantities. In these cases, the factors really do not matter.
The second nice property of the entropy is that it is additive if we have
two non-interacting systems with energies
E
(1)
, E
(2)
. Then the total number of
states of the combined system is
Ω(E
(1)
, E
(2)
) = Ω
1
(E
(1)
)Ω
2
(E
(2)
).
So when we take the logarithm, we find
S(E
(1)
, E
(2)
) = S(E
(1)
) + S(E
(2)
).
Of course, this is not very interesting, until we bring our systems together and
let them interact with each other.
Interacting systems
Suppose we bring the two systems together, and let them exchange energy. Then
the energy of the individual systems is no longer fixed, and only the total energy
E
total
= E
(1)
+ E
(2)
is fixed. Then we find that
Ω(E
total
) =
X
E
i
1
(E
i
)Ω
2
(E
total
E
i
),
where we sum over all possible energy levels of the first system. In terms of the
entropy of the system, we have
Ω(E
total
) =
X
E
i
exp
S
1
(E
i
)
k
+
S
2
(E
total
E
i
)
k
We can be a bit more precise with what the sum means. We are not summing
over all eigenstates. Recall that we have previously fixed an accuracy
δE
. So
we can imagine dividing the whole energy spectrum into chunks of size
δE
, and
here we are summing over the chunks.
We know that
S
1,2
/k N
1,2
10
23
, which is a ridiculously large number.
So the sum is overwhelmingly dominated by the term with the largest exponent.
Suppose this is maximized when E
i
= E
. Then we have
S(E
total
) = k log Ω(E
total
) S
1
(E
) + S
2
(E
total
E
).
Again, we are not claiming that only the factor coming from
E
has significant
contribution. Maybe one or two energy levels next to
E
are also very significant,
but taking these into account will only multiply Ω(
E
total
) by a (relatively) small
constant, hence contributes a small additive factor to
S
(
E
total
), which can be
neglected.
Now given any
E
(1)
, what is the probability that the actual energy of the
first system is E
(1)
? For convenience, we write E
(2)
= E
total
E
(1)
, the energy
of the second system. Then the probability desired is
1
(E
(1)
)Ω
2
(E
(2)
)
Ω(E
total
)
= exp
1
k
S
1
(E
(1)
) + S
2
(E
(2)
) S(E
total
)
.
Again recall that the numbers at stake are unimaginably huge. So if
S
1
(
E
(1)
) +
S
2
(
E
(2)
) is even slightly different from
S
(
E
total
), then the probability is effectively
zero. And by above, for the two quantities to be close, we need
E
(1)
=
E
. So
for all practical purposes, the value of E
(1)
is fixed into E
.
Now imagine we prepare two systems separately with energies
E
(1)
and
E
(2)
such that
E
(1)
6
=
E
, and then bring the system together, then we are no longer
in equilibrium.
E
(1)
will change until it takes value
E
, and then entropy of the
system will increase from
S
1
(
E
(1)
) +
S
2
(
E
(2)
) to
S
1
(
E
) +
S
2
(
E
total
E
). In
particular, the entropy increases.
Law
(Second law of thermodynamics)
.
The entropy of an isolated system
increases (or remains the same) in any physical process. In equilibrium, the
entropy attains its maximum value.
This prediction is verified by virtually all observations of physics.
While our derivation did not show it is impossible to violate the second law
of thermodynamics, it is very very very very very very very very unlikely to be
violated.
Temperature
Having defined entropy, the next interesting thing we can define is the temperature.
We assume that
S
is a smooth function in
E
. Then we can define the temperature
as follows:
Definition (Temperature). The temperature is defined to be
1
T
=
dS
dE
.
Why do we call this the temperature? Over the course, we will see that
this quantity we decide to call “temperature” does behave as we would expect
temperature to behave. It is difficult to give further justification of this definition,
because even though we vaguely have some idea what temperature is like in
daily life, those ideas are very far from anything we can concretely write down
or even describe.
One reassuring property we can prove is the following:
Proposition.
Two interacting systems in equilibrium have the same tempera-
ture.
Proof. Recall that the equilibrium energy E
is found by maximizing
S
1
(E
i
) + S
2
(E
total
E
i
)
over all possible
E
i
. Thus, at an equilibrium, the derivative of this expression
has to vanish, and the derivative is exactly
dS
1
dE
E
(1)
=E
dS
i
dE
E
(2)
=E
total
E
= 0
So we need
1
T
1
=
1
T
2
.
In other words, we need
T
1
= T
2
.
Now suppose initially, our systems have different temperature. We would
expect energy to flow from the hotter system to the cooler system. This is indeed
the case.
Proposition.
Suppose two systems with initial energies
E
(1)
, E
(2)
and temper-
atures
T
1
, T
2
are put into contact. If
T
1
> T
2
, then energy will flow form the
first system to the second.
Proof.
Since we are not in equilibrium, there must be some energy transfer from
one system to the other. Suppose after time δt, the energy changes by
E
(1)
7→ E
(1)
+ δE
E
(2)
7→ E
(2)
δE,
keeping the total energy constant. Then the change in entropy is given by
δS =
dS
1
dE
δE
(1)
+
dS
2
dE
δE
(2)
=
1
T
1
1
T
2
δE.
By assumption, we know
1
T
1
1
T
2
< 0,
but by the second law of thermodynamics, we know
δS
must increase. So we
must have δE < 0, i.e. energy flows from the first system to the second.
So this notion of temperature agrees with the basic properties of temperature
we expect.
Note that these properties we’ve derived only depends on the fact that
1
T
is a monotonically decreasing function of T . In principle, we could have picked
any monotonically decreasing function of
T
, and set it to
dS
dE
. We will later see
that this definition will agree with the other definitions of temperature we have
previously seen, e.g. via the ideal gas law, and so this is indeed the “right” one.
Heat capacity
As we will keep on doing later, we can take different derivatives to get different
interesting quantities. This time, we are going to get heat capacity. Recall that
T
was a function of energy,
T
=
T
(
E
). We will assume that we can invert this
function, at least locally, to get E as a function of T .
Definition (Heat capacity). The heat capacity of a system is
C =
dE
dT
.
The specific heat capacity is
C
mass of system
.
The specific heat capacity is a property of the substance that makes up the
system, and not how much stuff there is, as both
C
and the mass scale linearly
with the size of the system.
This is some quantity we can actually physically measure. We can measure
the temperature with a thermometer, and it is usually not too difficult to see how
much energy we are pumping into a system. Then by measuring the temperature
change, we can figure out the heat capacity.
In doing so, we can indirectly measure the entropy, or at least the changes in
entropy. Note that we have
dS
dT
=
dS
dE
dE
dT
=
C
T
.
Integrating up, if the temperature changes from T
1
to T
2
, we know
S =
Z
T
2
T
1
C(T )
T
dT.
As promised, by measuring heat capacity experimentally, we can measure the
change in entropy.
The heat capacity is useful in other ways. Recall that to find the equilibrium
energy E
, a necessary condition was that it satisfies
dS
1
dE
dS
2
dE
= 0.
However, we only know that the solution is an extrema, and not necessarily
maximum. To figure out if it is the maximum, we take the second derivative.
Note that for a single system, we have
d
2
S
dE
2
=
d
dE
1
T
=
1
T
2
C
.
Applying this to two systems, one can check that entropy is maximized at
E
(1)
=
E
if
C
1
, C
2
>
0. The actual computations is left as an exercise on the
first example sheet.
Let’s look at some actual systems and try to calculate these quantities.
Example.
Consider a 2-state system, where we have
N
non-interacting particles
with fixed positions. Each particle is either in
|↑i
or
|↓i
. We can think of these
as spins, for example. These two states have different energies
E
= ε, E
= 0.
We let
N
and
N
be the number of particles in
|↑i
and
|↓i
respectively. Then
the total energy of the system is
E = εN
.
We want to calculate this quantity Ω(
E
). Here in this very contrived example,
it is convenient to pick
δE < ε
, so that Ω(
E
) is just the number of ways of
choosing N
particles from N. By basic combinatorics, we have
Ω(E) =
N!
N
!(N N
)!
,
and
S(E) = k log
N!
N
!(N N
)!
.
This is not an incredibly useful formula. Since we assumed that
N
and
N
are
huge, we can use Stirling’s approximation
N! =
2πNN
N
e
N
1 + O
1
N

.
Then we have
log N! = N log N N +
1
2
log(2πN) + O
1
N
.
We just use the approximation three times to get
S(E) = k (N log N N N
log N
+ N
(N N
) log(N N
) + N N
)
= k
(N N
) log
N N
N
+ N
log
N
N

= kN

1
E
Nε
log
1
E
Nε
+
E
Nε
log
E
Nε

.
This is better, but we can get much more out of it if we plot it:
E
S(E)
0
Nε
Nε/2
Nk log 2
The temperature is
1
T
=
dS
dT
=
k
ε
log
Nε
E
1
,
and we can invert to get
N
N
=
E
Nε
=
1
e
ε/kT
+ 1
.
Suppose we get to control the temperature of the system, e.g. if we put it with a
heat reservoir. What happens as we vary our temperature?
As
T
0, we have
N
0. So the states all try to go to the ground state.
As T , we find N
/N
1
2
, and E Nε/2.
The second result is a bit weird. As
T
, we might expect all things to go
the maximum energy level, and not just half of them.
To confuse ourselves further, we can plot another graph, for
1
T
vs
E
. The
graph looks like
E
1
T
0
Nε
Nε/2
We see that having energy
> Nε/
2 corresponds to negative temperature, and to
go from positive temperature to negative temperature, we need to pass through
infinite temperature. So in some sense, negative temperature is “hotter” than
infinite temperature.
What is going on? By definition, negative
T
means Ω(
E
) is a decreasing
function of energy. This is a very unusual situation. In this system, all the
particles are fixed, and have no kinetic energy. Consequently, the possible energy
levels are bounded. If we included kinetic energy into the system, then kinetic
energy can be arbitrarily large. In this case, Ω(
E
) is usually an increasing
function of E.
Negative
T
has indeed been observed experimentally. This requires setups
where the kinetic energy is not so important in the range of energies we are
talking about. One particular scenario where this is observed is in nuclear spins
of crystals in magnetic fields. If we have a magnetic field, then naturally, most
of the spins will align with the field. We now suddenly flip the field, and then
most of the spins are anti-aligned, and this can give us a negative temperature
state.
Now we can’t measure negative temperature by sticking a thermometer into
the material and getting a negative answer. Something that can be interestingly
measured is the heat capacity
C =
dE
dT
=
Nε
2
kT
2
e
ε/kT
(e
ε/kT
+ 1)
2
.
This again exhibits some peculiar properties. We begin by looking at a plot:
T
C
0
kT ε
By looking at the formula, we see that the maximum
kT
is related to the
microscopic
ε
. If we know about the value of
k
, then we can use the macroscopic
observation of C to deduce something about the microscopic ε.
Note that C is proportional to N. As T 0, we have
C T
2
e
ε/kT
,
and this is a function that decreases very rapidly as
T
0, and in fact this is
one of the favorite examples in analysis where all derivatives of the function at 0
vanish. Ultimately, this is due to the energy gap between the ground state and
the first excited state.
Another peculiarity of this plot is that the heat capacity vanishes at high
temperature, but this is due to the peculiar property of the system at high
temperature. In a general system, we expect the heat capacity to increase with
temperature.
How much of this is actually physical? The answer is “not much”. This is
not surprising, because we didn’t really do much physics in these computations.
For most solids, the contribution to
C
from spins is swamped by other effects
such as contributions of phonons (quantized vibrations in the solid) or electrons.
In this case, C(T ) is monotonic in T .
However, there are some very peculiar materials for which we obtain a small
local maximum in
C
(
T
) for very small
T
, before increasing monotonically, which
is due to the contributions of spin:
T
C
0