1Fundamentals of statistical mechanics

II Statistical Physics

1.1 Microcanonical ensemble

We begin by considering a rather general system. Suppose we have an isolated

system containing

N

particles, where

N

is a Large Number

TM

. The canonical

example to keep in mind is a box of gas detached from reality.

Definition

(Microstate)

.

The microstate of a system is the actual (quantum)

state of the system. This gives a complete description of the system.

As one would expect, the microstate is very complicated and infeasible to

describe, especially when we have many particles. In statistical physics, we

observe that many microstates are indistinguishable macroscopically. Thus, we

only take note of some macroscopically interesting quantities, and use these

macroscopic quantities to put a probability distribution on the microstates.

More precisely, we let {|ni} be a basis of normalized eigenstates, say

ˆ

H |ni = E

n

|ni.

We let

p

(

n

) be the probability that the microstate is

|ni

. Note that this

probability is not the quantum probability we are used to. It is some probability

assigned to reflect our ignorance of the system. Given such probabilities, we can

define the expectation of an operator in the least imaginative way:

Definition

(Expectation value)

.

Given a probability distribution

p

(

n

) on the

states, the expectation value of an operator O is

hOi =

X

n

p(n) hn|O|ni.

If one knows about density operators, we can describe the system as a mixed

state with density operator

ρ =

X

n

p(n) |nihn|.

There is an equivalent way of looking at this. We can consider an ensemble

consisting of

W

1 independent copies of our system such that

W p

(

n

) many

copies are in the microstate

|ni

. Then the expectation is just the average over

the ensemble. For most purposes, how we think about this doesn’t really matter.

We shall further assume our system is in equilibrium, i.e. the probability

distribution

p

(

n

) does not change in time. So in particular

hOi

is independent of

time. Of course, this does not mean the particles stop moving. The particles are

still whizzing around. It’s just that the statistical distribution does not change.

In this course, we will mostly be talking about equilibrium systems. When we

get out of equilibrium, things become very complicated.

The idea of statistical physics is that we have some partial knowledge about

the system. For example, we might know its total energy. The microstates that

are compatible with this partial knowledge are called accessible. The fundamental

assumption of statistical mechanics is then

An isolated system in equilibrium is equally likely to be in any of the

accessible microstates.

Thus, different probability distributions, or different ensembles, are distinguished

by the partial knowledge we know.

Definition

(Microcanonical ensemble)

.

In a microcanonical ensemble, we know

the energy is between

E

and

E

+

δE

, where

δE

is the accuracy of our measuring

device. The accessible microstates are those with energy

E ≤ E

n

≤ E

+

δE

. We

let Ω(E) be the number of such states.

In practice,

δE

is much much larger than the spacing of energy levels, and so

Ω(

E

)

1. A priori, it seems like our theory will depend on what the value of

δE is, but as we develop the theory, we will see that this doesn’t really matter.

It is crucial here that we are working with a quantum system, so the possible

states is discrete, and it makes sense to count the number of systems. We need

to do more quite a bit work if we want to do this classically.

Example.

Suppose we have

N

= 10

23

particles, and each particle can occupy

two states

|↑i

and

|↓i

, which have the same energy

ε

. Then we always have

Nε

total energy, and we have

Ω(Nε) = 2

10

23

.

This is a fantastically huge, mind-boggling number. This is the kind of number

we are talking about.

By the fundamental assumption, we can write

p(n) =

(

1

Ω(E)

if E ≤ E

n

≤ E + δE

0 otherwise

.

This is the characteristic distribution of the microcanonical ensemble.

It turns out it is not very convenient to work with Ω(

E

). In particular, Ω(

E

)

is not linear in

N

, the number of particles. Instead, it scales as an exponential

of N. So we take the logarithm.

Definition (Boltzmann entropy). The (Boltzmann) entropy is defined as

S(E) = k log Ω(E),

where k = 1.381 ×10

−23

J K

−1

is Boltzmann’s constant.

This annoying constant

k

is necessary because when people started doing

thermodynamics, they didn’t know about statistical physics, and picked weird

conventions.

We wrote our expressions as

S

(

E

), instead of

S

(

E, δE

). As promised, the

value of

δE

doesn’t really matter. We know that Ω(

E

) will scale approximately

linearly with

δE

. So if we, say, double

δE

, then

S

(

E

) will increase by

k log

2,

which is incredibly tiny compared to

S

(

E

) =

k log

Ω(

E

). So it doesn’t matter

which value of δE we pick.

Even if you are not so convinced that multiplying 10

10

23

by a factor of 2 or

adding

log

2 to 10

23

do not really matter, you should be reassured that at the

end, we will rarely talk about Ω(

E

) or

S

(

E

) itself. Instead, we will often divide

two different Ω’s to get probabilities, or differentiate

S

to get other interesting

quantities. In these cases, the factors really do not matter.

The second nice property of the entropy is that it is additive — if we have

two non-interacting systems with energies

E

(1)

, E

(2)

. Then the total number of

states of the combined system is

Ω(E

(1)

, E

(2)

) = Ω

1

(E

(1)

)Ω

2

(E

(2)

).

So when we take the logarithm, we find

S(E

(1)

, E

(2)

) = S(E

(1)

) + S(E

(2)

).

Of course, this is not very interesting, until we bring our systems together and

let them interact with each other.

Interacting systems

Suppose we bring the two systems together, and let them exchange energy. Then

the energy of the individual systems is no longer fixed, and only the total energy

E

total

= E

(1)

+ E

(2)

is fixed. Then we find that

Ω(E

total

) =

X

E

i

Ω

1

(E

i

)Ω

2

(E

total

− E

i

),

where we sum over all possible energy levels of the first system. In terms of the

entropy of the system, we have

Ω(E

total

) =

X

E

i

exp

S

1

(E

i

)

k

+

S

2

(E

total

− E

i

)

k

We can be a bit more precise with what the sum means. We are not summing

over all eigenstates. Recall that we have previously fixed an accuracy

δE

. So

we can imagine dividing the whole energy spectrum into chunks of size

δE

, and

here we are summing over the chunks.

We know that

S

1,2

/k ∼ N

1,2

∼

10

23

, which is a ridiculously large number.

So the sum is overwhelmingly dominated by the term with the largest exponent.

Suppose this is maximized when E

i

= E

∗

. Then we have

S(E

total

) = k log Ω(E

total

) ≈ S

1

(E

∗

) + S

2

(E

total

− E

∗

).

Again, we are not claiming that only the factor coming from

E

∗

has significant

contribution. Maybe one or two energy levels next to

E

∗

are also very significant,

but taking these into account will only multiply Ω(

E

total

) by a (relatively) small

constant, hence contributes a small additive factor to

S

(

E

total

), which can be

neglected.

Now given any

E

(1)

, what is the probability that the actual energy of the

first system is E

(1)

? For convenience, we write E

(2)

= E

total

− E

(1)

, the energy

of the second system. Then the probability desired is

Ω

1

(E

(1)

)Ω

2

(E

(2)

)

Ω(E

total

)

= exp

1

k

S

1

(E

(1)

) + S

2

(E

(2)

) − S(E

total

)

.

Again recall that the numbers at stake are unimaginably huge. So if

S

1

(

E

(1)

) +

S

2

(

E

(2)

) is even slightly different from

S

(

E

total

), then the probability is effectively

zero. And by above, for the two quantities to be close, we need

E

(1)

=

E

∗

. So

for all practical purposes, the value of E

(1)

is fixed into E

∗

.

Now imagine we prepare two systems separately with energies

E

(1)

and

E

(2)

such that

E

(1)

6

=

E

∗

, and then bring the system together, then we are no longer

in equilibrium.

E

(1)

will change until it takes value

E

∗

, and then entropy of the

system will increase from

S

1

(

E

(1)

) +

S

2

(

E

(2)

) to

S

1

(

E

∗

) +

S

2

(

E

total

− E

∗

). In

particular, the entropy increases.

Law

(Second law of thermodynamics)

.

The entropy of an isolated system

increases (or remains the same) in any physical process. In equilibrium, the

entropy attains its maximum value.

This prediction is verified by virtually all observations of physics.

While our derivation did not show it is impossible to violate the second law

of thermodynamics, it is very very very very very very very very unlikely to be

violated.

Temperature

Having defined entropy, the next interesting thing we can define is the temperature.

We assume that

S

is a smooth function in

E

. Then we can define the temperature

as follows:

Definition (Temperature). The temperature is defined to be

1

T

=

dS

dE

.

Why do we call this the temperature? Over the course, we will see that

this quantity we decide to call “temperature” does behave as we would expect

temperature to behave. It is difficult to give further justification of this definition,

because even though we vaguely have some idea what temperature is like in

daily life, those ideas are very far from anything we can concretely write down

or even describe.

One reassuring property we can prove is the following:

Proposition.

Two interacting systems in equilibrium have the same tempera-

ture.

Proof. Recall that the equilibrium energy E

∗

is found by maximizing

S

1

(E

i

) + S

2

(E

total

− E

i

)

over all possible

E

i

. Thus, at an equilibrium, the derivative of this expression

has to vanish, and the derivative is exactly

dS

1

dE

E

(1)

=E

∗

−

dS

i

dE

E

(2)

=E

total

−E

∗

= 0

So we need

1

T

1

=

1

T

2

.

In other words, we need

T

1

= T

2

.

Now suppose initially, our systems have different temperature. We would

expect energy to flow from the hotter system to the cooler system. This is indeed

the case.

Proposition.

Suppose two systems with initial energies

E

(1)

, E

(2)

and temper-

atures

T

1

, T

2

are put into contact. If

T

1

> T

2

, then energy will flow form the

first system to the second.

Proof.

Since we are not in equilibrium, there must be some energy transfer from

one system to the other. Suppose after time δt, the energy changes by

E

(1)

7→ E

(1)

+ δE

E

(2)

7→ E

(2)

− δE,

keeping the total energy constant. Then the change in entropy is given by

δS =

dS

1

dE

δE

(1)

+

dS

2

dE

δE

(2)

=

1

T

1

−

1

T

2

δE.

By assumption, we know

1

T

1

−

1

T

2

< 0,

but by the second law of thermodynamics, we know

δS

must increase. So we

must have δE < 0, i.e. energy flows from the first system to the second.

So this notion of temperature agrees with the basic properties of temperature

we expect.

Note that these properties we’ve derived only depends on the fact that

1

T

is a monotonically decreasing function of T . In principle, we could have picked

any monotonically decreasing function of

T

, and set it to

dS

dE

. We will later see

that this definition will agree with the other definitions of temperature we have

previously seen, e.g. via the ideal gas law, and so this is indeed the “right” one.

Heat capacity

As we will keep on doing later, we can take different derivatives to get different

interesting quantities. This time, we are going to get heat capacity. Recall that

T

was a function of energy,

T

=

T

(

E

). We will assume that we can invert this

function, at least locally, to get E as a function of T .

Definition (Heat capacity). The heat capacity of a system is

C =

dE

dT

.

The specific heat capacity is

C

mass of system

.

The specific heat capacity is a property of the substance that makes up the

system, and not how much stuff there is, as both

C

and the mass scale linearly

with the size of the system.

This is some quantity we can actually physically measure. We can measure

the temperature with a thermometer, and it is usually not too difficult to see how

much energy we are pumping into a system. Then by measuring the temperature

change, we can figure out the heat capacity.

In doing so, we can indirectly measure the entropy, or at least the changes in

entropy. Note that we have

dS

dT

=

dS

dE

dE

dT

=

C

T

.

Integrating up, if the temperature changes from T

1

to T

2

, we know

∆S =

Z

T

2

T

1

C(T )

T

dT.

As promised, by measuring heat capacity experimentally, we can measure the

change in entropy.

The heat capacity is useful in other ways. Recall that to find the equilibrium

energy E

∗

, a necessary condition was that it satisfies

dS

1

dE

−

dS

2

dE

= 0.

However, we only know that the solution is an extrema, and not necessarily

maximum. To figure out if it is the maximum, we take the second derivative.

Note that for a single system, we have

d

2

S

dE

2

=

d

dE

1

T

= −

1

T

2

C

.

Applying this to two systems, one can check that entropy is maximized at

E

(1)

=

E

∗

if

C

1

, C

2

>

0. The actual computations is left as an exercise on the

first example sheet.

Let’s look at some actual systems and try to calculate these quantities.

Example.

Consider a 2-state system, where we have

N

non-interacting particles

with fixed positions. Each particle is either in

|↑i

or

|↓i

. We can think of these

as spins, for example. These two states have different energies

E

↑

= ε, E

↓

= 0.

We let

N

↑

and

N

↓

be the number of particles in

|↑i

and

|↓i

respectively. Then

the total energy of the system is

E = εN

↑

.

We want to calculate this quantity Ω(

E

). Here in this very contrived example,

it is convenient to pick

δE < ε

, so that Ω(

E

) is just the number of ways of

choosing N

↑

particles from N. By basic combinatorics, we have

Ω(E) =

N!

N

↑

!(N − N

↑

)!

,

and

S(E) = k log

N!

N

↑

!(N − N

↑

)!

.

This is not an incredibly useful formula. Since we assumed that

N

and

N

↑

are

huge, we can use Stirling’s approximation

N! =

√

2πNN

N

e

−N

1 + O

1

N

.

Then we have

log N! = N log N − N +

1

2

log(2πN) + O

1

N

.

We just use the approximation three times to get

S(E) = k (N log N − N − N

↑

log N

↑

+ N

↑

− (N − N

↑

) log(N − N

↑

) + N − N

↑

)

= −k

(N − N

↑

) log

N − N

↑

N

+ N

↑

log

N

↑

N

= −kN

1 −

E

Nε

log

1 −

E

Nε

+

E

Nε

log

E

Nε

.

This is better, but we can get much more out of it if we plot it:

E

S(E)

0

Nε

Nε/2

Nk log 2

The temperature is

1

T

=

dS

dT

=

k

ε

log

Nε

E

− 1

,

and we can invert to get

N

↑

N

=

E

Nε

=

1

e

ε/kT

+ 1

.

Suppose we get to control the temperature of the system, e.g. if we put it with a

heat reservoir. What happens as we vary our temperature?

–

As

T →

0, we have

N

↑

→

0. So the states all try to go to the ground state.

– As T → ∞, we find N

↑

/N →

1

2

, and E → Nε/2.

The second result is a bit weird. As

T → ∞

, we might expect all things to go

the maximum energy level, and not just half of them.

To confuse ourselves further, we can plot another graph, for

1

T

vs

E

. The

graph looks like

E

1

T

0

Nε

Nε/2

We see that having energy

> Nε/

2 corresponds to negative temperature, and to

go from positive temperature to negative temperature, we need to pass through

infinite temperature. So in some sense, negative temperature is “hotter” than

infinite temperature.

What is going on? By definition, negative

T

means Ω(

E

) is a decreasing

function of energy. This is a very unusual situation. In this system, all the

particles are fixed, and have no kinetic energy. Consequently, the possible energy

levels are bounded. If we included kinetic energy into the system, then kinetic

energy can be arbitrarily large. In this case, Ω(

E

) is usually an increasing

function of E.

Negative

T

has indeed been observed experimentally. This requires setups

where the kinetic energy is not so important in the range of energies we are

talking about. One particular scenario where this is observed is in nuclear spins

of crystals in magnetic fields. If we have a magnetic field, then naturally, most

of the spins will align with the field. We now suddenly flip the field, and then

most of the spins are anti-aligned, and this can give us a negative temperature

state.

Now we can’t measure negative temperature by sticking a thermometer into

the material and getting a negative answer. Something that can be interestingly

measured is the heat capacity

C =

dE

dT

=

Nε

2

kT

2

e

ε/kT

(e

ε/kT

+ 1)

2

.

This again exhibits some peculiar properties. We begin by looking at a plot:

T

C

0

kT ∼ ε

By looking at the formula, we see that the maximum

kT

is related to the

microscopic

ε

. If we know about the value of

k

, then we can use the macroscopic

observation of C to deduce something about the microscopic ε.

Note that C is proportional to N. As T → 0, we have

C ∝ T

−2

e

−ε/kT

,

and this is a function that decreases very rapidly as

T →

0, and in fact this is

one of the favorite examples in analysis where all derivatives of the function at 0

vanish. Ultimately, this is due to the energy gap between the ground state and

the first excited state.

Another peculiarity of this plot is that the heat capacity vanishes at high

temperature, but this is due to the peculiar property of the system at high

temperature. In a general system, we expect the heat capacity to increase with

temperature.

How much of this is actually physical? The answer is “not much”. This is

not surprising, because we didn’t really do much physics in these computations.

For most solids, the contribution to

C

from spins is swamped by other effects

such as contributions of phonons (quantized vibrations in the solid) or electrons.

In this case, C(T ) is monotonic in T .

However, there are some very peculiar materials for which we obtain a small

local maximum in

C

(

T

) for very small

T

, before increasing monotonically, which

is due to the contributions of spin:

T

C

0