2Classification of chains and states
IB Markov Chains
2.2 Recurrence or transience
The major focus of this chapter is recurrence and transience. This was something
that came up when we discussed random walks in IA Probability — given a
random walk, say starting at 0, what is the probability that we will return to
0 later on? Recurrence and transience is a qualitative way of answering this
question. As we mentioned, we will mostly focus on irreducible chains. So by
definition there is always a non-zero probability of returning to 0. Hence the
question we want to ask is whether we are going to return to 0 with certainty, i.e.
with probability 1. If we are bound to return, then we say the state is recurrent.
Otherwise, we say it is transient.
It should be clear that this notion is usually interesting only for an infinite
state space. If we have an infinite state space, we might get transience because we
are very likely to drift away to a place far, far away and never return. However,
in a finite state space, this can’t happen. Transience can occur only if we get
stuck in a place and can’t leave, i.e. we are not in an irreducible state space.
These are not interesting.
Notation. For convenience, we will introduce some notations. We write
P
i
(A) = P(A | X
0
= i),
and
E
i
(Z) = E(Z | X
0
= i).
Suppose we start from
i
, and randomly wander around. Eventually, we may
or may not get to
j
. If we do, there is a time at which we first reach
j
. We call
this the first passage time.
Definition (First passage time and probability). The first passage time of
j ∈ S
starting from i is
T
j
= min{n ≥ 1 : X
n
= j}.
Note that this implicitly depends on
i
. Here we require
n ≥
1. Otherwise
T
i
would always be 0.
The first passage probability is
f
ij
(n) = P
i
(T
j
= n).
Definition (Recurrent state). A state i ∈ S is recurrent (or persistent) if
P
i
(T
i
< ∞) = 1,
i.e. we will eventually get back to the state. Otherwise, we call the state transient.
Note that transient does not mean we don’t get back. It’s just that we are
not sure that we will get back. We can show that if a state is recurrent, then
the probability that we return to i infinitely many times is also 1.
Our current objective is to show the following characterization of recurrence.
Theorem. i is recurrent iff
P
n
p
i,i
(n) = ∞.
The technique to prove this would be to use generating functions. We need to
first decide what sequence to work with. For any fixed
i, j
, consider the sequence
p
ij
(n) as a sequence in n. Then we define
P
i,j
(s) =
∞
X
n=0
p
i,j
(n)s
n
.
We also define
F
i,j
(S) =
∞
X
n=0
f
i,j
(n)s
n
,
where
f
i,j
is our first passage probability. For the sake of clarity, we make it
explicit that p
i,j
(0) = δ
i,j
, and f
i,j
(0) = 0.
Our proof would be heavily based on the result below:
Theorem.
P
i,j
(s) = δ
i,j
+ F
i,j
(s)P
j,j
(s),
for −1 < s ≤ 1.
Proof. Using the law of total probability
p
i,j
(n) =
n
X
m=1
P
i
(X
n
= j | T
j
= m)P
i
(T
j
= m)
Using the Markov property, we can write this as
=
n
X
m=1
P(X
n
= j | X
m
= j)P
i
(T
j
= m)
=
n
X
m=1
p
j,j
(n −m)f
i,j
(m).
We can multiply through by s
n
and sum over all n to obtain
∞
X
n=1
p
i,j
(n)s
n
=
∞
X
n=1
n
X
m=1
p
j,j
(n − m)s
n−m
f
i,j
(m)s
m
.
The left hand side is almost the generating function
P
i,j
(
s
), except that we
are missing an
n
= 0 term, which is
p
i,j
(0) =
δ
i,j
. The right hand side is the
“convolution” of the power series P
j,j
(s) and F
i,j
(s), which we can write as the
product P
j,j
(s)F
i,j
(s). So
P
i,j
(s) − δ
i,j
= P
j,j
(s)F
i,j
(s).
Before we actually prove our theorem, we need one helpful result from
Analysis that allows us to deal with power series nicely.
Lemma (Abel’s lemma). Let
u
1
, u
2
, ···
be real numbers such that
U
(
s
) =
P
n
u
n
s
n
converges for 0 < s < 1. Then
lim
s→1
−
U(s) =
X
n
u
n
.
Proof is an exercise in analysis, which happens to be on the first example
sheet of Analysis II.
We now prove the theorem we initially wanted to show
Theorem. i is recurrent iff
P
n
p
ii
(n) = ∞.
Proof. Using j = i in the above formula, for 0 < s < 1, we have
P
i,i
(s) =
1
1 − F
i,i
(s)
.
Here we need to be careful that we are not dividing by 0. This would be a
problem if F
ii
(s) = 1. By definition, we have
F
i,i
(s) =
∞
X
n=1
f
i,i
(n)s
n
.
Also, by definition of f
ii
, we have
F
i,i
(1) =
X
n
f
i,i
(n) = P(ever returning to i) ≤ 1.
So for
|s| <
1,
F
i,i
(
s
)
<
1. So we are not dividing by zero. Now we use our
original equation
P
i,i
(s) =
1
1 − F
i,i
(s)
,
and take the limit as
s →
1. By Abel’s lemma, we know that the left hand side
is
lim
s→1
P
i,i
(s) = P
i,i
(1) =
X
n
p
i,i
(n).
The other side is
lim
s→1
1
1 − F
i,i
(s)
=
1
1 −
P
f
i,i
(n)
.
Hence we have
X
n
p
i,i
(n) =
1
1 −
P
f
i,i
(n)
.
Since
P
f
i,i
(
n
) is the probability of ever returning, the probability of ever
returning is 1 if and only if
P
n
p
i,i
(n) = ∞.
Using this result, we can check if a state is recurrent. However, a Markov
chain has many states, and it would be tedious to check every state individually.
Thus we have the following helpful result.
Theorem. Let C be a communicating class. Then
(i) Either every state in C is recurrent, or every state is transient.
(ii) If C contains a recurrent state, then C is closed.
Proof.
(i)
Let
i ↔ j
and
i
=
j
. Then by definition of communicating, there is some
m
such that
p
i,j
(
m
) =
α >
0, and some
n
such that
p
j,i
(
n
) =
β >
0. So
for each k, we have
p
i,i
(m + k + n) ≥ p
i,j
(m)p
j,j
(k)p
j,i
(n) = αβp
j,j
(k).
So if
P
k
p
j,j
(
k
) =
∞
, then
P
r
p
i,i
(
r
) =
∞
. So
j
recurrent implies
i
recurrent. Similarly, i recurrent implies j recurrent.
(ii)
If
C
is not closed, then there is a non-zero probability that we leave the
class and never get back. So the states are not recurrent.
Note that there is a profound difference between a finite state space and an
infinite state space. A finite state space can be represented by a finite matrix,
and we are all very familiar with a finite matrices. We can use everything we
know about finite matrices from IA Vectors and Matrices. However, infinite
matrices are weirder.
For example, any finite transition matrix
P
has an eigenvalue of 1. This
is since the row sums of a transition matrix is always 1. So if we multiply
P
by e = (1
,
1
, ··· ,
1), then we get e again. However, this is not true for infinite
matrices, since we usually don’t usually allow arbitrary infinite vectors. To
avoid getting infinitely large numbers when multiplying vectors and matrices, we
usually restrict our focus to vectors x such that
P
x
2
i
is finite. In this case the
vector e is not allowed, and the transition matrix need not have eigenvalue 1.
Another thing about a finite state space is that probability “cannot escape”.
Each step of a Markov chain gives a probability distribution on the state space,
and we can imagine the progression of the chain as a flow of probabilities around
the state space. If we have a finite state space, then all the probability flow must
be contained within our finite state space. However, if we have an infinite state
space, then probabilities can just drift away to infinity.
More concretely, we get the following result about finite state spaces.
Theorem. In a finite state space,
(i) There exists at least one recurrent state.
(ii) If the chain is irreducible, every state is recurrent.
Proof.
(ii) follows immediately from (i) since if a chain is irreducible, either all
states are transient or all states are recurrent. So we just have to prove (i).
We first fix an arbitrary i. Recall that
P
i,j
(s) = δ
i,j
+ P
j,j
(s)F
i,j
(s).
If
j
is transient, then
P
n
p
j,j
(
n
) =
P
j,j
(1)
< ∞
. Also,
F
i,j
(1) is the probability
of ever reaching
j
from
i
, and is hence finite as well. So we have
P
i,j
(1)
< ∞
.
By Abel’s lemma, P
i,j
(1) is given by
P
i,j
(1) =
X
n
p
i,j
(n).
Since this is finite, we must have p
i,j
(n) → 0.
Since we know that
X
j∈S
p
i,j
(n) = 1,
if every state is transient, then since the sum is finite, we know
P
p
i,j
(
n
)
→
0
as n → ∞. This is a contradiction. So we must have a recurrent state.
Theorem (P´olya’s theorem). Consider
Z
d
=
{
(
x
1
, x
2
, ··· , x
d
) :
x
i
∈ Z}
. This
generates a graph with
x
adjacent to
y
if
|x −y|
= 1, where
| · |
is the Euclidean
norm.
d = 1
d = 2
Consider a random walk in
Z
d
. At each step, it moves to a neighbour, each
chosen with equal probability, i.e.
P(X
n+1
= j | X
n
= i) =
(
1
2d
|j − i| = 1
0 otherwise
This is an irreducible chain, since it is possible to get from one point to any
other point. So the points are either all recurrent or all transient.
The theorem says this is recurrent iff d = 1 or 2.
Intuitively, this makes sense that we get recurrence only for low dimensions,
since if we have more dimensions, then it is easier to get lost.
Proof.
We will start with the case
d
= 1. We want to show that
P
p
0,0
(
n
) =
∞
.
Then we know the origin is recurrent. However, we can simplify this a bit. It is
impossible to get back to the origin in an odd number of steps. So we can instead
consider
P
p
0,0
(2
n
). However, we can write down this expression immediately.
To return to the origin after 2
n
steps, we need to have made
n
steps to the left,
and n steps to the right, in any order. So we have
p
0,0
(2n) = P(n steps left, n steps right) =
2n
n
1
2
2n
.
To show if this converges, it is not helpful to work with these binomial coefficients
and factorials. So we use Stirling’s formula
n
!
≃
√
2πn
n
e
n
. If we plug this in,
we get
p
0,0
(2n) ∼
1
√
πn
.
This tends to 0, but really slowly, and even more slowly than the harmonic series.
So we have
P
p
0,0
(2n) = ∞.
In the
d
= 2 case, suppose after 2
n
steps, I have taken
r
steps right,
ℓ
steps
left,
u
steps up and
d
steps down. We must have
r
+
ℓ
+
u
+
d
= 2
n
, and we need
r
=
ℓ, u
=
d
to return the origin. So we let
r
=
ℓ
=
m, u
=
d
=
n −m
. So we get
p
0,0
(2n) =
1
4
2n
n
X
m=0
2n
m, m, n − m, n − m
=
1
4
2n
n
X
m=0
(2n)!
(m!)
2
((n − m)!)
2
=
1
4
2n
2n
n
n
X
m=0
n!
m!(n − m)!
2
=
1
4
2n
2n
n
n
X
m=0
n
m
n
n − m
We now use a well-known identity (proved in IA Numbers and Sets) to obtain
=
1
4
2n
2n
n
2n
n
=
"
2n
n
1
2
2n
#
2
∼
1
πn
.
So the sum diverges. So this is recurrent. Note that the two-dimensional
probability turns out to be the square of the one-dimensional probability. This
is not a coincidence, and we will explain this after the proof. However, this does
not extend to higher dimensions.
In the d = 3 case, we have
p
0,0
(2n) =
1
6
2n
X
i+j+k=n
(2n)!
(i!j!k!)
2
.
This time, there is no neat combinatorial formula. Since we want to show this is
summable, we can try to bound this from above. We have
p
0,0
(2n) =
1
6
2n
2n
n
X
n!
i!j!k!
2
=
1
2
2n
2n
n
X
n!
3
n
i!j!k!
2
Why do we write it like this? We are going to use the identity
X
i+j+k=n
n!
3
n
i!j!k!
=
1. Where does this come from? Suppose we have three urns, and throw
n
balls
into it. Then the probability of getting
i
balls in the first,
j
in the second and
k
in the third is exactly
n!
3
n
i!j!k!
. Summing over all possible combinations of
i
,
j
and
k
gives the total probability of getting in any configuration, which is 1. So
we can bound this by
≤
1
2
2n
2n
n
max
n!
3
n
i!j!k!
X
n!
3
n
i!j!k!
=
1
2
2n
2n
n
max
n!
3
n
i!j!k!
To find the maximum, we can replace the factorial by the gamma function and
use Lagrange multipliers. However, we would just argue that the maximum can
be achieved when i, j and k are as close to each other as possible. So we get
≤
1
2
2n
2n
n
n!
3
n
1
⌊n/3⌋!
3
≤ Cn
−3/2
for some constant
C
using Stirling’s formula. So
P
p
0,0
(2
n
)
< ∞
and the chain
is transient. We can prove similarly for higher dimensions.
Let’s get back to why the two dimensional probability is the square of the one-
dimensional probability. This square might remind us of independence. However,
it is obviously not true that horizontal movement and vertical movement are
independent — if we go sideways in one step, then we cannot move vertically.
So we need something more sophisticated.
We write
X
n
= (
A
n
, B
n
). What we do is that we try to rotate our space.
We are going to record our coordinates in a pair of axis that is rotated by 45
◦
.
A
B
(A
n
, B
n
)
V
U
U
n
/
√
2
V
n
/
√
2
We can define the new coordinates as
U
n
= A
n
− B
n
V
n
= A
n
+ B
n
In each step, either
A
n
or
B
n
change by one step. So
U
n
and
V
n
both change by
1. Moreover, they are independent. So we have
p
0,0
(2n) = P(A
n
= B
n
= 0)
= P(U
n
= V
n
= 0)
= P(U
n
= 0)P(V
n
= 0)
=
"
2n
n
1
2
2n
#
2
.