2Classification of chains and states

IB Markov Chains

2.2 Recurrence or transience

The major focus of this chapter is recurrence and transience. This was something

that came up when we discussed random walks in IA Probability — given a

random walk, say starting at 0, what is the probability that we will return to

0 later on? Recurrence and transience is a qualitative way of answering this

question. As we mentioned, we will mostly focus on irreducible chains. So by

definition there is always a non-zero probability of returning to 0. Hence the

question we want to ask is whether we are going to return to 0 with certainty, i.e.

with probability 1. If we are bound to return, then we say the state is recurrent.

Otherwise, we say it is transient.

It should be clear that this notion is usually interesting only for an infinite

state space. If we have an infinite state space, we might get transience because we

are very likely to drift away to a place far, far away and never return. However,

in a finite state space, this can’t happen. Transience can occur only if we get

stuck in a place and can’t leave, i.e. we are not in an irreducible state space.

These are not interesting.

Notation. For convenience, we will introduce some notations. We write

P

i

(A) = P(A | X

0

= i),

and

E

i

(Z) = E(Z | X

0

= i).

Suppose we start from

i

, and randomly wander around. Eventually, we may

or may not get to

j

. If we do, there is a time at which we first reach

j

. We call

this the first passage time.

Definition (First passage time and probability). The first passage time of

j ∈ S

starting from i is

T

j

= min{n ≥ 1 : X

n

= j}.

Note that this implicitly depends on

i

. Here we require

n ≥

1. Otherwise

T

i

would always be 0.

The first passage probability is

f

ij

(n) = P

i

(T

j

= n).

Definition (Recurrent state). A state i ∈ S is recurrent (or persistent) if

P

i

(T

i

< ∞) = 1,

i.e. we will eventually get back to the state. Otherwise, we call the state transient.

Note that transient does not mean we don’t get back. It’s just that we are

not sure that we will get back. We can show that if a state is recurrent, then

the probability that we return to i infinitely many times is also 1.

Our current objective is to show the following characterization of recurrence.

Theorem. i is recurrent iff

P

n

p

i,i

(n) = ∞.

The technique to prove this would be to use generating functions. We need to

first decide what sequence to work with. For any fixed

i, j

, consider the sequence

p

ij

(n) as a sequence in n. Then we define

P

i,j

(s) =

∞

X

n=0

p

i,j

(n)s

n

.

We also define

F

i,j

(S) =

∞

X

n=0

f

i,j

(n)s

n

,

where

f

i,j

is our first passage probability. For the sake of clarity, we make it

explicit that p

i,j

(0) = δ

i,j

, and f

i,j

(0) = 0.

Our proof would be heavily based on the result below:

Theorem.

P

i,j

(s) = δ

i,j

+ F

i,j

(s)P

j,j

(s),

for −1 < s ≤ 1.

Proof. Using the law of total probability

p

i,j

(n) =

n

X

m=1

P

i

(X

n

= j | T

j

= m)P

i

(T

j

= m)

Using the Markov property, we can write this as

=

n

X

m=1

P(X

n

= j | X

m

= j)P

i

(T

j

= m)

=

n

X

m=1

p

j,j

(n −m)f

i,j

(m).

We can multiply through by s

n

and sum over all n to obtain

∞

X

n=1

p

i,j

(n)s

n

=

∞

X

n=1

n

X

m=1

p

j,j

(n − m)s

n−m

f

i,j

(m)s

m

.

The left hand side is almost the generating function

P

i,j

(

s

), except that we

are missing an

n

= 0 term, which is

p

i,j

(0) =

δ

i,j

. The right hand side is the

“convolution” of the power series P

j,j

(s) and F

i,j

(s), which we can write as the

product P

j,j

(s)F

i,j

(s). So

P

i,j

(s) − δ

i,j

= P

j,j

(s)F

i,j

(s).

Before we actually prove our theorem, we need one helpful result from

Analysis that allows us to deal with power series nicely.

Lemma (Abel’s lemma). Let

u

1

, u

2

, ···

be real numbers such that

U

(

s

) =

P

n

u

n

s

n

converges for 0 < s < 1. Then

lim

s→1

−

U(s) =

X

n

u

n

.

Proof is an exercise in analysis, which happens to be on the first example

sheet of Analysis II.

We now prove the theorem we initially wanted to show

Theorem. i is recurrent iff

P

n

p

ii

(n) = ∞.

Proof. Using j = i in the above formula, for 0 < s < 1, we have

P

i,i

(s) =

1

1 − F

i,i

(s)

.

Here we need to be careful that we are not dividing by 0. This would be a

problem if F

ii

(s) = 1. By definition, we have

F

i,i

(s) =

∞

X

n=1

f

i,i

(n)s

n

.

Also, by definition of f

ii

, we have

F

i,i

(1) =

X

n

f

i,i

(n) = P(ever returning to i) ≤ 1.

So for

|s| <

1,

F

i,i

(

s

)

<

1. So we are not dividing by zero. Now we use our

original equation

P

i,i

(s) =

1

1 − F

i,i

(s)

,

and take the limit as

s →

1. By Abel’s lemma, we know that the left hand side

is

lim

s→1

P

i,i

(s) = P

i,i

(1) =

X

n

p

i,i

(n).

The other side is

lim

s→1

1

1 − F

i,i

(s)

=

1

1 −

P

f

i,i

(n)

.

Hence we have

X

n

p

i,i

(n) =

1

1 −

P

f

i,i

(n)

.

Since

P

f

i,i

(

n

) is the probability of ever returning, the probability of ever

returning is 1 if and only if

P

n

p

i,i

(n) = ∞.

Using this result, we can check if a state is recurrent. However, a Markov

chain has many states, and it would be tedious to check every state individually.

Thus we have the following helpful result.

Theorem. Let C be a communicating class. Then

(i) Either every state in C is recurrent, or every state is transient.

(ii) If C contains a recurrent state, then C is closed.

Proof.

(i)

Let

i ↔ j

and

i

=

j

. Then by definition of communicating, there is some

m

such that

p

i,j

(

m

) =

α >

0, and some

n

such that

p

j,i

(

n

) =

β >

0. So

for each k, we have

p

i,i

(m + k + n) ≥ p

i,j

(m)p

j,j

(k)p

j,i

(n) = αβp

j,j

(k).

So if

P

k

p

j,j

(

k

) =

∞

, then

P

r

p

i,i

(

r

) =

∞

. So

j

recurrent implies

i

recurrent. Similarly, i recurrent implies j recurrent.

(ii)

If

C

is not closed, then there is a non-zero probability that we leave the

class and never get back. So the states are not recurrent.

Note that there is a profound difference between a finite state space and an

infinite state space. A finite state space can be represented by a finite matrix,

and we are all very familiar with a finite matrices. We can use everything we

know about finite matrices from IA Vectors and Matrices. However, infinite

matrices are weirder.

For example, any finite transition matrix

P

has an eigenvalue of 1. This

is since the row sums of a transition matrix is always 1. So if we multiply

P

by e = (1

,

1

, ··· ,

1), then we get e again. However, this is not true for infinite

matrices, since we usually don’t usually allow arbitrary infinite vectors. To

avoid getting infinitely large numbers when multiplying vectors and matrices, we

usually restrict our focus to vectors x such that

P

x

2

i

is finite. In this case the

vector e is not allowed, and the transition matrix need not have eigenvalue 1.

Another thing about a finite state space is that probability “cannot escape”.

Each step of a Markov chain gives a probability distribution on the state space,

and we can imagine the progression of the chain as a flow of probabilities around

the state space. If we have a finite state space, then all the probability flow must

be contained within our finite state space. However, if we have an infinite state

space, then probabilities can just drift away to infinity.

More concretely, we get the following result about finite state spaces.

Theorem. In a finite state space,

(i) There exists at least one recurrent state.

(ii) If the chain is irreducible, every state is recurrent.

Proof.

(ii) follows immediately from (i) since if a chain is irreducible, either all

states are transient or all states are recurrent. So we just have to prove (i).

We first fix an arbitrary i. Recall that

P

i,j

(s) = δ

i,j

+ P

j,j

(s)F

i,j

(s).

If

j

is transient, then

P

n

p

j,j

(

n

) =

P

j,j

(1)

< ∞

. Also,

F

i,j

(1) is the probability

of ever reaching

j

from

i

, and is hence finite as well. So we have

P

i,j

(1)

< ∞

.

By Abel’s lemma, P

i,j

(1) is given by

P

i,j

(1) =

X

n

p

i,j

(n).

Since this is finite, we must have p

i,j

(n) → 0.

Since we know that

X

j∈S

p

i,j

(n) = 1,

if every state is transient, then since the sum is finite, we know

P

p

i,j

(

n

)

→

0

as n → ∞. This is a contradiction. So we must have a recurrent state.

Theorem (P´olya’s theorem). Consider

Z

d

=

{

(

x

1

, x

2

, ··· , x

d

) :

x

i

∈ Z}

. This

generates a graph with

x

adjacent to

y

if

|x −y|

= 1, where

| · |

is the Euclidean

norm.

d = 1

d = 2

Consider a random walk in

Z

d

. At each step, it moves to a neighbour, each

chosen with equal probability, i.e.

P(X

n+1

= j | X

n

= i) =

(

1

2d

|j − i| = 1

0 otherwise

This is an irreducible chain, since it is possible to get from one point to any

other point. So the points are either all recurrent or all transient.

The theorem says this is recurrent iff d = 1 or 2.

Intuitively, this makes sense that we get recurrence only for low dimensions,

since if we have more dimensions, then it is easier to get lost.

Proof.

We will start with the case

d

= 1. We want to show that

P

p

0,0

(

n

) =

∞

.

Then we know the origin is recurrent. However, we can simplify this a bit. It is

impossible to get back to the origin in an odd number of steps. So we can instead

consider

P

p

0,0

(2

n

). However, we can write down this expression immediately.

To return to the origin after 2

n

steps, we need to have made

n

steps to the left,

and n steps to the right, in any order. So we have

p

0,0

(2n) = P(n steps left, n steps right) =

2n

n

1

2

2n

.

To show if this converges, it is not helpful to work with these binomial coefficients

and factorials. So we use Stirling’s formula

n

!

≃

√

2πn

n

e

n

. If we plug this in,

we get

p

0,0

(2n) ∼

1

√

πn

.

This tends to 0, but really slowly, and even more slowly than the harmonic series.

So we have

P

p

0,0

(2n) = ∞.

In the

d

= 2 case, suppose after 2

n

steps, I have taken

r

steps right,

ℓ

steps

left,

u

steps up and

d

steps down. We must have

r

+

ℓ

+

u

+

d

= 2

n

, and we need

r

=

ℓ, u

=

d

to return the origin. So we let

r

=

ℓ

=

m, u

=

d

=

n −m

. So we get

p

0,0

(2n) =

1

4

2n

n

X

m=0

2n

m, m, n − m, n − m

=

1

4

2n

n

X

m=0

(2n)!

(m!)

2

((n − m)!)

2

=

1

4

2n

2n

n

n

X

m=0

n!

m!(n − m)!

2

=

1

4

2n

2n

n

n

X

m=0

n

m

n

n − m

We now use a well-known identity (proved in IA Numbers and Sets) to obtain

=

1

4

2n

2n

n

2n

n

=

"

2n

n

1

2

2n

#

2

∼

1

πn

.

So the sum diverges. So this is recurrent. Note that the two-dimensional

probability turns out to be the square of the one-dimensional probability. This

is not a coincidence, and we will explain this after the proof. However, this does

not extend to higher dimensions.

In the d = 3 case, we have

p

0,0

(2n) =

1

6

2n

X

i+j+k=n

(2n)!

(i!j!k!)

2

.

This time, there is no neat combinatorial formula. Since we want to show this is

summable, we can try to bound this from above. We have

p

0,0

(2n) =

1

6

2n

2n

n

X

n!

i!j!k!

2

=

1

2

2n

2n

n

X

n!

3

n

i!j!k!

2

Why do we write it like this? We are going to use the identity

X

i+j+k=n

n!

3

n

i!j!k!

=

1. Where does this come from? Suppose we have three urns, and throw

n

balls

into it. Then the probability of getting

i

balls in the first,

j

in the second and

k

in the third is exactly

n!

3

n

i!j!k!

. Summing over all possible combinations of

i

,

j

and

k

gives the total probability of getting in any configuration, which is 1. So

we can bound this by

≤

1

2

2n

2n

n

max

n!

3

n

i!j!k!

X

n!

3

n

i!j!k!

=

1

2

2n

2n

n

max

n!

3

n

i!j!k!

To find the maximum, we can replace the factorial by the gamma function and

use Lagrange multipliers. However, we would just argue that the maximum can

be achieved when i, j and k are as close to each other as possible. So we get

≤

1

2

2n

2n

n

n!

3

n

1

⌊n/3⌋!

3

≤ Cn

−3/2

for some constant

C

using Stirling’s formula. So

P

p

0,0

(2

n

)

< ∞

and the chain

is transient. We can prove similarly for higher dimensions.

Let’s get back to why the two dimensional probability is the square of the one-

dimensional probability. This square might remind us of independence. However,

it is obviously not true that horizontal movement and vertical movement are

independent — if we go sideways in one step, then we cannot move vertically.

So we need something more sophisticated.

We write

X

n

= (

A

n

, B

n

). What we do is that we try to rotate our space.

We are going to record our coordinates in a pair of axis that is rotated by 45

◦

.

A

B

(A

n

, B

n

)

V

U

U

n

/

√

2

V

n

/

√

2

We can define the new coordinates as

U

n

= A

n

− B

n

V

n

= A

n

+ B

n

In each step, either

A

n

or

B

n

change by one step. So

U

n

and

V

n

both change by

1. Moreover, they are independent. So we have

p

0,0

(2n) = P(A

n

= B

n

= 0)

= P(U

n

= V

n

= 0)

= P(U

n

= 0)P(V

n

= 0)

=

"

2n

n

1

2

2n

#

2

.