3Long-run behaviour

IB Markov Chains

3.1 Invariant distributions

We want to look at what happens in the long run. Recall that at the very

beginning of the course, we calculated the transition probabilities of the two-

state Markov chain, and saw that in the long run, as

n → ∞

, the probability

distribution of the

X

n

will “converge” to some particular distribution. Moreover,

this limit does not depend on where we started. We’d like to see if this is true

for all Markov chains.

First of all, we want to make it clear what we mean by the chain “converging”

to something. When we are dealing with real sequences

x

n

, we have a precise

definition of what it means for

x

n

→ x

. How can we define the convergence of a

sequence of random variables

X

n

? These are not proper numbers, so we cannot

just apply our usual definitions.

For the purposes of our investigation of Markov chains here, it turns out

that the right way to think about convergence is to look at the probability mass

function. For each k ∈ S, we ask if P(X

n

= k) converges to anything.

In most cases,

P

(

X

n

=

k

) converges to something. Hence, this is not an

interesting question to ask. What we would really want to know is whether the

limit is a probability mass function. It is, for example, possible that

P

(

X

n

=

k) → 0 for all k, and the result is not a distribution.

From Analysis, we know there are, in general, two ways to prove something

converges — we either “guess” a limit and then prove it does indeed converge to

that limit, or we prove the sequence is Cauchy. It would be rather silly to prove

that these probabilities form a Cauchy sequence, so we will attempt to guess the

limit. The limit will be known as an invariant distribution, for reasons that will

become obvious shortly.

The main focus of this section is to study the existence and properties of

invariant distributions, and we will provide sufficient conditions for convergence

to occur in the next.

Recall that if we have a starting state

λ

, then the distribution of the

n

th

step is given by λP

n

. We have the following trivial identity:

λP

n+1

= (λP

n

)P.

If the distribution converges, then we have

λP

n

→ π

for some

π

, and also

λP

n+1

→ π. Hence the limit π satisfies

πP = π.

We call these invariant distributions.

Definition (Invariant distriubtion). Let

X

j

be a Markov chain with transition

probabilities

P

. The distribution

π

= (

π

k

:

k ∈ S

) is an invariant distribution if

(i) π

k

≥ 0,

P

k

π

k

= 1.

(ii) π = πP .

The first condition just ensures that this is a genuine distribution.

An invariant distribution is also known as an invariant measure, equilibrium

distribution or steady-state distribution.

Theorem. Consider an irreducible Markov chain. Then

(i) There exists an invariant distribution if some state is positive recurrent.

(ii)

If there is an invariant distribution

π

, then every state is positive recurrent,

and

π

i

=

1

µ

i

for

i ∈ S

, where

µ

i

is the mean recurrence time of

i

. In particular,

π

is

unique.

Note how we worded the first statement. Recall that we once stated that if

one state is positive recurrent, then all states are positive recurrent, and then

said we would defer the proof for later on. This is where we actually prove it.

In (i), we show that if some state is positive recurrent, then it has an invariant

distribution. Then (ii) tells us if there is an invariant distribution, then all states

are positive recurrent. Hence one state being positive recurrent implies all states

being positive recurrent.

Now where did the formula for

π

come from? We can first think what

π

i

should be. By definition, we should know that for large

m

,

P

(

X

m

=

i

)

∼ π

i

.

This means that if we go on to really late in time, we would expect to visit the

state

i π

i

of the time. On the other hand, the mean recurrence time tells us that

we are expected to (re)-visit

i

every

µ

i

steps. So it makes sense that

π

i

= 1

/µ

i

.

To put this on a more solid ground and actually prove it, we would like to

look at some time intervals. For example, we might ask how many times we will

hit

i

in 100 steps. This is not a good thing to do, because we are not given where

we are starting, and this probability can depend a lot on where the starting

point is.

It turns out the natural thing to do is not to use a fixed time interval, but

use a random time interval. In particular, we fix a state

k

, and look at the time

interval between two consecutive visits of k.

We start by letting

X

0

=

k

. Let

W

i

denote the number of visits to

i

before

the next visit to k. Formally, we have

W

i

=

∞

X

m=1

1(X

m

= i, m ≤ T

k

),

where

T

k

is the recurrence time of

m

and 1 is the indicator function. In particular,

W

i

= 1 for i = k (if T

k

is finite). We can also write this as

W

i

=

T

k

X

m=1

1(X

m

= i).

This is a random variable. So we can look at its expectation. We define

ρ

i

= E

k

(W

i

).

We will show that this ρ is almost our π

i

, up to a constant.

Proposition. For an irreducible recurrent chain and

k ∈ S

,

ρ

= (

ρ

i

:

i ∈ S

)

defined as above by

ρ

i

= E

k

(W

i

), W

i

=

∞

X

m=1

1(X

m

= i, T

k

≥ m),

we have

(i) ρ

k

= 1

(ii)

P

i

ρ

i

= µ

k

(iii) ρ = ρP

(iv) 0 < ρ

i

< ∞ for all i ∈ S.

Proof.

(i) This follows from definition of ρ

i

, since for m < T

k

, X

m

= k.

(ii)

Note that

P

i

W

i

=

T

k

, since in each step we hit exactly one thing. We

have

X

i

ρ

i

=

X

i

E

k

(W

i

)

= E

k

X

i

W

i

!

= E

k

(T

k

)

= µ

k

.

Note that we secretly swapped the sum and expectation, which is in general

bad because both are potentially infinite sums. However, there is a theorem

(bounded convergence) that tells us this is okay whenever the summands

are non-negative, which is left as an Analysis exercise.

(iii) We have

ρ

j

= E

k

(W

j

)

= E

k

X

m≥1

1(X

m

= j, T

k

≥ m)

=

X

m≥1

P

k

(X

m

= j, T

k

≥ m)

=

X

m≥1

X

i∈S

P

k

(X

m

= j | X

m−1

= i, T

k

≥ m)P

k

(X

m−1

= i, T

k

≥ m)

We now use the Markov property. Note that

T

k

≥ m

means

X

1

, ··· , X

m−1

are all not

k

. The Markov property thus tells us the condition

T

k

≥ m

is

useless. So we are left with

=

X

m≥1

X

i∈S

P

k

(X

m

= j | X

m−1

= i)P

k

(X

m−1

= i, T

k

≥ m)

=

X

m≥1

X

i∈S

p

i,j

P

k

(X

m−1

= i, T

k

≥ m)

=

X

i∈S

p

i,j

X

m≥1

P

k

(X

m−1

= i, T

k

≥ m)

The last term looks really

ρ

i

, but the indices are slightly off. We shall have

faith in ourselves, and show that this is indeed equal to ρ

i

.

First we let r = m − 1, and get

X

m≥1

P

k

(X

m−1

= i, T

k

≥ m) =

∞

X

r=0

P

k

(X

r

= i, T

k

≥ r + 1).

Of course this does not fix the problem. We will look at the different

possible cases. First, if

i

=

k

, then the

r

= 0 term is 1 since

T

k

≥

1 is

always true by definition and

X

0

=

k

, also by construction. On the other

hand, the other terms are all zero since it is impossible for the return time

to be greater or equal to

r

+ 1 if we are at

k

at time

r

. So the sum is 1,

which is ρ

k

.

In the case where

i

=

k

, first note that when

r

= 0 we know that

X

0

=

k

=

i

. So the term is zero. For

r ≥

1, we know that if

X

r

=

i

and

T

k

≥ r

, then we must also have

T

k

≥ r

+ 1, since it is impossible

for the return time to

k

to be exactly

r

if we are not at

k

at time

r

. So

P

k

(X

r

= i, T

k

≥ r + 1) = P

k

(X

r

= i, T

k

≥ r). So indeed we have

X

m≥0

P

k

(X

m−1

= i, T

k

≥ m) = ρ

i

.

Hence we get

ρ

j

=

X

i∈S

p

ij

ρ

i

.

So done.

(iv)

To show that 0

< ρ

i

< ∞

, first fix our

i

, and note that

ρ

k

= 1. We know

that

ρ

=

ρP

=

ρP

n

for

n ≥

1. So by expanding the matrix sum, we know

that for any m, n,

ρ

i

≥ ρ

k

p

k,i

(n)

ρ

k

≥ ρ

i

p

i,k

(m)

By irreducibility, we now choose

m, n

such that

p

i,k

(

m

)

, p

k,i

(

n

)

>

0. So

we have

ρ

k

p

k,i

(n) ≤ ρ

i

≤

ρ

k

p

i,k

(m)

Since ρ

k

= 1, the result follows.

Now we can prove our initial theorem.

Theorem. Consider an irreducible Markov chain. Then

(i)

There exists an invariant distribution if and only if some state is positive

recurrent.

(ii)

If there is an invariant distribution

π

, then every state is positive recurrent,

and

π

i

=

1

µ

i

for

i ∈ S

, where

µ

i

is the mean recurrence time of

i

. In particular,

π

is

unique.

Proof.

(i) Let k be a positive recurrent state. Then

π

i

=

ρ

i

µ

k

satisfies π

i

≥ 0 with

P

i

π

i

= 1, and is an invariant distribution.

(ii)

Let

π

be an invariant distribution. We first show that all entries are

non-zero. For all n, we have

π = πP

n

.

Hence for all i, j ∈ S, n ∈ N, we have

π

i

≥ π

j

p

j,i

(n). (∗)

Since

P

π

1

= 1, there is some k such that π

k

> 0.

By (∗) with j = k, we know that

π

i

≥ π

k

p

k,i

(n) > 0

for some n, by irreducibility. So π

i

> 0 for all i.

Now we show that all states are positive recurrent. So we need to rule out

the cases of transience and null recurrence.

So assume all states are transient. So

p

j,i

(

n

)

→

0 for all

i, j ∈ S

,

n ∈ N

.

However, we know that

π

i

=

X

j

π

j

p

j,i

(n).

If our state space is finite, then since

p

j,i

(

n

)

→

0, the sum tends to 0,

and we reach a contradiction, since

π

i

is non-zero. If we have a countably

infinite set, we have to be more careful. We have a huge state space

S

,

and we don’t know how to work with it. So we approximate it by a finite

F , and split S into F and S \ F . So we get

0 ≤

X

j

π

j

p

j,i

(n)

=

X

j∈F

π

j

p

j,i

(n) +

X

j∈F

π

j

p

j,i

(n)

≤

X

j∈F

p

j,i

(n) +

X

j∈F

π

j

→

X

j∈F

π

j

as we take the limit

n → ∞

. We now want to take the limit as

F → S

.

We know that

P

j∈S

π

i

= 1. So as we put more and more things into

F

,

P

j∈F

π

i

→

0. So

P

π

j

p

j,i

(

n

)

→

0. So we get the desired contradiction.

Hence we know that all states are recurrent.

To rule out the case of null recurrence, recall that in the previous discussion,

we said that we “should” have

π

i

µ

i

= 1. So we attempt to prove this.

Then this would imply that µ

i

is finite since π

i

> 0.

By definition µ

i

= E

i

(T

i

), and we have the general formula

E(N) =

X

n

P(N ≥ n).

So we get

π

i

µ

i

=

∞

X

n=1

π

i

P

i

(T

i

≥ n).

Note that

P

i

is a probability conditional on starting at

i

. So to work with

the expression

π

i

P

i

(

T

i

≥ n

), it is helpful to let

π

i

be the probability of

starting at i. So suppose X

0

has distribution π. Then

π

i

µ

i

=

∞

X

n=1

P(T

i

≥ n, X

0

= i).

Let’s work out what the terms are. What is the first term? It is

P(T

i

≥ 1, X

0

= i) = P(X

0

= i) = π

i

,

since we know that we always have T

i

≥ 1 by definition.

For other

n ≥

2, we want to compute

P

(

T

i

≥ n, X

0

=

i

). This is the

probability of starting at

i

, and then not return to

i

in the next

n −

1 steps.

So we have

P(T

i

≥ n, X

0

= i) = P(X

0

= i, X

m

= i for 1 ≤ m ≤ n − 1)

Note that all the expressions on the right look rather similar, except that

the first term is =

i

while the others are

=

i

. We can make them look

more similar by writing

P(T

i

≥ n, X

0

= i) = P(X

m

= i for 1 ≤ m ≤ n − 1)

− P(X

m

= i for 0 ≤ m ≤ n − 1)

What can we do now? The trick here is to use invariance. Since we started

with an invariant distribution, we always live in an invariant distribution.

Looking at the time interval 1

≤ m ≤ n −

1 is the same as looking at

0

≤ m ≤ n −

2). In other words, the sequence (

X

0

, ··· , X

n−2

) has the

same distribution as (X

1

, ··· , X

n−1

). So we can write the expression as

P(T

i

≥ n, X

0

= i) = a

n−2

− a

n−1

,

where

a

r

= P(X

m

= i for 0 ≤ i ≤ r).

Now we are summing differences, and when we sum differences everything

cancels term by term. Then we have

π

i

µ

i

= π

i

+ (a

0

− a

1

) + (a

1

− a

2

) + ···

Note that we cannot do the cancellation directly, since this is an infinite

sum, and infinity behaves weirdly. We have to look at a finite truncation,

do the cancellation, and take the limit. So we have

π

i

µ

i

= lim

N→∞

[π

i

+ (a

0

− a

1

) + (a

1

− a

2

) + ··· + (a

N−2

− a

N−1

)]

= lim

N→∞

[π

i

+ a

0

− a

N−1

]

= π

i

+ a

0

− lim

N→∞

a

N

.

What is each term?

π

i

is the probability that

X

0

=

i

, and

a

0

is the

probability that

X

0

=

i

. So we know that

π

i

+

a

0

= 1. What about

lim a

N

? We know that

lim

N→∞

a

N

= P(X

m

= i for all m).

Since the state is recurrent, the probability of never visiting i is 0. So we

get

π

i

µ

i

= 1.

Since

π

i

>

0, we get

µ

i

=

1

π

i

< ∞

for all

i

. Hence we have positive

recurrence. We have also proved the formula we wanted.