3Long-run behaviour
IB Markov Chains
3.1 Invariant distributions
We want to look at what happens in the long run. Recall that at the very
beginning of the course, we calculated the transition probabilities of the two-
state Markov chain, and saw that in the long run, as
n → ∞
, the probability
distribution of the
X
n
will “converge” to some particular distribution. Moreover,
this limit does not depend on where we started. We’d like to see if this is true
for all Markov chains.
First of all, we want to make it clear what we mean by the chain “converging”
to something. When we are dealing with real sequences
x
n
, we have a precise
definition of what it means for
x
n
→ x
. How can we define the convergence of a
sequence of random variables
X
n
? These are not proper numbers, so we cannot
just apply our usual definitions.
For the purposes of our investigation of Markov chains here, it turns out
that the right way to think about convergence is to look at the probability mass
function. For each k ∈ S, we ask if P(X
n
= k) converges to anything.
In most cases,
P
(
X
n
=
k
) converges to something. Hence, this is not an
interesting question to ask. What we would really want to know is whether the
limit is a probability mass function. It is, for example, possible that
P
(
X
n
=
k) → 0 for all k, and the result is not a distribution.
From Analysis, we know there are, in general, two ways to prove something
converges — we either “guess” a limit and then prove it does indeed converge to
that limit, or we prove the sequence is Cauchy. It would be rather silly to prove
that these probabilities form a Cauchy sequence, so we will attempt to guess the
limit. The limit will be known as an invariant distribution, for reasons that will
become obvious shortly.
The main focus of this section is to study the existence and properties of
invariant distributions, and we will provide sufficient conditions for convergence
to occur in the next.
Recall that if we have a starting state
λ
, then the distribution of the
n
th
step is given by λP
n
. We have the following trivial identity:
λP
n+1
= (λP
n
)P.
If the distribution converges, then we have
λP
n
→ π
for some
π
, and also
λP
n+1
→ π. Hence the limit π satisfies
πP = π.
We call these invariant distributions.
Definition (Invariant distriubtion). Let
X
j
be a Markov chain with transition
probabilities
P
. The distribution
π
= (
π
k
:
k ∈ S
) is an invariant distribution if
(i) π
k
≥ 0,
P
k
π
k
= 1.
(ii) π = πP .
The first condition just ensures that this is a genuine distribution.
An invariant distribution is also known as an invariant measure, equilibrium
distribution or steady-state distribution.
Theorem. Consider an irreducible Markov chain. Then
(i) There exists an invariant distribution if some state is positive recurrent.
(ii)
If there is an invariant distribution
π
, then every state is positive recurrent,
and
π
i
=
1
µ
i
for
i ∈ S
, where
µ
i
is the mean recurrence time of
i
. In particular,
π
is
unique.
Note how we worded the first statement. Recall that we once stated that if
one state is positive recurrent, then all states are positive recurrent, and then
said we would defer the proof for later on. This is where we actually prove it.
In (i), we show that if some state is positive recurrent, then it has an invariant
distribution. Then (ii) tells us if there is an invariant distribution, then all states
are positive recurrent. Hence one state being positive recurrent implies all states
being positive recurrent.
Now where did the formula for
π
come from? We can first think what
π
i
should be. By definition, we should know that for large
m
,
P
(
X
m
=
i
)
∼ π
i
.
This means that if we go on to really late in time, we would expect to visit the
state
i π
i
of the time. On the other hand, the mean recurrence time tells us that
we are expected to (re)-visit
i
every
µ
i
steps. So it makes sense that
π
i
= 1
/µ
i
.
To put this on a more solid ground and actually prove it, we would like to
look at some time intervals. For example, we might ask how many times we will
hit
i
in 100 steps. This is not a good thing to do, because we are not given where
we are starting, and this probability can depend a lot on where the starting
point is.
It turns out the natural thing to do is not to use a fixed time interval, but
use a random time interval. In particular, we fix a state
k
, and look at the time
interval between two consecutive visits of k.
We start by letting
X
0
=
k
. Let
W
i
denote the number of visits to
i
before
the next visit to k. Formally, we have
W
i
=
∞
X
m=1
1(X
m
= i, m ≤ T
k
),
where
T
k
is the recurrence time of
m
and 1 is the indicator function. In particular,
W
i
= 1 for i = k (if T
k
is finite). We can also write this as
W
i
=
T
k
X
m=1
1(X
m
= i).
This is a random variable. So we can look at its expectation. We define
ρ
i
= E
k
(W
i
).
We will show that this ρ is almost our π
i
, up to a constant.
Proposition. For an irreducible recurrent chain and
k ∈ S
,
ρ
= (
ρ
i
:
i ∈ S
)
defined as above by
ρ
i
= E
k
(W
i
), W
i
=
∞
X
m=1
1(X
m
= i, T
k
≥ m),
we have
(i) ρ
k
= 1
(ii)
P
i
ρ
i
= µ
k
(iii) ρ = ρP
(iv) 0 < ρ
i
< ∞ for all i ∈ S.
Proof.
(i) This follows from definition of ρ
i
, since for m < T
k
, X
m
= k.
(ii)
Note that
P
i
W
i
=
T
k
, since in each step we hit exactly one thing. We
have
X
i
ρ
i
=
X
i
E
k
(W
i
)
= E
k
X
i
W
i
!
= E
k
(T
k
)
= µ
k
.
Note that we secretly swapped the sum and expectation, which is in general
bad because both are potentially infinite sums. However, there is a theorem
(bounded convergence) that tells us this is okay whenever the summands
are non-negative, which is left as an Analysis exercise.
(iii) We have
ρ
j
= E
k
(W
j
)
= E
k
X
m≥1
1(X
m
= j, T
k
≥ m)
=
X
m≥1
P
k
(X
m
= j, T
k
≥ m)
=
X
m≥1
X
i∈S
P
k
(X
m
= j | X
m−1
= i, T
k
≥ m)P
k
(X
m−1
= i, T
k
≥ m)
We now use the Markov property. Note that
T
k
≥ m
means
X
1
, ··· , X
m−1
are all not
k
. The Markov property thus tells us the condition
T
k
≥ m
is
useless. So we are left with
=
X
m≥1
X
i∈S
P
k
(X
m
= j | X
m−1
= i)P
k
(X
m−1
= i, T
k
≥ m)
=
X
m≥1
X
i∈S
p
i,j
P
k
(X
m−1
= i, T
k
≥ m)
=
X
i∈S
p
i,j
X
m≥1
P
k
(X
m−1
= i, T
k
≥ m)
The last term looks really
ρ
i
, but the indices are slightly off. We shall have
faith in ourselves, and show that this is indeed equal to ρ
i
.
First we let r = m − 1, and get
X
m≥1
P
k
(X
m−1
= i, T
k
≥ m) =
∞
X
r=0
P
k
(X
r
= i, T
k
≥ r + 1).
Of course this does not fix the problem. We will look at the different
possible cases. First, if
i
=
k
, then the
r
= 0 term is 1 since
T
k
≥
1 is
always true by definition and
X
0
=
k
, also by construction. On the other
hand, the other terms are all zero since it is impossible for the return time
to be greater or equal to
r
+ 1 if we are at
k
at time
r
. So the sum is 1,
which is ρ
k
.
In the case where
i
=
k
, first note that when
r
= 0 we know that
X
0
=
k
=
i
. So the term is zero. For
r ≥
1, we know that if
X
r
=
i
and
T
k
≥ r
, then we must also have
T
k
≥ r
+ 1, since it is impossible
for the return time to
k
to be exactly
r
if we are not at
k
at time
r
. So
P
k
(X
r
= i, T
k
≥ r + 1) = P
k
(X
r
= i, T
k
≥ r). So indeed we have
X
m≥0
P
k
(X
m−1
= i, T
k
≥ m) = ρ
i
.
Hence we get
ρ
j
=
X
i∈S
p
ij
ρ
i
.
So done.
(iv)
To show that 0
< ρ
i
< ∞
, first fix our
i
, and note that
ρ
k
= 1. We know
that
ρ
=
ρP
=
ρP
n
for
n ≥
1. So by expanding the matrix sum, we know
that for any m, n,
ρ
i
≥ ρ
k
p
k,i
(n)
ρ
k
≥ ρ
i
p
i,k
(m)
By irreducibility, we now choose
m, n
such that
p
i,k
(
m
)
, p
k,i
(
n
)
>
0. So
we have
ρ
k
p
k,i
(n) ≤ ρ
i
≤
ρ
k
p
i,k
(m)
Since ρ
k
= 1, the result follows.
Now we can prove our initial theorem.
Theorem. Consider an irreducible Markov chain. Then
(i)
There exists an invariant distribution if and only if some state is positive
recurrent.
(ii)
If there is an invariant distribution
π
, then every state is positive recurrent,
and
π
i
=
1
µ
i
for
i ∈ S
, where
µ
i
is the mean recurrence time of
i
. In particular,
π
is
unique.
Proof.
(i) Let k be a positive recurrent state. Then
π
i
=
ρ
i
µ
k
satisfies π
i
≥ 0 with
P
i
π
i
= 1, and is an invariant distribution.
(ii)
Let
π
be an invariant distribution. We first show that all entries are
non-zero. For all n, we have
π = πP
n
.
Hence for all i, j ∈ S, n ∈ N, we have
π
i
≥ π
j
p
j,i
(n). (∗)
Since
P
π
1
= 1, there is some k such that π
k
> 0.
By (∗) with j = k, we know that
π
i
≥ π
k
p
k,i
(n) > 0
for some n, by irreducibility. So π
i
> 0 for all i.
Now we show that all states are positive recurrent. So we need to rule out
the cases of transience and null recurrence.
So assume all states are transient. So
p
j,i
(
n
)
→
0 for all
i, j ∈ S
,
n ∈ N
.
However, we know that
π
i
=
X
j
π
j
p
j,i
(n).
If our state space is finite, then since
p
j,i
(
n
)
→
0, the sum tends to 0,
and we reach a contradiction, since
π
i
is non-zero. If we have a countably
infinite set, we have to be more careful. We have a huge state space
S
,
and we don’t know how to work with it. So we approximate it by a finite
F , and split S into F and S \ F . So we get
0 ≤
X
j
π
j
p
j,i
(n)
=
X
j∈F
π
j
p
j,i
(n) +
X
j∈F
π
j
p
j,i
(n)
≤
X
j∈F
p
j,i
(n) +
X
j∈F
π
j
→
X
j∈F
π
j
as we take the limit
n → ∞
. We now want to take the limit as
F → S
.
We know that
P
j∈S
π
i
= 1. So as we put more and more things into
F
,
P
j∈F
π
i
→
0. So
P
π
j
p
j,i
(
n
)
→
0. So we get the desired contradiction.
Hence we know that all states are recurrent.
To rule out the case of null recurrence, recall that in the previous discussion,
we said that we “should” have
π
i
µ
i
= 1. So we attempt to prove this.
Then this would imply that µ
i
is finite since π
i
> 0.
By definition µ
i
= E
i
(T
i
), and we have the general formula
E(N) =
X
n
P(N ≥ n).
So we get
π
i
µ
i
=
∞
X
n=1
π
i
P
i
(T
i
≥ n).
Note that
P
i
is a probability conditional on starting at
i
. So to work with
the expression
π
i
P
i
(
T
i
≥ n
), it is helpful to let
π
i
be the probability of
starting at i. So suppose X
0
has distribution π. Then
π
i
µ
i
=
∞
X
n=1
P(T
i
≥ n, X
0
= i).
Let’s work out what the terms are. What is the first term? It is
P(T
i
≥ 1, X
0
= i) = P(X
0
= i) = π
i
,
since we know that we always have T
i
≥ 1 by definition.
For other
n ≥
2, we want to compute
P
(
T
i
≥ n, X
0
=
i
). This is the
probability of starting at
i
, and then not return to
i
in the next
n −
1 steps.
So we have
P(T
i
≥ n, X
0
= i) = P(X
0
= i, X
m
= i for 1 ≤ m ≤ n − 1)
Note that all the expressions on the right look rather similar, except that
the first term is =
i
while the others are
=
i
. We can make them look
more similar by writing
P(T
i
≥ n, X
0
= i) = P(X
m
= i for 1 ≤ m ≤ n − 1)
− P(X
m
= i for 0 ≤ m ≤ n − 1)
What can we do now? The trick here is to use invariance. Since we started
with an invariant distribution, we always live in an invariant distribution.
Looking at the time interval 1
≤ m ≤ n −
1 is the same as looking at
0
≤ m ≤ n −
2). In other words, the sequence (
X
0
, ··· , X
n−2
) has the
same distribution as (X
1
, ··· , X
n−1
). So we can write the expression as
P(T
i
≥ n, X
0
= i) = a
n−2
− a
n−1
,
where
a
r
= P(X
m
= i for 0 ≤ i ≤ r).
Now we are summing differences, and when we sum differences everything
cancels term by term. Then we have
π
i
µ
i
= π
i
+ (a
0
− a
1
) + (a
1
− a
2
) + ···
Note that we cannot do the cancellation directly, since this is an infinite
sum, and infinity behaves weirdly. We have to look at a finite truncation,
do the cancellation, and take the limit. So we have
π
i
µ
i
= lim
N→∞
[π
i
+ (a
0
− a
1
) + (a
1
− a
2
) + ··· + (a
N−2
− a
N−1
)]
= lim
N→∞
[π
i
+ a
0
− a
N−1
]
= π
i
+ a
0
− lim
N→∞
a
N
.
What is each term?
π
i
is the probability that
X
0
=
i
, and
a
0
is the
probability that
X
0
=
i
. So we know that
π
i
+
a
0
= 1. What about
lim a
N
? We know that
lim
N→∞
a
N
= P(X
m
= i for all m).
Since the state is recurrent, the probability of never visiting i is 0. So we
get
π
i
µ
i
= 1.
Since
π
i
>
0, we get
µ
i
=
1
π
i
< ∞
for all
i
. Hence we have positive
recurrence. We have also proved the formula we wanted.