1Markov chains

IB Markov Chains



1.2 Transition probability
Recall that we can specify the dynamics of a Markov chain by the one-step
transition probability,
p
i,j
= P(X
n+1
= j | X
n
= i).
However, we don’t always want to take 1 step. We might want to take 2 steps, 3
steps, or, in general, n steps. Hence, we define
Definition (
n
-step transition probability). The
n
-step transition probability
from i to j is
p
i,j
(n) = P(X
n
= j | X
0
= i).
How do we compute these probabilities? The idea is to break this down into
smaller parts. We want to express
p
i,j
(
m
+
n
) in terms of
n
-step and
m
-step
transition probabilities. Then we can iteratively break down an arbitrary
p
i,j
(
n
)
into expressions involving the one-step transition probabilities only.
To compute
p
i,j
(
m
+
n
), we can think of this as a two-step process. We first
go from
i
to some unknown point
k
in
m
steps, and then travel from
k
to
j
in
n
more steps. To find the probability to get from
i
to
j
, we consider all possible
routes from i to j, and sum up all the probability of the paths. We have
p
ij
(m + n) = P(X
m+n
| X
0
= i)
=
X
k
P(X
m+n
= j | X
m
= k, X
0
= i)P(X
m
= k | X
0
= i)
=
X
k
P(X
m+n
= j | X
m
= k)P(X
m
= k | X
0
= i)
=
X
k
p
i,k
(m)p
k,j
(n).
Thus we get
Theorem (Chapman-Kolmogorov equation).
p
i,j
(m + n) =
X
kS
p
i,k
(m)p
k,j
(n).
This formula is suspiciously familiar. It is just matrix multiplication!
Notation. Write P (m) = (p
i,j
(m))
i,jS
.
Then we have
P (m + n) = P(m)P (n)
In particular, we have
P (n) = P (1)P (n 1) = ··· = P (1)
n
= P
n
.
This allows us to easily compute the
n
-step transition probability by matrix
multiplication.
Example. Let S = {1, 2}, with
P =
1 α α
β 1 β
We assume 0 < α, β < 1. We want to find the n-step transition probability.
We can achieve this via diagonalization. We can write P as
P = U
1
κ
1
0
0 κ
2
U,
where the κ
i
are eigenvalues of P , and U is composed of the eigenvectors.
To find the eigenvalues, we calculate
det(P λI) = (1 α λ)(1 β λ) αβ = 0.
We solve this to obtain
κ
1
= 1, κ
2
= 1 α β.
Usually, the next thing to do would be to find the eigenvectors to obtain
U
.
However, here we can cheat a bit and not do that. Using the diagonalization of
P , we have
P
n
= U
1
κ
n
1
0
0 κ
n
2
U.
We can now attempt to compute p
1,2
. We know that it must be of the form
p
1,2
=
n
1
+ Bκ
n
2
= A + B(1 α β)
n
where
A
and
B
are constants coming from
U
and
U
1
. However, we know well
that
p
1,2
(0) = 0, p
12
(1) = α.
So we obtain
A + B = 0
A + B(1 α β) = α.
This is something we can solve, and obtain
p
1,2
(n) =
α
α + β
(1 (1 α β)
n
) = 1 p
1,1
(n).
How about
p
2,1
and
p
2,2
? Well we don’t need additional work. We can obtain
these simply by interchanging α and β. So we obtain
P
n
=
1
α + β
β + α(1 α β)
n
α α(1 α β)
n
β β(1 β α)
n
α + β(1 β α)
n
What happens as n ? We can take the limit and obtain
P
n
1
α + β
β α
β α
We see that the two rows are the same. This means that as time goes on, where
we end up does not depend on where we started. We will later (near the end of
the course) see that this is generally true for most Markov chains.
Alternatively, we can solve this by a difference equation. The recurrence
relation is given by
p
1,1
(n + 1) = p
1,1
(n)(1 α) + p
1,2
(n)β.
Writing in terms of p
11
only, we have
p
1,1
(n + 1) = p
1,1
(n)(1 α) + (1 p
1,1
(n))β.
We can solve this as we have done in IA Differential Equations.
We saw that the Chapman-Kolmogorov equation can be concisely stated as
a rule about matrix multiplication. In general, many statements about Markov
chains can be formulated in the language of linear algebra naturally.
For example, let
X
0
have distribution
λ
. What is the distribution of
X
1
? By
definition, it is
P(X
1
= j) =
X
i
P(X
1
= j | X
0
= i)P(X
0
= i) =
X
i
λ
i
p
i,j
.
Hence this has a distribution
λP
, where
λ
is treated as a row vector. Similarly,
X
n
has the distribution λP
n
.
In fact, historically, Markov chains was initially developed as a branch of
linear algebra, and a lot of the proofs were just linear algebra manipulations.
However, nowadays, we often look at it as a branch of probability theory instead,
and this is what we will do in this course. So don’t be scared if you hate linear
algebra.