1Markov chains

IB Markov Chains

1.2 Transition probability

Recall that we can specify the dynamics of a Markov chain by the one-step

transition probability,

p

i,j

= P(X

n+1

= j | X

n

= i).

However, we don’t always want to take 1 step. We might want to take 2 steps, 3

steps, or, in general, n steps. Hence, we define

Definition (

n

-step transition probability). The

n

-step transition probability

from i to j is

p

i,j

(n) = P(X

n

= j | X

0

= i).

How do we compute these probabilities? The idea is to break this down into

smaller parts. We want to express

p

i,j

(

m

+

n

) in terms of

n

-step and

m

-step

transition probabilities. Then we can iteratively break down an arbitrary

p

i,j

(

n

)

into expressions involving the one-step transition probabilities only.

To compute

p

i,j

(

m

+

n

), we can think of this as a two-step process. We first

go from

i

to some unknown point

k

in

m

steps, and then travel from

k

to

j

in

n

more steps. To find the probability to get from

i

to

j

, we consider all possible

routes from i to j, and sum up all the probability of the paths. We have

p

ij

(m + n) = P(X

m+n

| X

0

= i)

=

X

k

P(X

m+n

= j | X

m

= k, X

0

= i)P(X

m

= k | X

0

= i)

=

X

k

P(X

m+n

= j | X

m

= k)P(X

m

= k | X

0

= i)

=

X

k

p

i,k

(m)p

k,j

(n).

Thus we get

Theorem (Chapman-Kolmogorov equation).

p

i,j

(m + n) =

X

k∈S

p

i,k

(m)p

k,j

(n).

This formula is suspiciously familiar. It is just matrix multiplication!

Notation. Write P (m) = (p

i,j

(m))

i,j∈S

.

Then we have

P (m + n) = P(m)P (n)

In particular, we have

P (n) = P (1)P (n − 1) = ··· = P (1)

n

= P

n

.

This allows us to easily compute the

n

-step transition probability by matrix

multiplication.

Example. Let S = {1, 2}, with

P =

1 − α α

β 1 − β

We assume 0 < α, β < 1. We want to find the n-step transition probability.

We can achieve this via diagonalization. We can write P as

P = U

−1

κ

1

0

0 κ

2

U,

where the κ

i

are eigenvalues of P , and U is composed of the eigenvectors.

To find the eigenvalues, we calculate

det(P − λI) = (1 − α − λ)(1 − β − λ) − αβ = 0.

We solve this to obtain

κ

1

= 1, κ

2

= 1 − α − β.

Usually, the next thing to do would be to find the eigenvectors to obtain

U

.

However, here we can cheat a bit and not do that. Using the diagonalization of

P , we have

P

n

= U

−1

κ

n

1

0

0 κ

n

2

U.

We can now attempt to compute p

1,2

. We know that it must be of the form

p

1,2

= Aκ

n

1

+ Bκ

n

2

= A + B(1 − α − β)

n

where

A

and

B

are constants coming from

U

and

U

−1

. However, we know well

that

p

1,2

(0) = 0, p

12

(1) = α.

So we obtain

A + B = 0

A + B(1 − α − β) = α.

This is something we can solve, and obtain

p

1,2

(n) =

α

α + β

(1 − (1 − α − β)

n

) = 1 − p

1,1

(n).

How about

p

2,1

and

p

2,2

? Well we don’t need additional work. We can obtain

these simply by interchanging α and β. So we obtain

P

n

=

1

α + β

β + α(1 − α − β)

n

α − α(1 − α − β)

n

β − β(1 − β − α)

n

α + β(1 − β − α)

n

What happens as n → ∞? We can take the limit and obtain

P

n

→

1

α + β

β α

β α

We see that the two rows are the same. This means that as time goes on, where

we end up does not depend on where we started. We will later (near the end of

the course) see that this is generally true for most Markov chains.

Alternatively, we can solve this by a difference equation. The recurrence

relation is given by

p

1,1

(n + 1) = p

1,1

(n)(1 − α) + p

1,2

(n)β.

Writing in terms of p

11

only, we have

p

1,1

(n + 1) = p

1,1

(n)(1 − α) + (1 − p

1,1

(n))β.

We can solve this as we have done in IA Differential Equations.

We saw that the Chapman-Kolmogorov equation can be concisely stated as

a rule about matrix multiplication. In general, many statements about Markov

chains can be formulated in the language of linear algebra naturally.

For example, let

X

0

have distribution

λ

. What is the distribution of

X

1

? By

definition, it is

P(X

1

= j) =

X

i

P(X

1

= j | X

0

= i)P(X

0

= i) =

X

i

λ

i

p

i,j

.

Hence this has a distribution

λP

, where

λ

is treated as a row vector. Similarly,

X

n

has the distribution λP

n

.

In fact, historically, Markov chains was initially developed as a branch of

linear algebra, and a lot of the proofs were just linear algebra manipulations.

However, nowadays, we often look at it as a branch of probability theory instead,

and this is what we will do in this course. So don’t be scared if you hate linear

algebra.