2Classification of chains and states

IB Markov Chains

2.4 The strong Markov property and applications

We are going to state the strong Markov property and see applications of it.

Before this, we should know what the weak Markov property is. We have, in fact,

already seen the weak Markov property. It’s just that we called it the “Markov

property’ instead.

In probability, we often have “strong” and “weak” versions of things. For

example, we have the strong and weak law of large numbers. The difference is

that the weak versions are expressed in terms of probabilities, while the strong

versions are expressed in terms of random variables.

Initially, when people first started developing probability theory, they just

talked about probability distributions like the Poisson distribution or the normal

distribution. However, later it turned out it is often nicer to talk about random

variables instead. After messing with random variables, we can just take ex-

pectations or evaluate probabilities to get the corresponding statement about

probability distributions. Hence usually the “strong” versions imply the “weak”

version, but not the other way round.

In this case, recall that we defined the Markov property in terms of the

probabilities at some fixed time. We have some fixed time

t

, and we want to

know the probabilities of events after

t

in terms of what happened before

t

.

In the strong Markov property, we will allow

t

to be a random variable, and

say something slightly more involved. However, we will not allow

T

to be any

random variable, but just some nice ones.

Definition (Stopping time). Let

X

be a Markov chain. A random variable

T

(which is a function Ω

→ N ∪{∞}

) is a stopping time for the chain

X

= (

X

n

) if

for n ≥ 0, the event {T = n} is given in terms of X

0

, ··· , X

n

.

For example, suppose we are in a casino and gambling. We let

X

n

be the

amount of money we have at time

n

. Then we can set our stopping time as “the

time when we have $10 left”. This is a stopping time, in the sense that we can

use this as a guide to when to stop — it is certainly possible to set yourself a

guide that you should leave the casino when you have $10 left. However, it does

not make sense to say “I will leave if the next game will make me bankrupt”,

since there is no way to tell if the next game will make you bankrupt (it certainly

will not if you win the game!). Hence this is not a stopping time.

Example. The hitting time

H

A

is a stopping time. This is since

{H

A

=

n}

=

{X

i

∈ A for i < n} ∩ {X

n

∈ A}

. We also know that

H

A

+ 1 is a stopping time

since it only depends in

X

i

for

i ≤ n −

1. However,

H

A

−

1 is not a stopping

time since it depends on X

n+1

.

We can now state the strong Markov property, which is expressed in a rather

complicated manner but can be useful at times.

Theorem (Strong Markov property). Let

X

be a Markov chain with transition

matrix

P

, and let

T

be a stopping time for

X

. Given

T < ∞

and

X

T

=

i

, the

chain (

X

T +k

:

k ≥

0) is a Markov chain with transition matrix

P

with initial

distribution X

T +0

= i, and this Markov chain is independent of X

0

, ··· , X

T

.

Proof is omitted.

Example (Gambler’s ruin). Again, this is the Markov chain taking values on

the non-negative integers, moving to the right with probability

p

and left with

probability

q

= 1

− p

. 0 is an absorbing state, since we have no money left to

bet if we are broke.

Instead of computing the probability of hitting zero, we want to find the

time it takes to get to 0, i.e.

H = inf{n ≥ 0 : X

n

= 0}.

Note that the infimum of the empty set is +

∞

, i.e. if we never hit zero, we say

it takes infinite time. What is the distribution of

H

? We define the generating

function

G

i

(s) = E

i

(s

H

) =

∞

X

n=0

s

n

P

i

(H = n), |s| < 1.

Note that we need the requirement that

|s| <

1, since it is possible that

H

is

infinite, and we would not want to think whether the sum converges when

s

= 1.

However, we know that it does for |s| < 1.

We have

G

1

(s) = E

1

(s

H

) = pE

1

(s

H

| X

1

= 2) + qE

1

(s

H

| X

1

= 0).

How can we simplify this? The second term is easy, since if

X

1

= 0, then we

must have

H

= 1. So

E

1

(

s

H

| X

1

= 0) =

s

. The first term is more tricky. We

are now at 2. To get to 0, we need to pass through 1. So the time needed to get

to 0 is the time to get from 2 to 1 (say

H

′

), plus the time to get from 1 to 0

(say

H

′′

). We know that

H

′

and

H

′′

have the same distribution as

H

, and by

the strong Markov property, they are independent. So

G

1

= pE

1

(s

H

′

+H

′′

+1

) + qs = psG

2

1

+ qs. (∗)

Solving this, we get two solutions

G

1

(s) =

1 ±

p

1 − 4pqs

2

2ps

.

We have to be careful here. This result says that for each value of

s

,

G

1

(

s

) is

either

1+

√

1−4pqs

2

2ps

or

1−

√

1−4pqs

2

2ps

. It does not say that there is some consistent

choice of + or − that works for everything.

However, we know that if we suddenly change the sign, then

G

1

(

s

) will be

discontinuous at that point, but

G

1

, being a power series, has to be continuous.

So the solution must be either + for all s, or − for all s.

To determine the sign, we can look at what happens when

s

= 0. We see

that the numerator becomes 1

±

1, while the denominator is 0. We know that

G

converges at s = 0. Hence the numerator must be 0. So we must pick −, i.e.

G

1

(s) =

1 −

p

1 − 4pqs

2

2ps

.

We can find P

1

(H = k) by expanding the Taylor series.

What is the probability of ever hitting 0? This is

P

1

(H < ∞) =

∞

X

n=1

P

1

(H = n) = lim

s→1

G

1

(s) =

1 −

√

1 − 4pq

2p

.

We can rewrite this using the fact that

q

= 1

− p

. So 1

−

4

pq

= 1

−

4

p

(1

− p

) =

(1 − 2p)

2

= |q − p|

2

. So we can write

P

1

(H < ∞) =

1 − |p − q|

2p

=

(

1 p ≤ q

q

p

p > q

.

Using this, we can also find

µ

=

E

1

(

H

). Firstly, if

p > q

, then it is possible

that

H

=

∞

. So

µ

=

∞

. If

p ≤ q

, we can find

µ

by differentiating

G

1

(

s

) and

evaluating at

s

= 1. Doing this directly would result it horrible and messy

algebra, which we want to avoid. Instead, we can differentiate (∗) and obtain

G

′

1

= pG

2

1

+ ps2G

1

G

′

1

+ q.

We can rewrite this as

G

′

1

(s) =

pG(s)

2

+ q

1 − 2psG(s)

.

By Abel’s lemma, we have

µ = lim

s→1

G

′

(s) =

(

∞ p =

1

2

1

p−q

p <

1

2

.