1Markov chains

IB Markov Chains

1.1 The Markov property

We start with the definition of a Markov chain.

Definition (Markov chain). Let

X

= (

X

0

, X

1

, ···

) be a sequence of random

variables taking values in some set

S

, the state space. We assume that

S

is

countable (which could be finite).

We say

X

has the Markov property if for all

n ≥

0

, i

0

, ··· , i

n+1

∈ S

, we have

P(X

n+1

= i

n+1

| X

0

= i

0

, ··· , X

n

= i

n

) = P(X

n+1

= i

n+1

| X

n

= i

n

).

If X has the Markov property, we call it a Markov chain.

We say that a Markov chain

X

is homogeneous if the conditional probabilities

P(X

n+1

= j | X

n

= i) do not depend on n.

All our chains

X

will be Markov and homogeneous unless otherwise specified.

Since the state space

S

is countable, we usually label the states by integers

i ∈ N.

Example.

(i) A random walk is a Markov chain.

(ii) The branching process is a Markov chain.

In general, to fully specify a (homogeneous) Markov chain, we will need two

items:

(i)

The initial distribution

λ

i

=

P

(

X

0

=

i

). We can write this as a vector

λ = (λ

i

: i ∈ S).

(ii)

The transition probabilities

p

i,j

=

P

(

X

n+1

=

j | X

n

=

i

). We can write

this as a matrix P = (p

i,j

)

i,j∈S

.

We will start by proving a few properties of

λ

and

P

. These let us know

whether an arbitrary pair of vector and matrix (

λ, P

) actually specifies a Markov

chain.

Proposition.

(i) λ is a distribution, i.e. λ

i

≥ 0,

P

i

λ

i

= 1.

(ii) P is a stochastic matrix, i.e. p

i,j

≥ 0 and

P

j

p

i,j

= 1 for all i.

Proof.

(i) Obvious since λ is a probability distribution.

(ii) p

i,j

≥ 0 since p

ij

is a probability. We also have

X

j

p

i,j

=

X

j

P(X

1

= j | X

0

= i) = 1

since P(X

1

= · | X

0

= i) is a probability distribution function.

Note that we only require the row sum to be 1, and the column sum need

not be.

We will prove another seemingly obvious fact.

Theorem. Let

λ

be a distribution (on

S

) and

P

a stochastic matrix. The

sequence

X

= (

X

0

, X

1

, ···

) is a Markov chain with initial distribution

λ

and

transition matrix P iff

P(X

0

= i

0

, X

1

= i

1

, ··· , X

n

= i

n

) = λ

i

0

p

i

0

,i

1

p

i

1

,i

2

···p

i

n−1

,i

n

(∗)

for all n, i

0

, ··· , i

n

Proof. Let A

k

be the event X

k

= i

k

. Then we can write (∗) as

P(A

0

∩ A

1

∩ ··· ∩ A

n

) = λ

i

0

p

i

0

,i

1

p

i

1

,i

2

···p

i

n−1

,i

n

. (∗)

We first assume that X is a Markov chain. We prove (∗) by induction on n.

When n = 0, (∗) says P(A

0

) = λ

i

0

. This is true by definition of λ.

Assume that it is true for all n < N. Then

P(A

0

∩ A

1

∩ ··· ∩ A

N

) = P(A

0

, ··· , A

N−1

)P(A

0

, ··· , A

N

| A

0

, ··· , A

N−1

)

= λ

i

0

p

i

0

,i

1

···p

i

N −2

,i

N −1

P(A

N

| A

0

, ··· , A

N−1

)

= λ

i

0

p

i

0

,i

1

···p

i

N −2

,i

N −1

P(A

N

| A

N−1

)

= λ

i

0

p

i

0

,i

1

···p

i

N −2

,i

N −1

p

i

N −1

,i

N

.

So it is true for N as well. Hence we are done by induction.

Conversely, suppose that (

∗

) holds. Then for

n

= 0, we have

P

(

X

0

=

i

0

) =

λ

i

0

.

Otherwise,

P(X

n

= i

n

| X

0

= i

0

, ··· , X

n−1

= i

n−1

) = P(A

n

| A

0

∩ ··· ∩ A

n−1

)

=

P(A

0

∩ ··· ∩ A

n

)

P(A

0

∩ ··· ∩ A

n−1

)

= p

i

n−1

,i

n

,

which is independent of i

0

, ··· , i

n−2

. So this is Markov.

Often, we do not use the Markov property directly. Instead, we use the

following:

Theorem (Extended Markov property). Let

X

be a Markov chain. For

n ≥

0,

any

H

given in terms of the past

{X

i

:

i < n}

, and any

F

given in terms of the

future {X

i

: i > n}, we have

P(F | X

n

= i, H) = P(F | X

n

= i).

To prove this, we need to stitch together many instances of the Markov

property. Actual proof is omitted.