1Markov chains

IB Markov Chains

1.1 The Markov property
Definition (Markov chain). Let
X
= (
X
0
, X
1
, ···
) be a sequence of random
variables taking values in some set
S
, the state space. We assume that
S
is
countable (which could be finite).
We say
X
has the Markov property if for all
n
0
, i
0
, ··· , i
n+1
S
, we have
P(X
n+1
= i
n+1
| X
0
= i
0
, ··· , X
n
= i
n
) = P(X
n+1
= i
n+1
| X
n
= i
n
).
If X has the Markov property, we call it a Markov chain.
We say that a Markov chain
X
is homogeneous if the conditional probabilities
P(X
n+1
= j | X
n
= i) do not depend on n.
All our chains
X
will be Markov and homogeneous unless otherwise specified.
Since the state space
S
is countable, we usually label the states by integers
i N.
Example.
(i) A random walk is a Markov chain.
(ii) The branching process is a Markov chain.
In general, to fully specify a (homogeneous) Markov chain, we will need two
items:
(i)
The initial distribution
λ
i
=
P
(
X
0
=
i
). We can write this as a vector
λ = (λ
i
: i S).
(ii)
The transition probabilities
p
i,j
=
P
(
X
n+1
=
j | X
n
=
i
). We can write
this as a matrix P = (p
i,j
)
i,jS
.
We will start by proving a few properties of
λ
and
P
. These let us know
whether an arbitrary pair of vector and matrix (
λ, P
) actually specifies a Markov
chain.
Proposition.
(i) λ is a distribution, i.e. λ
i
0,
P
i
λ
i
= 1.
(ii) P is a stochastic matrix, i.e. p
i,j
0 and
P
j
p
i,j
= 1 for all i.
Proof.
(i) Obvious since λ is a probability distribution.
(ii) p
i,j
0 since p
ij
is a probability. We also have
X
j
p
i,j
=
X
j
P(X
1
= j | X
0
= i) = 1
since P(X
1
= · | X
0
= i) is a probability distribution function.
Note that we only require the row sum to be 1, and the column sum need
not be.
We will prove another seemingly obvious fact.
Theorem. Let
λ
be a distribution (on
S
) and
P
a stochastic matrix. The
sequence
X
= (
X
0
, X
1
, ···
) is a Markov chain with initial distribution
λ
and
transition matrix P iff
P(X
0
= i
0
, X
1
= i
1
, ··· , X
n
= i
n
) = λ
i
0
p
i
0
,i
1
p
i
1
,i
2
···p
i
n1
,i
n
()
for all n, i
0
, ··· , i
n
Proof. Let A
k
be the event X
k
= i
k
. Then we can write () as
P(A
0
A
1
··· A
n
) = λ
i
0
p
i
0
,i
1
p
i
1
,i
2
···p
i
n1
,i
n
. ()
We first assume that X is a Markov chain. We prove () by induction on n.
When n = 0, () says P(A
0
) = λ
i
0
. This is true by definition of λ.
Assume that it is true for all n < N. Then
P(A
0
A
1
··· A
N
) = P(A
0
, ··· , A
N1
)P(A
0
, ··· , A
N
| A
0
, ··· , A
N1
)
= λ
i
0
p
i
0
,i
1
···p
i
N 2
,i
N 1
P(A
N
| A
0
, ··· , A
N1
)
= λ
i
0
p
i
0
,i
1
···p
i
N 2
,i
N 1
P(A
N
| A
N1
)
= λ
i
0
p
i
0
,i
1
···p
i
N 2
,i
N 1
p
i
N 1
,i
N
.
So it is true for N as well. Hence we are done by induction.
Conversely, suppose that (
) holds. Then for
n
= 0, we have
P
(
X
0
=
i
0
) =
λ
i
0
.
Otherwise,
P(X
n
= i
n
| X
0
= i
0
, ··· , X
n1
= i
n1
) = P(A
n
| A
0
··· A
n1
)
=
P(A
0
··· A
n
)
P(A
0
··· A
n1
)
= p
i
n1
,i
n
,
which is independent of i
0
, ··· , i
n2
. So this is Markov.
Often, we do not use the Markov property directly. Instead, we use the
following:
Theorem (Extended Markov property). Let
X
be a Markov chain. For
n
0,
any
H
given in terms of the past
{X
i
:
i < n}
, and any
F
given in terms of the
future {X
i
: i > n}, we have
P(F | X
n
= i, H) = P(F | X
n
= i).
To prove this, we need to stitch together many instances of the Markov
property. Actual proof is omitted.