3Discrete random variables
IA Probability
3.5 Probability generating functions
Consider a random variable X, taking values 0, 1, 2, ···. Let p
r
= P(X = r).
Definition (Probability generating function (pgf)). The probability generating
function (pgf) of X is
p(z) = E[z
X
] =
∞
X
r=0
P(X = r)z
r
= p
0
+ p
1
z + p
2
z
2
··· =
∞
X
0
p
r
z
r
.
This is a power series (or polynomial), and converges if |z| ≤ 1, since
|p(z)| ≤
X
r
p
r
|z
r
| ≤
X
r
p
r
= 1.
We sometimes write as p
X
(z) to indicate what the random variable.
This definition might seem a bit out of the blue. However, it turns out to be
a rather useful algebraic tool that can concisely summarize information about
the probability distribution.
Example. Consider a fair di.e. Then p
r
= 1/6 for r = 1, ··· , 6. So
p(z) = E[z
X
] =
1
6
(z + z
2
+ ··· + z
6
) =
1
6
z
1 − z
6
1 − z
.
Theorem. The distribution of
X
is uniquely determined by its probability
generating function.
Proof.
By definition,
p
0
=
p
(0),
p
1
=
p
′
(0) etc. (where
p
′
is the derivative of
p
).
In general,
d
i
dz
i
p(z)
z=0
= i!p
i
.
So we can recover (p
0
, p
1
, ···) from p(z).
Theorem (Abel’s lemma).
E[X] = lim
z→1
p
′
(z).
If p
′
(z) is continuous, then simply E[X] = p
′
(1).
Note that this theorem is trivial if
p
′
(1) exists, as long as we know that we
can differentiate power series term by term. What is important here is that even
if
p
′
(1) doesn’t exist, we can still take the limit and obtain the expected value,
e.g. when E[X] = ∞.
Proof. For z < 1, we have
p
′
(z) =
∞
X
1
rp
r
z
r−1
≤
∞
X
1
rp
r
= E[X].
So we must have
lim
z→1
p
′
(z) ≤ E[X].
On the other hand, for any ε, if we pick N large, then
N
X
1
rp
r
≥ E[X] − ε.
So
E[X] − ε ≤
N
X
1
rp
r
= lim
z→1
N
X
1
rp
r
z
r−1
≤ lim
z→1
∞
X
1
rp
r
z
r−1
= lim
z→1
p
′
(z).
So E[X] ≤ lim
z→1
p
′
(z). So the result follows
Theorem.
E[X(X − 1)] = lim
z→1
p
′′
(z).
Proof. Same as above.
Example. Consider the Poisson distribution. Then
p
r
= P(X = r) =
1
r!
λ
r
e
−λ
.
Then
p(z) = E[z
X
] =
∞
X
0
z
r
1
r!
λ
r
e
−λ
= e
λz
e
−λ
= e
λ(z−1)
.
We can have a sanity check:
p
(1) = 1, which makes sense, since
p
(1) is the sum
of probabilities.
We have
E[X] =
d
dz
e
λ(z−1)
z=1
= λ,
and
E[X(X − 1)] =
d
2
dx
2
e
λ(z−1)
z=1
= λ
2
So
var(X) = E[X
2
] − E[X]
2
= λ
2
+ λ − λ
2
= λ.
Theorem. Suppose
X
1
, X
2
, ··· , X
n
are independent random variables with pgfs
p
1
, p
2
, ··· , p
n
. Then the pgf of X
1
+ X
2
+ ··· + X
n
is p
1
(z)p
2
(z) ···p
n
(z).
Proof.
E[z
X
1
+···+X
n
] = E[z
X
1
···z
X
n
] = E[z
X
1
] ···E[z
X
n
] = p
1
(z) ···p
n
(z).
Example. Let X ∼ B(n, p). Then
p(z) =
n
X
r=0
P(X = r)z
r
=
X
n
r
p
r
(1 −p)
n−r
z
r
= (pz + (1 −p))
n
= (pz + q)
n
.
So
p
(
z
) is the product of
n
copies of
pz
+
q
. But
pz
+
q
is the pgf of
Y ∼ B
(1
, p
).
This shows that
X
=
Y
1
+
Y
2
+
···
+
Y
n
(which we already knew), i.e. a
binomial distribution is the sum of Bernoulli trials.
Example. If
X
and
Y
are independent Poisson random variables with parame-
ters λ, µ respectively, then
E[t
X+Y
] = E[t
X
]E[t
Y
] = e
λ(t−1)
e
µ(t−1)
= e
(λ+µ)(t−1)
So X + Y ∼ P(λ + µ).
We can also do it directly:
P(X + Y = r) =
r
X
i=0
P(X = i, Y = r −i) =
r
X
i=0
P(X = i)P(X = r −i),
but is much more complicated.
We can use pgf-like functions to obtain some combinatorial results.
Example. Suppose we want to tile a 2
× n
bathroom by 2
×
1 tiles. One way
to do it is
We can do it recursively: suppose there are
f
n
ways to tile a 2
×n
grid. Then if
we start tiling, the first tile is either vertical, in which we have
f
n−1
ways to tile
the remaining ones; or the first tile is horizontal, in which we have
f
n−2
ways to
tile the remaining. So
f
n
= f
n−1
+ f
n−2
,
which is simply the Fibonacci sequence, with f
0
= f
1
= 1.
Let
F (z) =
∞
X
n=0
f
n
z
n
.
Then from our recurrence relation, we obtain
f
n
z
n
= f
n−1
z
n
+ f
n−2
z
n
.
So
∞
X
n=2
f
n
z
n
=
∞
X
n=2
f
n−1
z
n
+
∞
X
n=2
f
n−2
z
n
.
Since f
0
= f
1
= 1, we have
F (z) − f
0
− zf
1
= z(F (z) − f
0
) + z
2
F (z).
Thus F (z) = (1 − z − z
2
)
−1
. If we write
α
1
=
1
2
(1 +
√
5), α
2
=
1
2
(1 −
√
5).
then we have
F (z) = (1 − z − z
2
)
−1
=
1
(1 − α
1
z)(1 − α
2
z)
=
1
α
1
− α
2
α
1
1 − α
1
z
−
α
2
1 − α
2
z
=
1
α
1
− α
2
α
1
∞
X
n=0
α
n
1
z
n
− α
2
∞
X
n=0
α
n
2
z
n
!
.
So
f
n
=
α
n+1
1
− α
n+1
2
α
1
− α
2
.
Example. A Dyck word is a string of brackets that match, such as (), or ((())()).
There is only one Dyck word of length 2, (). There are 2 of length 4, (()) and
()(). Similarly, there are 5 Dyck words of length 5.
Let
C
n
be the number of Dyck words of length 2
n
. We can split each Dyck
word into (
w
1
)
w
2
, where
w
1
and
w
2
are Dyck words. Since the lengths of
w
1
and w
2
must sum up to 2(n − 1),
C
n+1
=
n
X
i=0
C
i
C
n−i
. (∗)
We again use pgf-like functions: let
c(x) =
∞
X
n=0
C
n
x
n
.
From (∗), we can show that
c(x) = 1 + xc(x)
2
.
We can solve to show that
c(x) =
1 −
√
1 − 4x
2x
=
∞
X
0
2n
n
x
n
n + 1
,
noting that C
0
= 1. Then
C
n
=
1
n + 1
2n
n
.
Sums with a random number of terms
A useful application of generating functions is the sum with a random number
of random terms. For example, an insurance company may receive a random
number of claims, each demanding a random amount of money. Then we have
a sum of a random number of terms. This can be answered using probability
generating functions.
Example. Let
X
1
, X
2
, ··· , X
n
be iid with pgf
p
(
z
) =
E
[
z
X
]. Let
N
be a random
variable independent of
X
i
with pgf
h
(
z
). What is the pgf of
S
=
X
1
+
···
+
X
N
?
E[z
S
] = E[z
X
1
+···+X
N
]
= E
N
[E
X
i
[z
X
1
+...+X
N
| N]
| {z }
assuming fixed N
]
=
∞
X
n=0
P(N = n)E[z
X
1
+X
2
+···+X
n
]
=
∞
X
n=0
P(N = n)E[z
X
1
]E[z
X
2
] ···E[z
X
n
]
=
∞
X
n=0
P(N = n)(E[z
X
1
])
n
=
∞
X
n=0
P(N = n)p(z)
n
= h(p(z))
since h(x) =
P
∞
n=0
P(N = n)x
n
.
So
E[S] =
d
dz
h(p(z))
z=1
= h
′
(p(1))p
′
(1)
= E[N]E[X
1
]
To calculate the variance, use the fact that
E[S(S − 1)] =
d
2
dz
2
h(p(z))
z=1
.
Then we can find that
var(S) = E[N] var(X
1
) + E[X
2
1
] var(N).