IA Probability - Discrete random variables

3Discrete random variables

IA Probability

3.5 Probability generating functions

Consider a random variable X, taking values 0, 1, 2, ···. Let p

= P(X = r).

Definition (Probability generating function (pgf)). The probability generating

function (pgf) of X is

p(z) = E[z

] =

∞

r=0

P(X = r)z

= p

+ p

z + p

··· =

∞

This is a power series (or polynomial), and converges if |z| ≤ 1, since

|p(z)| ≤

| ≤

= 1.

We sometimes write as p

(z) to indicate what the random variable.

This definition might seem a bit out of the blue. However, it turns out to be

a rather useful algebraic tool that can concisely summarize information about

the probability distribution.

Example. Consider a fair di.e. Then p

= 1/6 for r = 1, ··· , 6. So

p(z) = E[z

] =

(z + z

+ ··· + z

) =



1 − z



Theorem. The distribution of

is uniquely determined by its probability

generating function.

Proof.

By definition,

(0),

′

(0) etc. (where

′

is the derivative of

In general,

p(z)



z=0

= i!p

So we can recover (p

, p

, ···) from p(z).

Theorem (Abel’s lemma).

E[X] = lim

z→1

′

(z).

If p

′

(z) is continuous, then simply E[X] = p

′

(1).

Note that this theorem is trivial if

′

(1) exists, as long as we know that we

can differentiate power series term by term. What is important here is that even

′

(1) doesn’t exist, we can still take the limit and obtain the expected value,

e.g. when E[X] = ∞.

Proof. For z < 1, we have

′

(z) =

∞

r−1

≤

∞

= E[X].

So we must have

lim

z→1

′

(z) ≤ E[X].

On the other hand, for any ε, if we pick N large, then

≥ E[X] − ε.

E[X] − ε ≤

= lim

z→1

r−1

≤ lim

z→1

∞

r−1

= lim

z→1

′

(z).

So E[X] ≤ lim

z→1

′

(z). So the result follows

Theorem.

E[X(X − 1)] = lim

z→1

′′

(z).

Proof. Same as above.

Example. Consider the Poisson distribution. Then

= P(X = r) =

−λ

Then

p(z) = E[z

] =

∞

−λ

= e

λz

−λ

= e

λ(z−1)

We can have a sanity check:

(1) = 1, which makes sense, since

(1) is the sum

of probabilities.

We have

E[X] =

λ(z−1)



z=1

= λ,

and

E[X(X − 1)] =

λ(z−1)



z=1

= λ

var(X) = E[X

] − E[X]

= λ

+ λ − λ

= λ.

Theorem. Suppose

, X

, ··· , X

are independent random variables with pgfs

, p

, ··· , p

. Then the pgf of X

+ X

+ ··· + X

is p

(z)p

(z) ···p

(z).

Proof.

E[z

+···+X

] = E[z

···z

] = E[z

] ···E[z

] = p

(z) ···p

(z).

Example. Let X ∼ B(n, p). Then

p(z) =

r=0

P(X = r)z





(1 −p)

n−r

= (pz + (1 −p))

= (pz + q)

(

) is the product of

copies of

. But

is the pgf of

Y ∼ B

, p

This shows that

···

(which we already knew), i.e. a

binomial distribution is the sum of Bernoulli trials.

Example. If

and

are independent Poisson random variables with parame-

ters λ, µ respectively, then

E[t

X+Y

] = E[t

]E[t

] = e

λ(t−1)

µ(t−1)

= e

(λ+µ)(t−1)

So X + Y ∼ P(λ + µ).

We can also do it directly:

P(X + Y = r) =

i=0

P(X = i, Y = r −i) =

i=0

P(X = i)P(X = r −i),

but is much more complicated.

We can use pgf-like functions to obtain some combinatorial results.

Example. Suppose we want to tile a 2

× n

bathroom by 2

1 tiles. One way

to do it is

We can do it recursively: suppose there are

ways to tile a 2

×n

grid. Then if

we start tiling, the first tile is either vertical, in which we have

n−1

ways to tile

the remaining ones; or the first tile is horizontal, in which we have

n−2

ways to

tile the remaining. So

= f

n−1

+ f

n−2

which is simply the Fibonacci sequence, with f

= f

= 1.

Let

F (z) =

∞

n=0

Then from our recurrence relation, we obtain

= f

n−1

+ f

n−2

∞

n=2

∞

n=2

n−1

∞

n=2

n−2

Since f

= f

= 1, we have

F (z) − f

− zf

= z(F (z) − f

) + z

F (z).

Thus F (z) = (1 − z − z

)

−1

. If we write

(1 +

√

5), α

(1 −

√

5).

then we have

F (z) = (1 − z − z

)

−1

(1 − α

z)(1 − α

− α



1 − α

−

1 − α



− α

∞

n=0

− α

∞

n=0

n+1

− α

n+1

− α

Example. A Dyck word is a string of brackets that match, such as (), or ((())()).

There is only one Dyck word of length 2, (). There are 2 of length 4, (()) and

()(). Similarly, there are 5 Dyck words of length 5.

Let

be the number of Dyck words of length 2

. We can split each Dyck

word into (

)

, where

and

are Dyck words. Since the lengths of

and w

must sum up to 2(n − 1),

n+1

i=0

n−i

. (∗)

We again use pgf-like functions: let

c(x) =

∞

n=0

From (∗), we can show that

c(x) = 1 + xc(x)

We can solve to show that

c(x) =

1 −

√

1 − 4x

∞





n + 1

noting that C

= 1. Then

n + 1





Sums with a random number of terms

A useful application of generating functions is the sum with a random number

of random terms. For example, an insurance company may receive a random

number of claims, each demanding a random amount of money. Then we have

a sum of a random number of terms. This can be answered using probability

generating functions.

Example. Let

, X

, ··· , X

be iid with pgf

(

) =

[

]. Let

be a random

variable independent of

with pgf

(

). What is the pgf of

···

E[z

] = E[z

+···+X

]

= E

+...+X

| N]

| {z }

assuming fixed N

]

∞

n=0

P(N = n)E[z

+···+X

]

∞

n=0

P(N = n)E[z

]E[z

] ···E[z

]

∞

n=0

P(N = n)(E[z

])

∞

n=0

P(N = n)p(z)

= h(p(z))

since h(x) =

∞

n=0

P(N = n)x

E[S] =

h(p(z))



z=1

= h

′

(p(1))p

′

(1)

= E[N]E[X

]

To calculate the variance, use the fact that

E[S(S − 1)] =

h(p(z))



z=1

Then we can find that

var(S) = E[N] var(X

) + E[X

] var(N).