7Central limit theorem

IA Probability



7 Central limit theorem
Suppose
X
1
, ··· , X
n
are iid random variables with mean
µ
and variance
σ
2
. Let
S
n
= X
1
+ ··· + X
n
. Then we have previously shown that
var(S
n
/
n) = var
S
n
n
= σ
2
.
Theorem (Central limit theorem). Let
X
1
, X
2
, ···
be iid random variables with
E[X
i
] = µ, var(X
i
) = σ
2
< . Define
S
n
= X
1
+ ··· + X
n
.
Then for all finite intervals (a, b),
lim
n→∞
P
a
S
n
σ
n
b
=
Z
b
a
1
2π
e
1
2
t
2
dt.
Note that the final term is the pdf of a standard normal. We say
S
n
σ
n
D
N(0, 1).
To show this, we will use the continuity theorem without proof:
Theorem (Continuity theorem). If the random variables
X
1
, X
2
, ···
have mgf’s
m
1
(
θ
)
, m
2
(
θ
)
, ···
and
m
n
(
θ
)
m
(
θ
) as
n
for all
θ
, then
X
n
D
the
random variable with mgf m(θ).
We now provide a sketch-proof of the central limit theorem:
Proof. wlog, assume µ = 0, σ
2
= 1 (otherwise replace X
i
with
X
i
µ
σ
).
Then
m
X
i
(θ) = E[e
θX
i
] = 1 + θE[X
i
] +
θ
2
2!
E[X
2
i
] + ···
= 1 +
1
2
θ
2
+
1
3!
θ
3
E[X
3
i
] + ···
Now consider S
n
/
n. Then
E[e
θS
n
/
n
] = E[e
θ(X
1
+...+X
n
)/
n
]
= E[e
θX
1
/
n
] ···E[e
θX
n
/
n
]
=
E[e
θX
1
/
n
]
n
=
1 +
1
2
θ
2
1
n
+
1
3!
θ
3
E[X
3
]
1
n
3/2
+ ···
n
e
1
2
θ
2
as
n
since (1 +
a/n
)
n
e
a
. And this is the mgf of the standard normal.
So the result follows from the continuity theorem.
Note that this is not a very formal proof, since we have to require E[X
3
] to
be finite. Also, sometimes the moment generating function is not defined. But
this will work for many “nice” distributions we will ever meet.
The proper proof uses the characteristic function
χ
X
(θ) = E[e
iθX
].
An important application is to use the normal distribution to approximate a
large binomial.
Let
X
i
B
(1
, p
). Then
S
n
B
(
n, p
). So
E
[
S
n
] =
np
and
var
(
S
n
) =
p
(1
p
).
So
S
n
np
p
np(1 p)
D
N(0, 1).
Example. Suppose two planes fly a route. Each of
n
passengers chooses a plane
at random. The number of people choosing plane 1 is
S B
(
n,
1
2
). Suppose
each has s seats. What is
F (s) = P(S > s),
i.e. the probability that plane 1 is over-booked? We have
F (s) = P(S > s) = P
S n/2
q
n ·
1
2
·
1
2
>
s n/2
n/2
.
Since
S np
n/2
N(0, 1),
we have
F (s) 1 Φ
s n/2
n/2
.
For example, if
n
= 1000 and
s
= 537, then
S
n
n/2
n/2
2
.
34, Φ(2
.
34)
0
.
99,
and
F
(
s
)
0
.
01. So with only 74 seats as buffer between the two planes, the
probability of overbooking is just 1/100.
Example. An unknown proportion
p
of the electorate will vote Labour. It is
desired to find
p
without an error not exceeding 0
.
005. How large should the
sample be?
We estimate by
p
=
S
n
n
,
where X
i
B(1, p). Then
P(|p
p| 0.005) = P(|S
n
np| 0.005n)
= P
|S
n
np|
p
np(1 p)
| {z }
N(0,1)
0.005n
p
np(1 p)
We want |p
p| 0.005 with probability 0.95. Then we want
0.005n
p
np(1 p)
Φ
1
(0.975) = 1.96.
(we use 0.975 instead of 0.95 since we are doing a two-tailed test) Since the
maximum possible value of p(1 p) is 1/4, we have
n 38416.
In practice, we don’t have that many samples. Instead, we go by
P(|p
< p| 0.03) 0.95.
This just requires n 1068.
Example (Estimating
π
with Buffon’s needle). Recall that if we randomly toss
a needle of length
to a floor marked with parallel lines a distance
L
apart, the
probability that the needle hits the line is p =
2
πL
.
X
θ
L
Suppose we toss the pin n times, and it hits the line N times. Then
N N(np, np(1 p))
by the Central limit theorem. Write
p
for the actual proportion observed. Then
ˆπ =
2
(N/n)L
=
π2ℓ/(πL)
p
=
πp
p + (p
p)
= π
1
p
p
p
+ ···
Hence
ˆπ π
p p
p
.
We know
p
N
p,
p(1 p)
n
.
So we can find
ˆπ π N
0,
π
2
p(1 p)
np
2
= N
0,
π
2
(1 p)
np
We want a small variance, and that occurs when
p
is the largest. Since
p
= 2
ℓ/πL
,
this is maximized with = L. In this case,
p =
2
π
,
and
ˆπ π N
0,
(π 2)π
2
2n
.
If we want to estimate π to 3 decimal places, then we need
P(|ˆπ π| 0.001) 0.95.
This is true if and only if
0.001
s
2n
(π 2)(π
2
)
Φ
1
(0.975) = 1.96
So
n
2
.
16
×
10
7
. So we can obtain
π
to 3 decimal places just by throwing a
stick 20 million times! Isn’t that exciting?