1Estimation
IB Statistics
1.5 Confidence intervals
Definition. A 100
γ
% (0
< γ <
1) confidence interval for
θ
is a random interval
(
A
(X)
, B
(X)) such that
P
(
A
(X)
< θ < B
(X)) =
γ
, no matter what the true
value of θ may be.
It is also possible to have confidence intervals for vector parameters.
Notice that it is the endpoints of the interval that are random quantities,
while θ is a fixed constant we want to find out.
We can interpret this in terms of repeat sampling. If we calculate (
A
(x)
, B
(x))
for a large number of samples x, then approximately 100
γ
% of them will cover
the true value of θ.
It is important to know that having observed some data x and calculated
95% confidence interval, we cannot say that
θ
has 95% chance of being within
the interval. Apart from the standard objection that
θ
is a fixed value and
either is or is not in the interval, and hence we cannot assign probabilities to
this event, we will later construct an example where even though we have got a
50% confidence interval, we are 100% sure that θ lies in that interval.
Example. Suppose
X
1
, ··· , X
n
are iid
N
(
θ,
1). Find a 95% confidence interval
for θ.
We know
¯
X ∼ N(θ,
1
n
), so that
√
n(
¯
X − θ) ∼ N(0, 1).
Let
z
1
, z
2
be such that Φ(
z
2
)
−
Φ(
z
1
) = 0
.
95, where Φ is the standard normal
distribution function.
We have P[z
1
<
√
n(
¯
X − θ) < z
2
] = 0.95, which can be rearranged to give
P
¯
X −
z
2
√
n
< θ <
¯
X −
z
1
√
n
= 0.95.
so we obtain the following 95% confidence interval:
¯
X −
z
2
√
n
,
¯
X −
z
1
√
n
.
There are many possible choices for
z
1
and
z
2
. Since
N
(0
,
1) density is symmetric,
the shortest such interval is obtained by
z
2
=
z
0.025
=
−z
1
. We can also choose
other values such as
z
1
=
−∞
,
z
2
= 1
.
64, but we usually choose symmetric end
points.
The above example illustrates a common procedure for finding confidence
intervals:
–
Find a quantity
R
(X
, θ
) such that the
P
θ
-distribution of
R
(X
, θ
) does not
depend on
θ
. This is called a pivot. In our example,
R
(X
, θ
) =
√
n
(
¯
X −θ
).
–
Write down a probability statement of the form
P
θ
(
c
1
< R
(X
, θ
)
< c
2
) =
γ
.
– Rearrange the inequalities inside P(. . .) to find the interval.
Usually
c
1
, c
2
are percentage points from a known standardised distribution, often
equitailed. For example, we pick 2
.
5% and 97
.
5% points for a 95% confidence
interval. We could also use, say 0% and 95%, but this generally results in a
wider interval.
Note that if (
A
(X)
, B
(X)) is a 100
γ
% confidence interval for
θ
, and
T
(
θ
)
is a monotone increasing function of
θ
, then (
T
(
A
(X))
, T
(
B
(X))) is a 100
γ
%
confidence interval for T (θ).
Example. Suppose
X
1
, ··· , X
50
are iid
N
(0
, σ
2
). Find a 99% confidence interval
for σ
2
.
We know that X
i
/σ ∼ N(0, 1). So
1
σ
2
50
X
i=1
X
2
i
∼ χ
2
50
.
So R(X, σ
2
) =
P
50
i=1
X
2
i
/σ
2
is a pivot.
Recall that χ
2
n
(α) is the upper 100α% point of χ
2
n
, i.e.
P(χ
2
n
≤ χ
2
n
(α)) = 1 −α.
So we have c
1
= χ
2
50
(0.995) = 27.99 and c
2
= χ
2
50
(0.005) = 79.49.
So
P
c
1
<
P
X
2
i
σ
2
< c
2
= 0.99,
and hence
P
P
X
2
i
c
2
< σ
2
<
P
X
2
i
c
1
= 0.99.
Using the remark above, we know that a 99% confidence interval for
σ
is
q
P
X
2
i
c
2
,
q
P
X
2
i
c
1
.
Example. Suppose
X
1
, ··· , X
n
are iid
Bernoulli
(
p
). Find an approximate
confidence interval for p.
The mle of p is ˆp =
P
X
i
/n.
By the Central Limit theorem,
ˆp
is approximately
N
(
p, p
(1
− p
)
/n
) for large
n.
So
√
n(ˆp −p)
p
p(1 −p)
is approximately
N
(0
,
1) for large
n
. So letting
z
(1−γ)/2
be
the solution to Φ(z
(1−γ)/2
) −Φ(−z
(1−γ)/2
) = 1 −γ, we have
P
ˆp − z
(1−γ)/2
r
p(1 −p)
n
< p < ˆp + z
(1−γ)/2
r
p(1 −p)
n
!
≈ γ.
But
p
is unknown! So we approximate it by
ˆp
to get a confidence interval for
p
when n is large:
P
ˆp − z
(1−γ)/2
r
ˆp(1 − ˆp)
n
< p < ˆp + z
(1−γ)/2
r
ˆp(1 − ˆp)
n
!
≈ γ.
Note that we have made a lot of approximations here, but it would be difficult
to do better than this.
Example. Suppose an opinion poll says 20% of the people are going to vote
UKIP, based on a random sample of 1
,
000 people. What might the true
proportion be?
We assume we have an observation of
x
= 200 from a
binomial
(
n, p
) distri-
bution with
n
= 1
,
000. Then
ˆp
=
x/n
= 0
.
2 is an unbiased estimate, and also
the mle.
Now
var
(
X/n
) =
p(1−p)
n
≈
ˆp(1−ˆp)
n
= 0
.
00016. So a 95% confidence interval is
ˆp − 1.96
r
ˆp(1 − ˆp)
n
, ˆp + 1.96
r
ˆp(1 − ˆp)
n
!
= 0.20±1.96×0.013 = (0.175, 0.225),
If we don’t want to make that many approximations, we can note that
p
(1
−p
)
≤
1
/
4 for all 0
≤ p ≤
1. So a conservative 95% interval is
ˆp±
1
.
96
p
1/4n ≈ ˆp±
p
1/n
.
So whatever proportion is reported, it will be ’accurate’ to ±1/
√
n.
Example. Suppose
X
1
, X
2
are iid from
U
(
θ −
1
/
2
, θ
+ 1
/
2). What is a sensible
50% confidence interval for θ?
We know that each
X
i
is equally likely to be less than
θ
or greater than
θ
.
So there is 50% chance that we get one observation on each side, i.e.
P
θ
(min(X
1
, X
2
) ≤ θ ≤ max(X
1
, X
2
)) =
1
2
.
So (min(X
1
, X
2
), max(X
1
, X
2
)) is a 50% confidence interval for θ.
But suppose after the experiment, we obtain
|x
1
−x
2
| ≥
1
2
. For example, we
might get
x
1
= 0
.
2
, x
2
= 0
.
9, then we know that, in this particular case,
θ
must
lie in (min(X
1
, X
2
), max(X
1
, X
2
)), and we don’t have just 50% “confidence”!
This is why after we calculate a confidence interval, we should not say “there
is 100(1
− α
)% chance that
θ
lies in here”. The confidence interval just says
that “if we keep making these intervals, 100(1
− α
)% of them will contain
θ
”.
But if have calculated a particular confidence interval, the probability that that
particular interval contains θ is not 100(1 − α)%.