1Estimation

IB Statistics

1.5 Confidence intervals

Definition. A 100

γ

% (0

< γ <

1) confidence interval for

θ

is a random interval

(

A

(X)

, B

(X)) such that

P

(

A

(X)

< θ < B

(X)) =

γ

, no matter what the true

value of θ may be.

It is also possible to have confidence intervals for vector parameters.

Notice that it is the endpoints of the interval that are random quantities,

while θ is a fixed constant we want to find out.

We can interpret this in terms of repeat sampling. If we calculate (

A

(x)

, B

(x))

for a large number of samples x, then approximately 100

γ

% of them will cover

the true value of θ.

It is important to know that having observed some data x and calculated

95% confidence interval, we cannot say that

θ

has 95% chance of being within

the interval. Apart from the standard objection that

θ

is a fixed value and

either is or is not in the interval, and hence we cannot assign probabilities to

this event, we will later construct an example where even though we have got a

50% confidence interval, we are 100% sure that θ lies in that interval.

Example. Suppose

X

1

, ··· , X

n

are iid

N

(

θ,

1). Find a 95% confidence interval

for θ.

We know

¯

X ∼ N(θ,

1

n

), so that

√

n(

¯

X − θ) ∼ N(0, 1).

Let

z

1

, z

2

be such that Φ(

z

2

)

−

Φ(

z

1

) = 0

.

95, where Φ is the standard normal

distribution function.

We have P[z

1

<

√

n(

¯

X − θ) < z

2

] = 0.95, which can be rearranged to give

P

¯

X −

z

2

√

n

< θ <

¯

X −

z

1

√

n

= 0.95.

so we obtain the following 95% confidence interval:

¯

X −

z

2

√

n

,

¯

X −

z

1

√

n

.

There are many possible choices for

z

1

and

z

2

. Since

N

(0

,

1) density is symmetric,

the shortest such interval is obtained by

z

2

=

z

0.025

=

−z

1

. We can also choose

other values such as

z

1

=

−∞

,

z

2

= 1

.

64, but we usually choose symmetric end

points.

The above example illustrates a common procedure for finding confidence

intervals:

–

Find a quantity

R

(X

, θ

) such that the

P

θ

-distribution of

R

(X

, θ

) does not

depend on

θ

. This is called a pivot. In our example,

R

(X

, θ

) =

√

n

(

¯

X −θ

).

–

Write down a probability statement of the form

P

θ

(

c

1

< R

(X

, θ

)

< c

2

) =

γ

.

– Rearrange the inequalities inside P(. . .) to find the interval.

Usually

c

1

, c

2

are percentage points from a known standardised distribution, often

equitailed. For example, we pick 2

.

5% and 97

.

5% points for a 95% confidence

interval. We could also use, say 0% and 95%, but this generally results in a

wider interval.

Note that if (

A

(X)

, B

(X)) is a 100

γ

% confidence interval for

θ

, and

T

(

θ

)

is a monotone increasing function of

θ

, then (

T

(

A

(X))

, T

(

B

(X))) is a 100

γ

%

confidence interval for T (θ).

Example. Suppose

X

1

, ··· , X

50

are iid

N

(0

, σ

2

). Find a 99% confidence interval

for σ

2

.

We know that X

i

/σ ∼ N(0, 1). So

1

σ

2

50

X

i=1

X

2

i

∼ χ

2

50

.

So R(X, σ

2

) =

P

50

i=1

X

2

i

/σ

2

is a pivot.

Recall that χ

2

n

(α) is the upper 100α% point of χ

2

n

, i.e.

P(χ

2

n

≤ χ

2

n

(α)) = 1 −α.

So we have c

1

= χ

2

50

(0.995) = 27.99 and c

2

= χ

2

50

(0.005) = 79.49.

So

P

c

1

<

P

X

2

i

σ

2

< c

2

= 0.99,

and hence

P

P

X

2

i

c

2

< σ

2

<

P

X

2

i

c

1

= 0.99.

Using the remark above, we know that a 99% confidence interval for

σ

is

q

P

X

2

i

c

2

,

q

P

X

2

i

c

1

.

Example. Suppose

X

1

, ··· , X

n

are iid

Bernoulli

(

p

). Find an approximate

confidence interval for p.

The mle of p is ˆp =

P

X

i

/n.

By the Central Limit theorem,

ˆp

is approximately

N

(

p, p

(1

− p

)

/n

) for large

n.

So

√

n(ˆp −p)

p

p(1 −p)

is approximately

N

(0

,

1) for large

n

. So letting

z

(1−γ)/2

be

the solution to Φ(z

(1−γ)/2

) −Φ(−z

(1−γ)/2

) = 1 −γ, we have

P

ˆp − z

(1−γ)/2

r

p(1 −p)

n

< p < ˆp + z

(1−γ)/2

r

p(1 −p)

n

!

≈ γ.

But

p

is unknown! So we approximate it by

ˆp

to get a confidence interval for

p

when n is large:

P

ˆp − z

(1−γ)/2

r

ˆp(1 − ˆp)

n

< p < ˆp + z

(1−γ)/2

r

ˆp(1 − ˆp)

n

!

≈ γ.

Note that we have made a lot of approximations here, but it would be difficult

to do better than this.

Example. Suppose an opinion poll says 20% of the people are going to vote

UKIP, based on a random sample of 1

,

000 people. What might the true

proportion be?

We assume we have an observation of

x

= 200 from a

binomial

(

n, p

) distri-

bution with

n

= 1

,

000. Then

ˆp

=

x/n

= 0

.

2 is an unbiased estimate, and also

the mle.

Now

var

(

X/n

) =

p(1−p)

n

≈

ˆp(1−ˆp)

n

= 0

.

00016. So a 95% confidence interval is

ˆp − 1.96

r

ˆp(1 − ˆp)

n

, ˆp + 1.96

r

ˆp(1 − ˆp)

n

!

= 0.20±1.96×0.013 = (0.175, 0.225),

If we don’t want to make that many approximations, we can note that

p

(1

−p

)

≤

1

/

4 for all 0

≤ p ≤

1. So a conservative 95% interval is

ˆp±

1

.

96

p

1/4n ≈ ˆp±

p

1/n

.

So whatever proportion is reported, it will be ’accurate’ to ±1/

√

n.

Example. Suppose

X

1

, X

2

are iid from

U

(

θ −

1

/

2

, θ

+ 1

/

2). What is a sensible

50% confidence interval for θ?

We know that each

X

i

is equally likely to be less than

θ

or greater than

θ

.

So there is 50% chance that we get one observation on each side, i.e.

P

θ

(min(X

1

, X

2

) ≤ θ ≤ max(X

1

, X

2

)) =

1

2

.

So (min(X

1

, X

2

), max(X

1

, X

2

)) is a 50% confidence interval for θ.

But suppose after the experiment, we obtain

|x

1

−x

2

| ≥

1

2

. For example, we

might get

x

1

= 0

.

2

, x

2

= 0

.

9, then we know that, in this particular case,

θ

must

lie in (min(X

1

, X

2

), max(X

1

, X

2

)), and we don’t have just 50% “confidence”!

This is why after we calculate a confidence interval, we should not say “there

is 100(1

− α

)% chance that

θ

lies in here”. The confidence interval just says

that “if we keep making these intervals, 100(1

− α

)% of them will contain

θ

”.

But if have calculated a particular confidence interval, the probability that that

particular interval contains θ is not 100(1 − α)%.