2Hypothesis testing

IB Statistics

2.1 Simple hypotheses

Definition (Critical region). For testing

H

0

against an alternative hypothesis

H

1

, a test procedure has to partition

X

n

into two disjoint exhaustive regions

C

and

¯

C

, such that if x

∈ C

, then

H

0

is rejected, and if x

∈

¯

C

, then

H

0

is not

rejected. C is the critical region.

When performing a test, we may either arrive at a correct conclusion, or

make one of the two types of error:

Definition (Type I and II error).

(i) Type I error: reject H

0

when H

0

is true.

(ii) Type II error: not rejecting H

0

when H

0

is false.

Definition (Size and power). When H

0

and H

1

are both simple, let

α = P(Type I error) = P(X ∈ C | H

0

is true).

β = P(Type II error) = P(X ∈ C | H

1

is true).

The size of the test is α, and 1 − β is the power of the test to detect H

1

.

If we have two simple hypotheses, a relatively straightforward test is the

likelihood ratio test.

Definition (Likelihood). The likelihood of a simple hypothesis

H

:

θ

=

θ

∗

given

data x is

L

x

(H) = f

X

(x | θ = θ

∗

).

The likelihood ratio of two simple hypotheses H

0

, H

1

given data x is

Λ

x

(H

0

; H

1

) =

L

x

(H

1

)

L

x

(H

0

)

.

A likelihood ratio test (LR test) is one where the critical region

C

is of the form

C = {x : Λ

x

(H

0

; H

1

) > k}

for some k.

It turns out this rather simple test is “the best” in the following sense:

Lemma (Neyman-Pearson lemma). Suppose

H

0

:

f

=

f

0

,

H

1

:

f

=

f

1

, where

f

0

and

f

1

are continuous densities that are nonzero on the same regions. Then

among all tests of size less than or equal to

α

, the test with the largest power is

the likelihood ratio test of size α.

Proof. Under the likelihood ratio test, our critical region is

C =

x :

f

1

(x)

f

0

(x)

> k

,

where

k

is chosen such that

α

=

P

(

reject H

0

| H

0

) =

P

(X

∈ C | H

0

) =

R

C

f

0

(x) dx. The probability of Type II error is given by

β = P(X ∈ C | f

1

) =

Z

¯

C

f

1

(x) dx.

Let

C

∗

be the critical region of any other test with size less than or equal to

α

.

Let α

∗

= P(X ∈ C

∗

| H

0

) and β

∗

= P(X ∈ C

∗

| H

1

). We want to show β ≤ β

∗

.

We know α

∗

≤ α, i.e

Z

C

∗

f

0

(x) dx ≤

Z

C

f

0

(x) dx.

Also, on C, we have f

1

(x) > kf

0

(x), while on

¯

C we have f

1

(x) ≤ kf

0

(x). So

Z

¯

C

∗

∩C

f

1

(x) dx ≥ k

Z

¯

C

∗

∩C

f

0

(x) dx

Z

¯

C∩C

∗

f

1

(x) dx ≤ k

Z

¯

C∩C

∗

f

0

(x) dx.

Hence

β − β

∗

=

Z

¯

C

f

1

(x) dx −

Z

¯

C

∗

f

1

(x) dx

=

Z

¯

C∩C

∗

f

1

(x) dx +

Z

¯

C∩

¯

C

∗

f

1

(x) dx

−

Z

¯

C

∗

∩C

f

1

(x) dx −

Z

¯

C∩

¯

C

∗

f

1

(x) dx

=

Z

¯

C∩C

∗

f

1

(x) dx −

Z

¯

C

∗

∩C

f

1

(x) dx

≤ k

Z

¯

C∩C

∗

f

0

(x) dx −k

Z

¯

C

∗

∩C

f

0

(x) dx

= k

Z

¯

C∩C

∗

f

0

(x) dx +

Z

C∩C

∗

f

0

(x) dx

− k

Z

¯

C

∗

∩C

f

0

(x) dx +

Z

C∩C

∗

f

0

(x) dx

= k(α

∗

− α)

≤ 0.

C

∗

¯

C

∗

¯

C C

C

∗

∩

¯

C

(f

1

≤ kf

0

)

¯

C

∗

∩ C

(f

1

≥ kf

0

)

C

∗

∩ C

β/H

1

α/H

0

α

∗

/H

0

β

∗

/H

1

Here we assumed the

f

0

and

f

1

are continuous densities. However, this

assumption is only needed to ensure that the likelihood ratio test of exactly size

α

exists. Even with non-continuous distributions, the likelihood ratio test is still

a good idea. In fact, you will show in the example sheets that for a discrete

distribution, as long as a likelihood ratio test of exactly size

α

exists, the same

result holds.

Example. Suppose

X

1

, ··· , X

n

are iid

N

(

µ, σ

2

0

), where

σ

2

0

is known. We want

to find the best size

α

test of

H

0

:

µ

=

µ

0

against

H

1

:

µ

=

µ

1

, where

µ

0

and

µ

1

are known fixed values with µ

1

> µ

0

. Then

Λ

x

(H

0

; H

1

) =

(2πσ

2

0

)

−n/2

exp

−

1

2σ

2

0

P

(x

i

− µ

1

)

2

(2πσ

2

0

)

−n/2

exp

−

1

2σ

2

0

P

(x

i

− µ

0

)

2

= exp

µ

1

− µ

0

σ

2

0

n¯x +

n(µ

2

0

− µ

2

1

)

2σ

2

0

.

This is an increasing function of

¯x

, so for any

k

, Λ

x

> k ⇔ ¯x > c

for some

c

.

Hence we reject H

0

if ¯x > c, where c is chosen such that P(

¯

X > c | H

0

) = α.

Under H

0

,

¯

X ∼ N (µ

0

, σ

2

0

/n), so Z =

√

n(

¯

X − µ

0

)/σ

0

∼ N(0, 1).

Since ¯x > c ⇔ z > c

′

for some c

′

, the size α test rejects H

0

if

z =

√

n(¯x −µ

0

)

σ

0

> z

α

.

For example, suppose

µ

0

= 5,

µ

1

= 6,

σ

0

= 1,

α

= 0

.

05,

n

= 4 and x =

(5.1, 5.5, 4.9, 5.3). So ¯x = 5.2.

From tables,

z

0.05

= 1

.

645. We have

z

= 0

.

4 and this is less than 1

.

645. So x

is not in the rejection region.

We do not reject

H

0

at the 5% level and say that the data are consistent

with H

0

.

Note that this does not mean that we accept

H

0

. While we don’t have

sufficient reason to believe it is false, we also don’t have sufficient reason to

believe it is true.

This is called a z-test.

In this example, LR tests reject

H

0

if

z > k

for some constant

k

. The size of

such a test is

α

=

P

(

Z > k | H

0

) = 1

−

Φ(

k

), and is decreasing as

k

increasing.

Our observed value

z

will be in the rejected region iff

z > k ⇔ α > p

∗

=

P

(

Z >

z | H

0

).

Definition (

p

-value). The quantity

p

∗

is called the

p

-value of our observed data

x. For the example above, z = 0.4 and so p

∗

= 1 −Φ(0.4) = 0.3446.

In general, the

p

-value is sometimes called the “observed significance level” of

x. This is the probability under

H

0

of seeing data that is “more extreme” than

our observed data x. Extreme observations are viewed as providing evidence

against H

0

.