IB Statistics - Hypothesis testing

2Hypothesis testing

IB Statistics

2.1 Simple hypotheses

Definition (Critical region). For testing

against an alternative hypothesis

, a test procedure has to partition

into two disjoint exhaustive regions

and

, such that if x

∈ C

, then

is rejected, and if x

∈

, then

is not

rejected. C is the critical region.

When performing a test, we may either arrive at a correct conclusion, or

make one of the two types of error:

Definition (Type I and II error).

(i) Type I error: reject H

when H

is true.

(ii) Type II error: not rejecting H

when H

is false.

Definition (Size and power). When H

and H

are both simple, let

α = P(Type I error) = P(X ∈ C | H

is true).

β = P(Type II error) = P(X ∈ C | H

is true).

The size of the test is α, and 1 − β is the power of the test to detect H

If we have two simple hypotheses, a relatively straightforward test is the

likelihood ratio test.

Definition (Likelihood). The likelihood of a simple hypothesis

∗

given

data x is

(H) = f

(x | θ = θ

∗

The likelihood ratio of two simple hypotheses H

, H

given data x is

; H

) =

)

A likelihood ratio test (LR test) is one where the critical region

is of the form

C = {x : Λ

; H

) > k}

for some k.

It turns out this rather simple test is “the best” in the following sense:

Lemma (Neyman-Pearson lemma). Suppose

, where

and

are continuous densities that are nonzero on the same regions. Then

among all tests of size less than or equal to

, the test with the largest power is

the likelihood ratio test of size α.

Proof. Under the likelihood ratio test, our critical region is

C =



x :

(x)

> k



where

is chosen such that

(

reject H

| H

) =

∈ C | H

) =

(x) dx. The probability of Type II error is given by

β = P(X ∈ C | f

) =

(x) dx.

Let

∗

be the critical region of any other test with size less than or equal to

Let α

∗

= P(X ∈ C

∗

| H

) and β

∗

= P(X ∈ C

∗

| H

). We want to show β ≤ β

∗

We know α

∗

≤ α, i.e

∗

(x) dx ≤

(x) dx.

Also, on C, we have f

(x) > kf

(x), while on

C we have f

(x) ≤ kf

(x). So

∗

∩C

(x) dx ≥ k

∗

∩C

(x) dx

C∩C

∗

(x) dx ≤ k

C∩C

∗

(x) dx.

Hence

β − β

∗

(x) dx −

∗

(x) dx

C∩C

∗

(x) dx +

C∩

∗

(x) dx

−

∗

∩C

(x) dx −

C∩

∗

(x) dx

C∩C

∗

(x) dx −

∗

∩C

(x) dx

≤ k

C∩C

∗

(x) dx −k

∗

∩C

(x) dx

= k



C∩C

∗

(x) dx +

C∩C

∗

(x) dx



− k



∗

∩C

(x) dx +

C∩C

∗

(x) dx



= k(α

∗

− α)

≤ 0.

∗

C C

∗

∩

≤ kf

)

∗

∩ C

≥ kf

)

∗

∩ C

β/H

α/H

∗

Here we assumed the

and

are continuous densities. However, this

assumption is only needed to ensure that the likelihood ratio test of exactly size

exists. Even with non-continuous distributions, the likelihood ratio test is still

a good idea. In fact, you will show in the example sheets that for a discrete

distribution, as long as a likelihood ratio test of exactly size

exists, the same

result holds.

Example. Suppose

, ··· , X

are iid

(

µ, σ

), where

is known. We want

to find the best size

test of

against

, where

and

are known fixed values with µ

> µ

. Then

; H

) =

(2πσ

)

−n/2

exp



−

2σ

− µ

)



(2πσ

)

−n/2

exp



−

2σ

− µ

)



= exp



− µ

n¯x +

n(µ

− µ

)

2σ



This is an increasing function of

¯x

, so for any

, Λ

> k ⇔ ¯x > c

for some

Hence we reject H

if ¯x > c, where c is chosen such that P(

X > c | H

) = α.

Under H

X ∼ N (µ

, σ

/n), so Z =

√

X − µ

)/σ

∼ N(0, 1).

Since ¯x > c ⇔ z > c

′

for some c

′

, the size α test rejects H

z =

√

n(¯x −µ

)

> z

For example, suppose

= 5,

= 6,

= 1,

= 0

05,

= 4 and x =

(5.1, 5.5, 4.9, 5.3). So ¯x = 5.2.

From tables,

0.05

= 1

645. We have

= 0

4 and this is less than 1

645. So x

is not in the rejection region.

We do not reject

at the 5% level and say that the data are consistent

with H

Note that this does not mean that we accept

. While we don’t have

sufficient reason to believe it is false, we also don’t have sufficient reason to

believe it is true.

This is called a z-test.

In this example, LR tests reject

z > k

for some constant

. The size of

such a test is

(

Z > k | H

) = 1

−

Φ(

), and is decreasing as

increasing.

Our observed value

will be in the rejected region iff

z > k ⇔ α > p

∗

(

Z >

z | H

Definition (

-value). The quantity

∗

is called the

-value of our observed data

x. For the example above, z = 0.4 and so p

∗

= 1 −Φ(0.4) = 0.3446.

In general, the

-value is sometimes called the “observed significance level” of

x. This is the probability under

of seeing data that is “more extreme” than

our observed data x. Extreme observations are viewed as providing evidence

against H