2Hypothesis testing

IB Statistics



2.1 Simple hypotheses
Definition (Critical region). For testing
H
0
against an alternative hypothesis
H
1
, a test procedure has to partition
X
n
into two disjoint exhaustive regions
C
and
¯
C
, such that if x
C
, then
H
0
is rejected, and if x
¯
C
, then
H
0
is not
rejected. C is the critical region.
When performing a test, we may either arrive at a correct conclusion, or
make one of the two types of error:
Definition (Type I and II error).
(i) Type I error: reject H
0
when H
0
is true.
(ii) Type II error: not rejecting H
0
when H
0
is false.
Definition (Size and power). When H
0
and H
1
are both simple, let
α = P(Type I error) = P(X C | H
0
is true).
β = P(Type II error) = P(X ∈ C | H
1
is true).
The size of the test is α, and 1 β is the power of the test to detect H
1
.
If we have two simple hypotheses, a relatively straightforward test is the
likelihood ratio test.
Definition (Likelihood). The likelihood of a simple hypothesis
H
:
θ
=
θ
given
data x is
L
x
(H) = f
X
(x | θ = θ
).
The likelihood ratio of two simple hypotheses H
0
, H
1
given data x is
Λ
x
(H
0
; H
1
) =
L
x
(H
1
)
L
x
(H
0
)
.
A likelihood ratio test (LR test) is one where the critical region
C
is of the form
C = {x : Λ
x
(H
0
; H
1
) > k}
for some k.
It turns out this rather simple test is “the best” in the following sense:
Lemma (Neyman-Pearson lemma). Suppose
H
0
:
f
=
f
0
,
H
1
:
f
=
f
1
, where
f
0
and
f
1
are continuous densities that are nonzero on the same regions. Then
among all tests of size less than or equal to
α
, the test with the largest power is
the likelihood ratio test of size α.
Proof. Under the likelihood ratio test, our critical region is
C =
x :
f
1
(x)
f
0
(x)
> k
,
where
k
is chosen such that
α
=
P
(
reject H
0
| H
0
) =
P
(X
C | H
0
) =
R
C
f
0
(x) dx. The probability of Type II error is given by
β = P(X ∈ C | f
1
) =
Z
¯
C
f
1
(x) dx.
Let
C
be the critical region of any other test with size less than or equal to
α
.
Let α
= P(X C
| H
0
) and β
= P(X ∈ C
| H
1
). We want to show β β
.
We know α
α, i.e
Z
C
f
0
(x) dx
Z
C
f
0
(x) dx.
Also, on C, we have f
1
(x) > kf
0
(x), while on
¯
C we have f
1
(x) kf
0
(x). So
Z
¯
C
C
f
1
(x) dx k
Z
¯
C
C
f
0
(x) dx
Z
¯
CC
f
1
(x) dx k
Z
¯
CC
f
0
(x) dx.
Hence
β β
=
Z
¯
C
f
1
(x) dx
Z
¯
C
f
1
(x) dx
=
Z
¯
CC
f
1
(x) dx +
Z
¯
C
¯
C
f
1
(x) dx
Z
¯
C
C
f
1
(x) dx
Z
¯
C
¯
C
f
1
(x) dx
=
Z
¯
CC
f
1
(x) dx
Z
¯
C
C
f
1
(x) dx
k
Z
¯
CC
f
0
(x) dx k
Z
¯
C
C
f
0
(x) dx
= k
Z
¯
CC
f
0
(x) dx +
Z
CC
f
0
(x) dx
k
Z
¯
C
C
f
0
(x) dx +
Z
CC
f
0
(x) dx
= k(α
α)
0.
C
¯
C
¯
C C
C
¯
C
(f
1
kf
0
)
¯
C
C
(f
1
kf
0
)
C
C
β/H
1
α/H
0
α
/H
0
β
/H
1
Here we assumed the
f
0
and
f
1
are continuous densities. However, this
assumption is only needed to ensure that the likelihood ratio test of exactly size
α
exists. Even with non-continuous distributions, the likelihood ratio test is still
a good idea. In fact, you will show in the example sheets that for a discrete
distribution, as long as a likelihood ratio test of exactly size
α
exists, the same
result holds.
Example. Suppose
X
1
, ··· , X
n
are iid
N
(
µ, σ
2
0
), where
σ
2
0
is known. We want
to find the best size
α
test of
H
0
:
µ
=
µ
0
against
H
1
:
µ
=
µ
1
, where
µ
0
and
µ
1
are known fixed values with µ
1
> µ
0
. Then
Λ
x
(H
0
; H
1
) =
(2πσ
2
0
)
n/2
exp
1
2σ
2
0
P
(x
i
µ
1
)
2
(2πσ
2
0
)
n/2
exp
1
2σ
2
0
P
(x
i
µ
0
)
2
= exp
µ
1
µ
0
σ
2
0
n¯x +
n(µ
2
0
µ
2
1
)
2σ
2
0
.
This is an increasing function of
¯x
, so for any
k
, Λ
x
> k ¯x > c
for some
c
.
Hence we reject H
0
if ¯x > c, where c is chosen such that P(
¯
X > c | H
0
) = α.
Under H
0
,
¯
X N (µ
0
, σ
2
0
/n), so Z =
n(
¯
X µ
0
)
0
N(0, 1).
Since ¯x > c z > c
for some c
, the size α test rejects H
0
if
z =
n(¯x µ
0
)
σ
0
> z
α
.
For example, suppose
µ
0
= 5,
µ
1
= 6,
σ
0
= 1,
α
= 0
.
05,
n
= 4 and x =
(5.1, 5.5, 4.9, 5.3). So ¯x = 5.2.
From tables,
z
0.05
= 1
.
645. We have
z
= 0
.
4 and this is less than 1
.
645. So x
is not in the rejection region.
We do not reject
H
0
at the 5% level and say that the data are consistent
with H
0
.
Note that this does not mean that we accept
H
0
. While we don’t have
sufficient reason to believe it is false, we also don’t have sufficient reason to
believe it is true.
This is called a z-test.
In this example, LR tests reject
H
0
if
z > k
for some constant
k
. The size of
such a test is
α
=
P
(
Z > k | H
0
) = 1
Φ(
k
), and is decreasing as
k
increasing.
Our observed value
z
will be in the rejected region iff
z > k α > p
=
P
(
Z >
z | H
0
).
Definition (
p
-value). The quantity
p
is called the
p
-value of our observed data
x. For the example above, z = 0.4 and so p
= 1 Φ(0.4) = 0.3446.
In general, the
p
-value is sometimes called the “observed significance level” of
x. This is the probability under
H
0
of seeing data that is “more extreme” than
our observed data x. Extreme observations are viewed as providing evidence
against H
0
.