2Hypothesis testing
IB Statistics
2.3 Tests of goodnessoffit and independence
2.3.1 Goo dnessoffit of a fullyspecified null distribution
So far, we have considered relatively simple cases where we are attempting to
figure out, say, the mean. However, in reality, more complicated scenarios arise.
For example, we might want to know if a dice is fair, i.e. if the probability of
getting each number is exactly
1
6
. Our null hypothesis would be that
p
1
=
p
2
=
··· = p
6
=
1
6
, while the alternative hypothesis allows any possible values of p
i
.
In general, suppose the observation space
X
is partitioned into
k
sets, and
let
p
i
be the probability that an observation is in set
i
for
i
= 1
, ··· , k
. We want
to test “
H
0
: the
p
i
’s arise from a fully specified model” against “
H
1
: the
p
i
’s
are unrestricted (apart from the obvious p
i
≥ 0,
P
p
i
= 1)”.
Example. The following table lists the birth months of admissions to Oxford
and Cambridge in 2012.
Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug
470 515 470 457 473 381 466 457 437 396 384 394
Is this compatible with a uniform distribution over the year?
Out of
n
independent observations, let
N
i
be the number of observations in
ith set. So (N
1
, ··· , N
k
) ∼ multinomial(k; p
1
, ··· , p
k
).
For a generalized likelihood ratio test of
H
0
, we need to find the maximised
likelihood under H
0
and H
1
.
Under
H
1
,
like
(
p
1
, ··· , p
k
)
∝ p
n
1
1
···p
n
k
k
. So the log likelihood is
l
=
constant
+
P
n
i
log p
i
. We want to maximise this subject to
P
p
i
= 1. Us
ing the Lagrange multiplier, we will find that the mle is
ˆp
i
=
n
i
/n
. Also
Θ
1
 = k −1 (not k, since they must sum up to 1).
Under
H
0
, the values of
p
i
are specified completely, say
p
i
=
˜p
i
. So

Θ
0

= 0.
Using our formula for ˆp
i
, we find that
2 log Λ = 2 log
ˆp
n
1
1
··· ˆp
n
k
k
˜p
n
1
1
··· ˜p
n
k
k
= 2
X
n
i
log
n
i
n˜p
i
(1)
Here

Θ
1
−
Θ
0

=
k−
1. So we reject
H
0
if 2
log
Λ
> χ
2
k−1
(
α
) for an approximate
size α test.
Under
H
0
(no effect of month of birth),
˜p
i
is the proportion of births in
month
i
in 1993/1994 in the whole population — this is not simply proportional
to the number of days in each month (or even worse,
1
12
), as there is for example
an excess of September births (the “Christmas effect”). Then
2 log Λ = 2
X
n
i
log
n
i
n˜p
i
= 44.86.
P
(
χ
2
11
>
44
.
86) = 3
×
10
−9
, which is our
p
value. Since this is certainly less than
0.001, we can reject
H
0
at the 0
.
1% level, or can say the result is “significant at
the 0.1% level”.
The traditional levels for comparison are
α
= 0
.
05
,
0
.
01
,
0
.
001, roughly
corresponding to “evidence”, “strong evidence” and “very strong evidence”.
A similar common situation has
H
0
:
p
i
=
p
i
(
θ
) for some parameter
θ
and
H
1
as before. Now

Θ
0

is the number of independent parameters to be estimated
under H
0
.
Under H
0
, we find mle
ˆ
θ by maximizing n
i
log p
i
(θ), and then
2 log Λ = 2 log
ˆp
1
n
1
··· ˆp
k
n
k
p
1
(
ˆ
θ)
n
1
···p
k
(
ˆ
θ)
n
k
!
= 2
X
n
i
log
n
i
np
i
(
ˆ
θ)
!
. (2)
The degrees of freedom are k − 1 −Θ
0
.
2.3.2 Pearson’s chisquared test
Notice that the two log likelihoods are of the same form. In general, let
o
i
=
n
i
(observed number) and let
e
i
=
n ˜p
i
or
np
i
(
ˆ
θ
) (expected number). Let
δ
i
=
o
i
−e
i
.
Then
2 log Λ = 2
X
o
i
log
o
i
e
i
= 2
X
(e
i
+ δ
i
) log
1 +
δ
i
e
i
= 2
X
(e
i
+ δ
i
)
δ
i
e
i
−
δ
2
i
2e
2
i
+ O(δ
3
i
)
= 2
X
δ
i
+
δ
2
i
e
i
−
δ
2
i
2e
i
+ O(δ
3
i
)
We know that
P
δ
i
= 0 since
P
e
i
=
P
o
i
. So
≈
X
δ
2
i
e
i
=
X
(o
i
− e
i
)
2
e
i
.
This is known as the Pearson’s chisquared test.
Example. Mendel crossed 556 smooth yellow male peas with wrinkled green
peas. From the progeny, let
(i) N
1
be the number of smooth yellow peas,
(ii) N
2
be the number of smooth green peas,
(iii) N
3
be the number of wrinkled yellow peas,
(iv) N
4
be the number of wrinkled green peas.
We wish to test the goodness of fit of the model
H
0
: (p
1
, p
2
, p
3
, p
4
) =
9
16
,
3
16
,
3
16
,
1
16
.
Suppose we observe (n
1
, n
2
, n
3
, n
4
) = (315, 108, 102, 31).
We find (
e
1
, e
2
, e
3
, e
4
) = (312
.
75
,
104
.
25
,
104
.
25
,
34
.
75). The actual 2
log
Λ =
0.618 and the approximation we had is
P
(o
i
−e
i
)
2
e
i
= 0.604.
Here Θ
0
 = 0 and Θ
1
 = 4 − 1 = 3. So we refer to test statistics χ
2
3
(α).
Since
χ
2
3
(0
.
05) = 7
.
815, we see that neither value is significant at 5%. So
there is no evidence against Mendel’s theory. In fact, the
p
value is approximately
P
(
χ
2
3
>
0
.
6)
≈
0
.
90. This is a really good fit, so good that people suspect the
numbers were not genuine.
Example. In a genetics problem, each individual has one of the three possible
genotypes, with probabilities
p
1
, p
2
, p
3
. Suppose we wish to test
H
0
:
p
i
=
p
i
(
θ
),
where
p
1
(θ) = θ
2
, p
2
= 2θ(1 − θ), p
3
(θ) = (1 − θ)
2
.
for some θ ∈ (0, 1).
We observe N
i
= n
i
. Under H
0
, the mle
ˆ
θ is found by maximising
X
n
i
log p
i
(θ) = 2n
1
log θ + n
2
log(2θ(1 − θ)) + 2n
3
log(1 − θ).
We find that
ˆ
θ =
2n
1
+n
2
2n
. Also, Θ
0
 = 1 and Θ
1
 = 2.
After conducting an experiment, we can substitute
p
i
(
ˆ
θ
) into (2), or find the
corresponding Pearson’s chisquared statistic, and refer to χ
2
1
.
2.3.3 Testing indep endence in contingency tables
Definition (Contingency table). A contingency table is a table in which obser
vations or individuals are classified according to one or more criteria.
Example. 500 people with recent car changes were asked about their previous
and new cars. The results are as follows:
New car
Large Medium Small
Previous
car
Large 56 52 42
Medium 50 83 67
Small 18 51 81
This is a twoway contingency table: Each person is classified according to the
previous car size and new car size.
Consider a twoway contingency table with
r
rows and
c
columns. For
i
= 1
, ··· , r
And
j
= 1
, ··· , c
, let
p
ij
be the probability that an individual
selected from the population under consideration is classified in row
i
and
column j. (i.e. in the (i, j) cell of the table).
Let
p
i+
=
P
(
in row i
) and
p
+j
=
P
(
in column j
). Then we must have
p
++
=
P
i
P
j
p
ij
= 1.
Suppose a random sample of
n
individuals is taken, and let
n
ij
be the number
of these classified in the (i, j) cell of the table.
Let n
i+
=
P
j
n
ij
and n
+j
=
P
i
n
ij
. So n
++
= n.
We have
(N
11
, ··· , N
1c
, N
21
, ··· , N
rc
) ∼ multinomial(rc; p
11
, ··· , p
1c
, p
21
, ··· , p
rc
).
We may be interested in testing the null hypothesis that the two classifications
are independent. So we test
– H
0
: p
ij
= p
i+
p
+j
for all i, j, i.e. independence of columns and rows.
– H
1
: p
ij
are unrestricted.
Of course we have the usual restrictions like p
++
= 1, p
ij
≥ 0.
Under H
1
, the mles are ˆp
ij
=
n
ij
n
.
Under H
0
, the mles are ˆp
i+
=
n
i+
n
and ˆp
+j
=
n
+j
n
.
Write o
ij
= n
ij
and e
ij
= nˆp
i+
ˆp
+j
= n
i+
n
+j
/n.
Then
2 log Λ = 2
r
X
i=1
c
X
j=1
o
ij
log
o
ij
e
ij
≈
r
X
i=1
c
X
j=1
(o
ij
− e
ij
)
2
e
ij
.
using the same approximating steps for Pearson’s Chisquared test.
We have

Θ
1

=
rc −
1, because under
H
1
the
p
ij
’s sum to one. Also,

Θ
0

= (
r −
1) + (
c −
1) because
p
1+
, ··· , p
r+
must satisfy
P
i
p
i+
= 1 and
p
+1
, ··· , p
+c
must satisfy
P
j
p
+j
= 1. So
Θ
1
 − Θ
0
 = rc − 1 − (r − 1) − (c − 1) = (r − 1)(c − 1).
Example. In our previous example, we wish to test
H
0
: the new and previous
car sizes are independent. The actual data is:
New car
Large Medium Small Total
Previous
car
Large 56 52 42 150
Medium 50 83 67 120
Small 18 51 81 150
Total 124 186 190 500
while the expected values given by H
0
is
New car
Large Medium Small Total
Previous
car
Large 37.2 55.8 57.0 150
Medium 49.6 74.4 76.0 120
Small 37.2 55.8 57.0 150
Total 124 186 190 500
Note the margins are the same. It is quite clear that they do not match well,
but we can find the p value to be sure.
XX
(o
ij
− e
ij
)
2
e
ij
= 36
.
20, and the degrees of freedom is (3
−
1)(3
−
1) = 4.
From the tables, χ
2
4
(0.05) = 9.488 and χ
2
4
(0.01) = 13.28.
So our observed value of 36.20 is significant at the 1% level, i.e. there is
strong evidence against H
0
. So we conclude that the new and present car sizes
are not independent.