IB Statistics - Hypothesis testing

2Hypothesis testing

IB Statistics

2.3 Tests of goodness-of-fit and independence

2.3.1 Goo dness-of-fit of a fully-specified null distribution

So far, we have considered relatively simple cases where we are attempting to

figure out, say, the mean. However, in reality, more complicated scenarios arise.

For example, we might want to know if a dice is fair, i.e. if the probability of

getting each number is exactly

. Our null hypothesis would be that

··· = p

, while the alternative hypothesis allows any possible values of p

In general, suppose the observation space

is partitioned into

sets, and

let

be the probability that an observation is in set

for

= 1

, ··· , k

. We want

to test “

: the

’s arise from a fully specified model” against “

: the

’s

are unrestricted (apart from the obvious p

≥ 0,

= 1)”.

Example. The following table lists the birth months of admissions to Oxford

and Cambridge in 2012.

Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug

470 515 470 457 473 381 466 457 437 396 384 394

Is this compatible with a uniform distribution over the year?

Out of

independent observations, let

be the number of observations in

ith set. So (N

, ··· , N

) ∼ multinomial(k; p

, ··· , p

For a generalized likelihood ratio test of

, we need to find the maximised

likelihood under H

and H

Under

(

, ··· , p

)

∝ p

···p

. So the log likelihood is

constant

log p

. We want to maximise this subject to

= 1. Us-

ing the Lagrange multiplier, we will find that the mle is

ˆp

. Also

|Θ

| = k −1 (not k, since they must sum up to 1).

Under

, the values of

are specified completely, say

˜p

. So

= 0.

Using our formula for ˆp

, we find that

2 log Λ = 2 log



ˆp

··· ˆp

˜p

··· ˜p



= 2

log



n˜p



(1)

Here

|−|

k−

1. So we reject

if 2

log

> χ

k−1

(

) for an approximate

size α test.

Under

(no effect of month of birth),

˜p

is the proportion of births in

month

in 1993/1994 in the whole population — this is not simply proportional

to the number of days in each month (or even worse,

), as there is for example

an excess of September births (the “Christmas effect”). Then

2 log Λ = 2

log



n˜p



= 44.86.

(

86) = 3

−9

, which is our

-value. Since this is certainly less than

0.001, we can reject

at the 0

1% level, or can say the result is “significant at

the 0.1% level”.

The traditional levels for comparison are

= 0

001, roughly

corresponding to “evidence”, “strong evidence” and “very strong evidence”.

A similar common situation has

(

) for some parameter

and

as before. Now

is the number of independent parameters to be estimated

under H

Under H

, we find mle

θ by maximizing n

log p

(θ), and then

2 log Λ = 2 log

ˆp

··· ˆp

(

θ)

···p

(

θ)

= 2

log

(

θ)

. (2)

The degrees of freedom are k − 1 −|Θ

2.3.2 Pearson’s chi-squared test

Notice that the two log likelihoods are of the same form. In general, let

(observed number) and let

n ˜p

(

) (expected number). Let

−e

Then

2 log Λ = 2

log





= 2

+ δ

) log



1 +



= 2

+ δ

)



−

+ O(δ

)



= 2



−

+ O(δ

)



We know that

= 0 since

. So

≈

− e

)

This is known as the Pearson’s chi-squared test.

Example. Mendel crossed 556 smooth yellow male peas with wrinkled green

peas. From the progeny, let

(i) N

be the number of smooth yellow peas,

(ii) N

be the number of smooth green peas,

(iii) N

be the number of wrinkled yellow peas,

(iv) N

be the number of wrinkled green peas.

We wish to test the goodness of fit of the model

: (p

, p

) =





Suppose we observe (n

, n

) = (315, 108, 102, 31).

We find (

, e

) = (312

104

75). The actual 2

log

Λ =

0.618 and the approximation we had is

−e

)

= 0.604.

Here |Θ

| = 0 and |Θ

| = 4 − 1 = 3. So we refer to test statistics χ

(α).

Since

05) = 7

815, we see that neither value is significant at 5%. So

there is no evidence against Mendel’s theory. In fact, the

-value is approximately

(

≈

90. This is a really good fit, so good that people suspect the

numbers were not genuine.

Example. In a genetics problem, each individual has one of the three possible

genotypes, with probabilities

, p

. Suppose we wish to test

(

where

(θ) = θ

, p

= 2θ(1 − θ), p

(θ) = (1 − θ)

for some θ ∈ (0, 1).

We observe N

= n

. Under H

, the mle

θ is found by maximising

log p

(θ) = 2n

log θ + n

log(2θ(1 − θ)) + 2n

log(1 − θ).

We find that

θ =

. Also, |Θ

| = 1 and |Θ

| = 2.

After conducting an experiment, we can substitute

(

) into (2), or find the

corresponding Pearson’s chi-squared statistic, and refer to χ

2.3.3 Testing indep endence in contingency tables

Definition (Contingency table). A contingency table is a table in which obser-

vations or individuals are classified according to one or more criteria.

Example. 500 people with recent car changes were asked about their previous

and new cars. The results are as follows:

New car

Large Medium Small

car

Large 56 52 42

Medium 50 83 67

Small 18 51 81

This is a two-way contingency table: Each person is classified according to the

previous car size and new car size.

Consider a two-way contingency table with

rows and

columns. For

= 1

, ··· , r

And

= 1

, ··· , c

, let

be the probability that an individual

selected from the population under consideration is classified in row

and

column j. (i.e. in the (i, j) cell of the table).

Let

(

in row i

) and

(

in column j

). Then we must have

= 1.

Suppose a random sample of

individuals is taken, and let

be the number

of these classified in the (i, j) cell of the table.

Let n

and n

. So n

= n.

We have

, ··· , N

, N

, ··· , N

) ∼ multinomial(rc; p

, ··· , p

, p

, ··· , p

We may be interested in testing the null hypothesis that the two classifications

are independent. So we test

– H

: p

= p

for all i, j, i.e. independence of columns and rows.

– H

: p

are unrestricted.

Of course we have the usual restrictions like p

= 1, p

≥ 0.

Under H

, the mles are ˆp

Under H

, the mles are ˆp

and ˆp

Write o

= n

and e

= nˆp

ˆp

= n

/n.

Then

2 log Λ = 2

i=1

j=1

log





≈

i=1

j=1

− e

)

using the same approximating steps for Pearson’s Chi-squared test.

We have

rc −

1, because under

the

’s sum to one. Also,

= (

r −

1) + (

c −

1) because

, ··· , p

must satisfy

= 1 and

, ··· , p

must satisfy

= 1. So

|Θ

| − |Θ

| = rc − 1 − (r − 1) − (c − 1) = (r − 1)(c − 1).

Example. In our previous example, we wish to test

: the new and previous

car sizes are independent. The actual data is:

New car

Large Medium Small Total

car

Large 56 52 42 150

Medium 50 83 67 120

Small 18 51 81 150

Total 124 186 190 500

while the expected values given by H

New car

Large Medium Small Total

car

Large 37.2 55.8 57.0 150

Medium 49.6 74.4 76.0 120

Small 37.2 55.8 57.0 150

Total 124 186 190 500

Note the margins are the same. It is quite clear that they do not match well,

but we can find the p value to be sure.

− e

)

= 36

20, and the degrees of freedom is (3

−

1)(3

−

1) = 4.

From the tables, χ

(0.05) = 9.488 and χ

(0.01) = 13.28.

So our observed value of 36.20 is significant at the 1% level, i.e. there is

strong evidence against H

. So we conclude that the new and present car sizes

are not independent.