3Linear models

IB Statistics

3.8 Hypothesis testing

3.8.1 Hypothesis testing

In hypothesis testing, we want to know whether certain variables influence the

result. If, say, the variable

x

1

does not influence

Y

, then we must have

β

1

= 0.

So the goal is to test the hypothesis

H

0

:

β

1

= 0 versus

H

1

:

β

1

= 0. We will

tackle a more general case, where

β

can be split into two vectors

β

0

and

β

1

,

and we test if β

1

is zero.

We start with an obscure lemma, which might seem pointless at first, but

will prove itself useful very soon.

Lemma. Suppose Z

∼ N

n

(0

, σ

2

I

n

), and

A

1

and

A

2

are symmetric, idempotent

n × n

matrices with

A

1

A

2

= 0 (i.e. they are orthogonal). Then Z

T

A

1

Z and

Z

T

A

2

Z are independent.

This is geometrically intuitive, because

A

1

and

A

2

being orthogonal means

they are concerned about different parts of the vector Z.

Proof. Let X

i

= A

i

Z, i = 1, 2 and

W =

W

1

W

2

=

A

1

A

2

Z.

Then

W ∼ N

2n

0

0

, σ

2

A

1

0

0 A

2

since the off diagonal matrices are σ

2

A

T

1

A

2

= A

1

A

2

= 0.

So W

1

and W

2

are independent, which implies

W

T

1

W

1

= Z

T

A

T

1

A

1

Z = Z

T

A

1

A

1

Z = Z

T

A

1

Z

and

W

T

2

W

2

= Z

T

A

T

2

A

2

Z = Z

T

A

2

A

2

Z = Z

T

A

2

Z

are independent

Now we go to hypothesis testing in general linear models:

Suppose

X

n×p

=

X

0

n×p

0

X

1

n×(p−p

0

)

!

and B =

β

0

β

1

, where

rank

(

X

) =

p, rank(X

0

) = p

0

.

We want to test

H

0

:

β

1

= 0 against

H

1

:

β

1

= 0. Under

H

0

,

X

1

β

1

vanishes

and

Y = X

0

β

0

+ ε.

Under H

0

, the mle of β

0

and σ

2

are

ˆ

ˆ

β

0

= (X

T

0

X

0

)

−1

X

T

0

Y

ˆ

ˆσ

2

=

RSS

0

n

=

1

n

(Y − X

0

ˆ

ˆ

β

0

)

T

(Y − X

0

ˆ

ˆ

β

0

)

and we have previously shown these are independent.

Note that our poor estimators wear two hats instead of one. We adopt the

convention that the estimators of the null hypothesis have two hats, while those

of the alternative hypothesis have one.

So the fitted values under H

0

are

ˆ

ˆ

Y = X

0

(X

T

0

X

0

)

−1

X

T

0

Y = P

0

Y,

where P

0

= X

0

(X

T

0

X

0

)

−1

X

T

0

.

The generalized likelihood ratio test of H

0

against H

1

is

Λ

Y

(H

0

, H

1

) =

1

√

2πˆσ

2

exp

−

1

2ˆσ

2

(Y − X

ˆ

β)

T

(Y − X

ˆ

β)

1

√

2π

ˆ

ˆσ

2

exp

−

1

2

ˆ

ˆσ

2

(Y − X

ˆ

ˆ

β

0

)

T

(Y − X

ˆ

ˆ

β

0

)

=

ˆ

ˆσ

2

ˆσ

2

!

n/2

=

RSS

0

RSS

n/2

=

1 +

RSS

0

− RSS

RSS

n/2

.

We reject H

0

when 2 log Λ is large, equivalently when

RSS

0

−RSS

RSS

is large.

Using the results in Lecture 8, under H

0

, we have

2 log Λ = n log

1 +

RSS

0

− RSS

RSS

,

which is approximately a χ

2

p

1

−p

0

random variable.

This is a good approximation. But we can get an exact null distribution, and

get an exact test.

We have previously shown that RSS = Y

T

(I

n

− P)Y, and so

RSS

0

− RSS = Y

T

(I

n

− P

0

)Y − Y

T

(I

n

− P)Y = Y

T

(P −P

0

)Y.

Now both

I

n

− P

and

P − P

0

are symmetric and idempotent, and therefore

rank(I

n

− P ) = n − p and

rank(P −P

0

) = tr(P −P

0

) = tr(P ) − tr(P

0

) = rank(P ) − rank(P

0

) = p − p

0

.

Also,

(I

n

− P )(P − P

0

) = (I

n

− P )P − (I

n

− P )P

0

= (P −P

2

) − (P

0

− P P

0

) = 0.

(we have

P

2

=

P

by idempotence, and

P P

0

=

P

0

since after projecting with

P

0

,

we are already in the space of P , and applying P has no effect)

Finally,

Y

T

(I

n

− P )Y = (Y −X

0

β

0

)

T

(I

n

− P )(Y − X

0

β

0

)

Y

T

(P −P

0

)Y = (Y − X

0

β

0

)

T

(P −P

0

)(Y − X

0

β

0

)

since (I

n

− P )X

0

= (P −P

0

)X

0

= 0.

If we let Z = Y

−X

0

β

0

,

A

1

=

I

n

−P

,

A

2

=

P −P

0

, and apply our previous

lemma, and the fact that Z

T

A

i

Z ∼ σ

2

χ

2

r

, then

RSS = Y

T

(I

n

− P )Y ∼ χ

2

n−p

RSS

0

− RSS = Y

T

(P −P

0

)Y ∼ χ

2

p−p

0

and these random variables are independent.

So under H

0

,

F =

Y

T

(P −P

0

)Y/(p − p

0

)

Y

T

(I

n

− P )Y/(n − p)

=

(RSS

0

− RSS)/(p − p

0

)

RSS/(n − p)

∼ F

p−p

0

,n−p

,

Hence we reject H

0

if F > F

p−p

0

,n−p

(α).

RSS

0

− RSS

is the reduction in the sum of squares due to fitting

β

1

in

addition to β

0

.

Source of var. d.f. sum of squares mean squares F statistic

Fitted model p − p

0

RSS

0

− RSS

RSS

0

−RSS

p−p

0

(RSS

0

−RSS)/(p−p

0

)

RSS/(n−p)

Residual n − p RSS

RSS

n−p

Total n − p

0

RSS

0

The ratio

RSS

0

−RSS

RSS

0

is sometimes known as the proportion of variance explained

by β

1

, and denoted R

2

.

3.8.2 Simple linear regression

We assume that

Y

i

= a

′

+ b(x

i

− ¯x) + ε

i

,

where ¯x =

P

x

i

/n and ε

i

are N(0, σ

2

).

Suppose we want to test the hypothesis

H

0

:

b

= 0, i.e. no linear relationship.

We have previously seen how to construct a confidence interval, and so we could

simply see if it included 0.

Alternatively, under

H

0

, the model is

Y

i

∼ N

(

a

′

, σ

2

), and so

ˆa

′

=

¯

Y

, and the

fitted values are

ˆ

Y

i

=

¯

Y .

The observed RSS

0

is therefore

RSS

0

=

X

i

(y

i

− ¯y)

2

= S

yy

.

The fitted sum of squares is therefore

RSS

0

−RSS =

X

i

(y

i

− ¯y)

2

−(y

i

− ¯y −

ˆ

b(x

i

− ¯x))

2

=

ˆ

b

2

X

(x

i

− ¯x)

2

=

ˆ

b

2

S

xx

.

Source of var. d.f. sum of squares mean squares F statistic

Fitted model 1 RSS

0

− RSS =

ˆ

b

2

S

XX

ˆ

b

2

S

xx

F =

ˆ

b

2

S

xx

˜σ

2

Residual n − 2 RSS =

P

i

(y

i

− ˆy)

2

˜σ

2

.

Total n − 1 RSS

0

=

P

i

(y

i

− ¯y)

2

Note that the proposition of variance explained is

ˆ

b

2

S

xx

/S

yy

=

S

2

xy

S

xx

S

yy

=

r

2

,

where r is the Pearson’s product-moment correlation coefficient

r =

S

xy

p

S

xx

S

yy

.

We have previously seen that under

H

0

,

ˆ

b

SE(

ˆ

b)

∼ t

n−2

, where

SE

(

ˆ

b

) =

˜σ/

√

S

xx

.

So we let

t =

ˆ

b

SE(

ˆ

b)

=

ˆ

b

√

S

xx

˜σ

.

Checking whether

|t| > t

n−2

α

2

is precisely the same as checking whether

t

2

= F > F

1,n−2

(α), since a F

1,n−2

variable is t

2

n−2

.

Hence the same conclusion is reached, regardless of whether we use the

t-distribution or the F statistic derived form an analysis of variance table.

3.8.3

One way analysis of variance with equal numbers in each group

Recall that in our wafer example, we made measurements in groups, and want to

know if there is a difference between groups. In general, suppose

J

measurements

are taken in each of I groups, and that

Y

ij

= µ

i

+ ε

ij

,

where

ε

ij

are independent

N

(0

, σ

2

) random variables, and the

µ

i

are unknown

constants.

Fitting this model gives

RSS =

I

X

i=1

J

X

j=1

(Y

ij

− ˆµ

i

)

2

=

I

X

i=1

J

X

j=1

(Y

ij

−

¯

Y

i.

)

2

on n − I degrees of freedom.

Suppose we want to test the hypothesis

H

0

:

µ

i

=

µ

, i.e. no difference between

groups.

Under

H

0

, the model is

Y

ij

∼ N

(

µ, σ

2

), and so

ˆµ

=

¯

Y

, and the fitted values

are

ˆ

Y

ij

=

¯

Y .

The observed RSS

0

is therefore

RSS

0

=

X

i,j

(y

ij

− ¯y

..

)

2

.

The fitted sum of squares is therefore

RSS

0

− RSS =

X

i

X

j

(y

ij

− ¯y

..

)

2

− (y

ij

− ¯y

i.

)

2

= J

X

i

(¯y

i.

− ¯y

..

)

2

.

Source of var. d.f. sum of squares mean squares F statistic

Fitted model I − 1 J

P

i

(¯y

i

− ¯y

..

)

2

J

P

i

(¯y

i.

−¯y

..

)

2

I−1

J

P

i

(¯y

i.

−¯y

..

)

2

(I−1)˜σ

2

Residual n − I

P

i

P

j

(y

ij

− ¯y

i.

)

2

˜σ

2

Total n −1

P

i

P

j

(y

ij

− ¯y

..

)

2