IB Statistics - Linear models

3Linear models

IB Statistics

3.8 Hypothesis testing

3.8.1 Hypothesis testing

In hypothesis testing, we want to know whether certain variables influence the

result. If, say, the variable

does not influence

, then we must have

= 0.

So the goal is to test the hypothesis

= 0 versus



= 0. We will

tackle a more general case, where

can be split into two vectors

and

and we test if β

is zero.

We start with an obscure lemma, which might seem pointless at first, but

will prove itself useful very soon.

Lemma. Suppose Z

∼ N

, σ

), and

and

are symmetric, idempotent

n × n

matrices with

= 0 (i.e. they are orthogonal). Then Z

Z and

Z are independent.

This is geometrically intuitive, because

and

being orthogonal means

they are concerned about different parts of the vector Z.

Proof. Let X

= A

Z, i = 1, 2 and

W =









Then

W ∼ N





, σ



0 A



since the off diagonal matrices are σ

= A

= 0.

So W

and W

are independent, which implies

= Z

Z = Z

and

= Z

Z = Z

are independent

Now we go to hypothesis testing in general linear models:

Suppose

n×p

n×(p−p

)

and B =





, where

rank

(

) =

p, rank(X

) = p

We want to test

= 0 against



= 0. Under

vanishes

and

Y = X

+ ε.

Under H

, the mle of β

and σ

are

= (X

)

−1

ˆσ

RSS

(Y − X

)

(Y − X

)

and we have previously shown these are independent.

Note that our poor estimators wear two hats instead of one. We adopt the

convention that the estimators of the null hypothesis have two hats, while those

of the alternative hypothesis have one.

So the fitted values under H

are

Y = X

)

−1

Y = P

where P

= X

)

−1

The generalized likelihood ratio test of H

against H

, H

) =



√

2πˆσ



exp



−

2ˆσ

(Y − X

β)

(Y − X

β)





√

2π

ˆσ



exp



−

ˆσ

(Y − X

)

(Y − X

)



ˆσ

n/2



RSS



n/2



1 +

RSS

− RSS

RSS



n/2

We reject H

when 2 log Λ is large, equivalently when

RSS

−RSS

RSS

is large.

Using the results in Lecture 8, under H

, we have

2 log Λ = n log



1 +

RSS

− RSS

RSS



which is approximately a χ

−p

random variable.

This is a good approximation. But we can get an exact null distribution, and

get an exact test.

We have previously shown that RSS = Y

− P)Y, and so

RSS

− RSS = Y

− P

)Y − Y

− P)Y = Y

(P −P

)Y.

Now both

− P

and

P − P

are symmetric and idempotent, and therefore

rank(I

− P ) = n − p and

rank(P −P

) = tr(P −P

) = tr(P ) − tr(P

) = rank(P ) − rank(P

) = p − p

Also,

− P )(P − P

) = (I

− P )P − (I

− P )P

= (P −P

) − (P

− P P

) = 0.

(we have

by idempotence, and

P P

since after projecting with

we are already in the space of P , and applying P has no effect)

Finally,

− P )Y = (Y −X

)

− P )(Y − X

)

(P −P

)Y = (Y − X

)

(P −P

)(Y − X

)

since (I

− P )X

= (P −P

= 0.

If we let Z = Y

−X

−P

P −P

, and apply our previous

lemma, and the fact that Z

Z ∼ σ

, then

RSS = Y

− P )Y ∼ χ

n−p

RSS

− RSS = Y

(P −P

)Y ∼ χ

p−p

and these random variables are independent.

So under H

F =

(P −P

)Y/(p − p

)

− P )Y/(n − p)

(RSS

− RSS)/(p − p

)

RSS/(n − p)

∼ F

p−p

,n−p

Hence we reject H

if F > F

p−p

,n−p

(α).

RSS

− RSS

is the reduction in the sum of squares due to fitting

addition to β

Source of var. d.f. sum of squares mean squares F statistic

Fitted model p − p

RSS

− RSS

RSS

−RSS

p−p

(RSS

−RSS)/(p−p

)

RSS/(n−p)

Residual n − p RSS

RSS

n−p

Total n − p

RSS

The ratio

RSS

−RSS

RSS

is sometimes known as the proportion of variance explained

by β

, and denoted R

3.8.2 Simple linear regression

We assume that

= a

′

+ b(x

− ¯x) + ε

where ¯x =

/n and ε

are N(0, σ

Suppose we want to test the hypothesis

= 0, i.e. no linear relationship.

We have previously seen how to construct a confidence interval, and so we could

simply see if it included 0.

Alternatively, under

, the model is

∼ N

(

′

, σ

), and so

ˆa

′

, and the

fitted values are

Y .

The observed RSS

is therefore

RSS

− ¯y)

= S

The fitted sum of squares is therefore

RSS

−RSS =



− ¯y)

−(y

− ¯y −

b(x

− ¯x))



− ¯x)

Source of var. d.f. sum of squares mean squares F statistic

Fitted model 1 RSS

− RSS =

F =

˜σ

Residual n − 2 RSS =

− ˆy)

˜σ

Total n − 1 RSS

− ¯y)

Note that the proposition of variance explained is

where r is the Pearson’s product-moment correlation coefficient

r =

We have previously seen that under

SE(

∼ t

n−2

, where

(

) =

˜σ/

√

So we let

t =

SE(

√

˜σ

Checking whether

|t| > t

n−2





is precisely the same as checking whether

= F > F

1,n−2

(α), since a F

1,n−2

variable is t

n−2

Hence the same conclusion is reached, regardless of whether we use the

t-distribution or the F statistic derived form an analysis of variance table.

3.8.3

One way analysis of variance with equal numbers in each group

Recall that in our wafer example, we made measurements in groups, and want to

know if there is a difference between groups. In general, suppose

measurements

are taken in each of I groups, and that

= µ

+ ε

where

are independent

, σ

) random variables, and the

are unknown

constants.

Fitting this model gives

RSS =

i=1

j=1

− ˆµ

)

i=1

j=1

−

)

on n − I degrees of freedom.

Suppose we want to test the hypothesis

, i.e. no difference between

groups.

Under

, the model is

∼ N

(

µ, σ

), and so

ˆµ

, and the fitted values

are

Y .

The observed RSS

is therefore

RSS

i,j

− ¯y

)

The fitted sum of squares is therefore

RSS

− RSS =



− ¯y

)

− (y

− ¯y

)



= J

(¯y

− ¯y

)

Source of var. d.f. sum of squares mean squares F statistic

Fitted model I − 1 J

(¯y

− ¯y

)

(¯y

−¯y

)

I−1

(¯y

−¯y

)

(I−1)˜σ

Residual n − I

− ¯y

)

˜σ

Total n −1

− ¯y

)