3Linear models

IB Statistics

3.8 Hypothesis testing
3.8.1 Hypothesis testing
In hypothesis testing, we want to know whether certain variables influence the
result. If, say, the variable
x
1
does not influence
Y
, then we must have
β
1
= 0.
So the goal is to test the hypothesis
H
0
:
β
1
= 0 versus
H
1
:
β
1
= 0. We will
tackle a more general case, where
β
can be split into two vectors
β
0
and
β
1
,
and we test if β
1
is zero.
We start with an obscure lemma, which might seem pointless at first, but
will prove itself useful very soon.
Lemma. Suppose Z
N
n
(0
, σ
2
I
n
), and
A
1
and
A
2
are symmetric, idempotent
n × n
matrices with
A
1
A
2
= 0 (i.e. they are orthogonal). Then Z
T
A
1
Z and
Z
T
A
2
Z are independent.
This is geometrically intuitive, because
A
1
and
A
2
being orthogonal means
they are concerned about different parts of the vector Z.
Proof. Let X
i
= A
i
Z, i = 1, 2 and
W =
W
1
W
2
=
A
1
A
2
Z.
Then
W N
2n

0
0
, σ
2
A
1
0
0 A
2

since the off diagonal matrices are σ
2
A
T
1
A
2
= A
1
A
2
= 0.
So W
1
and W
2
are independent, which implies
W
T
1
W
1
= Z
T
A
T
1
A
1
Z = Z
T
A
1
A
1
Z = Z
T
A
1
Z
and
W
T
2
W
2
= Z
T
A
T
2
A
2
Z = Z
T
A
2
A
2
Z = Z
T
A
2
Z
are independent
Now we go to hypothesis testing in general linear models:
Suppose
X
n×p
=
X
0
n×p
0
X
1
n×(pp
0
)
!
and B =
β
0
β
1
, where
rank
(
X
) =
p, rank(X
0
) = p
0
.
We want to test
H
0
:
β
1
= 0 against
H
1
:
β
1
= 0. Under
H
0
,
X
1
β
1
vanishes
and
Y = X
0
β
0
+ ε.
Under H
0
, the mle of β
0
and σ
2
are
ˆ
ˆ
β
0
= (X
T
0
X
0
)
1
X
T
0
Y
ˆ
ˆσ
2
=
0
n
=
1
n
(Y X
0
ˆ
ˆ
β
0
)
T
(Y X
0
ˆ
ˆ
β
0
)
and we have previously shown these are independent.
Note that our poor estimators wear two hats instead of one. We adopt the
convention that the estimators of the null hypothesis have two hats, while those
of the alternative hypothesis have one.
So the fitted values under H
0
are
ˆ
ˆ
Y = X
0
(X
T
0
X
0
)
1
X
T
0
Y = P
0
Y,
where P
0
= X
0
(X
T
0
X
0
)
1
X
T
0
.
The generalized likelihood ratio test of H
0
against H
1
is
Λ
Y
(H
0
, H
1
) =
1
2πˆσ
2
exp
1
2ˆσ
2
(Y X
ˆ
β)
T
(Y X
ˆ
β)
1
2π
ˆ
ˆσ
2
exp
1
2
ˆ
ˆσ
2
(Y X
ˆ
ˆ
β
0
)
T
(Y X
ˆ
ˆ
β
0
)
=
ˆ
ˆσ
2
ˆσ
2
!
n/2
=
0
n/2
=
1 +
0
n/2
.
We reject H
0
when 2 log Λ is large, equivalently when
0
is large.
Using the results in Lecture 8, under H
0
, we have
2 log Λ = n log
1 +
0
,
which is approximately a χ
2
p
1
p
0
random variable.
This is a good approximation. But we can get an exact null distribution, and
get an exact test.
We have previously shown that RSS = Y
T
(I
n
P)Y, and so
0
T
(I
n
P
0
)Y Y
T
(I
n
P)Y = Y
T
(P P
0
)Y.
Now both
I
n
P
and
P P
0
are symmetric and idempotent, and therefore
rank(I
n
P ) = n p and
rank(P P
0
) = tr(P P
0
) = tr(P ) tr(P
0
) = rank(P ) rank(P
0
) = p p
0
.
Also,
(I
n
P )(P P
0
) = (I
n
P )P (I
n
P )P
0
= (P P
2
) (P
0
P P
0
) = 0.
(we have
P
2
=
P
by idempotence, and
P P
0
=
P
0
since after projecting with
P
0
,
we are already in the space of P , and applying P has no effect)
Finally,
Y
T
(I
n
P )Y = (Y X
0
β
0
)
T
(I
n
P )(Y X
0
β
0
)
Y
T
(P P
0
)Y = (Y X
0
β
0
)
T
(P P
0
)(Y X
0
β
0
)
since (I
n
P )X
0
= (P P
0
)X
0
= 0.
If we let Z = Y
X
0
β
0
,
A
1
=
I
n
P
,
A
2
=
P P
0
, and apply our previous
lemma, and the fact that Z
T
A
i
Z σ
2
χ
2
r
, then
T
(I
n
P )Y χ
2
np
0
T
(P P
0
)Y χ
2
pp
0
and these random variables are independent.
So under H
0
,
F =
Y
T
(P P
0
)Y/(p p
0
)
Y
T
(I
n
P )Y/(n p)
=
0
0
)
F
pp
0
,np
,
Hence we reject H
0
if F > F
pp
0
,np
(α).
0
is the reduction in the sum of squares due to fitting
β
1
in
0
.
Source of var. d.f. sum of squares mean squares F statistic
Fitted model p p
0
0
0
pp
0
0
0
)
np
Total n p
0
0
The ratio
0
0
is sometimes known as the proportion of variance explained
by β
1
, and denoted R
2
.
3.8.2 Simple linear regression
We assume that
Y
i
= a
+ b(x
i
¯x) + ε
i
,
where ¯x =
P
x
i
/n and ε
i
are N(0, σ
2
).
Suppose we want to test the hypothesis
H
0
:
b
= 0, i.e. no linear relationship.
We have previously seen how to construct a confidence interval, and so we could
simply see if it included 0.
Alternatively, under
H
0
, the model is
Y
i
N
(
a
, σ
2
), and so
ˆa
=
¯
Y
, and the
fitted values are
ˆ
Y
i
=
¯
Y .
0
is therefore
0
=
X
i
(y
i
¯y)
2
= S
yy
.
The fitted sum of squares is therefore
0
X
i
(y
i
¯y)
2
(y
i
¯y
ˆ
b(x
i
¯x))
2
=
ˆ
b
2
X
(x
i
¯x)
2
=
ˆ
b
2
S
xx
.
Source of var. d.f. sum of squares mean squares F statistic
0
ˆ
b
2
S
XX
ˆ
b
2
S
xx
F =
ˆ
b
2
S
xx
˜σ
2
P
i
(y
i
ˆy)
2
˜σ
2
.
0
=
P
i
(y
i
¯y)
2
Note that the proposition of variance explained is
ˆ
b
2
S
xx
/S
yy
=
S
2
xy
S
xx
S
yy
=
r
2
,
where r is the Pearson’s product-moment correlation coefficient
r =
S
xy
p
S
xx
S
yy
.
We have previously seen that under
H
0
,
ˆ
b
SE(
ˆ
b)
t
n2
, where
SE
(
ˆ
b
) =
˜σ/
S
xx
.
So we let
t =
ˆ
b
SE(
ˆ
b)
=
ˆ
b
S
xx
˜σ
.
Checking whether
|t| > t
n2
α
2
is precisely the same as checking whether
t
2
= F > F
1,n2
(α), since a F
1,n2
variable is t
2
n2
.
Hence the same conclusion is reached, regardless of whether we use the
t-distribution or the F statistic derived form an analysis of variance table.
3.8.3
One way analysis of variance with equal numbers in each group
Recall that in our wafer example, we made measurements in groups, and want to
know if there is a difference between groups. In general, suppose
J
measurements
are taken in each of I groups, and that
Y
ij
= µ
i
+ ε
ij
,
where
ε
ij
are independent
N
(0
, σ
2
) random variables, and the
µ
i
are unknown
constants.
Fitting this model gives
I
X
i=1
J
X
j=1
(Y
ij
ˆµ
i
)
2
=
I
X
i=1
J
X
j=1
(Y
ij
¯
Y
i.
)
2
on n I degrees of freedom.
Suppose we want to test the hypothesis
H
0
:
µ
i
=
µ
, i.e. no difference between
groups.
Under
H
0
, the model is
Y
ij
N
(
µ, σ
2
), and so
ˆµ
=
¯
Y
, and the fitted values
are
ˆ
Y
ij
=
¯
Y .
0
is therefore
0
=
X
i,j
(y
ij
¯y
..
)
2
.
The fitted sum of squares is therefore
0
X
i
X
j
(y
ij
¯y
..
)
2
(y
ij
¯y
i.
)
2
= J
X
i
(¯y
i.
¯y
..
)
2
.
Source of var. d.f. sum of squares mean squares F statistic
Fitted model I 1 J
P
i
(¯y
i
¯y
..
)
2
J
P
i
(¯y
i.
¯y
..
)
2
I1
J
P
i
(¯y
i.
¯y
..
)
2
(I1)˜σ
2
Residual n I
P
i
P
j
(y
ij
¯y
i.
)
2
˜σ
2
Total n 1
P
i
P
j
(y
ij
¯y
..
)
2