3Linear models
IB Statistics
3.8 Hypothesis testing
3.8.1 Hypothesis testing
In hypothesis testing, we want to know whether certain variables influence the
result. If, say, the variable
x
1
does not influence
Y
, then we must have
β
1
= 0.
So the goal is to test the hypothesis
H
0
:
β
1
= 0 versus
H
1
:
β
1
= 0. We will
tackle a more general case, where
β
can be split into two vectors
β
0
and
β
1
,
and we test if β
1
is zero.
We start with an obscure lemma, which might seem pointless at first, but
will prove itself useful very soon.
Lemma. Suppose Z
∼ N
n
(0
, σ
2
I
n
), and
A
1
and
A
2
are symmetric, idempotent
n × n
matrices with
A
1
A
2
= 0 (i.e. they are orthogonal). Then Z
T
A
1
Z and
Z
T
A
2
Z are independent.
This is geometrically intuitive, because
A
1
and
A
2
being orthogonal means
they are concerned about different parts of the vector Z.
Proof. Let X
i
= A
i
Z, i = 1, 2 and
W =
W
1
W
2
=
A
1
A
2
Z.
Then
W ∼ N
2n
0
0
, σ
2
A
1
0
0 A
2
since the off diagonal matrices are σ
2
A
T
1
A
2
= A
1
A
2
= 0.
So W
1
and W
2
are independent, which implies
W
T
1
W
1
= Z
T
A
T
1
A
1
Z = Z
T
A
1
A
1
Z = Z
T
A
1
Z
and
W
T
2
W
2
= Z
T
A
T
2
A
2
Z = Z
T
A
2
A
2
Z = Z
T
A
2
Z
are independent
Now we go to hypothesis testing in general linear models:
Suppose
X
n×p
=
X
0
n×p
0
X
1
n×(p−p
0
)
!
and B =
β
0
β
1
, where
rank
(
X
) =
p, rank(X
0
) = p
0
.
We want to test
H
0
:
β
1
= 0 against
H
1
:
β
1
= 0. Under
H
0
,
X
1
β
1
vanishes
and
Y = X
0
β
0
+ ε.
Under H
0
, the mle of β
0
and σ
2
are
ˆ
ˆ
β
0
= (X
T
0
X
0
)
−1
X
T
0
Y
ˆ
ˆσ
2
=
RSS
0
n
=
1
n
(Y − X
0
ˆ
ˆ
β
0
)
T
(Y − X
0
ˆ
ˆ
β
0
)
and we have previously shown these are independent.
Note that our poor estimators wear two hats instead of one. We adopt the
convention that the estimators of the null hypothesis have two hats, while those
of the alternative hypothesis have one.
So the fitted values under H
0
are
ˆ
ˆ
Y = X
0
(X
T
0
X
0
)
−1
X
T
0
Y = P
0
Y,
where P
0
= X
0
(X
T
0
X
0
)
−1
X
T
0
.
The generalized likelihood ratio test of H
0
against H
1
is
Λ
Y
(H
0
, H
1
) =
1
√
2πˆσ
2
exp
−
1
2ˆσ
2
(Y − X
ˆ
β)
T
(Y − X
ˆ
β)
1
√
2π
ˆ
ˆσ
2
exp
−
1
2
ˆ
ˆσ
2
(Y − X
ˆ
ˆ
β
0
)
T
(Y − X
ˆ
ˆ
β
0
)
=
ˆ
ˆσ
2
ˆσ
2
!
n/2
=
RSS
0
RSS
n/2
=
1 +
RSS
0
− RSS
RSS
n/2
.
We reject H
0
when 2 log Λ is large, equivalently when
RSS
0
−RSS
RSS
is large.
Using the results in Lecture 8, under H
0
, we have
2 log Λ = n log
1 +
RSS
0
− RSS
RSS
,
which is approximately a χ
2
p
1
−p
0
random variable.
This is a good approximation. But we can get an exact null distribution, and
get an exact test.
We have previously shown that RSS = Y
T
(I
n
− P)Y, and so
RSS
0
− RSS = Y
T
(I
n
− P
0
)Y − Y
T
(I
n
− P)Y = Y
T
(P −P
0
)Y.
Now both
I
n
− P
and
P − P
0
are symmetric and idempotent, and therefore
rank(I
n
− P ) = n − p and
rank(P −P
0
) = tr(P −P
0
) = tr(P ) − tr(P
0
) = rank(P ) − rank(P
0
) = p − p
0
.
Also,
(I
n
− P )(P − P
0
) = (I
n
− P )P − (I
n
− P )P
0
= (P −P
2
) − (P
0
− P P
0
) = 0.
(we have
P
2
=
P
by idempotence, and
P P
0
=
P
0
since after projecting with
P
0
,
we are already in the space of P , and applying P has no effect)
Finally,
Y
T
(I
n
− P )Y = (Y −X
0
β
0
)
T
(I
n
− P )(Y − X
0
β
0
)
Y
T
(P −P
0
)Y = (Y − X
0
β
0
)
T
(P −P
0
)(Y − X
0
β
0
)
since (I
n
− P )X
0
= (P −P
0
)X
0
= 0.
If we let Z = Y
−X
0
β
0
,
A
1
=
I
n
−P
,
A
2
=
P −P
0
, and apply our previous
lemma, and the fact that Z
T
A
i
Z ∼ σ
2
χ
2
r
, then
RSS = Y
T
(I
n
− P )Y ∼ χ
2
n−p
RSS
0
− RSS = Y
T
(P −P
0
)Y ∼ χ
2
p−p
0
and these random variables are independent.
So under H
0
,
F =
Y
T
(P −P
0
)Y/(p − p
0
)
Y
T
(I
n
− P )Y/(n − p)
=
(RSS
0
− RSS)/(p − p
0
)
RSS/(n − p)
∼ F
p−p
0
,n−p
,
Hence we reject H
0
if F > F
p−p
0
,n−p
(α).
RSS
0
− RSS
is the reduction in the sum of squares due to fitting
β
1
in
addition to β
0
.
Source of var. d.f. sum of squares mean squares F statistic
Fitted model p − p
0
RSS
0
− RSS
RSS
0
−RSS
p−p
0
(RSS
0
−RSS)/(p−p
0
)
RSS/(n−p)
Residual n − p RSS
RSS
n−p
Total n − p
0
RSS
0
The ratio
RSS
0
−RSS
RSS
0
is sometimes known as the proportion of variance explained
by β
1
, and denoted R
2
.
3.8.2 Simple linear regression
We assume that
Y
i
= a
′
+ b(x
i
− ¯x) + ε
i
,
where ¯x =
P
x
i
/n and ε
i
are N(0, σ
2
).
Suppose we want to test the hypothesis
H
0
:
b
= 0, i.e. no linear relationship.
We have previously seen how to construct a confidence interval, and so we could
simply see if it included 0.
Alternatively, under
H
0
, the model is
Y
i
∼ N
(
a
′
, σ
2
), and so
ˆa
′
=
¯
Y
, and the
fitted values are
ˆ
Y
i
=
¯
Y .
The observed RSS
0
is therefore
RSS
0
=
X
i
(y
i
− ¯y)
2
= S
yy
.
The fitted sum of squares is therefore
RSS
0
−RSS =
X
i
(y
i
− ¯y)
2
−(y
i
− ¯y −
ˆ
b(x
i
− ¯x))
2
=
ˆ
b
2
X
(x
i
− ¯x)
2
=
ˆ
b
2
S
xx
.
Source of var. d.f. sum of squares mean squares F statistic
Fitted model 1 RSS
0
− RSS =
ˆ
b
2
S
XX
ˆ
b
2
S
xx
F =
ˆ
b
2
S
xx
˜σ
2
Residual n − 2 RSS =
P
i
(y
i
− ˆy)
2
˜σ
2
.
Total n − 1 RSS
0
=
P
i
(y
i
− ¯y)
2
Note that the proposition of variance explained is
ˆ
b
2
S
xx
/S
yy
=
S
2
xy
S
xx
S
yy
=
r
2
,
where r is the Pearson’s product-moment correlation coefficient
r =
S
xy
p
S
xx
S
yy
.
We have previously seen that under
H
0
,
ˆ
b
SE(
ˆ
b)
∼ t
n−2
, where
SE
(
ˆ
b
) =
˜σ/
√
S
xx
.
So we let
t =
ˆ
b
SE(
ˆ
b)
=
ˆ
b
√
S
xx
˜σ
.
Checking whether
|t| > t
n−2
α
2
is precisely the same as checking whether
t
2
= F > F
1,n−2
(α), since a F
1,n−2
variable is t
2
n−2
.
Hence the same conclusion is reached, regardless of whether we use the
t-distribution or the F statistic derived form an analysis of variance table.
3.8.3
One way analysis of variance with equal numbers in each group
Recall that in our wafer example, we made measurements in groups, and want to
know if there is a difference between groups. In general, suppose
J
measurements
are taken in each of I groups, and that
Y
ij
= µ
i
+ ε
ij
,
where
ε
ij
are independent
N
(0
, σ
2
) random variables, and the
µ
i
are unknown
constants.
Fitting this model gives
RSS =
I
X
i=1
J
X
j=1
(Y
ij
− ˆµ
i
)
2
=
I
X
i=1
J
X
j=1
(Y
ij
−
¯
Y
i.
)
2
on n − I degrees of freedom.
Suppose we want to test the hypothesis
H
0
:
µ
i
=
µ
, i.e. no difference between
groups.
Under
H
0
, the model is
Y
ij
∼ N
(
µ, σ
2
), and so
ˆµ
=
¯
Y
, and the fitted values
are
ˆ
Y
ij
=
¯
Y .
The observed RSS
0
is therefore
RSS
0
=
X
i,j
(y
ij
− ¯y
..
)
2
.
The fitted sum of squares is therefore
RSS
0
− RSS =
X
i
X
j
(y
ij
− ¯y
..
)
2
− (y
ij
− ¯y
i.
)
2
= J
X
i
(¯y
i.
− ¯y
..
)
2
.
Source of var. d.f. sum of squares mean squares F statistic
Fitted model I − 1 J
P
i
(¯y
i
− ¯y
..
)
2
J
P
i
(¯y
i.
−¯y
..
)
2
I−1
J
P
i
(¯y
i.
−¯y
..
)
2
(I−1)˜σ
2
Residual n − I
P
i
P
j
(y
ij
− ¯y
i.
)
2
˜σ
2
Total n −1
P
i
P
j
(y
ij
− ¯y
..
)
2