3Linear models
IB Statistics
3.7 Expected response at x
∗
After performing the linear regression, we can now make predictions from it.
Suppose that x
∗
is a new vector of values for the explanatory variables.
The expected response at x
∗
is
E
[Y

x
∗
] = x
∗T
β
. We estimate this by x
∗T
ˆ
β
.
Then we have
x
∗T
(
ˆ
β − β) ∼ N (0, x
∗T
cov(
ˆ
β)x
∗
) = N(0, σ
2
x
∗T
(X
T
X)
−1
x
∗
).
Let τ
2
= x
∗T
(X
T
X)
−1
x
∗
. Then
x
∗T
(
ˆ
β − β)
˜στ
∼ t
n−p
.
Then a confidence interval for the expected response x
∗T
β has end points
x
∗T
ˆ
β ± ˜στ t
n−p
α
2
.
Example. Previous example continued:
Suppose we wish to estimate the time to run 2 miles for a man with an
oxygen takeup measurement of 50. Here x
∗T
= (1, 50 − ¯x), where ¯x = 48.6.
The estimated expected response at x
∗T
is
x
∗T
ˆ
β = ˆa
′
+ (50 − 48.5) ×
ˆ
b = 826.5 − 1.4 × 12.9 = 808.5,
which is obtained by plugging x
∗T
into our fitted line.
We find
τ
2
= x
∗T
(X
T
X)
−1
x
∗
=
1
n
+
x
∗2
S
xx
=
1
24
+
1.4
2
783.5
= 0.044 = 0.21
2
.
So a 95% confidence interval for E[Y  x
∗
= 50 − ¯x] is
x
∗T
ˆ
β ± ˜στ t
n−p
α
2
= 808.5 ± 55.6 × 0.21 × 2.07 = (783.6, 832.2).
Note that this is the confidence interval for the predicted expected value,
NOT the confidence interval for the actual obtained value.
The predicted response at x
∗
is
Y
∗
= x
∗
β
+
ε
∗
, where
ε
∗
∼ N
(0
, σ
2
), and
Y
∗
is independent of
Y
1
, ··· , Y
n
. Here we have more uncertainties in our prediction:
β and ε
∗
.
A 100(1
− α
)% prediction interval for
Y
∗
is an interval
I
(Y) such that
P
(
Y
∗
∈ I
(Y)) = 1
− α
, where the probability is over the joint distribution of
Y
∗
, Y
1
, ··· , Y
n
. So
I
is a random function of the past data Y that outputs an
interval.
First of all, as above, the predicted expected response is
ˆ
Y
∗
= x
∗T
β
. This is
an unbiased estimator since
ˆ
Y
∗
− Y
∗
= x
∗T
(
ˆ
β − β) − ε
∗
, and hence
E[
ˆ
Y
∗
− Y
∗
] = x
∗T
(β − β) = 0,
To find the variance, we use that fact that x
∗T
(
ˆ
β − β
) and
ε
∗
are independent,
and the variance of the sum of independent variables is the sum of the variances.
So
var(
ˆ
Y
∗
− Y
∗
) = var(x
∗T
(
ˆ
β)) + var(ε
∗
)
= σ
2
x
∗T
(X
T
X)
−1
x
∗
+ σ
2
.
= σ
2
(τ
2
+ 1).
We can see this as the uncertainty in the regression line
σ
2
τ
2
, plus the wobble
about the regression line σ
2
. So
ˆ
Y
∗
− Y
∗
∼ N (0, σ
2
(τ
2
+ 1)).
We therefore find that
ˆ
Y
∗
− Y
∗
˜σ
√
τ
2
+ 1
∼ t
n−p
.
So the interval with endpoints
x
∗T
ˆ
β ± ˜σ
p
τ
2
+ 1t
n−p
α
2
is a 95% prediction interval for
Y
∗
. We don’t call this a confidence interval —
confidence intervals are about finding parameters of the distribution, while the
prediction interval is about our predictions.
Example. A 95% prediction interval for Y
∗
at x
∗T
= (1, (50 − ¯x)) is
x
∗T
± ˜σ
p
τ
2
+ 1t
n−p
α
2
= 808.5 ± 55.6 × 1.02 × 2.07 = (691.1, 925.8).
Note that this is much wider than our our expected response! This is since there
are three sources of uncertainty: we don’t know what
σ
is, what
ˆ
b
is, and the
random ε fluctuation!
Example. Wafer example continued: Suppose we wish to estimate the expected
resistivity of a new wafer in the first instrument. Here x
∗T
= (1
,
0
, ··· ,
0) (recall
that x is an indicator vector to indicate which instrument is used).
The estimated response at x
∗T
is
x
∗T
ˆ
µ = ˆµ
1
= ¯y
1
= 124.3
We find
τ
2
= x
∗T
(X
T
X)
−1
x
∗
=
1
5
.
So a 95% confidence interval for E[Y
∗
1
] is
x
∗T
ˆ
µ ± ˜στ t
n−p
α
2
= 124.3 ±
10.4
√
5
× 2.09 = (114.6, 134.0).
Note that we are using an estimate of
σ
obtained from all five instruments. If
we had only used the data from the first instrument, σ would be estimated as
˜σ
1
=
s
P
5
j=1
y
1,j
− ¯y
1
5 − 1
= 8.74.
The observed 95% confidence interval for µ
1
would have been
¯y
1
±
˜σ
1
√
5
t
4
α
2
= 124.3 ± 3.91 × 2.78 = (113.5, 135.1),
which is slightly wider. Usually it is much wider, but in this special case, we
only get little difference since the data from the first instrument is relatively
tighter than the others.
A 95% prediction interval for Y
∗
1
at x
∗T
= (1, 0, ··· , 0) is
x
∗T
ˆ
µ ± ˜σ
p
τ
2
+ 1t
n−p
α
2
= 124.3 ± 10.42 × 1.1 × 2.07 = (100.5, 148.1).