IB Statistics - Linear models

3Linear models

IB Statistics

3.7 Expected response at x

∗

After performing the linear regression, we can now make predictions from it.

Suppose that x

∗

is a new vector of values for the explanatory variables.

The expected response at x

∗

] = x

∗T

. We estimate this by x

∗T

Then we have

∗T

(

β − β) ∼ N (0, x

∗T

cov(

β)x

∗

) = N(0, σ

∗T

−1

∗

Let τ

= x

∗T

−1

∗

. Then

∗T

(

β − β)

˜στ

∼ t

n−p

Then a confidence interval for the expected response x

∗T

β has end points

∗T

β ± ˜στ t

n−p





Example. Previous example continued:

Suppose we wish to estimate the time to run 2 miles for a man with an

oxygen take-up measurement of 50. Here x

∗T

= (1, 50 − ¯x), where ¯x = 48.6.

The estimated expected response at x

∗T

β = ˆa

′

+ (50 − 48.5) ×

b = 826.5 − 1.4 × 12.9 = 808.5,

which is obtained by plugging x

∗T

into our fitted line.

We find

= x

∗T

−1

∗

∗2

1.4

783.5

= 0.044 = 0.21

So a 95% confidence interval for E[Y | x

∗

= 50 − ¯x] is

∗T

β ± ˜στ t

n−p





= 808.5 ± 55.6 × 0.21 × 2.07 = (783.6, 832.2).

Note that this is the confidence interval for the predicted expected value,

NOT the confidence interval for the actual obtained value.

The predicted response at x

∗

= x

∗

, where

∗

∼ N

, σ

), and

∗

is independent of

, ··· , Y

. Here we have more uncertainties in our prediction:

β and ε

∗

A 100(1

− α

)% prediction interval for

∗

is an interval

(Y) such that

(

∗

∈ I

(Y)) = 1

− α

, where the probability is over the joint distribution of

∗

, Y

, ··· , Y

. So

is a random function of the past data Y that outputs an

interval.

First of all, as above, the predicted expected response is

∗

= x

∗T

. This is

an unbiased estimator since

∗

− Y

∗

= x

∗T

(

β − β) − ε

∗

, and hence

∗

− Y

∗

] = x

∗T

(β − β) = 0,

To find the variance, we use that fact that x

∗T

(

β − β

) and

∗

are independent,

and the variance of the sum of independent variables is the sum of the variances.

var(

∗

− Y

∗

) = var(x

∗T

(

β)) + var(ε

∗

)

= σ

∗T

−1

∗

+ σ

= σ

(τ

+ 1).

We can see this as the uncertainty in the regression line

, plus the wobble

about the regression line σ

. So

∗

− Y

∗

∼ N (0, σ

(τ

+ 1)).

We therefore find that

∗

− Y

∗

˜σ

√

+ 1

∼ t

n−p

So the interval with endpoints

∗T

β ± ˜σ

+ 1t

n−p





is a 95% prediction interval for

∗

. We don’t call this a confidence interval —

confidence intervals are about finding parameters of the distribution, while the

prediction interval is about our predictions.

Example. A 95% prediction interval for Y

∗

at x

∗T

= (1, (50 − ¯x)) is

∗T

± ˜σ

+ 1t

n−p





= 808.5 ± 55.6 × 1.02 × 2.07 = (691.1, 925.8).

Note that this is much wider than our our expected response! This is since there

are three sources of uncertainty: we don’t know what

is, what

is, and the

random ε fluctuation!

Example. Wafer example continued: Suppose we wish to estimate the expected

resistivity of a new wafer in the first instrument. Here x

∗T

= (1

, ··· ,

0) (recall

that x is an indicator vector to indicate which instrument is used).

The estimated response at x

∗T

µ = ˆµ

= ¯y

= 124.3

We find

= x

∗T

−1

∗

So a 95% confidence interval for E[Y

∗

] is

∗T

µ ± ˜στ t

n−p





= 124.3 ±

10.4

√

× 2.09 = (114.6, 134.0).

Note that we are using an estimate of

obtained from all five instruments. If

we had only used the data from the first instrument, σ would be estimated as

˜σ

j=1

1,j

− ¯y

5 − 1

= 8.74.

The observed 95% confidence interval for µ

would have been

¯y

˜σ

√





= 124.3 ± 3.91 × 2.78 = (113.5, 135.1),

which is slightly wider. Usually it is much wider, but in this special case, we

only get little difference since the data from the first instrument is relatively

tighter than the others.

A 95% prediction interval for Y

∗

at x

∗T

= (1, 0, ··· , 0) is

∗T

µ ± ˜σ

+ 1t

n−p





= 124.3 ± 10.42 × 1.1 × 2.07 = (100.5, 148.1).