2Kernel machines

III Modern Statistical Methods



2 Kernel machines
We are going to start a little bit slowly, and think about our linear model
Y
=
Xβ
0
+
ε
, where
E
(
ε
) = 0 and
var
(
ε
) =
σ
2
I
. Ordinary least squares is an
unbiased estimator, so let’s look at biased estimators.
For a biased estimator,
˜
β
, we should not study the variance, but the mean
squared error
E[(
˜
β β
0
)(
˜
β β
0
)
T
] = E(
˜
β E
˜
β + E
˜
β β
0
)(
˜
β E
˜
β + E
˜
β β
0
)
T
= var(
˜
β) + (E
˜
β β
0
)(E
˜
β β
0
)
T
The first term is, of course, just the variance, and the second is the squared bias.
So the point is that if we pick a clever biased estimator with a tiny variance,
then this might do better than unbiased estimators with large variance.

Contents