1Estimation

IB Statistics



1.2 Mean squared error
Given an estimator, we want to know how good the estimator is. We have just
come up with the concept of the bias above. However, this is generally not a
good measure of how good the estimator is.
For example, if we do 1000 random trials
X
1
, ··· , X
1000
, we can pick our
estimator as
T
(X) =
X
1
. This is an unbiased estimator, but is really bad because
we have just wasted the data from the other 999 trials. On the other hand,
T
(X) = 0
.
01 +
1
1000
P
X
i
is biased (with a bias of 0
.
01), but is in general much
more trustworthy than
T
. In fact, at the end of the section, we will construct
cases where the only possible unbiased estimator is a completely silly estimator
to use.
Instead, a commonly used measure is the mean squared error.
Definition (Mean squared error). The mean squared error of an estimator
ˆ
θ
is
E
θ
[(
ˆ
θ θ)
2
].
Sometimes, we use the root mean squared error, that is the square root of
the above.
We can express the mean squared error in terms of the variance and bias:
E
θ
[(
ˆ
θ θ)
2
] = E
θ
[(
ˆ
θ E
θ
(
ˆ
θ) + E
θ
(
ˆ
θ) θ)
2
]
= E
θ
[(
ˆ
θ E
θ
(
ˆ
θ))
2
] + [E
θ
(
ˆ
θ) θ]
2
+ 2E
θ
[E
θ
(
ˆ
θ) θ] E
θ
[
ˆ
θ E
θ
(
ˆ
θ)]
| {z }
0
= var(
ˆ
θ) + bias
2
(
ˆ
θ).
If we are aiming for a low mean squared error, sometimes it could be preferable to
have a biased estimator with a lower variance. This is known as the “bias-variance
trade-off”.
For example, suppose
X binomial
(
n, θ
), where
n
is given and
θ
is to be
determined. The standard estimator is
T
U
=
X/n
, which is unbiased.
T
U
has
variance
var
θ
(T
U
) =
var
θ
(X)
n
2
=
θ(1 θ)
n
.
Hence the mean squared error of the usual estimator is given by
mse(T
U
) = var
θ
(T
U
) + bias
2
(T
U
) = θ(1 θ)/n.
Consider an alternative estimator
T
B
=
X + 1
n + 2
= w
X
n
+ (1 w)
1
2
,
where
w
=
n/
(
n
+ 2). This can be interpreted to be a weighted average (by the
sample size) of the sample mean and 1/2. We have
E
θ
(T
B
) θ =
+ 1
n + 2
θ = (1 w)
1
2
θ
,
and is biased. The variance is given by
var
θ
(T
B
) =
var
θ
(X)
(n + 2)
2
= w
2
θ(1 θ)
n
.
Hence the mean squared error is
mse(T
B
) = var
θ
(T
B
) + bias
2
(T
B
) = w
2
θ(1 θ)
n
+ (1 w)
2
1
2
θ
2
.
We can plot the mean squared error of each estimator for possible values of
θ
.
Here we plot the case where n = 10.
unbiased estimator
biased estimator
θ
mse
0 0.2 0.4 0.6 0.8 1.0
0
0.01
0.02
0.03
This biased estimator has smaller MSE unless θ has extreme values.
We see that sometimes biased estimators could give better mean squared
errors. In some cases, not only could unbiased estimators be worse they could
be completely nonsense.
Suppose
X Poisson
(
λ
), and for whatever reason, we want to estimate
θ
= [
P
(
X
= 0)]
2
=
e
2λ
. Then any unbiased estimator
T
(
X
) must satisfy
E
θ
(T (X)) = θ, or equivalently,
E
λ
(T (X)) = e
λ
X
x=0
T (x)
λ
x
x!
= e
2λ
.
The only function T that can satisfy this equation is T (X) = (1)
X
.
Thus the unbiased estimator would estimate e
2λ
to be 1 if X is even, -1 if
X is odd. This is clearly nonsense.