1Estimation

IB Statistics

1.2 Mean squared error

Given an estimator, we want to know how good the estimator is. We have just

come up with the concept of the bias above. However, this is generally not a

good measure of how good the estimator is.

For example, if we do 1000 random trials

X

1

, ··· , X

1000

, we can pick our

estimator as

T

(X) =

X

1

. This is an unbiased estimator, but is really bad because

we have just wasted the data from the other 999 trials. On the other hand,

T

′

(X) = 0

.

01 +

1

1000

P

X

i

is biased (with a bias of 0

.

01), but is in general much

more trustworthy than

T

. In fact, at the end of the section, we will construct

cases where the only possible unbiased estimator is a completely silly estimator

to use.

Instead, a commonly used measure is the mean squared error.

Definition (Mean squared error). The mean squared error of an estimator

ˆ

θ

is

E

θ

[(

ˆ

θ − θ)

2

].

Sometimes, we use the root mean squared error, that is the square root of

the above.

We can express the mean squared error in terms of the variance and bias:

E

θ

[(

ˆ

θ − θ)

2

] = E

θ

[(

ˆ

θ − E

θ

(

ˆ

θ) + E

θ

(

ˆ

θ) − θ)

2

]

= E

θ

[(

ˆ

θ − E

θ

(

ˆ

θ))

2

] + [E

θ

(

ˆ

θ) − θ]

2

+ 2E

θ

[E

θ

(

ˆ

θ) − θ] E

θ

[

ˆ

θ − E

θ

(

ˆ

θ)]

| {z }

0

= var(

ˆ

θ) + bias

2

(

ˆ

θ).

If we are aiming for a low mean squared error, sometimes it could be preferable to

have a biased estimator with a lower variance. This is known as the “bias-variance

trade-off”.

For example, suppose

X ∼ binomial

(

n, θ

), where

n

is given and

θ

is to be

determined. The standard estimator is

T

U

=

X/n

, which is unbiased.

T

U

has

variance

var

θ

(T

U

) =

var

θ

(X)

n

2

=

θ(1 − θ)

n

.

Hence the mean squared error of the usual estimator is given by

mse(T

U

) = var

θ

(T

U

) + bias

2

(T

U

) = θ(1 − θ)/n.

Consider an alternative estimator

T

B

=

X + 1

n + 2

= w

X

n

+ (1 − w)

1

2

,

where

w

=

n/

(

n

+ 2). This can be interpreted to be a weighted average (by the

sample size) of the sample mean and 1/2. We have

E

θ

(T

B

) − θ =

nθ + 1

n + 2

− θ = (1 − w)

1

2

− θ

,

and is biased. The variance is given by

var

θ

(T

B

) =

var

θ

(X)

(n + 2)

2

= w

2

θ(1 − θ)

n

.

Hence the mean squared error is

mse(T

B

) = var

θ

(T

B

) + bias

2

(T

B

) = w

2

θ(1 − θ)

n

+ (1 − w)

2

1

2

− θ

2

.

We can plot the mean squared error of each estimator for possible values of

θ

.

Here we plot the case where n = 10.

unbiased estimator

biased estimator

θ

mse

0 0.2 0.4 0.6 0.8 1.0

0

0.01

0.02

0.03

This biased estimator has smaller MSE unless θ has extreme values.

We see that sometimes biased estimators could give better mean squared

errors. In some cases, not only could unbiased estimators be worse — they could

be completely nonsense.

Suppose

X ∼ Poisson

(

λ

), and for whatever reason, we want to estimate

θ

= [

P

(

X

= 0)]

2

=

e

−2λ

. Then any unbiased estimator

T

(

X

) must satisfy

E

θ

(T (X)) = θ, or equivalently,

E

λ

(T (X)) = e

−λ

∞

X

x=0

T (x)

λ

x

x!

= e

−2λ

.

The only function T that can satisfy this equation is T (X) = (−1)

X

.

Thus the unbiased estimator would estimate e

−2λ

to be 1 if X is even, -1 if

X is odd. This is clearly nonsense.