1Estimation
IB Statistics
1.4 Likelihood
There are many different estimators we can pick, and we have just come up with
some criteria to determine whether an estimator is “good”. However, these do
not give us a systematic way of coming up with an estimator to actually use. In
practice, we often use the maximum likelihood estimator.
Let
X
1
, ··· , X
n
be random variables with joint pdf/pmf
f
X
(x
 θ
). We
observe X = x.
Definition (Likelihood). For any given x, the likelihood of
θ
is
like
(
θ
) =
f
X
(x

θ
), regarded as a function of
θ
. The maximum likelihood estimator (mle) of
θ
is
an estimator that picks the value of θ that maximizes like(θ).
Often there is no closed form for the mle, and we have to find
ˆ
θ
numerically.
When we can find the mle explicitly, in practice, we often maximize the
loglikelihood instead of the likelihood. In particular, if
X
1
, ··· , X
n
are iid, each
with pdf/pmf f
X
(x  θ), then
like(θ) =
n
Y
i=1
f
X
(x
i
 θ),
log like(θ) =
n
X
i=1
log f
X
(x
i
 θ).
Example. Let X
1
, ··· , X
n
be iid Bernoulli(p). Then
l(p) = log like(p) =
X
x
i
log p +
n −
X
x
i
log(1 − p).
Thus
dl
dp
=
P
x
i
p
−
n −
P
x
i
1 − p
.
This is zero when
p
=
P
x
i
/n
. So this is the maximum likelihood estimator
(and is unbiased).
Example. Let
X
1
, ··· , X
n
be iid
N
(
µ, σ
2
), and we want to estimate
θ
= (
µ, σ
2
).
Then
l(µ, σ
2
) = log like(µ, σ
2
) = −
n
2
log(2π) −
n
2
log(σ
2
) −
1
2σ
2
X
(x
i
− µ)
2
.
This is maximized when
∂l
∂µ
=
∂l
∂σ
2
= 0.
We have
∂l
∂µ
= −
1
σ
2
X
(x
i
− µ),
∂l
∂σ
2
= −
n
2σ
2
+
1
2σ
4
X
(x
i
− µ)
2
.
So the solution, hence maximum likelihood estimator is (
ˆµ, ˆσ
2
) = (
¯x, S
xx
/n
),
where ¯x =
1
n
P
x
i
and S
xx
=
P
(x
i
− ¯x)
2
.
We shall see later that
S
XX
/σ
2
=
nˆσ
2
σ
2
∼ χ
2
n−1
, and so
E
(
ˆσ
2
) =
(n−1)σ
2
n
, i.e.
ˆσ
2
is biased.
Example (German tank problem). Suppose the American army discovers some
German tanks that are sequentially numbered, i.e. the first tank is numbered 1,
the second is numbered 2, etc. Then if
θ
tanks are produced, then the probability
distribution of the tank number is
U
(0
, θ
). Suppose we have discovered
n
tanks
whose numbers are
x
1
, x
2
, ··· , x
n
, and we want to estimate
θ
, the total number
of tanks produced. We want to find the maximum likelihood estimator.
Then
like(θ) =
1
θ
n
1
[max x
i
≤θ]
1
[min x
i
≥0]
.
So for
θ ≥ max x
i
,
like
(
θ
) = 1
/θ
n
and is decreasing as
θ
increases, while for
θ < max x
i
, like(θ) = 0. Hence the value
ˆ
θ = max x
i
maximizes the likelihood.
Is
ˆ
θ
unbiased? First we need to find the distribution of
ˆ
θ
. For 0
≤ t ≤ θ
, the
cumulative distribution function of
ˆ
θ is
F
ˆ
θ
(t) = P (
ˆ
θ ≤ t) = P(X
i
≤ t for all i) = (P(X
i
≤ t))
n
=
t
θ
n
,
Differentiating with respect to T , we find the pdf f
ˆ
θ
=
nt
n−1
θ
n
. Hence
E(
ˆ
θ) =
Z
θ
0
t
nt
n−1
θ
n
dt =
nθ
n + 1
.
So
ˆ
θ is biased, but asymptotically unbiased.
Example. Smarties come in
k
equally frequent colours, but suppose we do not
know
k
. Assume that we sample with replacement (although this is unhygienic).
Our first Smarties are Red, Purple, Red, Yellow. Then
like(k) = P
k
(1st is a new colour)P
k
(2nd is a new colour)
P
k
(3rd matches 1st)P
k
(4th is a new colour)
= 1 ×
k − 1
k
×
1
k
×
k − 2
k
=
(k − 1)(k − 2)
k
3
.
The maximum likelihood is 5 (by trial and error), even though it is not much
likelier than the others.
How does the mle relate to sufficient statistics? Suppose that
T
is sufficient
for
θ
. Then the likelihood is
g
(
T
(x)
, θ
)
h
(x), which depends on
θ
through
T
(x).
To maximise this as a function of
θ
, we only need to maximize
g
. So the mle
ˆ
θ
is a function of the sufficient statistic.
Note that if
ϕ
=
h
(
θ
) with
h
injective, then the mle of
ϕ
is given by
h
(
ˆ
θ
). For
example, if the mle of the standard deviation
σ
is
ˆσ
, then the mle of the variance
σ
2
is
ˆσ
2
. This is rather useful in practice, since we can use this to simplify a lot
of computations.