IB Statistics - Estimation

1Estimation

IB Statistics

1.4 Likelihood

There are many different estimators we can pick, and we have just come up with

some criteria to determine whether an estimator is “good”. However, these do

not give us a systematic way of coming up with an estimator to actually use. In

practice, we often use the maximum likelihood estimator.

Let

, ··· , X

be random variables with joint pdf/pmf

| θ

). We

observe X = x.

Definition (Likelihood). For any given x, the likelihood of

(

) =

), regarded as a function of

. The maximum likelihood estimator (mle) of

an estimator that picks the value of θ that maximizes like(θ).

Often there is no closed form for the mle, and we have to find

numerically.

When we can find the mle explicitly, in practice, we often maximize the

log-likelihood instead of the likelihood. In particular, if

, ··· , X

are iid, each

with pdf/pmf f

(x | θ), then

like(θ) =

i=1

| θ),

log like(θ) =

i=1

log f

| θ).

Example. Let X

, ··· , X

be iid Bernoulli(p). Then

l(p) = log like(p) =





log p +



n −



log(1 − p).

Thus

−

n −

1 − p

This is zero when

. So this is the maximum likelihood estimator

(and is unbiased).

Example. Let

, ··· , X

be iid

(

µ, σ

), and we want to estimate

= (

µ, σ

Then

l(µ, σ

) = log like(µ, σ

) = −

log(2π) −

log(σ

) −

2σ

− µ)

This is maximized when

∂l

∂µ

∂l

∂σ

= 0.

We have

∂l

∂µ

= −

− µ),

∂l

∂σ

= −

2σ

− µ)

So the solution, hence maximum likelihood estimator is (

ˆµ, ˆσ

) = (

¯x, S

where ¯x =

and S

− ¯x)

We shall see later that

/σ

nˆσ

∼ χ

n−1

, and so

(

ˆσ

) =

(n−1)σ

, i.e.

ˆσ

is biased.

Example (German tank problem). Suppose the American army discovers some

German tanks that are sequentially numbered, i.e. the first tank is numbered 1,

the second is numbered 2, etc. Then if

tanks are produced, then the probability

distribution of the tank number is

, θ

). Suppose we have discovered

tanks

whose numbers are

, x

, ··· , x

, and we want to estimate

, the total number

of tanks produced. We want to find the maximum likelihood estimator.

Then

like(θ) =

[max x

≤θ]

[min x

≥0]

So for

θ ≥ max x

(

) = 1

/θ

and is decreasing as

increases, while for

θ < max x

, like(θ) = 0. Hence the value

θ = max x

maximizes the likelihood.

unbiased? First we need to find the distribution of

. For 0

≤ t ≤ θ

, the

cumulative distribution function of

θ is

(t) = P (

θ ≤ t) = P(X

≤ t for all i) = (P(X

≤ t))





Differentiating with respect to T , we find the pdf f

n−1

. Hence

θ) =

n−1

dt =

nθ

n + 1

θ is biased, but asymptotically unbiased.

Example. Smarties come in

equally frequent colours, but suppose we do not

know

. Assume that we sample with replacement (although this is unhygienic).

Our first Smarties are Red, Purple, Red, Yellow. Then

like(k) = P

(1st is a new colour)P

(2nd is a new colour)

(3rd matches 1st)P

(4th is a new colour)

= 1 ×

k − 1

k − 2

(k − 1)(k − 2)

The maximum likelihood is 5 (by trial and error), even though it is not much

likelier than the others.

How does the mle relate to sufficient statistics? Suppose that

is sufficient

for

. Then the likelihood is

(

(x)

, θ

)

(x), which depends on

through

(x).

To maximise this as a function of

, we only need to maximize

. So the mle

is a function of the sufficient statistic.

Note that if

(

) with

injective, then the mle of

is given by

(

). For

example, if the mle of the standard deviation

ˆσ

, then the mle of the variance

ˆσ

. This is rather useful in practice, since we can use this to simplify a lot

of computations.