2Hypothesis testing

IB Statistics

2.6 Student’s t-distribution

Definition (

t

-distribution). Suppose that

Z

and

Y

are independent,

Z ∼ N

(0

,

1)

and Y ∼ χ

2

k

. Then

T =

Z

p

Y/k

is said to have a t-distribution on k degrees of freedom, and we write T ∼ t

k

.

The density of t

k

turns out to be

f

T

(t) =

Γ((k + 1)/2)

Γ(k/2)

1

√

πk

1 +

t

2

k

−(k+1)/2

.

This density is symmetric, bell-shaped, and has a maximum at

t

= 0, which

is rather like the standard normal density. However, it can be shown that

P

(

T > t

)

> P

(

Z > t

), i.e. the

T

distribution has a “fatter” tail. Also, as

k → ∞

,

t

k

approaches a normal distribution.

Proposition. If k > 1, then E

k

(T ) = 0.

If k > 2, then var

k

(T ) =

k

k−2

.

If k = 2, then var

k

(T ) = ∞.

In all other cases, the values are undefined. In particular, the

k

= 1 case has

undefined mean and variance. This is known as the Cauchy distribution.

Notation. We write

t

k

(

α

) be the upper 100

α

% point of the

t

k

distribution, so

that P(T > t

k

(α)) = α.

Why would we define such a weird distribution? The typical application is

to study random samples with unknown mean and unknown variance.

Let

X

1

, ··· , X

n

be iid

N

(

µ, σ

2

). Then

¯

X ∼ N

(

µ, σ

2

/n

). So

Z

=

√

n(

¯

X−µ)

σ

∼

N(0, 1).

Also, S

XX

/σ

2

∼ χ

2

n−1

and is independent of

¯

X, and hence Z. So

√

n(

¯

X − µ)/σ

p

S

XX

/((n − 1)σ

2

)

∼ t

n−1

,

or

√

n(

¯

X − µ)

p

S

XX

/(n − 1)

∼ t

n−1

.

We write

˜σ

2

=

S

XX

n−1

(note that this is the unbiased estimator). Then a 100(1

−α

)%

confidence interval for µ is found from

1 − α = P

−t

n−1

α

2

≤

√

n(

¯

X − µ)

˜σ

≤ t

n−1

α

2

.

This has endpoints

¯

X ±

˜σ

√

n

t

n−1

α

2

.