2Hypothesis testing
IB Statistics
2.6 Student’s t-distribution
Definition (
t
-distribution). Suppose that
Z
and
Y
are independent,
Z ∼ N
(0
,
1)
and Y ∼ χ
2
k
. Then
T =
Z
p
Y/k
is said to have a t-distribution on k degrees of freedom, and we write T ∼ t
k
.
The density of t
k
turns out to be
f
T
(t) =
Γ((k + 1)/2)
Γ(k/2)
1
√
πk
1 +
t
2
k
−(k+1)/2
.
This density is symmetric, bell-shaped, and has a maximum at
t
= 0, which
is rather like the standard normal density. However, it can be shown that
P
(
T > t
)
> P
(
Z > t
), i.e. the
T
distribution has a “fatter” tail. Also, as
k → ∞
,
t
k
approaches a normal distribution.
Proposition. If k > 1, then E
k
(T ) = 0.
If k > 2, then var
k
(T ) =
k
k−2
.
If k = 2, then var
k
(T ) = ∞.
In all other cases, the values are undefined. In particular, the
k
= 1 case has
undefined mean and variance. This is known as the Cauchy distribution.
Notation. We write
t
k
(
α
) be the upper 100
α
% point of the
t
k
distribution, so
that P(T > t
k
(α)) = α.
Why would we define such a weird distribution? The typical application is
to study random samples with unknown mean and unknown variance.
Let
X
1
, ··· , X
n
be iid
N
(
µ, σ
2
). Then
¯
X ∼ N
(
µ, σ
2
/n
). So
Z
=
√
n(
¯
X−µ)
σ
∼
N(0, 1).
Also, S
XX
/σ
2
∼ χ
2
n−1
and is independent of
¯
X, and hence Z. So
√
n(
¯
X − µ)/σ
p
S
XX
/((n − 1)σ
2
)
∼ t
n−1
,
or
√
n(
¯
X − µ)
p
S
XX
/(n − 1)
∼ t
n−1
.
We write
˜σ
2
=
S
XX
n−1
(note that this is the unbiased estimator). Then a 100(1
−α
)%
confidence interval for µ is found from
1 − α = P
−t
n−1
α
2
≤
√
n(
¯
X − µ)
˜σ
≤ t
n−1
α
2
.
This has endpoints
¯
X ±
˜σ
√
n
t
n−1
α
2
.