2Hypothesis testing

IB Statistics



2.5 Multivariate normal theory
2.5.1 Multivariate normal distribution
So far, we have only worked with scalar random variables or a vector of iid random
variables. In general, we can have a random (column) vector X = (
X
1
, ··· , X
n
)
T
,
where the X
i
are correlated.
The mean of this vector is given by
µ = E[X] = (E(X
1
), ··· , E(X
n
))
T
= (µ
1
, ··· , µ
n
)
T
.
Instead of just the variance, we have the covariance matrix
cov(X) = E[(X µ)(X µ)
T
] = (cov(X
i
, X
j
))
ij
,
provided they exist, of course.
We can multiply the vector X by an m × n matrix A. Then we have
E[AX] = Aµ,
and
cov(AX) = A cov(X)A
T
. ()
The last one comes from
cov(AX) = E[(AX E[AX])(AX E[AX])
T
]
= E[A(X EX)(X EX)
T
A
T
]
= AE[(X EX)(X EX)
T
]A
T
.
If we have two random vectors V
,
W, we can define the covariance
cov
(V
,
W)
to be a matrix with (
i, j
)th element
cov
(
V
i
, W
j
). Then
cov
(
A
X
, B
X) =
A cov(X)B
T
.
An important distribution is a multivariate normal distribution.
Definition (Multivariate normal distribution). X has a multivariate normal
distribution if, for every t
R
n
, the random variable t
T
X (i.e. t
·
X) has a
normal distribution. If E[X] = µ and cov(X) = Σ, we write X N
n
(µ, Σ).
Note that Σ is symmetric and is positive semi-definite because by (),
t
T
Σt = var(t
T
X) 0.
So what is the pdf of a multivariate normal? And what is the moment generating
function? Recall that a (univariate) normal X N(µ, σ
2
) has density
f
X
(x; µ, σ
2
) =
1
2πσ
exp
1
2
(x µ)
2
σ
2
,
with moment generating function
M
X
(s) = E[e
sX
] = exp
µs +
1
2
σ
2
s
2
.
Hence for any t, the moment generating function of t
T
X is given by
M
t
T
X
(s) = E[e
st
T
X
] = exp
t
T
µs +
1
2
t
T
Σts
2
.
Hence X has mgf
M
X
(t) = E[e
t
T
X
] = M
t
T
X
(1) = exp
t
T
µ +
1
2
t
T
Σt
. ()
Proposition.
(i)
If X
N
n
(
µ,
Σ), and
A
is an
m × n
matrix, then
A
X
N
m
(
Aµ, A
Σ
A
T
).
(ii) If X N
n
(0, σ
2
I), then
|X|
2
σ
2
=
X
T
X
σ
2
=
X
X
2
i
σ
2
χ
2
n
.
Instead of writing |X|
2
2
χ
2
n
, we often just say |X|
2
σ
2
χ
2
n
.
Proof.
(i) See example sheet 3.
(ii) Immediate from definition of χ
2
n
.
Proposition. Let X
N
n
(
µ,
Σ). We split X up into two parts: X =
X
1
X
2
,
where X
i
is a n
i
× 1 column vector and n
1
+ n
2
= n.
Similarly write
µ =
µ
1
µ
2
, Σ =
Σ
11
Σ
12
Σ
21
Σ
22
,
where Σ
ij
is an n
i
× n
j
matrix.
Then
(i) X
i
N
n
i
(µ
i
, Σ
ii
)
(ii) X
1
and X
2
are independent iff Σ
12
= 0.
Proof.
(i) See example sheet 3.
(ii) Note that by symmetry of Σ, Σ
12
= 0 if and only if Σ
21
= 0.
From (
),
M
X
(t) =
exp
(t
T
µ
+
1
2
t
T
Σt) for each t
R
n
. We write t =
t
1
t
2
.
Then the mgf is equal to
M
X
(t) =
exp
t
T
1
µ
1
+ t
T
2
µ
2
+ t
T
1
Σ
11
t
1
+
1
2
t
T
2
Σ
22
t
2
+
1
2
t
T
1
Σ
12
t
2
+
1
2
t
T
2
Σ
21
t
1
.
From (i), we know that
M
X
i
(t
i
) =
exp
(t
T
i
µ
i
+
1
2
t
T
i
Σ
ii
t
i
). So
M
X
(t) =
M
X
1
(t
1
)M
X
2
(t
2
) for all t if and only if Σ
12
= 0.
Proposition. When Σ is a positive definite, then X has pdf
f
X
(x; µ, Σ) =
1
|Σ|
2
1
2π
n
exp
1
2
(x µ)
T
Σ
1
(x µ)
.
Note that Σ is always positive semi-definite. The conditions just forbid the
case |Σ| = 0, since this would lead to dividing by zero.
2.5.2 Normal random samples
We wish to use our knowledge about multivariate normals to study univariate
normal data. In particular, we want to prove the following:
Theorem (Joint distribution of
¯
X
and
S
XX
). Suppose
X
1
, ··· , X
n
are iid
N(µ, σ
2
) and
¯
X =
1
n
P
X
i
, and S
XX
=
P
(X
i
¯
X)
2
. Then
(i)
¯
X N(µ, σ
2
/n)
(ii) S
XX
2
χ
2
n1
.
(iii)
¯
X and S
XX
are independent.
Proof.
We can write the joint density as X
N
n
(
µ, σ
2
I
), where
µ
=
(µ, µ, ··· , µ).
Let
A
be an
n × n
orthogonal matrix with the first row all 1
/
n
(the other
rows are not important). One possible such matrix is
A =
1
n
1
n
1
n
1
n
···
1
n
1
2×1
1
2×1
0 0 ··· 0
1
3×2
1
3×2
2
3×2
0 ··· 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
n(n1)
1
n(n1)
1
n(n1)
1
n(n1)
···
(n1)
n(n1)
Now define Y = AX. Then
Y N
n
(Aµ,
2
IA
T
) = N
n
(Aµ, σ
2
I).
We have
Aµ = (
nµ, 0, ··· , 0)
T
.
So
Y
1
N
(
nµ, σ
2
) and
Y
i
N
(0
, σ
2
) for
i
= 2
, ··· , n
. Also,
Y
1
, ··· , Y
n
are
independent, since the covariance matrix is every non-diagonal term 0.
But from the definition of A, we have
Y
1
=
1
n
n
X
i=1
X
i
=
n
¯
X.
So
n
¯
X N(
nµ, σ
2
), or
¯
X N(µ, σ
2
/n). Also
Y
2
2
+ ··· + Y
2
n
= Y
T
Y Y
2
1
= X
T
A
T
AX Y
2
1
= X
T
X n
¯
X
2
=
n
X
i=1
X
2
i
n
¯
X
2
=
n
X
i=1
(X
i
¯
X)
2
= S
XX
.
So S
XX
= Y
2
2
+ ··· + Y
2
n
σ
2
χ
2
n1
.
Finally, since Y
1
and Y
2
, ··· , Y
n
are independent, so are
¯
X and S
XX
.