2Hypothesis testing

IB Statistics

2.5 Multivariate normal theory

2.5.1 Multivariate normal distribution

So far, we have only worked with scalar random variables or a vector of iid random

variables. In general, we can have a random (column) vector X = (

X

1

, ··· , X

n

)

T

,

where the X

i

are correlated.

The mean of this vector is given by

µ = E[X] = (E(X

1

), ··· , E(X

n

))

T

= (µ

1

, ··· , µ

n

)

T

.

Instead of just the variance, we have the covariance matrix

cov(X) = E[(X − µ)(X − µ)

T

] = (cov(X

i

, X

j

))

ij

,

provided they exist, of course.

We can multiply the vector X by an m × n matrix A. Then we have

E[AX] = Aµ,

and

cov(AX) = A cov(X)A

T

. (∗)

The last one comes from

cov(AX) = E[(AX − E[AX])(AX − E[AX])

T

]

= E[A(X − EX)(X − EX)

T

A

T

]

= AE[(X − EX)(X − EX)

T

]A

T

.

If we have two random vectors V

,

W, we can define the covariance

cov

(V

,

W)

to be a matrix with (

i, j

)th element

cov

(

V

i

, W

j

). Then

cov

(

A

X

, B

X) =

A cov(X)B

T

.

An important distribution is a multivariate normal distribution.

Definition (Multivariate normal distribution). X has a multivariate normal

distribution if, for every t

∈ R

n

, the random variable t

T

X (i.e. t

·

X) has a

normal distribution. If E[X] = µ and cov(X) = Σ, we write X ∼ N

n

(µ, Σ).

Note that Σ is symmetric and is positive semi-definite because by (∗),

t

T

Σt = var(t

T

X) ≥ 0.

So what is the pdf of a multivariate normal? And what is the moment generating

function? Recall that a (univariate) normal X ∼ N(µ, σ

2

) has density

f

X

(x; µ, σ

2

) =

1

√

2πσ

exp

−

1

2

(x − µ)

2

σ

2

,

with moment generating function

M

X

(s) = E[e

sX

] = exp

µs +

1

2

σ

2

s

2

.

Hence for any t, the moment generating function of t

T

X is given by

M

t

T

X

(s) = E[e

st

T

X

] = exp

t

T

µs +

1

2

t

T

Σts

2

.

Hence X has mgf

M

X

(t) = E[e

t

T

X

] = M

t

T

X

(1) = exp

t

T

µ +

1

2

t

T

Σt

. (†)

Proposition.

(i)

If X

∼ N

n

(

µ,

Σ), and

A

is an

m × n

matrix, then

A

X

∼ N

m

(

Aµ, A

Σ

A

T

).

(ii) If X ∼ N

n

(0, σ

2

I), then

|X|

2

σ

2

=

X

T

X

σ

2

=

X

X

2

i

σ

2

∼ χ

2

n

.

Instead of writing |X|

2

/σ

2

∼ χ

2

n

, we often just say |X|

2

∼ σ

2

χ

2

n

.

Proof.

(i) See example sheet 3.

(ii) Immediate from definition of χ

2

n

.

Proposition. Let X

∼ N

n

(

µ,

Σ). We split X up into two parts: X =

X

1

X

2

,

where X

i

is a n

i

× 1 column vector and n

1

+ n

2

= n.

Similarly write

µ =

µ

1

µ

2

, Σ =

Σ

11

Σ

12

Σ

21

Σ

22

,

where Σ

ij

is an n

i

× n

j

matrix.

Then

(i) X

i

∼ N

n

i

(µ

i

, Σ

ii

)

(ii) X

1

and X

2

are independent iff Σ

12

= 0.

Proof.

(i) See example sheet 3.

(ii) Note that by symmetry of Σ, Σ

12

= 0 if and only if Σ

21

= 0.

From (

†

),

M

X

(t) =

exp

(t

T

µ

+

1

2

t

T

Σt) for each t

∈ R

n

. We write t =

t

1

t

2

.

Then the mgf is equal to

M

X

(t) =

exp

t

T

1

µ

1

+ t

T

2

µ

2

+ t

T

1

Σ

11

t

1

+

1

2

t

T

2

Σ

22

t

2

+

1

2

t

T

1

Σ

12

t

2

+

1

2

t

T

2

Σ

21

t

1

.

From (i), we know that

M

X

i

(t

i

) =

exp

(t

T

i

µ

i

+

1

2

t

T

i

Σ

ii

t

i

). So

M

X

(t) =

M

X

1

(t

1

)M

X

2

(t

2

) for all t if and only if Σ

12

= 0.

Proposition. When Σ is a positive definite, then X has pdf

f

X

(x; µ, Σ) =

1

|Σ|

2

1

√

2π

n

exp

−

1

2

(x − µ)

T

Σ

−1

(x − µ)

.

Note that Σ is always positive semi-definite. The conditions just forbid the

case |Σ| = 0, since this would lead to dividing by zero.

2.5.2 Normal random samples

We wish to use our knowledge about multivariate normals to study univariate

normal data. In particular, we want to prove the following:

Theorem (Joint distribution of

¯

X

and

S

XX

). Suppose

X

1

, ··· , X

n

are iid

N(µ, σ

2

) and

¯

X =

1

n

P

X

i

, and S

XX

=

P

(X

i

−

¯

X)

2

. Then

(i)

¯

X ∼ N(µ, σ

2

/n)

(ii) S

XX

/σ

2

∼ χ

2

n−1

.

(iii)

¯

X and S

XX

are independent.

Proof.

We can write the joint density as X

∼ N

n

(

µ, σ

2

I

), where

µ

=

(µ, µ, ··· , µ).

Let

A

be an

n × n

orthogonal matrix with the first row all 1

/

√

n

(the other

rows are not important). One possible such matrix is

A =

1

√

n

1

√

n

1

√

n

1

√

n

···

1

√

n

1

√

2×1

−1

√

2×1

0 0 ··· 0

1

√

3×2

1

√

3×2

−2

√

3×2

0 ··· 0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

1

√

n(n−1)

1

√

n(n−1)

1

√

n(n−1)

1

√

n(n−1)

···

−(n−1)

√

n(n−1)

Now define Y = AX. Then

Y ∼ N

n

(Aµ, Aσ

2

IA

T

) = N

n

(Aµ, σ

2

I).

We have

Aµ = (

√

nµ, 0, ··· , 0)

T

.

So

Y

1

∼ N

(

√

nµ, σ

2

) and

Y

i

∼ N

(0

, σ

2

) for

i

= 2

, ··· , n

. Also,

Y

1

, ··· , Y

n

are

independent, since the covariance matrix is every non-diagonal term 0.

But from the definition of A, we have

Y

1

=

1

√

n

n

X

i=1

X

i

=

√

n

¯

X.

So

√

n

¯

X ∼ N(

√

nµ, σ

2

), or

¯

X ∼ N(µ, σ

2

/n). Also

Y

2

2

+ ··· + Y

2

n

= Y

T

Y − Y

2

1

= X

T

A

T

AX − Y

2

1

= X

T

X − n

¯

X

2

=

n

X

i=1

X

2

i

− n

¯

X

2

=

n

X

i=1

(X

i

−

¯

X)

2

= S

XX

.

So S

XX

= Y

2

2

+ ··· + Y

2

n

∼ σ

2

χ

2

n−1

.

Finally, since Y

1

and Y

2

, ··· , Y

n

are independent, so are

¯

X and S

XX

.