IA Probability - Continuous random variables

5Continuous random variables

IA Probability

5.6 Transformation of random variables

We will now look at what happens when we apply a function to random variables.

We first look at the simple case where there is just one variable, and then move

on to the general case where we have multiple variables and can mix them

together.

Single random variable

Theorem. If

is a continuous random variable with a pdf

(

), and

(

)

is a continuous, strictly increasing function with

−1

(

) differentiable, then

Y = h(X) is a random variable with pdf

(y) = f

−1

(y))

−1

(y).

Proof.

(y) = P(Y ≤ y)

= P(h(X) ≤ y)

= P(X ≤ h

−1

(y))

= F (h

−1

(y))

Take the derivative with respect to y to obtain

(y) = F

′

(y) = f(h

−1

(y))

−1

(y).

It is often easier to redo the proof than to remember the result.

Example. Let X ∼ U[0, 1]. Let Y = −log X. Then

P(Y ≤ y) = P(−log X ≤ y)

= P(X ≥ e

−y

)

= 1 − e

−y

But this is the cumulative distribution function of

(1). So

is exponentially

distributed with λ = 1.

In general, we get the following result:

Theorem. Let

U ∼ U

1]. For any strictly increasing distribution function

the random variable X = F

−1

U has distribution function F .

Proof.

P(X ≤ x) = P(F

−1

(U) ≤ x) = P(U ≤ F (x)) = F (x).

This condition “strictly increasing” is needed for the inverse to exist. If it is

not, we can define

−1

(u) = inf{x : F (x) ≥ u, 0 < u < 1},

and the same result holds.

This can also be done for discrete random variables

(

) =

by letting

X = x

j−1

i=0

≤ U <

i=0

for U ∼ U [0, 1].

Multiple random variables

Suppose X

, X

, ··· , X

are random variables with joint pdf f. Let

= r

, ··· , X

)

= r

, ··· , X

)

= r

, ··· , X

For example, we might have Y

and Y

= X

+ X

Let

R ⊆ R

such that

((

, ··· , X

)

∈ R

) = 1, i.e.

is the set of all values

) can take.

Suppose

is the image of

under the above transformation, and the map

R → S is bijective. Then there exists an inverse function

= s

, ··· , Y

)

= s

, ··· , Y

)

= s

, ··· , Y

For example, if

, X

refers to the coordinates of a random point in Cartesian

coordinates, Y

, Y

might be the coordinates in polar coordinates.

Definition (Jacobian determinant). Suppose

∂s

∂y

exists and is continuous at

every point (y

, ··· , y

) ∈ S. Then the Jacobian determinant is

J =

∂(s

, ··· , s

)

∂(y

, ··· , y

)

= det







∂s

∂y

···

∂s

∂y

∂s

∂y

···

∂s

∂y







Take

A ⊆ R

and

(

). Then using results from IA Vector Calculus, we

get

P((X

, ··· , X

) ∈ A) =

f(x

, ··· , x

) dx

···dx

f(s

, ···y

), s

, ··· , s

)|J| dy

··· dy

= P((Y

, ···Y

) ∈ B).

Proposition. (Y

, ··· , Y

) has density

g(y

, ··· , y

) = f(s

, ··· , y

), ···s

, ··· , y

))|J|

if (y

, ··· , y

) ∈ S, 0 otherwise.

Example. Suppose (X, Y ) has density

f(x, y) =

(

4xy 0 ≤ x ≤ 1, 0 ≤ y ≤ 1

0 otherwise

We see that X and Y are independent, with each having a density f(x) = 2x.

Define U = X/Y , V = XY . Then we have X =

√

UV and Y =

V/U.

The Jacobian is

det



∂x/∂u ∂x/∂v

∂y/∂u ∂y/∂v



= det



v/u

u/v

−

v/u

1/uv



Alternatively, we can find this by considering

det



∂u/∂x ∂u/∂y

∂v/∂x ∂u/∂y



= 2u,

and then inverting the matrix. So

g(u, v) = 4

√

if (u, v) is in the image S, 0 otherwise. So

g(u, v) =

I[(u, v) ∈ S].

Since this is not separable, we know that U and V are not independent.

In the linear case, life is easy. Suppose

Y =













= A













= AX

Then X = A

−1

Y. Then

∂x

∂y

= (A

−1

)

. So |J| = |det(A

−1

)| = |det A|

−1

. So

g(y

, ··· , y

) =

|det A|

f(A

−1

y).

Example. Suppose

, X

have joint pdf

(

, x

). Suppose we want to find

the pdf of

. We let

. Then

Y −Z

and

. Then







1 1

0 1





= AX

Then |J| = 1/|det A| = 1. Then

g(y, z) = f (y − z, z)

(y) =

∞

−∞

f(y − z, z) dz =

∞

−∞

f(z, y − z) dz.

If X

and X

are independent, f(x

, x

) = f

). Then

g(y) =

∞

−∞

(z)f

(y − z) dz.

Non-injective transformations

We previously discussed transformation of random variables by injective maps.

What if the mapping is not? There is no simple formula for that, and we have

to work out each case individually.

Example. Suppose X has pdf f. What is the pdf of Y = |X|?

We use our definition. We have

P(|X| ∈ (a, b)) =

f(x) +

−a

−b

f(x) dx =

(f(x) + f(−x)) dx.

(x) = f(x) + f(−x),

which makes sense, since getting

|X|

is equivalent to getting

X = −x.

Example. Suppose

∼ E

(

)

, X

∼ E

(

) are independent random variables.

Let Y = min(X

, X

). Then

P(Y ≥ t) = P(X

≥ t, X

≥ t)

= P(X

≥ t)P(X

≥ t)

= e

−λt

−µt

= e

−(λ+µ)t

So Y ∼ E(λ + µ).

Given random variables, not only can we ask for the minimum of the variables,

but also ask for, say, the second-smallest one. In general, we define the order

statistics as follows:

Definition (Order statistics). Suppose that

, ··· , X

are some random

variables, and

, ··· , Y

, ··· , X

arranged in increasing order, i.e.

≤

≤ ··· ≤ Y

. This is the order statistics.

We sometimes write Y

= X

(i)

Assume the X

are iid with cdf F and pdf f . Then the cdf of Y

P(Y

≤ y) = P(X

≤ y, ··· , X

≤ y) = P(X

≤ y) ···P(X

≤ y) = F (y)

So the pdf of Y

F (y)

= nf(y)F (y)

n−1

Also,

P(Y

≥ y) = P(X

≥ y, ··· , X

≥ y) = (1 − F (y))

What about the joint distribution of Y

, Y

G(y

, y

) = P(Y

≤ y

, Y

≤ y

)

= P(Y

≤ y

) − P(Y

≥ y

, Y

≤ y

)

= F (y

)

− (F (y

) − F (y

))

Then the pdf is

∂

∂y

G(y

, y

) = n(n − 1)(F (y

) − F (y

))

n−2

f(y

)f(y

We can think about this result in terms of the multinomial distribution. By defi-

nition, the probability that

∈

[

, y

) and

∈

(

−δ, y

] is approximately

g(y

, y

Suppose that

is sufficiently small that all other

n −

’s are very unlikely

to fall into [

, y

) and (

− δ, y

]. Then to find the probability required,

we can treat the sample space as three bins. We want exactly one

to fall

into the first and last bins, and

n −

’s to fall into the middle one. There are



1,n−2,1



= n(n − 1) ways of doing so.

The probability of each thing falling into the middle bin is

(

)

− F

(

and the probabilities of falling into the first and last bins are

(

)

and

(

)

Then the probability of Y

∈ [y

, y

+ δ) and Y

∈ (y

− δ, y

] is

n(n − 1)(F (y

) − F (y

))

n−2

f(y

)f(y

)δ

and the result follows.

We can also find the joint distribution of the order statistics, say

, since it

is just given by

g(y

, ···y

) = n!f(y

) ···f (y

≤ y

≤ ··· ≤ y

, 0 otherwise. We have this formula because there are

combinations of

, ··· , x

that produces a given order statistics

, ··· , y

, and

the pdf of each combination is f(y

) ···f (y

In the case of iid exponential variables, we find a nice distribution for the

order statistic.

Example. Let

, ··· , X

be iid

(

), and

, ··· , Y

be the order statistic.

Let

= Y

− Y

= Y

− Y

n−1

These are the distances between the occurrences. We can write this as a Z =

with

A =







1 0 0 ··· 0

−1 1 0 ··· 0

0 0 0 ··· 1







Then

det

(

) = 1 and hence

|J|

= 1. Suppose that the pdf of

, ··· , Z

is, say

h. Then

h(z

, ··· , z

) = g(y

, ··· , y

) · 1

= n!f(y

) ···f (y

)

= n!λ

−λ(y

+···+y

)

= n!λ

−λ(nz

+(n−1)z

+···+z

)

i=1

(λi)e

−(λi)z

n+1−i

Since h is expressed as a product of n density functions, we have

∼ E((n + 1 − i)λ).

with all Z

independent.