5Continuous random variables

IA Probability



5.6 Transformation of random variables
We will now look at what happens when we apply a function to random variables.
We first look at the simple case where there is just one variable, and then move
on to the general case where we have multiple variables and can mix them
together.
Single random variable
Theorem. If
X
is a continuous random variable with a pdf
f
(
x
), and
h
(
x
)
is a continuous, strictly increasing function with
h
1
(
x
) differentiable, then
Y = h(X) is a random variable with pdf
f
Y
(y) = f
X
(h
1
(y))
d
dy
h
1
(y).
Proof.
F
Y
(y) = P(Y y)
= P(h(X) y)
= P(X h
1
(y))
= F (h
1
(y))
Take the derivative with respect to y to obtain
f
Y
(y) = F
Y
(y) = f(h
1
(y))
d
dy
h
1
(y).
It is often easier to redo the proof than to remember the result.
Example. Let X U[0, 1]. Let Y = log X. Then
P(Y y) = P(log X y)
= P(X e
y
)
= 1 e
y
.
But this is the cumulative distribution function of
E
(1). So
Y
is exponentially
distributed with λ = 1.
In general, we get the following result:
Theorem. Let
U U
[0
,
1]. For any strictly increasing distribution function
F
,
the random variable X = F
1
U has distribution function F .
Proof.
P(X x) = P(F
1
(U) x) = P(U F (x)) = F (x).
This condition “strictly increasing” is needed for the inverse to exist. If it is
not, we can define
F
1
(u) = inf{x : F (x) u, 0 < u < 1},
and the same result holds.
This can also be done for discrete random variables
P
(
X
=
x
i
) =
p
i
by letting
X = x
j
if
j1
X
i=0
p
i
U <
j
X
i=0
p
i
,
for U U [0, 1].
Multiple random variables
Suppose X
1
, X
2
, ··· , X
n
are random variables with joint pdf f. Let
Y
1
= r
1
(X
1
, ··· , X
n
)
Y
2
= r
2
(X
1
, ··· , X
n
)
.
.
.
Y
n
= r
n
(X
1
, ··· , X
n
).
For example, we might have Y
1
=
X
1
X
1
+X
2
and Y
2
= X
1
+ X
2
.
Let
R R
n
such that
P
((
X
1
, ··· , X
n
)
R
) = 1, i.e.
R
is the set of all values
(X
i
) can take.
Suppose
S
is the image of
R
under the above transformation, and the map
R S is bijective. Then there exists an inverse function
X
1
= s
1
(Y
1
, ··· , Y
n
)
X
2
= s
2
(Y
1
, ··· , Y
n
)
.
.
.
X
n
= s
n
(Y
1
, ··· , Y
n
).
For example, if
X
1
, X
2
refers to the coordinates of a random point in Cartesian
coordinates, Y
1
, Y
2
might be the coordinates in polar coordinates.
Definition (Jacobian determinant). Suppose
s
i
y
j
exists and is continuous at
every point (y
1
, ··· , y
n
) S. Then the Jacobian determinant is
J =
(s
1
, ··· , s
n
)
(y
1
, ··· , y
n
)
= det
s
1
y
1
···
s
1
y
n
.
.
.
.
.
.
.
.
.
s
n
y
1
···
s
n
y
n
Take
A R
and
B
=
r
(
A
). Then using results from IA Vector Calculus, we
get
P((X
1
, ··· , X
n
) A) =
Z
A
f(x
1
, ··· , x
n
) dx
1
···dx
n
=
Z
B
f(s
1
(y
1
, ···y
n
), s
2
, ··· , s
n
)|J| dy
1
··· dy
n
= P((Y
1
, ···Y
n
) B).
So
Proposition. (Y
1
, ··· , Y
n
) has density
g(y
1
, ··· , y
n
) = f(s
1
(y
1
, ··· , y
n
), ···s
n
(y
1
, ··· , y
n
))|J|
if (y
1
, ··· , y
n
) S, 0 otherwise.
Example. Suppose (X, Y ) has density
f(x, y) =
(
4xy 0 x 1, 0 y 1
0 otherwise
We see that X and Y are independent, with each having a density f(x) = 2x.
Define U = X/Y , V = XY . Then we have X =
UV and Y =
p
V/U.
The Jacobian is
det
x/∂u x/∂v
y/∂u y/∂v
= det
1
2
p
v/u
1
2
p
u/v
1
2
p
v/u
3
1
2
p
1/uv
=
1
2u
Alternatively, we can find this by considering
det
u/∂x u/∂y
v/∂x u/∂y
= 2u,
and then inverting the matrix. So
g(u, v) = 4
uv
r
v
u
1
2u
=
2v
u
,
if (u, v) is in the image S, 0 otherwise. So
g(u, v) =
2v
u
I[(u, v) S].
Since this is not separable, we know that U and V are not independent.
In the linear case, life is easy. Suppose
Y =
Y
1
.
.
.
Y
n
= A
X
1
.
.
.
X
n
= AX
Then X = A
1
Y. Then
x
i
y
j
= (A
1
)
ij
. So |J| = |det(A
1
)| = |det A|
1
. So
g(y
1
, ··· , y
n
) =
1
|det A|
f(A
1
y).
Example. Suppose
X
1
, X
2
have joint pdf
f
(
x
1
, x
2
). Suppose we want to find
the pdf of
Y
=
X
1
+
X
2
. We let
Z
=
X
2
. Then
X
1
=
Y Z
and
X
2
=
Z
. Then
Y
Z
=
1 1
0 1
X
1
X
2
= AX
Then |J| = 1/|det A| = 1. Then
g(y, z) = f (y z, z)
So
g
Y
(y) =
Z
−∞
f(y z, z) dz =
Z
−∞
f(z, y z) dz.
If X
1
and X
2
are independent, f(x
1
, x
2
) = f
1
(x
1
)f
2
(x
2
). Then
g(y) =
Z
−∞
f
1
(z)f
2
(y z) dz.
Non-injective transformations
We previously discussed transformation of random variables by injective maps.
What if the mapping is not? There is no simple formula for that, and we have
to work out each case individually.
Example. Suppose X has pdf f. What is the pdf of Y = |X|?
We use our definition. We have
P(|X| (a, b)) =
Z
b
a
f(x) +
Z
a
b
f(x) dx =
Z
b
a
(f(x) + f(x)) dx.
So
f
Y
(x) = f(x) + f(x),
which makes sense, since getting
|X|
=
x
is equivalent to getting
X
=
x
or
X = x.
Example. Suppose
X
1
E
(
λ
)
, X
2
E
(
µ
) are independent random variables.
Let Y = min(X
1
, X
2
). Then
P(Y t) = P(X
1
t, X
2
t)
= P(X
1
t)P(X
2
t)
= e
λt
e
µt
= e
(λ+µ)t
.
So Y E(λ + µ).
Given random variables, not only can we ask for the minimum of the variables,
but also ask for, say, the second-smallest one. In general, we define the order
statistics as follows:
Definition (Order statistics). Suppose that
X
1
, ··· , X
n
are some random
variables, and
Y
1
, ··· , Y
n
is
X
1
, ··· , X
n
arranged in increasing order, i.e.
Y
1
Y
2
··· Y
n
. This is the order statistics.
We sometimes write Y
i
= X
(i)
.
Assume the X
i
are iid with cdf F and pdf f . Then the cdf of Y
n
is
P(Y
n
y) = P(X
1
y, ··· , X
n
y) = P(X
1
y) ···P(X
n
y) = F (y)
n
.
So the pdf of Y
n
is
d
dy
F (y)
n
= nf(y)F (y)
n1
.
Also,
P(Y
1
y) = P(X
1
y, ··· , X
n
y) = (1 F (y))
n
.
What about the joint distribution of Y
1
, Y
n
?
G(y
1
, y
n
) = P(Y
1
y
1
, Y
n
y
n
)
= P(Y
n
y
n
) P(Y
1
y
1
, Y
n
y
n
)
= F (y
n
)
n
(F (y
n
) F (y
1
))
n
.
Then the pdf is
2
y
1
y
n
G(y
1
, y
n
) = n(n 1)(F (y
n
) F (y
1
))
n2
f(y
1
)f(y
n
).
We can think about this result in terms of the multinomial distribution. By defi-
nition, the probability that
Y
1
[
y
1
, y
1
+
δ
) and
Y
n
(
y
n
δ, y
n
] is approximately
g(y
1
, y
n
).
Suppose that
δ
is sufficiently small that all other
n
2
X
i
’s are very unlikely
to fall into [
y
1
, y
1
+
δ
) and (
y
n
δ, y
n
]. Then to find the probability required,
we can treat the sample space as three bins. We want exactly one
X
i
to fall
into the first and last bins, and
n
2
X
i
’s to fall into the middle one. There are
n
1,n2,1
= n(n 1) ways of doing so.
The probability of each thing falling into the middle bin is
F
(
y
n
)
F
(
y
1
),
and the probabilities of falling into the first and last bins are
f
(
y
1
)
δ
and
f
(
y
n
)
δ
.
Then the probability of Y
1
[y
1
, y
1
+ δ) and Y
n
(y
n
δ, y
n
] is
n(n 1)(F (y
n
) F (y
1
))
n2
f(y
1
)f(y
n
)δ
2
,
and the result follows.
We can also find the joint distribution of the order statistics, say
g
, since it
is just given by
g(y
1
, ···y
n
) = n!f(y
1
) ···f (y
n
),
if
y
1
y
2
··· y
n
, 0 otherwise. We have this formula because there are
n
!
combinations of
x
1
, ··· , x
n
that produces a given order statistics
y
1
, ··· , y
n
, and
the pdf of each combination is f(y
1
) ···f (y
n
).
In the case of iid exponential variables, we find a nice distribution for the
order statistic.
Example. Let
X
1
, ··· , X
n
be iid
E
(
λ
), and
Y
1
, ··· , Y
n
be the order statistic.
Let
Z
1
= Y
1
Z
2
= Y
2
Y
1
.
.
.
Z
n
= Y
n
Y
n1
.
These are the distances between the occurrences. We can write this as a Z =
A
Y,
with
A =
1 0 0 ··· 0
1 1 0 ··· 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 ··· 1
Then
det
(
A
) = 1 and hence
|J|
= 1. Suppose that the pdf of
Z
1
, ··· , Z
n
is, say
h. Then
h(z
1
, ··· , z
n
) = g(y
1
, ··· , y
n
) · 1
= n!f(y
1
) ···f (y
n
)
= n!λ
n
e
λ(y
1
+···+y
n
)
= n!λ
n
e
λ(nz
1
+(n1)z
2
+···+z
n
)
=
n
Y
i=1
(λi)e
(λi)z
n+1i
Since h is expressed as a product of n density functions, we have
Z
i
E((n + 1 i)λ).
with all Z
i
independent.