5Continuous random variables
IA Probability
5.1 Continuous random variables
So far, we have only looked at the case where the outcomes Ω are discrete.
Consider an experiment where we throw a needle randomly onto the ground
and record the angle it makes with a fixed horizontal. Then our sample space is
Ω = {ω ∈ R : 0 ≤ ω < 2π}. Then we have
P(ω ∈ [0, θ]) =
θ
2π
, 0 ≤ θ ≤ 2π.
With continuous distributions, we can no longer talk about the probability of
getting a particular number, since this is always zero. For example, we will
almost never get an angle of exactly 0.42 radians.
Instead, we can only meaningfully talk about the probability of
X
falling into
a particular range. To capture the distribution of
X
, we want to define a function
f
such that for each
x
and small
δx
, the probability of
X
lying in [
x, x
+
δx
] is
given by
f
(
x
)
δx
+
o
(
δx
). This
f
is known as the probability density function.
Integrating this, we know that the probability of
X ∈
[
a, b
] is
R
b
a
f
(
x
) d
x
. We
take this as the definition of f .
Definition (Continuous random variable). A random variable
X
: Ω
→ R
is
continuous if there is a function f : R → R
≥0
such that
P(a ≤ X ≤ b) =
Z
b
a
f(x) dx.
We call f the probability density function, which satisfies
– f ≥ 0
–
R
∞
−∞
f(x) = 1.
Note that P(X = a) = 0 since it is
R
a
a
f(x) dx. Then we also have
P
[
a∈Q
[X = a]
= 0,
since it is a countable union of probability 0 events (and axiom 3 states that the
probability of the countable union is the sum of probabilities, i.e. 0).
Definition (Cumulative distribution function). The cumulative distribution
function (or simply distribution function) of a random variable
X
(discrete,
continuous, or neither) is
F (x) = P(X ≤ x).
We can see that F (x) is increasing and F (x) → 1 as x → ∞.
In the case of continuous random variables, we have
F (x) =
Z
x
−∞
f(z) dz.
Then
F
is continuous and differentiable. In general,
F
′
(
x
) =
f
(
x
) whenever
F
is differentiable.
The name of continuous random variables comes from the fact that
F
(
x
) is
(absolutely) continuous.
The cdf of a continuous random variable might look like this:
P
The cdf of a discrete random variable might look like this:
P
The cdf of a random variable that is neither discrete nor continuous might look
like this:
P
Note that we always have
P(a < x ≤ b) = F (b) − F (a).
This will be equal to
R
b
a
f(x) dx in the case of continuous random variables.
Definition (Uniform distribution). The uniform distribution on [a, b] has pdf
f(x) =
1
b −a
.
Then
F (x) =
Z
x
a
f(z) dz =
x − a
b − a
for a ≤ x ≤ b.
If X follows a uniform distribution on [a, b], we write X ∼ U[a, b].
Definition (Exponential random variable). The exponential random variable
with parameter λ has pdf
f(x) = λe
−λx
and
F (x) = 1 − e
−λx
for x ≥ 0.
We write X ∼ E(λ).
Then
P(a ≤ x ≤ b) =
Z
b
a
f(z) dz = e
−λa
− e
−λb
.
Proposition. The exponential random variable is memoryless, i.e.
P(X ≥ x + z | X ≥ x) = P(X ≥ z).
This means that, say if
X
measures the lifetime of a light bulb, knowing it has
already lasted for 3 hours does not give any information about how much longer
it will last.
Recall that the geometric random variable is the discrete memoryless random
variable.
Proof.
P(X ≥ x + z | X ≥ x) =
P(X ≥ x + z)
P(X ≥ x)
=
R
∞
x+z
f(u) du
R
∞
x
f(u) du
=
e
−λ(x+z)
e
−λx
= e
−λz
= P(X ≥ z).
Similarly, given that, you have lived for
x
days, what is the probability of
dying within the next δx days?
P(X ≤ x + δx | X ≥ x) =
P(x ≤ X ≤ x + δx)
P(X ≥ x)
=
λe
−λx
δx
e
−λx
= λδx.
So it is independent of how old you currently are, assuming your survival follows
an exponential distribution.
In general, we can define the hazard rate to be
h(x) =
f(x)
1 − F (x)
.
Then
P(x ≤ X ≤ x + δx | X ≥ x) =
P(x ≤ X ≤ x + δx)
P(X ≥ x)
=
δxf(x)
1 − F (x)
= δx · h(x).
In the case of exponential distribution, h(x) is constant and equal to λ.
Similar to discrete variables, we can define the expected value and the
variance. However, we will (obviously) have to replace the sum with an integral.
Definition (Expectation). The expectation (or mean) of a continuous random
variable is
E[X] =
Z
∞
−∞
xf(x) dx,
provided not both
R
∞
0
xf(x) dx and
R
0
−∞
xf(x) dx are infinite.
Theorem. If X is a continuous random variable, then
E[X] =
Z
∞
0
P(X ≥ x) dx −
Z
∞
0
P(X ≤ −x) dx.
This result is more intuitive in the discrete case:
∞
X
x=0
xP(X = x) =
∞
X
x=0
∞
X
y=x+1
P(X = y) =
∞
X
x=0
P(X > x),
where the first equality holds because on both sides, we have
x
copies of
P
(
X
=
x
)
in the sum.
Proof.
Z
∞
0
P(X ≥ x) dx =
Z
∞
0
Z
∞
x
f(y) dy dx
=
Z
∞
0
Z
∞
0
I[y ≥ x]f(y) dy dx
=
Z
∞
0
Z
∞
0
I[x ≤ y] dx
f(y) dy
=
Z
∞
0
yf(y) dy.
We can similarly show that
R
∞
0
P(X ≤ −x) dx = −
R
0
−∞
yf(y) dy.
Example. Suppose X ∼ E(λ). Then
P(X ≥ x) =
Z
∞
x
λe
−λt
dt = e
−λx
.
So
E[X] =
Z
∞
0
e
−λx
dx =
1
λ
.
Definition (Variance). The variance of a continuous random variable is
var(X) = E[(X − E[X])
2
] = E[X
2
] − (E[X])
2
.
So we have
var(X) =
Z
∞
−∞
x
2
f(x) dx −
Z
∞
−∞
xf(x) dx
2
.
Example. Let X ∼ U[a, b]. Then
E[X] =
Z
b
a
x
1
b − a
dx =
a + b
2
.
So
var(X) =
Z
b
a
x
2
1
b − a
dx − (E[X])
2
=
1
12
(b − a)
2
.
Apart from the mean, or expected value, we can also have other notions of
“average values”.
Definition (Mode and median). Given a pdf f(x), we call ˆx a mode if
f(ˆx) ≥ f(x)
for all
x
. Note that a distribution can have many modes. For example, in the
uniform distribution, all x are modes.
We say it is a median if
Z
ˆx
−∞
f(x) dx =
1
2
=
Z
∞
ˆx
f(x) dx.
For a discrete random variable, the median is ˆx such that
P(X ≤ ˆx) ≥
1
2
, P(X ≥ ˆx) ≥
1
2
.
Here we have a non-strict inequality since if the random variable, say, always
takes value 0, then both probabilities will be 1.
Suppose that we have an experiment whose outcome follows the distribution
of
X
. We can perform the experiment many times and obtain many results
X
1
, ··· , X
n
. The average of these results is known as the sample mean.
Definition (Sample mean). If
X
1
, ··· , X
n
is a random sample from some
distribution, then the sample mean is
¯
X =
1
n
n
X
1
X
i
.