IA Probability - Continuous random variables

5Continuous random variables

IA Probability

5.1 Continuous random variables

So far, we have only looked at the case where the outcomes Ω are discrete.

Consider an experiment where we throw a needle randomly onto the ground

and record the angle it makes with a fixed horizontal. Then our sample space is

Ω = {ω ∈ R : 0 ≤ ω < 2π}. Then we have

P(ω ∈ [0, θ]) =

2π

, 0 ≤ θ ≤ 2π.

With continuous distributions, we can no longer talk about the probability of

getting a particular number, since this is always zero. For example, we will

almost never get an angle of exactly 0.42 radians.

Instead, we can only meaningfully talk about the probability of

falling into

a particular range. To capture the distribution of

, we want to define a function

such that for each

and small

δx

, the probability of

lying in [

x, x

δx

] is

given by

(

)

δx

(

δx

). This

is known as the probability density function.

Integrating this, we know that the probability of

X ∈

[

a, b

] is

(

) d

. We

take this as the definition of f .

Definition (Continuous random variable). A random variable

: Ω

→ R

continuous if there is a function f : R → R

≥0

such that

P(a ≤ X ≤ b) =

f(x) dx.

We call f the probability density function, which satisfies

– f ≥ 0

–

∞

−∞

f(x) = 1.

Note that P(X = a) = 0 since it is

f(x) dx. Then we also have





[

a∈Q

[X = a]





= 0,

since it is a countable union of probability 0 events (and axiom 3 states that the

probability of the countable union is the sum of probabilities, i.e. 0).

Definition (Cumulative distribution function). The cumulative distribution

function (or simply distribution function) of a random variable

(discrete,

continuous, or neither) is

F (x) = P(X ≤ x).

We can see that F (x) is increasing and F (x) → 1 as x → ∞.

In the case of continuous random variables, we have

F (x) =

−∞

f(z) dz.

Then

is continuous and differentiable. In general,

′

(

) =

(

) whenever

is differentiable.

The name of continuous random variables comes from the fact that

(

) is

(absolutely) continuous.

The cdf of a continuous random variable might look like this:

The cdf of a discrete random variable might look like this:

The cdf of a random variable that is neither discrete nor continuous might look

like this:

Note that we always have

P(a < x ≤ b) = F (b) − F (a).

This will be equal to

f(x) dx in the case of continuous random variables.

Definition (Uniform distribution). The uniform distribution on [a, b] has pdf

f(x) =

b −a

Then

F (x) =

f(z) dz =

x − a

b − a

for a ≤ x ≤ b.

If X follows a uniform distribution on [a, b], we write X ∼ U[a, b].

Definition (Exponential random variable). The exponential random variable

with parameter λ has pdf

f(x) = λe

−λx

and

F (x) = 1 − e

−λx

for x ≥ 0.

We write X ∼ E(λ).

Then

P(a ≤ x ≤ b) =

f(z) dz = e

−λa

− e

−λb

Proposition. The exponential random variable is memoryless, i.e.

P(X ≥ x + z | X ≥ x) = P(X ≥ z).

This means that, say if

measures the lifetime of a light bulb, knowing it has

already lasted for 3 hours does not give any information about how much longer

it will last.

Recall that the geometric random variable is the discrete memoryless random

variable.

Proof.

P(X ≥ x + z | X ≥ x) =

P(X ≥ x + z)

P(X ≥ x)

∞

x+z

f(u) du

∞

f(u) du

−λ(x+z)

−λx

= e

−λz

= P(X ≥ z).

Similarly, given that, you have lived for

days, what is the probability of

dying within the next δx days?

P(X ≤ x + δx | X ≥ x) =

P(x ≤ X ≤ x + δx)

P(X ≥ x)

λe

−λx

δx

−λx

= λδx.

So it is independent of how old you currently are, assuming your survival follows

an exponential distribution.

In general, we can define the hazard rate to be

h(x) =

f(x)

1 − F (x)

Then

P(x ≤ X ≤ x + δx | X ≥ x) =

P(x ≤ X ≤ x + δx)

P(X ≥ x)

δxf(x)

1 − F (x)

= δx · h(x).

In the case of exponential distribution, h(x) is constant and equal to λ.

Similar to discrete variables, we can define the expected value and the

variance. However, we will (obviously) have to replace the sum with an integral.

Definition (Expectation). The expectation (or mean) of a continuous random

variable is

E[X] =

∞

−∞

xf(x) dx,

provided not both

∞

xf(x) dx and

−∞

xf(x) dx are infinite.

Theorem. If X is a continuous random variable, then

E[X] =

∞

P(X ≥ x) dx −

∞

P(X ≤ −x) dx.

This result is more intuitive in the discrete case:

∞

x=0

xP(X = x) =

∞

x=0

∞

y=x+1

P(X = y) =

∞

x=0

P(X > x),

where the first equality holds because on both sides, we have

copies of

(

)

in the sum.

Proof.

∞

P(X ≥ x) dx =

∞

f(y) dy dx

∞

I[y ≥ x]f(y) dy dx

∞



∞

I[x ≤ y] dx



f(y) dy

∞

yf(y) dy.

We can similarly show that

∞

P(X ≤ −x) dx = −

−∞

yf(y) dy.

Example. Suppose X ∼ E(λ). Then

P(X ≥ x) =

∞

λe

−λt

dt = e

−λx

E[X] =

∞

−λx

dx =

Definition (Variance). The variance of a continuous random variable is

var(X) = E[(X − E[X])

] = E[X

] − (E[X])

So we have

var(X) =

∞

−∞

f(x) dx −



∞

−∞

xf(x) dx



Example. Let X ∼ U[a, b]. Then

E[X] =

b − a

dx =

a + b

var(X) =

b − a

dx − (E[X])

(b − a)

Apart from the mean, or expected value, we can also have other notions of

“average values”.

Definition (Mode and median). Given a pdf f(x), we call ˆx a mode if

f(ˆx) ≥ f(x)

for all

. Note that a distribution can have many modes. For example, in the

uniform distribution, all x are modes.

We say it is a median if

ˆx

−∞

f(x) dx =

∞

ˆx

f(x) dx.

For a discrete random variable, the median is ˆx such that

P(X ≤ ˆx) ≥

, P(X ≥ ˆx) ≥

Here we have a non-strict inequality since if the random variable, say, always

takes value 0, then both probabilities will be 1.

Suppose that we have an experiment whose outcome follows the distribution

. We can perform the experiment many times and obtain many results

, ··· , X

. The average of these results is known as the sample mean.

Definition (Sample mean). If

, ··· , X

is a random sample from some

distribution, then the sample mean is

X =