3Integration

II Probability and Measure

3.1 Definition and basic properties

We are now going to work towards defining the integral of a measurable function

on a measure space (

E, E, µ

). Different sources use different notations for the

integral. The following notations are all commonly used:

µ(f) =

Z

E

f dµ =

Z

E

f(x) dµ(x) =

Z

E

f(x)µ(dx).

In the case where (E, E, µ) = (R, B, Lebesgue), people often just write this as

µ(f) =

Z

R

f(x) dx.

On the other hand, if (

E, E, µ

) = (Ω

, F, P

) is a probability space, and

X

is a

random variable, then people write the integral as E[X], the expectation of X.

So how are we going to define the integral? There are two steps to defining

the integral. The idea is that we first define the integral on simple functions,

and then extend the definition to more general measurable functions by taking

the limit. When we do the definition for simple functions, it will be obvious that

the definition satisfies the nice properties, and we will have to check that they

are preserved when we take the limit.

Definition

(Simple function)

.

A simple function is a measurable function that

can be written as a finite non-negative linear combination of indicator functions

of measurable sets, i.e.

f =

n

X

k=1

a

k

1

A

k

for some A

k

∈ E and a

k

≥ 0.

Note that some sources do not assume that

a

k

≥

0, but assuming this makes

our life easier.

It is obvious that

Proposition.

A function is simple iff it is measurable, non-negative, and takes

on only finitely-many values.

Definition (Integral of simple function). The integral of a simple function

f =

n

X

k=1

a

k

1

A

k

is given by

µ(f) =

n

X

k=1

a

k

µ(A

k

).

Note that it can be that

µ

(

A

k

) =

∞

, but

a

k

= 0. When this happens, we

are just going to declare that 0

· ∞

= 0 (this makes sense because this means

we are ignoring all 0

· 1

A

terms for any

A

). After we do this, we can check the

integral is well-defined.

We are now going to extend this definition to non-negative measurable

functions by a limiting procedure. Once we’ve done this, we are going to extend

the definition to measurable functions by linearity of the integral. Then we

would have a definition of the integral, and we are going to deduce properties of

the integral using approximation.

Definition (Integral). Let f be a non-negative measurable function. We set

µ(f) = sup{µ(g) : g ≤ f, g is simple}.

For arbitrary f, we write

f = f

+

− f

−

= (f ∨ 0) + (f ∧ 0).

We put |f| = f

+

+ f

−

. We say f is integrable if µ(|f|) < ∞. In this case, set

µ(f) = µ(f

+

) −µ(f

−

).

If only one of

µ

(

f

+

)

, µ

(

f

−

)

< ∞

, then we can still make the above definition,

and the result will be infinite.

In the case where we are integrating over (a subset of) the reals, we call it

the Lebesgue integral.

Proposition.

Let

f

: [0

,

1]

→ R

be Riemann integrable. Then it is also Lebesgue

integrable, and the two integrals agree.

We will not prove this, but this immediately gives us results like the funda-

mental theorem of calculus, and also helps us to actually compute the integral.

However, note that this does not hold for infinite domains, as you will see in the

second example sheet.

But the Lebesgue integrable functions are better. A lot of functions are

Lebesgue integrable but not Riemann integrable.

Example. Take the standard non-Riemann integrable function

f = 1

[0,1]\Q

.

Then f is not Riemann integrable, but it is Lebesgue integrable, since

µ(f) = µ([0, 1] \Q) = 1.

We are now going to study some basic properties of the integral. We will first

look at the properties of integrals of simple functions, and then extend them to

general integrable functions.

For f, g simple, and α, β ≥ 0, we have that

µ(αf + βg) = αµ(f) + βµ(g).

So the integral is linear.

Another important property is monotonicity — if f ≤ g, then µ(f) ≤ µ(g).

Finally, we have

f

= 0 a.e. iff

µ

(

f

) = 0. It is absolutely crucial here that we

are talking about non-negative functions.

Our goal is to show that these three properties are also satisfied for arbitrary

non-negative measurable functions, and the first two hold for integrable functions.

In order to achieve this, we prove a very important tool — the monotone

convergence theorem. Later, we will also learn about the dominated convergence

theorem and Fatou’s lemma. These are the main and very important results

about exchanging limits and integration.

Theorem

(Monotone convergence theorem)

.

Suppose that (

f

n

)

, f

are non-

negative measurable with f

n

% f. Then µ(f

n

) % µ(f).

In the proof we will use the fact that the integral is monotonic, which we

shall prove later.

Proof.

We will split the proof into five steps. We will prove each of the following

in turn:

(i) If f

n

and f are indicator functions, then the theorem holds.

(ii) If f is an indicator function, then the theorem holds.

(iii) If f is simple, then the theorem holds.

(iv) If f is non-negative measurable, then the theorem holds.

Each part follows rather straightforwardly from the previous one, and the reader

is encouraged to try to prove it themself.

We first consider the case where

f

n

=

1

A

n

and

f

=

1

A

. Then

f

n

% f

is true

iff A

n

% A. On the other hand, µ(f

n

) % µ(f) iff µ(A

n

) % µ(A).

For convenience, we let A

0

= ∅. We can write

µ(A) = µ

[

n

A

n

\ A

n−1

!

=

∞

X

n=1

µ(A

n

\ A

n−1

)

= lim

N→∞

N

X

n=1

µ(A

n

\ A

n−1

)

= lim

N→∞

µ(A

N

).

So done.

We next consider the case where f = 1

A

for some A. Fix ε > 0, and set

A

n

= {f

n

> 1 −ε} ∈ E.

Then we know that A

n

% A, as f

n

% f. Moreover, by definition, we have

(1 −ε)1

A

n

≤ f

n

≤ f = 1

A

.

As A

n

% A, we have that

(1 −ε)µ(f) = (1 − ε) lim

n→∞

µ(A

n

) ≤ lim

n→∞

µ(f

n

) ≤ µ(f)

since f

n

≤ f. Since ε is arbitrary, we know that

lim

n→∞

µ(f

n

) = µ(f).

Next, we consider the case where f is simple. We write

f =

m

X

k=1

a

k

1

A

k

,

where a

k

> 0 and A

k

are pairwise disjoint. Since f

n

% f, we know

a

−1

k

f

n

1

A

k

% 1

A

k

.

So we have

µ(f

n

) =

m

X

k=1

µ(f

n

1

A

k

) =

m

X

k=1

a

k

µ(a

−1

k

f

n

1

A

k

) →

m

X

k=1

a

k

µ(A

k

) = µ(f).

Suppose

f

is non-negative measurable. Suppose

g ≤ f

is a simple function.

As

f

n

% f

, we know

f

n

∧g % f ∧g

=

g

. So by the previous case, we know that

µ(f

n

∧ g) → µ(g).

We also know that

µ(f

n

) ≥ µ(f

n

∧ g).

So we have

lim

n→∞

µ(f

n

) ≥ µ(g)

for all g ≤ f. This is possible only if

lim

n→∞

µ(f

n

) ≥ µ(f)

by definition of the integral. However, we also know that

µ

(

f

n

)

≤ µ

(

f

) for all

n

,

again by definition of the integral. So we must have equality. So we have

µ(f) = lim

n→∞

µ(f

n

).

Theorem. Let f, g be non-negative measurable, and α, β ≥ 0. We have that

(i) µ(αf + βg) = αµ(f) + βµ(g).

(ii) f ≤ g implies µ(f) ≤ µ(g).

(iii) f = 0 a.e. iff µ(f) = 0.

Proof.

(i) Let

f

n

= 2

−n

b2

n

fc ∧ n

g

n

= 2

−n

b2

n

gc ∧n.

Then

f

n

, g

n

are simple with

f

n

% f

and

g

n

% g

. Hence

µ

(

f

n

)

% µ

(

f

)

and

µ

(

g

n

)

% µ

(

g

) and

µ

(

αf

n

+

βg

n

)

% µ

(

αf

+

βg

), by the monotone

convergence theorem. As f

n

, g

n

are simple, we have that

µ(αf

n

+ βg

n

) = αµ(f

n

) + βµ(g

n

).

Taking the limit as n → ∞, we get

µ(αf + βg) = αµ(f) + βµ(g).

(ii)

We shall be careful not to use the monotone convergence theorem. We

have

µ(g) = sup{µ(h) : h ≤ g simple}

≥ sup{µ(h) : h ≤ f simple}

= µ(f).

(iii) Suppose f 6= 0 a.e. Let

A

n

=

x : f(x) >

1

n

.

Then

{x : f(x) 6= 0} =

[

n

A

n

.

Since the left hand set has non-negative measure, it follows that there is

some A

n

with non-negative measure. For that n, we define

h =

1

n

1

A

n

.

Then µ(f ) ≥ µ(h) > 0. So µ(f) 6= 0.

Conversely, suppose f = 0 a.e. We let

f

n

= 2

−n

b2

n

fc ∧ n

be a simple function. Then f

n

% f and f

n

= 0 a.e. So

µ(f) = lim

n→∞

µ(f

n

) = 0.

We now prove the analogous statement for general integrable functions.

Theorem. Let f, g be integrable, and α, β ≥ 0. We have that

(i) µ(αf + βg) = αµ(f) + βµ(g).

(ii) f ≤ g implies µ(f) ≤ µ(g).

(iii) f = 0 a.e. implies µ(f) = 0.

Note that in the last case, the converse is no longer true, as one can easily

see from the sign function sgn : [−1, 1] → R.

Proof.

(i) We are going to prove these by applying the previous theorem.

By definition of the integral, we have

µ

(

−f

) =

−µ

(

f

). Also, if

α ≥

0, then

µ(αf) = µ(αf

+

) −µ(αf

−

) = αµ(f

+

) −αµ(f

−

) = αµ(f).

Combining these two properties, it then follows that if

α

is a real number,

then

µ(αf) = αµ(f).

To finish the proof of (i), we have to show that

µ

(

f

+

g

) =

µ

(

f

) +

µ

(

g

).

We know that this is true for non-negative functions, so we need to employ

a little trick to make this a statement about the non-negative version. If

we let h = f + g, then we can write this as

h

+

− h

−

= (f

+

− f

−

) + (g

+

− g

−

).

We now rearrange this as

h

+

f

−

+ g

−

= f

+

+ g

+

+ h

−

.

Now everything is non-negative measurable. So applying µ gives

µ(f

+

) + µ(f

−

) + µ(g

−

) = µ(f

+

) + µ(g

+

) + µ(h

−

).

Rearranging, we obtain

µ(h

+

) −µ(h

−

) = µ(f

+

) −µ(f

−

) + µ(g

+

) −µ(g

−

).

This is exactly the same thing as saying

µ(f + g) = µ(h) = µ(f) = µ(g).

(ii)

If

f ≤ g

, then

g −f ≥

0. So

µ

(

g −f

)

≥

0. By (i), we know

µ

(

g

)

−µ

(

f

)

≥

0.

So µ(g) ≥ µ(f).

(iii)

If

f

= 0 a.e., then

f

+

, f

−

= 0 a.e. So

µ

(

f

+

) =

µ

(

f

−

) = 0. So

µ

(

f

) =

µ(f

+

) −µ(f

−

) = 0.

As mentioned, the converse to (iii) is no longer true. However, we do have

the following partial converse:

Proposition.

If

A

is a

π

-system with

E ∈ A

and

σ

(

A

) =

E

, and

f

is an

integrable function that

µ(f1

A

) = 0

for all A ∈ A. Then µ(f) = 0 a.e.

Proof. Let

D = {A ∈ E : µ(f1

A

) = 0}.

It follows immediately from the properties of the integral that

D

is a d-system.

So D = E by Dynkin’s lemma. Let

A

+

= {x ∈ E : f (x) > 0},

A

−

= {x ∈ E : f (x) < 0}.

Then A

±

∈ E, and

µ(f1

A

+

) = µ(f1

A

−

) = 0.

So f 1

A

+

and f 1

A

−

vanish a.e. So f vanishes a.e.

Proposition.

Suppose that (

g

n

) is a sequence of non-negative measurable

functions. Then we have

µ

∞

X

n=1

g

n

!

=

∞

X

n=1

µ(g

n

).

Proof. We know

N

X

n=1

g

n

!

%

∞

X

n=1

g

n

!

as N → ∞. So by the monotone convergence theorem, we have

N

X

n=1

µ(g

n

) = µ

N

X

n=1

g

n

!

% µ

∞

X

n=1

g

n

!

.

But we also know that

N

X

n=1

µ(g

n

) %

∞

X

n=1

µ(g

n

)

by definition. So we are done.

So for non-negative measurable functions, we can always switch the order of

integration and summation.

Note that we can consider summation as integration. We let

E

=

N

and

E

=

{all subsets of N}

. We let

µ

be the counting measure, so that

µ

(

A

) is the

size of

A

. Then integrability (and having a finite integral) is the same as absolute

convergence. Then if it converges, then we have

Z

f dµ =

∞

X

n=1

f(n).

So we can just view our proposition as proving that we can swap the order of

two integrals. The general statement is known as Fubini’s theorem.