Part IA Analysis I
Based on lectures by W. T. Gowers
Notes taken by Dexter Chua
Lent 2015
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Limits and convergence
Sequences and series in
R
and
C
. Sums, products and quotients. Absolute convergence;
absolute convergence implies convergence. The Bolzano-Weierstrass theorem and
applications (the General Principle of Convergence). Comparison and ratio tests,
alternating series test. [6]
Continuity
Continuity of real- and complex-valued functions defined on subsets of
R
and
C
. The
intermediate value theorem. A continuous function on a closed bounded interval is
bounded and attains its bounds. [3]
Differentiability
Differentiability of functions from
R
to
R
. Derivative of sums and products. The chain
rule. Derivative of the inverse function. Rolle’s theorem; the mean value theorem.
One-dimensional version of the inverse function theorem. Taylor’s theorem from
R
to
R; Lagrange’s form of the remainder. Complex differentiation. [5]
Power series
Complex power series and radius of convergence. Exp onential, trigonometric and
hyperbolic functions, and relations between them. *Direct proof of the differentiability
of a power series within its circle of convergence*. [4]
Integration
Definition and basic properties of the Riemann integral. A non-integrable function.
Integrability of monotonic functions. Integrability of piecewise-continuous functions.
The fundamental theorem of calculus. Differentiation of indefinite integrals. Integration
by parts. The integral form of the remainder in Taylor’s theorem. Improper integrals. [6]
Contents
0 Introduction
1 The real number system
2 Convergence of sequences
2.1 Definitions
2.2 Sums, products and quotients
2.3 Monotone-sequences property
2.4 Cauchy sequences
2.5 Limit supremum and infimum
3 Convergence of infinite sums
3.1 Infinite sums
3.2 Absolute convergence
3.3 Convergence tests
3.4 Complex versions
4 Continuous functions
4.1 Continuous functions
4.2 Continuous induction*
5 Differentiability
5.1 Limits
5.2 Differentiation
5.3 Differentiation theorems
5.4 Complex differentiation
6 Complex power series
6.1 Exponential and trigonometric functions
6.2 Differentiating power series
6.3 Hyperbolic trigonometric functions
7 The Riemann Integral
7.1 Riemann Integral
7.2 Improper integrals
0 Introduction
In IA Differential Equations, we studied calculus in a non-rigorous setting. While
we did define differentiation (properly) as a limit, we did not define what a limit
was. We didn’t even have a proper definition of integral, and just mumbled
something about it being an infinite sum.
In Analysis, one of our main goals is to put calculus on a rigorous foundation.
We will provide unambiguous definitions of what it means to take a limit, a
derivative and an integral. Based on these definitions, we will prove the results
we’ve had such as the product rule, quotient rule and chain rule. We will also
rigorously prove the fundamental theorem of calculus, which states that the
integral is the inverse operation to the derivative.
However, this is not all Analysis is about. We will study all sorts of limiting
(“infinite”) processes. We can see integration as an infinite sum, and differenti-
ation as dividing two infinitesimal quantities. In Analysis, we will also study
infinite series such as 1 +
1
4
+
1
9
+
1
16
+
···
, as well as limits of infinite sequences.
Another important concept in Analysis is continuous functions. In some
sense, continuous functions are functions that preserve limit processes. While
their role in this course is just being “nice” functions to work with, they will be
given great importance when we study metric and topological spaces.
This course is continued in IB Analysis II, where we study uniform convergence
(a stronger condition for convergence), calculus of multiple variables and metric
spaces.
1 The real number system
We are all familiar with real numbers they are “decimals” consisting of
infinitely many digits. When we really want to study real numbers, while this
definition is technically valid, it is not a convenient definition to work with.
Instead, we take an axiomatic approach. We define the real numbers to be a set
that satisfies certain properties, and, if we really want to, show that the decimals
satisfy these properties. In particular, we define real numbers to be an ordered
field with the least upper bound property.
We’ll now define what it means to be an “ordered field with the least upper
bound property”.
Definition (Field). A field is a set
F
with two binary operations + and
×
that
satisfies all the familiar properties satisfied by addition and multiplication in
Q
,
namely
(i)
(
a, b, c
)
a
+ (
b
+
c
) = (
a
+
b
) +
c
and
a ×
(
b ×c
) = (
a ×b
)
×c
(associativity)
(ii) (a, b) a + b = b + a and a × b = b × a (commutativity)
(iii) 0, 1 such that (a)a + 0 = a and a × 1 = a (identity)
(iv) a
(
a
) such that
a
+(
a
) = 0. If
a 6
= 0, then
a
1
such that
a×a
1
= 1.
(inverses)
(v) (a, b, c) a × (b + c) = (a × b) + (a ×c) (distributivity)
Example. Q, R, C, integers mod p, {a + b
2 : a, b Z} are all fields.
Definition (Totally ordered set). An (totally) ordered set is a set
X
with a
relation < that satisfies
(i) If x, y, z X, x < y and y < z, then x < z (transitivity)
(ii) If x, y X, exactly one of x < y, x = y, y < x holds (trichotomy)
We call it a totally ordered set, as opposed to simply an ordered set, when
we want to emphasize the order is total, i.e. every pair of elements are related to
each other, as opposed to a partial order, where “exactly one” in trichotomy is
replaced with “at most one”.
Now to define an ordered field, we don’t simply ask for a field with an order.
The order has to interact nicely with the field operations.
Definition (Ordered field). An ordered field is a field
F
with a relation
<
that
makes F into an ordered set such that
(i) if x, y, z F and x < y, then x + z < y + z
(ii) if x, y, z F, x < y and z > 0, then xz < yz
Lemma. Let F be an ordered field and x F. Then x
2
0.
Proof.
By trichotomy, either
x <
0,
x
= 0 or
x >
0. If
x
= 0, then
x
2
= 0. So
x
2
0. If
x >
0, then
x
2
>
0
× x
= 0. If
x <
0, then
x x <
0
x
. So 0
< x
.
But then x
2
= (x)
2
> 0.
Definition (Least upper bound). Let
X
be an ordered set and let
A X
. An
upper bound for
A
is an element
x X
such that (
a A
)
a x
. If
A
has an
upper bound, then we say that A is bounded above.
An upper bound
x
for
A
is a least upper bound or supremum if nothing
smaller that x is an upper bound. That is, we need
(i) (a A) a x
(ii) (y < x)(a A) a > y
We usually write
sup A
for the supremum of
A
when it exists. If
sup A A
,
then we call it max A, the maximum of A.
Example. Let
X
=
Q
. Then the supremum of (0
,
1) is 1. The set
{x
:
x
2
<
2
}
is bounded above by 2, but has no supremum (even though
2
seems like a
supremum, we are in Q and
2 is non-existent!).
max
[0
,
1] = 1 but (0
,
1) has no maximum because the supremum is not in
(0, 1).
We can think of the supremum as a point we can get arbitrarily close to in
the set but cannot pass through.
Definition (Least upper bound property). An ordered set
X
has the least upper
bound property if every non-empty subset of
X
that is bounded above has a
supremum.
Obvious modifications give rise to definitions of lower bound, greatest lower
bound (or infimum) etc. It is simple to check that an ordered field with the least
upper bound property has the greatest lower bound property.
Definition (Real numbers). The real numbers is an ordered field with the least
upper bound property.
Of course, it is very important to show that such a thing exists, or else we
will be studying nothing. It is also nice to show that such a field is unique (up
to isomorphism). However, we will not prove these in the course.
In a field, we can define the “natural numbers” to be 2 = 1 + 1, 3 = 1 + 2
etc. Then an important property of the real numbers is
Lemma (Archimedean property v1)). Let
F
be an ordered field with the least
upper bound property. Then the set {1, 2, 3, ···} is not bounded above.
Proof.
If it is bounded above, then it has a supremum
x
. But then
x
1 is not
an upper bound. So we can find
n {
1
,
2
,
3
, ···}
such that
n > x
1. But then
n + 1 > x, but x is supposed to be an upper bound.
Is the least upper bound property required to prove the Archimedean prop-
erty? It seems like any ordered field should satisfy this even if they do not have
the least upper bound property. However, it turns out there are ordered fields
in which the integers are bounded above.
Consider the field of rational functions, i.e. functions in the form
P (x)
Q(x)
with
P
(
x
)
, Q
(
x
) being polynomials, under the usual addition and multiplication. We
order two functions
P (x)
Q(x)
,
R(x)
S(x)
as follows: these two functions intersect only
finitely many times because
P
(
x
)
S
(
x
) =
R
(
x
)
Q
(
x
) has only finitely many roots.
After the last intersection, the function whose value is greater counts as the
greater function. It can be checked that these form an ordered field.
In this field, the integers are the constant functions 1
,
2
,
3
, ···
, but it is
bounded above since the function x is greater than all of them.
2 Convergence of sequences
Having defined real numbers, the first thing we will study is sequences. We
will want to study what it means for a sequence to converge. Intuitively, we
would like to say that 1
,
1
2
,
1
3
,
1
4
, ···
converges to 0, while 1
,
2
,
3
,
4
, ···
diverges.
However, the actual formal definition of convergence is rather hard to get right,
and historically there have been failed attempts that produced spurious results.
2.1 Definitions
Definition (Sequence). A sequence is, formally, a function
a
:
N R
(or
C
).
Usually (i.e. always), we write
a
n
instead of
a
(
n
). Instead of
a
, we usually write
it as (a
n
), (a
n
)
1
or (a
n
)
n=1
to indicate it is a sequence.
Definition (Convergence of sequence). Let (
a
n
) be a sequence and
` R
. Then
a
n
converges to
`
, tends to
`
, or
a
n
`
, if for all
ε >
0, there is some
N N
such that whenever n > N, we have |a
n
`| < ε. In symbols, this says
(ε > 0)(N )(n N) |a
n
`| < ε.
We say ` is the limit of (a
n
).
One can think of (N)(n N) as saying “eventually always”, or as “from
some point on”. So the definition means, if
a
n
`
, then given any
ε
, there
eventually, everything in the sequence is within ε of `.
We’ll now provide an alternative form of the Archimedean property. This is
the form that is actually useful.
Lemma (Archimedean property v2). 1/n 0.
Proof.
Let
ε >
0. We want to find an
N
such that
|
1
/N
0
|
= 1
/N < ε
. So pick
N
such that
N >
1
. There exists such an
N
by the Archimedean property v1.
Then for all n N, we have 0 < 1/n 1/N < ε. So |1/n 0| < ε.
Note that the red parts correspond to the definition of convergence of a
sequence. This is generally how we prove convergence from first principles.
Definition (Bounded sequence). A sequence (a
n
) is bounded if
(C)(n) |a
n
| C.
A sequence is eventually bounded if
(C)(N)(n N) |a
n
| C.
The definition of an eventually bounded sequence seems a bit daft. Clearly
every eventually bounded sequence is bounded! Indeed it is:
Lemma. Every eventually bounded sequence is bounded.
Proof.
Let
C
and
N
be such that (
n N
)
|a
n
| C
. Then
n N
,
|a
n
|
max{|a
1
|, ··· , |a
N1
|, C}.
The proof is rather trivial. However, most of the time it is simpler to prove
that a sequence is eventually bounded, and this lemma saves us from writing
that long line every time.
2.2 Sums, products and quotients
Here we prove the things that we think are obviously true, e.g. sums and products
of convergent sequences are convergent.
Lemma (Sums of sequences). If a
n
a and b
n
b, then a
n
+ b
n
a + b.
Proof.
Let
ε >
0. We want to find a clever
N
such that for all
n N
,
|a
n
+
b
n
(
a
+
b
)
| < ε
. Intuitively, we know that
a
n
is very close to
a
and
b
n
is
very close to b. So their sum must be very close to a + b.
Formally, since
a
n
a
and
b
n
b
, we can find
N
1
, N
2
such that
n N
1
,
|a
n
a| < ε/2 and n N
2
, |b
n
b| < ε/2.
Now let N = max{N
1
, N
2
}. Then by the triangle inequality, when n N ,
|(a
n
+ b
n
) (a + b)| |a
n
a| + |b
n
b| < ε.
We want to prove that the product of convergent sequences is convergent.
However, we will not do it in one go. Instead, we separate it into many smaller
parts.
Lemma (Scalar multiplication of sequences). Let
a
n
a
and
λ R
. Then
λa
n
λa.
Proof. If λ = 0, then the result is trivial.
Otherwise, let
ε >
0. Then
N
such that
n N
,
|a
n
a| < ε/|λ|
. So
|λa
n
λa| < ε.
Lemma. Let (a
n
) be bounded and b
n
0. Then a
n
b
n
0.
Proof.
Let
C 6
= 0 be such that (
n
)
|a
n
| C
. Let
ε >
0. Then
N
such that
(n N ) |b
n
| < ε/C. Then |a
n
b
n
| < ε.
Lemma. Every convergent sequence is bounded.
Proof.
Let
a
n
l
. Then there is an
N
such that
n N
,
|a
n
l|
1. So
|a
n
| |l| + 1. So a
n
is eventually bounded, and therefore bounded.
Lemma (Product of sequences). Let a
n
a and b
n
b. Then a
n
b
n
ab.
Proof. Let a
n
= a + ε
n
. Then a
n
b
n
= (a + ε
n
)b
n
= ab
n
+ ε
n
b
n
.
Since
b
n
b
,
ab
n
ab
. Since
ε
n
0 and
b
n
is bounded,
ε
n
b
n
0. So
a
n
b
n
ab.
Proof.
(alternative) Observe that
a
n
b
n
ab
= (
a
n
a
)
b
n
+(
b
n
b
)
a
. We know that
a
n
a
0 and
b
n
b
0. Since (
b
n
) is bounded, so (
a
n
a
)
b
n
+ (
b
n
b
)
a
0.
So a
n
b
n
ab.
Note that in this proof, we no longer write “Let
ε >
0”. In the beginning, we
have no lemmas proven. So we must prove everything from first principles and
use the definition. However, after we have proven the lemmas, we can simply
use them instead of using first principles. This is similar to in calculus, where
we use first principles to prove the product rule and chain rule, then no longer
use first principles afterwards.
Lemma (Quotient of sequences). Let (
a
n
) be a sequence such that (
n
)
a
n
6
= 0.
Suppose that a
n
a and a 6= 0. Then 1/a
n
1/a.
Proof. We have
1
a
n
1
a
=
a a
n
aa
n
.
We want to show that this
0. Since
a a
n
0, we have to show that 1
/
(
aa
n
)
is bounded.
Since
a
n
a
,
N
such that
n N
,
|a
n
a| a/
2. Then
n N
,
|a
n
|
|a|/
2. Then
|
1
/
(
a
n
a
)
|
2
/|a|
2
. So 1
/
(
a
n
a
) is bounded. So (
a a
n
)
/
(
aa
n
)
0
and the result follows.
Corollary. If a
n
a, b
n
b, b
n
, b 6= 0, then a
n
/b
n
= a/b.
Proof.
We know that 1
/b
n
1
/b
. So the result follows by the product rule.
Lemma (Sandwich rule). Let (
a
n
) and (
b
n
) be sequences that both converge to
a limit x. Suppose that a
n
c
n
b
n
for every n. Then c
n
x.
Proof.
Let
ε >
0. We can find
N
such that
n N
,
|a
n
x| < ε
and
|b
n
x| < ε
.
Then n N, we have x ε < a
n
c
n
b
n
< x + ε. So |c
n
x| < ε.
Example. 1
/
2
n
0. For every
n
,
n <
2
n
. So 0
<
1
/
2
n
<
1
/n
. The result
follows from the sandwich rule.
Example. We want to show that
n
2
+ 3
(n + 5)(2n 1)
1
2
.
We can obtain this by
n
2
+ 3
(n + 5)(2n 1)
=
1 + 3/n
2
(1 + 5/n)(2 1/n)
1
2
,
by sum rule, sandwich rule, Archimedean property, product rule and quotient
rule.
Example. Let k N and let δ > 0. Then
n
k
(1 + δ)
n
0.
This can be summarized as “exponential growth beats polynomial growth even-
tually”.
By the binomial theorem,
(1 + δ)
n
n
k + 1
δ
k+1
.
Also, for n 2k,
n
k + 1
=
n(n 1) ···(n k)
(k + 1)!
(n/2)
k+1
(k + 1)!
.
So for sufficiently large n,
n
k
(1 + δ)
n
n
k
2
k+1
(k + 1)!
n
k+1
δ
k+1
=
2
k+1
(k + 1)!
δ
k+1
·
1
n
0.
2.3 Monotone-sequences property
Recall that we characterized the least upper bound property. It turns out that
there is an alternative characterization of real number using sequences, known
as the monotone-sequences property. In this section, we will show that the two
characterizations are equivalent, and use the monotone-sequences property to
deduce some useful results.
Definition (Monotone sequence). A sequence (
a
n
) is increasing if
a
n
a
n+1
for all n.
It is strictly increasing if
a
n
< a
n+1
for all
n
. (Strictly) decreasing sequences
are defined analogously.
A sequence is (strictly) monotone if it is (strictly) increasing or (strictly)
decreasing.
Definition (Monotone-sequences property). An ordered field has the monotone
sequences property if every increasing sequence that is bounded above converges.
We want to show that the monotone sequences property is equivalent to the
least upper bound property.
Lemma. Least upper bound property monotone-sequences property.
Proof.
Let (
a
n
) be an increasing sequence and let
C
an upper bound for (
a
n
).
Then
C
is an upper bound for the set
{a
n
:
n N}
. By the least upper bound
property, it has a supremum s. We want to show that this is the limit of (a
n
).
Let
ε >
0. Since
s
=
sup{a
n
:
n N}
, there exists an
N
such that
a
N
> s ε
.
Then since (
a
n
) is increasing,
n N
, we have
s ε < a
N
a
n
s
. So
|a
n
s| < ε.
We first prove a handy lemma.
Lemma. Let (
a
n
) be a sequence and suppose that
a
n
a
. If (
n
)
a
n
x
, then
a x.
Proof.
If
a > x
, then set
ε
=
a x
. Then we can find
N
such that
a
N
> x
.
Contradiction.
Before showing the other way implication, we will need the following:
Lemma. Monotone-sequences property Archimedean property.
Proof. We prove version 2, i.e. that 1/n 0.
Since 1
/n >
0 and is decreasing, by MSP, it converges. Let
δ
be the limit.
By the previous lemma, we must have δ 0.
If
δ >
0, then we can find
N
such that 1
/N <
2
δ
. But then for all
n
4
N
,
we have 1/n 1/(4N) < δ/2. Contradiction. Therefore δ = 0.
Lemma. Monotone-sequences property least upper bound property.
Proof.
Let
A
be a non-empty set that’s bounded above. Pick
u
0
, v
0
such that
u
0
is not an upper bound for
A
and
v
0
is an upper bound. Now do a repeated
bisection: having chosen
u
n
and
v
n
such that
u
n
is not an upper bound and
v
n
is, if (
u
n
+
v
n
)
/
2 is an upper bound, then let
u
n+1
=
u
n
,
v
n+1
= (
u
n
+
v
n
)
/
2.
Otherwise, let u
n+1
= (u
n
+ v
n
)/2, v
n+1
= v
n
.
Then u
0
u
1
u
2
··· and v
0
v
1
v
2
···. We also have
v
n
u
n
=
v
0
u
0
2
n
0.
By the monotone sequences property,
u
n
s
(since (
u
n
) is bounded above by
v
0
). Since v
n
u
n
0, v
n
s. We now show that s = sup A.
If
s
is not an upper bound, then there exists
a A
such that
a > s
. Since
v
n
s
, then there exists
m
such that
v
m
< a
, contradicting the fact that
v
m
is
an upper bound.
To show it is the least upper bound, let
t < s
. Then since
u
n
s
, we can
find
m
such that
u
m
> t
. So
t
is not an upper bound. Therefore
s
is the least
upper bound.
Why do we need to prove the Archimedean property first? In the proof
above, we secretly used the it. When showing that
v
n
u
n
0, we required the
fact that
1
2
n
0. To prove this, we sandwiched it with
1
n
. But to show
1
n
0,
we need the Archimedean property.
Lemma. A sequence can have at most 1 limit.
Proof.
Let (
a
n
) be a sequence, and suppose
a
n
x
and
a
n
y
. Let
ε >
0
and pick
N
such that
n N
,
|a
n
x| < ε/
2 and
|a
n
y| < ε/
2. Then
|x y| |x a
N
|
+
|a
N
y| < ε/
2 +
ε/
2 =
ε
. Since
ε
was arbitrary,
x
must
equal y.
Lemma (Nested intervals property). Let
F
be an ordered field with the monotone
sequences property. Let
I
1
I
2
···
be closed bounded non-empty intervals.
Then
T
n=1
I
n
6= .
Proof.
Let
T
n
= [
a
n
, b
n
] for each
n
. Then
a
1
a
2
···
and
b
1
b
2
···
.
For each
n
,
a
n
b
n
b
1
. So the sequence
a
n
is bounded above. So by the
monotone sequences property, it has a limit
a
. For each
n
, we must have
a
n
a
.
Otherwise, say
a
n
> a
. Then for all
m n
, we have
a
m
a
n
> a
. This implies
that a > a, which is nonsense.
Also, for each fixed
n
, we have that
m n
,
a
m
b
m
b
n
. So
a b
n
.
Thus, for all n, a
n
a b
n
. So a I
n
. So a
T
n=1
I
n
.
We can use this to prove that the reals are uncountable:
Proposition. R is uncountable.
Proof.
Suppose the contrary. Let
x
1
, x
2
, ···
be a list of all real numbers. Find
an interval that does not contain
x
1
. Within that interval, find an interval that
does not contain
x
2
. Continue ad infinitum. Then the intersection of all these
intervals is non-empty, but the elements in the intersection are not in the list.
Contradiction.
A powerful consequence of this is the Bolzano-Weierstrass theorem. This is
formulated in terms of subsequences:
Definition (Subsequence). Let (
a
n
) be a sequence. A subsequence of (
a
n
) is a
sequence of the form a
n
1
, a
n
2
, ···, where n
1
< n
2
< ···.
Example. 1, 1/4, 1/9, 1/16, ··· is a subsequence of 1, 1/2, 1/3, ···.
Theorem (Bolzano-Weierstrass theorem). Let
F
be an ordered field with the
monotone sequences property (i.e. F = R).
Then every bounded sequence has a convergent subsequence.
Proof.
Let
u
0
and
v
0
be a lower and upper bound, respectively, for a sequence
(
a
n
). By repeated bisection, we can find a sequence of intervals [
u
0
, v
0
]
[
u
1
, v
1
]
[
u
2
, v
2
]
···
such that
v
n
u
n
= (
v
0
u
0
)
/
2
n
, and such that each
[u
n
, v
n
] contains infinitely many terms of (a
n
).
By the nested intervals property,
T
n=1
[
u
n
, v
n
]
6
=
. Let
x
belong to the
intersection. Now pick a subsequence
a
n
1
, a
n
2
, ···
such that
a
n
k
[
u
k
, v
k
]. We
can do this because [
u
k
, v
k
] contains infinitely many
a
n
, and we have only picked
finitely many of them. We will show that a
n
k
x.
Let
ε >
0. By the Archimedean property, we can find
K
such that
v
K
u
K
=
(
v
0
u
0
)
/
2
K
ε
. This implies that [
u
K
, v
K
]
(
x ε, x
+
ε
), since
x
[
u
K
, v
K
].
Then
k K
,
a
n
k
[
u
k
, v
k
]
[
u
K
, v
K
]
(
xε, x
+
ε
). So
|a
n
k
x| < ε
.
2.4 Cauchy sequences
The third characterization of real numbers is in terms of Cauchy sequences.
Cauchy convergence is an alternative way of defining convergent sequences
without needing to mention the actual limit of the sequence. This allows us to
say
{
3
,
3
.
1
,
3
.
14
,
3
.
141
,
3
.
1415
, ···}
is Cauchy convergent in
Q
even though the
limit π is not in Q.
Definition (Cauchy sequence). A sequence (
a
n
) is Cauchy if for all
ε
, there is
some
N N
such that whenever
p, q N
, we have
|a
p
a
q
| < ε
. In symbols,
we have
(ε > 0)(N )(p, q N) |a
p
a
q
| < ε.
Roughly, a sequence is Cauchy if all terms are eventually close to each other
(as opposed to close to a limit).
Lemma. Every convergent sequence is Cauchy.
Proof.
Let
a
n
a
. Let
ε >
0. Then
N
such that
n N
,
|a
n
a| < ε/
2.
Then p, q N, |a
p
a
q
| |a
p
a| + |a a
q
| < ε/2 + ε/2 = ε.
Lemma. Let (
a
n
) be a Cauchy sequence with a subsequence (
a
n
k
) that converges
to a. Then a
n
a.
Proof.
Let
ε >
0. Pick
N
such that
p, q N
,
|a
p
a
q
| < ε/
2. Then pick
K
such that n
K
N and |a
n
K
a| < ε/2.
Then n N, we have
|a
n
a| |a
n
a
n
K
| + |a
n
K
a| <
ε
2
+
ε
2
= ε.
An important result we have is that in
R
, Cauchy convergence and regular
convergence are equivalent.
Theorem (The general principle of convergence). Let
F
be an ordered field with
the monotone-sequence property. Then every Cauchy sequence of F converges.
Proof.
Let (
a
n
) be a Cauchy sequence. Then it is eventually bounded, since
N
,
n N
,
|a
n
a
N
|
1 by the Cauchy condition. So it is bounded. Hence by
Bolzano-Weierstrass, it has a convergent subsequence. Then (
a
n
) converges to
the same limit.
Definition (Complete ordered field). An ordered field in which every Cauchy
sequence converges is called complete.
Hence we say that R is a complete ordered field.
However, not every complete ordered field is (isomorphic to)
R
. For example,
we can take the rational functions as before, then take the Cauchy completion
of it (i.e. add all the limits we need). Then it is already too large to be the reals
(it still doesn’t have the Archimedean property) but is a complete ordered field.
To show that completeness implies the monotone-sequences property, we
need an additional condition: the Archimedean property.
Lemma. Let
F
be an ordered field with the Archimedean property such that
every Cauchy sequence converges. The
F
satisfies the monotone-sequences
property.
Proof.
Instead of showing that every bounded monotone sequence converges, and
is hence Cauchy, We will show the equivalent statement that every increasing
non-Cauchy sequence is not bounded above.
Let (a
n
) be an increasing sequence. If (a
n
) is not Cauchy, then
(ε > 0)(N )(p, q > N) |a
p
a
q
| ε.
wlog let p > q. Then
a
p
a
q
+ ε a
N
+ ε.
So for any N, we can find a p > N such that
a
p
a
N
+ ε.
Then we can construct a subsequence a
n
1
, a
n
2
, ··· such that
a
n
k+1
a
n
k
+ ε.
Therefore
a
n
k
a
n
1
+ (k 1)ε.
So by the Archimedean property, (a
n
k
), and hence (a
n
), is unbounded.
Note that the definition of a convergent sequence is
(l)(ε > 0)(N)(n N) |a
n
l| < ε,
while that of Cauchy convergence is
(ε > 0)(N )(p, q N) |a
p
a
q
| < ε.
In the first definition,
l
quantifies over all real numbers, which is uncountable.
However, in the second definition, we only have to quantify over natural numbers,
which is countable (by the Archimedean property, we only have to consider the
cases ε = 1/n).
Since they are equivalent in
R
, the second definition is sometimes preferred
when we care about logical simplicity.
2.5 Limit supremum and infimum
Here we will define the limit supremum and infimum. While these are technically
not part of the course, eventually some lecturers will magically assume students
know this definition. So we might as well learn it here.
Definition (Limit supremum/infimum). Let (
a
n
) be a bounded sequence. We
define the limit supremum as
lim sup
n→∞
a
n
= lim
n→∞
sup
mn
a
m
.
To see that this exists, set
b
n
=
sup
mn
a
m
. Then (
b
n
) is decreasing since we
are taking the supremum of fewer and fewer things, and is bounded below by
any lower bound for (a
n
) since b
n
a
n
. So it converges.
Similarly, we define the limit infimum as
lim inf
n→∞
a
n
= lim
n→∞
inf
mn
a
m
.
Example. Take the sequence
2, 1,
3
2
,
1
2
,
4
3
,
1
3
, ···
Then the limit supremum is 1 and the limit infimum is 0.
Lemma. Let (
a
n
) be a sequence. The following two statements are equivalent:
a
n
a
lim sup a
n
= lim inf a
n
= a.
Proof. If a
n
a, then let ε > 0. Then we can find an n such that
a ε a
m
a + ε for all m n
It follows that
a ε inf
mn
a
m
sup
mn
a
m
a + ε.
Since ε was arbitrary, it follows that
lim inf a
n
= lim sup a
n
= a.
Conversely, if
lim inf a
n
=
lim sup a
n
=
a
, then let
ε >
0. Then we can find
n
such that
inf
mn
a
m
> a ε and sup
mn
a
m
< a + ε.
It follows that m n, we have |a
m
a| < ε.
3 Convergence of infinite sums
In this chapter, we investigate which infinite sums, as opposed to sequences,
converge. We would like to say 1 +
1
2
+
1
4
+
1
8
+
···
= 2, while 1 + 1 + 1 + 1 +
···
does not converge. The majority of the chapter is coming up with different tests
to figure out if infinite sums converge.
3.1 Infinite sums
Definition (Convergence of infinite sums and partial sums). Let (
a
n
) be a real
sequence. For each N, define
S
N
=
N
X
n=1
a
n
.
If the sequence (S
N
) converges to some limit s, then we say that
X
n=1
a
n
= s,
and we say that the series
X
n=1
a
n
converges.
We call S
N
the N th partial sum.
There is an immediate necessary condition for a series to converge.
Lemma. If
X
n=1
a
n
converges. Then a
n
0.
Proof.
Let
X
n=1
a
n
=
s
. Then
S
n
s
and
S
n1
s
. Then
a
n
=
S
n
S
n1
0.
However, the converse is false!
Example (Harmonic series). If a
n
= 1/n, then a
n
0 but
P
a
n
= .
We can prove this as follows:
S
2
n
S
2
n1
=
1
2
n1
+ 1
+ ··· +
1
2
n
2
n1
2
n
=
1
2
.
Therefore S
2
n
S
1
+ n/2. So the partial sums are unbounded.
Example (Geometric series). Let |ρ| < 1. Then
X
n=0
ρ
n
=
1
1 ρ
.
We can prove this by considering the partial sums:
N
X
n=0
ρ
n
=
1 ρ
N+1
1 ρ
.
Since ρ
N+1
0, this tends to 1/(1 ρ).
Example.
X
n=2
1
n(n 1)
converges. This is since
1
n(n 1)
=
1
n 1
1
n
.
So
N
X
n=2
1
n(n 1)
= 1
1
N
1.
Lemma. Suppose that
a
n
0 for every
n
and the partial sums
S
n
are bounded
above. Then
P
n=1
a
n
converges.
Proof.
The sequence (
S
n
) is increasing and bounded above. So the result follows
form the monotone sequences property.
The simplest convergence test we have is the comparison test. Roughly
speaking, it says that if 0
a
n
b
n
for all
n
and
P
b
n
converges, then
P
a
n
converges. However, we will prove a much more general form here for convenience.
Lemma (Comparison test). Let (
a
n
) and (
b
n
) be non-negative sequences, and
suppose that
C, N
such that
n N
,
a
n
Cb
n
. Then if
P
b
n
converges, then
so does
P
a
n
.
Proof.
Let
M > N
. Also for each
R
, let
S
R
=
P
R
n=1
a
n
and
T
R
=
P
R
n=1
b
n
. We
want S
R
to be bounded above.
S
M
S
N
=
M
X
n=N+1
a
n
C
M
X
n=N+1
b
n
C
X
n=N+1
b
n
.
So
M N
,
S
M
S
n
+
C
P
n=N+1
b
n
. Since the
S
M
are increasing and
bounded, it must converge.
Example.
(i)
P
1
n2
n
converges, since
P
1
2
n
converges.
(ii)
P
n
2
n
converges.
If
n
4, then
n
2
n/2
. That’s because 4 = 2
4/2
and for
n
4,
(
n
+ 1)
/n <
2
, so when we increase
n
, we multiply the right side by a
greater number by the left. Hence by the comparison test, it is sufficient
to show that
P
2
n/2
/
2
n
=
P
2
n/2
converges, which it does (geometric
series).
(iii)
P
1
n
diverges, since
1
n
1
n
. So if it converged, then so would
P
1
n
, but
P
1
n
diverges.
(iv)
P
1
n
2
converges, since for
n
2,
1
n
2
1
n(n1)
, and we have proven that
the latter converges.
(v) Consider
X
n=1
n + 5
n
3
7n
2
/2
. We show this converges by noting
n
3
7n
2
2
= n
2
n
7
2
.
So if n 8, then
n
3
7n
2
2
n
3
2
.
Also, n + 5 2n. So
n + 5
n
3
7n
2
/2
4/n
2
.
So it converges by the comparison test.
(vi) If α > 1, then
P
1/n
α
converges.
Let S
N
=
P
N
n=1
1/n
α
. Then
S
2
n
S
2
n1
=
1
(2
n1
+ 1)
α
+ ··· +
1
(2
n
)
α
2
n1
(2
n1
)
α
= (2
n1
)
1α
= (2
1α
)
n1
.
But 2
1α
< 1. So
S
2
n
= (S
2
n
S
2
n1
) + (S
2
n1
S
2
n2
) + ···(S
2
S
1
) + S
1
and is bounded above by comparison with the geometric series 1 + 2
1α
+
(2
1α
)
2
+ ···
3.2 Absolute convergence
Here we’ll consider two stronger conditions for convergence absolute conver-
gence and unconditional convergence. We’ll prove that these two conditions are
in fact equivalent.
Definition (Absolute convergence). A series
P
a
n
converges absolutely if the
series
P
|a
n
| converges.
Example. The series
P
(1)
n+1
n
= 1
1
2
+
1
3
1
4
+
···
converges, but not
absolutely.
To see the convergence, note that
a
2n1
+ a
2n
=
1
2n 1
1
2n
=
1
2n(2n 1)
.
It is easy to compare with 1
/n
2
to get that the partial sums
S
2n
converges. But
S
2n+1
S
2n
= 1/(2n + 1) 0, so the S
2n+1
converges to the same limit.
It does not converge absolutely, because the sum of the absolute values is
the harmonic series.
Lemma. Let
P
a
n
converge absolutely. Then
P
a
n
converges.
Proof.
We know that
P
|a
n
|
converges. Let
S
N
=
P
N
n=1
a
n
and
T
N
=
P
N
n=1
|a
n
|.
We know two ways to show random sequences converge, without knowing
what they converge to, namely monotone-sequences and Cauchy sequences. Since
S
N
is not monotone, we shall try Cauchy sequences.
If p > q, then
|S
p
S
q
| =
p
X
n=q+1
a
n
p
X
n=q+1
|a
n
| = T
p
T
q
.
But the sequence
T
p
converges. So
ε >
0, we can find
N
such that for all
p > q N , we have T
p
T
q
< ε, which implies |S
p
S
q
| < ε.
Definition (Unconditional convergence). A series
P
a
n
converges uncondition-
ally if the series
P
n=1
a
π(n)
converges for every bijection
π
:
N N
, i.e. no
matter how we re-order the elements of a
n
, the sum still converges.
Theorem. If
P
a
n
converges absolutely, then it converges unconditionally.
Proof. Let S
n
=
P
N
n=1
a
π(n)
. Then if p > q,
|S
p
S
q
| =
p
X
n=q+1
a
π(n)
X
n=q+1
|a
π(n)
|.
Let ε > 0. Since
P
|a
n
| converges, pick M such that
P
n=M+1
|a
n
| < ε.
Pick N large enough that {1, ··· , M} {π(1), ··· , π(N)}.
Then if n > N , we have π(n) > M. Therefore if p > q N, then
|S
p
S
q
|
p
X
n=q+1
|a
π(n)
|
X
n=M+1
|a
n
| < ε.
Therefore the sequence of partial sums is Cauchy.
The converse is also true.
Theorem. If
P
a
n
converges unconditionally, then it converges absolutely.
Proof.
We will prove the contrapositive: if it doesn’t converge absolutely, it
doesn’t converge unconditionally.
Suppose that
P
|a
n
|
=
. Let (
b
n
) be the subsequence of non-negative
terms of
a
n
, and (
c
n
) be the subsequence of negative terms. Then
P
b
n
and
P
c
n
cannot both converge, or else
P
|a
n
| converges.
wlog,
P
b
n
=
. Now construct a sequence 0 =
n
0
< n
1
< n
2
< ···
such
that k,
b
n
k1
+1
+ b
n
k1
+2
+ ··· + b
n
k
+ c
k
1,
This is possible because the
b
n
are unbounded and we can get it as large as we
want.
Let π be the rearrangement
b
1
, b
2
, ···b
n
1
, c
1
, b
n
1
+1
, ···b
n
2
, c
2
, b
n
2
+1
, ···b
n
3
, c
3
, ···
So the sum up to c
k
is at least k. So the partial sums tend to infinity.
We can prove an even stronger result:
Lemma. Let
P
a
n
be a series that converges absolutely. Then for any bijection
π : N N,
X
n=1
a
n
=
X
n=1
a
π(n)
.
Proof.
Let
ε >
0. We know that both
P
|a
n
|
and
P
|a
π(n)
|
converge. So let
M
be such that
P
n>M
|a
n
| <
ε
2
and
P
n>M
|a
π(n)
| <
ε
2
.
Now N be large enough such that
{1, ··· , M } {π(1), ··· , π(N)},
and
{π(1), ··· , π(M)} {1, ··· , N}.
Then for every K N ,
K
X
n=1
a
n
K
X
n=1
a
π(n)
K
X
n=M+1
|a
n
| +
K
X
n=M+1
|a
π(n)
| <
ε
2
+
ε
2
= ε.
We have the first inequality since given our choice of
M
and
N
, the first
M
terms of the
P
a
n
and
P
a
π(n)
sums are cancelled by some term in the huge
sum.
So
K N
, the partial sums up to
K
differ by at most
ε
. So
|
P
a
n
P
a
π(n)
| ε.
Since this is true for all ε, we must have
P
a
n
=
P
a
π(n)
.
3.3 Convergence tests
We’ll now come up with a lot of convergence tests.
Lemma (Alternating sequence test). Let (
a
n
) be a decreasing sequence of
non-negative reals, and suppose that
a
n
0. Then
X
n=1
(
1)
n+1
a
n
converges,
i.e. a
1
a
2
+ a
3
a
4
+ ··· converges.
Proof. Let S
N
=
N
X
n=1
(1)
n+1
a
n
. Then
S
2n
= (a
1
a
2
) + (a
3
a
4
) + ··· + (a
2n1
a
2n
) 0,
and (S
2n
) is an increasing sequence.
Also,
S
2n+1
= a
1
(a
2
a
3
) (a
4
a
5
) ··· (a
2n
a
2n+1
),
and (S
2n+1
) is a decreasing sequence. Also S
2n+1
S
2n
= a
2n+1
0.
Hence we obtain the bounds 0
S
2n
S
2n+1
a
1
. It follows from the
monotone sequences property that (S
2n
) and (S
2n+1
) converge.
Since S
2n+1
S
2n
= a
2n+1
0, they converge to the same limit.
Example.
1
1
3
2
+
1
3
3
1
3
4
+ ···converges.
Lemma (Ratio test). We have three versions:
(i) If c < 1 such that
|a
n+1
|
|a
n
|
c,
for all n, then
P
a
n
converges.
(ii) If c < 1 and N such that
(n N )
|a
n+1
|
|a
n
|
c,
then
P
a
n
converges. Note that just because the ratio is always less than
1, it doesn’t necessarily converge. It has to be always less than a fixed
number c. Otherwise the test will say that
P
1/n converges.
(iii) If ρ (1, 1) such that
a
n+1
a
n
ρ,
then
P
a
n
converges. Note that we have the open interval (
1
,
1). If
|a
n+1
|
|a
n
|
1, then the test is inconclusive!
Proof.
(i) |a
n
| c
n1
|a
1
|
. Since
P
c
n
converges, so does
P
|a
n
|
by comparison test.
So
P
a
n
converges absolutely, so it converges.
(ii)
For all
k
0, we have
|a
N+k
| c
k
|a
N
|
. So the series
P
|a
N+k
|
converges,
and therefore so does
P
|a
k
|.
(iii)
If
a
n+1
a
n
ρ
, then
|a
n+1
|
|a
n
|
|ρ|
. So (setting
ε
= (1
|ρ|
)
/
2) there exists
N
such that n N,
|a
n+1
|
|a
n
|
1+|ρ|
2
< 1. So the result follows from (ii).
Example. If |b| < 1, then
P
nb
n
converges, since
a
n+1
a
n
=
(n + 1)b
n+1
nb
n
=
1 +
1
n
b b < 1.
So it converges.
We can also evaluate this directly by considering
X
i=1
X
n=i
b
n
.
The following two tests were taught at the end of the course, but are included
here for the sake of completeness.
Theorem (Condensation test). Let (
a
n
) be a decreasing non-negative sequence.
Then
P
n=1
a
n
< if and only if
X
k=1
2
k
a
2
k
< .
Proof.
This is basically the proof that
P
1
n
diverges and
P
1
n
α
converges for
α < 1 but written in a more general way.
We have
a
1
+ a
2
+ (a
3
+ a
4
) + (a
5
+ ··· + a
8
) + (a
9
+ ··· + a
16
) + ···
a
1
+ a
2
+ 2a
4
+ 4a
8
+ 8a
16
+ ···
So if
P
2
k
a
2
k
diverges,
P
a
n
diverges.
To prove the other way round, simply group as
a
1
+ (a
2
+ a
3
) + (a
4
+ ··· + a
7
) + ···
a
1
+ 2a
2
+ 4a
4
+ ··· .
Example. If a
n
=
1
n
, then 2
k
a
2
k
= 1. So
P
k=1
2
k
a
2
k
= . So
P
n=1
1
n
= .
After we formally define integrals, we will prove the integral test:
Theorem (Integral test). Let
f
: [1
,
]
R
be a decreasing non-negative
function. Then
P
n=1
f(n) converges iff
R
1
f(x) dx < .
3.4 Complex versions
Most definitions in the course so far carry over unchanged to the complex
numbers. e.g. z
n
z iff (ε > 0)(N)(n N) |z
n
z| < ε.
Two exceptions are least upper bound and monotone sequences, because the
complex numbers do not have an ordering! (It cannot be made into an ordered
field because the square of every number in an ordered field has to be positive)
Fortunately, Cauchy sequences still work.
We can prove the complex versions of most theorems so far by looking at the
real and imaginary parts.
Example. Let (
z
n
) be a Cauchy sequence in
C
. Let
z
n
=
x
n
+
iy
n
. Then (
x
n
)
and (
y
n
) are Cauchy. So they converge, from which it follows that
z
n
=
x
n
+
iy
n
converges.
Also, the Bolzano-Weierstrass theorem still holds: If (
z
n
) is bounded, let
z
n
=
x
n
+
y
n
, then (
x
n
) and (
y
n
) are bounded. Then find a subsequence (
x
n
k
)
that converges. Then find a subsequence of (y
n
k
) that converges.
Then nested-intervals property has a “nested-box” property as a complex
analogue.
Finally, the proof that absolutely convergent sequences converge still works.
It follows that the ratio test still works.
Example. If |z| < 1, then
P
nz
n
converges. Proof is the same as above.
However, we do have an extra test for complex sums.
Lemma (Abel’s test). Let
a
1
a
2
···
0, and suppose that
a
n
0. Let
z C such that |z| = 1 and z 6= 1. Then
P
a
n
z
n
converges.
Proof. We prove that it is Cauchy. We have
N
X
n=M
a
n
z
n
=
N
X
n=M
a
n
z
n+1
z
n
z 1
=
1
z 1
N
X
n=M
a
n
(z
n+1
z
n
)
=
1
z 1
N
X
n=M
a
n
z
n+1
N
X
n=M
a
n
z
n
!
=
1
z 1
N
X
n=M
a
n
z
n+1
N1
X
n=M1
a
n+1
z
n+1
!
=
1
z 1
a
N
z
N+1
a
M
z
M
+
N1
X
n=M
(a
n
a
n+1
)z
n+1
!
We now take the absolute value of everything to obtain
N
X
n=M
a
n
z
n
1
|z 1|
a
N
+ a
M
+
N1
X
n=M
(a
n
a
n+1
)
!
=
1
|z 1|
(a
N
+ a
M
+ (a
M
a
M+1
) + ··· + (a
N1
a
N
))
=
2a
M
|z 1|
0.
So it is Cauchy. So it converges
Note that here we transformed the sum
P
a
n
(
z
n+1
z
n
) into
a
N
z
N+1
a
M
z
M
+
P
(
a
n
a
n+1
)
z
n+1
. What we have effectively done is a discrete analogue
of integrating by parts.
Example. The series
P
z
n
/n
converges if
|z| <
1 or if
|z|
= 1 and
z 6
= 1, and it
diverges if z = 1 or |z| > 1.
The cases
|z| <
1 and
|z| >
1 are trivial from the ratio test, and Abel’s test
is required for the |z| = 1 cases.
4 Continuous functions
4.1 Continuous functions
Definition (Continuous function). Let
A R
,
a A
, and
f
:
A R
. Then
f
is continuous at
a
if for any
ε >
0, there is some
δ >
0 such that if
y A
is such
that |y a| < δ, then |f(y) f(a)| < ε. In symbols, we have
(ε > 0)(δ > 0)(y A) |y a| < δ |f(y) f (a)| < ε.
f is continuous if it is continuous at every a A. In symbols, we have
(a A)(ε > 0)(δ > 0)(y A) |y a| < δ |f(y) f(a)| < ε.
Intuitively,
f
is continuous at
a
if we can obtain
f
(
a
) as accurately as we
wish by using more accurate values of
a
(the definition says that if we want to
approximate
f
(
a
) by
f
(
y
) to within accuracy
ε
, we just have to get our
y
to
within δ of a for some δ).
For example, suppose we have the function
f(x) =
(
0 x π
1 x > π
.
Suppose that we don’t know what the function actually is, but we have a
computer program that computes this function. We want to know what
f
(
π
)
is. Since we cannot input
π
(it has infinitely many digits), we can try 3, and it
gives 0. Then we try 3
.
14, and it gives 0 again. If we try 3
.
1416, it gives 1 (since
π
= 3
.
1415926
··· <
3
.
1416). We keep giving more and more digits of
π
, but the
result keeps oscillating between 0 and 1. We have no hope of what
f
(
π
) might
be, even approximately. So this f is discontinuous at π.
However, if we have the function
g
(
x
) =
x
2
, then we can find the (approx-
imate) value of
g
(
π
). We can first try
g
(3) and obtain 9. Then we can try
g
(3
.
14) = 9
.
8596,
g
(3
.
1416) = 9
.
86965056 etc. We can keep trying and obtain
more and more accurate values of g(π). So g is continuous at π.
Example.
Constant functions are continuous.
The function f(x) = x is continuous (take δ = ε).
The definition of continuity of a function looks rather like the definition of
convergence. In fact, they are related by the following lemma:
Lemma. The following two statements are equivalent for a function
f
:
A R
.
f is continuous
If (a
n
) is a sequence in A with a
n
a, then f(a
n
) f (a).
Proof. (i)(ii) Let ε > 0. Since f is continuous at a,
(δ > 0)(y A) |y a| < δ |f(y) f(a)| < ε.
We want
N
such that
n N
,
|f
(
a
n
)
f
(
a
)
| < ε
. By continuity, it is enough
to find N such that n N, |a
n
a| < δ. Since a
n
a, such an N exists.
(ii)
(i) We prove the contrapositive: Suppose
f
is not continuous at
a
. Then
(ε > 0)(δ > 0)(y A) |y a| < δ and |f(y) f(a)| ε.
For each
n
, we can therefore pick
a
n
A
such that
|a
n
a| <
1
n
and
|f
(
a
n
)
f
(
a
)
| ε
. But then
a
n
a
(by Archimedean property), but
f
(
a
n
)
6→ f
(
a
).
Example.
(i)
Let
f
(
x
) =
(
1 x < 0
1 x 0
. Then
f
is not continuous because
1
n
0 but
f(
1
n
) 1 6= f (0).
(ii) Let f : Q R with
f(x) =
(
1 x
2
> 2
0 x
2
< 2
Then
f
is continuous. For every
a Q
, we can find an interval about
a
on
which f is constant. So f is continuous at a.
(iii) Let
f(x) =
(
sin
1
x
x 6= 0
0 x = 0
Then
f
(
a
) is discontinuous. For example, let
a
n
= 1
/
[(2
n
+ 0
.
5)
π
]. Then
a
n
0 and f(a
n
) 1 6= f (0).
We can use this sequence definition as the definition for continuous functions.
This has the advantage of being cleaner to write and easier to work with. In
particular, we can reuse a lot of our sequence theorems to prove the analogous
results for continuous functions.
Lemma. Let A R and f, g : A R be continuous functions. Then
(i) f + g is continuous
(ii) fg is continuous
(iii) if g never vanishes, then f /g is continuous.
Proof.
(i) Let a A and let (a
n
) be a sequence in A with a
n
a. Then
(f + g)(a
n
) = f(a
n
) + g(a
n
).
But f (a
n
) f (a) and g(a
n
) g(a). So
f(a
n
) + g(a
n
) f (a) + g(a) = (f + g)(a).
(ii) and (iii) are proved in exactly the same way.
With this lemma, from the fact that constant functions and
f
(
x
) =
x
are
continuous, we know that all polynomials are continuous. Similarly, rational
functions P (x)/Q(x) are continuous except when Q(x) = 0.
Lemma. Let
A, B R
and
f
:
A B
,
g
:
B R
. Then if
f
and
g
are
continuous, g f : A R is continuous.
Proof. We offer two proofs:
(i)
Let (
a
n
) be a sequence in
A
with
a
n
a A
. Then
f
(
a
n
)
f
(
a
) since
f
is continuous. Then
g
(
f
(
a
n
))
g
(
f
(
a
)) since
g
is continuous. So
g f
is continuous.
(ii)
Let
a A
and
ε >
0. Since
g
is continuous at
f
(
a
), there exists
η >
0 such
that z B, |z f(a)| < η |g(z) g(f(a))| < ε.
Since
f
is continuous at
a
,
δ >
0 such that
y A
,
|y a| < δ
|f(y) f(a)| < η. Therefore |y a| < δ |g(f(y)) g(f (a))| < ε.
There are two important theorems regarding continuous functions the
maximum value theorem and the intermediate value theorem.
Theorem (Maximum value theorem). Let [
a, b
] be a closed interval in
R
and
let
f
: [
a, b
]
R
be continuous. Then
f
is bounded and attains its bounds, i.e.
f(x) = sup f for some x, and f(y) = inf f for some y.
Proof.
If
f
is not bounded above, then for each
n
, we can find
x
n
[
a, b
] such
that f (x
n
) n for all n.
By Bolzano-Weierstrass, since
x
n
[
a, b
] and is bounded, the sequence
(
x
n
) has a convergent subsequence (
x
n
k
). Let
x
be its limit. Then since
f
is
continuous, f(x
n
k
) f (x). But f(x
n
k
) n
k
. So this is a contradiction.
Now let
C
=
sup{f
(
x
) :
x
[
a, b
]
}
. Then for every
n
, we can find
x
n
such that
f
(
x
n
)
C
1
n
. So by Bolzano-Weierstrass, (
x
n
) has a convergent
subsequence (
x
n
k
). Since
C
1
n
k
f
(
x
n
k
)
C
,
f
(
x
n
k
)
C
. Therefore if
x = lim x
n
k
, then f(x) = C.
A similar argument applies if f is unbounded below.
Theorem (Intermediate value theorem). Let
a < b R
and let
f
: [
a, b
]
R
be continuous. Suppose that
f
(
a
)
<
0
< f
(
b
). Then there exists an
x
(
a, b
)
such that f (x) = 0.
Proof. We have several proofs:
(i)
Let
A
=
{x
:
f
(
x
)
<
0
}
and let
s
=
sup A
. We shall show that
f
(
s
) = 0
(this is similar to the proof that
2
exists in Numbers and Sets). If
f
(
s
)
<
0, then setting
ε
=
|f
(
s
)
|
in the definition of continuity, we can find
δ >
0 such that
y
,
|y s| < δ f
(
y
)
<
0. Then
s
+
δ/
2
A
, so
s
is not
an upper bound. Contradiction.
If
f
(
s
)
>
0, by the same argument, we can find
δ >
0 such that
y
,
|y s| < δ f(y) > 0. So s δ/2 is a smaller upper bound.
(ii)
Let
a
0
=
a
,
b
0
=
b
. By repeated bisection, construct nested intervals
[
a
n
, b
n
] such that
b
n
a
n
=
b
0
a
0
2
n
and
f
(
a
n
)
<
0
f
(
b
n
). Then by the
nested intervals property, we can find
x
n=0
[
a
n
, b
n
]. Since
b
n
a
n
0,
a
n
, b
n
x.
Since
f
(
a
n
)
<
0 for every
n
,
f
(
x
)
0. Similarly, since
f
(
b
n
)
0 for every
n, f (x) 0. So f(x) = 0.
It is easy to generalize this to get that, if
f
(
a
)
< c < f
(
b
), then
x
(
a, b
)
such that
f
(
x
) =
c
, by applying the result to
f
(
x
)
c
. Also, we can assume
instead that f(b) < c < f(a) and obtain the same result by looking at f (x).
Corollary. Let
f
: [
a, b
]
[
c, d
] be a continuous strictly increasing function
with f (a) = c, f (b) = d. Then f is invertible and its inverse is continuous.
Proof.
Since
f
is strictly increasing, it is an injection (suppose
x 6
=
y
. wlog,
x < y
. Then
f
(
x
)
< f
(
y
) and so
f
(
x
)
6
=
f
(
y
)). Now let
y
(
c, d
). By the
intermediate value theorem, there exists
x
(
a, b
) such that
f
(
x
) =
y
. So
f
is a
surjection. So it is a bijection and hence invertible.
Let
g
be the inverse. Let
y
[
c, d
] and let
ε >
0. Let
x
=
g
(
y
). So
f
(
x
) =
y
.
Let
u
=
f
(
x ε
) and
v
=
f
(
x
+
ε
) (if
y
=
c
or
d
, make the obvious adjustments).
Then
u < y < v
. So we can find
δ >
0 such that (
y δ, y
+
δ
)
(
u, v
). Then
|z y| < δ g(z) (x ε, x + ε) |g(z) g(y)| < ε.
With this corollary, we can create more continuous functions, e.g.
x.
4.2 Continuous induction*
Continuous induction is a generalization of induction on natural numbers. It
provides an alternative mechanism to prove certain results we have shown.
Proposition (Continuous induction v1). Let
a < b
and let
A
[
a, b
] have the
following properties:
(i) a A
(ii) If x A and x 6= b, then y A with y > x.
(iii) If ε > 0, y A : y (x ε, x], then x A.
Then b A.
Proof.
Since
a A
,
A 6
=
.
A
is also bounded above by
b
. So let
s
=
sup A
.
Then ε > 0, y A such that y > s ε. Therefore, by (iii), s A.
If s 6= b, then by (ii), we can find y A such that y > s.
It can also be formulated as follows:
Proposition (Continuous induction v2). Let A [a, b] and suppose that
(i) a A
(ii) If [a, x] A and x 6= b, then there exists y > x such that [a, y] A.
(iii) If [a, x) A, then [a, x] A.
Then A = [a, b]
Proof.
We prove that version 1
version 2. Suppose
A
satisfies the conditions
of v2. Let A
0
= {x [a, b] : [a, x] A}.
Then
a A
0
. If
x A
0
with
x 6
=
b
, then [
a, x
]
A
. So
y > x
such that
[a, y] A. So y > x such that y A
0
.
If
ε >
0
, y
(
x ε, x
] such that [
a, y
]
A
, then [
a, x
)
A
. So by (iii),
[
a, x
]
A
, so
x A
0
. So
A
0
satisfies properties (i) to (iii) of version 1. Therefore
b A
0
. So [a, b] A. So A = [a, b].
We reprove intermediate value theorem here:
Theorem (Intermediate value theorem). Let
a < b R
and let
f
: [
a, b
]
R
be continuous. Suppose that
f
(
a
)
<
0
< f
(
b
). Then there exists an
x
(
a, b
)
such that f (x) = 0.
Proof.
Assume that
f
is continuous. Suppose
f
(
a
)
<
0
< f
(
b
). Assume that
(x) f(x) 6= 0, and derive a contradiction.
Let
A
=
{x
:
f
(
x
)
<
0
}
Then
a A
. If
x A
, then
f
(
x
)
<
0, and by
continuity, we can find
δ >
0 such that
|y x| < δ f
(
y
)
<
0. So if
x 6
=
b
, then
we can find y A such that y > x.
We prove the contrapositive of the last condition, i.e.
x 6∈ A (δ > 0)(y A) y 6∈ (x δ, x].
If
x 6∈ A
, then
f
(
x
)
>
0 (we assume that
f
is never zero. If not, we’re done).
Then by continuity, δ > 0 such that |y x| < δ f(y) > 0. So y 6∈ A.
Hence by continuous induction, b A. Contradiction.
Now we prove that continuous functions in closed intervals are bounded.
Theorem. Let [
a, b
] be a closed interval in
R
and let
f
: [
a, b
]
R
be continuous.
Then f is bounded.
Proof.
Let
f
: [
a, b
] be continuous. Let
A
=
{x
:
f is bounded on
[
a, x
]
}
. Then
a A
. If
x A, x 6
=
b
, then
δ >
0 such that
|y x| < δ |f
(
y
)
f
(
x
)
| <
1.
So
y > x
(e.g. take
min{x
+
δ/
2
, b}
) such that
f
is bounded on [
a, y
], which
implies that y A.
Now suppose that
ε >
0,
y
(
x, ε, x
] such that
y A
. Again, we can
find
δ >
0 such that
f
is bounded on (
x δ, x
+
δ
), and in particular on (
x δ, x
].
Pick
y
such that
f
is bounded on [
a, y
] and
y > x δ
. Then
f
is bounded on
[a, x]. So x A.
So we are done by continuous induction.
Finally, we can prove a theorem that we have not yet proven.
Definition (Cover of a set). Let
A R
. A cover of
A
by open intervals is a set
{I
γ
: γ Γ} where each I
γ
is an open interval and A
S
γΓ
I
γ
.
A finite subcover is a finite subset
{I
γ
1
, ··· , I
γ
n
}
of the cover that is still a
cover.
Not every cover has a finite subcover. For example, the cover
{
(
1
n
,
1) :
n N}
of (0, 1) has no finite subcover.
Theorem (Heine-Borel*). Every cover of a closed, bounded interval [
a, b
] by
open intervals has a finite subcover. We say closed intervals are compact (cf.
Metric and Topological Spaces).
Proof.
Let
{I
γ
:
γ
Γ
}
be a cover of [
a, b
] by open intervals. Let
A
=
{x
: [
a, x
]
can be covered by finitely many of the I
γ
}.
Then a A since a must belong to some I
γ
.
If
x A
, then pick
γ
such that
x I
γ
. Then if
x 6
=
b
, since
I
γ
is an open
interval, it contains [
x, y
] for some
y > x
. Then [
a, y
] can be covered by finitely
many I
γ
, by taking a finite cover for [a, x] and the I
γ
that contains x.
Now suppose that ε > 0, y A such that y (x ε, x].
Let
I
γ
be an open interval containing
x
. Then it contains (
x ε, x
] for some
ε >
0. Pick
y A
such that
y
(
x ε, x
]. Now combine
I
γ
with a finite
subcover of [a, y] to get a finite subcover of [a, x]. So x A.
Then done by continuous induction.
We can use Heine-Borel to prove that continuous functions on [
a, b
] are
bounded.
Theorem. Let [
a, b
] be a closed interval in
R
and let
f
: [
a, b
]
R
be continuous.
Then
f
is bounded and attains it bounds, i.e.
f
(
x
) =
sup f
for some
x
, and
f(y) = inf f for some y.
Proof. Let f : [a, b] R be continuous. Then by continuity,
(x [a, b])(δ
x
> 0)(y) |y x| < δ
x
|f (y) f (x)| < 1.
Let
γ
= [
a, b
] and for each
x γ
, let
I
x
= (
x δ
x
, x
+
δ
x
). So by Heine-Borel,
we can find x
1
, ··· , x
n
such that [a, b]
S
n
1
(x
i
δ
x
i
, x
i
+ δ
x
i
).
But
f
is bounded in each interval (
x
i
δ
x
i
, x
i
+
δ
x
i
) by
|f
(
x
i
)
|
+ 1. So it is
bounded on [a, b] by max |f(x
i
)| + 1.
5 Differentiability
In the remainder of the course, we will properly develop calculus, and put
differentiation and integration on a rigorous foundation. Every notion will be
given a proper definition which we will use to prove results like the product and
quotient rule.
5.1 Limits
First of all, we need the notion of limits. Recall that we’ve previously had limits
for sequences. Now, we will define limits for functions.
Definition (Limit of functions). Let A R and let f : A R. We say
lim
xa
f(x) = `,
or f (x) ` as x a”, if
(ε > 0)(δ > 0)(x A) 0 < |x a| < δ |f(x) `| < ε.
We couldn’t care less what happens when
x
=
a
, hence the strict inequality
0 < |x a|. In fact, f doesn’t even have to be defined at x = a.
Example. Let
f(x) =
(
x x 6= 2
3 x = 2
Then lim
x2
= 2, even though f(2) = 3.
Example. Let f(x) =
sin x
x
. Then f(0) is not defined but lim
x0
f(x) = 1.
We will see a proof later after we define what sin means.
We notice that the definition of the limit is suspiciously similar to that of
continuity. In fact, if we define
g(x) =
(
f(x) x 6= a
` x = a
Then f (x) ` as x a iff g is continuous at a.
Alternatively,
f
is continuous at
a
if
f
(
x
)
f
(
a
) as
x a
. It follows also
that
f
(
x
)
`
as
x a
iff
f
(
x
n
)
`
for every sequence (
x
n
) in
A
with
x
n
a
.
The previous limit theorems of sequences apply here as well
Proposition. If f(x) ` and g(x) m as x a, then f(x) + g(x) ` + m,
f(x)g(x) `m, and
f(x)
g(x)
`
m
if g and m don’t vanish.
5.2 Differentiation
Similar to what we did in IA Differential Equations, we define the derivative as
a limit.
Definition (Differentiable function).
f
is differentiable at
a
with derivative
λ
if
lim
xa
f(x) f(a)
x a
= λ.
Equivalently, if
lim
h0
f(a + h) f (a)
h
= λ.
We write λ = f
0
(a).
Here we see why, in the definition of the limit, we say that we don’t care
what happens when
x
=
a
. In our definition here, our function is 0/0 when
x = a, and we can’t make any sense out of what happens when x = a.
Alternatively, we write the definition of differentiation as
f(x + h) f (x)
h
= f
0
(x) + ε(h),
where ε(h) 0 as h 0. Rearranging, we can deduce that
f(x + h) = f(x) + hf
0
(x) + (h),
Note that by the definition of the limit, we don’t have to care what value
ε
takes
when
h
= 0. It can be 0,
π
or 10
10
10
. However, we usually take
ε
(0) = 0 so that
ε is continuous.
Using the small-
o
notation, we usually write
o
(
h
) for a function that satisfies
o(h)
h
0 as h 0. Hence we have
Proposition.
f(x + h) = f(x) + hf
0
(x) + o(h).
We can interpret this as an approximation of f (x + h):
f(x + h) = f(x) + hf
0
(x)
| {z }
linear approximation
+ o(h)
|{z}
error term
.
And differentiability shows that this is a very good approximation with small
o(h) error.
Conversely, we have
Proposition. If
f
(
x
+
h
) =
f
(
x
) +
hf
0
(
x
) +
o
(
h
), then
f
is differentiable at
x
with derivative f
0
(x).
Proof.
f(x + h) f (x)
h
= f
0
(x) +
o(h)
h
f
0
(x).
We can take derivatives multiple times, and get multiple derivatives.
Definition (Multiple derivatives). This is defined recursively:
f
is (
n
+ 1)-
times differentiable if it is
n
-times differentiable and its
n
th derivative
f
(n)
is
differentiable. We write
f
(n+1)
for the derivative of
f
(n)
, i.e. the (
n
+ 1)th
derivative of f.
Informally, we will say
f
is
n
-times differentiable if we can differentiate it
n
times, and the nth derivative is f
(n)
.
We can prove the usual rules of differentiation using the small
o
-notation. It
can also be proven by considering limits directly, but the notation will become a
bit more daunting.
Lemma (Sum and product rule). Let
f, g
be differentiable at
x
. Then
f
+
g
and f g are differentiable at x, with
(f + g)
0
(x) = f
0
(x) + g
0
(x)
(fg)
0
(x) = f
0
(x)g(x) + f (x)g
0
(x)
Proof.
(f + g)(x + h) = f(x + h) + g(x + h)
= f(x) + hf
0
(x) + o(h) + g(x) + hg
0
(x) + o(h)
= (f + g)(x) + h(f
0
(x) + g
0
(x)) + o(h)
fg(x + h) = f(x + h)g(x + h)
= [f(x) + hf
0
(x) + o(h)][g(x) + hg
0
(x) + o(h)]
= f(x)g(x) + h[f
0
(x)g(x) + f (x)g
0
(x)]
+ o(h)[g(x) + f(x) + hf
0
(x) + hg
0
(x) + o(h)] + h
2
f
0
(x)g
0
(x)
| {z }
error term
By limit theorems, the error term is o(h). So we can write this as
= fg(x) + h(f
0
(x)g(x) + f (x)g
0
(x)) + o(h).
Lemma (Chain rule). If
f
is differentiable at
x
and
g
is differentiable at
f
(
x
),
then g f is differentiable at x with derivative g
0
(f(x))f
0
(x).
Proof.
If one is sufficiently familiar with the small-
o
notation, then we can
proceed as
g(f(x + h)) = g(f(x) + hf
0
(x) + o(h)) = g(f (x)) + hf
0
(x)g
0
(f(x)) + o(h).
If not, we can be a bit more explicit about the computations, and use
(
h
)
instead of o(h):
(g f)(x + h) = g(f(x + h))
= g[f(x) + hf
0
(x) +
1
(h)
| {z }
the h term
]
= g(f(x)) +
fg
0
(x) +
1
(h)
g
0
(f(x))
+
hf
0
(x) +
1
(h)
ε
2
(hf
0
(x) +
1
(h))
= g f(x) + hg
0
(f(x))f
0
(x)
+ h
h
ε
1
(h)g
0
(f(x)) +
f
0
(x) + ε
1
(h)
ε
2
hf
0
(x) +
1
(h)
i
|
{z }
error term
.
We want to show that the error term is
o
(
h
), i.e. it divided by
h
tends to 0 as
h 0.
But
ε
1
(
h
)
g
0
(
f
(
x
))
0,
f
0
(
x
)+
ε
1
(
h
) is bounded, and
ε
2
(
hf
0
(
x
)+
1
(
h
))
0
because hf
0
(x) +
1
(h) 0 and ε
2
(0) = 0. So our error term is o(h).
We usually don’t write out the error terms so explicitly, and just use heuristics
like
f
(
x
+
o
(
h
)) =
f
(
x
) +
o
(
h
);
o
(
h
) +
o
(
h
) =
o
(
h
); and
g
(
x
)
·o
(
h
) =
o
(
h
) for any
(bounded) function g.
Example.
(i) Constant functions are differentiable with derivative 0.
(ii) f(x) = λx is differentiable with derivative λ.
(iii)
Using the product rule, we can show that
x
n
is differentiable with derivative
nx
n1
by induction.
(iv) Hence all polynomials are differentiable.
Example. Let f(x) = 1/x. If x 6= 0, then
f(x + h) f (x)
h
=
1
x+h
1
x
h
=
h
x(x+h)
h
=
1
x(x + h)
1
x
2
by limit theorems.
Lemma (Quotient rule). If
f
and
g
are differentiable at
x
, and
g
(
x
)
6
= 0, then
f/g is differentiable at x with derivative
f
g
0
(x) =
f
0
(x)g(x) g
0
(x)f(x)
g(x)
2
.
Proof.
First note that 1
/g
(
x
) =
h
(
g
(
x
)) where
h
(
y
) = 1
/y
. So 1
/g
(
x
) is differ-
entiable at x with derivative
1
g(x)
2
g
0
(x) by the chain rule.
By the product rule, f/g is differentiable at x with derivative
f
0
(x)
g(x)
f(x)
g
0
(x)
g(x)
2
=
f
0
(x)g(x) f (x)g
0
(x)
g(x)
2
.
Lemma. If f is differentiable at x, then it is continuous at x.
Proof.
As
y x
,
f(y) f(x)
y x
f
0
(
x
). Since,
y x
0,
f
(
y
)
f
(
x
)
0 by
product theorem of limits. So f(y) f(x). So f is continuous at x.
Theorem. Let
f
: [
a, b
]
[
c, d
] be differentiable on (
a, b
), continuous on [
a, b
],
and strictly increasing. Suppose that
f
0
(
x
) never vanishes. Suppose further that
f
(
a
) =
c
and
f
(
b
) =
d
. Then
f
has an inverse
g
and for each
y
(
c, d
),
g
is
differentiable at y with derivative 1/f
0
(g(y)).
In human language, this states that if
f
is invertible, then the derivative of
f
1
is 1/f
0
.
Note that the conditions will (almost) always require
f
to be differentiable
on open interval (
a, b
), continuous on closed interval [
a, b
]. This is because it
doesn’t make sense to talk about differentiability at
a
or
b
since the definition of
f
0
(a) requires f to be defined on both sides of a.
Proof. g exists by an earlier theorem about inverses of continuous functions.
Let y, y + k (c, d). Let x = g(y), x + h = g(y + k).
Since
g
(
y
+
k
) =
x
+
h
, we have
y
+
k
=
f
(
x
+
h
). So
k
=
f
(
x
+
h
)
y
=
f(x + h) f (x). So
g(y + k) g(y)
k
=
(x + h) x
f(x + h) f (x)
=
f(x + h) f (x)
h
1
.
As k 0, since g is continuous, g(y + k) g(y). So h 0. So
g(y + k) g(y)
k
[f
0
(x)]
1
= [f
0
(g(y)]
1
.
Example. Let f(x) = x
1/2
for x > 0. Then f is the inverse of g(x) = x
2
. So
f
0
(x) =
1
g
0
(f(x))
=
1
2x
1/2
=
1
2
x
1/2
.
Similarly, we can show that the derivative of x
1/q
is
1
q
x
1/q1
.
Then let’s take x
p/q
= (x
1/q
)
p
. By the chain rule, its derivative is
p(x
1/q
)
p1
·
1
q
x
1/q1
=
p
q
x
p1
q
+
1
q
1
=
p
q
x
p
q
1
.
5.3 Differentiation theorems
Everything we’ve had so far is something we already know. It’s just that now we
can prove them rigorously. In this section, we will come up with genuinely new
theorems, including but not limited to Taylor’s theorem, which gives us Taylor’s
series.
Theorem (Rolle’s theorem). Let
f
be continuous on a closed interval [
a, b
] (with
a < b
) and differentiable on (
a, b
). Suppose that
f
(
a
) =
f
(
b
). Then there exists
x (a, b) such that f
0
(x) = 0.
It is intuitively obvious: if you move up and down, and finally return to the
same point, then you must have changed direction some time. Then
f
0
(
x
) = 0
at that time.
Proof. If f is constant, then we’re done.
Otherwise, there exists
u
such that
f
(
u
)
6
=
f
(
a
). wlog,
f
(
u
)
> f
(
a
). Since
f
is continuous, it has a maximum, and since
f
(
u
)
> f
(
a
) =
f
(
b
), the maximum
is not attained at a or b.
Suppose maximum is attained at x (a, b). Then for any h 6= 0, we have
f(x + h) f (x)
h
(
0 h > 0
0 h < 0
since
f
(
x
+
h
)
f
(
x
)
0 by maximality of
f
(
x
). By considering both sides as we
take the limit h 0, we know that f
0
(x) 0 and f
0
(x) 0. So f
0
(x) = 0.
Corollary (Mean value theorem). Let
f
be continuous on [
a, b
] (
a < b
), and
differentiable on (a, b). Then there exists x (a, b) such that
f
0
(x) =
f(b) f(a)
b a
.
Note that
f(b)f (a)
ba
is the slope of the line joining f(a) and f(b).
f(x)
f(a)
f(b)
The mean value theorem is sometimes described as “rotate your head and
apply Rolle’s”. However, if we actually rotate it, we might end up with a
non-function. What we actually want is a shear.
Proof. Let
g(x) = f(x)
f(b) f(a)
b a
x.
Then
g(b) g(a) = f(b) f(a)
f(b) f(a)
b a
(b a) = 0.
So by Rolle’s theorem, we can find x (a, b) such that g
0
(x) = 0. So
f
0
(x) =
f(b) f(a)
b a
,
as required.
We’ve always assumed that if a function has a positive derivative everywhere,
then the function is increasing. However, it turns out that this is really hard to
prove directly. It does, however, follow quite immediately from the mean value
theorem.
Example. Suppose
f
0
(
x
)
>
0 for every
x
(
a, b
). Then for
u, v
in [
a, b
], we can
find w (u, v) such that
f(v) f(u)
v u
= f
0
(w) > 0.
It follows that f(v) > f(u). So f is strictly increasing.
Similarly, if
f
0
(
x
)
2 for every
x
and
f
(0) = 0, then
f
(1)
2, or else we
can find x (0, 1) such that
2 f
0
(x) =
f(1) f(0)
1 0
= f(1).
Theorem (Local version of inverse function theorem). Let
f
be a function with
continuous derivative on (a, b).
Let
x
(
a, b
) and suppose that
f
0
(
x
)
6
= 0. Then there is an open interval
(
u, v
) containing
x
on which
f
is invertible (as a function from (
u, v
) to
f
((
u, v
))).
Moreover, if g is the inverse, then g
0
(f(z)) =
1
f
0
(z)
for every z (u, v).
This says that if
f
has a non-zero derivative, then it has an inverse locally
and the derivative of the inverse is 1/f
0
.
Note that this not only requires
f
to be differentiable, but the derivative
itself also has to be continuous.
Proof.
wlog,
f
0
(
x
)
>
0. By the continuity, of
f
0
, we can find
δ >
0 such that
f
0
(
z
)
>
0 for every
z
(
x δ, x
+
δ
). By the mean value theorem,
f
is strictly
increasing on (
x δ, x
+
δ
), hence injective. Also,
f
is continuous on (
x δ, x
+
δ
)
by differentiability.
Then done by the inverse function theorem.
Finally, we are going to prove Taylor’s theorem. To do so, we will first need
some lemmas.
Theorem (Higher-order Rolle’s theorem). Let
f
be continuous on [
a, b
] (
a < b
)
and n-times differentiable on an open interval containing [a, b]. Suppose that
f(a) = f
0
(a) = f
(2)
(a) = ··· = f
(n1)
(a) = f(b) = 0.
Then x (a, b) such that f
(n)
(x) = 0.
Proof. Induct on n. The n = 0 base case is just Rolle’s theorem.
Suppose we have
k < n
and
x
k
(
a, b
) such that
f
(k)
(
x
k
) = 0. Since
f
(k)
(
a
) = 0, we can find
x
k+1
(
a, x
k
) such that
f
(k+1)
(
x
k+1
) = 0 by Rolle’s
theorem.
So the result follows by induction.
Corollary. Suppose that
f
and
g
are both differentiable on an open interval
containing [
a, b
] and that
f
(k)
(
a
) =
g
(k)
(
a
) for
k
= 0
,
1
, ··· , n
1, and also
f(b) = g(b). Then there exists x (a, b) such that f
(n)
(x) = g
(n)
(x).
Proof. Apply generalised Rolle’s to f g.
Now we shall show that for any
f
, we can find a polynomial
p
of degree at
most
n
that satisfies the conditions for
g
, i.e. a
p
such that
p
(k)
(
a
) =
f
(k)
(
a
) for
k = 0, 1, ··· , n 1 and p(b) = f(b).
A useful ingredient is the observation that if
Q
k
(x) =
(x a)
k
k!
,
then
Q
(j)
k
(a) =
(
1 j = k
0 j 6= k
Therefore, if
Q(x) =
n1
X
k=0
f
(k)
(a)Q
k
(x),
then
Q
(j)
(a) = f
(j)
(a)
for
j
= 0
,
1
, ··· , n
1. To get
p
(
b
) =
f
(
b
), we use our
n
th degree polynomial
term:
p(x) = Q(x) +
(x a)
n
(b a)
n
f(b) Q(b)
.
Then our final term does not mess up our first
n
1 derivatives, and gives
p(b) = f(b).
By the previous corollary, we can find x (a, b) such that
f
(n)
(x) = p
(n)
(x).
That is,
f
(n)
(x) =
n!
(b a)
n
f(b) Q(b)
.
Therefore
f(b) = Q(b) +
(b a)
n
n!
f
(n)
(x).
Alternatively,
f(b) = f(a) + (b a)f
0
(a) + ··· +
(b a)
n1
(n 1)!
f
(n1)
(a) +
(b a)
n
n!
f
(n)
(x).
Setting b = a + h, we can rewrite this as
Theorem (Taylor’s theorem with the Lagrange form of remainder).
f(a + h) = f(a) + hf
0
(a) + ··· +
h
n1
(n 1)!
f
(n1)
(a)
| {z }
(n1)-degree approximation to f near a
+
h
n
n!
f
(n)
(x)
| {z }
error term
.
for some x (a, a + h).
Strictly speaking, we only proved it for the case when
h >
0, but we can
easily show it holds for h < 0 too by considering g(x) = f(x).
Note that the remainder term is not necessarily small, but this often gives
us the best (
n
1)-degree approximation to
f
near
a
. For example, if
f
(n)
is
bounded by C near a, then
h
n
n!
f
(n)
(x)
C
n!
|h|
n
= o(h
n1
).
Example. Let
f
:
R R
be a differentiable function such that
f
(0) = 1 and
f
0
(
x
) =
f
(
x
) for every
x
(intuitively, we know it is
e
x
, but that thing doesn’t
exist!). Then for every x, we have
f(x) = 1 + x +
x
2
2!
+
x
3
3!
+ ··· =
X
n=0
x
n
n!
.
While it seems like we can prove this works by differentiating it and see that
f
0
(
x
) =
f
(
x
), the sum rule only applies for finite sums. We don’t know we can
differentiate a sum term by term. So we have to use Taylor’s theorem.
Since
f
0
(
x
) =
f
(
x
), it follows that all derivatives exist. By Taylor’s theorem,
f(x) = f(0) + f
0
(0)x +
f
(2)
(0)
2!
x
2
+ ··· +
f
(n1)
(0)
(n 1)!
x
n1
+
f
(n)
(u)
n!
x
n
.
for some u between 0 and x. This equals to
f(x) =
n1
X
k=0
x
k
k!
+
f
(n)
(u)
n!
x
n
.
We must show that the remainder term
f
(n)
(u)
n!
x
n
0 as
n
. Note here
that x is fixed, but u can depend on n.
But we know that
f
(n)
(
u
) =
f
(
u
), but since
f
is differentiable, it is continuous,
and is bounded on [0, x]. Suppose |f(u)| C on [0, x]. Then
f
(n)(u)
n!
x
n
C
n!
|x|
n
0
from limit theorems. So it follows that
f(x) = 1 + x +
x
2
2!
+
x
3
3!
+ ··· =
X
n=0
x
n
n!
.
5.4 Complex differentiation
Definition (Complex differentiability). Let
f
:
C C
. Then
f
is differentiable
at z with derivative f
0
(z) if
lim
h0
f(z + h) f(z)
h
exists and equals f
0
(z).
Equivalently,
f(z + h) = f(z) + hf
0
(z) + o(h).
This is exactly the same definition with real differentiation, but has very
different properties!
All the usual rules chain rule, product rule etc. also apply (with the same
proofs). Also the derivatives of polynomials are what you expect. However, there
are some more interesting cases.
Example. f(z) = ¯z is not differentiable.
z + h z
h
=
¯
h
h
=
(
1 h is real
1 h is purely imaginary
If this seems weird, this is because we often think of
C
as
R
2
, but they are
not the same. For example, reflection is a linear map in
R
2
, but not in
C
. A
linear map in
C
is something in the form
x 7→ bx
, which can only be a dilation
or rotation, not reflections or other weird things.
Example.
f
(
z
) =
|z|
is also not differentiable. If it were, then
|z|
2
would be as
well (by the product rule). So would
|z|
2
z
=
¯z
when
z 6
= 0 by the quotient rule.
At
z
= 0, it is certainly not differentiable, since it is not even differentiable on
R
.
6 Complex power series
Before we move on to integration, we first have a look at complex power series.
This will allow us to define the familiar exponential and trigonometric functions.
Definition (Complex power series). A complex power series is a series of the
form
X
n=0
a
n
z
n
.
when z C and a
n
C for all n. When it converges, it is a function of z.
When considering complex power series, a very important concept is the
radius of convergence. To make sense of this concept, we first need the following
lemma:
Lemma. Suppose that
P
a
n
z
n
converges and
|w| < |z|
, then
P
a
n
w
n
converges
(absolutely).
Proof. We know that
|a
n
w
n
| = |a
n
z
n
| ·
w
z
n
.
Since
P
a
n
z
n
converges, the terms a
n
z
n
are bounded. So pick C such that
|a
n
z
n
| C
for every n. Then
0
X
n=0
|a
n
w
n
|
X
n=0
C
w
z
n
,
which converges (geometric series). So by the comparison test,
P
a
n
w
n
converges
absolutely.
It follows that if
P
a
n
z
n
does not converge and
|w| > |z|
, then
P
a
n
w
n
does
not converge.
Now let
R
=
sup{|z|
:
P
a
n
z
n
converges
}
(
R
may be infinite). If
|z| < R
,
then we can find
z
0
with
|z
0
|
(
|z|, R
] such that
P
n
a
n
z
n
0
converges. So by
lemma above,
P
a
n
z
n
converges. If
|z| > R
, then
P
a
n
z
n
diverges by definition
of R.
Definition (Radius of convergence). The radius of convergence of a power series
P
a
n
z
n
is
R = sup
n
|z| :
X
a
n
z
n
converges
o
.
{z : |z| < R} is called the circle of convergence.
a
.
If
|z| < R
, then
P
a
n
z
n
converges. If
|z| > R
, then
P
a
n
z
n
diverges. When
|z| = R, the series can converge at some points and not the others.
Example.
X
n=0
z
n
has radius of convergence of 1. When
|z|
= 1, it diverges
(since the terms do not tend to 0).
Example.
X
n=0
z
n
n
has radius of convergence 1, since the ratio of (
n
+ 1)th term
to nth is
z
n+1
/(n + 1)
z
n
/n
= z ·
n
n + 1
z.
So if
|z| <
1, then the series converges by the ratio test. If
|z| >
1, then eventually
the terms are increasing in modulus.
If
z
= 1, then it diverges (harmonic series). If
|z|
= 1 and
z 6
= 1, it converges
by Abel’s test.
Example. The series
X
n=1
z
n
n
2
converges for |z| 1 and diverges for |z| > 1.
As evidenced by the above examples, the ratio test can be used to find the
radius of convergence. We also have an alternative test based on the nth root.
Lemma. The radius of convergence of a power series
P
a
n
z
n
is
R =
1
lim sup
n
p
|a
n
|
.
Often
n
p
|a
n
| converges, so we only have to find the limit.
Proof.
Suppose
|z| <
1
/ lim sup
n
p
|a
n
|
. Then
|z|lim sup
n
p
|a
n
| <
1. Therefore
there exists N and ε > 0 such that
sup
nN
|z|
n
p
|a
n
| 1 ε
by the definition of lim sup. Therefore
|a
n
z
n
| (1 ε)
n
for every
n N
, which implies (by comparison with geometric series) that
P
a
n
z
n
converges absolutely.
On the other hand, if
|z|lim sup
n
p
|a
n
| >
1, it follows that
|z|
n
p
|a
n
|
1 for
infinitely many
n
. Therefore
|a
n
z
n
|
1 for infinitely many
n
. So
P
a
n
z
n
does
not converge.
Example. The radius of convergence of
z
n
2
n
is 2 because
n
p
|a
n
|
=
1
2
for every
n
.
So lim sup
n
p
|a
n
| =
1
2
. So 1/ lim sup
n
p
|a
n
| = 2.
But often it is easier to find the radius convergence from elementary methods
such as the ratio test, e.g. for
P
n
2
z
n
.
a
Note to pedants: yes it is a disc, not a circle
6.1 Exponential and trigonometric functions
Definition (Exponential function). The exponential function is
e
z
=
X
n=0
z
n
n!
.
By the ratio test, this converges on all of C.
A fundamental property of this function is that
e
z+w
= e
z
e
w
.
Once we have this property, we can say that
Proposition. The derivative of e
z
is e
z
.
Proof.
e
z+h
e
z
h
= e
z
e
h
1
h
= e
z
1 +
h
2!
+
h
2
3!
+ ···
But
h
2!
+
h
2
3!
+ ···
|h|
2
+
|h|
2
4
+
|h|
3
8
+ ··· =
|h|/2
1 |h|/2
0.
So
e
z+h
e
z
h
e
z
.
But we must still prove that e
z+w
= e
z
e
w
.
Consider two sequences (
a
n
)
,
(
b
n
). Their convolution is the sequence (
c
n
)
defined by
c
n
= a
0
b
n
+ a
1
b
n1
+ a
2
b
n2
+ ··· + a
n
b
0
.
The relevance of this is that if you take
N
X
n=0
a
n
z
n
!
N
X
n=0
b
n
z
n
!
and
N
X
n=0
c
n
z
n
,
and equate coefficients of z
n
, you get
c
n
= a
0
b
n
+ a
1
b
n1
+ a
2
b
n2
+ ··· + a
n
b
0
.
Theorem. Let
P
n=0
a
n
and
P
n=0
b
n
be two absolutely convergent series, and
let (
c
n
) be the convolution of the sequences (
a
n
) and (
b
n
). Then
P
n=0
c
n
converges (absolutely), and
X
n=0
c
n
=
X
n=0
a
n
!
X
n=0
b
n
!
.
Proof.
We first show that a rearrangement of
P
c
n
converges absolutely. Hence
it converges unconditionally, and we can rearrange it back to
P
c
n
.
Consider the series
(a
0
b
0
) + (a
0
b
1
+ a
1
b
1
+ a
1
b
0
) + (a
0
b
2
+ a
1
b
2
+ a
2
b
2
+ a
2
b
1
+ a
2
b
0
) + ··· ()
Let
S
N
=
N
X
n=0
a
n
, T
N
=
N
X
n=0
b
n
, U
N
=
N
X
n=0
|a
n
|, V
N
=
N
X
n=0
|b
n
|.
Also let
S
N
S, T
N
T, U
N
U, V
N
V
(these exist since
P
a
n
and
P
b
n
converge absolutely).
If we take the modulus of the terms of (
), and consider the first (
N
+ 1)
2
terms (i.e. the first
N
+1 brackets), the sum is
U
N
V
N
. Hence the series converges
absolutely to UV . Hence () converges.
The partial sum up to (
N
+ 1)
2
of the series (
) itself is
S
N
T
N
, which
converges to ST . So the whole series converges to ST .
Since it converges absolutely, it converges unconditionally. Now consider a
rearrangement:
a
0
b
0
+ (a
0
b
1
+ a
1
b
0
) + (a
0
b
2
+ a
1
b
1
+ a
2
b
0
) + ···
Then this converges to
ST
as well. But the partial sum of the first 1 + 2+
···
+
N
terms is c
0
+ c
1
+ ··· + c
N
. So
N
X
n=0
c
n
ST =
X
n=0
a
n
!
X
n=0
b
n
!
.
Corollary.
e
z
e
w
= e
z+w
.
Proof. By theorem above (and definition of e
z
),
e
z
e
w
=
X
n=0
1 ·
w
n
n!
+
z
1!
w
n1
(n 1)!
+
z
2
2!
w
n2
(n 2)!
+ ··· +
z
n
n!
· 1
e
z
e
w
=
X
n=0
1
n!
w
n
+
n
1
zw
n1
+
n
2
z
2
w
n2
+ ··· +
n
n
z
n
=
X
n=0
(z + w)
n
by the binomial theorem
= e
z+w
.
Note that if (
c
n
) is the convolution of (
a
n
) and (
b
n
), then the convolution
of (
a
n
z
n
) and (
b
n
z
n
) is (
c
n
z
n
). Therefore if both
P
a
n
z
n
and
P
b
n
z
n
converge
absolutely, then their product is
P
c
n
z
n
.
Note that we have now completed the proof that the derivative of e
z
is e
z
.
Now we define sin z and cos z:
Definition (Sine and cosine).
sin z =
e
iz
e
iz
2i
= z
z
3
3!
+
z
5
5!
z
7
7!
+ ···
cos z =
e
iz
+ e
iz
2
= 1
z
2
2!
+
z
4
4!
z
6
6!
+ ··· .
We now prove certain basic properties of
sin
and
cos
, using known properties
of e
z
.
Proposition.
d
dz
sin z =
ie
iz
+ ie
iz
2i
= cos z
d
dz
cos z =
ie
iz
ie
iz
2
= sin z
sin
2
z + cos
2
z =
e
2iz
+ 2 + e
2iz
4
+
e
2iz
2 + e
2iz
4
= 1.
It follows that if x is real, then |cos x| and |sin x| are at most 1.
Proposition.
cos(z + w) = cos z cos w sin z sin w
sin(z + w) = sin z cos w + cos z sin w
Proof.
cos z cos w sin z sin w =
(e
iz
+ e
iz
)(e
iw
+ e
iw
)
4
+
(e
iz
e
iz
)(e
iw
e
iw
)
4
=
e
i(z+w)
+ e
i(z+w)
2
= cos(z + w).
Differentiating both sides wrt z gives
sin z cos w cos z sin w = sin(z + w).
So
sin(z + w) = sin z cos w + cos z sin w.
When
x
is real, we know that
cos x
1. Also
sin
0 = 0, and
d
dx
sin x
=
cos x
1. So for
x
0,
sin x x
, “by the mean value theorem”. Also,
cos
0 = 1,
and
d
dx
cos x
=
sin x
, which, for
x
0, is greater than
x
. From this, it follows
that when
x
0,
cos x
1
x
2
2
(the 1
x
2
2
comes from “integrating”
x
, (or
finding a thing whose derivative is x)).
Continuing in this way, we get that for
x
0, if you take truncate the power
series for
sin x
or
cos x
, it will be
sin x, cos x
if you stop at a positive term,
and if you stop at a negative term. For example,
sin x x
x
3
3!
+
x
5
5!
x
7
7!
+
x
9
9!
x
11
11!
.
In particular,
cos 2 1
2
2
2!
+
2
4
4!
= 1 2 +
2
3
< 0.
Since
cos
0 = 1, it follows by the intermediate value theorem that there exists
some
x
(0
,
2) such that
cos x
= 0. Since
cos x
1
x
2
2
, we can further deduce
that x > 1.
Definition (Pi). Define the smallest x such that cos x = 0 to be
π
2
.
Since
sin
2
z
+
cos
2
z
= 1, it follows that
sin
π
2
=
±
1. Since
cos x >
0 on [0
,
π
2
],
sin
π
2
0 by the mean value theorem. So sin
π
2
= 1.
Thus
Proposition.
cos
z +
π
2
= sin z
sin
z +
π
2
= cos z
cos(z + π) = cos z
sin(z + π) = sin z
cos(z + 2π) = cos z
sin(z + 2π) = sin z
Proof.
cos
z +
π
2
= cos z cos
π
2
sin z sin
π
2
= sin z sin
π
2
= sin z
and similarly for others.
6.2 Differentiating power series
We shall show that inside the circle of convergence, the derivative of
P
n=0
a
n
z
is
given by the obvious formula
P
n=1
na
n
z
n1
.
We first prove some (seemingly arbitrary and random) lemmas to build up
the proof of the above statement. They are done so that the final proof will not
be full of tedious algebra.
Lemma. Let a and b be complex numbers. Then
b
n
a
n
n(b a)a
n1
= (b a)
2
(b
n2
+ 2ab
n3
+ 3a
2
b
n4
+ ···+ (n 1)a
n2
).
Proof. If b = a, we are done. Otherwise,
b
n
a
n
b a
= b
n1
+ ab
n2
+ a
2
b
n3
+ ··· + a
n1
.
Differentiate both sides with respect to a. Then
na
n1
(b a) + b
n
a
n
(b a)
2
= b
n2
+ 2ab
n3
+ ··· + (n 1)a
n2
.
Rearranging gives the result.
Alternatively, we can do
b
n
a
n
= (b a)(b
n1
+ ab
n2
+ ··· + a
n1
).
Subtract n(b a)a
n1
to obtain
(b a)[b
n1
a
n1
+ a(b
n2
a
n2
) + a
2
(b
n3
a
n3
) + ···]
and simplify.
This implies that
(z + h)
n
z
n
nhz
n1
= h
2
((z + h)
n2
+ 2z(z + h)
n3
+ ··· + (n 1)z
n2
),
which is actually the form we need.
Lemma. Let
a
n
z
n
have radius of convergence
R
, and let
|z| < R
. Then
P
na
n
z
n1
converges (absolutely).
Proof.
Pick
r
such that
|z| < r < R
. Then
P
|a
n
|r
n
converges, so the terms
|a
n
|r
n
are bounded above by, say, C. Now
X
n|a
n
z
n1
| =
X
n|a
n
|r
n1
|z|
r
n1
C
r
X
n
|z|
r
n1
The series
P
n
|z|
r
n1
converges, by the ratio test. So
P
n|a
n
z
n1
|
converges,
by the comparison test.
Corollary. Under the same conditions,
X
n=2
n
2
a
n
z
n2
converges absolutely.
Proof. Apply Lemma above again and divide by 2.
Theorem. Let
P
a
n
z
n
be a power series with radius of convergence
R
. For
|z| < R, let
f(z) =
X
n=0
a
n
z
n
and g(z) =
X
n=1
na
n
z
n1
.
Then f is differentiable with derivative g.
Proof. We want f(z + h) f(z) hg(z) to be o(h). We have
f(z + h) f(z) hg(z) =
X
n=2
a
n
((z + h)
n
z
n
hnz
n
).
We started summing from
n
= 2 since the
n
= 0 and
n
= 1 terms are 0. Using
our first lemma, we are left with
h
2
X
n=2
a
n
(z + h)
n2
+ 2z(z + h)
n3
+ ··· + (n 1)z
n2
We want the huge infinite series to be bounded, and then the whole thing is a
bounded thing times h
2
, which is definitely o(h).
Pick
r
such that
|z| < r < R
. If
h
is small enough that
|z
+
h| r
, then the
last infinite series is bounded above (in modulus) by
X
n=2
|a
n
|(r
n2
+ 2r
n2
+ ··· + (n 1)r
n2
) =
X
n=2
|a
n
|
n
2
r
n2
,
which is bounded. So done.
In IB Analysis II, we will prove the same result using the idea of uniform
convergence, which gives a much nicer proof.
Example. The derivative of
e
z
= 1 + z +
z
2
2!
+
z
3
3!
+ ···
is
1 + z +
z
2
2!
+ ··· = e
z
.
So we have another proof that of this fact.
Similarly, the derivatives of sin z and cos z work out as cos z and sin z.
6.3 Hyperbolic trigonometric functions
Definition (Hyperbolic sine and cosine). We define
cosh z =
e
z
+ e
z
2
= 1 +
z
2
2!
+
z
4
4!
+
z
6
6!
+ ···
sinh z =
e
z
e
z
2
= z +
z
3
3!
+
z
5
5!
+
z
7
7!
+ ···
Either from the definition or from differentiating the power series, we get
that
Proposition.
d
dz
cosh z = sinh z
d
dz
sinh z = cosh z
Also, by definition, we have
Proposition.
cosh iz = cos z
sinh iz = i sin z
Also,
Proposition.
cosh
2
z sinh
2
z = 1,
7 The Riemann Integral
Finally, we can get to integrals. There are many ways to define an integral,
which can have some subtle differences. The definition we will use here is the
Riemann integral, which is the simplest definition, but is also the weakest one, in
the sense that many functions are not Riemann integrable but integrable under
other definitions.
Still, the definition of the Riemann integral is not too straightforward, and
requires a lot of preliminary definitions.
7.1 Riemann Integral
Definition (Dissections). Let [
a, b
] be a closed interval. A dissection of [
a, b
] is
a sequence a = x
0
< x
1
< x
2
< ··· < x
n
= b.
Definition (Upper and lower sums). Given a dissection
D
, the upper sum and
lower sum are defined by the formulae
U
D
(f) =
n
X
i=1
(x
i
x
i1
) sup
x[x
i1
,x
i
]
f(x)
L
D
(f) =
n
X
i=1
(x
i
x
i1
) inf
x[x
i1
,x
i
]
f(x)
Sometimes we use the shorthand
M
i
= sup
x[x
i1
,x
i
]
f(x), m
i
= inf
x[x
i1
x
i
]
f(x).
The upper sum is the total area of the red rectangles, while the lower sum is
the total area of the black rectangles:
x
y
a x
1
x
2
x
3
···
x
i
x
i+1
···
···
b
Definition (Refining dissections). If
D
1
and
D
2
are dissections of [
a, b
], we say
that D
2
refines D
1
if every point of D
1
is a point of D
2
.
Lemma. If D
2
refines D
1
, then
U
D
2
f U
D
1
f and L
D
2
f
L
D
1
f.
Using the picture above, this is because if we cut up the dissections into
smaller pieces, the red rectangles can only get chopped into shorter pieces and
the black rectangles can only get chopped into taller pieces.
x
y
x
0
x
1
x
y
x
0
x
1
x
2
x
3
Proof.
Let
D
be
x
0
< x
1
< ··· < x
n
. Let
D
2
be obtained from
D
1
by the
addition of one point z. If z (x
i1
, x
i
), then
U
D
2
f U
D
1
f =
"
(z x
i1
) sup
x[x
i1
,z]
f(x)
#
+
"
(x
i
z) sup
x[z,x
i
]
f(x)
#
(x
i
x
i1
)M
i
.
But
sup
x[x
i1
,z]
f
(
x
) and
sup
x[z,x
i
]
f
(
x
) are both at most
M
i
. So this is at
most M
i
(z x
i1
+ x
i
z (x
i
x
i1
)) = 0. So
U
D
2
f U
D
1
f.
By induction, the result is true whenever D
2
refines D
1
.
A very similar argument shows that L
D
2
f
L
D
1
f.
Definition (Least common refinement). If
D
1
and
D
2
be dissections of [
a, b
].
Then the least common refinement of
D
1
and
D
2
is the dissection made out of
the points of D
1
and D
2
.
Corollary. Let D
1
and D
2
be two dissections of [a, b]. Then
U
D
1
f L
D
2
f.
Proof.
Let
D
be the least common refinement (or indeed any common refinement).
Then by lemma above (and by definition),
U
D
1
f U
D
f L
D
f L
D
2
f.
Finally, we can define the integral.
Definition (Upper, lower, and Riemann integral). The upper integral is
Z
b
a
f(x) dx = inf
D
U
D
f.
The lower integral is
Z
b
a
f(x) dx = sup
D
L
D
f.
If these are equal, then we call their common value the Riemann integral of
f
,
and is denoted
R
b
a
f(x) dx.
If this exists, we say f is Riemann integrable.
We will later prove the fundamental theorem of calculus, which says that
integration is the reverse of differentiation. But why don’t we simply define
integration as anti-differentiation, and prove that it is the area of the curve?
There are things that we cannot find (a closed form of) the anti-derivative of,
like
e
x
2
. In these cases, we wouldn’t want to say the integral doesn’t exist it
surely does according to this definition!
There is an immediate necessary condition for Riemann integrability bound-
edness. If
f
is unbounded above in [
a, b
], then for any dissection
D
, there must
be some
i
such that
f
is unbounded on [
x
i1
, x
i
]. So
M
i
=
. So
U
D
f
=
.
Similarly, if
f
is unbounded below, then
L
D
f
=
−∞
. So unbounded functions
are not Riemann integrable.
Example. Let
f
(
x
) =
x
on [
a, b
]. Intuitively, we know that the integral is
(
b
2
a
2
)
/
2, and we will show this using the definition above. Let
D
=
x
0
<
x
1
< ··· < x
n
be a dissection. Then
U
D
f =
n
X
i=1
(x
i
x
i1
)x
i
We know that the integral is
b
2
a
2
2
. So we put each term of the sum into the
form
x
2
i
x
2
i1
2
plus some error terms.
=
n
X
i=1
x
2
i
2
x
2
i1
2
+
x
2
i
2
x
i1
x
i
+
x
2
i1
2
=
1
2
n
X
i=1
(x
2
i
x
2
i1
+ (x
i
x
i1
)
2
)
=
1
2
(b
2
a
2
) +
1
2
n
X
i=1
(x
i
x
i1
)
2
.
Definition (Mesh). The mesh of a dissection D is max
i
(x
i+1
x
i
).
Then if the mesh is < δ, then
1
2
n
X
i=1
(x
i
x
i1
)
2
δ
2
n
X
i=1
(x
i
x
i1
) =
δ
2
(b a).
So by making δ small enough, we can show that for any ε > 0,
Z
b
a
x dx <
1
2
(b
2
a
2
) + ε.
Similarly,
Z
b
a
x dx >
1
2
(b
2
a
2
) ε.
So
Z
b
a
x dx =
1
2
(b
2
a
2
).
Example. Define f : [0, 1] R by
f(x) =
(
1 x Q
0 x 6∈ Q
.
Let
x
0
< x
1
< ··· < x
n
be a dissection. Then for every
i
, we have
m
i
= 0 (since
there is an irrational in every interval), and
M
i
= 1 (since there is a rational in
every interval). So
U
D
f =
n
X
i=1
M
i
(x
i
x
i1
) =
n
X
i=1
(x
i
x
i1
) = 1.
Similarly, L
D
f = 0. Since D was arbitrary, we have
Z
1
0
f(x) dx = 1,
Z
1
0
f(x) dx = 0.
So f is not Riemann integrable.
Of course, this function is not interesting at all. The whole point of its
existence is to show undergraduates that there are some functions that are not
integrable!
Note that it is important to say that the function is not Riemann integrable.
There are other notions for integration in which this function is integrable. For
example, this function is Lebesgue-integrable.
Using the definition to show integrability is often cumbersome. Most of
the time, we use the Riemann’s integrability criterion, which follows rather
immediately from the definition, but is much nicer to work with.
Proposition (Riemann’s integrability criterion). This is sometimes known as
Cauchy’s integrability criterion.
Let
f
: [
a, b
]
R
. Then
f
is Riemann integrable if and only if for every
ε > 0, there exists a dissection D such that
U
D
L
D
< ε.
Proof.
(
) Suppose that
f
is integrable. Then (by definition of Riemann
integrability), there exist D
1
and D
2
such that
U
D
1
<
Z
b
a
f(x) dx +
ε
2
,
and
L
D
2
>
Z
b
a
f(x) dx
ε
2
.
Let D be a common refinement of D
1
and D
2
. Then
U
D
f L
D
f U
D
1
f L
D
2
f < ε.
() Conversely, if there exists D such that
U
D
f L
D
f < ε,
then
inf U
D
f sup L
D
f < ε,
which is, by definition, that
Z
b
a
f(x) dx
Z
b
a
f(x) dx < ε.
Since ε > 0 is arbitrary, this gives us that
Z
b
a
f(x) dx =
Z
b
a
f(x) dx.
So f is integrable.
The next big result we want to prove is that integration is linear, ie
Z
b
a
(λf(x) + µg(x)) dx = λ
Z
b
a
f(x) dx + µ
Z
b
a
g(x) dx.
We do this step by step:
Proposition. Let
f
: [
a, b
]
R
be integrable, and
λ
0. Then
λf
is integrable,
and
Z
b
a
λf(x) dx = λ
Z
b
a
f(x) dx.
Proof. Let D be a dissection of [a, b]. Since
sup
x[x
i1
,x
i
]
λf(x) = λ sup
x[x
i1
,x
i
]
f(x),
and similarly for inf, we have
U
D
(λf) = λU
D
f
L
D
(λf) = λL
D
f.
So if we choose
D
such that
U
D
f L
D
f < ε/λ
, then
U
D
(
λf
)
L
D
(
λf
)
< ε
. So
the result follows from Riemann’s integrability criterion.
Proposition. Let f : [a, b] R be integrable. Then f is integrable, and
Z
b
a
f(x) dx =
Z
b
a
f(x) dx.
Proof. Let D be a dissection. Then
sup
x[x
i1
,x
i
]
f(x) = inf
x[x
i1
,x
i
]
f(x)
inf
x[x
i1
,x
i
]
f(x) = sup
x[x
i1
,x
i
]
f(x).
Therefore
U
D
(f) =
n
X
i=1
(x
i
x
i1
)(m
i
) = L
D
(f).
Similarly,
L
D
(f) = U
D
f.
So
U
D
(f) L
D
(f) = U
D
f L
D
f.
Hence if
f
is integrable, then
f
is integrable by the Riemann integrability
criterion.
Proposition. Let
f, g
: [
a, b
]
R
be integrable. Then
f
+
g
is integrable, and
Z
b
a
(f(x) + g(x)) dx =
Z
b
a
f(x) dx +
Z
b
a
g(x) dx.
Proof. Let D be a dissection. Then
U
D
(f + g) =
n
X
i=1
(x
i
x
i1
) sup
x[x
i1
,x
i
]
(f(x) + g(x))
n
X
i=1
(x
i
x
i1
)
sup
u[x
i1
,x
i
]
f(u) + sup
v[x
i1
,x
i
]
g(v)
!
= U
D
f + U
D
g
Therefore,
Z
b
a
(f(x) + g(x)) dx
Z
b
a
f(x) dx +
Z
b
a
g(x) dx =
Z
b
a
f(x) dx +
Z
b
a
g(x) dx.
Similarly,
Z
b
a
(f(x) + g(x)) dx
Z
b
a
f(x) dx +
Z
b
a
g(x) dx.
So the upper and lower integrals are equal, and the result follows.
So we now have that
Z
b
a
(λf(x) + µg(x)) dx = λ
Z
b
a
f(x) dx + µ
Z
b
a
g(x) dx.
We will prove more “obvious” results.
Proposition. Let
f, g
: [
a, b
]
R
be integrable, and suppose that
f
(
x
)
g
(
x
)
for every x. Then
Z
b
a
f(x) dx
Z
b
a
g(x) dx.
Proof. Follows immediately from the definition.
Proposition. Let f : [a, b] R be integrable. Then |f| is integrable.
Proof. Note that we can write
sup
x[x
i1
,x
i
]
f(x) inf
x[x
i1
,x
i
]
f(x) = sup
u,v[x
i1
,x
i
]
|f(u) f(v)|.
Similarly,
sup
x[x
i1
,x
i
]
|f(x)| inf
x[x
i1
,x
i
]
|f(x)| = sup
u,v[x
i1
,x
i
]
||f(u)| |f(v)||.
For any pair of real numbers,
x, y
, we have that
||x| |y|| |x y|
by the
triangle inequality. Then for any interval u, v [x
i1
, x
i
], we have
||f(u)| |f(v)|| |f(u) f(v)|.
Hence we have
sup
x[x
i1
,x
i
]
|f(x)| inf
x[x
i1
,x
i
]
|f(x)| sup
x[x
i1
,x
i
]
f(x) inf
x[x
i1
,x
i
]
f(x).
So for any dissection D, we have
U
D
(|f|) L
D
(|f|) U
D
(f) L
D
(f).
So the result follows from Riemann’s integrability criterion.
Combining these two propositions, we get that if
|f(x) g(x)| C,
for every x [a, b], then
Z
b
a
f(x) dx
Z
b
a
g(x) dx
C(b a).
Proposition (Additivity property). Let
f
: [
a, c
]
R
be integrable, and let
b
(
a, c
). Then the restrictions of
f
to [
a, b
] and [
b, c
] are Riemann integrable,
and
Z
b
a
f(x) dx +
Z
c
b
f(x) dx =
Z
c
a
f(x) dx
Similarly, if
f
is integrable on [
a, b
] and [
b, c
], then it is integrable on [
a, c
] and
the above equation also holds.
Proof.
Let
ε >
0, and let
a
=
x
0
< x
1
< ··· < x
n
=
c
be a dissection of
D
of
[a, c] such that
U
D
(f)
Z
c
a
f(x) dx + ε,
and
L
D
(f)
Z
c
a
f(x) dx ε.
Let
D
0
be the dissection made of
D
plus the point
b
. Let
D
1
be the dissection of
[
a, b
] made of points of
D
0
from
a
to
b
, and
D
2
be the dissection of [
b, c
] made of
points of D
0
from b to c. Then
U
D
1
(f) + U
D
2
(f) = U
D
0
(f) U
D
(f),
and
L
D
1
(f) + L
D
2
(f) = L
D
0
(f) L
D
(f).
Since
U
D
(
f
)
L
D
(
f
)
<
2
ε
, and both
U
D
2
(
f
)
L
D
2
(
f
) and
U
D
1
(
f
)
L
D
1
(
f
)
are non-negative, we have
U
D
1
(
f
)
L
D
1
(
f
) and
U
D
2
(
f
)
L
D
2
(
f
) are less than
2
ε
. Since
ε
is arbitrary, it follows that the restrictions of
f
to [
a, b
] and [
b, c
] are
both Riemann integrable. Furthermore,
Z
b
a
f(x) dx +
Z
c
b
f(x) dx U
D
1
(f) + U
D
2
(f) = U
D
0
(f) U
D
(f)
Z
c
a
f(x) dx + ε.
Similarly,
Z
b
a
f(x) dx +
Z
c
b
f(x) dx L
D
1
(f) + L
D
2
(f) = L
D
0
(f) L
D
(f)
Z
c
a
f(x) dx ε.
Since ε is arbitrary, it follows that
Z
b
a
f(x) dx +
Z
c
b
f(x) dx =
Z
c
a
f(x) dx.
The other direction is left as an (easy) exercise.
Proposition. Let f, g : [a, b] R be integrable. Then f g is integrable.
Proof.
Let
C
be such that
|f
(
x
)
|, |g
(
x
)
| C
for every
x
[
a, b
]. Write
L
i
and
`
i
for the
sup
and
inf
of
g
in [
x
i1
, x
i
]. Now let
D
be a dissection, and for each
i, let u
i
and v
i
be two points in [x
i1
, x
i
].
We will pretend that
u
i
and
v
i
are the minimum and maximum when we
write the proof, but we cannot assert that they are, since
fg
need not have
maxima and minima. We will then note that since our results hold for arbitrary
u
i
and v
i
, it must hold when fg is at its supremum and infimum.
We find what we pretend is the difference between the upper and lower sum:
n
X
i=1
x
i
x
i1
)(f(v
i
)g(v
i
) f (u
i
)g(u
i
)
=
n
X
i=1
(x
i
x
i1
)
f(v
i
)(g(v
i
) g(u
i
)) + (f (v
i
) f (u
i
))g(u
i
)
n
X
i=1
C(L
i
`
i
) + (M
i
m
i
)C
= C(U
D
g L
D
g + U
D
f L
D
f).
Since u
i
and v
i
are arbitrary, it follows that
U
D
(fg) L
D
(fg) C(U
D
f L
D
f + U
D
g L
D
g).
Since
C
is fixed, and we can get
U
D
f L
D
f
and
U
D
g L
D
g
arbitrary small
(since
f
and
g
are integrable), we can get
U
D
(
fg
)
L
D
(
fg
) arbitrarily small. So
the result follows.
Theorem. Every continuous function
f
on a closed bounded interval [
a, b
] is
Riemann integrable.
Proof. wlog assume [a, b] = [0, 1].
Suppose the contrary. Let
f
be non-integrable. This means that there exists
some
ε
such that for every dissection
D
,
U
D
L
D
> ε
. In particular, for every
n, let D
n
be the dissection 0,
1
n
,
2
n
, ··· ,
n
n
.
Since
U
D
n
L
D
n
> ε
, there exists some interval
k
n
,
k+1
n
in which
sup f
inf f > ε
. Suppose the supremum and infimum are attained at
x
n
and
y
n
respectively. Then we have |x
n
y
n
| <
1
n
and f (x
n
) f (y
n
) > ε.
By Bolzano Weierstrass, (
x
n
) has a convergent subsequence, say (
x
n
i
). Say
x
n
i
x
. Since
|x
n
y
n
| <
1
n
0, we must have
y
n
i
x
. By continuity, we
must have
f
(
x
n
i
)
f
(
x
) and
f
(
y
n
i
)
f
(
x
), but
f
(
x
n
i
) and
f
(
y
n
i
) are always
apart by ε. Contradiction.
With this result, we know that a lot of things are integrable, e.g. e
x
2
.
To prove this, we secretly used the property of uniform continuity:
Definition (Uniform continuity*). Let
A R
and let
f
:
A R
. Then
f
is
uniformly continuous if
(ε)(δ > 0)(x)(y) |x y| < δ |f(x) f(y)| ε.
This is different from regular continuity. Regular continuity says that at any
point
x
, we can find a
δ
that works for this point. Uniform continuity says that
we can find a δ that works for any x.
It is easy to show that a uniformly continuous function is integrable, since by
uniformly continuity, as long as the mesh of a dissection is sufficiently small, the
difference between the upper sum and the lower sum can be arbitrarily small by
uniform continuity. Thus to prove the above theorem, we just have to show that
continuous functions on a closed bounded interval are uniformly continuous.
Theorem (non-examinable). Let
a < b
and let
f
: [
a, b
]
R
be continuous.
Then f is uniformly continuous.
Proof. Suppose that f is not uniformly continuous. Then
(ε)(δ > 0)(x)(y) |x y| < δ and |f(x) f(y)| ε.
Therefore, we can find sequences (x
n
), (y
n
) such that for every n, we have
|x
n
y
n
|
1
n
and |f (x
n
) f (y
n
)| ε.
Then by Bolzano-Weierstrass theorem, we can find a subsequence (
x
n
k
) converg-
ing to some
x
. Since
|x
n
k
y
n
k
|
1
n
k
,
y
n
k
x
as well. But
|f
(
x
n
k
)
f
(
y
n
k
)
| ε
for every
k
. So
f
(
x
n
k
) and
f
(
y
n
k
) cannot both converge to the same limit. So
f
is not continuous at x.
This proof is very similar to the proof that continuous functions are integrable.
In fact, the proof that continuous functions are integrable is just a fuse of this
proof and the (simple) proof that uniformly continuously functions are integrable.
Theorem. Let f : [a, b] R be monotone. Then f is Riemann integrable.
Note that monotone functions need not be “nice”. It can even have in-
finitely many discontinuities. For example, if
f
: [0
,
1]
R
maps
x
to the
1/(first non-zero digit in the binary expansion of x), with f(0) = 0.
Proof. let ε > 0. Let D be a dissection of mesh less than
ε
f(b)f (a)
. Then
U
D
f L
D
f =
n
X
i=1
(x
i
x
i1
)(f(x
i
) f (x
i1
))
ε
f(b) f(a)
n
X
i=1
(f(x
i
) f (x
i1
))
= ε.
Pictorially, we see that the difference between the upper and lower sums is
total the area of the red rectangles.
x
y
To calculate the total area, we can stack the red areas together to get something
of width
ε
f(b)f (a)
and height f (b) f(a). So the total area is just ε.
Lemma. Let
a < b
and let
f
be a bounded function from [
a, b
]
R
that is
continuous on (a, b). Then f is integrable.
An example where this would apply is
R
1
0
sin
1
x
. It gets nasty near
x
= 0, but
its “nastiness” is confined to
x
= 0 only. So as long as its nastiness is sufficiently
contained, it would still be integrable.
The idea of the proof is to integrate from a point
x
1
very near
a
up to a
point
x
n1
very close to
b
. Since
f
is bounded, the regions [
a, x
1
] and [
x
n1
, b
]
are small enough to not cause trouble.
Proof.
Let
ε >
0. Suppose that
|f
(
x
)
| C
for every
x
[
a, b
]. Let
x
0
=
a
and
pick
x
1
such that
x
1
x
0
<
ε
8C
. Also choose
z
between
x
1
and
b
such that
b z <
ε
8C
.
Then
f
is continuous [
x
1
, z
]. Therefore it is integrable on [
x
1
, z
]. So we can
find a dissection D
0
with points x
1
< x
2
< ··· < x
n1
= z such that
U
D
0
f L
D
0
f <
ε
2
.
Let D be the dissection a = x
0
< x
1
< ··· < x
n
= b. Then
U
D
f L
D
f <
ε
8C
· 2C +
ε
2
+
ε
8C
· 2C = ε.
So done by Riemann integrability criterion.
Example.
f(x) =
(
sin
1
x
x 6= 0
0 x = 0
defined on [1, 1] is integrable.
g(x) =
(
x x 1
x
2
+ 1 x > 1
defined on [0, 1] is integrable.
Corollary. Every piecewise continuous and bounded function on [
a, b
] is inte-
grable.
Proof.
Partition [
a, b
] into intervals
I
1
, ··· , I
k
, on each of which
f
is (bounded
and) continuous. Hence for every
I
j
with end points
x
j1
,
x
j
,
f
is integrable on
[
x
j1
, x
j
] (which may not equal
I
j
, e.g.
I
j
could be [
x
j1
, x
j
)). But then by the
additivity property of integration, we get that f is integrable on [a, b]
We defined Riemann integration in a very general way we allowed arbitrary
dissections, and took the extrema over all possible dissection. Is it possible to
just consider some particular nice dissections instead? Perhaps unsurprisingly,
yes! It’s just that we opt to define it the general way so that we can easily talk
about things like least common refinements.
Lemma. Let
f
: [
a, b
]
R
be Riemann integrable, and for each
n
, let
D
n
be
the dissection
a
=
x
0
< x
1
< ··· < x
n
=
b
, where
x
i
=
a
+
i(ba)
n
for each
i
.
Then
U
D
n
f
Z
b
a
f(x) dx
and
L
D
n
f
Z
b
a
f(x) dx.
Proof.
Let
ε >
0. We need to find an
N
. The only thing we know is that
f
is
Riemann integrable, so we use it:
Since
f
is integrable, there is a dissection
D
, say
u
0
< u
1
< ··· < u
m
, such
that
U
D
f
Z
b
a
f(x) dx <
ε
2
.
We also know that f is bounded. Let C be such that |f(x)| C.
For any n, let D
0
be the least common refinement of D
n
and D. Then
U
D
0
f U
D
f.
Also, the sums
U
D
n
f
and
U
D
0
f
are the same, except that at most
m
of the
subintervals [x
i1
, x
i
] are subdivided in D
0
.
For each interval that gets chopped up, the upper sum decreases by at most
ba
n
· 2C. Therefore
U
D
n
f U
D
0
f
b a
n
2C · m.
Pick n such that 2Cm(b a)/n <
ε
2
. Then
U
D
n
f U
D
f <
ε
2
.
So
U
D
n
f
Z
b
a
f(x) dx < ε.
This is true whenever
n >
4C(ba)m
ε
. Since we also have
U
D
n
f
R
b
a
f
(
x
) d
x
,
therefore
U
D
n
f
Z
b
a
f(x) dx.
The proof for lower sums is similar.
For convenience, we define the following:
Notation. If b > a, we define
Z
a
b
f(x) dx =
Z
b
a
f(x) dx.
We now prove that the fundamental theorem of calculus, which says that
integration is the reverse of differentiation.
Theorem (Fundamental theorem of calculus, part 1). Let
f
: [
a, b
]
R
be
continuous, and for x [a, b], define
F (x) =
Z
x
a
f(t) dt.
Then F is differentiable and F
0
(x) = f(x) for every x.
Proof.
F (x + h) F (x)
h
=
1
h
Z
x+h
x
f(t) dt
Let
ε >
0. Since
f
is continuous, at
x
, then there exists
δ
such that
|y x| < δ
implies |f (y) f (x)| < ε.
If |h| < δ, then
1
h
Z
x+h
x
f(t) dt f(x)
=
1
h
Z
x+h
x
(f(t) f(x)) dt
1
|h|
Z
x+h
x
|f(t) f(x)| dt
ε|h|
|h|
= ε.
Corollary. If f is continuously differentiable on [a, b], then
Z
b
a
f
0
(t) dt = f (b) f (a).
Proof. Let
g(x) =
Z
x
a
f
0
(t) dt.
Then
g
0
(x) = f
0
(x) =
d
dx
(f(x) f(a)).
Since
g
0
(
x
)
f
0
(
x
) = 0,
g
(
x
)
f
(
x
) must be a constant function by the mean
value theorem. We also know that
g(a) = 0 = f(a) f(a)
So we must have
g
(
x
) =
f
(
x
)
f
(
a
) for every
x
, and in particular, for
x
=
b
.
Theorem (Fundamental theorem of calculus, part 2). Let
f
: [
a, b
]
R
be a
differentiable function, and suppose that f
0
is integrable. Then
Z
b
a
f
0
(t) dt = f (b) f (a).
Note that this is a stronger result than the corollary above, since it does not
require that f
0
is continuous.
Proof.
Let
D
be a dissection
x
0
< x
1
< ··· < x
n
. We want to make use of this
dissection. So write
f(b) f(a) =
n
X
i=1
(f(x
i
) f (x
i1
)).
For each
i
, there exists
u
i
(
x
i1
, x
i
) such that
f
(
x
i
)
f
(
x
i1j
) = (
x
i
x
i1
)f
0
(u
i
) by the mean value theorem. So
f(b) f(a) =
n
X
i=1
(x
i
x
i1
)f
0
(u
i
).
We know that
f
0
(
u
i
) is somewhere between
sup
x[x
i
,x
i1
]
f
0
(
x
) and
inf
x[x
i
,x
i1
]
f
0
(
x
)
by definition. Therefore
L
D
f
0
f (b) f (a) U
D
f
0
.
Since
f
0
is integrable and
D
was arbitrary,
L
D
f
0
and
U
D
f
0
can both get arbitrarily
close to
R
b
a
f
0
(t) dt. So
f(b) f(a) =
Z
b
a
f
0
(t) dt.
Note that the condition that
f
0
is integrable is essential. It is possible to find
a differentiable function whose derivative is not integrable! You will be asked to
find it in the example sheet.
Using the fundamental theorem of calculus, we can easily prove integration
by parts:
Theorem (Integration by parts). Let
f, g
: [
a, b
]
R
be integrable such that
everything below exists. Then
Z
b
a
f(x)g
0
(x) dx = f (b)g(b) f (a)g(a)
Z
b
a
f
0
(x)g(x) dx.
Proof. By the fundamental theorem of calculus,
Z
b
a
(f(x)g
0
(x) + f
0
(x)g(x)) dx =
Z
b
a
(fg)
0
(x) dx = f (b)g(b) f (a)g(a).
The result follows after rearrangement.
Recall that when we first had Taylor’s theorem, we said it had the Lagrange
form of the remainder. There are many other forms of the remainder term. Here
we will look at the integral form:
Theorem (Taylor’s theorem with the integral form of the remainder). Let
f
be
n + 1 times differentiable on [a, b] with with f
(n+1)
continuous. Then
f(b) = f(a) + (b a)f
0
(a) +
(b a)
2
2!
f
(2)
(a) + ···
+
(b a)
n
n!
f
(n)
(a) +
Z
b
a
(b t)
n
n!
f
(n+1)
(t) dt.
Proof. Induction on n.
When n = 0, the theorem says
f(b) f(a) =
Z
b
a
f
0
(t) dt.
which is true by the fundamental theorem of calculus.
Now observe that
Z
b
a
(b t)
n
n!
f
(n+1)
(t) dt =
(b t)
n+1
(n + 1)!
f
(n+1)
(t)
b
a
+
Z
b
a
(b t)
n+1
(n + 1)!
f
(n+1)
(t) dt
=
(b a)
n+1
(n + 1)!
f
(n+1)
(a) +
Z
b
a
(b t)
n+1
(n + 1)!
f
(n+2)
(t) dt.
So the result follows by induction.
Note that the form of the integral remainder is rather weird and unexpected.
How could we have come up with it? We might start with the fundamental
theorem of algebra and integrate by parts. The first attempt would be to
integrate 1 to t and differentiate f
0
(t) to f
(2)
(t). So we have
f(b) = f(a) +
Z
b
a
f
0
(t) dt
= f(a) + [tf
0
(t)]
b
a
Z
b
a
tf
(2)
(t) dt
= f(a) + bf
0
(b) af
0
(a)
Z
b
a
tf
(2)
(t) dt
We want something in the form (
b a
)
f
0
(
a
), so we take that out and see what
we are left with.
= f(a) + (b a)f
0
(a) + b(f
0
(b) f
0
(a))
Z
b
a
tf
(2)
(t) dt
Then we note that f
0
(b) f
0
(a) =
R
b
a
f
(2)
(t) dt. So we have
= f(a) + (b a)f
0
(a) +
Z
b
a
(b t)f
(2)
(t) dt.
Then we can see that the right thing to integrate is (
b t
) and continue to obtain
the result.
Theorem (Integration by substitution). Let
f
: [
a, b
]
R
be continuous. Let
g
: [
u, v
]
R
be continuously differentiable, and suppose that
g
(
u
) =
a, g
(
v
) =
b
,
and f is defined everywhere on g([u, v]) (and still continuous). Then
Z
b
a
f(x) dx =
Z
v
u
f(g(t))g
0
(t) dt.
Proof.
By the fundamental theorem of calculus,
f
has an anti-derivative
F
defined on g([u, v]). Then
Z
v
u
f(g(t))g
0
(t) dt =
Z
v
u
F
0
(g(t))g
0
(t) dt
=
Z
v
u
(F g)
0
(t) dt
= F g(v) F g(u)
= F (b) F(a)
=
Z
b
a
f(x) dx.
We can think of “integration by parts” as what you get by integrating the
product rule, and “integration by substitution” as what you get by integrating
the chain rule.
7.2 Improper integrals
It is sometimes sensible to talk about integrals of unbounded functions or
integrating to infinity. But we have to be careful and write things down nicely.
Definition (Improper integral). Suppose that we have a function
f
: [
a, b
]
R
such that, for every
ε >
0,
f
is integrable on [
a
+
ε, b
] and
lim
ε0
R
b
a+ε
f
(
x
) d
x
exists. Then we define the improper integral
Z
b
a
f(x) dx to be lim
ε0
Z
b
a+ε
f(x) dx.
even if the Riemann integral does not exist.
We can do similarly for [a, b ε], or integral to infinity:
Z
a
f(x) dx = lim
b→∞
Z
b
a
f(x) dx.
when it exists.
Example.
Z
1
ε
x
1/2
dx =
h
2x
1/2
i
1
ε
= 2 2ε
1/2
2.
So
Z
1
0
x
1/2
dx = 2,
even though x
1/2
is unbounded on [0, 1].
Note that officially we are required to make
f
(
x
) =
x
1/2
a function with
domain [0
,
1]. So we can assign
f
(0) =
π
, or any number, since it doesn’t matter.
Example.
Z
x
1
1
t
2
dt =
1
t
x
1
= 1
1
x
1 as x
by the fundamental theorem of calculus. So
Z
1
1
x
2
dx = 1.
Finally, we can prove the integral test, whose proof we omitted when we first
began.
Theorem (Integral test). Let
f
: [1
,
]
R
be a decreasing non-negative
function. Then
P
n=1
f(n) converges iff
R
1
f(x) dx < .
Proof. We have
Z
n+1
n
f(x) dx f(n)
Z
n
n1
f(x) dx,
since
f
is decreasing (the right hand inequality is valid only for
n
2). It follows
that
Z
N+1
1
f(x) dx
N
X
n=1
f(n)
Z
N
1
f(x) dx + f(1)
So if the integral exists, then
P
f
(
n
) is increasing and bounded above by
R
1
f(x) dx, so converges.
If the integral does not exist, then
R
N
1
f
(
x
) d
x
is unbounded. Then
P
N
n=1
f
(
n
)
is unbounded, hence does not converge.
Example. Since
R
x
1
1
t
2
dt < , it follows that
P
n=1
1
n
2
converges.