6Differentiation from ℝm to ℝn

IB Analysis II



6.4 Inverse function theorem
Now, we get to the inverse function theorem. This is one of the most important
theorems of the course. This has many interesting and important consequences,
but we will not have time to get to these.
Before we can state the inverse function theorem, we need a definition.
Definition (
C
1
function). Let
U R
n
be open. We say f :
U R
m
is
C
1
on
U if f is differentiable at each x U and
Df : U L(R
n
, R
m
)
is continuous.
We write C
1
(U) or C
1
(U; R
m
) for the set of all C
1
maps from U to R
m
.
First we get a convenient alternative characterization of C
1
.
Proposition. Let
U R
n
be open. Then f = (
f
1
, ··· , f
n
) :
U R
n
is
C
1
on
U
if and only if the partial derivatives
D
j
f
i
(x) exists for all x
U
, 1
i n
,
1 j n, and D
j
f
i
: U R are continuous.
Proof. () Differentiability of f at x implies D
j
f
i
(x) exists and is given by
D
j
f
i
(x) = Df(x)e
j
, b
i
,
where {e
1
, ··· , e
n
} and {b
1
, ··· , b
m
} are the standard basis for R
n
and R
m
.
So we know
|D
j
f
i
(x) D
j
f
i
(y)| = |⟨(Df(x) Df (y))e
j
, b
i
⟩| Df (x) Df(y)
since e
j
and b
i
are unit vectors. Hence if Df is continuous, so is D
j
f
i
.
(
) Since the partials exist and are continuous, by our previous theorem, we
know that the derivative
D
f exists. To show
D
f :
U L
(
R
m
;
R
n
) is continuous,
note the following general fact:
For any linear map
A L
(
R
n
;
R
m
) represented by (
a
ij
) so that
A
h =
a
ij
h
j
,
then for x = (x
1
, ··· , x
n
), we have
Ax
2
=
m
X
i=1
n
X
j=1
A
ij
x
j
2
By Cauchy-Schwarz, we have
m
X
i=1
n
X
j=1
a
2
ij
n
X
j=1
x
2
j
= x
2
m
X
i=1
n
X
j=1
a
2
ij
.
Dividing by x
2
, we know
A
q
XX
a
2
ij
.
Applying this to A = Df(x) Df (y), we get
Df(x) Df (y)
q
XX
(D
j
f
i
(x) D
j
f
i
(y))
2
.
So if all D
j
f
i
are continuous, then so is Df.
If we do not wish to go through all that algebra to show the inequality
A
q
XX
a
2
ij
,
we can instead note that
q
PP
a
2
ij
is a norm on
L
(
R
n
, R
m
), since it is just the
Euclidean norm if we treat the matrix as a vector written in a funny way. So by
the equivalence of norms on finite-dimensional vector spaces, there is some
C
such that
A C
q
XX
a
2
ij
,
and then the result follows.
Finally, we can get to the inverse function theorem.
Theorem (Inverse function theorem). Let
U R
n
be open, and f :
U R
m
be a
C
1
map. Let a
U
, and suppose that
D
f(a) is invertible as a linear map
R
n
R
n
. Then there exists open sets
V, W R
n
with a
V
, f (a)
W
,
V U
such that
f|
V
: V W
is a bijection. Moreover, the inverse map f |
1
V
: W V is also C
1
.
We have a fancy name for these functions.
Definition (Diffeomorphism). Let
U, U
R
n
are open, then a map g :
U U
is a diffeomorphism if it is C
1
with a C
1
inverse.
Note that different people have different definitions for the word “diffeomor-
phism”. Some require it to be merely differentiable, while others require it to be
infinitely differentiable. We will stick with this definition.
Then the inverse function theorem says: if f is
C
1
and
D
f(a) is invertible,
then f is a local diffeomorphism at a.
Before we prove this, we look at the simple case where
n
= 1. Suppose
f
(
a
)
= 0. Then there exists a
δ
such that
f
(
t
)
>
0 or
f
(
t
)
<
0 in
t
(
aδ, a
+
δ
).
So
f|
(aδ,a+δ)
is monotone and hence is invertible. This is a triviality. However,
this is not a triviality even for n = 2.
Proof.
By replacing f with (
D
f(a))
1
f (or by rotating our heads and stretching
it a bit), we can assume
D
f(a) =
I
, the identity map. By continuity of
D
f, there
exists some r > 0 such that
Df(x) I <
1
2
for all x
B
r
(a)
. By shrinking
r
sufficiently, we can assume
B
r
(a) U
. Let
W = B
r/2
(f(a)), and let V = f
1
(W ) B
r
(a).
That was just our setup. There are three steps to actually proving the
theorem.
Claim. V is open, and f|
V
: V W is a bijection.
Since f is continuous, f
1
(
W
) is open. So
V
is open. To show f
|
V
:
V W
is bijection, we have to show that for each y
W
, then there is a unique x
V
such that f (x) = y. We are going to use the contraction mapping theorem to
prove this. This statement is equivalent to proving that for each y
W
, the
map T (x) = x f (x) + y has a unique fixed point x V .
Let h(x) = x f (x). Then note that
Dh(x) = I Df (x).
So by our choice of r, for every x B
r
(a), we must have
Dh(x)
1
2
.
Then for any x
1
, x
2
B
r
(a), we can use the mean value inequality to estimate
h(x
1
) h(x
2
)
1
2
x
1
x
2
.
Hence we know
T (x
1
) T (x
2
) = h(x
1
) h(x
2
)
1
2
x
1
x
2
.
Finally, to apply the contraction mapping theorem, we need to pick the right
domain for T , namely B
r
(a).
For any x B
r
(a), we have
T (x) a = x f(x) + y a
= x f(x) (a f(a)) + y f (a)
h(x) h(a)+ y f (a)
1
2
x a + y f (a)
<
r
2
+
r
2
= r.
So
T
:
B
r
(a) B
r
(a)
B
r
(a)
. Since
B
r
(a)
is complete,
T
has a unique fixed
point x
B
r
(a)
, i.e.
T
(x) = x. Finally, we need to show x
B
r
(a), since this is
where we want to find our fixed point. But this is true, since
T
(x)
B
r
(a) by
above. So we must have x
B
r
(a). Also, since
f
(x) = y, we know x
f
1
(
W
).
So x V .
So we have shown that for each y
W
, there is a unique x
V
such that
f(x) = y. So f |
V
: V W is a bijection.
We have done the hard work now. It remains to show that f
|
V
is invertible
with C
1
inverse.
Claim. The inverse map g = f
|
1
V
:
W V
is Lipschitz (and hence continuous).
In fact, we have
g(y
1
) g(y
2
) 2y
1
y
2
.
For any x
1
, x
2
V , by the triangle inequality, know
x
1
x
2
f(x
1
) f(x
2
) (x
1
f(x
1
)) (x
2
f(x
2
))
= h(x
1
) h(x
0
)
1
2
x
1
x
2
.
Hence, we get
x
1
x
2
2f (x
1
) f(x
2
).
Apply this to x
1
= g(y
1
) and x
2
= g(y
2
), and note that f(g(y
j
)) = y
j
to get
the desired result.
Claim. g is in fact C
1
, and moreover, for all y W ,
Dg(y) = Df (g(y))
1
. ()
Note that if g were differentiable, then its derivative must be given by (
),
since by definition, we know
f(g(y)) = y,
and hence the chain rule gives
Df(g(y)) · Dg(y) = I.
Also, we immediately know
D
g is continuous, since it is the composition of
continuous functions (the inverse of a matrix is given by polynomial expressions
of the components). So we only need to check that
D
f(g(y))
1
satisfies the
definition of the derivative.
First we check that
D
f(x) is indeed invertible for every x
B
r
(a)
. We use
the fact that
Df(x) I
1
2
.
If Df (x)v = 0, then we have
v = Df (x)v v Df (x) I∥∥v
1
2
v.
So we must have
v
= 0, i.e. v = 0. So
ker D
f(x) =
{
0
}
. So
D
f(g(y))
1
exists.
Let x V be fixed, and y = f (x). Let k be small and
h = g(y + k) g(y).
In other words,
f(x + h) f(x) = k.
Since g is invertible, whenever k
= 0, h
= 0. Since g is continuous, as k
0,
h 0 as well.
We have
g(y + k) g(y) Df (g(y))
1
k
k
=
h Df(g(y))
1
k
k
=
Df(x)
1
(Df(x)h k)
k
=
Df(x)
1
(f(x + h) f(x) Df(x)h)
k
= Df(x)
1
f(x + h) f(x) Df(x)h
h
·
h
k
= Df(x)
1
f(x + h) f(x) Df(x)h
h
·
g(y + k) g(y)
(y + k) y
.
As k
0, h
0. The first factor
D
f(x)
1
is fixed; the second factor tends
to 0 as h
0; the third factor is bounded by 2. So the whole thing tends to 0.
So done.
Note that in the case where
n
= 1, if
f
: (
a, b
)
R
is
C
1
with
f
(
x
)
= 0 for
every
x
, then
f
is monotone on the whole domain (
a, b
), and hence
f
: (
a, b
)
f
((
a, b
)) is a bijection. In higher dimensions, this is not true. Even if we know
that
D
f(x) is invertible for all x
U
, we cannot say f
|
U
is a bijection. We still
only know there is a local inverse.
Example. Let U = R
2
, and f : R
2
R
2
be given by
f(x, y) =
e
x
cos y
e
x
sin y
.
Then we can directly compute
Df(x, y) =
e
x
cos y e
x
sin y
e
x
sin y e
x
cos y.
Then we have
det(Df(x, y)) = e
x
= 0
for all (x, y) R
2
. However, by periodicity, we have
f(x, y + 2) = f (x, y)
for all n. So f is not injective on R
2
.
One major application of the inverse function theorem is to prove the implicit
function theorem. We will not go into details here, but an example of the theorem
can be found on example sheet 4.