6Differentiation from ℝm to ℝn

IB Analysis II



6.1 Differentiation from R
m
to R
n
We are now going to investigate differentiation of functions
f
:
R
n
R
m
. The
hard part is to first come up with a sensible definition of what this means. There
is no obvious way to generalize what we had for real functions. After defining
it, we will need to do some hard work to come up with easy ways to check if
functions are differentiable. Then we can use it to prove some useful results like
the mean value inequality. We will always use the usual Euclidean norm.
To define differentiation in R
n
, we first we need a definition of the limit.
Definition (Limit of function). Let
E R
n
and
f
:
E R
m
. Let a
R
n
be a
limit point of E, and let b R
m
. We say
lim
xa
f(x) = b
if for every ε > 0, there is some δ > 0 such that
(x E) 0 < x a < δ f (x) b < ε.
As in the case of
R
in IA Analysis I, we do not impose any requirements on
F when x = a. In particular, we don’t assume that a is in the domain E.
We would like a definition of differentiation for functions
f
:
R
n
R
(or
more generally f :
R
n
R
m
) that directly extends the familiar definition on the
real line. Recall that if
f
: (
b, c
)
R
and
a
(
b, c
), we say
f
is differentiable if
the limit
Df(a) = f
(a) = lim
h0
f(a + h) f (a)
h
()
exists (as a real number). This cannot be extended to higher dimensions directly,
since h would become a vector in
R
n
, and it is not clear what we mean by
dividing by a vector. We might try dividing by h instead, i.e. require that
lim
h0
f(a + h) f (a)
h
exists. However, this is clearly wrong, since in the case of
n
= 1, this reduces to
the existence of the limit
f(a + h) f (a)
|h|
,
which almost never exists, e.g. when
f
(
x
) =
x
. It is also possible that this exists
while the genuine derivative does not, e.g. when
f
(
x
) =
|x|
, at
x
= 0. So this is
clearly wrong.
Now we are a bit stuck. We need to divide by something, and that thing
better be a scalar.
h
is not exactly what we want. What should we do? The
idea is move f
(a) to the other side of the equation, and () becomes
lim
h0
f(a + h) f (a) f
(a)h
h
= 0.
Now if we replace h by |h|, nothing changes. So this is equivalent to
lim
h0
f(a + h) f (a) f
(a)h
|h|
= 0.
In other words, the function f is differentiable if there is some A such that
lim
h0
f(a + h) f (a) Ah
|h|
= 0,
and we call A the derivative.
We are now in a good shape to generalize. Note that if
f
:
R
n
R
is a
real-valued function, then
f
(
a
+
h
)
f
(
a
) is a scalar, but
h
is a vector. So
A
is
not just a number, but a (row) vector. In general, if our function f :
R
n
R
m
is vector-valued, then our
A
should be an
m × n
matrix. Alternatively,
A
is a
linear map from R
n
to R
m
.
Definition (Differentiation in
R
n
). Let
U R
n
be open, f :
R
n
R
m
. We say
f is differentiable at a point a
U
if there exists a linear map
A
:
R
n
R
m
such that
lim
h0
f(a + h) f (a) Ah
h
= 0.
We call A the derivative of f at a. We write the derivative as Df (a).
This is equivalent to saying
lim
xa
f(x) f(a) A(x a)
x a
= 0.
Note that this is completely consistent with our usual definition the case where
n
=
m
= 1, as we have discussed above, since a linear transformation
α
:
R R
is just given by α(h) = Ah for some real A R.
One might instead attempt to define differentiability as follows: for any
f
:
R
m
R
, we say
f
is differentiable at
x
if
f
is differentiable when restricted
to any line passing through
x
. However, this is a weaker notion, and we will
later see that if we define differentiability this way, then differentiability will no
longer imply continuity, which is bad.
Having defined differentiation, we want to show that the derivative is unique.
Proposition (Uniqueness of derivative). Derivatives are unique.
Proof. Suppose A, B : R
n
R
m
both satisfy the condition
lim
h0
f(a + h) f (a) Ah
h
= 0
lim
h0
f(a + h) f (a) Bh
h
= 0.
By the triangle inequality, we get
(B A)h f (a + h) f(a) Ah + f(a + h) f (a) Bh.
So
(B A)h
h
0
as h 0. We set h = tu in this proof to get
(B A)tu
tu
0
as t 0. Since (B A) is linear, we know
(B A)tu
tu
=
(B A)u
u
.
So (B A)u = 0 for all u R
n
. So B = A.
Notation. We write L(R
n
; R
m
) for the space of linear maps A : R
n
R
m
.
So Df(a) L(R
n
; R
m
).
To avoid having to write limits and divisions all over the place, we have the
following convenient notation:
Notation (Little o notation). For any function α : B
r
(0) R
n
R
m
, write
α(h) = o(h)
if
α(h)
h
0 as h 0.
In other words, α 0 faster than h as h 0.
Note that officially,
α
(h) =
o
(h) as a whole is a piece of notation, and does
not represent equality.
Then the condition for differentiability can be written as:
f
:
U R
m
is
differentiable at a U if there is some A with
f(a + h) f (a) Ah = o(h).
Alternatively,
f(a + h) = f(a) + Ah + o(h).
Note that we require the domain
U
of f to be open, so that for each a
U
,
there is a small ball around a on which f is defined, so f (a + h) is defined for
for sufficiently small h. We could relax this condition and consider “one-sided”
derivatives instead, but we will not look into these in this course.
We can interpret the definition of differentiability as saying we can find a
“good” linear approximation (technically, it is affine, not linear) to the function f
near a.
While the definition of the derivative is good, it is purely existential. This is
unlike the definition of differentiability of real functions, where we are asked to
compute an explicit limit if the limit exists, that’s the derivative. If not, it
is not differentiable. In the higher-dimensional world, this is not the case. We
have completely no idea where to find the derivative, even if we know it exists.
So we would like an explicit formula for it.
The idea is to look at specific “directions” instead of finding the general
derivative. As always, let f :
U R
m
be differentiable at a
U
. Fix some
u R
n
, take h = tu (with t R). Assuming u = 0, differentiability tells
lim
t0
f(a + tu) f (a) Df (a)(tu)
tu
= 0.
This is equivalent to saying
lim
t0
f(a + tu) f (a) tDf (a)u
|t|∥u
= 0.
Since u is fixed, This in turn is equivalent to
lim
t0
f(a + tu) f (a) tDf (a)u
t
= 0.
This, finally, is equal to
Df(a)u = lim
t0
f(a + tu) f (a)
t
.
We derived this assuming u
= 0, but this is trivially true for u = 0. So this
valid for all u.
This is of the same form as the usual derivative, and it is usually not too
difficult to compute this limit. Note, however, that this says if the derivative
exists, then the limit above is related to the derivative as above. However, even
if the limit exists for all u, we still cannot conclude that the derivative exists.
Regardless, even if the derivative does not exist, this limit is still often a
useful notion.
Definition (Directional derivative). We write
D
u
f(a) = lim
t0
f(a + tu) f (a)
t
whenever this limit exists. We call
D
u
f(a) the directional derivative of f at
a U in the direction of u R
n
.
By definition, we have
D
u
f(a) =
d
dt
t=0
f(a + tu).
Often, it is convenient to focus on the special cases where u = e
j
, a member of
the standard basis for
R
n
. This is known as the partial derivative. By convention,
this is defined for real-valued functions only, but the same definition works for
any R
m
-valued function.
Definition (Partial derivative). The
j
th partial derivative of
f
:
U R
at
a U is
D
e
j
f(a) = lim
t→∞
f(a + te
j
) f(a)
t
,
when the limit exists. We often write this as
D
e
j
f(a) = D
j
f(a) =
f
x
j
.
Note that these definitions do not require differentiability of f at a. We
will see some examples shortly. Before that, we first establish some elementary
properties of differentiable functions.
Proposition. Let U R
n
be open, a U.
(i) If f : U R
m
is differentiable at a, then f is continuous at a.
(ii)
If we write f = (
f
1
, f
2
, ··· , f
m
) :
U R
m
, where each
f
i
:
U R
, then f
is differentiable at a if and only if each
f
j
is differentiable at a for each
j
.
(iii)
If
f, g
:
U R
m
are both differentiable at a, then
λ
f +
µ
g is differentiable
at a with
D(λf + µg)(a) = λDf(a) + µDg(a).
(iv)
If
A
:
R
n
R
m
is a linear map, then
A
is differentiable for any a
R
n
with
DA(a) = A.
(v)
If f is differentiable at a, then the directional derivative
D
u
f(a) exists for
all u R
n
, and in fact
D
u
f(a) = Df(a)u.
(vi)
If f is differentiable at a, then all partial derivatives
D
j
f
i
(a) exist for
j = 1, ··· , n; i = 1, ··· , m, and are given by
D
j
f
i
(a) = Df
i
(a)e
j
.
(vii)
If
A
= (
A
ij
) be the matrix representing
D
f(a) with respect to the standard
basis for R
n
and R
m
, i.e. for any h R
n
,
Df(a)h = Ah.
Then A is given by
A
ij
= Df(a)e
j
, b
i
= D
j
f
i
(a).
where
{
e
1
, ··· ,
e
n
}
is the standard basis for
R
n
, and
{
b
1
, ··· ,
b
m
}
is the
standard basis for R
m
.
The second property is useful, since instead of considering arbitrary
R
m
-
valued functions, we can just look at real-valued functions.
Proof.
(i) By definition, if f is differentiable, then as h 0, we know
f(a + h) f (a) Df (a)h 0.
Since Df(a)h 0 as well, we must have f (a + h) f (h).
(ii) Exercise on example sheet 4.
(iii) We just have to check this directly. We have
(λf + µg)(a + h) (λf + µg)(a) (λDf(a) + µDg(a))
h
= λ
f(a + h) f (a) Df (a)h
h
+ µ
g(a + h) g(a) Dg(a)h
h
.
which tends to 0 as h 0. So done.
(iv) Since A is linear, we always have A(a + h) A(a) Ah = 0 for all h.
(v) We’ve proved this in the previous discussion.
(vi) We’ve proved this in the previous discussion.
(vii)
This follows from the general result for linear maps: for any linear map
represented by (A
ij
)
m×n
, we have
A
ij
= Ae
j
, b
i
.
Applying this with A = Df(a) and note that for any h R
n
,
Df(a)h = (Df
1
(a)h, ··· , Df
m
(a)h).
So done.
The above says differentiability at a point implies the existence of all direc-
tional derivatives, which in turn implies the existence of all partial derivatives.
The converse implication does not hold in either of these.
Example. Let f
2
: R
2
R be defined by
f(x, y) =
(
0 xy = 0
1 xy = 0
Then the partial derivatives are
df
dx
(0, 0) =
df
dy
(0, 0) = 0,
In other directions, say u = (1, 1), we have
f(0 + tu) f (0)
t
=
1
t
which diverges as t 0. So the directional derivative does not exist.
Example. Let f : R
2
R be defined by
f(x, y) =
(
x
3
y
y = 0
0 y = 0
Then for u = (u
1
, u
2
) = 0 and t = 0, we can compute
f(0 + tu) f (0)
t
=
(
tu
3
1
u
2
u
2
= 0
0 u
2
= 0
So
D
u
f(0) = lim
t0
f(0 + tu) f (0)
t
= 0,
and the directional derivative exists. However, the function is not differentiable
at 0, since it is not even continuous at 0, as
f(δ, δ
4
) =
1
δ
diverges as δ 0.
Example. Let f : R
2
R be defined by
f(x, y) =
(
x
3
x
2
+y
2
(x, y) = (0, 0)
0 (x, y) = (0, 0)
.
It is clear that
f
continuous at points other than 0, and
f
is also continuous at
0 since |f(x, y)| |x|. We can compute the partial derivatives as
f
x
(0, 0) = 1,
f
y
(0, 0) = 0.
In fact, we can compute the difference quotient in the direction u = (
u
1
, u
2
)
= 0
to be
f(0 + tu) f (0)
t
=
u
3
1
u
2
1
+ u
2
2
.
So we have
D
u
f(0) =
u
3
1
u
2
1
+ u
2
2
.
We can now immediately conclude that
f
is not differentiable at 0, since if it
were, then we would have
D
u
f(0) = Df(0)u,
which should be a linear expression in u, but this is not.
Alternatively, if f were differentiable, then we have
Df(0)h =
1 0
h
1
h
2
= h
1
.
However, we have
f(0 + h) f (0) Df(0)h
h
=
h
3
1
h
2
1
+h
2
2
h
1
p
h
2
1
+ h
2
2
=
h
1
h
2
2
p
h
2
1
+ h
2
2
3
,
which does not tend to 0 as h 0. For example, if h = (t, t), this quotient is
1
2
3/2
for t = 0.
To decide if a function is differentiable, the first step would be to compute
the partial derivatives. If they don’t exist, then we can immediately know the
function is not differentiable. However, if they do, then we have a candidate for
what the derivative is, and we plug it into the definition to check if it actually is
the derivative.
This is a cumbersome thing to do. It turns out that while existence of partial
derivatives does not imply differentiability in general, it turns out we can get
differentiability if we add some more slight conditions.
Theorem. Let
U R
n
be open, f :
U R
m
. Let a
U
. Suppose there exists
some open ball B
r
(a) U such that
(i) D
j
f
i
(x) exists for every x B
r
(a) and 1 i m, 1 j n
(ii) D
j
f
i
are continuous at a for all 1 i m, 1 j n.
Then f is differentiable at a.
Proof.
It suffices to prove for
m
= 1, by the long proposition. For each h =
(h
1
, ··· , h
n
) R
n
, we have
f(a + h) f(a) =
n
X
j=1
f(a + h
1
e
1
+ ···+ h
j
e
j
) f(a + h
1
e
1
+ ···+ h
j1
e
j1
).
Now for convenience, we can write
h
(j)
= h
1
e
1
+ ··· + h
j
e
j
= (h
1
, ··· , h
j
, 0, ··· , 0).
Then we have
f(a + h) f (a) =
n
X
j=1
f(a + h
(j)
) f(a + h
(j1)
)
=
n
X
j=1
f(a + h
(j1)
+ h
j
e
j
) f(a + h
(j1)
).
Note that in each term, we are just moving along the coordinate axes. Since
the partial derivatives exist, the mean value theorem of single-variable calculus
applied to
g(t) = f(a + h
(j1)
+ te
j
)
on the interval t [0, h
j
] allows us to write this as
f(a + h) f (a)
=
n
X
j=1
h
j
D
j
f(a + h
(j1)
+ θ
j
h
j
e
j
)
=
n
X
j=1
h
j
D
j
f(a) +
n
X
j=1
h
j
D
j
f(a + h
(j1)
+ θ
j
h
j
e
j
) D
j
f(a)
for some θ
j
(0, 1).
Note that
D
j
f
(a + h
(j1)
+
θ
j
h
j
e
j
)
D
j
f
(a)
0 as h
0 since the partial
derivatives are continuous at a. So the second term is
o
(h). So
f
is differentiable
at a with
Df(a)h =
n
X
j=1
D
j
f(a)h
j
.
This is a very useful result. For example, we can now immediately conclude
that the function
x
y
z
7→
3x
2
+ 4 sin y + e
6z
xyze
14x
is differentiable everywhere, since it has continuous partial derivatives. This is
much better than messing with the definition itself.