6Differentiation from ℝm to ℝn
IB Analysis II
6.5 2nd order derivatives
We’ve done so much work to understand first derivatives. For real functions,
we can immediately know a lot about higher derivatives, since the derivative is
just a normal real function again. Here, it slightly more complicated, since the
derivative is a linear operator. However, this is not really a problem, since the
space of linear operators is just yet another vector space, so we can essentially
use the same definition.
Definition (2nd derivative). Let
U ⊆ R
n
be open, f :
U → R
m
be differentiable.
Then
D
f :
U → L
(
R
n
;
R
m
). We say
D
f is differentiable at a
∈ U
if there exists
A ∈ L(R
n
; L(R
n
; R
m
)) such that
lim
h→0
1
∥h∥
(Df(a + h) − Df(a) − Ah) = 0.
For this to make sense, we would need to put a norm on
L
(
R
n
;
R
m
) (e.g. the
operator norm), but
A
, if it exists, is independent of the choice of the norm,
since all norms are equivalent for a finite-dimensional space.
This is, in fact, the same definition as our usual differentiability, since
L
(
R
n
;
R
m
) is just a finite-dimensional space, and is isomorphic to
R
nm
. So
D
f is
differentiable if and only if
D
f :
U → R
nm
is differentiable with
A ∈ L
(
R
n
;
R
nm
).
This allows use to recycle our previous theorems about differentiability.
In particular, we know
D
f is differentiable is implied by the existence of
partial derivatives
D
i
(
D
j
f
k
) in a neighbourhood of a, and their continuity at a,
for all k = 1, ··· , m and i, j = 1, ··· , n.
Notation. Write
D
ij
f(a) = D
i
(D
j
f)(a) =
∂
2
∂x
i
∂x
j
f(a).
Let’s now go back to the initial definition, and try to interpret it. By linear
algebra, in general, a linear map
ϕ
:
R
ℓ
→ L
(
R
n
;
R
m
) induces a bilinear map
Φ : R
ℓ
× R
n
→ R
m
by
Φ(u, v) = ϕ(u)(v) ∈ R
m
.
In particular, we know
Φ(au + bv, w) = aΦ(u, w) + bΦ(v, w)
Φ(u, av + bw) = aΦ(u, v) + bΦ(u, w).
Conversely, if Φ :
R
ℓ
× R
n
→ R
m
is bilinear, then
ϕ
:
R
ℓ
→ L
(
R
n
;
R
m
) defined
by
ϕ(u) = (v 7→ Φ(u, v))
is linear. These are clearly inverse operations to each other. So there is a
one-to-one correspondence between bilinear maps
ϕ
:
R
ℓ
× R
n
→ R
m
and linear
maps Φ : R
ℓ
→ L(R
n
; R
m
).
In other words, instead of treating our second derivative as a weird linear
map in L(R
n
; L(R
n
; R
m
)), we can view it as a bilinear map R
n
× R
n
→ R
m
.
Notation. We define D
2
f(a) : R
n
× R
n
→ R
m
by
D
2
f(a)(u, v) = D(Df )(a)(u)(v).
We know D
2
f(a) is a bilinear map.
In coordinates, if
u =
n
X
j=1
u
j
e
j
, v =
n
X
j=1
v
j
e
j
,
where
{
e
1
, ··· ,
e
n
}
are the standard basis for
R
n
, then using bilinearity, we have
D
2
f(a)(u, v) =
n
X
i=1
n
X
j=1
D
2
f(a)(e
i
, e
j
)u
i
v
j
.
This is very similar to the case of first derivatives, where the derivative can be
completely specified by the values it takes on the basis vectors.
In the definition of the second derivative, we can again take h =
t
e
i
. Then
we have
lim
t→0
Df(a + te
i
) − Df(a) − tD(Df)(a)(e
i
)
t
= 0.
Note that the whole thing at the top is a linear map in L(R
n
; R
m
). We can let
the whole thing act on e
j
, and obtain
lim
t→0
Df(a + te
i
)(e
j
) − Df(a)(e
j
) − tD(Df)(a)(e
i
)(e
j
)
t
= 0.
for all i, j = 1, ··· , n. Taking the D
2
f(a)(e
i
, e
j
) to the other side, we know
D
2
f(a)(e
i
, e
j
) = lim
t→0
Df(a + te
i
)(e
j
) − Df(a)(e
j
)
t
= lim
t→0
D
e
j
f(a + te
i
) − D
e
j
f(a)
t
= D
e
i
D
e
j
f(a).
In other words, we have
D
2
f(e
i
, e
j
) =
m
X
k=1
D
ij
f
k
(a)b
k
,
where {b
1
, ··· , b
m
} is the standard basis for R
m
. So we have
D
2
f(u, v) =
n
X
i,j=1
m
X
k=1
D
ij
f
k
(a)u
i
v
j
b
k
We have been very careful to keep the right order of the partial derivatives.
However, in most cases we care about, it doesn’t matter.
Theorem (Symmetry of mixed partials). Let
U ⊆ R
n
be open, f :
U → R
m
,
a ∈ U, and ρ > 0 such that B
ρ
(a) ⊆ U.
Let
i, j ∈ {
1
, ··· , n}
be fixed and suppose that
D
i
D
j
f(x) and
D
j
D
i
f(x) exist
for all x ∈ B
ρ
(a) and are continuous at a. Then in fact
D
i
D
j
f(a) = D
j
D
i
f(a).
The proof is quite short, when we know what to do.
Proof.
wlog, assume
m
= 1. If
i
=
j
, then there is nothing to prove. So assume
i = j.
Let
g
ij
(t) = f(a + te
i
+ te
j
) − f(a + te
i
) − f(a + te
j
) + f(a).
Then for each fixed t, define ϕ : [0, 1] → R by
ϕ(s) = f(a + ste
i
+ te
j
) − f(a + ste
i
).
Then we get
g
ij
(t) = ϕ(1) − ϕ(0).
By the mean value theorem and the chain rule, there is some
θ ∈
(0
,
1) such that
g
ij
(t) = ϕ
′
(θ) = t
D
i
f(a + θte
i
+ te
j
) − D
i
f(a + θte
i
)
.
Now apply mean value theorem to the function
s 7→ D
i
f(a + θte
i
+ ste
j
),
there is some η ∈ (0, 1) such that
g
ij
(t) = t
2
D
j
D
i
f(a + θte
i
+ ηte
j
).
We can do the same for g
ji
, and find some
˜
θ, ˜η such that
g
ji
(t) = t
2
D
i
D
j
f(a +
˜
θte
i
+ ˜ηte
j
).
Since g
ij
= g
ji
, we get
t
2
D
j
D
i
f(a + θte
i
+ ηte
j
) = t
2
D
i
D
j
f(a +
˜
θte
i
+ ˜ηte
j
).
Divide by
t
2
, and take the limit as
t →
0. By continuity of the partial derivatives,
we get
D
j
D
i
f(a) = D
i
D
j
f(a).
This is nice. Whenever the second derivatives are continuous, the order does
not matter. We can alternatively state this result as follows:
Proposition. If
f
:
U → R
m
is differentiable in
U
such that
D
i
D
j
f(x) exists
in a neighbourhood of a
∈ U
and are continuous at a, then
D
f is differentiable
at a and
D
2
f(a)(u, v) =
X
j
X
i
D
i
D
j
f(a)u
i
v
j
.
is a symmetric bilinear form.
Proof.
This follows from the fact that continuity of second partials implies
differentiability, and the symmetry of mixed partials.
Finally, we conclude with a version of Taylor’s theorem for multivariable
functions.
Theorem (Second-order Taylor’s theorem). Let
f
:
U → R
be
C
2
, i.e.
D
i
D
j
f
(x)
are continuous for all x ∈ U . Let a ∈ U and B
r
(a) ⊆ U. Then
f(a + h) = f(a) + Df(a)h +
1
2
D
2
f(h, h) + E(h),
where E(h) = o(∥h∥
2
).
Proof. Consider the function
g(t) = f(a + th).
Then the assumptions tell us
g
is twice differentiable. By the 1D Taylor’s
theorem, we know
g(1) = g(0) + g
′
(0) +
1
2
g
′′
(s)
for some s ∈ [0, 1].
In other words,
f(a + h) = f(a) + Df(a)h +
1
2
D
2
f(a + sh)(h, h)
= f(a) + Df(a)h +
1
2
D
2
f(a)(h, h) + E(h),
where
E(h) =
1
2
D
2
f(a + sh)(h, h) − D
2
f(a)(h, h)
.
By definition of the operator norm, we get
|E(h)| ≤
1
2
∥D
2
f(a + sh) − D
2
f(a)∥∥h∥
2
.
By continuity of the second derivative, as h → 0, we get
∥D
2
f(a + sh) − D
2
f(a)∥ → 0.
So E(h) = o(∥h∥
2
). So done.