IB Analysis II - Differentiation from ℝ<sup>m</sup> to ℝ<sup>n</sup>

6Differentiation from ℝ^m to ℝⁿ

IB Analysis II

6.5 2nd order derivatives

We’ve done so much work to understand first derivatives. For real functions,

we can immediately know a lot about higher derivatives, since the derivative is

just a normal real function again. Here, it slightly more complicated, since the

derivative is a linear operator. However, this is not really a problem, since the

space of linear operators is just yet another vector space, so we can essentially

use the same definition.

Definition (2nd derivative). Let

U ⊆ R

be open, f :

U → R

be differentiable.

Then

f :

U → L

(

;

). We say

f is differentiable at a

∈ U

if there exists

A ∈ L(R

; L(R

; R

)) such that

lim

h→0

∥h∥

(Df(a + h) − Df(a) − Ah) = 0.

For this to make sense, we would need to put a norm on

(

;

) (e.g. the

operator norm), but

, if it exists, is independent of the choice of the norm,

since all norms are equivalent for a finite-dimensional space.

This is, in fact, the same definition as our usual differentiability, since

(

;

) is just a finite-dimensional space, and is isomorphic to

. So

f is

differentiable if and only if

f :

U → R

is differentiable with

A ∈ L

(

;

This allows use to recycle our previous theorems about differentiability.

In particular, we know

f is differentiable is implied by the existence of

partial derivatives

(

) in a neighbourhood of a, and their continuity at a,

for all k = 1, ··· , m and i, j = 1, ··· , n.

Notation. Write

f(a) = D

f)(a) =

∂

∂x

f(a).

Let’s now go back to the initial definition, and try to interpret it. By linear

algebra, in general, a linear map

ℓ

→ L

(

;

) induces a bilinear map

Φ : R

ℓ

× R

→ R

Φ(u, v) = ϕ(u)(v) ∈ R

In particular, we know

Φ(au + bv, w) = aΦ(u, w) + bΦ(v, w)

Φ(u, av + bw) = aΦ(u, v) + bΦ(u, w).

Conversely, if Φ :

ℓ

× R

→ R

is bilinear, then

ℓ

→ L

(

;

) defined

ϕ(u) = (v 7→ Φ(u, v))

is linear. These are clearly inverse operations to each other. So there is a

one-to-one correspondence between bilinear maps

ℓ

× R

→ R

and linear

maps Φ : R

ℓ

→ L(R

; R

In other words, instead of treating our second derivative as a weird linear

map in L(R

; L(R

; R

)), we can view it as a bilinear map R

× R

→ R

Notation. We define D

f(a) : R

× R

→ R

f(a)(u, v) = D(Df )(a)(u)(v).

We know D

f(a) is a bilinear map.

In coordinates, if

u =

j=1

, v =

j=1

where

{

, ··· ,

}

are the standard basis for

, then using bilinearity, we have

f(a)(u, v) =

i=1

j=1

f(a)(e

, e

This is very similar to the case of first derivatives, where the derivative can be

completely specified by the values it takes on the basis vectors.

In the definition of the second derivative, we can again take h =

. Then

we have

lim

t→0

Df(a + te

) − Df(a) − tD(Df)(a)(e

)

= 0.

Note that the whole thing at the top is a linear map in L(R

; R

). We can let

the whole thing act on e

, and obtain

lim

t→0

Df(a + te

)(e

) − Df(a)(e

) − tD(Df)(a)(e

)(e

)

= 0.

for all i, j = 1, ··· , n. Taking the D

f(a)(e

, e

) to the other side, we know

f(a)(e

, e

) = lim

t→0

Df(a + te

)(e

) − Df(a)(e

)

= lim

t→0

f(a + te

) − D

f(a)

= D

f(a).

In other words, we have

f(e

, e

) =

k=1

(a)b

where {b

, ··· , b

} is the standard basis for R

. So we have

f(u, v) =

i,j=1

k=1

(a)u

We have been very careful to keep the right order of the partial derivatives.

However, in most cases we care about, it doesn’t matter.

Theorem (Symmetry of mixed partials). Let

U ⊆ R

be open, f :

U → R

a ∈ U, and ρ > 0 such that B

(a) ⊆ U.

Let

i, j ∈ {

, ··· , n}

be fixed and suppose that

f(x) and

f(x) exist

for all x ∈ B

(a) and are continuous at a. Then in fact

f(a) = D

f(a).

The proof is quite short, when we know what to do.

Proof.

wlog, assume

= 1. If

, then there is nothing to prove. So assume

i = j.

Let

(t) = f(a + te

+ te

) − f(a + te

) + f(a).

Then for each fixed t, define ϕ : [0, 1] → R by

ϕ(s) = f(a + ste

+ te

) − f(a + ste

Then we get

(t) = ϕ(1) − ϕ(0).

By the mean value theorem and the chain rule, there is some

θ ∈

1) such that

(t) = ϕ

′

(θ) = t



f(a + θte

+ te

) − D

f(a + θte

)



Now apply mean value theorem to the function

s 7→ D

f(a + θte

+ ste

there is some η ∈ (0, 1) such that

(t) = t

f(a + θte

+ ηte

We can do the same for g

, and find some

θ, ˜η such that

(t) = t

f(a +

θte

+ ˜ηte

Since g

= g

, we get

f(a + θte

+ ηte

) = t

f(a +

θte

+ ˜ηte

Divide by

, and take the limit as

t →

0. By continuity of the partial derivatives,

we get

f(a) = D

f(a).

This is nice. Whenever the second derivatives are continuous, the order does

not matter. We can alternatively state this result as follows:

Proposition. If

U → R

is differentiable in

such that

f(x) exists

in a neighbourhood of a

∈ U

and are continuous at a, then

f is differentiable

at a and

f(a)(u, v) =

f(a)u

is a symmetric bilinear form.

Proof.

This follows from the fact that continuity of second partials implies

differentiability, and the symmetry of mixed partials.

Finally, we conclude with a version of Taylor’s theorem for multivariable

functions.

Theorem (Second-order Taylor’s theorem). Let

U → R

, i.e.

(x)

are continuous for all x ∈ U . Let a ∈ U and B

(a) ⊆ U. Then

f(a + h) = f(a) + Df(a)h +

f(h, h) + E(h),

where E(h) = o(∥h∥

Proof. Consider the function

g(t) = f(a + th).

Then the assumptions tell us

is twice differentiable. By the 1D Taylor’s

theorem, we know

g(1) = g(0) + g

′

(0) +

′′

(s)

for some s ∈ [0, 1].

In other words,

f(a + h) = f(a) + Df(a)h +

f(a + sh)(h, h)

= f(a) + Df(a)h +

f(a)(h, h) + E(h),

where

E(h) =



f(a + sh)(h, h) − D

f(a)(h, h)



By definition of the operator norm, we get

|E(h)| ≤

∥D

f(a + sh) − D

f(a)∥∥h∥

By continuity of the second derivative, as h → 0, we get

∥D

f(a + sh) − D

f(a)∥ → 0.

So E(h) = o(∥h∥

). So done.