IB Analysis II - Differentiation from ℝ<sup>m</sup> to ℝ<sup>n</sup>

6Differentiation from ℝ^m to ℝⁿ

IB Analysis II

6.1 Differentiation from R

to R

We are now going to investigate differentiation of functions

→ R

. The

hard part is to first come up with a sensible definition of what this means. There

is no obvious way to generalize what we had for real functions. After defining

it, we will need to do some hard work to come up with easy ways to check if

functions are differentiable. Then we can use it to prove some useful results like

the mean value inequality. We will always use the usual Euclidean norm.

To define differentiation in R

, we first we need a definition of the limit.

Definition (Limit of function). Let

E ⊆ R

and

E → R

. Let a

∈ R

be a

limit point of E, and let b ∈ R

. We say

lim

x→a

f(x) = b

if for every ε > 0, there is some δ > 0 such that

(∀x ∈ E) 0 < ∥x − a∥ < δ ⇒ ∥f (x) − b∥ < ε.

As in the case of

in IA Analysis I, we do not impose any requirements on

F when x = a. In particular, we don’t assume that a is in the domain E.

We would like a definition of differentiation for functions

→ R

(or

more generally f :

→ R

) that directly extends the familiar definition on the

real line. Recall that if

: (

b, c

)

→ R

and

a ∈

(

b, c

), we say

is differentiable if

the limit

Df(a) = f

′

(a) = lim

h→0

f(a + h) − f (a)

(∗)

exists (as a real number). This cannot be extended to higher dimensions directly,

since h would become a vector in

, and it is not clear what we mean by

dividing by a vector. We might try dividing by ∥h∥ instead, i.e. require that

lim

h→0

f(a + h) − f (a)

∥h∥

exists. However, this is clearly wrong, since in the case of

= 1, this reduces to

the existence of the limit

f(a + h) − f (a)

|h|

which almost never exists, e.g. when

(

) =

. It is also possible that this exists

while the genuine derivative does not, e.g. when

(

) =

|x|

, at

= 0. So this is

clearly wrong.

Now we are a bit stuck. We need to divide by something, and that thing

better be a scalar.

∥

is not exactly what we want. What should we do? The

idea is move f

′

(a) to the other side of the equation, and (∗) becomes

lim

h→0

f(a + h) − f (a) −f

′

(a)h

= 0.

Now if we replace h by |h|, nothing changes. So this is equivalent to

lim

h→0

f(a + h) − f (a) −f

′

(a)h

|h|

= 0.

In other words, the function f is differentiable if there is some A such that

lim

h→0

f(a + h) − f (a) −Ah

|h|

= 0,

and we call A the derivative.

We are now in a good shape to generalize. Note that if

→ R

is a

real-valued function, then

(

)

− f

(

) is a scalar, but

is a vector. So

not just a number, but a (row) vector. In general, if our function f :

→ R

is vector-valued, then our

should be an

m × n

matrix. Alternatively,

is a

linear map from R

to R

Definition (Differentiation in

). Let

U ⊆ R

be open, f :

→ R

. We say

f is differentiable at a point a

∈ U

if there exists a linear map

→ R

such that

lim

h→0

f(a + h) − f (a) − Ah

∥h∥

= 0.

We call A the derivative of f at a. We write the derivative as Df (a).

This is equivalent to saying

lim

x→a

f(x) − f(a) − A(x − a)

∥x − a∥

= 0.

Note that this is completely consistent with our usual definition the case where

= 1, as we have discussed above, since a linear transformation

R → R

is just given by α(h) = Ah for some real A ∈ R.

One might instead attempt to define differentiability as follows: for any

→ R

, we say

is differentiable at

is differentiable when restricted

to any line passing through

. However, this is a weaker notion, and we will

later see that if we define differentiability this way, then differentiability will no

longer imply continuity, which is bad.

Having defined differentiation, we want to show that the derivative is unique.

Proposition (Uniqueness of derivative). Derivatives are unique.

Proof. Suppose A, B : R

→ R

both satisfy the condition

lim

h→0

f(a + h) − f (a) − Ah

∥h∥

= 0

lim

h→0

f(a + h) − f (a) − Bh

∥h∥

= 0.

By the triangle inequality, we get

∥(B − A)h∥ ≤ ∥f (a + h) − f(a) − Ah∥ + ∥f(a + h) − f (a) − Bh∥.

∥(B − A)h∥

∥h∥

→ 0

as h → 0. We set h = tu in this proof to get

∥(B − A)tu∥

∥tu∥

→ 0

as t → 0. Since (B − A) is linear, we know

∥(B − A)tu∥

∥tu∥

∥(B − A)u∥

∥u∥

So (B − A)u = 0 for all u ∈ R

. So B = A.

Notation. We write L(R

; R

) for the space of linear maps A : R

→ R

So Df(a) ∈ L(R

; R

To avoid having to write limits and divisions all over the place, we have the

following convenient notation:

Notation (Little o notation). For any function α : B

(0) ⊆ R

→ R

, write

α(h) = o(h)

α(h)

∥h∥

→ 0 as h → 0.

In other words, α → 0 faster than ∥h∥ as h → 0.

Note that officially,

(h) =

(h) as a whole is a piece of notation, and does

not represent equality.

Then the condition for differentiability can be written as:

U → R

differentiable at a ∈ U if there is some A with

f(a + h) − f (a) −Ah = o(h).

Alternatively,

f(a + h) = f(a) + Ah + o(h).

Note that we require the domain

of f to be open, so that for each a

∈ U

there is a small ball around a on which f is defined, so f (a + h) is defined for

for sufficiently small h. We could relax this condition and consider “one-sided”

derivatives instead, but we will not look into these in this course.

We can interpret the definition of differentiability as saying we can find a

“good” linear approximation (technically, it is affine, not linear) to the function f

near a.

While the definition of the derivative is good, it is purely existential. This is

unlike the definition of differentiability of real functions, where we are asked to

compute an explicit limit — if the limit exists, that’s the derivative. If not, it

is not differentiable. In the higher-dimensional world, this is not the case. We

have completely no idea where to find the derivative, even if we know it exists.

So we would like an explicit formula for it.

The idea is to look at specific “directions” instead of finding the general

derivative. As always, let f :

U → R

be differentiable at a

∈ U

. Fix some

u ∈ R

, take h = tu (with t ∈ R). Assuming u = 0, differentiability tells

lim

t→0

f(a + tu) − f (a) − Df (a)(tu)

∥tu∥

= 0.

This is equivalent to saying

lim

t→0

f(a + tu) − f (a) − tDf (a)u

|t|∥u∥

= 0.

Since ∥u∥ is fixed, This in turn is equivalent to

lim

t→0

f(a + tu) − f (a) − tDf (a)u

= 0.

This, finally, is equal to

Df(a)u = lim

t→0

f(a + tu) − f (a)

We derived this assuming u



= 0, but this is trivially true for u = 0. So this

valid for all u.

This is of the same form as the usual derivative, and it is usually not too

difficult to compute this limit. Note, however, that this says if the derivative

exists, then the limit above is related to the derivative as above. However, even

if the limit exists for all u, we still cannot conclude that the derivative exists.

Regardless, even if the derivative does not exist, this limit is still often a

useful notion.

Definition (Directional derivative). We write

f(a) = lim

t→0

f(a + tu) − f (a)

whenever this limit exists. We call

f(a) the directional derivative of f at

a ∈ U in the direction of u ∈ R

By definition, we have

f(a) =



t=0

f(a + tu).

Often, it is convenient to focus on the special cases where u = e

, a member of

the standard basis for

. This is known as the partial derivative. By convention,

this is defined for real-valued functions only, but the same definition works for

any R

-valued function.

Definition (Partial derivative). The

th partial derivative of

U → R

a ∈ U is

f(a) = lim

t→∞

f(a + te

) − f(a)

when the limit exists. We often write this as

f(a) = D

f(a) =

∂f

∂x

Note that these definitions do not require differentiability of f at a. We

will see some examples shortly. Before that, we first establish some elementary

properties of differentiable functions.

Proposition. Let U ⊆ R

be open, a ∈ U.

(i) If f : U → R

is differentiable at a, then f is continuous at a.

(ii)

If we write f = (

, f

, ··· , f

) :

U → R

, where each

U → R

, then f

is differentiable at a if and only if each

is differentiable at a for each

(iii)

f, g

U → R

are both differentiable at a, then

f +

g is differentiable

at a with

D(λf + µg)(a) = λDf(a) + µDg(a).

(iv)

→ R

is a linear map, then

is differentiable for any a

∈ R

with

DA(a) = A.

(v)

If f is differentiable at a, then the directional derivative

f(a) exists for

all u ∈ R

, and in fact

f(a) = Df(a)u.

(vi)

If f is differentiable at a, then all partial derivatives

(a) exist for

j = 1, ··· , n; i = 1, ··· , m, and are given by

(a) = Df

(a)e

(vii)

= (

) be the matrix representing

f(a) with respect to the standard

basis for R

and R

, i.e. for any h ∈ R

Df(a)h = Ah.

Then A is given by

= ⟨Df(a)e

, b

⟩ = D

(a).

where

{

, ··· ,

}

is the standard basis for

, and

{

, ··· ,

}

is the

standard basis for R

The second property is useful, since instead of considering arbitrary

valued functions, we can just look at real-valued functions.

Proof.

(i) By definition, if f is differentiable, then as h → 0, we know

f(a + h) − f (a) − Df (a)h → 0.

Since Df(a)h → 0 as well, we must have f (a + h) → f (h).

(ii) Exercise on example sheet 4.

(iii) We just have to check this directly. We have

(λf + µg)(a + h) − (λf + µg)(a) − (λDf(a) + µDg(a))

∥h∥

= λ

f(a + h) − f (a) − Df (a)h

∥h∥

+ µ

g(a + h) − g(a) −Dg(a)h

∥h∥

which tends to 0 as h → 0. So done.

(iv) Since A is linear, we always have A(a + h) − A(a) − Ah = 0 for all h.

(v) We’ve proved this in the previous discussion.

(vi) We’ve proved this in the previous discussion.

(vii)

This follows from the general result for linear maps: for any linear map

represented by (A

)

m×n

, we have

= ⟨Ae

, b

⟩.

Applying this with A = Df(a) and note that for any h ∈ R

Df(a)h = (Df

(a)h, ··· , Df

(a)h).

So done.

The above says differentiability at a point implies the existence of all direc-

tional derivatives, which in turn implies the existence of all partial derivatives.

The converse implication does not hold in either of these.

Example. Let f

: R

→ R be defined by

f(x, y) =

(

0 xy = 0

1 xy = 0

Then the partial derivatives are

(0, 0) =

(0, 0) = 0,

In other directions, say u = (1, 1), we have

f(0 + tu) − f (0)

which diverges as t → 0. So the directional derivative does not exist.

Example. Let f : R

→ R be defined by

f(x, y) =

(

y = 0

0 y = 0

Then for u = (u

, u

) = 0 and t = 0, we can compute

f(0 + tu) − f (0)

(

= 0

0 u

= 0

f(0) = lim

t→0

f(0 + tu) − f (0)

= 0,

and the directional derivative exists. However, the function is not differentiable

at 0, since it is not even continuous at 0, as

f(δ, δ

) =

diverges as δ → 0.

Example. Let f : R

→ R be defined by

f(x, y) =

(

(x, y) = (0, 0)

0 (x, y) = (0, 0)

It is clear that

continuous at points other than 0, and

is also continuous at

0 since |f(x, y)| ≤ |x|. We can compute the partial derivatives as

∂f

∂x

(0, 0) = 1,

∂f

∂y

(0, 0) = 0.

In fact, we can compute the difference quotient in the direction u = (

, u

)



= 0

to be

f(0 + tu) − f (0)

+ u

So we have

f(0) =

+ u

We can now immediately conclude that

is not differentiable at 0, since if it

were, then we would have

f(0) = Df(0)u,

which should be a linear expression in u, but this is not.

Alternatively, if f were differentiable, then we have

Df(0)h =



1 0







= h

However, we have

f(0 + h) − f (0) −Df(0)h

∥h∥

− h

+ h

= −

+ h

which does not tend to 0 as h → 0. For example, if h = (t, t), this quotient is

−

3/2

for t = 0.

To decide if a function is differentiable, the first step would be to compute

the partial derivatives. If they don’t exist, then we can immediately know the

function is not differentiable. However, if they do, then we have a candidate for

what the derivative is, and we plug it into the definition to check if it actually is

the derivative.

This is a cumbersome thing to do. It turns out that while existence of partial

derivatives does not imply differentiability in general, it turns out we can get

differentiability if we add some more slight conditions.

Theorem. Let

U ⊆ R

be open, f :

U → R

. Let a

∈ U

. Suppose there exists

some open ball B

(a) ⊆ U such that

(i) D

(x) exists for every x ∈ B

(a) and 1 ≤ i ≤ m, 1 ≤ j ≤ n

(ii) D

are continuous at a for all 1 ≤ i ≤ m, 1 ≤ j ≤ n.

Then f is differentiable at a.

Proof.

It suffices to prove for

= 1, by the long proposition. For each h =

, ··· , h

) ∈ R

, we have

f(a + h) − f(a) =

j=1

f(a + h

+ ···+ h

) −f(a + h

+ ···+ h

j−1

Now for convenience, we can write

(j)

= h

+ ··· + h

= (h

, ··· , h

, 0, ··· , 0).

Then we have

f(a + h) − f (a) =

j=1

f(a + h

(j)

) − f(a + h

(j−1)

)

j=1

f(a + h

(j−1)

+ h

) − f(a + h

(j−1)

Note that in each term, we are just moving along the coordinate axes. Since

the partial derivatives exist, the mean value theorem of single-variable calculus

applied to

g(t) = f(a + h

(j−1)

+ te

)

on the interval t ∈ [0, h

] allows us to write this as

f(a + h) − f (a)

j=1

f(a + h

(j−1)

+ θ

)

j=1

f(a) +

j=1



f(a + h

(j−1)

+ θ

) − D

f(a)



for some θ

∈ (0, 1).

Note that

(a + h

(j−1)

)

−D

(a)

→

0 as h

→

0 since the partial

derivatives are continuous at a. So the second term is

(h). So

is differentiable

at a with

Df(a)h =

j=1

f(a)h

This is a very useful result. For example, we can now immediately conclude

that the function









7→



+ 4 sin y + e

xyze

14x



is differentiable everywhere, since it has continuous partial derivatives. This is

much better than messing with the definition itself.