IA Vector Calculus - Derivatives and coordinates

1Derivatives and coordinates

IA Vector Calculus

1.1 Derivative of functions

We used to define a derivative as the limit of a quotient and a function is differ-

entiable if the derivative exists. However, this obviously cannot be generalized

to vector-valued functions, since you cannot divide by vectors. So we want

an alternative definition of differentiation, which can be easily generalized to

vectors.

Recall, that if a function

is differentiable at

, then for a small perturbation

δx, we have

δf

def

= f(x + δx) − f(x) = f

(x)δx + o(δx),

which says that the resulting change in

is approximately proportional to

δx

(as opposed to 1

/δx

or something else). It can be easily shown that the converse

is true — if f satisfies this relation, then f is differentiable.

This definition is more easily extended to vector functions. We say a function

is differentiable if, when

is perturbed by

δx

, then the resulting change is

“something” times

δx

plus an

(

δx

) error term. In the most general case,

δx

will

be a vector and that “something” will be a matrix. Then that “something” will

be what we call the derivative.

Vector functions R → R

We start with the simple case of vector functions.

Definition (Vector function). A vector function is a function F : R → R

This takes in a number and returns a vector. For example, it can map a time

to the velocity of a particle at that time.

Definition

(Derivative of vector function)

A vector function

(

) is differen-

tiable if

δF

def

= F(x + δx) − F(x) = F

(x)δx + o(δx)

for some F

(x). F

(x) is called the derivative of F(x).

We don’t have anything new and special here, since we might as well have

defined F

(x) as

= lim

δx→0

δx

[F(x + δx) − F(x)],

which is easily shown to be equivalent to the above definition.

Using differential notation, the differentiability condition can be written as

dF = F

(x) dx.

Given a basis

that is independent of

, vector differentiation is performed

componentwise, i.e.

Proposition.

(x) = F

(x)e

Leibnitz identities hold for the products of scalar and vector functions.

Proposition.

(fg) =

g + f

(g · h) =

· h + g ·

(g × h) =

× h + g ×

Note that the order of multiplication must be retained in the case of the cross

product.

Example.

Consider a particle with mass

. It has position

(

), velocity

(

)

and acceleration

r. Its momentum is p = m

r(t).

Note that derivatives with respect to

are usually denoted by dots instead

of dashes.

If F(r) is the force on a particle, then Newton’s second law states that

p = m

r = F.

We can define the angular momentum about the origin to be

L = r × p = mr ×

If we want to know how the angular momentum changes over time, then

L = m

r ×

r + mr ×

r = mr ×

r = r × F.

which is the torque of F about the origin.

Scalar functions R

→ R

We can also define derivatives for a different kind of function:

Definition. A scalar function is a function f : R

→ R.

A scalar function takes in a position and gives you a number, e.g. the potential

energy of a particle at different positions.

Before we define the derivative of a scalar function, we have to first define

what it means to take a limit of a vector.

Definition

(Limit of vector)

The limit of vectors is defined using the norm.

So v → c iff |v − c| → 0. Similarly, f(r) = o(r) means

|f(r)|

|r|

→ 0 as r → 0.

Definition

(Gradient of scalar function)

A scalar function

(

) is differentiable

at r if

δf

def

= f (r + δr) − f(r) = (∇f) · δr + o(δr)

for some vector ∇f, the gradient of f at r.

Here we have a fancy name “gradient” for the derivative. But we will soon

give up on finding fancy names and just call everything the “derivative”!

Note also that here we genuinely need the new notion of derivative, since

“dividing by δr” makes no sense at all!

The above definition considers the case where

δr

comes in all directions.

What if we only care about the case where

δr

is in some particular direction

For example, maybe

is the potential of a particle that is confined to move in

one straight line only.

Then taking δr = hn, with n a unit vector,

f(r + hn) − f(r) = ∇f · (hn) + o(h) = h(∇f · n) + o(h),

which gives

Definition (Directional derivative). The directional derivative of f along n is

n · ∇f = lim

h→0

[f(r + hn) − f(r)],

It refers to how fast f changes when we move in the direction of n.

Using this expression, the directional derivative is maximized when

is in

the same direction as

∇f

(then

n · ∇f

|∇f|

). So

∇f

points in the direction

of greatest slope.

How do we evaluate

∇f

? Suppose we have an orthonormal basis

. Setting

n = e

in the above equation, we obtain

· ∇f = lim

h→0

[f(r + he

) − f(r)] =

∂f

∂x

Hence

Theorem. The gradient is

∇f =

∂f

∂x

Hence we can write the condition of differentiability as

δf =

∂f

∂x

δx

+ o(δx).

In differential notation, we write

df = ∇f · dr =

∂f

∂x

which is the chain rule for partial derivatives.

Example. Take f (x, y, z) = x + e

sin z. Then

∇f =



∂f

∂x

∂f

∂y

∂f

∂z



= (1 + ye

sin z, xe

sin z, e

cos z)

At (x, y, z) = (0, 1, 0), ∇f = (1, 0, 1). So f increases/decreases most rapidly for

√

1) with a rate of change of

√

. There is no change in

perpendicular to ±

√

(1, 0, 1).

Now suppose we have a scalar function

(

) and we want to consider the rate

of change along a path

(

). A change

δu

produces a change

δr

δu

(

δu

and

δf = ∇f · δr + o(|δr|) = ∇f · r

(u)δu + o(δu).

This shows that f is differentiable as a function of u and

Theorem (Chain rule). Given a function f(r(u)),

= ∇f ·

∂f

∂x

Note that if we drop the du, we simply get

df = ∇f · dr =

∂f

∂x

which is what we’ve previously had.

Vector fields R

→ R

We are now ready to tackle the general case, which are given the fancy name of

vector fields.

Definition (Vector field). A vector field is a function F : R

→ R

Definition

(Derivative of vector field)

A vector field

→ R

is differen-

tiable if

δF

def

= F(x + δx) − F(x) = Mδx + o(δx)

for some m × n matrix M. M is the derivative of F.

As promised, M does not have a fancy name.

Given an arbitrary function

→ R

that maps

x 7→ y

and a choice

of basis, we can write

as a set of

functions

(

) such that

, y

, ··· , y

). Then

∂F

∂x

and we can write the derivative as

Theorem. The derivative of F is given by

∂y

∂x

Note that we could have used this as the definition of the derivative. However,

the original definition is superior because it does not require a selection of

coordinate system.

Definition.

A function is smooth if it can be differentiated any number of times.

This requires that all partial derivatives exist and are totally symmetric in

i, j

and k (i.e. the differential operator is commutative).

The functions we will consider will be smooth except where things obviously

go wrong (e.g. f(x) = 1/x at x = 0).

Theorem

(Chain rule)

Suppose

→ R

and

→ R

. Suppose that

the coordinates of the vectors in

, R

and

are

, x

and

respectively.

By the chain rule,

∂y

∂u

∂y

∂x

∂u

with summation implied. Writing in matrix form,

M(f ◦ g)

= M(f)

M(g)

Alternatively, in operator form,

∂

∂u

∂x

∂u

∂

∂x