1Derivatives and coordinates
IA Vector Calculus
1.1 Derivative of functions
We used to define a derivative as the limit of a quotient and a function is differ-
entiable if the derivative exists. However, this obviously cannot be generalized
to vector-valued functions, since you cannot divide by vectors. So we want
an alternative definition of differentiation, which can be easily generalized to
vectors.
Recall, that if a function
f
is differentiable at
x
, then for a small perturbation
δx, we have
δf
def
= f(x + δx) − f(x) = f
0
(x)δx + o(δx),
which says that the resulting change in
f
is approximately proportional to
δx
(as opposed to 1
/δx
or something else). It can be easily shown that the converse
is true — if f satisfies this relation, then f is differentiable.
This definition is more easily extended to vector functions. We say a function
F
is differentiable if, when
x
is perturbed by
δx
, then the resulting change is
“something” times
δx
plus an
o
(
δx
) error term. In the most general case,
δx
will
be a vector and that “something” will be a matrix. Then that “something” will
be what we call the derivative.
Vector functions R → R
n
We start with the simple case of vector functions.
Definition (Vector function). A vector function is a function F : R → R
n
.
This takes in a number and returns a vector. For example, it can map a time
to the velocity of a particle at that time.
Definition
(Derivative of vector function)
.
A vector function
F
(
x
) is differen-
tiable if
δF
def
= F(x + δx) − F(x) = F
0
(x)δx + o(δx)
for some F
0
(x). F
0
(x) is called the derivative of F(x).
We don’t have anything new and special here, since we might as well have
defined F
0
(x) as
F
0
=
dF
dx
= lim
δx→0
1
δx
[F(x + δx) − F(x)],
which is easily shown to be equivalent to the above definition.
Using differential notation, the differentiability condition can be written as
dF = F
0
(x) dx.
Given a basis
e
i
that is independent of
x
, vector differentiation is performed
componentwise, i.e.
Proposition.
F
0
(x) = F
0
i
(x)e
i
.
Leibnitz identities hold for the products of scalar and vector functions.
Proposition.
d
dt
(fg) =
df
dt
g + f
dg
dt
d
dt
(g · h) =
dg
dt
· h + g ·
dh
dt
d
dt
(g × h) =
dg
dt
× h + g ×
dh
dt
Note that the order of multiplication must be retained in the case of the cross
product.
Example.
Consider a particle with mass
m
. It has position
r
(
t
), velocity
˙
r
(
t
)
and acceleration
¨
r. Its momentum is p = m
˙
r(t).
Note that derivatives with respect to
t
are usually denoted by dots instead
of dashes.
If F(r) is the force on a particle, then Newton’s second law states that
˙
p = m
¨
r = F.
We can define the angular momentum about the origin to be
L = r × p = mr ×
˙
r.
If we want to know how the angular momentum changes over time, then
˙
L = m
˙
r ×
˙
r + mr ×
¨
r = mr ×
¨
r = r × F.
which is the torque of F about the origin.
Scalar functions R
n
→ R
We can also define derivatives for a different kind of function:
Definition. A scalar function is a function f : R
n
→ R.
A scalar function takes in a position and gives you a number, e.g. the potential
energy of a particle at different positions.
Before we define the derivative of a scalar function, we have to first define
what it means to take a limit of a vector.
Definition
(Limit of vector)
.
The limit of vectors is defined using the norm.
So v → c iff |v − c| → 0. Similarly, f(r) = o(r) means
|f(r)|
|r|
→ 0 as r → 0.
Definition
(Gradient of scalar function)
.
A scalar function
f
(
r
) is differentiable
at r if
δf
def
= f (r + δr) − f(r) = (∇f) · δr + o(δr)
for some vector ∇f, the gradient of f at r.
Here we have a fancy name “gradient” for the derivative. But we will soon
give up on finding fancy names and just call everything the “derivative”!
Note also that here we genuinely need the new notion of derivative, since
“dividing by δr” makes no sense at all!
The above definition considers the case where
δr
comes in all directions.
What if we only care about the case where
δr
is in some particular direction
n
?
For example, maybe
f
is the potential of a particle that is confined to move in
one straight line only.
Then taking δr = hn, with n a unit vector,
f(r + hn) − f(r) = ∇f · (hn) + o(h) = h(∇f · n) + o(h),
which gives
Definition (Directional derivative). The directional derivative of f along n is
n · ∇f = lim
h→0
1
h
[f(r + hn) − f(r)],
It refers to how fast f changes when we move in the direction of n.
Using this expression, the directional derivative is maximized when
n
is in
the same direction as
∇f
(then
n · ∇f
=
|∇f|
). So
∇f
points in the direction
of greatest slope.
How do we evaluate
∇f
? Suppose we have an orthonormal basis
e
i
. Setting
n = e
i
in the above equation, we obtain
e
i
· ∇f = lim
h→0
1
h
[f(r + he
i
) − f(r)] =
∂f
∂x
i
.
Hence
Theorem. The gradient is
∇f =
∂f
∂x
i
e
i
Hence we can write the condition of differentiability as
δf =
∂f
∂x
i
δx
i
+ o(δx).
In differential notation, we write
df = ∇f · dr =
∂f
∂x
i
dx
i
,
which is the chain rule for partial derivatives.
Example. Take f (x, y, z) = x + e
xy
sin z. Then
∇f =
∂f
∂x
,
∂f
∂y
,
∂f
∂z
= (1 + ye
xy
sin z, xe
xy
sin z, e
xy
cos z)
At (x, y, z) = (0, 1, 0), ∇f = (1, 0, 1). So f increases/decreases most rapidly for
n
=
±
1
√
2
(1
,
0
,
1) with a rate of change of
±
√
2
. There is no change in
f
if
n
is
perpendicular to ±
1
√
2
(1, 0, 1).
Now suppose we have a scalar function
f
(
r
) and we want to consider the rate
of change along a path
r
(
u
). A change
δu
produces a change
δr
=
r
0
δu
+
o
(
δu
),
and
δf = ∇f · δr + o(|δr|) = ∇f · r
0
(u)δu + o(δu).
This shows that f is differentiable as a function of u and
Theorem (Chain rule). Given a function f(r(u)),
df
du
= ∇f ·
dr
du
=
∂f
∂x
i
dx
i
du
.
Note that if we drop the du, we simply get
df = ∇f · dr =
∂f
∂x
i
dx
i
,
which is what we’ve previously had.
Vector fields R
n
→ R
m
We are now ready to tackle the general case, which are given the fancy name of
vector fields.
Definition (Vector field). A vector field is a function F : R
n
→ R
m
.
Definition
(Derivative of vector field)
.
A vector field
F
:
R
n
→ R
m
is differen-
tiable if
δF
def
= F(x + δx) − F(x) = Mδx + o(δx)
for some m × n matrix M. M is the derivative of F.
As promised, M does not have a fancy name.
Given an arbitrary function
F
:
R
n
→ R
m
that maps
x 7→ y
and a choice
of basis, we can write
F
as a set of
m
functions
y
j
=
F
j
(
x
) such that
y
=
(y
1
, y
2
, ··· , y
m
). Then
dy
j
=
∂F
j
∂x
i
dx
i
.
and we can write the derivative as
Theorem. The derivative of F is given by
M
ji
=
∂y
j
∂x
i
.
Note that we could have used this as the definition of the derivative. However,
the original definition is superior because it does not require a selection of
coordinate system.
Definition.
A function is smooth if it can be differentiated any number of times.
This requires that all partial derivatives exist and are totally symmetric in
i, j
and k (i.e. the differential operator is commutative).
The functions we will consider will be smooth except where things obviously
go wrong (e.g. f(x) = 1/x at x = 0).
Theorem
(Chain rule)
.
Suppose
g
:
R
p
→ R
n
and
f
:
R
n
→ R
m
. Suppose that
the coordinates of the vectors in
R
p
, R
n
and
R
m
are
u
a
, x
i
and
y
r
respectively.
By the chain rule,
∂y
r
∂u
a
=
∂y
r
∂x
i
∂x
i
∂u
a
,
with summation implied. Writing in matrix form,
M(f ◦ g)
ra
= M(f)
ri
M(g)
ia
.
Alternatively, in operator form,
∂
∂u
a
=
∂x
i
∂u
a
∂
∂x
i
.