IB Variational Principles - Multivariate calculus

1Multivariate calculus

IB Variational Principles

1.2 Convex functions

1.2.1 Convexity

Convex functions is an important class of functions that has a lot of nice

properties. For example, stationary points of convex functions are all minima,

and a convex function can have at most one minimum value. To define convex

functions, we need to first define a convex set.

Definition

(Convex set)

A set

S ⊆ R

is convex if for any distinct

x, y ∈

S, t ∈

1), we have (1

−t

)

ty ∈ S

. Alternatively, any line joining two points

in S lies completely within S.

non-convex convex

Definition (Convex function). A function f : R

→ R is convex if

(i) The domain D(f) is convex

(ii) The function f lies below (or on) all its chords, i.e.

f((1 − t)x + ty) ≤ (1 −t)f(x) + tf(y) (∗)

for all x, y ∈ D(f), t ∈ (0, 1).

A function is strictly convex if the inequality is strict, i.e.

f((1 − t)x + ty) < (1 −t)f(x) + tf(y).

x y

(1 − t)x + ty

(1 − t)f(x) + tf(y)

A function f is (strictly) concave iff −f is (strictly) convex.

Example.

(i) f(x) = x

is strictly convex.

(ii) f(x) = |x| is convex, but not strictly.

(iii) f(x) =

defined on x > 0 is strictly convex.

(iv) f

(

) =

defined on

∗

R \{

}

is not convex. Apart from the fact that

∗

is not a convex domain. But even if we defined, like

(0) = 0, it is

not convex by considering the line joining (

−

, −

1) and (1

1) (and in fact

f(x) =

defined on x < 0 is concave).

1.2.2 First-order convexity condition

While the definition of a convex function seems a bit difficult to work with, if

our function is differentiable, it is easy to check if it is convex.

First assume that our function is once differentiable, and we attempt to find

a first-order condition for convexity. Suppose that

is convex. For fixed

x, y

we define the function

h(t) = (1 − t)f(x) + tf(y) − f((1 − t)x + ty).

By the definition of convexity of

, we must have

(

)

≥

0. Also, trivially

h(0) = 0. So

h(t) − h(0)

≥ 0

for any t ∈ (0, 1). So

(0) ≥ 0.

On the other hand, we can also differentiate h directly and evaluate at 0:

(0) = f(y) − f(x) − (y − x) · ∇f(x).

Combining our two results, we know that

f(y) ≥ f (x) + (y − x) ·∇f(x) (†)

It is also true that this condition implies convexity, which is an easy result.

How can we interpret this result? The equation

(

) + (

y − x

)

· ∇f

(

) = 0

defines the tangent plane of

. Hence this condition is saying that a convex

differentiable function lies above all its tangent planes.

We immediately get the corollary

Corollary.

A stationary point of a convex function is a global minimum. There

can be more than one global minimum (e.g. a constant function), but there is at

most one if the function is strictly convex.

Proof. Given x

such that ∇f (x

) = 0, (†) implies that for any y,

f(y) ≥ f (x

) + (y − x

) · ∇f(x

) = f(x

We can write our first-order convexity condition in a different way. We can

rewrite (†) into the form

(y − x) · [∇f(y) − ∇f(x)] ≥ f(x) − f(y) − (x − y) · ∇f(y).

By applying (

†

) to the right hand side (with

and

swapped), we know that

the right hand side is ≥ 0. So we have another first-order condition:

(y − x) · [∇f(y) − ∇f(x)] ≥ 0,

It can be shown that this is equivalent to the other conditions.

This condition might seem a bit weird to comprehend, but all it says is that

∇f

(

) is a non-decreasing function. For example, when

= 1, the equation

states that (

y − x

)(

(

)

− f

(

))

≥

0, which is the same as saying

(

)

≥ f

(

)

whenever y > x.

1.2.3 Second-order convexity condition

We have an even nicer condition when the function is twice differentiable. We

start with the equation we just obtained:

(y − x) · [∇f(y) − ∇f(x)] ≥ 0,

Write y = x + h. Then

h · (∇f(x + h) −∇f(x)) ≥ 0.

Expand the left in Taylor series. Using suffix notation, this becomes

∇

f + O(h

)] ≥ 0.

But ∇

∇

f = H

. So we have

+ O(h

) ≥ 0

This is true for all

if the Hessian

is positive semi-definite (or simply positive),

i.e. the eigenvalues are non-negative. If they are in fact all positive, then we say

H is positive definite.

Hence convexity implies that the Hessian matrix is positive for all

x ∈ D

(

Strict convexity implies that it is positive definite.

The converse is also true — if the Hessian is positive, then it is convex.

Example. Let f(x, y) =

for x, y > 0. Then the Hessian is

H =





The determinant is

det H =

> 0

and the trace is

tr H =





> 0.

So f is convex.

To conclude that

is convex, we only used the fact that

is positive,

instead of

and

being individually positive. Then could we relax the domain

condition to be

xy >

0 instead? The answer is no, because in this case, the

function will no longer be convex!