1Multivariate calculus
IB Variational Principles
1.2 Convex functions
1.2.1 Convexity
Convex functions is an important class of functions that has a lot of nice
properties. For example, stationary points of convex functions are all minima,
and a convex function can have at most one minimum value. To define convex
functions, we need to first define a convex set.
Definition
(Convex set)
.
A set
S ⊆ R
n
is convex if for any distinct
x, y ∈
S, t ∈
(0
,
1), we have (1
−t
)
x
+
ty ∈ S
. Alternatively, any line joining two points
in S lies completely within S.
non-convex convex
Definition (Convex function). A function f : R
n
→ R is convex if
(i) The domain D(f) is convex
(ii) The function f lies below (or on) all its chords, i.e.
f((1 − t)x + ty) ≤ (1 −t)f(x) + tf(y) (∗)
for all x, y ∈ D(f), t ∈ (0, 1).
A function is strictly convex if the inequality is strict, i.e.
f((1 − t)x + ty) < (1 −t)f(x) + tf(y).
x y
(1 − t)x + ty
(1 − t)f(x) + tf(y)
A function f is (strictly) concave iff −f is (strictly) convex.
Example.
(i) f(x) = x
2
is strictly convex.
(ii) f(x) = |x| is convex, but not strictly.
(iii) f(x) =
1
x
defined on x > 0 is strictly convex.
(iv) f
(
x
) =
1
x
defined on
R
∗
=
R \{
0
}
is not convex. Apart from the fact that
R
∗
is not a convex domain. But even if we defined, like
f
(0) = 0, it is
not convex by considering the line joining (
−
1
, −
1) and (1
,
1) (and in fact
f(x) =
1
x
defined on x < 0 is concave).
1.2.2 First-order convexity condition
While the definition of a convex function seems a bit difficult to work with, if
our function is differentiable, it is easy to check if it is convex.
First assume that our function is once differentiable, and we attempt to find
a first-order condition for convexity. Suppose that
f
is convex. For fixed
x, y
,
we define the function
h(t) = (1 − t)f(x) + tf(y) − f((1 − t)x + ty).
By the definition of convexity of
f
, we must have
h
(
t
)
≥
0. Also, trivially
h(0) = 0. So
h(t) − h(0)
t
≥ 0
for any t ∈ (0, 1). So
h
0
(0) ≥ 0.
On the other hand, we can also differentiate h directly and evaluate at 0:
h
0
(0) = f(y) − f(x) − (y − x) · ∇f(x).
Combining our two results, we know that
f(y) ≥ f (x) + (y − x) ·∇f(x) (†)
It is also true that this condition implies convexity, which is an easy result.
How can we interpret this result? The equation
f
(
x
) + (
y − x
)
· ∇f
(
x
) = 0
defines the tangent plane of
f
at
x
. Hence this condition is saying that a convex
differentiable function lies above all its tangent planes.
We immediately get the corollary
Corollary.
A stationary point of a convex function is a global minimum. There
can be more than one global minimum (e.g. a constant function), but there is at
most one if the function is strictly convex.
Proof. Given x
0
such that ∇f (x
0
) = 0, (†) implies that for any y,
f(y) ≥ f (x
0
) + (y − x
0
) · ∇f(x
0
) = f(x
0
).
We can write our first-order convexity condition in a different way. We can
rewrite (†) into the form
(y − x) · [∇f(y) − ∇f(x)] ≥ f(x) − f(y) − (x − y) · ∇f(y).
By applying (
†
) to the right hand side (with
x
and
y
swapped), we know that
the right hand side is ≥ 0. So we have another first-order condition:
(y − x) · [∇f(y) − ∇f(x)] ≥ 0,
It can be shown that this is equivalent to the other conditions.
This condition might seem a bit weird to comprehend, but all it says is that
∇f
(
x
) is a non-decreasing function. For example, when
n
= 1, the equation
states that (
y − x
)(
f
0
(
y
)
− f
0
(
x
))
≥
0, which is the same as saying
f
0
(
y
)
≥ f
0
(
x
)
whenever y > x.
1.2.3 Second-order convexity condition
We have an even nicer condition when the function is twice differentiable. We
start with the equation we just obtained:
(y − x) · [∇f(y) − ∇f(x)] ≥ 0,
Write y = x + h. Then
h · (∇f(x + h) −∇f(x)) ≥ 0.
Expand the left in Taylor series. Using suffix notation, this becomes
h
i
[h
j
∇
j
∇
i
f + O(h
2
)] ≥ 0.
But ∇
j
∇
i
f = H
ij
. So we have
h
i
H
ij
h
j
+ O(h
3
) ≥ 0
This is true for all
h
if the Hessian
H
is positive semi-definite (or simply positive),
i.e. the eigenvalues are non-negative. If they are in fact all positive, then we say
H is positive definite.
Hence convexity implies that the Hessian matrix is positive for all
x ∈ D
(
f
).
Strict convexity implies that it is positive definite.
The converse is also true — if the Hessian is positive, then it is convex.
Example. Let f(x, y) =
1
xy
for x, y > 0. Then the Hessian is
H =
1
xy
2
x
2
1
xy
1
xy
2
y
2
The determinant is
det H =
3
x
4
y
4
> 0
and the trace is
tr H =
2
xy
1
x
2
+
1
y
2
> 0.
So f is convex.
To conclude that
f
is convex, we only used the fact that
xy
is positive,
instead of
x
and
y
being individually positive. Then could we relax the domain
condition to be
xy >
0 instead? The answer is no, because in this case, the
function will no longer be convex!