2The method of Lagrange multipliers

IB Optimisation



2 The method of Lagrange multipliers
So how do we solve the problem of constrained maximization? The trick here
is to include the constraints into the constraints into the objective function, so
that things outside the constraint will not be thought to be minima.
Suppose the original problem is
minimize f(x) subject to h(x) = b, x X.
Call the constraint (P ).
Definition (Lagrangian). The Lagrangian of a constraint (P) is defined as
L(x, λ) = f(x) λ
T
(h(x) b).
for λ R
m
. λ is known as the Lagrange multiplier.
Note that when the constraint is satisfied,
h
(
x
)
b
= 0, and
L
(
x, λ
) =
f
(
x
).
We could as well have used
L(x, λ) = f(x) + λ
T
(h(x) b).
since we just have to switch the sign of
λ
. So we don’t have to worry about
getting the sign of λ wrong when defining the Lagrangian.
If we minimize
L
over both
x
and
λ
, then we will magically find the minimal
solution subject to the constrains. Sometimes.
Theorem (Lagrangian sufficiency). Let x
X and λ
R
m
be such that
L(x
, λ
) = inf
xX
L(x, λ
) and h(x
) = b.
Then x
is optimal for (P ).
In words, if
x
minimizes
L
for a fixed
λ
, and
x
satisfies the constraints,
then x
minimizes f.
This looks like a pretty powerful result, but it turns out that it is quite easy
to prove.
Proof.
We first define the “feasible set”: let
X
(
b
) =
{x X
:
h
(
x
) =
b}
, i.e. the
set of all x that satisfies the constraints. Then
min
xX(b)
f(x) = min
xX(b)
(f(x) λ
T
(h(x) b)) since h(x) b = 0
min
xX
(f(x) λ
T
(h(x) b))
= f(x
) λ
T
(h(x
) b).
= f(x
).
How can we interpret this result? To find these values of
λ
and
x
, we have
to solve
L = 0
h(x) = b.
Alternatively, we can write this as
f = λh
h(x) = b.
What does this mean? For better visualization, we take the special case where
f
and
h
are a functions
R
2
R
. Usually, if we want to minimize
f
without
restriction, then for small changes in
x
, there should be no (first-order) change
in f, i.e. df = f · dx = 0. This has to be true for all possible directions of x.
However, if we are constrained by
h
(
x
) =
b
, this corresponds to forcing
x
to
lie along this particular path. Hence the restriction d
f
= 0 only has to hold when
x
lies along the path. Since we need
f ·
d
x
= 0, this means that
f
has to be
perpendicular to d
x
. Alternatively,
f
has to be parallel to the normal to the
path. Since the normal to the path is given by
h
, we obtain the requirement
f = λh.
This is how we should interpret the condition
f
=
λh
. Instead of requiring
that
f
= 0 as in usual minimization problems, we only require
f
to point at
directions perpendicular to the allowed space.
Example. Minimize x
1
+ x
2
2x
3
subject to
x
1
+ x
2
+ x
3
= 5
x
2
1
+ x
2
2
= 4
The Lagrangian is
L(x, λ) = x
1
x
2
2x
3
λ
1
(x
1
+ x
2
+ x
3
5) λ
2
(x
2
1
+ x
2
2
4)
= ((1 λ
1
)x
1
2λ
2
x
2
1
) + ((1 λ
1
)x
2
λ
2
x
2
2
)
+ (2 λ
1
)x
3
+ 5λ
1
+ 4λ
2
We want to pick a
λ
and
x
such that
L
(
x
, λ
) is minimal. Then in particular,
for our λ
, L(x, λ
) must have a finite minimum.
We note that (
2
λ
1
)
x
3
does not have a finite minimum unless
λ
1
=
2,
since
x
3
can take any value. Also, the terms in
x
1
and
x
2
do not have a finite
minimum unless λ
2
< 0.
With these in mind, we find a minimum by setting all first derivatives to be
0:
L
x
1
= 1 λ
1
2λ
2
x
1
= 3 2λ
2
x
1
L
x
2
= 1 λ
1
2λ
2
x
2
= 1 2λ
2
x
2
Since these must be both 0, we must have
x
1
=
3
2λ
2
, x
2
=
1
2λ
2
.
To show that this is indeed a minimum, we look at the Hessian matrix:
HL =
2λ
2
0
0 2λ
2
which is positive semidefinite when
λ
2
<
0, which is the condition we came up
with at the beginning.
Let Y = {λ : R
2
: λ
1
= 2, λ
2
< 0} be our helpful values of λ.
So we have shown above that for every
λ Y
,
L
(
x, λ
) has a unique minimum
at x(λ) = (
3
2λ
2
,
1
2λ2
, x
3
)
T
.
Now all we have to do is find
λ
and
x
such that
x
(
λ
) satisfies the functional
constraints. The second constraint gives
x
2
1
+ x
2
2
=
9
4λ
2
+
1
4λ
2
2
= 4 λ
2
=
r
5
8
.
The first constraint gives
x
3
= 5 x
1
x
2
.
So the theorem implies that
x
1
= 3
r
2
5
, x
2
=
r
2
5
, x
3
= 5 + 4
r
2
5
.
So far so good. But what if our functional constraint is an inequality? We
will need slack variables.
To minimize f(x) subject to h(x) b, x X, we proceed as follows:
(i)
Introduce slack variables to obtain the equivalent problem, to minimize
f(x) subject to h(x) + z = b, x X, z 0.
(ii) Compute the Lagrangian
L(x, z, λ) = f(x) λ
T
(f(x) + z b).
(iii) Find
Y =
λ : inf
xX,z0
L(x, z, λ) > −∞
.
(iv) For each λ Y , minimize L(x, z, λ), i.e. find
x
(λ) X, z
(λ) 0
such that
L(x
(λ), z
(λ), λ) = inf
xX,z0
L(x, z, λ)
(v) Find λ
Y such that
h(x
(λ
)) + z
(λ
) = b.
Then by the Lagrangian sufficiency condition,
x
(
λ
) is optimal for the con-
strained problem.

Contents