2The method of Lagrange multipliers

IB Optimisation

2 The method of Lagrange multipliers

So how do we solve the problem of constrained maximization? The trick here

is to include the constraints into the constraints into the objective function, so

that things outside the constraint will not be thought to be minima.

Suppose the original problem is

minimize f(x) subject to h(x) = b, x ∈ X.

Call the constraint (P ).

Definition (Lagrangian). The Lagrangian of a constraint (P) is defined as

L(x, λ) = f(x) − λ

T

(h(x) − b).

for λ ∈ R

m

. λ is known as the Lagrange multiplier.

Note that when the constraint is satisfied,

h

(

x

)

− b

= 0, and

L

(

x, λ

) =

f

(

x

).

We could as well have used

L(x, λ) = f(x) + λ

T

(h(x) − b).

since we just have to switch the sign of

λ

. So we don’t have to worry about

getting the sign of λ wrong when defining the Lagrangian.

If we minimize

L

over both

x

and

λ

, then we will magically find the minimal

solution subject to the constrains. Sometimes.

Theorem (Lagrangian sufficiency). Let x

∗

∈ X and λ

∗

∈ R

m

be such that

L(x

∗

, λ

∗

) = inf

x∈X

L(x, λ

∗

) and h(x

∗

) = b.

Then x

∗

is optimal for (P ).

In words, if

x

∗

minimizes

L

for a fixed

λ

∗

, and

x

∗

satisfies the constraints,

then x

∗

minimizes f.

This looks like a pretty powerful result, but it turns out that it is quite easy

to prove.

Proof.

We first define the “feasible set”: let

X

(

b

) =

{x ∈ X

:

h

(

x

) =

b}

, i.e. the

set of all x that satisfies the constraints. Then

min

x∈X(b)

f(x) = min

x∈X(b)

(f(x) − λ

∗T

(h(x) − b)) since h(x) − b = 0

≥ min

x∈X

(f(x) − λ

∗T

(h(x) − b))

= f(x

∗

) − λ

∗T

(h(x

∗

) − b).

= f(x

∗

).

How can we interpret this result? To find these values of

λ

∗

and

x

∗

, we have

to solve

∇L = 0

h(x) = b.

Alternatively, we can write this as

∇f = λ∇h

h(x) = b.

What does this mean? For better visualization, we take the special case where

f

and

h

are a functions

R

2

→ R

. Usually, if we want to minimize

f

without

restriction, then for small changes in

x

, there should be no (first-order) change

in f, i.e. df = ∇f · dx = 0. This has to be true for all possible directions of x.

However, if we are constrained by

h

(

x

) =

b

, this corresponds to forcing

x

to

lie along this particular path. Hence the restriction d

f

= 0 only has to hold when

x

lies along the path. Since we need

∇f ·

d

x

= 0, this means that

∇f

has to be

perpendicular to d

x

. Alternatively,

∇f

has to be parallel to the normal to the

path. Since the normal to the path is given by

∇h

, we obtain the requirement

∇f = λ∇h.

This is how we should interpret the condition

∇f

=

λ∇h

. Instead of requiring

that

∇f

= 0 as in usual minimization problems, we only require

∇f

to point at

directions perpendicular to the allowed space.

Example. Minimize x

1

+ x

2

− 2x

3

subject to

x

1

+ x

2

+ x

3

= 5

x

2

1

+ x

2

2

= 4

The Lagrangian is

L(x, λ) = x

1

− x

2

− 2x

3

− λ

1

(x

1

+ x

2

+ x

3

− 5) − λ

2

(x

2

1

+ x

2

2

− 4)

= ((1 − λ

1

)x

1

− 2λ

2

x

2

1

) + ((−1 − λ

1

)x

2

− λ

2

x

2

2

)

+ (−2 − λ

1

)x

3

+ 5λ

1

+ 4λ

2

We want to pick a

λ

∗

and

x

∗

such that

L

(

x

∗

, λ

∗

) is minimal. Then in particular,

for our λ

∗

, L(x, λ

∗

) must have a finite minimum.

We note that (

−

2

− λ

1

)

x

3

does not have a finite minimum unless

λ

1

=

−

2,

since

x

3

can take any value. Also, the terms in

x

1

and

x

2

do not have a finite

minimum unless λ

2

< 0.

With these in mind, we find a minimum by setting all first derivatives to be

0:

∂L

∂x

1

= 1 − λ

1

− 2λ

2

x

1

= 3 − 2λ

2

x

1

∂L

∂x

2

= −1 − λ

1

− 2λ

2

x

2

= 1 − 2λ

2

x

2

Since these must be both 0, we must have

x

1

=

3

2λ

2

, x

2

=

1

2λ

2

.

To show that this is indeed a minimum, we look at the Hessian matrix:

HL =

−2λ

2

0

0 −2λ

2

which is positive semidefinite when

λ

2

<

0, which is the condition we came up

with at the beginning.

Let Y = {λ : R

2

: λ

1

= −2, λ

2

< 0} be our helpful values of λ.

So we have shown above that for every

λ ∈ Y

,

L

(

x, λ

) has a unique minimum

at x(λ) = (

3

2λ

2

,

1

2λ2

, x

3

)

T

.

Now all we have to do is find

λ

and

x

such that

x

(

λ

) satisfies the functional

constraints. The second constraint gives

x

2

1

+ x

2

2

=

9

4λ

2

+

1

4λ

2

2

= 4 ⇔ λ

2

= −

r

5

8

.

The first constraint gives

x

3

= 5 − x

1

− x

2

.

So the theorem implies that

x

1

= −3

r

2

5

, x

2

= −

r

2

5

, x

3

= 5 + 4

r

2

5

.

So far so good. But what if our functional constraint is an inequality? We

will need slack variables.

To minimize f(x) subject to h(x) ≤ b, x ∈ X, we proceed as follows:

(i)

Introduce slack variables to obtain the equivalent problem, to minimize

f(x) subject to h(x) + z = b, x ∈ X, z ≥ 0.

(ii) Compute the Lagrangian

L(x, z, λ) = f(x) − λ

T

(f(x) + z − b).

(iii) Find

Y =

λ : inf

x∈X,z≥0

L(x, z, λ) > −∞

.

(iv) For each λ ∈ Y , minimize L(x, z, λ), i.e. find

x

∗

(λ) ∈ X, z

∗

(λ) ≥ 0

such that

L(x

∗

(λ), z

∗

(λ), λ) = inf

x∈X,z≥0

L(x, z, λ)

(v) Find λ

∗

∈ Y such that

h(x

∗

(λ

∗

)) + z

∗

(λ

∗

) = b.

Then by the Lagrangian sufficiency condition,

x

∗

(

λ

∗

) is optimal for the con-

strained problem.