1Multivariate calculus

IB Variational Principles



1.4 Lagrange multipliers
At the beginning, we considered the problem of unconstrained maximization.
We wanted to maximize
f
(
x, y
) where
x, y
can be any real value. However,
sometimes we want to restrict to certain values of (
x, y
). For example, we might
want x and y to satisfy x + y = 10.
We take a simple example of a hill. We model it using the function
f
(
x, y
)
given by the height above the ground. The hilltop would be given by the
maximum of f, which satisfies
0 = df = f · dx
for any (infinitesimal) displacement dx. So we need
f = 0.
This would be a case of unconstrained maximization, since we are considering all
possible values of x and y.
A problem of constrained maximization would be as follows: we have a path
p defined by p(x, y) = 0. What is the highest point along the path p?
We still need
f ·
d
x
= 0, but now d
x
is not arbitrary. We only consider
the d
x
parallel to the path. Alternatively,
f
has to be entirely perpendicular
to the path. Since we know that the normal to the path is
p
, our condition
becomes
f = λp
for some lambda
λ
. Of course, we still have the constraint
p
(
x, y
) = 0. So what
we have to solve is
f = λp
p = 0
for the three variables x, y, λ.
Alternatively, we can change this into a single problem of unconstrained
extremization. We ask for the stationary points of the function
φ
(
x, y, λ
) given
by
φ(x, y, λ) = f(x, y) λp(x, y)
When we maximize against the variables
x
and
y
, we obtain the
f
=
λp
condition, and maximizing against λ gives the condition p = 0.
Example.
Find the radius of the smallest circle centered on origin that intersects
y = x
2
1.
(i)
First do it the easy way: for a circle of radius
R
to work,
x
2
+
y
2
=
R
2
and y = x
2
1 must have a solution. So
(x
2
)
2
x
2
+ 1 R
2
= 0
and
x
2
=
1
2
±
r
R
2
3
4
So R
min
=
3/2.
(ii)
We can also view this as a variational problem. We want to minimize
f
(
x, y
) =
x
2
+
y
2
subject to the constraint
p
(
x, y
) = 0 for
p
(
x, y
) =
yx
2
+1.
We can solve this directly. We can solve the constraint to obtain
y
=
x
2
1.
Then
R
2
(x) = f(x, y(x)) = (x
2
)
2
x
2
+ 1
We look for stationary points of R
2
:
(R
2
(x))
0
= 0 x
x
2
1
2
= 0
So
x
= 0 and
R
= 1; or
x
=
±
1
2
and
R
=
3
2
. Since
3
2
is smaller, this is
our minimum.
(iii)
Finally, we can use Lagrange multipliers. We find stationary points of the
function
φ(x, y, λ) = f(x, y) λp(x, y) = x
2
+ y
2
λ(y x
2
+ 1)
The partial derivatives give
φ
x
= 0 2x(1 + λ) = 0
φ
y
= 0 2y λ = 0
φ
λ
= 0 y x
2
+ 1 = 0
The first equation gives us two choices
x
= 0. Then the third equation gives
y
=
1. So
R
=
p
x
2
+ y
2
= 1.
λ
=
1. So the second equation gives
y
=
1
2
and the third gives
x = ±
1
2
. Hence R =
3
2
is the minimum.
This can be generalized to problems with functions
R
n
R
using the same
logic.
Example. For x R
n
, find the minimum of the quadratic form
f(x) = x
i
A
ij
x
j
on the surface |x|
2
= 1.
(i)
The constraint imposes a normalization condition on
x
. But if we scale up
x, f(x) scales accordingly. So if we define
Λ(x) =
f(x)
g(x)
, g(x) = |x|
2
,
the problem is equivalent to minimization of Λ(
x
) without constraint. Then
i
Λ(x) =
2
g
A
ij
x
j
f
g
x
i
So we need
Ax = Λx
So the extremal values of Λ(
x
) are the eigenvalues of
A
. So Λ
min
is the
lowest eigenvalue.
This answer is intuitively obvious if we diagonalize A.
(ii)
We can also do it with Lagrange multipliers. We want to find stationary
values of
φ(x, λ) = f(x) λ(|x|
2
1).
So
0 = φ A
ij
x
j
= λx
i
Differentiating with respect to λ gives
φ
λ
= 0 |x|
2
= 1.
So we get the same set of equations.
Example.
Find the probability distribution
{p
1
, ··· , p
n
}
satisfying
P
i
p
i
= 1
that maximizes the information entropy
S =
n
X
i=1
p
i
log p
i
.
We look for stationary points of
φ(p, λ) =
n
X
i=1
p
i
ln p
i
λ
n
X
i=1
p
i
+ λ.
We have
φ
p
i
= ln p
i
(1 + λ) = 0.
So
p
i
= e
(1+λ)
.
It is the same for all i. So we must have p
i
=
1
n
.