3The Lasso and beyond

III Modern Statistical Methods



3.4 Properties of Lasso solutions
Let’s now apply this to the Lasso. Recall that the Lasso objective was
Q
λ
(β) =
1
2n
kY Xβk
2
2
+ λkβk
1
.
We know this is a convex function in
β
. So we know
ˆ
β
L
λ
is a minimizer of
Q
λ
(
β
)
iff 0 Q
λ
(
ˆ
β).
Let’s subdifferentiate Q
λ
and see what this amounts to. We have
Q
λ
(
ˆ
β) =
1
n
X
T
(Y Xβ)
+ λ
n
ˆν R
d
: kˆνk
1, ˆν|
ˆρ
L
λ
= sgn(
ˆ
β
L
λ,ˆρ
L
λ
)
o
,
where ˆρ
L
λ
= {j :
ˆ
β
L
λ,j
6= 0}.
Thus, 0 Q
λ
(
ˆ
β
λ
) is equivalent to
ˆν =
1
λ
·
1
n
X
T
(Y X
ˆ
β
λ
)
satisfying
kˆνk
1, ˆν
ˆρ
L
λ
= sgn(
ˆ
β
L
λ,ˆρ
L
λ
).
These are known as the KKT conditions for the Lasso.
In principle, there could be several solutions to the Lasso. However, at least
the fitted values are always unique.
Proposition. X
ˆ
β
L
λ
is unique.
Proof.
Fix
λ >
0 and stop writing it. Suppose
ˆ
β
(1)
and
ˆ
β
(2)
are two Lasso
solutions at λ. Then
Q(
ˆ
β
(1)
) = Q(
ˆ
β
(2)
) = c
.
As Q is convex, we know
c
= Q
1
2
ˆ
β
(1)
+
1
2
ˆ
β
(2)
1
2
Q(
ˆ
β
(1)
) +
1
2
Q(
ˆ
β
(2)
) = c
.
So
1
2
ˆ
β
(1)
+
1
2
ˆ
β
(2)
is also a minimizer.
Since the two terms in
Q
(
β
) are individually convex, it must be the case that
1
2
(Y X
ˆ
β
(1)
) +
1
2
(Y X
ˆ
β
(2)
)
2
2
=
1
2
Y X
ˆ
β
(1)
2
2
+
1
2
Y X
ˆ
β
(2)
2
2
1
2
(
ˆ
β
(1)
+
ˆ
β
(2)
)
1
=
1
2
k
ˆ
β
(1)
k
1
+
1
2
k
ˆ
β
(2)
k
1
.
Moreover, since
k · k
2
2
is strictly convex, we can have equality only if
X
ˆ
β
(1)
=
X
ˆ
β
(2)
. So we are done.
Definition (Equicorrelation set). Define the equicorrelation set
ˆ
E
λ
to be the
set of k such that
1
n
|X
T
k
(Y X
ˆ
β
L
λ
)| = λ,
or equivalently, the
k
with
ν
k
=
±
1, which is well-defined since it depends only
on the fitted values.
By the KKT conditions,
ˆ
E
λ
contains the set of non-zeroes of Lasso solution,
but may be strictly bigger than that.
Note that if rk(X
ˆ
E
λ
) = |
ˆ
E
λ
|, then the Lasso solution must be unique since
X
ˆ
E
λ
(
ˆ
β
(1)
ˆ
β
(2)
) = 0.
So
ˆ
β
(1)
=
ˆ
β
(2)
.