Part III Modular Forms and L-functions
Based on lectures by A. J. Scholl
Notes taken by Dexter Chua
Lent 2017
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Mo dular Forms are classical objects that appear in many areas of mathematics, from
number theory to representation theory and mathematical physics. Most famous is,
of course, the role they played in the proof of Fermat’s Last Theorem, through the
conjecture of Shimura-Taniyama-Weil that elliptic curves are modular. One connection
b etween modular forms and arithmetic is through the medium of
L
-functions, the
basic example of which is the Riemann
ζ
-function. We will discuss various types of
L-function in this course and give arithmetic applications.
Pre-requisites
Prerequisites for the course are fairly modest; from number theory, apart from basic
elementary notions, some knowledge of quadratic fields is desirable. A fair chunk of the
course will involve (fairly 19th-century) analysis, so we will assume the basic theory of
holomorphic functions in one complex variable, such as are found in a first course on
complex analysis (e.g. the 2nd year Complex Analysis course of the Tripos).
Contents
0 Introduction
1 Some preliminary analysis
1.1 Characters of abelian groups
1.2 Fourier transforms
1.3 Mellin transform and Γ-function
2 Riemann ζ-function
3 Dirichlet L-functions
4 The modular group
5 Modular forms of level 1
5.1 Basic definitions
5.2 The space of modular forms
5.3 Arithmetic of
6 Hecke operators
6.1 Hecke operators and algebras
6.2 Hecke operators on modular forms
7 L-functions of eigenforms
8 Modular forms for subgroups of SL
2
(Z)
8.1 Definitions
8.2 The Petersson inner product
8.3 Examples of modular forms
9 Hecke theory for Γ
0
(N)
10 Modular forms and rep theory
0 Introduction
One of the big problems in number theory is the so-called Langlands programme,
which is relates “arithmetic objects” such as representations of the Galois group
and elliptic curves over
Q
, with “analytic objects” such as modular forms and
more generally automorphic forms and representations.
Example. y
2
+
y
=
x
3
x
is an elliptic curve, and we can associate to it the
function
f(z) = q
Y
n1
(1 q
n
)
2
(1 q
11n
)
2
=
X
n=1
a
n
q
n
, q = e
2πiz
,
where we assume
Im z >
0, so that
|q| <
1. The relation between these two
objects is that the number of points of
E
over
F
p
is equal to 1 +
p a
p
, for
p 6
= 11. This strange function
f
is a modular form, and is actually cooked up
from the slightly easier function
η(z) = q
1/24
Y
n=1
(1 q
n
)
by
f(z) = (η(z)η(11z))
2
.
This function
η
is called the Dedekind eta function, and is one of the simplest
examples of a modular forms (in the sense that we can write it down easily).
This satisfies the following two identities:
η(z + 1) = e
/12
η(z), η
1
z
=
r
z
i
η(z).
The first is clear, and the second takes some work to show. These transformation
laws are exactly what makes this thing a modular form.
Another way to link E and f is via the L-series
L(E, s) =
X
n=1
a
n
n
s
,
which is a generalization of the Riemann ζ-function
ζ(s) =
X
n=1
1
n
s
.
We are in fact not going to study elliptic curves, as there is another course
on that, but we are going to study the modular forms and these
L
-series. We are
going to do this in a fairly classical way, without using algebraic number theory.
1 Some preliminary analysis
1.1 Characters of abelian groups
When we were young, we were forced to take some “applied” courses, and learnt
about these beasts known as Fourier transforms. At that time, the hope was
that we can leave them for the engineers, and never meet them ever again.
Unfortunately, it turns out Fourier transforms are also important in “pure”
mathematics, and we must understand them well.
Let’s recall how Fourier transforms worked. We had two separate but closely
related notions. First of all, we can take the Fourier transform of a function
f : R C. The idea is that we wanted to write any such function as
f(x) =
Z
−∞
e
2πiyx
ˆ
f(y) dy.
One way to think about this is that we are expanding
f
in the basis
χ
y
(
x
) =
e
2πiyx
.
We also could take the Fourier series of a periodic function, i.e. a function defined
on R/Z. In this case, we are trying to write our function as
f(x) =
X
n=−∞
c
n
e
2πinx
.
In this case, we are expanding our function in the basis
χ
n
(
x
) =
e
2πinx
. What
is special about these basis {χ
y
} and {χ
y
}?
We observe that
R
and
R/Z
are not just topological spaces, but in fact
abelian topological groups. These
χ
y
and
χ
n
are not just functions to
C
, but
continuous group homomorphisms to U(1)
C
. In fact, these give all continuous
group homomorphisms from R and R/Z to U(1).
Definition
(Character)
.
Let
G
be an abelian topological group. A (unitary)
character of
G
is a continuous homomorphism
χ
:
G
U(1), where U(1) =
{z
C | |z| = 1}.
To better understand Fourier transforms, we must understand characters,
and we shall spend some time doing so.
Example. For any group G, there is the trivial character χ
0
(g) 1.
Example. The product of two characters is a character.
Example. If χ is a character, then so is χ
, and χχ
= 1.
Thus, we see that the collection of all characters form a group under multi-
plication.
Definition
(Character group)
.
Let
G
be a group. The character group (or
Pontryagin dual )
ˆ
G is the group of all characters of G.
It is usually not hard to figure out what the character group is.
Example. Let G = R. For y R, we let χ
y
: R U(1) be
χ
y
(x) = e
2πixy
.
For each
y R
, this is a character, and all characters are of this form. So
ˆ
R
=
R
under this correspondence.
Example.
Take
G
=
Z
with the discrete topology. A character is uniquely
determined by the image of 1, and any element of U(1) can be the image of 1.
So we have
ˆ
G
=
U(1).
Example.
Take
G
=
Z/N Z
. Then the character is again determined by the
image of 1, and the allowed values are exactly the N th roots of unity. So
ˆ
G
=
µ
N
= {ζ C
×
: ζ
N
= 1}.
Example.
Let
G
=
G
1
× G
2
. Then
ˆ
G
=
ˆ
G
1
×
ˆ
G
2
. So, for example,
ˆ
R
n
=
R
n
.
Under this correspondence, a y R
n
corresponds to
χ
y
(x) = e
2πx·y
.
Example. Take G = R
×
. We have
G
=
1} × R
×
>0
=
1} × R,
where we have an isomorphism between
R
×
>0
=
R
by the exponential map. So
we have
ˆ
G
=
Z/2Z × R.
Explicitly, given (ε, σ) Z/2Z × R, then character is given by
x 7→ sgn(x)
ε
|x|
.
Note that
ˆ
G
has a natural topology for which the evaluation maps (
χ
ˆ
G
)
7→ χ
(
g
)
U(1) are all continuous for all
g
. Moreover, evaluation gives us a
map G
ˆ
ˆ
G.
Theorem
(Pontryagin duality)
.
Pontryagin duality If
G
is locally compact,
then G
ˆ
ˆ
G is an isomorphism.
Since this is a course on number theory, and not topological groups, we will
not prove this.
Proposition.
Let
G
be a finite abelian group. Then
|
ˆ
G|
=
|G|
, and
G
and
ˆ
G
are in fact isomorphic, but not canonically.
Proof.
By the classification of finite abelian groups, we know
G
is a product of
cyclic groups. So it suffices to prove the result for cyclic groups
Z/N Z
, and the
result is clear since
\
Z/N Z = µ
N
=
Z/N Z.
1.2 Fourier transforms
Equipped with the notion of characters, we can return to our original goal of
understand Fourier transforms. We shall first recap the familiar definitions of
Fourier transforms in specific cases, and then come up with the definition of
Fourier transforms in full generality. In the mean time, we will get some pesky
analysis out of the way.
Definition
(Fourier transform)
.
Let
f
:
R C
be an
L
1
function, i.e.
R
|f|
d
x <
. The Fourier transform is
ˆ
f(y) =
Z
−∞
e
2πixy
f(x) dx =
Z
−∞
χ
y
(x)
1
f(x) dx.
This is a bounded and continuous function on R.
We will see that the “correct” way to think about the Fourier transform is
to view it as a function on
ˆ
R instead of R.
In general, there is not much we can say about how well-behaved
ˆ
f
will
be. In particular, we cannot expect the “Fourier inversion theorem” to hold for
general
L
1
functions. If we like analysis, then we can figure out exactly how
much we need to assume about
ˆ
f
. But we don’t. We chicken out and only
consider functions that decay rely fast at infinity. This makes our life much
easier.
Definition (Schwarz space). The Schwarz space is defined by
S(R) = {f C
(R) : x
n
f
(k)
(x) 0 as x ±∞ for all k, n 0}.
Example. The function
f(x) = e
πx
2
.
is in the Schwarz space.
One can prove the following:
Proposition. If f S(R), then
ˆ
f S(R), and the Fourier inversion formula
ˆ
ˆ
f = f (x)
holds.
Everything carries over when we replace
R
with
R
n
, as long as we promote
both x and y into vectors.
We can also take the Fourier transform of functions defined on
G
=
R/Z
.
For n Z, we let χ
n
ˆ
G by
χ
n
(x) = e
2πinx
.
These are exactly all the elements of
ˆ
G
, and
ˆ
G
=
Z
. We then define the Fourier
coefficients of a periodic function f : R/Z C by
c
n
(f) =
Z
1
0
e
2πinx
f(x) dx =
Z
R/Z
χ
n
(x)
1
f(x) dx.
Again, under suitable regularity conditions on
f
, e.g. if
f C
(
R/Z
), we have
Proposition.
f(x) =
X
nZ
c
n
(f)e
2πinx
=
X
nZ
=
ˆ
G
c
n
(f)χ
n
(x).
This is the Fourier inversion formula for G = R/Z.
Finally, in the case when G = Z/NZ, we can define
Definition
(Discrete Fourier transform)
.
Given a function
f
:
Z/N Z C
, we
define the Fourier transform
ˆ
f : µ
N
C by
ˆ
f(ζ) =
X
aZ/NZ
ζ
a
f(a).
This time there aren’t convergence problems to worry with, so we can quickly
prove this result:
Proposition. For a function f : Z/NZ C, we have
f(x) =
1
N
X
ζµ
N
ζ
x
ˆ
f(ζ).
Proof.
We see that both sides are linear in
f
, and we can write each function
f
as
f =
X
aZ/NZ
f(a)δ
a
,
where
δ
a
(x) =
(
1 x = a
0 x 6= a
.
So we wlog f = δ
a
. Thus we have
ˆ
f(ζ) = ζ
a
,
and the RHS is
1
N
X
ζµ
N
ζ
xa
.
We now note the fact that
X
ζµ
N
ζ
k
=
(
N k 0 (mod N)
0 otherwise
.
So we know that the RHS is equal to δ
a
, as desired.
It is now relatively clear what the general picture should be, except that we
need a way to integrate functions defined on an abelian group. Since we are not
doing analysis, we shall not be very precise about what we mean:
Definition
(Haar measure)
.
Let
G
be a topological group. A Haar measure
is a left translation-invariant Borel measure on
G
satisfying some regularity
conditions (e.g. being finite on compact sets).
Theorem. Let G be a locally compact abelian group G. Then there is a Haar
measure on G, unique up to scaling.
Example. On G = R, the Haar measure is the usual Lebesgue measure.
Example.
If
G
is discrete, then the Haar measure is the counting measure, so
that
Z
f dg =
X
gG
f(g).
Example. If G = R
×
>0
, then the integral given by the Haar measure is
Z
f(x)
dx
x
,
since
dx
x
is invariant under multiplication of x by a constant.
Now we can define the general Fourier transform.
Definition
(Fourier transform)
.
Let
G
be a locally compact abelian group with
a Haar measure d
g
, and let
f
:
G C
be a continuous
L
1
function. The Fourier
transform
ˆ
f :
ˆ
G C is given by
ˆ
f(χ) =
Z
G
χ(g)
1
f(g) dg.
It is possible to prove the following theorem:
Theorem
(Fourier inversion theorem)
.
Given a locally compact abelian group
G
with a fixed Haar measure, there is some constant
C
such that for “suitable”
f : G C, we have
ˆ
ˆ
f(g) = Cf(g),
using the canonical isomorphism G
ˆ
ˆ
G.
This constant is necessary, because the measure is only defined up to a
multiplicative constant.
One of the most important results of this investigation is the following result:
Theorem (Poisson summation formula). Let f S(R
n
). Then
X
aZ
n
f(a) =
X
bZ
n
ˆ
f(b).
Proof. Let
g(x) =
X
aZ
n
f(x + a).
This is now a function that is invariant under translation of
Z
n
. It is easy to
check this is a well-defined
C
function on
R
n
/Z
n
, and so has a Fourier series.
We write
g(x) =
X
bZ
n
c
b
(g)e
2πib·x
,
with
c
b
(g) =
Z
R
n
/Z
n
e
2πib·x
g(x) dx =
X
aZ
n
Z
[0,1]
n
e
2πib·x
f(x + a) dx.
We can then do a change of variables
x 7→ x a
, which does not change the
exponential term, and get that
c
b
(g) =
Z
R
n
e
2πib·x
f(x) dx =
ˆ
f(b).
Finally, we have
X
aZ
n
f(a) = g(0) =
X
bZ
n
c
b
(x) =
X
bZ
n
ˆ
f(b).
1.3 Mellin transform and Γ-function
We unfortunately have a bit more analysis to do, which we will use a lot later
on. This is the Mellin transform.
Definition (Mellin transform). Let f : R
>0
C be a function. We define
M(f, s) =
Z
0
y
s
f(y)
dy
y
,
whenever this converges.
We want to think of this as an analytic function of
s
. The following lemma
tells us when we can actually do so
Lemma. Suppose f : R
>0
C is such that
y
N
f(y) 0 as y for all N Z
there exists m such that |y
m
y(f)| is bounded as y 0
Then M (f, s) converges and is an analytic function of s for Re(s) > m.
The conditions say
f
is rapidly decreasing at
and has moderate growth at
0.
Proof. We know that for any 0 < r < R < , the integral
Z
R
r
y
s
f(y)
dy
y
is analytic for all s since f is continuous.
By assumption, we know
R
R
0 as
R
uniformly on compact subsets
of C. So we know
Z
r
y
s
f(y)
dy
y
converges uniformly on compact subsets of C.
On the other hand, the integral
R
r
0
as
r
0 converges uniformly on compact
subsets of
{s C
:
Re
(
s
)
> m}
by the other assumption. So the result
follows.
This transform might seem a bit strange, but we can think of this as an
analytic continuation of the Fourier transform.
Example. Suppose we are in the rather good situation that
Z
0
|f|
dy
y
< .
In practice, this will hardly ever be the case, but this is a good place to start
exploring. In this case, the integral actually converges on
iR
, and equals the
Fourier transform of f L
1
(G) = L
1
(R
×
>0
). Indeed, we find
ˆ
G = {y 7→ y
: σ R}
=
R,
and
dy
y
is just the invariant measure on
G
. So the formula for the Mellin
transform is exactly the formula for the Fourier transform, and we can view the
Mellin transform as an analytic continuation of the Fourier transform.
We now move on to explore properties of the Mellin transform. When we
make a change of variables y αy, by inspection of the formula, we find
Proposition.
M(f(αy), s) = α
s
M(f, s)
for α > 0.
The following is a very important example of the Mellin transform:
Definition function). The Γ function is the Mellin transform of
f(y) = e
y
.
Explicitly, we have
Γ(s) =
Z
0
e
y
y
s
dy
y
.
By general theory, we know this is analytic for Re(s) > 0.
If we just integrate by parts, we find
Γ(s) =
Z
0
e
y
y
s1
dy =
e
y
y
s
s
0
+
1
s
Z
0
e
y
y
s
dy =
1
s
Γ(s + 1).
So we find that
Proposition.
sΓ(s) = Γ(s + 1).
Moreover, we can compute
Γ(1) =
Z
0
e
y
dy = 1.
So we get
Proposition. For an integer n 1, we have
Γ(n) = (n 1)!.
In general, iterating the above formula, we find
Γ(s) =
1
s(s + 1) ···(s + N 1)
Γ(s + N).
Note that the right hand side makes sense for
Re
(
s
)
> N
(except at non-
positive integer points). So this allows us to extend Γ(
s
) to a meromorphic
function on {Re(s) > N }, with simple poles at 0, 1, ··· , 1 N of residues
res
s=1N
Γ(s) =
(1)
N1
(N 1)!
.
Of course, since
N
was arbitrary, we know Γ(
s
) extends to a meromorphic
function on C \ Z
0
.
Here are two facts about the Γ function that we are not going to prove,
because, even if the current experience might suggest otherwise, this is not an
analysis course.
Proposition.
(i) The Weierstrass product: We have
Γ(s)
1
= e
γs
s
Y
n1
1 +
s
n
e
s/n
for all
s C
. In particular, Γ(
s
) is never zero. Here
γ
is the Euler-
Mascheroni constant, given by
γ = lim
n→∞
1 +
1
2
+ ··· +
1
n
log n
.
(ii) Duplication and reflection formulae:
π
1
2
Γ(2s) = 2
2s1
Γ(s
s +
1
2
and
Γ(s)Γ(1 s) =
π
sin πz
.
The main reason why we care about the Mellin transform in this course is
that a lot of Dirichlet series are actually Mellin transforms of some functions.
Suppose we have a Dirichlet series
X
n=1
a
n
n
s
,
where the a
n
grow not too quickly. Then we can write
(2π)
s
Γ(s)
X
n=1
a
n
n
s
=
X
n=1
a
n
(2πn)
s
M(e
y
, s)
=
X
n=1
M(e
2πny
, s)
= M(f, s),
where we set
f(y) =
X
n1
a
n
e
2πny
.
Since we know about the analytic properties of the Γ function, by understanding
M(f, s), we can deduce useful properties about the Dirichlet series itself.
2 Riemann ζ-function
We ended the previous chapter by briefly mentioning Dirichlet series. The first
and simplest example one can write down is the Riemann ζ-function.
Definition (Riemann ζ-function). The Riemann ζ-function is defined by
ζ(s) =
X
n1
1
n
s
for Re(s) > 1.
This
ζ
-function is related to prime numbers by the following famous formula:
Proposition (Euler product formula). We have
ζ(s) =
Y
p prime
1
1 p
s
.
Proof.
Euler’s proof was purely formal, without worrying about convergence.
We simply note that
Y
p prime
1
1 p
s
=
Y
p
(1 + p
s
+ (p
2
)
s
+ ···) =
X
n1
n
s
,
where the last equality follows by unique factorization in
Z
. However, to prove
this properly, we need to be a bit more careful and make sure things converge.
Saying the infinite product
Q
p
convergence is the same as saying
P
p
s
converges, by basic analysis, which is okay since we know
ζ
(
s
) converges absolutely
when Re(s) > 1. Then we can look at the difference
ζ(s)
Y
pX
1
1 p
s
= ζ(s)
Y
pX
(1 + p
s
+ p
2s
+ ···)
=
Y
n∈N
X
n
s
,
where
N
X
is the set of all
n
1 such that at least one prime factor is
X
. In
particular, we know
ζ(s)
Y
pX
1
1 p
s
X
nX
|n
s
| 0
as X . So the result follows.
The Euler product formula is the beginning of the connection between the
ζ
-function and the distribution of primes. For example, as the product converges
for
Re
(
s
)
>
1, we know in particular that
ζ
(
s
)
6
= 0 for all
s
when
Re
(
s
)
>
1.
Whether or not
Re
(
s
) vanishes elsewhere is a less straightforward matter, and
this involves quite a lot of number theory.
We will, however, not care about primes in this course. Instead, we look at
some further analytic properties of the
ζ
function. To do so, we first write it as
a Mellin transform.
Theorem. If Re(s) > 1, then
(2π)
s
Γ(s)ζ(s) =
Z
0
y
s
e
2πy
1
dy
y
= M(f, s),
where
f(y) =
1
e
2πy
1
.
This is just a simple computation.
Proof. We can write
f(y) =
e
2πy
1 e
2πy
=
X
n1
e
2πny
for y > 0.
As y 0, we find
f(y)
1
2πy
.
So when Re(s) > 1, the Mellin transform converges, and equals
X
n1
M(e
2πny
, s) =
X
n1
(2πn)
s
M(e
y
, s) = (2π)
s
Γ(s)ζ(s).
Corollary. ζ
(
s
) has a meromorphic continuation to
C
with a simple pole at
s = 1 as its only singularity, and
res
s=1
ζ(s) = 1.
Proof. We can write
M(f, s) = M
0
+ M
=
Z
1
0
+
Z
1
y
s
e
2πy
1
dy
y
.
The second integral
M
is convergent for all
s C
, hence defines a holomorphic
function.
For any fixed N, we can expand
f(y) =
N1
X
n=1
c
n
y
n
+ y
N
g
N
(y)
for some g C
(R), as f has a simple pole at y = 0, and
c
1
=
1
2π
.
So for Re(s) > 1, we have
M
0
=
N1
X
n=1
c
n
Z
1
0
y
n+s1
dy +
Z
N
0
y
N+s1
g
N
(y) dy
=
N1
X
n=1
c
n
s + n
y
s+n
+
Z
1
0
g
N
(y)y
s+N1
dy.
We now notice that this formula makes sense for
Re
(
s
)
> N
. Thus we have
found a meromorphic continuation of
(2π)
s
Γ(s)ζ(s)
to
{Re
(
s
)
> N }
, with at worst simple poles at
s
= 1
N,
2
N, ··· ,
0
,
1. Also,
we know Γ(
s
) has a simple pole at
s
= 0
,
1
,
2
, ···
. So
ζ
(
s
) is analytic at
s = 0, 1, 2, ···. Since c
1
=
1
2π
and Γ(1) = 1, we get
res
s=1
ζ(s) = 1.
Now we note that by the Euler product formula, if there are only finitely
many primes, then ζ(s) is certainly analytic everywhere. So we deduce
Corollary. There are infinitely many primes.
Given a function
ζ
, it is natural to ask what values it takes. In particular,
we might ask what values it takes at integers. There are many theorems and
conjectures concerning the values at integers of
L
-functions (which are Dirichlet
series like the
ζ
-function). These properties relate to subtle number-theoretic
quantities. For example, the values of
ζ
(
s
) at negative integers are closely
related to the class numbers of the cyclotomic fields
Q
(
ζ
p
). These are also
related to early (partial) proofs of Fermat’s last theorem, and things like the
Birch–Swinnerton-Dyer conjecture on elliptic curves.
We shall take a tiny step by figuring out the values of
ζ
(
s
) at negative integers.
They are given by the Bernoulli numbers.
Definition
(Bernoulli numbers)
.
The Bernoulli numbers are defined by a gen-
erating function
X
n=0
B
n
t
n
n!
=
t
e
t
1
=
1 +
t
2!
+
t
2
3!
+ ···
1
.
Clearly, all Bernoulli numbers are rational. We can directly compute
B
0
= 1, B
1
=
1
2
, ··· .
The first thing to note about this is the following:
Proposition. B
n
= 0 if n is odd and n 3.
Proof. Consider
f(t) =
t
e
t
1
+
t
2
=
X
n0,n6=1
B
n
t
n
n!
.
We find that
f(t) =
t
2
e
t
+ 1
e
t
1
= f(t).
So all the odd coefficients must vanish.
Corollary. We have
ζ(0) = B
1
=
1
2
, ζ(1 n) =
B
n
n
for
n >
1. In particular, for all
n
1 integer, we know
ζ
(1
n
)
Q
and vanishes
if n > 1 is odd.
Proof. We know
(2π)
s
Γ(s)ζ(s)
has a simple pole at s = 1 n, and the residue is c
n1
, where
1
e
2πy
1
=
X
n≥−1
c
n
y
n
.
So we know
c
n1
= (2π)
n1
B
n
n!
.
We also know that
res
s=1n
Γ(s) =
(1)
n1
(n 1)!
,
we get that
ζ(1 n) = (1)
n1
B
n
n
.
If
n
= 1, then this gives
1
2
. If
n
is odd but
>
1, then this vanishes. If
n
is even,
then this is
B
n
n
, as desired.
To end our discussion on the
ζ
-function, we shall prove a functional equation,
relating
ζ
(
s
) to
ζ
(1
s
). To do so, we relate the
ζ
-function to another Mellin
transform. We define
Θ(y) =
X
nZ
e
πn
2
y
= 1 + 2
X
n1
e
πn
2
y
.
This is convergent for for y > 0. So we can write
Θ(y) = ϑ(iy),
where
ϑ(z) =
X
nZ
e
πin
2
z
,
which is analytic for
|e
πiz
| <
1, i.e.
Im
(
z
)
>
0. This is Jacobi’s
ϑ
-function. This
function is also important in algebraic geometry, representation theory, and even
applied mathematics. But we will just use it for number theory. We note that
Θ(y) 1
as y , so we can’t take its Mellin transform. What we can do is
Proposition.
M
Θ(y) 1
2
,
s
2
= π
s/2
Γ
s
2
ζ(s).
The proof is again just do it.
Proof. The left hand side is
X
n1
M
e
πn
2
y
,
s
2
=
X
n1
(πn
2
)
s/2
M
e
y
,
s
2
= π
s/2
Γ
s
2
ζ(s).
To produce a functional equation for ζ, we first do it for Θ.
Theorem (Functional equation for Θ-function). If y > 0, then
Θ
1
y
= y
1/2
Θ(y), ()
where we take the positive square root. More generally, taking the branch of
which is positive real on the positive real axis, we have
ϑ
1
z
=
z
i
1/2
ϑ(z).
The proof is rather magical.
Proof. By analytic continuation, it suffices to prove (). Let
g
t
(x) = e
πtx
2
= g
1
(t
1/2
x).
In particular,
g
1
(x) = e
πx
2
.
Now recall that
ˆg
1
=
g
1
. Moreover, the Fourier transform of
f
(
αx
) is
1
α
ˆ
f
(
y
).
So
ˆg
t
(y) = t
1/2
ˆg
1
(t
1/2
y) = t
1/2
g
1
(t
1/2
y) = t
1/2
e
πy
2
/t
.
We now apply the Poisson summation formula:
Θ(t) =
X
nZ
e
πn
2
t
=
X
nZ
g
t
(n) =
X
nZ
ˆg
t
(n) = t
1/2
Θ(1/t).
Before we continue, we notice that most of the time, when we talk about the
Γ-function, there are factors of π floating around. So we can conveniently set
Notation.
Γ
R
(s) = π
s/2
Γ(s/2).
Γ
C
(s) = 2(2π)
s
Γ(s)
These are the real/complex Γ-factors.
We also define
Notation.
Z(s) Γ
R
(s)ζ(s) = π
s/2
Γ
s
2
ζ(s).
The theorem is then
Theorem (Functional equation for ζ-function).
Z(s) = Z(1 s).
Moreover, Z(s) is meromorphic, with only poles at s = 1 and 0.
Proof. For Re(s) > 1, we have
2Z(s) = M
Θ(y) 1,
s
2
=
Z
0
[Θ(y) 1]y
s/2
dy
y
=
Z
1
0
+
Z
1
[Θ(y) 1]y
s/2
dy
y
The idea is that using the functional equation for the Θ-function, we can relate
the
R
1
0
part and the
R
1
part. We have
Z
1
0
(Θ(y) 1)y
s/2
dy
y
=
Z
1
0
(Θ(y) y
1/2
)y
s/2
dy
y
+
Z
1
0
y
s1
2
y
1/2
dy
y
=
Z
1
0
(y
1/2
Θ(1/y) y
1/2
)
dy
y
+
2
s 1
2
s
.
In the first term, we change variables y 1/y, and get
=
Z
1
y
1/2
(Θ(y) 1)y
s/2
dy
y
+
2
s 1
2
s
.
So we find that
2Z(s) =
Z
1
(Θ(y) 1)(y
s/2
+ y
1s
2
)
dy
y
+
2
s 1
2
s
= 2Z(1 s).
Note that what we’ve done by separating out the
y
s1
2
y
s/2
term is that we
separated out the two poles of our function.
Later on, we will come across more
L
-functions, and we will prove functional
equations in the same way.
Note that we can write
Z(s) = Γ
R
(s)
Y
p primes
1
1 p
s
,
and the term Γ
R
(
s
) should be thought of as the Euler factor for
p
=
, i.e. the
Archimedean valuation on Q.
3 Dirichlet L-functions
We now move on to study a significant generalization of
ζ
-functions, namely
Dirichlet
L
-functions. While these are generalizations of the
ζ
-function, it turns
out the
ζ
function is a very particular kind of
L
-function. For example, most
L
-functions are actually analytic on all of
C
, except for (finite multiples of) the
ζ-function.
Recall that a Dirichlet series is a series of the form
X
n=1
a
n
n
s
.
A Dirichlet
L
-function is a Dirichlet series whose coefficients come from Dirichlet
characters.
Definition
(Dirichlet characters)
.
Let
N
1. A Dirichlet character mod
N
is
a character χ : (Z/NZ)
×
C
×
.
As before, we write
\
(Z/N Z)
×
for the group of characters.
Note that in the special case N = 1, we have
Z/N Z = {0 = 1} = (Z/NZ)
×
,
and so
\
(Z/N Z)
×
=
{1}, and the only Dirichlet character is identically 1.
Not all characters are equal. Some are less exciting than others. Suppose
χ
is a character mod
N
, and we have some integer
d >
1. Then we have the
reduction mod N map
(Z/N dZ)
×
(Z/N Z)
×
,
and we can compose
χ
with this to get a character mod
Nd
. This is a rather
boring character, because the value of
x
(
Z/N dZ
)
×
only depends on the value
of x mod N.
Definition
(Primitive character)
.
We say a character
χ
\
(Z/nZ)
×
is primitive
if there is no M < N with M | N with χ
0
\
(Z/M Z)
×
such that
χ = χ
0
(reduction mod M).
Similarly we define
Definition
(Equivalent characters)
.
We say characters
χ
1
\
(Z/N
1
Z)
×
and
χ
2
\
(Z/N
2
Z)
×
are equivalent if for all
x Z
such that (
x, N
1
N
2
) = 1, we have
χ
1
(x mod N
1
) = χ
2
(x mod N
2
).
It is clear that if we produce a new character from an old one via reduction
mod Nd, then they are equivalent.
One can show the following:
Proposition.
If
χ
\
(Z/N Z)
×
, then there exists a unique
M | N
and a primitive
χ
\
(Z/M Z)
×
that is equivalent to χ.
Definition
(Conductor)
.
The conductor of a character
χ
is the unique
M | N
such that there is a primitive χ
\
(Z/M Z)
×
that is equivalent to χ.
Example. Take
χ = χ
0
\
(Z/N Z)
×
,
given by
χ
0
(
x
)
1. If
N >
1, then
χ
0
is not primitive, and the associated
primitive character is the trivial character modulo
M
= 1. So the conductor is 1.
Using these Dirichlet characters, we can define Dirichlet L-series:
Definition
(Dirichlet
L
-series)
.
Let
χ
\
(Z/N Z)
×
be a Dirichlet character. The
Dirichlet L-series of χ is
L(χ, s) =
X
n1
(n,N)=1
χ(n)n
s
.
Since |χ(n)| = 1, we again know this is absolutely convergent for Re(s) > 1.
As
χ
(
mn
) =
χ
(
m
)
χ
(
n
) whenever (
mn, N
) = 1, we get the same Euler product
as we got for the ζ-function:
Proposition.
L(χ, s) =
Y
prime p-N
1
1 χ(p)p
s
.
The proof of convergence is again similar to the case of the ζ-function.
It follows that
Proposition.
Suppose
M | N
and
χ
M
\
(Z/M Z)
×
and
χ
N
\
(Z/N Z)
×
are
equivalent. Then
L(χ
M
, s) =
Y
p-M
p|N
1
1 χ
M
(p)p
s
L(χ
N
, s).
In particular,
L(χ
M
, s)
L(χ
N
, s)
=
Y
p-M
p|N
1
1 χ
M
(p)p
s
is analytic and non-zero for Re(s) > 0.
We’ll next show that
L
(
χ, s
) has a meromorphic continuation to
C
, and is
analytic unless χ = χ
0
.
Theorem.
(i) L
(
χ, s
) has a meromorphic continuation to
C
, which is analytic except for
at worst a simple pole at s = 1.
(ii)
If
χ 6
=
χ
0
(the trivial character), then
L
(
χ, s
) is analytic everywhere. On
the other hand, L(χ
0
, s) has a simple pole with residue
ϕ(N)
N
=
Y
p|N
1
1
p
,
where ϕ is the Euler function.
Proof. More generally, let φ : Z/NZ C be any N -periodic function, and let
L(φ, s) =
X
n=1
φ(n)n
s
.
Then
(2π)
s
Γ(s)L(φ, s) =
X
n=1
φ(n)M(e
2πny
, s) = M(f(y), s),
where
f(y) =
X
n1
φ(n)e
2πny
.
We can then write
f(y) =
N
X
n=1
X
r=0
φ(n)e
2π(n+rN)y
=
N
X
n=1
φ(n)
e
2πny
1 e
2πNy
=
N
X
n=1
φ(n)
e
2π(N n)y
e
2πNy
1
.
As 0 N n < N, this is O(e
2πy
) as y . Copying for ζ(s), we write
M(f, s) =
Z
1
0
+
Z
1
f(y)y
s
dy
y
M
0
(s) + M
(s).
The second term is analytic for all s C, and the first term can be written as
M
0
(s) =
N
X
n=1
φ(n)
Z
1
0
e
2π(Nn)y
e
2πNy
1
y
s
dy
y
.
Now for any L, we can write
e
2π(Nn)y
e
2πNy
1
=
1
2πNy
+
L1
X
r=0
c
r,n
y
r
+ y
L
g
L,n
(y)
for some g
L,n
(y) C
[0, 1]. Hence we have
M
0
(s) =
N
X
n=1
φ(n)
Z
1
0
1
2πNy
y
s
dy
y
+
Z
1
0
L1
X
r=0
c
r,n
y
r+s1
dy
!
+ G(s),
where G(s) is some function analytic for Re(s) > L. So we see that
(2π)
s
Γ(s)L(φ, s) =
N
X
n=1
φ(n)
1
2πN(s 1)
+
c
0,n
s
+ ··· +
c
L1,n
s + L 1
+ G(s).
As Γ(
s
) has poles at
s
= 0
,
1
, ···
, this cancels with all the poles apart from the
one at s = 1.
The first part then follows from taking
φ(n) =
(
χ(n) (n, N) = 1
0 (n, N) 1
.
By reading off the formula, since Γ(1) = 1, we know
res
s=1
L(χ, s) =
1
N
N
X
n=1
φ(n).
If
χ 6
=
χ
0
, then this vanishes by the orthogonality of characters. Otherwise, it is
|(Z/N Z)
×
|/N = ϕ(N )/N.
Note that this is consistent with the result
L(χ
0
, s) =
Y
p|N
(1 p
s
)ζ(s).
So for a non-trivial character, our L-function doesn’t have a pole.
The next big theorem is that in fact
L
(
χ,
1) is non-zero. In number theory,
there are lots of theorems of this kind, about non-vanishing of
L
-functions at
different points.
Theorem. If χ 6= χ
0
, then L(χ, 1) 6= 0.
Proof. The trick is the consider all characters together. We let
ζ
N
(s) =
Y
χ
\
(Z/NZ)
×
L(χ, s) =
Y
p-N
Y
χ
(1 χ(p)p
s
)
1
for
Re
(
s
)
>
1. Now we know
L
(
χ
0
, s
) has a pole at
s
= 1, and is analytic
everywhere else. So if any other
L
(
χ,
1) = 0, then
ζ
N
(
s
) is analytic on
Re
(
s
)
>
0.
We will show that this cannot be the case.
We begin by finding a nice formula for the product of (1
χ
(
p
)
p
s
)
1
over
all characters.
Claim. If p - N, and T is any complex number, then
Y
χ
\
(Z/NZ)
×
(1 χ(p)T ) = (1 T
f
p
)
ϕ(N)/f
p
,
where f
p
is the order of p in (Z/nZ)
×
.
So
ζ
N
(s) =
Y
p-N
(1 p
f
p
s
)
ϕ(N)/f
p
.
To see this, we write f = f
p
, and, for convenience, write
G = (Z/NZ)
×
H = hpi G.
We note that
ˆ
G
naturally contains
[
G/H
=
{χ
ˆ
G
:
χ
(
p
) = 1
}
as a subgroup.
Also, we know that
|
[
G/H| = |G/H| = ϕ(N)/f.
Also, the restriction map
ˆ
G
[
G/H
ˆ
H
is obviously injective, hence an isomorphism by counting orders. So
Y
χ
ˆ
G
(1χ(p)T ) =
Y
χ
ˆ
H
(1χ(p)T )
ϕ(N)/f
=
Y
ζµ
f
(1ζT )
ϕ(N)/f
= (1T
f
)
ϕ(N)/f
.
We now notice that when we expand the product of
ζ
N
, at least formally, then we
get a Dirichlet series with non-negative coefficients. We now prove the following
peculiar property of such Dirichlet series:
Claim. Let
D(s) =
X
n1
a
n
n
s
be a Dirichlet series with real
a
n
0, and suppose this is absolutely convergent
for
Re
(
s
)
> σ >
0. Then if
D
(
s
) can be analytically continued to an analytic
function
˜
D on {Re(s) > 0}, then the series converges for all real s > 0.
Let
ρ > σ
. Then by the analytic continuation, we have a convergent Taylor
series on {|s ρ| < ρ}
D(s) =
X
k0
1
k!
D
(k)
(ρ)(s ρ)
k
.
Moreover, since
ρ > σ
, we can directly differentiate the Dirichlet series to obtain
the derivatives:
D
(k)
(ρ) =
X
n1
a
n
(log n)
k
n
ρ
.
So if 0 < x < ρ, then
D(x) =
X
k0
1
k!
(p x)
k
X
n1
a
n
(log n)
k
n
ρ
.
Now note that all terms in this sum are all non-negative. So the double series
has to converge absolutely as well, and thus we are free to rearrange the sum as
we wish. So we find
D(x) =
X
n1
a
n
n
ρ
X
k0
1
k!
(ρ x)
k
(log n)
l
=
X
n1
a
n
n
ρ
e
(ρx) log n
=
X
n1
a
n
n
ρ
n
ρx
=
X
n1
a
n
n
x
,
as desired.
Now we are almost done, as
ζ
N
(s) = L(χ
0
, s)
Y
χ6=χ
0
L(χ, s).
We saw that
L
(
χ
0
, s
) has a simple pole at
s
= 1, and the other terms are all
holomorphic at
s
= 1. So if some
L
(
χ,
1) = 0, then
ζ
N
(
s
) is holomorphic for
Re
(
s
)
>
0 (and in fact everywhere). Since the Dirichlet series of
η
N
has
0
coefficients, by the lemma, it suffices to find some point on
R
>0
where the
Dirichlet series for ζ
N
doesn’t converge.
We notice
ζ
N
(x) =
Y
p-N
(1 + p
f
p
x
+ p
2f
p
x
+ ···)
ϕ(N)/f
p
X
p-N
p
ϕ(N)x
.
It now suffices to show that
P
p
1
=
, and thus the series for
ζ
N
(
x
) is not
convergent for x =
1
ϕ(N)
.
Claim. We have
X
p prime
p
x
log(x 1)
as x 1
+
. On the other hand, if χ 6= χ
0
is a Dirichlet character mod N, then
X
p-N
χ(p)p
x
is bounded as x 1
+
.
Of course (and crucially, as we will see), the second part is not needed for
the proof, but it is still nice to know.
To see this, we note that for any χ, we have
log L(χ, x) =
X
p-N
log(1 χ(p)p
x
) =
X
p-N
X
r1
χ(p)
r
p
rx
r
.
So
log L(χ, x)
X
p-N
χ(p)p
x
<
X
p-N
X
r2
p
rx
=
X
p-N
p
2x
1 p
x
X
n1
n
2
1/2
,
which is a (finite) constant for C < . When χ = χ
0
, N = 1, then
log ζ(x)
X
p
p
x
is bounded as x 1
+
. But we know
ζ(s) =
1
s 1
+ O(s).
So we have
X
p
p
x
log(x 1).
When
χ 6
=
χ
0
, then
L
(
χ,
1)
6
= 0, as we have just proved! So
log L
(
χ, x
) is
bounded as x 1
+
. and so we are done.
Note that up to a finite number of factors in the Euler product (for
p | N
),
this
ζ
N
(
s
) equals to the Dedekind
ζ
-function of the number field
K
=
Q
(
n
1
),
given by
ζ
K
(s) =
X
ideals 06=I⊆O
K
1
(N(I))
s
.
We can now use what we’ve got to quickly prove Dirichlet’s theorem:
Theorem
(Dirichlet’s theorem on primes in arithmetic progressions)
.
Let
a Z
be such that (
a, N
) = 1. Then there exists infinitely many primes
p a
(mod N).
Proof. We want to show that the series
X
p prime
pa mod N
p
x
is unbounded as
x
1
+
, and in particular must be infinite. We note that for
(x, N) = 1, we have
X
χ
\
(Z/NZ)
×
χ(x) =
(
ϕ(N) x 1 (mod N)
0 otherwise
,
since the sum of roots of unity vanishes. We also know that
χ
is a character, so
χ(a)
1
χ(p) = χ(a
1
p). So we can write
X
p prime
pa mod N
p
x
=
1
ϕ(N)
X
χ(Z/NZ)
×
χ(a)
1
X
all p
χ(p)p
x
,
Now if χ = χ
0
, then the sum is just
X
p-N
p
x
log(x 1)
as x 1
+
. Moreover, all the other sums are bounded as x 1
+
. So
X
pa mod N
p
x
1
ϕ(N)
log(x 1).
So the whole sum must be unbounded as
x
1
+
. So in particular, the sum
must be infinite.
This actually tells us something more. It says
P
pa mod N
p
x
P
all p
p
x
1
ϕ(N)
.
as
x
1
+
. So in some well-defined sense (namely analytic density),
1
ϕ(N)
of the
primes are a (mod N).
In fact, we can prove that
lim
X→∞
|{p X : p a mod N}|
|{p X}|
=
1
ϕ(N)
.
This theorem has many generalizations. In general, let
L/K
be a finite Galois
extension of number fields with Galois group
G
=
Gal
(
L/K
). Then for all
primes
p
of
K
which is unramified in
L
, we can define a Frobenius conjugacy
class [σ
p
] G.
Theorem
(Cebotarev density theorem)
.
Cebotarev density theorem Let
L/K
be a Galois extension. Then for any conjugacy class
C Gal
(
L/K
), there exists
infinitely many p with [σ
p
] = C.
If
L/K
=
Q
(
n
1
)
/Q
, then
G
=
(
Z/N Z
)
×
, and
σ
p
is just the element of
G
given by
p
(
mod N
). So if we fix
a
(
mod N
)
G
, then there are infinitely many
p
with
p a
(
mod N
). So we see the Cebotarev density theorem is indeed a
generalization of Dirichlet’s theorem.
4 The modular group
We now move on to study the other words that appear in the title of the course,
namely modular forms. Modular forms are very special functions defined on the
upper half plane
H = {z C : Im(z) > 0}.
The main property of a modular form is that they transform nicely under
obius transforms. In this chapter, we will first try to understand these obius
transforms. Recall that a matrix
γ =
a b
c d
GL
2
(C)
acts on C by
z 7→ γ(z) =
az + b
cz + d
.
If we further restrict to matrices in
GL
2
(
R
), then this maps
C \R
to
C \R
, and
R {∞} to R {∞}.
We want to understand when this actually fixes the upper half plane. This
is a straightforward computation
Im γ(z) =
1
2i
az + b
cz + d
a¯z + b
c¯z + d
=
1
2i
(ad bc)(z ¯z)
|cz + d|
2
= det(γ)
Im z
|cz + d|
2
.
Thus, we know
Im
(
γ
(
z
)) and
Im
(
z
) have the same sign iff
det
(
γ
)
>
0. We write
Definition (GL
2
(R)
+
).
GL
2
(R)
+
= {γ GL
2
(R) : det γ > 0}.
This is the group of obius transforms that map H to H.
However, note that the action of
GL
2
(
R
)
+
on
H
is not faithful. The kernel
is given by the subgroup
R
×
· I = R
×
·
1 0
0 1
.
Thus, we are naturally led to define
Definition (PGL
2
(R)
+
).
PGL
2
(R)
+
=
GL
2
(R)
+
R
×
· I
.
There is a slightly better way of expressing this. Now note that we can obtain
any matrix in
GL
2
(
R
+
), by multiplying an element of
SL
2
(
R
) with a unit in
R
.
So we have
PGL
2
(R)
+
=
SL
2
(R)/I} PSL
2
(R).
What we have is thus a faithful action of
PSL
2
(
R
) on the upper half plane
H
.
From IA Groups, we know this action is transitive, and the stabilizer of
i
=
1
is SO(2)/I}.
In fact, this group
PSL
2
(
R
) is the group of all holomorphic automorphisms
of H, and the subgroup SO(2) SL
2
(R) is a maximal compact subgroup.
Theorem. The group SL
2
(R) admits the Iwasawa decomposition
SL
2
(R) = KAN = N AK,
where
K = SO(2), A =

r 0
0
1
r

, N =

1 x
0 1

Note that this implies a few of our previous claims. For example, any
z = x + iy C can be written as
z = x + iy =
1 x
0 1
y 0
0
1
y
· i,
using the fact that K = SO(2) fixes i, and this gives transitivity.
Proof.
This is just Gram–Schmidt orthogonalization. Given
g GL
2
(
R
), we
write
ge
1
= e
0
1
, ge
2
= e
0
2
,
By Gram-Schmidt, we can write
f
1
= λ
1
e
0
1
f
2
= λ
2
e
0
1
+ µe
0
2
such that
kf
1
k = kf
2
k = 1, (f
1
, f
2
) = 0.
So we can write
f
1
f
2
=
e
0
1
e
0
2
λ
1
λ
2
0 µ
Now the left-hand matrix is orthogonal, and by decomposing the inverse of
λ
1
λ
2
0 µ
, we can write g =
e
0
1
e
0
2
as a product in KAN.
In general, we will be interested in subgroups Γ
SL
2
(
R
), and their images
¯
Γ in Γ PSL
2
(R), i.e.
¯
Γ =
Γ
Γ I}
.
We are mainly interested in the case Γ = SL
2
(Z), or a subgroup of finite index.
Definition (Modular group). The modular group is
PSL
2
(Z) =
SL
2
(Z)
I}
.
There are two particularly interesting elements of the modular group, given
by
S = ±
0 1
1 0
, T = ±
1 1
0 1
.
Then we have
T
(
z
) =
z
+ 1 and
S
(
z
) =
1
z
. One immediately sees that
T
has
infinite order and S
2
= 1 (in PSL
2
(Z)). We can also compute
T S = ±
1 1
1 0
and
(T S)
3
= 1.
The following theorem essentially summarizes the basic properties of the modular
group we need to know about:
Theorem. Let
D =
z H :
1
2
Re z
1
2
, |z| > 1
{z H : |z| = 1, Re(z) 0}.
1
2
1
1
2
1
ρ = e
πi/3
i
Then
D
is a fundamental domain for the action of
¯
Γ
on
H
, i.e. every orbit
contains exactly one element of D.
The stabilizer of
z D
in Γ is trivial if
z 6
=
i, ρ
, and the stabilizers of
i
and
ρ
are
¯
Γ
i
= hSi
=
Z
2Z
,
¯
Γ
ρ
= hT Si
=
Z
3Z
.
Finally, we have
¯
Γ = hS, T i = hS, T Si.
In fact, we have
¯
Γ = hS, T | S
2
= (T S)
3
= ei,
but we will neither prove nor need this.
The proof is rather technical, and involves some significant case work.
Proof.
Let
¯
Γ
=
hS, T i
¯
Γ
. We will show that if
z H
, then there exists
γ
¯
Γ
such that γ(z) D.
Since
z 6∈ R
, we know
Z
+
Zz
=
{cz
+
d
:
c, d Z}
is a discrete subgroup of
C. So we know
{|cz + d| : c, d Z}
is a discrete subset of
R
, and is in particular bounded away from 0. Thus, we
know
Im γ(z) =
Im(z)
|cz + d|
2
: γ =
a b
c d
¯
Γ
is a discrete subset of
R
>0
and is bounded above. Thus there is some
γ
¯
Γ
with
Im γ
(
z
) maximal. Replacing
γ
by
T
n
γ
for suitable
n
, we may assume
|Re γ(z)|
1
2
.
We consider the different possible cases.
If |γ(z)| < 1, then
Im Sγ(z) = Im
1
γ(z)
=
Im γ(z)
|γ(z)|
2
> Im γ(z),
which is impossible. So we know
|γ
(
z
)
|
1. So we know
γ
(
z
) lives in the
closure of D.
If Re(γ(z)) =
1
2
, then T γ(z) has real part +
1
2
, and so T (γ(z)) D.
If
1
2
< Re
(
z
)
<
0 and
|γ
(
z
)
|
= 1, then
|Sγ
(
z
)
|
= 1 and 0
< Re Sγ
(
z
)
<
1
2
,
i.e. Sγ(z) D.
So we can move it to somewhere in D.
We shall next show that if
z, z
0
D
, and
z
0
=
γ
(
z
) for
γ
¯
Γ
, then
z
=
z
0
.
Moreover, either
γ = 1; or
z = i and γ = S; or
z = ρ and γ = T S or (T S)
2
.
It is clear that this proves everything.
To show this, we wlog
Im(z
0
) =
Im z
|cz + d|
2
Im z
where
γ =
a b
c d
,
and we also wlog c 0.
Therefore we know that |cz + d| 1. In particular, we know
1 Im(cz + d) = c Im(z) c
3
2
since z D. So c = 0 or 1.
If c = 0, then
γ = ±
1 m
0 1
for some
m Z
, and this
z
0
=
z
+
m
. But this is clearly impossible. So we
must have m = 0, z = z
0
, γ = 1 PSL
2
(Z).
If
c
= 1, then we know
|z
+
d|
1. So
z
is at distance 1 from an integer.
As z D, the only possibilities are d = 0 or 1.
If d = 0, then we know |z| = 1. So
γ =
a 1
1 0
for some a Z. Then z
0
= a
1
z
. Then
either a = 0, which forces z = i, γ = S; or
a = 1, and z
0
= 1
1
z
, which implies z = z
0
= ρ and γ = T S.
If d = 1, then by looking at the picture, we see that z = ρ. Then
|cz + d| = |z 1| = 1,
and so
Im z
0
= Im z =
3
2
.
So we have z
0
= ρ as well. So
+ b
ρ 1
= ρ,
which implies
ρ
2
(a + 1)ρ b = 0
So ρ = 1, a = 0, and γ = (T S)
2
.
Note that this proof is the same as the proof of reduction theory for binary
positive definite binary quadratic forms.
What does the quotient
¯
Γ \ N
look like? Each point in the quotient can be
identified with an element in
D
. Moreover,
S
and
T
identify the portions of
the boundary of
D
. Thinking hard enough, we see that the quotient space is
homeomorphic to a disk.
An important consequence of this is that the quotient Γ
\H
has finite invariant
measure.
Proposition. The measure
dµ =
dx dy
y
2
is invariant under
PSL
2
(
R
). If Γ
PSL
2
(
Z
) is of finite index, then
µ
\H
)
<
.
Proof. Consider the 2-form associated to µ, given by
η =
dx dy
y
2
=
idz d¯z
2(Im z)
2
.
We now let
γ =
a b
c d
SL
2
(R).
Then we have
Im γ(z) =
Im z
|cz + d|
2
.
Moreover, we have
dγ(z)
dz
=
a(cz + d) c(az + b)
(cz + d)
2
=
1
(cz + d)
2
.
Plugging these into the formula, we see that η is invariant under γ.
Now if
¯
Γ PSL
2
(
Z
) has finite index, then we can write
PSL
2
(
Z
) as a union
of cosets
PSL
2
(Z) =
n
a
i=1
¯γγ
i
,
where n = (PSL
2
(Z) :
¯
Γ). Then a fundamental domain for
¯
Γ is just
n
[
i=1
γ
i
(D),
and so
µ(
¯
Γ \ H) =
X
µ(γ
i
D) = (D).
So it suffices to show that µ(D) is finite, and we simply compute
µ(D) =
Z
D
dx dy
y
2
Z
x=
1
2
x=
1
2
Z
y=
y=
2/2
dx dy
y
2
< .
It is an easy exercise to show that we actually have
µ(D) =
π
3
.
We end with a bit terminology.
Definition
(Principal congruence subgroup)
.
For
N
1, the principal congru-
ence subgroup of level N is
Γ(N) = {γ SL
2
(Z) : γ I (mod N)} = ker(SL
2
(Z) SL
2
(Z/N Z)).
Any Γ
SL
2
(
Z
) containing some Γ(
N
) is called a congruence subgroup, and its
level is the smallest N such that Γ Γ(N)
This is a normal subgroup of finite index.
Definition
0
(N), Γ
1
(N)). We define
Γ
0
(N) =

a b
c d
SL
2
(Z) : c 0 (mod N )
and
Γ
1
(N) =

a b
c d
SL
2
(Z) : c 0, d 1 (mod N)
.
We similarly define Γ
0
(
N
) and Γ
1
(
N
) to be the transpose of Γ
0
(
N
) and Γ
1
(
N
)
respectively.
Note that “almost all” subgroups of
SL
2
(
Z
) are not congruence subgroups.
On the other hand, if we try to define the notion of congruence subgroups in
higher dimensions, we find that all subgroups of
SL
n
(
Z
) for
n >
2 are congruence!
5 Modular forms of level 1
5.1 Basic definitions
We can now define a modular form. Recall that we have SL
2
(Z) = Γ(1).
Definition
(Modular form of level 1)
.
A holomorphic function
f
:
H C
is a
modular form of weight k Z and level 1 if
(i) For any
γ =
a b
c d
Γ(1),
we have
f(γ(z)) = (cz + d)
k
f(z).
(ii) f is holomorphic at (to be defined precisely later).
What can we deduce about modular forms from these properties? If we take
γ = I, then we get
f(z) = (1)
k
f(z).
So if
k
is odd, then
f
0. So they only exist for even weights. If we have even
weights, then it suffices to consider
¯
Γ = hS, T i. Since
f(z) 7→ (cz + d)
k
f(γ(z))
is a group action of Γ(1) on functions on
H
, it suffices to check that
f
is invariant
under the generators S and T . Thus, (i) is equivalent to
f(z + 1) = f(z), f(1/z) = z
k
f(z).
How do we interpret (ii)? We know
f
is
Z
-periodic. If we write
q
=
e
2πiz
, then
we have
z H
iff 0
< |q| <
1, and moreover, if two different
z
give the same
q
,
then the values of
f
on the two
z
agree. In other words,
f
(
z
) only depends on
q
,
and thus there exists a holomorphic function
˜
f(q) on {0 < |q| < 1} such that
˜
f(e
2πiz
) = f(z).
Explicitly, we can write
˜
f(q) = f
1
2πi
log q
.
By definition,
˜
f
is a holomorphic function on a punctured disk. So we have a
Laurent expansion
˜
f(q) =
X
n=−∞
a
n
(f)q
n
,
called the Fourier expansion or
q
-expansion of
f
. We say
f
is meromorphic
(resp. holomorphic) at if
˜
f is meromorphic (resp. holomorphic) at q = 0.
In other words, it is meromorphic at
if
a
n
(
f
) = 0 for
n
sufficiently negative,
and holomorphic if
a
n
(
f
) = 0 for all
n
0. The latter just says
f
(
z
) is bounded
as Im(z) .
The following definition is also convenient:
Definition
(Cusp form)
.
A modular form
f
is a cusp form if the constant term
a
0
(f) is 0.
We will later see that “almost all” modular forms are cusp forms.
In this case, we have
˜
f =
X
n1
a
n
(f)q
n
.
From now on, we will drop the ˜, which should not cause confusion.
Definition
(Weak modular form)
.
A weak modular form is a holomorphic form
on H satisfying (i) which is meromorphic at .
We will use these occasionally.
The transformation rule for modular forms seem rather strong. So, are there
actually modular forms? It turns out that there are quite a lot of modular forms,
and remarkably, there is a relatively easy way of listing out all the modular
forms.
The main class (and in fact, as we will later see, a generating class) of modular
forms is due to Eisenstein. This has its origin in the theory of elliptic functions,
but we will not go into that.
Definition (Eisenstein series). Let k 4 be even. We define
G
k
(z) =
X
m,nZ
(m,n)6=(0,0)
1
(mz + n)
k
=
X
0
(m,n)Z
2
1
(mz + n)
k
.
Here the
P
0
denotes that we are omitting 0, and in general, it means we
don’t sum over things we obviously don’t want to sum over.
When we just write down this series, it is not clear that it is a modular form,
or even that it converges. This is given by the following theorem:
Theorem. G
k
is a modular form of weight
k
and level 1. Moreover, its
q
-
expansion is
G
k
(z) = 2ζ(k)
1
2k
B
k
X
n1
σ
k1
(n)q
n
, (1)
where
σ
r
(n) =
X
1d|n
d
r
.
Convergence of the series follows from the following more general result. Note
that since z 6∈ R, we know {1, z} is an R-basis for C.
Proposition. Let (e
1
, ··· , e
d
) be some basis for R
d
. Then if r R, the series
X
0
mZ
d
km
1
e
1
+ ··· + m
d
e
d
k
r
converges iff r > d.
Proof. The function
(x
i
) R
d
7→
X
i=1
x
i
e
i
is a norm on
R
d
. As any 2 norms on
R
d
are equivalent, we know this is equivalent
to the sup norm k · k
. So the series converges iff the corresponding series
0
X
mZ
d
kmk
r
converges. But if 1
N Z
, then the number of
m Z
d
such that
kmk
=
N
is (2N + 1)
d
(2N 1)
d
2
d
dN
d1
. So the series converges iff
X
N1
N
r
N
d1
converges, which is true iff r > d.
Proof of theorem.
Then convergence of the Eisenstein series by applying this
to
R
2
=
C
. So the series is absolutely convergent. Therefore we can simply
compute
G
k
(z + 1) =
0
X
m,n
1
(mz + (m + n))
k
= G
k
(z).
Also we can compute
G
k
1
z
=
0
X
m,n
=
z
k
(m + nz)
k
= z
k
G
k
(z).
So
G
k
satisfies the invariance property. To show that
G
k
is holomorphic, and
holomorphic at infinity, we’ll derive the q-expansion (1).
Lemma.
X
n=
1
(n + w)
k
=
(2πi)
k
(k 1)!
X
d=1
d
k1
e
2πidw
for any w H and k 2.
There are (at least) two ways to prove this. One of this is to use the series
for the cotangent, but here we will use Poisson summation.
Proof. Let
f(x) =
1
(x + w)
k
.
We compute
ˆ
f(y) =
Z
−∞
e
2πixy
(x + w)
k
dx.
We replace this with a contour integral. We see that this has a pole at
w
. If
y > 0, then we close the contour downwards, and we have
ˆ
f(y) = 2πi Res
z=w
e
2πiyz
(z + w)
k
= 2πi
(2πiy)
k1
(k 1)!
e
2πiyw
.
If
y
0, then we close in the upper half plane, and since there is no pole, we
have
ˆ
f(y) = 0. So we have
X
n=−∞
1
(n + w)
k
=
X
nZ
f(n) =
X
dZ
ˆ
f(d) =
(2πi)
k
(k 1)!
X
d1
d
k1
e
2πidw
by Poisson summation formula.
Note that when we proved the Poisson summation formula, we required
f
to decrease very rapidly at infinity, and our
f
does not satisfy that condition.
However, we can go back and check that the proof still works in this case.
Now we get back to the Eisenstein series. Note that since
k
is even, we can
drop certain annoying signs. We have
G
k
(z) = 2
X
n1
1
n
k
+ 2
X
m1
X
nZ
1
(n + mz)
k
= 2ζ(k) + 2
X
m1
(2πi)
k
(k 1)!
X
d1
d
k1
q
dm
.
= 2ζ(k) + 2
(2πi)
k
(k 1)!
X
n1
σ
k1
(n)q
n
.
Then the result follows from the fact that
ζ(k) =
1
2
(2πi)
k
B
k
k!
.
So we see that G
k
is holomorphic in H, and is also holomorphic at .
It is convenient to introduce a normalized Eisenstein series
Definition (Normalized Eisenstein series). We define
E
k
(z) = (2ζ(k))
1
G
k
(z)
= 1
2k
B
k
X
n1
σ
k1
(n)q
n
=
1
2
X
(m,n)=1
m,nZ
1
(mz + n)
k
.
The last line follows by taking out any common factor of
m, n
in the series
defining G
k
.
Thus, to figure out the (normalized) Eisenstein series, we only need to know
the Bernoulli numbers.
Example. We have
B
2
=
1
6
, B
4
=
1
30
, B
6
=
1
42
, B
8
=
1
30
B
10
=
5
66
, B
12
=
631
2730
, B
14
=
7
6
.
Using these, we find
E
4
= 1 + 240
X
σ
3
(n)q
n
E
6
= 1 504
X
σ
5
(n)q
n
E
8
= 1 + 480
X
σ
7
(n)q
n
E
10
= 1 264
X
σ
9
(n)q
n
E
12
= 1 +
65520
691
X
σ
11
(n)q
n
E
14
= 1 24
X
σ
13
(n)q
n
.
We notice that there is a simple pattern for k 14, except for k = 12.
For more general analysis of modular forms, it is convenient to consider the
following notation:
Definition (Slash operator). Let
a b
c d
= γ GL
2
(R)
+
, z H,
and f : H C any function. We write
j(γ, z) = cz + d.
We define the slash operator to be
(f |
k
γ)(z) = (det γ)
k/2
j(γ, z)
k
f(γ(z)).
Note that some people leave out the
det γ
k/2
factor, but if we have it, then
whenever γ = Ia, then
f |
k
γ = sgn(a)
k
f,
which is annoying. In this notation, then condition (i) for
f
to be a modular
form is just
f |
k
γ = f
for all γ SL
2
(Z).
To prove things about our j operator, it is convenient to note that
γ
z
1
= j(γ, z)
γ(z)
1
. ()
Proposition.
(i) j(γδ, z) = j(γ, δ(z))j(δ, z) (in fancy language, we say j is a 1-cocycle).
(ii) j(γ
1
, z) = j(γ, γ
1
(z))
1
.
(iii) γ
:
ϕ 7→ f |
k
γ
is a (right) action of
G
=
GL
2
(
R
)
+
on functions on
H
. In
other words,
f |
k
γ |
k
δ = f |
k
(γδ).
Note that this implies that if if Γ GL
2
(R)
+
and Γ = hγ
1
, ··· , γ
m
i then
f |
k
γ = f f |
k
γ
i
= f for all i = 1, ··· , m.
The proof is just a computation.
Proof.
(i) We have
j(γδ, z)
γδ(z)
1
= γδ
z
1
= j(δ, z)γ
δ(z)
1
= j(δ, z)j(γ, δ(z))
z
1
(ii) Take δ = γ
1
.
(iii) We have
((f |
k
γ)|
k
δ)(z) = (det δ)
k/2
j(δ, z)
k
(f |
k
γ)(δ(z))
= (det δ)
k/2
j(δ, z)
k
(det γ)
k/2
j(γ, δ(z))
k
f(γδ(z))
= (det γδ)
k/2
j(γδ, z)
k
f(γδ(z))
= (f |
k
γδ)(z).
Back to the Eisenstein series.
G
k
arise naturally in elliptic functions, which
are coefficients in the series expansion of Weierstrass function.
There is another group-theoretic interpretation, which generalizes in many
ways. Consider
Γ(1)
=
±
1 n
0 1
: n Z
Γ(1) = SL
2
(Z),
which is the stabilizer of . If
δ = ±
1 n
0 1
Γ(1)
,
then we have
j(δγ, z) = j(δ, γ(z))j(γ, z) = ±j(γ, z).
So j(γ, z)
2
depends only on the coset Γ(1)
γ. We can also check that if
γ =
a b
c d
, γ =
a
0
b
0
c
0
d
0
Γ(1),
then Γ(1)
γ = Γ(1)
γ
0
iff (c, d) = ±(c
0
, d
0
).
Moreover, gcd(c, d) = 1 iff there exists a, b such that
a b
c d
= 1.
We therefore have
E
k
(z) =
X
γΓ(1)
\Γ(1)
j(γ, z)
k
,
where we sum over (any) coset representatives of Γ(1)
.
We can generalize this in two ways. We can either replace
j
with some other
appropriate function, or change the groups.
5.2 The space of modular forms
In this section, we are going to find out all modular forms! For
k Z
, we write
M
k
=
M
k
(Γ(1)) for the set of modular forms of weight
k
(and level 1). We have
S
k
S
k
(Γ(1)) containing the cusp forms. These are
C
-vector spaces, and are
zero for odd k.
Moreover, from the definition, we have a natural product
M
k
· M
`
M
k+`
.
Likewise, we have
S
k
· M
`
S
k+`
.
We let
M
=
M
kZ
M
k
, S
=
M
kZ
S
k
.
Then M
is a graded ring and S
is a graded ideal. By definition, we have
S
k
= ker(a
0
: M
k
C).
To figure out what all the modular forms are, we use the following constraints
on the zeroes of a modular form:
Proposition.
Let
f
be a weak modular form (i.e. it can be meromorphic at
)
of weight k and level 1. If f is not identically zero, then
X
z
0
∈D\{i,ρ}
ord
z
0
(f)
+
1
2
ord
i
(f) +
1
3
ord
ρ
f + ord
(f) =
k
12
,
where ord
f is the least r Z such that a
r
(f) 6= 0.
Note that if
γ
Γ(1), then
j
(
γ, z
) =
cz
+
d
is never 0 for
z H
. So it follows
that ord
z
f = ord
γ(z)
f.
We will prove this using the argument principle.
Proof.
Note that the function
˜
f
(
q
) is non-zero for 0
< |q| < ε
for some small
ε
by the principle of isolated zeroes. Setting
ε = e
2πR
,
we know f(z) 6= 0 if Im z R.
In particular, the number of zeroes of
f
in
D
is finite. We consider the
integral along the following contour, counterclockwise.
ρ
ρ
2
i
1
2
+ iR
1
2
+ iR
C
0
C
We assume
f
has no zeroes along the contour. Otherwise, we need to go around
the poles, which is a rather standard complex analytic maneuver we will not go
through.
For ε sufficiently small, we have
Z
Γ
f
0
(z)
f(z)
dz = 2πi
X
z
0
∈D\{i,ρ}
ord
z
0
f
by the argument principle. Now the top integral is
Z
1
2
iR
1
2
+iR
f
0
f
dz =
Z
|q|=ε
d
˜
f
dq
˜
f(q)
dq = 2πi ord
f.
As
f
0
f
has at worst a simple pole at
z
=
i
, the residue is
ord
i
f
. Since we are
integrating along only half the circle, as ε 0, we pick up
πi res = πi ord
i
f.
Similarly, we get
2
3
πi ord
ρ
f coming from ρ and ρ
2
.
So it remains to integrate along the bottom circular arcs. Now note that
S : z 7→
1
z
maps C to C
0
with opposite orientation, and
df(Sz)
f(Sz)
= k
dz
z
+
df(z)
f(z)
as
f(Sz) = z
k
f(z).
So we have
Z
C
+
Z
C
0
f
0
f
dz =
Z
C
0
f
0
f
dz
k
z
dz +
f
0
f
dz
k
Z
C
0
dz
z
k
Z
i
ρ
dz
z
=
πik
6
.
So taking the limit ε 0 gives the right result.
Corollary. If k < 0, then M
k
= {0}.
Corollary. If k = 0, then M
0
= C, the constants, and S
0
= {0}.
Proof.
If
f M
0
, then
g
=
f f
(1). If
f
is not constant, then
ord
i
g
1, so
the LHS is > 0, but the RHS is = 0. So f C.
Of course, a
0
(f) = f. So S
0
= {0}.
Corollary.
dim M
k
1 +
k
12
.
In particular, they are finite dimensional.
Proof.
We let
f
0
, ··· , f
d
be
d
+ 1 elements of
M
k
, and we choose distinct points
z
1
, ··· , z
d
D \ {i, ρ}. Then there exists λ
0
, ··· , λ
d
C, not all 0, such that
f =
d
X
i=0
λ
i
f
i
vanishes at all these points. Now if
d >
k
12
, then LHS is
>
k
12
. So
f
0. So (
f
i
)
are linearly dependent, i.e. dim M
k
< d + 1.
Corollary. M
2
=
{
0
}
and
M
k
=
CE
k
for 4
k
10 (
k
even). We also have
E
8
= E
2
4
and E
10
= E
4
E
6
.
Proof. Only M
2
= {0} requires proof. If 0 6= f M
2
, then this implies
a +
b
2
+
c
3
=
1
6
for integers a, b, c 0, which is not possible.
Alternatively, if
f M
2
, then
f
2
M
4
and
f
3
M
6
. This implies
E
3
4
=
E
2
6
,
which is not the case as we will soon see.
Note that we know
E
8
=
E
2
4
, and is not just a multiple of it, by checking the
leading coefficient (namely 1).
Corollary. The cusp form of weight 12 is
E
3
4
E
2
6
= (1 + 240q + ···)
3
(1 504q + ···)
2
= 1728q + ··· .
Note that 1728 = 12
3
.
Definition (∆ and τ ).
∆ =
E
3
4
E
2
6
1728
=
X
n1
τ(n)q
n
S
12
.
This function
τ
is very interesting, and is called Ramanujan’s
τ
-function. It
has nice arithmetic properties we’ll talk about soon.
The following is a crucial property of ∆:
Proposition. ∆(z) 6= 0 for all z H.
Proof. We have
X
z
0
6=i,ρ
ord
z
0
+
1
2
ord
i
+
1
3
ord
ρ
+ ord
∆ =
k
12
= 1.
Since ord
ρ
∆ = 1, it follows that there can’t be any other zeroes.
It follows from this that
Proposition.
The map
f 7→
f
is an isomorphism
M
k12
(Γ(1))
S
k
(Γ(1))
for all k > 12.
Proof.
Since
S
12
, it follows that if
f M
k1
, then
f S
k
. So the map
is well-defined, and we certainly get an injection
M
k12
S
k
. Now if
g S
k
,
since
ord
= 1
ord
g
and
6
=
H
. So
g
is a modular form of weight
k 12.
Thus, we find that
Theorem.
(i) We have
dim M
k
(Γ(1)) =
0 k < 0 or k odd
k
12
k > 0, k 2 (mod 12)
1 +
k
12
otherwise
(ii) If k > 4 and even, then
M
k
= S
k
CE
k
.
(iii) Every element of M
k
is a polynomial in E
4
and E
6
.
(iv) Let
b =
(
0 k 0 (mod 4)
1 k 2 (mod 4)
.
Then
{h
j
= ∆
j
E
b
6
E
(k12j6b)/4
4
: 0 j < dim M
k
}.
is a basis for M
k
, and
{h
j
: 1 j < dim M
k
}
is a basis for S
k
.
Proof.
(ii) S
k
is the kernel of the homomorphism
M
k
C
sending
f 7→ a
0
(
f
). So the
complement of
S
k
has dimension at most 1, and we know
E
k
is an element
of it. So we are done.
(i)
For
k <
12, this agrees with what we have already proved. By the
proposition, we have
dim M
k12
= dim S
k
.
So we are done by induction and (ii).
(iii)
This is true for
k <
12. If
k
12 is even, then we can find
a, b
0 with
4a + 6b = k. Then E
a
4
E
b
6
M
k
, and is not a cusp form. So
M
k
= CE
a
4
E
b
6
M
k12
.
But is a polynomial in E
4
, E
6
, So we are done by induction on k.
(iv)
By (i), we know
k
12
j
6
k
0 for
j < dim M
k
, and is a multiple of 4.
So
h
j
M
k
. Next note that the
q
-expansion of
h
j
begins with
q
j
. So they
are all linearly independent.
So we have completely determined all modular forms, and this is the end of
the course.
5.3 Arithmetic of
Recall that we had
∆ =
X
τ(n)q
n
,
and we knew
τ(1) = 1, τ(n) Q.
In fact, more is true.
Proposition.
(i) τ(n) Z for all n 1.
(ii) τ(n) = σ
11
(n) (mod 691)
The function
τ
satisfies many more equations, some of which are on the
second example sheet.
Proof.
(i) We have
1728∆ = (1 + 240A
3
(q))
3
(1 504A
5
(q))
2
,
where
A
r
=
X
n1
σ
r
(n)q
n
.
We can write this as
1728∆ = 3 · 240A
3
+ 3 · 240
2
A
2
3
+ 240
3
A
3
3
+ 2 · 504A
5
504
2
A
2
5
.
Now recall the deep fact that 1728 = 12
3
and 504 = 21 · 24.
Modulo 1728, this is equal to
720A
3
+ 1008A
5
.
So it suffices to show that
5σ
3
+ 7σ
5
(n) 0 (mod 12).
In other words, we need
5d
3
+ 7d
5
0 (mod 12),
and we can just check this manually for all d.
(ii) Consider
E
3
4
= 1 +
X
n1
b
n
q
n
with b
n
Z. We also have
E
12
= 1 +
65520
691
X
n1
σ
11
(n)q
n
.
Also, we know
E
12
E
3
4
S
12
.
So it is equal to
λ
for some
λ Q
. So we find that for all
n
1, we have
665520
691
σ
11
(n) b
n
= λτ(n).
In other words,
65520σ
11
(n) 691b
n
= µτ(n)
for some τ Q.
Putting
n
= 1, we know
τ
(1) = 1,
σ
11
(1) = 1, and
b
1
Z
. So
µ Z
and
µ 65520 (mod 691). So for all n 1, we have
65520σ
11
(n) 65520τ (n) (mod 691).
Since 691 and 65520 are coprime, we are done.
This proof is elementary, once we had the structure theorem, but doesn’t
really explain why the congruence is true.
The function
τ
(
n
) was studied extensively by Ramanujan. He proved the 691
congruence (and many others), and (experimentally) observed that if (
m, n
) = 1,
then
τ(mn) = τ(m)τ(n).
Also, he observed that for any prime p, we have
|τ(p)| < 2p
11/2
,
which was a rather curious thing to notice. Both of these things are true, and
we will soon prove that the first is true. The second is also true, but it uses deep
algebraic geometry. It was proved by Deligne in 1972, and he got a fields medal
for proving this. So it’s pretty hard.
We will also prove a theorem of Jacobi:
∆ = q
Y
n=1
(1 q
n
)
24
.
The numbers τ (p) are related to Galois representations.
Rationality and integrality
So far, we have many series that have rational coefficients in them. Given any
subring
R C
, we let
M
k
(
R
) =
M
k
(Γ(1)
, R
) be the set of all
f M
k
such that
all
a
n
(
f
)
R
. Likewise, we define
S
k
(
R
). For future convenience, we will prove
a short lemma about them.
Lemma.
(i)
Suppose
dim M
k
=
d
+ 1
1. Then there exists a basis
{g
j
: 0
j d}
for M
k
such that
g
j
M
k
(Z) for all j {0, ··· , d}.
a
n
(g
j
) = δ
nj
for all j, n {0, ··· , d}.
(ii) For any R, M
k
(R)
=
R
d+1
generated by {g
j
}.
Proof.
(i)
We take our previous basis
h
j
= ∆
j
E
b
6
E
(k12j6b)/4
4
M
k
(
Z
). Then we
have a
n
(h
n
) = 1, and a
j
(h
n
) = 0 for all j < n. Then we just row reduce.
(ii) The isomorphism is given by
M
k
(R) R
d+1
f (a
n
(f))
d
X
j=0
c
j
g
j
(c
n
)
6 Hecke operators
6.1 Hecke operators and algebras
Recall that for f : H C, γ GL
2
(R)
+
and k Z, we defined
(f |
k
γ)(z) = (det γ)
k/2
j(γ, z)
k
f(γ(z)),
where
γ =
a b
c d
, j(γ, z) = cz + d.
We then defined
M
k
= {f : f |
k
γ = f for all γ Γ(1) + holomorphicity condition}.
We showed that these are finite-dimensional, and we found a basis. But there
is more to be said about modular forms. Just because we know polynomials
have a basis 1
, x, x
2
, ···
does not mean there isn’t anything else to say about
polynomials!
In this chapter, we will figure that
M
k
has the structure of a module for
the Hecke algebra. This structure underlies the connection with arithmetic, i.e.
Galois representations etc.
How might we try to get some extra structure on
M
k
? We might try to see
what happens if we let something else in
GL
2
(
R
)
+
act on
f
. Unfortunately, in
general, if
f
is a modular form and
γ GL
2
(
R
)
+
, then
g
=
f |
k
γ
is not a modular
form. Indeed, given a δ Γ(1), then it acts on g by
g|
k
δ = f |
k
γδ = (f |
k
γδγ
1
)γ
and usually
γδγ
1
6∈
Γ(1). In fact the normalizer of Γ(1) in
GL
2
(
R
)
+
is generated
by Γ(1) and aI for a R
.
It turns out we need to act in a smarter way. To do so, we have to develop
quite a lot of rather elementary group theory.
Consider a group
G
, and Γ
G
. The idea is to use the double cosets of Γ
defined by
ΓgΓ = {γgγ
0
: γ, γ
0
Γ}.
One alternative way to view this is to consider the right multiplication action of
G
, hence Γ on the right cosets Γ
g
. Then the double coset Γ
g
Γ is the union of
the orbits of Γg under the action of Γ. We can write this as
ΓgΓ =
a
iI
Γg
i
for some g
i
gΓ G and index set I.
In our applications, we will want this disjoint union to be finite. By the
orbit-stabilizer theorem, the size of this orbit is the index of the stabilizer of Γ
g
in Γ. It is not hard to see that the stabilizer is given by Γ
g
1
Γ
g
. Thus, we
are led to consider the following hypothesis:
Hypothesis (H): For all g G, (Γ : Γ g
1
Γg) < .
Then (
G,
Γ) satisfies (H) iff for any
g
, the double coset Γ
g
Γ is the union of
finitely many cosets.
The important example is the following:
Theorem.
Let
G
=
GL
2
(
Q
), and Γ
SL
2
(
Z
) a subgroup of finite index. Then
(G, Γ) satisfies (H).
Proof. We first consider the case Γ = SL
2
(Z). We first suppose
g =
a b
c d
Mat
2
(Z),
and det g = ±N, N 1. We claim that
g
1
Γg Γ Γ(N ),
from which it follows that
(Γ : Γ g
1
Γg) < .
So given
γ
Γ(
N
), we need to show that
gγg
1
Γ, i.e. it has integer coefficients.
We consider
±N·gγg
1
=
a b
c d
γ
d b
c a
a b
c d
d b
c a
N I 0 (mod N).
So we know that
gγg
1
must have integer entries. Now in general, if
g
0
GL
2
(
Q
),
then we can write
g
0
=
1
M
g
for
g
with integer entries, and we know conjugating by
g
and
g
0
give the same
result. So (G, Γ) satisfies (H).
The general result follows by a butterfly. Recall that if (
G
:
H
)
<
and
(
G
:
H
0
)
<
, then (
G
:
H H
0
)
<
. Now if Γ
Γ(1) =
SL
2
(
Z
) is of finite
index, then we can draw the diagram
Γ(1)
g
1
Γ(1)g
Γ
Γ(1) g
1
Γ(1)g
g
1
Γg
Γ g
1
Γ(1)G Γ(1) g
1
Γg
Γ g
1
Γg
finite
finite
finite
finite
Each group is the intersection of the two above, and so all inclusions are of finite
index.
Note that the same proof works for GL
N
(Q) for any N .
Before we delve into concreteness, we talk a bit more about double cosets.
Recall that cosets partition the group into pieces of equal size. Is this true for
double cosets as well? We can characterize double cosets as orbits of Γ
×
Γ acting
on G by
(γ, δ) · g = γgδ
1
.
So G is indeed the disjoint union of the double cosets of Γ.
However, it is not necessarily the case that all double cosets have the same
size. For example
|
Γ
e
Γ
|
=
|
Γ
|
, but for a general
g
,
|
Γ
g
Γ
|
can be the union of
many cosets of Γ.
Our aim is to define a ring
H
(
G,
Γ) generated by double cosets called the
Hecke algebra. As an abelian group, it is the free abelian group on symbols
g
Γ] for each double coset
g
Γ]. It turns out instead of trying to define a
multiplication for the Hecke algebra directly, we instead try to define an action
of this on interesting objects, and then there is a unique way of giving
H
(
G,
Γ)
a multiplicative structure such that this is a genuine action.
Given a group
G
, a
G
-module is an abelian group with a
Z
-linear
G
-action.
In other words, it is a module of the group ring
ZG
. We will work with right
modules, instead of the usual left modules.
Given such a module and a subgroup Γ G, we will write
M
Γ
= {m M : = m for all γ Γ}.
Notation. For g G and m M
Γ
, we let
m|gΓ] =
n
X
i=1
mg
i
, ()
where
ΓgΓ =
n
a
i=1
Γg
i
.
The following properties are immediate, but also crucial.
Proposition.
(i) m|gΓ] depends only on ΓgΓ.
(ii) m|gΓ] M
Γ
.
Proof.
(i) If g
0
i
= γ
i
g
i
for γ
i
Γ, then
X
mg
0
i
=
X
i
g
i
=
X
mg
i
as m M
Γ
.
(ii) Just write it out, using the fact that {Γg
i
} is invariant under Γ.
Theorem.
There is a product on
H
(
G,
Γ) making it into an associative ring,
the Hecke algebra of (
G,
Γ), with unit
e
Γ] = [Γ], such that for every
G
-module
M, we have M
Γ
is a right H(G, Γ)-module by the operation ().
In the proof, and later on, we will use the following observation: Let
Z
\G
]
be the free abelian group on cosets
g
]. This has an obvious right
G
-action by
multiplication. We know a double coset is just an orbit of Γ acting on a single
coset. So there is an isomorphism between
Θ : H(G, Γ) Z \ G]
Γ
.
given by
gΓ] 7→
X
g
i
],
where
ΓgΓ =
a
Γg
i
.
Proof. Take M = Z \ G], and let
ΓgΓ =
a
Γg
i
ΓhΓ =
a
Γh
j
.
Then
X
i
g
i
] M
Γ
,
and we have
X
i
g
i
]|hΓ] =
X
i,j
g
i
h
j
] M
Γ
,
and this is well-defined. This gives us a well-defined product on
H
(
G,
Γ).
Explicitly, we have
gΓ] · hΓ] = Θ
1
X
i,j
g
i
h
j
]
.
It should be clear that this is associative, as multiplication in
G
is associative,
and [Γ] = [ΓeΓ] is a unit.
Now if M is any right G-module, and m M
Γ
, we have
m|gΓ]|hΓ] =
X
mg
i
|hΓ] =
X
mg
i
h
j
= m([ΓgΓ] · hΓ]).
So M
Γ
is a right H(G, Γ)-module.
Now in our construction of the product, we need to apply the map Θ
1
. It
would be nice to have an explicit formula for the product in terms of double
cosets. To do so, we choose representatives S G such that
G =
a
gS
ΓgΓ.
Proposition. We write
ΓgΓ =
r
a
i=1
Γg
i
ΓhΓ =
s
a
j=1
Γh
j
.
Then
gΓ] · hΓ] =
X
kS
σ(k)[ΓkΓ],
where σ(k) is the number of pairs (i, j) such that Γg
i
h
j
= Γk.
Proof. This is just a simple counting exercise.
Of course, we could have taken this as the definition of the product, but we
have to prove that this is independent of the choice of representatives
g
i
and
h
j
,
and of S, and that it is associative, which is annoying.
6.2 Hecke operators on modular forms
We are now done with group theory. For the rest of the chapter, we take
G
=
GL
2
(
Q
)
+
and Γ(1) =
SL
2
(
Z
). We are going to compute the Hecke algebra
in this case.
The first thing to do is to identify what the single and double cosets are.
Let’s first look at the case where the representative lives in GL
2
(Z)
+
. We let
γ GL
2
(Z)
+
with
det γ = n > 0.
The rows of
γ
generate a subgroup Λ
Z
2
. If the rows of
γ
0
also generate the
same subgroup Λ, then there exists δ GL
2
(Z) with det δ = ±1 such that
γ
0
= δγ.
So we have
deg γ
0
=
±n
, and if
det γ
0
= +
n
, then
δ SL
2
(
Z
) = Γ. This gives a
bijection
cosets Γγ such that
γ Mat
2
(Z), det γ = n
subgroups Λ Z
2
of index n
What we next want to do is to pick representatives of these subgroups¡ hence
the cosets. Consider an arbitrary subgroup Λ Z
2
= Ze
1
Ze
2
. We let
Λ Ze
2
= Z · de
2
for some d 1. Then we have
Λ = hae
1
+ be
2
, de
2
i
for some
a
1,
b Z
such that 0
b < d
. Under these restrictions,
a
and
b
are
unique. Moreover, ad = n. So we can define
Π
n
=

a b
0 d
Mat
2
(Z) : a, d 1, ad = n, 0 b < d
.
Then
n
γ Mat
2
(Z) : det γ = n
o
=
a
γΠ
n
Γγ.
These are the single cosets. How about the double cosets? The left hand side
above is invariant under Γ on the left and right, and is so a union of double
cosets.
Proposition.
(i) Let γ Mat
2
(Z) and det γ = n 1. Then
ΓγΓ = Γ
n
1
0
0 n
2
Γ
for unique n
1
, n
2
1 and n
2
| n
1
, n
1
n
2
= n.
(ii)
n
γ Mat
2
(Z) : det γ = n
o
=
a
Γ
n
1
0
0 n
2
Γ,
where we sum over all 1 n
2
| n
1
such that n = n
1
n
2
.
(iii) Let γ, n
1
, n
2
be as above. if d 1, then
Γ(d
1
γ)Γ = Γ
n
1
/d 0
0 n
2
/d
Γ,
Proof.
This is the Smith normal form theorem, or, alternatively, the fact that
we can row and column reduce.
Corollary. The set

Γ
r
1
0
0 r
2
Γ
: r
1
, r
2
Q
>0
,
r
1
r
2
Z
is a basis for H(G, Γ) over Z.
So we have found a basis. The next goal is to find a generating set. To do so,
we define the following matrices:
For 1 n
2
| n
1
, we define
T (n
1
, n
2
) =
Γ
n
1
0
0 n
2
Γ
For n 1, we write
R(n) =
Γ
n 0
0 n
Γ
=
Γ
n 0
0 n

= T (n, n)
Finally, we define
T (n) =
X
1n
2
|n
1
n
1
n
2
=n
T (n
1
, n
2
)
In particular, we have
T (1, 1) = R(1) = 1 = T (1),
and if n is square-free, then
T (n) = T (n, 1).
Theorem.
(i) R(mn) = R(m)R(n) and R(m)T (n) = T (n)R(m) for all m, n 1.
(ii) T (m)T (n) = T (mn) whenever (m, n) = 1.
(iii) T (p)T (p
r
) = T (p
r+1
) + pR(p)T (p
r1
) of r 1.
Before we prove this theorem, we see how it helps us find a nice generating
set for the Hecke algebra.
Corollary. H
(
G,
Γ) is commutative, and is generated by
{T
(
p
)
, R
(
p
)
, R
(
p
)
1
:
p prime}.
This is rather surprising, because the group we started with was very non-
commutative.
Proof. We know that T (n
1
, n
2
), R(p) and R(p)
1
generate H(G, Γ), because
Γ
p 0
0 p
Γ
Γ
n
1
0
0 n
2
Γ
=
Γ
pn
1
0
0 pn
2
Γ
In particular, when n
2
| n
1
, we can write
T (n
1
, n
2
) = R(n
2
)T
n
1
n
2
, 1
.
So it suffices to show that we can produce any
T
(
n,
1) from the
T
(
m
) and
R
(
m
).
We proceed inductively. The result is immediate when
n
is square-free, because
T (n, 1) = T (n). Otherwise,
T (n) =
X
1n
2
|n
1
n
1
n
2
=n
T (n
1
, n
2
)
=
X
1n
2
|n
1
n
1
n
2
=n
R(n
2
)T
n
1
n
2
, 1
= T (n, 1) +
X
1<n
2
|n
1
n
1
n
2
=n
R(n
2
)T
n
1
n
2
, 1
.
So
{T
(
p
)
, R
(
p
)
, R
(
p
)
1
}
does generate
H
(
G,
Γ), and by the theorem, we know
these generators commute. So H(G, Γ) is commutative.
We now prove the theorem.
Proof of theorem.
(i) We have
Γ
a 0
0 a
Γ
γΓ] =
Γ
a 0
0 a
γΓ
= [ΓγΓ]
Γ
a 0
0 a
Γ
by the formula for the product.
(ii) Recall we had the isomorphism Θ : H(G, Γ) 7→ Z \ G]
Γ
, and
Θ(T (n)) =
X
γΠ
n
γ]
for some Π
n
. Moreover,
{γZ
2
| γ
Π
n
}
is exactly the subgroups of
Z
2
of
index n.
On the other hand,
Θ(T (m)T (n)) =
X
δΠ
m
Π
n
δγ],
and
{δγZ
2
| δ Π
m
} = {subgroups of γZ
2
of index n}.
Since
n
and
m
are coprime, every subgroup Λ
Z
2
of index
mn
is
contained in a unique subgroup of index
n
. So the above sum gives exactly
Θ(T (mn)).
(iii) We have
Θ(T (p
r
)T (p)) =
X
δΠ
p
r
Π
p
δγ],
and for fixed
γ
Π
p
, we know
{δγZ
2
:
δ
Π
p
r
}
are the index
p
r
subgroups
of Z
2
.
On the other hand, we have
Θ(T (p
r+1
)) =
X
εΠ
p
r+1
ε],
where {εZ
2
} are the subgroups of Z
2
of index p
r+1
.
Every Λ =
εZ
2
of index
p
r+1
is a subgroup of some index
p
subgroup
Λ
0
Z
2
of index
p
r
. If Λ
6⊆ pZ
2
, then Λ
0
is unique, and Λ
0
= Λ +
pZ
2
. On
the other hand, if Λ pZ
2
, i.e.
ε =
p 0
0 p
ε
0
for some
ε
0
of determinant
p
r1
, then there are (
p
+1) such Λ
0
corresponding
to the (p + 1) order p subgroups of Z
2
/pZ
2
.
So we have
Θ(T (p
r
)T (p)) =
X
εΠ
p
r+1
\(pIΓ
p
r1
)
ε] + (p + 1)
X
ε
0
Π
p
r1
pIε
0
]
=
X
εΠ
p
r+1
ε] + p
X
ε
0
Π
p
r1
pIε
0
]
= T (p
r+1
) + pR(p)T (p
r1
).
What’s underlying this is really just the structure theorem of finitely generated
abelian groups. We can replace
GL
2
with
GL
N
, and we can prove some analogous
formulae, only much uglier. We can also do this with
Z
replaced with any principal
ideal domain.
Given all these discussion of the Hecke algebra, we let them act on modular
forms! We write
V
k
= {all functions f : H C}.
This has a right G = GL
2
(Q)
+
action on the right by
g : f 7→ f |
k
g.
Then we have M
k
V
Γ
k
. For f V
Γ
k
, and g G, we again write
ΓgΓ =
a
Γg
i
,
and then we have
f |
k
gΓ] =
X
f |
k
g
i
V
Γ
k
.
Recall when we defined the slash operator, we included a determinant in there.
This gives us
f |
k
R(n) = f
for all n 1, so the R(n) act trivially. We also define
T
n
= T
k
n
: V
Γ
k
V
Γ
k
by
T
n
f = n
k/21
f |
k
T (n).
Since
H
(
G,
Γ) is commutative, there is no confusion by writing
T
n
on the left
instead of the right.
Proposition.
(i) T
k
mn
T
k
m
T
k
n
if (m, n) = 1, and
T
k
p
r+1
= T
k
p
r
T
k
p
p
k1
T
k
p
r1
.
(ii) If f M
k
, then T
n
f M
k
. Similarly, if f S
k
, then T
n
f S
k
.
(iii) We have
a
n
(T
m
f) =
X
1d|(m,n)
d
k1
a
mn/d
2
(f).
In particular,
a
0
(T
m
f) = σ
k1
(m)a
0
(f).
Proof.
(i) This follows from the analogous relations for T (n), plus f|R(n) = f.
(ii)
This follows from (iii), since
T
n
clearly maps holomorphic
f
to holomorphic
f.
(iii) If r Z, then
q
r
|
k
T (m) = m
k/2
X
e|m,0b<e
e
k
exp
2πi
mzr
e
2
+ 2πi
br
e
,
where we use the fact that the elements of Π
m
are those of the form
Π
m
=

a b
0 e
: ae = m, 0 b < e
.
Now for each fixed
e
, the sum over
b
vanishes when
r
e
6∈ Z
, and is
e
otherwise. So we find
q
r
|
k
T (m) = m
k/2
X
e|(n,r)
e
1k
q
mr/e
2
.
So we have
T
m
(f) =
X
r0
a
r
(f)
X
e|(m,r)
m
e
k1
q
mr/e
2
=
X
1d|m
e
k1
X
a
ms/d
(f)q
ds
=
X
n0
X
d|(m,n)
d
k1
a
mn/d
2
q
n
,
where we put n = ds.
So we actually have a rather concrete formula for what the action looks like.
We can use this to derive some immediate corollaries.
Corollary. Let f M
k
be such that
T
n
(f) = λf
for some m > 1 and λ C. Then
(i) For every n with (n, m) = 1, we have
a
mn
(f) = λa
n
(f).
If a
0
(f) 6= 0, then λ = σ
k1
(m).
Proof. This just follows from above, since
a
n
(T
m
f) = λa
n
(f),
and then we just plug in the formula.
This gives a close relationship between the eigenvalues of
T
m
and the Fourier
coefficients. In particular, if we have an
f
that is an eigenvector for all
T
m
, then
we have the following corollary:
Corollary. Let 0 6= f M
k
, and k 4 with T
m
f = λ
m
f for all m 1. Then
(i) If f S
k
, then a
1
(f) 6= 0 and
f = a
1
(f)
X
n1
λ
n
q
n
.
(ii) If f 6∈ S
k
, then
f = a
0
(f)E
k
.
Proof.
(i) We apply the previous corollary with n = 1.
(ii)
Since
a
0
(
f
)
6
= 0, we know
a
n
(
f
) =
σ
k1
(
m
)
a
1
(
f
) by (both parts of) the
corollary. So we have
f = a
0
(f) + a
1
(f)
X
n1
σ
k1
(n)q
n
= A + BE
k
.
But since F and E
k
are modular forms, and k 6= 0, we know A = 0.
Definition
(Hecke eigenform)
.
Let
f S
k
\ {
0
}
. Then
f
is a Hecke eigenform
if for all n 1, we have
T
n
f = λ
n
f
for some l
n
C. It is normalized if a
1
(f) = 1.
We now state a theorem, which we cannot prove yet, because there is still
one missing ingredient. Instead, we will give a partial proof of it.
Theorem.
There exists a basis for
S
k
consisting of normalized Hecke eigenforms.
So this is actually typical phenomena!
Partial proof. We know that {T
n
} are commuting operators on S
k
.
Fact. There exists an inner product on S
k
for which {T
n
} are self-adjoint.
Then by linear algebra, the {T
n
} are simultaneously diagonalized.
Example.
We take
k
= 12, and
dim S
12
= 1. So everything in here is an
eigenvector. In particular,
∆(z) =
X
n1
τ(n)q
n
is a normalized Hecke eigenform. So
τ
(
n
) =
λ
n
. Thus, from properties of the
T
n
, we know that
τ(mn) = τ(m)τ(n)
τ(p
r+1
) = τ(p)τ(p
r
) p
11
τ(p
r1
)
whenever (m, n) = 1 and r 1.
We can do this similarly for
k
= 16
,
18
,
20
,
22
,
26, because
dim S
k
= 1, with
Hecke eigenform f = E
k12
∆.
Unfortunately, when
dim S
k
(Γ(1))
>
1, there do not appear to be any
“natural” eigenforms. It seems like we just have to take the space and diagonalize
it by hand. For example,
S
24
has dimension 2, and the eigenvalues of the
T
n
live in the strange field
Q
(
144169
) (note that 144169 is a prime), and not in
Q
.
We don’t seem to find good reasons for why this is true. It appears that the nice
things that happen for small values of
k
happen only because there is no choice.
7 L-functions of eigenforms
Given any modular form, or in fact any function with a q expansion
f =
X
n1
a
n
q
n
S
k
(Γ(1)),
we can form a Dirichlet series
L(f, s) =
X
n1
a
n
n
s
.
Our goal of this chapter is to study the behaviour of this
L
-function. There
are a few things we want to understand. First, we want to figure out when
this series converges. Next, we will come up with an Euler product formula
for the
L
-series. Finally, we come up with an analytic continuation and then a
functional equation.
Afterwards, we will use such analytic methods to understand how
E
2
, which
we figured is not a modular form, transforms under Γ(1), and this in turns gives
us a product formula for ∆(z).
Notation.
We write
|a
n
|
=
O
(
n
k/2
) if there exists
c R
such that for sufficiently
large n, we have |a
n
| cn
k/2
. We will also write this as
|a
n
| n
k/2
.
The latter notation might seem awful, but it is very convenient if we want to
write down a chain of such “inequalities”.
Proposition.
Let
f S
k
(Γ(1)). Then
L
(
f, s
) converges absolutely for
Re
(
s
)
>
k
2
+ 1.
To prove this, it is enough to show that
Lemma. If
f =
X
n1
a
n
q
n
S
k
(Γ(1)),
then
|a
n
| n
k/2
Proof.
Recall from the example sheet that if
f S
k
, then
y
k/2
|f|
is bounded on
the upper half plane. So
|a
n
(f)| =
1
2π
Z
|q|=r
q
n
˜
f(q)
dq
q
for r (0, 1). Then for any y, we can write this as
Z
1
0
e
2πin(x+iy)
f(x + iy)dx
e
2πny
sup
0x1
|f(x + iy)| e
2πny
y
k/2
.
We now pick y =
1
n
, and the result follows.
As before, we can write the
L
-function as an Euler product. This time it
looks a bit more messy.
Proposition. Suppose f is a normalized eigenform. Then
L(f, s) =
Y
p prime
1
1 a
p
p
s
+ p
k12s
.
This is a very important fact, and is one of the links between cusp forms and
algebraic number theory.
There are two parts of the proof a formal manipulation, and then a
convergence proof. We will not do the convergence part, as it is exactly the same
as for ζ(s).
Proof. We look at
(1 a
p
p
s
+ p
k12s
)(1 + a
p
p
s
+ a
p
2
p
2s
+ ···)
= 1 +
X
r2
(a
p
r
+ p
k1
a
p
r2
a
p
a
r1
p
)p
rs
.
Since we have an eigenform, all of those coefficients are zero. So this is just 1.
Thus, we know
1 + a
p
p
s
+ a
p
2
p
2s
+ ··· =
1
1 a
p
p
s
+ p
k12s
.
Also, we know that when (m, n) = 1, we have
a
mn
= a
m
a
n
,
and also a
1
= 1. So we can write
L(f, s) =
Y
p
(1 + a
p
p
s
+ a
p
2
p
2s
+ ···) =
Y
p
1
1 a
p
p
s
+ p
k12s
.
We now obtain an analytic continuation and functional equation for our
L
-functions. It is similar to what we did for the
ζ
-function, but it is easier this
time, because we don’t have poles.
Theorem.
If
f S
k
then,
L
(
f, s
) is entire, i.e. has an analytic continuation to
all of C. Define
Λ(f, s) = (2π)
s
Γ(s)L(f, s) = M(f(iy), s).
Then we have
Λ(f, s) = (1)
k/2
Λ(f, k s).
The proof follows from the following more general fact:
Theorem. Suppose we have a function
0 6= f(z) =
X
n1
a
n
q
n
,
with a
n
= O(n
R
) for some R, and there exists N > 0 such that
f |
k
0 1
N 0
= cf
for some k Z
>0
and c C. Then the function
L(s) =
X
n1
a
n
n
s
is entire. Moreover, c
2
= (1)
k
, and if we set
Λ(s) = (2π)
s
Γ(s)L(s), ε = c · i
k
1},
then
Λ(k s) = εN
sk/2
Λ(s).
Note that the condition is rather weak, because we don’t require
f
to even
be a modular form! If we in fact have
f S
k
, then we can take
N
= 1
, c
= 1,
and then we get the desired analytic continuation and functional equation for
L(f, s).
Proof. By definition, we have
cf(z) = f |
k
0 1
N 0
= N
k/2
z
k
f
1
Nz
.
Applying the matrix once again gives
f |
k
0 1
N 0
|
k
0 1
N 0
= f |
k
N 0
0 N
= (1)
k
f(z),
but this is equal to c
2
f(z). So we know
c
2
= (1)
k
.
We now apply the Mellin transform. We assume Re(s) 0, and then we have
Λ(f, s) = M(f(iy), s) =
Z
0
f(iy)y
s
dy
y
=
Z
1/
N
+
Z
1/
N
0
!
f(iy)y
s
dy
y
.
By a change of variables, we have
Z
1/
N
0
f(iy)y
s
dy
y
=
Z
1/
N
f
i
Ny
N
s
y
s
dy
y
=
Z
1/
N
ci
k
N
k/2s
f(iy)y
ks
dy
y
.
So
Λ(f, s) =
Z
1/
N
f(iy)(y
s
+ εN
k/2s
y
ks
)
dy
y
,
where
ε = i
k
c = ±1.
Since
f
0 rapidly for
y
, this integral is an entire function of
s
, and
satisfies the functional equation
Λ(f, k s) = εN
s
k
2
Λ(f, s).
Sometimes, we absorb the power of N into Λ, and define a new function
Λ
(f, s) = N
s/2
Λ(f, s) = εΛ
(f, k s).
However, we can’t get rid of the ε.
What we have established is a way to go from modular forms to
L
-functions,
and we found that these
L
-functions satisfy certain functional equations. Now is
it possible to go the other way round? Given any
L
-function, does it come from
a modular form? This is known as the converse problem. One obvious necessary
condition is that it should satisfy the functional equation, but is this sufficient?
To further investigate this, we want to invert the Mellin transform.
Theorem
(Mellin inversion theorem)
.
Let
f
: (0
,
)
C
be a
C
function
such that
for all N, n 0, the function y
N
f
(n)
(y) is bounded as y ; and
there exists
k Z
such that for all
n
0, we have
y
n+k
f
(n)
(
y
) bounded
as y 0.
Let Φ(s) = M (f, s), analytic for Re(s) > k. Then for all σ > k, we have
f(y) =
1
2πi
Z
σ+i
σi
Φ(s)y
s
ds.
Note that the conditions can be considerably weakened, but we don’t want
to do so much analysis.
Proof.
The idea is to reduce this to the inversion of the Fourier transform. Fix
a σ > k, and define
g(x) = e
2πσx
f(e
2πx
) C
(R).
Then we find that for any
N, n
0, the function
e
Nx
g
(n)
(
x
) is bounded as
x +. On the other hand, as x −∞, we have
g
(n)
(x)
n
X
j=0
e
2π(σ+j)x
|f
(j)
(e
2πx
)|
n
X
j=0
e
2π(σ+j)x
e
2π(j+k)x
e
2π(σk)x
.
So we find that
g S
(
R
). This allows us to apply the Fourier inversion formula.
By definition, we have
ˆg(t) =
Z
−∞
e
2πσx
f(e
2πx
)e
2πixt
dx
=
1
2π
Z
0
y
σ+it
f(y)
dy
y
=
1
2π
Φ(σ + it).
Applying Fourier inversion, we find
f(y) = y
σ
g
log y
2π
= y
σ
1
2π
Z
−∞
e
2πit(log y/2π)
Φ(σ + it) dt
=
1
2πi
Z
σ+i
σi
Φ(s)y
s
ds.
We can now use this to prove a simple converse theorem.
Theorem. Let
L(s) =
X
n1
a
n
n
s
be a Dirichlet series such that
a
n
=
O
(
n
R
) for some
R
. Suppose there is some
even k 4 such that
L(s) can be analytically continued to {Re(s) >
k
2
ε} for some ε > 0;
|L(s)| is bounded in vertical strips {σ
0
Re s σ
1
} for
k
2
σ
0
< σ
1
.
The function
Λ(s) = (2π)
s
Γ(s)L(s)
satisfies
Λ(s) = (1)
k/2
Λ(k s)
for
k
2
ε < Re s <
k
2
+ ε.
Then
f =
X
n1
a
n
q
n
S
k
(Γ(1)).
Note that the functional equation allows us to continue the Dirichlet series
to the whole complex plane.
Proof.
Holomorphicity of
f
on
H
follows from the fact that
a
n
=
O
(
n
R
), and
since it is given by a
q
series, we have
f
(
z
+ 1) =
f
(
z
). So it remains to show
that
f
1
z
= z
k
f(z).
By analytic continuation, it is enough to show this for
f
i
y
= (iy)
k
f(iy).
Using the inverse Mellin transform (which does apply in this case, even if it
might not meet the conditions of the version we proved), we have
f(iy) =
1
2πi
Z
σ+i
σi
Λ(s)y
s
ds
=
1
2πi
Z
k
2
+i
k
2
i
Λ(s)y
s
ds
=
(1)
k/2
2πi
Z
k
2
+i
k
2
i
Λ(k s)y
s
ds
=
(1)
k/2
2πi
Z
k
2
+i
k
2
i
Λ(s)y
sk
ds
= (1)
k/2
y
k
f
i
y
.
Note that for the change of contour, we need
Z
σ±iT
k
2
±iT
Λ(s)y
s
ds 0
as
T
. To do so, we need the fact that Γ(
σ
+
iT
)
0 rapidly as
T ±∞
uniformly for σ in any compact set, which indeed holds in this case.
This is a pretty result, but not really of much interest at this level. However,
it is a model for other proofs of more interesting things, which we unfortunately
would not go into.
Recall we previously defined the Eisenstein series E
2
, and found that
E
2
(z) = 1 24
X
n1
σ
1
(n)q
n
.
We know this is not a modular form, because there is no modular form of weight
2. However,
E
2
does satisfy
E
2
(
z
+ 1) =
E
(
z
), as it is given by a
q
-expansion.
So we know that E
2
(
1
z
) 6= z
2
E
2
(z). But what is it?
We let
f(y) =
1 E
2
(iy)
24
=
X
n1
σ
1
(n)e
2πny
.
Proposition. We have
M(f, s) = (2π)
s
Γ(s)ζ(s)ζ(s 1).
This is a really useful result, because we understand Γ and ζ well.
Proof. Doing the usual manipulations, it suffices to show that
X
σ
1
(m)m
s
= ζ(s)ζ(s 1).
We know if (m, n) = 1, then
σ
1
(mn) = σ
1
(m)σ
1
(n).
So we have
X
m1
σ
1
(m)m
s
=
Y
p
(1 + (p + 1)p
s
+ (p
2
+ p + 1)p
2s
+ ···).
Also, we have
(1 p
s
)(1 + (p + 1)p
s
+ (p
2
+ p + 1)p
2s
+ ···)
= 1 + p
1s
+ p
22s
+ ··· =
1
1 p
1s
.
Therefore we find
X
σ
1
(m)m
s
= ζ(s)ζ(s 1).
The idea is now to use the functional equation for
ζ
and the inverse Mellin
transform to obtain the transformation formula for
E
2
. This is the reverse of
what we did for genuine modular forms. This argument is due to Weil.
Recall that we defined
Γ
R
(s) = π
s/2
Γ
s
2
, Z(s) = Γ
R
(s)ζ(s).
Then we found the functional equation
Z(s) = Z(1 s).
Similarly, we defined
Γ
C
(s) = 2(2π)
s
Γ(s) = Γ
R
(s
R
(s + 1),
where the last equality follows from the duplication formula. Then we know
(2π)
s
Γ(s) = (2π)
s
(s 1)Γ(s 1) =
s 1
4π
Γ
R
(s
R
(s 1).
This implies we have the functional equation
Proposition.
M(f, s) =
s 1
4π
Z(s)Z(s 1) = M (f, 2 s).
This also tells us the function is holomorphic except for poles at
s
= 0
,
1
,
2,
which are all simple.
Theorem. We have
f(y) + y
2
f
1
y
=
1
24
1
4π
y
1
+
1
24
y
2
.
Proof.
We will apply the Mellin inversion formula. To justify this application,
we need to make sure our
f
behaves sensibly ass
y
0
,
. We use the absurdly
terrible bound
σ
1
(m)
X
1dm
d m
2
.
Then we get
f
(n)
(y)
X
m1
m
2+n
e
2πmy
This is certainly very well-behaved as
y
, and is
y
N
for all
N
. As
y 0, this is
1
(1 e
2πy
)
n+3
y
n3
.
So f satisfies conditions of our Mellin inversion theorem with k = 3.
We pick any σ > 3. Then the inversion formula says
f(y) =
1
2πi
Z
σ+i
σi
M(f, s)y
s
ds.
So we have
f
1
y
=
1
2πi
Z
σ+i
σi
M(f, 2 s)y
s
ds
=
1
2πi
Z
2σ+i
2σi
M(f, s)y
2s
ds
So we have
f(y) + y
2
f
1
y
=
1
2πi
Z
σ+i
σi
Z
2+σ+i
2σi
M(f, s)y
s
ds.
This contour is pretty simple. It just looks like this:
×× ×
210
Using the fact that
M
(
f, s
) vanishes quickly as
|Im
(
s
)
|
, this is just the
sum of residues
f(y) + y
2
f
1
y
=
X
s
0
=0,1,2
res
s=s
0
M(f, s)y
s
0
.
It remains to compute the residues. At s = 2, we have
res
s=2
M(f, s) =
1
4π
Z(2) res
s=1
Z(s) =
1
4π
·
π
6
· 1 =
1
24
.
By the functional equation, this implies
res
s=0
M(f, s) =
1
24
.
Now it remains to see what happens when s = 1. We have
res
s=1
M(f, s) =
1
4π
res
s=1
Z(s) res
s=0
Z(s) =
1
4π
.
So we are done.
Corollary.
E
2
1
z
= z
2
E
2
(z) +
12z
2πi
.
Proof. We have
E
2
(iy) = 1 24f(y)
= 1 24y
2
f
1
y
1 +
6
π
y
1
+ y
2
= y
2
1 24f
1
y

+
6
π
y
1
= y
2
E
1
iy
+
6
π
y
1
.
Then the result follows from setting
z
=
iy
, and then applying analytic con-
tinuiation.
Corollary.
∆(z) = q
Y
m1
(1 q
m
)
24
.
Proof.
Let
D
(
z
) be the right-hand-side. It suffices to show this is a modular
form, since
S
12
(Γ(1)) is one-dimensional. It is clear that this is holomorphic on
H, and D(z + 1) = D(z). If we can show that
D |
12
0 1
1 0
= D,
then we are done. In other words, we need to show that
D
1
z
= z
12
D(z).
But we have
D
0
(z)
D(z)
= 2πi 24
X
m1
2πimq
1 q
m
= 2πi
1 24
X
m,d1
mq
md
= 2πiE
2
(z)
So we know
d
dz
log D
1
z

=
1
z
2
D
0
D
1
z
=
1
z
2
2πiE
2
1
z
=
D
0
D
(z) + 12
d
dz
log z.
So we know that
log D
1
z
= log D + 12 log z + c,
for some locally constant function c. So we have
D
1
z
= z
12
D(z) · C
for some other constant
C
. By trying
z
=
i
, we find that
C
= 1 (since
D
(
i
)
6
= 0
by the infinite product). So we are done.
8 Modular forms for subgroups of SL
2
(Z)
8.1 Definitions
For the rest of the course, we are going to look at modular forms defined on
some selected subgroups of SL
2
(Z).
We fix a subgroup Γ
Γ(1) of finite index. For Γ(1), we defined a modular
form to a holomorphic function
f
:
H C
that is invariant under the action of
Γ(1), and is holomorphic at infinity. For a general subgroup Γ, the invariance
part of the definition works just as well. However, we need something stronger
than just holomorphicity at infinity.
Before we explain what the problem is, we first look at some examples. Recall
that we write
¯
Γ for the image of Γ in PSL
2
(Z).
Lemma.
Let Γ
Γ(1) be a subgroup of finite index, and
γ
1
, ··· , γ
i
be right
coset representatives of
¯
Γ in Γ(1), i.e.
Γ(1) =
d
a
i=1
¯
Γγ
i
.
Then
d
a
i=1
γ
i
D
is a fundamental domain for Γ.
Example. Take
Γ
0
(p) =

a b
c d
SL
2
(Z) : b 0 (mod p)
Recall there is a canonical map
SL
2
(
Z
)
SL
2
(
F
p
) that is surjective. Then Γ
0
(
p
)
is defined to be the inverse image of
H =

a 0
b c

SL
2
(F
p
).
So we know
(Γ(1) : Γ
0
(p)) = (SL
2
(F
q
) : H) =
|SL
2
(F
q
)|
|H|
= p + 1,
where the last equality follows from counting. In fact, we have an explicit choice
of coset representatives
SL
2
(F
p
) =
a
bF
p
0
∗ ∗
1 b
0 1
a
0
∗ ∗
1
1
Thus, we also have coset representatives of Γ
0
(p) by
T
b
=
1 b
0 1
: b F
p
S =
0 1
1 0

.
For example, when
p
= 3, then
b {
0
,
1
,
+1
}
. Then the fundamental domain
is
D
T
1
DT
1
D
SD
So in defining modular forms, we’ll want to control functions as
z
0 (in
some way), as well as when
y
. In fact, what we really need is that the
function has to be holomorphic at all points in
P
1
(
Q
) =
Q {∞}
. It happens
that in the case of Γ(1), the group Γ(1) acts transitively on
Q {∞}
. So by
invariance of
f
under Γ(1), being holomorphic at
ensures we are holomorphic
everywhere.
In general, we will have to figure out the orbits of
Q{∞}
under Γ, and then
pick a representative of each orbit. Before we go into that, we first understand
what the fundamental domain of Γ looks like.
Definition (Cusps). The cusps of Γ (or
¯
Γ) are the orbits of Γ on P
1
(Q).
We would want to say a subgroup of index
n
has
n
many cusps, but this
is obviously false, as we can see from our example above. The problem is that
we should could each cusp with “multiplicity”. We will call this the width. For
example, in the fundamental domain above
D
p =
T
1
DT
1
D
SD
0
In this case, we should count
p
=
three times, and
p
= 0 once. One might
worry this depends on which fundamental domain we pick for Γ. Thus, we will
define it in a different way. From now on, it is more convenient to talk about
¯
Γ
than Γ.
Since
Γ(1)
acts on
P
1
(
Q
) transitively, it actually just suffices to understand
how to define the multiplicity for the cusp of
. The stabilizer of
in
Γ(1)
is
Γ(1)
=
±
1 b
0 1
: b Z
.
For a general subgroup
Γ Γ(1)
, the stabilizer of
is
Γ
=
Γ Γ(1)
. Then
this is a finite index subgroup of Γ(1)
, and hence must be of the form
Γ
=
±
1 m
0 1

for some m 1. We define the width of the cusp to be this m.
More generally, for an arbitrary cusp, we define the width by conjugation.
Definition
(Width of cusp)
.
Let
α Q {∞}
be a representation of a cusp of
Γ. We pick g Γ(1) with g() = α. Then γ(α) = α iff g
1
γg() = . So
g
1
Γ
α
g = (g
1
Γg)
=
±
1 m
α
0 1

for some m
α
1. This m
α
is called the width of the cusp α (i.e. the cusp Γα).
The g above is not necessarily unique. But if g
0
is another, then
g
0
= g
±1 n
0 ±1
for some n Z. So m
α
is independent of the choice of g.
As promised, we have the following proposition:
Proposition. Let Γ have ν cusps of widths m
1
, ··· , m
ν
. Then
ν
X
i=1
m
i
= (
Γ(1) :
¯
Γ).
Proof. There is a surjective map
π :
¯
Γ \ Γ(1) cusps
given by sending
¯
Γ · γ 7→
¯
Γ · γ().
It is then an easy group theory exercise that |π
1
([α])| = m
α
.
Example. Consider the following subgroup
Γ = Γ
0
(p) =

a b
c d
: c 0 (mod p)
.
Then we have
Γ(1) : Γ
0
(p)
=
Γ(1) : Γ
0
(p)
= p + 1.
We can compute
Γ
=
1,
1 1
0 1

= Γ(1)
.
So m
= 1. But we also have
Γ
0
=
1,
1 0
p 1

,
and this gives
m
0
=
p
. Since
p
+ 1 =
p
= 1, these are the only cusps of Γ
0
(
p
).
Likewise, for Γ
0
(p), we have m
= p and m
0
= 1.
Equipped with the definition of a cusp, we can now define a modular form!
Definition
(Modular form)
.
Let Γ
SL
2
(
Z
) be of finite index, and
k Z
. A
modular form of weight k on Γ is a holomorphic function f : H C such that
(i) f |
k
γ = f for all γ Γ.
(ii) f is holomorphic at the cusps of Γ.
If moreover,
(iii) f vanishes at the cusps of Γ,
then we say f is a cusp form.
As before, we have to elaborate a bit more on what we mean by (ii) and (iii).
We already know what it means when the cusp is
(i.e. it is Γ
). Now in
general, we write our cusp as Γα = Γg() for some g Γ(1).
Then we know
¯
Γ
α
= g
±
1 m
0 1

g
1
.
This implies we must have
1 m
0 1
or
1 m
0 1
g
1
Γ
α
g.
Squaring, we know that we must have
1 2m
0 1
g
1
Γ
α
g.
So we have
f |
k
g|
k
(
1 2m
0 1
) = f |
k
g.
So we know
(f |
k
g)(z + 2m) = (f |
k
g)(z).
Thus, we can write
f |
k
g =
˜
f
g
(q) =
X
nZ
(constant)q
n/2m
=
X
nQ
2mnZ
a
g,n
(f)q
n
,
where we define
q
a/b
= e
2πiaz/b
.
Then f is holomorphic at the cusp α = g() if
a
g,n
(f) = 0
for all n < 0, and vanishes at α if moreover
a
g,0
(f) = 0.
Note that if I Γ, then in fact
1 m
0 1
g
1
Γ
α
g.
So the q expansion at α is actually series in q
1/m
.
There is a better way of phrasing this. Suppose we have
g
(
) =
α
=
g
0
(
),
where g
0
GL
2
(Q)
+
. Then we have
g
0
= gh
for some h GL
2
(Q)
+
such that h() = . So we can write
h = ±
a b
0 c
where a, b, d Q and a, d > 0.
Then, we have
f |
k
g
0
= (f |
k
g)|
k
h
=
X
nQ
2mnZ
a
g,n
(f)q
n
|
k
±
a b
0 d
= (±i)
k
X
n
a
g,n
(f)q
an/d
e
2πbn/d
.
In other words, we have
f |
k
g =
X
n0
c
n
q
rn
for some positive r Q. So condition (ii) is equivalent to (ii’):
(ii’) For all g GL
2
(Q)
+
, the function f |
k
g is holomorphic at .
Note that (ii’) doesn’t involve Γ, which is convenient. Likewise, (iii) is
consider to
(iii’) f |
k
g vanishes at for all g GL
2
(Q)
+
.
We can equivalently replace GL
2
(Q)
+
with SL
2
(Z).
Modular form and cusp forms of weight
k
on Γ form a vector space
M
k
(Γ)
S
k
(Γ).
Recall that for Γ = Γ(1) = SL
2
(Z), we knew M
k
= 0 if k is odd, because
f |
k
(I) = (1)
k
f.
More generally, if
I
Γ, then
M
k
= 0 for all odd
k
. But if
I 6∈
Γ, then
usually there can be non-zero forms of odd weight.
Let’s see some examples of such things.
Proposition.
Let Γ
Γ(1) be of finite index, and
g G
=
GL
2
(
Q
)
+
. Then
Γ
0
=
g
1
Γ
g
Γ(1) also has finite index in Γ(1), and if
f M
k
(Γ) or
S
k
(Γ), then
f |
k
g M
k
0
) or S
k
0
) respectively.
Proof.
We saw that (
G,
Γ) has property (H). So this implies the first part. Now
if γ Γ
0
, then gγg
1
Γ. So
f |
k
gγg
1
= f f |
k
g|
k
γ = f |
k
g.
The conditions (ii’) and (iii’) are clear.
This provides a way of producing a lot of modular forms. For example, we
can take Γ = Γ(1), and take
g =
N 0
0 1
.
Then it is easy to see that Γ
0
= Γ
0
(
N
). So if
f
(
z
)
M
k
(Γ(1)), then
f
(
Nz
)
M
k
0
(
N
)). But in general, there are lots of modular forms in Γ
0
(
N
) that
cannot be constructed this way.
As before, we don’t have a lot of modular forms in low dimensions, and there
is also an easy bound for those in higher dimensions.
Theorem. We have
M
k
(Γ) =
(
0 k < 0
C k = 0
,
and
dim
C
M
k
(Γ) 1 +
k
12
(Γ(1) : Γ).
for all k > 0.
In contrast to the case of modular forms of weight 1, we don’t have explicit
generators for this.
Proof. Let
Γ(1) =
d
a
i=1
Γγ
i
.
We let
f M
k
(Γ),
and define
N
f
=
Y
1id
f |
k
γ
i
.
We claim that
N
f
M
kd
(Γ(1)), and
N
f
= 0 iff
f
= 0. The latter is obvious by
the principle of isolated zeroes.
Indeed, f is certainly holomorphic on H, and if γ Γ(1), then
N
f
|
k
γ =
Y
i
f |
k
γ
i
γ = N
f
.
As f M
k
(Γ), each f |
k
γ
i
is holomorphic at .
If k < 0, then N
f
M
kd
(Γ(1)) = 0. So f = 0.
If
k
0, then suppose
dim M
k
(
G
)
> N
. Pick
z
1
, ··· , z
N
D \ {i, ρ}
distinct. Then there exists 0 6= f M
k
(Γ) with
f(z
1
) = ··· = f (z
N
) = 0.
So
N
f
(z
1
) = ··· = N
f
(z
N
) = 0.
Then by our previous formula for zeros of modular forms, we know
N
kd
12
.
So dim M
k
(Γ) 1 +
kd
12
.
If k = 0, then M
0
(Γ) has dimension 1. So M
0
(Γ) = C.
8.2 The Petersson inner product
As promised earlier, we define an inner product on the space of cusp forms.
We let
f, g S
k
(Γ). Then the function
y
k
f
(
z
)
g(z)
is Γ-invariant, and is
bounded on
H
, since
f
and
g
vanish at cusps. Also, recall that
dx dy
y
2
is an
GL
2
(R)
+
-invariant measure. So we can define
hf, gi =
1
v(Γ)
Z
Γ\H
y
k
f(z)g(z)
dx dy
y
z
C,
where
R
Γ\H
means we integrate over any fundamental domain, and
v
(Γ) is the
volume of a fundamental domain,
v(Γ) =
Z
ΓH
dx dy
y
2
= (Γ(1) :
¯
Γ)
Z
D
dx dy
y
2
.
The advantage of this normalization is that if we replace Γ by a subgroup Γ
0
of finite index, then a fundamental domain for Γ
0
is the union of (
¯
Γ
:
¯
Γ
0
) many
fundamental domains for Γ. So the expression (
) is independent of Γ, as long
as both f, g S
k
(Γ).
This is called the Petersson inner product.
Proposition.
(i) h·, ·i is a Hermitian inner product on S
k
(Γ).
(ii) h·, ·i
is invariant under translations by
GL
2
(
Q
)
+
. In other words, if
γ GL
2
(Q)
+
, then
hf |
k
γ, g |
k
γi = hf, gi.
(iii) If f, g S
k
(Γ(1)), then
hT
n
f, gi = hf, T
n
gi.
This completes our previous proof that the
T
n
can be simultaneously diago-
nalized.
Proof.
(i)
We know
hf, gi
is
C
-linear in
f
, and
hf, gi
=
hg, f i
. Also, if
hf, fi
= 0,
then
Z
Γ\H
y
k2
|f|
2
dx dy = 0,
but since
f
is continuous, and
y
is never zero, this is true iff
f
is identically
zero.
(ii) Let f
0
= f |
k
γ and g
0
= g|
k
γ S
k
0
), where Γ
0
= Γ γ
1
Γγ. Then
y
k
f
0
¯g
0
= y
k
(det γ)
k
|cz + d|
2k
· f(γ(z))
g(γ(z)) = (Im γ(z))
k
f(γ(z))g(γ(z)).
Now Im γ(z) is just the y of γ(z). So it follows that Then we have
hf
0
, g
0
i =
1
v
0
)
Z
D
Γ
0
y
k
f ¯g
dx dy
y
2
γ(z)
=
1
v
0
)
Z
γ(D
Γ
0
)
y
k
f ¯g
dx dy
y
2
.
Now
γ
(
D
Γ
0
) is a fundamental domain for
γ
Γ
0
γ
1
=
γ
Γ
γ
1
Γ, and note that
v
0
) = v(γΓ
0
γ
1
) by invariance of measure. So hf
0
, g
0
i = hf, gi.
(iii)
Note that
T
n
is a polynomial with integer coefficients in
{T
p
:
p | n}
. So it
is enough to do it for n = p. We claim that
hT
p
f, gi = p
k
2
1
(p + 1)hf |
k
δ, gi,
where δ Mat
2
(Z) is any matrix with det(δ) = p.
Assuming this, we let
δ
a
=
1
Mat
2
(Z),
which also has determinant p. Now as
g|
k
p 0
0 p
= g,
we know
hT
p
f, gi = p
k
2
1
(p + 1)hf |
k
δ, gi
= p
k
2
1
(p + 1)hf, g|
k
δ
1
i
= p
k
2
1
(p + 1)hf, g|
k
δ
a
i
= hf, T
p
gi
To prove the claim, we let
Γ(1)
p 0
0 1
Γ(1) =
a
0jp
Γ(1)δγ
i
for some γ
i
Γ(1). Then we have
hT
p
f, gi = p
k
2
1
*
X
j
f |
k
δγ
j
, g
+
= p
k
2
1
X
j
hf |
k
δγ
j
, g|
k
γ
j
i
= p
k
2
1
(p + 1)hf |
k
δ, gi,
using the fact that g |
k
γ
j
= g.
8.3 Examples of modular forms
We now look at some examples of (non-trivial) modular forms for different
subgroups. And the end we will completely characterize
M
2
0
(4)). This seems
like a rather peculiar thing to completely characterize, but it turns out this
understanding M
2
0
(4)) can allow us to prove a rather remarkable result!
Eisenstein series
At the beginning of the course, the first example of a modular form we had was
Eisenstein series. It turns out a generalized version of Eisenstein series will give
us more modular forms.
Definition (G
r,k
). Let k 3. Pick any vector r = (r
1
, r
2
) Q
2
. We define
G
r,k
(z) =
X
0
mZ
2
1
((m
1
+ r
1
)z + m
2
+ r
2
)
k
,
where
P
0
means we omit any m such that m + r = 0.
For future purposes, we will consider r as a row vector.
As before, we can check that this converges absolutely for
k
3 and
z H
.
This obviously depends only on r mod Z
2
, and
G
0,k
= G
k
.
Theorem.
(i) If γ Γ(1), then
G
r,k
|
k
γ = G
rγ,k
.
(ii) If Nr Z
2
, then G
r,k
M
k
(Γ(N)).
Proof.
(i) If g GL
2
(R)
+
and u R
2
, then
1
(u
1
z + u
2
)
k
|
k
g =
(det g)
k/2
((au
1
+ cu
2
)z + (bu
1
+ du
2
))
k
=
(det g)
k/2
(v
1
z + v
2
)
k
,
where v = n · g. So
G
r,k
|
k
γ =
X
0
m
1
(((m + r)
1
γ)z + ((m + r)γ)
2
)
k
=
X
m
0
1
((m
0
1
+ r
0
1
)z + m
0
2
+ r
0
2
)
k
= G
rγ,k
(z),
where m
0
= mγ and r
0
= rγ.
(ii)
By absolute convergence,
G
r,k
is holomorphic on the upper half plane.
Now if
Nr Z
2
and
γ
Γ(
N
), then
Nrγ N r
(
mod N
). So
rγ r
(mod Z
2
). So we have
G
r,k
|
k
γ = G
rγ,k
= G
r,k
.
So we get invariance under Γ(
N
). So it is enough to prove
G
r,k
is holo-
morphic at cusps, i.e.
G
r,k
|
k
γ
is holomorphic at
for all
γ
Γ(1). So it is
enough to prove that for all r, G
r,k
is holomorphic at .
We can write
G
r,k
=
X
m
1
+r
1
>0
+
X
m
1
+r
1
=0
+
X
m
1
+r
1
<0
!
1
((m
1
+ r
1
)z + m
2
+ r
2
)
k
.
The first sum is
X
m
1
+r
1
>0
=
X
m
1
>r
1
X
m
2
Z
1
([(m
1
+ r
1
)z + r
2
] + m
2
)
k
.
We know that (
m
1
+
r
1
)
z
+
r
2
H
. So we can write this as a Fourier series
X
m
1
>r
1
X
d1
(2π)
k
(k 1)!
d
k1
e
2πr
2
d
q
(m
1
+r
1
)d
.
We now see that all powers of q are positive. So this is holomorphic.
The sum over m
1
+ r
1
= 0 is just a constant. So it is fine.
For the last term, we have
X
m
1
+r
1
<0
=
X
m
1
<r
1
X
m
2
Z
(1)
k
((m
1
r
1
)z r
2
m
2
)
k
,
which is again a series in positive powers of q
m
1
r
1
.
ϑ functions
Our next example of modular forms is going to come from
ϑ
functions. We
previously defined a ϑ function, and now we are going to call it ϑ
3
:
ϑ
3
(z) = ϑ(z) =
X
nZ
e
πin
2
z
= 1 + 2
X
n1
q
n
2
/2
.
We proved a functional equation for this, which was rather useful.
Unfortunately, this is not a modular form. Applying elements of Γ(1) to ϑ
3
will give us some new functions, which we shall call ϑ
2
and ϑ
4
.
Definition (ϑ
2
and ϑ
4
).
ϑ
2
(z) =
X
nZ
e
πi(n+1/2)
2
z
= q
1/8
X
nZ
q
n(n+1)/2
= 2q
1/8
X
n0
q
n(n+1)/2
ϑ
4
(z) =
X
nZ
(1)
n
e
πin
2
z
= 1 + 2
X
n1
(1)
n
q
n
2
/2
.
Theorem.
(i) ϑ
4
(z) = ϑ
3
(z ± 1) and θ
2
(z + 1) = e
πi/4
ϑ
2
(z).
(ii)
ϑ
3
1
z
=
z
i
1/2
ϑ
3
(z)
ϑ
4
1
z
=
z
i
1/2
ϑ
2
(z)
Proof.
(i) Immediate from definition, e.g. from the fact that e
πi
= 1.
(ii)
The first part we’ve seen already. To do the last part, we use the Poisson
summation formula. Let
h
t
(x) = e
πt(x+1/2)
2
= g
t
x +
1
2
,
where
g
t
(x) = e
πtx
2
.
We previously saw
ˆg
t
(y) = t
1/2
e
πy
2
/t
.
We also have
ˆ
h
t
(y) =
Z
e
2πixy
g
t
x +
1
2
dx
=
Z
e
2πi(x1/2)y
g
t
(x) dx
= e
πiy
ˆg
t
(y).
So by the Poisson summation formula,
ϑ
2
(it) =
X
nZ
h
t
(n) =
X
nZ
ˆ
h
t
(n) =
X
nZ
(1)
n
t
1/2
e
πn
2
/t
= t
1/2
ϑ
4
i
t
.
There is also a
ϑ
1
, but we have
ϑ
1
= 0. Of course, we did not just invent
a funny name for the zero function for no purpose. In general, we can define
functions
ϑ
j
(
u, z
), and the
ϑ
functions we defined are what we get when we set
u = 0. It happens that ϑ
1
has the property that
ϑ
1
(u, z) = ϑ
1
(u, z),
which implies ϑ
1
(z) = 0.
We now see that the action of
SL
2
(
Z
) send us between
ϑ
2
,
ϑ
3
and
ϑ
4
, up to
simple factors.
Corollary.
(i) Let
F =
ϑ
4
2
ϑ
4
3
ϑ
4
4
.
Then
F (z + 1) =
1 0 0
0 0 1
0 1 0
F, z
2
F
1
z
=
0 0 1
0 1 0
1 0 0
F
(ii) ϑ
4
j
M
2
(Γ) for a subgroup Γ
Γ(1) of finite index. In particular,
ϑ
4
j
|
z
γ
is
holomorphic at for any γ GL
2
(Q)
+
.
Proof.
(i) Immediate from the theorem.
(ii)
We know
Γ(1)
=
hS, T i
, where
T
=
±(
1 1
0 1
)
and
S
=
±
0 1
1 0
. So by (i),
there is a homomorphism ρ : Γ(1) GL
3
(Z) and ρ(I) = I with
F |
2
γ = ρ(γ)F,
where
ρ
(
γ
) is a signed permutation. In particular, the image of
ρ
is finite,
so the kernel Γ = ker ρ has finite index, and this is the Γ we want.
It remains to check holomorphicity. But each
ϑ
j
is holomorphic at
.
Since
F |
2
γ
=
ρ
(
γ
)
F
, we know
ϑ
4
j
|
2
is a sum of things holomorphic at
,
and is hence holomorphic at .
It would be nice to be more specific about exactly what subgroup
ϑ
4
j
is
invariant under. Of course, whenever
γ
Γ, then we have
ϑ
4
j
|
2
γ
=
ϑ
4
j
. But in
fact ϑ
4
j
is invariant under a larger group.
To do this, it suffices to do it for
ϑ
4
=
ϑ
4
3
, and the remaining follow by
conjugation.
We introduce a bit of notation
Notation. We write
W
N
=
0 1
N 0
Note that in general, W
N
does not have determinant 1.
Theorem.
Let
f
(
z
) =
ϑ
(2
z
)
4
. Then
f
(
z
)
M
2
0
(4)), and moreover,
f |
2
W
4
=
f.
To prove this, we first note the following lemma:
Lemma. Γ
0
(4) is generated by
I, T =
1 1
0 1
, U =
1 0
4 1
= W
4
1 1
0 1
W
1
4
.
Four is special. This is not a general phenomenon.
Proof. It suffices to prove that Γ
0
(4) is generated by T and U = ±(
1 0
4 1
).
Let
γ = ±
a b
c d
Γ
0
(4).
We let
s(γ) = a
2
+ b
2
.
As
c
is even, we know
a
1 (
mod
2). So
s
(
γ
)
1, and moreover
s
(
γ
) = 1 iff
b = 0, a = ±1, iff γ = T
n
for some n.
We now claim that if
s
(
γ
)
6
= 1, then there exists
δ {T
±1
, U
±1
}
such that
s(γδ) < s(γ). If this is true, then we are immediately done.
To prove the claim, if s(γ) 6= 1, then note that |a| 6= |2b| as a is odd.
If
|a| < |
2
b|
, then
min{|b±a|} < |b|
. This means
s
(
γT
±1
) =
a
2
+(
b±a
)
2
<
s(γ).
If
|a| > |
2
b|
, then
min{|a ±
4
b|} < |a|
, so
s
(
γU
±1
) = (
a ±
4
b
)
2
+
b
2
<
s(γ).
Proof of theorem. It is enough to prove that
f |
2
T = f |
2
U = f.
This is now easy to prove, as it is just a computation. Since
ϑ
(
z
+ 2) =
ϑ
(
z
), we
know
f |
2
T = f(z + 1) = f (z).
We also know that
f |
2
W
4
= 4(4z)
2
f
1
4z
=
1
4z
2
ϑ
1
2z
4
= f(z),
as
ϑ
1
z
=
z
i
1/2
ϑ(z).
So we have
f |
2
U = f |
2
W
4
|
2
T
1
|
2
W
4
= (1)(1)f = f.
We look at Γ
0
(2) and Γ
0
(4) more closely. We have
Γ
0
(2) =

a b
2c d

=
γ Γ(1) : γ =
1
0 1
mod 2
.
We know
|SL
2
(
Z/
2
Z
)
|
= 6, and so (Γ(1) : Γ
0
(2)) = 3. We have coset representa-
tives
I,
0 1
0 1
,
1 0
1 1
.
We also have a map
Γ
0
(2) Z/2Z
a b
2c d
7→ c,
which one can directly check to be a homomorphism. This has kernel Γ
0
(4). So
we know
0
(2) : Γ
0
(4)) = 2, and has coset representatives
I,
1 0
2 1
So
Γ
0
(2) =
T, ±
1 0
2 1
= W
2
1 1
0 1
W
1
2
.
We can draw fundamental domains for Γ
0
(2):
and Γ
0
(4):
We are actually interested in Γ
0
(2) and Γ
0
(4) instead, and their fundamental
domains look “dual”.
Consider
g(z) = E
2
|
2
(
2 0
0 1
) E
2
= 2E
2
(2z) E
2
(z).
Recall that we had
E
2
(z) = 1 24
X
n1
σ
1
(n)q
n
= z
2
E
2
1
z
12
2πiz
.
Proposition. We have g M
2
0
(2)), and g|
2
W
2
= g.
Proof. We compute
g|
2
W
2
=
2
(2z)
2
g
1
2z
=
1
z
2
E
2
1
z
2
(2z)
2
E
2
1
2z
= E
2
(z) +
1
2πiz
2
E
2
(2z) +
12
2πi · 2z
= g(z).
We also have
g|
2
T = g(z + 1) = g(z),
and so
g|
2
(
1 0
2 1
) = g|
2
W
2
T
1
W
1
2
= g.
Moreover,
g
is holomorphic at
, and hence so is
g|
2
W
2
=
g
. So
g
is also
holomorphic at 0 =
W
2
(
). As
has width 1 and 0 has width 2, we see
that these are all the cusps, and so
g
is holomorphic at the cusps. So
g
M
2
0
(2)).
Now we can do this again. We let
h = g(2z) =
1
2
g|
2
(
2 0
0 1
) = 2E
2
(4z) E
2
(2z).
Since g M
2
0
(2)), this implies h M
2
0
(4)) M
0
0
(2)).
The functions
g
and
h
are obviously linearly independent. Recall that we
have
dim M
2
0
(4)) 1 +
k(Γ(1) : Γ
0
(4))
12
= 2.
So the inequality is actually an equality. We have therefore shown that
Theorem.
M
2
0
(4)) = Cg Ch.
Recall we also found an
f
(
z
) =
ϑ
(2
z
)
4
M
2
0
(4)). So we know we must
have
f = ag + bh
for some constants a, b C.
It is easy to find what
a
and
b
are. We just write down the
q
-expansions. We
have
f = ϑ(2z)
4
= (1 + 2q + 2q
4
+ ···)
4
= 1 + 8q + 24q
2
+ 32q
3
+ ···
g = 2E
2
(2z) E
2
(z)
= 1 + 24
X
n1
σ
1
(n)(q
n
2q
2n
)
= 1 + 24q + 24q
2
+ 96q
3
+ ···
h = g(2z)
= 1 + 24q
2
+ ···
By looking at the first two terms, we find that we must have
f =
1
3
g +
2
3
h =
1
3
(4E
2
(4z) E
2
(z)) = 1 + 8
X
k1
σ
1
(n) 4σ
1
n
4

q
n
,
where σ
1
n
4
= 0 if
n
4
6∈ Z.
But recall that
f =
X
nZ
q
n
2
!
4
=
X
a,b,c,dZ
q
a
2
+b
2
+c
2
+d
2
=
X
nN
r
4
(n)q
n
,
where
r
4
(
n
) is the number of ways to write
n
as a sum of 4 squares (where order
matters). Therefore,
Theorem (Lagrange’s 4-square theorem). For all n 1, we have
r
4
(n) = 8
σ
1
(n) 4σ
1
n
4

= 8
X
d|n4-d
d.
In particular, r
4
(n) > 0.
We could imagine ourselves looking at other sums of squares. For example,
we can look instead at
ϑ
2
(2
z
)
2
, which turns out to be an element of
M
1
1
(4)),
one can get a similar formula for the number of ways of writing
n
as a sum of 2
squares.
We can also consider higher powers, and then get approximate formulae for
r
2k
(
n
), because the appropriate Eisenstein series no longer generate
M
k
. There
may be a cusp form as well, which gives an error term.
In general, if we have
γ =
a b
Nc d
Γ
0
(N),
then we find
W
N
γW
1
N
d c
Nb a
Γ
0
(N).
So
W
N
normalizes the group Γ
0
(
N
). Then if
f M
k
0
(
N
)), then
f |
k
W
N
M
k
0
(N)), and this also preserves cusp forms.
Moreover, we have
f |
k
W
2
N
= f |
k
N 0
0 N
= f,
as I Γ
0
(N). So
M
k
0
(N)) = M
k
0
(N))
+
M
k
0
(N))
,
where we split into the (
±
1)-eigenspaces for
W
N
, and the cusp forms decompose
similarly. This
W
N
is the Atkin-Lehner involution. This is the “substitute” for
the the operator S =
0 1
1 0
in Γ(1).
9 Hecke theory for Γ
0
(N)
Note that it is possible to do this for other congruence subgroups. The key case
is
Γ
1
(N) =

a b
c d
SL
2
(Z) : c 0 (mod N ), d, a 1 (mod N)
What is special about this? There are two things
The map
a b
c d
7→ d mod N
is a homomorphism Γ
0
(
N
)
(
Z/nZ
)
×
, and
the kernel is Γ
1
(N).
So we can describe
S
k
1
(N)) =
M
χ
\
(Z/NZ)
×
S
k
1
(N), χ),
where f S
k
1
(n), χ) if
f |
k
a b
c d
= χ(d)f for all
a b
c d
Γ
0
(N).
Of course,
S
k
1
(
N
)
, χ
trivial
) =
S
k
0
(
N
)). In general, everything we can do for
Γ
0
(N) can be done for S
k
1
(N), χ).
But why not study Γ(N ) itself? We can check that
1 0
0 N
Γ(N)
1 0
0 N
1
Γ
1
(N
2
).
So we can go from modular forms on Γ(N) to ones on Γ
1
(N
0
).
For various representation-theoretic reasons, things work much better on
Γ
1
(N).
Last time, we used in various places the matrix
W
N
=
0 1
N 0
.
Then we decomposed
S
k
0
(N)) = S
k
2
(N))
+
S
k
0
(N))
,
according to the
±
-eigenspaces of the operator
W
N
. A while ago, we proved
some theorem about the function equation of L-functions.
Theorem. Let f S
k
0
(N))
ε
, where ε = ±1. Then define
L(f, s) =
X
n1
a
n
n
s
.
Then L(f, s) is am entire function, and satisfies the functional equation
Λ(f, s) = (2π)
s
Γ(s)L(f, s) = ε(N)
k/2
Λ(f, k s).
Proof. We have f |
k
W
N
= εf, and then we can apply our earlier result.
This is a rather remarkable thing. We didn’t really use much about the
properties of f.
Now we want to talk about Hecke operators on Γ
0
(
N
). These are a bit more
complicate. It is much better to understand these in terms of representation
theory instead, but that is outside the scope of the course. So we will just state
the relevant definitions and results.
Recall that a modular form of level 1 is defined by the
q
-expansion, and if
what we have is in fact a Hecke eigenform, then it suffices to know the Hecke
eigenvalues, i.e. the values of
a
p
. We describe this as having “multiplicity one”.
Theorem
(Strong multiplicity one for
SL
2
(
Z
))
.
Let
f, g S
k
(Γ(1)) be normal-
ized Hecke eigenforms, i.e.
f|T
p
= λ
p
f λ
p
= a
p
(f)
g|T
p
= µ
p
g µ
p
= a
p
(g).
Suppose there exists a finite set of primes
S
such that such that for all
p 6∈ S
,
then λ
p
= µ
p
. Then f = g.
Note that since the space of modular forms is finite dimensional, we know
that the modular forms can only depend on finitely many of the coefficients. But
this alone does not give the above result. For example, it might be that
a
2
(
f
) is
absolutely crucial for determining which modular form we are, and we cannot
miss it out. The strong multiplicity one theorem says this does not happen.
Idea of proof. We use the functional equations
Λ(f, k s) = (1)
k/2
Λ(f, s)
Λ(g, k s) = (1)
k/2
Λ(g, s)
So we know
L(f, k s)
L(f, s)
=
L(g, k s)
L(g, s)
.
Since these are eigenforms, we have an Euler product
L(f, s) =
Y
p
(1 λ
p
p
s
+ p
k12s
)
1
,
and likewise for g. So we obtain
Y
p
1 λ
p
p
sk
+ p
2sk1
1 λ
p
p
s
+ p
k12s
=
Y
p
1 µ
p
p
sk
+ p
2sk1
1 µ
p
p
s
+ p
k12s
.
Now we can replace this
Q
p
with
Q
pS
. Then we have some very explicit
rational functions, and then by looking at the appropriate zeroes and poles, we
can actually get λ
p
= µ
p
for all p.
This uses L-functions in an essential way.
The reason we mention this here is that a naive generalization of this theorem
does not hold for, say, Γ
0
(
N
). To even make sense of this statement, we need to
say what the Hecke operators are for Γ
0
(
N
). We are going to write the definition
in a way as short as possible.
Definition (Hecke operators on Γ
0
(N)). If p - N, we define
T
p
f = p
k
2
1
f |
k
p 0
0 1
+
p1
X
k=0
f |
k
1 b
0 p
!
which is the same as the case with Γ(1).
When p | N, then we define
U
p
f = p
k
2
1
p1
X
n=0
f |
k
1 b
0 p
.
Some people call this T
p
instead, and this is very confusing.
We can compute the effect on
q
-expansions as follows when
p - N
, then
we have
a
n
(T
p
f) = a
np
(f) + p
k1
a
n/p
(f),
where the second term is set to 0 if p - n. If p | N, then we have
a
n
(U
p
f) = a
np
(f).
Proposition. T
p
, U
p
send S
k
0
(N)) to S
k
0
(N)), and they all commute.
Proof. T
p
, U
p
do correspond to double coset actions
Γ
0
(N)
1 0
0 p
Γ
0
(N) =
(
Γ
0
(N)
p 0
0 1
q
`
b
Γ
0
(N)
1 b
0 p
p - N
`
b
Γ
0
(N)
1 b
0 p
p | N
.
Commutativity is checked by carefully checking the effect on the
q
-expansions.
However, these do not generate all the Hecke operators. For example, we
have W
N
!
Example. Consider S
12
0
(2)). This contains f = ∆(z) and
g = f |
12
(
2 0
0 1
) = 2
6
∆(2z) = ∆ |
12
W
2
,
using the fact that
|
k
0 1
1 0
= ∆.
So the matrix of W
2
on span{f, g} is
0 1
1 0
.
We can write out
f =
X
τ(n)q
n
= q 24q
2
+ 252q
3
1472q
4
+ ···
g = 2
6
X
τ(n)q
2n
= 2
6
(q
2
+ 24q
4
+ ···)
So we find that
U
2
g = 2
6
f.
It takes a bit more work to see what this does on f . We in fact have
U
2
f =
X
τ(2n)q
n
= 24q 1472q
4
+ ··· = 24f 32g.
So in fact we have
U
2
=
24 64
32 0
.
Now
U
2
and
W
2
certainly do not commute. So the Hecke algebra is not com-
mutative. In fact, generates a two-dimensional representation of the Hecke
algebra.
This makes life much worse. When we did Hecke algebras for Γ(1), all our
representations are 1-dimensional, and we can just work with linear spans. Now
everything has higher dimensional, and things go rather wrong. Similarly, we
can consider ∆(
dz
)
S
12
0
(
N
)) for any
d | N
, and this gives a whole bunch of
things like this.
This turns out to be the only obstruction to the commutativity of the action
of the Hecke algebra. We know S
k
0
(N)) contains
{f(dz) : f S
k
0
(M)), dM | N, M 6= N }.
We let
S
k
0
(
N
))
old
be the span of these. These are all the forms that come
from a smaller level.
Now
S
k
0
(
N
)) has an inner product! So the natural thing to do is to
consider the orthogonal complement of
S
k
0
(
N
))
old
, and call it
S
k
0
(
N
))
new
.
Theorem
(Atkin–Lehner)
.
The Hecke algebra
H
(
G,
Γ
0
(
N
)) fixes
S
k
0
(
N
))
new
and
S
k
0
(
N
))
old
, and on
S
k
0
(
N
))
new
, it acts as a commutative subalgebra
of the endomorphism ring, is closed under adjoint, and hence is diagonalizable.
Moreover, strong multiplicity one holds, i.e. if
S
is a finite set of primes, and we
have
{λ
p
:
p 6∈ S}
given, then there exists at most one
N
1 and at most one
f S
k
0
(N), 1)
new
(up to scaling, obviously) for which
T
p
f = λ
p
f for all p - N, p 6∈ S.
10 Modular forms and rep theory
In this final chapter, we are going to talk about the relation between modular
forms and representation theory. The words “representation theory” are a bit
vague. We are largely going to talk about automorphic representations, and this
is related to Langlands programme.
Recall that f is a modular form on SL
2
(Z) if
(i) f is holomorphic H C
(ii) f |
k
γ = (cz + d)
k
f(γ(z)) = f(z) for all
γ =
a b
c d
SL
2
(Z)
(iii) It satisfies suitable growth conditions at the cusp .
Let’s look at the different properties in turn. The second is the modularity
condition, which is what gave us nice properties like Hecke operators. The
growth condition is some “niceness” condition, and for example this gives the
finite-dimensionality of the space of modular forms.
But how about the first condition? It seems like an “obvious” condition to
impose, because we are working on the complex plane. Practically speaking,
it allows us to use the tools of complex analysis. But what if we dropped this
condition?
Example. Recall that we had an Eisenstein series of weight 2,
E
2
(z) = 1 24
X
n1
σ
1
(n)q
n
.
This is not a modular form. Of course, we have
E
2
(
z
) =
E
2
(
z
+ 1), but we saw
that
E
2
1
z
z
2
E
2
(z) =
12z
2πi
6= 0.
However, we can get rid of this problem at the expense of making a non-
holomorphic modular form. Let’s consider the function
f(z) =
1
y
=
1
Im(z)
= f(z + 1).
We then look at
f
1
z
z
2
f(z) =
|z|
2
y
z
2
y
=
z(¯z z)
y
= 2iz.
Aha! This is the same equation as that for
E
2
apart from a constant factor. So
if we let
˜
E
2
(z) = E
2
(z)
3
πy
,
then this satisfies
˜
E
2
(z) =
˜
E
2
(z + 1) = z
2
˜
E
2
1
z
.
The term
3
πy
certainly tends to 0 rapidly as
|z|
, so if we formulate the
growth condition in (iii) without assuming holomorphicity of
f
, then we will find
that
˜
E
2
satisfies (ii) and (iii), but not (i). This is an example of a non-holomorphic
modular form of weight 2.
Perhaps this is a slightly artificial example, but it is one.
Let’s explore what happens when our functions satisfy (ii) and (iii), but not
(i).
Definition
(Non-holomorphic modular forms)
.
We let
W
k
(Γ(1)) be the set of
all C
functions H C such that
(ii) f |
k
γ = f for all γ Γ(1)
(iii) f
(
x
+
iy
) =
O
(
y
R
) as
y
for some
R >
0, and the same holds for all
derivatives.
Note that the notation is not standard.
Before we proceed, we need to introduce some notation from complex analysis.
As usual, we write z = x + iy, and we define the operators
z
=
1
2
x
+
i∂y
¯z
=
1
2
x
i∂y
.
We can check that these operators satisfy
z
z
=
¯z
¯z
= 1,
¯z
z
=
z
¯z
= 0.
Moreover, the Cauchy–Riemann equations just says
f
¯z
= 0, and if this holds,
then the complex derivative is just
f
z
. Thus, if we are working with potentially
non-holomorphic functions on the complex plane, it is often useful to consider
the operators
z
and
¯z
separately.
Using this notation, given f W
k
, we have
f M
k
f
¯z
= 0.
So suppose
f
is not holomorphic, then
f
¯z
6
= 0. We can define a new operator by
L
k
(f) = 2iy
2
f
¯z
.
Note that this is slightly strange, because we have a subscript
k
, but the function
doesn’t depend on
k
. Also, we put a star up there for some reason. It turns out
there is a related operator called
L
k
, which does depend on
k
, and this
L
k
is a
slight modification that happens not to depend on k.
This has the following properties:
Proposition.
We have L
k
f = 0 iff f is holomorphic.
If f W
K
(Γ(1)), then g L
k
f W
k2
(Γ(1)).
Thus, L
k
is a “lowering” operator.
Proof. The first part is clear. For the second part, note that we have
f(γ(z)) = (cz + d)
k
f(z).
We now differentiate both sides with respect to
¯z
. Then (after a bit of analysis),
we find that
(c¯z + d)
2
f
¯z
(γ(z)) = (cz + d)
k
f
¯z
.
On the other hand, we have
(Im γ(z))
2
=
y
2
|cz + d|
4
.
So we find
g(γ(z)) = 2i
y
2
|2z + d|
4
(c¯z + d)
2
(cz + d)
k
f
¯z
= (cz + d)
k2
g(z).
The growth condition is easy to check.
Example. Consider
˜
E
2
defined previously. Since E
2
is holomorphic, we have
L
k
˜
E
2
=
6i
π
y
2
¯z
1
y
= constant,
which is certainly a (holomorphic) modular form of weight 0.
In general, if
L
k
f
is actually holomorphic, then it is in
M
k2
. Otherwise, we
can just keep going! There are two possibilities:
For some 0 ` <
k
2
, we have
0 6= L
k2`
···L
k2
L
k
f M
k2`
.
The function
g
=
L
2
L
4
···L
k
f W
0
(Γ(1)), and is non-zero. In this case,
g(γ(z)) = g(z) for all γ SL
2
(Z).
What does
W
0
(Γ(1)) look like? Since it is invariant under Γ(1), it is just a
C
function on the fundamental domain
D
satisfying suitable
C
conditions on the
boundary. This space is huge. For example, it contains any
C
function on
D
vanishing in a neighbourhood of the boundary.
This is too big. We want to impose some “regularity” conditions. Previously,
we imposed a very strong regularity condition of holomorphicity, but this is too
strong, since the only invariant holomorphic functions are constant.
A slightly weaker condition might be to require it is harmonic, i.e.
˜
f
=
2
f
x
2
+
2
f
y
2
= 0. However, the maximum principle also implies f must vanish.
A weaker condition would be to require that
f
is an eigenfunction of
˜
, but
there is a problem that
˜
is not invariant under Γ(1). It turns out we need a
slight modification, and take
∆ = y
2
2
x
2
+
2
y
2
.
It is a straightforward verification that this is indeed invariant under
SL
2
(
R
), i.e.
∆(f(γ(z))) = (∆f)(γ(z)).
In fact, this is just the Laplacian under the hyperbolic metric.
Definition
(Maass form)
.
A Maass form on
SL
2
(
Z
) is an
f W
0
(Γ(1)) such
that
f = λf
for some λ C.
There are interesting things we can prove about these. Recall that our
first examples of modular forms came from Eisenstein series. There are also
non-holomorphic Eisenstein series.
Example. Let s C and Re(s) > 0. We define
E(z, s) =
1
2
X
(c,d)=1c,dZ
y
s
|cz + d|
2s
=
1
2
X
γ=±
(
c d
)
∈±
(
1
0 1
)
\PSL
2
(Z)
(Im γ(z))
s
.
It is easy to see that this converges. From the second definition, we see that
E
(
z, s
) is invariant under Γ(1), and after some analysis, this is
C
and satisfies
the growth condition.
Finally, we check the eigenfunction condition. We can check
y
s
= y
2
2
y
2
(y
s
) = s(1 s)y
s
.
But since is invariant under SL
2
(R), it follows that we also have
E(z, s) = s(1 s)E(z, s).
In the case of modular forms, we studied the cusp forms in particular. To
study similar phenomena here, we look at the Fourier expansion of
f
. We have
the periodicity condition
f(x + iy + 1) = f (x + iy).
Since this is not holomorphic, we cannot expand it as a function of
e
2πiz
. However,
we can certainly expand it as a function in e
2πix
. Thus, we write
f(x + iy) =
X
n=−∞
F
n
(y)e
2πinx
.
This looks pretty horrible, but now recall that we had the eigenfunction condition.
Then we have
λf = ∆f = y
2
X
n=−∞
(F
00
n
(y) 4πn
2
F
n
(y))e
2πinx
.
This tells us F
n
(y) satisfies the differential equation
y
2
F
00
n
(y) + (λ 4π
2
n
2
y
2
)F
n
(y) = 0. ()
It isn’t terribly important what exactly the details are, but let’s look what
happens in particular when n = 0. Then we have
y
2
F
00
0
+ λF
0
= 0.
This is pretty easy to solve. The general solution is given by
F
0
= Ay
s
+ By
s
0
,
where s and s
0
= 1 s are the roots of s(1 s) = λ.
What about the other terms? We see that if
y
is large, then, if we were an
applied mathematician, then we would say the
λF
(
y
) term is negligible, and
then the equation looks like
F
00
n
(y) = 4π
2
n
2
F (y).
This has two independent solutions, and they are
e
±2πny
. It is in fact true that
the true solutions to the equation grow as
e
±2πny
for large
y
. To satisfy the
growth condition, we must only pick those that grow as
e
2πny
. We call this
κ
|n|
(y). These are known as the Bessel functions.
Thus, we find that we have
f(z) = Ay
s
+ By
1s
| {z }
“constant term”
+
X
n6=0
a
n
(f)κ
|n|
(y)e
2πinx
.
The exact form isn’t really that important. The point is that we can separate
out these “constant terms”. Then it is now not difficult to define cusp forms.
Definition
(Cusp form)
.
A Maass form is a cusp form if
F
1
= 0, i.e.
A
=
B
= 0.
Similar to modular forms, we have a theorem classifying Maass cusp forms.
Theorem
(Maass)
.
Let
S
Maass
(Γ(1)
, λ
) be the space of Maass cusp forms with
eigenvalue
λ
. This space is finite-dimensional, and is non-zero if and only if
λ {λ
n
: n 0}, where {λ
n
} is a sequence satisfying
0 < λ
0
< λ
1
< λ
2
< ··· .
Given this, we can define Hecke operators just as for holomorphic forms (this
is easier as
k
= 0), and most of the theory we developed for modular forms carry
over.
Even though we proved all these very nice properties of these cusps forms, it
took people a lot of time to actually come up with such a cusp form! Nowadays,
we are able to compute these with the aid of computers, and there exists tables
of λ’s and Hecke eigenforms.
Now recall that we had this mysterious operator
L
k
= 2iy
2
¯z
,
which had the property that if f |
k
γ = f , then (L
k
f) |
k2
γ = (L
k
f).
With a bit of experimentation, we can come up with something that raises
the weight.
Definition (R
k
). Define
R
k
= 2i
z
+
1
y
k.
Now this has a property that
Proposition. If f |
k
γ = f , then (R
k
f) |
k+2
γ = R
k
f.
Note that this time, since we are differentiating against
z
, the
cz
+
d
term
will be affected, and this is where the
1
y
k term comes in.
Suppose we have
f
=
f
0
M
k
(Γ(1)). Then we can apply
R
to it to obtain
f
1
= R
k
f
0
. We can now try to apply L
k+2
to it. Then we have
L
k+2
R
k
f = 2iy
2
¯z
2if
0
+
k
y
f
= 2iy
2
kf
y
1
¯z
= kf.
So we don’t get anything new.
But of course, we can continue in the other direction. We can recursively
obtain
f
2
= R
k+2
f
1
, f
3
= R
k+4
f
2
, ··· .
Then we can compute L
k+2n
and R
k+2n
of these, and we find that
(R
L
L
R
)f
n
= (k + 2n)f
n
.
This looks suspiciously like the representation of the Lie algebra of
sl
2
, where we
have operators that raise and lower weights. The only slightly non-trivial part is
that this is an infinite-dimensional representation, as we can keep on raising and
(at least in general) it doesn’t get to 0.
It turns out it is much easier to make sense of this by replacing functions on
H
with functions on
G
=
SL
2
(
R
). By the orbit-stabilizer theorem, we can write
H = G/K, where
K = SO(2) = {g SL
2
(R) : g(i) = 1} =
r
θ
=
cos θ sin θ
sin θ cos θ

.
Recall that we defined the function
j
(
γ, z
) =
cz
+
d
, where
γ
=
a b
c d
. This
satisfied the property
j(γδ, z) = j(γ, δ(z))j(δ, z).
The main theorem is the following:
Proposition.
For Γ
Γ(1), there is a bijection between functions
f
:
H C
such that
f |
k
γ
=
f
for all
γ
Γ, and functions Φ :
G C
such that Φ(
γg
) = Φ(
g
)
for all γ Γ and Φ(gr
θ
) = e
ikθ
Φ(g).
The real reason of this is that such an
f
is a section of a certain line bundle
L
k
on Γ
\H
= Γ
\G/K
. The point is that this line bundle can be made trivial
either by pulling to
H
=
G/K
, or to Γ
\G
. Of course, to actually prove it, we
don’t need such fancy language. We just need to write down the map.
Proof. Given an f, we define
Φ(g) = (ci + d)
k
f(g(i)) = j(g, i)
k
f(g(i)).
We can then check that
Φ(γg) = j(γg, i)
k
f(γ(g(i)))
= j(γg, i)
k
j(γ, g(i))
k
f(g(i))
= Φ(g).
On the other hand, using the fact that r
θ
is in the stabilizer of i, we obtain
Φ(gr
θ
) = j(gr
θ
, i)
k
f(gr
θ
(i))
= j(gr
θ
, i)
k
f(g(i))
= j(g, r
θ
(i))j(r
θ
, 1)f(g(i))
= Φ(g)j(r
θ
, i)
k
.
But j(r
θ
, i) = sin θ + cos θ. So we are done.
What we can do with this is that we can cast everything in the language of
these functions on
G
. In particular, what do these lowering and raising operators
do? We have our
C
function Φ : Γ
\G C
. Now if
X g
=
sl
2
(
R
), then this
acts on Φ by differentiation, since that’s how Lie algebras and Lie groups are
related. Explicitly, we have
XΦ =
d
dt
t=0
Φ(ge
Xt
).
When we compute these things explicitly, we find that, up to conjugacy,
L
and
R
just correspond to the standard elements
X
=
0 0
1 0
, X
+
=
0 1
0 0
sl
2
,
and we have
[X
+
, X
] = 2H, H =
1 0
0 1
.
Then the weight k is just corresponds to H.
What does the Laplacian correspond to? It corresponds to a certain product
of these operators, the Casimir operator , given by
Ω = X
+
X
+ X
X
+
+
1
2
H
2
.
This leads to the notion of automorphic forms.
Definition
(Automorphic form)
.
An automorphic form on Γ is a
C
function
Φ : Γ\G C such that Φ(gr
θ
) = e
ikθ
Φ(g) for some k Z such that
ΩΦ = λΦ
for some λ C, satisfying a growth condition given by
Φ
a b
c d
polynomial in a, b, c, d.
The condition of this being a cusp function is then
Z
1
0
Φ

1 x
0 1
g
dx = 0.
These things turn out to be exactly what we’ve looked at before.
Proposition. The set of cuspoidal automorphic forms bijects with representa-
tions of
sl
2
generated by holomorphic cusp forms
f
and their conjugates
¯
f
, and
Maass cusp forms.
The holomorphic cusp forms
f
generate a representation of
sl
2
with lowest
weight; The conjugates of holomorphic cusp forms generate those with highest
weight, while the Maass forms generate the rest.
This is now completely susceptible to generalization. We can replace
G,
Γ
with any semi-simple Lie group (e.g.
SL
n
(
R
),
Sp
2n
(
R
)), and Γ by some arithmetic
subgroup. This leads to the general theory of automorphic forms, and is one
half of the Langlands’ program.