5Electromagnetism and relativity
IB Electromagnetism
5.1 A review of special relativity
5.1.1 A geometric interlude on (co)vectors
Let’s first look at normal, Euclidean geometry we know from IA Vectors and
Matrices. We all know what a vector is. A vector is, roughly, a direction. For
example, the velocity of a particle is a vector — it points in the direction where
the particle is moving. On the other hand, position is not quite a vector. It is a
vector only after we pick an “origin” of our space. Afterwards, we can think of
position as a vector pointing the direction from the origin to where we are.
Perhaps a slightly less familiar notion is that of a covector. A covector is
some mathematical object that takes in a vector and spits out a number, and it
has to do this in a linear way. In other words, given a vector space
V
, a covector
is a linear map V → R.
One prominent example is that the derivative d
f
of a function
f
:
R
n
→ R
at
(say) the origin is naturally a covector! If you give me a direction, the derivative
tells us how fast
f
changes in that direction. In other words, given a vector
v
,
df(v) is the directional derivative of f in the direction of v.
But, you say, we were taught that the derivative of
f
is the gradient, which
is a vector. Indeed, you’ve probably never heard of the word “covector” before.
If we wanted to compute the directional derivative of
f
in the direction of
v
, we
simply compute the gradient
∇f
, and then take the dot product
∇f · v
. We
don’t need to talk about covectors, right?
The key realization is that to make the last statement, we need the notion of
a dot product, or inner product. Once we have an inner product, it is an easy
mathematical fact that every covector L : V → R is uniquely of the form
L(v) = v ·w
for some fixed
w ∈ V
, and conversely any vector gives a covector this way.
Thus, whenever we have a dot product on our hands, the notion of covector is
redundant — we can just talk about vectors.
In special relativity, we still have an inner product. However, the inner
product is not a “genuine” inner product. For example, the inner product of a
(non-zero) vector with itself might be zero, or even negative! And in this case,
we need to be careful about the distinction between a vector and a covector, and
it is worth understanding the difference more carefully.
Since we eventually want to do computations, it is extremely useful to look
at these in coordinates. Suppose we have picked a basis
e
1
, ··· , e
n
. Then by
definition of a basis, we can write any vector
v
as
v
=
P
v
i
e
i
. These
v
i
are
the coordinates of
v
in this coordinate system, and by convention, we write the
indices with superscript. We also say they have upper indices.
We can also introduce coordinates for covectors. If
L
is a covector, then we
can define its coordinates by
L
i
= L(e
i
).
By convention, the indices are now written as subscripts, or lower indices. Using
the summation convention, we have
L(v) = L(v
i
e
i
) = v
i
L(e
i
) = v
i
L
i
.
Previously, when we introduced the summation convention, we had a “rule” that
each index can only appear at most twice, and we sum over it when it is repeated.
Here we can refine this rule, and give good meaning to it:
Rule. We can only contract an upper index with a lower index.
The interpretation is that “contraction” really just means applying a covector
to a vector — a very natural thing to do. It doesn’t make sense to “apply” a
vector to a vector, or a covector to a covector. It also doesn’t make sense to
repeat the same index three times, because we can only apply a single covector
to a single vector.
It is common to encounter some things that are neither vectors nor covectors,
but we still want to apply the summation convention to them. For example,
we want to write
v
=
v
i
e
i
, even though
e
i
is not a covector. It turns out in all
cases we encounter, there is one choice of upper or lower indexing that makes
it consistent with our summation convention. For example, we should write
e
i
,
not e
i
, so that v = v
i
e
i
works out.
We said previously that the existence of an inner product allows us to convert
a covector into a vector, and vice versa. So let’s see how inner products work in
a choice of basis.
If the basis we picked were orthonormal, then for any vectors
v
and
w
, we
simply have
v ·w
=
v
T
w
. Alternatively, we have
v ·w
=
P
i
v
i
w
i
. If our basis
were not orthonormal (which is necessarily the case in SR), we can define the
matrix η by
η
ij
= e
i
· e
j
.
We will later say that
η
is a (0
,
2)-tensor, after we defined what that means.
The idea is that it takes in two vectors, and returns a number (namely the
inner product of them). This justifies our choice to use lower indices for both
coordinates. For now, we can argue for this choice by noting that the indices on
e
i
and e
j
are already lower.
Using this η, we can compute
v ·w = (v
i
e
i
) · (w
j
e
j
) = v
i
w
j
(e
i
· e
j
) = w
i
w
j
η
ij
.
In other words, we have
v ·w = v
T
ηw.
We see that this matrix
η
encodes all the information about the inner product
in this basis. This is known as the metric. If we picked an orthonormal basis,
then η would be the identity matrix.
Now it is easy to convert a vector into a covector. The covector (
− · w
) is
given by
v ·w
=
v
i
(
w
j
η
ij
). We can then read off the coordinates of the covector
to be
w
i
= w
j
η
ij
.
In general, these coordinates
w
i
are not the same as
w
i
. This is generally
true only if
η
ij
is the identity matrix, i,e. the basis is orthonormal. Thus,
distinguishing between vectors and covectors now has a practical purpose. Each
“vector” has two sets of coordinates — one when you think of it as a vector, and
one when you turn it into a covector, and they are different. So the positioning
of the indices help us keep track of which coordinates we are talking about.
We can also turn a covector
w
i
back to a vector, if we take the inverse of the
matrix η, which we will write as η
ij
. Then
w
i
= w
j
η
ij
.
5.1.2 Transformation rules
It is often the case that in relativity, we can write down the coordinates of an
object in any suitable basis. However, it need not be immediately clear to us
whether that object should be a vector or covector, or perhaps neither. Thus,
we need a way to identify whether objects are vectors or covectors. To do so, we
investigate how the coordinates of vectors and covectors when we change basis.
By definition, if we have a change of basis matrix
P
, then the coordinates of
a vector transform as v 7→ P v, or equivalently
v
i
7→ v
0i
= P
i
j
v
j
.
How about covectors? If
L
is a covector and
v
is vector, then
L
(
v
) is a number.
In particular, its value does not depend on basis. Thus, we know that the sum
L
i
v
i
must be invariant under any change of basis. Thus, if
L
i
7→
˜
L
i
, then we
know
˜
L
i
P
i
j
v
j
= L
i
v
j
.
Thus, we know
˜
L
i
P
i
j
=
L
i
. To obtain
˜
L
i
, we have to invert
P
i
j
and multiply
L
i
with that.
However, our formalism in terms of indices does not provide us with a way
of denoting the inverse of a matrix pleasantly. We can avert this problem by
focusing on orthogonal matrices, i.e. the matrices that preserve the metric. We
say P is orthogonal if
P
T
ηP = η,
or equivalently,
P
i
j
η
ik
P
k
`
= η
j`
.
This implies the inverse of P has coordinates
P
i
j
= (η
−1
P
T
η)
i
j
= η
i`
η
jk
P
`
k
,
which is the fancy way of describing the “transpose” (which is not the literal
transpose unless
η
=
I
). This is just reiterating the fact that the inverse of an
orthogonal matrix is its transpose. When we do special relativity, the orthogonal
matrices are exactly the Lorentz transformations.
Thus, we find that if P is orthogonal, then covectors transform as
L
i
7→ L
0
i
= P
i
j
L
j
.
We can now write down the “physicists’ ” definition of vectors and covectors.
Before we do that, we restrict to the case of interest in special relativity. The
reason is that we started off this section with the caveat “in any suitable basis”.
We shall not bother to explain what “suitable” means in general, but just do it
in the case of interest.
5.1.3 Vectors and covectors in SR
Recall from IA Dynamics and Relativity that in special relativity, we combine
space and time into one single object. For example, the position and time of an
event is now packed into a single 4-vector in spacetime
X
µ
=
ct
x
y
z
.
Here the index
µ
ranges from 0 to 3. In special relativity, we use Greek alphabets
(e.g.
µ, ν, ρ, σ
) to denote the indices. If we want to refer to the spacial components
(1, 2, 3) only, we will use Roman alphabets (e.g. i, j, k) to denote them.
As we initially discussed, position is very naturally thought of as a vector, and
we will take this as our starting postulate. We will then use the transformation
rules to identify whether any other thing should be a vector or a covector.
In the “standard” basis, the metric we use is the Minkowski metric, defined
by
η
µν
=
+1 0 0 0
0 −1 0 0
0 0 −1 0
0 0 0 −1
. (∗)
This is not positive definite, hence not a genuine inner product. However, it is
still invertible, which is what our previous discussion required. This means, for
example,
X · X = (ct)
2
− (x
2
+ y
2
+ z
2
),
the spacetime interval.
Definition
(Orthonormal basis)
.
An orthonormal basis of spacetime is a basis
where the metric takes the form (
∗
). An (orthonormal) coordinate system is a
choice of orthonormal basis.
Definition
(Lorentz transformations)
.
A Lorentz transformation is a change-of-
basis matrix that preserves the inner product, i.e. orthogonal matrices under the
Minkowski metric.
Thus, Lorentz transformations send orthonormal bases to orthonormal bases.
For example, the familiar Lorentz boost
ct
0
= γ
ct −
v
c
x
x
0
= γ
x −
v
c
ct
y
0
= y
z
0
= z
is the Lorentz transformation given by the matrix
Λ
µ
ν
=
γ −γv/c 0 0
−γv/c γ 0 0
0 0 1 0
0 0 0 1
Other examples include rotations of the space dimensions, which are given by
matrices of the form
Λ
µ
ν
=
1 0 0 0
0
0 R
0
,
with R a rotation matrix.
We can now write down our practical definition of vectors and covectors.
Definition
(Vectors and covectors)
.
A vector is an assignment of 4 numbers
V
µ
, µ
= 0
,
1
,
2
,
3 to each coordinate system such that under a change of basis by
Λ, the coordinates V
µ
transform as V
µ
7→ Λ
µ
ν
V
ν
.
A covector is an assignment of 4 numbers
V
µ
, µ
= 0
,
1
,
2
,
3 to each coordinate
system such that under a change of basis by Λ, the coordinates
V
µ
transform as
V
µ
7→ Λ
µ
ν
V
ν
.
Example. By assumption, the position X
µ
is a vector.
Example.
Suppose we have a trajectory of a particle
X
µ
(
s
) in spacetime. Then
d
ds
X
µ
(s) is also a vector, by checking it transforms.
Finally, we would want to be able to talk about tensors. For example, we
want to be able to talk about
X
µ
X
ν
. This is an assignment of 16 numbers
indexed by µ, ν = 0, 1, 2, 3 that transforms as
X
µ
X
ν
7→ Λ
µ
ρ
Λ
ν
σ
X
ρ
X
σ
.
We would also like to talk about
η
µν
as a tensor. We can make the following
general definition:
Definition (Tensor). A tensor of type (m, n) is a quantity
T
µ
1
···µ
n
ν
1
···ν
n
which transforms as
T
0µ
1
···µ
n
ν
1
···ν
n
= Λ
µ
1
ρ
1
···Λ
µ
m
ρ
m
Λ
ν
1
σ
1
···Λ
ν
n
σ
n
× T
ρ
1
,··· ,ρ
m
σ
1
,··· ,σ
n
.
As we saw, we can change the type of a tensor by raising and lowering indices
by contracting with
η
µν
or its inverse. However, the total
n
+
m
will not be
changed.
Finally, we introduce the 4-derivative.
Definition (4-derivative). The 4-derivative is
∂
µ
=
∂
∂X
µ
=
1
c
∂
∂t
, ∇
.
As we previously discussed, the derivative ought to be a covector. We can also
verify this by explicit computation using the chain rule. Under a transformation
X
µ
7→ X
0µ
, we have
∂
µ
=
∂
∂X
µ
7→
∂
∂X
0µ
=
∂X
ν
∂X
0µ
∂
∂X
ν
= (Λ
−1
)
ν
µ
∂
ν
= Λ
µ
ν
∂
ν
.