5Electromagnetism and relativity

IB Electromagnetism

5.1 A review of special relativity

5.1.1 A geometric interlude on (co)vectors

Let’s first look at normal, Euclidean geometry we know from IA Vectors and

Matrices. We all know what a vector is. A vector is, roughly, a direction. For

example, the velocity of a particle is a vector — it points in the direction where

the particle is moving. On the other hand, position is not quite a vector. It is a

vector only after we pick an “origin” of our space. Afterwards, we can think of

position as a vector pointing the direction from the origin to where we are.

Perhaps a slightly less familiar notion is that of a covector. A covector is

some mathematical object that takes in a vector and spits out a number, and it

has to do this in a linear way. In other words, given a vector space

V

, a covector

is a linear map V → R.

One prominent example is that the derivative d

f

of a function

f

:

R

n

→ R

at

(say) the origin is naturally a covector! If you give me a direction, the derivative

tells us how fast

f

changes in that direction. In other words, given a vector

v

,

df(v) is the directional derivative of f in the direction of v.

But, you say, we were taught that the derivative of

f

is the gradient, which

is a vector. Indeed, you’ve probably never heard of the word “covector” before.

If we wanted to compute the directional derivative of

f

in the direction of

v

, we

simply compute the gradient

∇f

, and then take the dot product

∇f · v

. We

don’t need to talk about covectors, right?

The key realization is that to make the last statement, we need the notion of

a dot product, or inner product. Once we have an inner product, it is an easy

mathematical fact that every covector L : V → R is uniquely of the form

L(v) = v ·w

for some fixed

w ∈ V

, and conversely any vector gives a covector this way.

Thus, whenever we have a dot product on our hands, the notion of covector is

redundant — we can just talk about vectors.

In special relativity, we still have an inner product. However, the inner

product is not a “genuine” inner product. For example, the inner product of a

(non-zero) vector with itself might be zero, or even negative! And in this case,

we need to be careful about the distinction between a vector and a covector, and

it is worth understanding the difference more carefully.

Since we eventually want to do computations, it is extremely useful to look

at these in coordinates. Suppose we have picked a basis

e

1

, ··· , e

n

. Then by

definition of a basis, we can write any vector

v

as

v

=

P

v

i

e

i

. These

v

i

are

the coordinates of

v

in this coordinate system, and by convention, we write the

indices with superscript. We also say they have upper indices.

We can also introduce coordinates for covectors. If

L

is a covector, then we

can define its coordinates by

L

i

= L(e

i

).

By convention, the indices are now written as subscripts, or lower indices. Using

the summation convention, we have

L(v) = L(v

i

e

i

) = v

i

L(e

i

) = v

i

L

i

.

Previously, when we introduced the summation convention, we had a “rule” that

each index can only appear at most twice, and we sum over it when it is repeated.

Here we can refine this rule, and give good meaning to it:

Rule. We can only contract an upper index with a lower index.

The interpretation is that “contraction” really just means applying a covector

to a vector — a very natural thing to do. It doesn’t make sense to “apply” a

vector to a vector, or a covector to a covector. It also doesn’t make sense to

repeat the same index three times, because we can only apply a single covector

to a single vector.

It is common to encounter some things that are neither vectors nor covectors,

but we still want to apply the summation convention to them. For example,

we want to write

v

=

v

i

e

i

, even though

e

i

is not a covector. It turns out in all

cases we encounter, there is one choice of upper or lower indexing that makes

it consistent with our summation convention. For example, we should write

e

i

,

not e

i

, so that v = v

i

e

i

works out.

We said previously that the existence of an inner product allows us to convert

a covector into a vector, and vice versa. So let’s see how inner products work in

a choice of basis.

If the basis we picked were orthonormal, then for any vectors

v

and

w

, we

simply have

v ·w

=

v

T

w

. Alternatively, we have

v ·w

=

P

i

v

i

w

i

. If our basis

were not orthonormal (which is necessarily the case in SR), we can define the

matrix η by

η

ij

= e

i

· e

j

.

We will later say that

η

is a (0

,

2)-tensor, after we defined what that means.

The idea is that it takes in two vectors, and returns a number (namely the

inner product of them). This justifies our choice to use lower indices for both

coordinates. For now, we can argue for this choice by noting that the indices on

e

i

and e

j

are already lower.

Using this η, we can compute

v ·w = (v

i

e

i

) · (w

j

e

j

) = v

i

w

j

(e

i

· e

j

) = w

i

w

j

η

ij

.

In other words, we have

v ·w = v

T

ηw.

We see that this matrix

η

encodes all the information about the inner product

in this basis. This is known as the metric. If we picked an orthonormal basis,

then η would be the identity matrix.

Now it is easy to convert a vector into a covector. The covector (

− · w

) is

given by

v ·w

=

v

i

(

w

j

η

ij

). We can then read off the coordinates of the covector

to be

w

i

= w

j

η

ij

.

In general, these coordinates

w

i

are not the same as

w

i

. This is generally

true only if

η

ij

is the identity matrix, i,e. the basis is orthonormal. Thus,

distinguishing between vectors and covectors now has a practical purpose. Each

“vector” has two sets of coordinates — one when you think of it as a vector, and

one when you turn it into a covector, and they are different. So the positioning

of the indices help us keep track of which coordinates we are talking about.

We can also turn a covector

w

i

back to a vector, if we take the inverse of the

matrix η, which we will write as η

ij

. Then

w

i

= w

j

η

ij

.

5.1.2 Transformation rules

It is often the case that in relativity, we can write down the coordinates of an

object in any suitable basis. However, it need not be immediately clear to us

whether that object should be a vector or covector, or perhaps neither. Thus,

we need a way to identify whether objects are vectors or covectors. To do so, we

investigate how the coordinates of vectors and covectors when we change basis.

By definition, if we have a change of basis matrix

P

, then the coordinates of

a vector transform as v 7→ P v, or equivalently

v

i

7→ v

0i

= P

i

j

v

j

.

How about covectors? If

L

is a covector and

v

is vector, then

L

(

v

) is a number.

In particular, its value does not depend on basis. Thus, we know that the sum

L

i

v

i

must be invariant under any change of basis. Thus, if

L

i

7→

˜

L

i

, then we

know

˜

L

i

P

i

j

v

j

= L

i

v

j

.

Thus, we know

˜

L

i

P

i

j

=

L

i

. To obtain

˜

L

i

, we have to invert

P

i

j

and multiply

L

i

with that.

However, our formalism in terms of indices does not provide us with a way

of denoting the inverse of a matrix pleasantly. We can avert this problem by

focusing on orthogonal matrices, i.e. the matrices that preserve the metric. We

say P is orthogonal if

P

T

ηP = η,

or equivalently,

P

i

j

η

ik

P

k

`

= η

j`

.

This implies the inverse of P has coordinates

P

i

j

= (η

−1

P

T

η)

i

j

= η

i`

η

jk

P

`

k

,

which is the fancy way of describing the “transpose” (which is not the literal

transpose unless

η

=

I

). This is just reiterating the fact that the inverse of an

orthogonal matrix is its transpose. When we do special relativity, the orthogonal

matrices are exactly the Lorentz transformations.

Thus, we find that if P is orthogonal, then covectors transform as

L

i

7→ L

0

i

= P

i

j

L

j

.

We can now write down the “physicists’ ” definition of vectors and covectors.

Before we do that, we restrict to the case of interest in special relativity. The

reason is that we started off this section with the caveat “in any suitable basis”.

We shall not bother to explain what “suitable” means in general, but just do it

in the case of interest.

5.1.3 Vectors and covectors in SR

Recall from IA Dynamics and Relativity that in special relativity, we combine

space and time into one single object. For example, the position and time of an

event is now packed into a single 4-vector in spacetime

X

µ

=

ct

x

y

z

.

Here the index

µ

ranges from 0 to 3. In special relativity, we use Greek alphabets

(e.g.

µ, ν, ρ, σ

) to denote the indices. If we want to refer to the spacial components

(1, 2, 3) only, we will use Roman alphabets (e.g. i, j, k) to denote them.

As we initially discussed, position is very naturally thought of as a vector, and

we will take this as our starting postulate. We will then use the transformation

rules to identify whether any other thing should be a vector or a covector.

In the “standard” basis, the metric we use is the Minkowski metric, defined

by

η

µν

=

+1 0 0 0

0 −1 0 0

0 0 −1 0

0 0 0 −1

. (∗)

This is not positive definite, hence not a genuine inner product. However, it is

still invertible, which is what our previous discussion required. This means, for

example,

X · X = (ct)

2

− (x

2

+ y

2

+ z

2

),

the spacetime interval.

Definition

(Orthonormal basis)

.

An orthonormal basis of spacetime is a basis

where the metric takes the form (

∗

). An (orthonormal) coordinate system is a

choice of orthonormal basis.

Definition

(Lorentz transformations)

.

A Lorentz transformation is a change-of-

basis matrix that preserves the inner product, i.e. orthogonal matrices under the

Minkowski metric.

Thus, Lorentz transformations send orthonormal bases to orthonormal bases.

For example, the familiar Lorentz boost

ct

0

= γ

ct −

v

c

x

x

0

= γ

x −

v

c

ct

y

0

= y

z

0

= z

is the Lorentz transformation given by the matrix

Λ

µ

ν

=

γ −γv/c 0 0

−γv/c γ 0 0

0 0 1 0

0 0 0 1

Other examples include rotations of the space dimensions, which are given by

matrices of the form

Λ

µ

ν

=

1 0 0 0

0

0 R

0

,

with R a rotation matrix.

We can now write down our practical definition of vectors and covectors.

Definition

(Vectors and covectors)

.

A vector is an assignment of 4 numbers

V

µ

, µ

= 0

,

1

,

2

,

3 to each coordinate system such that under a change of basis by

Λ, the coordinates V

µ

transform as V

µ

7→ Λ

µ

ν

V

ν

.

A covector is an assignment of 4 numbers

V

µ

, µ

= 0

,

1

,

2

,

3 to each coordinate

system such that under a change of basis by Λ, the coordinates

V

µ

transform as

V

µ

7→ Λ

µ

ν

V

ν

.

Example. By assumption, the position X

µ

is a vector.

Example.

Suppose we have a trajectory of a particle

X

µ

(

s

) in spacetime. Then

d

ds

X

µ

(s) is also a vector, by checking it transforms.

Finally, we would want to be able to talk about tensors. For example, we

want to be able to talk about

X

µ

X

ν

. This is an assignment of 16 numbers

indexed by µ, ν = 0, 1, 2, 3 that transforms as

X

µ

X

ν

7→ Λ

µ

ρ

Λ

ν

σ

X

ρ

X

σ

.

We would also like to talk about

η

µν

as a tensor. We can make the following

general definition:

Definition (Tensor). A tensor of type (m, n) is a quantity

T

µ

1

···µ

n

ν

1

···ν

n

which transforms as

T

0µ

1

···µ

n

ν

1

···ν

n

= Λ

µ

1

ρ

1

···Λ

µ

m

ρ

m

Λ

ν

1

σ

1

···Λ

ν

n

σ

n

× T

ρ

1

,··· ,ρ

m

σ

1

,··· ,σ

n

.

As we saw, we can change the type of a tensor by raising and lowering indices

by contracting with

η

µν

or its inverse. However, the total

n

+

m

will not be

changed.

Finally, we introduce the 4-derivative.

Definition (4-derivative). The 4-derivative is

∂

µ

=

∂

∂X

µ

=

1

c

∂

∂t

, ∇

.

As we previously discussed, the derivative ought to be a covector. We can also

verify this by explicit computation using the chain rule. Under a transformation

X

µ

7→ X

0µ

, we have

∂

µ

=

∂

∂X

µ

7→

∂

∂X

0µ

=

∂X

ν

∂X

0µ

∂

∂X

ν

= (Λ

−1

)

ν

µ

∂

ν

= Λ

µ

ν

∂

ν

.