4Non-cooperative games

IB Optimisation



4.1 Games and Solutions
Definition
(Bimatrix game)
.
A two-player game, or bimatrix game, is given
by two matrices
P, Q R
m×n
. Player 1, or the row player, chooses a row
i
{
1
, · · · , m}
, while player 2, the column player, chooses a column
j {
1
, · · · , n}
.
These are selected without knowledge of the other player’s decisions. The two
players then get payoffs P
ij
and Q
ij
respectively.
Example. A game of rock-paper-scissors can have payoff matrices
P
ij
=
0 1 1
1 0 1
1 1 0
, Q
ij
=
0 1 1
1 0 1
1 1 0
.
Here a victory gives you a payoff of 1, a loss gives a payoff of
1, and a draw
gives a payoff of 1. Also the first row/column corresponds to playing rock, second
corresponds to paper and third corresponds to sciss ors.
Usually, this is not the best way to display the payoff matrices. First of all,
we need two write out two matrices, and there isn’t an easy way to indicate what
row corresponds to what decision. Instead, we usually write this as a table.
R P S
R (0, 0) (1, 1) (1, 1)
P (1, 1) (0, 0) (1, 1)
S (1, 1) (1, 1) (0, 0)
By convention, the first item in the tuple (
1
,
1) indicates the payoff of the row
player, and the second item indicates the payoff of the column player.
Definition
(Strategy)
.
Players are allowed to play randomly. The set of strate-
gies the row player can have is
X = {x R
m
: x 0,
X
x
i
= 1}
and the column player has strategies
Y = {y R
n
: y 0,
X
y
i
= 1}
Each vector corresponds to the probabilities of selecting each row or column.
A strategy profile (
x, y
)
X × Y
induces a lottery, and we write
p
(
x, y
) =
x
T
P y for the expected payoff of the row player.
If x
i
= 1 for some i, i.e. we always pick i, we call x a pure strategy.
Example
(Prisoner’s dilemma)
.
Suppose Alice and Bob commit a crime together,
and are caught by the police. They can choose to remain silent (
S
) or testify
(T ). Different options will lead to different outcomes:
Both keep silent: the police has little evidence and they go to jail for 2
years.
One testifies and one remains silent: the one who testifies gets awarded
and is freed, while the other gets stuck in jail for 10 years.
Both testify: they both go to jail for 5 years.
We can represent this by a payoff table:
S T
S (2, 2) (0, 3)
T (3, 0) (1, 1)
Note that higher payoff is desired, so a longer serving time corresponds to a
lower payoff. Also, payoffs are interpreted relatively, so replacing (0
,
3) with
(0, 100) (and (3, 0) with (100, 0)) in the payoff table would make no difference.
Here we see that regardless of what the other person does, it is always strictly
better to testify than not (unless you want to be nice). We say
T
is a dominant
strategy, and (1, 1) is Pareto dominated by (2, 2).
Example
(Chicken)
.
The game of Chicken is as follows: two people drive their
cars towards each other at high speed. If they collide, they will di.e. Hence they
can decide to chicken out (
C
) or continue driving (
D
). If both don’t chicken,
they die, which is bad. If one chickens and the other doesn’t the person who
chicken looks silly (but doesn’t die). If both chicken out, they both look slightly
silly. This can be represented by the following table:
C D
C (2, 2) (1, 3)
D (3, 1) (0, 0)
Here there is no dominating strategy, so we need a different way of deciding
what to do.
Instead, we define the security level of the row player to be
max
xX
min
yY
p(x, y) = max
xX
min
j∈{1,...,n}
m
X
i=1
x
i
p
ij
.
Such an
x
is the strategy the row player can employ that minimizes the worst
possible loss. This is called the maximin strategy.
We can formulate this as a linear program:
maximize v such that
m
X
i=1
x
i
p
ij
v for all j = 1, · · · , n
m
X
i=1
x
i
= 1
x 0
Here the maximin strategy is to chicken. However, this isn’t really what we
are looking for, since if both players employ this maximin strategy, it would be
better for you to not chicken out.
Definition
(Best response and equilibrium)
.
A strategy
x X
is a best response
to y Y if for all x
0
X
p(x, y) p(x
0
, y)
A pair (
x, y
) is an equilibrium if
x
is the best response against
y
and
y
is a best
response against x.
Example.
In the chicken game, there are two pure equilibrium, (3
,
1) and (1
,
3),
and there is a mixed equilibrium in which the players pick the options with equal
probability.
Theorem (Nash, 1961). Every bimatrix game has an equilibrium.
We are not proving this since it is too hard.