2Axioms of probability

IA Probability



2.5 Conditional probability
Definition (Conditional probability). Suppose
B
is an event with
P
(
B
)
>
0.
For any event A Ω, the conditional probability of A given B is
P(A | B) =
P(A B)
P(B)
.
We interpret as the probability of A happening given that B has happened.
Note that if A and B are independent, then
P(A | B) =
P(A B)
P(B)
=
P(A)P(B)
P(B)
= P(A).
Example. In a game of poker, let A
i
= [player i gets royal flush]. Then
P(A
1
) = 1.539 × 10
6
.
and
P(A
2
| A
1
) = 1.969 × 10
6
.
It is significantly bigger, albeit still incredibly tiny. So we say “good hands
attract”.
If P(A | B) > P(A), then we say that B attracts A. Since
P(A B)
P(B)
> P(A)
P(A B)
P(A)
> P(B),
A
attracts
B
if and only if
B
attracts
A
. We can also say
A
repels
B
if
A
attracts
B
C
.
Theorem.
(i) P(A B) = P(A | B)P(B).
(ii) P(A B C) = P(A | B C)P(B | C)P(C).
(iii) P(A | B C) =
P(AB|C)
P(B|C)
.
(iv)
The function
P
(
· | B
) restricted to subsets of
B
is a probability function
(or measure).
Proof.
Proofs of (i), (ii) and (iii) are trivial. So we only prove (iv). To prove
this, we have to check the axioms.
(i) Let A B. Then P(A | B) =
P(AB)
P(B)
1.
(ii) P(B | B) =
P(B)
P(B)
= 1.
(iii) Let A
i
be disjoint events that are subsets of B. Then
P
[
i
A
i
B
!
=
P(
S
i
A
i
B)
P(B)
=
P (
S
i
A
i
)
P(B)
=
X
P(A
i
)
P(B)
=
X
P(A
i
B)
P(B)
=
X
P(A
i
| B).
Definition (Partition). A partition of the sample space is a collection of disjoint
events {B
i
}
i=0
such that
S
i
B
i
= Ω.
For example, “odd” and “even” partition the sample space into two events.
The following result should be clear:
Proposition. If
B
i
is a partition of the sample space, and
A
is any event, then
P(A) =
X
i=1
P(A B
i
) =
X
i=1
P(A | B
i
)P(B
i
).
Example. A fair coin is tossed repeatedly. The gambler gets +1 for head, and
1 for tail. Continue until he is broke or achieves $a. Let
p
x
= P(goes broke | starts with $x),
and B
1
be the event that he gets head on the first toss. Then
p
x
= P(B
1
)p
x+1
+ P(B
C
1
)p
x1
p
x
=
1
2
p
x+1
+
1
2
p
x1
We have two boundary conditions
p
0
= 1,
p
a
= 0. Then solving the recurrence
relation, we have
p
x
= 1
x
a
.
Theorem (Bayes’ formula). Suppose
B
i
is a partition of the sample space, and
A and B
i
all have non-zero probability. Then for any B
i
,
P(B
i
| A) =
P(A | B
i
)P(B
i
)
P
j
P(A | B
j
)P(B
j
)
.
Note that the denominator is simply P(A) written in a fancy way.
Example (Screen test). Suppose we have a screening test that tests whether a
patient has a particular disease. We denote positive and negative results as +
and
respectively, and
D
denotes the person having disease. Suppose that the
test is not absolutely accurate, and
P(+ | D) = 0.98
P(+ | D
C
) = 0.01
P(D) = 0.001.
So what is the probability that a person has the disease given that he received a
positive result?
P(D | +) =
P(+ | D)P(D)
P(+ | D)P(D) + P(+ | D
C
)P(D
C
)
=
0.98 · 0.001
0.98 · 0.001 + 0.01 · 0.999
= 0.09
So this test is pretty useless. Even if you get a positive result, since the disease is
so rare, it is more likely that you don’t have the disease and get a false positive.
Example. Consider the two following cases:
(i) I have 2 children, one of whom is a boy.
(ii) I have two children, one of whom is a son born on a Tuesday.
What is the probability that both of them are boys?
(i) P(BB | BB BG) =
1/4
1/4+2/4
=
1
3
.
(ii)
Let
B
denote a boy born on a Tuesday, and
B
a boy not born on a
Tuesday. Then
P(B
B
B
B | BB
B
B
B
G) =
1
14
·
1
14
+ 2 ·
1
14
·
6
14
1
14
·
1
14
+ 2 ·
1
14
·
6
14
+ 2 ·
1
14
·
1
2
=
13
27
.
How can we understand this? It is much easier to have a boy born on a Tuesday
if you have two boys than one boy. So if we have the information that a boy
is born on a Tuesday, it is now less likely that there is just one boy. In other
words, it is more likely that there are two boys.