2Axioms of probability

IA Probability



2.1 Axioms and definitions
So far, we have semi-formally defined some probabilistic notions. However, what
we had above was rather restrictive. We were only allowed to have a finite
number of possible outcomes, and all outcomes occur with the same probability.
However, most things in the real world do not fit these descriptions. For example,
we cannot use this to model a coin that gives heads with probability π
1
.
In general, “probability” can be defined as follows:
Definition (Probability space). A probability space is a triple (Ω
, F, P
). is a
set called the sample space,
F
is a collection of subsets of Ω, and
P
:
F
[0
,
1]
is the probability measure.
F has to satisfy the following axioms:
(i) , F.
(ii) A F A
C
F.
(iii) A
1
, A
2
, ··· F
S
i=1
A
i
F.
And P has to satisfy the following Kolmogorov axioms:
(i) 0 P(A) 1 for all A F
(ii) P(Ω) = 1
(iii)
For any countable collection of events
A
1
, A
2
, ···
which are disjoint, i.e.
A
i
A
j
= for all i, j, we have
P
[
i
A
i
!
=
X
i
P(A
i
).
Items in are known as the outcomes, items in
F
are known as the events, and
P(A) is the probability of the event A.
If is finite (or countable), we usually take
F
to be all the subsets of Ω, i.e.
the power set of Ω. However, if is, say,
R
, we have to be a bit more careful
and only include nice subsets, or else we cannot have a well-defined P.
Often it is not helpful to specify the full function
P
. Instead, in discrete cases,
we just specify the probabilities of each outcome, and use the third axiom to
obtain the full P.
Definition (Probability distribution). Let =
{ω
1
, ω
2
, ···}
. Choose numbers
p
1
, p
2
, ··· such that
P
i=1
p
i
= 1. Let p(ω
i
) = p
i
. Then define
P(A) =
X
ω
i
A
p(ω
i
).
This
P
(
A
) satisfies the above axioms, and
p
1
, p
2
, ···
is the probability distribution
Using the axioms, we can quickly prove a few rather obvious results.
Theorem.
(i) P() = 0
(ii) P(A
C
) = 1 P(A)
(iii) A B P(A) P(B)
(iv) P(A B) = P(A) + P(B) P(A B).
Proof.
(i) and are disjoint. So P(Ω) + P() = P(Ω ) = P(Ω). So P() = 0.
(ii) P(A) + P(A
C
) = P(Ω) = 1 since A and A
C
are disjoint.
(iii) Write B = A (B A
C
). Then
P (B) = P(A) + P(B A
C
) P(A).
(iv) P
(
A B
) =
P
(
A
) +
P
(
B A
C
). We also know that
P
(
B
) =
P
(
A B
) +
P(B A
C
). Then the result follows.
From above, we know that
P
(
A B
)
P
(
A
) +
P
(
B
). So we say that
P
is a
subadditive function. Also,
P
(
A B
) +
P
(
A B
)
P
(
A
) +
P
(
B
) (in fact both
sides are equal!). We say P is submodular.
The next theorem is better expressed in terms of limits.
Definition (Limit of events). A sequence of events
A
1
, A
2
, ···
is increasing if
A
1
A
2
···. Then we define the limit as
lim
n→∞
A
n
=
[
1
A
n
.
Similarly, if they are decreasing, i.e. A
1
A
2
···, then
lim
n→∞
A
n
=
\
1
A
n
.
Theorem. If A
1
, A
2
, ··· is increasing or decreasing, then
lim
n→∞
P(A
n
) = P
lim
n→∞
A
n
.
Proof. Take B
1
= A
1
, B
2
= A
2
\ A
1
. In general,
B
n
= A
n
\
n1
[
1
A
i
.
Then
n
[
1
B
i
=
n
[
1
A
i
,
[
1
B
i
=
[
1
A
i
.
Then
P(lim A
n
) = P
[
1
A
i
!
= P
[
1
B
i
!
=
X
1
P(B
i
) (Axiom III)
= lim
n→∞
n
X
i=1
P(B
i
)
= lim
n→∞
P
n
[
1
A
i
!
= lim
n→∞
P(A
n
).
and the decreasing case is proven similarly (or we can simply apply the above to
A
C
i
).