III Modern Statistical Methods - Graphical modelling

4Graphical modelling

III Modern Statistical Methods

4.3 The PC algorithm

We now want to try to find out the structural equation model given some data,

and in particular, determine the causal structure. As we previously saw, there is

no hope of determining this completely, even if we know the distribution of the

Z completely. Let’s consider the different obstacles to this problem.

Causal minimality

is generated by an SEM with DAG

, then from the above, we know that

is Markov with respect to

. The converse is also true: if

is Markov with

respect to a DAG

, then there exists a SEM with DAG

that generates

This immediately implies that

will be Markov with respect to many DAGs.

For example, a DAG whose skeleton is complete will always work. This suggests

the following definition:

Definition (Causal minimality). A distribution

satisfies causal minimality

with respect to G but not any proper subgraph of G.

Markov equivalent DAGs

It is natural to aim for finding a causally minimal DAG. However, this does

not give a unique solution, as we saw previously with the two variables that are

always the same.

In general, two different DAGs may satisfy the same set of d-separations,

and then a distribution is Markov with respect to one iff its Markov with respect

to the other, and we cannot distinguish between the two.

Definition (Markov equivalence). For a DAG G, we let

M(G) = {distributions P such that P is Markov with respect to G}.

We say two DAGs G

, G

are are Markov equivalent if M(G

) = M(G

What is nice is that there is a rather easy way of determining when two

DAGs are Markov equivalent.

Proposition. Two DAGs are Markov equivalent iff they have the same skeleton

and same set of v-structure.

The set of all DAGs that are Markov equivalent to a given DAG can be

represented by a CPDAG (completed partial DAG), which contains an edge

(j, k) iff some member of the equivalence class does.

Faithfulness

To describe the final issue, consider the SEM

= ε

, Z

= αZ

+ ε

, Z

= βZ

+ γZ

+ ε

We take ε ∼ N

(0, I). Then we have Z = (Z

, Z

) ∼ N(0, Σ), where

Σ =





1 α β + αγ

α α

+ 1 α(β + αγ) + γ

β + αγ α(β + αγ) + γ β

+ γ

(α

+ 1) + 1 + 2αβγ





Now if we picked values of α, β, γ such that

β + αγ = 0,

then we obtained an extra independence relation

q Z

in our system. For

example, if we pick β = −1 and α, γ = 1, then

Σ =





1 1 0

1 2 1

0 1 2





While there is an extra independence relation, we cannot remove any edge while

still satisfying the Markov property. Indeed:

–

If we remove 1

→

2, then this would require

q Z

, but this is not true.

– If we remove 2 → 3, then this would require Z

q Z

| Z

, but we have

var((Z

, Z

) | Z

) =



2 1

1 2



−







1 0





1 1

1 2



and this is not diagonal.

– If we remove 1 → 3, then this would require Z

q Z

| Z

, but

var((Z

, Z

) | Z

) =



1 0

0 2



−







1 1



which is again non-diagonal.

So this DAG satisfies causal minimality. However,

can also be generated by

the structural equation model

= ˜ε

+ ˜ε

= ˜ε

where the ˜ε

are independent with

˜ε

∼ N(0, 1), ˜ε

∼ N(0, 2), ˜ε

∼ N(0,

Then this has the DAG

This is a strictly smaller DAG in terms of the number of edges involved. It is

easy to see that this satisfies causal minimality.

Definition (Faithfulness). We say

is faithful to a DAG

if it is Markov

with respect to

and for all

A, B, S

disjoint,

q Z

| Z

implies

A, B

are

d-separated by S.

Determining the DAG

We shall assume our distribution is faithful to some

, and see if we can figure

out G

from P , or even better, from data.

To find G, the following proposition helps us compute the skeleton:

Proposition. If nodes

and

are adjacent in a DAG

, then no set can

d-separate them.

If they are not adjacent, and

is a topological order for

with

(

)

< π

(

then they are d-separated by pa(k).

Proof.

Only the last part requires proof. Consider a path

, . . . , j

Start reading the path from

and go backwards. If it starts as

m−1

→ k

, then

m−1

is a parent of k and blocks the path. Otherwise, it looks like k → j

m−1

We keep going down along the path until we first see something of the form

···

Thus must exist, since j is not a descendant of k by topological ordering. So it

suffices to show that

does not have a descendant in

(

), but if it did, then

this would form a closed loop.

Finding the

-structures is harder, and at best we can do so up to Markov

equivalence. To do that, observe the following:

Proposition. Suppose we have j − ` − k in the skeleton of a DAG.

(i) If j → ` ← k, then no S that d-separates j can have ` ∈ S.

(ii)

If there exists

that d-separates

and

` 6∈ S

, then

j → ` ← k

Denote the set of nodes adjacent to the vertex

in the graph

adj

(

G, k

We can now describe the first part of the PC algorithm, which finds the

skeleton of the “true DAG”:

(i) Set

G to be the complete undirected graph. Set ` = −1.

(ii) Repeat the following steps:

(a) Set ` = ` + 1:

(b) Repeat the following steps:

Select a new ordered pair of nodes

j, k

that are adjacent in

and

such that |adj(

G, j) \ {k}| ≥ `.

ii. Repeat the following steps:

A. Choose a new S ⊆ adj(

G, j) \ {k} with |S| = `.

q Z

| Z

, then delete the edge

, and store

(

k, j

) =

S(j, k) = S

C. Repeat until j − k is deleted or all S chosen.

iii. Repeat until all pairs of adjacent nodes are inspected.

Suppose

is faithful to a DAG

. At each stage of the algorithm, the skeleton

will be a subgraph of

. On the other hand, edges (

j, k

) remaining at

termination will have

q Z

| Z

for all S ⊆ (

G, k), S ⊆ (

G, j).

So they must be adjacent in G

. Thus,

G and G

have the same skeleton.

To find the v-structures, we perform:

(i) For all j − l − k in

G, do:

(a) If ` 6∈ S(j, k), then orient j → ` ← k.

This gives us the Markov equivalence class, and we may orient the other edges

using other properties like acyclicity.

If we want to apply this to data sets, then we need to apply some conditional

independence tests instead of querying our oracle to figure out if things are

conditional dependence. However, errors in the algorithm propagate, and the

whole process may be derailed by early errors. Moreover, the result of the

algorithm may depend on how we iterate through the nodes. People have tried

many ways to fix these problems, but in general, this method is rather unstable.

Yet, if we have large data sets, this can produce quite decent results.