import numpy as np
import halerium.core as hal
import matplotlib.pyplot as plt

By using this Application, You are agreeing to be bound by the terms and conditions of the Halerium End-User License Agreement that can be downloaded here: https://erium.de/halerium-eula.txt

Causal Models - the Basics

Causal Models are models where relationships between parameters have a causal direction.

Correlations and statistics

Let’s start with correlation and statistics first.

Consider having two parameters \(a\) and \(b\) and some example data for them.

cov = [
    [1., 0.7],
    [0.7, 1.]
data_ab = np.random.multivariate_normal(np.zeros(2), cov, size=(1000,))
data_a = data_ab[:, 0]
data_b = data_ab[:, 1]

The data for a and b are correlated. Their individual variance is 1.

np.corrcoef(data_a, data_b)
array([[1.        , 0.70200685],
       [0.70200685, 1.        ]])
np.var(data_a), np.var(data_b)
(0.9819749563818557, 0.9483082972434826)
plt.scatter(data_a, data_b, alpha=0.5)
plt.xlim([-4, 4])
plt.ylim([-4, 4])
(-4.0, 4.0)

Statistically, all that we care about is the probability distribution \(P(a, b)\). No more. Statistics do not care whether \(a\) causes \(b\) or \(b\) causes \(a\) (or maybe both are caused by another parameter).

With Bayes’ theorem we can construct both splits,

\(P(a,b) = P(a) P(b|a)\)


\(P(a,b) = P(b) P(a|b)\).

We can see this by creating two different Bayesian networks

Split 1 - P(a) P(b|a)

graph_1 = hal.Graph("graph_1")
with graph_1:
    hal.Variable("a", mean=0, variance=1)
    hal.Variable("b", mean=0.7 * a, variance=1-0.7**2)
data_from_1_a, data_from_1_b = hal.get_generative_model(graph_1, data=hal.DataLinker(n_data=1000)).get_example(
    [graph_1.a, graph_1.b])
{'graph_1/a': set(), 'graph_1/b': {'graph_1/a'}}
np.corrcoef(data_from_1_a, data_from_1_b)
array([[1.        , 0.71458369],
       [0.71458369, 1.        ]])
np.var(data_from_1_a), np.var(data_from_1_a)
(1.0330448348514882, 1.0330448348514882)

Split 2 - P(b) P(a|b)

graph_2 = hal.Graph("graph_2")
with graph_2:
    hal.Variable("b", mean=0, variance=1)
    hal.Variable("a", mean=0.7 * b, variance=1-0.7**2)
data_from_2_a, data_from_2_b = hal.get_generative_model(graph_2, data=hal.DataLinker(n_data=1000)).get_example(
    [graph_2.a, graph_2.b])
{'graph_2/b': set(), 'graph_2/a': {'graph_2/b'}}
np.corrcoef(data_from_2_a, data_from_2_b)
array([[1.        , 0.70663465],
       [0.70663465, 1.        ]])
np.var(data_from_2_a), np.var(data_from_2_a)
(1.0153531618670686, 1.0153531618670686)

Scatter plot comparison

fig, axs = plt.subplots(1, 2, figsize=(16, 7.5))
for ax, data in zip(axs, [[data_from_1_a, data_from_1_b], [data_from_2_a, data_from_2_b]]):
    data_a, data_b = data
    ax.scatter(data_a, data_b, alpha=0.5)
    ax.set_xlim([-4, 4])
    ax.set_ylim([-4, 4])
Text(0.5, 1.0, 'graph_2')

We can see, that both splits produce equal distribution patterns. So the statistics they describe are equivalent.

If both splits are equal, whats the difference?

Statistically there is no difference. There is only a difference if we interpret the split to be along causal directions.

Split 1: a causes b

Split 2: b causes a

Once we decide for this interpretation and use the “DO” operation to model interventions we get a difference.

The DO operation

The DO operation cuts all incoming arrows to a parameter.

Help on function do_operation in module halerium.core.causal_calculus.do:

do_operation(scopetor, variables, inplace=False, strict=True)
    Perform Do operation of variable(s) in scopetor.

    Do operations are required to accurately model interventions.

    The Do operation cuts all dependencies the variables have
    by setting all of the distribution parameters to None.
    This way values that are fed to the variable as data
    can only propagate forwards in the dependency structure.

    scopetor : Scopetor (Graph, Entity, Variable)
        The scopetor in which the Do operation is applied.
        Must be a top-level scopetor, i.e. a scopetor without
        a parent scope.
    variables : VariableBase, str or iterable
        The variable(s) to which the Do operation is applied.
        If the variable(s) are not scopees in scopetor,
        their equivalents in scopetor will be located and the
        do operation will be applied to the equivalents instead.
    inplace : bool, optional
        Whether or not to modify scopetor (True) or to leave
        scopetor unchanged and return a modified copy (False).
        The default is False.
    strict : bool, optional
        Only has an effect if inplace=False.
        If True the scopetor has to be self-contained.
        If False not self-contained parts are left out and will be
        set to None in the copied instance.
        The default is True.

    modified_scopetor : Scopetor
        The modified scopetor.

Applied to graph_1, \(P(a)P(b|a)\), the do operation only has an effect if it is applied for the variable b.

{'graph_1/a': set(), 'graph_1/b': {'graph_1/a'}}
hal.do_operation(graph_1, graph_1.a).get_all_variable_dependencies()
{'graph_1/a': set(), 'graph_1/b': {'graph_1/a'}}
hal.do_operation(graph_1, graph_1.b).get_all_variable_dependencies()
{'graph_1/a': set(), 'graph_1/b': set()}

Applied to graph_2, \(P(a)P(b|a)\), the do operation only has an effect if it is applied for the variable a.

{'graph_2/b': set(), 'graph_2/a': {'graph_2/b'}}
hal.do_operation(graph_2, graph_2.a).get_all_variable_dependencies()
{'graph_2/b': set(), 'graph_2/a': set()}
hal.do_operation(graph_2, graph_2.b).get_all_variable_dependencies()
{'graph_2/b': set(), 'graph_2/a': {'graph_2/b'}}

the difference between observation and intervention


You condition a statistical model (\(P(a,b)\) is a statistical model) to an observation by constructing

\(P(unknown | observed)\).

Say you observe \(a=1\). Then you are asking the question:

\(P(b | a=1)\)

“How does my state of information about \(b\) change if I observe \(a=1\)?”


You change your causal model (\(P(a= P(b|a)\) can be interpreted as a causal model) to an intervention by constructing

\(P(unknown | DO \ intervention)\).

Say you do \(a=1\) (you set it to 1). Then you are asking the question:

\(P(b | DO \ a=1)\)

“What will happen to \(b\) if I set \(a=1\)”?

Minimal example

Let’s see the effect on our two graphs

a_given = 1.

Graph 1


g = graph_1
hal.Predictor(g, data={g.a: [a_given]})(g.b)


g = hal.do_operation(graph_1, graph_1.a)
hal.Predictor(g, data={g.a: [a_given]})(g.b)

For graph 1 there is no difference between intervention and observation because \(a\) is at the beginning of the causal chain.

Graph 2


g = graph_2
hal.Predictor(g, data={g.a: [a_given]})(g.b)


g = hal.do_operation(graph_2, graph_2.a)
hal.Predictor(g, data={g.a: [a_given]})(g.b)

For graph 2 there is a fundamental difference between DO \(a=1\) and observe \(a=1\), because \(a\) is caused by \(b\).

Information from an observation travels backwards along causal directions. Information from an intervention only travels forwards.

the InterventionPredictor objective

The operational chain

g = hal.do_operation(graph_2, graph_2.a)
hal.Predictor(g, data={g.a: [a_given]})(g.b)

is conveniently accessible with the InterventionPredictor objective.

                          interventions={graph_2.a: [a_given]})(g.b)