Causal Structures - Creation & Training#

In this example we want to illustrate how the CausalStructure class is used. We start with the imports.

[1]:
import numpy as np
import pandas as pd
import pylab as pl

from halerium import CausalStructure

By using this Application, You are agreeing to be bound by the terms and conditions of the Halerium End-User License Agreement that can be downloaded here: https://erium.de/halerium-eula.txt

The artificial data#

We create an artificial data set containing three parameters.

[2]:
n_data = 100
np.random.seed(100)
parameter_a = 5 + np.random.randn(n_data) * 0.1
parameter_b = parameter_a * (-35) + 150 + np.random.randn(n_data) * 1.
parameter_c = parameter_a * 10.5 + parameter_b * (.5) + np.random.randn(n_data) * 0.1

data = pd.DataFrame(data={"(a)": parameter_a,
                          "(b|a)": parameter_b,
                          "(c|a,b)": parameter_c})
data
[2]:
(a) (b|a) (c|a,b)
0 4.825023 -20.580472 40.432953
1 5.034268 -27.335642 39.101290
2 5.115304 -32.008941 37.765420
3 4.974756 -24.083157 40.149657
4 5.098132 -28.683511 39.198809
... ... ... ...
95 5.000302 -24.361388 40.234901
96 4.992398 -24.908699 39.977586
97 5.000396 -23.996587 40.410045
98 4.981499 -24.952434 39.641407
99 4.751285 -14.718803 42.547324

100 rows × 3 columns

We purposely chose column names which cannot be used as python variable names.

Creating the causal structure#

Now that we have our data, we can formulate the set of dependencies that define our causal structure.

[3]:
dependencies = [
    ["(a)", "(b|a)"], # the column '(b|a)' depends on '(a)'
    [["(a)", "(b|a)"], "(c|a,b)"], # the column '(c|a,b)' depends on '(a)' and '(b|a)'
]

and create the causal structure.

[4]:
causal_structure = CausalStructure(dependencies)

Training#

We can directly train the causal_structure using

[5]:
causal_structure.train(data)

What this triggered in the background was actually 1. causal_structure.scaling_data = data 2. causal_structure.build_graph() 3. causal_structure.train(data)

In the first step scaling data are provided to the causal structure. Scaling data are what the causal structure uses to set the locations and scales of the created variables. These are important to allow for the correct definition of a priori statistics, i.e. useful regularization.

In the second step the graph is built using default arguments. Each dependency will be mathematically modelled as a quadratic regression with unknown variance.

In the third step the graph is trained using default arguments.

In next section we will use the trained causal structure to make predictions.