Causal Structures - Creation & Training#

In this example we want to illustrate how the CausalStructure class is used. We start with the imports.

[1]:

import numpy as np
import pandas as pd
import pylab as pl

from halerium import CausalStructure


By using this Application, You are agreeing to be bound by the terms and conditions of the Halerium End-User License Agreement that can be downloaded here: https://erium.de/halerium-eula.txt

The artificial data#

We create an artificial data set containing three parameters.

[2]:

n_data = 100
np.random.seed(100)
parameter_a = 5 + np.random.randn(n_data) * 0.1
parameter_b = parameter_a * (-35) + 150 + np.random.randn(n_data) * 1.
parameter_c = parameter_a * 10.5 + parameter_b * (.5) + np.random.randn(n_data) * 0.1

data = pd.DataFrame(data={"(a)": parameter_a,
                          "(b|a)": parameter_b,
                          "(c|a,b)": parameter_c})
data

[2]:

	(a)	(b\|a)	(c\|a,b)
0	4.825023	-20.580472	40.432953
1	5.034268	-27.335642	39.101290
2	5.115304	-32.008941	37.765420
3	4.974756	-24.083157	40.149657
4	5.098132	-28.683511	39.198809
...	...	...	...
95	5.000302	-24.361388	40.234901
96	4.992398	-24.908699	39.977586
97	5.000396	-23.996587	40.410045
98	4.981499	-24.952434	39.641407
99	4.751285	-14.718803	42.547324

100 rows × 3 columns

We purposely chose column names which cannot be used as python variable names.

Creating the causal structure#

Now that we have our data, we can formulate the set of dependencies that define our causal structure.

[3]:

dependencies = [
    ["(a)", "(b|a)"], # the column '(b|a)' depends on '(a)'
    [["(a)", "(b|a)"], "(c|a,b)"], # the column '(c|a,b)' depends on '(a)' and '(b|a)'
]

and create the causal structure.

[4]:

causal_structure = CausalStructure(dependencies)

Training#

We can directly train the causal_structure using

[5]:

causal_structure.train(data)

What this triggered in the background was actually 1. causal_structure.scaling_data = data 2. causal_structure.build_graph() 3. causal_structure.train(data)

In the first step scaling data are provided to the causal structure. Scaling data are what the causal structure uses to set the locations and scales of the created variables. These are important to allow for the correct definition of a priori statistics, i.e. useful regularization.

In the second step the graph is built using default arguments. Each dependency will be mathematically modelled as a quadratic regression with unknown variance.

In the third step the graph is trained using default arguments.

In next section we will use the trained causal structure to make predictions.