Causal Structures - Creation & Training#
In this example we want to illustrate how the CausalStructure class is used. We start with the imports.
[1]:
import numpy as np
import pandas as pd
import pylab as pl
from halerium import CausalStructure
By using this Application, You are agreeing to be bound by the terms and conditions of the Halerium End-User License Agreement that can be downloaded here: https://erium.de/halerium-eula.txt
The artificial data#
We create an artificial data set containing three parameters.
[2]:
n_data = 100
np.random.seed(100)
parameter_a = 5 + np.random.randn(n_data) * 0.1
parameter_b = parameter_a * (-35) + 150 + np.random.randn(n_data) * 1.
parameter_c = parameter_a * 10.5 + parameter_b * (.5) + np.random.randn(n_data) * 0.1
data = pd.DataFrame(data={"(a)": parameter_a,
"(b|a)": parameter_b,
"(c|a,b)": parameter_c})
data
[2]:
(a) | (b|a) | (c|a,b) | |
---|---|---|---|
0 | 4.825023 | -20.580472 | 40.432953 |
1 | 5.034268 | -27.335642 | 39.101290 |
2 | 5.115304 | -32.008941 | 37.765420 |
3 | 4.974756 | -24.083157 | 40.149657 |
4 | 5.098132 | -28.683511 | 39.198809 |
... | ... | ... | ... |
95 | 5.000302 | -24.361388 | 40.234901 |
96 | 4.992398 | -24.908699 | 39.977586 |
97 | 5.000396 | -23.996587 | 40.410045 |
98 | 4.981499 | -24.952434 | 39.641407 |
99 | 4.751285 | -14.718803 | 42.547324 |
100 rows × 3 columns
We purposely chose column names which cannot be used as python variable names.
Creating the causal structure#
Now that we have our data, we can formulate the set of dependencies that define our causal structure.
[3]:
dependencies = [
["(a)", "(b|a)"], # the column '(b|a)' depends on '(a)'
[["(a)", "(b|a)"], "(c|a,b)"], # the column '(c|a,b)' depends on '(a)' and '(b|a)'
]
and create the causal structure.
[4]:
causal_structure = CausalStructure(dependencies)
Training#
We can directly train the causal_structure using
[5]:
causal_structure.train(data)
What this triggered in the background was actually 1. causal_structure.scaling_data = data
2. causal_structure.build_graph()
3. causal_structure.train(data)
In the first step scaling data are provided to the causal structure. Scaling data are what the causal structure uses to set the locations and scales of the created variables. These are important to allow for the correct definition of a priori statistics, i.e. useful regularization.
In the second step the graph is built using default arguments. Each dependency will be mathematically modelled as a quadratic regression with unknown variance.
In the third step the graph is trained using default arguments.
In next section we will use the trained causal structure to make predictions.