High-level usage - Code Overview#
On a high level the usage of Halerium involves the creation, training and evaluation of causal structures.
Causal Structures#
A CausalStructure
is a collection of dependencies between parameters,
with each parameter being defined by a string. The most convenient way to use
a causal structure in conjunction with a pandas DataFrame, where the parameter names
of the causal structure match the column names of the DataFrame.
For example:
>>> data = pandas.DataFrame(columns=["a", "b", "c", "d"])
>>> CausalStructure([[{"a", "b"}, {"c", "d"}]])
CausalStructure([[{'a', 'b'}, 'c'],
[{'a', 'b'}, 'd']])
Here the causal structure defines that c and d each depend on a and b.
Internally causal structures contains a Dependencies
instance.
The causal converts its Dependencies
instance to the necessary
core-objects (namely the Graph
) to evaluate various
objectives.
Dependencies#
Internally causal structures utilize instances of Dependency
and Dependencies
. These classes manage the dependencies that make up
the causal structure and are in charge of checking that the dependency tree is acyclic, i.e. that following
a chain of dependencies cannot lead back to the first parameter in the chain.
>>> Dependency(feature: "a", target="a")
CyclicDependencyError: Cyclic dependency detected for 'a'.
>>> Dependencies([["a", "b"],
["b", "c"],
["c", "a"]])
CyclicDependencyError: Cyclic dependency detected for {'b'}.
The user does not have to create dependencies explicitly. The dependencies argument to the __init__
of the CausalStructure
class is used to create the Dependencies
instance automatically. The user can however also create the Dependencies
instance themselves
>>> dependencies = Dependencies([[{"a", "b"}, {"c", "d"}]])
>>> causal_structure = CausalStructure(dependencies)
>>> causal_structure
CausalStructure([[{'a', 'b'}, 'c'],
[{'a', 'b'}, 'd']])
Basic Methods#
The most important method of the CausalStructure
class are the following
train()
: This method trains the causal structure (or rather
its internal Graph
) with a training data set. After training the
causal structure can be used to make predictions or to evaluate objectives.
>>> data = pandas.DataFrame(columns=["a", "b", "c", "d"],
>>> data=[[0, 0, 0, 0],
>>> [1, 0, 1, 2],
>>> [0, 1, -2, 1],
>>> [1, 1, -1, 3]])
>>> causal_structure.train(data)
predict()
: This method makes a prediction using the
internal trained graph and an input data set.
>>> data_in = pandas.DataFrame(columns=["a", "b"],
>>> data=[[ -1, -1],
>>> [0.5, 0.5]])
>>> causal_structure.predict(data_in)
a b c d
0 -1.0 -1.0 0.976529 -2.953433
1 0.5 0.5 -0.476481 1.490052
evaluate_objective()
: This method evaluates an objective
class using the internal trained graph and additional arguments to the
objective class. See the Objectives section.
Advanced Methods#
The CausalStructure
class offers a number of advanced methods
that allow the user to influence how the internal graph is build or to
modify and/or utilize the graph with the core package
The most important of these methods are the following
build_graph()
: This method converts the dependencies into
a Graph
instance (see the core-package for details).
The method is automatically called when the get_graph()
or
train()
methods are called. With the explicit call the
user can modify the build arguments.
get_graph()
: This method returns the Graph
instance
that was created from the dependencies. If no graph was built yet, the
build_graph()
is triggered first.
The user can modify the returned graph in-place using the
core-package. Alternatively, a modified graph be used to replace
the graph
attribute.
get_trained_graph()
: This method returns the Graph
instance
that was created by the train()
method.
If no training has taken place yet an Exception is raised.
The user can modify the returned graph in-place using the
core-package. Alternatively, a modified graph be used to replace
the trained_graph
attribute.
get_data_linker()
: This method creates a
DataLinker
instance (see the core-package documentation
for details) compatible with the internal graph from a provided data set.
Examples#
The CausalStructure
, Dependency
and Dependencies
classes are further explained
in the following examples:
Real data applications of the CausalStructure
are
in the following examples:
Objectives#
Objectives are special classes that answer specific questions.
The answer is based on the trained graph and the additional arguments
to the objective (e.g. data).
After the causal structure has been trained with the
train()
method objectives can be evaluated by
calling the evaluate_objective()
method. The first argument to the evaluate_objective()
method is the objective class. The available classes are
Predictor
: The predictor answers the question
what the values of all parameters could be given the values
of a subset of the parameters as data.
>>> causal_structure.evaluate_objective(Predictor, data=data_in, measure="mean")
a b c d
0 -1.0 -1.0 0.976529 -2.953433
1 0.5 0.5 -0.476481 1.490052
>>> causal_structure.evaluate_objective(Predictor, data=data_in, measure="std")
a b c d
0 0.0 0.0 9.443231 10.300908
1 0.0 0.0 0.875485 0.982973
Evaluator
: The evaluator answers the question
how well the predictions perform on a test data set.
>>> data_test = pandas.DataFrame(columns=["a", "b", "c", "d"],
>>> data=[[-1, -1, 1, -3],
>>> [ 0, -1, 2, -1],
>>> [-1, 0, -1, -2],
>>> [ 2, 1, 0, 5],
>>> [ 1, 2, -3, 4]])
>>> causal_structure.evaluate_objective(Evaluator, data=data_test,
>>> inputs=["a", "b"], metric="r2")
{'a': None, 'b': None, 'c': 0.9997590212263663, 'd': 0.9998194827898427}
InfluenceEstimator
: The influence estimator answers the question
how much a certain target is influenced by the other parameters.
>>> causal_structure.evaluate_objective(InfluenceEstimator, target="d")
{'a': 0.7127661538640915, 'b': 0.4127766942443396, 'c': 0.0, 'd': 1.0}
OutlierDetector
: The outlier detector answers the question which
data points in a given data set are outliers
(i.e. are very incompatible with the trained graph).
>>> data_test = pandas.DataFrame(columns=["a", "b", "c", "d"],
>>> data=[[1.5, 1.0, -0.5, 4.0],
>>> [1.5, 1.0, -0.5, 40.0]])
>>> causal_structure.evaluate_objective(OutlierDetector, data=data_test)
a b c d graph
0 1.0 0.0 0.0 0.0 0.0
1 1.0 0.0 1.0 1.0 1.0
RankEstimator
: The rank estimator is the continuous analogon to
the outlier detector. It answers the question of how many comparison data
points would be more likely than the data point in question.
>>> causal_structure.evaluate_objective(RankEstimator, data=data_test)
a b c d graph
0 0.04 0.31 0.60 0.52 0.22
1 0.04 0.31 0.00 0.00 0.00
ProbabilityEstimator
: The probability estimator answers the
question of what is the logarithmic probability density of the data
point in question.
>>> causal_structure.evaluate_objective(ProbabilityEstimator, data=data_test)
a b c d graph
0 -2.225791 -0.725791 -2.117248 -2.287631 -7.388386
1 -2.225791 -0.725791 -2.117248 -997.881033 -1222.759525
To answer questions which are not covered by these objective classes the user will have to utilize the low-level functionalities of the core package.
Examples#
The objectives are used with the CausalStructure
class
in the following examples: