.. currentmodule:: halerium ================================ High-level usage - Code Overview ================================ On a high level the usage of Halerium involves the creation, training and evaluation of causal structures. Causal Structures ================= A :class:`~CausalStructure` is a collection of dependencies between parameters, with each parameter being defined by a string. The most convenient way to use a causal structure in conjunction with a pandas DataFrame, where the parameter names of the causal structure match the column names of the DataFrame. For example: :: >>> data = pandas.DataFrame(columns=["a", "b", "c", "d"]) >>> CausalStructure([[{"a", "b"}, {"c", "d"}]]) CausalStructure([[{'a', 'b'}, 'c'], [{'a', 'b'}, 'd']]) Here the causal structure defines that c and d each depend on a and b. Internally causal structures contains a :class:`~core.Dependencies` instance. The causal converts its :class:`~core.Dependencies` instance to the necessary core-objects (namely the :class:`~core.Graph`) to evaluate various objectives. Dependencies ------------ Internally causal structures utilize instances of :class:`~core.Dependency` and :class:`~core.Dependencies`. These classes manage the dependencies that make up the causal structure and are in charge of checking that the dependency tree is acyclic, i.e. that following a chain of dependencies cannot lead back to the first parameter in the chain. :: >>> Dependency(feature: "a", target="a") CyclicDependencyError: Cyclic dependency detected for 'a'. >>> Dependencies([["a", "b"], ["b", "c"], ["c", "a"]]) CyclicDependencyError: Cyclic dependency detected for {'b'}. The user does not have to create dependencies explicitly. The `dependencies` argument to the `__init__` of the :class:`~CausalStructure` class is used to create the :class:`~core.Dependencies` instance automatically. The user can however also create the :class:`~core.Dependencies` instance themselves :: >>> dependencies = Dependencies([[{"a", "b"}, {"c", "d"}]]) >>> causal_structure = CausalStructure(dependencies) >>> causal_structure CausalStructure([[{'a', 'b'}, 'c'], [{'a', 'b'}, 'd']]) Basic Methods ------------- The most important method of the :class:`~CausalStructure` class are the following :meth:`~CausalStructure.train`: This method trains the causal structure (or rather its internal :class:`~core.Graph`) with a training data set. After training the causal structure can be used to make predictions or to evaluate objectives. :: >>> data = pandas.DataFrame(columns=["a", "b", "c", "d"], >>> data=[[0, 0, 0, 0], >>> [1, 0, 1, 2], >>> [0, 1, -2, 1], >>> [1, 1, -1, 3]]) >>> causal_structure.train(data) :meth:`~CausalStructure.predict`: This method makes a prediction using the internal trained graph and an input data set. :: >>> data_in = pandas.DataFrame(columns=["a", "b"], >>> data=[[ -1, -1], >>> [0.5, 0.5]]) >>> causal_structure.predict(data_in) a b c d 0 -1.0 -1.0 0.976529 -2.953433 1 0.5 0.5 -0.476481 1.490052 :meth:`~CausalStructure.evaluate_objective`: This method evaluates an objective class using the internal trained graph and additional arguments to the objective class. See the `Objectives`_ section. Advanced Methods ---------------- The :class:`~CausalStructure` class offers a number of advanced methods that allow the user to influence how the internal graph is build or to modify and/or utilize the graph with the :ref:`core package` The most important of these methods are the following :meth:`~CausalStructure.build_graph`: This method converts the dependencies into a :class:`~core.Graph` instance (see the core-package for details). The method is automatically called when the :meth:`~CausalStructure.get_graph` or :meth:`~CausalStructure.train` methods are called. With the explicit call the user can modify the build arguments. :meth:`~CausalStructure.get_graph`: This method returns the :class:`~core.Graph` instance that was created from the dependencies. If no graph was built yet, the :meth:`~CausalStructure.build_graph` is triggered first. The user can modify the returned graph in-place using the core-package. Alternatively, a modified graph be used to replace the :attr:`~CausalStructure.graph` attribute. :meth:`~CausalStructure.get_trained_graph`: This method returns the :class:`~core.Graph` instance that was created by the :meth:`~CausalStructure.train` method. If no training has taken place yet an Exception is raised. The user can modify the returned graph in-place using the core-package. Alternatively, a modified graph be used to replace the :attr:`~CausalStructure.trained_graph` attribute. :meth:`~CausalStructure.get_data_linker`: This method creates a :class:`~core.DataLinker` instance (see the core-package documentation for details) compatible with the internal graph from a provided data set. Examples -------- The :class:`CausalStructure`, :class:`~core.Dependency` and :class:`~core.Dependencies` classes are further explained in the following examples: .. toctree:: :maxdepth: 1 examples/04_causal_structure/01-causal_structure_dependency_basics examples/04_causal_structure/02-01-creation_and_training examples/04_causal_structure/02-02-prediction Real data applications of the :class:`CausalStructure` are in the following examples: .. toctree:: :maxdepth: 1 examples/04_causal_structure/03-causal_structures_calschool .. _highlevel objectives: Objectives ========== Objectives are special classes that answer specific questions. The answer is based on the trained graph and the additional arguments to the objective (e.g. data). After the causal structure has been trained with the :meth:`~CausalStructure.train` method objectives can be evaluated by calling the :meth:`~CausalStructure.evaluate_objective` method. The first argument to the :meth:`~CausalStructure.evaluate_objective` method is the objective class. The available classes are :class:`~Predictor`: The predictor answers the question what the values of all parameters could be given the values of a subset of the parameters as data. :: >>> causal_structure.evaluate_objective(Predictor, data=data_in, measure="mean") a b c d 0 -1.0 -1.0 0.976529 -2.953433 1 0.5 0.5 -0.476481 1.490052 >>> causal_structure.evaluate_objective(Predictor, data=data_in, measure="std") a b c d 0 0.0 0.0 9.443231 10.300908 1 0.0 0.0 0.875485 0.982973 :class:`~Evaluator`: The evaluator answers the question how well the predictions perform on a test data set. :: >>> data_test = pandas.DataFrame(columns=["a", "b", "c", "d"], >>> data=[[-1, -1, 1, -3], >>> [ 0, -1, 2, -1], >>> [-1, 0, -1, -2], >>> [ 2, 1, 0, 5], >>> [ 1, 2, -3, 4]]) >>> causal_structure.evaluate_objective(Evaluator, data=data_test, >>> inputs=["a", "b"], metric="r2") {'a': None, 'b': None, 'c': 0.9997590212263663, 'd': 0.9998194827898427} :class:`~InfluenceEstimator`: The influence estimator answers the question how much a certain target is influenced by the other parameters. :: >>> causal_structure.evaluate_objective(InfluenceEstimator, target="d") {'a': 0.7127661538640915, 'b': 0.4127766942443396, 'c': 0.0, 'd': 1.0} :class:`~OutlierDetector`: The outlier detector answers the question which data points in a given data set are outliers (i.e. are very incompatible with the trained graph). :: >>> data_test = pandas.DataFrame(columns=["a", "b", "c", "d"], >>> data=[[1.5, 1.0, -0.5, 4.0], >>> [1.5, 1.0, -0.5, 40.0]]) >>> causal_structure.evaluate_objective(OutlierDetector, data=data_test) a b c d graph 0 1.0 0.0 0.0 0.0 0.0 1 1.0 0.0 1.0 1.0 1.0 :class:`~RankEstimator`: The rank estimator is the continuous analogon to the outlier detector. It answers the question of how many comparison data points would be more likely than the data point in question. :: >>> causal_structure.evaluate_objective(RankEstimator, data=data_test) a b c d graph 0 0.04 0.31 0.60 0.52 0.22 1 0.04 0.31 0.00 0.00 0.00 :class:`~ProbabilityEstimator`: The probability estimator answers the question of what is the logarithmic probability density of the data point in question. :: >>> causal_structure.evaluate_objective(ProbabilityEstimator, data=data_test) a b c d graph 0 -2.225791 -0.725791 -2.117248 -2.287631 -7.388386 1 -2.225791 -0.725791 -2.117248 -997.881033 -1222.759525 To answer questions which are not covered by these objective classes the user will have to utilize the low-level functionalities of the :ref:`core package`. Examples -------- The objectives are used with the :class:`CausalStructure` class in the following examples: .. toctree:: :maxdepth: 1 examples/04_causal_structure/02-03-objectives_intro examples/04_causal_structure/02-04-evaluation examples/04_causal_structure/02-05-outlier_detection examples/04_causal_structure/02-06-influence_estimation examples/04_causal_structure/02-07-rank_estimation examples/04_causal_structure/02-08-probability_estimation