.. currentmodule:: halerium.core .. _core overview: =============================== Low-level usage - Code Overview =============================== This is the code overview for the Halerium core subpackage. With the core subpackage the user can create and modify Halerium structures. The objects in the high-level Halerium package, e.g. the :class:`~CausalStructure`, can be viewed as factories for Halerium core code. Structures ========== Halerium structures are built out of graphs, entities and variables. The building blocks can be nested to build deep and hierarchical structures in a convenient way with the scoping mechanism. Scoping ======= A *scope* is the context in a Halerium structure. Scopes are managed via entering and exiting python `with` blocks. A *scopetor* is an object that can provide such a scope. The Halerium scopetor classes are :class:`~Graph`, :class:`~Entity`, :class:`~Variable`, and :class:`~StaticVariable`. A *scopee* is an object that a becomes *child* of the scope in which it is created. All scopetors are also scopees as well as all operators. For example: :: >>> with Entity("s") as s: >>> e = Entity("e") >>> print(e.scope) More details can be found in .. toctree:: :maxdepth: 1 scope Graphs ====== A :class:`~Graph` instance can contain other :class:`~Graph` instances as well as :class:`~Entity`, :class:`~Variable`, and :class:`~StaticVariable` instances as children. Additionally it has the special children ``inputs`` and ``outputs`` that can only contain :class:`~Entity` instances. In the Halerium platform graphs can be displayed for interactive inspection with the :func:`~.show` method, see .. toctree:: :maxdepth: 1 gui Entities ======== A :class:`~Entity` instance can contain other :class:`~Entity` instances as well as :class:`~Variable`, and :class:`~StaticVariable` instances as children. Variables ========= Dynamic Variables ----------------- The shape of dynamic variables scales with the amount of data. A dynamic variable with a shape of ``(3, 5)`` will correspond to an array of shape ``(11, 3, 5)`` in a model with ``n_data=11``. A :class:`~Variable` instance can contain other :class:`~Variable` instances as well as :class:`~StaticVariable` instances as children. Static Variables ---------------- The shape of static variables does not scale with the amount of data. Their value is therefore universal, which makes them suitable for parameters that are to be learned from a set of training data. A :class:`~StaticVariable` instance can only contain other :class:`~StaticVariable` instances as children. Printing child trees ==================== The function :func:`~print_child_tree` can be applied to any scopetor to see the scopetors child tree. :: >>> e = Entity("e") >>> with e: >>> Entity("ee") >>> with ee: >>> Variable("v") >>> Variable("w") >>> print_child_tree(e) e ├─ee │ └─v └─w Operations ========== Mathematical operations can be conveniently created with the functions in the :mod:`halerium.core.math` module. All math functions are available at the top-module level, e.g. :: halerium.core.exp(halerium.core.constant(1.)) # equivalent to halerium.core.math.exp(halerium.core.math.constant(1.)) For basic arithmetic use the overloaded python operators ``+``, ``-``, ``*``, ``/``, ``**``. ``abs()`` and ``numpy``-stype slicing is also supported. The math functions are designed to mimic their numpy counterparts as closely as possible. Floats and numpy arrays are automatically casted to :class:`~operator.Const` operators when included in a Halerium operation, e.g. :: halerium.core.exp(1.) # equivalent to halerium.core.exp(halerium.core.constant(1.)) Mathematical operations create *operators*, which are scopees. See their documentation in the :mod:`halerium.core.operator` module. All operators that may be used when defining Halerium structures are accessible therein, e.g. :: halerium.core.operator.Add(1., 1.) Printing operand trees ---------------------- The function :func:`~print_operand_tree` can be applied to any operator to see the operators that lead to it. :: >>> a = hal.constant(0.) >>> b = hal.constant(1.) >>> c = a + b >>> d = c * a >>> print_operand_tree(d) ├─ │ ├─ │ └─ └─ Links ===== Links connect entities or variables. A link is created by calling the function :: halerium.core.link(source, target) within the scope of a graph. If ``target`` and ``source`` are variables ``target`` will refer to ``source``. If they are entities the ``target`` entity's variables will refer to the ``source`` entity's variables. A typical scenario to set a link is to link an output entity of one graph to the input entity of another graph. The full set of rules defining valid ``source`` and ``target`` pairs can be found in .. toctree:: :maxdepth: 1 links Data ==== Data can be linked to :class:`~Variable` or :class:`~StaticVariable` instances in a Halerium structure. To link data you provide a dictionary with the variables as keys and numpy arrays as values as the data argument of e.g. a model factory. :: data={graph.var1: np.zeros((10, 4)), graph.var2: np.zeros((10, 2, 3)), graph.static_var_1, np.zeros((3,))} model = get_generative_model(graph=graph, data=data) Alternatively a `DataLinker` instance can be created from a data dictionary with :func:`~get_data_linker` :: dl = get_data_linker(data={graph.var1: np.zeros((10, 4)), graph.var2: np.zeros((10, 2, 3)), graph.static_var_1, np.zeros((3,))}) The `DataLinker` instance is the explicit representation of the data links. It too can be provided as the data argument. :: model = get_generative_model(graph=graph, data=dl) Models ====== Creating Models --------------- Models can only be created from Graph instances. Models are created by combining a Halerium graph with data. Models implement a specific algorithm/solution strategy in order to do actual numerical calculations. The common way to get a model instance is by calling either :func:`~get_generative_model`, :func:`~get_posterior_model`, or :func:`~get_optimizer_model`. :func:`~get_generative_model` will return an instance of :class:`~model.ForwardModel`. This model is purely for generating data in a feed-forward fashion. Information from data only flows forwards along the dependencies of the structure. :func:`~get_posterior_model` will return an instance of either :class:`~model.MAPFisherModel`, :class:`~model.MAPModel`, :class:`~model.ADVIModel`, or :class:`~model.MGVIModel`. These models calculate estimates for the variables in the structure that take all data and all possible directions of information flow into account. Which model class (and therefore solution strategy) is used depend on the keyword argument ``method``. :func:`~.get_optimizer_model` will return an instance of :class:`~model.ForwardOptimizerModel`. This model is used to calculate the optimal values for a set of variables that minimize a cost function that depends on a set of (different) variables in the graph. Evaluating Models ----------------- Models need to be solved with their ``solve`` method, before they can be evaluated. By default models created using the ``get_..._model`` functions come in a trained state. A trained model can - generate a single sample of variables with the ``get_example`` method - generate samples of variables with the ``get_samples`` method - calculate the mean of variables with the ``get_means`` method - calculate the standard deviation ``get_standard_deviations`` method - calculate the variance of variables with the ``get_variances`` method. From posterior models you can also extract a posterior graph by calling the ``get_posterior_graph`` method. The posterior graph will contain updated probability distribution for all :class:`~StaticVariable` instances in its graph. Further explanations on creating and solving/training models can be found in .. toctree:: :maxdepth: 1 examples/01_introduction/04_more_on_training_models Training Graphs --------------- To directly get a posterior graph from the combination of a :class:`~Graph` instance and data the :class:`~Trainer` class can be used. Upon instantiation the class will create and solve a model. When called the Trainer returns the posterior graph. :: >>> trainer = Trainer(graph=graph, data=train_data) >>> trained_graph = trainer() For a more detailed example see .. toctree:: :maxdepth: 1 examples/01_introduction/03_trainer To understand the mathematical background of the training process see .. toctree:: :maxdepth: 1 examples/01_introduction/05_what_happens_during_training Objectives ========== The objective classes introduces in the :ref:`main package overview` can also be used on the core level. Here the objective class is instantiated with a (trained) graph instance and additional arguments like data. When the instantiated objective is called the objective result is returned for the provided scopetors. :: >>> predictor = Predictor(graph=graph, data=prediction_input_data) >>> predictor(graph.y) array([1., 2., 3., 4.]) The available objectives are - :class:`~halerium.Predictor` - :class:`~halerium.Evaluator` - :class:`~halerium.InfluenceEstimator` - :class:`~halerium.OutlierDetector` - :class:`~halerium.RankEstimator` - :class:`~halerium.ProbabilityEstimator` They are explained in detail in .. toctree:: :maxdepth: 1 examples/02_objectives/01_probability_estimator examples/02_objectives/02_influence_estimator examples/02_objectives/03_predictor examples/02_objectives/04_evaluator examples/02_objectives/05_rank_estimator examples/02_objectives/06_outlier_detector Distributions ============= Every :class:`~StaticVariable` or :class:`~Variable` in a Halerium structure has an underlying probability distribution. By default this is the :class:`~distribution.NormalDistribution`. The user can create variables with different types of distributions by providing the distribution class as the distribution argument when creating a variable. :: v = Variable(name="v", distribution=LogNormalDistribution) Each distribution class supports different defining parameters. For the :class:`~distribution.NormalDistribution` these are ``mean`` and ``variance``, which are commonly used with Halerium variables with the default distribution. For the :class:`~distribution.LogNormalDistribution` used in the example above they are ``mean_log`` and ``variance_log``. :: v.mean_log = 0. v.variance_log = 1. The Halerium distributions are explained in more detail in .. toctree:: :maxdepth: 1 examples/01_introduction/06_distributions Below is a short overview of the available distributions and their parameters. Supported distributions ----------------------- Currently, Halerium supports the following distributions: - :class:`~distribution.NormalDistribution` with the parameters ``mean`` and ``variance``, - :class:`~distribution.LogNormalDistribution` with the parameters ``mean_log`` and ``variance_log``, - :class:`~distribution.UniformDistribution` with the parameters ``center`` and ``width``, - :class:`~distribution.BernoulliDistribution` with the parameters ``logit`` and ``mean``, - :class:`~distribution.DiracDistribution` with the parameter ``mean``. Furthermore there is the possibility of a variable having the :class:`~distribution.NoDistribution` in which case it is not a random variable at all, but rather acts as a placeholder for data. Consequently, a variable with the :class:`~distribution.NoDistribution` has to be completely determined by data when creating a model. Regression factories ==================== Regression factories help the user to connect :class:`~Variable` instances by parametrized mathematical formulas with the parameters being :class:`~StaticVariable` instances. Currently, Halerium provides the following factories for creating the static variables and creating the result of the formula applied to the inputs: - :func:`~regression.linear_regression` for linear regression (see https://en.wikipedia.org/wiki/Linear_regression ), - :func:`~regression.polynomial_regression` for polynomial regression (see https://en.wikipedia.org/wiki/Polynomial_regression ), - :func:`~regression.gaussian_process_regression` for regression using gaussian processes (see https://en.wikipedia.org/wiki/Kriging ). For even more convenience :func:`~regression.connect_via_regression` and :func:`~regression.connect_via_gaussian_process` directly set distribution parameters of desired output variables to the regression results. For further explanations see .. toctree:: :maxdepth: 1 examples/01_introduction/07_regression examples/01_introduction/08_gaussian_process_regression examples/01_introduction/09_logistic_regression Causal Calculus =============== Causal calculus is realized via the Do operation, :func:`~do_operation`. With this operation a Graph or other scopetor can be modified to make variables of choice independent of all other variables. This is required to model interventions and to distinguish the effect of interventions from observations. The modified Graph can then be utilized further, e.g. to make predictions. The combination of a do operation and prediction is conveniently accessible in the :class:`~halerium.InterventionPredictor` For a basic introduction of the do operation see .. toctree:: :maxdepth: 1 examples/05_causal_inference/01-Basics Time Series =========== Halerium :class:`~Variable` instances are expanded with the data dimension at model creation time. This way a :class:`~Graph` can be formulated irrespective of the amount of data and the conditional independence along the data axis is ensured by construction. If we use the data axis as a time axis the conditional independence is not desirable. The data values do not represent independent and identically distributed samples, but a time-series. In this time series a particular value is conditionally independent of its corresponding future values, but it can depend on the past. To access the past values of a :class:`~Variable` or any other dynamic operator, we can utilize the :class:`~operator.TimeShift` operator and the ``TimeIndex`` singleton. :: v = Variable("v") past_v = v[TimeIndex-3] # does the same as past_v = TimeShift(operand=v, shift=-3, initial_values=0.) Only past values of a dynamic operator are available. A positive shift would lead to an error. :: >>> future_v = v[TimeIndex+1] RuntimeError: Positive shifts are not supported. A shift of 0 has the same effect as the :class:`~operator.Identity` operator. For further explanations see .. toctree:: :maxdepth: 1 examples/06_time_series/01-creating-arma-graphs examples/06_time_series/02-fitting-a-sarima-model