{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Causal Structures - Creation & Training" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example we want to illustrate how the CausalStructure class is used.\n", "We start with the imports." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "By using this Application, You are agreeing to be bound by the terms and conditions of the Halerium End-User License Agreement that can be downloaded here: https://erium.de/halerium-eula.txt\n" ] } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "import pylab as pl\n", "\n", "from halerium import CausalStructure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The artificial data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We create an artificial data set containing three parameters." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
(a)(b|a)(c|a,b)
04.825023-20.58047240.432953
15.034268-27.33564239.101290
25.115304-32.00894137.765420
34.974756-24.08315740.149657
45.098132-28.68351139.198809
............
955.000302-24.36138840.234901
964.992398-24.90869939.977586
975.000396-23.99658740.410045
984.981499-24.95243439.641407
994.751285-14.71880342.547324
\n", "

100 rows × 3 columns

\n", "
" ], "text/plain": [ " (a) (b|a) (c|a,b)\n", "0 4.825023 -20.580472 40.432953\n", "1 5.034268 -27.335642 39.101290\n", "2 5.115304 -32.008941 37.765420\n", "3 4.974756 -24.083157 40.149657\n", "4 5.098132 -28.683511 39.198809\n", ".. ... ... ...\n", "95 5.000302 -24.361388 40.234901\n", "96 4.992398 -24.908699 39.977586\n", "97 5.000396 -23.996587 40.410045\n", "98 4.981499 -24.952434 39.641407\n", "99 4.751285 -14.718803 42.547324\n", "\n", "[100 rows x 3 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n_data = 100\n", "np.random.seed(100)\n", "parameter_a = 5 + np.random.randn(n_data) * 0.1\n", "parameter_b = parameter_a * (-35) + 150 + np.random.randn(n_data) * 1.\n", "parameter_c = parameter_a * 10.5 + parameter_b * (.5) + np.random.randn(n_data) * 0.1\n", "\n", "data = pd.DataFrame(data={\"(a)\": parameter_a,\n", " \"(b|a)\": parameter_b,\n", " \"(c|a,b)\": parameter_c})\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We purposely chose column names which cannot be used as python variable names." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating the causal structure\n", "\n", "Now that we have our data, we can formulate the set of dependencies that define our causal structure. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "dependencies = [\n", " [\"(a)\", \"(b|a)\"], # the column '(b|a)' depends on '(a)'\n", " [[\"(a)\", \"(b|a)\"], \"(c|a,b)\"], # the column '(c|a,b)' depends on '(a)' and '(b|a)'\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and create the causal structure." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "causal_structure = CausalStructure(dependencies)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Training" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can directly train the causal_structure using" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "causal_structure.train(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What this triggered in the background was actually\n", " 1. ``causal_structure.scaling_data = data``\n", " 2. ``causal_structure.build_graph()``\n", " 3. ``causal_structure.train(data)``\n", " \n", "In the first step scaling data are provided to the causal structure. Scaling data are what the causal structure uses to set the locations and scales of the created variables. These are important to allow for the correct definition of a priori statistics, i.e. useful regularization.\n", "\n", "In the second step the graph is built using default arguments. Each dependency will be mathematically modelled as a quadratic regression with unknown variance.\n", "\n", "In the third step the graph is trained using default arguments." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In [next section](./02-02-prediction.ipynb) we will use the trained causal structure to make predictions." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 4 }