{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Causal Structures - Creation & Training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example we want to illustrate how the CausalStructure class is used.\n",
    "We start with the imports."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "By using this Application, You are agreeing to be bound by the terms and conditions of the Halerium End-User License Agreement that can be downloaded here: https://erium.de/halerium-eula.txt\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import pylab as pl\n",
    "\n",
    "from halerium import CausalStructure"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The artificial data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We create an artificial data set containing three parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>(a)</th>\n",
       "      <th>(b|a)</th>\n",
       "      <th>(c|a,b)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>4.825023</td>\n",
       "      <td>-20.580472</td>\n",
       "      <td>40.432953</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5.034268</td>\n",
       "      <td>-27.335642</td>\n",
       "      <td>39.101290</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5.115304</td>\n",
       "      <td>-32.008941</td>\n",
       "      <td>37.765420</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4.974756</td>\n",
       "      <td>-24.083157</td>\n",
       "      <td>40.149657</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5.098132</td>\n",
       "      <td>-28.683511</td>\n",
       "      <td>39.198809</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95</th>\n",
       "      <td>5.000302</td>\n",
       "      <td>-24.361388</td>\n",
       "      <td>40.234901</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>96</th>\n",
       "      <td>4.992398</td>\n",
       "      <td>-24.908699</td>\n",
       "      <td>39.977586</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>97</th>\n",
       "      <td>5.000396</td>\n",
       "      <td>-23.996587</td>\n",
       "      <td>40.410045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98</th>\n",
       "      <td>4.981499</td>\n",
       "      <td>-24.952434</td>\n",
       "      <td>39.641407</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99</th>\n",
       "      <td>4.751285</td>\n",
       "      <td>-14.718803</td>\n",
       "      <td>42.547324</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>100 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         (a)      (b|a)    (c|a,b)\n",
       "0   4.825023 -20.580472  40.432953\n",
       "1   5.034268 -27.335642  39.101290\n",
       "2   5.115304 -32.008941  37.765420\n",
       "3   4.974756 -24.083157  40.149657\n",
       "4   5.098132 -28.683511  39.198809\n",
       "..       ...        ...        ...\n",
       "95  5.000302 -24.361388  40.234901\n",
       "96  4.992398 -24.908699  39.977586\n",
       "97  5.000396 -23.996587  40.410045\n",
       "98  4.981499 -24.952434  39.641407\n",
       "99  4.751285 -14.718803  42.547324\n",
       "\n",
       "[100 rows x 3 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "n_data = 100\n",
    "np.random.seed(100)\n",
    "parameter_a = 5 + np.random.randn(n_data) * 0.1\n",
    "parameter_b = parameter_a * (-35) + 150 + np.random.randn(n_data) * 1.\n",
    "parameter_c = parameter_a * 10.5 + parameter_b * (.5) + np.random.randn(n_data) * 0.1\n",
    "\n",
    "data = pd.DataFrame(data={\"(a)\": parameter_a,\n",
    "                          \"(b|a)\": parameter_b,\n",
    "                          \"(c|a,b)\": parameter_c})\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We purposely chose column names which cannot be used as python variable names."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating the causal structure\n",
    "\n",
    "Now that we have our data, we can formulate the set of dependencies that define our causal structure. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "dependencies = [\n",
    "    [\"(a)\", \"(b|a)\"], # the column '(b|a)' depends on '(a)'\n",
    "    [[\"(a)\", \"(b|a)\"], \"(c|a,b)\"], # the column '(c|a,b)' depends on '(a)' and '(b|a)'\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and create the causal structure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "causal_structure = CausalStructure(dependencies)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can directly train the causal_structure using"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "causal_structure.train(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What this triggered in the background was actually\n",
    " 1. ``causal_structure.scaling_data = data``\n",
    " 2. ``causal_structure.build_graph()``\n",
    " 3. ``causal_structure.train(data)``\n",
    " \n",
    "In the first step scaling data are provided to the causal structure. Scaling data are what the causal structure uses to set the locations and scales of the created variables. These are important to allow for the correct definition of a priori statistics, i.e. useful regularization.\n",
    "\n",
    "In the second step the graph is built using default arguments. Each dependency will be mathematically modelled as a quadratic regression with unknown variance.\n",
    "\n",
    "In the third step the graph is trained using default arguments."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In [next section](./02-02-prediction.ipynb) we will use the trained causal structure to make predictions."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}