{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Estimate influences with the Influence Estimator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `InfluenceEstimator` is an objective class in Halerium for estimating the influence of a variable, entity, or (sub)graph on the variance of a specified target.\n", "\n", "For a given graph, target, and element of the graph, the influence estimator calculates the relative reduction of the target's variance when the given graph element is held fixed compared to when all variables are unconstrained." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ingredients" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To estimate the influences, we need the following ingredients:\n", " - a graph, and\n", " - a target.\n", " \n", "The target can be just a variable in the graph, or a function taking the graph as input." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We import the packages, classes, and functions required for the code examples below." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# for handling data:\n", "import numpy as np\n", "\n", "# for building graphs:\n", "from halerium.core import Graph, Entity, Variable, StaticVariable, show\n", "\n", "# for building models:\n", "from halerium.core import get_generative_model, get_posterior_model\n", "\n", "# for estimating influences:\n", "from halerium import InfluenceEstimator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a graph first." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "g = Graph(\"g\")\n", "with g:\n", " e = Entity(\"e\")\n", " with e:\n", " Variable(\"a\", mean=0, variance=2)\n", " Variable(\"b\", mean=0, variance=3)\n", " Variable(\"c\", mean=0, variance=4)\n", "\n", " Variable(\"d\", mean=e.a + e.b + e.c, variance=1)\n", "\n", "show(g)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this graph, the mean of variable `d` is given by the sum of the variables `e.a`, `e.b`, and `e.c`.\n", "The variances of `e.a`, `e.b`, and `e.c` are two, three, and four times as large as the variance of `d` for any fixed set of values for `e.a`, `e.b`, and `e.c`.\n", "The total variance of `d`, when `e.a`, `e.b`, and `e.c` are not fixed, is simply the sum of all these." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total variance of d = [10.10254441]\n" ] } ], "source": [ "generative_model = get_generative_model(g)\n", "d_total_variance = generative_model.get_variances(g.d, n_samples=1000)\n", "print(\"total variance of d =\", d_total_variance)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The influence estimator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's create an influence estimator for the graph `g` and the target `g.d`." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "ie = InfluenceEstimator(graph=g, \n", " target=g.d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now query the influence estimator by calling it with some graph element(s) as argument.\n", "For example, we can ask it to estimate the influence of the variable `g.e.a` by:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.22220185608099374" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ie(g.e.a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The number returned is the relative reduction of the variance of the target when `g.e.a` is held fixed compared to the full variance of the target when no varianbe is held fixed. In the graph considered here, `g.e.a` contributes 20% to the variance of the target `g.d`. Thus the estimated relative variance reduction should be near 0.2." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can ask for the influences of all graph elements by omitting the call argument:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'g': 1.0,\n", " 'g/e': 0.9159407995866957,\n", " 'g/e/a': 0.22220185608099374,\n", " 'g/e/b': 0.302906770122022,\n", " 'g/e/c': 0.39083217338368,\n", " 'g/d': 1.0}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ie()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The graph `g` itself as whole accounts for all the variance in its member `g.d`, and thus its influence `g.d` expressed as relative variance reduction is unity. \n", "\n", "The same holds for the target itself. Fixing its value removes any variance in it. Thus the relative variance reduction is 100%.\n", "\n", "90% of the variance in `g.d` stems from variables in the entity `g.e`. Thus, the estimated influence of `g.e` on `g.d` is close to 0.9. Resolving the constituents of `g.e`, one finds that `g.e.a`, `g.e.b`, and `g.e.c` each infuence the target `g.d` by 20%, 30%, and 40%, resp. The remaining 10% of the variance of `g.d` is irreducible noise." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualizing the influence estimator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can add the information of the objective to the graph visualization by using `show` and then activating the objective's button in the bottom right of the canvas." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "show(ie)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Options" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Target" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As in the above example, the target can be a variable in the graph." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'g': 1.0,\n", " 'g/e': 0.8942984478404212,\n", " 'g/e/a': 0.199091176983294,\n", " 'g/e/b': 0.29995665619246964,\n", " 'g/e/c': 0.3952506146646576,\n", " 'g/d': 1.0}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ie = InfluenceEstimator(graph=g, \n", " target=g.d)\n", "ie()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The target can also be an entity in the graph.\n", "Let's make a graph with two entities." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "g2 = Graph(\"g2\")\n", "with g2:\n", " e1 = Entity(\"e1\")\n", " with e1:\n", " Variable(\"a\", mean=0, variance=2)\n", " Variable(\"b\", mean=0, variance=3)\n", " Variable(\"c\", mean=0, variance=4)\n", " \n", " e2 = Entity(\"e2\")\n", " with e2:\n", " Variable(\"d1\", variance=1)\n", " Variable(\"d2\", variance=1)\n", " e2.d1.mean = e1.a + e1.b\n", " e2.d2.mean = e1.a + e1.c" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now take the second entity as target and estimate the influence of all the graph components on its variance." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'g2': 0.7793190412540283,\n", " 'g2/e1': 0.8799107327311396,\n", " 'g2/e1/a': 0.4582248784911632,\n", " 'g2/e1/b': 0.17902348299773718,\n", " 'g2/e1/c': 0.2426623712422392,\n", " 'g2/e2': 0.7793190412540283,\n", " 'g2/e2/d1': 0.36816158978351676,\n", " 'g2/e2/d2': 0.41115745147051147}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ie = InfluenceEstimator(graph=g2, \n", " target=g2.e2)\n", "ie()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One can also take functions as targets. Such a function must take a graph as argument and return operation. As an example, we define the function:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def f(graph):\n", " return graph.e2.d1 + graph.e2.d2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we specify the function as target:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'g2': 0.7506897555952952,\n", " 'g2/e1': 0.8536942019034909,\n", " 'g2/e1/a': 0.44848251047837473,\n", " 'g2/e1/b': 0.16228885840705484,\n", " 'g2/e1/c': 0.24292283301806125,\n", " 'g2/e2': 0.7506897555952952,\n", " 'g2/e2/d1': 0.32748271279832525,\n", " 'g2/e2/d2': 0.42320704279696997}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ie = InfluenceEstimator(graph=g2, \n", " target=f)\n", "ie()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The visualization still works in the same way." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "show(ie)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Speed vs. Accuracy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The influence estimator class takes an argument `n_samples`, which regulates how many examples are used when computing the variances. Fewer examples make the estimator faster but less accurate, more examples requires more computing time but yields more accurate results.\n", "\n", "Let's try fast but inaccurate:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'g': 1.0,\n", " 'g/e': 0.8174850455328879,\n", " 'g/e/a': 0.2531955170898377,\n", " 'g/e/b': 0.301689813971006,\n", " 'g/e/c': 0.26259971447204417,\n", " 'g/d': 1.0}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ie = InfluenceEstimator(graph=g, \n", " target=g.d,\n", " n_samples=30)\n", "ie()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now try a slower but more accurate estimator:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'g': 1.0,\n", " 'g/e': 0.9032381973643302,\n", " 'g/e/a': 0.20475754552666683,\n", " 'g/e/b': 0.2999833333235057,\n", " 'g/e/c': 0.3984973185141577,\n", " 'g/d': 1.0}" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ie = InfluenceEstimator(graph=g, \n", " target=g.d,\n", " n_samples=10000)\n", "ie()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 4 }