Distributions

Imports

First let us import the required packages, classes, and functions.

[1]:
import numpy as np
import matplotlib.pyplot as plt

import halerium.core as hal

from halerium.core import Variable, Graph
from halerium.core.distribution import (
    BernoulliDistribution, DiracDistribution, LogNormalDistribution,
    NormalDistribution, UniformDistribution
)

from halerium.core import get_generative_model

Variables and distributions

Halerium variables can follow various kinds of parametrized distributions.

For a variable, the class of the distribution that it follows is fixed upon creation of the variable instance. By default, variables have a normal distribution (which is likely the right choice in most cases):

[2]:
v = Variable("v")
type(v.distribution)
[2]:
halerium.core.distribution.normal_distribution.NormalDistribution

The distribution can also be explicitly stated when creating a variable.

This creates a log-normal distributed variable:

[3]:
v = Variable("v", distribution=LogNormalDistribution)
type(v.distribution)
[3]:
halerium.core.distribution.log_normal_distribution.LogNormalDistribution

A variable’s distribution and data tye have to be compatible:

[4]:
for distribution in ("BernoulliDistribution", "DiracDistribution", "NormalDistribution"):
    for dtype in ("bool", "float"):
        try:
            v = Variable("v", distribution=distribution, dtype=dtype)
            print(f" {distribution} and {dtype} are compatible.")
        except:
            print(f" {distribution} and {dtype} are not compatible.")

 BernoulliDistribution and bool are compatible.
 BernoulliDistribution and float are not compatible.
 DiracDistribution and bool are compatible.
 DiracDistribution and float are compatible.
 NormalDistribution and bool are not compatible.
 NormalDistribution and float are compatible.

While the variable’s distribution class has to be decided at variable creation, the variable’s distribution parameters can be set either at variable creation, e.g.

[5]:
v = Variable("v", distribution=NormalDistribution, mean=0, variance=1)

or at a later point:

[6]:
v = Variable("v", distribution=NormalDistribution)
v.mean = 0
v.variance = 1

However, the distribution parameters need to be set eventually (unless a variable is fully determined by data). Creating a model from a graph containing a variable with a missing distribution parameter may raise an exception:

[7]:
with Graph("g") as g:
    Variable("v", distribution=NormalDistribution)

try:
    get_generative_model(graph=g)
except Exception as e:
    print("Error:", e)
Error: Variable <halerium.Variable 'g/v'> with NormalDistribution without mean is not fully determined by data. Variables with distribution type NormalDistribution must have a mean or be fully determined by data.

Distribution classes

Normal distribution

This creates a variable with a normal distribution:

[8]:
v = Variable("v", distribution=NormalDistribution, mean=0, variance=1)

The parameters of a NormalDistribution are ‘mean’ and ‘variance’, i.e. the mean and variance, resp., of the distribution:

[9]:
v.distribution.parameter_names
[9]:
{'mean', 'variance'}

The mean can take any real value. The variance must be a positive number, otherwise the variable’s value is not-a-number.

The data type of a normally distributed variable is ‘float’:

[10]:
v.dtype
[10]:
'float'

Log-normal distribution

This creates a variable with a log-normal distribution:

[11]:
v = Variable("v", distribution=LogNormalDistribution, mean_log=0, variance_log=1)

The parameters of a LogNormalDistribution are ‘mean_log’ and ‘variance_log’, i.e. the mean and variance, resp., of the underlying normal distribution:

[12]:
v.distribution.parameter_names
[12]:
{'mean_log', 'variance_log'}

The parameter mean_log can take any real value. The parameter variance_log must be a positive number, otherwise the variable’s value is not-a-number.

The data type of a log-normally distributed variable is ‘float’:

[13]:
v.dtype
[13]:
'float'

Uniform distribution

This creates a variable with a uniform distribution:

[14]:
v = Variable("v", distribution=UniformDistribution, center=0, width=1)

The parameters of a UniformDistribution are ‘center’ and ‘width’:

[15]:
v.distribution.parameter_names
[15]:
{'center', 'width'}

The center can be any real number, the width must be positive. The distribution’s possible values then lie in the interval [center - width/2, center + width/2].

The data type of a uniform distributed variable is ‘float’:

[16]:
v.dtype
[16]:
'float'

Bernoulli distribution

This creates a variable with a Bernoulli distribution:

[17]:
v = Variable("v", distribution=BernoulliDistribution, mean=0.5)

The parameter of a BernoulliDistribution is either ‘logit’ or ‘mean’:

[18]:
v.distribution.parameter_names
[18]:
{'logit', 'mean'}

The logit can be any real number. The mean must be between 0 and 1. The logit and mean are not independent parameters. Thus one can only specify either the logit or the mean. The other parameter is then automatically set to the corresponding value.

The data type of a uniform distributed variable is ‘bool’:

[19]:
v.dtype
[19]:
'bool'

Dirac distribution

This creates a variable with a Dirac distribution:

[20]:
v = Variable("v", distribution=DiracDistribution, mean=0.5)

The parameter of a DiracDistribution is the ‘mean’:

[21]:
v.distribution.parameter_names
[21]:
{'mean'}

The only possible value of a Dirac distributed variable is the value of its mean:

[22]:
v.evaluate()
[22]:
array(0.5)

By default, data type of a Dirac distributed variable is ‘float’:

[23]:
v.dtype
[23]:
'float'

For variables with data type ‘float’, the mean can be any real number. For variables with data type ‘bool’, the mean must be boolean, too.

This creates a Dirac distributed boolean:

[24]:
v = Variable("v", distribution=DiracDistribution, dtype='bool', mean=True)
v.evaluate()
[24]:
array(True)
[ ]: