The OutlierDetector class#

Aliases#

halerium.OutlierDetector
halerium.core.OutlierDetector
halerium.core.objectives.OutlierDetector
class OutlierDetector(graph, data, compiler=None, n_samples=1000, outlier_threshold=0.05, method='upsampled', name='OutlierDetector', description=None, copy_graph=True)#

The outlier detector class.

This class classifies events as outliers. The rank is estimated for each data point individually and compared against a threshold. If the rank falls below the threshold, the event is classified as an outlier. The rank of a data point is defined as the fraction of a random sample expected to have the same or lower probabilitiy than the data point in question. So rank values range between 0 and 1.

Parameters:
  • graph (halerium.core.Graph) – The graph that defines the dependencies and probabilities of the variables.

  • data (dict, halerium.core.DataLinker) – The data in which to identify outliers. Either dictionary with variables as keys and data arrays as values, or a DataLinker holding links to the variables in graph.

  • compiler (halerium.core.compiler.compiler_base.CompilerBase, optional) – The backend compiler to be used. The default is the Tensorflow compiler.

  • n_samples (int, optional) – The amount of samples to be used to estimate the probabilities. The default is 1000.

  • outlier_threshold (float, optional) – The rank value below which to classify an event as outlier.

  • method (str, optional,) – The method with which the probability is estimated. Either “marginalized” or “upsampled”. “marginalized” marginalizes the missing values, so that the result represents a probability density only over the variables that have data. “upsampled” samples the missing values, so that the result represents a probability density over all variables. The “marginalized” method is slower and more memory intensive. The dafault is “upsampled”.

  • name (str, optional) – The name of the objective.

  • description (str, optional) – The description of the objective.

  • copy_graph (bool, optional) – Whether the objective should make a copy of the graph for its own use, or just keep the graph itself as attribute. Users should leave this set to the default True, unless they are certain that the graph won’t be altered by the user or other code.

Examples

A call of the instance with the graph as argument returns >>> from halerium.core import Graph, Variable >>> with Graph(“g”) as g: >>> Variable(“v”, mean=0, variance=1) >>> g_data = {g.v: [ 0., 1., 5.]} >>> outlier_detector = OutlierDetector(g, data= g_data) >>> outlier_detector(g) array([False, False, True])

__call__(fetches=None)#

Detect outliers.

Returns whether an event is an outlier of each element in fetches for each data point.

Parameters:

fetches (halerium.core.scope.Scopetor, dict, list, tuple, optional) – The scopetors for which to return the log probabilities. If no fetches are provided, the default is to return estimates for the graph itself and all its elements.

Returns:

ranks – Whether the event is an outlier for each element in fetches.

Return type:

array, dict, list or tuple

dump_dict(value_postprocessor=None)#

Dump a dict with information on the objective.

The dict returned contains the name, description and the values resulting from a call of the objective. Additional keys included are used by the GUI for appropriately displaying the results of the objective.

Parameters:

value_postprocessor (optional) – A function to apply to the values returned by the call of the objective. The default is None, in which case no post-processing is done.

Returns:

result – A dictionary containing the name, description, etc. of the objective.

Return type:

dict