Performance Evaluation

[1]:
%%capture
# execute the creation & training notebook first
%run "02-01-creation_and_training.ipynb"

After training we might want to know how well our causal structure can predict data. We can evaluate the predictive power of our causal structure using the .evaluate method.

For this we will need some test data. We create these artifical data now.

[2]:
np.random.seed(42)
n_data = 100
parameter_a = 5 + np.random.randn(n_data) * 0.1
parameter_b = parameter_a * (-35) + 150 + np.random.randn(n_data) * 1.
parameter_c = parameter_a * 10.5 + parameter_b * (.5) + np.random.randn(n_data) * 0.01

test_data = pd.DataFrame(data={"(a)": parameter_a,
                               "(b|a)": parameter_b,
                               "(c|a,b)": parameter_c})

Apart from the test data, we will have to specify which parameter(s) serve as prediction inputs. Let’s start with ‘(a)’ being the only input. This means we will evaluate the performance of predicting the other parameters values from the value of ‘(a)’.

[3]:
evaluation = causal_structure.evaluate(data=test_data,
                                       inputs=["(a)"])
evaluation
[3]:
(a)             NaN
(b|a)      0.927296
(c|a,b)    0.697470
Name: r2, dtype: float64

By default .evaluate evaluates the R2-score. We see that based on ‘(a)’ the model achieves a prediction score of ~0.9 on ‘(b|a)’ and ~0.5 on ‘(c|a,b)’. For ‘(a)’ we get the answer NaN, since it was part of the inputs.

If we change the inputs to ‘(a)’ and ‘(b|a)’ we expect an increase in the score for ‘(c|a,b)’. We cann also pass further arguments to the evaluate method. Say next to the R2-score we want to know the root mean square error (“rmse”).

[4]:
evaluation = causal_structure.evaluate(data=test_data,
                                       inputs=["(a)", "(b|a)"],
                                       metric=("r2", "rmse"))
evaluation
[4]:
(a) (b|a) (c|a,b)
r2 None None 0.998541
rmse None None 0.032054

As expected the R2-score for ‘(c|a,b)’ increased significantly. Additionally we have the root mean squared error, which is very low.

For further details about the Evaluator see the corresponding section in the core-documentation.

In the next section we will have a look at outlier detection.