# Performance Evaluation#

```
[1]:
```

```
%%capture
# execute the creation & training notebook first
%run "02-01-creation_and_training.ipynb"
```

After training we might want to know how well our causal structure can predict data. We can evaluate the predictive power of our causal structure using the `.evaluate`

method.

For this we will need some test data. We create these artifical data now.

```
[2]:
```

```
np.random.seed(42)
n_data = 100
parameter_a = 5 + np.random.randn(n_data) * 0.1
parameter_b = parameter_a * (-35) + 150 + np.random.randn(n_data) * 1.
parameter_c = parameter_a * 10.5 + parameter_b * (.5) + np.random.randn(n_data) * 0.01
test_data = pd.DataFrame(data={"(a)": parameter_a,
"(b|a)": parameter_b,
"(c|a,b)": parameter_c})
```

Apart from the test data, we will have to specify which parameter(s) serve as prediction inputs. Let’s start with ‘(a)’ being the only input. This means we will evaluate the performance of predicting the other parameters values from the value of ‘(a)’.

```
[3]:
```

```
evaluation = causal_structure.evaluate(data=test_data,
inputs=["(a)"])
evaluation
```

```
[3]:
```

```
(a) NaN
(b|a) 0.927296
(c|a,b) 0.697470
Name: r2, dtype: float64
```

By default `.evaluate`

evaluates the R2-score. We see that based on ‘(a)’ the model achieves a prediction score of ~0.9 on ‘(b|a)’ and ~0.5 on ‘(c|a,b)’. For ‘(a)’ we get the answer `NaN`

, since it was part of the inputs.

If we change the inputs to ‘(a)’ and ‘(b|a)’ we expect an increase in the score for ‘(c|a,b)’. We cann also pass further arguments to the `evaluate`

method. Say next to the R2-score we want to know the root mean square error (“rmse”).

```
[4]:
```

```
evaluation = causal_structure.evaluate(data=test_data,
inputs=["(a)", "(b|a)"],
metric=("r2", "rmse"))
evaluation
```

```
[4]:
```

(a) | (b|a) | (c|a,b) | |
---|---|---|---|

r2 | None | None | 0.998541 |

rmse | None | None | 0.032054 |

As expected the R2-score for ‘(c|a,b)’ increased significantly. Additionally we have the root mean squared error, which is very low.

For further details about the `Evaluator`

see the corresponding section in the core-documentation.

In the next section we will have a look at outlier detection.