Causal Inference Example - Coupon Case#

Your data driven results may be wrong!#

Numbers don’t lie - but they sure do tell a lot of half-truths. In the following example we explore one such case, where the difference between correlation and causation is crucial. Can we even separate the two? Yes of course, but we need to take a step beyond pure statistics.

In our example case we model the buying behavior from households. For a certain year we know their income, how much they spend on gasoline, how often they use coupons for groceries, and how much revenue they make at the grocery store.

So far households look for coupons themselves (pull principle). Your task is to assess to potential of a push principle.

Analyze Data#

[1]:

import pandas as pd
data = pd.read_csv("coupon_data.csv")
data = data.rename(
    columns={"household_income": "income", "gas_expenditure": "gas",
             "coupon_fraction": "coupon", "store_revenue": "revenue"})

A common way to get first impressions of the data is the correlation matrix. Evaluating our data with this approach yields an interesting result:

[2]:

data.corr().style.background_gradient(cmap="RdYlGn", vmin=-1, vmax=1)

[2]:

	income	gas	coupon	revenue
income	1.000000	0.700277	-0.751953	0.698591
gas	0.700277	1.000000	-0.519725	0.473422
coupon	-0.751953	-0.519725	1.000000	-0.490240
revenue	0.698591	0.473422	-0.490240	1.000000

There is a negative correlation between the usage of coupons and the revenue. That’s odd!

Doesn’t common sense dictate that coupons incentivize to buy more?

We will have to take a step beyond mere correlations and use a structured model with causal calculus.

In the following we will pin this against the standard machine learning approach – a black-box model.

[3]:

from halerium import CausalStructure


By using this Application, You are agreeing to be bound by the terms and conditions of the Halerium End-User License Agreement that can be downloaded here: https://erium.de/halerium-eula.txt

Black-Box Model#

Define a Black-Box Model#

Machine Learning usually utilizes black box models. All inputs are assumed to be independent and there are no prior assumptions about their effects on the outputs. Here our inputs are the household income, the gas expenditure, and the coupon usage. Our output is the store revenue.

[4]:

cs_bb = CausalStructure([[
  ['gas', 'income', 'coupons'], 'revenue']])
cs_bb.train(data)

As we later want to predict revenue from gas expenditure and coupon usage, we could also omit income as an input. There is no reason to prefer one variant over another in the black box model.

Predict with the Black-Box Model#

[5]:

cs_bb.predict({
     'gas': [1000.0],
     'coupon': [0.30]
})

[5]:

	coupons	income	gas	revenue
0	0.0	60328.630556	1000.0	4211.810692

The Black-Box model ignores the information that gas expenditure and coupon usage carry about the income. Therefore, the missing value for the income can only be imputed as the mean income. The revenue prediction thus suffers from a systematic bias.

Interventions on the Black-Box Model are Pointless#

We now want to assess a different couponing strategy:

“What if we just gave people coupons independent of their income via a push principle (e.g. via email or an app)?”

[6]:

cs_bb.predict_interventions(
    data={
       'gas': [1000.0],
    },
    interventions={
      'coupon': [0.30]
    }
)

[6]:

	coupons	income	gas	revenue
0	1.387779e-19	60328.630556	1000.0	4214.449284

In the Black-Box model coupon is an independent input. An intervention on the coupon changes nothing. We get the same result as for the prediction.

The Black-Box Model cannot distinguish between the two couponing strategies.

Structured Model#

Define a Structured Model#

In a Structured Model we can define the causal relationships between different parameters. Those relationships can reflect physical laws, an expert’s experience or even just assumptions. Here we assume that the frequency in which people use coupons depends on their income. Rich people do not bother looking for coupons (on average). Furthermore, we assume that wealthier households will drive more and with less fuel-efficient cars.

[7]:

cs_st = CausalStructure([
  ['income', ['gas', 'revenue', 'coupon']],
  ['coupon', 'revenue']])
cs_st.train(data)

Predict with the Structured Model#

[8]:

cs_st.predict({
     'gas': [1000.0],
     'coupon': [0.30]
})

[8]:

	income	gas	coupon	revenue
0	63430.673214	1000.0	0.3	4455.597629

In the Structured Model income influences gas and coupon. The income is imputed from the information about coupon usage and gas expenditure. The model learned that a higher income causes lower coupon usage and imputes the income accordingly. This is correct! If you observe that a household never uses coupons your best bet is to assume that it has an above average income and therefore spends more money at the store.

Interventions on the Structured Model are Illuminating#

We now want to assess a different couponing strategy:

“What if we just gave people coupons independent of their income via a push principle (e.g. via email or an app)?”

[9]:

cs_st.predict_interventions(
    data={
       'gas': [1000.0],
    },
    interventions={
      'coupon': [0.30]
    }
)

[9]:

	income	gas	coupon	revenue
0	55289.109428	1000.0	0.3	3606.484593

The structured model knows that the coupon ratio depends on the income. With an intervention we „cut-away” this influence - and with it the influence the income has on the coupon usage. In contrast to the observation, revenue now increases with the usage of coupons, as we would expect.

So do coupons decrease or increase revenue?#

The correlation analysis said decrease, the black-box model said increase. Only structured model with causal calculus let us understand the issue. If we observe that a household uses lots of coupons (in a pull principle) its expected revenue is low, but if we actively supply it with coupons (push principle) its expected revenue will increase. This difference between observation and intervention can only be handled with causal inference

Appendix - Alternative Black-Box Model#

As we mentioned above the black-box model could be set up differently. We could omit income as an input, since we later want to make predictions without it. This changes the behavior of the black-box model quite a bit.

[10]:

cs_bb2 = CausalStructure([[
  ['gas', 'coupons'], 'revenue']])
cs_bb2.train(data)

[11]:

cs_bb2.predict({
     'gas': [1000.0],
     'coupon': [0.30]
})

[11]:

	coupons	gas	revenue
0	0.0	1000.0	3431.674476

The predictions now behave more like the ones of the structured model. With a higher coupon rate the revenue now decreases. The black box model has implicitly learned, that coupons and revenue have another connection to each other via the income.

Since it is a black-box model interventions make no difference.

[12]:

cs_bb2.predict_interventions(
    data={
       'gas': [1000.0],
    },
    interventions={
      'coupon': [0.30]
    }
)

[12]:

	coupons	gas	revenue
0	0.0	1000.0	3608.108074

[ ]: