Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counterfactual Samples giving invalid values of effect in dowhy gcm #1241

Open
PMK1991 opened this issue Aug 21, 2024 · 1 comment
Open

Counterfactual Samples giving invalid values of effect in dowhy gcm #1241

PMK1991 opened this issue Aug 21, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@PMK1991
Copy link

PMK1991 commented Aug 21, 2024

I am working on a heart disease dataset which has continuous variable as a treatment (thalach) and outcome categorical(target).There is another variable that the treatment affects (exang) which is categorical too. They have an inverse relation i.e. when thalach is increased, exang should come down. However, it is giving invalid values (-1) of exang (0,1) when intervened on thalach, using counterfactual_samples.

image

samples = gcm.counterfactual_samples(causal_model, {'thalach': lambda thalach:thalach * 1.1}, observed_data=df_risk).

Before and after intervention:

image

Although, I have a hack by clipping values. I would like to know if there is something built in to constrain the effect of intervention.

@PMK1991 PMK1991 added the bug Something isn't working label Aug 21, 2024
@bloebp
Copy link
Member

bloebp commented Aug 21, 2024

Hi, I am wondering, it seems the categorical variables are numerical here. Can you try converting them to strings or bools (if binary)? Otherwise, the models will interpret these as discrete (with order) or continuous.

When you convert them to categorical values (strings/bools), an issue is, however, in that we only support point-wise counterfactual estimates (in Pearl's sense). You might need to look at interventional samples instead (which also work with categorical non-root nodes). This would be:

samples = gcm.interventional_samples(causal_model, {'thalach': lambda thalach: thalach * 1.1}, observed_data=df_risk)

Note, however, that these are sampled from the interventional distribution, i.e., running it twice will give you slightly different values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants