Before starting Causality, it is good to start with the following papers:
This is a nice article written by Judea Pearl and Dana Mackenzie in WSJ. Link
The current data-crunching approach to machine learning misses an essential element of human intelligence.
Put simply, today’s machine-learning programs can’t tell whether a crowing rooster makes the sun rise, or the other way around. Whatever volumes of data a machine analyzes, it cannot understand what a human gets intuitively. From the time we are infants, we organize our experiences into causes and effects. The questions “Why did this happen?” and “What if I had acted differently?” are at the core of the cognitive advances that made us human, and so far are missing from machines.
Suppose, for example, that a drugstore decides to entrust its pricing to a machine learning program that we’ll call Charlie. The program reviews the store’s records and sees that past variations of the price of toothpaste haven’t correlated with changes in sales volume. So Charlie recommends raising the price to generate more revenue. A month later, the sales of toothpaste have dropped—along with dental floss, cookies and other items. Where did Charlie go wrong? Charlie didn’t understand that the previous (human) manager varied prices only when the competition did. When Charlie unilaterally raised the price, dentally price-conscious customers took their business elsewhere. The example shows that historical data alone tells us nothing about causes—and that the direction of causation is crucial.
Machine-learning systems have made astounding progress at analyzing data patterns, but that is the low-hanging fruit of artificial intelligence. To reach the higher fruit, AI needs a ladder, which we call the Ladder of Causation. Its rungs represent three levels of reasoning.
This paper is published in 2021 in European Journal of Epidemiology. Link
Causal and prediction research usually require different methods, and yet their findings may get conflated when reported and interpreted. The aim of the current study is to quantify the frequency of conflation between etiological and prediction research, to discuss common underlying mistakes and provide recommendations on how to avoid these.
A nice paper published in 2022: Link
We describe and contrast two distinct problem areas for statistical causality:
- studying the likely effects of an intervention (effects of causes), and
- studying whether there is a causal link between the observed exposure and outcome in an individual case (causes of effects).
If two random variables
Let
We can now provide a formal definition of a causal effect for an individual: The treatment
Epidemiologists, statisticians, economists, and other social scientists refer to the action
The variables
treatment
For each individual, one of the counterfactual outcomes (the one that corresponds to the treatment value that the individual did receive) is actually factual. For example, because Zeus was actually treated
Consistency: if
Individual causal effects are defined as a contrast of the values of counterfactual outcomes, but only one of those outcomes is observed for each individual the one corresponding to the treatment value actually experienced by the individual. All other counterfactual outcomes remain unobserved.
Assume
with
Then
and
(no matter how much we whiten someone’s teeth, this will not have any effect on this person’s smoking habits) similarly,
proof of
When the proportion of individuals who develop the outcome in the treated
Causal inference requires data like the hypothetical data in Table 1.1, but all we can ever expect to have is real world data like those in Table 1.2. The question is then under which conditions real world data can be used for causal inference. The next chapter provides one answer: conduct a randomized experiment.
Suppose you know that carrying a lighter
The lack of an arrow between
Association, unlike causation, is a symmetric relationship between two variables (an edge without direction); thus, when present, association flows between two variables regardless of the direction of the causal arrows.
We know that carrying a lighter
We learn that Hera is carrying a lighter. But if Hera is carrying a lighter
Suppose you know that certain genetic haplotype
The lack of an arrow between
Now lets check whether A and Y are associated.
Learning about the haplotype
Causal graphs theory again confirms our intuition because it says that colliders, unlike other variables, block the flow of association along the path on which they lie.
Now suppose we obtain an additional piece of information: aspirin
Confounding is the bias due to common causes of treatment and outcome. Bellow is the graph of a treatment
The conditional probability of treatment
We can estimated the probability of treatment given the covariates
The instrumental variable
which introduces a variable
Casualty is identification, not prediction.
Predictive models can simply ignore the
A mediator variable explains the process through which two variables are related, while a moderator variable affects the strength and direction of that relationship.
https://psychdrop.com/2020/04/05/mediation-versus-moderation-whats-the-difference/
(add picture from the link)
Mediators mediate the relationship between X and Y. This occurs by X affecting M leading to M affecting Y, which is called the indirect effect. The direct effect is the relationship between X and Y in the presence of a mediator. Mediation occurs when (1) there is a statistically significant indirect effect (2) the direct effect is smaller than the total effect.
Moderator variables modify the relationship between X and Y. They affect the strength and direction of the relationship between X and Y. That means that X‘s effect on Y can change depending on the moderator.
A confounder is a third variable that affects variables of interest and makes them seem related when they are not. In contrast, a mediator is the mechanism of a relationship between two variables: it explains the process by which they are related.
Confounders are often demographic variables such as age, gender, and race that typically cannot be changed in an experimental design. Mediators are by definition capable of being changed and are often selected based on malleability. Suppressor variables may or may not be malleable.
According to research and data, the mortality rate of developed countries is lower than in developing countries because of advanced healthcare facilities. So, here, developed countries are the independent variable, the mortality rate is the dependent variable, and the mediator would be better healthcare facilities that navigate the relationship between both.
Being a developed country cannot influence its mortality rate directly. But after introducing the mediator here, which is better health care facilities, we can see an obvious effect of being a developed country and having a low mortality rate.
https://www.statisticshowto.com/mediator-variable/
https://en.wikipedia.org/wiki/Causal_system
"Elements of Causal Inference" book has a good example in Figure 5.1.
- average treatment effect (ATE)
- average treatment effect on the treated (ATT)
- treatment-on-the-treated (TOT) effect
- average treatment effect on the untreated (ATU)
Two important questions:
- The Cause Question: What is a cause of a given effect?
- The Effect Question: What is an effect of a given cause?
In the context of a randomized controlled trial, the ATT can be estimated by comparing the outcomes of individuals who received the treatment to the outcomes of individuals who did not receive the treatment, but who were otherwise similar in all relevant respects.
In observational studies, where the assignment of treatment is not random, the ATT can be estimated using methods such as propensity score matching or regression analysis that adjust for confounding factors that may affect both the treatment assignment and the outcome.
Causal Inference: What If, Miguel A. Hernán, James M. Robins, December 31, 2020 https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ Quasi-Experimentation: A Guide to Design and Analysis, Charles S. Reichardt Elements of Causal Inference: Foundations and Learning Algorithms, Jonas Peters, Dominik Janzing, and Bernhard Scholkopf Judea PearlProfessor of Computer Science Department, Cognitive Systems Lab, UCLA
http://bayes.cs.ucla.edu/jp_home.html
Miguel Hernan,Kolokotrones Professor of Biostatistics and Epidemiology at Harvard and Broad Institute
https://www.hsph.harvard.edu/profile/miguel-hernan/
https://www.youtube.com/watch?v=gRkUhg9Wb-I&ab_channel=MITOpenCourseWare
https://www.youtube.com/watch?v=zvrcyqcN9Wo&ab_channel=BroadInstitute
https://github.com/MIT-LCP/mimic-code
https://github.com/uber/causalml
They use ML for uplift modeling and they dont care about the causal graph, they are intersted in the treatment effect. The model is hevily dependent on the uplift model assumptions.
It allows user to estimate the Conditional Average Treatment Effect (CATE) or Individual Treatment Effect (ITE) from experimental or observational data. Essentially, it estimates the causal impact of intervention
$T$ on outcome$Y$ for users with observed features$X$ , without strong assumptions on the model form.
https://en.wikipedia.org/wiki/Uplift_modelling
The uplift of a marketing campaign is usually defined as the difference in response rate between a treated group and a randomized control group.
However, many marketers define lift (rather than uplift) as the difference in response rate between treatment and control, so uplift modeling can be defined as improving (upping) lift through predictive modeling.
There are 4 groups
- The Persuadables : customers who only respond to the marketing action because they were targeted
- The Sure Things : customers who would have responded whether they were targeted or not
- The Lost Causes : customers who will not respond irrespective of whether or not they are targeted
- The Do Not Disturbs or Sleeping Dogs : customers who are less likely to respond because they were targeted
The only segment that provides true incremental responses is the Persuadables.
Uplift modelling provides a scoring technique that can separate customers into the groups described above. (How?)
Traditional response modelling often targets the Sure Things being unable to distinguish them from the Persuadables.
- CausalML: Python Package for Causal Machine Learning https://arxiv.org/pdf/2002.11631
- Uplift Modeling for Multiple Treatments with Cost Optimization https://arxiv.org/pdf/1908.05372
https://github.com/py-why/dowhy
https://www.microsoft.com/en-us/research/group/alice/
https://github.com/Microsoft/EconML
https://www.cs.ubc.ca/labs/lci/mlrg/slides/doCalc.pdf
There is observational data (seeing) and interventional data (doing). Usually the DAG is designed for observational data, but that ignores the possibility of unobserved variables, also without interventional data you can’t distinguish the direction of causality.
Simplest external intervention: a single variable is forced to take some fixed value (in a graph remove arrows entering that variable)
”chain”, ”fork”, ”v-structure” or ”collider”
d-separated
, give
If d-connected
.
do()
operator marks an action or an intervention in the model. In an algebraic model we replace certain functions with a constant
- Rule 1 (Insertion/deletion of observations)
- Rule 2 (Action/observation exchange)
- Rule 3 (Insertion/deletion of actions)
Goal is to generate probabilistic formulas for the effect of interventions in terms of the observed probabilities.
Not all models are acyclic. See for example Modeling Discrete Interventional Data Using Directed Cyclic Graphical Models (UAI 2009) by Mark Schmidt and Kevin Murphy
The do-calculus is an axiomatic system for replacing probability formulas containing the do operator with ordinary conditional probabilities.
https://www.andrewheiss.com/blog/2021/09/07/do-calculus-backdoors/
- Rule 1: Decide if we can ignore an observation
- Rule 2: Decide if we can treat an intervention as an observation
- Rule 3: Decide if we can ignore an intervention