Hierarchical Causal Models #236

adamrupe · 2024-09-06T16:17:50Z

Closes #278

This PR implements Hierarchical Causal Models (Weinstein and Blei, 2024)

This PR will be ready for review when the following algorithms have been tested and implemented.

Algorithm 1: Graphical algorithm for collapsing a hierarchical causal graphical model (HCGM). This algorithm transforms the graph of a hierarchical causal model (HCM) into the graph of its collapsed model, following Definition 4.
Algorithm 2: Graphical algorithm for augmenting a collapsed model. This algorithm adds an
augmentation variable to a collapsed HCGM, following Definition 6.
Algorithm 3: Graphical algorithm for marginalizing an augmented model. This algorithm
marginalizes out parent(s) of an augmentation variable (Section 5.2).
Causal query pipeline: Utilizes Algorithms 1 -3 (as needed) to check if a causal query is identifiable in the HCM. The use of Algorithms 2 and 3 depends on the causal query, i.e. whether a variable needs to be augmented in (Alg 2) and then whether another variable needs to be marginalized out (Alg 3).
HSCM tests
High-level example (with real-world motivation) that shows how to do a causal query on a HCM

codecov · 2024-09-06T19:42:27Z

Codecov Report

Attention: Patch coverage is 88.12500% with 19 lines in your changes missing coverage. Please review.

Project coverage is 81.27%. Comparing base (05a9456) to head (3af8c66).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
src/y0/hierarchical.py	88.12%	9 Missing and 10 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #236      +/-   ##
==========================================
+ Coverage   80.87%   81.27%   +0.39%     
==========================================
  Files          50       51       +1     
  Lines        4135     4314     +179     
  Branches      845      981     +136     
==========================================
+ Hits         3344     3506     +162     
- Misses        668      670       +2     
- Partials      123      138      +15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cthoyt · 2024-09-10T10:22:47Z

hi @adamrupe - can you add a checklist into the PR description with the tasks to complete for this PR before it needs review?

Copilot

Copilot reviewed 18 out of 19 changed files in this pull request and generated 1 comment.

Files not reviewed (1)

tox.ini: Language not supported

src/y0/hierarchical.py

pyproject.toml

tests/test_hierarchical.py

cthoyt

I did a major refactor to address the software issues from the last round. The next steps for @adamrupe and @djinnome are:

Read through the code and familiarize yourselves with the new interface
Comment on / address all TODO's I left in the code (there aren't many)
Tests
- Either implement tests for conversion to HSCM or delete the conversion code
- Test augment_collapsed_model
Check the notebook, which used to raise some exceptions, but I replaced those with the high-level identify_outcomes API. Please review to make sure that the places where there is no estimand produced because the graph has a single c-component are all correct
Create a high-level, real world example that demonstrates using all of the code in a story-driven workflow (i.e., do not explain the math, only explain which functions you implemented solve the problem). Use https://github.com/y0-causal-inference/y0/blob/main/notebooks/Counterfactual%20Transportability.ipynb as a golden standard for how a great notebook with applications looks

Along the way, please make sure that you check the CI/CD system for automated, objective feedback on code quality. @adamrupe if you're not familiar with how to do this, I am happy to show you

adamrupe · 2025-02-05T22:46:56Z

@cthoyt What's your recommendation for handling merge conflicts with jupyter notebooks? I need to do this before I can pull your updates. I'm also not familiar with the CI/CD system, so if you could talk me through it that would be great.

cthoyt · 2025-02-05T22:58:06Z

@adamrupe before merging, copy your local notebook to your desktop. While merging, throw away everything from your repository's copy and overwrite it with remote. Then, you can think about manually inspecting your notebook on your desktop, and the new version from the remote repo side-by-side.

The best way to avoid this kind of thing is never to leave changes unpushed when you finish working, and to always pull before you start working again

The short explanation of how to use the CI/CD system is: you can always scroll to the bottom of this pull request (#236) and look at the feedback given by GitHub running our unit tests, linting, and code quality checks.

This is what it looks like to me right now:

You can click on any of the rows with the red x's, and then it will bring you to the page that ran the tests for you. Right now, you will be able to see all of the output from running pytest. You have to scroll up a bit since unfortunately, pytest reports timings and warnings after test failures, but you can see https://github.com/y0-causal-inference/y0/actions/runs/13134504627/job/36646756591?pr=236#step:6:69 for the currently failing test.

Similarly, while you're still getting used to having code quality checks, you will probably see that the linting or type checking scripts also give errors, which you can view in the same way..

It's sort of the expectation in a team setting for coding that you make pushes often, and each time check out what kind of feedback CI gives. This will help you iteratively make your code better, with fully objective feedback that you don't have to wait on someone else to give you. Alternatively to CI/CD in GitHub, you can run tox which also creates a reproducible execution of all of the testing suite.

There's documentation in the README on how to use all of the nice development tools built into this repo at https://github.com/y0-causal-inference/y0?tab=readme-ov-file#%EF%B8%8F-for-developers

If you get caught up on any parts of this that aren't self-explanatory, I'm happy to plan a video chat tomorrow, or sometime next before 6PM germany time

adamrupe · 2025-02-06T20:13:43Z

Awesome, thanks @cthoyt! That makes sense, and Jeremy and Richard have already shown me how to use tox a bit. I've pulled your changes and I'm going through them now. I'll add a test_to_hscm.

cthoyt · 2025-03-01T01:28:31Z

@adamrupe you should be unblocked on the CI/CD pipeline now. looking forward to seeing a nice case study notebook, then we can finish this PR!

adamrupe · 2025-03-17T23:54:48Z

@cthoyt @djinnome I've changed the name of the previous notebook to HCM Manuscript Figures.ipynb and added a new case study notebook called Hierarchical Causal Models.ipynb.

cthoyt · 2025-03-18T10:08:52Z

I'd like you to consider what makes https://github.com/y0-causal-inference/y0/blob/main/notebooks/Surrogate%20Outcomes.ipynb a joy to read and try and take some lessons from it to improve the HCM notebook.

Write your notebook keeping in mind that you are the last human being who ever has to understand the math behind the implementation you wrote.
Imagine that all users of y0 want to solve a real problem, and they are reading the documentation to understand how they can model their problem using the data structures and algorithms in y0. They do not appreciate:
- Abstract headings. Name each case study by the actual problem it's about, not the archetypical HCM as named by the paper. E.g., "Confounder" -> "After-school Tutoring and Test Scores"
- Prose written like a mathematician. Rather than writing "Consider a school district that is interested in understanding how effective after-school tutoring is at raising test scores.", write "A school district is interested in understanding how effective after-school tutoring is at raising test scores." Use this simple and straightforward language to tell a story, not drag the user through a convoluted proof. Avoid words like "suppose", "consider"
- Abstract examples. Give something concrete in any place you're tempted to use a variable to represent a high-level concept. Explain the concrete reason you need to make the modeling choice based on the case study, then after you may explain the theory that corresponds to that choice
- Cryptic variable names. Any time you use a one letter variable name, you make it harder for readers to follow the example. Why are average test scores using the variable $y$? Call it score!
- Avoid cryptic notation. Do the bars on top of the variable help understand what's going on? If we're not using individual test scores, then what does this add?
  Further, why are we sub-scripting with i? The explanation for the subscripting comes way too late.
- Mixture of typography. I think that it's better to re-produce Figure 1 inside the notebook rather than using a mixture of visual styles and fonts.
Use Python naming conventions for all variables. Scores, Tutoring, UnitConfounder should be scores, tutoring, and unit_confounder
I'm not sure you need to explain the process of collapsing and augmenting. Isn't y0 able to abstract this away? Is there a logical reason the reader needs to know this happens in the background, given "here's the problem, here's how to model it, and here's the algorithm to apply to get an answer"? Maybe you can reuse some of the thoughts from above to frame this in a way that it's about the problem instead of about the math, but I think
If you want to include math, you do have to address the difference between continuous integrals and the discrete estimands that come out of y0 functions. Further, try and match your hand-written notation to the output of y0 (e.g., use capital P for probability distributions, capital Q for Q-variables)

adamrupe · 2025-03-21T19:06:22Z

@cthoyt thanks for the suggestions. Here are some thoughts, in no particular order:

We can add our own renderings instead of displaying Figure 1 from the paper. But there will still be a mix of typography because the hierarchical models require pygraphviz. It would be nice to have a pygraphviz backend for NxMixedGraph.draw(); in this case it would match styles, but pygraphviz generally has nice visuals.
On a related note, one reason for using abstract variable names like Y and A is that render nicely with NxMixedGraph.draw(). More verbose names like scores etc. do not fit in the nodes when visualized.
Speaking of Variable names, I used capitalized names for y0 Variables, like Scores to follow the same convention used in other y0 notebooks, including the Surrogate Outcomes notebook you have suggested (e.g. Cancer, Smoking, Tar).
Something very important to note is that the ideas and algorithms of hierarchical causal modeling are still in development. As of yet, there is no general and sound algorithm to answer whether a given causal query is identifiable from a hierarchical causal model. Therefore, in order to use the graphical algorithms we have implemented for hierarchical causal models, users need to have some baseline understanding of the algorithms and the ideas behind them, like what a Q Variable is. I've tried to distill out this baseline understanding from the (quite long) paper into the notebook. We can try to cut back on the math a little bit, but not too much.
For example, you asked whether the overbars are necessary to convey averaged quantities. In the first Confounder example, the distinction between quantities averaged over students in a school vs quantities for each individual student in the school is hugely important. The causal query is not identifiable when using the average values (which is not a hierarchical problem), but does become identifiable when we have the individual student data (and it is now a hierarchical problem). It is important for a user to understand why this is the case if they want to use these algorithms on their own hierarchical causal problems.

cthoyt · 2025-04-19T13:46:20Z

fyi @adamrupe @djinnome I'll be starting my new job in RWTH aachen probably the third week of June, but as soon as I sign the contract (sometime between now and then), I plan to submit the y0 paper to JOSS. If we want to credit the implementation of HCMs in that paper, I would like this to be finished. Please let me know if it's not clear what the expectations are for finishing this / merging this PR

djinnome · 2025-04-19T14:27:44Z

Congrats on the new position! June is a good deadline for us. @adamrupe is not at PNNL anymore, but I have some time in May to finish off the remaining items

…les to english names that capture the unit-level (school) and subunit (student) hierarchical distributions

djinnome · 2025-06-18T23:37:42Z

Referring back to #236 (comment)

I think we should help set reasonable expectations for the user.
The hierarchical causal model paper does not describe a sound and complete algorithm.
Certain decisions are still left to the user in terms of how to augment a hierarchical causal model so that it can be identified.
The confounder, Instrumental variables, and interference scenarios all demonstrate how to identify causal queries over hierarchical models that cannot be identified in a non-hierarchical model.
With that said, we are very much open to adjusting the notation and examples so that they are clearer to the user, even if the mechanics of the algorithms can't be completely ignored by the user.

adamrupe linked an issue Sep 6, 2024 that may be closed by this pull request

Implement hierarchical causal models from figure 2 in pygraphviz #232

Closed

cthoyt added the hierarchical causal models label Oct 29, 2024

djinnome assigned adamrupe Jan 24, 2025

djinnome requested review from cthoyt and Copilot January 24, 2025 00:00

Copilot AI reviewed Jan 24, 2025

View reviewed changes

src/y0/hierarchical.py Outdated Show resolved Hide resolved

djinnome marked this pull request as ready for review January 24, 2025 00:01

cthoyt reviewed Jan 24, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

cthoyt reviewed Jan 27, 2025

View reviewed changes

tests/test_hierarchical.py Outdated Show resolved Hide resolved

cthoyt force-pushed the HCM-fig2 branch 2 times, most recently from 3237de1 to ef54579 Compare February 3, 2025 08:42

cthoyt requested changes Feb 3, 2025

View reviewed changes

cthoyt force-pushed the HCM-fig2 branch from a53f77f to 18d2ae4 Compare February 3, 2025 10:07

cthoyt mentioned this pull request Feb 5, 2025

Add tutorial on working in a team cthoyt/cookiecutter-snekpack#41

Open

adamrupe added 10 commits February 27, 2025 18:43

functions for creating and querying HCMs with pygraphviz

3c27650

functionality to collapse phygraphviz HCMs to nxmixedgraphs

3975ba4

added __all__

134202c

added test_hierarchical

e643395

pygraphviz dep and lint exception for hierarchical in pyproject

4079a0d

auto lint hierarchical

e8fb669

auto lint test_hierarchical

c76a6f9

added docstrings to hierarchical.py

9134417

added docstrings to test_hierarchical.py

833ed32

more ignores in pyproject.toml

6828af3

adamrupe and others added 12 commits March 5, 2025 14:25

raise exception if collapsing HCM with unobserved subunits

7006267

allow strings for marginal_parents in marginalize_augmented_model

ee87ec5

minor clean to HCM nb

4ccd6ff

add tests for augment_collapsed_model and lint

9d9563f

switch Q variables to upper case to match paper

b25a8a8

Update index.rst

b9a8918

Merge branch 'main' into HCM-fig2

4fe51fb

Ruff

60fb658

Update tests.yml

0633c6f

rename current HCM nb

c59a7f9

add HCM nb

a80607d

lint

4dafe73

adamrupe and others added 3 commits March 17, 2025 21:35

update HCM Figs nb with new Q variable format

478d8fc

Update Hierarchical Causal Models.ipynb

b9756ff

Cleanup unused code

e7ed6fd

Rerun

8811ed0

cthoyt mentioned this pull request May 9, 2025

JOSS manuscript second draft #287

Merged

Started the process of converting unit-level and subunit-level variab…

ae91b8b

…les to english names that capture the unit-level (school) and subunit (student) hierarchical distributions

cthoyt added 2 commits June 20, 2025 16:46

Merge branch 'main' into HCM-fig2

3864813

Update notebook

4c3926a

cthoyt enabled auto-merge (squash) June 20, 2025 14:46

cthoyt added 2 commits June 20, 2025 16:51

Renames

428be21

Update README.md

f485b00

Hierarchical Causal Models #236

Are you sure you want to change the base?

Hierarchical Causal Models #236

Uh oh!

Conversation

adamrupe commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cthoyt commented Sep 10, 2024

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

cthoyt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamrupe commented Feb 5, 2025

Uh oh!

cthoyt commented Feb 5, 2025

Uh oh!

adamrupe commented Feb 6, 2025

Uh oh!

cthoyt commented Mar 1, 2025

Uh oh!

adamrupe commented Mar 17, 2025

Uh oh!

cthoyt commented Mar 18, 2025

Uh oh!

adamrupe commented Mar 21, 2025

Uh oh!

cthoyt commented Apr 19, 2025

Uh oh!

djinnome commented Apr 19, 2025

Uh oh!

djinnome commented Jun 18, 2025

Uh oh!

Uh oh!

adamrupe commented Sep 6, 2024 •

edited

Loading

codecov bot commented Sep 6, 2024 •

edited

Loading

cthoyt left a comment •

edited

Loading