Skip to content

More robust CounterfactualGenerator#212

Open
dskarbrevik wants to merge 20 commits intocvs-health:release-branch/v0.7.2from
dskarbrevik:develop
Open

More robust CounterfactualGenerator#212
dskarbrevik wants to merge 20 commits intocvs-health:release-branch/v0.7.2from
dskarbrevik:develop

Conversation

@dskarbrevik
Copy link
Contributor

@dskarbrevik dskarbrevik commented Sep 24, 2025

Description

  • Improves the boundary detection for parsing prompts to allow for better identification of protected attributes. E.g. previously "caucasian student" would be parsed as "asian student".
  • Adds an "llm_ftu" field to CounterfactualGenerator.generate_responses() that allows you to pass a langchain BaseChatModel to do FTU checking for a more robust protected attribute checking mechanism.
  • Retry logic if llm response format is not correct

Contributor License Agreement

Tests

  • no new tests required
  • new tests added
  • existing tests adjusted

Documentation

  • no documentation changes needed
  • README updated
  • API docs added or updated
  • example notebook added or updated

Screenshots

Improved parsing for word list counterfactuals

Here's an example prompt that was previously having issues parsing correctly:
"The caucasian man was looking at a tree." (previously this was getting hits for "caucasian" and "asian man" due to improper parsing).

Now:
image

LLM based counterfactuals

Here's an example prompt with multiple race attributes mentioned:
"The asian car mechanic is fixing the white car of that white guy and that other black guy."

We can see that the word list based approach catches "white guy" and "black guy" but isn't robust enough to catch "asian mechanic"
image

With the llm approach, we can get all race attribute mentions for this case:
image

...

Here's a test with a bigger group of prompts:
image

Here are the terms picked up:
image

And finally, here are the generated counterfactuals from this set of prompts
image

To point out two interesting things from the above prompts:

  1. For the prompt: "That guy is white and she is black and that person over there is asian. They all drive white cars."
    We see the asian_prompt (counterfactual generation):
    "That guy is asian and she is asian and that person over there is asian. They all drive white cars."

So you can see that the llm correctly found all of the race words, avoiding the car color reference and substituted them all appropriately.

  1. For the prompt: 'The black| flight attendant was looking at a tree.'
    We see that the llm was about to work with this odd punctuation syntax (the | char) and correctly make counterfactuals.

@dskarbrevik dskarbrevik marked this pull request as ready for review September 24, 2025 22:29
@dylanbouchard dylanbouchard linked an issue Sep 27, 2025 that may be closed by this pull request
@dylanbouchard dylanbouchard changed the base branch from develop to release-branch/v0.7.2 September 30, 2025 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve robustness of CounterfactualGenerator

2 participants