Skip to content

Enabling sexual orientation attribute#233

Open
PS5138 wants to merge 1 commit intocvs-health:mainfrom
PS5138:sexual-orientation-attribute
Open

Enabling sexual orientation attribute#233
PS5138 wants to merge 1 commit intocvs-health:mainfrom
PS5138:sexual-orientation-attribute

Conversation

@PS5138
Copy link

@PS5138 PS5138 commented Feb 15, 2026

Closes #142

Hi! This is my second open-source contribution, so I appreciate your patience. I've done my best to follow the existing code patterns and contributing guidelines, but if I've missed anything or if there are changes you'd like me to make, please let me know and I'll be happy to work on it.

Description

This PR adds sexual orientation as a third protected attribute to the CounterfactualGenerator, alongside the existing gender and race attributes. The implementation follows the same substitution strategy used for race (all-to-one replacement) with four groups: heterosexual, gay, lesbian, and bisexual.

Changes in detail

langfair/constants/word_lists.py

  • Added SEXUAL_ORIENTATION_WORDS_NOT_REQUIRING_CONTEXT (13 terms: homosexual, heterosexual, bisexual, lesbian, queer, lgbtq, etc.)
  • Added SEXUAL_ORIENTATION_WORDS_REQUIRING_CONTEXT (3 terms: gay, straight, pride; these only match when followed by a person word, to avoid false positives like "straight line" or "go straight")
  • Word lists influenced by the HRC Glossary of Terms

langfair/generator/counterfactual.py

  • Built STRICT_SEXUAL_ORIENTATION_WORDS mappings (mirroring the race pattern)
  • Added sexual_orientation to attribute_to_word_lists, group_mapping, and validation
  • Added _get_sexual_orientation_subsequences, _counterfactual_sub_sexual_orientation, and _replace_sexual_orientation helper methods (mirroring _get_race_subsequences, _counterfactual_sub_race, and _replace_race)
  • Replacement sorts by length (longest first) to prevent partial matches (e.g., "homosexual" inside "homosexuals")
  • Added sexual_orientation support to neutralize_tokens (uses [MASK], same as race)
  • Updated all relevant docstrings

langfair/auto/auto.py

  • Added sexual_orientation to Protected_Attributes
  • Fixed protected_words initialization to derive dynamically from Protected_Attributes instead of being hardcoded to just race and gender

tests/test_counterfactualgenerator.py

  • Added test_counterfactual_sexual_orientation covering parse_texts, create_prompts, generate_responses, check_ftu, neutralize_tokens, and validation error handling

Contributor License Agreement

Tests

  • no new tests required
  • new tests added
  • existing tests adjusted

All tests passed.

Documentation

  • no documentation changes needed
  • README updated
  • API docs added or updated
  • example notebook added or updated

Screenshots

N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable sexual orientation attribute for CounterfactualGenerator

1 participant