Incomplete gender perturbations in the subject dataset #84

ege-erdogan · 2024-09-23T08:48:23Z

Hi, and thanks for the nice work. We've discovered some samples in the subject dataset (downloaded from OSF) for which the subject gender cannot be clearly distinguished between the male and female versions of the same sentence, and the ground truth labels do not cover all the words corresponding to the subject. Two examples (bold words are part of the subject but stay the same):

(MALE) Because his father works with horses , Matilda demands the definition of a horse .
(FEMALE) Because her father works with horses , Matilda demands the definition of a horse .

and

(MALE) Zain seeks escape in an ultimate manner by committing suicide , drowning herself in the waters of the Gulf of Mexico.
(FEMALE) Chloe seeks escape in an ultimate manner by committing suicide , drowning herself in the waters of the Gulf of Mexico.

Appears to be human labeling error according to A.1.1 in the paper but we wanted to notify you and see if you were aware of this or updated the dataset to fix this issue.

Edit: to clarify, in the second examples 'herself' is not part of the grammatical subject but refers to the subject so should be modified accordingly to be consistent, while in the first sentence 'Matilda' is the subject.

Best,
Ege

rickwg · 2024-09-25T13:11:00Z

Hey Ege, thanks for bringing that to our attention - great catch! We definitely could've been clearer about how we put together the 'subject' dataset. Let me break it down:
For the 'subject' dataset, we're only labeling the first part of the grammatical subject. If there's a second part, we're leaving it out. As for those sentences you pointed out, we're altering the bold words specifically for the 'all' dataset.
Just so you know, we're actually in the process of updating our datasets. We've realized that using names to determine gender is a critical weakness in our current setup, so we're working on fixing that.
I'll make sure to close this issue once we've got the updated versions published and ready to go.
Really appreciate you flagging this. If you have any other questions or spot anything else, feel free to let me know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incomplete gender perturbations in the subject dataset #84

Incomplete gender perturbations in the subject dataset #84

ege-erdogan commented Sep 23, 2024 •

edited

Loading

rickwg commented Sep 25, 2024

Incomplete gender perturbations in the subject dataset #84

Incomplete gender perturbations in the subject dataset #84

Comments

ege-erdogan commented Sep 23, 2024 • edited Loading

rickwg commented Sep 25, 2024

ege-erdogan commented Sep 23, 2024 •

edited

Loading