-
-
Notifications
You must be signed in to change notification settings - Fork 0
Transcript review before deposit: 16 Milan. Spanish #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'll pick this up |
Done! |
It would be good to do so, I think. @iramosp, can you work with Riva to
make it safer?
…On Thu, 6 Mar 2025 at 22:07, Riva Quiroga ***@***.***> wrote:
Done!
There are some linguistic features that make it possible to infer the
country of origin of the interviewee. I'm not sure if you want to anonymize
that too
—
Reply to this email directly, view it on GitHub
<#23 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACGXRDVZJNDJFCONI23444T2TDBLJAVCNFSM6AAAAABYJHTNFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMBVGA2DSOBVGM>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***
.com>
[image: rivaquiroga]*rivaquiroga* left a comment
(open-life-science/open-seeds-impact-paper#23)
<#23 (comment)>
Done!
There are some linguistic features that make it possible to infer the
country of origin of the interviewee. I'm not sure if you want to anonymize
that too
—
Reply to this email directly, view it on GitHub
<#23 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACGXRDVZJNDJFCONI23444T2TDBLJAVCNFSM6AAAAABYJHTNFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMBVGA2DSOBVGM>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***
.com>
|
@rivaquiroga , do you have suggestions on how to do this? As our expert linguist, I'm happy to follow your lead on this - sorry I said it wasn't needed before. |
The thing with anonymization is that it is usually the sum of small linguistic features that seem trivial on their own and random pieces of information that allow you to reconstruct who the person is. For example, I think I have a pretty good guess of who this person might be, considering gender markers, their variety of Spanish, a mention of a specific programming language, the fact that they are not in the capital city of their country, and a mention of something related to their project (which I suggested anonymizing). |
Agree! |
You need to remove the features that are characteristic of that variant of Spanish, not change them. In most cases, these include filler words, certain adjectives, nouns, etc. For example, in Chilean Spanish, I could say something like: "Llegué a mi casa y me comí un pan batido con palta." (Some people might even be able to identify the city where I'm based if they have knowledge of Chilean breads.) The edited version could be: "Llegué a mi casa y comí [producto local]." If you change it to your variety, it would be: "Llegué a mi casa y me comí un pan con aguacate." But that is not what I said. So the approach is to de-identify, not to modify. |
@rivaquiroga, if you still have some time, it'd be good to anonymize this too. I tried doing it for the other Spanish transcripts, and will give them another pass after looking at your suggestions for this one. |
@iramosp, I'll work on that this week and let you know when I'm done |
@iramosp, done! |
¡Muchas gracias! This was a very meticulous review, and I learned a lot from it 🙌 |
Transcript Language:
Spanish
Task:
The text was updated successfully, but these errors were encountered: