Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-page text annotation #7099

Open
EmilyAlsentzer opened this issue Feb 17, 2025 · 3 comments
Open

Multi-page text annotation #7099

EmilyAlsentzer opened this issue Feb 17, 2025 · 3 comments

Comments

@EmilyAlsentzer
Copy link

I am trying to annotate multiple pages of text data for a QA task where I highlight the span of text where the answer was found. You can do this in a clunky way using the QA template with multiple text boxes, but this doesn't scale well when you have 50+ pages of text with metadata for each page. Each page is free-text, not a PDF so the multi-page PDF template doesn't work either. Is there some way to highlight spans across many pages of text, each with their own metadata? Ideally I'd like the user to be able to filter by the metadata for each page as well. Thanks.

@heidi-humansignal
Copy link
Collaborator

Hello Emily,

Is it possible you can tell us how many pages are you expecting this free-text to be as a single task? Is it possible you can provide a sample data for this?

Thank you,
Abu

Comment by Abubakar Saad
Workflow Run

@EmilyAlsentzer
Copy link
Author

Hi Abu,

We're working with electronic health record data, which includes unstructured clinical notes. Each patient record can have anywhere from 1 to several hundred notes. Each note is time-stamped and can be from a few sentences up to >10k tokens. The task is QA over the longitudinal, timestamped data. We want to be able to annotate the spans in the longitudinal data where the answer can be found.

I unfortunately can't provide sample data, as the data contains health information, but the simplest version of the input is typically a CSV of patient_id, note_id, note_timestamp, note_text. All of the notes belonging to a single patient_id should be in a task.

Thanks!

@EmilyAlsentzer
Copy link
Author

I'll also add that some sort of temporal filtering would be a nice to have, but we could also handle that at minimum by presorting by timestamp before loading the data to label studio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants