Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to access terminals for unparsed text? #58

Open
abigailalice opened this issue Jan 11, 2025 · 0 comments
Open

Is it possible to access terminals for unparsed text? #58

abigailalice opened this issue Jan 11, 2025 · 0 comments

Comments

@abigailalice
Copy link

I'm trying to render the original sentences I use Greynir to parse verbatim as html, while inserting the additional data (e.g. lemmas, parts of speech) into the html as well. However, it's not clear if all of the original data is recoverable from the results of Greynir, for instance, if I have a sentence with multiple spaces and use tidy_text they get reduced to a single one, and using terminals doesn't show spaces at all.

For comparison, spacy lets you recover the input text from its output. Is there a way to do this, so I can iterate over terminals or unparsed text together? I did see that periods are stored as a terminal, with no category, so presumably raw terminals could be stored the same way, but I'm assuming from tidy_texts behaviour the data might not be stored at all.

I am looking at trying to insert missing context back into the results as a workaround, I'm just curious if there's any methods/attributes that get me the info I need already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant