Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect_language_mixed(): R Session Crashing when running on empty entries #3

Open
TimBMK opened this issue Jun 19, 2021 · 4 comments
Open

Comments

@TimBMK
Copy link

TimBMK commented Jun 19, 2021

Hey!

I have a large dataset of mixed-language entries (assume 100k+) that I want to run cld3's language detection on in order to detect non-english language snippets. However, I was running into the problem with the R Session aborting (fatal error) as soon as I try to run it over certain entries. I could isolate the problem and it seems that as soon as it hit an empty entry ("") , it would fail and take the whole session down with it. cld2::detect_language_mixed and cld3::detect_language() both do not seem to have that issue, so I'm assuming it would be an easy fix to escape these entries and return NA. Seeing that it took me a while to figure out, it might save quite a bit of heartache to implement this in the next update though. I'm running the latest cld3 release from CRAN (1.4.1).

Also, thanks for the great package! It's really helpful seeing that it seems to deal better with multi-language entries than cld2.

@jeroen
Copy link
Member

jeroen commented Jun 19, 2021

Can you try to create a minimal reproducible example?

@TimBMK
Copy link
Author

TimBMK commented Jun 19, 2021

test <- ""
cld3::detect_language_mixed(test)

@jeroen
Copy link
Member

jeroen commented Jun 19, 2021

oh wow haha that is embarrassing

@TimBMK
Copy link
Author

TimBMK commented Jun 19, 2021

Probably just a little slip up somewhere, haha. When I remove the empty entries it runs like a charm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants