-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some Japanese text returns language as null #2
Comments
Hi, this would get solved by using the largest database.
In the next version of ELD, the large database will be the default database. Still, this could happen with other combinations of Japanese/Chinese characters for very short text. There is a possible improvement, for a future version of ELD, as stated at the end of the readme: So I leave it up to you to close or not the issue, if you think using the ngramsL60 is a good enough fix, or not, and you would like to see further improvements as commented before. |
This makes sense, I just wasn't expecting it to return null with no guess. Switching to the largest database does work for "終了". You are correct, there are still examples : "拒否" where it will return null. Testing it across all my data now with ngramsL60 I found some examples in other languages: "undo". So there are certain words it won't detect on their own (or combined with others it cannot detect). |
That is interesting feedback, thanks. A shorter term solution, for single words that are undetected, would be to do an internal re-detect, and search the word as a prefix, suffix, or infix in other words; for example in the current database "undo" appears as suffix & infix for English. It might be better a I want to ask, do you have any suggestion, about what to return in case of no detection? Maybe the returned object could have an |
print(ELDdetector.detect("終了"))
{"": {"language": null, "scores()": {}, "is_reliable()": false}}
The text was updated successfully, but these errors were encountered: