-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Japanese text being identified as Kurmanji #7
Comments
Hi, @ftkurt . Thank you for identifying this issue! This package doesn't support Japanese yet, but it's easy to add a language. Would you like to submit a pull request? The documentation on how to add a language is here: https://github.com/DanielJDufour/language-detector/blob/master/CONTRIBUTING.md |
I briefly looked at Japanese character sets, and it seems its a bit different than other languages as they have multiple sets. Therefore, I would rather prefer someone knowledgeable about Japanese do that. However, I am currently working on collecting Sorani and Kurmanji datasets. I might be able to add more data for those two Kurdish dialects in the coming days. I think this will help with making this package more reliable. |
That would be great! Thank you! |
It's probably because of this line.
language-detector/language_detector/prep/char_language.txt
Line 73 in e960b59
See the following texts:
The text was updated successfully, but these errors were encountered: