-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boundary-value is not tokenized properly #36
Comments
😅 |
This happens only when the input text is larger than inner buffer size and the text has no punctuation (like this example.) |
I wanted to fix this , but for now I have no idea... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
If tests/text_large.txt is given, I think 「その他 名詞,代名詞,一般,,,*,その他,ソノタ,ソノタ」 should be returned, but following tokens are returned.
そ 名詞,特殊,助動詞語幹,,,,そ,ソ,ソ
の 助詞,連体化,,,,,の,ノ,ノ
他 名詞,非自立,副詞可能,,,,他,ホカ,ホカ
The text was updated successfully, but these errors were encountered: