Non-English tokenizers #464
Labels
enhancement
New feature or request
help wanted
Extra attention is needed
question
Further information is requested
Describe the solution you'd like
For CJK languages, like for example Chinese, words are not separated by spaces. So there usually has a need to use a tokenizer to split sentences into word stems. Like this one: https://github.com/yanyiwu/cppjieba
Is it currently doable in Pisa? If not, is there any plan to add this feature in the future?
Additional context
The text was updated successfully, but these errors were encountered: