Skip to content

Vietnamese support #651

Answered by micheleriva
hckhanh asked this question in Q&A
Mar 1, 2024 · 1 comments · 2 replies
Discussion options

You must be logged in to vote

Hi!
I think we could probably try to add official support to Vietnamese via https://github.com/vunb/vntk or some other similar toolkit.

This means:

  1. Exposing a Vietnamese tokenizer
  2. Exposing a Vietnamese stemmer
  3. Exposing a list of Vietnamese stop-words

You can find a reference implementation here: https://github.com/askorama/orama/blob/main/packages/tokenizers/src/tokenizer-mandarin/src/tokenizer.ts

It would really help if a native speaker could help with this. We'd try our best to support any contribution in that sense.

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@hckhanh
Comment options

@micheleriva
Comment options

Answer selected by hckhanh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants