Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report: newmm bug #893

Open
wannaphong opened this issue Dec 18, 2023 · 1 comment
Open

Report: newmm bug #893

wannaphong opened this issue Dec 18, 2023 · 1 comment
Labels
bug bugs in the library
Projects

Comments

@wannaphong
Copy link
Member

wannaphong commented Dec 18, 2023

newmm is use the maximum matching algorithms, constrained by Thai Character Cluster (TCC) boundaries with improved TCC rules. It can found a ambiguous breaking points bug that slower/very slow. The bug can handle by newmm-safe but newmm-safe can't fixed in sometime (Rare case).

This issue will submit some text for testing to fixed the issues in the future.

@wannaphong
Copy link
Member Author

wannaphong commented Dec 19, 2023

error-1.txt https://drive.google.com/drive/folders/1iFEfUqwsg3xozifT5o0OfRojfSoVF7LD?usp=sharing

The file has very big (131MB). newmm can't handle and has ambiguous breaking points.

@bact bact added the bug bugs in the library label Feb 11, 2024
@bact bact added this to To do in PyThaiNLP Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug bugs in the library
Projects
PyThaiNLP
  
To do
Development

No branches or pull requests

2 participants