Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add advanced validations for dictionary building #168

Open
eiennohito opened this issue Dec 17, 2021 · 0 comments
Open

Add advanced validations for dictionary building #168

eiennohito opened this issue Dec 17, 2021 · 0 comments
Assignees
Milestone

Comments

@eiennohito
Copy link
Collaborator

eiennohito commented Dec 17, 2021

We plan to introduce dictionary build warnings, which will not abort the building of the dictionary, but will report that something was not good.

Warning-producing checks will be optional, but enabled by default.

Proposed list of warnings:

  • Surface forms are not normalized. Words with such surfaces will not be possible to lookup via Trie index (this is current behavior) and those problems seem to appear somewhat frequently with user dictionaries.
  • Word segmentation producing non-consistent splitting. Concatenation of word splitting surfaces should produce the surface of the original word.
  • Having non-distinguishable dictionary entries (with same left/right connection IDs + surface). In this case an entry with the highest cost wins, otherwise the last dictionary entry wins. We will remove all other entries from index.
@eiennohito eiennohito self-assigned this Dec 17, 2021
@eiennohito eiennohito added this to the 0.6.0 milestone Dec 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant