Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate nb Wikidata into Unicode Inflection #58

Open
grhoten opened this issue Jan 22, 2025 · 1 comment
Open

Integrate nb Wikidata into Unicode Inflection #58

grhoten opened this issue Jan 22, 2025 · 1 comment
Milestone

Comments

@grhoten
Copy link
Member

grhoten commented Jan 22, 2025

The revised dictionary-parser can parse Wikidata, but some issues need to be resolved.

The initial issues include:

  • The dictionary-parser output needs to be addressed
  • The unit tests need to be fixed.

Tool output that needs to be addressed:

Line 54516: Q11655558 is not a known part of speech grammeme for L448781(om)
Line 54680: Q11655558 is not a known part of speech grammeme for L450068(fordi)
Line 70572: Q11655558 is not a known part of speech grammeme for L586421(hvorvidt)
Line 70779: Q11655558 is not a known part of speech grammeme for L588077(likesom)
Line 139027: Q11655558 is not a known part of speech grammeme for L1141679(da)
Line 226488: Q11655558 is not a known part of speech grammeme for L448822(som)
Line 256655: Q11655558 is not a known part of speech grammeme for L702577(såfremt)
Line 282555: Q11655558 is not a known part of speech grammeme for L908229(så)
Line 343201: Q11655558 is not a known part of speech grammeme for L1400551(enda om)
Line 422911: Q11655558 is not a known part of speech grammeme for L656353(enn)
Line 428473: Q11655558 is not a known part of speech grammeme for L702574(såframt)
Line 514334: Q11655558 is not a known part of speech grammeme for L1400533(selv om)
Line 514336: Q11655558 is not a known part of speech grammeme for L1400537(om enn)
Line 569358: Q11655558 is not a known part of speech grammeme for L449151(mens)
Line 653761: Q11655558 is not a known part of speech grammeme for L1139106(innen)
Line 685773: Q11655558 is not a known part of speech grammeme for L1400543(uaktet)
Line 741838: Q11655558 is not a known part of speech grammeme for L455338(viss)
Line 757221: Q11655558 is not a known part of speech grammeme for L588076(liksom)
Line 771104: Q11655558 is not a known part of speech grammeme for L702575(så framt)
Line 775794: Q11655558 is not a known part of speech grammeme for L740679(etter)
Line 857052: Q11655558 is not a known part of speech grammeme for L1400541(fordi om)
Line 912050: Q11655558 is not a known part of speech grammeme for L448828(å)
Line 912080: Q11655558 is not a known part of speech grammeme for L449141(enn)
Line 912797: Q11655558 is not a known part of speech grammeme for L454959(ettersom)
Line 912850: Q11655558 is not a known part of speech grammeme for L455337(dersom)
Line 912851: Q11655558 is not a known part of speech grammeme for L455339(hvis)
Line 1083763: Q11655558 is not a known part of speech grammeme for L448782(at)
Line 1089454: Q11655558 is not a known part of speech grammeme for L494760(skjønt)
Line 1103761: Q11655558 is not a known part of speech grammeme for L618919(idet)
Line 1114128: Q11655558 is not a known part of speech grammeme for L702576(så fremt)
Line 1119103: Q11655558 is not a known part of speech grammeme for L743825(for di)
Line 1371397: Q11655558 is not a known part of speech grammeme for L1400550(enda)

Here is the current generated lexical dictionary files to debug the test failures.

nb.zip

@grhoten grhoten added this to the 0.1 milestone Jan 22, 2025
@grhoten
Copy link
Member Author

grhoten commented Jan 28, 2025

It looks like belte needs to be looked at further. It looks like there are 2 choices for plural definite.

makaber needs to be added.

mor has incompletely defined the grammatical gender.

These are the options used:

--language nb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant