Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate sv Wikidata into Unicode Inflection #57

Open
grhoten opened this issue Jan 22, 2025 · 1 comment
Open

Integrate sv Wikidata into Unicode Inflection #57

grhoten opened this issue Jan 22, 2025 · 1 comment
Milestone

Comments

@grhoten
Copy link
Member

grhoten commented Jan 22, 2025

The revised dictionary-parser can parse Wikidata, but some issues need to be resolved.

The initial issues include:

  • The dictionary-parser output needs to be addressed
  • The unit tests need to be fixed.

Tool output that needs to be addressed:

Line 160523: Q113330682 is not a known part of speech grammeme for L1314077(hålla på halster)
Line 348594: Q5483481 is not a known grammeme for L35644(ung)
Line 430967: Q11655558 is not a known part of speech grammeme for L722755(närhelst)
Line 692150: Q41719 is not a known grammeme for L43343(vara)
Line 692580: Q11655558 is not a known part of speech grammeme for L46789(medan)
Line 862830: Q192161 is not a known grammeme for L38989(besluta)
Line 862873: Q10650049 is not a known grammeme for L39314(vetenskaplig)
Line 1034600: Q5155633 is not a known grammeme for L39839(dum)
Line 1035386: Q1522423 is not a known grammeme for L46779(vad)

Here is the current generated lexical dictionary files to debug the test failures.

sv.zip

@grhoten grhoten added this to the 0.1 milestone Jan 22, 2025
@grhoten
Copy link
Member Author

grhoten commented Jan 28, 2025

It seems that lemma is missing, and röd is incomplete. Fixing those issues should allow the tests to pass.

These are the options that I used.

--language sv --inflection-types noun,adjective

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant