Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate da Wikidata into Unicode Inflection #59

Open
grhoten opened this issue Jan 22, 2025 · 3 comments
Open

Integrate da Wikidata into Unicode Inflection #59

grhoten opened this issue Jan 22, 2025 · 3 comments
Milestone

Comments

@grhoten
Copy link
Member

grhoten commented Jan 22, 2025

The revised dictionary-parser can parse Wikidata, but some issues need to be resolved.

The initial issues include:

  • The dictionary-parser output needs to be addressed
  • The unit tests need to be fixed.

Tool output that needs to be addressed:

Line 28512: Q177634 is not a known grammeme for L237294(lokal)
Line 148700: Q24238356 is not a known part of speech grammeme for L1218844(det ved den søde grød)
Line 177076: Q527205 is not a known grammeme for L39189(leve)
Line 177077: Q527205 is not a known grammeme for L39191(ske)
Line 181314: Q177634 is not a known grammeme for L73159(bred)
Line 515722: Q55064750 is not a known part of speech grammeme for L3064(der)
Line 519939: Q106322767 is not a known grammeme for L37094(fed)
Line 520853: Q55064750 is not a known part of speech grammeme for L45364(her)
Line 596497: Q10535365 is not a known part of speech grammeme for L678570(at)
Line 675371: Q113330682 is not a known part of speech grammeme for L1313692(sove på det)
Line 866973: Q177634 is not a known grammeme for L73165(vid)

Here is the current generated lexical dictionary files to debug the test failures.

da.zip

@grhoten grhoten added this to the 0.1 milestone Jan 22, 2025
@nciric
Copy link
Contributor

nciric commented Jan 24, 2025

There are 71781 nouns in Danish. See this query.

@grhoten
Copy link
Member Author

grhoten commented Jan 28, 2025

The lack of Ajax in the dictionary seems to be the only test issue. Either the lemma should be added, or the test should be changed.

-------------------------------------------------------------------------------
InflectionTest#testInflections
-------------------------------------------------------------------------------
/Users/grhoten/Development/inflection/inflection/test/src/inflection/dialog/InflectionTest.cpp:133
...............................................................................

/Users/grhoten/Development/inflection/inflection/test/src/inflection/dialog/InflectionTest.cpp:112: FAILED:
  CHECK( expectedStr == resultStr )
with expansion:
  "SS[Ajaxt]" == "SS[Ajax]"
with message:
  locale=da source=SS[Ajax] {definiteness=indefinite,gender=neuter,number=
  singular,pos=adjective}

/Users/grhoten/Development/inflection/inflection/test/src/inflection/dialog/InflectionTest.cpp:112: FAILED:
  CHECK( expectedStr == resultStr )
with expansion:
  "SS[Ajaxe]" == "SS[Ajax]"
with message:
  locale=da source=SS[Ajax] {definiteness=indefinite,gender=common,number=
  plural,pos=adjective}

/Users/grhoten/Development/inflection/inflection/test/src/inflection/dialog/InflectionTest.cpp:112: FAILED:
  CHECK( expectedStr == resultStr )
with expansion:
  "SS[Ajaxe]" == "SS[Ajax]"
with message:
  locale=da source=SS[Ajax] {definiteness=indefinite,gender=neuter,number=
  plural,pos=adjective}

/Users/grhoten/Development/inflection/inflection/test/src/inflection/dialog/InflectionTest.cpp:112: FAILED:
  CHECK( expectedStr == resultStr )
with expansion:
  "SS[Ajaxe]" == "SS[Ajax]"
with message:
  locale=da source=SS[Ajax] {definiteness=definite,gender=common,number=
  singular,pos=adjective}

/Users/grhoten/Development/inflection/inflection/test/src/inflection/dialog/InflectionTest.cpp:112: FAILED:
  CHECK( expectedStr == resultStr )
with expansion:
  "SS[Ajaxe]" == "SS[Ajax]"
with message:
  locale=da source=SS[Ajax] {definiteness=definite,gender=neuter,number=
  singular,pos=adjective}

/Users/grhoten/Development/inflection/inflection/test/src/inflection/dialog/InflectionTest.cpp:112: FAILED:
  CHECK( expectedStr == resultStr )
with expansion:
  "SS[Ajaxe]" == "SS[Ajax]"
with message:
  locale=da source=SS[Ajax] {definiteness=definite,gender=common,number=plural,
  pos=adjective}

/Users/grhoten/Development/inflection/inflection/test/src/inflection/dialog/InflectionTest.cpp:112: FAILED:
  CHECK( expectedStr == resultStr )
with expansion:
  "SS[Ajaxe]" == "SS[Ajax]"
with message:
  locale=da source=SS[Ajax] {definiteness=definite,gender=neuter,number=plural,
  pos=adjective}

===============================================================================
test cases:   253 |   252 passed | 1 failed
assertions: 19659 | 19652 passed | 7 failed

@grhoten
Copy link
Member Author

grhoten commented Jan 29, 2025

This seems like a good set of options to use

--language da --inflection-types noun,adjective --ignore-entries-with-grammemes abbreviation --ignore-property countable --ignore-property spelling --ignore-property oblique --ignore-property dative  --ignore-property accusative

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants