Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comma sensitivity in sentences #18

Open
jokull opened this issue May 13, 2020 · 7 comments
Open

Comma sensitivity in sentences #18

jokull opened this issue May 13, 2020 · 7 comments

Comments

@jokull
Copy link
Contributor

jokull commented May 13, 2020

I’m not familiar with the parsing pipeline but I thought I would share an instance of where the parser tripped in a (to me) surprising way:

greynir.parse_single('Sótt er um leyfi til  byggja 50 leiguíbúðir fyrir námsme
nn á lóð við Austurhlíð.').lemmas

This is fine and gives me the right lemmas for leiguíbúð, námsmaður etc.

greynir.parse_single('Sótt er um leyfi til  byggja 50 leiguíbúðir fyrir námsme
nn, á lóð við Austurhlíð.').lemmas

The comma before "á lóð" gives me the "eiga" lemma for "á" instead of just "á".

Sorry if GitHub issues is the wrong place. I’m mainly curious about the roadmap, design and limitations. I assume Greynir uses commas to fragment sentences to keep down the parse pathways.

BTW this is a real world example.

Loving Greynir and following your progress! ✨

EDIT: Screenshot might help

Screenshot 2020-05-13 at 22 50 49

@vthorsteinsson
Copy link
Member

Thanks! This is indeed the right place to submit issues such as this one, which describes a bona fide bug. It looks like we need to tweak the grammar for this syntactic structure; it is incorrectly preferring to see ", á lóð" as a continuation of a previous verb phrase, instead of a new prepositional phrase. We will look into this.

@vthorsteinsson
Copy link
Member

@jokull
Copy link
Contributor Author

jokull commented May 14, 2020

Great!

Here is another potential bug I spotted.

Málinu er vísað til umsagnar skipulagsfulltrúa vegna svala.

"svala" here seems to be the noun not a balcony.

Balcony is perhaps more common, maybe a grammar file tweak can help this. Not even sure what "svali" means.

@sveinbjornt
Copy link
Member

:)

svali

@vthorsteinsson
Copy link
Member

By the way, in the earlier bug ("Sótt var um leyfi...") Greynir is recognizing "Sótt" as the noun, not the verb; and then it constructs a double verb phrase with the verbs "var" and "á" hanging off the subject "Sótt".

@vthorsteinsson
Copy link
Member

...and "svala" can also be a bird, i.e. a female noun, plus two masculine ones ("svalur" and "svali").

@vthorsteinsson
Copy link
Member

A fix for "svala" is ready and will be in the next commit to the config file Prefs.conf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants