Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A terminal that doesn't match specific strings #123

Open
stuartlangridge opened this issue Jul 7, 2020 · 2 comments
Open

A terminal that doesn't match specific strings #123

stuartlangridge opened this issue Jul 7, 2020 · 2 comments
Labels

Comments

@stuartlangridge
Copy link

stuartlangridge commented Jul 7, 2020

I'd like to define a terminal that matches words except specific words.

This is why: trying this code

import parglare

grammar = r"""
Sentence: The? object_name=Identifier "is" A Identifier DOT;
Identifier: IdentifierWord+;

terminals

The: /(?i)The/;
A: /(?i)An?/;
IdentifierWord: /\w+/;
DOT: ".";
"""

text = """The apple is a fruit."""

g = parglare.Grammar.from_string(grammar)
p = parglare.Parser(g, debug=True, consume_input=False)
result = p.parse(text)
print(result)

fails, expectedly, with Can't disambiguate between: <IdentifierWord(The)> or <The(The)>, because IdentifierWord matches everything. So what I'd like to do is have IdentifierWord not match certain things, such as "the" and "a". However, when I try this, by changing the definition of the IdentifierWord terminal to IdentifierWord: /(?!The|a)\w+/; so that it uses a negative lookahead to exclude certain words from matching, then the above code fails with

Error at 2:4:"\nThe **> apple is a" => Expected: IdentifierWord but found <A(a)>

I don't understand why this is. It's finding the "a" at the beginning of "apple" and treating it as an "a". I don't know if I'm solving this the best way; is there some other way I should be structuring this sort of grammar, or maybe some better way of defining a terminal that matches all words except certain ones?

@igordejanovic
Copy link
Owner

Word apple is not matched by (?!The|a)\w+. It is because the negative assertion will match a at the beginning. What you need to do it to make sure that the negative assertion take into account the word boundary. Try this (?!(The|a)\b)\w+.

@stuartlangridge
Copy link
Author

aha! Again, much appreciated; I understand now what I was doing wrong. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants