-
-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Superfluous identical ambiguities in Earley #1436
Comments
P.S. looks like this is happening because of empty rules. For example this is working correctly: grammar1 ="""
start: (/b/ a?)*
a: /a/
"""
grammar2 ="""
start: (/b/ /a/?)*
"""
for g in (grammar1, grammar2):
t = Lark(g, ambiguity="explicit").parse('ba')
print(t.pretty()) So, perhaps it's related to #1312 |
All of the derivations are distinct. They just happen to produce identical trees after tree shaping. This is most easily seen with the from lark import Lark
from lark.parsers.earley_forest import TreeForestTransformer
grammar1 ="""
start: (a?)*
a: /a/
"""
grammar2 ="""
start: (/a/?)*
"""
transformer = TreeForestTransformer(resolve_ambiguity=False)
for g in (grammar1, grammar2):
node = Lark(g, ambiguity="forest", debug=True).parse('a')
print(transformer.transform(node).pretty()) Output:
Note that the "identical" ambiguity behavior can also result from user-defined inlined rules. Example Grammar: start: _a1 | _a2
_a1: /a/
_a2: /a/ IMO, it's not worthwhile for us to detect and filter out identical tree structures in general. Perhaps we could have special logic for the empty star/plus rule case if desired. |
I just checked, and the second example returns two derivations instead of four, now that we added caching of the nodes. That's progress anyhow :) # the new result
_ambig
start a
start a I wonder if we can solve the remaining ambiguity using such a caching mechanism, but calculate the hash in a way that ignores specific rules (i.e. __rules) ?
That's a very good point! But, the repetition operators (+/*) are a core functionality, and Lark kind of pretends that they are actual operators (when in reality they are macros). So it's better not to break this illusion, if we can avoid it. |
Below are two simple grammars. The first one produces 2 identical derivations, and the second one produces 4 identical derivations.
@chanicpanic If you have the time, maybe you could take a look? It looks like it's happening inside the earley-forest.
Output:
The text was updated successfully, but these errors were encountered: