You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some unexpected behavior was discovered when experimenting with layout on an ableC extension for prolog-style logic programming. The following is a somewhat simplified grammar that seems to exhibit the same issue.
Error at line 3, column 0 in file
(parser state: 3; real character index: 12):
Expected a token of one of the following types:
[copper_features:test_layout:lookahead:ext:ExtComment_t,
'(',
'%',
'+',
')',
';',
copper_features:test_layout:lookahead:host:WhiteSpace_t]
Input currently matches:
['}']
This is a rather unexpected result because the introduction of an extension causes unrelated, existing code to suddenly break, without any sort of lexical ambiguity being raised. If I correctly understand what is going on here, this is happens because of DFA states differing only in lookahead being merged, resulting in the layout terminal dominating due to maximal munch?
This behavior seems rather undesirable, and at least should emit some sort of warning. Or would it even be possible to modify the LALR(1) parser construction algorithm to not merge states that have different layout?
The text was updated successfully, but these errors were encountered:
In theory, there are many ways to modify LR(1) state merging -- one example being IELR(1), which does not merge any states where the merging would cause a parse table conflict.
In practice, since Copper builds its LALR(1) DFAs by first building an LR(0) DFA and then adding lookahead as a post-process, no explicit merging is ever done and generating separate states of this sort would be very complicated (not to mention potentially very expensive in terms of state counts).
For adding a warning, one could conceivably check scanner states to see whether, in any state where a non-layout terminal is in the accept set, a layout terminal is in the possible set.
Some unexpected behavior was discovered when experimenting with layout on an ableC extension for prolog-style logic programming. The following is a somewhat simplified grammar that seems to exhibit the same issue.
Grammar "host":
Grammar "ext":
Using a parser built only from "host" the string
parses successfully. However using a parser containing both host and ext (generated copper spec for reference: Parser_copper_features_test_layout_lookahead_parse_ext.copper) for the same string, the following parse error results:
This is a rather unexpected result because the introduction of an extension causes unrelated, existing code to suddenly break, without any sort of lexical ambiguity being raised. If I correctly understand what is going on here, this is happens because of DFA states differing only in lookahead being merged, resulting in the layout terminal dominating due to maximal munch?
This behavior seems rather undesirable, and at least should emit some sort of warning. Or would it even be possible to modify the LALR(1) parser construction algorithm to not merge states that have different layout?
The text was updated successfully, but these errors were encountered: