Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layout on extension productions can cause unexpected parse errors in host code #47

Open
krame505 opened this issue Feb 14, 2020 · 1 comment

Comments

@krame505
Copy link
Member

Some unexpected behavior was discovered when experimenting with layout on an ableC extension for prolog-style logic programming. The following is a somewhat simplified grammar that seems to exhibit the same issue.

Grammar "host":

ignore terminal WhiteSpace_t /[\t\r\n ]+/;

terminal Plus_t '+' association=left, precedence=0;
terminal Mod_t '%' association=left, precedence=1;

terminal LParen_t '(';
terminal RParen_t ')';
terminal LCurly_t '{';
terminal RCurly_t '}';
terminal Semi_t ';';

terminal Id_t /[a-zA-Z]+/;

nonterminal Stmt;
concrete productions top::Stmt
| Expr ';' {}
| '{' Stmt '}' {}

nonterminal Expr;
concrete productions top::Expr
| '(' Expr ')' {}
| Expr '+' Expr {}
| Expr '%' Expr {}
| Id_c '(' ')' {}
| Id_c {}

nonterminal Id_c;
concrete productions top::Id_c
| Id_t {}

Grammar "ext":

terminal ExtComment_t /% .*/;

marking terminal Ext_t 'ext' dominates Id_t;
terminal Dot_t '.';

concrete production extProd
top::Stmt ::= 'ext' '{' id::Id_c '(' ')' '.' '}'
layout { ExtComment_t }
{}

Using a parser built only from "host" the string

{
  a % b;
}

parses successfully. However using a parser containing both host and ext (generated copper spec for reference: Parser_copper_features_test_layout_lookahead_parse_ext.copper) for the same string, the following parse error results:

   Error at line 3, column 0 in file 
         (parser state: 3; real character index: 12):
  Expected a token of one of the following types:
   [copper_features:test_layout:lookahead:ext:ExtComment_t,
    '(',
    '%',
    '+',
    ')',
    ';',
    copper_features:test_layout:lookahead:host:WhiteSpace_t]
   Input currently matches:
   ['}']

This is a rather unexpected result because the introduction of an extension causes unrelated, existing code to suddenly break, without any sort of lexical ambiguity being raised. If I correctly understand what is going on here, this is happens because of DFA states differing only in lookahead being merged, resulting in the layout terminal dominating due to maximal munch?

This behavior seems rather undesirable, and at least should emit some sort of warning. Or would it even be possible to modify the LALR(1) parser construction algorithm to not merge states that have different layout?

@schwerdf
Copy link
Contributor

In theory, there are many ways to modify LR(1) state merging -- one example being IELR(1), which does not merge any states where the merging would cause a parse table conflict.
In practice, since Copper builds its LALR(1) DFAs by first building an LR(0) DFA and then adding lookahead as a post-process, no explicit merging is ever done and generating separate states of this sort would be very complicated (not to mention potentially very expensive in terms of state counts).
For adding a warning, one could conceivably check scanner states to see whether, in any state where a non-layout terminal is in the accept set, a layout terminal is in the possible set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants