Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic actions on layout terminals are delayed #52

Open
krame505 opened this issue Nov 5, 2020 · 10 comments
Open

Semantic actions on layout terminals are delayed #52

krame505 opened this issue Nov 5, 2020 · 10 comments

Comments

@krame505
Copy link
Member

krame505 commented Nov 5, 2020

For context, this problem showed up in ableC, where we are trying to use the C preprocessor tags to control the disambiguation of identifiers, based on whether we are currently parsing a system header file. These preprocessor tags are recognized as a layout terminal, with a semantic action that sets a parser attribute which is later referenced in a disambiguation function.

The following is a simple example in Silver:

grammar layoutactions;

parser attribute isMarked :: Boolean action { isMarked = false; };

ignore terminal Marker_t '#'
  action {
    print "Shifted Marker";
    isMarked = true;
  };

terminal Id1_t /[a-zA-Z]+/ action { print "Shifted Id1"; };
terminal Id2_t /[a-zA-Z]+/ action { print "Shifted Id2"; };

disambiguate Id1_t, Id2_t {
  print "Disambiguating Id";
  pluck if isMarked then Id1_t else Id2_t;
}

nonterminal Foo;
concrete productions top::Foo
| Id1_t {} action { print "Reduced 1"; }
| Id2_t {} action { print "Reduced 2"; }

parser parse :: Foo {  layoutactions; }
function main 
IOVal<Integer> ::= args::[String] ioin::IO
{
  local result::ParseResult<Foo> = parse(head(args), "test");
  return
    if null(args) then ioval(print("no argument\n", ioin), 1)
    else if !result.parseSuccess then ioval(print(result.parseErrors ++ "\n", ioin), 0)
    else ioval(ioin, 0);
}

When run on the input #a, this prints

Disambiguating Id
Shifted Marker
Shifted Id2
Reduced 2

The disambiguation function is being called before the semantic action for '#' has been run, thus giving the wrong result.

As a (very hacky) workaround, one can introduce a second, ambiguous layout terminal and use a disambiguation function to update the parser attribute in the disambiguation function:

ignore terminal Marker2_t '#';
disambiguate Marker_t, Marker2_t {
  print "Disambiguating Marker";
  isMarked = true;
  pluck Marker_t;
}

This gives the desired output of

Disambiguating Marker
Disambiguating Id
Shifted Marker
Shifted Id1
Reduced 1

This workaround does work in the context of ableC but is something that we would greatly wish to avoid.

Since we evidently know which layout terminal is going to be shifted as soon as the disambiguation function is run, why aren't the semantic actions executed immediately, before attempting to disambiguate the next terminal? I vaguely remember there being a reason for this, but I don't remember what it was.

@schwerdf
Copy link
Contributor

schwerdf commented Nov 6, 2020

As a general rule, semantic actions are run when a terminal is consumed by the parser, rather than when it is originally scanned (potentially as yet-unconsumed lookahead). Since all the layout preceding a terminal is scanned at the same time as the terminal, the parser sees it all as one unit, and they are all considered "consumed" at the same time as the terminal (i.e., when the terminal is shifted).

@krame505
Copy link
Member Author

krame505 commented Nov 6, 2020

Would it be reasonable to consider adding a second type of semantic action for terminals that is run as soon as the terminal is scanned? This is the behavior that we want for recognizing C preprocessor tags to use them in disambiguation. As I mentioned we can achieve this in a hacky way with a bogus lexical ambiguity and disambiguation function, but this is rather suboptimal.

@schwerdf
Copy link
Contributor

schwerdf commented Nov 9, 2020

That sounds like it would just be sweeping the hackiness under the rug -- adding an extra wrinkle to the tool to straighten out the representation of this grammar. We had to add disambiguation functions because Copper could not parse C without them; but in this case, since Copper can do what needs to be done, it does not seem worth adding that kind of complication.

@ericvanwyk
Copy link
Contributor

@schwerdf - do you think the current behavior is the correct one? I think what Copper does now makes sense given our early understanding of things, but if shifting a token, or dropping a layout one, has a semantic action that affects how the following characters are processed it seems that we should be executing these actions as the tokens are processed instead of waiting until a batch is given to the parser. Or am I missing something here?

@schwerdf
Copy link
Contributor

On terminals in traditional parsing frameworks, one has scanner actions (applied when the terminal is matched in the scanner) and parser actions (applied when the terminal is shifted in the parser). In Copper, disambiguation functions have to be scanner actions, but the semantic actions on non-layout terminals have to be parser actions, since they return result values that go into the parse tree. This leaves actions on layout terminals (normally invisible to the parser) in a gray area.

Historically, we had no choice but to run these actions as a group at the next shift -- scans could potentially be re-run after a reduce action, and ambiguities were allowed between layout terminals. There are still three reasons to do it this way: (1) to prevent complications when re-running the scan in the event of a syntax error; (2) to avoid tightening the screws on the behavior of a compatible parser, by providing tighter guarantees of the relative times at which these actions will be run; and (3) to maintain the former behavior that may be relied on by existing grammars.

If these concerns are surmountable, then it should not be very complicated to rearrange things to run the actions on layout terminals when they are matched.

@ericvanwyk
Copy link
Contributor

Thanks, August. Let me look into these concerns a bit. We at least have hacky solution to the immediate problem.

@schwerdf
Copy link
Contributor

On a closer examination of the parser engine class, scratch concern (1): the "disjoint scan" after an error used to re-scan preceding layout as well, but now starts at the exact point of failure, so the layout is never re-scanned.

@ericvanwyk
Copy link
Contributor

That is good. Thanks.

@krame505
Copy link
Member Author

So I found a reason why we don't want to simply change the behavior of semantic actions on layout terminals to be scanner actions instead of parser actions - we were using the action block on CPP tag terminals to update the location information, which should happen in-order with the actions of other non-layout terminals to ensure they receive the correct location (moving this entire action block into a disambiguation function introduced a bug with location tracking - fixed in melt-umn/ableC#171.)

So it seems for this case we really do want to be able to define seperate scanner and parser actions for terminals.

@schwerdf
Copy link
Contributor

schwerdf commented Nov 19, 2020

The change as I envisioned it would not alter the order in which the actions are run relative to each other; the only noticeable change would be relative to disambiguation functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants