GLR returning iterator of possible parses #88

SupraSummus · 2018-11-15T19:27:50Z

(First of all I don't even know if this is theoretically possible.)

Idea is to have GLRParser.parse(...) return iterator, and each value in that iterator is computed on demand. Parses are sorted for example by order of alternatives chosen during the parse.

Advantages:

this solution is superset of current one. User can always do list(parses) to get evaluated list of all parses.
Possibility to get relatively quick parse on very ambiguous grammar and then if a parse is not desired take next one.
Possibility to apply a crude disambiguation technique by placing production alternatives in right order, and then selecting first parse.

The text was updated successfully, but these errors were encountered:

igordejanovic · 2018-11-16T09:12:24Z

I'm quite sure that it's not theoretically possible :)

GLR constructs trees in parallel, not sequential order like some top-down approaches might be doing. All possible sub-trees are constructed as the input is consumed. Furthermore, all trees are packed and common sub-tree sharing is used as it is natural to do so even with an ambiguous grammars memory consumption increases moderately (just to account for the difference). LR parsing (GLR is just an extension) uses CFG where choice operator is unordered by design so it doesn't matter in what order you place it. CFGs are declarative in nature. Calculated parsing tables will not depend on the order of productions, so the parser wouldn't see any difference.

Iterators/generators would be possible for top-down approaches with backtracking for example (like PEG based parsers which naturally use ordered choice thus are more imperative in nature). E.g. Arpeggio and textX are using this parsing approach so it should be possible to do.
For example, you get a first tree and to get the next you continue parsing as if the last match is not successful. The parser do backtrack and continue with the next. As an extra bonus the trees would be returned in a deterministic order. PEGs are always unambiguous so are not meant to be used for the definition of ambiguous languages, but with this hack I think it could be done. The question is would it be usable?

SupraSummus · 2018-11-18T21:25:52Z

GLR must construct trees in parallel when the input is a non-seekable stream. Parglare isn't based on this assumption - it is not a stream parser and it has random access to the input. This allows to process live heads in any order. For example you can take one head and feed input to it until it forks or succeeds or dies. In case of fork take leftmost head and continue with it. "Sleeping heads" must be stored of course as save-points for "backtracking" (or whatever this is called in case of this GLR variant) along with context info - the position in input sequence.

As for parse ordering: the only requirement to make parses ordered is to order actions in each state. And this is, I believe, possible.

Use case for "ordered, lazy GLR" is parsing programming languages, which has their ambiguity resolved by specifying that first parse is preferred and parses are ordered by order of chosen alternatives. Dhall is an example.

igordejanovic · 2018-11-18T23:01:24Z

That wouldn't be GLR parser anymore. It would be a backtracking parser. The power of GLR comes from its "online" style of work where it takes a token and do as much work as possible. GLR heads don't only fork, they merge if found in the same state which makes GLR efficient. Worst case is O(n^3) while for nearly deterministic languages runs in O(n). Backtracking parsers could go exponential for certain grammars.

PEG based parsers would do what you need: backtracking and disambiguation by choice ordering. This very difference between PEG parsing and GLR is one of the reason why I started this project.
By using alternative ordering for disambiguation you loose the declarative nature of CFG. You don't specify the language anymore, you specify the parser, i.e. tell parser how to parse.
You can read more about motivation in my blog post.

Furthermore, GLR is based on DFA and shift-reduce, so the relation between production order and the shift/reduce state actions is something that needs investigation.

I still think that this is something that would be artificial for GLR. It is more natural to be done on recursive descent parsers. But, if you have time to research/hack in this direction I don't mind :) I would just like, if planed to be integrated in this fork, to be made backward compatible, and doen't degrade performance, so it wont disturb existing users.

SupraSummus · 2018-11-18T23:15:52Z

PEG seems like a way to go for my use case, indeed.

igordejanovic · 2021-06-19T19:22:53Z

In version 0.14.0 GLR parser returns the forest object which is iterable producing lazy trees. It doesn't work as requested here as it wouldn't be possible but from the user point of view it doesn't differ much. The parser will parse the complete input and then build trees lazily during iteration. See the docs.

igordejanovic added the discussion label Nov 16, 2018

igordejanovic added the proposal label Nov 16, 2018

igordejanovic changed the title ~~Proposal: GLR returning iterator of possible parses~~ GLR returning iterator of possible parses Nov 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GLR returning iterator of possible parses #88

GLR returning iterator of possible parses #88

SupraSummus commented Nov 15, 2018 •

edited

igordejanovic commented Nov 16, 2018 •

edited

SupraSummus commented Nov 18, 2018

igordejanovic commented Nov 18, 2018

SupraSummus commented Nov 18, 2018

igordejanovic commented Jun 19, 2021

GLR returning iterator of possible parses #88

GLR returning iterator of possible parses #88

Comments

SupraSummus commented Nov 15, 2018 • edited

igordejanovic commented Nov 16, 2018 • edited

SupraSummus commented Nov 18, 2018

igordejanovic commented Nov 18, 2018

SupraSummus commented Nov 18, 2018

igordejanovic commented Jun 19, 2021

SupraSummus commented Nov 15, 2018 •

edited

igordejanovic commented Nov 16, 2018 •

edited