-
-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Porting from pyparsing match_previous_literal #1437
Comments
LALR only parses context-free grammars. However, a function like match_previous_literal() is context-sensitive. So the parser can't help you there. But a regular expression should be capable of matching this construct. It sounds like the best solution is to create a regexp terminal that matches the entire expression, and then in post-processing run the regexp again to parse the string. (using groups) |
Hi Erez, thanks for pointing me into the right direction with RE. I confirm that it is as easy as
to back-reference in the RE the matching first character. The complete expression is handed to the post-processor which will then still need to take it apart using Python code. Thx for this nice tool. |
Update: The error message is The workaround is to use named capturing group which results in a grammar of
This is agnostic to the lexer adding additional groups. |
That makes sense. Maybe there's a way we could make it work, by fixing indexes for example. Although requiring names for back-references almost sounds like a win :) |
What is your question?
I am porting from pyparsing to Lark due to expectations on increased performance. Initial test show very promising.
One of the few constructs that I could not identify how to express in Lark is
match_previous_literal
. It allows to dynamically match based on a previously matched literal.(see pyparsing-docs.readthedocs.io
I need this functionality to match a
sed
search and replace like construct.${var:S/from/to/g}
including all of its equivalents, like e.g.${var:S#from#to#g}
I would like to match the character after the signaling token :S and use that as the delimiter for the expression. The delimited content must be parsed as well, i.e. may contain constructs like ${var2}.
If you're having trouble with your code or grammar
Currently I am using a workaround using templates simply listing some commonly used separators, but this is not exhaustive and not generic. Almost any character can be used as the separator
I wonder if there is any possibility to define this generically in the grammar. I could live with an interim solution that retrieves the delimited content and run a parse on it in a post-processing step.
The text was updated successfully, but these errors were encountered: