You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to apply a general grammar on various types of text files; specifically on code and documentation files in languages such as python, C, LaTeX... All of these use different comment characters, since my grammar has a keen interest in comments, the COMMENT_CHAR terminal must be set to the right value for each file I need to parse.
I recommend lark should allow users to change/set terminal values. Specifically,
Alternatives
I have already tried using edit_terminals to produce this behavior. However, edit_terminals occurs after terminals values are processed and compressed into a minimal set of tokens. Although one could modify these complex regexes, this requires the user to understand how lark works behind the scenes, would be very clunky, and prone to errors.
Although one could also modify the input grammar before passing it to lark, text processing should be lark's responsibility. Having to parse a grammar and modify it so lark can parse the user's files seems a bit backwards.
Context
This my original post and has an example to describe the problem:
I'm having an issue using edit_terminals: I'm finding that Lark is compressing the terminals I've defined before I use edit_terminals.
This is my grammar; the terminal I am looking to modify is COMMENT_CHAR to support multiple languages:
start: (snippet|LINE)*snippet:snippet_markerLINE*// TODO: Evaluate if the prefix for each token should be in or out of the token.snippet_marker.1:PREFIXMARKER_IWS*/.+/SUFFIXPREFIX:_IWS*COMMENT_CHAR_IWS*MARKER:_SPIDERSUFFIX:_EOL_DEFINITION_TOKENS:CONTEXT_IWS*TOPIC_IWS*CONTENT_TYPE_BOOLEAN_FLAGS:_DEFINITION_TOKENS? (LINK|EMBEDDING) // Makes the definition tokens filtering optionsCONTEXT:"#"/\w{1,16}/// What's the general theme?TOPIC:"@"/\w{1,16}/// What is this snippet about?CONTENT_TYPE:"$"/[DRAC]/// Documentation, Reasoning, API, CodeLINK:"?"EMBEDDING:"!"// Resources_SPIDER:"//\(oo)/\\"_BORING:"SNIPPET"_ROBOT:"[o_o]"LINE:_IWS*COMMENT_CHAR?_SENTENCE?_EOL_SENTENCE: (_WORD_IWS+)*_WORD_WORD:/\S+/COMMENT_CHAR:"TO BE OVERRIDE BY PROGRAM- DO NOT REMOVE - DO NOT USE ANOTHER TOKEN FOR COMMENT CHAR"_EOL:_IWS*_NL_IWS:/[\t]/_NL:/\r?\n/
Clearly COMMENT_CHAR was absorbed by PREFIX, which makes it difficult to consistently.
There's a possibility I'm not using this right, but I also feel the edit_terminals option should occur before compression, otherwise users need to predict compression to use it consistently.
Thank you,
Clement
The text was updated successfully, but these errors were encountered:
The Problem
I am trying to apply a general grammar on various types of text files; specifically on code and documentation files in languages such as python, C, LaTeX... All of these use different comment characters, since my grammar has a keen interest in comments, the
COMMENT_CHAR
terminal must be set to the right value for each file I need to parse.I recommend lark should allow users to change/set terminal values. Specifically,
Alternatives
I have already tried using
edit_terminals
to produce this behavior. However,edit_terminals
occurs after terminals values are processed and compressed into a minimal set of tokens. Although one could modify these complex regexes, this requires the user to understand how lark works behind the scenes, would be very clunky, and prone to errors.Although one could also modify the input grammar before passing it to lark, text processing should be lark's responsibility. Having to parse a grammar and modify it so lark can parse the user's files seems a bit backwards.
Context
This my original post and has an example to describe the problem:
I'm having an issue using
edit_terminals
: I'm finding that Lark is compressing the terminals I've defined before I useedit_terminals
.This is my grammar; the terminal I am looking to modify is
COMMENT_CHAR
to support multiple languages:This is the python:
But the output is:
Clearly
COMMENT_CHAR
was absorbed byPREFIX
, which makes it difficult to consistently.There's a possibility I'm not using this right, but I also feel the
edit_terminals
option should occur before compression, otherwise users need to predict compression to use it consistently.Thank you,
Clement
The text was updated successfully, but these errors were encountered: