Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with highlighting #11

Open
Metatr0n opened this issue Jun 19, 2018 · 3 comments
Open

problem with highlighting #11

Metatr0n opened this issue Jun 19, 2018 · 3 comments

Comments

@Metatr0n
Copy link

Hi, Nikos!

Could you help with one problem: i made json-file with grammar (include cyrillic symbols).
All works fine, but tokens "и", "И" behave strangely =)
In word "Привет" symbol "и" will be highlighted (but it's just a letter in this case) ((

Grammar:
`// 1. a partial javascript grammar in simple JSON format

var ldsl_grammar = // a partial javascript grammar in simple JSON format

{

// prefix ID for regular expressions used in the grammar

"RegExpID" : "RE::",

// Style model

"Style" : {

 "comment"                      : "comment"

,"atom"                         : "atom"

,"keyword"                      : "keyword"

,"builtin"                      : "builtin"

,"operator"                     : "operator"

,"identifier"                   : "variable"

,"property"                     : "attribute"

,"number"                       : "number"

,"string"                       : "string"

},

// Lexical model

"Lex" : {

 "comment"                      : {"type":"comment","tokens":[

                                // line comment

                                // start, end delims  (null matches end-of-line)

                                [  "//",  null ],

                                // block comments

                                // start,  end    delims

                                [  "/*",   "*/" ]

                                ]}

,"identifier"                   : "RE::/[_A-Za-z$][_A-Za-z0-9$]*/"

,"property"                     : "RE::/[_A-Za-z$][_A-Za-z0-9$]*/"

,"number"                       : [

                                // floats

                                "RE::/\\d*\\.\\d+(e[\\+\\-]?\\d+)?/",

                                "RE::/\\d+\\.\\d*/",

                                "RE::/\\.\\d+/",

                                // integers

                                // decimal

                                "RE::/[1-9]\\d*(e[\\+\\-]?\\d+)?L?/",

                                // just zero

                                "RE::/0(?![\\dx])/"

                                ]

,"string"                       : {"type":"escaped-block","escape":"\\","tokens":

                                // start, end of string (can be the matched regex group ie. 1 )

                                [ "RE::/(['\"])/",   1 ]

                                }

,"operator"                     : {"tokens":[

                                "+", "-", "*", "/", "не"

                                ]}

,"delimiter"                    : {"tokens":[

                                "(", ")", "[", "]", ",", ":"

                                ]}

,"atom"                         : {"autocomplete":true,"tokens":[

                                "ИСТИНА", "ЛОЖЬ", "и", "или", "ИЛИ", "И"

                                ]}

,"keyword"                      : {"autocomplete":true,"tokens":[

                                "если", "Если", "ЕСЛИ", "в случае", "иначе", "тогда", "содержит", "то", "хоть один из", "ни одного из"

                                ]}

,"builtin"                      : {"autocomplete":true,"tokens":[

                                "КАЖДЫЙ", "ЛЮБОЙ", "ИМПЕРАТИВ", "ТРЕБОВАНИЕ", "Требование"

                                ]}

},

// Syntax model (optional)

"Syntax" : {

"ldsl"                           : "comment | number | string | keyword | operator | atom | builtin | identifier"

},

// what to parse and in what order

"Parser" : [ ["ldsl"] ]

};`

@foo123
Copy link
Owner

foo123 commented Jun 19, 2018

I guess it is highlighted as an atom as per the grammar, however check the combine option for the tokens in the atom Lex part (check the grammar documentation simple tokens). This combines the tokens using a different separator than the default, or does not combine them at all. I think this should be the problem. The combine option takes a string that can be a regular expression eg:

I dont know if word separator \b matches unicode characters as non-word characters, so try using a custom separatorwhich suits your needs and/or make each token a custom regular expression and not combine them into one large regular expression at all.

"atom" : {"autocomplete":true,"tokens":[/**/],"combine":"\\s" // combine using a space separator}
// "combine":false does not combine the tokens into one regular expression at all. Instead it matches one by one as single word, regardsless if they are simply prefix of a whole word, they will be matched

PS: Also the keywords tokens contain spaces, these will be hard to match if they are combines into regular expressions, try using a regular expression instead for each token that is more than one word and contains spaces.

,"keyword"                      : {"autocomplete":true,"tokens":[

                                "если", "Если", "ЕСЛИ", "в случае", "иначе", "тогда", "содержит", "то", "хоть один из", "ни одного из"

try sth like this:

,"keyword"                      : {"autocomplete":[/*put tokens here*/],"tokens":[

                                "RE::/если\\b/", "RE::/в\\sслучае\\b/"// do similar for other tokens. Unicode characters is a pain in the neck!

@Metatr0n
Copy link
Author

Metatr0n commented Jun 20, 2018

Thanks for answer, but still exists problem.
i replaced tokens like "если" with "^если\s" and "\sесли\s".
"^если\s" - works fine
"\sесли\s" - ='(

Tried this on https://regexr.com/. These patterns works well. (for string "привет если привет")
Could you please help with it?

@foo123
Copy link
Owner

foo123 commented Jun 20, 2018

Try sth like:

"keyword": {"autocomplete":["если"],"tokens":[
"RE::/^(если)\\s/" // do similar for other keyword tokens
]}

or try using unicode codes in regular expressions instead of the actual characters. You can use the online tool i pointed to in other issue to generate the unicode codes from a uniicode string

NOTE: that above regex matches the keyword IF it is followed by a space, but it might be followed by a parenthesis or other delimiter eg bracket (or by EOL, end of line $). So you can include all these in the regex above to be sure, simply make sure the main keyword is in its own group 1 of the regex , else all the regex matched will be highlighted. For example a regex like "RE::/^(если)(\\b|\\s|\\(|{|$)/"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants