[HTML] Add separate keywords for attributes #251

zufuliu · 2024-05-28T13:28:24Z

Created for https://sourceforge.net/p/scintilla/bugs/514/, copy my last post:

I think a simple fix is just split attribute list from tag list keywords (this is what Notepad4 did since 2018):

case 6:
	wordListN = &keywordsAttr;
	break;

Also, those keywords<n> fields needs better describable names.

Application wants better support (handle attributes for each tag) can opt-in new sub-style classifiers added in Lexilla 5.3.2, and using keywordsAttr for global attributes (see https://html.spec.whatwg.org/multipage/dom.html#global-attributes).

The text was updated successfully, but these errors were encountered:

nyamatongwe · 2024-05-28T22:28:41Z

split attribute list from tag list

That change would not be completely compatible leading to unhighlighted attributes.

zufuliu · 2024-05-29T10:42:39Z

I think it's a compatible change: if app set keywordsAttr (word list 6) to non-empty, we know it want to using separator tag list and attribute list, and then using it instead of word list 0 (tag list) for classifyAttribHTML(). A more compatible change would let app to set which word list is attribute list instead of hardcoded 6. Lightweight app can put all attributes and event handlers into the word list similar to Notepad4 did, e.g. SciTE would need to update:

keywordclass.hypertext=\
$(hypertext.elements) $(hypertext.attributes) $(html5.elements) $(html5.attributes) public !doctype

# attributes
keywords7.$(file.patterns.xxx)=

zufuliu · 2024-06-04T12:14:46Z

Here are two patches to fix this:
html-0604-1.patch

diff --git a/lexers/LexHTML.cxx b/lexers/LexHTML.cxx
index 0bff582d..6191e99e 100644
--- a/lexers/LexHTML.cxx
+++ b/lexers/LexHTML.cxx
@@ -1006,6 +1006,8 @@ class LexerHTML : public DefaultLexer {
 	WordList keywords4;
 	WordList keywords5;
 	WordList keywords6; // SGML (DTD) keywords
+	WordList keywords7;
+	const WordList *attributeList;
 	OptionsHTML options;
 	OptionSetHTML osHTML;
 	std::set<std::string> nonFoldingTags;
@@ -1019,6 +1021,7 @@ public:
 			isXml_ ?  std::size(lexicalClassesXML) : std::size(lexicalClassesHTML)),
 		isXml(isXml_),
 		isPHPScript(isPHPScript_),
+		attributeList(&keywords),
 		osHTML(isPHPScript_),
 		nonFoldingTags(std::begin(tagsThatDoNotFold), std::end(tagsThatDoNotFold)) {
 	}
@@ -1115,6 +1118,10 @@ Sci_Position SCI_METHOD LexerHTML::WordListSet(int n, const char *wl) {
 	case 5:
 		wordListN = &keywords6;
 		break;
+	case 6:
+		wordListN = &keywords7;
+		attributeList = wl[0] ? wordListN : &keywords;
+		break;
 	default:
 		break;
 	}
@@ -2022,7 +2029,7 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int
 			break;
 		case SCE_H_ATTRIBUTE:
 			if (!setAttributeContinue.Contains(ch)) {
-				isLanguageType = classifyAttribHTML(inScriptType, styler.GetStartSegment(), i - 1, keywords, classifierAttributes, styler, lastTag);
+				isLanguageType = classifyAttribHTML(inScriptType, styler.GetStartSegment(), i - 1, *attributeList, classifierAttributes, styler, lastTag);
 				if (ch == '>') {
 					styler.ColourTo(i, SCE_H_TAG);
 					if (inScriptType == eNonHtmlScript) {
diff --git a/test/examples/hypertext/SciTE.properties b/test/examples/hypertext/SciTE.properties
index e3aa3c28..eb53d607 100644
--- a/test/examples/hypertext/SciTE.properties
+++ b/test/examples/hypertext/SciTE.properties
@@ -1,7 +1,6 @@
 lexer.*=hypertext
-# Tags and attributes
-keywords.*=b br body content encoding head href html img language li link meta \
-name p rel runat script src strong title type ul version xml xmlns
+# Tags
+keywords.*=b br body head html img li link meta p script strong title ul xml
 # JavaScript
 keywords2.*=function var
 # Basic
@@ -12,6 +11,8 @@ keywords4.*=import pass
 keywords5.*=echo __file__ __line__
 # SGML
 keywords6.*=ELEMENT
+# Attributes
+keywords7.*=content encoding href language name rel runat src type version xmlns
 
 # Tag
 substyles.hypertext.1=1

html-0604-2.patch

diff --git a/lexers/LexHTML.cxx b/lexers/LexHTML.cxx
index 0bff582d..bbb5fbbc 100644
--- a/lexers/LexHTML.cxx
+++ b/lexers/LexHTML.cxx
@@ -1006,6 +1006,8 @@ class LexerHTML : public DefaultLexer {
 	WordList keywords4;
 	WordList keywords5;
 	WordList keywords6; // SGML (DTD) keywords
+	WordList keywords7;
+	const WordList *attributeList;
 	OptionsHTML options;
 	OptionSetHTML osHTML;
 	std::set<std::string> nonFoldingTags;
@@ -1019,6 +1021,7 @@ public:
 			isXml_ ?  std::size(lexicalClassesXML) : std::size(lexicalClassesHTML)),
 		isXml(isXml_),
 		isPHPScript(isPHPScript_),
+		attributeList(&keywords),
 		osHTML(isPHPScript_),
 		nonFoldingTags(std::begin(tagsThatDoNotFold), std::end(tagsThatDoNotFold)) {
 	}
@@ -1115,6 +1118,9 @@ Sci_Position SCI_METHOD LexerHTML::WordListSet(int n, const char *wl) {
 	case 5:
 		wordListN = &keywords6;
 		break;
+	case 6:
+		wordListN = &keywords7;
+		break;
 	default:
 		break;
 	}
@@ -1122,6 +1128,9 @@ Sci_Position SCI_METHOD LexerHTML::WordListSet(int n, const char *wl) {
 	if (wordListN) {
 		if (wordListN->Set(wl)) {
 			firstModification = 0;
+			if (n == 6) {
+				attributeList = wordListN->Length() ? wordListN : &keywords;
+			}
 		}
 	}
 	return firstModification;
@@ -2022,7 +2031,7 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int
 			break;
 		case SCE_H_ATTRIBUTE:
 			if (!setAttributeContinue.Contains(ch)) {
-				isLanguageType = classifyAttribHTML(inScriptType, styler.GetStartSegment(), i - 1, keywords, classifierAttributes, styler, lastTag);
+				isLanguageType = classifyAttribHTML(inScriptType, styler.GetStartSegment(), i - 1, *attributeList, classifierAttributes, styler, lastTag);
 				if (ch == '>') {
 					styler.ColourTo(i, SCE_H_TAG);
 					if (inScriptType == eNonHtmlScript) {
diff --git a/test/examples/hypertext/SciTE.properties b/test/examples/hypertext/SciTE.properties
index e3aa3c28..eb53d607 100644
--- a/test/examples/hypertext/SciTE.properties
+++ b/test/examples/hypertext/SciTE.properties
@@ -1,7 +1,6 @@
 lexer.*=hypertext
-# Tags and attributes
-keywords.*=b br body content encoding head href html img language li link meta \
-name p rel runat script src strong title type ul version xml xmlns
+# Tags
+keywords.*=b br body head html img li link meta p script strong title ul xml
 # JavaScript
 keywords2.*=function var
 # Basic
@@ -12,6 +11,8 @@ keywords4.*=import pass
 keywords5.*=echo __file__ __line__
 # SGML
 keywords6.*=ELEMENT
+# Attributes
+keywords7.*=content encoding href language name rel runat src type version xmlns
 
 # Tag
 substyles.hypertext.1=1

html-0604-1.patch set attribute list when parameter is not an empty string (decouple from tag list without set anything):

attributeList = wl[0] ? wordListN : &keywords;

html-0604-2.patch set attribute list only when parameter contains word:

if (n == 6) {
	attributeList = wordListN->Length() ? wordListN : &keywords;
}

nyamatongwe · 2024-06-04T23:46:39Z

This appears to duplicate behaviour available from substyles added with 1c6e3e5 so isn't necessary.

zufuliu · 2024-06-05T10:41:08Z

I don't think it's duplicate. Even with substyles, body is still classified as SCE_H_ATTRIBUTE:

lexilla/lexers/LexHTML.cxx

Lines 259 to 269 in 85d1d67

    
           bool classifyAttribHTML(script_mode inScriptType, Sci_PositionU start, Sci_PositionU end, const WordList &keywords, const WordClassifier &classifier, Accessor &styler, const std::string &tag) { 
        
           	int chAttr = SCE_H_ATTRIBUTEUNKNOWN; 
        
           	bool isLanguageType = false; 
        
           	if (IsNumberChar(styler[start])) { 
        
           		chAttr = SCE_H_NUMBER; 
        
           	} else { 
        
           		const std::string s = styler.GetRangeLowered(start, end+1); 
        
           		if (keywords.InList(s)) { 
        
           			chAttr = SCE_H_ATTRIBUTE; 
        
           		} else { 
        
           			int subStyle = classifier.ValueFor(s);

So, there need a way to not use keywords (word list 0) to classify attributes. The patch provided a method that's easy for app to migration than implement substyles.

nyamatongwe added the html Caused by the hypertext lexer label May 28, 2024

zufuliu changed the title ~~[HTML] Add new keywords for global attributes~~ [HTML] Add separator keywords for attributes May 29, 2024

nyamatongwe changed the title ~~[HTML] Add separator keywords for attributes~~ [HTML] Add separate keywords for attributes Jun 4, 2024

This comment was marked as duplicate.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HTML] Add separate keywords for attributes #251

[HTML] Add separate keywords for attributes #251

zufuliu commented May 28, 2024

nyamatongwe commented May 28, 2024

zufuliu commented May 29, 2024

zufuliu commented Jun 4, 2024

nyamatongwe commented Jun 4, 2024

zufuliu commented Jun 5, 2024

This comment was marked as duplicate.

[HTML] Add separate keywords for attributes #251

[HTML] Add separate keywords for attributes #251

Comments

zufuliu commented May 28, 2024

nyamatongwe commented May 28, 2024

zufuliu commented May 29, 2024

zufuliu commented Jun 4, 2024

nyamatongwe commented Jun 4, 2024

zufuliu commented Jun 5, 2024

This comment was marked as duplicate.