Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HTML] Add separate keywords for attributes #251

Open
zufuliu opened this issue May 28, 2024 · 6 comments
Open

[HTML] Add separate keywords for attributes #251

zufuliu opened this issue May 28, 2024 · 6 comments
Labels
html Caused by the hypertext lexer

Comments

@zufuliu
Copy link
Contributor

zufuliu commented May 28, 2024

Created for https://sourceforge.net/p/scintilla/bugs/514/, copy my last post:

I think a simple fix is just split attribute list from tag list keywords (this is what Notepad4 did since 2018):

case 6:
	wordListN = &keywordsAttr;
	break;

Also, those keywords<n> fields needs better describable names.

Application wants better support (handle attributes for each tag) can opt-in new sub-style classifiers added in Lexilla 5.3.2, and using keywordsAttr for global attributes (see https://html.spec.whatwg.org/multipage/dom.html#global-attributes).

@nyamatongwe nyamatongwe added the html Caused by the hypertext lexer label May 28, 2024
@nyamatongwe
Copy link
Member

split attribute list from tag list

That change would not be completely compatible leading to unhighlighted attributes.

@zufuliu zufuliu changed the title [HTML] Add new keywords for global attributes [HTML] Add separator keywords for attributes May 29, 2024
@zufuliu
Copy link
Contributor Author

zufuliu commented May 29, 2024

I think it's a compatible change: if app set keywordsAttr (word list 6) to non-empty, we know it want to using separator tag list and attribute list, and then using it instead of word list 0 (tag list) for classifyAttribHTML(). A more compatible change would let app to set which word list is attribute list instead of hardcoded 6. Lightweight app can put all attributes and event handlers into the word list similar to Notepad4 did, e.g. SciTE would need to update:

keywordclass.hypertext=\
$(hypertext.elements) $(hypertext.attributes) $(html5.elements) $(html5.attributes) public !doctype

# attributes
keywords7.$(file.patterns.xxx)=

@zufuliu
Copy link
Contributor Author

zufuliu commented Jun 4, 2024

Here are two patches to fix this:
html-0604-1.patch

diff --git a/lexers/LexHTML.cxx b/lexers/LexHTML.cxx
index 0bff582d..6191e99e 100644
--- a/lexers/LexHTML.cxx
+++ b/lexers/LexHTML.cxx
@@ -1006,6 +1006,8 @@ class LexerHTML : public DefaultLexer {
 	WordList keywords4;
 	WordList keywords5;
 	WordList keywords6; // SGML (DTD) keywords
+	WordList keywords7;
+	const WordList *attributeList;
 	OptionsHTML options;
 	OptionSetHTML osHTML;
 	std::set<std::string> nonFoldingTags;
@@ -1019,6 +1021,7 @@ public:
 			isXml_ ?  std::size(lexicalClassesXML) : std::size(lexicalClassesHTML)),
 		isXml(isXml_),
 		isPHPScript(isPHPScript_),
+		attributeList(&keywords),
 		osHTML(isPHPScript_),
 		nonFoldingTags(std::begin(tagsThatDoNotFold), std::end(tagsThatDoNotFold)) {
 	}
@@ -1115,6 +1118,10 @@ Sci_Position SCI_METHOD LexerHTML::WordListSet(int n, const char *wl) {
 	case 5:
 		wordListN = &keywords6;
 		break;
+	case 6:
+		wordListN = &keywords7;
+		attributeList = wl[0] ? wordListN : &keywords;
+		break;
 	default:
 		break;
 	}
@@ -2022,7 +2029,7 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int
 			break;
 		case SCE_H_ATTRIBUTE:
 			if (!setAttributeContinue.Contains(ch)) {
-				isLanguageType = classifyAttribHTML(inScriptType, styler.GetStartSegment(), i - 1, keywords, classifierAttributes, styler, lastTag);
+				isLanguageType = classifyAttribHTML(inScriptType, styler.GetStartSegment(), i - 1, *attributeList, classifierAttributes, styler, lastTag);
 				if (ch == '>') {
 					styler.ColourTo(i, SCE_H_TAG);
 					if (inScriptType == eNonHtmlScript) {
diff --git a/test/examples/hypertext/SciTE.properties b/test/examples/hypertext/SciTE.properties
index e3aa3c28..eb53d607 100644
--- a/test/examples/hypertext/SciTE.properties
+++ b/test/examples/hypertext/SciTE.properties
@@ -1,7 +1,6 @@
 lexer.*=hypertext
-# Tags and attributes
-keywords.*=b br body content encoding head href html img language li link meta \
-name p rel runat script src strong title type ul version xml xmlns
+# Tags
+keywords.*=b br body head html img li link meta p script strong title ul xml
 # JavaScript
 keywords2.*=function var
 # Basic
@@ -12,6 +11,8 @@ keywords4.*=import pass
 keywords5.*=echo __file__ __line__
 # SGML
 keywords6.*=ELEMENT
+# Attributes
+keywords7.*=content encoding href language name rel runat src type version xmlns
 
 # Tag
 substyles.hypertext.1=1

html-0604-2.patch

diff --git a/lexers/LexHTML.cxx b/lexers/LexHTML.cxx
index 0bff582d..bbb5fbbc 100644
--- a/lexers/LexHTML.cxx
+++ b/lexers/LexHTML.cxx
@@ -1006,6 +1006,8 @@ class LexerHTML : public DefaultLexer {
 	WordList keywords4;
 	WordList keywords5;
 	WordList keywords6; // SGML (DTD) keywords
+	WordList keywords7;
+	const WordList *attributeList;
 	OptionsHTML options;
 	OptionSetHTML osHTML;
 	std::set<std::string> nonFoldingTags;
@@ -1019,6 +1021,7 @@ public:
 			isXml_ ?  std::size(lexicalClassesXML) : std::size(lexicalClassesHTML)),
 		isXml(isXml_),
 		isPHPScript(isPHPScript_),
+		attributeList(&keywords),
 		osHTML(isPHPScript_),
 		nonFoldingTags(std::begin(tagsThatDoNotFold), std::end(tagsThatDoNotFold)) {
 	}
@@ -1115,6 +1118,9 @@ Sci_Position SCI_METHOD LexerHTML::WordListSet(int n, const char *wl) {
 	case 5:
 		wordListN = &keywords6;
 		break;
+	case 6:
+		wordListN = &keywords7;
+		break;
 	default:
 		break;
 	}
@@ -1122,6 +1128,9 @@ Sci_Position SCI_METHOD LexerHTML::WordListSet(int n, const char *wl) {
 	if (wordListN) {
 		if (wordListN->Set(wl)) {
 			firstModification = 0;
+			if (n == 6) {
+				attributeList = wordListN->Length() ? wordListN : &keywords;
+			}
 		}
 	}
 	return firstModification;
@@ -2022,7 +2031,7 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int
 			break;
 		case SCE_H_ATTRIBUTE:
 			if (!setAttributeContinue.Contains(ch)) {
-				isLanguageType = classifyAttribHTML(inScriptType, styler.GetStartSegment(), i - 1, keywords, classifierAttributes, styler, lastTag);
+				isLanguageType = classifyAttribHTML(inScriptType, styler.GetStartSegment(), i - 1, *attributeList, classifierAttributes, styler, lastTag);
 				if (ch == '>') {
 					styler.ColourTo(i, SCE_H_TAG);
 					if (inScriptType == eNonHtmlScript) {
diff --git a/test/examples/hypertext/SciTE.properties b/test/examples/hypertext/SciTE.properties
index e3aa3c28..eb53d607 100644
--- a/test/examples/hypertext/SciTE.properties
+++ b/test/examples/hypertext/SciTE.properties
@@ -1,7 +1,6 @@
 lexer.*=hypertext
-# Tags and attributes
-keywords.*=b br body content encoding head href html img language li link meta \
-name p rel runat script src strong title type ul version xml xmlns
+# Tags
+keywords.*=b br body head html img li link meta p script strong title ul xml
 # JavaScript
 keywords2.*=function var
 # Basic
@@ -12,6 +11,8 @@ keywords4.*=import pass
 keywords5.*=echo __file__ __line__
 # SGML
 keywords6.*=ELEMENT
+# Attributes
+keywords7.*=content encoding href language name rel runat src type version xmlns
 
 # Tag
 substyles.hypertext.1=1

html-0604-1.patch set attribute list when parameter is not an empty string (decouple from tag list without set anything):

attributeList = wl[0] ? wordListN : &keywords;

html-0604-2.patch set attribute list only when parameter contains word:

if (n == 6) {
	attributeList = wordListN->Length() ? wordListN : &keywords;
}

@nyamatongwe
Copy link
Member

This appears to duplicate behaviour available from substyles added with 1c6e3e5 so isn't necessary.

@nyamatongwe nyamatongwe changed the title [HTML] Add separator keywords for attributes [HTML] Add separate keywords for attributes Jun 4, 2024
@zufuliu
Copy link
Contributor Author

zufuliu commented Jun 5, 2024

I don't think it's duplicate. Even with substyles, body is still classified as SCE_H_ATTRIBUTE:

lexilla/lexers/LexHTML.cxx

Lines 259 to 269 in 85d1d67

bool classifyAttribHTML(script_mode inScriptType, Sci_PositionU start, Sci_PositionU end, const WordList &keywords, const WordClassifier &classifier, Accessor &styler, const std::string &tag) {
int chAttr = SCE_H_ATTRIBUTEUNKNOWN;
bool isLanguageType = false;
if (IsNumberChar(styler[start])) {
chAttr = SCE_H_NUMBER;
} else {
const std::string s = styler.GetRangeLowered(start, end+1);
if (keywords.InList(s)) {
chAttr = SCE_H_ATTRIBUTE;
} else {
int subStyle = classifier.ValueFor(s);

So, there need a way to not use keywords (word list 0) to classify attributes. The patch provided a method that's easy for app to migration than implement substyles.

1 similar comment
@zufuliu

This comment was marked as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
html Caused by the hypertext lexer
Projects
None yet
Development

No branches or pull requests

2 participants