feat: better structured headings #134

clason · 2024-06-01T14:54:55Z

Problems: Header titles cannot be extracted easily (e.g., for generating a table of contents).

Solution: Expose nodes for separators (separator) and actual heading text (heading).

Closes #133

clason · 2024-06-01T15:06:43Z

@justinmk that should simplify some header handling in gen_help_html quite a bit (I hope).

justinmk · 2024-06-01T15:41:39Z

grammar.js

-      seq(
-        token.immediate(field('delimiter', /============+[\t ]*\n/)),
-        repeat1($._atom),
+      prec(1, seq(


I believe TS considers declaration order as part of precedence. So possibly we could avoid prec(1,...) by declaring these headings before block. But doesn't need to block this for now.

I could make it tighter, but some precedence is needed to resolve the conflict between heading and possible taglinks.

I believe TS considers declaration order as part of precedence.

Only for terminals, e.g. string literals and regex patterns, and strings are higher than regex patterns by default

grammar.js

justinmk · 2024-06-01T15:46:37Z

grammar.js

+        alias(repeat1($._atom), $.heading),
+        optional(seq($.tag, repeat($._atom))),


everything before the first tag is heading (edit: oh I see this is consistent with h3)? should we name it text and use that as the pseudo-convention for exposing the "text content" of complex captures? I was experimenting with this but only used it for "fields" (see e.g. url).

Depends. If we use a common node (not name!) for text nodes (that aggregate words?), then we need a field again to distinguish them.

And yes, I tried to be consistent across headings (even column headings, which as usual is a massive headache).

test/corpus/arguments.txt

justinmk · 2024-06-01T15:49:38Z

test/corpus/arguments.txt

-        (word)
-        (word)
+        (separator)
+        (heading
+          (word)
+          (word))


nice improvement. though i think the separator previously was just not captured, is it useful to capture it ?

It was not exposed, correct. I think it's useful if people want to give it a different highlight group and -- especially -- conceal (replace with proper line-drawing character).

clason · 2024-06-01T16:22:14Z

grammar.js

@@ -135,14 +140,14 @@ module.exports = grammar({
      '>',
      choice(
        alias(token.immediate(/[a-z0-9]+\n/), $.language),
-        token.immediate('\n')),
+        token.immediate(/\n/)),


JS regex are for parsing purposes; only use literals if you want to expose them as anonymous nodes to query.

regex has lower priority than literals (oh but that's "only for terminals")

It's not about priority; it's what gets exposed to queries. "Hiding" stuff from queries is the primary way of keeping parser size down (and performance up).

"Hiding" stuff from queries is the primary way of keeping parser size down (and performance up).

Nice. Would be useful to add that tip here:

tree-sitter-vimdoc/grammar.js

Lines 2 to 5 in ce5ea84

// - Match Specificity: Tree-sitter will prefer a token that is specified in

// the grammar as a String instead of a RegExp.

// - Rule Order: Tree-sitter will prefer the token that appears earlier in the

// grammar.

Tips at home: https://github.com/nvim-treesitter/nvim-treesitter/wiki/Parser-Development

regex has lower priority than literals (oh but that's "only for terminals")

Oh, I see what you mean here. But here it's fine since the token.immediate takes care of it. (I tested it.)

justinmk

nicely done

Problems: Header titles cannot be extracted easily (e.g., for generating a table of contents). Solution: Expose nodes for separators (`separator`) and actual heading text (`heading`).

grammar.js

Co-authored-by: Justin M. Keyes <[email protected]>

clason · 2024-06-04T13:40:00Z

I'll let this cook on nvim-treesitter a bit, then I'll make a release and update the bundled parser in Neovim.

justinmk · 2024-06-19T14:28:30Z

Is it expected that this didn't change any of the h1/h2 test cases?

tree-sitter-vimdoc/test/corpus/heading1_2.txt

Line 2 in 2249c44

h1 h2 heading

Problem: vimdoc grammar added new forms that are not handled in our HTML generator. neovim/tree-sitter-vimdoc#134 Solution: Update `gen_help_html.lua`. Fixes neovim#29277

Problem: vimdoc grammar added new forms that are not handled in our HTML generator. neovim/tree-sitter-vimdoc#134 Solution: Update `gen_help_html.lua`. Fixes #29277

clason requested a review from justinmk June 1, 2024 14:55

justinmk reviewed Jun 1, 2024

View reviewed changes

grammar.js Outdated Show resolved Hide resolved

justinmk reviewed Jun 1, 2024

View reviewed changes

test/corpus/arguments.txt Show resolved Hide resolved

justinmk reviewed Jun 1, 2024

View reviewed changes

clason commented Jun 1, 2024

View reviewed changes

clason requested a review from justinmk June 1, 2024 16:54

justinmk approved these changes Jun 3, 2024

View reviewed changes

clason force-pushed the feat/headings branch 3 times, most recently from 478eca0 to 16197b6 Compare June 3, 2024 15:50

feat: better structured headings

43e57d9

Problems: Header titles cannot be extracted easily (e.g., for generating a table of contents). Solution: Expose nodes for separators (`separator`) and actual heading text (`heading`).

clason force-pushed the feat/headings branch from 16197b6 to 43e57d9 Compare June 3, 2024 15:55

justinmk reviewed Jun 4, 2024

View reviewed changes

grammar.js Outdated Show resolved Hide resolved

justinmk approved these changes Jun 4, 2024

View reviewed changes

Update grammar.js

7d85029

Co-authored-by: Justin M. Keyes <[email protected]>

clason merged commit 1b177bd into master Jun 4, 2024
3 checks passed

clason deleted the feat/headings branch June 4, 2024 13:39

justinmk mentioned this pull request Jun 19, 2024

fix(gen_help_html): handle delimiter, heading neovim/neovim#29415

Merged

OXY2DEV mentioned this pull request Aug 18, 2024

Vimdoc parser without PR#134 causes heading=nil OXY2DEV/helpview.nvim#4

Closed

This was referenced Oct 20, 2024

ci: update main workflow #139

Closed

fix(tests): adapt expected to heading changes #140

Merged

		alias(repeat1($._atom), $.heading),
		optional(seq($.tag, repeat($._atom))),

	// - Match Specificity: Tree-sitter will prefer a token that is specified in
	// the grammar as a String instead of a RegExp.
	// - Rule Order: Tree-sitter will prefer the token that appears earlier in the
	// grammar.

feat: better structured headings #134

feat: better structured headings #134

Uh oh!

Conversation

clason commented Jun 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clason commented Jun 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justinmk Jun 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

justinmk Jun 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justinmk Jun 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clason Jun 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justinmk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

clason commented Jun 4, 2024

Uh oh!

justinmk commented Jun 19, 2024

Uh oh!

Uh oh!

clason commented Jun 1, 2024 •

edited

Loading

clason commented Jun 1, 2024 •

edited

Loading

justinmk Jun 1, 2024 •

edited

Loading

justinmk Jun 1, 2024 •

edited

Loading

justinmk Jun 3, 2024 •

edited

Loading

clason Jun 3, 2024 •

edited

Loading