Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: better structured headings #134

Merged
merged 2 commits into from
Jun 4, 2024
Merged

feat: better structured headings #134

merged 2 commits into from
Jun 4, 2024

Conversation

clason
Copy link
Member

@clason clason commented Jun 1, 2024

Problems: Header titles cannot be extracted easily (e.g., for generating a table of contents).

Solution: Expose nodes for separators (separator) and actual heading text (heading).

Closes #133

@clason clason requested a review from justinmk June 1, 2024 14:55
@clason
Copy link
Member Author

clason commented Jun 1, 2024

@justinmk that should simplify some header handling in gen_help_html quite a bit (I hope).

seq(
token.immediate(field('delimiter', /============+[\t ]*\n/)),
repeat1($._atom),
prec(1, seq(
Copy link
Member

@justinmk justinmk Jun 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe TS considers declaration order as part of precedence. So possibly we could avoid prec(1,...) by declaring these headings before block. But doesn't need to block this for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could make it tighter, but some precedence is needed to resolve the conflict between heading and possible taglinks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe TS considers declaration order as part of precedence.

Only for terminals, e.g. string literals and regex patterns, and strings are higher than regex patterns by default

grammar.js Outdated Show resolved Hide resolved
Comment on lines +188 to +201
alias(repeat1($._atom), $.heading),
optional(seq($.tag, repeat($._atom))),
Copy link
Member

@justinmk justinmk Jun 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything before the first tag is heading (edit: oh I see this is consistent with h3)? should we name it text and use that as the pseudo-convention for exposing the "text content" of complex captures? I was experimenting with this but only used it for "fields" (see e.g. url).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends. If we use a common node (not name!) for text nodes (that aggregate words?), then we need a field again to distinguish them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, I tried to be consistent across headings (even column headings, which as usual is a massive headache).

Comment on lines 208 to 211
(word)
(word)
(separator)
(heading
(word)
(word))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice improvement. though i think the separator previously was just not captured, is it useful to capture it ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not exposed, correct. I think it's useful if people want to give it a different highlight group and -- especially -- conceal (replace with proper line-drawing character).

@@ -135,14 +140,14 @@ module.exports = grammar({
'>',
choice(
alias(token.immediate(/[a-z0-9]+\n/), $.language),
token.immediate('\n')),
token.immediate(/\n/)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS regex are for parsing purposes; only use literals if you want to expose them as anonymous nodes to query.

Copy link
Member

@justinmk justinmk Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regex has lower priority than literals (oh but that's "only for terminals")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about priority; it's what gets exposed to queries. "Hiding" stuff from queries is the primary way of keeping parser size down (and performance up).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Hiding" stuff from queries is the primary way of keeping parser size down (and performance up).

Nice. Would be useful to add that tip here:

// - Match Specificity: Tree-sitter will prefer a token that is specified in
// the grammar as a String instead of a RegExp.
// - Rule Order: Tree-sitter will prefer the token that appears earlier in the
// grammar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@clason clason Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regex has lower priority than literals (oh but that's "only for terminals")

Oh, I see what you mean here. But here it's fine since the token.immediate takes care of it. (I tested it.)

@clason clason requested a review from justinmk June 1, 2024 16:54
Copy link
Member

@justinmk justinmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nicely done

@clason clason force-pushed the feat/headings branch 3 times, most recently from 478eca0 to 16197b6 Compare June 3, 2024 15:50
Problems: Header titles cannot be extracted easily (e.g., for generating
a table of contents).

Solution: Expose nodes for separators (`separator`) and actual heading
text (`heading`).
grammar.js Outdated Show resolved Hide resolved
Co-authored-by: Justin M. Keyes <[email protected]>
@clason clason merged commit 1b177bd into master Jun 4, 2024
3 checks passed
@clason clason deleted the feat/headings branch June 4, 2024 13:39
@clason
Copy link
Member Author

clason commented Jun 4, 2024

I'll let this cook on nvim-treesitter a bit, then I'll make a release and update the bundled parser in Neovim.

@justinmk
Copy link
Member

Is it expected that this didn't change any of the h1/h2 test cases?

justinmk added a commit to justinmk/neovim that referenced this pull request Jun 19, 2024
Problem:
vimdoc grammar added new forms that are not handled in our HTML
generator. neovim/tree-sitter-vimdoc#134

Solution:
Update `gen_help_html.lua`.

Fixes neovim#29277
justinmk added a commit to justinmk/neovim that referenced this pull request Jun 19, 2024
Problem:
vimdoc grammar added new forms that are not handled in our HTML
generator. neovim/tree-sitter-vimdoc#134

Solution:
Update `gen_help_html.lua`.

Fixes neovim#29277
justinmk added a commit to neovim/neovim that referenced this pull request Jun 19, 2024
Problem:
vimdoc grammar added new forms that are not handled in our HTML
generator. neovim/tree-sitter-vimdoc#134

Solution:
Update `gen_help_html.lua`.

Fixes #29277
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More structured headings
3 participants