-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: better structured headings #134
Conversation
@justinmk that should simplify some header handling in |
seq( | ||
token.immediate(field('delimiter', /============+[\t ]*\n/)), | ||
repeat1($._atom), | ||
prec(1, seq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe TS considers declaration order as part of precedence. So possibly we could avoid prec(1,...)
by declaring these headings before block
. But doesn't need to block this for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could make it tighter, but some precedence is needed to resolve the conflict between heading and possible taglinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe TS considers declaration order as part of precedence.
Only for terminals, e.g. string literals and regex patterns, and strings are higher than regex patterns by default
alias(repeat1($._atom), $.heading), | ||
optional(seq($.tag, repeat($._atom))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything before the first tag is heading
(edit: oh I see this is consistent with h3
)? should we name it text
and use that as the pseudo-convention for exposing the "text content" of complex captures? I was experimenting with this but only used it for "fields" (see e.g. url
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends. If we use a common node (not name!) for text
nodes (that aggregate word
s?), then we need a field again to distinguish them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, I tried to be consistent across headings (even column headings, which as usual is a massive headache).
test/corpus/arguments.txt
Outdated
(word) | ||
(word) | ||
(separator) | ||
(heading | ||
(word) | ||
(word)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice improvement. though i think the separator previously was just not captured, is it useful to capture it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not exposed, correct. I think it's useful if people want to give it a different highlight group and -- especially -- conceal (replace with proper line-drawing character).
@@ -135,14 +140,14 @@ module.exports = grammar({ | |||
'>', | |||
choice( | |||
alias(token.immediate(/[a-z0-9]+\n/), $.language), | |||
token.immediate('\n')), | |||
token.immediate(/\n/)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JS regex are for parsing purposes; only use literals if you want to expose them as anonymous nodes to query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regex has lower priority than literals (oh but that's "only for terminals")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not about priority; it's what gets exposed to queries. "Hiding" stuff from queries is the primary way of keeping parser size down (and performance up).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Hiding" stuff from queries is the primary way of keeping parser size down (and performance up).
Nice. Would be useful to add that tip here:
Lines 2 to 5 in ce5ea84
// - Match Specificity: Tree-sitter will prefer a token that is specified in | |
// the grammar as a String instead of a RegExp. | |
// - Rule Order: Tree-sitter will prefer the token that appears earlier in the | |
// grammar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regex has lower priority than literals (oh but that's "only for terminals")
Oh, I see what you mean here. But here it's fine since the token.immediate
takes care of it. (I tested it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nicely done
478eca0
to
16197b6
Compare
Problems: Header titles cannot be extracted easily (e.g., for generating a table of contents). Solution: Expose nodes for separators (`separator`) and actual heading text (`heading`).
Co-authored-by: Justin M. Keyes <[email protected]>
I'll let this cook on nvim-treesitter a bit, then I'll make a release and update the bundled parser in Neovim. |
Is it expected that this didn't change any of the h1/h2 test cases?
|
Problem: vimdoc grammar added new forms that are not handled in our HTML generator. neovim/tree-sitter-vimdoc#134 Solution: Update `gen_help_html.lua`. Fixes neovim#29277
Problem: vimdoc grammar added new forms that are not handled in our HTML generator. neovim/tree-sitter-vimdoc#134 Solution: Update `gen_help_html.lua`. Fixes neovim#29277
Problem: vimdoc grammar added new forms that are not handled in our HTML generator. neovim/tree-sitter-vimdoc#134 Solution: Update `gen_help_html.lua`. Fixes #29277
Problems: Header titles cannot be extracted easily (e.g., for generating a table of contents).
Solution: Expose nodes for separators (
separator
) and actual heading text (heading
).Closes #133