Skip to content

Commit

Permalink
rewrite grammar, remove scanner
Browse files Browse the repository at this point in the history
Problem:
Hand-written C scanner is hard to maintain, slow, and hangs on files
like `filetype.txt` and `usr_24.txt`.

Solution:
Delete hand-written C scanner, define grammar fully in `grammar.js`

- introduce `url`
- introduce `block`, a group of lines. (does not support nesting yet)
- introduce `line_li` for listitems. (does not support nesting yet)
- keycodes #1
- `[range]` #1

fix #1
fix #7
fix #9
fix #10
fix #11
fix #14
fix #12 (except nested)
fix #13 (except nested)
  • Loading branch information
justinmk committed Sep 26, 2022
1 parent d1900d9 commit dcc85f1
Show file tree
Hide file tree
Showing 14 changed files with 994 additions and 376 deletions.
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
tree-sitter-vimdoc
==================

This grammar intentionally support a subset of the vimdoc "spec"; predictable
results are the primary goal, so that _output_ formats (e.g. HTML) are
well-formed; the _input_ (vimdoc) is secondary. The first step should always be
to try to fix the input (within reason) rather than insist on a grammar that
handles vimdoc's endless quirks.

Notes
-----

- vimdoc format "spec":
- [:help help-writing](https://neovim.io/doc/user/helphelp.html#help-writing)
- https://github.com/nanotee/vimdoc-notes
- `(code_block)` is contained by `(line)` because `>` can start a code block at the end of a line.

Known issues
------------

- `line` in a `code_block` does not contain `word` atoms, it's just the full
raw text line including whitespace. This is somewhat dictated by its
"preformatted" nature; parsing the contents implies loading a "child"
language (injection). See [#2](https://github.com/vigoux/tree-sitter-vimdoc/issues/2).
- `url` doesn't handle _surrounding_ parens. E.g. `(https://example.com/#yay)` yields `word`
- `url` doesn't handle _nested_ parens. E.g. `(https://example.com/(foo)#yay)`
- Ideally `block_end` should consume the last block of the document _only_ if that
block is missing a trailing blank line or EOL ("\n").
- TODO: consider simply _not supporting_ docs without EOL?
- Ideally `line_noeol` should consume the last line of the document _only_ if
that line is missing EOL ("\n").
- TODO: consider simply _not supporting_ docs without EOL?

TODO
----

- `line_noeol` is a special-case to support documents that don't end in EOL.
Grammar could be a bit simpler if we just require EOL at end of document.
- `line_modeline` (only at EOF)
- `column_heading` should not allow hotlinks. This is sometimes used in old help files to show results of a code example, e.g. in `usr_41.txt`:
```
List concatenation is done with +: >
:echo alist + ['foo', 'bar']
< ['foo', 'bar', 'foo', 'bar'] ~
```
53 changes: 37 additions & 16 deletions corpus/arguments.txt
Original file line number Diff line number Diff line change
@@ -1,31 +1,52 @@
================================================================================
Simple argument
simple argument
================================================================================
This in an argument: {arg}
--------------------------------------------------------------------------------

(help_file
(line
(word)
(word)
(word)
(word)
(argument
(word))))
(block
(line
(word)
(word)
(word)
(word)
(argument
(word)))))

================================================================================
Multiple arguments on the same line
multiple arguments on the same line
================================================================================

{foo} {bar} {baz}

--------------------------------------------------------------------------------

(help_file
(line
(argument
(word))
(argument
(word))
(argument
(block
(line
(argument
(word))
(argument
(word))
(argument
(word)))))

================================================================================
NOT an argument
================================================================================
{foo "{bar}" `{baz}` |{baz| }

--------------------------------------------------------------------------------

(help_file
(block
(line
(argument
(word)
(ERROR))
(word)
(backtick
(word))
(hotlink
(word))
(word))))
93 changes: 70 additions & 23 deletions corpus/backtick.txt
Original file line number Diff line number Diff line change
@@ -1,45 +1,92 @@
================================================================================
Simple backtick
simple backtick
================================================================================

`foobar`
a `foobar` b `:echo`

--------------------------------------------------------------------------------

(help_file
(line
(backtick
(word))))
(block
(line
(word)
(backtick
(word))
(word)
(backtick
(word)))))

================================================================================
Backtick in text
backtick in text
================================================================================

Hello `world`, I am a markup language
Hello `world`, I am `markup language`. But `this is
an error`.

--------------------------------------------------------------------------------

(help_file
(line
(word)
(backtick
(word))
(word)
(word)
(word)
(word)
(word)
(word)))
(block
(line
(word)
(backtick
(word))
(word)
(word)
(word)
(backtick
(word))
(word)
(word)
(backtick
(word)
(MISSING "`")))
(line
(word)
(word))))

================================================================================
Backtick with command inside
NOT a codespan / backtick
================================================================================

`:echo`
*'* *'a* *`* *`a*
'{a-z} `{a-z} Jump to the mark.
*g'* *g'a* *g`* *g`a*
g'{mark} g`{mark}

--------------------------------------------------------------------------------

(help_file
(line
(backtick
(word))))
(block
(line
(tag
(word))
(tag
(word))
(tag
(word))
(tag
(word)))
(ERROR)
(line
(argument
(word))
(word)
(word)
(word)
(word))
(line
(tag
(word))
(tag
(word))
(tag
(word))
(tag
(word)))
(line
(word)
(argument
(word))
(word)
(argument
(word)))))
Loading

0 comments on commit dcc85f1

Please sign in to comment.