Allow using alternative markdown renderers #455

huonw · 2020-05-06T02:58:25Z

Shelling out to pandoc twice for each markdown cell can be slow. Using a single-process/in-memory renderer for any markdown can be noticably faster.

This adds support for using commonmark-py via `nbsphinx_markdown_renderer = "commonmark".

For instance, builds of https://github.com/stellargraph/stellargraph's docs go from ~2 minutes to 40 seconds (stellargraph/stellargraph#1517).

This is a draft PR because there's several open questions, that would be good to collaborate on/clarify before I spend more effort on this:

is this a useful feature at all?
is there an alternative (or additional) Python library we can use? Preferably one that covers more common extensions like tables (Support Tables readthedocs/commonmark.py#28) and equations (Math equations readthedocs/commonmark.py#49). I briefly investigated:
- mistune: the rST support and table extensions seem to only be available in the 2.0.0 alphas, but this cannot be installed as nbconvert depends on version 0.8.4
- Python-Markdown: has a lot of extensions, but seems to only support targeting HTML
- m2r: not maintained
- m2rr: maybe not popular enough?
- recommonmark: unclear to me how to use this for this particular case
is it worth generalising this even more? e.g. this could allow nbsphinx_markdown_renderer to be set to any subclass of nbsphinx.MarkdownRenderer (in addition to a str name) to let users customise

(NB. the diff here is much smaller ignoring whitespace, to skip over the changed indentation of the markdown2rst: https://github.com/spatialaudio/nbsphinx/pull/455/files?w=1)

Shelling out to pandoc twice for each markdown cell can be very slow. Using a single-process/in-memory renderer for any markdown can be noticably faster. For instance, builds of https://github.com/stellargraph/stellargraph's docs go from ~2 minutes to 40 seconds.

pep8speaks · 2020-05-06T02:58:30Z

Hello @huonw! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file src/nbsphinx.py:

Line 752:80: E501 line too long (102 > 79 characters)
Line 1307:80: E501 line too long (170 > 79 characters)
Line 1314:80: E501 line too long (80 > 79 characters)
Line 1322:80: E501 line too long (81 > 79 characters)
Line 1323:80: E501 line too long (84 > 79 characters)
Line 1363:80: E501 line too long (82 > 79 characters)
Line 1414:1: E305 expected 2 blank lines after class or function definition, found 1
Line 1414:80: E501 line too long (97 > 79 characters)

chrisjsewell · 2020-05-06T03:19:04Z

I would point out: https://github.com/executablebooks/markdown-it-py, which has tables, among other extensions, and is already used in https://myst-parser.readthedocs.io, https://myst-nb.readthedocs.io/en/latest/ and https://jupyterbook.org 😁

mgeier · 2020-05-06T14:32:33Z

is this a useful feature at all?

Yes, definitely!
But I'd like to do it a bit differently.

Instead of converting from Markdown to reST and then from reST to a docutils "doctree" object, I would like to do the conversion in one step, see #36.

is there an alternative (or additional) Python library we can use? Preferably one that covers more common extensions like tables (readthedocs/commonmark.py#28) and equations (readthedocs/commonmark.py#49).

Ideally this should be a generic extension point, so we wouldn't have to support a limited set of parsers.

is it worth generalising this even more? e.g. this could allow nbsphinx_markdown_renderer to be set to any subclass of nbsphinx.MarkdownRenderer (in addition to a str name) to let users customise

Yes, I think it should have at least that level of generality.

I'm not sure a class is the best API, though. I think a function should suffice.

I think that's what nbsphinx should do in the future:

create an empty "doctree"
iterate through all cells of the notebook
- if it's a code cell: add the necessary nodes to the doctree
- if it's a Markdown cell: call the configurable Markdown parser, which adds the necessary nodes to the doctree

This means, instead of Markdown "renderers" which generate reST code, we would need Markdown "parsers" that can append to a doctree.

A very important feature of those parsers would be to support piecemeal parsing, while not destroying the section hierarchy.

For example, one Markdown cell might "open" a section of a certain level.
If a code cell follows, this code cell would have to be added to the same section level.
And if another Markdown cell follows after that which has only text (and no section headers), this text should also be added to the right section level.

I don't know if the existing Markdown/CommonMark parsers with "doctree" output do support this feature.

@chrisjsewell Can MyST-Parser parse partial Markdown documents?

I think the most promising option for a parser would be something between markdown-it-py and MyST-Parser. Something that supports parsing into docutils doctrees but doesn't have the MyST extensions enabled by default.

huonw · 2020-05-10T23:17:24Z

Instead of converting from Markdown to reST and then from reST to a docutils "doctree" object, I would like to do the conversion in one step, see #36.

That issue has been open for a while, has there been more progress other than the comments on it? If not, is it worth pursuing this PR as an intermediate step?

One concern that I guess you might have is migration: at the moment I guess switching to docutils directly would be unobservable to a user (except for differences to Pandoc's rendering), whereas with this change (with the arbitrary-function version) it would be observable to users who have a custom def nbsphinx_markdown_renderer(markdown): ... function?

Ideally this should be a generic extension point, so we wouldn't have to support a limited set of parsers.

If this is generalised, would you want built-in support for any parsers other than Pandoc? At least with the current MD -> reST approach, the function required for other libraries is tiny (e.g. the core of the CommonMark one is two lines, and mistune would be similar).

I would point out: https://github.com/executablebooks/markdown-it-py, which has tables, among other extensions, and is already used in https://myst-parser.readthedocs.io, https://myst-nb.readthedocs.io/en/latest/ and https://jupyterbook.org 😁

@chrisjsewell I think this doesn't work with the current setup of nbsphinx, for the same reason as Python-Markdown (and mistune < 2): no reStructuredText rendering?

mgeier · 2020-05-11T18:34:50Z

Instead of converting from Markdown to reST and then from reST to a docutils "doctree" object, I would like to do the conversion in one step, see #36.

That issue has been open for a while, has there been more progress other than the comments on it?

Not that I know of, except the things already discussed in this PR.

We would need an appropriate Markdown parser before #36 can be tackled.

The MyST/markdown-it-py project looks promising, but I think it doesn't have all the needed features (yet?).

If not, is it worth pursuing this PR as an intermediate step?

No, not when it exposes an implementation detail like the intermediate reST representation.

But if it just "invisibly" switches the Markdown parser, I'm all for it!

One concern that I guess you might have is migration: at the moment I guess switching to docutils directly would be unobservable to a user (except for differences to Pandoc's rendering), whereas with this change (with the arbitrary-function version) it would be observable to users who have a custom def nbsphinx_markdown_renderer(markdown): ... function?

Exactly.

Ideally this should be a generic extension point, so we wouldn't have to support a limited set of parsers.

If this is generalised, would you want built-in support for any parsers other than Pandoc?

I guess there should be some parser selected by default. I don't care which one, as long as it has the right features.
And ideally, Pandoc wouldn't be involved at all in the default workflow.

At least with the current MD -> reST approach, the function required for other libraries is tiny (e.g. the core of the CommonMark one is two lines, and mistune would be similar).

You would still have to take care of the Markdown extensions used in Jupyter.

Do you already have a feature-complete parser to replace Pandoc?

I have no problem with switching from Pandoc to another parser, as long as we don't expose the intermediate reST representation.

huonw mentioned this pull request May 6, 2020

Try using new nbsphinx_markdown_renderer option for doc builds stellargraph/stellargraph#1517

Closed

mgeier mentioned this pull request Aug 23, 2020

DOC: Avoid nested markup #485

Merged

mgeier mentioned this pull request Oct 17, 2020

Custom markdown cell rendering #502

Closed

mgeier mentioned this pull request Apr 25, 2021

Force loading MathJax on HTML pages generated from notebooks #551

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow using alternative markdown renderers #455

Allow using alternative markdown renderers #455

huonw commented May 6, 2020 •

edited

Loading

pep8speaks commented May 6, 2020

chrisjsewell commented May 6, 2020 •

edited

Loading

mgeier commented May 6, 2020

huonw commented May 10, 2020

mgeier commented May 11, 2020

Allow using alternative markdown renderers #455

Are you sure you want to change the base?

Allow using alternative markdown renderers #455

Conversation

huonw commented May 6, 2020 • edited Loading

pep8speaks commented May 6, 2020

chrisjsewell commented May 6, 2020 • edited Loading

mgeier commented May 6, 2020

huonw commented May 10, 2020

mgeier commented May 11, 2020

huonw commented May 6, 2020 •

edited

Loading

chrisjsewell commented May 6, 2020 •

edited

Loading