-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace client-side highlight.js by server-side syntect highlighting #1842
Conversation
1880e7c
to
c1dcb38
Compare
Screenshots (original on left, updated on right): TOML and Rust rendered as part of a readme, https://docs.rs/crate/bpaf/0.6.1: The same section of readme in markdown source, https://docs.rs/crate/bpaf/0.6.1/source/README.md : Unsupported filetype, https://docs.rs/crate/bpaf/0.6.1/source/.gitignore : |
Wouldn't it be better to store the rendered content directly? |
I couldn't find a better alternative than submodules for the assets. |
I'm not sure if you're answering my question. In case you are, I'll try to reword it better: Instead of highlighting every time the page is requested, why not store the rendered page to not recompute it every time? |
That would mean that any improvements to the highlighting wouldn't apply to old files. Highlighting is very cheap, and these pages aren't requested that often compared to rustdoc pages. |
I see, makes sense. Thanks for the explanation! 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this!
Parts of the restructuring could be a chance to write some unit-tests for these ( like from_path_and_mime
or web::markdown
, but that's not a blocker.
I did some manual testing, comparing the current source view with this one.
Looking at: https://docs.rs/crate/regex/1.6.0
- the code examples in the readme are not all hilighted (example: "Usage: Avoid compiling the same regex in a loop")
Looking at: https://docs.rs/crate/itertools/0.10.3/source/
Files that are currently highlited but aren't with this PR any more:
.rustfmt.toml
Cargo.lock
Cargo.toml.orig
.github/workflows/ci.yml
(here I'm not sure which highlighting is applied currently, could not yaml)
looking at https://docs.rs/crate/regex/1.6.0/source/
test
Of course we could decide not to highlight certain file-types, though I would prefer to kepe highlighting for commonly used things (cargo.lock is a good example for this )
Fixed the readme block, |
Feel free to ping me / request a review when I should have another look. Thank you! |
Extracted the full list of languages from the version of highlight.js we had: The languages included in the base sublime syntax packages: The major missing languages I can see are: |
To load all the syntaxes it turns out YAML (https://docs.rs/crate/itertools/0.10.3/source/.github/workflows/ci.yml): JSON (https://docs.rs/crate/itertools/0.10.3/source/.cargo_vcs_info.json): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did more testing with the last state:
not hilighted:
- http://localhost:3000/crate/itertools/0.10.1 (the code in the readme), though I see that the readme was a RST file in that version. How did we support that before this change? ( rust-itertools/itertools@59fda6b )
I don't know how we are even finding that readme to render, it's not set via Line 265 in 52932db
highlight.js is detecting that codeblock as |
Ah, it's not the readme, that's the crate-level docs being very hackily extracted: Lines 300 to 326 in 52932db
We should probably stop doing that, it doesn't make sense when those docs are being rendered as part of the documentation anyway... |
now that's a neat edge-case :D
I agree, IMO we don't have to show more than crates.io here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hate manually managing submodules; can we clone them in the build script instead of requiring people to deal with it themselves?
Other than that this seems reasonable; @jsha asked a while ago why are we even syntax highlighting in the first place, and that seems good to keep in the back of our minds, but the new highlighting looks reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While doing a last test I locally saw some higher response times with bigger source files ( > 2 seconds, for example from the regex crate), but this was better in the release build.
Also these pages are candidates for caching anyways when that's live.
I doubt these will get more than 1 update per year, the overhead of working out how to manage them seems higher than the overhead of running |
@Nemo157 I saw you rebased / added a new commit, feel free to ping me when this is ready again |
Finally got round to implementing this and rebasing to resolve the workflow conflicts. (Then removed it again after the following changes 😁)
It's ~1.3k files/2.8MB, which I guess is actually smaller than the font-awesome files. I've switched to using |
This is setup to only support the same languages we targeted before: Rust, Markdown and TOML.
This requires passing the full filename into the highlighter so that if specific files have an overridden syntax definition they will be chosen. We also need to edit the syntax definition we load so that it includes all the filenames we need.
This is setup to only support the same languages we targeted before: Rust, Markdown and TOML.
fixes #1652
TODO in this PR:
Future extensions:
Packages
subrepository, or there's a collection of third-party syntaxes at https://github.com/sharkdp/bat/tree/master/assets/syntaxes/02_Extra