Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor html output functions #529

Merged
merged 2 commits into from
Feb 17, 2025
Merged

Refactor html output functions #529

merged 2 commits into from
Feb 17, 2025

Conversation

digitalmoksha
Copy link
Collaborator

Refactored the html formatter to handle each node via a separate functions.

@digitalmoksha
Copy link
Collaborator Author

Went a little nuts today 😬

Copy link
Contributor

github-actions bot commented Feb 5, 2025

Command Mean [ms] Min [ms] Max [ms] Relative
./bench.sh ./comrak-8c939a6 315.5 ± 1.5 313.1 318.5 2.93 ± 0.02
./bench.sh ./comrak-main 316.7 ± 6.3 312.1 346.0 2.95 ± 0.06
./bench.sh ./pulldown-cmark 107.5 ± 0.5 106.2 108.7 1.00
./bench.sh ./cmark-gfm 114.1 ± 1.1 112.4 116.6 1.06 ± 0.01
./bench.sh ./markdown-it 482.1 ± 4.8 476.2 497.2 4.48 ± 0.05

Run on Wed Feb 5 01:40:02 UTC 2025

@digitalmoksha
Copy link
Collaborator Author

let ... else is only stable in 1.65, and I forgot I was running 1.73 locally.

I was trying to keep the function signature the same (always the same args) for all the render_ functions. But maybe there is no real benefit to that, and I'm making this more difficult than it needs to be. It made sense in my head yesterday 🤷

@charlottia
Copy link
Collaborator

let ... else is only stable in 1.65, and I forgot I was running 1.73 locally.

D'oh! Maybe we should think about an MSRV bumping policy? Right now we're sitting at 1.62.1 maybe forever, whereas some people out there are using N-2 (!).

I was trying to keep the function signature the same (always the same args) for all the render_ functions.

The advantage that sticks out in my head is they become more amenable to generated (macro) code; we can't inspect the target function signatures at runtime, but if they all share a signature it'd be possible to do some kind of programmatically generated dispatch!

@digitalmoksha
Copy link
Collaborator Author

The advantage that sticks out in my head is they become more amenable to generated (macro) code; we can't inspect the target function signatures at runtime, but if they all share a signature it'd be possible to do some kind of programmatically generated dispatch!

That's kind of what I was thinking, maybe setting us up for that possibility.

Maybe we should think about an MSRV bumping policy? Right now we're sitting at 1.62.1 maybe forever, whereas some people out there are using N-2 (!).

I think you're right. I'm not sure what exactly the policy should be. One part it's nice not to require a version that's too bleeding edge, so that larger organizations don't need to upgrade their rust version as quickly. For example GitLab is still on 1.73.

But we also don't want to completely limit ourselves on the ability to use certain features. Maybe it makes sense, for now, to evaluate when we run into something like the let...else and consider it, if it doesn't push us too far forward. In this case we would need to step up only to 1.65. But we could also consider going to 1.70 or 1.73 🤔

@digitalmoksha
Copy link
Collaborator Author

I've gone ahead and added a commit to bump the MSRV to 1.65. It's a pretty minimal bump.

I could rewrite the code to not use the let...else syntax, but it really kind of bloats it.

wdyt? 🤔

@digitalmoksha
Copy link
Collaborator Author

digitalmoksha commented Feb 13, 2025

Hey @kivikakk. So I've made the HtmlFormatter have a public interface. The purpose of this is to provide a way to override certain node formatters while being able to use the default versions.

https://gitlab.com/gitlab-org/ruby/gems/gitlab-glfm-markdown/-/merge_requests/76/diffs?commit_id=eb78621bfe8f47b6a41652c68ed2e95d5cfc0883 is a hacked together way in which it might be used.

I made all the render_xxx... functions public. But just now thinking about it, it might not be necessary. The normal format_node handles default behavior. Even if I wanted to override a particular node, say NodeValue::BlockQuote to tweak a value in that node, and then want the default behavior, I could still call format_node rather than render_block_quote directly.

So maybe I wouldn't need to make all those public.

This relates to the discussion we had on this topic.

I admit I posed the question to an AI and it lead me to this

is there a way to override a function in a rust crate?

In Rust, there isn't a direct way to "override" a function in a crate as you might in object-oriented languages. However, there are several approaches you can use to achieve similar results:

  1. Trait Implementation: If the function is part of a trait, you can implement that trait for your own type and provide your own implementation of the function.

  2. Wrapper Types: Create a new type that wraps the original type from the crate. Implement methods on your wrapper type that either add new functionality or call the wrapped type's methods.

  3. Extension Traits: Define a new trait with the method you want to "override", and implement it for the type from the crate.

  4. Macros: In some cases, you might be able to use macros to redefine or extend functionality, though this approach can be complex.

  5. Fork the Crate: As a last resort, you could fork the crate and modify it directly. This is generally not recommended unless absolutely necessary.

Here's a simple example using an extension trait:

// Original crate
struct OriginalType;

impl OriginalType {
    fn method(&self) {
        println!("Original method");
    }
}

// Your code
trait ExtendedBehavior {
    fn method(&self);
}

impl ExtendedBehavior for OriginalType {
    fn method(&self) {
        println!("Extended method");
    }
}

fn main() {
    let obj = OriginalType;
    obj.method(); // Prints: "Original method"
    ExtendedBehavior::method(&obj); // Prints: "Extended method"
}

So I basically used a wrapper type in my hacked attempt.

@kivikakk
Copy link
Owner

I'm happy with making one or all the renderer functions public, tbh! But maybe better to just do the main one (so fewer changes are "breaking").

One part it's nice not to require a version that's too bleeding edge, so that larger organizations don't need to upgrade their rust version as quickly.

Yes, indeed; same goes for distro package maintainers.

I've gone ahead and added a commit to bump the MSRV to 1.65. It's a pretty minimal bump.

Works for me! :)

@digitalmoksha digitalmoksha force-pushed the bw-refactor-html-output branch from 4930706 to 7336534 Compare February 17, 2025 20:36
@digitalmoksha
Copy link
Collaborator Author

Ok, I know better than this - always keep PRs as small as possible. So this just has the refactoring of the render functions.

In another PR I will attempt to make HtmlFormatter#format_node overridable. In my current implementation, I don't like the fact that I basically have to copy the entire format function into the main program. Maybe we can leverage the adapter plugin mechanism we already have, by having an HTMLFormatNodeAdapter or something.

This allows us to use the `let...else` style syntax
@digitalmoksha digitalmoksha force-pushed the bw-refactor-html-output branch from 7336534 to 545419e Compare February 17, 2025 22:22
@digitalmoksha digitalmoksha merged commit 532d6e7 into main Feb 17, 2025
38 checks passed
@digitalmoksha digitalmoksha deleted the bw-refactor-html-output branch February 17, 2025 22:27
@@ -400,7 +400,7 @@ where
plain
} else {
stack.push((node, false, Phase::Post));
self.format_node(node, true)?
!self.format_node(node, true)?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting change; can you elaborate on it?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, to be clear, if it's a mistaken change, then the real mistake is that no test failure resulted from it! And if it isn't, we should add a regression test! Either way it's interesting.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I meant to comment on that change.

All the new render_ functions return a io::Result<bool>, and return an Ok(true) when the function is successful. The only render function that can return an Ok(false) is render_image. And that false needs to trigger some additional work, just like it did before.

Originally format_node was returning Ok(false) by default, which would set new_plain in format.

But with the render functions returning a true for success, then it made more sense to set a processing_complete boolean with value of the render functions, and return that. So now we had to negate the return value in the line above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests do fail pretty spectacularly if the ! is removed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it; I misread the diff and thought the trues and falses hadn't in fact switched positions. I might change those to a binary enum that actually make clear what work they're doing.

@kivikakk
Copy link
Owner

In my current implementation, I don't like the fact that I basically have to copy the entire format function into the main program. Maybe we can leverage the adapter plugin mechanism we already have, by having an HTMLFormatNodeAdapter or something.

You may again wish to consider the approach I outlined in the discussion: #454 (reply in thread); with an exposed format_node, that could be the default case, and with the macro piecing together a HtmlFormatter (and indeed, could be used to create the actual default HtmlFormatter too). Lmk if you want me to try putting something like that together for you to look over.

@digitalmoksha
Copy link
Collaborator Author

Lmk if you want me to try putting something like that together for you to look over.

Yeah that would be great. I haven't done anything with macros in Rust, and didn't quite follow what you were suggesting at the time.

@kivikakk
Copy link
Owner

Will do in a bit, then.

@kivikakk
Copy link
Owner

I have pushed an example of what I mean here: https://github.com/kivikakk/comrak/compare/custom-html-formatter?w=1

See the added test for a sample use.

I'm extremely low on cycles and I basically hacked this together in the fastest way possible to demonstrate the idea. There's a lot of cleanup that can be done; I'm exposing a whole bunch of things here which it'd be nicer not to expose.

I am 100% sure this can be done cleaner, and I wouldn't want to merge this as-is — it's just to give you an idea of how a macro could help produce the outcome you want. Note how the baseline html::HtmlFormatter is created in the very last line of html.rs.

@digitalmoksha
Copy link
Collaborator Author

Oh awesome, thanks! I'll look it over in the next few days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants