Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minify all regexes using the new oniguruma-parser optimizer #125

Open
slevithan opened this issue Mar 3, 2025 · 3 comments · May be fixed by #133
Open

Minify all regexes using the new oniguruma-parser optimizer #125

slevithan opened this issue Mar 3, 2025 · 3 comments · May be fixed by #133

Comments

@slevithan
Copy link

slevithan commented Mar 3, 2025

I've recently split oniguruma-to-es's parser/tokenizer/traverser into the new dedicated library oniguruma-parser. It includes lots of improvements, including a new Oniguruma optimizer that I think is a perfect fit for this library (in fact I built it primarily for Shiki). 😊

As of now, the optimizer includes 18 optimization transforms, but many more can be added in the future. tm-grammars will get continually-improving regex minification when bumping dependencies.

Here's how you'd want to call it in this library:

import { optimize } from 'oniguruma-parser/optimizer'

const pattern = '...'
const optimized = optimize(pattern, {
  rules: {
    // Follow `vscode-oniguruma` which enables this Oniguruma option by default
    captureGroup: true,
  },
}).pattern

Some notes:

  • Although oniguruma-parser supports a few more Oniguruma features than oniguruma-to-es, there are still a handful of features it doesn't yet support. So if optimize throws an error, you should probably ignore it and leave the original pattern unaltered. As of now, I think this will only happen for one regex in the Swift grammar.
  • Any existing code in this library that is changing the regexes can/should be removed when you start using optimize. It will take care of existing minification like removing comments and free-spacing with flag x.
  • Since the optimizer and the Oniguruma code generator it uses are both brand new, it probably makes sense to introduce this in a PR so I can review the regex changes before you land them.
@antfu
Copy link
Member

antfu commented Mar 3, 2025

That's awesome, thank you! Would you like to send the PR to swap the current minifier with yours?

@slevithan
Copy link
Author

No rush, but I'd prefer for you to submit the initial PR / integration. After it's in place, I can submit PRs that might be needed for any future releases. 😊

@slevithan
Copy link
Author

slevithan commented Mar 8, 2025

Heads up that I now have this working locally and discovered that it causes changes to highlighting for 6 grammars. Your existing testing in place for this was great! I thought it would be kind of risky to integrate a new system like this, but since this library already compares Oniguruma results for all samples before and after minification, that allows changing this with a lot more confidence!! 😊

I'll investigate the problem with the 6 grammars, fix, and send a PR to integrate the optimizer when that's done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants