Skip to content

Vite 8.0.14 dep-optimizer corrupts lone-surrogate Unicode escapes (\uD800-\uDBFF) in regex strings #22500

@CluesOverride

Description

@CluesOverride

Describe the bug

Starting with Vite 8.0.14 (which bumps rolldown 1.0.0-rc.18 → 1.0.2 and lands #22342 "pass oxc jsx options to transformSync in dependency scan"), the dep-optimizer corrupts lone-surrogate Unicode escape sequences in CommonJS-imported string literals.

For an input regex string like:

// node_modules/@vscode/markdown-it-katex/.../katex.js (CJS, transitively loads KaTeX)
var tokenRegexString = "([!-\\[\\]-‧‪-퟿豈-�][̀-ͯ]*|[\uD800-\uDBFF][\uDC00-\uDFFF][̀-ͯ]*|..."

The dep-optimized output at node_modules/.vite/deps/@vscode_markdown-it-katex.js is:

"([!-\\[\\]-‧‪-퟿豈-�][̀-ͯ]*|[<FFFD>d800-<FFFD>dbff][<FFFD>dc00-<FFFD>dfff][̀-ͯ]*..."

The escapes whose decoded values are valid BMP characters (, , , , ) survived. But the escapes whose decoded values are lone surrogates (\uD800, \uDBFF, \uDC00, \uDFFF) were each turned into a U+FFFD REPLACEMENT CHARACTER followed by literal text d800 etc. The resulting [<FFFD>d800-<FFFD>dbff] is parsed by the JS regex engine as a character class containing the literal characters <FFFD>, d, 8, 0, -, b, f — no longer matching surrogate code units in input strings.

For KaTeX's lexer (which uses this regex), the practical impact is that every multi-character control word longer than the backslash escape gets truncated — \sqrt tokenizes as \s (red error) plus the letters qrt as math italics, breaking all rendered math.

This appears to be a UTF-8 round-trip step in the dep-optimizer's minification path that doesn't handle lone surrogates (UTF-8 explicitly cannot represent them; the encoder substitutes U+FFFD).

Reproduction

https://github.com/CluesOverride/td148-vite-lone-surrogate-repro

Steps to reproduce

  1. git clone https://github.com/CluesOverride/td148-vite-lone-surrogate-repro && cd td148-vite-lone-surrogate-repro
  2. npm install
  3. npx vite dev
  4. Open node_modules/.vite/deps/@vscode_markdown-it-katex.js and search for dbff — you'll see <U+FFFD>d800-<U+FFFD>dbff instead of \uD800-\uDBFF.
  5. Open the browser to http://localhost:5173/. The page's self-check table will show SOURCE PRESERVED? NO, and \sqrt{x} renders as \s (red) + literal math-italic qrtx.
  6. Re-install with npm install vite@8.0.13 --save-exact and re-run npx vite dev — the cache file contains preserved \uD800 escapes with zero U+FFFD bytes; KaTeX renders correctly.

Bug is in DEV-mode dep-optimizer specifically — vite build preserves the escapes (the build path uses a different minifier code path).

Expected behavior

The optimized bundle preserves the original string semantics. Lone-surrogate escape sequences in JS string literals should either:

  • Remain as \uXXXX escape sequences in the output (safest), OR
  • Be preserved as round-tripping JavaScript string values (each lone surrogate is one UTF-16 code unit; "\uD800".charCodeAt(0) === 0xD800 is well-defined).

Vite 8.0.11 / 8.0.12 / 8.0.13 (rolldown 1.0.0-rc.18) preserves the strings correctly. The behavior change in 8.0.14 (rolldown 1.0.2 + #22342) appears to assume strings can be re-encoded via UTF-8 round-trip, which silently corrupts lone surrogates.

Actual behavior

Each \uD800-\uDBFF and \uDC00-\uDFFF escape becomes <U+FFFD>d800-<U+FFFD>dbff etc. The JS regex engine parses the corrupted character class differently than the source intended, and downstream consumers (KaTeX's lexer) fail. Six U+FFFD bytes appear in node_modules/.vite/deps/@vscode_markdown-it-katex.js under 8.0.14, zero under 8.0.13.

System Info

System:
    OS: macOS 26.5
    CPU: (14) arm64 Apple M4 Pro
    Memory: 396.00 MB / 48.00 GB
    Shell: 5.9 - /bin/zsh
  Binaries:
    Node: 22.22.2 - /Users/austinfee/.nvm/versions/node/v22.22.2/bin/node
    Yarn: 1.22.22
    npm: 10.9.7
    pnpm: 11.2.2
  Browsers:
    Chrome: 148.0.7778.179
    Firefox: 149.0
    Safari: 26.5
  npmPackages:
    vite: 8.0.14 => 8.0.14
    rolldown (transitive): 1.0.2

Used Package Manager

npm

Logs

Not applicable — bug is in the optimized bundle bytes on disk, not in vite --debug console output.

Validations

  • Code of Conduct
  • Contributing Guidelines
  • Docs
  • No existing duplicate issue (searched lone-surrogate / surrogate / unicode-escape-regex / KaTeX-dep-optimizer / 8.0.14-regex / Oxc-transformSync / Rolldown-minify-string)
  • Vite issue (may also be a Rolldown 1.0.2 or Oxc transformSync bug — cross-filing as appropriate may help)
  • Concrete bug
  • Minimal reproduction provided

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions