Vite 8.0.14 dep-optimizer corrupts lone-surrogate Unicode escapes (\uD800-\uDBFF) in regex strings

### Describe the bug

Starting with Vite **8.0.14** (which bumps `rolldown` 1.0.0-rc.18 → 1.0.2 and lands [#22342](https://github.com/vitejs/vite/pull/22342) "pass oxc jsx options to transformSync in dependency scan"), the dep-optimizer corrupts lone-surrogate Unicode escape sequences in CommonJS-imported string literals.

For an input regex string like:

```js
// node_modules/@vscode/markdown-it-katex/.../katex.js (CJS, transitively loads KaTeX)
var tokenRegexString = "([!-\\[\\]-‧‪-퟿豈-￿][̀-ͯ]*|[\uD800-\uDBFF][\uDC00-\uDFFF][̀-ͯ]*|..."
```

The dep-optimized output at `node_modules/.vite/deps/@vscode_markdown-it-katex.js` is:

```js
"([!-\\[\\]-‧‪-퟿豈-￿][̀-ͯ]*|[<FFFD>d800-<FFFD>dbff][<FFFD>dc00-<FFFD>dfff][̀-ͯ]*..."
```

The escapes whose decoded values are **valid BMP characters** (`‧`, `‪`, `퟿`, `豈`, `￿`) survived. But the escapes whose decoded values are **lone surrogates** (`\uD800`, `\uDBFF`, `\uDC00`, `\uDFFF`) were each turned into a U+FFFD REPLACEMENT CHARACTER followed by literal text `d800` etc. The resulting `[<FFFD>d800-<FFFD>dbff]` is parsed by the JS regex engine as a character class containing the literal characters `<FFFD>`, `d`, `8`, `0`, `-`, `b`, `f` — no longer matching surrogate code units in input strings.

For KaTeX's lexer (which uses this regex), the practical impact is that every multi-character control word longer than the backslash escape gets truncated — `\sqrt` tokenizes as `\s` (red error) plus the letters `qrt` as math italics, breaking all rendered math.

This appears to be a UTF-8 round-trip step in the dep-optimizer's minification path that doesn't handle lone surrogates (UTF-8 explicitly cannot represent them; the encoder substitutes U+FFFD).

### Reproduction

https://github.com/CluesOverride/td148-vite-lone-surrogate-repro

### Steps to reproduce

1. `git clone https://github.com/CluesOverride/td148-vite-lone-surrogate-repro && cd td148-vite-lone-surrogate-repro`
2. `npm install`
3. `npx vite dev`
4. Open `node_modules/.vite/deps/@vscode_markdown-it-katex.js` and search for `dbff` — you'll see `<U+FFFD>d800-<U+FFFD>dbff` instead of `\uD800-\uDBFF`.
5. Open the browser to `http://localhost:5173/`. The page's self-check table will show `SOURCE PRESERVED? NO`, and `\sqrt{x}` renders as `\s` (red) + literal math-italic `qrtx`.
6. Re-install with `npm install vite@8.0.13 --save-exact` and re-run `npx vite dev` — the cache file contains preserved `\uD800` escapes with zero U+FFFD bytes; KaTeX renders correctly.

Bug is in DEV-mode dep-optimizer specifically — `vite build` preserves the escapes (the build path uses a different minifier code path).

### Expected behavior

The optimized bundle preserves the original string semantics. Lone-surrogate escape sequences in JS string literals should either:
- Remain as `\uXXXX` escape sequences in the output (safest), OR
- Be preserved as round-tripping JavaScript string values (each lone surrogate is one UTF-16 code unit; `"\uD800".charCodeAt(0) === 0xD800` is well-defined).

Vite 8.0.11 / 8.0.12 / 8.0.13 (rolldown 1.0.0-rc.18) preserves the strings correctly. The behavior change in 8.0.14 (rolldown 1.0.2 + #22342) appears to assume strings can be re-encoded via UTF-8 round-trip, which silently corrupts lone surrogates.

### Actual behavior

Each `\uD800-\uDBFF` and `\uDC00-\uDFFF` escape becomes `<U+FFFD>d800-<U+FFFD>dbff` etc. The JS regex engine parses the corrupted character class differently than the source intended, and downstream consumers (KaTeX's lexer) fail. Six U+FFFD bytes appear in `node_modules/.vite/deps/@vscode_markdown-it-katex.js` under 8.0.14, zero under 8.0.13.

### System Info

```shell
System:
    OS: macOS 26.5
    CPU: (14) arm64 Apple M4 Pro
    Memory: 396.00 MB / 48.00 GB
    Shell: 5.9 - /bin/zsh
  Binaries:
    Node: 22.22.2 - /Users/austinfee/.nvm/versions/node/v22.22.2/bin/node
    Yarn: 1.22.22
    npm: 10.9.7
    pnpm: 11.2.2
  Browsers:
    Chrome: 148.0.7778.179
    Firefox: 149.0
    Safari: 26.5
  npmPackages:
    vite: 8.0.14 => 8.0.14
    rolldown (transitive): 1.0.2
```

### Used Package Manager

npm

### Logs

Not applicable — bug is in the optimized bundle bytes on disk, not in `vite --debug` console output.

### Validations

- [x] Code of Conduct
- [x] Contributing Guidelines
- [x] Docs
- [x] No existing duplicate issue (searched lone-surrogate / surrogate / unicode-escape-regex / KaTeX-dep-optimizer / 8.0.14-regex / Oxc-transformSync / Rolldown-minify-string)
- [x] Vite issue (may also be a Rolldown 1.0.2 or Oxc `transformSync` bug — cross-filing as appropriate may help)
- [x] Concrete bug
- [x] Minimal reproduction provided


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vite 8.0.14 dep-optimizer corrupts lone-surrogate Unicode escapes (\uD800-\uDBFF) in regex strings #22500

Describe the bug

Reproduction

Steps to reproduce

Expected behavior

Actual behavior

System Info

Used Package Manager

Logs

Validations

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Vite 8.0.14 dep-optimizer corrupts lone-surrogate Unicode escapes (\uD800-\uDBFF) in regex strings #22500

Description

Describe the bug

Reproduction

Steps to reproduce

Expected behavior

Actual behavior

System Info

Used Package Manager

Logs

Validations

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions