Skip to content

fix(es/parser): fix panic on JSX surrogate entities#11803

Draft
cuyl wants to merge 2 commits intoswc-project:mainfrom
cuyl:jsx-entity
Draft

fix(es/parser): fix panic on JSX surrogate entities#11803
cuyl wants to merge 2 commits intoswc-project:mainfrom
cuyl:jsx-entity

Conversation

@cuyl
Copy link
Copy Markdown
Contributor

@cuyl cuyl commented Apr 17, 2026

This commit fixes a panic (RuntimeError: unreachable) that occurred when the JSX parser
encountered numeric entities representing UTF-16 surrogates (e.g., ��).

The previous implementation used char::from_u32(...).expect(...), which panics if the
value is a surrogate or exceeds the Unicode range. This is now replaced with a safer
parse_from_code helper that returns an error instead of panicking.

Additionally, the lexer now supports merging consecutive JSX entities that form a
UTF-16 surrogate pair into a single Unicode character. This matches the behavior of
other JSX parsers and ensures that characters outside the Basic Multilingual Plane
are correctly decoded when represented as surrogate pairs.

Changes:

  • Replaced panicking char::from_u32 with safe numeric entity parsing.
  • Updated read_jsx_entity to return String to support multi-code-unit characters.
  • Implemented merging logic for UTF-16 surrogate pairs across consecutive entities.
  • Added test cases covering surrogate pairs in both JSX attributes and text.

Fixes #11802.

@cuyl cuyl requested a review from a team as a code owner April 17, 2026 23:08
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 17, 2026

⚠️ No Changeset found

Latest commit: b044768

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9c8856281f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread crates/swc_ecma_parser/src/lexer/mod.rs Outdated
format!("&{s};"),
));
}
} else if prev_result != NO_PREV_RESULT {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve lone low-surrogate entities instead of dropping them

Low surrogates are only handled when prev_result is already set; when a low surrogate appears first (for example �), this condition is false and the function falls through with an empty value, so callers append nothing and the entity silently disappears from JSX text/attribute values. Even for invalid entities, silently deleting source content is a parsing regression and should be replaced with explicit fallback behavior (e.g., replacement char or raw preservation).

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 17, 2026

Binary Sizes

File Size
swc.linux-x64-gnu.node 27M (27778120 bytes)

Commit: 118153a

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 17, 2026

Merging this PR will degrade performance by 2.06%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
❌ 1 regressed benchmark
✅ 217 untouched benchmarks
⏩ 31 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
es/large/minify/libraries/terser 511.2 ms 521.9 ms -2.06%
es/lints/libs/three 50.5 ms 49.3 ms +2.26%

Comparing cuyl:jsx-entity (b94cb60) with main (6028240)2

Open in CodSpeed

Footnotes

  1. 31 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (6f07c6c) during the generation of this report, so 6028240 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

This commit fixes a panic (RuntimeError: unreachable) that occurred when the JSX parser
encountered numeric entities representing UTF-16 surrogates (e.g., ��).

The previous implementation used `char::from_u32(...).expect(...)`, which panics if the
value is a surrogate or exceeds the Unicode range. This is now replaced with a safer
`parse_from_code` helper that returns an error instead of panicking.

Additionally, the lexer now supports merging consecutive JSX entities that form a
UTF-16 surrogate pair into a single Unicode character. This matches the behavior of
other JSX parsers and ensures that characters outside the Basic Multilingual Plane
are correctly decoded when represented as surrogate pairs.

Changes:
- Replaced panicking `char::from_u32` with safe numeric entity parsing.
- Updated `read_jsx_entity` to return `String` to support multi-code-unit characters.
- Implemented merging logic for UTF-16 surrogate pairs across consecutive entities.
- Added test cases covering surrogate pairs in both JSX attributes and text.

Fixes swc-project#11802.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b94cb6033b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread crates/swc_ecma_parser/src/lexer/mod.rs Outdated
Comment on lines 1467 to 1469
} else {
let value = char::from_u32(result as u32).unwrap().to_string();
return Ok((value, format!("&{s};")));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Flush buffered high surrogate before returning next code point

If the lexer buffers a high surrogate from one entity and the following &#...; is a non-surrogate scalar (for example �a), this branch returns only the second character and drops the buffered surrogate entirely. That silently deletes source content in JSX text/attributes for mixed surrogate/non-surrogate sequences instead of preserving or replacing the unmatched surrogate.

Useful? React with 👍 / 👎.

@cuyl cuyl marked this pull request as draft April 18, 2026 07:15
@cuyl
Copy link
Copy Markdown
Contributor Author

cuyl commented Apr 22, 2026

I changed some generated code manually (crates/swc_ecma_visit/src/generated.rs), actually I don't know how to generated the code.
I've done some research on the code trying to generate the code by running cargo codegen --input-dir crates/swc_ecma_ast --output crates/swc_ecma_visit/src/generated.rs.
But the generated code have too much diff and cannot pass build.
@kdy1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

RuntimeError (panic) in JSX parser when encountering surrogate entitie

1 participant