[EXPERIMENT] Add node-id based ECMA semantics#11818
[EXPERIMENT] Add node-id based ECMA semantics#11818kdy1 wants to merge 2 commits intoswc-project:mainfrom
Conversation
|
|
I wanted to profile this approach using Codspeed |
There was a problem hiding this comment.
Pull request overview
Introduces experimental node-id–based semantic analysis across SWC’s ECMA AST by adding NodeId to Ident, generating visitor/hook support, and wiring resolver/minifier/rename entrypoints to assign node ids and enable symbol-based renaming.
Changes:
- Add
NodeId(occurrence id) toswc_ecma_ast::Ident, plus visitor/hook generated support for traversing it. - Introduce
swc_ecma_transforms_base::semanticswith node-id assignment and scope/symbol/reference side tables + new semantics fixture tests. - Wire resolver and minifier analysis entrypoints to assign fresh node ids; add SymbolId-based rename adapter APIs.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/swc_typescript/src/fast_dts/enum.rs | Populate new Ident.node_id field for synthesized identifiers. |
| crates/swc_estree_compat/src/swcify/expr.rs | Populate new Ident.node_id field during ESTree → SWC conversion. |
| crates/swc_ecma_visit/src/generated.rs | Generated visitor/fold/path plumbing for NodeId and Ident.node_id. |
| crates/swc_ecma_utils/src/ident.rs | Extend IdentLike with node_id() and implement it for common identifier-like types. |
| crates/swc_ecma_transforms_base/tests/semantics.rs | Add harness to snapshot-dump new semantics output from fixtures. |
| crates/swc_ecma_transforms_base/tests/node-id-semantics/unresolved/input.js | Fixture exercising unresolved references. |
| crates/swc_ecma_transforms_base/tests/node-id-semantics/unresolved/output.txt | Expected snapshot output for unresolved fixture. |
| crates/swc_ecma_transforms_base/tests/node-id-semantics/ts-type-value/input.ts | Fixture exercising TS type/value namespaces. |
| crates/swc_ecma_transforms_base/tests/node-id-semantics/ts-type-value/output.txt | Expected snapshot output for TS type/value fixture. |
| crates/swc_ecma_transforms_base/tests/node-id-semantics/shadow/input.js | Fixture exercising shadowing across scopes. |
| crates/swc_ecma_transforms_base/tests/node-id-semantics/shadow/output.txt | Expected snapshot output for shadowing fixture. |
| crates/swc_ecma_transforms_base/tests/node-id-semantics/imports-exports/input.js | Fixture exercising imports/exports. |
| crates/swc_ecma_transforms_base/tests/node-id-semantics/imports-exports/output.txt | Expected snapshot output for imports/exports fixture. |
| crates/swc_ecma_transforms_base/tests/node-id-semantics/class-function-expr/input.js | Fixture exercising named class/function expressions. |
| crates/swc_ecma_transforms_base/tests/node-id-semantics/class-function-expr/output.txt | Expected snapshot output for class/function expr fixture. |
| crates/swc_ecma_transforms_base/tests/node-id-semantics/catch-param-var/input.js | Fixture exercising catch param vs var-hoist interaction. |
| crates/swc_ecma_transforms_base/tests/node-id-semantics/catch-param-var/output.txt | Expected snapshot output for catch/var fixture. |
| crates/swc_ecma_transforms_base/src/semantics.rs | New node-id assignment + semantic analysis implementation (scopes/symbols/references). |
| crates/swc_ecma_transforms_base/src/resolver/mod.rs | Assign node ids once at resolver entrypoints. |
| crates/swc_ecma_transforms_base/src/rename/ops.rs | Add a SymbolOperator that renames using Semantics symbol resolution. |
| crates/swc_ecma_transforms_base/src/rename/mod.rs | Expose SymbolRenameMap and rename_symbols API. |
| crates/swc_ecma_transforms_base/src/lib.rs | Export new semantics module. |
| crates/swc_ecma_transformer/src/es2016/exponentiation_operator.rs | Populate new Ident.node_id field for synthesized Math ident. |
| crates/swc_ecma_minifier/src/pass/mangle_props.rs | Assign node ids before property-mangler analysis entrypoint. |
| crates/swc_ecma_minifier/src/lib.rs | Assign node ids at minifier optimize entrypoint. |
| crates/swc_ecma_minifier/src/compress/mod.rs | Assign node ids before minifier ProgramData analysis passes. |
| crates/swc_ecma_hooks/src/generated.rs | Generated hook plumbing for NodeId enter/exit events. |
| crates/swc_ecma_ast/src/node_id.rs | New NodeId type and invariants (0 reserved invalid). |
| crates/swc_ecma_ast/src/lib.rs | Export NodeId and include new module. |
| crates/swc_ecma_ast/src/ident.rs | Add node_id: NodeId field to Ident and update constructors/conversions. |
| crates/swc_bundler/src/id.rs | Extend IdentLike impl for bundler Id with node_id() returning invalid. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if self.options.hoist_vars || self.options.hoist_fns { | ||
| assign_node_ids(n); | ||
| let data = analyze(&*n, Some(self.marks), false); | ||
|
|
There was a problem hiding this comment.
assign_node_ids(n) adds a full AST traversal immediately before program_data::analyze, but ProgramData analysis currently keys off (Atom, SyntaxContext) and does not use Ident.node_id. If node ids are only needed for the new experimental semantics path, consider guarding this behind a config/feature or deferring assignment until a node-id consumer is actually used, to avoid unnecessary overhead.
|
|
||
| assign_node_ids(m); | ||
| let mut data = analyze(&*m, None, true); | ||
|
|
There was a problem hiding this comment.
assign_node_ids(m) adds an extra full AST traversal here, but the subsequent program_data::analyze/property atom collection does not currently use Ident.node_id (it relies on symbol + SyntaxContext). If this is only for the upcoming node-id semantics migration, consider gating it behind a flag or moving it to the code path that actually consumes node ids to avoid the extra walk.
|
|
||
| pub trait IdentLike: Sized + Send + Sync + 'static { | ||
| fn from_ident(i: &Ident) -> Self; | ||
| fn node_id(&self) -> NodeId; |
There was a problem hiding this comment.
IdentLike gained a new required method (node_id) without a default implementation, which is a breaking change for downstream crates implementing this public trait. Consider providing a default implementation (e.g. returning an invalid NodeId) so existing implementors keep compiling, and only override it where meaningful.
| fn node_id(&self) -> NodeId; | |
| fn node_id(&self) -> NodeId { | |
| NodeId::invalid() | |
| } |
| self.scope_stack | ||
| .iter() | ||
| .find(|frame| frame.id == scope) | ||
| .and_then(|frame| frame.decls.get(&(name.clone(), namespace)).copied()) |
There was a problem hiding this comment.
find_in_scope clones Atom just to perform a map lookup (name.clone()), which can add noticeable overhead in large ASTs. Consider restructuring decls so lookups can be done by borrowing (e.g. nested maps keyed by Atom/namespace, or a custom key type with Borrow) to avoid per-lookup cloning.
| .and_then(|frame| frame.decls.get(&(name.clone(), namespace)).copied()) | |
| .and_then(|frame| { | |
| frame | |
| .decls | |
| .iter() | |
| .find(|((decl_name, decl_namespace), _)| { | |
| decl_name == name && *decl_namespace == namespace | |
| }) | |
| .map(|(_, symbol)| *symbol) | |
| }) |
| fn resolve_symbol(&self, name: &Atom, namespace: SymbolNamespace) -> Option<SymbolId> { | ||
| for frame in self.scope_stack.iter().rev() { | ||
| if let Some(symbol) = frame.decls.get(&(name.clone(), namespace)) { | ||
| return Some(*symbol); | ||
| } |
There was a problem hiding this comment.
resolve_symbol clones Atom on every scope-frame lookup (name.clone() inside the loop). This makes resolution O(depth) and allocates/clones per frame. Consider changing the decls keying strategy to allow borrowed lookups so resolution doesn't require cloning for each frame.
| fn visit_import_decl(&mut self, import: &ImportDecl) { | ||
| self.predeclare_import(import); |
There was a problem hiding this comment.
Imports are predeclared in predeclare_module_decl, but visit_import_decl also calls predeclare_import, which re-walks and re-declares the same bindings. This is redundant work for every import; consider making one of these paths a no-op (e.g. only predeclare in the initial scan, and skip during the main visit).
| fn visit_import_decl(&mut self, import: &ImportDecl) { | |
| self.predeclare_import(import); | |
| fn visit_import_decl(&mut self, _import: &ImportDecl) { | |
| // Import bindings are predeclared during the initial module scan. | |
| // Skip redeclaring them during the main visit. |
| typed!(visit_mut_ts_namespace_export_decl, TsNamespaceExportDecl); | ||
|
|
||
| fn visit_mut_program(&mut self, program: &mut Program) { | ||
| self.assign_node_ids_once(program); |
There was a problem hiding this comment.
Resolver now performs an extra full-tree traversal to assign NodeIds before doing its normal work. If most resolver callers don't need node ids, consider making node-id assignment opt-in (config flag/feature) or exposing it as a separate pass, to avoid doubling traversal cost by default.
| self.assign_node_ids_once(program); | |
| #[cfg(feature = "resolver_node_ids")] | |
| { | |
| self.assign_node_ids_once(program); | |
| } |
Binary Sizes
Commit: ba84514 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d3f14b71ec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| &decl.ident, | ||
| self.current_scope(), | ||
| SymbolKind::Class, | ||
| SymbolNamespace::Value, | ||
| ); |
There was a problem hiding this comment.
Register class symbols in type namespace
With SemanticConfig { typescript: true }, class names are valid type references (for example let x: C), but this declaration path only records Decl::Class in SymbolNamespace::Value. Later type-resolution paths (such as visit_ts_type_ref) look up SymbolNamespace::Type, so class type uses are marked unresolved and downstream SymbolId-based passes can miss those bindings. Add a corresponding type-namespace declaration for class symbols when TypeScript analysis is enabled.
Useful? React with 👍 / 👎.
| let new_sym = new_id.atom(); | ||
| if ident.sym.eq(new_sym) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Apply ctxt-only renames in SymbolOperator
This early return skips updates whenever the target atom matches, but RenamedVariable also carries SyntaxContext. If a caller needs to retag a symbol without changing its text (same atom, different ctxt), the rename silently becomes a no-op and references keep the old binding identity. The condition should account for ctxt too (or still assign ctxt when atoms are equal).
Useful? React with 👍 / 👎.
Merging this PR will degrade performance by 13.2%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ❌ | es/large/minify/libraries/antd |
2.6 s | 2.7 s | -3.12% |
| ❌ | es/large/minify/libraries/d3 |
544 ms | 556.6 ms | -2.27% |
| ❌ | es/large/all/es2018 |
120 ms | 123.1 ms | -2.48% |
| ❌ | es/large/all/es2016 |
121.7 ms | 124.8 ms | -2.54% |
| ❌ | es/large/minify/libraries/echarts |
2.1 s | 2.2 s | -3% |
| ❌ | es/large/minify/libraries/three |
894.7 ms | 925.1 ms | -3.28% |
| ❌ | es/large/minify/libraries/terser |
511.9 ms | 530.1 ms | -3.44% |
| ❌ | es/large/minify/libraries/typescript |
5 s | 5.2 s | -3.03% |
| ❌ | es/large/minify/libraries/victory |
1.2 s | 1.3 s | -2.94% |
| ❌ | es/minifier/libs/antd |
2 s | 2.1 s | -4.12% |
| ❌ | es/minifier/libs/three |
668.5 ms | 696 ms | -3.94% |
| ❌ | es/minifier/libs/victory |
895.1 ms | 935.2 ms | -4.29% |
| ❌ | es/minifier/libs/typescript |
4 s | 4.1 s | -3.87% |
| ❌ | es/minifier/libs/echarts |
1.6 s | 1.7 s | -3.93% |
| ❌ | es/minifier/libs/terser |
375.1 ms | 387.8 ms | -3.28% |
| ⚡ | es/full-target/es2018 |
588.8 µs | 575.8 µs | +2.26% |
| ❌ | es/full-target/es2017 |
629.6 µs | 647.5 µs | -2.76% |
| ❌ | es/resolver_with_hygiene/typescript |
758 ms | 795.9 ms | -4.75% |
| ❌ | es/resolver/typescript |
248.2 ms | 286 ms | -13.2% |
Comparing kdy1:kdy1/node-id-semantics (d3f14b7) with main (fe38342)
Footnotes
-
31 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Summary
NodeIdtoswc_ecma_ast::Identas an occurrence id withNodeId(0)reserved as invalid/default.swc_ecma_transforms_base::semanticswith fresh node-id assignment plusScopeId,SymbolId, andReferenceIdside tables.IdentLike.Notes
This is intentionally experimental. The AST and semantics foundation is in place, but the minifier's internal
ProgramDatavariable keys are not fully migrated from legacyIdtoSymbolIdyet.Test Plan
git submodule update --init --recursivecargo test -p generate-code test_ecmascript -- --nocapturecargo check -p swc_ecma_ast -p swc_ecma_visit -p swc_ecma_hookscargo check -p swc_ecma_transforms_base -p swc_ecma_utils -p swc_ecma_minifierUPDATE=1 cargo test -p swc_ecma_transforms_base --test semantics -- --ignoredcargo test -p swc_ecma_transforms_base --test semantics -- --ignoredcargo test -p swc_ecma_transforms_basecargo test -p swc_ecma_utilscargo test -p swc_ecma_minifier(cd crates/swc_ecma_minifier && UPDATE=1 ./scripts/test.sh)(cd crates/swc_ecma_minifier && ./scripts/test.sh)(cd crates/swc_ecma_minifier && ./scripts/exec.sh)cargo check -p swc_ecma_transformer -p swc_typescript -p swc_ecma_transforms_modulecargo fmt --allcargo clippy --all --all-targets -- -D warningscargo test -p swc_ecma_ast -p swc_ecma_visit -p swc_ecma_hooks -p swc_ecma_transformer -p swc_typescript -p swc_estree_compatcargo test -p swc_bundler --libcargo test -p swc_bundler ...with integration tests was attempted, butswc_bundler --test denorequires a localdenoexecutable and failed withNo such file or directory.