perf: parallelize HashedPostStateSorted::from_reverts hashing/sorting #20148
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Introduces a parallelized path for
HashedPostStateSorted::from_revertsusingrayon, gated behind a newparallel-from-revertsfeature flag.This optimization targets CPU bottlenecks caused by deep reorgs or blocks with skewed storage distributions. It implements an Account Count Threshold (2,500) to ensure no regressions on small or standard blocks.
Motivation
Closes #20049
Processing large state reverts involves two distinct phases:
While the DB walk must remain sequential, the sorting phase becomes a bottleneck when thousands of accounts (or accounts with massive storage slots) need processing. This PR parallelizes the CPU-bound sorting phase, reducing wall-time for heavy blocks.
Implementation Details
parallel-from-reverts(opt-in).par_sort_unstable) introduced too much overhead for typical slot counts (< 1,000).< 2,500: Always Sequential (avoids Rayon overhead).>= 2,500: Parallel (distributes load, handles skewed accounts).Benchmarks
1. Micro-Benchmarks (
sorting_par_exp)Measured purely the sorting/hashing overhead (excluding DB reads).
Acc_LowAcc_MedAcc_HighSkewed_10kSkewed Distribution: 95% accounts have 4 slots, 5% have 2,000 slots.*
2. Integration Benchmarks (
integration_bench)Measured full lifecycle: DB Read -> Hashing -> Sorting -> Allocation.
SmallLarge_UniformLarge_SkewedSince the total runtime is dominated by DB I/O, this actually represents a solid optimization of the available CPU-bound work. The threshold ensures no regression for small blocks.
Checklist
rayondependency (optional).parallel-from-revertsfeature flag.