-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Description
After the architectural migration from ConsistentDbView + TrieInput to OverlayStateProviderFactory + LazyOverlay (around v1.10.x), state root computation performance has significantly regressed on high-throughput chains (e.g., BSC with 0.45-second block time and 1000+ TPS).
Observed behavior:
reth_sync_block_validation_state_rootreaches ~7 seconds at 1000 TPS (~450 tx/block)- The same workload ran at 1800 TPS on the previous architecture (pre-v1.10, using
ConsistentDbView+TrieInput) - State root must complete within 0.45s to keep up with chain tip, but takes ~15x longer than the budget
Root Cause Analysis
1. Removal of the prefix_sets guard for StateRootTask
In the previous architecture, there was a critical guard in validate_block_with_state:
// Old code (pre-OverlayFactory migration)
let trie_input = self.compute_trie_input(persisting_kind, ...);
// Use state root task only if prefix sets are empty, otherwise proof generation is
// too expensive because it requires walking over the paths in the prefix set in every proof.
if trie_input.prefix_sets.is_empty() {
self.payload_processor.spawn(/* StateRootTask */)
} else {
use_state_root_task = false;
self.payload_processor.spawn_cache_exclusive(/* fallback to Parallel */)
}This guard was added to fix #14683 (closed by #14729) which identified that large prefix_sets from ancestor blocks cause each multiproof request to walk all paths in the prefix set, making proof generation extremely expensive.
In the current architecture, plan_state_root_computation() no longer checks prefix sets:
// Current code (v1.10+)
const fn plan_state_root_computation(&self) -> StateRootStrategy {
if self.config.skip_state_root_validation() || self.config.state_root_fallback() {
StateRootStrategy::Synchronous
} else if self.config.use_state_root_task() {
StateRootStrategy::StateRootTask // Always used, no prefix_sets check
} else {
StateRootStrategy::Parallel
}
}The StateRootTask is now always selected when use_state_root_task() is true, regardless of how many unpersisted ancestor blocks exist in memory.
2. Per-worker provider creation overhead
The old ProofTaskManager shared ConsistentDbView read transactions across proof workers and used pre-computed nodes_sorted/state_sorted from TrieInput via Arc.
The new ProofWorkerHandle creates a new OverlayStateProviderFactory::database_provider_ro() for each proof worker, which involves:
- Creating a new DB read transaction
- Resolving the
LazyOverlay(computing trie data from in-memory ancestor blocks) - Constructing the overlay state provider
With ~450 transactions per block at 0.45s block time generating many proof requests, this overhead becomes significant.
3. Why high-throughput chains are disproportionately affected
| Parameter | Ethereum Mainnet | BSC (current) |
|---|---|---|
| Block time | 12 seconds | 0.45 seconds |
| TX per block | ~200 | ~450 (at 1000 TPS) |
| Blocks per second | ~0.08 | ~2.2 |
| Unpersisted ancestor blocks | Few | Very many (persistence can't keep up) |
| State root time budget | 12s | 0.45s |
With 0.45s block time, the chain produces ~2.2 blocks/second. The persistence service cannot write to disk fast enough, causing many blocks to accumulate in memory. Each block's overlay data compounds, making the OverlayStateProviderFactory's LazyOverlay resolution increasingly expensive.
The new architecture works well for Ethereum mainnet's workload (~200 tx/block, 12s block time). But on chains with sub-second block times, the accumulated overlay overhead and per-worker provider creation cost become a critical bottleneck.
Relationship to Previous Issues
- #14683: "
TrieInputwith large prefix sets slows down State Root Task multiproofs" — this was the original bug, fixed by adding the prefix_sets guard in #14729 - #14417: "State Root Task has >500ms spikes of
newPayloadlatency on Base" — tracking issue for the same class of problems
The fix in #14729 was effective, but when TrieInput was replaced by OverlayStateProviderFactory, the guard was not ported to the new architecture.
Suggested Solutions
-
Reintroduce a prefix_sets-like guard: When the overlay contains significant trie data from ancestor blocks (many unpersisted blocks), fall back to
StateRootStrategy::Parallelinstead ofStateRootTask. -
Optimize
OverlayStateProviderFactoryfor proof workers: Cache the resolved overlay and share it across proof workers instead of creating independent providers per worker. The oldConsistentDbViewapproach was more efficient because it shared data viaArc. -
Add a configurable threshold: Allow chains to configure when to use
StateRootTaskvsParallelbased on expected transaction throughput or block time.
Platform
- reth version: v1.10.2 (via bnb-chain/reth fork), confirmed same pattern in v1.11.1 main branch
- Chain: BSC (BNB Smart Chain), 0.45-second block time
- Workload: 1000 TPS benchmark (~450 tx/block)
- Hardware: Standard validator-grade server
Metrics
# At 1000 TPS on BSC (0.45s block time, ~450 tx/block):
reth_sync_block_validation_state_root_duration: ~7s (budget: 0.45s)
# Previous version (pre-OverlayFactory, with prefix_sets guard):
# Successfully handled 1800 TPS on same hardware
Metadata
Metadata
Assignees
Labels
Type
Projects
Status