-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Describe the bug
During and after backfill (pipeline) sync, RPC queries that resolve state via history_by_block_hash can silently return stale or incorrect state for blocks whose headers have been downloaded but whose execution has not yet completed. This affects any RPC method that resolves state by block hash — including eth_getBalance, eth_call, eth_getStorageAt, and all debug_trace* methods.
Root cause
There are two interacting bugs:
Bug 1: history_by_block_hash lacks the ensure_canonical_block guard
history_by_block_number correctly calls ensure_canonical_block(block_number) before returning state, rejecting queries for blocks that haven't been fully synced. However, history_by_block_hash skips this check entirely:
// blockchain_provider.rs L567-569 — NO guard
fn history_by_block_hash(&self, block_hash: BlockHash) -> ProviderResult<StateProviderBox> {
self.consistent_provider()?.into_state_provider_at_block_hash(block_hash)
}Critically, history_by_block_hash is the path used by most queries, not history_by_block_number. state_by_block_number_or_tag(Number(num)) converts the number to a hash first, then routes through state_by_block_hash → history_by_block_hash — bypassing the guarded history_by_block_number path. The same applies to state_by_block_id(BlockId::Hash(...)) which routes directly to history_by_block_hash. This means history_by_block_number (with its guard) is effectively dead code for most queries — only used for Earliest.
The debug_trace* methods are also affected: debug_traceBlockByNumber resolves the block, then calls spawn_with_state_at_block(block.parent_hash().into(), ...), which routes through state_at_block_id(BlockId::Hash(parent_hash)) → history_by_block_hash — the unguarded path.
Bug 2: ConsistentProvider::best_block_number() returns header height, not execution height
Even if bug 1 is fixed by adding ensure_canonical_block to history_by_block_hash, the guard itself has a gap. ensure_canonical_block checks against best_block_number(), which falls back to self.last_block_number() when head_block is None:
// consistent.rs L762-763
fn best_block_number(&self) -> ProviderResult<BlockNumber> {
self.head_block.as_ref().map(|b| Ok(b.number())).unwrap_or_else(|| self.last_block_number())
}last_block_number() reads the highest entry in the CanonicalHeaders table — the header download frontier, which is far ahead of execution during staged sync. This means ensure_canonical_block compares against headers-downloaded (not state-executed), making it pass for blocks whose state doesn't exist yet.
head_block is None after every backfill sync because on_backfill_sync_finished calls clear_state() (line 1743), which empties the in-memory state. Backfill is triggered when the gap exceeds MIN_BLOCKS_FOR_PIPELINE_RUN = EPOCH_SLOTS = 32 blocks.
Observable symptoms
-
eth_getBalance/eth_getStorageAt: Returns stale balances for recent blocks. For native balances, the mempool rejects transactions based on stale state (loud failure). For ERC20 balances, the stale values propagate silently into EVM execution. -
debug_traceBlockByNumber: Re-executes block transactions against stale parent state. Manifests as arithmetic overflow panics (Panic(0x11)) in DEX router calls where reserves are stale, or phantom transactions appearing/disappearing in traces vs. stored receipts. -
eth_call/eth_estimateGas: Returns incorrect results when called against recent block numbers.
Suggested fix
Both bugs need to be addressed together:
Fix 1: Add ensure_canonical_block guard to history_by_block_hash:
fn history_by_block_hash(&self, block_hash: BlockHash) -> ProviderResult<StateProviderBox> {
let provider = self.consistent_provider()?;
let block_number = provider
.block_number(block_hash)?
.ok_or(ProviderError::BlockHashNotFound(block_hash))?;
provider.ensure_canonical_block(block_number)?;
provider.into_state_provider_at_block_hash(block_hash)
}Fix 2: Change best_block_number() fallback from last_block_number() (header height) to storage_provider.best_block_number() (Finish stage checkpoint = actual execution progress):
fn best_block_number(&self) -> ProviderResult<BlockNumber> {
self.head_block
.as_ref()
.map(|b| Ok(b.number()))
.unwrap_or_else(|| self.storage_provider.best_block_number())
}The database provider's best_block_number() returns the StageId::Finish checkpoint, which reflects the highest block where all pipeline stages (including Execution) have completed.
Downstream report and fix
This was discovered and is being fixed downstream in bnb-chain/reth (BSC fork):
- Issue: bnb-chain/reth-bsc#273
- Fix PR: bnb-chain/reth#101 (includes both fixes)
Steps to reproduce
- Run a reth node on a fast-block chain (e.g., BSC at 0.45s blocks)
- Wait for the node to sync and begin serving traffic
- Introduce brief network lag (>14 seconds for BSC, >32 blocks for the target chain) to trigger a backfill sync cycle
- During or immediately after backfill, query
eth_getBalanceordebug_traceBlockByNumberfor a recent block - Compare the result against a fully-synced reference node
The stale state is silent — no error is returned, just incorrect data.
Node logs
Platform(s)
No response
Container Type
Kubernetes
What version/commit are you on?
v1.11.1
What database version are you on?
2
Which chain / network are you on?
mainnet
What type of node are you running?
Archive (default)
What prune config do you use, if any?
No response
If you've built Reth from source, provide the full command you used
No response
Code of Conduct
- I agree to follow the Code of Conduct
Metadata
Metadata
Assignees
Labels
Type
Projects
Status