Stale state returned during/after backfill sync via `history_by_block_hash` (missing `ensure_canonical_block` guard)

### Describe the bug

During and after backfill (pipeline) sync, RPC queries that resolve state via `history_by_block_hash` can silently return **stale or incorrect state** for blocks whose headers have been downloaded but whose execution has not yet completed. This affects any RPC method that resolves state by block hash — including `eth_getBalance`, `eth_call`, `eth_getStorageAt`, and all `debug_trace*` methods.

### Root cause

There are two interacting bugs:

**Bug 1: `history_by_block_hash` lacks the `ensure_canonical_block` guard**

[`history_by_block_number`](https://github.com/paradigmxyz/reth/blob/cdeba79590a3c5339b398e5136cdc355cba5f20d/crates/storage/provider/src/providers/blockchain_provider.rs#L554-L565) correctly calls `ensure_canonical_block(block_number)` before returning state, rejecting queries for blocks that haven't been fully synced. However, [`history_by_block_hash`](https://github.com/paradigmxyz/reth/blob/cdeba79590a3c5339b398e5136cdc355cba5f20d/crates/storage/provider/src/providers/blockchain_provider.rs#L567-L569) skips this check entirely:

```rust
// blockchain_provider.rs L567-569 — NO guard
fn history_by_block_hash(&self, block_hash: BlockHash) -> ProviderResult<StateProviderBox> {
    self.consistent_provider()?.into_state_provider_at_block_hash(block_hash)
}
```

Critically, `history_by_block_hash` is the path used by **most queries**, not `history_by_block_number`. [`state_by_block_number_or_tag(Number(num))`](https://github.com/paradigmxyz/reth/blob/cdeba79590a3c5339b398e5136cdc355cba5f20d/crates/storage/provider/src/providers/blockchain_provider.rs#L545-L550) converts the number to a hash first, then routes through `state_by_block_hash` → `history_by_block_hash` — bypassing the guarded `history_by_block_number` path. The same applies to [`state_by_block_id(BlockId::Hash(...))`](https://github.com/paradigmxyz/reth/blob/cdeba79590a3c5339b398e5136cdc355cba5f20d/crates/storage/storage-api/src/state.rs#L152-L157) which routes directly to `history_by_block_hash`. This means `history_by_block_number` (with its guard) is effectively dead code for most queries — only used for `Earliest`.

The `debug_trace*` methods are also affected: `debug_traceBlockByNumber` resolves the block, then calls `spawn_with_state_at_block(block.parent_hash().into(), ...)`, which routes through `state_at_block_id(BlockId::Hash(parent_hash))` → `history_by_block_hash` — the unguarded path.

**Bug 2: `ConsistentProvider::best_block_number()` returns header height, not execution height**

Even if bug 1 is fixed by adding `ensure_canonical_block` to `history_by_block_hash`, the guard itself has a gap. [`ensure_canonical_block`](https://github.com/paradigmxyz/reth/blob/cdeba79590a3c5339b398e5136cdc355cba5f20d/crates/storage/provider/src/providers/consistent.rs#L631-L638) checks against [`best_block_number()`](https://github.com/paradigmxyz/reth/blob/cdeba79590a3c5339b398e5136cdc355cba5f20d/crates/storage/provider/src/providers/consistent.rs#L762-L766), which falls back to `self.last_block_number()` when `head_block` is `None`:

```rust
// consistent.rs L762-763
fn best_block_number(&self) -> ProviderResult<BlockNumber> {
    self.head_block.as_ref().map(|b| Ok(b.number())).unwrap_or_else(|| self.last_block_number())
}
```

`last_block_number()` reads the highest entry in the `CanonicalHeaders` table — the **header download frontier**, which is far ahead of execution during staged sync. This means `ensure_canonical_block` compares against headers-downloaded (not state-executed), making it pass for blocks whose state doesn't exist yet.

`head_block` is `None` after every backfill sync because [`on_backfill_sync_finished`](https://github.com/paradigmxyz/reth/blob/cdeba79590a3c5339b398e5136cdc355cba5f20d/crates/engine/tree/src/tree/mod.rs#L1741-L1753) calls `clear_state()` (line 1743), which empties the in-memory state. Backfill is triggered when the gap exceeds [`MIN_BLOCKS_FOR_PIPELINE_RUN = EPOCH_SLOTS = 32`](https://github.com/paradigmxyz/reth/blob/cdeba79590a3c5339b398e5136cdc355cba5f20d/crates/engine/tree/src/tree/mod.rs#L89) blocks.

### Observable symptoms

1. **`eth_getBalance` / `eth_getStorageAt`**: Returns stale balances for recent blocks. For native balances, the mempool rejects transactions based on stale state (loud failure). For ERC20 balances, the stale values propagate silently into EVM execution.

2. **`debug_traceBlockByNumber`**: Re-executes block transactions against stale parent state. Manifests as arithmetic overflow panics (`Panic(0x11)`) in DEX router calls where reserves are stale, or phantom transactions appearing/disappearing in traces vs. stored receipts.

3. **`eth_call` / `eth_estimateGas`**: Returns incorrect results when called against recent block numbers.

## Suggested fix

Both bugs need to be addressed together:

**Fix 1**: Add `ensure_canonical_block` guard to `history_by_block_hash`:

```rust
fn history_by_block_hash(&self, block_hash: BlockHash) -> ProviderResult<StateProviderBox> {
    let provider = self.consistent_provider()?;
    let block_number = provider
        .block_number(block_hash)?
        .ok_or(ProviderError::BlockHashNotFound(block_hash))?;
    provider.ensure_canonical_block(block_number)?;
    provider.into_state_provider_at_block_hash(block_hash)
}
```

**Fix 2**: Change `best_block_number()` fallback from `last_block_number()` (header height) to `storage_provider.best_block_number()` (Finish stage checkpoint = actual execution progress):

```rust
fn best_block_number(&self) -> ProviderResult<BlockNumber> {
    self.head_block
        .as_ref()
        .map(|b| Ok(b.number()))
        .unwrap_or_else(|| self.storage_provider.best_block_number())
}
```

The database provider's `best_block_number()` returns the `StageId::Finish` checkpoint, which reflects the highest block where all pipeline stages (including Execution) have completed.

## Downstream report and fix

This was discovered and is being fixed downstream in bnb-chain/reth (BSC fork):
- **Issue**: [bnb-chain/reth-bsc#273](https://github.com/bnb-chain/reth-bsc/issues/273)
- **Fix PR**: [bnb-chain/reth#101](https://github.com/bnb-chain/reth/pull/101) (includes both fixes)

### Steps to reproduce

1. Run a reth node on a fast-block chain (e.g., BSC at 0.45s blocks)
2. Wait for the node to sync and begin serving traffic
3. Introduce brief network lag (>14 seconds for BSC, >32 blocks for the target chain) to trigger a backfill sync cycle
4. During or immediately after backfill, query `eth_getBalance` or `debug_traceBlockByNumber` for a recent block
5. Compare the result against a fully-synced reference node

The stale state is silent — no error is returned, just incorrect data.

### Node logs

```text

```

### Platform(s)

_No response_

### Container Type

Kubernetes

### What version/commit are you on?

v1.11.1

### What database version are you on?

2

### Which chain / network are you on?

mainnet

### What type of node are you running?

Archive (default)

### What prune config do you use, if any?

_No response_

### If you've built Reth from source, provide the full command you used

_No response_

### Code of Conduct

- [x] I agree to follow the Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stale state returned during/after backfill sync via `history_by_block_hash` (missing `ensure_canonical_block` guard) #22873

Describe the bug

Root cause

Observable symptoms

Suggested fix

Downstream report and fix

Steps to reproduce

Node logs

Platform(s)

Container Type

What version/commit are you on?

What database version are you on?

Which chain / network are you on?

What type of node are you running?

What prune config do you use, if any?

If you've built Reth from source, provide the full command you used

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stale state returned during/after backfill sync via history_by_block_hash (missing ensure_canonical_block guard) #22873

Description

Describe the bug

Root cause

Observable symptoms

Suggested fix

Downstream report and fix

Steps to reproduce

Node logs

Platform(s)

Container Type

What version/commit are you on?

What database version are you on?

Which chain / network are you on?

What type of node are you running?

What prune config do you use, if any?

If you've built Reth from source, provide the full command you used

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Stale state returned during/after backfill sync via `history_by_block_hash` (missing `ensure_canonical_block` guard) #22873