perf(engine): batch multiproof messages #20066

yongkangc · 2025-12-02T07:44:02Z

Problem

When state updates arrive faster than proof computation, each message triggers a separate proof task, causing redundant trie traversals for the same accounts.

Solution

Batch consecutive same-type messages before dispatching to workers.

Batching Logic

Dispatch-then-batch:

Dispatch first message immediately — Don't wait; start proof computation right away
Accumulate while workers process — Drain channel for more same-type messages (same source for StateUpdate)
Process accumulated batch — Merge targets and dispatch as single work item

Different-type messages encountered during drain are stored in pending_msg and processed in the next iteration to preserve ordering.

Changes

Add process_multiproof_message() with dispatch-then-batch logic
Add pending_msg field to preserve message ordering across iterations

Expected Impact

Fewer proof computations under load (N messages merged into 1 batch)
Lower pending queue, less overhead

Benches

Mainnet: perf(engine): batch multiproof messages #20066 (comment)

Detailed flow diagrams

fixes #19168

Copilot

Pull request overview

This PR introduces message batching optimization for the multiproof task to improve performance, while simultaneously fixing a critical ordering issue. The previous approach of sending non-matching messages back to the channel moved them to the end of the queue, breaking StateUpdate ordering required by ProofSequencer. The solution stores pending messages locally and handles them immediately after batch processing.

Key changes:

Add batching logic for consecutive PrefetchProofs messages to reduce processing overhead
Add batching logic for consecutive StateUpdate messages to merge state updates efficiently
Fix message ordering by using local pending_msg buffer instead of re-sending to channel

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/engine/tree/src/tree/payload_processor/multiproof.rs

yongkangc · 2025-12-03T11:31:55Z

Flow Diagrams

Before: No Batching

Channel: [StateUpdate1, StateUpdate2, StateUpdate3, PrefetchProofs1]

Loop:
  recv StateUpdate1   -->  dispatch proof #1
  recv StateUpdate2   -->  dispatch proof #2
  recv StateUpdate3   -->  dispatch proof #3
  recv PrefetchProofs1  -->  dispatch proof #4

Result: 4 messages, 4 proof tasks, 4 trie traversals

After: With Batching

Channel: [StateUpdate1, StateUpdate2, StateUpdate3, PrefetchProofs1]
State:   pending_msg = None

Loop iteration 1:
  pending_msg = None, so select recv StateUpdate1
  |
  +-- process_multiproof_message(StateUpdate1)
        |
        +-- try_recv() StateUpdate2, same type, MERGE
        +-- try_recv() StateUpdate3, same type, MERGE  
        +-- try_recv() PrefetchProofs1, different type
        |         |
        |         +-- pending_msg = Some(PrefetchProofs1)
        |
        +-- dispatch ONE merged batch as proof #1

Loop iteration 2:
  pending_msg = Some(PrefetchProofs1), skip select
  |
  +-- process_multiproof_message(PrefetchProofs1)
        |
        +-- dispatch proof #2

Result: 4 messages, 2 proof tasks, 2 trie traversals

Message Type Handling

Message Type	Batching
`StateUpdate`	Merge: combine `state` fields
`PrefetchProofs`	Merge: extend `targets`
`FinishedStateUpdates`	No batch (signal)
`EmptyProof`	No batch (forward as-is)

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…te updates

When batching PrefetchProofs or StateUpdate messages, if a different message type is encountered during try_recv(), it was being sent back to the channel via self.tx.send(). This puts the message at the END of the queue instead of preserving its position, breaking the ordering of StateUpdate messages which is critical for ProofSequencer. Fix: Store the pending message in a local buffer and handle it immediately after processing the batch, before the next select_biased iteration.

Introduces a new method to process multiproof messages, allowing for the batching of consecutive same-type messages. This change improves efficiency by reducing redundant trie traversals when messages arrive in quick succession. The implementation ensures that if a different message type is encountered, it is stored for processing in the next iteration, preserving the order of messages. Additionally, tests have been added to verify that message ordering is maintained during batching.

Fixes clippy warning about unused-mut in multiproof tests.

Introduces constants for maximum batch targets and messages to improve multiproof message processing efficiency. Adds an estimation function for target counts during batching, ensuring that the number of processed messages does not exceed defined limits. Additionally, new histograms are implemented to track the sizes of prefetch and state update batches, enhancing observability of the batching process.

…lism Decreased DEFAULT_MAX_BATCH_TARGETS from 500 to 50 and DEFAULT_MAX_BATCH_MESSAGES from 16 to 2 to improve processing efficiency and maintain pipeline parallelism during multiproof message handling.

…ciency Reverted DEFAULT_MAX_BATCH_TARGETS to 500 and DEFAULT_MAX_BATCH_MESSAGES to 16 to improve processing efficiency. Implemented a dispatch-then-batch strategy for handling PrefetchProofs and StateUpdate messages, allowing immediate dispatch of the first message while accumulating additional messages for batch processing. This change aims to maintain pipeline flow and optimize message handling.

Add metrics to understand why chunking is skipped: - available_storage_workers_at_dispatch_histogram: workers available when dispatching - available_account_workers_at_dispatch_histogram: workers available when dispatching - chunking_skipped_busy: counter for when chunking skipped due to busy workers - chunking_skipped_below_threshold: counter for when targets below chunking threshold - chunking_performed: counter for successful chunking These metrics help diagnose the vicious cycle where chunking is disabled exactly when it's most needed (when workers are busy with large proofs).

Problem: The batching feature caused an 80% regression in newPayload latency. Root cause analysis revealed three issues: 1. **Vicious cycle in chunking logic**: The `should_chunk` condition checked if workers were available before chunking. When workers were busy (available=0), chunking was skipped, creating larger proofs that kept workers busy longer. Metrics showed proof chunks = 1 (no chunking) despite large batches. 2. **Priority inversion with pending_msg**: The main loop processed pending messages before checking proof results, violating select_biased! priority and starving worker completion signals. 3. **estimate_evm_state_targets undercount**: The estimate was 1 less per account with storage slots, allowing batches to exceed the 500 target limit. Solution: - Remove worker availability check from chunking decision - always chunk if targets exceed threshold. Workers will pick up chunks as they become free. - Drain proof results with try_recv() BEFORE processing pending messages to maintain correct priority. - Fix estimate to match HashedPostState::chunking_length: 1 + changed_slots instead of 1 + changed_slots.saturating_sub(1). Changes: - Remove unused `max_targets_for_chunking` field and constant - Remove `chunking_skipped_busy` counter (no longer applicable) - Add priority drain loop for proof results before pending_msg processing - Fix estimate_evm_state_targets formula Expected Impact: - Chunking will now happen regardless of worker availability - Proof results will be processed with proper priority - Batch sizes will stay within limits

- Introduced a new function `same_state_change_source` to determine if two state change sources originate from the same source. - Updated the batching logic in `MultiProofTask` to utilize this function, ensuring that state updates from different sources are correctly separated during processing. - Added a test case to verify that the batching mechanism correctly handles and separates state updates from different sources, maintaining the integrity of the batching process.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/engine/tree/src/tree/payload_processor/multiproof.rs

…ogic - Eliminated several unused metrics related to worker availability and chunking performance from `MultiProofTaskMetrics`. - Updated the batching logic to simplify the handling of state updates, ensuring that the first message is dispatched immediately while accumulating additional messages for batch processing. - Improved code clarity by consolidating conditional checks for batching state updates.

- Added descriptive comments to several test cases in `multiproof.rs` to clarify their purpose and expected behavior. - Improved documentation for tests related to batching of prefetch proofs, state updates, and message ordering, ensuring better understanding and maintainability of the test suite.

yongkangc · 2025-12-05T14:05:50Z

Mainnet Bench on reth4 1000 blocks

yongkangc · 2025-12-05T14:20:53Z

crates/engine/tree/src/tree/payload_processor/multiproof.rs

                    }
                },
                recv(self.rx) -> message => {
-                    match message {


this is extracted to process_multiproof_message

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

crates/engine/tree/src/tree/payload_processor/multiproof.rs:642

The documentation states "number of messages merged" but due to the batch size recording issue (see previous comments), the actual values recorded are inconsistent. When the first message is dispatched immediately and no additional messages are batched, it records 1. However, when additional messages are batched after the first immediate dispatch, it only records the count of the additional messages, not including the first one.

Update the documentation to clarify the actual semantics, or fix the recording logic to be consistent.

    /// Histogram of sparse trie update durations.
    pub sparse_trie_update_duration_histogram: Histogram,
    /// Histogram of sparse trie final update durations.
    pub sparse_trie_final_update_duration_histogram: Histogram,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/engine/tree/src/tree/payload_processor/multiproof.rs

mediocregopher · 2025-12-05T16:45:34Z

crates/engine/tree/src/tree/payload_processor/multiproof.rs

+            self.multiproof_manager.proof_worker_handle.available_account_workers();
+        let available_storage_workers =
+            self.multiproof_manager.proof_worker_handle.available_storage_workers();
+        let outcome = dispatch_with_chunking(


nit: it looks like chunk got renamed to outcome, if we're still returning number of chunks here then chunks or num_chunks would be a clearer name I think

mediocregopher · 2025-12-05T16:45:45Z

crates/engine/tree/src/tree/payload_processor/multiproof.rs

+            self.multiproof_manager.proof_worker_handle.available_account_workers();
+        let available_storage_workers =
+            self.multiproof_manager.proof_worker_handle.available_storage_workers();
+        let outcome = dispatch_with_chunking(


same nit here as with the prewarm case

mediocregopher · 2025-12-05T16:52:23Z

crates/engine/tree/src/tree/payload_processor/multiproof.rs

+                    debug!(target: "engine::tree::payload_processor::multiproof", "Started state root calculation");
+                }
+
+                // Dispatch first message immediately


Did you do some testing that confirmed that doing this is impactful? My intuition says that it wouldn't matter a whole lot, since it's just the results of one tx, more likely than not it would only occupy a single worker while the rest are waiting for further batched messages.

mediocregopher · 2025-12-05T17:55:27Z

crates/engine/tree/src/tree/payload_processor/multiproof.rs

+    fn process_multiproof_message(
+        &mut self,
+        msg: MultiProofMessage,
+        pending_msg: &mut Option<MultiProofMessage>,


It'd be nice to pull out a lot (or all) of these &muts into a separate struct, like MultiproofBatchCtx or something. It seems like a lot are just for metrics, if so maybe two structs, MultiproofBatchCtx and MultiproofBatchMetrics.

Addresses reviewer nit: the variable returned from dispatch_with_chunking represents number of chunks, so the name should reflect that.

Extract &mut parameters from process_multiproof_message into: - MultiproofBatchCtx: core processing state (pending_msg, timing, updates_finished) - MultiproofBatchMetrics: counters for proofs processed/requested This improves code organization and reduces function parameter count.

Remove the "dispatch first message immediately" optimization per reviewer feedback. The first message is now included in the batch and dispatched together with accumulated messages, simplifying the code.

github-project-automation bot added this to Reth Tracker Dec 2, 2025

github-project-automation bot moved this to Backlog in Reth Tracker Dec 2, 2025

yongkangc changed the title ~~fix(engine): preserve message ordering in multiproof batching~~ perf(engine): batch multiproof messages Dec 2, 2025

yongkangc self-assigned this Dec 2, 2025

yongkangc requested a review from Copilot December 2, 2025 07:56

Copilot started reviewing on behalf of yongkangc December 2, 2025 07:57 View session

Copilot finished reviewing on behalf of yongkangc December 2, 2025 08:02

Copilot AI reviewed Dec 2, 2025

View reviewed changes

yongkangc requested a review from Copilot December 3, 2025 11:42

Copilot started reviewing on behalf of yongkangc December 3, 2025 11:42 View session

yongkangc force-pushed the yk/batch-multiproof-fix branch from 1427209 to a8031e6 Compare December 3, 2025 11:43

Copilot finished reviewing on behalf of yongkangc December 3, 2025 11:46

Copilot AI reviewed Dec 3, 2025

View reviewed changes

yongkangc and others added 8 commits December 3, 2025 14:31

Add bench compare latency stats

cdd43db

feat(engine): update multiproof task to batch prefetch proofs and sta…

72699ae

…te updates

test: add batching tests for prefetch proofs and state updates

b625015

fix(multiproof): batching logic dropping messages

d696eee

fmt

c44ec2e

fix: remove unnecessary mut from test variables

6a38e49

Fixes clippy warning about unused-mut in multiproof tests.

yongkangc force-pushed the yk/batch-multiproof-fix branch from a8031e6 to 6a38e49 Compare December 4, 2025 03:08

yongkangc added 7 commits December 4, 2025 03:50

feat: batch by target instead

61354ec

refactor(multiproof): reduce batch limits to enhance pipeline paralle…

3e43225

…lism Decreased DEFAULT_MAX_BATCH_TARGETS from 500 to 50 and DEFAULT_MAX_BATCH_MESSAGES from 16 to 2 to improve processing efficiency and maintain pipeline parallelism during multiproof message handling.

Copilot finished reviewing on behalf of yongkangc December 5, 2025 13:58

Copilot AI reviewed Dec 5, 2025

View reviewed changes

yongkangc added 5 commits December 5, 2025 14:02

remove trace

839247e

fmt

e1d96bc

fmt

97c0a5a

fmt

123a286

yongkangc commented Dec 5, 2025

View reviewed changes

yongkangc requested a review from Copilot December 5, 2025 14:21

Copilot started reviewing on behalf of yongkangc December 5, 2025 14:21 View session

comment

0530b3d

Copilot finished reviewing on behalf of yongkangc December 5, 2025 14:26

remove ;

6fddf9d

Copilot AI reviewed Dec 5, 2025

View reviewed changes

crates/engine/tree/src/tree/payload_processor/multiproof.rs Outdated Show resolved Hide resolved

crates/engine/tree/src/tree/payload_processor/multiproof.rs Outdated Show resolved Hide resolved

commment

a7c2f7a

yongkangc marked this pull request as ready for review December 5, 2025 14:32

yongkangc requested review from Rjected, fgimenez, mattsse and mediocregopher as code owners December 5, 2025 14:32

yongkangc moved this from Backlog to In Progress in Reth Tracker Dec 5, 2025

yongkangc moved this from In Progress to In Review in Reth Tracker Dec 5, 2025

yongkangc added 2 commits December 5, 2025 14:36

fmt

1c29a52

chore: fix fmt

160910d

mediocregopher reviewed Dec 5, 2025

View reviewed changes

yongkangc added 3 commits December 6, 2025 06:43

fix(engine): rename outcome to num_chunks for clarity

16ee86a

Addresses reviewer nit: the variable returned from dispatch_with_chunking represents number of chunks, so the name should reflect that.

refactor(engine): simplify batching by removing immediate dispatch

5762a7a

Remove the "dispatch first message immediately" optimization per reviewer feedback. The first message is now included in the batch and dispatched together with accumulated messages, simplifying the code.

perf(engine): batch multiproof messages #20066

Are you sure you want to change the base?

perf(engine): batch multiproof messages #20066

Conversation

yongkangc commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Batching Logic

Changes

Expected Impact

Benches

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yongkangc commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Flow Diagrams

Before: No Batching

After: With Batching

Message Type Handling

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yongkangc commented Dec 5, 2025

Uh oh!

yongkangc Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

mediocregopher Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

mediocregopher Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

mediocregopher Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

mediocregopher Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yongkangc commented Dec 2, 2025 •

edited

Loading

yongkangc commented Dec 3, 2025 •

edited

Loading