-
Notifications
You must be signed in to change notification settings - Fork 2.2k
perf(engine): batch multiproof messages #20066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces message batching optimization for the multiproof task to improve performance, while simultaneously fixing a critical ordering issue. The previous approach of sending non-matching messages back to the channel moved them to the end of the queue, breaking StateUpdate ordering required by ProofSequencer. The solution stores pending messages locally and handles them immediately after batch processing.
Key changes:
- Add batching logic for consecutive PrefetchProofs messages to reduce processing overhead
- Add batching logic for consecutive StateUpdate messages to merge state updates efficiently
- Fix message ordering by using local pending_msg buffer instead of re-sending to channel
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Flow DiagramsBefore: No BatchingAfter: With BatchingMessage Type Handling
|
1427209 to
a8031e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When batching PrefetchProofs or StateUpdate messages, if a different message type is encountered during try_recv(), it was being sent back to the channel via self.tx.send(). This puts the message at the END of the queue instead of preserving its position, breaking the ordering of StateUpdate messages which is critical for ProofSequencer. Fix: Store the pending message in a local buffer and handle it immediately after processing the batch, before the next select_biased iteration.
Introduces a new method to process multiproof messages, allowing for the batching of consecutive same-type messages. This change improves efficiency by reducing redundant trie traversals when messages arrive in quick succession. The implementation ensures that if a different message type is encountered, it is stored for processing in the next iteration, preserving the order of messages. Additionally, tests have been added to verify that message ordering is maintained during batching.
Fixes clippy warning about unused-mut in multiproof tests.
a8031e6 to
6a38e49
Compare
Introduces constants for maximum batch targets and messages to improve multiproof message processing efficiency. Adds an estimation function for target counts during batching, ensuring that the number of processed messages does not exceed defined limits. Additionally, new histograms are implemented to track the sizes of prefetch and state update batches, enhancing observability of the batching process.
…lism Decreased DEFAULT_MAX_BATCH_TARGETS from 500 to 50 and DEFAULT_MAX_BATCH_MESSAGES from 16 to 2 to improve processing efficiency and maintain pipeline parallelism during multiproof message handling.
…ciency Reverted DEFAULT_MAX_BATCH_TARGETS to 500 and DEFAULT_MAX_BATCH_MESSAGES to 16 to improve processing efficiency. Implemented a dispatch-then-batch strategy for handling PrefetchProofs and StateUpdate messages, allowing immediate dispatch of the first message while accumulating additional messages for batch processing. This change aims to maintain pipeline flow and optimize message handling.
Add metrics to understand why chunking is skipped: - available_storage_workers_at_dispatch_histogram: workers available when dispatching - available_account_workers_at_dispatch_histogram: workers available when dispatching - chunking_skipped_busy: counter for when chunking skipped due to busy workers - chunking_skipped_below_threshold: counter for when targets below chunking threshold - chunking_performed: counter for successful chunking These metrics help diagnose the vicious cycle where chunking is disabled exactly when it's most needed (when workers are busy with large proofs).
Problem: The batching feature caused an 80% regression in newPayload latency. Root cause analysis revealed three issues: 1. **Vicious cycle in chunking logic**: The `should_chunk` condition checked if workers were available before chunking. When workers were busy (available=0), chunking was skipped, creating larger proofs that kept workers busy longer. Metrics showed proof chunks = 1 (no chunking) despite large batches. 2. **Priority inversion with pending_msg**: The main loop processed pending messages before checking proof results, violating select_biased! priority and starving worker completion signals. 3. **estimate_evm_state_targets undercount**: The estimate was 1 less per account with storage slots, allowing batches to exceed the 500 target limit. Solution: - Remove worker availability check from chunking decision - always chunk if targets exceed threshold. Workers will pick up chunks as they become free. - Drain proof results with try_recv() BEFORE processing pending messages to maintain correct priority. - Fix estimate to match HashedPostState::chunking_length: 1 + changed_slots instead of 1 + changed_slots.saturating_sub(1). Changes: - Remove unused `max_targets_for_chunking` field and constant - Remove `chunking_skipped_busy` counter (no longer applicable) - Add priority drain loop for proof results before pending_msg processing - Fix estimate_evm_state_targets formula Expected Impact: - Chunking will now happen regardless of worker availability - Proof results will be processed with proper priority - Batch sizes will stay within limits
- Introduced a new function `same_state_change_source` to determine if two state change sources originate from the same source. - Updated the batching logic in `MultiProofTask` to utilize this function, ensuring that state updates from different sources are correctly separated during processing. - Added a test case to verify that the batching mechanism correctly handles and separates state updates from different sources, maintaining the integrity of the batching process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ogic - Eliminated several unused metrics related to worker availability and chunking performance from `MultiProofTaskMetrics`. - Updated the batching logic to simplify the handling of state updates, ensuring that the first message is dispatched immediately while accumulating additional messages for batch processing. - Improved code clarity by consolidating conditional checks for batching state updates.
- Added descriptive comments to several test cases in `multiproof.rs` to clarify their purpose and expected behavior. - Improved documentation for tests related to batching of prefetch proofs, state updates, and message ordering, ensuring better understanding and maintainability of the test suite.
| } | ||
| }, | ||
| recv(self.rx) -> message => { | ||
| match message { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is extracted to process_multiproof_message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
crates/engine/tree/src/tree/payload_processor/multiproof.rs:642
- The documentation states "number of messages merged" but due to the batch size recording issue (see previous comments), the actual values recorded are inconsistent. When the first message is dispatched immediately and no additional messages are batched, it records 1. However, when additional messages are batched after the first immediate dispatch, it only records the count of the additional messages, not including the first one.
Update the documentation to clarify the actual semantics, or fix the recording logic to be consistent.
/// Histogram of sparse trie update durations.
pub sparse_trie_update_duration_histogram: Histogram,
/// Histogram of sparse trie final update durations.
pub sparse_trie_final_update_duration_histogram: Histogram,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.multiproof_manager.proof_worker_handle.available_account_workers(); | ||
| let available_storage_workers = | ||
| self.multiproof_manager.proof_worker_handle.available_storage_workers(); | ||
| let outcome = dispatch_with_chunking( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it looks like chunk got renamed to outcome, if we're still returning number of chunks here then chunks or num_chunks would be a clearer name I think
| self.multiproof_manager.proof_worker_handle.available_account_workers(); | ||
| let available_storage_workers = | ||
| self.multiproof_manager.proof_worker_handle.available_storage_workers(); | ||
| let outcome = dispatch_with_chunking( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same nit here as with the prewarm case
| debug!(target: "engine::tree::payload_processor::multiproof", "Started state root calculation"); | ||
| } | ||
|
|
||
| // Dispatch first message immediately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you do some testing that confirmed that doing this is impactful? My intuition says that it wouldn't matter a whole lot, since it's just the results of one tx, more likely than not it would only occupy a single worker while the rest are waiting for further batched messages.
| fn process_multiproof_message( | ||
| &mut self, | ||
| msg: MultiProofMessage, | ||
| pending_msg: &mut Option<MultiProofMessage>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to pull out a lot (or all) of these &muts into a separate struct, like MultiproofBatchCtx or something. It seems like a lot are just for metrics, if so maybe two structs, MultiproofBatchCtx and MultiproofBatchMetrics.
Addresses reviewer nit: the variable returned from dispatch_with_chunking represents number of chunks, so the name should reflect that.
Extract &mut parameters from process_multiproof_message into: - MultiproofBatchCtx: core processing state (pending_msg, timing, updates_finished) - MultiproofBatchMetrics: counters for proofs processed/requested This improves code organization and reduces function parameter count.
Remove the "dispatch first message immediately" optimization per reviewer feedback. The first message is now included in the batch and dispatched together with accumulated messages, simplifying the code.

Problem
When state updates arrive faster than proof computation, each message triggers a separate proof task, causing redundant trie traversals for the same accounts.
Solution
Batch consecutive same-type messages before dispatching to workers.
Batching Logic
Dispatch-then-batch:
Different-type messages encountered during drain are stored in
pending_msgand processed in the next iteration to preserve ordering.Changes
process_multiproof_message()with dispatch-then-batch logicpending_msgfield to preserve message ordering across iterationsExpected Impact
Benches
Detailed flow diagrams
fixes #19168