Skip to content

Conversation

clabby
Copy link
Collaborator

@clabby clabby commented Sep 21, 2025

Overview

Warning

WIP; Open for early review, but not feature complete - See TODOs.

Integrates commonware-coding as a default within commonware-consensus' marshal by adding a new shim between consensus and the marshal::Actor. This layer is responsible for broadcasting the erasure coded chunks of Blocks sent by the proposer as well as reconstructing them as they arrive.

Example usage of the new API in alto: commonwarexyz/alto#149

TODO

  • test_finalize_good_links test seems to be pseudo-passing; all validators are eventually reconstructing the proposed blocks & receiving the same finalized chain, but it’s failing the determinism check. Need to investigate that.
  • When notarization / finalization votes are received by marshal, we need some way to self-identify as the submitter of these votes - right now I’m just naively sending all of the chunks that the validator has locally, which is inefficient.
  • Take a second look at the method for mapping of block digest <> coding commitment; Right now we wait for enough chunks to arrive before anything (since we must have the digest,) which seems right, but there could be a better method.
  • Use durable / prunable stores in the ShardLayer, prune when Orchestrator reports.
  • Support subscriptions in ShardLayer
  • Support subscriptions to individual chunks; The Automaton needs this in order to verify individual chunks prior to sending notarization votes.
  • Wait to receive minimum amount of valid shards - currently only waiting for the minimum amount of shards w/o checks.
  • Consider eager reconstruction for performance. Right now, blocks are reconstructed on-demand. If the ShardMailbox had a way of being notified when its buffered::Mailbox received shards, it could register a listener when Automaton::verify initially verifies the local shard for the notarization vote. This would (hopefully) mean that by the time a proposer wanted to build on the notarized parent, the block is already present for them.
  • Handle "couldn't reconstruct" case. Will require marshal to store chunks for blocks that cannot be reconstructed to convince peers. CodingAdapter will need to ensure that it never asks the application to build on a "bad" parent; The application should not need to know this detail about the consensus chain. (e: or, a more elegant solution: [consensus::simplex] Allow for a second, post-notarization-certificate, verify(...) hook to the Automaton, called finalizable(...) #1767)

Viz

sequenceDiagram
    participant P as Proposer / App
    participant E as Erasure Encoder
    participant N1 as Participant 1
    participant N2 as Participant 2
    participant N3 as Participant 3
    participant Nr as Participant r
    participant V as Consensus
    participant S as Shard Layer
    participant M as Marshal
    
    Note over P,M: Block Proposal & Encoding Phase
    P->>E: Original block data
    E->>E: Split into k chunks + generate r parity chunks
    E->>P: Return k+r encoded chunks
    
    Note over P,M: Distribution Phase
    P->>N1: Send chunk 1 + merkle proof
    P->>N2: Send chunk 2 + merkle proof  
    P->>N3: Send chunk 3 + merkle proof
    P->>Nr: Send chunk k+r + merkle proof
    
    Note over P,M: Notarization Phase
    N1->>N1: Validate chunk integrity
    
    N2->>N2: Validate chunk integrity
    N3->>N3: Validate chunk integrity
    Nr->>Nr: Validate chunk integrity
    
    N1->>V: Attest chunk validity + vote
    N2->>V: Attest chunk validity + vote
    N3->>V: Attest chunk validity + vote
    Nr->>V: Attest chunk validity + vote
    V->>V: Notarization
    
    Note over P,M: Reconstruction Phase (parallel w/ finalization)
    S->>S: Collect ≥k valid chunks
    S->>S: Reconstruct original block
    S->>S: Verify commitment
    S->>M: Send block

    Note over P,M: Finalization Phase (parallel w/ reconstruction)
    N1->>V: Vote
    N2->>V: Vote
    N3->>V: Vote
    Nr->>V: Vote
    V->>V: Finalization

    Note over P,M: Reporting Phase
    M->>P: Send finalized block
Loading

honest validator shard distribution
Untitled-2025-06-05-1043 (6)

Meta

closes #1520

@clabby clabby self-assigned this Sep 21, 2025
@clabby clabby force-pushed the cl/consensus-erasure-coding branch 3 times, most recently from b48f6e0 to 39ef3e4 Compare September 22, 2025 17:20
@patrick-ogrady
Copy link
Contributor

Right now we wait for enough chunks to arrive before anything (since we must have the digest,) which seems right, but there could be a better method.

In this case, you are referring to marshal right? If so, yes. In consensus, we should send a notarize as soon as we observe our fragment is correctly placed in the BMT.

@clabby clabby force-pushed the cl/consensus-erasure-coding branch 4 times, most recently from e44f142 to 9b114d1 Compare September 23, 2025 17:13
@clabby clabby force-pushed the cl/consensus-erasure-coding branch 7 times, most recently from b5abca8 to 590cdd2 Compare September 27, 2025 18:42
@clabby clabby changed the base branch from main to cronokirby/coding-api-v2 September 27, 2025 18:43
@clabby clabby force-pushed the cl/consensus-erasure-coding branch 9 times, most recently from a1657ec to acd884d Compare September 28, 2025 18:06
@clabby clabby added this to Tracker Sep 28, 2025
@clabby clabby moved this to In Progress in Tracker Sep 28, 2025
@clabby clabby force-pushed the cl/consensus-erasure-coding branch from acd884d to 5f67c8b Compare September 28, 2025 20:08
@clabby clabby deleted the branch commonwarexyz:main September 29, 2025 23:19
@clabby clabby closed this Sep 29, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in Tracker Sep 29, 2025
@clabby clabby reopened this Sep 30, 2025
@clabby clabby changed the base branch from cronokirby/coding-api-v2 to main September 30, 2025 00:32
@clabby clabby force-pushed the cl/consensus-erasure-coding branch 8 times, most recently from ecd698a to d902967 Compare October 1, 2025 21:27
clabby added 14 commits October 8, 2025 17:24
Adds type definitions for erasure coding.
Adds a new wrapper around the `buffered::Mailbox` that shards messages
and recovers blocks upon retrieval of enough shards from the network.
Integrates the new `ShardedMailbox` into marshal, allowing for the
broadcast and reconstruction of `CodedBlock`s.
Defines a new interface for applications that employ erasure coding
Introduces a wrapper around the new `Application` trait that minimizes
the boilerplate required to implement applications that utilize sharded
broadcast.

This wrapper is possible because when using erasure coding, the verify
and broadcast steps are identical for all applications.
[consensus/marshal] Append `CodingConfig` to consensus commitment
Broadcasts the local shard on verification, rather than on every notarization vote received from consensus.
Reduces pressure on the marshal control loop
@clabby clabby force-pushed the cl/consensus-erasure-coding branch from d902967 to 3a1bf45 Compare October 8, 2025 21:24
Copy link

codecov bot commented Oct 8, 2025

Codecov Report

❌ Patch coverage is 23.62768% with 640 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.53%. Comparing base (87fff07) to head (3a1bf45).

Files with missing lines Patch % Lines
consensus/src/marshal/coding/actor.rs 0.00% 244 Missing ⚠️
consensus/src/marshal/coding/application.rs 0.00% 136 Missing ⚠️
consensus/src/marshal/coding/types.rs 60.80% 98 Missing ⚠️
consensus/src/marshal/actor.rs 0.00% 74 Missing ⚠️
consensus/src/marshal/coding/mailbox.rs 0.00% 57 Missing ⚠️
consensus/src/types.rs 51.11% 22 Missing ⚠️
consensus/src/marshal/ingress/orchestrator.rs 0.00% 4 Missing ⚠️
consensus/src/marshal/mocks/block.rs 78.57% 3 Missing ⚠️
consensus/src/marshal/finalizer.rs 0.00% 2 Missing ⚠️
@@            Coverage Diff             @@
##             main    #1680      +/-   ##
==========================================
- Coverage   92.28%   90.53%   -1.76%     
==========================================
  Files         304      307       +3     
  Lines       79179    79315     +136     
==========================================
- Hits        73072    71807    -1265     
- Misses       6107     7508    +1401     
Files with missing lines Coverage Δ
coding/src/lib.rs 98.26% <ø> (+4.62%) ⬆️
consensus/src/marshal/cache.rs 0.00% <ø> (-94.10%) ⬇️
consensus/src/marshal/ingress/handler.rs 82.19% <100.00%> (-10.96%) ⬇️
consensus/src/marshal/ingress/mailbox.rs 0.00% <ø> (-84.00%) ⬇️
consensus/src/marshal/finalizer.rs 0.00% <0.00%> (-88.64%) ⬇️
consensus/src/marshal/mocks/block.rs 80.00% <78.57%> (-20.00%) ⬇️
consensus/src/marshal/ingress/orchestrator.rs 0.00% <0.00%> (-88.89%) ⬇️
consensus/src/types.rs 72.52% <51.11%> (-20.96%) ⬇️
consensus/src/marshal/coding/mailbox.rs 0.00% <0.00%> (ø)
consensus/src/marshal/actor.rs 0.00% <0.00%> (-88.71%) ⬇️
... and 3 more

... and 15 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87fff07...3a1bf45. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[consensus/marshal] Integrate commonware-coding By Default

2 participants