Skip to content

Conversation

clabby
Copy link
Collaborator

@clabby clabby commented Oct 3, 2025

Overview

Adds an example for resharing over a replicated log, rather than using a trusted arbiter (a. la. commonware-vrf).

makes progress on #1708 - to also reconfigure after performing a reshare, we'll need to have the epocher example in.

@clabby clabby added the example An example of how to use the library code label Oct 3, 2025
@clabby clabby force-pushed the cl/reshare-example branch from 2889336 to d5ed983 Compare October 3, 2025 19:46
Copy link

cloudflare-workers-and-pages bot commented Oct 3, 2025

Deploying monorepo with  Cloudflare Pages  Cloudflare Pages

Latest commit: 3c94581
Status: ✅  Deploy successful!
Preview URL: https://cc148058.monorepo-eu0.pages.dev
Branch Preview URL: https://cl-reshare-example.monorepo-eu0.pages.dev

View logs

@clabby clabby force-pushed the cl/reshare-example branch 2 times, most recently from afc14ba to 3a4b100 Compare October 3, 2025 19:50
@clabby clabby requested a review from patrick-ogrady October 3, 2025 19:56
@clabby clabby added this to Tracker Oct 3, 2025
@clabby clabby moved this to Ready for Review in Tracker Oct 3, 2025
@clabby clabby requested a review from BrendanChou October 3, 2025 19:56
@clabby clabby force-pushed the cl/reshare-example branch from 3a4b100 to 06c99fe Compare October 3, 2025 20:33
Copy link
Contributor

@patrick-ogrady patrick-ogrady left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core cryptography is used correctly but suggest a much tighter coupling with block height (will make this much simpler/easier to follow -> we can't be "optimistic" with this, so we shouldn't design the code assuming we want to).


// --- `serde` helpers ---

mod serde_hex {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use commonware_utils::hex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to reach for serde for some reason, we should expand the scope of what utils can do.

Copy link
Collaborator Author

@clabby clabby Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commonware_utils::hex / commonware_utils::from_hex are incompatible with serde without additional wrapping. You have to type-erase your hex data as a String, and you end up with something like:

#[derive(Serialize, Deserialize)]
pub struct MySerdeStuff {
    /// A `Public<MinSig>` that's type-erased as a hex string
    pub poly: String,
}

// 2-hop serialization
let poly: Poly<MinSig> = ...;
let t = MySerdeStuff { poly: hex(poly.encode().as_ref()) };
let ser_t = serde_json::to_string(&t);

// ...

// 2-hop deserialization
let deser_t: MySerdeStuff = serde_json::from_string(&ser_t).unwrap();
let poly_bytes = from_hex(&deser_t.poly).unwrap();
let poly = Poly::<MinSig>::decode(&poly_bytes).unwrap();

Normally it's common to have some serde feature flags lying around in crates so that you can directly serialize / deserialize without intermediate steps or type erasure, i.e.:

#[derive(Serialize, Deserialize)]
pub struct MySerdeStuff {
    pub poly: Public<MinSig>,
}

let poly: Poly<MinSig> = ...;
let t = MySerdeStuff { poly };

// one-hop serialization
let ser_t = serde_json::to_string(&t);

// one-hop deserialization
let deser_t: MySerdeStuff = serde_json::from_string(&ser_t).unwrap();

See for example how we did it over in kona.

@clabby clabby force-pushed the cl/reshare-example branch 3 times, most recently from e24f8cd to 19f84ee Compare October 5, 2025 19:45
@clabby clabby self-assigned this Oct 5, 2025
@clabby clabby force-pushed the cl/reshare-example branch from 19f84ee to 151e66d Compare October 5, 2025 20:12
&mut receiver,
);

while let Some(message) = self.mailbox.next().await {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to suggest merging this with application because we only ever interact with it (what I've been using as the "gut check" for when an actor should be merged/standalone)?

I think the dkg logic could remain in its own function but seems clearer to have this run inside the application instead of as another actor that is "driven" by blocks?

We can debate tomorrow.

Copy link
Collaborator Author

@clabby clabby Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, could certainly do that. I mainly separated them as a best-practice for the example; Hopefully, we can steer people away from doing expensive computation in the application control loop (soon, that'll be fool proof with the post-coding abstraction.)

Anything that can slow down building (and by virtue, consensus) == bad imo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ i.e., the way I think about "application" is that it's an incredibly hot control loop; In a well designed CW app, it seems like you should never be able to hold it up for more than ~a few ms (and all computation that's being done there should be critical progress to moving to the next view.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair retort

let genesis_digest = genesis.digest();
let built = Arc::new(Mutex::new(None));

while let Some(message) = self.mailbox.next().await {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to the actor-less abstraction 👀

@clabby clabby force-pushed the cl/reshare-example branch 4 times, most recently from 61297c7 to c883fdb Compare October 7, 2025 14:35
@clabby clabby marked this pull request as draft October 8, 2025 01:50
@clabby clabby force-pushed the cl/reshare-example branch from 9b4aa44 to 6dd7e29 Compare October 8, 2025 01:50
@clabby clabby force-pushed the cl/reshare-example branch 11 times, most recently from 7e40088 to 9aa01bd Compare October 13, 2025 17:03
Copy link
Contributor

@patrick-ogrady patrick-ogrady left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I could do before internet cut out. Will keep reviewing.

}

// Ask the DKG actor for a result to include
let reshare = context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't feel great about this timeout here.

The intention with the consensus-process dropping the response channel when it no longer needed something was to have a clean abstraction for the application to build ASAP whatever it had and then return. If dkg doesn't have a commitment, it should just be able to return no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dkg::Actor can be doing work when the request for a commitment comes in, and may not be able to immediately return None.

I originally had the work that DkgManager does in a separate thread to make sure that the control loop for the dkg::Actor could always issue snappy responses, though it led to the code being a bit less clear to follow (from your initial review @ #1796 (comment)).

Now, since actions are dispatched inline, it's possible that the reshare process could hold up block building without this timeout.

match relative_height.cmp(&(BLOCKS_PER_EPOCH / 2)) {
Ordering::Less => {
// Continuously distribute shares to any players who haven't acknowledged
// receipt yet.
manager.distribute(epoch).await;
// Process any incoming messages from other dealers/players.
manager.process_messages(epoch).await;
}
Ordering::Equal => {
// Process any final messages from other dealers/players.
manager.process_messages(epoch).await;
// At the midpoint of the epoch, construct the deal outcome for inclusion.
manager.construct_deal_outcome(epoch).await;
}
Ordering::Greater => {
// Process any incoming deal outcomes from dealing contributors.
manager.process_block(epoch, block).await;
}
}

return;
}

// Verify the block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check this before the re-proposal logic. Otherwise, someone could go right to the last height and re-propose (skipping the inner heights).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we checked this first, we'd never be able to check the re-propose case.

block.height != parent.height + 1 + block.parent != parent.digest() are always true when re-proposing, which would cause verification to always fail.

Note the order of checks here:

  1. If re-proposing (parent commitment == proposed commitment), the block's height must be the last height in the epoch to validate.
  2. Else, the block must:
    • Have a height that increases by 1 relative to its parent.
    • Have its parent digest set to the digest of the parent.
    • Be in the same epoch as the round from the consensus Context.

Because you cannot re-propose without building on top of a previously verified parent, and prior to the last height, the standard rules are followed, it should not be possible to skip the inner heights here.

tl;dr - this saves us:

if block.height == get_last_height(round.epoch()) {

}
}

impl<V: Variant, P: PublicKey> p2p::Coordinator for Supervisor<V, P> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we are changing consensus participants but not peers, I wonder if we should separate this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea - we can if you'd prefer! Not a huge deal in the example; Easy enough to construct a supervisor for the sake of the P2P resolver that marshal needs.


if inactive_participants.len() < self.num_participants_per_epoch {
// Choose some random active participants to also be players if there are not enough.
let mut rng = StdRng::seed_from_u64(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use context's rng?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to do seeded RNG with Context? We need this to be pseudo-random so that all participants grab the exact same set of initial players.

let (mut network, mut oracle) = discovery::Network::new(context.with_label("network"), p2p_cfg);

// Register all possible peers
oracle.register(0, peer_config.all_peers()).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we don't update oracle in subsequent epochs, we should remove the p2p oracle impl from supervisor? (see earlier comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

// If ends with contiguous_height, ensure it is at least required_container
if metric.ends_with("_processed_height") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ensure this runs over a few epochs (it may already but just calling out to be sure)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it does! We've got test_1k that goes over 10 epochs.

@clabby clabby force-pushed the cl/reshare-example branch 2 times, most recently from 51a7d14 to 4b5b815 Compare October 13, 2025 19:59
@clabby clabby force-pushed the cl/reshare-example branch from 4b5b815 to 3c94581 Compare October 13, 2025 23:55
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 58.97436% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.20%. Comparing base (9c7abdc) to head (3c94581).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
storage/src/store/mod.rs 0.00% 16 Missing ⚠️
@@            Coverage Diff             @@
##             main    #1796      +/-   ##
==========================================
+ Coverage   92.10%   92.20%   +0.09%     
==========================================
  Files         307      308       +1     
  Lines       79648    81273    +1625     
==========================================
+ Hits        73362    74938    +1576     
- Misses       6286     6335      +49     
Files with missing lines Coverage Δ
consensus/src/threshold_simplex/engine.rs 96.92% <100.00%> (+4.73%) ⬆️
cryptography/src/bls12381/dkg/arbiter.rs 95.58% <ø> (ø)
storage/src/store/mod.rs 88.14% <0.00%> (-2.36%) ⬇️

... and 10 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c7abdc...3c94581. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

example An example of how to use the library code

Projects

Status: Ready for Review

Development

Successfully merging this pull request may close these issues.

2 participants