-
Notifications
You must be signed in to change notification settings - Fork 111
[examples] Reshare over replicated log #1796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
2889336
to
d5ed983
Compare
Deploying monorepo with
|
Latest commit: |
3c94581
|
Status: | ✅ Deploy successful! |
Preview URL: | https://cc148058.monorepo-eu0.pages.dev |
Branch Preview URL: | https://cl-reshare-example.monorepo-eu0.pages.dev |
afc14ba
to
3a4b100
Compare
3a4b100
to
06c99fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core cryptography is used correctly but suggest a much tighter coupling with block height (will make this much simpler/easier to follow -> we can't be "optimistic" with this, so we shouldn't design the code assuming we want to).
|
||
// --- `serde` helpers --- | ||
|
||
mod serde_hex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use commonware_utils::hex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to reach for serde for some reason, we should expand the scope of what utils can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commonware_utils::hex
/ commonware_utils::from_hex
are incompatible with serde
without additional wrapping. You have to type-erase your hex data as a String
, and you end up with something like:
#[derive(Serialize, Deserialize)]
pub struct MySerdeStuff {
/// A `Public<MinSig>` that's type-erased as a hex string
pub poly: String,
}
// 2-hop serialization
let poly: Poly<MinSig> = ...;
let t = MySerdeStuff { poly: hex(poly.encode().as_ref()) };
let ser_t = serde_json::to_string(&t);
// ...
// 2-hop deserialization
let deser_t: MySerdeStuff = serde_json::from_string(&ser_t).unwrap();
let poly_bytes = from_hex(&deser_t.poly).unwrap();
let poly = Poly::<MinSig>::decode(&poly_bytes).unwrap();
Normally it's common to have some serde
feature flags lying around in crates so that you can directly serialize / deserialize without intermediate steps or type erasure, i.e.:
#[derive(Serialize, Deserialize)]
pub struct MySerdeStuff {
pub poly: Public<MinSig>,
}
let poly: Poly<MinSig> = ...;
let t = MySerdeStuff { poly };
// one-hop serialization
let ser_t = serde_json::to_string(&t);
// one-hop deserialization
let deser_t: MySerdeStuff = serde_json::from_string(&ser_t).unwrap();
See for example how we did it over in kona.
e24f8cd
to
19f84ee
Compare
19f84ee
to
151e66d
Compare
examples/reshare/src/dkg/actor.rs
Outdated
&mut receiver, | ||
); | ||
|
||
while let Some(message) = self.mailbox.next().await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tempted to suggest merging this with application
because we only ever interact with it (what I've been using as the "gut check" for when an actor should be merged/standalone)?
I think the dkg logic could remain in its own function but seems clearer to have this run inside the application instead of as another actor that is "driven" by blocks?
We can debate tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, could certainly do that. I mainly separated them as a best-practice for the example; Hopefully, we can steer people away from doing expensive computation in the application control loop (soon, that'll be fool proof with the post-coding abstraction.)
Anything that can slow down building (and by virtue, consensus) == bad imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ i.e., the way I think about "application" is that it's an incredibly hot control loop; In a well designed CW app, it seems like you should never be able to hold it up for more than ~a few ms (and all computation that's being done there should be critical progress to moving to the next view.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair retort
let genesis_digest = genesis.digest(); | ||
let built = Arc::new(Mutex::new(None)); | ||
|
||
while let Some(message) = self.mailbox.next().await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to the actor-less abstraction 👀
61297c7
to
c883fdb
Compare
9b4aa44
to
6dd7e29
Compare
7e40088
to
9aa01bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I could do before internet cut out. Will keep reviewing.
} | ||
|
||
// Ask the DKG actor for a result to include | ||
let reshare = context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't feel great about this timeout
here.
The intention with the consensus-process dropping the response channel when it no longer needed something was to have a clean abstraction for the application to build ASAP whatever it had and then return. If dkg
doesn't have a commitment, it should just be able to return no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dkg::Actor
can be doing work when the request for a commitment comes in, and may not be able to immediately return None
.
I originally had the work that DkgManager
does in a separate thread to make sure that the control loop for the dkg::Actor
could always issue snappy responses, though it led to the code being a bit less clear to follow (from your initial review @ #1796 (comment)).
Now, since actions are dispatched inline, it's possible that the reshare process could hold up block building without this timeout.
monorepo/examples/reshare/src/dkg/actor.rs
Lines 279 to 299 in 9aa01bd
match relative_height.cmp(&(BLOCKS_PER_EPOCH / 2)) { | |
Ordering::Less => { | |
// Continuously distribute shares to any players who haven't acknowledged | |
// receipt yet. | |
manager.distribute(epoch).await; | |
// Process any incoming messages from other dealers/players. | |
manager.process_messages(epoch).await; | |
} | |
Ordering::Equal => { | |
// Process any final messages from other dealers/players. | |
manager.process_messages(epoch).await; | |
// At the midpoint of the epoch, construct the deal outcome for inclusion. | |
manager.construct_deal_outcome(epoch).await; | |
} | |
Ordering::Greater => { | |
// Process any incoming deal outcomes from dealing contributors. | |
manager.process_block(epoch, block).await; | |
} | |
} |
return; | ||
} | ||
|
||
// Verify the block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check this before the re-proposal logic. Otherwise, someone could go right to the last height and re-propose (skipping the inner heights).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we checked this first, we'd never be able to check the re-propose case.
block.height != parent.height + 1
+ block.parent != parent.digest()
are always true
when re-proposing, which would cause verification to always fail.
Note the order of checks here:
- If re-proposing (parent commitment == proposed commitment), the block's height must be the last height in the epoch to validate.
- Else, the block must:
- Have a height that increases by
1
relative to its parent. - Have its parent digest set to the digest of the parent.
- Be in the same epoch as the round from the consensus
Context
.
- Have a height that increases by
Because you cannot re-propose without building on top of a previously verified parent, and prior to the last height, the standard rules are followed, it should not be possible to skip the inner heights here.
tl;dr - this saves us:
if block.height == get_last_height(round.epoch()) { |
} | ||
} | ||
|
||
impl<V: Variant, P: PublicKey> p2p::Coordinator for Supervisor<V, P> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we are changing consensus participants but not peers, I wonder if we should separate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea - we can if you'd prefer! Not a huge deal in the example; Easy enough to construct a supervisor for the sake of the P2P resolver that marshal needs.
examples/reshare/src/dkg/actor.rs
Outdated
|
||
if inactive_participants.len() < self.num_participants_per_epoch { | ||
// Choose some random active participants to also be players if there are not enough. | ||
let mut rng = StdRng::seed_from_u64(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use context's rng?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to do seeded RNG with Context
? We need this to be pseudo-random so that all participants grab the exact same set of initial players.
let (mut network, mut oracle) = discovery::Network::new(context.with_label("network"), p2p_cfg); | ||
|
||
// Register all possible peers | ||
oracle.register(0, peer_config.all_peers()).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we don't update oracle in subsequent epochs, we should remove the p2p oracle impl from supervisor? (see earlier comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
// If ends with contiguous_height, ensure it is at least required_container | ||
if metric.ends_with("_processed_height") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should ensure this runs over a few epochs (it may already but just calling out to be sure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it does! We've got test_1k
that goes over 10 epochs.
51a7d14
to
4b5b815
Compare
4b5b815
to
3c94581
Compare
Codecov Report❌ Patch coverage is
@@ Coverage Diff @@
## main #1796 +/- ##
==========================================
+ Coverage 92.10% 92.20% +0.09%
==========================================
Files 307 308 +1
Lines 79648 81273 +1625
==========================================
+ Hits 73362 74938 +1576
- Misses 6286 6335 +49
... and 10 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Overview
Adds an example for resharing over a replicated log, rather than using a trusted arbiter (a. la.
commonware-vrf
).makes progress on #1708 - to also reconfigure after performing a reshare, we'll need to have the epocher example in.