-
Notifications
You must be signed in to change notification settings - Fork 15
chore: remove role from backup operator #296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e29ccd7 to
155b31b
Compare
Consolidated Tests Results 2025-12-01 - 20:32:09Test ResultsDetails
test-reporter: Run #194
🎉 All tests passed!TestsView All Tests
🍂 No flaky tests in this run. Github Test Reporter by CTRF 💚 🔄 This comment has been updated |
155b31b to
ccfb057
Compare
ccfb057 to
c7fb7ba
Compare
c7fb7ba to
452d1f2
Compare
7cd1c0a to
7808453
Compare
Actually bc2wrap does not really follow a standard format. For now I would try to avoid having diverging ways of serializing An alternative approach is to use a standard key format instead of bc2wrap or |
by bc2wrap I mean the sec1 format, actually. since bincode doesn't add any metadata when serializing byte vectors |
2b2a601 to
e4d74e3
Compare
jot2re
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Mainly some minor comments, in particular whether there are certain places where it makes sense to use a hashmap with the verification key as the key-value instead of an index.
I see you made a todo for backwards compatibility tests, are you continuing with making a PR for that?
core-client/src/lib.rs
Outdated
|
|
||
| let mut req_tasks = JoinSet::new(); | ||
| for (core_idx, ce) in core_endpoints.iter() { | ||
| for (endpoint_idx, ce) in core_endpoints.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this double loop is the best approach to handle this. Would it not be less fragile to instead have the core_endpoint be a hashmap of the verification key to the CoreServiceEndpointClient? Then it is possible to look up the end point directly for the key in a custodian_recovery_outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I think it would be better if we remove the party ID from the core-client code completely since going forward there will be no more unique, long-term praty IDs. but I think that will need to be in another PR as that's used everywhere, in all the other core-client functions at the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I think so too! Can I get you to make an issue for this and link it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the issue link in cd1a0ef
core/service/src/backup/tests.rs
Outdated
| engine::base::derive_request_id, | ||
| }; | ||
| use aes_prng::AesRng; | ||
| use alloy_primitives::Address; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we are still relying too much on arrays in this file, since now hashmaps with verification keys might make more sense in certain places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the verification key cannot be the key for the hashmap as it does not implement Ord. I wrote it in a comment here
https://github.com/zama-ai/kms/pull/296/files#diff-d494ecdbb919122886fc25291a0e4a910989bcd9355e234ad5e0d50c4c379fbcR593
in core/service/src/backup/tests.rs
we could implement Ord I suppose but these hashmaps are only used for managing our tests, so I was hesitant to add something in an important type like PublicSigKey that will only be used for testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed that.
I don't think there is an issue in adding Ord as it is just additional functionality that does not change anything with the previous code.
Alternatively .verf_key_id() can be used to get a byte array that is a unique representation of the PublicSigKey. Maybe that is the best to use as index instead of the entire key?
TBH I think either approach is totally fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated it to use verf_key_id here e0541e6
core/service/src/backup/tests.rs
Outdated
| ) -> ( | ||
| BTreeMap< | ||
| Role, | ||
| Address, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the address used here instead of the verification key itself like in other files?
I agree both work. But I might be more inclined to the verification key since then no changes will be needed in case we change the signing scheme
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #296 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case verf_key_id() would be more suitable as it is meant to be more general than the Address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it here e0541e6
| async fn run_custodian_recovery_init( | ||
| kms_clients: &HashMap<u32, CoreServiceEndpointClient<Channel>>, | ||
| ) -> Vec<RecoveryRequest> { | ||
| ) -> Vec<(u32, RecoveryRequest)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the use of role index and not verification key here? I think this leads back to a previous comment about storing kms_clients in a hash map keyed with the verification key instead of role index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answered here #296 (comment)
I had to add this u32 to link the request to the key used in the core-client map
| let backup_id = recovery_requests[0].backup_id.clone().unwrap(); | ||
| let mut outputs_for_operators = HashMap::new(); | ||
| ) -> HashMap<Address, (u32, CustodianRecoveryRequest)> { | ||
| use kms_grpc::kms::v1::CustodianRecoveryOutput; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: could you move the import to the top of the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handled it in e0541e6
| cur_op_role, | ||
| ); | ||
| parsed_custodian_rec.insert(cur_op_role, old_val); | ||
| match parsed_custodian_rec.entry(output.custodian_role) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I think this can be done a bit simpler using or_insert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i prefer the current approach since it logs a weird behaviour
| CustodianRecoveryOutput { | ||
| custodian_role, | ||
| operator_role, | ||
| operator_verification_key: bc2wrap::serialize(&operator_verification_key).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just add a comment that this is a legacy approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added reference to the issue about investigating key serialization in e0541e6
| root_path: &Path, | ||
| threshold: usize, | ||
| operator_role: Role, | ||
| operator_id: usize, // not actual operator ID, just for managing where the files go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it maybe be a string representing a dir instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this operator_id isn't used for anything other than specifying the path
I think it's better like this because this ID is used for the functions later on as we need to do a nested iteration over both operator ID and custodian ID. so it's cleaner for the directory structure when both IDs are in the path
I am a bit surprised about that, because I did some experiments with the different serializations we had for verification keys and I found the result of bc2wrap to be different from the |
you're right that the serialization is different, it was my misunderstanding. we'll track this issue in zama-ai/kms-internal/issues/2836 |
4b7e4dc to
f9d1e3c
Compare
f9d1e3c to
aaa4f33
Compare
Description of changes
One question/discussion, should we use bc2wrap for PublicSigKey like everywhere else in the code or we should use tfhe::safe_serialization? My understanding is that the custodian CLI is more of an independent component so it should be using more more "standard" formats, that's why I went with bc2wrap. But we can easily change it into
tfhe::safe_serializationif we need to.Issue ticket number and link
Closes zama-ai/kms-internal#2826
PR Checklist
I attest that all checked items are satisfied. Any deviation is clearly justified above.
chore: ...).TODO(#issue).unwrap/expect/paniconly in tests or for invariant bugs (documented if present).devopslabel + infra notified + infra-team reviewer assigned.!and affected teams notified.Zeroize+ZeroizeOnDropimplemented.unsafe; if unavoidable: minimal, justified, documented, and test/fuzz covered.Dependency Update Questionnaire (only if deps changed or added)
Answer in the
Cargo.tomlnext to the dependency (or here if updating):More details and explanations for the checklist and dependency updates can be found in CONTRIBUTING.md