Skip to content

Conversation

@kc1212
Copy link
Contributor

@kc1212 kc1212 commented Nov 26, 2025

Description of changes

One question/discussion, should we use bc2wrap for PublicSigKey like everywhere else in the code or we should use tfhe::safe_serialization? My understanding is that the custodian CLI is more of an independent component so it should be using more more "standard" formats, that's why I went with bc2wrap. But we can easily change it into tfhe::safe_serialization if we need to.

Issue ticket number and link

Closes zama-ai/kms-internal#2826

PR Checklist

I attest that all checked items are satisfied. Any deviation is clearly justified above.

  • Title follows conventional commits (e.g. chore: ...).
  • Tests added for every new pub item and test coverage has not decreased.
  • Public APIs and non-obvious logic documented; unfinished work marked as TODO(#issue).
  • unwrap/expect/panic only in tests or for invariant bugs (documented if present).
  • No dependency version changes OR (if changed) only minimal required fixes.
  • No architectural protocol changes OR linked spec PR/issue provided.
  • No breaking deployment config changes OR devops label + infra notified + infra-team reviewer assigned.
  • No breaking gRPC / serialized data changes OR commit marked with ! and affected teams notified.
  • No modifications to existing versionized structs OR backward compatibility tests updated.
  • No critical business logic / crypto changes OR ≥2 reviewers assigned.
  • No new sensitive data fields added OR Zeroize + ZeroizeOnDrop implemented.
  • No new public storage data OR data is verifiable (signature / digest).
  • No unsafe; if unavoidable: minimal, justified, documented, and test/fuzz covered.
  • Strongly typed boundaries: typed inputs validated at the edge; no untyped values or errors cross modules.
  • Self-review completed.

Dependency Update Questionnaire (only if deps changed or added)

Answer in the Cargo.toml next to the dependency (or here if updating):

  1. Ownership changes or suspicious concentration?
  2. Low popularity?
  3. Unusual version jump?
  4. Lacking documentation?
  5. Missing CI?
  6. No security / disclosure policy?
  7. Significant size increase?

More details and explanations for the checklist and dependency updates can be found in CONTRIBUTING.md

@kc1212 kc1212 requested a review from a team as a code owner November 26, 2025 08:58
@cla-bot cla-bot bot added the cla-signed The CLA has been signed. label Nov 26, 2025
@kc1212 kc1212 force-pushed the kc1212/chore/2826/remove-backup-operator-role branch 4 times, most recently from e29ccd7 to 155b31b Compare November 26, 2025 14:57
@github-actions
Copy link

github-actions bot commented Nov 26, 2025

Consolidated Tests Results 2025-12-01 - 20:32:09

Test Results

passed 25 passed

Details

tests 25 tests
clock not captured
tool junit-to-ctrf
build kind-testing arrow-right test-reporter link #194
pull-request chore: remove role from backup operator link #296

test-reporter: Run #194

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Pending ⏳ Other ❓ Flaky 🍂 Duration ⏱️
25 25 0 0 0 0 0 not captured

🎉 All tests passed!

Tests

View All Tests
Test Name Status Flaky Duration
identifiers::tests::request_id_ordering 3ms
identifiers::tests::test_id_type_conversions 3ms
identifiers::tests::test_invalid_hex_length 3ms
identifiers::tests::test_invalid_id_all_zeros 3ms
identifiers::tests::test_invalid_v1_request_id 3ms
identifiers::tests::test_key_id_from_str 3ms
identifiers::tests::test_key_id_protobuf_conversion 3ms
identifiers::tests::test_request_id_random 3ms
identifiers::tests::test_v1_request_id_conversion 3ms
identifiers::tests::test_v1_request_id_to_u128_conversion 3ms
identifiers::tests::test_v1_request_id_with_prefix 3ms
identifiers::tests::test_v1_request_id_with_whitespace 3ms
identifiers::tests::test_valid_hex_characters 3ms
rpc_types::tests::idempotent_plaintext 3ms
rpc_types::tests::test_abi_encoding_fhevm 3ms
rpc_types::tests::test_eip712_verification 3ms
rpc_types::tests::test_enum_default 3ms
rpc_types::tests::test_old_fhe_type_enum_compatibility 3ms
rpc_types::tests::test_request_id 3ms
rpc_types::tests::test_request_id_raw_string 3ms
rpc_types::tests::test_types_plaintext_ser 3ms
full_gen_tests_default_threshld_sequential_crs 2m 20s
test_threshld_insecure 3m 21s
full_gen_tests_default_centralzd_sequential_crs 1.8s
test_centralzd_insecure 1m 57s

🍂 No flaky tests in this run.

Github Test Reporter by CTRF 💚

🔄 This comment has been updated

@kc1212 kc1212 force-pushed the kc1212/chore/2826/remove-backup-operator-role branch from 155b31b to ccfb057 Compare November 26, 2025 14:58
@kc1212 kc1212 self-assigned this Nov 26, 2025
@kc1212 kc1212 force-pushed the kc1212/chore/2826/remove-backup-operator-role branch from ccfb057 to c7fb7ba Compare November 26, 2025 15:28
@kc1212 kc1212 force-pushed the kc1212/chore/2826/remove-backup-operator-role branch from c7fb7ba to 452d1f2 Compare November 26, 2025 16:05
@kc1212 kc1212 force-pushed the kc1212/chore/2826/remove-backup-operator-role branch from 7cd1c0a to 7808453 Compare November 27, 2025 13:14
@jot2re
Copy link
Collaborator

jot2re commented Nov 27, 2025

One question/discussion, should we use bc2wrap for PublicSigKey like everywhere else in the code or we should use tfhe::safe_serialization? My understanding is that the custodian CLI is more of an independent component so it should be using more more "standard" formats, that's why I went with bc2wrap. But we can easily change it into tfhe::safe_serialization if we need to.

Actually bc2wrap does not really follow a standard format. For now I would try to avoid having diverging ways of serializing PublicSigKey. Still, what we should probably do is to move to safe_serialization like the rest of the files in the system, but then implement the legacy_serialization interface to allow to still use the old approach to avoid compatibility issues.

An alternative approach is to use a standard key format instead of bc2wrap or safe_serialization. Still, I don't feel that is the best approach since we serialize all the data we need with versioning and out own format everywhere else. Instead we should just make it possible to do a standard export.

@kc1212
Copy link
Contributor Author

kc1212 commented Nov 28, 2025

Actually bc2wrap does not really follow a standard format. For now I would try to avoid having diverging ways of serializing PublicSigKey. Still, what we should probably do is to move to safe_serialization like the rest of the files in the system, but then implement the legacy_serialization interface to allow to still use the old approach to avoid compatibility issues.

by bc2wrap I mean the sec1 format, actually. since bincode doesn't add any metadata when serializing byte vectors

@kc1212 kc1212 force-pushed the kc1212/chore/2826/remove-backup-operator-role branch from 2b2a601 to e4d74e3 Compare November 28, 2025 09:12
Copy link
Collaborator

@jot2re jot2re left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Mainly some minor comments, in particular whether there are certain places where it makes sense to use a hashmap with the verification key as the key-value instead of an index.

I see you made a todo for backwards compatibility tests, are you continuing with making a PR for that?


let mut req_tasks = JoinSet::new();
for (core_idx, ce) in core_endpoints.iter() {
for (endpoint_idx, ce) in core_endpoints.iter() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this double loop is the best approach to handle this. Would it not be less fragile to instead have the core_endpoint be a hashmap of the verification key to the CoreServiceEndpointClient? Then it is possible to look up the end point directly for the key in a custodian_recovery_outputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think it would be better if we remove the party ID from the core-client code completely since going forward there will be no more unique, long-term praty IDs. but I think that will need to be in another PR as that's used everywhere, in all the other core-client functions at the moment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I think so too! Can I get you to make an issue for this and link it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the issue link in cd1a0ef

engine::base::derive_request_id,
};
use aes_prng::AesRng;
use alloy_primitives::Address;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we are still relying too much on arrays in this file, since now hashmaps with verification keys might make more sense in certain places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the verification key cannot be the key for the hashmap as it does not implement Ord. I wrote it in a comment here
https://github.com/zama-ai/kms/pull/296/files#diff-d494ecdbb919122886fc25291a0e4a910989bcd9355e234ad5e0d50c4c379fbcR593

in core/service/src/backup/tests.rs

we could implement Ord I suppose but these hashmaps are only used for managing our tests, so I was hesitant to add something in an important type like PublicSigKey that will only be used for testing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that.
I don't think there is an issue in adding Ord as it is just additional functionality that does not change anything with the previous code.
Alternatively .verf_key_id() can be used to get a byte array that is a unique representation of the PublicSigKey. Maybe that is the best to use as index instead of the entire key?
TBH I think either approach is totally fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it to use verf_key_id here e0541e6

) -> (
BTreeMap<
Role,
Address,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the address used here instead of the verification key itself like in other files?
I agree both work. But I might be more inclined to the verification key since then no changes will be needed in case we change the signing scheme

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case verf_key_id() would be more suitable as it is meant to be more general than the Address

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it here e0541e6

async fn run_custodian_recovery_init(
kms_clients: &HashMap<u32, CoreServiceEndpointClient<Channel>>,
) -> Vec<RecoveryRequest> {
) -> Vec<(u32, RecoveryRequest)> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the use of role index and not verification key here? I think this leads back to a previous comment about storing kms_clients in a hash map keyed with the verification key instead of role index

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answered here #296 (comment)

I had to add this u32 to link the request to the key used in the core-client map

let backup_id = recovery_requests[0].backup_id.clone().unwrap();
let mut outputs_for_operators = HashMap::new();
) -> HashMap<Address, (u32, CustodianRecoveryRequest)> {
use kms_grpc::kms::v1::CustodianRecoveryOutput;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: could you move the import to the top of the file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handled it in e0541e6

cur_op_role,
);
parsed_custodian_rec.insert(cur_op_role, old_val);
match parsed_custodian_rec.entry(output.custodian_role) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I think this can be done a bit simpler using or_insert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i prefer the current approach since it logs a weird behaviour

CustodianRecoveryOutput {
custodian_role,
operator_role,
operator_verification_key: bc2wrap::serialize(&operator_verification_key).unwrap(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just add a comment that this is a legacy approach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added reference to the issue about investigating key serialization in e0541e6

root_path: &Path,
threshold: usize,
operator_role: Role,
operator_id: usize, // not actual operator ID, just for managing where the files go
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it maybe be a string representing a dir instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this operator_id isn't used for anything other than specifying the path

I think it's better like this because this ID is used for the functions later on as we need to do a nested iteration over both operator ID and custodian ID. so it's cleaner for the directory structure when both IDs are in the path

@jot2re
Copy link
Collaborator

jot2re commented Nov 28, 2025

by bc2wrap I mean the sec1 format, actually. since bincode doesn't add any metadata when serializing byte vectors

I am a bit surprised about that, because I did some experiments with the different serializations we had for verification keys and I found the result of bc2wrap to be different from the to_sec1_bytes(). But maybe you mean bc2wrap directly on the byte vector from to_sec1_bytes() and not the PublicSigKey?

@kc1212
Copy link
Contributor Author

kc1212 commented Dec 1, 2025

by bc2wrap I mean the sec1 format, actually. since bincode doesn't add any metadata when serializing byte vectors

I am a bit surprised about that, because I did some experiments with the different serializations we had for verification keys and I found the result of bc2wrap to be different from the to_sec1_bytes(). But maybe you mean bc2wrap directly on the byte vector from to_sec1_bytes() and not the PublicSigKey?

you're right that the serialization is different, it was my misunderstanding. we'll track this issue in zama-ai/kms-internal/issues/2836

@kc1212 kc1212 force-pushed the kc1212/chore/2826/remove-backup-operator-role branch from 4b7e4dc to f9d1e3c Compare December 1, 2025 18:45
@kc1212 kc1212 force-pushed the kc1212/chore/2826/remove-backup-operator-role branch from f9d1e3c to aaa4f33 Compare December 1, 2025 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The CLA has been signed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants