Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[commonware-storage/mmr] less restrictive in-mem-mmr forgetting #484

Merged
merged 5 commits into from
Feb 12, 2025

Conversation

roberto-bayardo
Copy link
Collaborator

@roberto-bayardo roberto-bayardo commented Feb 12, 2025

Addresses #459

@roberto-bayardo roberto-bayardo force-pushed the improve-mmr-forgetting branch 10 times, most recently from a2d6750 to bf64d3e Compare February 12, 2025 02:32
@roberto-bayardo roberto-bayardo changed the title [WIP] [commonware-storage/mmr] less restrictive in-mem-mmr forgetting [commonware-storage/mmr] less restrictive in-mem-mmr forgetting Feb 12, 2025
@roberto-bayardo roberto-bayardo marked this pull request as ready for review February 12, 2025 02:46
oldest_remembered_node_pos: u64,
) -> u64 {
for (peak_pos, height) in peak_iterator {
if peak_pos >= oldest_remembered_node_pos {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I like the idea of continuing on peak_pos < oldest_remembered_node_pos to reduce nesting

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

me too, changed.

return oldest_remembered_node_pos;
}
}
0 // mmr is empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we should be more careful about this case? Couldn't you also hit here if each peak_pos is < oldest_remembered_node_pos? I think we should handle those differently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a valid MMR, the very last node is always a peak. So this wouldn't happen, unless the MMR is empty, or oldest_remembered_node_pos is specified incorrectly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm referring to the latter case. It "shouldn't" happen but I think we should consider returning an error if it does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I have made it panic(), is that OK vs returning error? I think of this like an array out of bounds condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it should never ever happen, I think assert/panic makes sense.

0 // mmr is empty
}

/// Returns the position of the oldest node whose hash will be required to prove inclusion of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit whose "digest'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

/// Forgetting this position will render the node with position `provable_pos` unprovable.
pub(crate) fn oldest_required_proof_pos(peak_iterator: PeakIterator, provable_pos: u64) -> u64 {
for (peak_pos, height) in peak_iterator {
if peak_pos >= provable_pos {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: again prefer continue on <

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

return provable_pos;
}
}
0 // mmr is empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like above, want to be careful with this condition when peak_iterator is non-zero

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

responded above


// The hashes of the MMR's peaks that are older than oldest_remembered_pos, keyed by their
// position.
old_peaks: HashMap<u64, H::Digest>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok so I was on-track with my questions about separating peak tracking 👍

/// Forgotten nodes will no longer be provable, nor will some nodes that follow them in some
/// cases. Use forget_max() to guarantee a desired node (and all that follow it) will remain
/// provable after forgetting.
pub fn forget_to_pos(&mut self, pos: u64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want this one to be public (could be called forget_unchecked). Seems better to corral users to either forget_all (which I think should be called forget_max) or forget_max (which I think should be called forget_to_pos).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find forget_all to be more descriptive/literal than forget_max for the case where we forget everything. I think forget_max may just be too ambiguous. Still trying to think of a better name though...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I've changed "forget_max" to just plain old "forget", left forget_to_pos named as is but reduced its visiblity to internal to the crate only. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its definitely an improvement. 👍

Not sure how I feel about forget_all vs forget_max but that's your call. Definitely "good enough" now.

@@ -85,7 +85,7 @@ impl<B: Blob, E: RStorage<B>, H: Hasher> Mmr<B, E, H> {
let item = journal.read(i).await?;
vec.push(item);
}
let mem_mmr = MemMmr::init(vec, oldest_peak);
let mem_mmr = MemMmr::init(vec, oldest_peak, vec![]);
Copy link
Contributor

@patrick-ogrady patrick-ogrady Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hmm, not in this case, since we're bootstrapping with all nodes starting from the very oldest peak. But we probably don't want to do this, and instead bootstrap in a "forget_all" state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh how weird, seems I managed to edit your comment with my reply.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to bootstrap in the "forget_all" state (in which case there are old peaks).

@@ -85,7 +85,7 @@ impl<B: Blob, E: RStorage<B>, H: Hasher> Mmr<B, E, H> {
let item = journal.read(i).await?;
vec.push(item);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend calling vec something more descriptive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have reworked this code to bootstrap in the "forget_all" state, with more descriptive names.

@roberto-bayardo roberto-bayardo force-pushed the improve-mmr-forgetting branch 2 times, most recently from 0818788 to c31e780 Compare February 12, 2025 17:56
if parent_pos == provable_pos {
// If we hit the node we are trying to prove while walking the path, then no
// older nodes are required to prove it.
println!("HELLO!!!!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe, you think? Leftover from my coverage debugging...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

return oldest_remembered_node_pos;
}
}
0 // mmr is empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm referring to the latter case. It "shouldn't" happen but I think we should consider returning an error if it does.

@roberto-bayardo roberto-bayardo merged commit 8d2d837 into main Feb 12, 2025
7 checks passed
@roberto-bayardo roberto-bayardo deleted the improve-mmr-forgetting branch February 12, 2025 23:15
@roberto-bayardo roberto-bayardo restored the improve-mmr-forgetting branch February 12, 2025 23:15
@roberto-bayardo roberto-bayardo deleted the improve-mmr-forgetting branch February 12, 2025 23:15
Copy link

codecov bot commented Feb 12, 2025

Codecov Report

Attention: Patch coverage is 98.24561% with 4 lines in your changes missing coverage. Please review.

Project coverage is 94.30%. Comparing base (ccf2f88) to head (7ea6d0b).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
storage/src/mmr/iterator.rs 96.55% 2 Missing ⚠️
storage/src/mmr/mem.rs 97.40% 2 Missing ⚠️
@@            Coverage Diff             @@
##             main     #484      +/-   ##
==========================================
+ Coverage   94.25%   94.30%   +0.05%     
==========================================
  Files          97       97              
  Lines       28213    28372     +159     
==========================================
+ Hits        26591    26756     +165     
+ Misses       1622     1616       -6     
Files with missing lines Coverage Δ
storage/src/mmr/journaled.rs 99.06% <100.00%> (+0.02%) ⬆️
storage/src/mmr/verification.rs 95.40% <100.00%> (+0.64%) ⬆️
storage/src/mmr/iterator.rs 98.69% <96.55%> (-1.31%) ⬇️
storage/src/mmr/mem.rs 95.25% <97.40%> (+1.72%) ⬆️

... and 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ccf2f88...7ea6d0b. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants