Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce DecoratorSpan to reduce MastNode size #1621

Draft
wants to merge 1 commit into
base: next
Choose a base branch
from

Conversation

yasonk
Copy link
Contributor

@yasonk yasonk commented Jan 12, 2025

This is my attempt to introduce a DecoratorSpan. It seems to be more complicated than I thought it would be, so I decided to open a draft PR to make sure I'm on the right path before I start try to test the changes. Also if somebody just wants to take over, it is fine with me.

I'll add some comments inline to highlight some difficulties or decisions points.

@yasonk yasonk marked this pull request as draft January 12, 2025 23:57
@@ -151,6 +148,8 @@ impl MastForestMerger {
let new_decorator_id = if let Some(existing_decorator) =
self.decorators_by_hash.get(&merging_decorator_hash)
{
// This implies that in some cases it is possible to have intersecting decorator sets
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies that in some cases it is possible to have intersecting decorator sets, meaning a DecoratorSpan will potentially need to be split up into multiple DecoratorSpan objects. Not sure if this is a real scenario or what prevents it from becoming real.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the original proposal didn't account for this. We definitely don't want to split a DecoratorSpan into multiple spans for a single node/operation. So, I think we have 2 options here:

  1. Either we drop the requirement that decorators in the decorator list are unique. Then, if two nodes use the same decorator, we'd have such decorators duplicated so that each of them has a different ID.
  2. Or, we add a level of indirection where DecoratorSpan points to a list of decorator IDs, and then these IDs point to a unique list of decorators.

Option 1 could make sense if we expect vast majority of decorators to be unique anyway - so, duplicating a small number of them may not be an issue. On the other hand, option 2 is a more generic approach and is probably less work. So, my intuition is that option 2 is probably a better approach here (assuming in fact is is not more complicated to implement than options 1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobbinth I don't fully understand Option 2. But just to understand the picture better.
This is my understanding:
When you add nodes and operations each decorator is assigned a unique DecoratorID. Therefore there is no overlap of DecoratorId between nodes (I didn't see the check for existing decorators when adding decorators).

The only place where I found a potential of overlap is this code checking for existing decorators when merging.
I'm not familiar with how impactful or frequent merging is, but if we remove the code to check for existing decorator, it will mean decorator spans for an operation are contiguous.

If they are not contiguous and we need another list to track that, then the initial goal of reducing memory footprint will not be met. (We were trying to replace a Vec with a smaller structure. Or does Option 2 simply push the memory burden away from MastForest?

If we have DecoratorSpan -> Vec(oldDecoratorID, newDecoratorId) -> DecoratorID

I don't understand where the middle part would live, and why would this be better than what is currently in main.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you add nodes and operations each decorator is assigned a unique DecoratorID.

This is not exactly correct: only unique decorators get uniq decorator IDs. That is, if we want to add two identical decorators to the same MastForest, both of them would get the same ID. This happens in MastForestBuilder::ensure_decorator().

If they are not contiguous and we need another list to track that, then the initial goal of reducing memory footprint will not be met. (We were trying to replace a Vec with a smaller structure. Or does Option 2 simply push the memory burden away from MastForest?

If we have DecoratorSpan -> Vec(oldDecoratorID, newDecoratorId) -> DecoratorID

I don't understand where the middle part would live, and why would this be better than what is currently in main.

The main idea of option 2 is to have a vector of decorator IDs which could contain duplicate IDs. So, MastForest could look like so:

pub struct MastForest {
    nodes: Vec<MastNode>,
    roots: Vec<MastNodeId>,
    decorator_ids: Vec<DecoratorId>, // new filed, open to use a different name for it
    decorators: Vec<Decorator>,
    advice_map: AdviceMap,
}

Then, DecoratorSpan would point to the decorator_ids field, and then we could use individual IDs from there to look up decorators.

To illustrate this, let's say we have a MastForest with 2 nodes. Node 1 would have decorators a and b attached to it, and node 2 would have decorators a and c attached to it. Then, this would result in the following:

  • decorators would look something like this: [a, b, c] - i.e., ID of a is 0, ID of b is 1, and ID of c is 2.
  • decorator_ids would look as follows [0, 1, 0, 2].
  • The decorator span for node 1 would then be (0, 2), and for node 2 it would be (2, 2).

And so, when we want to look up decorators for node 1, we look at its decorator span that says that we need to grab the first two IDs from the decorator_ids. These IDs are 0 and 1 and so then we can look up these decorators in the decorators vector.

This does increase the size of the MastForest somewhat and also creates an extra lookup "hop". But the main thing we are able to avoid is creating 2 vectors per node (which would require 24 bytes per vector, not counting the actual vector data). Instead, each node now can contain 2 decorator spans (which requires 16 bytes total - and we can compress this to 10). So, we still get some memory savings, but again, the main benefit is avoiding creating thousands (and potentially millions) of vectors - which is a significant bottleneck.

@@ -456,14 +456,18 @@ impl MastForestBuilder {
}

pub fn set_before_enter(&mut self, node_id: MastNodeId, decorator_ids: Vec<DecoratorId>) {
self.mast_forest[node_id].set_before_enter(decorator_ids);
// Does this need to have a Vec<DecoratorSpan>?
let span = DecoratorSpan::new_collection(decorator_ids).into_iter().next().unwrap();
Copy link
Contributor Author

@yasonk yasonk Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure having a collection of spans is needed, but new_collection method creates a collection because it seems that it may be possible to have non-contiguous decorator spans. See this comment: https://github.com/0xPolygonMiden/miden-vm/pull/1621/files#r1912588483


let new_node_fingerprint = self.fingerprint_for_node(&self[node_id]);
self.hash_by_node_id.insert(node_id, new_node_fingerprint);
}

pub fn set_after_exit(&mut self, node_id: MastNodeId, decorator_ids: Vec<DecoratorId>) {
self.mast_forest[node_id].set_after_exit(decorator_ids);
// Does this need to have a Vec<DecoratorSpan>?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure having a collection of spans is needed, but new_collection method creates a collection because it seems that it may be possible to have non-contiguous decorator spans. See this comment: https://github.com/0xPolygonMiden/miden-vm/pull/1621/files#r1912588483

@@ -642,10 +642,13 @@ impl Assembler {
)?;

if let Some(decorator_ids) = block_builder.drain_decorators() {
// This just picks off the first decorator span, but should there be more than one?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again not sure if there will ever be a case for more than one span here


self.decorators = new_decorators;
pub fn prepend_decorators(&mut self, decorator_ids: DecoratorSpan) {
// At which point do we want to create a decorator span? Do we convert the vector into a single span?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the calling function logic depends on whether the spans are contiguous, but in this case the assumption is that they are, and therefore prepend_decorators accepts a single DecorartorSpan.

@@ -95,7 +95,7 @@ impl<'a> DecoratorIterator<'a> {
/// Returns the next decorator but only if its position matches the specified position,
/// otherwise, None is returned.
#[inline(always)]
pub fn next_filtered(&mut self, pos: usize) -> Option<&DecoratorId> {
pub fn next_filtered(&mut self, pos: usize) -> Option<&DecoratorSpan> {
Copy link
Contributor Author

@yasonk yasonk Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I'm currently having difficulties. This should probably keep returning Option<&DecoratorId>, but then I need to create a method to iterate over vector of DecoratorSpan and then iterate over individual DecoratorSpan objects to get DecoratorId.

I tried to modify DecoratorIterator similar to below, but couldn't figure out how to set types for inner, outer, and DecoratorSpan::iter():

pub struct DecoratorIterator<'a> {
    outer: slice::Iter<'a, (usize, DecoratorSpan<'a>)>, // Iterator over DecoratorList
    inner: Option<slice::Iter<'a, DecoratorId>>,       // Specific type for inner iterator
}

impl<'a> DecoratorIterator<'a> {
    pub fn new(decorators: &'a [(usize, DecoratorSpan<'a>)]) -> Self {
        Self {
            outer: decorators.iter(),
            inner: None,
        }
    }
}

impl<'a> Iterator for DecoratorIterator<'a> {
    type Item = DecoratorId;

    fn next(&mut self) -> Option<Self::Item> {
        loop {
            if let Some(inner_iter) = &mut self.inner {
                if let Some(decorator_id) = inner_iter.next() {
                    return Some(*decorator_id);
                }
                // Inner iterator is exhausted
                self.inner = None;
            }

            // Fetch the next DecoratorSpan from the outer iterator
            if let Some((_index, decorator_span)) = self.outer.next() {
                self.inner = Some(decorator_span.iter());
            } else {
                // Outer iterator is exhausted
                return None;
            }
        }
    }
}

The problem is that if I use concrete types, then DecoratorSpan.iter() has to return a concrete iterator type, which seems strange. But if I use a generic type, then anyone who uses DecoratorIterator has to pass in a generic parameter, which seems to complicate the code unnecessarily, and also seems strange.

Any suggestions would be appreciated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think out of these options, returning a concrete type from DecoratorSpan.iter() seems like a better one (and I don't see a significant problem with it). But also, this approach won't work any more if we go with option 2 from #1621 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants