feat: introduce DecoratorSpan to reduce MastNode size #1621

yasonk · 2025-01-12T23:57:17Z

This is my attempt to introduce a DecoratorSpan. It seems to be more complicated than I thought it would be, so I decided to open a draft PR to make sure I'm on the right path before I start try to test the changes. Also if somebody just wants to take over, it is fine with me.

I'll add some comments inline to highlight some difficulties or decisions points.

yasonk · 2025-01-13T01:20:25Z

core/src/mast/merger/mod.rs

@@ -151,6 +148,8 @@ impl MastForestMerger {
            let new_decorator_id = if let Some(existing_decorator) =
                self.decorators_by_hash.get(&merging_decorator_hash)
            {
+                // This implies that in some cases it is possible to have intersecting decorator sets


This implies that in some cases it is possible to have intersecting decorator sets, meaning a DecoratorSpan will potentially need to be split up into multiple DecoratorSpan objects. Not sure if this is a real scenario or what prevents it from becoming real.

Yeah, I think the original proposal didn't account for this. We definitely don't want to split a DecoratorSpan into multiple spans for a single node/operation. So, I think we have 2 options here:

Either we drop the requirement that decorators in the decorator list are unique. Then, if two nodes use the same decorator, we'd have such decorators duplicated so that each of them has a different ID.

Or, we add a level of indirection where DecoratorSpan points to a list of decorator IDs, and then these IDs point to a unique list of decorators.

Option 1 could make sense if we expect vast majority of decorators to be unique anyway - so, duplicating a small number of them may not be an issue. On the other hand, option 2 is a more generic approach and is probably less work. So, my intuition is that option 2 is probably a better approach here (assuming in fact is is not more complicated to implement than options 1).

@bobbinth I don't fully understand Option 2. But just to understand the picture better.
This is my understanding:
When you add nodes and operations each decorator is assigned a unique DecoratorID. Therefore there is no overlap of DecoratorId between nodes (I didn't see the check for existing decorators when adding decorators).

The only place where I found a potential of overlap is this code checking for existing decorators when merging.
I'm not familiar with how impactful or frequent merging is, but if we remove the code to check for existing decorator, it will mean decorator spans for an operation are contiguous.

If they are not contiguous and we need another list to track that, then the initial goal of reducing memory footprint will not be met. (We were trying to replace a Vec with a smaller structure. Or does Option 2 simply push the memory burden away from MastForest?

If we have DecoratorSpan -> Vec(oldDecoratorID, newDecoratorId) -> DecoratorID

I don't understand where the middle part would live, and why would this be better than what is currently in main.

When you add nodes and operations each decorator is assigned a unique DecoratorID.

This is not exactly correct: only unique decorators get uniq decorator IDs. That is, if we want to add two identical decorators to the same MastForest, both of them would get the same ID. This happens in MastForestBuilder::ensure_decorator().

If they are not contiguous and we need another list to track that, then the initial goal of reducing memory footprint will not be met. (We were trying to replace a Vec with a smaller structure. Or does Option 2 simply push the memory burden away from MastForest?

If we have DecoratorSpan -> Vec(oldDecoratorID, newDecoratorId) -> DecoratorID

I don't understand where the middle part would live, and why would this be better than what is currently in main.

The main idea of option 2 is to have a vector of decorator IDs which could contain duplicate IDs. So, MastForest could look like so:

pub struct MastForest { nodes: Vec<MastNode>, roots: Vec<MastNodeId>, decorator_ids: Vec<DecoratorId>, // new filed, open to use a different name for it decorators: Vec<Decorator>, advice_map: AdviceMap, }

Then, DecoratorSpan would point to the decorator_ids field, and then we could use individual IDs from there to look up decorators.

To illustrate this, let's say we have a MastForest with 2 nodes. Node 1 would have decorators a and b attached to it, and node 2 would have decorators a and c attached to it. Then, this would result in the following:

decorators would look something like this: [a, b, c] - i.e., ID of a is 0, ID of b is 1, and ID of c is 2.

decorator_ids would look as follows [0, 1, 0, 2].

The decorator span for node 1 would then be (0, 2), and for node 2 it would be (2, 2).

And so, when we want to look up decorators for node 1, we look at its decorator span that says that we need to grab the first two IDs from the decorator_ids. These IDs are 0 and 1 and so then we can look up these decorators in the decorators vector.

This does increase the size of the MastForest somewhat and also creates an extra lookup "hop". But the main thing we are able to avoid is creating 2 vectors per node (which would require 24 bytes per vector, not counting the actual vector data). Instead, each node now can contain 2 decorator spans (which requires 16 bytes total - and we can compress this to 10). So, we still get some memory savings, but again, the main benefit is avoiding creating thousands (and potentially millions) of vectors - which is a significant bottleneck.

yasonk · 2025-01-13T01:22:09Z

assembly/src/assembler/mast_forest_builder.rs

@@ -456,14 +456,18 @@ impl MastForestBuilder {
    }

    pub fn set_before_enter(&mut self, node_id: MastNodeId, decorator_ids: Vec<DecoratorId>) {
-        self.mast_forest[node_id].set_before_enter(decorator_ids);
+        // Does this need to have a Vec<DecoratorSpan>?
+        let span = DecoratorSpan::new_collection(decorator_ids).into_iter().next().unwrap();


Not sure having a collection of spans is needed, but new_collection method creates a collection because it seems that it may be possible to have non-contiguous decorator spans. See this comment: https://github.com/0xPolygonMiden/miden-vm/pull/1621/files#r1912588483

yasonk · 2025-01-13T01:23:07Z

assembly/src/assembler/mast_forest_builder.rs


        let new_node_fingerprint = self.fingerprint_for_node(&self[node_id]);
        self.hash_by_node_id.insert(node_id, new_node_fingerprint);
    }

    pub fn set_after_exit(&mut self, node_id: MastNodeId, decorator_ids: Vec<DecoratorId>) {
-        self.mast_forest[node_id].set_after_exit(decorator_ids);
+        // Does this need to have a Vec<DecoratorSpan>?


Not sure having a collection of spans is needed, but new_collection method creates a collection because it seems that it may be possible to have non-contiguous decorator spans. See this comment: https://github.com/0xPolygonMiden/miden-vm/pull/1621/files#r1912588483

yasonk · 2025-01-13T01:24:15Z

assembly/src/assembler/mod.rs

@@ -642,10 +642,13 @@ impl Assembler {
                    )?;

                    if let Some(decorator_ids) = block_builder.drain_decorators() {
+                        // This just picks off the first decorator span, but should there be more than one?


Again not sure if there will ever be a case for more than one span here

yasonk · 2025-01-13T01:34:08Z

core/src/mast/node/basic_block_node/mod.rs

-
-        self.decorators = new_decorators;
+    pub fn prepend_decorators(&mut self, decorator_ids: DecoratorSpan) {
+        // At which point do we want to create a decorator span? Do we convert the vector into a single span?


Here the calling function logic depends on whether the spans are contiguous, but in this case the assumption is that they are, and therefore prepend_decorators accepts a single DecorartorSpan.

yasonk · 2025-01-13T02:46:09Z

core/src/operations/decorators/mod.rs

@@ -95,7 +95,7 @@ impl<'a> DecoratorIterator<'a> {
    /// Returns the next decorator but only if its position matches the specified position,
    /// otherwise, None is returned.
    #[inline(always)]
-    pub fn next_filtered(&mut self, pos: usize) -> Option<&DecoratorId> {
+    pub fn next_filtered(&mut self, pos: usize) -> Option<&DecoratorSpan> {


This is where I'm currently having difficulties. This should probably keep returning Option<&DecoratorId>, but then I need to create a method to iterate over vector of DecoratorSpan and then iterate over individual DecoratorSpan objects to get DecoratorId.

I tried to modify DecoratorIterator similar to below, but couldn't figure out how to set types for inner, outer, and DecoratorSpan::iter():

pub struct DecoratorIterator<'a> { outer: slice::Iter<'a, (usize, DecoratorSpan<'a>)>, // Iterator over DecoratorList inner: Option<slice::Iter<'a, DecoratorId>>, // Specific type for inner iterator } impl<'a> DecoratorIterator<'a> { pub fn new(decorators: &'a [(usize, DecoratorSpan<'a>)]) -> Self { Self { outer: decorators.iter(), inner: None, } } } impl<'a> Iterator for DecoratorIterator<'a> { type Item = DecoratorId; fn next(&mut self) -> Option<Self::Item> { loop { if let Some(inner_iter) = &mut self.inner { if let Some(decorator_id) = inner_iter.next() { return Some(*decorator_id); } // Inner iterator is exhausted self.inner = None; } // Fetch the next DecoratorSpan from the outer iterator if let Some((_index, decorator_span)) = self.outer.next() { self.inner = Some(decorator_span.iter()); } else { // Outer iterator is exhausted return None; } } } }

The problem is that if I use concrete types, then DecoratorSpan.iter() has to return a concrete iterator type, which seems strange. But if I use a generic type, then anyone who uses DecoratorIterator has to pass in a generic parameter, which seems to complicate the code unnecessarily, and also seems strange.

Any suggestions would be appreciated.

I think out of these options, returning a concrete type from DecoratorSpan.iter() seems like a better one (and I don't see a significant problem with it). But also, this approach won't work any more if we go with option 2 from #1621 (comment).

feat: introduce SpanDecorator

5824f4f

yasonk marked this pull request as draft January 12, 2025 23:57

yasonk commented Jan 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: introduce DecoratorSpan to reduce MastNode size #1621

feat: introduce DecoratorSpan to reduce MastNode size #1621

yasonk commented Jan 12, 2025

yasonk Jan 13, 2025

bobbinth Jan 14, 2025

yasonk Jan 16, 2025

bobbinth Jan 18, 2025

yasonk Jan 13, 2025 •

edited

Loading

yasonk Jan 13, 2025

yasonk Jan 13, 2025

yasonk Jan 13, 2025

yasonk Jan 13, 2025 •

edited by bobbinth

Loading

bobbinth Jan 14, 2025

feat: introduce DecoratorSpan to reduce MastNode size #1621

Are you sure you want to change the base?

feat: introduce DecoratorSpan to reduce MastNode size #1621

Conversation

yasonk commented Jan 12, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yasonk Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yasonk Jan 13, 2025 • edited by bobbinth Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yasonk Jan 13, 2025 •

edited

Loading

yasonk Jan 13, 2025 •

edited by bobbinth

Loading