Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scroll StateRootProvider implementation #4

Closed
Tracked by #16
frisitano opened this issue Oct 1, 2024 · 5 comments
Closed
Tracked by #16

Scroll StateRootProvider implementation #4

frisitano opened this issue Oct 1, 2024 · 5 comments
Assignees
Milestone

Comments

@frisitano
Copy link
Collaborator

frisitano commented Oct 1, 2024

Describe the feature

Overview

Scroll uses a binary Markle Patricia trie (BMPT) to represent it's state. This is different to the standard MPT that ethereum and reth uses. As such we will need to introduce a new implementation of the DatabaseStateRoot to support scrolls state model. This will also involve modifications to associated types and hashing schemes.

Design

There are a number of data structures, components and algorithms that are relevant in the context of the state root calculation. We will document those which require modification to support the BMPT state root calculation.

Introduction of StateTypes

To enable configurability of types required for modified state root calculation we will follow a similar pattern to what is used for EngineTypes, as such we will introduce a trait for StateTypes as follows:

pub trait StateTypes {
    /// The state root type.
    type StateRoot<'a, TX: DbTx + 'a>: DatabaseStateRoot<'a, TX>;
    /// The storage root type.
    type StorageRoot<'a, TX: DbTx + 'a>: DatabaseStorageRoot<'a, TX>;
    /// The state proof type.
    type StateProof<'a, TX: DbTx + 'a>: DatabaseProof<'a, TX>;
    /// The state witness type.
    type StateWitness<'a, TX: DbTx + 'a>: DatabaseTrieWitness<'a, TX>;
    /// The key hasher type.
    type KeyHasher: reth_trie::KeyHasher;
}

The StateKeyHasher trait can be defined as follows:

/// Trait for hashing state keys.
pub trait StateKeyHasher: Clone + Sync + Send + Default {
    /// Hashes the given bytes into a 256-bit hash.
    fn hash_key<T: AsRef<[u8]>>(bytes: T) -> B256;
}

As example of how this would be implemented for ethereum would be:

pub struct EthereumStateTypes;

impl StateTypes for EthereumStateTypes {
    type StateRoot = StateRoot;
    ...
    type StorageRoot = StorageRoot;
    type KeyHasher = KeccakKeyHasher;
}

Where the StateRoot and StorageRoot correspond to the types linked and KeccakKeyHasher is a type that will use the default keccak256 function from alloy.

We can then integrate this with NodeTypes by introducing a NodeTypesWithState trait and implementing it following the same pattern used for Engine types as seen here. This pattern will allow for configurability of the state root calculation by specifying the appropriate StateTypes object when configuring the node.

HashedPostState and HashedStorage

The HashedPostState and HashedStorage store the state diff associated with a block (or a number of blocks in the case of pipelining). By default both of these data structures leverage keccak256 to hash keys associated with accounts and storage slots. I propose that we introduce a generic on these data structures for key hashing.

We will make the following methods on HashedPostState and HashedStorage generic over the StateKeyHasher as follows:

HashedPostState::from_bundle_state<'a, K: StateKeyHasher>(...) -> Self
HashedPostState::from_cache_state<'a, KH: StateKeyHasher>(...) -> Self
HashedStorage::from_plain_storage<'a, KH: StateKeyHasher>(...) -> Self

Providers

The StateRootProvider and the StorageRootProvider will be made generic over the StateTypes trait. Parent types will also be impacted and appropriate changes will need to be made.

ParallelStateRoot

The ParallelStateRoot type leverages the StorageRoot and has an implementation of the StateRoot. The storage roots are computed in parallel and then integrated into the state root calculation. I suggest we modify the DatabaseStateRoot interface to allow for pre-computed storage roots to be passed in. As such we remove the bespoke storage root implementation in the ParallelStateRoot object and it just becomes a driver for parallel storage root and state root computation.

BranchNodeCompact

We should be able to reuse the BranchNodeCompact data structure to store our merkle nodes in the database. As scroll uses a binary merkle trie structure only two children would be expected in a branch node.

Stack based BMPT root algorithm

We may want to implement a stack based bmpt root algorithm that follows a similar pattern to what reth natively does for the standard Patricia Merkle trie. More research is required to understand the implications here.

@frisitano frisitano added this to the Milestone 1 milestone Oct 1, 2024
@frisitano frisitano changed the title Scroll StateRootProvider Scroll StateRootProvider implementation Oct 1, 2024
@frisitano frisitano added this to Reth Oct 1, 2024
@frisitano frisitano moved this to Todo in Reth Oct 1, 2024
@Thegaram
Copy link

Thegaram commented Oct 3, 2024

As I recall you mentioned another alternative ("deep integration with mdbx") that we might want to do in the future. Could you add some more context about that here?

@frisitano frisitano mentioned this issue Oct 8, 2024
10 tasks
@clabby
Copy link

clabby commented Oct 9, 2024

This is really cool! Hope to see some of the abstractions that come out of this work land upstream for other extensions of reth to use.

Have you guys looked into using liburkel here? It could potentially be a nice alternative to MDBX when working with base2 radix tries.

@frisitano
Copy link
Collaborator Author

This is really cool! Hope to see some of the abstractions that come out of this work land upstream for other extensions of reth to use.

For sure, thats the plan! It would be great to get a review / feedback from potential consumers of the abstractions as development progresses. I can keep you updated if that's ok with you?

Have you guys looked into using liburkel here? It could potentially be a nice alternative to MDBX when working with base2 radix tries.

I've not looked into it but it certainly seems interesting! There's also some promising research being done with nomt (blog post), it would be good to explore potential integrations with reth and benchmark to compare performance.

@frisitano
Copy link
Collaborator Author

frisitano commented Oct 20, 2024

Stack-Based BMPT State Root Calculation

Our ambition is to provide an implementation of scrolls BMPT within reth. We intend to leverage as much pre-existing architecture from scrolls default MPT implementation as possible to minimize the scope of this change. Leveraging the pre-existing reth types will result in a less efficient state size however the simplicity and practicality of this approach outweigh the cons.

HashBuilder

The algorithm that is used to build the state root and associated TrieUpdates is located within the HashBuilder object in alloy-trie.

We need to implement an analogous object with an analogous interface that implements an algorithm that builds a BMPT as defined in the scroll zk_trie spec here. We can name this object BmptHashBuilder or something similar to represent that the object builds a binary Merkle Patricia trie.

StateRoot

/// `StateRoot` is used to compute the root node of a state trie.
#[derive(Debug)]
pub struct StateRoot<T, H> {
/// The factory for trie cursors.
pub trie_cursor_factory: T,
/// The factory for hashed cursors.
pub hashed_cursor_factory: H,
/// A set of prefix sets that have changed.
pub prefix_sets: TriePrefixSets,
/// Previous intermediate state.
previous_state: Option<IntermediateStateRootState>,
/// The number of updates after which the intermediate progress should be returned.
threshold: u64,
#[cfg(feature = "metrics")]
/// State root metrics.
metrics: StateRootMetrics,
}

We can leverage the pre-existing StateRoot object which is responsible for iterating over nodes and providing them to the HashBuilder for state root calculation. In the first instance, we should create a code copy of the objects but in the future, we can consider introducing abstractions by the introduction of a generic over StateCommitmentHashBuilder trait defined below on the upstream StateRoot object.

StorageRoot

/// `StorageRoot` is used to compute the root node of an account storage trie.
#[derive(Debug)]
pub struct StorageRoot<T, H> {
/// A reference to the database transaction.
pub trie_cursor_factory: T,
/// The factory for hashed cursors.
pub hashed_cursor_factory: H,
/// The hashed address of an account.
pub hashed_address: B256,
/// The set of storage slot prefixes that have changed.
pub prefix_set: PrefixSet,
/// Storage root metrics.
#[cfg(feature = "metrics")]
metrics: TrieRootMetrics,
}

We will take a similar approach as described above with the StateRoot and then in the future look to introduce the StateCommitmentHashBuilder abstraction.

StateCommitmentHashBuilder

The logic for constructing state commitments is encapsulated within a single object. Currently in reth it is in the HashBuilder referenced above. With the introduction of scroll's version of the BmptHashBuilder we should consider introducing a trait for this object such as the following:

pub trait StateCommitmentHashBuilder: Default {
    fn add_branch(&mut self, key: Nibbles, value: B256, children_are_in_trie: bool);
    fn add_leaf(&mut self, key: Nibbles, value: &[u8]);
    fn root(&self) -> B256;
    fn updates_len(&self) -> usize;
    fn split(self) -> (Self, StorageTrieUpdates);
}

This will allow us to make parent objects generic over this trait:

  • StateRoot
  • StorageRoot
  • TrieWitness
  • Proof

Opened as a separate issue: #24

@frisitano frisitano self-assigned this Oct 20, 2024
@frisitano
Copy link
Collaborator Author

closed by #36

@github-project-automation github-project-automation bot moved this from In Progress to Done in Reth Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants