Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DatabaseState abstraction #11786

Closed
wants to merge 13 commits into from

Conversation

frisitano
Copy link
Contributor

@frisitano frisitano commented Oct 16, 2024

Overview

The objective of this pull request is to provide abstractions over the the state types in reth, specifically:

  • StateRoot
  • StorageRoot
  • StateProof
  • StateWitness
  • KeyHasher

These types should be easily configurable when defining the NodeTypes and configuring a node. The motivation behind this abstraction is that at scroll we use a BMPT, the spec of which can be found here. We would like to introduce these abstractions such that we can provide concrete types that work with a BMPT state representation for scroll reth.

Implementation

DatabaseState Trait

We introduce a DatabaseState trait which is the object we use to define the concrete types that implement the required operations for the associated objects. The definition of this trait can be seen here:

/// Database state trait.
pub trait DatabaseState: std::fmt::Debug + Send + Sync + Unpin + 'static {
    /// The state root type.
    type StateRoot<'a, TX: DbTx + 'a>: DatabaseStateRoot<'a, TX>;
    /// The storage root type.
    type StorageRoot<'a, TX: DbTx + 'a>: DatabaseStorageRoot<'a, TX>;
    /// The state proof type.
    type StateProof<'a, TX: DbTx + 'a>: DatabaseProof<'a, TX>;
    /// The state witness type.
    type StateWitness<'a, TX: DbTx + 'a>: DatabaseTrieWitness<'a, TX>;
    /// The key hasher type.
    type KeyHasher: reth_trie::KeyHasher;
}

where the DatabaseStateRoot, DatabaseStorageRoot, DatabaseProof, DatabaseTrieWitness are the pre-existing traits in reth and KeyHasher is a new trait that is introduced defined as follows:

/// Trait for hashing keys in state.
pub trait KeyHasher: Default + Clone + Send + Sync + 'static {
    /// Hashes the given bytes into a 256-bit hash.
    fn hash_key<T: AsRef<[u8]>>(bytes: T) -> B256;
}

NodeTypes Extension

We want the concrete DatabaseState type to be easily configurable by the use when configuring the node and as such we introduce an associated type to NodeTypes to enable this:

pub trait NodeTypes: Send + Sync + Unpin + 'static {
    /// The node's primitive types, defining basic operations and structures.
    type Primitives: NodePrimitives;
    /// The type used for configuration of the EVM.
    type ChainSpec: EthChainSpec;
    /// The types that define the state based operations of the node.
    type State: DatabaseState;
}

HashedPostState and PrefixSetLoader modifications

Both HashedPostState and PrefixSetLoader have methods which are associated with generating keys by hashing data, we make these methods generic over the KeyHasher define above.

PrefixSetLoader::load<KH: KeyHasher>(self, range: RangeInclusive<BlockNumber>) -> Result<TriePrefixSets,DatabaseError>;
HashedPostState::from_bundle_state<'a, KH: KeyHasher>(state: impl IntoParallelIterator<Item = (&'a Address, &'a BundleAccount)>) -> Self;
HashedPostState::from_cache_state<'a, KH: KeyHasher>(state: impl IntoParallelIterator<Item = (&'a Address, &'a CacheAccount)>,) -> Self

Providers

We modify the required provide objects such that they are generic over the DatabaseState using phantom data:

/// A provider struct that fetches data from the database.
/// Wrapper around [`DbTx`] and [`DbTxMut`]. Example: [`HeaderProvider`] [`BlockHashReader`]
#[derive(Debug)]
pub struct DatabaseProvider<TX, Spec, DS> {
    /// Database transaction.
    tx: TX,
    /// Chain spec
    chain_spec: Arc<Spec>,
    /// Static File provider
    static_file_provider: StaticFileProvider,
    /// Pruning configuration
    prune_modes: PruneModes,
    /// The state types.
    _state_types: PhantomData<DS>,
}
/// State provider for the latest state.
#[derive(Debug)]
pub struct LatestStateProvider<TX: DbTx, DS> {
    /// database transaction
    db: TX,
    /// Static File provider
    static_file_provider: StaticFileProvider,
    /// The state types
    _state_types: PhantomData<DS>,
}
/// State provider over latest state that takes tx reference.
#[derive(Debug)]
pub struct LatestStateProviderRef<'b, TX: DbTx, DS: DatabaseState> {
    /// database transaction
    tx: &'b TX,
    /// Static File provider
    static_file_provider: StaticFileProvider,
    /// The state types.
    _state_types: PhantomData<DS>,
}
/// State provider for a given block number.
/// For more detailed description, see [`HistoricalStateProviderRef`].
#[derive(Debug)]
pub struct HistoricalStateProvider<TX: DbTx, DS: DatabaseState> {
    /// Database transaction
    tx: TX,
    /// State at the block number is the main indexer of the state.
    block_number: BlockNumber,
    /// Lowest blocks at which different parts of the state are available.
    lowest_available_blocks: LowestAvailableBlocks,
    /// Static File provider
    static_file_provider: StaticFileProvider,
    /// The database state types.
    _state_types: PhantomData<DS>,
}
pub struct HistoricalStateProviderRef<'b, TX: DbTx, DS: DatabaseState> {
    /// Transaction
    tx: &'b TX,
    /// Block number is main index for the history state of accounts and storages.
    block_number: BlockNumber,
    /// Lowest blocks at which different parts of the state are available.
    lowest_available_blocks: LowestAvailableBlocks,
    /// Static File provider
    static_file_provider: StaticFileProvider,
    /// The database state types.
    _state_types: PhantomData<DS>,
}

HashedPostStateProvider

We want to encapsulate the DatabaseState and it's associated types within the state providers and as such we introduce a HashedPostStateProvider defined as follows such that the types do not leak out of the provider:

/// Trait that provides the `HashedPostState` from various sources.
#[auto_impl::auto_impl(&, Box, Arc)]
pub trait HashedPostStateProvider {
    /// Returns the `HashedPostState` of the `BundleState`.
    fn hashed_post_state_from_bundle_state(&self, bundle_state: &BundleState) -> HashedPostState;

    /// Returns the `HashedPostState` for the given block number.
    fn hashed_post_state_from_reverts(
        &self,
        block_number: BlockNumber,
    ) -> ProviderResult<HashedPostState>;
}

Default implementation of DatabaseState

We introduce a default implementation of DatabaseState that uses the pre-existing state types as seen below:

impl DatabaseState for () {
    type StateRoot<'a, TX: DbTx + 'a> =
        StateRoot<DatabaseTrieCursorFactory<'a, TX>, DatabaseHashedCursorFactory<'a, TX>>;
    type StorageRoot<'a, TX: DbTx + 'a> =
        StorageRoot<DatabaseTrieCursorFactory<'a, TX>, DatabaseHashedCursorFactory<'a, TX>>;
    type StateProof<'a, TX: DbTx + 'a> =
        Proof<DatabaseTrieCursorFactory<'a, TX>, DatabaseHashedCursorFactory<'a, TX>>;
    type StateWitness<'a, TX: DbTx + 'a> =
        TrieWitness<DatabaseTrieCursorFactory<'a, TX>, DatabaseHashedCursorFactory<'a, TX>>;
    type KeyHasher = KeccakKeyHasher;
}

TODO

Introduce appropriate abstractions for parallel state root.

Copy link
Member

@onbjerg onbjerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one general comment here is that DatabaseState is a pretty vague name, any way we can find something a bit more descriptive? e.g. something related to state commitment

@frisitano
Copy link
Contributor Author

one general comment here is that DatabaseState is a pretty vague name, any way we can find something a bit more descriptive? e.g. something related to state commitment

I agree. How about StateCommitmentTypes or StateCommitment?

Copy link
Member

@Rjected Rjected left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like StateCommitment a lot as a name, another note is that this should probably be split up - I think a good strategy for doing this would be:

  • PRs for each trait
  • Then, PRs for the implementation of each trait by the existing state commitment types
    The above two ^ can be combined in the same PR if it's easy.
    Finally, the integration with NodeTypes, full integration of trait StateCommitment etc.

Let me know if that makes sense as a general direction! Splitting it up will make it much easier to review and much more likely that these changes can actually get in

@frisitano
Copy link
Contributor Author

I like StateCommitment a lot as a name, another note is that this should probably be split up - I think a good strategy for doing this would be:

  • PRs for each trait
  • Then, PRs for the implementation of each trait by the existing state commitment types
    The above two ^ can be combined in the same PR if it's easy.
    Finally, the integration with NodeTypes, full integration of trait StateCommitment etc.

Let me know if that makes sense as a general direction! Splitting it up will make it much easier to review and much more likely that these changes can actually get in

Yes this makes sense. I recognise that it is impractical to review / integrate a PR of this size. I think your approach makes sense. I will create an issue with a plan of how we can decompose this into bitesize units of work following your suggestion.

@Rjected Rjected added C-enhancement New feature or request A-trie Related to Merkle Patricia Trie implementation labels Oct 30, 2024
@frisitano
Copy link
Contributor Author

The intention of this PR was to serve as a proof of concept. The production implementation can be tracked here #11830

@frisitano frisitano closed this Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sdk Related to reth's use as a library A-trie Related to Merkle Patricia Trie implementation C-enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants