Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: purge ephemeral storage #1729

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ogenev
Copy link
Member

@ogenev ogenev commented Mar 24, 2025

What was wrong?

This PR adds intelligent content purging to the EphemeralV1Store based on historical summaries updates in the beacon chain. It ensures that ephemeral content (headers, bodies, receipts) older than the last historical summary event is automatically purged.

How was it fixed?

  • Content is purged based on the slot when the last historical summary was added
  • Background task runs every epoch (384 seconds) to check for new purge opportunities
  • Purge conditions are triggered when transitioning to epochs that are multiples of 256
  • The implementation follows the same logic as the beacon state's historical summaries update mechanism

Added tests for:

  • Initial purge during store initialization
  • Background task purging

To-Do

@ogenev ogenev force-pushed the purge-ephemeral-storage branch 3 times, most recently from b9d6979 to 9f11b37 Compare March 25, 2025 09:53
@ogenev ogenev force-pushed the purge-ephemeral-storage branch from 9f11b37 to 88845e8 Compare March 25, 2025 10:02
@ogenev ogenev marked this pull request as ready for review March 25, 2025 10:25
@ogenev ogenev requested review from morph-dev and KolbyML March 25, 2025 10:26
@morph-dev
Copy link
Collaborator

morph-dev commented Mar 26, 2025

I will start by saying that maybe I don't have high level picture in mind, so maybe my way of viewing things is completely wrong. Would be happy to have a call and discuss this. Will provide feedback anyway so we can have more productive talk.


I'm not sure we want a background task that just goes and deletes ephemeral content, because we probably want to migrate bodies and receipts into permanent storage.

I also don't think that this should be driven by system clock. Instead, we should do it when new information is available (e.g. new historical summaries or new finalized block). We should probably use OverlayEvent for this (in some way).

With that in mind, I see this store as simple structure that allows us to easily access/modify content in the db, but it shouldn't drive this logic. For example, we can expose following function and call it from the outside:

/// Deletes and returns canonical content items from a slot range.
/// 
/// This function will also delete all content items that are associated with slots before 
/// `start_slot` or that are within the range but are not canonical.
fn delete_content(
    &self,
    start_slot: u64,
    end_slot: u64,
    non_canonical_slots: &[u64],
) -> Result<Vec<(TContentKey, RawContentValue)>, ContentStoreError>;

This is just an example. Depending on what information we have available, this can/should be modified to the use case (maybe even split into multiple functions). But if we don't have clear picture on how it can be used, be can just go with this and modify it as we go, or we can skip this part until we have better understanding on what is needed.

@ogenev
Copy link
Member Author

ogenev commented Mar 26, 2025

I'm not sure we want a background task that just goes and deletes ephemeral content, because we probably want to migrate bodies and receipts into permanent storage.

My idea is to migrate bodies and receipts in the same task (or have a different task), but I'm adding all features incrementally, and this is just the start (only implementing purging in this PR).

I will think about a system clock vs using an event, and we will discuss it (it seems like a good idea).

@morph-dev
Copy link
Collaborator

morph-dev commented Mar 26, 2025

My idea is to migrate bodies and receipts in the same task (or have a different task), but I'm adding all features incrementally, and this is just the start (only implementing purging in this PR).

Then this store would have to somehow know about permanent store as well, and I don't think it should.

In that case, task should run on higher level (most likely subnet storage object), extract/delete data from ephemeral store and write (or drop) it into permanent store.


Also, I think that in the long run, we want headers to expire at different point from bodies and receipts. More precisely, headers should be ephemeral for 8192 slots, while bodies and receipts should be ephemeral until they are finalized (64-96 slots).

It's probably fine to handle both of these at 8192 boundary for now, but it's something we should keep in mind for the long run.

@ogenev
Copy link
Member Author

ogenev commented Mar 26, 2025

Then this store would have to somehow know about permanent store as well, and I don't think it should.

In that case, task should run on higher level (most likely subnet storage object), extract/delete data from ephemeral store and write (or drop) it into permanent store.

We can have something like StoreManager which will contain the indexed and ephemeral storage and can run tasks to purge and transfer data between stores. Then, every subnetwork will contain a shared reference to this manager and can trigger commands, how do you feel about such architecture?

@morph-dev
Copy link
Collaborator

We can have something like StoreManager which will contain the indexed and ephemeral storage and can run tasks to purge and transfer data between stores. Then, every subnetwork will contain a shared reference to this manager and can trigger commands, how do you feel about such architecture?

It could work. But we would have to be careful for we do it, for example, if there is some mutex, other subnetworks shouldn't be block if not needed or for potentially longer time, etc.

I'm still not sure if background task using system clock is the best way for it. How would it know which content is finalized/valid and which one is not? There might be a way for this to work, but event driven approach makes more sense to me.

@ogenev
Copy link
Member Author

ogenev commented Mar 27, 2025

Added an issue #1731 to discuss with the broader team the storage architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants