Remove `executed_in_epoch` table. #21477

mystenmark · 2025-03-12T21:49:07Z

The table is replaced with:

An in-memory "dirty set" which holds executed but un-checkpointed
transaction digests. Transactions are removed from the dirty set by
CheckpointExecutor.
An additional bounded cache intended to lessen the number of db reads
by CheckpointBuilder
Last-resort reads go to the executed_transactions_to_checkpoint table.

The only purpose of the table was to allow CheckpointBuilder to prune
transaction dependencies from prior epochs, and the above approach
suffices while removing a surprisingly expensive source of db writes.

vercel · 2025-03-12T21:49:12Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
sui-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 13, 2025 9:14pm

2 Skipped Deployments

Name	Status	Preview	Comments	Updated (UTC)
multisig-toolkit	⬜️ Ignored (Inspect)	Visit Preview		Mar 13, 2025 9:14pm
sui-kiosk	⬜️ Ignored (Inspect)	Visit Preview		Mar 13, 2025 9:14pm

aschran · 2025-03-13T16:43:10Z

crates/sui-core/src/authority/authority_per_epoch_store.rs

        }
-        batch.write()?;

        if !matches!(tx_key, TransactionKey::Digest(_)) {


can this now be combined with above?

aschran · 2025-03-13T16:46:43Z

crates/sui-core/src/authority/authority_per_epoch_store.rs

        }
+
+        // TODO: should this be debug_fatal? Its potentially very serious in that it could


on the question of "do we want this to halt the network?" my initial/gut feeling is yes, so that further damage will not continue to accrue that makes it even harder to recover, while we are trying to debug what happened and verify that it will not happen again.

At the point at which this happens we are already most of the way through reconfig. The only damage will be that a transaction which achieved settlement finality has been dropped. Which is very bad for the sender, but it doesn't actually "damage" the chain in any other respect.

I'm comfortable leaving this as fatal because I believe the tests will catch such a bug before it makes it to prod. But if it did happen, i'm pretty sure the only thing we could do would be to push a fix which removes this fatal!. So that makes me wonder if there's any point.

aschran · 2025-03-13T16:48:43Z

crates/sui-core/src/authority/consensus_quarantine.rs

+        self.executed_in_epoch_cache.insert(tx_digest, ());
+    }
+
+    // Called by CheckpointExecutor


nit - sort of? proximately it's called by authority per epoch store. I think this comment makes it a bit confusing, to expect to see a call to this fn in CheckpointExecutor but it's not there?

Well, I guess I meant to indicate which component has a need for this method. I can rewrite the comment to be clearer.

andll · 2025-03-13T17:07:48Z

Looking at this PR from a perspective "how can we test it better" - should we try to reduce the cache size to say 1-2 elements for the simtest runs to make sure we exercise path with cache eviction? It feels like with 50k cache capacity it might never even try this code path

mystenmark · 2025-03-13T20:22:12Z

Looking at this PR from a perspective "how can we test it better" - should we try to reduce the cache size to say 1-2 elements for the simtest runs to make sure we exercise path with cache eviction? It feels like with 50k cache capacity it might never even try this code path

very good point

The table is replaced with: - An in-memory "dirty set" which holds executed but un-checkpointed transaction digests. Transactions are removed from the dirty set by CheckpointExecutor. - An additional bounded cache intended to lessen the number of db reads by CheckpointBuilder - Last-resort reads go to the `executed_transactions_to_checkpoint` table. The only purpose of the table was to allow CheckpointBuilder to prune transaction dependencies from prior epochs, and the above approach suffices while removing a surprisingly expensive source of db writes.

mystenmark requested review from aschran and andll March 12, 2025 21:49

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env March 12, 2025 21:49 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs March 12, 2025 21:50 View deployment

mystenmark force-pushed the mlogan-remove-executed-in-epoch branch from 34d17c6 to 8078644 Compare March 12, 2025 22:39

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env March 12, 2025 22:39 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs March 12, 2025 22:41 View deployment

aschran approved these changes Mar 13, 2025

View reviewed changes

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env March 13, 2025 18:14 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs March 13, 2025 18:18 View deployment

mystenmark force-pushed the mlogan-remove-executed-in-epoch branch from 270ae79 to eb5eaee Compare March 13, 2025 20:40

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env March 13, 2025 20:40 — with GitHub Actions Inactive

mystenmark enabled auto-merge (squash) March 13, 2025 20:41

vercel bot deployed to Preview – sui-docs March 13, 2025 20:42 View deployment

mystenmark added 4 commits March 13, 2025 14:13

Fix and future-proof test_epoch_flag_upgrade

c2c20b2

PR comments

2109a89

Improve tests

7947247

mystenmark force-pushed the mlogan-remove-executed-in-epoch branch from eb5eaee to 7947247 Compare March 13, 2025 21:13

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env March 13, 2025 21:13 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs March 13, 2025 21:14 View deployment

mystenmark merged commit efdbe99 into main Mar 13, 2025
46 of 47 checks passed

mystenmark deleted the mlogan-remove-executed-in-epoch branch March 13, 2025 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `executed_in_epoch` table. #21477

Remove `executed_in_epoch` table. #21477

mystenmark commented Mar 12, 2025

vercel bot commented Mar 12, 2025 •

edited

Loading

aschran Mar 13, 2025

aschran Mar 13, 2025

mystenmark Mar 13, 2025

aschran Mar 13, 2025

mystenmark Mar 13, 2025

andll commented Mar 13, 2025

mystenmark commented Mar 13, 2025

		}

		// TODO: should this be debug_fatal? Its potentially very serious in that it could

Remove executed_in_epoch table. #21477

Remove executed_in_epoch table. #21477

Conversation

mystenmark commented Mar 12, 2025

vercel bot commented Mar 12, 2025 • edited Loading

aschran Mar 13, 2025

Choose a reason for hiding this comment

aschran Mar 13, 2025

Choose a reason for hiding this comment

mystenmark Mar 13, 2025

Choose a reason for hiding this comment

aschran Mar 13, 2025

Choose a reason for hiding this comment

mystenmark Mar 13, 2025

Choose a reason for hiding this comment

andll commented Mar 13, 2025

mystenmark commented Mar 13, 2025

Remove `executed_in_epoch` table. #21477

Remove `executed_in_epoch` table. #21477

vercel bot commented Mar 12, 2025 •

edited

Loading