-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write latest finalized state in chunks #9026
base: master
Are you sure you want to change the base?
Conversation
Looks like a great idea! However:
From my understanding the ideal approach would be to make |
@Nashatyrev |
If you only care about atomicity of a single state record then it could be the option to stream all chunks besides the first one and write the first chunk just at the very end. So you would either have committed the first chunk and all others or you would have no first chunk which would be treated as absence of the state. |
well, it is not only atomicity in writing the state. We update that state within a bigger transaction, which contains several updates. teku/storage/src/main/java/tech/pegasys/teku/storage/server/kvstore/KvStoreDatabase.java Line 1110 in 2550254
|
Could it be generalized to such a statement: if there are |
storage/src/main/java/tech/pegasys/teku/storage/server/leveldb/CustomJniDBFactory.java
Fixed
Show fixed
Hide fixed
storage/src/main/java/tech/pegasys/teku/storage/server/leveldb/CustomJniDBFactory.java
Outdated
Show resolved
Hide resolved
storage/src/main/java/tech/pegasys/teku/storage/server/leveldb/CustomJniDBFactory.java
Outdated
Show resolved
Hide resolved
this would work from the atomicity standpoint, but it wont be transactional. Once we start writing there wont be any rollback. I.e. If the client shuts down in the middle of the sequence we loose the previous state too and client wont be able to recover. An approach could be to have two version of a given variable (lets say A and B) and have an additional meta value tracking which is the "current" (updating this info only after all chunks have been written) and we will "ping-pong" between A and B on each write. Doable (maybe complicated?), but if we will end up moving from levelDB and focus on RocksDB again as a primary DB, it could be a waste of time. |
f9100dd
to
16e0b33
Compare
Oh, right! I forgot it's the DB variable. |
The other problem is that we currently assume that the data in db (hot and finalized) is consistent as we update it in a big transaction (updating tables and variables all together). |
I would suggest the following approach for chunked variable transactions:
Looks pretty atomic and transactional to me 🤔 The only drawback would be unmanaged data left in the DB if the transaction (steps 1+2) is aborted in the middle. The same could happen for concurrent transactions. But that could be neglected due to rare occasions imo |
yeah i like that model and it could be extended to values in columns too. |
we could even think about writing those chunks directly as files on FS and write the full path to them as values (instead of the actual data). This would turn the db to be just an index. We could easily abstract this as This would even lead us to offload blobs on FS, pretty transparently |
SszByteArrayChunksWriter
(currently hardcoded to 8MiB chunk sizes)KvStoreUnchunkedVariable
which stores chunks in a similarly to what we do with columns:(id)->byte
stores a byte with the number of chunks (max 255 chunks)(id,chunkIndex) -> chunk bytes
There is an automatic backward compatibility behavior: if the base
(id)->byte
returns something bigger than a single byte, than it is considered as a big chunk (essentially the previous storage mode)TODO:
fixes #9018
Documentation
doc-change-required
label to this PR if updates are required.Changelog