RFC: Persistent Cache Storage #8646
LingyuCoder
started this conversation in
RFC
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Feature Name: Design the storage module of persistent cache
Start Date: 2024/12/09
Summary
Persistent cache storage module design:
The storage module does not care about how the data is generated and consumed (what occassion modules do) or the serialization/deserialization of the data (what serializers do), but only focuses on the organization and reading/writing of the data in the file system.
Motivation
Cached data consists of KV pairs, and isolated cached data is generated during the
make
,seal
, and other phases. Therefore, the storage system needs to have the following capabilities:scope
) for storing caches generated/consumed at each phase of the build process.Glossary
Storage
: A storage instance mounted on the compiler, so it will cross multiple compilation under dev modeScope
: A storage area distinguished by name, will be used in different building phasesPack
: A storage block which maps to a stored file on file system, containing multiple KV data items.Meta
: A metafile used to manage storage packs of a scope, used to read and verify storage blocks of this scope.User Guide
For the upper-level persistent cache modules, a storage instance exposes the following interfaces:
load()
: Asynchronously reads all data items under a Scopeset()
: Add/modify specified data items under scope, will not be written immediately, will be uniformly processed in the last idleremove()
: Remove the specified data item under the scope, it will not be written immediately, and it will be uniformly processed in the last idletrigger_save()
: notify storage to write all data, which will be processed in an independent thread, and notify the write result through the oneshot channelIt should be noted that the waiting for idle channels can be different according to the mode:
compiler.build()
: should wait and add storage error to diagnosticscompiler.watch()
: no need to wait, the data will be updated to scopes when saved, and the write task will use the queue to execute in order. If the storage fails, the error from the channel needs to be print intostderr
.Detailed design
Modules
Include several parts:
Interaction
The APIs are handled as follows:
get_all()
:After the idle call, the latest data can be obtained through the
load()
. However, since the asynchronous write task will acquire ownership of scopes, it will be locked during the write process to prevent theload()
reading empty values at this time.Strategy
Since usually when a rspack project builds, the vast majority of modules come from the
node_modules
, these modules will not change frequently, only a small number of user modules will be frequently modified, in the dev mode this situation is particularly obvious. Therefore, consider usingbucket
s to incrementally write within a certain range:Validation
Updating
Set 80% of the maximum size of a pack as stable size limit:
Exception handling
Any errors that occur during the caching process should not block the main process, and in the worst case, only cause the cache to be unavailable. Therefore, errors generated during the process should be regarded as warnings.
load()
, if the validation fails or reads abnormally, it will fallback to the cache is not available, and there will be an empty scope in memoryUnresolved Questions & TODOs
Arc
is still needed. Perhaps there is a better wayBeta Was this translation helpful? Give feedback.
All reactions