-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC mechanism based on full scan #155
Comments
My simple flow:
And the clients will access the |
Your flow is almost right. The current approach under consideration is to scan, mark, and periodically collect. The marking occurs either when the object or chunk is scanned by the GC operation or when it is used. This process needs further refinement. For example, the current mechanism for updating the "last-access-time" when a chunk is used is missing, and this should be taken into account as well. The GC (Garbage Collection) work should ideally be executed quietly during the idle time of the cyfs-stack, making it "unnoticeable" to external users. |
I see, it's great, I have used it to clear the timeout cache( I learned it theoretically today. |
I think some pseudocode is needed to show and help application developers understand this process. And I believe, such as this process is related to the implementation of RootState. It should be because before there is no concept of Root State, we have no way to accurately distinguish between Object saving and caching. Since it is strongly related to the operation of Root State, there are two types of GC TYPE I : post-scan based The post-scan is also similar to the current Root State backup, starting from the root path, traversing all paths and marking the corresponding objects as "valid". Objects not marked as valid can be deleted after the scan is complete. During the scan, the system is read-only. TYPE II: based on real-time operation We have APIs for adding objects and deleting objects to the Root State, and in these two APIs, the reference counts of objects are operated (can be asynchronous). When the system has no unscanned operations to add objects, it can execute the GC function to delete objects whose reference count is less than 1. Perhaps we can further illustrate the difference between these two directions through pseudocode. The system itself has strong and consistent synchronization and backup requirements, and it already has the implementation of TYPE I. The implementation of TYPE II is related to the transactional nature of the Root State API. Optimization can achieve "only scan the changed part". It seems that it can reduce the time of system frozen, but there may be some insurmountable difficulties? |
I learn a word does it mean as the |
roots in the GC mechanism is a very key role, you can refer to the following introduction document
In our GC system, roots include all global-state roots and a list of free objects (which may be outside of the global-state) |
The impact of rmeta and object metaAnother key concept in GC is rmeta and object meta, which will directly affect the result of GC.
So whether it is a full scan or an incremental scan based on global-state change, it can be done based on the above meta information, which is where the complexity of GC lies GC mechanism based on generationDuring GC full scan, it is not necessary to set the whole global-state to read-only mode: on the one hand, the full scan may take a long time, and setting it to read-only will have too much impact on the whole system; on the other hand, it is not necessary, because the full scan is based on the snapshot of global-state, and if changes occur during the scan, it should not affect the result of the GC. However, there is a point to note is that based on the result of one GC scan, all unrelated objects can not be cleared, but should be based on the garbage tagging and recycling of generations, there are several principles need to be observed as follows:
GC scan based on global-state real-time changesFrom the reference-based GC principle mentioned above, the object level should not be based on reference count, but on whether the generation marker and last access time are scanned; incremental GC scan based on global-state real-time changes can be a supplement to the above full-scan GC, but only the new sub tree needs to be scanned, and there is no need to scan the removed sub tree and decrement the reference count. |
Consider adding a GC mechanism, which has been planned for a long time and is a relatively complicated mechanism, so first complete the first version based on a full scan
GC mainly includes two types of data:
Involves named-object-cache related modules
Involves named-data-cache/tracker/chunk-manager related modules
The relevant points are as follows:
If a path is marked as virtual, it will not enter the next level of scanning, so during the scanning process, it is necessary to apply rmeta to each level of path to determine whether the path is virtual
The default non-objectmap leaf node object scans at most one-level reference objects, but this default configuration can be modified through object meta, so during the scanning process, object meta also needs to be applied to non-objectmap objects to determine the references that need to be scanned object depth
The text was updated successfully, but these errors were encountered: