-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AEP: Caching with remote data #35
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @unkcpz for describing the problems and some solutions. I think this AEP would benefit from some slight restructuring just to make it a bit more clear what the current problems are and what you are addressing exactly. I also think we should add the problem of cache invalidation when a RemoteData
is cleaned. I propose you adopt a structure like the following
Problems
- Shallow copy on
RemoteData.clone()
- Invalidate cache after
RemoteData.clean()
- Hash calculation of
RemoteData
- Prospective workchain caching
Now describe in more detail in following subheaders.
1. Shallow copy
Contents of the RemoteData
should really be cloned on the remote as well, not just the reference in AiiDA's database.
2. Invalidate cache after clean
When RemoteData._clean()
is called, an attribute or extra should be set which will cause it to no longer be considered as a valid cache source.
3. Hash calculation of RemoteData
The hash of a RemoteData
is computed based on the absolute filepath on the remote file system. This means that two nodes that have identical contents, but have different base folders, will have different hashes. Ideally, the hash should be calculated based on the hash of all contents, independent of the location on the remote.
4. Prospective workchain caching
If an entire workchain has already been run once, in principle it is not necessary to run its individual calcjobs again and we can simply cache everything. Currently, the engine will still execute each step and consider each calcjob individually whether it should be cached. If one of the remote folders of the cached jobs has been cleaned in the meantime, the workchain will fail, even though all results are essentially known. One might think that we could just add a check that if the workchain with the same inputs exist, we simply clone the entire matched workchain, including everything it ran, without running anything. This assumes that there is no variability of workchain based on their inputs, but I am not 100% sure this is guaranteed.
Proposed solution
Describe which of the problems you address and how.
I think you should definitely spend some time about thinking how you would address problem 1. It might seem easy, but the cloning will often happen by a daemon worker and so the opening of the transport should go through the transport queue. However, the call for the clone comes somewhere from Node.store()
and it is not evident how to get access to the transport queue in a "nice" non-hacky way. I think this might be more tricky to do properly than it seems on the surface.
Hi @sphuber, I update the description of AEP, but not sure about the ways how to implement it. I think we need to set a discussion. |
draft
README.md