Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AEP: Caching with remote data #35

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

unkcpz
Copy link
Member

@unkcpz unkcpz commented Jun 29, 2022

  • Used AEP template from AEP 0
  • Status is draft
  • Added type & status labels to PR
  • Added AEP to README.md
  • Provided github handles for authors

@unkcpz unkcpz changed the title Caching with remote data AEP: Caching with remote data Jun 29, 2022
@unkcpz unkcpz marked this pull request as draft June 29, 2022 09:59
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @unkcpz for describing the problems and some solutions. I think this AEP would benefit from some slight restructuring just to make it a bit more clear what the current problems are and what you are addressing exactly. I also think we should add the problem of cache invalidation when a RemoteData is cleaned. I propose you adopt a structure like the following

Problems

  1. Shallow copy on RemoteData.clone()
  2. Invalidate cache after RemoteData.clean()
  3. Hash calculation of RemoteData
  4. Prospective workchain caching

Now describe in more detail in following subheaders.

1. Shallow copy

Contents of the RemoteData should really be cloned on the remote as well, not just the reference in AiiDA's database.

2. Invalidate cache after clean

When RemoteData._clean() is called, an attribute or extra should be set which will cause it to no longer be considered as a valid cache source.

3. Hash calculation of RemoteData

The hash of a RemoteData is computed based on the absolute filepath on the remote file system. This means that two nodes that have identical contents, but have different base folders, will have different hashes. Ideally, the hash should be calculated based on the hash of all contents, independent of the location on the remote.

4. Prospective workchain caching

If an entire workchain has already been run once, in principle it is not necessary to run its individual calcjobs again and we can simply cache everything. Currently, the engine will still execute each step and consider each calcjob individually whether it should be cached. If one of the remote folders of the cached jobs has been cleaned in the meantime, the workchain will fail, even though all results are essentially known. One might think that we could just add a check that if the workchain with the same inputs exist, we simply clone the entire matched workchain, including everything it ran, without running anything. This assumes that there is no variability of workchain based on their inputs, but I am not 100% sure this is guaranteed.

Proposed solution

Describe which of the problems you address and how.

I think you should definitely spend some time about thinking how you would address problem 1. It might seem easy, but the cloning will often happen by a daemon worker and so the opening of the transport should go through the transport queue. However, the call for the clone comes somewhere from Node.store() and it is not evident how to get access to the transport queue in a "nice" non-hacky way. I think this might be more tricky to do properly than it seems on the surface.

@unkcpz
Copy link
Member Author

unkcpz commented Aug 26, 2022

Hi @sphuber,

I update the description of AEP, but not sure about the ways how to implement it. I think we need to set a discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants