You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similar to causal-learn and in the quest to achieve feature parity to make sure we're converging to a best-of-both implementations, we want to add aching of CI test values as a base for all skeleton learning algorithms. Starting an issue to track doing this...
We want to cache the explicit pvalues to allow users to re-run the entire algorithm using a set of different alpha values. Moreover, if they want to re-run the algorithm, it would be trivial to do so.
Implementation Thoughts
Caching can be implemented as a function of joblib. We want caching to be a function of the dataset, so we would first compute a hash of the dataset, which is used as a folder location: location of cache = '.dodiscover/<dataset_hash>'. So the cache would save to a private folder (similar to what many packages do) and then we can easily clear the cache using joblib API.
Then as a function of x_var, y_var, conditioning_set, and conditioning_test, we would let joblib.Memory cache the pvalues for us. However, another problem we need to figure out is how to best parallelize the existing CI tests using joblib.Parallel. This is actually not super trivial when I looked at it.
Assuming we can implement the parallelization, then the joblib caching would come for free almost and they would work well together without us having to right any of the "file saving and file opening" code. This is all abstracted.
Similar to causal-learn and in the quest to achieve feature parity to make sure we're converging to a best-of-both implementations, we want to add aching of CI test values as a base for all skeleton learning algorithms. Starting an issue to track doing this...
We want to cache the explicit pvalues to allow users to re-run the entire algorithm using a set of different alpha values. Moreover, if they want to re-run the algorithm, it would be trivial to do so.
Implementation Thoughts
Caching can be implemented as a function of joblib. We want caching to be a function of the dataset, so we would first compute a hash of the dataset, which is used as a folder location:
location of cache = '.dodiscover/<dataset_hash>'
. So the cache would save to a private folder (similar to what many packages do) and then we can easily clear the cache using joblib API.Then as a function of
x_var, y_var, conditioning_set, and conditioning_test
, we would letjoblib.Memory
cache the pvalues for us. However, another problem we need to figure out is how to best parallelize the existing CI tests usingjoblib.Parallel
. This is actually not super trivial when I looked at it.Assuming we can implement the parallelization, then the joblib caching would come for free almost and they would work well together without us having to right any of the "file saving and file opening" code. This is all abstracted.
xref: https://joblib.readthedocs.io/en/latest/auto_examples/nested_parallel_memory.html#sphx-glr-auto-examples-nested-parallel-memory-py
cc: @jaron-lee
The text was updated successfully, but these errors were encountered: