Incorporate async-compatible, process-subset-compatible barrier function. With this API, barrier names must be completely unique, and many places must be adjusted to take this into account. There is also an issue with inconsistency between jax.process_index()
and distributed.client.process_id
, which is resolved by initialize_runtime_to_distributed_ids
. This function will be removed as soon as the discrepancy is resolved.
#879
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Incorporate async-compatible, process-subset-compatible barrier function. With this API, barrier names must be completely unique, and many places must be adjusted to take this into account. There is also an issue with inconsistency between
jax.process_index()
anddistributed.client.process_id
, which is resolved byinitialize_runtime_to_distributed_ids
. This function will be removed as soon as the discrepancy is resolved.Adds a test decorator to allow testing code that uses this barrier, since barrier name collisions between test cases exercising the same code caused problems.
Set async checkpointing to true by default for the emergency.CheckpointManager.