You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, dist.ddp and dist.spmd are basically identical (the latter being a lightweight wrapper on the former). Also, they could be named more explicitly — dist.ddp doesn't actually involve Distributed Data Parallel, it just calls torchrun.
Motivation/Background
All else equal, simplification and explicit naming are good. For example, users leveraging Fully Sharded Data Parallel instead of DDP may find it confusing that they should be using dist.ddp.
Detailed Proposal
Refactor components/dist.py by combining the methods for ddp and spmd into one method called torchrun. Update docs, tests, examples, and callsites as appropriate.
Alternatives
Leave thing as-is.
Remove ddp by rolling it into spmd and keep the spmd method, so dist.spmd is the only available command and it has a "good enough" name.
The text was updated successfully, but these errors were encountered:
schmidt-ai
changed the title
Combine / rename dist.ddp and dist.spmd to dist.torchrun
Combine / rename dist.ddp and dist.spmd into dist.torchrunDec 8, 2023
Description
Currently,
dist.ddp
anddist.spmd
are basically identical (the latter being a lightweight wrapper on the former). Also, they could be named more explicitly —dist.ddp
doesn't actually involve Distributed Data Parallel, it just callstorchrun
.Motivation/Background
All else equal, simplification and explicit naming are good. For example, users leveraging Fully Sharded Data Parallel instead of DDP may find it confusing that they should be using
dist.ddp
.Detailed Proposal
Refactor
components/dist.py
by combining the methods forddp
andspmd
into one method calledtorchrun
. Update docs, tests, examples, and callsites as appropriate.Alternatives
ddp
by rolling it intospmd
and keep thespmd
method, sodist.spmd
is the only available command and it has a "good enough" name.Additional context/links
@danielbear
The text was updated successfully, but these errors were encountered: