You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is parallel GPU training support possible? We would like to try this with a fairly large (multi-GB) dataset, but to make training time reasonable it would need to be done in parallel. Single node parallelism with DataParallel() would probably work for our use case, although the PyTorch documentation suggests that DistributedDataParallel() is preferred even for a single node.
Part of the motivation for this is that a large dataset needs a lot of memory, which in a cloud environment means a large, multi-GPU instance. It is very expensive to run such a large instance for weeks with all but one of the GPUs idle.
The text was updated successfully, but these errors were encountered:
Currently, we do not support parallel GPU training, I'm sorry. DataParallel shouldn't be too difficult to setup, but I have to explore better the thing since I do not have much experience in writing multi-gpus pytorch programs.
I'll give it a try next week, and see if (and how) it works.
Is parallel GPU training support possible? We would like to try this with a fairly large (multi-GB) dataset, but to make training time reasonable it would need to be done in parallel. Single node parallelism with
DataParallel()
would probably work for our use case, although the PyTorch documentation suggests thatDistributedDataParallel()
is preferred even for a single node.Part of the motivation for this is that a large dataset needs a lot of memory, which in a cloud environment means a large, multi-GPU instance. It is very expensive to run such a large instance for weeks with all but one of the GPUs idle.
The text was updated successfully, but these errors were encountered: