Parallel GPU training

Is parallel GPU training support possible?  We would like to try this with a fairly large (multi-GB) dataset, but to make training time reasonable it would need to be done in parallel.  Single node parallelism with `DataParallel()` would probably work for our use case, although the PyTorch documentation suggests that `DistributedDataParallel()` is preferred even for a single node. 

Part of the motivation for this is that a large dataset needs a lot of memory, which in a cloud environment means a large, multi-GPU instance.  It is very expensive to run such a large instance for weeks with all but one of the GPUs idle.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallel GPU training #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parallel GPU training #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions