Parallel Dataloader failing when using num_workers > 0 #1161

aosakwe · 2024-05-16T13:17:17Z

Hi,

I am trying to increase the number of workers used by the dataloader but have been encountering issues. I saw issue 625 and 626 which included the warning message but cannot find an example vignette showing how to properly implement the parallel dataloader. Would it be possible to have a brief example for this?

dfalbel · 2024-05-16T14:06:20Z

When torch creates a parallel dataloader (num_workers > 1) it will create some new R processes using callr and then copy the dataset you passed on into each one of those processes. It will then run .getitem() in each of theses precesses.

Problems can arise when copying dataset into those processes, for example:

if the dataset contains torch_tensors as attributes. torch tensors are not serializable using saveRDS() thus it's hard to reliably move them between process. The alternative in this case is to not have any dataset attribute that is a tensor.
the dataset has very large attributes. if your dataset has very large attributes, theywill be copied into each process potentially using a lot of memory.
other kinds of objects that are not copiable using saveRDS(). eg connections, XML objects, anything that is a pointer.

Here's a small example running the mnist dataset in parallel:

library(torch)
library(torchvision)

dir <- "~/Downloads/mnist2"
train_ds <- mnist_dataset(
  dir,
  download = TRUE,
  transform = transform_to_tensor
)

train_dl <- dataloader(train_ds, batch_size = 128, shuffle = TRUE, num_workers = 4)
d <- coro::collect(train_dl)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Dataloader failing when using num_workers > 0 #1161

Parallel Dataloader failing when using num_workers > 0 #1161

aosakwe commented May 16, 2024

dfalbel commented May 16, 2024 •

edited

Loading

Parallel Dataloader failing when using num_workers > 0 #1161

Parallel Dataloader failing when using num_workers > 0 #1161

Comments

aosakwe commented May 16, 2024

dfalbel commented May 16, 2024 • edited Loading

dfalbel commented May 16, 2024 •

edited

Loading