You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to increase the number of workers used by the dataloader but have been encountering issues. I saw issue 625 and 626 which included the warning message but cannot find an example vignette showing how to properly implement the parallel dataloader. Would it be possible to have a brief example for this?
The text was updated successfully, but these errors were encountered:
When torch creates a parallel dataloader (num_workers > 1) it will create some new R processes using callr and then copy the dataset you passed on into each one of those processes. It will then run .getitem() in each of theses precesses.
Problems can arise when copying dataset into those processes, for example:
if the dataset contains torch_tensors as attributes. torch tensors are not serializable using saveRDS() thus it's hard to reliably move them between process. The alternative in this case is to not have any dataset attribute that is a tensor.
the dataset has very large attributes. if your dataset has very large attributes, theywill be copied into each process potentially using a lot of memory.
other kinds of objects that are not copiable using saveRDS(). eg connections, XML objects, anything that is a pointer.
Here's a small example running the mnist dataset in parallel:
Hi,
I am trying to increase the number of workers used by the dataloader but have been encountering issues. I saw issue 625 and 626 which included the warning message but cannot find an example vignette showing how to properly implement the parallel dataloader. Would it be possible to have a brief example for this?
The text was updated successfully, but these errors were encountered: