Skip to content

Implement async loading of the data #110

@albangossard

Description

@albangossard

In #109 we have implemented a simple dataloader that loads data sequentially. This is far from being optimal as during the loading of the data, the model is not training and we could use an asynchronous loading.
The idea is that at iteration n when the model is being evaluated on the data, there are other processes running in the background that prepare and load the data for the iteration n+1.

The goal of that task is to implement a prefetcher that handles all the mechanics of creating the processes and loading the data in these processes at the relevant moment with respect to where the main process (that trains the model) is in its execution.

When instantiating the dataloader, we want to be able to select between the existing flavor that loads data sequentially and this new implementation that uses multiprocessing.

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions