Add support for data not fitting in memory #163

ablaom · 2021-06-15T20:51:56Z

Currently resampling of observations, as in MLJ's evaluate!, TunedModel and IteratedModel does not work with out-of-memory data formats, even though Tables.jl supports some of these, because random access to observations is achieved by materialising the data into RAM. And all models currently materialize data in-memory for training (because that is how the algorithms are implemented, not because of a limitation on MLJ's API).

In many NN use-cases however, one is less interested in automated resampling, and often interested in training with data that does not fit into memory. See also this discussion on batching. It would be good to adapt MLJFlux models to handle such cases.

One suggestion is to use DataLoaders.jl for this, which is what FastAI does.

It may also be that the “data front end” that MLJ models can implement will be of some help here https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/#Implementing-a-data-front-end but we should also keep #97 in mind when we implement this. Currently we cache data for warm restart purposes, but implementing the data front end could render this unnecessary.

There are minor issues around scitypes. For example, if input X is not literally a vector of images, but only a proxy for lazy loading, then scitype will need to be overloaded to recognise it as still having the same scitype.

I have not had time to really look into details of how this might work but wanted to flag this and am happy to provide guidance to anyone wanting to investigate this further.

@lorenzoh @deyandyankov @Leonardbcm @ayush-1506

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for data not fitting in memory #163

Add support for data not fitting in memory #163

ablaom commented Jun 15, 2021 •

edited

Loading

Add support for data not fitting in memory #163

Add support for data not fitting in memory #163

Comments

ablaom commented Jun 15, 2021 • edited Loading

ablaom commented Jun 15, 2021 •

edited

Loading