Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for data not fitting in memory #163

Open
ablaom opened this issue Jun 15, 2021 · 0 comments
Open

Add support for data not fitting in memory #163

ablaom opened this issue Jun 15, 2021 · 0 comments

Comments

@ablaom
Copy link
Collaborator

ablaom commented Jun 15, 2021

Currently resampling of observations, as in MLJ's evaluate!, TunedModel and IteratedModel does not work with out-of-memory data formats, even though Tables.jl supports some of these, because random access to observations is achieved by materialising the data into RAM. And all models currently materialize data in-memory for training (because that is how the algorithms are implemented, not because of a limitation on MLJ's API).

In many NN use-cases however, one is less interested in automated resampling, and often interested in training with data that does not fit into memory. See also this discussion on batching. It would be good to adapt MLJFlux models to handle such cases.

One suggestion is to use DataLoaders.jl for this, which is what FastAI does.

It may also be that the “data front end” that MLJ models can implement will be of some help here https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/#Implementing-a-data-front-end but we should also keep #97 in mind when we implement this. Currently we cache data for warm restart purposes, but implementing the data front end could render this unnecessary.

There are minor issues around scitypes. For example, if input X is not literally a vector of images, but only a proxy for lazy loading, then scitype will need to be overloaded to recognise it as still having the same scitype.

I have not had time to really look into details of how this might work but wanted to flag this and am happy to provide guidance to anyone wanting to investigate this further.

@lorenzoh @deyandyankov @Leonardbcm @ayush-1506

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant