-
Notifications
You must be signed in to change notification settings - Fork 7
Is there support for online training, and is it a full update for the embedding of the DLRM model or a little bit of an update? #648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey! Thanks very much for your interest in Modyn. If I understand your question correctly, you ask whether we support online training of a DLRM model? If so, yes, this is precisely (one of the use cases) Modyn is built for. We call it model training on growing datasets, and think of it in terms of finetuning jobs in chunks. For an introduction to our ML pipeline model, I'd recommend you check out at least the first sections of the Modyn paper. We ran DLRM training for the SIGMOD paper, however, only from a throughput, not accuracy, perspective. We use the DLRM implementation that NVIDIA provides. You can find our version here. I am not sure what you mean by "a little bit of an update". Modyn itself is an execution engine, so to say. You can configure your pipelines to your requirements, i.e., you can define what finetuning means for you. If you could elaborate what you want to achieve, I am more than happy to elaborate on this. I think a good first step would be reading the modeling and design sections of the paper to get an understanding of how Modyn operates. |
thranks for the reply, out school environment does not allow a docker environment, can modyn be run only with anaconda env? |
i have tested in anaconda env, works, but need some modifications in cmakelists |
i have read about the paper, but I am curious about the model storage part. The DLRM online training, like easyrec and deeprec, supports incremental updates in embedding tables without interrupting the model training. Does modyn support this? or just load the entire model and embedding tables into GPU whenever new data comes? |
Hey @freshduer, you have to think about the model training process in chunks. This is what we tried to formalize using the triggers and trigger training sets. You start with an empty (random) model. Then you collect N datapoints, run a regular training (forward+backward), this generates model 1. You continue to collect M new datapoints. Depending on your training config ( The model storage that you mention defines how the model gets persisted to disk (e.g., only the diff, or compressed, etc.). That is completely orthogonal to how you train the model. |
Is there support for online training, and is it a full update for the embedding of the DLRM model or a little bit of an update?
The text was updated successfully, but these errors were encountered: