Adding Resume Training Functionality #83

yousefelsharkawy · 2024-10-03T12:13:09Z

Saved the data loader and optimizer states to resume training from where the model stopped
There are functions to save and load the rng states, but they seem to work on CPU only (refer to https://pytorch.org/docs/stable/random.html#torch.random.fork_rng). HOWEVER, saving and loading the model, optimizer, and data loader states seems to be sufficient to capture the state of training (I Trained the code for N steps on a single run, then did that again but for multiple runs using the resume training functionality, and I got the same loss and norms)

Adding Resume Training Functionality

73cf8d7

Provide feedback