Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add start_epoch variable to base trainer train() #92

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

liamchalcroft
Copy link
Contributor

As far as I can tell, there is no way to resume training currently without the trainer beginning from epoch 1. This should avoid overwrites and allow to continue correctly in schedulers (not tested)

As far as I can tell, there is no way to resume training currently without the trainer beginning from epoch 1. This should avoid overwrites and allow to continue correctly in schedulers
(not tested)
@clementchadebec
Copy link
Owner

Hi @liamchalcroft,

Sorry for the late reply. I indeed think this is a useful feature that I will integrate in the near future. Nonetheless, I was thinking of a method called resume_training_from_folder that will take as input the path to a folder containing the checkpoints of the model, the optimizer and the scheduler. It will then reload their state_dict and launch resume the training as you propose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants