Early stopping criteria is only checked for the `ReduceLROnPlateau` scheduler #215

mcognetta · 2023-08-25T04:34:01Z

Describe the bug
Early stopping and patience is only considered in the ReduceLROnPlateau. For example, using bleu as an early stopping criteria and the WarmupInverseSquareRootScheduler will not stop training early, even if validation performance does not improve for more than patience iterations.

The issue is in these lines:

joeynmt/joeynmt/training.py

Lines 532 to 540 in 0968187

 if self.stats.is_min_lr or self.stats.is_max_update: 

 break 

 if self.stats.is_min_lr or self.stats.is_max_update: 

 log_str = (f"minimum lr {self.learning_rate_min}" 

 if self.stats.is_min_lr else 

 f"maximum num. of updates {self.max_updates}") 

 logger.info("Training ended since %s was reached.", log_str) 

 break

There should be an additional stats variable checking if validation performance has not increase in N iterations.

To Reproduce
Steps to reproduce the behavior:

Use any of the basic training configurations.
Change the scheduling parameter to scheduling: "warmupinversesquareroot"
Run the training run.
Observe that training does not stop until max epochs, even if the validation performance stops improving.

Logged output
If possible, please add the training log and validation reports or excerpts of them.

Expected behavior
Training stops early when validation stops improving.

System (please complete the following information):

OS: Linux
CPU & GPU
Python3.10

Side note: Sorry for the influx of issues (there will be a few more in coming days). I am getting ramped up on this repo for a project of mine, so I have been doing lots of exploring. I will help to contribute fixes to anything that I report here.

The text was updated successfully, but these errors were encountered:

may- · 2023-09-04T01:09:03Z

@mcognetta
Thank you for reporting!
I feel it's not a bug, but an expected behaviour.

Actually, we initially introduced the patience parameter for the plateau scheduler only.

joeynmt/configs/transformer_small.yaml

Line 98 in 0968187

 patience: 5 # specific to plateau scheduler: wait for this many validations without improvement before decreasing the learning rate 

The term patience in plateau scheduler doesn't directly control the early-stopping, but the LR reduction. Early-stopping is triggered when the LR reached the specified minimum value.

Although Joeynmt doesn't stop the iteration, it skips to save the weights, if the validation performance has not increased.

joeynmt/joeynmt/training.py

Lines 654 to 658 in 0968187

 # save checkpoints 

 is_better = (self.stats.is_better(ckpt_score, self.ckpt_queue) 

 if len(self.ckpt_queue) > 0 else True) 

 if self.num_ckpts < 0 or is_better: 

 self._save_checkpoint(new_best, ckpt_score)

I'm not sure how many users wish to stop the iteration instead of just skipping the checkpoint. I'm happy to discuss further!!

mcognetta · 2023-09-04T08:46:23Z

Hi, thanks for the background info.

IMO, the main reason one would want a validation-based early-stopping mechanism that isn't tied directly to the learning rate is when you are training several models on a corpus that you don't know a priori how long it should take to converge. For example, when I start on a new corpus, I would like to be able to just set the max_epochs to 100 and let it run until convergence (and maybe it converges in 30 epochs and I don't want to waste 70 epochs worth of compute. Tying it to learning rate works for the plateau scheduler (and, I guess in theory the inverse sqrt one, but it would take a ton of steps to decay the LR enough to stop training), but not the other schedulers. This is particularly helpful when you are running many jobs in parallel and you don't want to waste a bunch of compute on epochs that are diverging and you don't want to have to be manually inspecting the validation scores for all of them throughout training.

I know this library has different goals than fairseq (and, in fact, this is why I am so interested in this library), but it should be noted that this functionality is the default in fairseq and most other NMT packages I have used and this meaning of patience is pretty common in the literature.

I have a fork where I implemented it in just a few lines (I kept patience the same for the plateau scheduler and introduced a new config variable that implements this idea). I will open a PR if this is not a hard no (ofc, happy to discuss further).

may- · 2023-09-04T14:20:10Z

I understand there is such a situation and would be nice.

My concern is how essential the functionality is, though. You know, I also implement many small "nice-to-have"s, but keep them in my personal fork and don't merge them to the main branch.
To me, this sounds more like a job of a hyperparam tuning tool such as optuna (they have pruning callbacks) and not necessarily coded in official joey. But yeah, it's just my preference, maybe. If joey users are happy with it, I have of course no objection at all 😄

Thank you again for your effort and deep consideration! I really appreciate that. I'm so glad to see our joey is still growing 🐨

mcognetta · 2023-09-15T01:04:20Z

Understood, ofc I will defer to your judgement. However, do you mind if I open (and then immediately close) a PR here linked to this issue so that others can use it in the future if they need it (adding the relevant code snippets here is a little cumbersome and it would be good for posterity)? It may also serve as a place for the discussion about if this feature is wanted by many users or not.

Thanks!

may- · 2023-09-15T07:52:12Z

Yes, open a PR, please! I'm more than happy for democratic decision.
Honestly, I'm sometimes afraid that this project would have some bias towards my personal preferences. Always nice to have fresh ideas and manage with many eyes. Thank you for your effort!!

may- added enhancement New feature or request work in process We are now working on this issue. labels Sep 15, 2023

mcognetta mentioned this issue Sep 15, 2023

RFC: Add early stopping for all LR schedulers based on validation score improvements #218

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early stopping criteria is only checked for the `ReduceLROnPlateau` scheduler #215

Early stopping criteria is only checked for the `ReduceLROnPlateau` scheduler #215

mcognetta commented Aug 25, 2023 •

edited

may- commented Sep 4, 2023 •

edited

mcognetta commented Sep 4, 2023

may- commented Sep 4, 2023

mcognetta commented Sep 15, 2023

may- commented Sep 15, 2023

Early stopping criteria is only checked for the ReduceLROnPlateau scheduler #215

Early stopping criteria is only checked for the ReduceLROnPlateau scheduler #215

Comments

mcognetta commented Aug 25, 2023 • edited

may- commented Sep 4, 2023 • edited

mcognetta commented Sep 4, 2023

may- commented Sep 4, 2023

mcognetta commented Sep 15, 2023

may- commented Sep 15, 2023

Early stopping criteria is only checked for the `ReduceLROnPlateau` scheduler #215

Early stopping criteria is only checked for the `ReduceLROnPlateau` scheduler #215

mcognetta commented Aug 25, 2023 •

edited

may- commented Sep 4, 2023 •

edited