Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] De-couple evaluation monitoring from booster verbosity #6172

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

david-cortes
Copy link
Contributor

Fixes #6162

This PR decouples the verbosity of boosters from the printing of evaluation metrics during cross-validation when calling lgb.cv.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up, I support separating configuration of when these logs get printed.

But instead of introducing a new keyword argument in this interface, I think it'd be better to handle "print eval metrics every eval_freq iterations" via callbacks.

The same thing was done in the Python package here... keyword argument verbose_eval, whose purpose was very similar to the print_metrics proposed in this PR, was deprecated in #4574 and removed in #4878, in favor of users passing the lgb.log_evaluation() callback.

XGBoost does something like that xgb.cv() in its R package... exporting a callback cb.print.evaluation() ...

https://github.com/dmlc/xgboost/blob/6c0a190f6d12d2ba6a1cabd7741881ea1913d433/R-package/NAMESPACE#L23

... and also optionally adding it if it wasn't provided but a keyword argument verbose is passed

https://github.com/dmlc/xgboost/blob/6c0a190f6d12d2ba6a1cabd7741881ea1913d433/R-package/R/xgb.cv.R#L164-L166.

So rather than introduce a new R-package-only parameter that uses and maintainers have to think about, I think it'd be preferable to:

  1. export callback cb_print_evaluation() (changing keyword argument period to be named eval_freq for consistency)

cb_print_evaluation <- function(period) {

  1. make callbacks = list(cb_print_evaluation(1L)) the default in lgb.cv()
  2. remove any logic in lgb.cv() that conditionally adds that to the list of callbacks

What do you think? also tagging @mayer79 @jmoralez

If you support that path but aren't interested in doing the work of exporting that callback, let me know and I'll happily do it.


tagging related issue: #2479

@david-cortes
Copy link
Contributor Author

Regarding the exporting of callbacks, I think a first step would be to document their structure and what they are passed. Currently, it's very unclear to me what exactly should a callback conform to and what exactly does the variable they get passed have inside at each point. I'll leave the exporting and documentation to package maintainers here.

Regarding the proposed logic - I am not sure if that's the best way. Currently, there are two separate callbacks for 'record' and for 'eval_freq' which feel like they should be doing the same calculation, but with the printing one potentially emitting logs less frequently. I think it'd be easier to understand if there was only one such callback with a printing frequency argument and perhaps a flag for signaling an early stop.

@jameslamb
Copy link
Collaborator

Regarding the exporting of callbacks, I think a first step would be to document their structure and what they are passed.

I agree! I can work on that (tracked in #2479).

I think it'd be easier to understand if there was only one such callback with a printing frequency argument and perhaps a flag for signaling an early stop.

That's great feedback, thank you.

Thankfully these aren't yet really part of the R package's public API so we have time to consider alternative designs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R-package] lgb.cv mixes booster verbosity with evaluation metric monitoring
2 participants