-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] skip integer categorical feature check when building dataset subset (fixes #6412) #6442
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The fix looks great. I really appreciate the thorough description... makes perfect sense to me.
I just left some comments about testing.
Could you please also add one more test in test_dataset.R
checking that the expected error message is actually raised under one of the conditions in that if
statement being modified here?
Similar to this one:
LightGBM/R-package/tests/testthat/test_dataset.R
Lines 620 to 622 in 6e78e69
expect_error({ | |
lgb.Dataset(raw_mat, categorical_feature = 2L)$construct() | |
}, regexp = "supplied a too large value in categorical_feature: 2 but only 1 features") |
@jmoralez if you have time in the next week or so, could you update this to latest |
The R-package 3.6 jobs are failing with this message (sample logs):
It seems evaluate 0.24.0 was published today which requires R >= 4.0.0 |
@jameslamb should I add something like LightGBM/.ci/test_r_package.sh Line 112 in 4401401
|
Minor remark @jameslamb: Dropping support for R < 4.0.0 should not be a too big deal. The main branch of XGBoost even requires R >= 4.3.0. |
Thanks for investigating it @jmoralez . I think we should do what we can to preserve R 3.x support for this release (including manually installing more packages in CI if necessary). The first release of R 4.x was only 4 years ago (https://cran.r-project.org/bin/windows/base/old/). By comparison, the oldest version of Python LightGBM supports (3.7) was first released about 6 years ago. (https://devguide.python.org/versions/) Could you try to get this PR working with R 3.6 if it's not too much effort, and write up an I do hear what you're saying @mayer79 , but I I've worked in organizations where a major-version upgrade of a language happens on a slightly longer timescale than 4 years. If the cost of keeping R 3.x support for the v4.4.0 release is "have to install a few more libraries manually in CI", I think it's worth it to do that. |
I can look into the MSVC issues right now. I'll do that over in a separate PR (#6476), in case you want to use this PR to debug ... but if I find a fix, I'll push it here. |
👀 I just noticed that all the Windows R-package jobs passed 11 hours ago, on this PR that has nothing to do with the R package:
Maybe we're gonna get lucky and the failing MSVC job here was just something temporary with GitHub Actions? I just restarted the failing job here, let's see. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look great, and thanks for the tests!
The job passed! Ok I'm treating this as just something temporary, possibly fixed by the latest updates to the |
Fixes #6412
When creating the folds for cv we slice the dataset, which just calls the constructor passing the same arguments and the
used_indices
. Iffree_raw_data = TRUE
then by the time this constructor is called there isn't raw data anymore, however, we also don't need to check if the categorical feature indices are out of bounds because this has already been checked when constructing the original dataset (and the categorical features can't be changed once the dataset has been constructed).This adds a check to see if we're building a subset and skips the validation. We can rely on that condition because it's checked afterwards
LightGBM/R-package/R/lgb.Dataset.R
Lines 202 to 212 in 6e78e69