Warnings regarding scikit-learn version: Version mismatch might lead to invalid results #22

bockthom · 2023-06-12T09:23:00Z

Up to now, I used BoDeGHa with sci-learn version 0.22, as stated in requirements.txt:

scikit-learn == 0.22

However, when installing BoDeGHa freshly, it uses sci-learn version 1.0.1, since this is the version given in setup.py:

BoDeGHa/setup.py

Line 34 in ac8a5d6

'scikit-learn == 1.0.1',

But using 1.0.1 leads to warnings when running BoDeGHa, as the pretrained model was trained with 0.22:

bodegha/lib/python3.10/site-packages/sklearn/base.py:324: UserWarning: 
Trying to unpickle estimator DecisionTreeClassifier from version 0.22 when using version 1.0.1. 
This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
warnings.warn(
bodegha/lib/python3.10/site-packages/sklearn/base.py:324: UserWarning: 
Trying to unpickle estimator RandomForestClassifier from version 0.22 when using version 1.0.1. 
This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
warnings.warn(
bodegha/lib/python3.10/site-packages/sklearn/base.py:438: UserWarning: 
X has feature names, but RandomForestClassifier was fitted without feature names
warnings.warn(

So, as there is a mismatch of the scikit-learn versions in your repository, this needs to be fixed somehow – using a pretrained model that was not trained using the current scikit-learn version could lead to wrong results.
To fix this, one can either set the scikit-learn version in setup.py back to 0.22, or you need to provide a new pretrained model for 1.0.1 in the repository.

I tried to set the version of scikit-learn in setup.py back to 0.22 , but without success: scikit-learn 0.22 is not compatible with the current version of numpy any more (AttributeError: module 'numpy' has no attribute 'float'. `np.float` was a deprecated alias for the builtin 'float'). Downgrading numpy to version 1.19.5 (the version before the deprecation of np.float) was not possible, as numpy 1.19.5 does not work with python 3.10. Installing numpy 1.21.2 (which is compatible with python 3.10), results in another error (ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject). I also tried other versions of numpy in-between 1.19.5 and 1.21.2, also without success.
So, finally, I did not manage to install scikit-learn version 0.22 on python3.10, on which your pretrained model was trained.

Could you please update the pretrained model in this repository to work with scikit-learn 1.0.1? – or could you prove that using your 0.22-pretrained model with 1.0.1 is still correct and prevent the corresponding warnings somehow?

Thanks in advance! This would help a lot and increase the reliability of your tool when such risk warnings would disappear 😉

The text was updated successfully, but these errors were encountered:

AlexandreDecan · 2023-06-17T08:42:10Z

@mehdigolzadeh can you take care of this? Is there an easy way to convert the model to the more recent version of sklearn?

AlexandreDecan · 2023-06-17T08:45:42Z

https://scikit-learn.org/stable/model_persistence.html#interoperable-formats

bockthom · 2023-07-29T11:18:54Z

Are there any news regarding this issue?

AlexandreDecan · 2023-07-30T14:37:39Z

I sent an email to the maintainer. I think he changed his email address, explaining why he's not even aware of this issue :-) Let's wait a few days for him to react.

As a side note, a researcher in my team is currently working on a new approach to detect bots in repositories hosted on GitHub, based on the various activities they make. The main difference compared to Bodegha is that the new model/tool will rely on a limited number of queries on the GitHub API, implying it will be much faster to detect bots in practice. However, so far, we have no insight about the accuracy of this approach but we are confident it will be, at least, comparable to Bodegha's accuracy. That said, do not expect the tool to be released before October/November :-)

AlexandreDecan · 2023-08-10T13:12:46Z

@mehdigolzadeh Any update?

bockthom · 2023-08-28T08:32:03Z

Still no reaction from the maintainer?
@AlexandreDecan Have you been able to successfully contact him via email?

(And also thanks for your side note. Nevertheless, I would like to stay with BoDeGHa, at least, for a certain time, as it is already part of my toolchain, and changing tools always implies additional efforts...)

AlexandreDecan · 2023-08-28T10:43:55Z

He reacted by mail saying he would give some feedback "soon"... :-) I've just sent another email.

mehdigolzadeh · 2023-08-30T22:06:48Z

I apologize for the delayed response; I've been swamped with numerous tasks. Unfortunately, I couldn't find the time to run and train a new model, but I did come up with a quick temporary fix. The warning is still present, but I've ignored it because the model is functioning without any problems. I plan to train the model using the new version of scikit-learn as soon as I have some free time.

AlexandreDecan · 2023-08-31T06:21:15Z

If the model is still working with the new version of sklearn, would it be possible to load it in the new version and to export it with the new model format?

mehdigolzadeh · 2023-08-31T08:00:54Z

I did this. Now, the model is exported using the new version of scikit-learn. However, I couldn't resolve the warning because the parameter needs to be passed during training.

AlexandreDecan assigned mehdigolzadeh Jun 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warnings regarding scikit-learn version: Version mismatch might lead to invalid results #22

Warnings regarding scikit-learn version: Version mismatch might lead to invalid results #22

bockthom commented Jun 12, 2023

AlexandreDecan commented Jun 17, 2023

AlexandreDecan commented Jun 17, 2023

bockthom commented Jul 29, 2023

AlexandreDecan commented Jul 30, 2023

AlexandreDecan commented Aug 10, 2023

bockthom commented Aug 28, 2023

AlexandreDecan commented Aug 28, 2023

mehdigolzadeh commented Aug 30, 2023

AlexandreDecan commented Aug 31, 2023

mehdigolzadeh commented Aug 31, 2023

Warnings regarding scikit-learn version: Version mismatch might lead to invalid results #22

Warnings regarding scikit-learn version: Version mismatch might lead to invalid results #22

Comments

bockthom commented Jun 12, 2023

AlexandreDecan commented Jun 17, 2023

AlexandreDecan commented Jun 17, 2023

bockthom commented Jul 29, 2023

AlexandreDecan commented Jul 30, 2023

AlexandreDecan commented Aug 10, 2023

bockthom commented Aug 28, 2023

AlexandreDecan commented Aug 28, 2023

mehdigolzadeh commented Aug 30, 2023

AlexandreDecan commented Aug 31, 2023

mehdigolzadeh commented Aug 31, 2023