-
Notifications
You must be signed in to change notification settings - Fork 634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The fixed value for feel_zeroes in get_binned_data may lead to deviation in some case. #334
Comments
Hi @chi2liu, Thanks for reporting it! We will not be able to address it immediately as the ongoing API update is currently a priority, but I added it to the list of feature requests we regularly review. Will get there! We also welcome a contribution if you want to fix it! |
Wow, I managed to mess up a bit with this issue by adding many different references in my local commits! I added opened a pull request for this with a temporal change. I added an if statement:
So, I basically did what you mentioned @chi2liu but instead of one-tenth I added 10^6. This is just to make it as close to zero (the actual value). For your case (and other similar) this will increase the Kullback Leibler divergence drift score. |
@AntonisCSt I don't think it's meaningful to use 0.0001 as threshold. Maybe it should be taken as 10^6 of the minimum value for all. Because we want to make it as close to zero (the actual value) as you mentioned. Why is the exception for greater than 0.0001? |
@chi2liu yes good point. That definitely needs a discussion. I kept it so it could contain the initial logic of the creator. But I agree. I will put an additional argument for my fix of why 10^6 and not 10^9. (hehe) |
* fix for issue #334 feel zeroes of stattests utils * black fix
Hi @emeli-dral hope you are well. thanks! |
Definitely makes more sense now ^^ |
In the implemntation of get_binned_data
evidently/src/evidently/calculations/stattests/utils.py
Line 33 in 4e4637d
By default, 0.0001 is used to fill the value with a percent of 0. It is valid when the minimum percent of non-zero is greater than 0.0001. But when the minimum percent in the data is less than or equal to 0.0001, like 0.00001. After we filling zeros, the original zero percent is now 0.0001, which is larger than the existing minimum percent(0.00001), and the data distribution is wrong after filling zeros.
The default value of fill zeros in this place, I think, should be set dynamically according to the percentage of data, rather than being fixed directly. How to set dynamically needs further discussion. For example, it should be set to one tenth of the minimum percent, or other values, but it must be guaranteed to be smaller than the minimum percent, otherwise it will be wrong.
I found this problem because when I tried to calculate Kullback Leibler divergence drift score on my data, I found that the minimum percent of my data was 0.00001, which was smaller than the default fill zeros, which resulted in incorrect Kullback Leibler divergence drift score. For example, the default calculated Kullback Leibler divergence drift score in my data is 0.41. After I change fill zeros to one tenth of the minimum percent, the calculated Kullback Leibler divergence drive score is 1.9.
The text was updated successfully, but these errors were encountered: