You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all thank you for building this amazing product, I use it all the time in my job and find it very useful and performant.
Recently when dealing with a complex multi-class (8 classes) problem with few high dimensional categorical features, I noticed that wrapping the CatBoostClassifier in a sklearn.multiclass.OneVsRestClassifier improves my performance on predictions by ~10%.
This improvement is obtained at even model size so for example training 1 single CatBoostClassifier to predict 8 classes having 4000 predictors is outperformed by 8 binary (CatBoostClassifier) classifiers each having 500 predictors [all other hyper params being the same].
Given this performance difference and the fact that the sklearn wrapper is quite slow and inefficient I was wondering whether a native implementation of a one vs rest multi-class approach could be implemented directly in the CatBoostClassifier as this would probably be much more performant and optimized.
I could find no reference to this feature in the docs and hence I think it is not implemented, if it is already please let me know.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
First of all thank you for building this amazing product, I use it all the time in my job and find it very useful and performant.
Recently when dealing with a complex multi-class (8 classes) problem with few high dimensional categorical features, I noticed that wrapping the CatBoostClassifier in a sklearn.multiclass.OneVsRestClassifier improves my performance on predictions by ~10%.
This improvement is obtained at even model size so for example training 1 single CatBoostClassifier to predict 8 classes having 4000 predictors is outperformed by 8 binary (CatBoostClassifier) classifiers each having 500 predictors [all other hyper params being the same].
Given this performance difference and the fact that the sklearn wrapper is quite slow and inefficient I was wondering whether a native implementation of a one vs rest multi-class approach could be implemented directly in the CatBoostClassifier as this would probably be much more performant and optimized.
I could find no reference to this feature in the docs and hence I think it is not implemented, if it is already please let me know.
Thanks,
Kind regards,
Francesco
Beta Was this translation helpful? Give feedback.
All reactions