Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature importance / model inspection #403

Open
baggepinnen opened this issue Dec 20, 2019 · 12 comments
Open

Feature importance / model inspection #403

baggepinnen opened this issue Dec 20, 2019 · 12 comments
Labels
design discussion Discussing design issues enhancement New feature or request

Comments

@baggepinnen
Copy link

baggepinnen commented Dec 20, 2019

It would be nice to have some integrated tools for model inspection and feature importance (FI). Below are some links to resources and what's available in scikit learn.

Scikit learn exposes a number of tools for understanding the relative importance of features in a dataset. These tools are general in the sense that they can be made to work with many different kinds of models. They are organized in a module called "Inspection" which I find fitting, since they all allow the user to somehow understand or inspect the result of fitting a model in other ways than simply measuring the error/accuracy. Some of them are linked below

@tlienart
Copy link
Collaborator

Thanks for this.

For bagging ensembles it's reasonably straightforward. Some models we interface with also have it (e.g. XGBoost) and so it can just be part of the interface (via report), I'll have a look into this for xgb.

Support for permutation/drop FI seems reasonably easy, there's just a question as to where the implementation would go, maybe a comparable "model(s) inspection" module or package in MLJ or something of the sorts.

The rest of your suggestions are a bit trickier. LIME is nice but basically is an entire package; like shap which is quoted in the article at your last point.

@tlienart tlienart added the enhancement New feature or request label Dec 20, 2019
@ablaom
Copy link
Member

ablaom commented Dec 30, 2019

Feature importance is an interesting one because most of the measures out there are rather ad-hoc and model dependent. That is, the very definition of feature importance depends on the model (eg, absolute value of a coefficient in a linear model makes no sense for a decision tree). And for certain models, eg trees and random forests, there are several inequivalent methods in common use. The paper cited above on shap describes an approach that is really model independent; unless someone is aware of another such approach, I suggest any generic MLJ tool follow that approach. There is already some implementation of SHAP in python, if I remember correctly.

@tlienart
Copy link
Collaborator

tlienart commented Feb 2, 2020

The recently created https://github.com/nredell/ShapML.jl may also be a very nice add (already compatible with MLJ as far as I can see) cc @nredell

@nredell
Copy link

nredell commented Feb 17, 2020

My plans for ShapML can be found on the Discourse site--https://discourse.julialang.org/t/ml-feature-importance-in-julia/17196/12--, but I'm posting here for posterity sake.

Just sitting down for the first refactor/feature additions today. I'll code with these guidelines in mind (https://github.com/invenia/BlueStyle) as well as take a trip through the MLJ code base. And if a general feature importance package pops up in the future, I wouldn't be opposed to helping fold ShapML in if it's up to par and hasn't expanded too much by then.

@ablaom
Copy link
Member

ablaom commented Apr 29, 2020

cc @sjvollmer (for summer FAIRness student, if not already aware)

@ablaom
Copy link
Member

ablaom commented Apr 29, 2020

My current inclination is to see if this can be satisfactorily addressed with third party packages, such as the Shapley one. A POC would make a great MLJTutorial.

If something more integrated makes sense, though, l'm interested to here about it.

@ablaom ablaom added the design discussion Discussing design issues label Apr 29, 2020
@vishalhedgevantage
Copy link

Any update on feature importance integration

@ablaom
Copy link
Member

ablaom commented Jul 4, 2021

@vishalhedgevantage There are some GSoC students working on better integration of interpretable machine learning (LIME/Shapley). And there is this issue, which I opened to support recursive feature elimination. However, the volunteer who had expressed an interest in the latter must of got busy with other things...

@Moelf
Copy link

Moelf commented Aug 9, 2021

For bagging ensembles it's reasonably straightforward. Some models we interface with also have it (e.g. XGBoost) and so it can just be part of the interface (via report), I'll have a look into this for xgb.

Any movement on this?

@ablaom
Copy link
Member

ablaom commented Aug 12, 2021

@Moelf Feel free to open a request to expose feature importance at XGBoost.jl.

To be honest, current priorities for MLJ favour pure julia solutions. I'm pretty sure EvoTrees.jl (which has an MLJ interface) exposes feature importances in the report. Perhaps you want to check out that well-maintained tree-boosting package.

@Moelf
Copy link

Moelf commented Aug 12, 2021

I think XGBoost is SOTA for many things (especially in my line of work, and turns out XGBoost was born in this field, amazing enough). Of course a Julia native XGBoost would be ideal and very cool, but I don't think it's on anyone's priority list

@ablaom
Copy link
Member

ablaom commented Aug 13, 2021

EvoTrees.jl is a pure julia gradient tree boosting algorithm which already has a lot of functionality to be found in XGBoost and, as far as I can tell, is implementing basically the same algorithm. It does not have all the bells and whistles, but it is being actively developed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design discussion Discussing design issues enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants