Feature importance / model inspection #403

baggepinnen · 2019-12-20T00:34:09Z

It would be nice to have some integrated tools for model inspection and feature importance (FI). Below are some links to resources and what's available in scikit learn.

Scikit learn exposes a number of tools for understanding the relative importance of features in a dataset. These tools are general in the sense that they can be made to work with many different kinds of models. They are organized in a module called "Inspection" which I find fitting, since they all allow the user to somehow understand or inspect the result of fitting a model in other ways than simply measuring the error/accuracy. Some of them are linked below

Permutation FI in scikit. This one is very general, randomly shuffle one column (feature) and see how much worse the model performns.
random forest FI in scikit
partial dependence plot this is not really a FI tool, but nevertheless an interesting tool to have to understand the result of the fitted model.
Drop Column FI This one is simple, train a model without a feature and see how much worse it gets.
LIME A general and popular method model-agnostic tool for understanding the impact of perturbing inputs.
A Unified Approach to Interpreting Model Predictions. A recent paper on model inspection, going through LIME and other related methods.

tlienart · 2019-12-20T13:29:28Z

Thanks for this.

For bagging ensembles it's reasonably straightforward. Some models we interface with also have it (e.g. XGBoost) and so it can just be part of the interface (via report), I'll have a look into this for xgb.

Support for permutation/drop FI seems reasonably easy, there's just a question as to where the implementation would go, maybe a comparable "model(s) inspection" module or package in MLJ or something of the sorts.

The rest of your suggestions are a bit trickier. LIME is nice but basically is an entire package; like shap which is quoted in the article at your last point.

ablaom · 2019-12-30T20:12:06Z

Feature importance is an interesting one because most of the measures out there are rather ad-hoc and model dependent. That is, the very definition of feature importance depends on the model (eg, absolute value of a coefficient in a linear model makes no sense for a decision tree). And for certain models, eg trees and random forests, there are several inequivalent methods in common use. The paper cited above on shap describes an approach that is really model independent; unless someone is aware of another such approach, I suggest any generic MLJ tool follow that approach. There is already some implementation of SHAP in python, if I remember correctly.

tlienart · 2020-02-02T15:22:47Z

The recently created https://github.com/nredell/ShapML.jl may also be a very nice add (already compatible with MLJ as far as I can see) cc @nredell

nredell · 2020-02-17T06:45:57Z

My plans for ShapML can be found on the Discourse site--https://discourse.julialang.org/t/ml-feature-importance-in-julia/17196/12--, but I'm posting here for posterity sake.

Just sitting down for the first refactor/feature additions today. I'll code with these guidelines in mind (https://github.com/invenia/BlueStyle) as well as take a trip through the MLJ code base. And if a general feature importance package pops up in the future, I wouldn't be opposed to helping fold ShapML in if it's up to par and hasn't expanded too much by then.

ablaom · 2020-04-29T02:49:43Z

cc @sjvollmer (for summer FAIRness student, if not already aware)

ablaom · 2020-04-29T02:52:33Z

My current inclination is to see if this can be satisfactorily addressed with third party packages, such as the Shapley one. A POC would make a great MLJTutorial.

If something more integrated makes sense, though, l'm interested to here about it.

vishalhedgevantage · 2021-07-02T11:18:35Z

Any update on feature importance integration

ablaom · 2021-07-04T21:53:59Z

@vishalhedgevantage There are some GSoC students working on better integration of interpretable machine learning (LIME/Shapley). And there is this issue, which I opened to support recursive feature elimination. However, the volunteer who had expressed an interest in the latter must of got busy with other things...

Moelf · 2021-08-09T14:58:23Z

For bagging ensembles it's reasonably straightforward. Some models we interface with also have it (e.g. XGBoost) and so it can just be part of the interface (via report), I'll have a look into this for xgb.

Any movement on this?

ablaom · 2021-08-12T01:47:44Z

@Moelf Feel free to open a request to expose feature importance at XGBoost.jl.

To be honest, current priorities for MLJ favour pure julia solutions. I'm pretty sure EvoTrees.jl (which has an MLJ interface) exposes feature importances in the report. Perhaps you want to check out that well-maintained tree-boosting package.

Moelf · 2021-08-12T20:31:03Z

I think XGBoost is SOTA for many things (especially in my line of work, and turns out XGBoost was born in this field, amazing enough). Of course a Julia native XGBoost would be ideal and very cool, but I don't think it's on anyone's priority list

ablaom · 2021-08-13T01:57:07Z

EvoTrees.jl is a pure julia gradient tree boosting algorithm which already has a lot of functionality to be found in XGBoost and, as far as I can tell, is implementing basically the same algorithm. It does not have all the bells and whistles, but it is being actively developed.

tlienart added the enhancement New feature or request label Dec 20, 2019

ablaom added the design discussion Discussing design issues label Apr 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature importance / model inspection #403

Feature importance / model inspection #403

baggepinnen commented Dec 20, 2019 •

edited

Loading

tlienart commented Dec 20, 2019

ablaom commented Dec 30, 2019

tlienart commented Feb 2, 2020

nredell commented Feb 17, 2020

ablaom commented Apr 29, 2020

ablaom commented Apr 29, 2020

vishalhedgevantage commented Jul 2, 2021

ablaom commented Jul 4, 2021

Moelf commented Aug 9, 2021

ablaom commented Aug 12, 2021

Moelf commented Aug 12, 2021

ablaom commented Aug 13, 2021 •

edited

Loading

Feature importance / model inspection #403

Feature importance / model inspection #403

Comments

baggepinnen commented Dec 20, 2019 • edited Loading

tlienart commented Dec 20, 2019

ablaom commented Dec 30, 2019

tlienart commented Feb 2, 2020

nredell commented Feb 17, 2020

ablaom commented Apr 29, 2020

ablaom commented Apr 29, 2020

vishalhedgevantage commented Jul 2, 2021

ablaom commented Jul 4, 2021

Moelf commented Aug 9, 2021

ablaom commented Aug 12, 2021

Moelf commented Aug 12, 2021

ablaom commented Aug 13, 2021 • edited Loading

baggepinnen commented Dec 20, 2019 •

edited

Loading

ablaom commented Aug 13, 2021 •

edited

Loading