Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved feature importance support #747

Open
8 of 13 tasks
ablaom opened this issue Feb 28, 2021 · 5 comments
Open
8 of 13 tasks

Improved feature importance support #747

ablaom opened this issue Feb 28, 2021 · 5 comments
Assignees
Labels

Comments

@ablaom
Copy link
Member

ablaom commented Feb 28, 2021

The MLJ model API only says that model reporting feature importances
should report them in the report output by fit. But it says
nothing about the actual format of this output, and I can see
inconsistencies in the implementations. Feature importances are used
by some meta-alogorithms, such as RecursiveFeatureElimination (#426) so this
might be worth sorting out.

I propose adding a new method feature_importance(model::Model, report) to the model API to report the scores, according to some
fixed convention. Some models (e.g., LightGBM models) report multiple
types of importance scores. So I propose this method return a named
tuple keyed on the type, whose values are Float64 vectors.

edit See suggestion for format below.

edit The proposal follows that same interface patter that we have already for training_losses.

Thoughts anyone?

TODO:

@ablaom ablaom changed the title Improved feature importance support Meta-tracking issue: Improved feature importance support Mar 1, 2021
@ablaom
Copy link
Member Author

ablaom commented Mar 1, 2021

cc @boliu-christine

@ablaom
Copy link
Member Author

ablaom commented Dec 21, 2021

Here's an update on my suggestion for the format of feature importances, as returned by the proposed method feature_importances(model, report).

I think allowing models to expose multiple types of feature importance is overkill / excessively complicated. Of course multiple scores can still be declared in the report itself.

So I suggest a vector of name => float pairs, where name is a symbol:

v= [:gender =>0.23, :height =>, :weight => 0.1] 

@zsz00
Copy link
Contributor

zsz00 commented Jan 29, 2022

What is the current state of this ??
I need feature importance support !

@OkonSamuel
Copy link
Member

What is the current state of this ?? I need feature importance support !

Am still working on this. Will be done soon.

@zsz00
Copy link
Contributor

zsz00 commented Mar 6, 2022

What is the current state of this ?
@OkonSamuel

@ablaom ablaom pinned this issue Apr 11, 2022
@ablaom ablaom changed the title Meta-tracking issue: Improved feature importance support Improved feature importance support May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants