Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature_clustering_selection method #209

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

brunoleme
Copy link

@brunoleme brunoleme commented Aug 24, 2022

Status

READY

Todo list

  • Documentation
  • Tests added and passed

Background context

This is a correlation-based feature selection method. But unlike the already existing correlation_feature_selection which does not have a criteria to selected among correlated features, feature_clustering_selection first employs a feature clustering, using absolute correlation as distance metric, following by the selection of the feature with lower 1-R2 metric from each cluster. 1-R2 metric allows to find the feature that most preserve the information (own cluster R2) from the other features from the same clusters, penalizing by the information (nearest cluster R2) present in the nearest cluster.

Description of the changes proposed in the pull request

This commit will add the feature selection method feature_clustering_selection in fklearn/tuning/model_agnostic_fc.py

Where should the reviewer start?

The reviewer should start by method feature_clustering_selection at src/fklearn/tuning/model_agnostic_fc.py
The method test_feature_clustering_selection at fklearn/tests/tuning/test_model_agnostic_fc.py illustrates how is the method usage.

This is a correlation-based feature selection method. But unlike the already existing correlation_feature_selection that does not have a criteria to selected among correlated features, feature_clustering_selection first employs a feature clustering, using absolute correlation as distance metrics, following by the selection of the feature with lower 1-R2 metric from each cluster. 1-R2 metric allows to find the feature that most preserve the information (own cluster R2) from the other features from the same clusters, penalizing by the information (nearest cluster R2) present in the nearest cluster
@brunoleme brunoleme requested a review from a team as a code owner August 24, 2022 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant