Add feature_clustering_selection method #209
Open
+219
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Status
READY
Todo list
Background context
This is a correlation-based feature selection method. But unlike the already existing correlation_feature_selection which does not have a criteria to selected among correlated features, feature_clustering_selection first employs a feature clustering, using absolute correlation as distance metric, following by the selection of the feature with lower 1-R2 metric from each cluster. 1-R2 metric allows to find the feature that most preserve the information (own cluster R2) from the other features from the same clusters, penalizing by the information (nearest cluster R2) present in the nearest cluster.
Description of the changes proposed in the pull request
This commit will add the feature selection method feature_clustering_selection in fklearn/tuning/model_agnostic_fc.py
Where should the reviewer start?
The reviewer should start by method feature_clustering_selection at src/fklearn/tuning/model_agnostic_fc.py
The method test_feature_clustering_selection at fklearn/tests/tuning/test_model_agnostic_fc.py illustrates how is the method usage.