-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port sample/feature selection from equisolve to metatensor-learn
#560
base: master
Are you sure you want to change the base?
Conversation
Here is a pre-built version of the code in this pull request: wheels.zip, you can install it locally by unzipping |
What is the status here? Code looks good as it is just ported. There seems to be an import issue though
|
For me the code looks good - Basically because I wrote back in good old equisolve times. @Luthaf you want also take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions needs to be added to the docs! And it would also be nice to have some tutorial on samples/feature selections.
@@ -0,0 +1,172 @@ | |||
import functools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid creating a bunch of utils just for this feature? Ideally this would use existing TensorMap fixtures/files from the repo.
@@ -136,6 +137,7 @@ deps = | |||
{[testenv]packaging_deps} | |||
{[testenv]testing_deps} | |||
torch=={env:METATENSOR_TESTS_TORCH_VERSION:2.2.*} | |||
skmatter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have one test environment without optional dependencies, to make sure we can run without them?
self._select_distance = None | ||
|
||
@property | ||
def selector_class(self) -> Type[skmatter._selection.GreedySelector]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to re-export all these properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to. We could have a very slim version that only exposes the fit
and transform
, fit_transform
, inverse_transform
.
I added them in the past to be API compatible with the numpy versions in skmatter. But especially selector_class
seems a bit superflous now that I read this again.
@property | ||
def support(self) -> TensorMap: | ||
""" | ||
TensorMap containing the support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this property, it should have more information on what the support is, and what metadata the TensorMap contains
For each block, the metadata of the relevant axis (i.e. samples or properties, | ||
depending on whether sample or feature selection is being performed) is sorted | ||
and returned according to the Hausdorff distance, in descending order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not very clear. If I do sample selection, I get a tensor with "haussdorf distance" as property, no components and the samples sorted?
How does the output looks like for property selection? I would go for the same thing, except using the input tensor properties as the samples of the get_select_distance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically you just swap samples and properties. Without looking into the code again a assume you get one sample "haussdorf distance" no components and properties sorted.
@property | ||
def get_select_distance(self) -> TensorMap: | ||
""" | ||
Returns a TensorMap containing the Hausdorff distances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the Hausdorff distance only used for FPS? Why is this function defined at the base class level instead of the FPS class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is. I assume it is the same in skmatter but we can move this only the FPS classes or create a new base class that only the fps selectors inherit to avoid duplicating this into two classes.
n_to_select: Union[int, dict], | ||
**selector_arguments, | ||
) -> None: | ||
self._selector_class = selector_class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are dispatching based on this class, I would assert it is one of the expected class.
|
||
def transform(self, X: TensorMap) -> TensorMap: | ||
""" | ||
Reduce X to the selected features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduce X to the selected features. | |
Reduce X to the selected samples/features. |
|
||
class FPS(GreedySelector): | ||
""" | ||
Transformer that performs Greedy Feature Selection using Farthest Point Sampling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is quite unclear to a user ...
Transformer that performs Greedy Feature Selection using Farthest Point Sampling. | |
Perform feature selection using Farthest Point Sampling (FPS). |
I would also add a link to some FPS paper or a short explanation of what this is doing. Same thing for all other selector classes!
If `n_to_select` is a dict, it must have keys that are tuples corresponding to the | ||
key values of each block. In this case, the values of the `n_to_select` dict can be | ||
int that specify different number of features to select for each block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to use a dict here? Or should we do the same trick as the other classes (i.e. Labels
+ List[T]
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and Labels
are the keys of the blocks? and why a List[Tensors]
and not List[int]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant List[int]
, the T
above was used as a template parameter! (this is basically LabelsDict[int]
)
Closes #558.
Ports over the sample/feature selection tools (i.e. FPS and CUR) for TensorMaps, from equisolve to metatensor-learn.
Contributor (creator of pull-request) checklist
Reviewer checklist
馃摎 Documentation preview 馃摎: https://metatensor--560.org.readthedocs.build/en/560/