Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing operations transpose, swapaxis and basic arithmetic operation #101

Closed
agoscinski opened this issue Jan 20, 2023 · 7 comments
Closed
Assignees
Labels
Python-API Related to the Python bindings to "core"

Comments

@agoscinski
Copy link
Contributor

No description provided.

@agoscinski agoscinski added the Python-API Related to the Python bindings to "core" label Jan 20, 2023
@agoscinski agoscinski self-assigned this Jan 20, 2023
@agoscinski
Copy link
Contributor Author

agoscinski commented Jan 20, 2023

I think we can generalize all functions that reduce the shape and functions that keep the shape fixed. We already have reduce_over_samples_block, but I think this can be generalized even more.

  • operation_name: str
  • parameters : list of strings, allows an application of the operation on a subset of the tensor block (TB) + gradient tensor blocks (GTBs), by default it does the operation on all of them, so it is by default ['values'] + [parameter for paramater, _ in tensor_block.gradients()]
  • axes: list of str, can contain the strings 'samples', 'components', 'properties' , by default ['samples', 'components', 'properties']
  • axes_names: dict of list str, for each specified axis in axes, it contains a list of names within the axis over which the reduction should be applied on, by default all names in the axis are used

and operations like add that do not change the shape do not need the last two paramaters

@Luthaf
Copy link
Member

Luthaf commented Jan 20, 2023

allows an application of the operation on a subset of the TB + GTBs, by default it does the operation on all of them, so it is by default ['values'] + [parameter for paramater, _ in tensor_block.gradients()]

We should never apply operations only on values or only on gradients, otherwise the values and gradients are no longer in sync (and the "gradients" are not actually gradients of the values). Doing this inside a function is fine, as long as when we give back the data to the user it is fully consistent.

but I think this can be generalized even more.

I don't think it is worth generalizing too much in the user-facing API, this would be very easily confusing. You mention reduce_over_samples_block, but this is an internal function that end users are not supposed to see. I prefer adding functions as we need them, with an API that make sense for this function.

For example, transpose would be fine, and invert samples with properties and components within themself. But swapaxis only makes sense if applied between two components, or between samples and properties.

In general, not all numpy operations have a direct equivalent in equistore, because we are imposing more structure and meaning than a pure numpy array.

@agoscinski
Copy link
Contributor Author

Right, the consistency is important. I think I posted here too much of a brain dump here, sorry for that. I thought about functions that might be used for standardizing features, but not all are needed. I think basic arithmetic would already make this possible with the current reduce function.

For example, transpose would be fine, and invert samples with properties and components within themself

Doesn't it result in the loss of the meaning of the gradient, because the gradient is the derivative of the properties wrt. the parameter?

@ceriottm
Copy link
Contributor

Doesn't it result in the loss of the meaning of the gradient, because the gradient is the derivative of the properties wrt. the parameter?

I think that what @Luthaf means is that any operation should be applied so that it leaves properties and gradients in a consistent state. E.g. scalar multiplication should be applied to both values and gradients. Scalar addition of a constant should be applied only to values, etc.

It is not entirely trivial how to formalize this, and it'd be crucial that it's done in a way that is consistent with autograd e.g. in torch.

@Luthaf
Copy link
Member

Luthaf commented Jan 23, 2023

Doesn't it result in the loss of the meaning of the gradient, because the gradient is the derivative of the properties wrt. the parameter?

You're right!

This is somehow related to #100: the second argument to dot is "transposed" compared to the first (properties of the first are samples of the second argument), which also explains why having gradient there is harder.

@agoscinski
Copy link
Contributor Author

agoscinski commented Feb 10, 2023

I split this issue in sub issues

I think all these cases share the problem, that sometimes we want to want to apply the operation on everything in the TensoBlock equally (I mean here also the gradients), sometimes we want to apply the operation consistently with the gradients, so X -> f(X), and ∇X -> ∇f(X) ∇X. I thought about adding a optional boolean parameter for this gradient_operation to distinguish these two cases

@Luthaf
Copy link
Member

Luthaf commented Mar 15, 2023

I'll close this, transpose and dot are tracked by separate issues.

@Luthaf Luthaf closed this as completed Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Python-API Related to the Python bindings to "core"
Projects
None yet
Development

No branches or pull requests

3 participants