-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing operations transpose, swapaxis and basic arithmetic operation #101
Comments
I think we can generalize all functions that reduce the shape and functions that keep the shape fixed. We already have reduce_over_samples_block, but I think this can be generalized even more.
and operations like add that do not change the shape do not need the last two paramaters |
We should never apply operations only on values or only on gradients, otherwise the values and gradients are no longer in sync (and the "gradients" are not actually gradients of the values). Doing this inside a function is fine, as long as when we give back the data to the user it is fully consistent.
I don't think it is worth generalizing too much in the user-facing API, this would be very easily confusing. You mention For example, In general, not all numpy operations have a direct equivalent in equistore, because we are imposing more structure and meaning than a pure numpy array. |
Right, the consistency is important. I think I posted here too much of a brain dump here, sorry for that. I thought about functions that might be used for standardizing features, but not all are needed. I think basic arithmetic would already make this possible with the current reduce function.
Doesn't it result in the loss of the meaning of the gradient, because the gradient is the derivative of the properties wrt. the parameter? |
I think that what @Luthaf means is that any operation should be applied so that it leaves properties and gradients in a consistent state. E.g. scalar multiplication should be applied to both values and gradients. Scalar addition of a constant should be applied only to values, etc. It is not entirely trivial how to formalize this, and it'd be crucial that it's done in a way that is consistent with autograd e.g. in torch. |
You're right! This is somehow related to #100: the second argument to dot is "transposed" compared to the first (properties of the first are samples of the second argument), which also explains why having gradient there is harder. |
I split this issue in sub issues
I think all these cases share the problem, that sometimes we want to want to apply the operation on everything in the TensoBlock equally (I mean here also the gradients), sometimes we want to apply the operation consistently with the gradients, so X -> f(X), and ∇X -> ∇f(X) ∇X. I thought about adding a optional boolean parameter for this |
I'll close this, transpose and dot are tracked by separate issues. |
No description provided.
The text was updated successfully, but these errors were encountered: