Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AnglE loss #2437

Closed
tomaarsen opened this issue Jan 23, 2024 · 5 comments · Fixed by #2471
Closed

Implement AnglE loss #2437

tomaarsen opened this issue Jan 23, 2024 · 5 comments · Fixed by #2471

Comments

@tomaarsen
Copy link
Collaborator

tomaarsen commented Jan 23, 2024

Hello!

Context

This issue describes a feature that I am planning to be included in a release before v3, or alternatively, in v3 of Sentence Transformers.

Details

Some recent work by Li & Li, 2023 introduces an AnglE objective. Its inclusion in Sentence Transformers would be beneficial to allow our users to also train using this new loss function. The first author, @SeanLee97, has implemented the objective in https://github.com/SeanLee97/AnglE.

@johneckberg has expressed an interest in contributing this loss.

cc @bwanglzu @ir2718 @johneckberg @aamir-s18 as I know you're interested in my TODO list. John will lead this development, and I will review his PR. Additionally, @SeanLee97 may also have a look if he finds the time.

  • Tom Aarsen
@johneckberg
Copy link
Contributor

I will get this out for review in the coming days!

AnglE loss is CoSENT loss with a novel pairwise similarity metric. In the proposed code for CoSENT loss I allow for the choice of similarity metric in the constructor. In my head, it might make sense to implement the AnglE metric calculation in utils, where a user can specify it as a metric instead of the default pairwise_cos_sim when using CoSENT loss. What do others think of that?

@tomaarsen
Copy link
Collaborator Author

tomaarsen commented Jan 29, 2024

If that similarity metric has a similar signature as e.g. pairwise cosine similarity, then it may make sense to discuss this with @ir2718 who is planning on creating a new Enum with most/all similarity measures in #2441.

(Edit: I see now that the signature is exactly like the pairwise cosine similarity! As far as I'm concerned, feel free to add it to utils for now, and we can always incorporate it into @ir2718 their Enum.)

I would be in favor of this approach.

We can then also introduce an AnglELoss class which is just a subclass of CoSENT, but with the similarity measure fixed to the pairwise similarity metric. How does that sound?

  • Tom Aarsen

@johneckberg
Copy link
Contributor

Sounds great Tom! I'll get started on that soon

@dawnik17
Copy link

The angle difference in util.pairwise_angle_sim is implemented as follows,

loss = logsum(1 + exp(s(k,l) - s(i,j))), where (i,j) and (k,l) are any of the input pairs in the batch such that the expected similarity of (i,j) is greater than (k,l).

But in the paper, it actually is represented as logsum(1 + exp(s(i,j) - s(k,l)))

I'm attaching a screenshot from the paper as well. I might be understanding it incorrectly, and would appreciate some clarification. Thanks in advance :) @tomaarsen @johneckberg

Screenshot 2024-08-22 at 6 26 45 PM

@tomaarsen
Copy link
Collaborator Author

Hello @dawnik17. Apologies for the delay.
If you remember, are you proposing that

scores = scores[:, None] - scores[None, :]

should be

scores = scores[None, :] - scores[:, None]

to match the paper?

If so, the current implementation does give notably better performance when training a simple model.

  • Tom Aarsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants