Implement AnglE loss #2437

tomaarsen · 2024-01-23T08:21:50Z

Hello!

Context

This issue describes a feature that I am planning to be included in a release before v3, or alternatively, in v3 of Sentence Transformers.

Details

Some recent work by Li & Li, 2023 introduces an AnglE objective. Its inclusion in Sentence Transformers would be beneficial to allow our users to also train using this new loss function. The first author, @SeanLee97, has implemented the objective in https://github.com/SeanLee97/AnglE.

@johneckberg has expressed an interest in contributing this loss.

cc @bwanglzu @ir2718 @johneckberg @aamir-s18 as I know you're interested in my TODO list. John will lead this development, and I will review his PR. Additionally, @SeanLee97 may also have a look if he finds the time.

Tom Aarsen

johneckberg · 2024-01-29T01:26:55Z

I will get this out for review in the coming days!

AnglE loss is CoSENT loss with a novel pairwise similarity metric. In the proposed code for CoSENT loss I allow for the choice of similarity metric in the constructor. In my head, it might make sense to implement the AnglE metric calculation in utils, where a user can specify it as a metric instead of the default pairwise_cos_sim when using CoSENT loss. What do others think of that?

tomaarsen · 2024-01-29T09:52:54Z

If that similarity metric has a similar signature as e.g. pairwise cosine similarity, then it may make sense to discuss this with @ir2718 who is planning on creating a new Enum with most/all similarity measures in #2441.

(Edit: I see now that the signature is exactly like the pairwise cosine similarity! As far as I'm concerned, feel free to add it to utils for now, and we can always incorporate it into @ir2718 their Enum.)

I would be in favor of this approach.

We can then also introduce an AnglELoss class which is just a subclass of CoSENT, but with the similarity measure fixed to the pairwise similarity metric. How does that sound?

Tom Aarsen

johneckberg · 2024-01-29T12:58:09Z

Sounds great Tom! I'll get started on that soon

dawnik17 · 2024-08-22T13:04:41Z

The angle difference in util.pairwise_angle_sim is implemented as follows,

loss = logsum(1 + exp(s(k,l) - s(i,j))), where (i,j) and (k,l) are any of the input pairs in the batch such that the expected similarity of (i,j) is greater than (k,l).

But in the paper, it actually is represented as logsum(1 + exp(s(i,j) - s(k,l)))

I'm attaching a screenshot from the paper as well. I might be understanding it incorrectly, and would appreciate some clarification. Thanks in advance :) @tomaarsen @johneckberg

tomaarsen · 2024-09-09T18:28:26Z

Hello @dawnik17. Apologies for the delay.
If you remember, are you proposing that

sentence-transformers/sentence_transformers/losses/CoSENTLoss.py

Line 85 in 2e13ee6

scores = scores[:, None] - scores[None, :]

should be

scores = scores[None, :] - scores[:, None]

to match the paper?

If so, the current implementation does give notably better performance when training a simple model.

Tom Aarsen

johneckberg mentioned this issue Feb 5, 2024

AnglE loss #2471

Merged

tomaarsen closed this as completed in #2471 Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement AnglE loss #2437

Implement AnglE loss #2437

tomaarsen commented Jan 23, 2024 •

edited

Loading

johneckberg commented Jan 29, 2024

tomaarsen commented Jan 29, 2024 •

edited

Loading

johneckberg commented Jan 29, 2024

dawnik17 commented Aug 22, 2024

tomaarsen commented Sep 9, 2024

Implement AnglE loss #2437

Implement AnglE loss #2437

Comments

tomaarsen commented Jan 23, 2024 • edited Loading

Context

Details

johneckberg commented Jan 29, 2024

tomaarsen commented Jan 29, 2024 • edited Loading

johneckberg commented Jan 29, 2024

dawnik17 commented Aug 22, 2024

tomaarsen commented Sep 9, 2024

tomaarsen commented Jan 23, 2024 •

edited

Loading

tomaarsen commented Jan 29, 2024 •

edited

Loading