-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a scoring function in model configuration #2490
Conversation
Hello! Thanks a bunch for this PR! It's a bit big, so it's taken a bit of time to have a good look at it. I like the direction, and I think it's clever to try and infer the best scoring function from the evaluator if no scoring function was explicitly provided. I'm not sure it's the smartest approach though, as it might be unexpected to see multiple similarly trained models e.g. have different scoring functions. I'm also considering using the scoring function defined in the model to inform the choice of scoring function in the evaluators, but I'm not sure if that's ideal. Beyond that, I think it would be interesting for the SimilarityFunction class to store the two kinds of scoring functions: "normal" and "pairwise". On the As for the changes in the evaluators: some of them are slightly problematic I think. In short: I want all code that works currently to keep working. This means that we can't easily e.g. update the name of a keyword argument like replacing This is definitely a tricky PR!
|
To be honest, I didn't think about this. I have a feeling that cosine similarity is the standard score function for most tasks. So I guess I would suggest to set that as the default, and then in case users want to set the scoring function to the best-performing score function they can explicitly set it to
I think this is better than the current solution. If you think this is a good design choice, I can implement it.
I don't know why anyone would use different scoring functions for normal and pairwise similarity calculation. Could you give me an example of why this would be an improvement?
Hard agree with this one, I'll add it.
I'm aware I didn't actually implement it for the Reranking evaluator, as I wasn't sure if it makes sense and wanted to hear your input. Can you explain a bit more what you mean by this just to make sure I'm on the right track? |
That's a really cool idea indeed. Seems good. And Cosine is a good default I think.
I think it would eventually be best, but perhaps we can keep the changes in the evaluators small for now.
It would never be different scoring functions, but I mean that sometimes you'd want to use
This is very tricky, but I'd like to keep all changes backwards compatible, i.e. code that used to work still works. If someone wrote some training script that said These changes are very tricky to navigate, and I think it might be best to minimize any changes in the evaluators and primarily change In particular, I think these changes are good (at a glance):
And I think these have some issues:
So perhaps we can update the
|
Thanks for the clarification, everything makes sense now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment for all files
I would switch to using f-strings instead of format
method. I saw as well that format
method is used across the library, but this seems to be due to lack of refactoring when f-strings became available. Maybe @tomaarsen can provide more context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! Is it okay if I take over the development here @ir2718? I have a bit of time this week & I'd like to push this up my priority list 😄
|
Hey @tomaarsen, Sorry for the late response, I've been busy lately. I'm pretty sure I've covered most of the cases and I don't mind you taking over. There are two caveats I'd like to mention:
Feel free to ping me in case you need my help. |
Oh, odd. I'll look into that as well. That sounds like a headache :S |
PR overview
The PR adds a
score_function
parameter in the model configuration upon saving the model (#2441) and updates the rest of the code accordingly.Details
Upon loading the model, the user can provide a string value for choosing the score function to be used. In case this score function is left blank, it will be chosen depending on the best-performing score function in training.
An important design choice here is that the best-performing score function is chosen solely depending on the chosen evaluator class. This only made sense for some evaluators eg.
LabelAccuracyEvaluator
assumes the model outputs a single vector so you can't compare it to anything. I've implemented it only for the following evaluators:BinaryClassificationEvaluator
EmbeddingSimilarityEvaluator
InformationRetrievalEvaluator
Another important thing to mention is that some evaluators have predefined score function values:
ParaphraseMiningEvaluator
RerankingEvaluator
TranslationEvaluator
Before adding this, I wanted to ask whether I should even add evaluation for different score functions or should I keep it as is?
score_function_name
parameter in the model configuration file.I would also like to add that this definitely needs more testing before merging.