You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Thanks for the great work on this project. This is a very helpful library for closed domain Q&A.
That being said, it seems through my experiments that the performance of the retriever is the bottleneck (reader performance is pretty good).
Upon investigating the code and studying the architecture, it seems like the retirever is the bottleneck.
As the BERT model is only invoked after getting the initial candidates from TF-IDF. So if the TF-IDF or BM25 miss out on the correct candidate paragraphs - the BERT model would miss out on the right answer as well. Which seems to indicate that the BERT model is completely dependent on the accuracy of the vectorizers.
Do you have any thoughts on how to improve the retriever accuracy and using deep learning based information retrieval (maybe sentence similarity based metrics). Any suggestions on more advanced vectorizers ?
Thanks. :)
The text was updated successfully, but these errors were encountered:
raghavgurbaxani
changed the title
Retriever is the bottleneck : Improving over TF-IDF & BM25 vectorizers
Retriever Vectorizer is the bottleneck : Improving over TF-IDF & BM25 vectorizers
Feb 20, 2020
Hi,
Thanks for the great work on this project. This is a very helpful library for closed domain Q&A.
That being said, it seems through my experiments that the performance of the retriever is the bottleneck (reader performance is pretty good).
Upon investigating the code and studying the architecture, it seems like the retirever is the bottleneck.
As the BERT model is only invoked after getting the initial candidates from TF-IDF. So if the TF-IDF or BM25 miss out on the correct candidate paragraphs - the BERT model would miss out on the right answer as well. Which seems to indicate that the BERT model is completely dependent on the accuracy of the vectorizers.
Do you have any thoughts on how to improve the retriever accuracy and using deep learning based information retrieval (maybe sentence similarity based metrics). Any suggestions on more advanced vectorizers ?
Thanks. :)
The text was updated successfully, but these errors were encountered: