Retriever Vectorizer is the bottleneck : Improving over TF-IDF & BM25 vectorizers #344

raghavgurbaxani · 2020-02-20T21:22:53Z

Hi,
Thanks for the great work on this project. This is a very helpful library for closed domain Q&A.
That being said, it seems through my experiments that the performance of the retriever is the bottleneck (reader performance is pretty good).

Upon investigating the code and studying the architecture, it seems like the retirever is the bottleneck.

As the BERT model is only invoked after getting the initial candidates from TF-IDF. So if the TF-IDF or BM25 miss out on the correct candidate paragraphs - the BERT model would miss out on the right answer as well. Which seems to indicate that the BERT model is completely dependent on the accuracy of the vectorizers.

Do you have any thoughts on how to improve the retriever accuracy and using deep learning based information retrieval (maybe sentence similarity based metrics). Any suggestions on more advanced vectorizers ?

Thanks. :)

raghavgurbaxani changed the title ~~Retriever is the bottleneck : Improving over TF-IDF & BM25 vectorizers~~ Retriever Vectorizer is the bottleneck : Improving over TF-IDF & BM25 vectorizers Feb 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retriever Vectorizer is the bottleneck : Improving over TF-IDF & BM25 vectorizers #344

Retriever Vectorizer is the bottleneck : Improving over TF-IDF & BM25 vectorizers #344

raghavgurbaxani commented Feb 20, 2020

Retriever Vectorizer is the bottleneck : Improving over TF-IDF & BM25 vectorizers #344

Retriever Vectorizer is the bottleneck : Improving over TF-IDF & BM25 vectorizers #344

Comments

raghavgurbaxani commented Feb 20, 2020