-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Background
Currently most vector index libraries from Faiss, jVector, Lucene etc.. support either a pre-filter and brute force KNN, or ANN followed by a post-filter.
This is causing for queries to miss relevant results for datasets due to the result queue being too short to include all the relevant results for post filtering.
For example: consider the case when ANN is performed over a dataset of say 1 Billion vectors, out of which we set a filter that reduces the scan to only 30 possible results and only interested to get the top-1 relevant results with our filter.
The issue with post-filtering in that case is that if we have other documents more relevant that are not included in the 30 possible results in our filter, we would get an intermediate result with the wrong top-K and then post filter it, which will result in empty result.
What solution would you like?
We would need to have a story around proper support for ANN filter functionality.
Issues
- Paginate search results during post-filtering - [FEATURE] Paginate search results during post-filtering #173