Skip to content

[META] Support Proper Filtering #172

@sam-herman

Description

@sam-herman

Background

Currently most vector index libraries from Faiss, jVector, Lucene etc.. support either a pre-filter and brute force KNN, or ANN followed by a post-filter.

This is causing for queries to miss relevant results for datasets due to the result queue being too short to include all the relevant results for post filtering.

For example: consider the case when ANN is performed over a dataset of say 1 Billion vectors, out of which we set a filter that reduces the scan to only 30 possible results and only interested to get the top-1 relevant results with our filter.
The issue with post-filtering in that case is that if we have other documents more relevant that are not included in the 30 possible results in our filter, we would get an intermediate result with the wrong top-K and then post filter it, which will result in empty result.

What solution would you like?
We would need to have a story around proper support for ANN filter functionality.

Issues

  1. Paginate search results during post-filtering - [FEATURE] Paginate search results during post-filtering #173

Sub-issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions