Explore bypassing HNSW graph building for tiny segments

### Description

For both quantized, and non-quantized vectors, if there are many flushes with tiny segments, it may not be worth building the HNSW graph.

For example, we already do a brute-force query against the segment if the `k` requested is larger than the number of vectors in the segment. So, having the graph means nothing at query time and is unnecessary work.

In my mind, this would behave as follows (given some empirically determined threshold N):

 - For tiny segments, we would only scalar quantize, or just store in a flat index (if its a non-quantizing format)
 - When merging tiny segments, we determine if the total number of vectors is > N, and if so, we start building the graph
 - During indexing, we don't construct the HNSW graph builder until we have N vectors, then we construct the builder, replay the seen vectors, and handle new docs. 

N needs to be determined empirically, the HNSW codecs already have so many configuration items, it would be very nice to simply pick one where the trade-off makes most sense. I would suspect it is somewhere around `10k`. 

Also, would N take into account if Panama Vector is enabled and if things are quantized or not?

Possibly, we may want to consider the scenario where Panama Vector is disabled, though, I am not sure I would suggest anyone using vector search with it off :/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explore bypassing HNSW graph building for tiny segments #13447

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Explore bypassing HNSW graph building for tiny segments #13447

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions