[FEATURE][RFC] Asymmetric Distance Calculation and Random Rotation to Boost Recall for On-Disk Vector Search.


**Is your feature request related to a problem?**
Disk-based vector search was added in OpenSearch 2.17. This feature uses binary quantization to decrease the memory footprint of vector search while keeping recall high. While disk-based vector search is successful at keeping recall high for some datasets, other data sets have recall less than 0.9 out of the box. We can use the two techniques discussed in this RFC, Asymmetric Distance Computation (ADC) and Random  Rotation (RR), to increase the recall on some of these datasets.
 
In order to maintain high recall, disk-based search uses two phases. The first phase searches a binary quantized index for an oversampled number of neighbors. The second phase rescores and reduces these oversampled neighbors using exact search.

![Image](https://github.com/user-attachments/assets/824cf0f7-a507-4146-9daa-f8999791a259)

Currently, the first phase quantizes each 4-byte floating point entry in a vector into a single bit, then computes its distance to neighbors using hamming distance.

While this approach achieves high recall on some datasets, it does not perform well on others. 

We can see that for some datasets, the recall for on-disk (32x compression) is above 0.9 out of the box:<br>
Data set | Recall@k
-- | --
minillm-msmarco | 0.90
mpnet-msmarco | 0.92
mxbai-msmarco | 0.95
snowflake-msmarco | 0.92

However, the success of this approach is dataset dependent. There are data sets that are well below 0.9 out of the box:
| Data set | Recall@k |
|-----------|-----------|
| sift | 0.39 |
| glove | 0.42 |
| gist | 0.12 |
| cohere-v2-wiki | 0.57 |
| cohere-v3-bioasq | 0.71 |
| tasb-msmarco | 0.87 |
| e5small-msmarco | 0.80 |
| clip-flickr-image-queries | 0.81 |
| cohere-v2-dbpedia | 0.76 |

In general, recall can be improved by increasing number of documents we rescore (via increasing `oversample_factor`). However, because rescoring has a random access pattern to disk, more rescoring leads to slower latencies. So recall is improved, but the recall/search latency tradeoff is not.

**What solution would you like?**

Reduce the impact of quantization errors through combining two separate optimizations:
1. Asymmetric Distance Computations (ADC)
2. Random Rotation (RR)

I am opening as a single RFC instead of several because the recall benefits of ADC+RR are much higher than the recall benefits of ADC or RR separately. There is also a search latency tradeoff for these techniques.

## Asymmetric Distance Computation (ADC)
Binary quantization introduces error in two different stages: 1), when the document vectors are quantized during ingestion, and 2), when the query vectors are quantized during search. 

While stage 1) is necessary to achieve memory savings due to the number of documents, stage 2) is less necessary for reducing memory requirements. For instance, a full-precision 768-dimension float32 vector will use around 3 kB memory, but the same binary-quantized vector will use 100 B memory. While this is a 32x difference, the scale is small compared to the ~3 gB required to index 1M binary quantized documents (the added overhead from 3 kB vs 100B is not significant).

We can use ADC to preserve the full-precision vectors during search. This cuts down the error introduced in stage 2).

At a very high level, ADC scales our query vector close to 0 and 1. Each entry `q_d` in the query vector is modified via: `q_d = (q_d - x_d) / (y_d - x_d)`. `x_d` is the mean of all document entries quantized to `0` (the below threshold mean) and `y_d` is the mean of all document entries quantized to `1` (the above threshold mean). Intuitively, the query is transformed through a linear interpolation of the above and below threshold means. This implies that an entry in our query vector that is very close to the below threshold mean will be transformed to a value near 0 and an entry close to the above threshold mean will be close to 1. However, this scheme allows for values besides just 0 and 1 which is where the better accuracy comes from.

## Random Rotation (RR)
One fact about binary quantization (a type of scalar quantization) is that each entry is given equal weight during the quantization process. This leads to undesirable effects if the scale of variance is different in different dimensions. For illustration imagine we have a 2-dimensional data distribution where the first dimension (x axis) has small variance and the second dimension (y axis) has high variance. The position of a vector in the second dimension may have more “information“ than its position in the first dimension, but binary quantization treats each dimension as equally important. However, we can rotate the distribution and some of the variance is ”smoothed“ from the second dimension into the first dimension. See below for an illustration.

![Image](https://github.com/user-attachments/assets/c1ec70fa-e617-4160-a5ed-6c2d74835d4e)

Random rotation can help smooth uneven variance across dimensions, which improves the effectiveness of k-NN search under binary quantization, since binary quantization treats all dimensions equally.

**What alternatives have you considered?**
- Increase oversampling factor to achieve 0.9 recall out of box for the above datasets.
The issue with this alternative is that it is naive and does not address the fundamental sources of error introduced in binary quantization (it does not account for uneven information across dimensions and still introduces quantization error in two stages). It will also increase the amount of random disk IO for fetching the full-precision vectors. I believe, based on the fact that we are achieving >0.9 recall on some datasets and <0.9 on others out of the box, that solely increasing oversampling factor is not the correct next step to improve the recall for disk-based vector search.

<h2>Benchmark Results</h2>
<h3>Benchmark Setup</h3>
Three sets of experiments were performed on the several datasets we've used to benchmark disk-based vector search in the past:<br><ol><li style="list-style-type:decimal">Random rotation (RR).</li><li style="list-style-type:decimal">Asymmetric distance calculation (ADC). </li><li style="list-style-type:decimal">Asymmetric distance calculation and random rotation (ADC+RR). </li></ol>The hyperparameters used for these benchmarks are the k-NN defaults. The relevant defaults are:<br><ul><li>m=16</li><li>ef_construction=256</li><li>ef_search=256</li><li>oversample_factor = 5.0 if dimensions &lt; 1000 else 3.0</li></ul>

Note that these experiments were conducted on a single shard/single segment one-machine cluster. We expect recall to be higher for multiple segments and multiple shard setups since there will be more hnsw graphs constructed in those cases (as opposed to a single hnsw graph).

<h3>Recall Changes</h3>
The following plots show recall differences between baseline k-NN and enhanced k-NN. <br><br>From left to right: Random rotation vs baseline, ADC vs baseline, ADC&amp;Random Rotation vs baseline. (click to enlarge).<br>

![Image](https://github.com/user-attachments/assets/d3d93317-fba5-4af1-b0e6-a38766b491a4)



<br>ADC+RR leads to good recall gains for some datasets, especially for datasets with a low recall (a high number of false positives). However, ADC+RR does not show improvements across the board and there are some datasets with small regressions.<br><br>Here is an enlarged recall plot for (ADC+RR):

![Image](https://github.com/user-attachments/assets/1057bb75-3fc8-4f87-a4b5-7d6e0c24b170)

<h3>Latency impact</h3>
There is variation in search latency numbers from run-to-run but the general trend is meaningful. ADC increases search latency. <br><br>The average search latency impact across the above datasets is <code>17.6%</code>. 

![Image](https://github.com/user-attachments/assets/eee8510f-4af8-4d0f-8256-5f72ed3fc659)


<h3>Indexing Impact</h3>The random rotation technique adds some additional overhead during indexing. When we quantize the vectors we first rotate them. The impact on p90 indexing latencies is mostly negligible.

![Image](https://github.com/user-attachments/assets/da2ba7c0-e497-4c07-be64-10684a19f565)

The main performance hit comes when we force merge segments into a single segment. The reason for this is that <i>all</i> vectors must be rotated before being sent to the faiss index. We do not store an extra copy of the rotated vectors due to the memory overhead and engineering complexity. One optimization we could look into implementing is to store rotated vectors in the source, then undo the rotation before returning the vectors. Alternatively we could simply store only the rotated vectors (if the user only cares about the document ids of the vectors and not the vectors themselves). Already k-NN does this something similar to support cosine similarity in faiss (store normalized vectors so IP and cosine similarity are equivalent).

![Image](https://github.com/user-attachments/assets/43f58938-9434-4ac3-87ed-c8e7f22b748d)
| Dataset | Mode | Model | Model type | dimension | Space type | Recall@100 (Enhanced) | Recall@100 (Baseline) | p90 query latency (Enhanced) | p90 query latency (Baseline) | P90 Latency Changes % | p90 indexing % change | Force merge time % change | p90 indexing (Enhanced) | p90 indexing (Baseline) | Force merge time (Enhanced) | Force merge time (Baseline) |
|----------|------|-------|------------|-----------|------------|-------------------|-------------------|---------------------------|---------------------------|---------------------|---------------------|------------------------|----------------------|----------------------|--------------------------|--------------------------|
| coherev2-10m | on_disk | Cohere v2 | Text Embeddings | 768.0 | inner product | 0.52 | 0.57 | 54.7 | 60.2 | -9.1362 | -13.4746 | 177.7311 | 102.1 | 118.0 | 198.3 | 71.4 |
| e5_embeddings | on_disk | e5.small | Text Embeddings | 384.0 | l2 | 0.88 | 0.8 | 25.8 | 19.1 | 35.0785 | -12.6792 | 45.0467 | 115.7 | 132.5 | 77.6 | 53.5 |
| flickr-image-queries | on_disk | Clip | Image Embeddings | 512.0 | l2 | 0.93 | 0.81 | 57.3 | 51.7 | 10.8317 | -6.5632 | 93.9394 | 78.3 | 83.8 | 64.0 | 33.0 |
| gist-960-euclidean | on_disk | Gist descriptor | Image Embeddings | 960.0 | l2 | 0.36 | 0.12 | 30.0 | 40.1 | -25.187 | -8.9844 | 412.7273 | 69.9 | 76.8 | 28.2 | 5.5 |
| glove-200-angular | on_disk | Glove word embedding model | Text Embeddings | 200.0 | cosine | 0.57 | 0.42 | 4.5 | 4.8 | -6.25 | -16.1695 | 2.7397 | 53.4 | 63.7 | 7.5 | 7.3 |
| marco_tasb | on_disk | tasb | Text Embeddings | 768.0 | inner product | 0.86 | 0.87 | 5.6 | 4.5 | 24.4444 | -31.063 | 300.0 | 66.8 | 96.9 | 23.2 | 5.8 |
| mbread_msmarco | on_disk | minillm | Text Embeddings | 384.0 | l2 | 0.95 | 0.9 | 6.7 | 3.36 | 99.4048 | | 263.8232 | 89.0 | | 35.4 | 9.73 |
| mini_msmarco | on_disk | mxbai | Text Embeddings | 1024.0 | cosine | 0.96 | 0.9 | 4.0 | 3.9 | 2.5641 | -3.2872 | 45.614 | 55.9 | 57.8 | 8.3 | 5.7 |
| mpnet_marco | on_disk | mpnet | Text Embeddings | 768.0 | l2 | 0.98 | 0.95 | 6.8 | 4.0 | 70.0 | -18.7943 | 145.8824 | 68.7 | 84.6 | 20.9 | 8.5 |
| sift-128-euclidean | on_disk | Sift descriptor | Image Embeddings | 128.0 | l2 | 0.7 | 0.39 | 3.8 | 3.7 | 2.7027 | -29.927 | 35.7143 | 9.6 | 13.7 | 3.8 | 2.8 |





<br><h1>ADC Design</h1><h2>High Level Design</h2>If ADC is enabled then the following design will allow for finding the closest neighbors of full-precision vectors against quantized document vectors.<br><br>The main component of the design is a new <code>ADCFlatCodesDistanceComputer</code> to enable asymmetric searching. Our strategy is to create an altered binary index with distance calculations handled by the custom computer. The creation step does not require reindexing data, but it does require loading the preexisting faiss index. Whenever this index is queried from Java code, the hnsw graph search will use the transformed full precision vector to determine its closest neighbors via the overloaded distance computer.<br>
<h3>Indexing Changes</h3>We use the <code>QuantizerHelper</code> to calculate the above and below threshold means for ADC for the vectors in each segment. These above and below threshold means are stored in the <code>QuantizationState</code> so that they can be used to transform the ADC vector. The faiss index creation and indexing steps are unchanged, as our strategy is to alter the query vectors during searching. The quantized document vectors are not changed at all; we just calculate the <code>aboveThresholdMeans[]</code> and <code>belowThresholdMeans[]</code> for later searches.

![Image](https://github.com/user-attachments/assets/a9b0d005-c9b2-4d69-8397-d5c4b6a1355a)

<h3>Search Changes</h3>We use the <code>aboveThresholdMeans[]</code> and <code>belowThresholdMeans[]</code> arrays to transform each query vector.<br>The main change to search is that we load an altered faiss index with a new distance computer that handles the asymmetric computation. The red boxes in the below diagram represent the alterations to the faiss index. The main new component is a document storage class, called <code>FaissIndexBQ</code>, that overrides the <code>get_FlatCodesDistanceComputer</code> method to return the desired <code>ADCFlatCodesDistanceComputer</code>. The new computer overrides the distance function in hnsw graph search. We pass the preexisting loaded id map and hnsw graph structure to the new binary index. We pass the codes vector (the set of quantized documents in our index) from the original storage to our altered storage as a pointer. We do not need to perform a vector copy, since we can assume that the original storage will remain in memory for the lifetime of the altered index.
<br>

![Image](https://github.com/user-attachments/assets/6f84b878-d004-4d19-af9c-d648ddaffd67)


Whenever this ADC-enabled index is searched, the new distance computer’s <code>set_query</code> and <code>distance_to_code</code> methods are used for asymmetric computation within the hnsw search code. <br><br>After the search concludes, the path is the same as before, with rescoring handled by preexisting code. Any rescoring is performed with the original non-transformed vector against the non-quantized document vectors. The filtering code path is not affected.<br><br>See the LLD section for a sequence diagram covering searching and caching as well as more details about the alteration strategy.<br><h3>Out of Scope</h3>The following are out of scope for this particular design:<br><ul></li><li>Using ADC during indexing to build a better hnsw graph. We would need to explore this direction further to see if the recall gains are large enough to offset what will be a large increase in indexing time.</li><li>Automatically deducing when ADC should be enabled/disabled. We elaborate on this option in the alternatives considered section, but this document does not include the <code>auto</code> design. </li></ul><br><h2>LLD</h2><h3>Faiss Index Alteration Strategy</h3>In Faiss a binary index is implemented in the <code>faiss::IndexBinary</code> class. The <code>faiss::IndexBinaryIDMap</code> , <code>faiss::IndexBinaryHNSW</code>, and <code>faiss::IndexBinaryFlat</code> classes encapsulate the vector to document id map, hnsw structure, and document storage respectively. When a faiss index is searched, a specific distance computer is used. The distance computer is attached to the document storage class, so we create a new document storage class <code>FaissIndexBQ</code> that overrides the <code>get_FlatCodesDistanceComputer</code> method to return the desired <code>ADCFlatCodesDistanceComputer</code>. As mentioned in the high level design, we pass the preexisting loaded id map and hnsw graph structure to the new binary index. We pass the codes vector (representing the set of quantized documents in our index) from the original storage to our altered storage as a pointer. We do not need to perform a vector copy, since we can assume that the original storage will remain in memory for the lifetime of the altered index. Below is a more detailed diagram illustrating the index alteration strategy.br><br>

![Image](https://github.com/user-attachments/assets/5938c425-82eb-408b-9c6b-f33b6f81e928)


<br><br><h3>Search sequence diagram</h3>

![Image](https://github.com/user-attachments/assets/a0137cc6-9156-4e05-b01b-c0c54d3fb9b7)

<h3>Transformation Formulae</h3>1 bit: <code>q_d = (q_d - x_d) / (y_d - x_d)</code> where <code>x_d</code> is the mean of all document entries quantized to <code>0</code> (the below threshold mean) and <code>y_d</code> is the mean of all document entries quantized to <code>1</code> (the above threshold mean).<br><br>The multi bit transformation is slightly more involved. Essentially instead of splitting each dimension into a below/above threshold, it splits each dimension into multiple thresholds (for instance <code>00</code>, <code>01</code>, <code>11</code> thresholds) and scales the query vector according to these thresholds. See the appendix for a full explanation. <br><h3>Additional Memory Requirements</h3>For 1 bit, per query: <br><ul><li>Need to store (d/ 8) * 256 additional floats for the batched distance.</li><li>Total cost: (d / 8) * 256 * 4 = 128*d bytes.</li></ul>For 2 bit, per query: <br><ul><li>Need to store (d / 8) * 256 * 2 additional floats for the batched distance.</li><li>Need to store a copy of the above and below threshold means used to transform the query vector ( 2 * d floats)</li><li>Finally need partitions (3 * d floats)</li><ul><li>Not strictly necessary but makes implementation easier.</li></ul><li>Total cost: 4 * ((d * 64) + (d * 2) + (d * 3)) = 276*d bytes. </li></ul><h3>Batching Optimization</h3>We optimize ADC by creating a codebook and exploiting the fact that the document vectors (“codes”) are represented as contiguous bits. Due to the bit-packing logic, each contiguous byte represents 8 entries in a document code. We can thus precompute the 2^8=256 possible distances (<code>0000 0000</code>, <code>0000 0001</code>,<code>0000 0011</code>  etc) of each batch of code entries. This allows us to find the distance to a batch of vector dimensions with an array lookup.<br><br><h1>Random Rotation Design</h1><h2>High Level Design</h2>Random rotation is implemented within the quantization code flow in k-NN. The matrix is generated by the following process:<br><ol><li style="list-style-type:decimal">Sample each entry in the matrix from a standard normal random variable. </li><li style="list-style-type:decimal">Use Gram-Schmidt to orthonormalize the matrix.</li></ol>The resultant matrix <code>M</code> gives us <code>norm(Mx) = norm(x)</code> with probability 1. <br><br>We apply our rotation matrix <code>M</code> to our full-precision document vectors before quantizing them, and also to our query vectors.<br><h2>Low Level Design</h2>

### Train Flow

![Image](https://github.com/user-attachments/assets/11de3db7-e46f-4d2b-972e-301f0dad8369)



### Indexing Flow
![Image](https://github.com/user-attachments/assets/9144e67c-3bd7-4875-ab61-a46214d37662)

### Search Flow


![Image](https://github.com/user-attachments/assets/92b3d890-5a30-49e9-996a-3f4bcfb373f7)

<h3>Open Questions</h3><ul><li>Due to merge time regressions, should we store the rotated vectors in memory? </li><ul><li>We do something similar in the cosine space for faiss, where we store the normalized vectors in source. </li><li>Rotation, like normalization, is an invertible operation. We can preserve the option of undoing this rotation by logging the entire random rotation matrix and its transpose. (the transpose of the rotation matrix is its inverse since the rotation matrix, due to its generation process, is orthogonal.) However, this approach may break if data is downloaded from opensearch and then re-indexed, since it will be rotated multiple times. This is in contrast to normalization, where normalization only affects the vectors during the first indexing step.</li>
</ul></ul>





<h2>User Experience - Recommended Changes to k-NN Interface</h2>

We recommend giving users control over ADC+RR via an index mapping parameter. The P0 goal for 3.1 is to introduce low-level configuration changes under the encoder configuration. `enable_adc` and `random_rotation` can be set to `true/false` to enable/disable each behavior. They are off by default to avoid unexpected latency increases.

```
PUT vector-index
{
  "settings" : {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "vector_field": {
        "type": "knn_vector",
        "dimension": 8,
        "method": {
            "name": "hnsw",
            "engine": "faiss",
            "space_type": "l2",
            "parameters": {
              "encoder": {
                "name": "binary",
                "parameters": {
                  "bits": 1,
		  "random_rotation": true,
		  "enable_adc": true
                }
              }
            }
        }
      }
    }
  }
}
```
<br>

In the future we can add a higher-level index mapping parameter named <code>enhance_search</code> or `boost_recall` or `boost_search_recall` with boolean <code>true/false</code> options. <code>enhance_search/boost_recall</code> should be <code>false</code> by default to prevent unexpected latency increases. <br><pre>PUT /adc-index<br>{<br>  "settings" : {<br>    "index": {<br>      "knn": true<br>    }<br>  },<br>  "mappings": {<br>    "properties": {<br>      "vector_field": {<br>        "type": "knn_vector",<br>        "dimension": 768,<br>        "space_type": "l2",<br>        "data_type": "float",<br>        "mode": "on_disk",<br>        "compression_level": "32x",<br>        // new enable adc+rr mapping parameter<br>        "enhance_search": true<br>      }<br>    }<br>  }<br>}</pre>


<h3>Alternatives Considered</h3><h3>Enable by Default</h3>One alternative is to enable ADC+RR by default. Though this will boost recall on many quantized datasets, it comes at the cost of higher latency. If we enable search accuracy enhancements by default then users will be confused by latency degradation in their preexisting clusters. The advantage of this approach is that recall will increase. The desirable recall/latency tradeoff is specific to each use case so it doesn’t make sense to turn on this setting for all users since some users will be happy (if they want primarily higher recall) and others will be upset (if they want primarily high performance). Additionally the impact on recall and performance may be opaque.<br><h3>Enable by default but decrease oversampling</h3>A large factor in on-disk search performance is the degree of oversampling/rescoring in an index. We may be able to tune the oversampling factor when <code>enhance_search</code> is enabled. Right now the <code>oversample_factor</code> is set to <code>5.0</code> for dimensions &lt; 1000 and to <code>3.0</code> for dimensions &gt; 1000. We could do more investigation into how much we can decrease the <code>oversample_factor</code> if <code>enhance_search</code> is on. This approach would require a series of experiments essentially doing a grid search over all oversample factors to determine if we can find a configuration that boosts recall for all test datasets without a latency penalty. If the community believes this is a promising direction it can be pursued as a P1 task. <br><h3>Enable based on characteristics of index (auto mode)</h3>We found from benchmarking that text embedding data holds up very well to binary quantization but image data (sift, gist) does not. We could take further investigation into the datasets that perform poorly and try to find characteristics of the index that we can use to toggle <code>enhance_search</code>. We could track metrics based on the data distribution of the index, perhaps using the new profiler API. The current profiler metrics of mean and variance are not sufficient for ADC unfortunately. This investigation will take time and may not reveal any new insights into when ADC should be turned on/off. Fundamentally a user’s decision about whether or not to use ADC depends on if they are willing to trade search latency for search accuracy. The exact hit to search time and the boost to accuracy depend on the user’s workload. We saw in the experiments that the accuracy boost and performance cost of ADC varies significantly depending on the dataset.
<br></br>
<h3>Coauthors</h3>
Jack Mazanec (@jmazanec15), Yonatan Naamad (@ynaamad), Vikash Tiwari (@Vikasht34), Tal Wagner (@talwagner)

Data set	Recall@k
minillm-msmarco	0.90
mpnet-msmarco	0.92
mxbai-msmarco	0.95
snowflake-msmarco	0.92

Data set	Recall@k
sift	0.39
glove	0.42
gist	0.12
cohere-v2-wiki	0.57
cohere-v3-bioasq	0.71
tasb-msmarco	0.87
e5small-msmarco	0.80
clip-flickr-image-queries	0.81
cohere-v2-dbpedia	0.76

Dataset	Mode	Model	Model type	dimension	Space type	Recall@100 (Enhanced)	Recall@100 (Baseline)	p90 query latency (Enhanced)	p90 query latency (Baseline)	P90 Latency Changes %	p90 indexing % change	Force merge time % change	p90 indexing (Enhanced)	p90 indexing (Baseline)	Force merge time (Enhanced)	Force merge time (Baseline)
coherev2-10m	on_disk	Cohere v2	Text Embeddings	768.0	inner product	0.52	0.57	54.7	60.2	-9.1362	-13.4746	177.7311	102.1	118.0	198.3	71.4
e5_embeddings	on_disk	e5.small	Text Embeddings	384.0	l2	0.88	0.8	25.8	19.1	35.0785	-12.6792	45.0467	115.7	132.5	77.6	53.5
flickr-image-queries	on_disk	Clip	Image Embeddings	512.0	l2	0.93	0.81	57.3	51.7	10.8317	-6.5632	93.9394	78.3	83.8	64.0	33.0
gist-960-euclidean	on_disk	Gist descriptor	Image Embeddings	960.0	l2	0.36	0.12	30.0	40.1	-25.187	-8.9844	412.7273	69.9	76.8	28.2	5.5
glove-200-angular	on_disk	Glove word embedding model	Text Embeddings	200.0	cosine	0.57	0.42	4.5	4.8	-6.25	-16.1695	2.7397	53.4	63.7	7.5	7.3
marco_tasb	on_disk	tasb	Text Embeddings	768.0	inner product	0.86	0.87	5.6	4.5	24.4444	-31.063	300.0	66.8	96.9	23.2	5.8
mbread_msmarco	on_disk	minillm	Text Embeddings	384.0	l2	0.95	0.9	6.7	3.36	99.4048		263.8232	89.0		35.4	9.73
mini_msmarco	on_disk	mxbai	Text Embeddings	1024.0	cosine	0.96	0.9	4.0	3.9	2.5641	-3.2872	45.614	55.9	57.8	8.3	5.7
mpnet_marco	on_disk	mpnet	Text Embeddings	768.0	l2	0.98	0.95	6.8	4.0	70.0	-18.7943	145.8824	68.7	84.6	20.9	8.5
sift-128-euclidean	on_disk	Sift descriptor	Image Embeddings	128.0	l2	0.7	0.39	3.8	3.7	2.7027	-29.927	35.7143	9.6	13.7	3.8	2.8

[FEATURE][RFC] Asymmetric Distance Calculation and Random Rotation to Boost Recall for On-Disk Vector Search. #2714

Description

Asymmetric Distance Computation (ADC)

Random Rotation (RR)

Benchmark Results

Benchmark Setup

Recall Changes

Latency impact

Indexing Impact

ADC Design

High Level Design

Indexing Changes

Search Changes

Out of Scope

LLD

Faiss Index Alteration Strategy

Search sequence diagram

Transformation Formulae

Additional Memory Requirements

Batching Optimization

Random Rotation Design

High Level Design

Low Level Design

Train Flow

Indexing Flow

Search Flow

Open Questions

User Experience - Recommended Changes to k-NN Interface

Alternatives Considered

Enable by Default

Enable by default but decrease oversampling

Enable based on characteristics of index (auto mode)

Coauthors

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions