[FEATURE] Add MUVERA ingest and search processors for multi-vector ANN prefetch

**Is your feature request related to a problem?**

Multi-vector retrieval models like ColBERT and ColPali produce per-token embeddings that require MaxSim scoring across all token pairs. This is expensive at scale because there's no way to do ANN prefetch on variable-length multi-vector representations — you're forced to either brute-force score every document or rely on external tooling to pre-encode vectors client-side.

Currently, the k-NN plugin supports `[lateInteractionScore](https://docs.opensearch.org/latest/query-dsl/specialized/script-score/#late-interaction-score)` for MaxSim reranking, but the inner query is typically `match_all` or a text filter, meaning every matching document gets scored. There's no native way to narrow candidates using the multi-vector embeddings themselves.

**What solution would you like?**

Add two new processors implementing the MUVERA algorithm (Multi-Vector Retrieval via Fixed Dimensional Encodings, [[paper](https://arxiv.org/abs/2405.19504)](https://arxiv.org/abs/2405.19504)):

1. **`muvera` ingest processor** — Converts variable-length multi-vector embeddings into a single fixed-dimensional encoding (FDE) vector using SimHash clustering and random projections. The FDE is stored in a `knn_vector` field for ANN indexing. The original multi-vectors remain in `_source` for reranking.

2. **`muvera_query` search request processor** — Intercepts `script_score` queries containing `query_vectors` in script params, MUVERA-encodes them into an FDE, and replaces the inner `match_all` with a `knn` query on the FDE field. The `lateInteractionScore` script wrapper stays intact for MaxSim reranking on the prefetched candidates.

### User flow

**Step 1: Create ingest pipeline**

```json
PUT _ingest/pipeline/muvera-ingest
{
  "description": "MUVERA FDE encoding for ColBERT vectors",
  "processors": [
    {
      "muvera": {
        "source_field": "colbert_vectors",
        "target_field": "muvera_fde",
        "dim": 128,
        "fde_dimension": 2560
      }
    }
  ]
}
```

Defaults: `k_sim=4`, `dim_proj=8`, `r_reps=20`, `seed=42`. FDE dimension = `r_reps * 2^k_sim * dim_proj` = 2560. The `fde_dimension` parameter validates the computed value so the user explicitly acknowledges the output size.

**Step 2: Create index**

```json
PUT muvera-index
{
  "settings": {
    "index.knn": true,
    "default_pipeline": "muvera-ingest"
  },
  "mappings": {
    "dynamic": false,
    "properties": {
      "muvera_fde": {
        "type": "knn_vector",
        "dimension": 2560,
        "method": {
          "name": "hnsw",
          "space_type": "innerproduct",
          "engine": "faiss"
        }
      },
      "title": { "type": "text" }
    }
  }
}
```

Note: `colbert_vectors` is intentionally left unmapped, it stays in `_source` for reranking but doesn't need its own field mapping.

**Step 3: Index documents**

```json
POST muvera-index/_doc/1
{
  "title": "example document",
  "colbert_vectors": [
    [0.1, 0.2, ...],
    [0.3, 0.4, ...],
    [0.5, 0.6, ...]
  ]
}
```

The ingest processor reads `colbert_vectors`, produces the FDE, and writes it to `muvera_fde`. Both fields end up in the stored document.

**Step 4: Create search pipeline**

```json
PUT _search/pipeline/muvera-search
{
  "request_processors": [
    {
      "muvera_query": {
        "target_field": "muvera_fde",
        "dim": 128,
        "fde_dimension": 2560,
        "oversample_factor": 4
      }
    }
  ]
}
```

Same MUVERA hyperparams as ingest (must match). `oversample_factor` controls how many candidates the knn prefetch retrieves relative to the requested result size.

**Step 5: Search**

```json
POST muvera-index/_search?search_pipeline=muvera-search
{
  "size": 10,
  "query": {
    "script_score": {
      "query": { "match_all": {} },
      "script": {
        "source": "lateInteractionScore(params.query_vectors, 'colbert_vectors', params._source, params.space_type)",
        "params": {
          "query_vectors": [[0.1, 0.2, ...], [0.3, 0.4, ...]],
          "space_type": "innerproduct"
        }
      }
    }
  }
}
```

What happens:
1. Search processor extracts `query_vectors` from script params
2. MUVERA-encodes them into a query FDE
3. Replaces `match_all` with `knn` on `muvera_fde` (k = size × oversample_factor = 40)
4. `lateInteractionScore` reranks the 40 candidates using exact MaxSim on original multi-vectors
5. Top 10 returned to user

**What alternatives have you considered?**

- Client-side MUVERA encoding (works but requires users to maintain encoding logic outside OpenSearch)
- Binary quantization of multi-vectors (lossy, doesn't preserve MaxSim structure)
- Text-based prefetch with BM25 (misses semantic signal from embeddings)

**Do you have any additional context?**

- MUVERA is already implemented in [[fastembed](https://github.com/qdrant/fastembed)](https://github.com/qdrant/fastembed) (Python) and used in production with [Qdrant](https://qdrant.tech/articles/muvera-embeddings/)
- We have a working implementation with unit tests, stable across multiple iterations with random seeds
- The implementation uses only public APIs — no reflection or core OpenSearch modifications required
- Tested end-to-end on a live cluster: ingest pipeline creates FDE vectors, search pipeline rewrites queries, `lateInteractionScore` reranking produces correct MaxSim scores


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add MUVERA ingest and search processors for multi-vector ANN prefetch #3163

User flow

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] Add MUVERA ingest and search processors for multi-vector ANN prefetch #3163

Description

User flow

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions