Add Vespa engine support for multi-engine benchmarking by OVI3D0 · Pull Request #1043 · opensearch-project/opensearch-benchmark

OVI3D0 · 2026-04-21T19:18:27Z

Description

Broken off from #1042 , this PR contains only Vespa-specific changes

Adds support for Vespa in OSB.

Ingest and query data on other search engines outside of OpenSearch using a new --database-type flag

Each engine is a Python module under osbenchmark/engine/<name>/ exposing 5 functions:

Function	Purpose
create_client_factory	Builds engine-native HTTP/gRPC client
create_async_client	Creates async client for worker processes
register_runners	Registers engine-specific operation runners
wait_for_client	Health-check polling before benchmark starts
on_execute_error	Translates engine exceptions to OSB metrics

A lightweight registry (osbenchmark/engine/__init__.py) dispatches --database-type=vespa to the correct module. OpenSearch remains the default and delegates to the existing osbenchmark/client.py and osbenchmark/worker_coordinator/runner.py, those files are untouched.

Vespa (osbenchmark/engine/vespa/):

Client: pyvespa httpr (Rust HTTP) for search, aiohttp for document feed
Search: pre-translated YQL required (runners raise BenchmarkError if body["yql"] missing)
Feed: feed_batch() with HTTP/2 multiplexing, configurable max_concurrent
HNSW: hnsw_ef_search mapped to targetHits + hnsw.exploreAdditionalHits

Usage

Vespa

opensearch-benchmark run \
  --pipeline=benchmark-only \
  --workload-path=<path-to-vectorsearch-workload> \
  --workload-params=params_vespa.json \
  --test-procedure=vespa-search-only \
  --target-hosts=<vespa-host>:8080 \
  --database-type=vespa \
  --client-options="hnsw_ef_search:256"

Example params file (works for all engines — adjust `target_index_name` per engine)

{
  "target_index_name": "target_index",
  "target_field_name": "embedding",
  "target_index_body": "indices/faiss-index.json",
  "target_index_primary_shards": 1,
  "target_index_replica_shards": 0,
  "target_index_dimension": 768,
  "target_index_space_type": "innerproduct",
  "target_index_bulk_size": 500,
  "target_index_bulk_index_data_set_format": "hdf5",
  "target_index_bulk_index_data_set_corpus": "cohere-1m",
  "target_index_bulk_indexing_clients": 10,
  "target_index_max_num_segments": 1,
  "hnsw_ef_construction": 200,
  "hnsw_ef_search": 256,
  "query_k": 100,
  "query_data_set_format": "hdf5",
  "query_data_set_corpus": "cohere-1m",
  "query_count": 10000,
  "search_clients": 1,
  "neighbors_data_set_corpus": "cohere-1m",
  "neighbors_data_set_format": "hdf5"
}

New Dependencies

pyvespa (pip install pyvespa) — optional, for Vespa engine
is not required for OpenSearch-only usage (lazy imports, graceful degradation)

Issues Resolved

#1000

Testing

New functionality includes testing

Unit tests + full E2E test against live Vespa + Milvus nodes.
All results from cohere-1M (768-dim, innerproduct) on m5.2xlarge (8 vCPU / 32 GB) instances, single-node per engine. 1M vectors ingested, 10K queries, hnsw_ef_search=256, k=100.

Engine	QPS (median)	p50 (ms)	p99 (ms)	recall@k	recall@1
Vespa 8.660	197	4.7	6.3	0.76	0.92
Milvus 2.6.13	242	3.5	4.6	0.970	1.000

Ingest throughput

Engine	ops/s	Protocol
Milvus	18,060	gRPC batch insert
Vespa	1,502	HTTP/1.1 per-doc

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Introduces an engine-as-module pattern under osbenchmark/engine/ where each engine exposes a standard interface (create_client_factory, create_async_client, register_runners, wait_for_client, on_execute_error). Adds Vespa support using pyvespa httpr (Rust HTTP) for search and aiohttp for document feed. Vespa runners require pre-translated YQL in the request body. Replaces the previous osbenchmark/database/ abstraction layer with the simpler engine registry pattern. OpenSearch engine is a thin delegation to existing client.py and runner.py (those files are untouched). Signed-off-by: Michael Oviedo <mikeovi@amazon.com>

github-actions · 2026-04-21T19:19:30Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 55d1e94.

Path	Line	Severity	Description
setup.py	202	high	New optional dependency 'pyvespa>=0.62.0' added under extras_require['vespa']. Per mandatory flagging rule, all dependency changes must be flagged regardless of apparent legitimacy. Maintainers should verify the pyvespa package origin, publisher, and integrity before merging.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 1 | Medium: 0 | Low: 0

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

rishabh6788 · 2026-04-30T21:05:13Z

+        db_type = self.config.opts("database", "type", default_value="opensearch", mandatory=False).lower()
+        engine = get_engine(db_type)
+        self.logger.info("Checking if [%s] is available.", db_type)
+        if engine.wait_for_client(opensearch["default"], max_attempts=40):


Ao we are assuming each client will have some way to confirm if the Db server is up and running?

yeah, pretty much. Each engine exposes a health check endpoint we can check before running the benchmark

rishabh6788 · 2026-04-30T21:11:03Z

+        # /_plugins/_knn/warmup/) that fail on non-OS engines — the engine's register_runners()
+        # overrides those so each operation type uses the engine-appropriate implementation.
+        db_type = self.config.opts("database", "type", default_value="opensearch", mandatory=False).lower()
+        if db_type != "opensearch":


why this?
Ideally if the default engine type is OS, then this should just register OS runners, am I missing something?
Okay, I think it is because we are not touching current runners logic for OS and other DB engine runners will reside in their implementation logic, is that right?

yeah, OS runners are registered in the existing runner.py file as before so they are untouched. The call here is only for non OS engines. If engine type is OS then this is just a no-op

OVI3D0 requested review from IanHoang, VijayanB, beaioun, gkamat and rishabh6788 as code owners April 21, 2026 19:18

OVI3D0 mentioned this pull request Apr 28, 2026

Add multi-engine support: Vespa and Milvus benchmarking #1042

Closed

1 task

rishabh6788 reviewed Apr 30, 2026

View reviewed changes

OVI3D0 mentioned this pull request May 12, 2026

Wire non-OpenSearch backends through the database registry #1065

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Vespa engine support for multi-engine benchmarking#1043

Add Vespa engine support for multi-engine benchmarking#1043
OVI3D0 wants to merge 1 commit into
opensearch-project:mainfrom
OVI3D0:feature/vespa-support

OVI3D0 commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

rishabh6788 Apr 30, 2026

Uh oh!

OVI3D0 May 1, 2026

Uh oh!

rishabh6788 Apr 30, 2026

Uh oh!

OVI3D0 May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

OVI3D0 commented Apr 21, 2026

Description

Usage

Vespa

Example params file (works for all engines — adjust target_index_name per engine)

New Dependencies

Issues Resolved

Testing

Ingest throughput

Uh oh!

github-actions Bot commented Apr 21, 2026

PR Code Analyzer ❗

Uh oh!

rishabh6788 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

OVI3D0 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

rishabh6788 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

OVI3D0 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Example params file (works for all engines — adjust `target_index_name` per engine)