Skip to content

Wire non-OpenSearch backends through the database registry#1065

Open
zznate wants to merge 2 commits into
opensearch-project:mainfrom
zznate:database-registry-non-os-routing
Open

Wire non-OpenSearch backends through the database registry#1065
zznate wants to merge 2 commits into
opensearch-project:mainfrom
zznate:database-registry-non-os-routing

Conversation

@zznate
Copy link
Copy Markdown

@zznate zznate commented May 11, 2026

PR #1023 introduced the DatabaseClient abstraction and registry but left several code paths hardcoded to the OpenSearch client. Surface each of these so a non-OpenSearch backend (registered via the existing factory + registry) can run a benchmark end-to-end.

  • benchmark.py: add --database-type to osb run. Choices derive from the DatabaseType enum so new backends become selectable simply by registering. Sets cfg.opts("database", "type") which the worker coordinator already reads.
  • test_run_orchestrator.py: skip the OS distribution-version probe (which builds a raw opensearchpy client and walks OS-shaped health/info routes) when database.type != opensearch.
  • worker_coordinator.py: WorkerCoordinatorActor.create_os_clients routes through DatabaseClientFactory.create_client_factory for non-OpenSearch backends, mirroring the async path at WorkerCoordinator.os_clients. wait_for_rest_api delegates the readiness probe to a factory-provided wait_for_rest_layer when database.type != opensearch; the legacy probe is preserved for OpenSearch.

OpenSearch behaviour is preserved. See #1064 for details.

Closes #1064

Description

[Describe what this change achieves]

Issues Resolved

[List any issues this PR will resolve]

Testing

  • New functionality includes testing

[Describe how this change was tested]


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

PR opensearch-project#1023 introduced the DatabaseClient abstraction and registry but
left several code paths hardcoded to the OpenSearch client. Surface
each of these so a non-OpenSearch backend (registered via the existing
factory + registry) can run a benchmark end-to-end.

- benchmark.py: add `--database-type` to `osb run`. Choices derive from
  the DatabaseType enum so new backends become selectable simply by
  registering. Sets `cfg.opts("database", "type")` which the worker
  coordinator already reads.
- test_run_orchestrator.py: skip the OS distribution-version probe
  (which builds a raw opensearchpy client and walks OS-shaped
  health/info routes) when database.type != opensearch.
- worker_coordinator.py: WorkerCoordinatorActor.create_os_clients
  routes through DatabaseClientFactory.create_client_factory for
  non-OpenSearch backends, mirroring the async path at
  WorkerCoordinator.os_clients. wait_for_rest_api delegates the
  readiness probe to a factory-provided wait_for_rest_layer when
  database.type != opensearch; the legacy probe is preserved for
  OpenSearch.

OpenSearch behaviour is preserved.

Closes opensearch-project#1064

Signed-off-by: zznate <zznate.m@gmail.com>
@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Missing Import

The code references DatabaseClientFactory on lines 981 and 1041 but no import statement for this class is visible in the diff. If DatabaseClientFactory is not imported at the top of this file, this will raise a NameError at runtime when a non-OpenSearch database type is used.

all_client_options = self.config.opts("client", "options").all_client_options
cluster_client_options = dict(all_client_options[cluster_name])
# Use retries to avoid aborts on long living connections for telemetry devices
cluster_client_options["retry-on-timeout"] = True
if database_type.lower() == "opensearch":
    opensearch[cluster_name] = self.os_client_factory(cluster_hosts, cluster_client_options).create()
else:
    # Non-OpenSearch backends route through the registry so the right
    # url_prefix / transport configuration is applied. This path
    # mirrors the async create path at WorkerCoordinator.os_clients.
    db_factory = DatabaseClientFactory.create_client_factory(
        database_type, cluster_hosts, cluster_client_options,
    )
    opensearch[cluster_name] = db_factory.create()
Incomplete Error Handling

When database_type.lower() != "opensearch" and the factory lacks a wait_for_rest_layer method or the method returns False, the code logs an error and raises SystemSetupError. However, if hasattr(db_factory, "wait_for_rest_layer") is False, the function silently returns without verifying readiness. This means a non-OpenSearch backend that does not implement wait_for_rest_layer will proceed without any readiness check, potentially causing failures later when the database is not actually ready.

if database_type.lower() != "opensearch":
    hosts = self.config.opts("client", "hosts").all_hosts["default"]
    client_options = dict(self.config.opts("client", "options").all_client_options["default"])
    db_factory = DatabaseClientFactory.create_client_factory(database_type, hosts, client_options)
    self.logger.info("Checking if non-OS REST layer is available (database_type=%s).", database_type)
    if hasattr(db_factory, "wait_for_rest_layer") and db_factory.wait_for_rest_layer(max_attempts=40):
        self.logger.info("REST layer is available.")
        return
    self.logger.error("Non-OS REST layer is not yet available. Stopping benchmark.")
    raise exceptions.SystemSetupError(f"{database_type} REST layer is not available.")

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Improve method existence check logic

The condition hasattr(db_factory, "wait_for_rest_layer") and
db_factory.wait_for_rest_layer(...) will raise an error if the factory doesn't have
the method but returns False from wait_for_rest_layer. Consider checking the method
existence separately and handling the case when it doesn't exist.

osbenchmark/worker_coordinator/worker_coordinator.py [1038-1047]

 if database_type.lower() != "opensearch":
     hosts = self.config.opts("client", "hosts").all_hosts["default"]
     client_options = dict(self.config.opts("client", "options").all_client_options["default"])
     db_factory = DatabaseClientFactory.create_client_factory(database_type, hosts, client_options)
     self.logger.info("Checking if non-OS REST layer is available (database_type=%s).", database_type)
-    if hasattr(db_factory, "wait_for_rest_layer") and db_factory.wait_for_rest_layer(max_attempts=40):
-        self.logger.info("REST layer is available.")
-        return
+    if hasattr(db_factory, "wait_for_rest_layer"):
+        if db_factory.wait_for_rest_layer(max_attempts=40):
+            self.logger.info("REST layer is available.")
+            return
     self.logger.error("Non-OS REST layer is not yet available. Stopping benchmark.")
     raise exceptions.SystemSetupError(f"{database_type} REST layer is not available.")
Suggestion importance[1-10]: 3

__

Why: The suggestion improves code clarity by separating the hasattr check from the method call, making the logic more explicit. However, the original code using and short-circuit evaluation is functionally correct and won't raise an error if the method doesn't exist. The improvement is minor and mainly stylistic.

Low

@OVI3D0
Copy link
Copy Markdown
Member

OVI3D0 commented May 12, 2026

thanks for this @zznate ! Just looking at a high level I think I planned to introduce something similar in #1043 along with the introduction of Vespa support, but that PR is still pending review. This LGTM though and solves the hardcoding issue immediately so I think we can just go ahead and merge :)

if database_type.lower() != "opensearch":
hosts = self.config.opts("client", "hosts").all_hosts["default"]
client_options = dict(self.config.opts("client", "options").all_client_options["default"])
db_factory = DatabaseClientFactory.create_client_factory(database_type, hosts, client_options)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually one NIT i have is that DatabaseClientFactory seems to get instantiated twice, once here and once up above on line 991, maybe we can just clean that up?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rad - thanks @OVI3D0

Good catch - i'll do that real quick.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OVI3D0 Any interest in a Quickwit implementation? I can clean it up a bit and submit with a new issue if so.

Caveat: there is a lot of surface area that doesnt correlate, so we would need to plan out a consistent way of handling this. That said, might be a good forcing function for doing this right before other impls start to arrive?

Copy link
Copy Markdown
Member

@OVI3D0 OVI3D0 May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any interest in a Quickwit implementation?

yeah for sure @zznate ! the hope with this project was to have OSB branch out from just being a benchmarking tool for OpenSearch and have it be used for any compatible search engine :) feel free to open up an issue or even contribute the changes and we can get those in

we would need to plan out a consistent way of handling this

There was a larger plan for this written out here: #998

Here you can read about the proposed architecture for future search engines we had in mind

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perferct - thanks! I'll give that a read. Missed that in my initial pass.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OVI3D0 - do i need to do anything else to get this merged? I see it's blocked still, please let me know if you need anything from my end.

Copy link
Copy Markdown
Member

@OVI3D0 OVI3D0 May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zznate unfortunately code changes across the org are locked down until this is resolved: https://www.stepsecurity.io/blog/mini-shai-hulud-is-back-a-self-spreading-supply-chain-attack-hits-the-npm-ecosystem

:p

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh gross! Understood tho - thanks!

Addresses review feedback: the non-OpenSearch factory was being
instantiated in both create_os_clients and wait_for_rest_api. Cache
on the actor in the former and reuse in the latter.

Signed-off-by: zznate <zznate.m@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Several OpenSearch-hardcoded call sites bypass the DatabaseClient registry introduced in #1023

2 participants