Skip to content

Add rescore support #1062

Open
Likhoram wants to merge 1 commit into
opensearch-project:mainfrom
Likhoram:rescore-support
Open

Add rescore support #1062
Likhoram wants to merge 1 commit into
opensearch-project:mainfrom
Likhoram:rescore-support

Conversation

@Likhoram
Copy link
Copy Markdown

@Likhoram Likhoram commented May 7, 2026

Description

Add rescore parameter support to VectorSearchPartitionParamSource

When oversample_factor is specified in workload params, injects a rescore block into the k-NN query body. This allows benchmarking the latency/recall tradeoff of disk-based vector search with different oversample factor values.

Includes unit tests.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copilot AI review requested due to automatic review settings May 7, 2026 22:44
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Type Validation Missing

The oversample_factor parameter is retrieved from params without type validation or range checking. If a user passes a non-numeric value (e.g., a string) or an invalid numeric value (e.g., negative number or zero), it will be included in the query body and likely cause a runtime error when the query is executed against the search engine. This can happen when users misconfigure benchmark parameters.

self.oversample_factor = params.get("oversample_factor")

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate oversample_factor parameter value

Validate that oversample_factor is a positive number when provided. Invalid values
could cause query failures or unexpected behavior. Add validation similar to how
other numeric parameters are validated in this class.

osbenchmark/workload/params.py [1115]

-self.oversample_factor = params.get("oversample_factor")
+oversample_factor = params.get("oversample_factor")
+if oversample_factor is not None:
+    if not isinstance(oversample_factor, (int, float)) or oversample_factor <= 0:
+        raise ValueError("oversample_factor must be a positive number")
+self.oversample_factor = oversample_factor
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that oversample_factor should be validated to ensure it's a positive number, preventing potential query failures. However, since this is parameter validation rather than a critical bug fix, and the PR doesn't show evidence of existing validation patterns being violated, it receives a moderate-high score.

Medium

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for OpenSearch k-NN disk-based rescoring configuration in VectorSearchPartitionParamSource by allowing an oversample_factor workload parameter to emit a rescore block in the generated k-NN query. This extends the benchmark harness so tracks can measure the latency/accuracy tradeoff when enabling rescoring.

Changes:

  • Capture oversample_factor from workload params in VectorSearchPartitionParamSource.
  • Inject rescore: { oversample_factor: ... } into the generated k-NN query when provided.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1113 to 1116
operation_type = parse_string_parameter(self.PARAMS_NAME_OPERATION_TYPE, params,
self.PARAMS_VALUE_VECTOR_SEARCH)
self.oversample_factor = params.get("oversample_factor")
self.query_params = query_params
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Likhoram - can you use class constants similar to self.PARAMS_NAME_NEIGHBORS_DATA_SET_CORPUS when accessing the oversample factor?

Comment thread osbenchmark/workload/params.py Outdated
Comment on lines +1235 to +1239
# Add rescore parameter if oversample_factor is specified
if self.oversample_factor is not None:
query["rescore"] = {
"oversample_factor": self.oversample_factor
}
Comment thread osbenchmark/workload/params.py Outdated
Comment on lines +1235 to +1239
# Add rescore parameter if oversample_factor is specified
if self.oversample_factor is not None:
query["rescore"] = {
"oversample_factor": self.oversample_factor
}
Comment on lines 1113 to 1116
operation_type = parse_string_parameter(self.PARAMS_NAME_OPERATION_TYPE, params,
self.PARAMS_VALUE_VECTOR_SEARCH)
self.oversample_factor = params.get("oversample_factor")
self.query_params = query_params
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Likhoram - can you use class constants similar to self.PARAMS_NAME_NEIGHBORS_DATA_SET_CORPUS when accessing the oversample factor?

Comment thread osbenchmark/workload/params.py Outdated
# Add rescore parameter if oversample_factor is specified
if self.oversample_factor is not None:
query["rescore"] = {
"oversample_factor": self.oversample_factor
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use string constants for rescore and oversample factor

if self.oversample_factor is not None:
query["rescore"] = {
"oversample_factor": self.oversample_factor
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add in UTs for this change. Please refer to 1bf7c2d

Comment thread osbenchmark/workload/params.py Outdated
})

# Add rescore parameter if oversample_factor is specified
if self.oversample_factor is not None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will allow oversample_factor to be 0, can we add if self.oversample_factor: instead

@VijayanB
Copy link
Copy Markdown
Member

VijayanB commented May 8, 2026

can you signoff the commit as well?

- Add PARAMS_NAME_OVERSAMPLE_FACTOR and PARAMS_NAME_RESCORE class constants
- Add oversample_factor parameter to __init__ method using class constant
- Add rescore block to k-NN query body when oversample_factor is specified
- Use truthy check to reject invalid values (0, None)
- Add unit tests for rescore query building with and without oversample_factor
- Enables disk-based vector search rescoring for benchmarks

Signed-off-by: Wenxin Li <liwenxin@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants