Add rescore support #1062
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Pull request overview
Adds support for OpenSearch k-NN disk-based rescoring configuration in VectorSearchPartitionParamSource by allowing an oversample_factor workload parameter to emit a rescore block in the generated k-NN query. This extends the benchmark harness so tracks can measure the latency/accuracy tradeoff when enabling rescoring.
Changes:
- Capture
oversample_factorfrom workload params inVectorSearchPartitionParamSource. - Inject
rescore: { oversample_factor: ... }into the generated k-NN query when provided.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| operation_type = parse_string_parameter(self.PARAMS_NAME_OPERATION_TYPE, params, | ||
| self.PARAMS_VALUE_VECTOR_SEARCH) | ||
| self.oversample_factor = params.get("oversample_factor") | ||
| self.query_params = query_params |
There was a problem hiding this comment.
@Likhoram - can you use class constants similar to self.PARAMS_NAME_NEIGHBORS_DATA_SET_CORPUS when accessing the oversample factor?
| # Add rescore parameter if oversample_factor is specified | ||
| if self.oversample_factor is not None: | ||
| query["rescore"] = { | ||
| "oversample_factor": self.oversample_factor | ||
| } |
| # Add rescore parameter if oversample_factor is specified | ||
| if self.oversample_factor is not None: | ||
| query["rescore"] = { | ||
| "oversample_factor": self.oversample_factor | ||
| } |
| operation_type = parse_string_parameter(self.PARAMS_NAME_OPERATION_TYPE, params, | ||
| self.PARAMS_VALUE_VECTOR_SEARCH) | ||
| self.oversample_factor = params.get("oversample_factor") | ||
| self.query_params = query_params |
There was a problem hiding this comment.
@Likhoram - can you use class constants similar to self.PARAMS_NAME_NEIGHBORS_DATA_SET_CORPUS when accessing the oversample factor?
| # Add rescore parameter if oversample_factor is specified | ||
| if self.oversample_factor is not None: | ||
| query["rescore"] = { | ||
| "oversample_factor": self.oversample_factor |
There was a problem hiding this comment.
Use string constants for rescore and oversample factor
| if self.oversample_factor is not None: | ||
| query["rescore"] = { | ||
| "oversample_factor": self.oversample_factor | ||
| } |
There was a problem hiding this comment.
Also add in UTs for this change. Please refer to 1bf7c2d
| }) | ||
|
|
||
| # Add rescore parameter if oversample_factor is specified | ||
| if self.oversample_factor is not None: |
There was a problem hiding this comment.
This will allow oversample_factor to be 0, can we add if self.oversample_factor: instead
|
can you signoff the commit as well? |
- Add PARAMS_NAME_OVERSAMPLE_FACTOR and PARAMS_NAME_RESCORE class constants - Add oversample_factor parameter to __init__ method using class constant - Add rescore block to k-NN query body when oversample_factor is specified - Use truthy check to reject invalid values (0, None) - Add unit tests for rescore query building with and without oversample_factor - Enables disk-based vector search rescoring for benchmarks Signed-off-by: Wenxin Li <liwenxin@amazon.com>
Description
Add rescore parameter support to VectorSearchPartitionParamSource
When oversample_factor is specified in workload params, injects a rescore block into the k-NN query body. This allows benchmarking the latency/recall tradeoff of disk-based vector search with different oversample factor values.
Includes unit tests.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.