add blueprint and tutorial for pre and post process function of Bedrock Rerank API (#3254) #3352

tkykenmt · 2025-01-09T04:19:45Z

Description

Amazon Bedrock introduced Rerank model support. OpenSearch can invoke Rerank models on Bedrock by writing custom pre/post processing function, but pre-built function is good for performance. This PR is for adding blueprint and tutorials to illustrate how to use these process functions.

Related Issues

Resolves #3254

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ck Rerank API (opensearch-project#3254) Signed-off-by: tkykenmt <[email protected]>

ylwu-amzn · 2025-01-09T08:21:29Z

docs/tutorials/rerank/rerank_pipeline_with_Bedrock_Rerank_model.md

+
+A [reranking pipeline](https://opensearch.org/docs/latest/search-plugins/search-relevance/reranking-search-results/) can rerank search results, providing a relevance score for each document in the search results with respect to the search query. The relevance score is calculated by a cross-encoder model.
+
+This tutorial illustrates using the [Amazon Bedrock Rerank API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_Rerank.html) to rerank search results using a model hosted on Amazon Bedrock.


Can you explain the difference with https://github.com/opensearch-project/ml-commons/blob/main/docs/tutorials/rerank/rerank_pipeline_with_Amazon_Rerank_model_on_Amazon_Bedrock.md ?

I see both can use same model amazon.rerank-v1:0. Which tutorial cx should follow ? Any preference ?

Previous blueprint requires custom pre and post functions, and calls bedrock invoke API. For invoke API, users need to set model-specific parameters. New blueprint doesn't require custom function code. In addition, new blueprint adopts rerank API. By using Rerank API, users can perform reranking simply by specifying common parameters that are independent of the model. Users can also switch to another model by just changing the model ID.

brianf-aws

I agree left some minor comments. Can you confirm you are able to get the results you need following the tutorial?

brianf-aws · 2025-01-09T18:25:33Z

docs/tutorials/rerank/rerank_pipeline_with_Bedrock_Rerank_model.md

+{
+  "query": {
+    "match": {
+      "passage_text": "What is the capital city of America?"


Maybe we dont need to match by text we could just match all since you are already doing it in the rerank context? Please let me know if I am misunderstanding your thought process

match all may not calculate score based on query text and target documents. So I believe match query is necessary to explain why text search without reranking does not work well

brianf-aws · 2025-01-28T23:36:20Z

docs/tutorials/rerank/rerank_pipeline_with_Bedrock_Rerank_model.md

+        "x-amz-content-sha256": "required",
+        "content-type": "application/json"
+      },
+      "pre_process_function": "connector.pre_process.bedrock.rerank",


nit: lets put the functions together (i.e. the pre and post)

Thanks, I'm going to place the post function under the pre function as follows

"pre_process_function": "connector.pre_process.bedrock.rerank", "post_process_function": "connector.post_process.bedrock.rerank"

updated on 3f2a4f5

brianf-aws · 2025-01-28T23:41:36Z

docs/tutorials/rerank/rerank_pipeline_with_Bedrock_Rerank_model.md

+      },
+      {
+        "_index": "my-test-data",
+        "_id": "1",


Seeing this is indexed based on 1 but the rerank returned based on 0. did you face any issues?

document id can be difference from index of rerank API result. It can also be a string of alphabetic characters or a UUID. Reranking API does not refer document id when reranking.

brianf-aws · 2025-01-28T23:42:51Z

docs/tutorials/rerank/rerank_pipeline_with_Bedrock_Rerank_model.md

+  "ext": {
+    "rerank": {
+      "query_context": {
+         "query_text_path": "query.match.passage_text.query"


Also we can just put it in the question here and have a match all on the top for more flexibility.

As per my previous comment, I think match all with reranking does not align real usecase.

match all may not calculate score based on query text and target documents. So I believe match query is necessary to explain why text search without reranking does not work well

Reranking is high-cost operation. It should be invoked with filtered result of text query. I think that using filter query instead of match query is appropriate approach because calculated score on OpenSearch side may be ignored if reranking is enabled.

updated on 3f2a4f5

brianf-aws · 2025-01-28T23:43:25Z

docs/tutorials/rerank/rerank_pipeline_with_Bedrock_Rerank_model.md

+  "highlight": {
+    "pre_tags": ["<strong>"],
+    "post_tags": ["</strong>"],
+    "fields": {"passage_text": {}}


Can you help me understand why you introduced highlight?

I used highlight to emphasize a result of query, but it isn't required to explain reranking feature. I'll remove highlight option.

updated on 3f2a4f5

…search-project#3352 Signed-off-by: tkykenmt <[email protected]>

codecov · 2025-01-29T18:58:04Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.76%. Comparing base (f28bb74) to head (3f2a4f5).
Report is 188 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3352      +/-   ##
============================================
- Coverage     81.31%   80.76%   -0.56%     
- Complexity     6094     6574     +480     
============================================
  Files           573      598      +25     
  Lines         25268    27957    +2689     
  Branches       2666     3072     +406     
============================================
+ Hits          20547    22579    +2032     
- Misses         3601     4062     +461     
- Partials       1120     1316     +196

Flag	Coverage Δ
ml-commons	`80.76% <ø> (-0.56%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

brianf-aws

Hi @tkykenmt , do you mind looking at my comments? I ran your changes on OS. 2.19 but some calls didn't succeed.

We merged your code changes in 2.19 and I was able to get through some of your API calls but not all. Thank you for the contribution!

brianf-aws · 2025-02-07T01:25:57Z

docs/tutorials/rerank/rerank_pipeline_with_Bedrock_Rerank_model.md

+```json
+POST my-test-data/_search?search_pipeline=rerank_pipeline_bedrock
+{
+  "filter": {


Hey when running this query I got the following error

{ "error": { "root_cause": [ { "type": "parsing_exception", "reason": "Unknown key for a START_OBJECT in [filter].", "line": 2, "col": 13 } ], "type": "parsing_exception", "reason": "Unknown key for a START_OBJECT in [filter].", "line": 2, "col": 13 }, "status": 400 }

filter should be replaced to query. fixed on 654b8a3

brianf-aws · 2025-02-07T01:31:13Z

docs/tutorials/rerank/rerank_pipeline_with_Bedrock_Rerank_model.md

+Note: If you don't use score calculated by OpenSearch, you can optimize query latency to use filter context instead. It skips score calculation on OpenSearch side:
+
+```json
+POST my-test-data/_search?search_pipeline=rerank_pipeline_bedrock


For this query I got the following error

{ "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "query_text_path must point to a string field" } ], "type": "illegal_argument_exception", "reason": "query_text_path must point to a string field" }, "status": 400 }

not sure if this query works

Sorry for broken blueprint... I'd like to fix this errors ASAP.

value of query_text_path was wrong. fixed on 654b8a3

brianf-aws · 2025-02-07T01:46:19Z

Also can we give a warning to users in the tutorial to make sure to check where you can get this model? I spent about 20ish mins trying to figure out why I couldnt access/see the model on my AWS account

tkykenmt · 2025-02-10T06:33:16Z

@brianf-aws Thank you for pointing out to check model-accessibility. I'll add a warning to do it.

Signed-off-by: tkykenmt <[email protected]>

tkykenmt · 2025-02-10T06:49:00Z

Added guidance to check model access settings on Bedrock on 654b8a3

add blueprint and tutorial for pre and post process function of Bedro…

6d2d5a9

…ck Rerank API (opensearch-project#3254) Signed-off-by: tkykenmt <[email protected]>

tkykenmt requested review from b4sjoo, dhrubo-os, jngz-es, model-collapse, rbhavna, ylwu-amzn, zane-neo, Zhangxunmt, austintlee, HenryL27 and xinyual as code owners January 9, 2025 04:19

tkykenmt had a problem deploying to ml-commons-cicd-env-require-approval January 9, 2025 04:20 — with GitHub Actions Failure

tkykenmt temporarily deployed to ml-commons-cicd-env-require-approval January 9, 2025 04:20 — with GitHub Actions Inactive

tkykenmt mentioned this pull request Jan 9, 2025

Add pre and post process functions for Bedrock Rerank API #3254 #3339

Merged

5 tasks

ylwu-amzn reviewed Jan 9, 2025

View reviewed changes

tkykenmt temporarily deployed to ml-commons-cicd-env-require-approval January 10, 2025 01:39 — with GitHub Actions Inactive

tkykenmt requested a review from ylwu-amzn January 14, 2025 06:09

brianf-aws suggested changes Jan 28, 2025

View reviewed changes

modified Bedrock rerank model tutorial to reflect review comment open…

3f2a4f5

…search-project#3352 Signed-off-by: tkykenmt <[email protected]>

tkykenmt requested a review from mingshl as a code owner January 29, 2025 09:28

tkykenmt requested a review from brianf-aws January 29, 2025 09:29

tkykenmt temporarily deployed to ml-commons-cicd-env-require-approval January 29, 2025 09:29 — with GitHub Actions Inactive

tkykenmt requested a deployment to ml-commons-cicd-env-require-approval January 29, 2025 18:58 — with GitHub Actions Waiting

brianf-aws suggested changes Feb 7, 2025

View reviewed changes

fix bedrock rerank blueprint opensearch-project#3352

654b8a3

Signed-off-by: tkykenmt <[email protected]>

tkykenmt had a problem deploying to ml-commons-cicd-env-require-approval February 10, 2025 06:48 — with GitHub Actions Failure

tkykenmt requested a review from brianf-aws February 10, 2025 06:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add blueprint and tutorial for pre and post process function of Bedrock Rerank API (#3254) #3352

add blueprint and tutorial for pre and post process function of Bedrock Rerank API (#3254) #3352

tkykenmt commented Jan 9, 2025 •

edited

Loading

ylwu-amzn Jan 9, 2025

tkykenmt Jan 9, 2025

brianf-aws left a comment

brianf-aws Jan 9, 2025

tkykenmt Jan 29, 2025

brianf-aws Jan 28, 2025

tkykenmt Jan 29, 2025

tkykenmt Jan 29, 2025

brianf-aws Jan 28, 2025

tkykenmt Jan 29, 2025 •

edited

Loading

brianf-aws Jan 28, 2025

tkykenmt Jan 29, 2025

tkykenmt Jan 29, 2025

brianf-aws Jan 28, 2025

tkykenmt Jan 29, 2025 •

edited

Loading

tkykenmt Jan 29, 2025

codecov bot commented Jan 29, 2025

brianf-aws left a comment

brianf-aws Feb 7, 2025

tkykenmt Feb 10, 2025 •

edited

Loading

brianf-aws Feb 7, 2025

tkykenmt Feb 10, 2025

tkykenmt Feb 10, 2025

brianf-aws commented Feb 7, 2025

tkykenmt commented Feb 10, 2025

tkykenmt commented Feb 10, 2025


		A [reranking pipeline](https://opensearch.org/docs/latest/search-plugins/search-relevance/reranking-search-results/) can rerank search results, providing a relevance score for each document in the search results with respect to the search query. The relevance score is calculated by a cross-encoder model.

		This tutorial illustrates using the [Amazon Bedrock Rerank API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_Rerank.html) to rerank search results using a model hosted on Amazon Bedrock.

add blueprint and tutorial for pre and post process function of Bedrock Rerank API (#3254) #3352

Are you sure you want to change the base?

add blueprint and tutorial for pre and post process function of Bedrock Rerank API (#3254) #3352

Conversation

tkykenmt commented Jan 9, 2025 • edited Loading

Description

Related Issues

Check List

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brianf-aws left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkykenmt Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkykenmt Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 29, 2025

Codecov Report

brianf-aws left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkykenmt Feb 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brianf-aws commented Feb 7, 2025

tkykenmt commented Feb 10, 2025

tkykenmt commented Feb 10, 2025

tkykenmt commented Jan 9, 2025 •

edited

Loading

tkykenmt Jan 29, 2025 •

edited

Loading

tkykenmt Jan 29, 2025 •

edited

Loading

tkykenmt Feb 10, 2025 •

edited

Loading