Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add blueprint and tutorial for pre and post process function of Bedrock Rerank API (#3254) #3352

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

tkykenmt
Copy link
Contributor

@tkykenmt tkykenmt commented Jan 9, 2025

Description

Amazon Bedrock introduced Rerank model support. OpenSearch can invoke Rerank models on Bedrock by writing custom pre/post processing function, but pre-built function is good for performance. This PR is for adding blueprint and tutorials to illustrate how to use these process functions.

Related Issues

Resolves #3254

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.


A [reranking pipeline](https://opensearch.org/docs/latest/search-plugins/search-relevance/reranking-search-results/) can rerank search results, providing a relevance score for each document in the search results with respect to the search query. The relevance score is calculated by a cross-encoder model.

This tutorial illustrates using the [Amazon Bedrock Rerank API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_Rerank.html) to rerank search results using a model hosted on Amazon Bedrock.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the difference with https://github.com/opensearch-project/ml-commons/blob/main/docs/tutorials/rerank/rerank_pipeline_with_Amazon_Rerank_model_on_Amazon_Bedrock.md ?

I see both can use same model amazon.rerank-v1:0. Which tutorial cx should follow ? Any preference ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous blueprint requires custom pre and post functions, and calls bedrock invoke API. For invoke API, users need to set model-specific parameters. New blueprint doesn't require custom function code. In addition, new blueprint adopts rerank API. By using Rerank API, users can perform reranking simply by specifying common parameters that are independent of the model. Users can also switch to another model by just changing the model ID.

@tkykenmt tkykenmt temporarily deployed to ml-commons-cicd-env-require-approval January 10, 2025 01:39 — with GitHub Actions Inactive
@tkykenmt tkykenmt requested a review from ylwu-amzn January 14, 2025 06:09
Copy link
Contributor

@brianf-aws brianf-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree left some minor comments. Can you confirm you are able to get the results you need following the tutorial?

{
"query": {
"match": {
"passage_text": "What is the capital city of America?"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we dont need to match by text we could just match all since you are already doing it in the rerank context? Please let me know if I am misunderstanding your thought process

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match all may not calculate score based on query text and target documents. So I believe match query is necessary to explain why text search without reranking does not work well

"x-amz-content-sha256": "required",
"content-type": "application/json"
},
"pre_process_function": "connector.pre_process.bedrock.rerank",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: lets put the functions together (i.e. the pre and post)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'm going to place the post function under the pre function as follows

      "pre_process_function": "connector.pre_process.bedrock.rerank",
      "post_process_function": "connector.post_process.bedrock.rerank"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated on 3f2a4f5

},
{
"_index": "my-test-data",
"_id": "1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this is indexed based on 1 but the rerank returned based on 0. did you face any issues?

Copy link
Contributor Author

@tkykenmt tkykenmt Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document id can be difference from index of rerank API result. It can also be a string of alphabetic characters or a UUID. Reranking API does not refer document id when reranking.

"ext": {
"rerank": {
"query_context": {
"query_text_path": "query.match.passage_text.query"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we can just put it in the question here and have a match all on the top for more flexibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my previous comment, I think match all with reranking does not align real usecase.

match all may not calculate score based on query text and target documents. So I believe match query is necessary to explain why text search without reranking does not work well

Reranking is high-cost operation. It should be invoked with filtered result of text query. I think that using filter query instead of match query is appropriate approach because calculated score on OpenSearch side may be ignored if reranking is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated on 3f2a4f5

Comment on lines 695 to 698
"highlight": {
"pre_tags": ["<strong>"],
"post_tags": ["</strong>"],
"fields": {"passage_text": {}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand why you introduced highlight?

Copy link
Contributor Author

@tkykenmt tkykenmt Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used highlight to emphasize a result of query, but it isn't required to explain reranking feature. I'll remove highlight option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated on 3f2a4f5

@tkykenmt tkykenmt requested a review from mingshl as a code owner January 29, 2025 09:28
@tkykenmt tkykenmt requested a review from brianf-aws January 29, 2025 09:29
@tkykenmt tkykenmt temporarily deployed to ml-commons-cicd-env-require-approval January 29, 2025 09:29 — with GitHub Actions Inactive
@tkykenmt tkykenmt temporarily deployed to ml-commons-cicd-env-require-approval January 29, 2025 09:29 — with GitHub Actions Inactive
Copy link

codecov bot commented Jan 29, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.76%. Comparing base (f28bb74) to head (3f2a4f5).
Report is 188 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3352      +/-   ##
============================================
- Coverage     81.31%   80.76%   -0.56%     
- Complexity     6094     6574     +480     
============================================
  Files           573      598      +25     
  Lines         25268    27957    +2689     
  Branches       2666     3072     +406     
============================================
+ Hits          20547    22579    +2032     
- Misses         3601     4062     +461     
- Partials       1120     1316     +196     
Flag Coverage Δ
ml-commons 80.76% <ø> (-0.56%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tkykenmt tkykenmt requested a deployment to ml-commons-cicd-env-require-approval January 29, 2025 18:58 — with GitHub Actions Waiting
Copy link
Contributor

@brianf-aws brianf-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tkykenmt , do you mind looking at my comments? I ran your changes on OS. 2.19 but some calls didn't succeed.

We merged your code changes in 2.19 and I was able to get through some of your API calls but not all. Thank you for the contribution!

```json
POST my-test-data/_search?search_pipeline=rerank_pipeline_bedrock
{
"filter": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey when running this query I got the following error

{
  "error": {
    "root_cause": [
      {
        "type": "parsing_exception",
        "reason": "Unknown key for a START_OBJECT in [filter].",
        "line": 2,
        "col": 13
      }
    ],
    "type": "parsing_exception",
    "reason": "Unknown key for a START_OBJECT in [filter].",
    "line": 2,
    "col": 13
  },
  "status": 400
}

Copy link
Contributor Author

@tkykenmt tkykenmt Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter should be replaced to query. fixed on 654b8a3

Note: If you don't use score calculated by OpenSearch, you can optimize query latency to use filter context instead. It skips score calculation on OpenSearch side:

```json
POST my-test-data/_search?search_pipeline=rerank_pipeline_bedrock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this query I got the following error

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "query_text_path must point to a string field"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "query_text_path must point to a string field"
  },
  "status": 400
}

not sure if this query works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for broken blueprint... I'd like to fix this errors ASAP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value of query_text_path was wrong. fixed on 654b8a3

@brianf-aws
Copy link
Contributor

Also can we give a warning to users in the tutorial to make sure to check where you can get this model? I spent about 20ish mins trying to figure out why I couldnt access/see the model on my AWS account

@tkykenmt
Copy link
Contributor Author

@brianf-aws Thank you for pointing out to check model-accessibility. I'll add a warning to do it.

@tkykenmt tkykenmt had a problem deploying to ml-commons-cicd-env-require-approval February 10, 2025 06:48 — with GitHub Actions Failure
@tkykenmt tkykenmt had a problem deploying to ml-commons-cicd-env-require-approval February 10, 2025 06:48 — with GitHub Actions Failure
@tkykenmt
Copy link
Contributor Author

Added guidance to check model access settings on Bedrock on 654b8a3

@tkykenmt tkykenmt requested a review from brianf-aws February 10, 2025 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Implement BedrockRerank[Pre|Post]ProcessFunction
3 participants