-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add blueprint and tutorial for pre and post process function of Bedrock Rerank API (#3254) #3352
base: main
Are you sure you want to change the base?
add blueprint and tutorial for pre and post process function of Bedrock Rerank API (#3254) #3352
Conversation
…ck Rerank API (opensearch-project#3254) Signed-off-by: tkykenmt <[email protected]>
|
||
A [reranking pipeline](https://opensearch.org/docs/latest/search-plugins/search-relevance/reranking-search-results/) can rerank search results, providing a relevance score for each document in the search results with respect to the search query. The relevance score is calculated by a cross-encoder model. | ||
|
||
This tutorial illustrates using the [Amazon Bedrock Rerank API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_Rerank.html) to rerank search results using a model hosted on Amazon Bedrock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the difference with https://github.com/opensearch-project/ml-commons/blob/main/docs/tutorials/rerank/rerank_pipeline_with_Amazon_Rerank_model_on_Amazon_Bedrock.md ?
I see both can use same model amazon.rerank-v1:0
. Which tutorial cx should follow ? Any preference ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous blueprint requires custom pre and post functions, and calls bedrock invoke API. For invoke API, users need to set model-specific parameters. New blueprint doesn't require custom function code. In addition, new blueprint adopts rerank API. By using Rerank API, users can perform reranking simply by specifying common parameters that are independent of the model. Users can also switch to another model by just changing the model ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree left some minor comments. Can you confirm you are able to get the results you need following the tutorial?
{ | ||
"query": { | ||
"match": { | ||
"passage_text": "What is the capital city of America?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we dont need to match by text we could just match all since you are already doing it in the rerank context? Please let me know if I am misunderstanding your thought process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match all may not calculate score based on query text and target documents. So I believe match query is necessary to explain why text search without reranking does not work well
"x-amz-content-sha256": "required", | ||
"content-type": "application/json" | ||
}, | ||
"pre_process_function": "connector.pre_process.bedrock.rerank", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: lets put the functions together (i.e. the pre and post)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm going to place the post function under the pre function as follows
"pre_process_function": "connector.pre_process.bedrock.rerank",
"post_process_function": "connector.post_process.bedrock.rerank"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated on 3f2a4f5
}, | ||
{ | ||
"_index": "my-test-data", | ||
"_id": "1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing this is indexed based on 1 but the rerank returned based on 0. did you face any issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
document id can be difference from index of rerank API result. It can also be a string of alphabetic characters or a UUID. Reranking API does not refer document id when reranking.
"ext": { | ||
"rerank": { | ||
"query_context": { | ||
"query_text_path": "query.match.passage_text.query" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we can just put it in the question here and have a match all on the top for more flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my previous comment, I think match all with reranking does not align real usecase.
match all may not calculate score based on query text and target documents. So I believe match query is necessary to explain why text search without reranking does not work well
Reranking is high-cost operation. It should be invoked with filtered result of text query. I think that using filter query instead of match query is appropriate approach because calculated score on OpenSearch side may be ignored if reranking is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated on 3f2a4f5
"highlight": { | ||
"pre_tags": ["<strong>"], | ||
"post_tags": ["</strong>"], | ||
"fields": {"passage_text": {}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand why you introduced highlight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used highlight to emphasize a result of query, but it isn't required to explain reranking feature. I'll remove highlight option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated on 3f2a4f5
…search-project#3352 Signed-off-by: tkykenmt <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3352 +/- ##
============================================
- Coverage 81.31% 80.76% -0.56%
- Complexity 6094 6574 +480
============================================
Files 573 598 +25
Lines 25268 27957 +2689
Branches 2666 3072 +406
============================================
+ Hits 20547 22579 +2032
- Misses 3601 4062 +461
- Partials 1120 1316 +196
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tkykenmt , do you mind looking at my comments? I ran your changes on OS. 2.19 but some calls didn't succeed.
We merged your code changes in 2.19 and I was able to get through some of your API calls but not all. Thank you for the contribution!
```json | ||
POST my-test-data/_search?search_pipeline=rerank_pipeline_bedrock | ||
{ | ||
"filter": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey when running this query I got the following error
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "Unknown key for a START_OBJECT in [filter].",
"line": 2,
"col": 13
}
],
"type": "parsing_exception",
"reason": "Unknown key for a START_OBJECT in [filter].",
"line": 2,
"col": 13
},
"status": 400
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter should be replaced to query. fixed on 654b8a3
Note: If you don't use score calculated by OpenSearch, you can optimize query latency to use filter context instead. It skips score calculation on OpenSearch side: | ||
|
||
```json | ||
POST my-test-data/_search?search_pipeline=rerank_pipeline_bedrock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this query I got the following error
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "query_text_path must point to a string field"
}
],
"type": "illegal_argument_exception",
"reason": "query_text_path must point to a string field"
},
"status": 400
}
not sure if this query works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for broken blueprint... I'd like to fix this errors ASAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value of query_text_path was wrong. fixed on 654b8a3
Also can we give a warning to users in the tutorial to make sure to check where you can get this model? I spent about 20ish mins trying to figure out why I couldnt access/see the model on my AWS account |
@brianf-aws Thank you for pointing out to check model-accessibility. I'll add a warning to do it. |
Signed-off-by: tkykenmt <[email protected]>
Added guidance to check model access settings on Bedrock on 654b8a3 |
Description
Amazon Bedrock introduced Rerank model support. OpenSearch can invoke Rerank models on Bedrock by writing custom pre/post processing function, but pre-built function is good for performance. This PR is for adding blueprint and tutorials to illustrate how to use these process functions.
Related Issues
Resolves #3254
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.