Skip to content

Conversation

@dbwiddis
Copy link
Member

WIP: Still need to address bulk response

Description

Avoids the serializing and deserializing of responses in the (default) client. This avoids parsing issues as well as loss of runtime information of subclasses.

Issues Resolved

Alternate solution to resolve #132

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dhrubo-os
Copy link
Contributor

I assume from the consumer side like ML-Common/Flow framework we don't have to make any changes based on this, right?

@dbwiddis
Copy link
Member Author

I assume from the consumer side like ML-Common/Flow framework we don't have to make any changes based on this, right?

I've made an effort to be backwards compatible for the current implementation. I'm still debating whether to mark the parser methods as deprecated.

So while you don't "have to" make changes, it would be a Good Thing™️ to replace the pattern:

return response.parser() == null ? null : GetResponse.fromXContent(response.parser());

with

return response.getResponse();

@dhrubo-os
Copy link
Contributor

I assume from the consumer side like ML-Common/Flow framework we don't have to make any changes based on this, right?

I've made an effort to be backwards compatible for the current implementation. I'm still debating whether to mark the parser methods as deprecated.

So while you don't "have to" make changes, it would be a Good Thing™️ to replace the pattern:

return response.parser() == null ? null : GetResponse.fromXContent(response.parser());

with

return response.getResponse();

return response.getResponse(); will that be also compatible with other client implementation like DDBClient?

@dbwiddis
Copy link
Member Author

return response.getResponse(); will that be also compatible with other client implementation like DDBClient?

Yes, this implementation basically encodes the "if we don't have the NodeClient response, create one from a parser."

The possible IOException will need to be handled (but I think it's already done!)

Only really potentially controversial question is whether it is even possible for the response to be null. I don't think so but the code allows for that.

@dbwiddis dbwiddis force-pushed the passthrough-response branch 2 times, most recently from fc86b90 to 80a12b5 Compare April 15, 2025 20:53
@dbwiddis dbwiddis marked this pull request as ready for review April 15, 2025 21:39
@dbwiddis dbwiddis force-pushed the passthrough-response branch 3 times, most recently from 3c4dcc4 to ab9674d Compare April 16, 2025 17:06
@dbwiddis
Copy link
Member Author

@dhrubo-os @arjunkumargiri this is ready for review

@dbwiddis dbwiddis force-pushed the passthrough-response branch 2 times, most recently from 2bbe751 to 412c8a3 Compare April 16, 2025 23:56
try {
return SdkClientUtils.createParser(bulkResponse);
} catch (IOException | NullPointerException e) {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the cascade effect if we just return null to the downstream?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of those “should never happen” situations.

The SDKClientUtil null checks it if it’s used.

Otherwise the caller has to null check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative is throwing an unchecked exception the caller doesn’t know to catch, or a checked exception that will never happen that we force the caller to handle

@dhrubo-os
Copy link
Contributor

Can we please add more test details in the issue description?

Have we tried to make any aggregated search with any Plugin like FF/ml-commons with this changes?

@dhrubo-os
Copy link
Contributor

Can we also test the bulk as well as we made changes there too?

mapper.serialize(obj, generator);
}
return jsonXContent.createParser(NamedXContentRegistry.EMPTY, DeprecationHandler.IGNORE_DEPRECATIONS, stringWriter.toString());
return SdkClientUtils.createParser(stringWriter.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume DDB client parsing logic is different which is why we aren't adding the same changes to the ddb client?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually DDB client logic is identical and we should make this same change there.

This isn't so much a change as a refactoring, moving the jsonXContent.createParser code to the SdkClientUtils class and delegating the call there.

The reason it was less important for the bug fix is that DDB doesn't implement search.

@dbwiddis
Copy link
Member Author

Can we also test the bulk as well as we made changes there too?

There is no change to the functionality of bulk, which is thoroughly unit tested. In fact, the change revealed a bug in the test (wrong type) which became more obvious with the direct pass-through.

@dbwiddis
Copy link
Member Author

Can we please add more test details in the issue description?

I started #132 by writing a unit test that failed without the fix and that test is carried over here.

Not clear what you're asking for? A sanity check with Flow Framework? I've done that, as well as run the tenant aware integration tests using the new client.

Have we tried to make any aggregated search with any Plugin like FF/ml-commons with this changes?

Yes, although there are likely many different combinations.

@dhrubo-os
Copy link
Contributor

Not clear what you're asking for? A sanity check with Flow Framework? I've done that, as well as run the tenant aware integration tests using the new client.

Yeah this is what I was asking. If you ran the tenant aware integration tests with the new changes of PR, I'm good with this. Thanks for clarifying.

@dbwiddis
Copy link
Member Author

So while you don't "have to" make changes

I should be more clear: you will need to change the search handler's code to the new getResponse() version to fix the aggregation bug in the local client with this PR

@dbwiddis
Copy link
Member Author

Added integ test to ML Commons PR opensearch-project/ml-commons#3516.

opensearch-project/ml-commons@0bef7a9 adds an integ test that uses an aggregation to search. This results in the failure stack trace in #132

  2> REPRODUCE WITH: ./gradlew ':opensearch-ml-plugin:integTest' --tests "org.opensearch.ml.rest.RestMLModelTenantAwareIT.testModelCRUD" -Dtests.seed=8663FF7406EA2943 -Dtests.security.manager=false -Dtests.locale=yrl-CO -Dtests.timezone=Australia/Lord_Howe -Druntime.java=21
  2> org.opensearch.client.ResponseException: method [GET], host [http://[::1]:61137], URI [/_plugins/_ml/models/_search], status line [HTTP/1.1 500 Internal Server Error]
    {"error":{"root_cause":[{"type":"status_exception","reason":"Fail to search model version"}],"type":"status_exception","reason":"Fail to search model version"},"status":500}

»  org.opensearch.core.common.ParsingException: Could not parse aggregation keyed as [unique_model_names]
»       at org.opensearch.search.aggregations.Aggregations.fromXContent(Aggregations.java:165) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]

» WARN ][r.suppressed             ] [integTest-0] path: /_plugins/_ml/models/_search, params: {}
»  org.opensearch.OpenSearchStatusException: Fail to search model version

opensearch-project/ml-commons@0e10ea8 updates the calls to SdkClient search to directly return the search response. The integ test passes.

@dbwiddis dbwiddis force-pushed the passthrough-response branch from 412c8a3 to 66a256f Compare April 22, 2025 05:51
@dbwiddis
Copy link
Member Author

Ran a sanity test against ML Commons PR opensearch-project/ml-commons#3516 branch deployed in OpenSearch 3.0.0-beta1 docker container

Created a connector and registered a model with that connector

POST https://localhost:9200/_plugins/_ml/models/_register
{
  "name": "bedrock:embedding cohere model",
  "function_name": "remote",
  "description": "embedding open ai model",
  "connector_id": "5vYkXJYBhf97Ja-IYBpD"
}
{
    "task_id": "6_YnXJYBhf97Ja-IFhpG",
    "status": "CREATED",
    "model_id": "7PYnXJYBhf97Ja-IFhp9"
}

Deployed the model

POST https://localhost:9200/_plugins/_ml/models/7PYnXJYBhf97Ja-IFhp9/_deploy

{
    "task_id": "7vYtXJYBhf97Ja-I5hqc",
    "task_type": "DEPLOY_MODEL",
    "status": "COMPLETED"
}

Undeployed the model

https://localhost:9200/_plugins/_ml/models/7PYnXJYBhf97Ja-IFhp9/_undeploy

{
    "6vhLaJ3pRDy773z0aZCXPA": {
        "stats": {
            "7PYnXJYBhf97Ja-IFhp9": "undeployed"
        }
    }
}

Undeployed the model again

https://localhost:9200/_plugins/_ml/models/7PYnXJYBhf97Ja-IFhp9/_undeploy

{
    "6vhLaJ3pRDy773z0aZCXPA": {
        "stats": {
            "7PYnXJYBhf97Ja-IFhp9": "not_found"
        }
    }
}

Searched with an aggregation

GET https://localhost:9200/_plugins/_ml/models/_search
{
  "size": 0,
  "aggs": {
    "unique_model_names": {
      "terms": {
        "field": "name.keyword",
        "size": 10000
      }
    }
  }
}

{
    "took": 9,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "unique_model_names": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "bedrock:embedding cohere model",
                    "doc_count": 1
                }
            ]
        }
    }
}

@dbwiddis dbwiddis merged commit 5c7d005 into opensearch-project:main Apr 22, 2025
11 checks passed
@dbwiddis dbwiddis deleted the passthrough-response branch April 22, 2025 17:10
@codecov
Copy link

codecov bot commented Apr 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (9656e34) to head (66a256f).
Report is 9 commits behind head on main.

Additional details and impacted files
@@     Coverage Diff     @@
##   main   #141   +/-   ##
===========================
===========================

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dbwiddis
Copy link
Member Author

dbwiddis commented Apr 22, 2025

@dhrubo-os thanks for the approval! Some quick follow-up:

  1. I would recommend you cherry-pick opensearch-project/ml-commons@0e10ea8 into a PR on main branch. This will directly pass search response for Local/NodeClient implementations. You may consider doing similar for other wrappers over time, but this one is more "urgent" due to the search aggregation bug.
  2. Without the cherry-pick recommended in (1) the parsing still requires a type hint. We can't pass typed_keys on the ml search REST path because it's not allowed. This may still impact the remote client and we may want to consider always adding that to the rest path when sending a search request via the client. (Would like to target that for 3.1.0)

@dbwiddis dbwiddis added the backport 3.0 3.0.0 Release label Apr 22, 2025
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 22, 2025
Signed-off-by: Daniel Widdis <[email protected]>
(cherry picked from commit 5c7d005)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
dbwiddis pushed a commit that referenced this pull request Apr 22, 2025
(cherry picked from commit 5c7d005)

Signed-off-by: Daniel Widdis <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/opensearch-remote-metadata-sdk/backport-2.19 2.19
# Navigate to the new working tree
pushd ../.worktrees/opensearch-remote-metadata-sdk/backport-2.19
# Create a new branch
git switch --create backport/backport-141-to-2.19
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5c7d0053506a7ff3b9e1e8d5e1fa28be5f44f750
# Push it to GitHub
git push --set-upstream origin backport/backport-141-to-2.19
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/opensearch-remote-metadata-sdk/backport-2.19

Then, create a pull request where the base branch is 2.19 and the compare/head branch is backport/backport-141-to-2.19.

dbwiddis added a commit to dbwiddis/opensearch-remote-metadata-sdk that referenced this pull request Apr 28, 2025
dbwiddis added a commit to dbwiddis/opensearch-remote-metadata-sdk that referenced this pull request Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]Unable to call aggregation in ml-commons API /_plugins/_ml/models/_search API

2 participants