Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(llmobs): trace text-based bedrock converse api #12560

Open
wants to merge 58 commits into
base: main
Choose a base branch
from

Conversation

lievan
Copy link
Contributor

@lievan lievan commented Feb 27, 2025

This PR supports instrumenting LLM spans for bedrock's Converse method. This PR does not touch ConverseStream, but we document it’s behavior in test_llmobs_converse_stream.

Its helpful to review the bedrock request syntax, and the response syntax

Example bedrock code snippet:

  response = bedrock_runtime.converse(
      system=[{
          "text": "You are an app that creates play lists for a radio station that plays rock and pop music. Only return song names and the artist. "
      }],
      modelId=MODEL_ID,
      messages=messages,
      inferenceConfig=…
      toolConfig=…
  )

Manual QA

Example with tool calls
Example without tool calls

Data this PR traces

  • System prompts in meta.input.messages[0].content with system role
  • Text based input in meta.input.messages[i].content with user role
  • Text based output in meta.input.messages[i].content with assistant role
  • Tool call outputs in meta.output.messages[0].tool_calls
  • Inference parameter metadata max_tokens and temperature
  • stop_reason

Implementation details:

We register a separate trace handler for processing bedrock converse responses.

core.on("botocore.bedrock.process_response_converse", _on_botocore_bedrock_process_response_converse)

This is to avoid the code-path that does extra post-processing of invoke model responses before it's ready for llmobs_set_tags.

Converse still relies on the same trace handler for processing 1) request input 2) bedrock exceptions.

Cassettes

I chose to use cassettes since there were some difficulties with mocking out the bedrock calls with respx. There are some authentication steps that happen within the botocore library before the mocked LLM call, leading me to run into errors like:

E           botocore.exceptions.ClientError: An error occurred (UnrecognizedClientException) when calling the Converse operation: The security token included in the request is invalid.
E           botocore.exceptions.ClientError: An error occurred (MissingAuthenticationTokenException) when calling the Converse operation: Missing Authentication Token

This means we needed to mock out or find a way to skip the internal authentication steps, which would cause the test to be dependent on non-bedrock parts of the botocore library which may be subject to change. In my opinion, this makes cassettes the better option.

To Do

  • Support converse stream
  • Support more inference params like top_p and stop_sequences

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

Copy link
Contributor

github-actions bot commented Feb 27, 2025

CODEOWNERS have been resolved as:

.riot/requirements/15f7356.txt                                          @DataDog/apm-python
.riot/requirements/1ecd900.txt                                          @DataDog/apm-python
.riot/requirements/5295cd7.txt                                          @DataDog/apm-python
.riot/requirements/df0b19d.txt                                          @DataDog/apm-python
.riot/requirements/e1342cb.txt                                          @DataDog/apm-python
releasenotes/notes/bedrock-converse-api-20dd255c1ee18cf4.yaml           @DataDog/apm-python
tests/contrib/botocore/bedrock_cassettes/bedrock_converse.yaml          @DataDog/ml-observability
tests/contrib/botocore/bedrock_cassettes/bedrock_converse_error.yaml    @DataDog/ml-observability
tests/contrib/botocore/bedrock_cassettes/bedrock_converse_stream.yaml   @DataDog/ml-observability
tests/snapshots/tests.contrib.botocore.test_bedrock.test_converse.json  @DataDog/apm-python
ddtrace/_trace/trace_handlers.py                                        @DataDog/apm-sdk-api-python
ddtrace/contrib/internal/botocore/patch.py                              @DataDog/apm-core-python @DataDog/apm-idm-python
ddtrace/contrib/internal/botocore/services/bedrock.py                   @DataDog/ml-observability
ddtrace/llmobs/_integrations/bedrock.py                                 @DataDog/ml-observability
ddtrace/llmobs/_integrations/utils.py                                   @DataDog/ml-observability
riotfile.py                                                             @DataDog/apm-python
tests/contrib/botocore/bedrock_utils.py                                 @DataDog/ml-observability
tests/contrib/botocore/test.py                                          @DataDog/apm-core-python @DataDog/apm-idm-python
tests/contrib/botocore/test_bedrock.py                                  @DataDog/ml-observability
tests/contrib/botocore/test_bedrock_llmobs.py                           @DataDog/ml-observability

@pr-commenter
Copy link

pr-commenter bot commented Feb 28, 2025

Benchmarks

Benchmark execution time: 2025-03-12 00:02:27

Comparing candidate commit ef243ab in PR branch evan.li/claude-code-converse-api with baseline commit 8d2f7da in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 468 metrics, 2 unstable metrics.

@lievan lievan changed the title feat(llmobs): trace bedrock converse api feat(llmobs): trace text-based bedrock converse api Feb 28, 2025
@datadog-dd-trace-py-rkomorn
Copy link

Datadog Report

Branch report: evan.li/claude-code-converse-api
Commit report: e700fca
Test service: dd-trace-py

✅ 0 Failed, 43 Passed, 290 Skipped, 49.39s Total duration (5m 5.68s time saved)

continue
role = str(p.get("role", ""))
content = p.get("content", "")
if isinstance(content, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if content is a string? Doesn't seem like we append it anywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bedrock converse spec states content is always a list of ContentBlock objects (what we parse inside this if block)

https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Message.html

i'll add a comment to clarify this

message = response.get("output", {}).get("message", {})
role = message.get("role", "assistant")
tool_calls_info = []
if message.get("content") and isinstance(message["content"], list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can content be a string or a non-list type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do not think so; bedrock docs state content is always a list of "ContentBlock" types: https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html

and this seems to be consistent with the earliest version of botocore that supports bedrock converse

Comment on lines 149 to 151
"name": content_block.get("toolUse", {}).get("name", ""),
"arguments": content_block.get("toolUse", {}).get("input", ""),
"tool_id": content_block.get("toolUse", {}).get("toolUseId", ""),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these values always guaranteed to be a string? should we be defensive/cast to string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name and tool_id are (i'll cast to string), but arguments isn't - thanks for the catch, ill default arguments to {}

"""
Sets LLM usage metrics in the context for LLM Observability.
"""
llmobs_usage = {}
if input_tokens:
if input_tokens is not None and input_tokens != "":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is checking against None and empty string not covered by if input_tokens?

Copy link
Contributor Author

@lievan lievan Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to differentiate token counts being 0 vs not present, which i guess is rare but valid. this is differentiate is needed since converse returns tokens in a usage field with integer values

for content_block in content:
if content_block.get("text") and isinstance(content_block.get("text"), str):
content_blocks.append(content_block.get("text", ""))
elif content_block.get("toolUse") and isinstance(content_block.get("toolUse"), dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can a content block contain multiple objects? i.e. should we always check each condition instead of if/elif/else?

Copy link
Contributor Author

@lievan lievan Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contentblock should only ever have one object type so i think this logic is ok
image

https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html#API_runtime_ContentBlock_Contents

tool_calls_info.append(
{
"name": str(toolUse.get("name", "")),
"arguments": toolUse.get("input", {}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should json dump this value if it's always json serializable, unless we need to pass it in as an actual dict (non-string) to be properly visualized in our UI. Correct me if I'm wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe if we dump it here, and then also call safe_json when we encode, the arguments will show up as a raw json string in the UI instead of the whole thing being having nice json formatting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm yea actually we'll drop the output if this isn't a dict since that's how we decode tool calls in the backend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants