feat(llmobs): trace text-based bedrock converse api #12560

lievan · 2025-02-27T22:42:19Z

This PR supports instrumenting LLM spans for bedrock's Converse method. This PR does not touch ConverseStream, but we document it’s behavior in test_llmobs_converse_stream.

Its helpful to review the bedrock request syntax, and the response syntax

Example bedrock code snippet:

  response = bedrock_runtime.converse(
      system=[{
          "text": "You are an app that creates play lists for a radio station that plays rock and pop music. Only return song names and the artist. "
      }],
      modelId=MODEL_ID,
      messages=messages,
      inferenceConfig=…
      toolConfig=…
  )

Manual QA

Example with tool calls
Example without tool calls

Data this PR traces

System prompts in meta.input.messages[0].content with system role
Text based input in meta.input.messages[i].content with user role
Text based output in meta.input.messages[i].content with assistant role
Tool call outputs in meta.output.messages[0].tool_calls
Inference parameter metadata max_tokens and temperature
stop_reason

Implementation details:

We register a separate trace handler for processing bedrock converse responses.

core.on("botocore.bedrock.process_response_converse", _on_botocore_bedrock_process_response_converse)

This is to avoid the code-path that does extra post-processing of invoke model responses before it's ready for llmobs_set_tags.

Converse still relies on the same trace handler for processing 1) request input 2) bedrock exceptions.

Cassettes

I chose to use cassettes since there were some difficulties with mocking out the bedrock calls with respx. There are some authentication steps that happen within the botocore library before the mocked LLM call, leading me to run into errors like:

E           botocore.exceptions.ClientError: An error occurred (UnrecognizedClientException) when calling the Converse operation: The security token included in the request is invalid.
E           botocore.exceptions.ClientError: An error occurred (MissingAuthenticationTokenException) when calling the Converse operation: Missing Authentication Token

This means we needed to mock out or find a way to skip the internal authentication steps, which would cause the test to be dependent on non-bedrock parts of the botocore library which may be subject to change. In my opinion, this makes cassettes the better option.

To Do

Support converse stream
Support more inference params like top_p and stop_sequences

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

ddtrace/llmobs/_integrations/bedrock.py

ddtrace/contrib/internal/botocore/patch.py

ddtrace/contrib/internal/botocore/services/bedrock.py

ddtrace/contrib/internal/botocore/patch.py

ddtrace/llmobs/_integrations/bedrock.py

github-actions · 2025-02-27T22:42:52Z

CODEOWNERS have been resolved as:

.riot/requirements/15f7356.txt                                          @DataDog/apm-python
.riot/requirements/1ecd900.txt                                          @DataDog/apm-python
.riot/requirements/5295cd7.txt                                          @DataDog/apm-python
.riot/requirements/df0b19d.txt                                          @DataDog/apm-python
.riot/requirements/e1342cb.txt                                          @DataDog/apm-python
releasenotes/notes/bedrock-converse-api-20dd255c1ee18cf4.yaml           @DataDog/apm-python
tests/contrib/botocore/bedrock_cassettes/bedrock_converse.yaml          @DataDog/ml-observability
tests/contrib/botocore/bedrock_cassettes/bedrock_converse_error.yaml    @DataDog/ml-observability
tests/contrib/botocore/bedrock_cassettes/bedrock_converse_stream.yaml   @DataDog/ml-observability
tests/snapshots/tests.contrib.botocore.test_bedrock.test_converse.json  @DataDog/apm-python
ddtrace/_trace/trace_handlers.py                                        @DataDog/apm-sdk-api-python
ddtrace/contrib/internal/botocore/patch.py                              @DataDog/apm-core-python @DataDog/apm-idm-python
ddtrace/contrib/internal/botocore/services/bedrock.py                   @DataDog/ml-observability
ddtrace/llmobs/_integrations/bedrock.py                                 @DataDog/ml-observability
ddtrace/llmobs/_integrations/utils.py                                   @DataDog/ml-observability
riotfile.py                                                             @DataDog/apm-python
tests/contrib/botocore/bedrock_utils.py                                 @DataDog/ml-observability
tests/contrib/botocore/test.py                                          @DataDog/apm-core-python @DataDog/apm-idm-python
tests/contrib/botocore/test_bedrock.py                                  @DataDog/ml-observability
tests/contrib/botocore/test_bedrock_llmobs.py                           @DataDog/ml-observability

pr-commenter · 2025-02-28T16:59:12Z

Benchmarks

Benchmark execution time: 2025-03-12 00:02:27

Comparing candidate commit ef243ab in PR branch evan.li/claude-code-converse-api with baseline commit 8d2f7da in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 468 metrics, 2 unstable metrics.

datadog-dd-trace-py-rkomorn · 2025-02-28T22:26:43Z

Datadog Report

Branch report: evan.li/claude-code-converse-api
Commit report: e700fca
Test service: dd-trace-py

✅ 0 Failed, 43 Passed, 290 Skipped, 49.39s Total duration (5m 5.68s time saved)

…aude-code-converse-api

ddtrace/_trace/trace_handlers.py

ddtrace/contrib/internal/botocore/services/bedrock.py

tests/contrib/botocore/test.py

…aude-code-converse-api

ddtrace/contrib/internal/botocore/services/bedrock.py

ddtrace/llmobs/_integrations/bedrock.py

Yun-Kim · 2025-03-11T19:56:30Z

ddtrace/llmobs/_integrations/bedrock.py

+                continue
+            role = str(p.get("role", ""))
+            content = p.get("content", "")
+            if isinstance(content, list):


What happens if content is a string? Doesn't seem like we append it anywhere

the bedrock converse spec states content is always a list of ContentBlock objects (what we parse inside this if block)

https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Message.html

i'll add a comment to clarify this

Yun-Kim · 2025-03-11T20:00:21Z

ddtrace/llmobs/_integrations/bedrock.py

+            message = response.get("output", {}).get("message", {})
+            role = message.get("role", "assistant")
+            tool_calls_info = []
+            if message.get("content") and isinstance(message["content"], list):


Can content be a string or a non-list type?

i do not think so; bedrock docs state content is always a list of "ContentBlock" types: https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html

and this seems to be consistent with the earliest version of botocore that supports bedrock converse

Yun-Kim · 2025-03-11T20:01:09Z

ddtrace/llmobs/_integrations/bedrock.py

+                                "name": content_block.get("toolUse", {}).get("name", ""),
+                                "arguments": content_block.get("toolUse", {}).get("input", ""),
+                                "tool_id": content_block.get("toolUse", {}).get("toolUseId", ""),


are these values always guaranteed to be a string? should we be defensive/cast to string?

name and tool_id are (i'll cast to string), but arguments isn't - thanks for the catch, ill default arguments to {}

ddtrace/llmobs/_integrations/bedrock.py

releasenotes/notes/bedrock-converse-api-20dd255c1ee18cf4.yaml

ddtrace/_trace/trace_handlers.py

ddtrace/llmobs/_integrations/bedrock.py

ddtrace/llmobs/_integrations/openai.py

Yun-Kim · 2025-03-11T22:43:17Z

ddtrace/contrib/internal/botocore/services/bedrock.py

    """
    Sets LLM usage metrics in the context for LLM Observability.
    """
    llmobs_usage = {}
-    if input_tokens:
+    if input_tokens is not None and input_tokens != "":


Is checking against None and empty string not covered by if input_tokens?

we want to differentiate token counts being 0 vs not present, which i guess is rare but valid. this is differentiate is needed since converse returns tokens in a usage field with integer values

ddtrace/llmobs/_integrations/bedrock.py

Yun-Kim · 2025-03-11T22:48:38Z

ddtrace/llmobs/_integrations/utils.py

+    for content_block in content:
+        if content_block.get("text") and isinstance(content_block.get("text"), str):
+            content_blocks.append(content_block.get("text", ""))
+        elif content_block.get("toolUse") and isinstance(content_block.get("toolUse"), dict):


can a content block contain multiple objects? i.e. should we always check each condition instead of if/elif/else?

contentblock should only ever have one object type so i think this logic is ok

https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html#API_runtime_ContentBlock_Contents

Yun-Kim · 2025-03-11T22:51:37Z

ddtrace/llmobs/_integrations/utils.py

+            tool_calls_info.append(
+                {
+                    "name": str(toolUse.get("name", "")),
+                    "arguments": toolUse.get("input", {}),


Probably should json dump this value if it's always json serializable, unless we need to pass it in as an actual dict (non-string) to be properly visualized in our UI. Correct me if I'm wrong

i believe if we dump it here, and then also call safe_json when we encode, the arguments will show up as a raw json string in the UI instead of the whole thing being having nice json formatting

hm yea actually we'll drop the output if this isn't a dict since that's how we decode tool calls in the backend

prompt 2

ab48b9b

datadog-datadog-prod-us1 bot reviewed Feb 27, 2025

View reviewed changes

lievan added 2 commits February 28, 2025 10:55

clean up

cdd129d

more cleanup

c2510fd

lievan changed the title ~~feat(llmobs): trace bedrock converse api~~ feat(llmobs): trace text-based bedrock converse api Feb 28, 2025

lievan added 2 commits February 28, 2025 13:59

bedrock integration should not need to access tags

0fe6aae

fix token extraction

e700fca

lievan added 13 commits March 2, 2025 17:48

test refactors

a5fd4ab

Merge branch 'main' of github.com:DataDog/dd-trace-py into evan.li/cl…

72a8b07

…aude-code-converse-api

working tests

9e56c32

clean up

acdf649

decouple from span tags

5956d3f

refactor to decouple i/o parsing

209fc8f

make the tests more readable

a97b1e8

rm uneeded changes

45d08b0

default total tokens

128f936

default token tokens

5a050b2

lockfiles

e7241d2

clarify comment

55dc627

rel note

4b4ef98

lievan commented Mar 3, 2025

View reviewed changes

ddtrace/_trace/trace_handlers.py Outdated Show resolved Hide resolved

lievan commented Mar 3, 2025

View reviewed changes

ddtrace/contrib/internal/botocore/services/bedrock.py Outdated Show resolved Hide resolved

lievan added 5 commits March 3, 2025 11:15

clarify commetn

0dd9323

fix bedrock tests

4348eda

add back vcr stub

5ee2e39

reqs

6ea621f

fix nonetype int error

fee7e8a

lievan added 9 commits March 5, 2025 13:00

rm sampling thing

96adf18

merge

c547471

merge

1cdcf8a

fix rel note

d138f0d

keep top p out of this pr

6df9de0

more double backticks

f83dea7

rm some leftover code changes

0630c90

rename snapshot

f61c9e5

try skipping

3b1358b

lievan commented Mar 6, 2025

View reviewed changes

tests/contrib/botocore/test.py Outdated Show resolved Hide resolved

lievan added 5 commits March 6, 2025 08:39

fix system prompt parsing

1c88cb2

fix snapshot

894a4aa

dont skip actually try to fix test

41789bd

Merge branch 'main' of github.com:DataDog/dd-trace-py into evan.li/cl…

7ba36c6

…aude-code-converse-api

add back the ski[

bb0d848

Yun-Kim reviewed Mar 11, 2025

View reviewed changes

lievan added 10 commits March 11, 2025 17:12

address comments

06fd9a6

extract out a common util function

30a6a03

safer tool use

c0b45aa

fix output

359a317

make i/o more consistent

5123f29

token usage cleanup

ed42bed

none checks for tokens

69fcb11

none checks for tokens

a818ad4

make sure catch none/empty string case

ec28d98

clean up usage code

275d0f9

Yun-Kim reviewed Mar 11, 2025

View reviewed changes

lievan added 3 commits March 11, 2025 19:00

remove accidental change

cfbc760

suggestions

d41153e

accidental openai change

ef243ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llmobs): trace text-based bedrock converse api #12560

feat(llmobs): trace text-based bedrock converse api #12560

lievan commented Feb 27, 2025 •

edited

Loading

github-actions bot commented Feb 27, 2025 •

edited

Loading

pr-commenter bot commented Feb 28, 2025 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Feb 28, 2025

Yun-Kim Mar 11, 2025

lievan Mar 11, 2025

Yun-Kim Mar 11, 2025

lievan Mar 11, 2025

Yun-Kim Mar 11, 2025

lievan Mar 11, 2025

Yun-Kim Mar 11, 2025

lievan Mar 11, 2025 •

edited

Loading

Yun-Kim Mar 11, 2025

lievan Mar 11, 2025 •

edited

Loading

Yun-Kim Mar 11, 2025

lievan Mar 11, 2025

lievan Mar 11, 2025

feat(llmobs): trace text-based bedrock converse api #12560

Are you sure you want to change the base?

feat(llmobs): trace text-based bedrock converse api #12560

Conversation

lievan commented Feb 27, 2025 • edited Loading

Manual QA

Data this PR traces

Implementation details:

Cassettes

To Do

Checklist

Reviewer Checklist

github-actions bot commented Feb 27, 2025 • edited Loading

pr-commenter bot commented Feb 28, 2025 • edited Loading

Benchmarks

datadog-dd-trace-py-rkomorn bot commented Feb 28, 2025

Datadog Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lievan Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lievan Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lievan commented Feb 27, 2025 •

edited

Loading

github-actions bot commented Feb 27, 2025 •

edited

Loading

pr-commenter bot commented Feb 28, 2025 •

edited

Loading

lievan Mar 11, 2025 •

edited

Loading

lievan Mar 11, 2025 •

edited

Loading