Skip to content

Conversation

@DenysMoskalenko
Copy link
Contributor

Summary

Implements AWS Bedrock prompt caching support (see #3418) by fixing how cache points are sent, documenting the workflow, and extending test coverage to assert cache writes and reads.

Testing

  • uv run pytest tests/models/test_bedrock.py
  • uv run coverage run -m pytest tests/models/test_bedrock.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DouweM It's mostly the duplication of the same documentation we have for Anthropic CachePoint. What do you think, maybe we need to move it somewhere?

@DenysMoskalenko DenysMoskalenko force-pushed the feature/add_anthropick_prompt_caching_on_bedrock branch from 5263d8a to 6612939 Compare November 15, 2025 15:55
@DenysMoskalenko
Copy link
Contributor Author

@DouweM Is there any change to continue with this PR? We need this feature a lot 🙏.

I read the #3453 but I guess that we can add bedrock support in the same way and make changes later for both places if needed, instead of just ignoring the Bedrock users. What do you think?

@DouweM
Copy link
Collaborator

DouweM commented Nov 18, 2025

@DenysMoskalenko Thanks for working on this Denys!

I guess that we can add bedrock support in the same way and make changes later for both places if needed, instead of just ignoring the Bedrock users. What do you think?

Agreed. Can you please have a look at these issues and address them in case they affect this implementation as well?

@DenysMoskalenko
Copy link
Contributor Author

DenysMoskalenko commented Nov 19, 2025

Sure:

  1. About the TTL (https://github.com/pydantic/pydantic-ai/pull/3450) — We cannot set the TTL for AWS Bedrock’s prompt cache; it’s fixed at a 5-minute sliding window that resets with each successful cache hit (see https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html#prompt-caching-overview).
  2. About CachePoint strip. @DouweM Do you think we really need to do this work? The limit of 4 doesn’t seem stable (the docs currently show 4 everywhere, but it’s in a table, so it might change). I thought the initial idea was to rely on the user: if they hit the “maximum 4 CachePoints” error, it’s on them to use fewer than 4 in their code (Same for example with minimum tokens for cache-point, it depend on the model and will not work for small inputs). Explicit error is better than implicit magic, in my opinion we should show to the user that he makes something wrong. Anyway, if you require this change, do you think this should be part of this PR?
  3. I don’t think there are any special considerations. I’ll update the tests and re-record the cassette for AWS Nova to confirm everything works correctly.
  4. Same as above.

  - Emit cache-point tool entries so Bedrock accepts cached tool definitions
  - Document and test prompt caching (writes + reads) with cassette-body checks
  - Refresh Bedrock cassettes and type annotations to align with the new flow
@DenysMoskalenko DenysMoskalenko force-pushed the feature/add_anthropick_prompt_caching_on_bedrock branch from 783607c to 900d542 Compare November 19, 2025 10:59
@DenysMoskalenko
Copy link
Contributor Author

DenysMoskalenko commented Nov 19, 2025

@DouweM
I’ve run tests using AWS Nova models, and prompt caching works well for both user and system prompts. I’ve recorded the cassette for that.

The limitation is:
– It doesn’t support tool caching. As specified in the documentation (https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html#prompt-caching-models), In my opinion, it’s fine to raise an error when a user tries to do something Bedrock doesn’t allow.

@DouweM
Copy link
Collaborator

DouweM commented Nov 19, 2025

  • About CachePoint strip. @DouweM Do you think we really need to do this work? The limit of 4 doesn’t seem stable (the docs currently show 4 everywhere, but it’s in a table, so it might change). I thought the initial idea was to rely on the user: if they hit the “maximum 4 CachePoints” error, it’s on them to use fewer than 4 in their code (Same for example with minimum tokens for cache-point, it depend on the model and will not work for small inputs). Explicit error is better than implicit magic, in my opinion we should show to the user that he makes something wrong.

@DenysMoskalenko If the number 4 changes or becomes model-specific we can add it to the model profile.

But I do think we should take care of staying under the limit, because it's not so easy for the user to do so themself if there are CachePoints in the message history for example. They could use a history processor to remove older ones, but that still wouldn't be able to know if there were cache points on the tool defs or instructions. So it's not really "the user is doing something wrong and should fix their code", but "the way we implemented cache points makes it easier to run into this issue than to fix it yourself", so we should fix it.

In this case, "implicit magic" is a bit intentional, because as I wrote on #3453, the goal is for this to be useful to people who don't want to become experts on prompt caching and the limitations Anthropic enforces, not the more advanced users and use cases that need fine-grained control.

In any case, the CachePoint stripping is being implemented in #3442 which I'd expect to merge today or tomorrow, so I'd recommend we wait for that one to merge first, and then adopt the relevant changes here as well. (Likely with a new prompt caching doc, so we don't end up with a ton of duplication)

In my opinion, it’s fine to raise an error when a user tries to do something Bedrock doesn’t allow.

Usually with model settings, we silently ignore them if they're not supported (that's why most of them say "Supported by: ..." in the docstring), so I might prefer to say "Supported by: Anthropic on Bedrock", and then silently ignore it for Nova.

I agree raising errors when the user does something unsupported is usually good, but with model settings we typically do a "best effort" so that as many requests as possible succeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants