Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Claude Sonnet 3.7 as a reasoning tool #11445

Merged
merged 41 commits into from
Mar 19, 2025
Merged

Conversation

aubin-tchoi
Copy link
Contributor

@aubin-tchoi aubin-tchoi commented Mar 18, 2025

Description

  • This PR adds support for handling thinking-mode output in Claude Sonnet 3.7: new thinking and redacted_thinking blocks + deltas (AssistantChatMessage has 2 new fields that are optional and can be omitted).
  • Closes https://github.com/dust-tt/tasks/issues/2401.
  • This PR adds a feature flag claude_3_7_reasoning that, when enabled, adds Claude Sonnet 3.7 as a reasoning model.
  • When running a reasoning tool having Claude Sonnet 3.7 as its supporting model, we add the beta header output-128k-2025-02-19 (increases the length of the output) + pass the thinking field in extras (ref), with a hardcoded token budget.

We choose to expose it as a reasoning tool instead of using its thinking mode in the regular multi step process for one main reason:

  • The API requires us to pass back the thinking and redacted_thinking tokens, the latter being encrypted data.
  • This behavior is specific to this API and cannot be reflected in AssistantChatMessage in its current state (we would need an additional field that would store the encrypted data, which does not make a lot of sense).
  • My 2 cents: results are still quite impressive, probably more because of the combination of beta header for bigger outputs than the actual thinking mode.

Tests

  • Tested locally.
  • Tested with the wind question, it gets it right.

Risk

  • Quite high as it touches many key code paths but well tested.
  • Under a FF.

Deploy Plan

  • Deploy front.
  • Deploy core.

@aubin-tchoi aubin-tchoi added the sdk-ack Used to acknowledge that you are not breaking the public API. label Mar 18, 2025
@aubin-tchoi aubin-tchoi changed the title 3.7 thinking experiments Add support for thinking with Claude 3.7 Mar 19, 2025
@aubin-tchoi aubin-tchoi requested a review from spolu March 19, 2025 06:24
@aubin-tchoi aubin-tchoi changed the title Add support for thinking with Claude 3.7 Add support for thinking with Claude Sonnet 3.7 Mar 19, 2025
Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directionally LGTM but we don't want to change the ChatMessage interface. Why other solutions do we have?

Good to math options through extras for now.

@@ -78,6 +78,10 @@ pub struct AssistantChatMessage {
#[serde(skip_serializing_if = "Option::is_none")]
pub reasoning_content: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub thinking: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too overfitted on Anthropic for our general API. We already have things in place to transmit reasoning tokens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have reasoning_content that was used for deepseek r1 (only other model to expose CoTs AFAIK) but my issue was that I really needed 2 fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can use reasoning_content for the regular CoTs and add a field additional_reasoning_content for the non human-parsable part, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand a bit on why you need two fields? What is the non_parseable part?

A possibility is to use content with that we then parse front-side to dissociate between different types of streams (as we do with reasoning tokens from claude 3.5 afaict?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they seem to parse back the reasoning tokens (they mention smth about "preserving the reasoning flow and conversation integrity" in the doc) in a pretty strict manner. AFAICT the non parsable part is just encrypted CoTs.

is parsing on the front-end the delimiter on the <thinking> blocks? if so I don't get <thinking>, I really get different structs in the messages streamed by anthropic and want to pass them back accurately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to tell this: we get a 400 from the api if we try to use thinking but don't pass the thinking and/or redacted_thinking blocks (it's actually the main drive behind this change), so if we were to use it as a reasoning tool instead of like this we would need to swallow the thinking tokens probably

@@ -238,6 +238,8 @@ impl TryFrom<&MistralChatMessage> for AssistantChatMessage {
Ok(AssistantChatMessage {
content,
reasoning_content: None,
thinking: None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought so aha

@@ -369,6 +369,8 @@ impl TryFrom<&OpenAICompletionChatMessage> for AssistantChatMessage {
Ok(AssistantChatMessage {
content,
reasoning_content,
thinking: None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope nope ;-)

@aubin-tchoi aubin-tchoi changed the title Add support for thinking with Claude Sonnet 3.7 Add support for Claude Sonnet 3.7 as a reasoning tool Mar 19, 2025
@aubin-tchoi aubin-tchoi requested a review from spolu March 19, 2025 15:15
StreamContent::AnthropicStreamThinking(content)) => {
content.thinking.push_str(delta.thinking.as_str());
if delta.thinking.len() > 0 {
let _ = event_sender.send(json!({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets turned into text? I'm pretty sure we do something else for r1/o1-o3?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per IRL. Out of consistency we want to inject thinking delimiters here

Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

If you commit to moving to delimiters for reasoning stuff THIS WEEK

@aubin-tchoi aubin-tchoi merged commit dd5a13f into main Mar 19, 2025
7 checks passed
@aubin-tchoi aubin-tchoi deleted the 3.7-thinking branch March 19, 2025 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sdk-ack Used to acknowledge that you are not breaking the public API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants