Add support for Claude Sonnet 3.7 as a reasoning tool #11445

aubin-tchoi · 2025-03-18T13:35:18Z

Description

This PR adds support for handling thinking-mode output in Claude Sonnet 3.7: new thinking and redacted_thinking blocks + deltas (AssistantChatMessage has 2 new fields that are optional and can be omitted).
Closes https://github.com/dust-tt/tasks/issues/2401.
This PR adds a feature flag claude_3_7_reasoning that, when enabled, adds Claude Sonnet 3.7 as a reasoning model.
When running a reasoning tool having Claude Sonnet 3.7 as its supporting model, we add the beta header output-128k-2025-02-19 (increases the length of the output) + pass the thinking field in extras (ref), with a hardcoded token budget.

We choose to expose it as a reasoning tool instead of using its thinking mode in the regular multi step process for one main reason:

The API requires us to pass back the thinking and redacted_thinking tokens, the latter being encrypted data.
This behavior is specific to this API and cannot be reflected in AssistantChatMessage in its current state (we would need an additional field that would store the encrypted data, which does not make a lot of sense).
My 2 cents: results are still quite impressive, probably more because of the combination of beta header for bigger outputs than the actual thinking mode.

Tests

Tested locally.
Tested with the wind question, it gets it right.

Risk

Quite high as it touches many key code paths but well tested.
Under a FF.

Deploy Plan

Deploy front.
Deploy core.

…s named thinking_delta and not thinking)

spolu

Directionally LGTM but we don't want to change the ChatMessage interface. Why other solutions do we have?

Good to math options through extras for now.

spolu · 2025-03-19T08:40:58Z

core/src/providers/chat_messages.rs

@@ -78,6 +78,10 @@ pub struct AssistantChatMessage {
    #[serde(skip_serializing_if = "Option::is_none")]
    pub reasoning_content: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
+    pub thinking: Option<String>,


This is too overfitted on Anthropic for our general API. We already have things in place to transmit reasoning tokens?

we have reasoning_content that was used for deepseek r1 (only other model to expose CoTs AFAIK) but my issue was that I really needed 2 fields

I can use reasoning_content for the regular CoTs and add a field additional_reasoning_content for the non human-parsable part, wdyt?

Can you expand a bit on why you need two fields? What is the non_parseable part?

A possibility is to use content with that we then parse front-side to dissociate between different types of streams (as we do with reasoning tokens from claude 3.5 afaict?)

they seem to parse back the reasoning tokens (they mention smth about "preserving the reasoning flow and conversation integrity" in the doc) in a pretty strict manner. AFAICT the non parsable part is just encrypted CoTs.

is parsing on the front-end the delimiter on the <thinking> blocks? if so I don't get <thinking>, I really get different structs in the messages streamed by anthropic and want to pass them back accurately

forgot to tell this: we get a 400 from the api if we try to use thinking but don't pass the thinking and/or redacted_thinking blocks (it's actually the main drive behind this change), so if we were to use it as a reasoning tool instead of like this we would need to swallow the thinking tokens probably

spolu · 2025-03-19T08:41:12Z

core/src/providers/mistral.rs

@@ -238,6 +238,8 @@ impl TryFrom<&MistralChatMessage> for AssistantChatMessage {
        Ok(AssistantChatMessage {
            content,
            reasoning_content: None,
+            thinking: None,


thought so aha

spolu · 2025-03-19T08:41:20Z

core/src/providers/openai_compatible_helpers.rs

@@ -369,6 +369,8 @@ impl TryFrom<&OpenAICompletionChatMessage> for AssistantChatMessage {
        Ok(AssistantChatMessage {
            content,
            reasoning_content,
+            thinking: None,


Nope nope ;-)

…ssage

…ove them when used as a regular tool

spolu · 2025-03-19T15:33:32Z

core/src/providers/anthropic.rs

+                                                StreamContent::AnthropicStreamThinking(content)) => {
+                                                content.thinking.push_str(delta.thinking.as_str());
+                                                if delta.thinking.len() > 0 {
+                                                    let _ = event_sender.send(json!({


This gets turned into text? I'm pretty sure we do something else for r1/o1-o3?

As per IRL. Out of consistency we want to inject thinking delimiters here

spolu

LGTM

If you commit to moving to delimiters for reasoning stuff THIS WEEK

aubin-tchoi added 7 commits March 18, 2025 12:15

add 3.7 sonnet as a reasoning model behind a FF

89454ac

add Anthropic beta reasoning when a reasoning tool is available

886931f

correctly remove the reasoning tool for claude 3.7 sonnet

d0668af

add a function actionIsClaudeReasoning

2db9904

fix passing extra parameters

01fc2a5

rename the feature flag to remove the dot

aba7aac

fix types

b12d848

aubin-tchoi added the sdk-ack Used to acknowledge that you are not breaking the public API. label Mar 18, 2025

add thinking to the extras

316367c

aubin-tchoi force-pushed the 3.7-thinking branch from 84de25f to c5b5e2a Compare March 18, 2025 14:38

pass thinking in the api call to anthropic

bd5b92a

aubin-tchoi force-pushed the 3.7-thinking branch from c5b5e2a to bd5b92a Compare March 18, 2025 14:38

aubin-tchoi added 2 commits March 18, 2025 15:43

fix var name

629c0b5

override the temperature in thinking mode (not supported)

54866db

aubin-tchoi force-pushed the 3.7-thinking branch from fc8dd59 to 54866db Compare March 18, 2025 14:48

aubin-tchoi added 2 commits March 18, 2025 15:57

remove the top_p too

c658c51

add two missing variants to the enum StreamContent (new blocks)

e9a7671

aubin-tchoi force-pushed the 3.7-thinking branch from a52f838 to e9a7671 Compare March 18, 2025 15:21

aubin-tchoi added 2 commits March 18, 2025 16:31

add missing signature block

21fa03a

fix thinking delta handling (it's actually a != struct bc the field i…

cc04550

…s named thinking_delta and not thinking)

aubin-tchoi force-pushed the 3.7-thinking branch from 4c79228 to cc04550 Compare March 18, 2025 15:37

aubin-tchoi added 9 commits March 18, 2025 16:40

fix handling of the thinking deltas

36af0ea

fix field name

9444d30

reduce the budget_token count

2a1ea5f

handle SignatureDelta

0a354af

fix handling of AnthropicStreamRedactedThinking

0e5070f

remove unneeded TODO

0e74eba

fix how reasoning tokens are passed to the context

13592c2

remove info log

c20f00f

fix handling of redacted thinking in deltas

0352ccf

aubin-tchoi force-pushed the 3.7-thinking branch from ea9281f to 6109159 Compare March 18, 2025 21:31

aubin-tchoi added 2 commits March 18, 2025 22:33

fix parsing in the response sent from core

1b83058

fix rendering of CoTs

c1696c1

aubin-tchoi force-pushed the 3.7-thinking branch from 71213ba to c1696c1 Compare March 18, 2025 21:53

aubin-tchoi added 5 commits March 19, 2025 00:10

fix a content type

c15694b

remove debug log

8150dfd

fix tool use

982c6ed

increase the budget token to 12k

de91256

reduce the token budget

521cb12

aubin-tchoi changed the title ~~3.7 thinking experiments~~ Add support for thinking with Claude 3.7 Mar 19, 2025

aubin-tchoi requested a review from spolu March 19, 2025 06:24

aubin-tchoi changed the title ~~Add support for thinking with Claude 3.7~~ Add support for thinking with Claude Sonnet 3.7 Mar 19, 2025

spolu reviewed Mar 19, 2025

View reviewed changes

aubin-tchoi force-pushed the 3.7-thinking branch from a5775e6 to afaf519 Compare March 19, 2025 14:33

remove the thinking and redacted_thinking fields from AssistantChatMe…

18a9874

…ssage

aubin-tchoi force-pushed the 3.7-thinking branch from afaf519 to 18a9874 Compare March 19, 2025 14:33

aubin-tchoi added 5 commits March 19, 2025 15:42

stop reading the resoning_content

1807eff

remove the thinking content from the AssistantChatMessage

154fd60

pass the extra thinking parameters when used as a reasoning tool, rem…

2e1e17e

…ove them when used as a regular tool

add back the removal of the top_p parameter

f505d4b

fix a typo

ea7ac26

aubin-tchoi changed the title ~~Add support for thinking with Claude Sonnet 3.7~~ Add support for Claude Sonnet 3.7 as a reasoning tool Mar 19, 2025

aubin-tchoi requested a review from spolu March 19, 2025 15:15

aubin-tchoi added 2 commits March 19, 2025 16:16

remove unused function

acb2a79

remove the top_p in reasoning mode

2ef750c

spolu reviewed Mar 19, 2025

View reviewed changes

spolu approved these changes Mar 19, 2025

View reviewed changes

aubin-tchoi merged commit dd5a13f into main Mar 19, 2025
7 checks passed

aubin-tchoi deleted the 3.7-thinking branch March 19, 2025 15:51

aubin-tchoi mentioned this pull request Mar 19, 2025

Wrap Claude 3.7 thinking tokens in <thinking> tags #11486

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Claude Sonnet 3.7 as a reasoning tool #11445

Add support for Claude Sonnet 3.7 as a reasoning tool #11445

aubin-tchoi commented Mar 18, 2025 •

edited

Loading

spolu left a comment

spolu Mar 19, 2025

aubin-tchoi Mar 19, 2025

aubin-tchoi Mar 19, 2025

spolu Mar 19, 2025

aubin-tchoi Mar 19, 2025

aubin-tchoi Mar 19, 2025

spolu Mar 19, 2025

aubin-tchoi Mar 19, 2025

spolu Mar 19, 2025

spolu Mar 19, 2025

spolu Mar 19, 2025

spolu left a comment •

edited

Loading

Add support for Claude Sonnet 3.7 as a reasoning tool #11445

Add support for Claude Sonnet 3.7 as a reasoning tool #11445

Conversation

aubin-tchoi commented Mar 18, 2025 • edited Loading

Description

Tests

Risk

Deploy Plan

spolu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spolu left a comment • edited Loading

Choose a reason for hiding this comment

aubin-tchoi commented Mar 18, 2025 •

edited

Loading

spolu left a comment •

edited

Loading