Fix Claude Extended Thinking Not (2026)
Extended thinking gives Claude deeper reasoning capabilities, but misconfigured parameters produce 400 errors or empty thinking blocks. This guide covers every failure mode and the exact fix.
The Error
{
"type": "error",
"error": {
"type": "invalid_request_error",
"message": "thinking.budget_tokens: must be >= 1024 and < max_tokens"
}
}
Quick Fix
- Set
budget_tokensto at least 1024 and strictly less thanmax_tokens. - When using tools with thinking, set
tool_choicetoautoornoneonly. - Pass thinking blocks back unmodified in multi-turn conversations.
What Causes This
Extended thinking fails when:
budget_tokensis less than 1024 or greater than or equal tomax_tokens.tool_choiceis set toanyor a specific tool name (onlyautoandnonework with thinking).- Thinking is toggled on or off mid-assistant-turn.
- Thinking blocks are modified or stripped when passing them back in multi-turn conversations.
- Thinking parameters change between turns, invalidating cached messages.
Full Solution
Basic Extended Thinking Setup
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[{"role": "user", "content": "Solve this step by step: What is 127 * 389?"}]
)
for block in response.content:
if block.type == "thinking":
print(f"Thinking: {block.thinking[:200]}...")
elif block.type == "text":
print(f"Answer: {block.text}")
Fix budget_tokens Validation
The budget_tokens value must satisfy: 1024 <= budget_tokens < max_tokens:
# WRONG: budget_tokens >= max_tokens
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=8000,
thinking={"type": "enabled", "budget_tokens": 8000}, # Error: not < max_tokens
messages=[...]
)
# WRONG: budget_tokens < 1024
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=8000,
thinking={"type": "enabled", "budget_tokens": 500}, # Error: < 1024
messages=[...]
)
# CORRECT
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000}, # 1024 <= 10000 < 16000
messages=[...]
)
Fix Tool Choice Conflicts
Extended thinking only supports tool_choice: auto or tool_choice: none:
# WRONG: tool_choice "any" with thinking
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
tool_choice={"type": "any"}, # Error!
tools=[{"name": "calc", "description": "Calculate", "input_schema": {"type": "object", "properties": {}}}],
messages=[...]
)
# CORRECT: tool_choice "auto" with thinking
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
tool_choice={"type": "auto"}, # OK
tools=[{"name": "calc", "description": "Calculate", "input_schema": {"type": "object", "properties": {}}}],
messages=[...]
)
Control Thinking Display
By default, Claude 4 models return summarized thinking. You can control this with the display parameter:
# Summarized thinking (default) -- charged for full tokens, returns summary
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000, "display": "summarized"},
messages=[...]
)
# Omit thinking content -- returns empty thinking blocks with encrypted signature
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000, "display": "omitted"},
messages=[...]
)
Multi-Turn Thinking Continuity
Pass thinking blocks back unmodified to maintain reasoning continuity:
# First turn
response1 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[{"role": "user", "content": "What is 127 * 389?"}]
)
# Second turn -- pass ALL content blocks back unmodified
messages = [
{"role": "user", "content": "What is 127 * 389?"},
{"role": "assistant", "content": response1.content}, # Includes thinking blocks
{"role": "user", "content": "Now multiply that result by 2"}
]
response2 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=messages
)
TypeScript Example
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 16000,
thinking: { type: "enabled", budget_tokens: 10000 },
messages: [{ role: "user", content: "Solve step by step: What is 127 * 389?" }]
});
for (const block of response.content) {
if (block.type === "thinking") {
console.log("Thinking:", block.thinking.slice(0, 200));
} else if (block.type === "text") {
console.log("Answer:", block.text);
}
}
Prevention
- Always set max_tokens > budget_tokens + expected output: A good rule is
max_tokens = budget_tokens + 4096. - Default to tool_choice auto: When combining tools with thinking, always use
auto. - Never modify thinking blocks: In multi-turn conversations, return them exactly as received.
- Keep thinking params stable: Changing thinking parameters between turns invalidates cached messages but not cached system prompts or tools.
Frequently Asked Questions
What models support extended thinking?
Extended thinking is available on Claude Opus 4.6 and Claude Sonnet 4.6. It is not available on Claude Haiku or older model versions. When using Claude Code, the model is selected either by default or through the --model flag. Verify your model supports thinking before enabling the parameter.
Does extended thinking cost more than regular responses?
Yes. Thinking tokens are billed as output tokens even though they represent internal reasoning rather than the final answer. A request with 10,000 thinking tokens and 2,000 output tokens is billed for 12,000 output tokens total. However, tasks that require multiple retries without thinking often cost more in aggregate than a single successful thinking-enabled attempt.
Can I see what Claude is thinking?
By default, Claude 4 models return summarized thinking. You can control this with the display parameter: "summarized" returns a brief summary of the reasoning, "omitted" returns empty thinking blocks with an encrypted signature for multi-turn continuity. The full internal reasoning is not exposed to prevent prompt injection through reasoning manipulation.
Why does thinking fail silently without producing better results?
If thinking is enabled but the budget is set too low (close to 1024), Claude may not have enough tokens to reason deeply and falls back to a fast response. Set the thinking budget to at least 5,000 tokens for meaningful reasoning improvement. For complex multi-step problems, use 10,000 or more.
Which model? → Take the 5-question quiz in our Model Selector.
Related Guides
Try it: Paste your error into our Error Diagnostic for an instant fix.
- Claude Extended Thinking API Guide – full tutorial on using extended thinking effectively.
- Claude Tool Use Not Working – debug tool_choice and tool definition issues.
- Claude API Error 400 invalid_request_error Fix – the error type returned for thinking parameter violations.
- Claude Prompt Caching Not Working – understand how thinking changes affect cache invalidation.
- Claude Streaming API Guide – streaming works with extended thinking for real-time output.