Extended Thinking Budget Exceeded — Fix (2026)
The Error
Extended thinking budget exceeded: used 32768 tokens of 16384 allowed
The Fix
# Increase the extended thinking token budget
claude config set thinking_budget 32768
Why This Works
Sequential thinking in Claude Code allows Claude to reason through complex problems before responding. When the allocated budget is too low, Claude hits the ceiling mid-reasoning and the request fails. Setting the budget to 32768 tokens gives sufficient room for multi-step reasoning while keeping costs predictable.
If That Doesn’t Work
# Disable extended thinking entirely if budget errors persist
claude config set thinking_budget 0
# Or break your prompt into smaller, simpler sub-tasks
# that require less reasoning depth
If you are on a rate-limited plan, extended thinking tokens count toward your per-minute token limit. Reduce concurrent sessions or wait for the rate window to reset before retrying with a higher budget. You can also check current token usage with claude config get thinking_budget to confirm the new value persisted — some workspace-level configs override global settings.
Prevention
Add to your CLAUDE.md:
Extended thinking budget is set to 32768 tokens. For tasks requiring deep analysis (architecture decisions, complex refactors), use explicit step-by-step instructions to reduce reasoning depth needed.
Related
- Claude Sonnet 4.5 model guide — Guide to the claude-sonnet-4-5-20250929 model and its capabilities
Related Error Messages
This fix also applies if you see these related error messages:
TokenLimitExceeded: max tokens reachedError: output truncated at max_tokensWarning: response may be incomplete due to token limitModelNotFoundError: model 'claude-3-opus' not availableError: specified model is deprecated
Frequently Asked Questions
What causes token count mismatches?
Token counts are estimated before sending a request and precisely calculated on the server. The estimation uses a fast local tokenizer that may differ slightly from the server’s tokenizer. Small discrepancies (1-3%) are normal and do not affect functionality.
How do I reduce token consumption in long sessions?
Start new conversations for unrelated tasks. Each message in a conversation includes the full history, so long conversations consume exponentially more tokens. A 50-message conversation may use 10x the tokens of five 10-message conversations.
Can I see my token usage?
Run claude usage to see your current billing period’s token consumption broken down by model. The Anthropic console at console.anthropic.com provides detailed usage graphs and per-day breakdowns.
Which model does Claude Code use by default?
Claude Code uses the latest Claude model available on your account. You can override the model with claude --model claude-sonnet-4-20250514 or set a default in your configuration with claude config set model claude-sonnet-4-20250514.
Estimate usage → Calculate your token consumption with our Token Estimator.
Related Guides
Try it: Paste your error into our Error Diagnostic for an instant fix.
- Fix Skill Exceeded Maximum Output
- Context Window Exceeded — Fix (2026)
- Fix Claude Rate Exceeded Error (2026)
- Fix Claude AI Rate Exceeded Error
How Extended Thinking Works
Extended thinking gives Claude more processing time for complex tasks by allowing it to reason through problems step-by-step before producing an output. This is not the same as a longer response – it is additional internal reasoning that improves accuracy on tasks requiring planning, multi-step logic, or complex code generation.
When to use extended thinking: Multi-file refactoring, architecture decisions, debugging complex interactions, writing code that involves multiple interacting systems.
When NOT to use extended thinking: Simple questions, single-file edits, formatting changes, or tasks where speed matters more than depth.
Token cost implications. Extended thinking tokens count toward your usage but are often worthwhile. A task that takes 3 attempts without thinking (90K tokens total) may succeed on the first attempt with thinking (40K tokens including thinking overhead).
Configuring Extended Thinking
Extended thinking is controlled by the model parameter and is available on Claude Opus 4.6 and Sonnet 4.6 models. In Claude Code, it activates automatically for complex tasks when using a supported model.
To explicitly request thinking in API calls:
response = client.messages.create(
model="claude-opus-4-6-20250414",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[{"role": "user", "content": prompt}]
)
The budget_tokens parameter controls how many tokens Claude can use for internal reasoning. Higher budgets allow deeper analysis but cost more. Start with 5,000 and increase if outputs are incomplete.