Context Window 200,000 tokens input context Handles full codebases, long documents, and extended multi-turn conversations Effective context utilization across the full window (no significant quality degradation at the edges) Coding Generates, reviews, and debugs code across all major ...

Token Type Cost per 1M Tokens Input $3.00 Output $15.00 Prompt caching (write) $3.75 Prompt caching (read) $0.30 Prompt Caching Prompt caching is where Sonnet 4 becomes particularly cost...

Claude Sonnet 4 (20250514): Model Guide (2026)

Last updated: April 20, 2026

claude-sonnet-4-20250514 is Anthropic’s current recommended Sonnet model. Released May 14, 2025, it delivers the best balance of capability and cost in the Claude model lineup. This is the model most developers should use for everyday coding, analysis, and generation tasks.

What the Model ID Means

claude-sonnet-4-20250514
│      │      │ │
│      │      │ └── Release date: May 14, 2025
│      │      └──── Version: 4
│      └───────── Tier: Sonnet (mid-range)
└──────────────── Family: Claude

The full model ID with date suffix ensures you always get this exact model version. Use it in production code to avoid unexpected behavior when Anthropic updates default aliases.

Why Sonnet 4 Is the Default Choice

Sonnet 4 is the model Anthropic actively recommends for most use cases. It replaced Sonnet 4.5 as the standard recommendation because:

Best-in-class instruction following: Sonnet 4 is measurably better at following complex, multi-part instructions
Strong coding performance: competitive with Opus on many coding benchmarks at one-fifth the cost
Reliable tool use: consistent and accurate function calling, critical for agent workflows
Good cost-to-quality ratio: $3/$15 per million tokens delivers the most value per dollar

Model Capabilities

Context Window

200,000 tokens input context
Handles full codebases, long documents, and extended multi-turn conversations
Effective context utilization across the full window (no significant quality degradation at the edges)

Coding

Generates, reviews, and debugs code across all major languages
Handles multi-file edits and cross-reference analysis
Strong at test generation and refactoring
Understands framework conventions (React, Django, Rails, Spring, etc.)

Instruction Following

Sonnet 4’s defining strength — reliably follows complex instructions
Adheres to formatting requirements, output constraints, and multi-step processes
Particularly important for CLAUDE.md rules in Claude Code projects

Extended Thinking

Supports extended thinking for complex reasoning tasks
Allocate a thinking budget to improve accuracy on math, logic, and architecture decisions:

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000,
    },
    messages=[{"role": "user", "content": "Design a distributed cache system"}]
)

Tool Use (Function Calling)

Full support for tool definitions and multi-tool sequences
Can call multiple tools per turn when appropriate
Compatible with the Claude Agent SDK

Vision

Accepts images alongside text
Analyzes screenshots, diagrams, charts, and UI mockups
Useful for front-end development and design review

How to Use Everywhere

Python API

import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[
        {"role": "user", "content": "Write a Python function to parse ISO 8601 dates"}
    ]
)
print(message.content[0].text)

TypeScript API

import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const message = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 4096,
  messages: [
    { role: "user", content: "Write a Python function to parse ISO 8601 dates" }
  ],
});
console.log(message.content[0].text);

Claude Code CLI

# Launch Claude Code with Sonnet 4
claude --model claude-sonnet-4-20250514
# Or use the shorthand alias
claude --model sonnet

CLAUDE.md Configuration

## Model
Default: claude-sonnet-4-20250514
Use this model for all implementation tasks. Escalate to Opus only for
architecture decisions or security reviews.

API Mode (Programmatic)

echo "Refactor the auth module to use middleware" | claude --model claude-sonnet-4-20250514 -p

For details on API mode, see our API mode vs interactive guide.

Pricing

Token Type	Cost per 1M Tokens
Input	$3.00
Output	$15.00
Prompt caching (write)	$3.75
Prompt caching (read)	$0.30

Prompt Caching

Prompt caching is where Sonnet 4 becomes particularly cost-effective. When the same system prompt or context is sent across multiple requests, cached input tokens cost just $0.30 per million — a 90% reduction.

This matters most for:

Agent loops (system prompt + conversation history cached across turns)
Batch processing (same instructions, different data)
Claude Code sessions (CLAUDE.md content cached)

Monthly Cost Estimates

Usage Level	Estimated Monthly Cost
Light (1-2 sessions/day)	$10-30
Moderate (5-10 sessions/day)	$50-150
Heavy (20+ sessions/day)	$200-600
Team (5 developers)	$500-2,000

For precise cost tracking, see our ccusage guide.

Comparisons

vs Sonnet 4.5 (Predecessor)

Aspect	Sonnet 4	Sonnet 4.5
Instruction following	Better	Good
Coding accuracy	Better	Strong
Creative tasks	Good	Slightly better
Pricing	Same	Same
Status	Current recommended	Available but not recommended

Sonnet 4 supersedes Sonnet 4.5 for nearly all use cases. Use Sonnet 4 for new projects. Only use Sonnet 4.5 if you have tested both and Sonnet 4.5 performs better on your specific workload. See our Sonnet 4.5 model guide for details.

vs Opus 4

Aspect	Sonnet 4	Opus 4
Complex reasoning	Good	Significantly better
Instruction following	Strong	Strong
Multi-step planning	Good	Better
Cost (input/output)	$3/$15	$15/$75
Speed	Faster	Slower
Best for	80% of tasks	Top 20% complexity

Use Sonnet 4 for feature implementation, code generation, bug fixes, test writing, and standard development tasks. This covers the majority of daily work.

Use Opus 4 for system architecture decisions, complex debugging that resists simpler attempts, security audits, and tasks requiring deep multi-step reasoning.

For model routing strategies, see our router guide.

vs Haiku 4.5

Aspect	Sonnet 4	Haiku 4.5
Capability	Full	Limited
Coding	Strong	Basic-adequate
Cost (input/output)	$3/$15	$0.80/$4
Speed	Fast	Fastest
Best for	General development	Simple tasks, classification

Use Haiku for: typo fixes, formatting, boilerplate generation, tab completion, and any task where speed matters more than depth. At 75% less cost than Sonnet, Haiku is the right choice for simple operations.

Need the complete toolkit? The Claude Code Playbook includes 200 production-ready templates, decision frameworks, and team setup guides for every Claude Code workflow.

Best Practices

When to Use Sonnet 4

Feature implementation: writing new code, endpoints, components
Bug fixes: diagnosing and fixing reported issues
Code review: analyzing code for quality and correctness
Test generation: writing unit and integration tests
Refactoring: restructuring code while preserving behavior
Documentation: generating code comments, API docs, README content

When to Escalate to Opus

The task involves designing a new system from scratch
Previous Sonnet attempts produced incorrect solutions
The task requires reasoning about security implications across multiple systems
You need to analyze a complex bug involving race conditions or distributed systems

When to Downgrade to Haiku

Fixing a typo or renaming a variable
Generating boilerplate from a template
Running a simple search-and-replace
Quick classification or categorization tasks

Frequently Asked Questions

Is Sonnet 4 the same as Sonnet 4.5? No. Despite the version numbers, Sonnet 4 is the newer model (released May 2025 vs September 2025 for 4.5). Sonnet 4 is generally better at instruction following and coding tasks.

Should I always use the full model ID? In production code, yes. Use claude-sonnet-4-20250514 to ensure consistent behavior. In interactive Claude Code sessions, the shorthand sonnet is fine.

Does Sonnet 4 support streaming? Yes. Both the API and Claude Code support streaming responses from Sonnet 4.

What are the rate limits? Rate limits depend on your API tier (free, build, scale), not the model. Check console.anthropic.com for your current limits.

Can Sonnet 4 handle a 200K-token input? Yes. Sonnet 4 supports the full 200K context window. Performance remains strong across the full window, though costs scale linearly with input size.

How does Sonnet 4 compare to GPT-4o or Gemini? Sonnet 4 is competitive with GPT-4o on coding benchmarks and generally stronger at instruction following. Direct comparisons depend on the specific task. Test on your own workloads.

Can I use Sonnet 4 with extended thinking and tool use simultaneously?

Yes. Extended thinking and tool use work together. The model can think through a problem before deciding which tools to call.

Is Sonnet 4 available on Amazon Bedrock and Google Vertex AI?

Yes. Sonnet 4 is available through both cloud providers. Check their documentation for the exact model ID format used on each platform.

Does Sonnet 4 support image input in Claude Code?

Claude Code does not currently pass images to the model. Image input is available through the API and Claude.ai web interface.

How often does Anthropic update the Sonnet model?

Anthropic releases new model versions periodically. Always use the full model ID with date suffix in production to avoid unexpected changes when defaults are updated.

Claude Opus 4.1 (20250805) with Thinking-16K

The model ID claude-opus-4-1-20250805-thinking-16k refers to Claude Opus 4.1 with a 16,000-token extended thinking budget. This is a specialized configuration of the Opus model family designed for tasks requiring deep, multi-step reasoning within a constrained thinking window.

What the Model ID Means

claude-opus-4-1-20250805-thinking-16k
│      │     │ │         │
│      │     │ │         └── Extended thinking budget: 16K tokens
│      │     │ └──────────── Release date: August 5, 2025
│      │     └────────────── Version: 4.1
│      └──────────────────── Tier: Opus (highest capability)
└─────────────────────────── Family: Claude

The -thinking-16k suffix indicates this model variant has a 16,000-token budget for extended thinking (chain-of-thought reasoning). The model uses this budget to “think through” complex problems before producing its final answer.

When to Use Opus 4.1 Thinking-16K

This model variant is optimal for:

Complex architecture decisions requiring analysis of multiple interacting systems
Security audits where the model must trace data flow across multiple files and identify subtle vulnerabilities
Mathematical proofs and formal reasoning that benefit from step-by-step derivation
Multi-constraint optimization problems where the model must balance competing requirements
Debugging distributed systems where root cause analysis requires tracing events across services

Pricing and Cost Considerations

Opus 4.1 with thinking-16k costs more per request than standard Opus because the thinking tokens count toward output token billing:

Component	Cost
Input tokens	$15 per million
Output tokens (including thinking)	$75 per million
16K thinking tokens per request	~$1.20 in thinking output alone

The thinking budget is a maximum, not a fixed cost. Simple questions may use only 2K-5K thinking tokens. The full 16K is consumed only on genuinely complex problems.

API Usage

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-1-20250805",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 16000
    },
    messages=[{"role": "user", "content": "Analyze this distributed system for race conditions..."}]
)

Comparison with Sonnet 4 Extended Thinking

Aspect	Opus 4.1 Thinking-16K	Sonnet 4 Extended Thinking
Reasoning depth	Deepest	Strong
Cost per request	~$1.20+ thinking alone	~$0.25 thinking
Speed	Slower	Faster
Best for	Research, architecture, proofs	Daily coding, implementation

For most development tasks, Sonnet 4 with extended thinking provides sufficient reasoning depth at a fraction of the cost. Reserve Opus 4.1 thinking-16k for problems where Sonnet’s reasoning falls short. See our Claude Code router guide for routing strategies between these models.

Which model? → Take the 5-question quiz in our Model Selector.

Estimate tokens → Calculate your usage with our Token Estimator.

Try it: Estimate your monthly spend with our Cost Calculator.

Sonnet 4.5 model guide — predecessor model comparison
Claude Code cost breakdown — pricing across all models
Model routing strategies — when to use which model
Reduce Claude Code costs — save money without losing quality
Claude Agent SDK — build agents on Sonnet 4
Cost tracking with ccusage — monitor spend per model
Claude Code prompt engineering — optimize prompts for Sonnet
The Claude Code Playbook — comprehensive reference
Claude temperature settings guide — Configure temperature for Sonnet 4

What the Model ID Means

Why Sonnet 4 Is the Default Choice

Model Capabilities

Context Window

Coding

Instruction Following

Extended Thinking

Tool Use (Function Calling)

Vision

How to Use Everywhere

Python API

TypeScript API

Claude Code CLI

CLAUDE.md Configuration

API Mode (Programmatic)

Pricing

Prompt Caching

Monthly Cost Estimates

Comparisons

vs Sonnet 4.5 (Predecessor)

vs Opus 4

vs Haiku 4.5

Best Practices

When to Use Sonnet 4

When to Escalate to Opus

When to Downgrade to Haiku

Frequently Asked Questions

Can I use Sonnet 4 with extended thinking and tool use simultaneously?

Is Sonnet 4 available on Amazon Bedrock and Google Vertex AI?

Does Sonnet 4 support image input in Claude Code?

How often does Anthropic update the Sonnet model?

Claude Opus 4.1 (20250805) with Thinking-16K

What the Model ID Means

When to Use Opus 4.1 Thinking-16K

Pricing and Cost Considerations

API Usage

Comparison with Sonnet 4 Extended Thinking

Related Guides

About the Author

Related Guides