Why Smarter Models Cost MORE (Sonnet (2026)

Last updated: April 22, 2026

What This Means for Claude Code Users

Upgrading from Sonnet 4.5 to Sonnet 4.6 does not just increase per-token pricing – it increases the number of tokens generated per response. Smarter models produce more thorough, detailed outputs. A task that consumed 40K output tokens on Sonnet 4.5 may consume 55K-65K output tokens on Sonnet 4.6: more explanation, more edge case handling, more comprehensive code. At $15/MTok output for Sonnet 4.6, this “intelligence tax” adds $0.23-$0.38 per interaction – a 20-40% cost increase beyond the headline rate change.

The Concept

The token paradox describes a counterintuitive dynamic: as models become more capable, they tend to produce longer outputs. This happens for several reasons. More capable models generate more comprehensive code with better error handling. They add more comments and documentation. They consider more edge cases. They provide more detailed explanations.

This is not a flaw – it is the model doing a better job. But it has a direct cost implication. Output tokens are the most expensive token category across all models ($15/MTok for Sonnet 4.6, $75/MTok for Opus 4.6). A 30% increase in output tokens on a model that already costs more per token creates a compounding cost increase.

The paradox is most visible when teams upgrade models and observe their bills increase by more than the per-token price difference would predict. If Sonnet 4.6 were 20% more expensive per token than 4.5, but generated 30% more tokens, the actual cost increase is 56% (1.2 * 1.3 = 1.56).

How It Works in Practice

Example 1: Measuring Output Verbosity

Compare output length for identical prompts across model versions to quantify the paradox.

# Measure output tokens for the same task on different models
# Task: "Add input validation to the /users POST endpoint using Zod"
# Sonnet 4.5 output: ~800 tokens
# - Generates a Zod schema
# - Updates the route handler
# - Basic error response
# Sonnet 4.6 output: ~1,200 tokens
# - Generates a more detailed Zod schema with custom error messages
# - Updates the route handler with typed error responses
# - Adds input sanitization
# - Adds a comment explaining the validation pattern
# - Suggests a test case
# Output increase: 50% more tokens for the same task
# Cost at Sonnet 4.6 rates: $0.012 vs $0.018 per interaction
# Over 100 interactions/day: $1.20 vs $1.80 -- $0.60/day difference

Example 2: Controlling Output Verbosity with CLAUDE.md Rules

The most effective countermeasure is explicit output constraints in CLAUDE.md.

# CLAUDE.md -- output verbosity control
## Output Rules
- Code changes only: do not explain what the code does unless asked
- No inline comments unless the logic is non-obvious
- Commit messages: one line, under 72 characters
- When editing files, show only the changed lines (use Edit tool, not full file rewrites)
- Do not suggest tests unless the task specifically requests testing
- Do not add TODO comments or future improvement suggestions

# Before CLAUDE.md verbosity rules:
# Agent output per code edit: ~1,000-1,500 tokens
# Includes: explanation, code, follow-up suggestions, test recommendations
# After CLAUDE.md verbosity rules:
# Agent output per code edit: ~400-600 tokens
# Includes: code changes only
# Savings: 50-60% output token reduction
# At Sonnet 4.6 ($15/MTok output): saves $0.006-$0.014 per edit
# Over 50 edits/day: saves $0.30-$0.70/day = $6-$14/month

Token Cost Impact

The verbosity paradox affects output tokens specifically, which are the most expensive token category. Controlling output verbosity directly targets the highest-cost component of every API call.

Daily coding session (100 interactions):
Without verbosity control:
  Average output: 1,200 tokens/interaction
  Daily output total: 120,000 tokens
  Cost at Sonnet 4.6: $1.80/day output
  Cost at Opus 4.6: $9.00/day output
With verbosity control (CLAUDE.md rules):
  Average output: 500 tokens/interaction
  Daily output total: 50,000 tokens
  Cost at Sonnet 4.6: $0.75/day output
  Cost at Opus 4.6: $3.75/day output
Savings: $1.05-$5.25/day = $21-$105/month

Implementation Checklist

Measure current average output tokens per interaction using /cost
Add output verbosity rules to CLAUDE.md (code only, no explanations unless asked)
Set a “max output” guideline: 600 tokens for simple edits, 1,500 for complex implementations
Instruct the agent to use Edit tool instead of full file rewrites (smaller diffs = fewer tokens)
Disable extended thinking for routine tasks (extended thinking inflates output by 2-5x)
Review weekly: if average output/interaction exceeds 800 tokens, tighten CLAUDE.md rules

The CCG Framework Connection

The token paradox is a core concept in the CCG cost framework because it affects every Claude Code user, regardless of their other optimization practices. Even users with perfect context engineering and model routing still overpay if they allow uncontrolled output verbosity. The CCG framework treats output token management as a distinct optimization axis from input token management, with its own set of rules and measurement techniques.

Which model? → Take the 5-question quiz in our Model Selector.

Why Smarter Models Cost MORE (Sonnet (2026)

What This Means for Claude Code Users

The Concept

How It Works in Practice

Example 1: Measuring Output Verbosity

Example 2: Controlling Output Verbosity with CLAUDE.md Rules

Token Cost Impact

Implementation Checklist

The CCG Framework Connection

Further Reading

About the Author

What This Means for Claude Code Users

The Concept

How It Works in Practice

Example 1: Measuring Output Verbosity

Example 2: Controlling Output Verbosity with CLAUDE.md Rules

Token Cost Impact

Implementation Checklist

The CCG Framework Connection

Further Reading

About the Author

Related Guides