How to Reduce Claude Code Token Usage (2026)
The Problem
The average Claude Code session consumes 50K-200K tokens. At Opus 4.6 rates ($15/MTok input, $75/MTok output), a heavy day of coding can burn $5-$20 in API costs. Multiply that across a team of five engineers working 20 days a month, and the bill reaches $2,000-$8,000 monthly – before anyone optimizes anything. A 3x reduction in token usage translates to saving $1,300-$5,300 per month for that same team.
Quick Wins (Under 5 Minutes)
- Run
/compactafter every major task – reduces context by 60-80%, saving 30K-160K tokens on the next prompt cycle. - Switch to Sonnet 4.6 for routine work – at $3/$15 per MTok (input/output), Sonnet costs 5x less than Opus for the same tokens.
- Add file read boundaries to CLAUDE.md – a single rule like “never read files over 500 lines without using line offsets” prevents 10K+ token reads.
- Disable unused MCP servers – each loaded MCP tool definition adds 500-2,000 tokens to every message. Five unused tools waste 2,500-10,000 tokens per turn.
- Start fresh sessions for unrelated tasks – carrying a 150K-token conversation into a new topic means paying for irrelevant context on every subsequent turn.
Deep Optimization Strategies
Strategy 1: Precision File Reading
Claude Code’s Read tool costs approximately 150 tokens of overhead plus the file content itself. Reading a 1,000-line file adds roughly 8,000-12,000 tokens. The optimization: read only what is needed.
# CLAUDE.md rule
When reading files:
- Use offset and limit parameters to read only relevant sections
- Never read an entire file over 200 lines without first checking its length
- For large files, read the first 50 lines to understand structure, then target specific sections
Before: Reading a full 800-line config file = ~9,600 tokens per read. After: Reading a targeted 50-line section = ~750 tokens per read. Savings: 92%.
Strategy 2: Model Routing by Task Complexity
Not every task requires Opus 4.6. Route tasks to the cheapest model that can handle them.
# Use Haiku 4.5 for simple file operations and searches
claude --model haiku "Find all TODO comments in src/"
# Use Sonnet 4.6 for standard coding tasks
claude --model sonnet "Refactor the auth middleware to use JWT validation"
# Reserve Opus 4.6 for complex architecture decisions
claude --model opus "Design the event-driven migration from REST to GraphQL"
| Task Type | Recommended Model | Cost per 100K tokens (in+out) |
|---|---|---|
| File search, grep, simple edits | Haiku 4.5 | $0.48 |
| Standard coding, refactoring | Sonnet 4.6 | $1.80 |
| Architecture, complex debugging | Opus 4.6 | $9.00 |
Routing 60% of tasks to Sonnet and 20% to Haiku instead of using Opus for everything saves roughly 70% on API spend.
Strategy 3: Structured Prompts Over Conversational Ones
Vague prompts cause Claude Code to explore, read multiple files, and iterate – all of which cost tokens. Structured prompts with explicit constraints eliminate discovery overhead.
# Bad prompt (triggers 5-8 tool calls, ~3,000 token overhead):
"Fix the login bug"
# Good prompt (triggers 2-3 tool calls, ~700 token overhead):
"In src/auth/login.ts, the handleLogin function on line 45 throws
'undefined is not a function' when the OAuth token is expired.
Fix the null check on the token.expiresAt field."
Specific prompts reduce tool call overhead from an average of 6 calls (6 x 245 = 1,470 tokens in Bash overhead alone) to 2 calls (490 tokens). Combined with reduced file reading, this produces a 60-75% reduction per interaction.
Strategy 4: CLAUDE.md as a Context Preloader
Instead of re-explaining project context every session, encode it in CLAUDE.md. This file loads once (~200-1,000 tokens) and replaces what would otherwise be 5-10 back-and-forth clarification messages costing 5,000-15,000 tokens.
# CLAUDE.md -- project context section
## Project: PaymentService
- Language: TypeScript, Node 20, pnpm
- Database: PostgreSQL via Prisma ORM
- Test runner: Vitest (run with `pnpm test`)
- API style: REST, OpenAPI 3.1 spec in docs/api.yaml
- Key directories: src/routes/, src/services/, src/models/
- NEVER modify files in src/generated/ -- these are auto-generated from Prisma
This 100-token context block eliminates an average of 3 discovery tool calls per session (3 x 245 = 735 tokens for Bash alone, plus file content read tokens).
Strategy 5: Aggressive Compaction Scheduling
The /compact command reduces context by 60-80%, but most users only run it when hitting the context window limit. Proactive compaction after every 10-15 exchanges keeps the running context lean.
# Check current context usage
/cost
# If context exceeds 80K tokens, compact immediately
/compact
# After compaction, verify the reduction
/cost
A session that grows to 150K tokens before compaction wastes approximately $2.25-$11.25 (depending on model) in redundant context charges over those 15 exchanges. Compacting at 80K tokens saves roughly 40% of that waste.
Measuring Your Savings
Track token usage with the built-in /cost command to see per-session totals. For long-term tracking across sessions, use ccusage:
# Install ccusage for historical tracking
npm install -g ccusage
# View usage for the past 7 days
ccusage --days 7
# Export to JSON for spreadsheet analysis
ccusage --days 30 --format json > usage-report.json
Compare your weekly totals before and after implementing these strategies. A 3x reduction means your weekly token count should drop to roughly 33% of the baseline.
Cost Impact Summary
| Technique | Token Savings | Monthly Savings (Solo, Opus) |
|---|---|---|
| Precision file reading | 40-60% on reads | $15-$45 |
| Model routing (60% Sonnet) | 70% on routed tasks | $50-$150 |
| Structured prompts | 60-75% per interaction | $20-$60 |
| CLAUDE.md preloading | 5K-15K tokens/session | $10-$30 |
| Proactive compaction | 40% context waste | $25-$75 |
| Combined | ~67% (3x reduction) | $120-$360 |
For a team of five engineers, multiply the solo figures by 5 for a monthly savings range of $600-$1,800.
Which model? → Take the 5-question quiz in our Model Selector.
Related Guides
Estimate tokens → Calculate your usage with our Token Estimator.
Try it: Estimate your monthly spend with our Cost Calculator.
- Claude Code Compact Command Guide – deep dive on compaction mechanics and timing
- Claude Code Context Window Management – strategies for staying within context limits
- Claude Code Model Selection for Cost: Sonnet vs Haiku vs Opus – full model routing decision framework