How to Reduce Claude Code Token Usage (2026)

The Problem

The average Claude Code session consumes 50K-200K tokens. At Opus 4.6 rates ($15/MTok input, $75/MTok output), a heavy day of coding can burn $5-$20 in API costs. Multiply that across a team of five engineers working 20 days a month, and the bill reaches $2,000-$8,000 monthly – before anyone optimizes anything. A 3x reduction in token usage translates to saving $1,300-$5,300 per month for that same team.

Quick Wins (Under 5 Minutes)

  1. Run /compact after every major task – reduces context by 60-80%, saving 30K-160K tokens on the next prompt cycle.
  2. Switch to Sonnet 4.6 for routine work – at $3/$15 per MTok (input/output), Sonnet costs 5x less than Opus for the same tokens.
  3. Add file read boundaries to CLAUDE.md – a single rule like “never read files over 500 lines without using line offsets” prevents 10K+ token reads.
  4. Disable unused MCP servers – each loaded MCP tool definition adds 500-2,000 tokens to every message. Five unused tools waste 2,500-10,000 tokens per turn.
  5. Start fresh sessions for unrelated tasks – carrying a 150K-token conversation into a new topic means paying for irrelevant context on every subsequent turn.

Deep Optimization Strategies

Strategy 1: Precision File Reading

Claude Code’s Read tool costs approximately 150 tokens of overhead plus the file content itself. Reading a 1,000-line file adds roughly 8,000-12,000 tokens. The optimization: read only what is needed.

# CLAUDE.md rule
When reading files:
- Use offset and limit parameters to read only relevant sections
- Never read an entire file over 200 lines without first checking its length
- For large files, read the first 50 lines to understand structure, then target specific sections

Before: Reading a full 800-line config file = ~9,600 tokens per read. After: Reading a targeted 50-line section = ~750 tokens per read. Savings: 92%.

Strategy 2: Model Routing by Task Complexity

Not every task requires Opus 4.6. Route tasks to the cheapest model that can handle them.

# Use Haiku 4.5 for simple file operations and searches
claude --model haiku "Find all TODO comments in src/"
# Use Sonnet 4.6 for standard coding tasks
claude --model sonnet "Refactor the auth middleware to use JWT validation"
# Reserve Opus 4.6 for complex architecture decisions
claude --model opus "Design the event-driven migration from REST to GraphQL"
Task Type Recommended Model Cost per 100K tokens (in+out)
File search, grep, simple edits Haiku 4.5 $0.48
Standard coding, refactoring Sonnet 4.6 $1.80
Architecture, complex debugging Opus 4.6 $9.00

Routing 60% of tasks to Sonnet and 20% to Haiku instead of using Opus for everything saves roughly 70% on API spend.

Strategy 3: Structured Prompts Over Conversational Ones

Vague prompts cause Claude Code to explore, read multiple files, and iterate – all of which cost tokens. Structured prompts with explicit constraints eliminate discovery overhead.

# Bad prompt (triggers 5-8 tool calls, ~3,000 token overhead):
"Fix the login bug"
# Good prompt (triggers 2-3 tool calls, ~700 token overhead):
"In src/auth/login.ts, the handleLogin function on line 45 throws
'undefined is not a function' when the OAuth token is expired.
Fix the null check on the token.expiresAt field."

Specific prompts reduce tool call overhead from an average of 6 calls (6 x 245 = 1,470 tokens in Bash overhead alone) to 2 calls (490 tokens). Combined with reduced file reading, this produces a 60-75% reduction per interaction.

Strategy 4: CLAUDE.md as a Context Preloader

Instead of re-explaining project context every session, encode it in CLAUDE.md. This file loads once (~200-1,000 tokens) and replaces what would otherwise be 5-10 back-and-forth clarification messages costing 5,000-15,000 tokens.

# CLAUDE.md -- project context section
## Project: PaymentService
- Language: TypeScript, Node 20, pnpm
- Database: PostgreSQL via Prisma ORM
- Test runner: Vitest (run with `pnpm test`)
- API style: REST, OpenAPI 3.1 spec in docs/api.yaml
- Key directories: src/routes/, src/services/, src/models/
- NEVER modify files in src/generated/ -- these are auto-generated from Prisma

This 100-token context block eliminates an average of 3 discovery tool calls per session (3 x 245 = 735 tokens for Bash alone, plus file content read tokens).

Strategy 5: Aggressive Compaction Scheduling

The /compact command reduces context by 60-80%, but most users only run it when hitting the context window limit. Proactive compaction after every 10-15 exchanges keeps the running context lean.

# Check current context usage
/cost
# If context exceeds 80K tokens, compact immediately
/compact
# After compaction, verify the reduction
/cost

A session that grows to 150K tokens before compaction wastes approximately $2.25-$11.25 (depending on model) in redundant context charges over those 15 exchanges. Compacting at 80K tokens saves roughly 40% of that waste.

Measuring Your Savings

Track token usage with the built-in /cost command to see per-session totals. For long-term tracking across sessions, use ccusage:

# Install ccusage for historical tracking
npm install -g ccusage
# View usage for the past 7 days
ccusage --days 7
# Export to JSON for spreadsheet analysis
ccusage --days 30 --format json > usage-report.json

Compare your weekly totals before and after implementing these strategies. A 3x reduction means your weekly token count should drop to roughly 33% of the baseline.

Cost Impact Summary

Technique Token Savings Monthly Savings (Solo, Opus)
Precision file reading 40-60% on reads $15-$45
Model routing (60% Sonnet) 70% on routed tasks $50-$150
Structured prompts 60-75% per interaction $20-$60
CLAUDE.md preloading 5K-15K tokens/session $10-$30
Proactive compaction 40% context waste $25-$75
Combined ~67% (3x reduction) $120-$360

For a team of five engineers, multiply the solo figures by 5 for a monthly savings range of $600-$1,800.

Which model? → Take the 5-question quiz in our Model Selector.

Estimate tokens → Calculate your usage with our Token Estimator.

Try it: Estimate your monthly spend with our Cost Calculator.

See Also