Fix Claude API Rate Limit Errors (HTTP (2026)

TL;DR: HTTP 429 means you have exceeded your request or token rate limit. Implement exponential backoff, check your usage tier limits, and batch requests to stay within bounds.

The Problem

Your Claude API calls start failing with HTTP Claude rate exceeded error fix:

{
 "type": "error",
 "error": {
 "type": "rate_limit_error",
 "message": "Rate limit exceeded. Please retry after X seconds."
 }
}

Why This Happens

Anthropic enforces rate limits across three dimensions:

  • Requests per minute (RPM): How many API calls you can make per minute.
  • Input tokens per minute (ITPM): Total input tokens consumed per minute.
  • Output tokens per minute (OTPM): Total output tokens generated per minute.

Limits vary by usage tier (Tier 1, Tier 2, Tier 3, Tier 4, and Monthly Invoicing) and by model. Opus models typically have lower limits than Sonnet or Haiku. You can view your current tier and limits in the Anthropic Console.

The Fix

Step 1 — Check Your Current Limits and Usage

# Make a request and inspect rate limit headers
curl -s -D - https://api.anthropic.com/v1/messages \
 -H "x-api-key: $ANTHROPIC_API_KEY" \
 -H "anthropic-version: 2023-06-01" \
 -H "content-type: application/json" \
 -d '{"model":"claude-sonnet-4-5-20250514","max_tokens":32,"messages":[{"role":"user","content":"hi"}]}' \
 -o /dev/null 2>&1 | grep -i "rate\|limit\|retry"

Key response headers returned on every request:

anthropic-ratelimit-requests-limit — Your RPM cap
anthropic-ratelimit-requests-remaining — Remaining requests this window
anthropic-ratelimit-requests-reset — When the RPM window resets (ISO 8601)
anthropic-ratelimit-input-tokens-limit — Your ITPM cap
anthropic-ratelimit-input-tokens-remaining
anthropic-ratelimit-input-tokens-reset
anthropic-ratelimit-output-tokens-limit — Your OTPM cap
anthropic-ratelimit-output-tokens-remaining
anthropic-ratelimit-output-tokens-reset
retry-after — Seconds to wait before retrying (on 429 responses)

Step 2 — Implement Proper Retry Logic

Python SDK (automatic retries built in):

import anthropic
import time
client = anthropic.Anthropic(max_retries=5)
def call_with_backoff(messages, max_attempts=5):
 """Call API with manual backoff for rate limits."""
 for attempt in range(max_attempts):
 try:
 return client.messages.create(
 model="claude-sonnet-4-5-20250514",
 max_tokens=1024,
 messages=messages
 )
 except anthropic.RateLimitError as e:
 if attempt == max_attempts - 1:
 raise
 wait = min(2 ** attempt, 60) # Cap at 60s
 print(f"Rate limited. Waiting {wait}s...")
 time.sleep(wait)
 return None # Unreachable but satisfies bounded return

TypeScript SDK:

import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({ maxRetries: 5 });
async function callWithBackoff(
 messages: Anthropic.MessageParam[],
 maxAttempts = 5
): Promise<Anthropic.Message | null> {
 for (let attempt = 0; attempt < maxAttempts; attempt++) {
 try {
 return await client.messages.create({
 model: "claude-sonnet-4-5-20250514",
 max_tokens: 1024,
 messages,
 });
 } catch (error) {
 if (error instanceof Anthropic.RateLimitError) {
 if (attempt === maxAttempts - 1) throw error;
 const wait = Math.min(2 ** attempt * 1000, 60000);
 await new Promise((r) => setTimeout(r, wait));
 } else {
 throw error;
 }
 }
 }
 return null;
}

Step 3 — Batch Requests Efficiently

If you are processing many items, use a rate-limited queue:

import asyncio
import anthropic
MAX_CONCURRENT = 5 # Stay well below RPM limit
semaphore = asyncio.Semaphore(MAX_CONCURRENT)
client = anthropic.AsyncAnthropic(max_retries=3)
async def process_item(item):
 async with semaphore:
 return await client.messages.create(
 model="claude-haiku-3-5-20241022",
 max_tokens=512,
 messages=[{"role": "user", "content": item}]
 )
async def main():
 items = ["task1", "task2", "task3"] # Your work items
 results = await asyncio.gather(
 *[process_item(item) for item in items],
 return_exceptions=True
 )
 for r in results:
 assert not isinstance(r, Exception), f"Failed: {r}"
 return results

Step 4 — Verify Rate Limits Are Not Exhausted

# Check remaining quota from response headers
curl -s -D - https://api.anthropic.com/v1/messages \
 -H "x-api-key: $ANTHROPIC_API_KEY" \
 -H "anthropic-version: 2023-06-01" \
 -H "content-type: application/json" \
 -d '{"model":"claude-sonnet-4-5-20250514","max_tokens":16,"messages":[{"role":"user","content":"1"}]}' \
 2>&1 | grep "ratelimit-requests-remaining"

Expected output:

anthropic-ratelimit-requests-remaining: 48

Common Variations

Scenario Cause Quick Fix
429 only with Opus Lower per-model RPM/ITPM limits Switch to Sonnet for high-throughput tasks
429 in parallel workers Too many simultaneous requests hitting RPM Add a semaphore / rate limiter
Consistent 429 at start of each minute ITPM or OTPM exhausted before RPM Reduce prompt size or output max_tokens
Need higher limits Current tier caps too low Upgrade usage tier in the Anthropic Console

Prevention

  • Use Haiku for high-volume tasks: Haiku has much higher rate limits than Opus or Sonnet.
  • Cache responses: Avoid redundant API calls by caching deterministic responses locally.
  • Monitor usage: Track your ratelimit-*-remaining headers in dashboards to catch limits before they hit.
  • Read the retry-after header: On a 429 response, wait exactly the number of seconds specified rather than guessing.


I hit this exact error six months ago. Then I wrote a CLAUDE.md that tells Claude my stack, my conventions, and my error handling patterns. Haven't seen it since. I run 5 Claude Max subs, 16 Chrome extensions serving 50K users, and bill $500K+ on Upwork. These CLAUDE.md templates are what I actually use. Not theory — production configs. **[Grab the templates — $99 once, free forever →](https://zovo.one/lifetime?utm_source=ccg&utm_medium=cta-error&utm_campaign=claude-api-rate-limit-fix)** 47/500 founding spots. Price goes up when they're gone.

Which model? → Take the 5-question quiz in our Model Selector.

Try it: Paste your error into our Error Diagnostic for an instant fix.

Last verified: 2026-04-15. Found an issue? Open a GitHub issue.