Anthropic Message Batches API Guide (2026)
The Problem
You need to process hundreds or thousands of Claude API requests but sending them one at a time is slow, expensive, and hits rate limits. Real-time responses are not required for your use case.
Quick Fix
Use the Message Batches API to submit up to 100,000 requests in a single batch at 50% of standard API pricing:
import anthropic
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request
client = anthropic.Anthropic()
message_batch = client.messages.batches.create(
requests=[
Request(
custom_id="request-1",
params=MessageCreateParamsNonStreaming(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Summarize this document..."}],
),
),
]
)
print(message_batch.id)
What’s Happening
The Message Batches API processes requests asynchronously instead of synchronously. When you submit a batch, Anthropic queues all requests and processes them in parallel. Most batches complete within 1 hour. Each request in the batch is handled independently, so one failure does not affect others.
The key advantage is cost: all batch usage is charged at 50% of standard API prices. For Claude Sonnet 4.6, that means $1.50 per million input tokens and $7.50 per million output tokens instead of $3 and $15 respectively.
Step-by-Step Fix
Step 1: Prepare your batch requests
Each request needs a unique custom_id (1-64 alphanumeric characters, hyphens, and underscores) and a params object with standard Messages API parameters:
import anthropic
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request
client = anthropic.Anthropic()
requests = []
for i, doc in enumerate(documents):
requests.append(
Request(
custom_id=f"doc-{i}",
params=MessageCreateParamsNonStreaming(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[
{"role": "user", "content": f"Summarize: {doc}"}
],
),
)
)
message_batch = client.messages.batches.create(requests=requests)
print(f"Batch ID: {message_batch.id}")
Step 2: Poll for completion
Check the batch status until processing finishes:
import time
while True:
batch = client.messages.batches.retrieve(message_batch.id)
if batch.processing_status == "ended":
break
print(f"Status: {batch.processing_status} - "
f"{batch.request_counts.succeeded} succeeded, "
f"{batch.request_counts.processing} processing")
time.sleep(30)
Step 3: Retrieve results
Stream results for the completed batch:
for result in client.messages.batches.results(message_batch.id):
if result.result.type == "succeeded":
print(f"{result.custom_id}: {result.result.message.content[0].text}")
elif result.result.type == "errored":
print(f"{result.custom_id}: Error - {result.result.error}")
Step 4: Handle errors and expiration
Individual requests can fail without affecting the batch. Batches expire if processing does not complete within 24 hours. Results are available for 29 days after creation.
batch = client.messages.batches.retrieve(message_batch.id)
counts = batch.request_counts
print(f"Succeeded: {counts.succeeded}")
print(f"Errored: {counts.errored}")
print(f"Expired: {counts.expired}")
Step 5: Use with prompt caching for better performance
Since batches can take time to process, use the 1-hour cache duration for shared context:
requests.append(
Request(
custom_id=f"doc-{i}",
params=MessageCreateParamsNonStreaming(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[{
"type": "text",
"text": shared_system_prompt,
"cache_control": {"type": "ephemeral"}
}],
messages=[
{"role": "user", "content": f"Analyze: {doc}"}
],
),
)
)
Batch limits
- Maximum 100,000 requests or 256 MB per batch, whichever comes first
- Batches expire after 24 hours if not complete
- Results available for 29 days after creation
- All active Claude models are supported
Pricing reference
| Model | Batch Input | Batch Output |
|---|---|---|
| Claude Opus 4.6 | $2.50/MTok | $12.50/MTok |
| Claude Sonnet 4.6 | $1.50/MTok | $7.50/MTok |
| Claude Haiku 4.5 | $0.50/MTok | $2.50/MTok |
Prevention
Design your batch pipelines to handle partial failures. Always check request_counts after processing ends. Implement retry logic for expired or errored requests by resubmitting them in a new batch.
For large-scale evaluations, split work into multiple batches under the 100K request limit and process them concurrently for maximum throughput.
Level Up Your Claude Code Workflow
The developers who get the most out of Claude Code aren’t just fixing errors — they’re running multi-agent pipelines, using battle-tested CLAUDE.md templates, and shipping with production-grade operating principles.
Which model? → Take the 5-question quiz in our Model Selector.
Related Guides
Try it: Estimate your monthly spend with our Cost Calculator.
- Claude API Batch Processing Large Datasets Guide
- Claude API Cost Optimization Strategies
- Claude API Rate Limit Fix
- Claude temperature settings guide — How to configure temperature and sampling parameters in Claude
Related Articles
- Message Batches API Tutorial with Cost Examples
- Claude Code for Claude Batch API: Anthropic Workflow Guide
- Message Batches API Tutorial with Cost Examples
Common Questions
How do I get started with anthropic message batches api?
Begin with the setup instructions in this guide. Install the required dependencies, configure your environment, and test with a small project before scaling to your full codebase.
What are the prerequisites?
You need a working development environment with Node.js or Python installed. Familiarity with the command line and basic Git operations is helpful. No advanced AI knowledge is required.
Can I use this with my existing development workflow?
Yes. These techniques integrate with standard development tools and CI/CD pipelines. Start by adding them to a single project and expand once you have verified the benefits.
Where can I find more advanced techniques?
Explore the related resources below for deeper coverage. The Claude Code documentation and community forums also provide advanced patterns and real-world case studies.