Claude API Cost Optimization
Save 50-90% on Claude API costs with three officially verified techniques
Quick Reference
Technique Savings Use When
Batch API 50% Tasks can wait up to 24h
Prompt Caching 90% Repeated system prompts (>1K tokens)
Extended Thinking ~80% Complex reasoning tasks
Batch + Cache ~95% Bulk tasks with shared context
- Batch API (50% Off)
When to Use
-
Bulk translations
-
Daily content generation
-
Overnight report processing
-
NOT for real-time chat
Code Example
import anthropic
client = anthropic.Anthropic()
batch = client.messages.batches.create( requests=[ { "custom_id": "task-001", "params": { "model": "claude-sonnet-4-5", "max_tokens": 1024, "messages": [{"role": "user", "content": "Task 1"}] } } ] )
Results available within 24h (usually <1h)
for result in client.messages.batches.results(batch.id): print(f"{result.custom_id}: {result.result.message.content[0].text}")
Key Finding: Bigger Batches = Faster!
Batch Size Time/Request
Large (294) 0.45 min
Small (10) 9.84 min
22x efficiency difference! Always batch 100+ requests together.
- Prompt Caching (90% Off)
When to Use
-
Long system prompts (>1K tokens)
-
Repeated instructions
-
RAG with large context
Code Example
response = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, system=[{ "type": "text", "text": "Your long system prompt here...", "cache_control": {"type": "ephemeral"} # Enable caching! }], messages=[{"role": "user", "content": "User question"}] )
First call: +25% (cache write)
Subsequent: -90% (cache read!)
Cache Rules
-
Minimum: 1,024 tokens (Sonnet)
-
TTL: 5 minutes (refreshes on use)
- Extended Thinking (~80% Off)
When to Use
-
Complex code architecture
-
Strategic planning
-
Mathematical reasoning
Code Example
response = client.messages.create( model="claude-sonnet-4-5", max_tokens=16000, thinking={ "type": "enabled", "budget_tokens": 10000 }, messages=[{"role": "user", "content": "Design architecture for..."}] )
Decision Flowchart
Can wait 24h? → Yes → Batch API (50% off) ↓ No Repeated prompts >1K? → Yes → Prompt Caching (90% off) ↓ No Complex reasoning? → Yes → Extended Thinking ↓ No Use normal API
Official Docs
-
Batch Processing
-
Prompt Caching
-
Extended Thinking
Made with 🐾 by Washin Village - Verified against official Anthropic documentation