claude-api-cost-optimization

Claude API Cost Optimization

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "claude-api-cost-optimization" with this command: npx skills add sstklen/claude-api-cost-optimization/sstklen-claude-api-cost-optimization-claude-api-cost-optimization

Claude API Cost Optimization

Save 50-90% on Claude API costs with three officially verified techniques

Quick Reference

Technique Savings Use When

Batch API 50% Tasks can wait up to 24h

Prompt Caching 90% Repeated system prompts (>1K tokens)

Extended Thinking ~80% Complex reasoning tasks

Batch + Cache ~95% Bulk tasks with shared context

  1. Batch API (50% Off)

When to Use

  • Bulk translations

  • Daily content generation

  • Overnight report processing

  • NOT for real-time chat

Code Example

import anthropic

client = anthropic.Anthropic()

batch = client.messages.batches.create( requests=[ { "custom_id": "task-001", "params": { "model": "claude-sonnet-4-5", "max_tokens": 1024, "messages": [{"role": "user", "content": "Task 1"}] } } ] )

Results available within 24h (usually <1h)

for result in client.messages.batches.results(batch.id): print(f"{result.custom_id}: {result.result.message.content[0].text}")

Key Finding: Bigger Batches = Faster!

Batch Size Time/Request

Large (294) 0.45 min

Small (10) 9.84 min

22x efficiency difference! Always batch 100+ requests together.

  1. Prompt Caching (90% Off)

When to Use

  • Long system prompts (>1K tokens)

  • Repeated instructions

  • RAG with large context

Code Example

response = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, system=[{ "type": "text", "text": "Your long system prompt here...", "cache_control": {"type": "ephemeral"} # Enable caching! }], messages=[{"role": "user", "content": "User question"}] )

First call: +25% (cache write)

Subsequent: -90% (cache read!)

Cache Rules

  • Minimum: 1,024 tokens (Sonnet)

  • TTL: 5 minutes (refreshes on use)

  1. Extended Thinking (~80% Off)

When to Use

  • Complex code architecture

  • Strategic planning

  • Mathematical reasoning

Code Example

response = client.messages.create( model="claude-sonnet-4-5", max_tokens=16000, thinking={ "type": "enabled", "budget_tokens": 10000 }, messages=[{"role": "user", "content": "Design architecture for..."}] )

Decision Flowchart

Can wait 24h? → Yes → Batch API (50% off) ↓ No Repeated prompts >1K? → Yes → Prompt Caching (90% off) ↓ No Complex reasoning? → Yes → Extended Thinking ↓ No Use normal API

Official Docs

  • Batch Processing

  • Prompt Caching

  • Extended Thinking

Made with 🐾 by Washin Village - Verified against official Anthropic documentation

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

yes

No summary provided by upstream source.

Repository SourceNeeds Review
General

Session-Memory Enhanced

Session-Memory Enhanced v4.0 - 统一增强版。融合 session-memory + memu-engine 核心功能。特性:结构化提取 + 向量检索 + 不可变分片 + 三位一体自动化 + 多代理隔离 + AI 摘要 + 零配置启动。

Registry SourceRecently Updated
General

PRISM-GEN-DEMO

English: Retrieve, filter, sort, merge, and visualize multiple CSV result files from PRISM-Gen molecular generation/screening. Provides portable query-based...

Registry SourceRecently Updated