ai-cost-optimizer

Master of LLM Economic Orchestration, specialized in Google GenAI (Gemini 3), Context Caching, and High-Fidelity Token Engineering.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "ai-cost-optimizer" with this command: npx skills add yuniorglez/gemini-elite-core/yuniorglez-gemini-elite-core-ai-cost-optimizer

Skill: AI Cost Optimizer (Standard 2026)

Role: The AI Cost Optimizer is a specialized "Token Economist" responsible for maximizing the reasoning output of AI agents while minimizing the operational expense. In 2026, this role masters the pricing tiers of Gemini 3 Flash and Lite models, implementing "Thinking-Level" routing and multi-layered caching to achieve up to 90% cost reduction on high-volume apps.

🎯 Primary Objectives

  1. Economic Orchestration: Dynamically routing prompts between Gemini 3 Pro, Flash, and Lite based on complexity.
  2. Context Caching Mastery: Implementing implicit and explicit caching for system instructions and long documents (v1.35.0+).
  3. Token Engineering: Reducing "Noise tokens" through XML-tagging and strict response schemas.
  4. Usage Governance: Implementing granular quotas and attribution to prevent runaway API billing.

🏗️ The 2026 Economic Stack

1. Target Models

  • Gemini 3 Pro: Reserved for "Mission Critical" reasoning and deep architecture mapping.
  • Gemini 3 Flash-Preview: The "Workhorse" for most coding and extraction tasks ($0.50/1M input).
  • Gemini Flash-Lite-Latest: The "Utility" agent for real-time validation and short-burst responses.

2. Optimization Tools

  • Google GenAI Context Caching: Reducing input fees for stable context blocks.
  • Thinking Level Param: Controlling reasoning depth for cost/latency trade-offs.
  • Prompt Registry: Deduplicating and optimizing recurring system instructions.

🛠️ Implementation Patterns

1. The "Thinking Level" Router

Adjusting the model's internal reasoning effort based on the task type.

// 2026 Pattern: Cost-Aware Generation
const model = genAI.getGenerativeModel({
  model: "gemini-3-flash",
  generationConfig: {
    thinkingLevel: taskComplexity === 'high' ? 'standard' : 'low',
    responseMimeType: "application/json",
  }
});

2. Explicit Context Caching (v1.35.0+)

Crucial for large codebases or stable documentation.

// Squaads Standard: 1M+ token repository caching
const codebaseCache = await cacheManager.create({
  model: "gemini-flash-lite-latest", 
  contents: [{ role: "user", parts: [{ text: fullRepoData }] }],
  ttlSeconds: 86400, // Cache for 24 hours
});

// Subsequent calls use cachedContent to avoid full re-billing
const result = await model.generateContent({
  cachedContent: codebaseCache.name,
  contents: [{ role: "user", parts: [{ text: "Explain the auth flow." }] }],
});

3. XML System Instruction Packing

Using XML tags to reduce instruction drift and token wastage in multi-turn chats.

<system_instruction>
  <role>Senior Architect</role>
  <constraints>No legacy PHP, use Property Hooks</constraints>
</system_instruction>

🚫 The "Do Not List" (Anti-Patterns)

  1. NEVER send a full codebase in every prompt. Use Repomix for pruning and Context Caching for reuse.
  2. NEVER use high-resolution video frames (280 tokens) for tasks that only need low-res (70 tokens).
  3. NEVER default to Gemini 3 Pro. Always start with Flash-Lite and escalate only if validation fails.
  4. NEVER allow agents to run in an infinite loop without a "Kill Switch" based on token accumulation.

🛠️ Troubleshooting & Usage Audit

IssueLikely Cause2026 Corrective Action
Billing SpikesUnoptimized multimodal inputDownsample images/video before sending to the model.
Low Quality (Lite)Insufficient reasoning depthSwitch thinkingLevel to standard or route to Flash-Preview.
Cache MissesContext drift in dynamic filesIsolate stable imports/types from volatile business logic.
HallucinationInstruction drift in long contextUse <system> tags and explicit "Do Not" lists.

📚 Reference Library


📊 Economic Metrics

  • Cost per Feature: < $0.05 (Target for Squaads agents).
  • Token Efficiency: > 80% (Knowledge vs Boilerplate).
  • Cache Hit Rate: > 75% for codebase queries.

🔄 Evolution of AI Pricing

  • 2023: Fixed per-token pricing (Prohibitive for large context).
  • 2024: First-gen Context Caching (Pro-only).
  • 2025-2026: Ubiquitous Caching and "Reasoning-on-Demand" (Thinking Level parameters).

End of AI Cost Optimizer Standard (v1.1.0)

Updated: January 22, 2026 - 23:45

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

filament-pro

No summary provided by upstream source.

Repository SourceNeeds Review
General

pdf-pro

No summary provided by upstream source.

Repository SourceNeeds Review
General

tailwind4-expert

No summary provided by upstream source.

Repository SourceNeeds Review
General

remotion-expert

No summary provided by upstream source.

Repository SourceNeeds Review