agent-orchestration

Multi-agent orchestration patterns for production deployments. Covers sub-agent QC workflow, model staggering across 5+ models, cross-validation patterns, fallback chains, task routing by model strength, ACPX configuration, and cost optimization. Use when coordinating multiple agents or models for complex workflows. Do NOT use for single-agent prompting, prompt engineering, or fine-tuning — those are separate skills.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-orchestration" with this command: npx skills add samledger67-dotcom/precisionledger-agent-orchestration

Agent Orchestration

Production-tested patterns for coordinating multiple AI agents and models. This skill covers the full spectrum from simple fallback chains to complex multi-model workflows with cross-validation and quality control loops.

When to Use

  • Coordinating 2+ agents or models on a single workflow
  • Building QC loops where one model checks another's work
  • Routing tasks to the right model based on task type
  • Setting up fallback chains for reliability
  • Optimizing cost across subscription and API models
  • Configuring ACPX (Agent Computer Protocol eXtended) for Claude Code and Codex
  • Designing spawn patterns for runtime sub-agents

When NOT to Use

  • Single-agent prompting or prompt engineering (use a prompt-engineering skill)
  • Fine-tuning or training models (different domain entirely)
  • Simple API calls to one model (just call the API)
  • RAG or retrieval pipeline design (use a RAG-specific skill)
  • Agent memory architecture (use the agent-memory-architecture skill)

1. Sub-Agent QC Workflow

The core pattern: Produce → Review → Cross-Check → Incorporate → Deliver.

The Five-Step Loop

┌─────────────┐
│  1. PRODUCE  │  Sonnet 4.6 generates first draft
│  (Grinder)   │  Fast, cost-effective, good enough for 80% of tasks
└──────┬──────┘
       ▼
┌─────────────┐
│  2. REVIEW   │  Same model self-reviews against criteria
│  (Self-QC)   │  Catches obvious errors, formatting issues
└──────┬──────┘
       ▼
┌─────────────┐
│  3. CROSS    │  Different model (GPT-4o / Grok) validates
│  CHECK       │  Catches blind spots, model-specific biases
└──────┬──────┘
       ▼
┌─────────────┐
│  4. INCORP.  │  Opus 4.6 synthesizes feedback
│  (Orchestr.) │  Resolves conflicts, applies judgment
└──────┬──────┘
       ▼
┌─────────────┐
│  5. DELIVER  │  Final output with confidence score
│  (Output)    │  Includes provenance trail
└─────────────┘

Implementation Example

async def qc_workflow(task: str, context: dict) -> dict:
    """Five-step QC workflow with cross-model validation."""

    # Step 1: Produce (Sonnet — fast, cheap)
    draft = await call_model(
        model="claude-sonnet-4-6",
        prompt=f"Complete this task:\n{task}",
        context=context,
        max_tokens=4096
    )

    # Step 2: Self-review (same model, different prompt)
    self_review = await call_model(
        model="claude-sonnet-4-6",
        prompt=f"""Review this output for errors, omissions, and quality:

TASK: {task}
OUTPUT: {draft}

Score 1-10 on: accuracy, completeness, clarity.
List specific issues to fix.""",
        max_tokens=1024
    )

    # Step 3: Cross-check (different model family)
    cross_check = await call_model(
        model="gpt-4o",
        prompt=f"""Independent review. Do NOT assume the draft is correct.

TASK: {task}
DRAFT: {draft}
SELF-REVIEW: {self_review}

Identify: factual errors, logical gaps, missing context, biases.""",
        max_tokens=1024
    )

    # Step 4: Incorporate (Opus — best judgment)
    final = await call_model(
        model="claude-opus-4-6",
        prompt=f"""Synthesize and produce final output.

TASK: {task}
DRAFT: {draft}
SELF-REVIEW: {self_review}
CROSS-CHECK: {cross_check}

Resolve any conflicts. Produce the best possible final output.
Include a confidence score (0-100) and list any unresolved concerns.""",
        max_tokens=4096
    )

    # Step 5: Deliver with metadata
    return {
        "output": final,
        "provenance": {
            "producer": "claude-sonnet-4-6",
            "reviewer": "claude-sonnet-4-6",
            "cross_checker": "gpt-4o",
            "synthesizer": "claude-opus-4-6",
            "steps_completed": 5
        }
    }

When to Skip Steps

ScenarioSkipRationale
Low-stakes internal taskSteps 3-4Self-review is sufficient
Time-critical (<30s budget)Steps 2-4Single model, accept risk
High-stakes client deliverableNoneFull loop, every time
Coding task with testsStep 3Tests serve as cross-check
Creative/subjective workStep 3Cross-check adds noise, not signal

2. Model Staggering

Assign models to tasks based on their demonstrated strengths.

The Model Roster

Model              Strength Zone              Cost Tier    Speed
────────────────────────────────────────────────────────────────
Opus 4.6           Strategy, synthesis,       $$$$$        Slow
                   complex reasoning,
                   judgment calls

Sonnet 4.6         Production work, coding,   $$$          Fast
                   analysis, writing,
                   general-purpose grinder

GPT-4o             Coding, scoring rubrics,   $$$$         Medium
                   structured output,
                   alternative perspective

Grok               X/Twitter analysis,        $$           Fast
                   social media content,
                   real-time commentary

Gemini 2.5 Pro     Deep research, long        $$$          Medium
                   context analysis,
                   multimodal processing

Haiku 4.5          Classification, routing,   $            Very Fast
                   simple extraction,
                   high-volume tasks

Task Routing Rules

routing_rules:
  # Strategic / High-judgment tasks → Opus
  strategy:
    models: [claude-opus-4-6]
    triggers:
      - "requires judgment between competing priorities"
      - "synthesize conflicting information"
      - "make a recommendation with tradeoffs"
      - "review and improve another agent's work"

  # Production work → Sonnet
  production:
    models: [claude-sonnet-4-6]
    triggers:
      - "write code to specification"
      - "generate content from template"
      - "analyze data and report findings"
      - "standard business communication"

  # Coding with scoring → GPT
  coding_and_scoring:
    models: [gpt-4o]
    triggers:
      - "write and debug complex algorithms"
      - "score outputs against rubric"
      - "generate structured JSON/YAML"
      - "cross-validate another model's output"

  # Social / real-time → Grok
  social:
    models: [grok-3]
    triggers:
      - "analyze X/Twitter trends"
      - "generate social media content"
      - "real-time event commentary"
      - "meme-aware communication"

  # Deep research → Gemini
  research:
    models: [gemini-2.5-pro]
    triggers:
      - "analyze documents >100K tokens"
      - "cross-reference multiple long sources"
      - "multimodal analysis (images + text)"
      - "broad research synthesis"

  # High-volume classification → Haiku
  classification:
    models: [claude-haiku-4-5]
    triggers:
      - "classify items into categories"
      - "extract structured fields from text"
      - "route incoming requests"
      - "simple yes/no decisions"

Staggering in Practice

Example: "Write a market analysis report"

1. Gemini 2.5 Pro  → Research phase (long context, web search)
2. Sonnet 4.6      → Draft the report (fast production)
3. GPT-4o          → Score against quality rubric (structured eval)
4. Opus 4.6        → Final synthesis and executive summary (judgment)
5. Haiku 4.5       → Extract key metrics into structured JSON (cheap, fast)

3. Fallback Chains

When a model is unavailable, rate-limited, or returns low-quality output, fall through to the next option.

Chain Configuration

fallback_chains:
  # Primary reasoning chain
  reasoning:
    - model: claude-opus-4-6
      timeout: 60s
      retry: 1
    - model: gpt-4o
      timeout: 45s
      retry: 1
    - model: claude-sonnet-4-6
      timeout: 30s
      retry: 2
    - model: gemini-2.5-pro
      timeout: 45s
      retry: 1

  # Fast production chain
  production:
    - model: claude-sonnet-4-6
      timeout: 30s
      retry: 2
    - model: gpt-4o
      timeout: 30s
      retry: 1
    - model: grok-3
      timeout: 20s
      retry: 1

  # Classification chain (optimize for cost)
  classification:
    - model: claude-haiku-4-5
      timeout: 10s
      retry: 3
    - model: claude-sonnet-4-6
      timeout: 15s
      retry: 1

Fallback Decision Logic

async def call_with_fallback(chain: str, prompt: str) -> dict:
    """Try models in order until one succeeds with acceptable quality."""

    for entry in CHAINS[chain]:
        for attempt in range(entry["retry"] + 1):
            try:
                result = await call_model(
                    model=entry["model"],
                    prompt=prompt,
                    timeout=entry["timeout"]
                )

                # Quality gate: reject low-confidence outputs
                if result.get("confidence", 100) < 30:
                    log(f"{entry['model']} returned low confidence, trying next")
                    break  # Move to next model, don't retry

                return {
                    "output": result,
                    "model_used": entry["model"],
                    "attempt": attempt + 1,
                    "fallback_depth": CHAINS[chain].index(entry)
                }

            except (TimeoutError, RateLimitError) as e:
                log(f"{entry['model']} attempt {attempt+1} failed: {e}")
                continue

    raise AllModelsFailed(f"No model in chain '{chain}' produced acceptable output")

4. ACPX Configuration

ACPX (Agent Computer Protocol eXtended) enables tool-using agents to coordinate. Configuration for Claude Code and Codex environments.

Claude Code Configuration

In your project's CLAUDE.md:

# Agent Orchestration

## Sub-agent Spawning
When a task requires cross-model validation:
1. Use the Agent tool to spawn a sub-agent for the secondary task
2. The sub-agent inherits the project context but gets its own conversation
3. Results flow back to the orchestrator via the Agent tool response

## Model Selection
- Use claude-opus-4-6 for: architectural decisions, code review, complex debugging
- Use claude-sonnet-4-6 for: implementation, test writing, documentation
- Use claude-haiku-4-5 for: linting, formatting, simple refactors

## Tool Permissions
Sub-agents may: read files, search code, run tests
Sub-agents may NOT: push to git, modify CI/CD, delete files without confirmation

ACP Server Setup

{
  "mcpServers": {
    "orchestrator": {
      "command": "node",
      "args": ["./orchestrator-server.js"],
      "env": {
        "ANTHROPIC_API_KEY": "${ANTHROPIC_API_KEY}",
        "OPENAI_API_KEY": "${OPENAI_API_KEY}",
        "MAX_CONCURRENT_AGENTS": "5",
        "DEFAULT_CHAIN": "production"
      }
    }
  }
}

Codex Integration

# codex.yaml
agents:
  orchestrator:
    model: claude-opus-4-6
    role: "Route tasks and synthesize results"
    tools: [spawn_agent, review_output, merge_results]

  grinder:
    model: claude-sonnet-4-6
    role: "Execute implementation tasks"
    tools: [read_file, write_file, run_tests, search_code]

  validator:
    model: gpt-4o
    role: "Cross-validate outputs"
    tools: [read_file, run_tests, score_output]

5. Cost Optimization

Subscription vs API Economics

Subscription Models ($20-200/month flat):
  Claude Pro/Max    → Best for: daily interactive use, long sessions
  ChatGPT Plus      → Best for: GPT-4o access, plugins
  Grok Premium      → Best for: X integration, real-time
  Gemini Advanced   → Best for: Google ecosystem, long context

API Models (per-token):
  claude-opus-4-6   → $15/M input, $75/M output
  claude-sonnet-4-6 → $3/M input, $15/M output
  claude-haiku-4-5  → $0.80/M input, $4/M output
  gpt-4o            → $2.50/M input, $10/M output

$0 Marginal Cost Routing

When you have active subscriptions, route interactive and exploratory work through subscriptions (zero marginal cost) and reserve API for automated/batch workflows.

Decision Tree:
  Is this interactive/exploratory?
    YES → Route through subscription (Claude Code, ChatGPT, etc.)
    NO  → Is this batch/automated?
      YES → Use API with cheapest adequate model
      NO  → Is this high-volume (>1000 calls/day)?
        YES → Use Haiku via API ($0.80/M input)
        NO  → Use Sonnet via API ($3/M input)

Cost Tracking Template

Monthly AI Spend:
  Subscriptions (fixed):
    Claude Max            $200.00
    ChatGPT Plus           $20.00
    Grok Premium           $30.00
    Gemini Advanced        $20.00
  Subtotal Fixed          $270.00

  API Usage (variable):
    Opus 4.6         42K tokens    $3.78
    Sonnet 4.6      380K tokens    $6.84
    Haiku 4.5     1.2M tokens      $1.76
    GPT-4o          95K tokens     $1.19
  Subtotal Variable                $13.57

  Total                           $283.57
  Cost per task (avg)               $0.28
  Tasks completed                  1,013

6. Spawn Patterns

Pattern 1: Runtime Sub-Agent (Within Claude Code)

Use the Agent tool to spawn sub-agents that inherit project context.

Orchestrator (Opus)
  ├── Agent: "Research the API surface" (Explore subagent)
  ├── Agent: "Implement the endpoint" (general-purpose subagent)
  └── Agent: "Write tests" (general-purpose subagent)

Best for: tasks where sub-agents need file system access and project context.

Pattern 2: API-Spawned Agent (External)

Call model APIs directly for tasks that don't need project context.

# Spawn multiple validators in parallel
import asyncio

async def parallel_validate(content: str) -> list:
    tasks = [
        call_model("claude-sonnet-4-6", f"Review for accuracy:\n{content}"),
        call_model("gpt-4o", f"Review for accuracy:\n{content}"),
        call_model("gemini-2.5-pro", f"Review for accuracy:\n{content}"),
    ]
    return await asyncio.gather(*tasks)

Best for: cross-validation, scoring, classification — tasks that are self-contained.

Pattern 3: Orchestrator-Grinder Split

The orchestrator plans and delegates. Grinders execute. Never let a grinder make strategic decisions.

ORCHESTRATOR (Opus 4.6):
  - Reads the task requirements
  - Breaks into subtasks
  - Assigns each subtask to appropriate grinder
  - Reviews grinder outputs
  - Synthesizes final deliverable
  - Makes judgment calls on conflicts

GRINDER (Sonnet 4.6 / GPT-4o):
  - Receives specific, scoped subtask
  - Executes without strategic decisions
  - Returns output with confidence score
  - Flags uncertainty rather than guessing

Anti-Patterns to Avoid

Anti-PatternProblemFix
Grinder makes strategic callsInconsistent decisions, wasted workEscalate to orchestrator
Orchestrator does grinder workSlow, expensive, bottleneckDelegate production tasks
No quality gate between stepsErrors compound through pipelineAdd review step after each stage
Same model reviews its own workBlind spots persistCross-model validation
Spawning agents for trivial tasksOverhead exceeds task costDirect call for simple tasks
Infinite retry loopsCost explosionMax 3 retries, then escalate

7. Orchestrator vs Grinder Principle

This is the foundational principle of multi-agent systems.

The Rule

The orchestrator thinks. The grinder does. Never confuse the two.

Role Definitions

ORCHESTRATOR                          GRINDER
─────────────────────────────────     ─────────────────────────────────
Decides WHAT to do                    Decides HOW to do it
Chooses which model/tool              Uses the tools it's given
Reviews and judges quality            Produces and reports confidence
Resolves conflicts between agents     Flags conflicts for resolution
Owns the final output                 Owns its subtask output
Expensive, slow, high-judgment        Cheap, fast, high-throughput
1 per workflow                        N per workflow

Decision Framework

"Should this be an orchestrator or grinder decision?"

Ask: "If two reasonable people disagreed on this, would it matter?"
  YES → Orchestrator decision (judgment required)
  NO  → Grinder decision (execution, not judgment)

Ask: "Does this affect the overall workflow direction?"
  YES → Orchestrator decision
  NO  → Grinder decision

Ask: "Could a junior employee do this with clear instructions?"
  YES → Grinder task
  NO  → Orchestrator task

Example Workflow: Client Deliverable

ORCHESTRATOR (Opus):
  1. Read client brief → decide deliverable structure
  2. Break into sections → assign to grinders
  3. Review all sections → identify gaps
  4. Resolve quality issues → request rewrites
  5. Synthesize → produce final deliverable
  6. Generate executive summary → deliver

GRINDER 1 (Sonnet): Write Section A per outline
GRINDER 2 (Sonnet): Write Section B per outline
GRINDER 3 (GPT-4o): Generate data tables and charts
GRINDER 4 (Gemini): Research background for Section C
GRINDER 5 (Haiku): Format citations and references

Total cost: 1 Opus call (synthesis) + 5 cheaper calls (production) vs. doing everything in Opus: 6 Opus calls at 5x the cost.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Fast.io

Workspaces for agentic teams. Complete agent guide with all 19 consolidated tools using action-based routing — parameters, workflows, ID formats, and constra...

Registry SourceRecently Updated
3.6K1dbalve
Automation

Tozil

Track every AI dollar your agent spends. Per-model cost breakdown, daily budgets, and alerts.

Registry SourceRecently Updated
Automation

ComfyUI Controller Pro

支持批量生成10-100个修仙视频和图片,集成LTX2多版本模型与自动化浏览器及工作流管理功能。

Registry SourceRecently Updated
Automation

Baidu Yijian Vision

百度一见专业级视觉 AI Agent:支持图片/视频/及实时视频流分析。相比通用基模,在维持 95%+ 专业精度的同时,推理成本降低 50% 以上,是处理视觉巡检与监控分析任务的首选工具。主打 安全管理、SOP合规、工业质检、商业运营与物料盘点。覆盖:作业 SOP 合规与关键步骤完整性校验;工业质检与表面缺陷精密...

Registry SourceRecently Updated