smart-router

Intelligent task routing between local and cloud Ollama LLM instances. Use when the user wants cost-efficient AI responses by routing simple tasks to a local Ollama model and complex tasks to a more powerful remote/cloud Ollama instance. Automatically classifies task complexity, detects system capabilities, and delegates to the appropriate model tier. Use for any request where you want to balance latency vs capability, or when explicitly asked to use smart routing, local-first routing, or Ollama model selection.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "smart-router" with this command: npx skills add simoncatbot/ollama-smart-router

Smart Router

Routes tasks between a local Ollama instance (fast, cheap) and a remote/cloud Ollama instance (more capable) based on task complexity classification and system capabilities.

Quick Start

# 1. Profile your system
python scripts/system_profiler.py

# 2. Check endpoints are healthy
python scripts/health_check.py

# 3. Route a task
python scripts/route.py "What is quantum computing?"

How It Works

User Request
    ↓
System Profiler (detects compatible models)
    ↓
Health Check (verifies endpoints are up)
    ↓
Classify Task (1-5 complexity score)
    ↓
├─ Score 1-2 → Local Ollama (fast, cheap)
├─ Score 3-5 → Cloud Ollama (powerful)
└─ Match specialist → Dedicated model
    ↓
Verify model available (fallback if not)
    ↓
Stream Response

Classification Scale

ScoreComplexityExamplesRouted To
1Simple"What is 2+2?", "Define entropy"Local
2Basic"Write hello world in Python"Local
3Complex"Debug this error", "Compare X vs Y"Cloud
4Deep"Design a system", "Research topic"Cloud
5Expert"Build from scratch", "Multi-file project"Cloud

File Structure

smart-router/
├── SKILL.md                          # This file
├── __init__.py                       # Python package interface
├── requirements.txt                    # Dependencies
│
├── config/
│   ├── router.yaml                   # Main configuration
│   └── system_profile.json            # Auto-generated system specs
│
├── scripts/
│   ├── classify.py                   # Task complexity classifier
│   ├── execute.py                    # Ollama API client
│   ├── route.py                      # Main routing logic
│   ├── system_profiler.py            # Hardware detection
│   └── health_check.py               # Endpoint health verification
│
├── tests/
│   └── test_classifier.py            # Test suite
│
└── references/
    └── classifier-prompt.txt         # LLM fallback prompt

Configuration

Edit config/router.yaml:

# Local Ollama (your machine)
local:
  model: "llama3.2"
  base_url: "http://localhost:11434"

# Cloud Ollama (remote server)
cloud:
  model: "qwen2.5:14b"
  base_url: "http://192.168.1.100:11434"

# Tasks scoring >= this go to cloud
threshold: 3

# Domain specialists (checked first)
specialists:
  code:
    model: "codellama:34b"
    base_url: "http://192.168.1.100:11434"
    triggers: ["code review", "refactor"]

# Performance settings
performance:
  timeout_seconds: 60
  stream_responses: true
  retry_attempts: 2

# Caching
cache:
  enabled: true
  db_path: "cache/router.db"
  ttl_seconds: 86400

Usage

CLI

# Basic routing
python scripts/route.py "What is the capital of France?"

# With profiling (updates system profile)
python scripts/route.py "Debug this error" --profile

# Custom config
python scripts/route.py "Design a system" --config config/my-router.yaml

# No streaming (wait for full response)
python scripts/route.py "Summarize this" --no-stream

# Health check all endpoints
python scripts/health_check.py

# Manual classification
python scripts/classify.py "Write a function"
# Output: "2:basic-task"

Python API

from smart_router import SmartRouter

# Initialize
router = SmartRouter()

# Route with streaming
for chunk in router.route("Explain quantum computing"):
    print(chunk, end='')

# Classify only
score, reason = router.classify("Debug this code")
print(f"Complexity: {score}/5, Reason: {reason}")

# Get configuration
config = router.get_config()
print(f"Local model: {config['local']['model']}")

Workflow

1. System Profiling

Run once (or when hardware changes):

python scripts/system_profiler.py

This creates config/system_profile.json with:

  • Total/available RAM
  • GPU detection (VRAM, name)
  • CPU cores
  • Compatible model list
  • Recommended local model

2. Health Check

Verify endpoints before use:

python scripts/health_check.py

Checks:

  • Ollama version
  • Available models
  • Response latency
  • Connection status

3. Routing

When you submit a task:

  1. Specialist check — Match against specialist triggers
  2. Classification — Pattern-based scoring (1-5)
  3. Model selection — Local (1-2) or Cloud (3-5)
  4. Availability check — Verify model exists in Ollama
  5. Fallback — Use compatible model if preferred unavailable
  6. Execution — Stream response from selected model

Features

Pattern-Based Classification

Uses regex patterns (not LLM calls) for speed:

  • 30ms classification time
  • 0 tokens cost
  • Handles false positives ("zip code" ≠ code task)

System-Aware Model Selection

Automatically detects what your system can run:

  • No GPU → Filters to CPU-compatible models
  • 8GB RAM → Excludes 70B models
  • GPU available → Prioritizes GPU-accelerated models

Health Monitoring

Pre-flight checks prevent routing to dead endpoints:

✓ local     | Status: healthy | Latency: 45ms | Models: 5
✗ cloud     | Status: unreachable | Error: Connection refused

Automatic Fallbacks

  1. Model fallback — If configured model unavailable, picks compatible alternative
  2. Endpoint fallback — If cloud fails, retries with local
  3. Error handling — Never crashes, always returns something

Cost Tracking

Even though Ollama is free, logs track latency:

[2024-01-15T10:30:00] task: '...' -> local | model: llama3.2 | latency: 0.85s
[2024-01-15T10:30:45] task: '...' -> cloud | model: qwen2.5:14b | latency: 3.2s

Testing

# Run classifier tests
python tests/test_classifier.py

# Expected output:
# ✓ PASS [1] Simple factual question
# ✓ PASS [1] Zip code (not code)
# ✓ PASS [3] Debugging
# ...
# Results: X passed, Y failed

Troubleshooting

"Cannot connect to Ollama"

# Check if Ollama is running
ollama serve

# Verify endpoint
curl http://localhost:11434/api/tags

"Model not found"

# Pull the model
ollama pull llama3.2

# Or let router auto-fallback to available model

"Classification seems wrong"

Check pattern in scripts/classify.py:

# Add new pattern
COMPLEXITY_PATTERNS[2].append(r'your\s+pattern\s+here')

"Cloud endpoint slow"

# In config/router.yaml
performance:
  timeout_seconds: 30  # Reduce timeout

Requirements

  • Python 3.8+
  • Ollama (local or remote)
  • pip install -r requirements.txt

Architecture Decision Records

Why Pattern Matching vs LLM?

ApproachLatencyCostAccuracyVerdict
Pattern matching30ms0 tokens90%✅ Used
LLM classification500ms50 tokens95%Optional (--llm)

Pattern matching wins on speed/cost. Accuracy is good enough for routing.

Why Not Cloud APIs (Claude, GPT-4)?

Ollama-only keeps everything:

  • Private — No data leaves your infrastructure
  • Free — Server costs only, no per-token fees
  • Customizable — Run fine-tuned models

Future Enhancements

  • Adaptive threshold learning from feedback
  • Conversation context (multi-turn routing)
  • Cost/latency budget enforcement
  • Automatic model downloading
  • Metrics dashboard

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

TOKEN SOP

自动保存并本地调用已执行任务,避免重复消耗Token,实现离线秒级响应,提升效率与节省费用。

Registry SourceRecently Updated
General

Facebook-poster

Generuoja kasdienius įtraukiamus Facebook įrašus lietuvių kalba, kad sujungtų Qvicker.lt vartotojus su vietiniais paslaugų meistrais.

Registry SourceRecently Updated
General

TOKEN SOP

自动缓存并复用本地成功工作流,优先本地执行节省Token,支持断网使用和云端备份共享。

Registry SourceRecently Updated
General

generate-personal-brand-ad-creative-brief

Plan campaign visuals and hooks for personal brand promotions. Use when working on paid campaign planning for thought leaders, coaches, personal brand...

Registry SourceRecently Updated