Semantic Model Router
Smart LLM router that saves up to 99% on inference costs by routing each request to the cheapest model that can handle it. Powered by a pre-trained ML classifier and semantic embeddings — no external calls, no API keys needed.
Install
openclaw plugins install @rayray1218/semantic-model-router
Quick Start
from scripts.model_router import ModelRouter
router = ModelRouter()
res = router.route("Design a distributed caching layer for a fintech platform.")
print(res["report"])
# [ClawRouter] anthropic/claude-sonnet-4-6 (ELITE, ml, conf=0.97)
# Cost: $3.0/M | Baseline: $10.0/M | Saved: 70.0%
How Routing Works
Queries are classified into three tiers through a 3-stage pipeline:
- ML Classifier (primary): A Logistic Regression model trained on 6,000+ labeled queries. Runs in <1ms from embedded weights in
model_weights.py.
- Semantic Embeddings (fallback): Cosine similarity to tier intent vectors via
sentence-transformers.
- Keyword Rules (last resort): Pattern matching with no dependencies.
| Tier | Default Model | Typical Workload | Cost/1M | vs Baseline |
|---|
| BASIC | deepseek/deepseek-chat | Greetings, simple Q&A, chit-chat | $0.14 | 99% saved |
| BALANCED | openai/gpt-4o-mini | Summaries, translations, explanations | $0.15 | 99% saved |
| ELITE | anthropic/claude-sonnet-4-6 | Complex coding, architecture, security | $3.00 | 70% saved |
Supported Models (17 total, verified Feb 2026)
Anthropic
| Model | Input /1M | Output /1M |
|---|
anthropic/claude-sonnet-4-6 | $3.00 | $15.00 ★ ELITE default |
anthropic/claude-opus-4-5 | $5.00 | $25.00 |
anthropic/claude-haiku-4-5 | $0.80 | $4.00 |
OpenAI
| Model | Input /1M | Output /1M |
|---|
openai/gpt-5 | $1.25 | $10.00 |
openai/gpt-4o | $2.50 | $10.00 |
openai/gpt-4o-mini | $0.15 | $0.60 ★ BALANCED default |
openai/o3 | $2.00 | $8.00 |
openai/o4-mini | $1.10 | $4.40 |
Google
| Model | Input /1M | Output /1M |
|---|
google/gemini-3.0-pro | $1.25 | $10.00 |
google/gemini-2.5-pro | $1.25 | $10.00 |
google/gemini-2.5-flash | $0.30 | $2.50 |
google/gemini-2.5-flash-lite | $0.10 | $0.40 |
DeepSeek
| Model | Input /1M | Output /1M |
|---|
deepseek/deepseek-chat (V3.2) | $0.28 | $0.42 ★ BASIC default |
deepseek/deepseek-reasoner (V3.2) | $0.28 | $0.42 |
xAI (Grok)
| Model | Input /1M | Output /1M |
|---|
xai/grok-3 | $3.00 | $15.00 |
xai/grok-3-mini | $0.30 | $0.50 |
Pricing source: Official API docs of each provider, verified Feb 2026.
Override Models at Runtime
# Use GPT-5.2 for ELITE, Gemini Flash Lite for BASIC
router = ModelRouter(
elite_model="openai/gpt-5.2",
balanced_model="google/gemini-2.5-flash",
basic_model="google/gemini-2.5-flash-lite",
)
# Swap a tier's model without recreating the router
router.set_model("ELITE", "anthropic/claude-opus-4-5")
List All Available Models (CLI)
python3 scripts/model_router.py --list-models
CLI Usage
# Route a single query
python3 scripts/model_router.py "Implement AES encryption from scratch"
# Override ELITE model
python3 scripts/model_router.py --elite openai/gpt-5.2 "Write a compiler"
# Run full smoke-test
python3 scripts/model_router.py
Dynamic Keyword Expansion
router.add_keywords("ELITE", ["cryptographic proof", "zero-knowledge"])
Example Output
Query Predicted Expected ✓ Cost Info
────────────────────────────────────────────────────────────────────────────────────
How are you doing today? BASIC BASIC ✓ $0.14/M saved 98.6%
Summarize this article in three bullet points. BALANCED BALANCED ✓ $0.15/M saved 98.5%
Implement a thread-safe LRU cache in Python. ELITE ELITE ✓ $3.0/M saved 70.0%
Security & Privacy
- Zero external calls: All classification runs locally.
- No API keys: The router itself needs none.
- Transparent weights: All model parameters live in
scripts/model_weights.py — fully auditable.
Save costs, route smarter. Built for the OpenClaw community.