Xrouter
Xrouter is an open-source inference router that sits between OpenClaw and your LLM providers. It uses a fast, hardware-aware classifier to route each request to the most cost-effective model that can handle the task.
This project is MIT licensed. See the MIT License.
Core Features
- OpenAI-compatible reverse proxy at
POST /v1/chat/completions. - 3-tier classifier (0 = cheap, 1 = medium, 2 = frontier) with early stream cutoff.
- Hardware detection helper to recommend local engine.
- Provider selection wizard to choose local and cloud endpoints.
- Cache layer with Redis or in-memory LRU fallback.
- Full cloud mode when local inference is not viable.
- Token tracking dashboard at
/dashboard.
Workflow
flowchart TD
A["Client / OpenClaw request"] --> B["Router (OpenAI-compatible)"]
B --> C{"Classifier enabled?"}
C -->|No| F["Route to Frontier provider"]
C -->|Yes| D["Classifier (0 / 1 / 2)"]
D --> E{"Decision"}
E -->|0| G["Route to Cheap provider"]
E -->|1| M["Route to Medium provider"]
E -->|2| F
G --> H["Provider adapter (auto or explicit)"]
M --> H
F --> H
H --> I["Upstream API call"]
I --> J["Stream/Response back to client"]
Repository Layout
src/server.js: router and streaming proxy.src/classifier.js: classifier call and retry logic.src/config.js: configuration and env parsing.src/cache.js: Redis + LRU cache.src/token_tracker.js: token tracking.scripts/check_hw.js: hardware detection.scripts/configure_providers.js: interactive provider setup.
Requirements
- Node.js 20+.
- Local classifier engine (optional).
- A frontier provider endpoint (required).
Quickstart
- Install dependencies.
- (Optional) Start a local model server.
- Run the configuration wizard.
- Start the router.
npm install
npm run configure
npm run dev
How To Use
- Start your local model server (optional but recommended).
- Run the wizard to configure providers and models.
- Start the router.
- Send OpenAI-compatible requests to the router.
- Inspect routing decisions in response headers or the dashboard.
Example local setup (Ollama):
ollama pull llama3.1
ollama run llama3.1
Run the wizard:
npm run configure
Start the router:
npm run dev
Test a request:
curl -i http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"any","messages":[{"role":"user","content":"Fix this sentence: I has a apple."}]}'
Look for these headers:
X-Xrouter-decision:0,1, or2.X-Xrouter-upstream:cheap,medium, orfrontier.
Open the dashboard:
http://localhost:3000/dashboard
Raw usage JSON:
http://localhost:3000/usage
Provider Selection (Terminal Wizard) Run:
npm run configure
The wizard:
- Scans hardware and recommends a local engine.
- Suggests a local classifier model.
- Lets you choose provider base URLs, API keys, and model overrides for cheap/medium/frontier routes.
- Writes
upstreams.jsonand optionally updates.env.
Quick Start Mode
- If your machine can run a local model, you can choose Quick Start.
- Quick Start auto-configures the local classifier.
- Cheap route always uses the same local model as the classifier to avoid Ollama model swapping.
- You only need to choose medium and frontier providers/models.
- On Apple Silicon (Ollama), the wizard lists installed Ollama models and can auto-download a recommended model.
Routing Behavior
- The classifier is called for each uncached request.
- The first
0,1, or2token returned decides the route. - If classification fails, the router defaults to the frontier route.
- When the classifier is enabled, cheap, medium, and frontier routes must be configured.
Compatibility
- The router accepts OpenAI-style requests and translates when needed.
- Provider type can be explicit (
xrouter,openai_compatible,openai,anthropic,gemini,cohere,azure_openai,mistral,groq,together,perplexity) orauto. autoinfers the provider adapter from the base URL or API key.- Providers that expose OpenAI-compatible endpoints use the
openai_compatibleadapter. - Anthropic/Gemini/Cohere streaming is translated into OpenAI-style SSE chunks.
- Non-OpenAI adapters currently support text-only messages and basic sampling params (temperature/top_p/stop).
Token Tracking Dashboard
GET /usage: returns cumulative token usage forcheap,medium, andfrontier.GET /dashboard: UI that displays token split and totals.- Local usage is counted inside
cheapwhen cheap uses the local model.
Environment Summary
HOST: bind host, default0.0.0.0.PORT: bind port, default3000.ROUTER_API_KEY: requireAuthorization: Bearer <key>.LOG_LEVEL: log level (debug/info/warn/error).LOG_TO_FILE: settrueto write logs to files.LOG_DIR: directory for log files (default./logs).CLASSIFIER_ENABLED: setfalseto disable local classification.CLASSIFIER_BASE_URL: OpenAI-compatible classifier endpoint.CLASSIFIER_MODEL: classifier model name.CLASSIFIER_SYSTEM_PROMPT: classifier prompt (single line).CLASSIFIER_TIMEOUT_MS: classifier timeout.CLASSIFIER_FORCE_STREAM: force streaming classifier request.CLASSIFIER_WARMUP: warm the classifier on server start.CLASSIFIER_WARMUP_DELAY_MS: delay before warmup request (ms).CLASSIFIER_KEEP_ALIVE_MS: keep-alive interval for classifier warmup (ms).CLASSIFIER_LOADING_RETRY_MS: delay between retries when the model is loading.CLASSIFIER_LOADING_MAX_RETRIES: max retries when the model is loading.CHEAP_BASE_URL: optional, defaults to classifier base URL.CHEAP_API_KEY: cheap provider API key.CHEAP_MODEL: optional model override for cheap route.CHEAP_PROVIDER: provider type for cheap route (autoif empty).CHEAP_HEADERS: optional JSON headers for cheap provider (stringified object).CHEAP_DEPLOYMENT: Azure deployment override for cheap route.CHEAP_API_VERSION: Azure API version override for cheap route.MEDIUM_BASE_URL: required when classifier is enabled.MEDIUM_API_KEY: medium provider API key.MEDIUM_MODEL: optional model override for medium route.MEDIUM_PROVIDER: provider type for medium route (autoif empty).MEDIUM_HEADERS: optional JSON headers for medium provider (stringified object).MEDIUM_DEPLOYMENT: Azure deployment override for medium route.MEDIUM_API_VERSION: Azure API version override for medium route.FRONTIER_BASE_URL: OpenAI-compatible frontier endpoint.FRONTIER_API_KEY: frontier API key.FRONTIER_MODEL: optional model override for frontier route.FRONTIER_PROVIDER: provider type for frontier route (autoif empty).FRONTIER_HEADERS: optional JSON headers for frontier provider (stringified object).FRONTIER_DEPLOYMENT: Azure deployment override for frontier route.FRONTIER_API_VERSION: Azure API version override for frontier route.REDIS_URL: if set, enables Redis cache.
Local Model Installation & Run Guides Ollama (best for Mac, easiest cross-platform)
- Install: Ollama Quickstart
- Pull a model:
ollama pull llama3.1 - Run:
ollama run llama3.1 - Base URL:
http://localhost:11434 - Router config:
CLASSIFIER_BASE_URL=http://localhost:11434CLASSIFIER_MODEL=llama3.1
vLLM (NVIDIA GPU)
- OpenAI server: vLLM OpenAI Server
- Example:
vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123 - Base URL:
http://localhost:8000 - Router config:
CLASSIFIER_BASE_URL=http://localhost:8000CLASSIFIER_MODEL=NousResearch/Meta-Llama-3-8B-Instruct
TensorRT-LLM (NVIDIA, max speed)
- Repo: TensorRT-LLM
- Server: trtllm-serve
- Base URL:
http://<host>:<port> - Router config:
CLASSIFIER_BASE_URL=http://<host>:<port>CLASSIFIER_MODEL=<your model>
llama.cpp (CPU/AMD fallback)
- Repo: llama.cpp
- Example:
llama-server -m model.gguf --port 8080 - Base URL:
http://localhost:8080 - Router config:
CLASSIFIER_BASE_URL=http://localhost:8080CLASSIFIER_MODEL=<gguf model name>
Docker Build and run the router with Redis:
docker compose -f deploy/docker-compose.yml up --build
Hardware Detection Run:
npm run check-hw
This prints the recommended engine:
tensorrt-llmfor large NVIDIA GPUs.vllmfor standard NVIDIA GPUs.mlxfor Apple Silicon.llama.cppfor CPU/AMD fallback.
Model List Fetching
- The wizard queries provider model list endpoints when possible.
- OpenAI-compatible:
/v1/models - Anthropic:
/v1/models - Gemini:
/v1beta/models - Cohere:
/v1/models - If listing fails, the wizard falls back to
scripts/cloud_model_catalog.json.
Star History