openrouter

Expert OpenRouter API assistant for AI agents. Use when making API calls to OpenRouter's unified API for 400+ AI models. Covers chat completions, streaming, tool calling, structured outputs, web search, embeddings, multimodal inputs, model selection, routing, and error handling.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "openrouter" with this command: npx skills add dimitrigilbert/ai-skills/dimitrigilbert-ai-skills-openrouter

OpenRouter API for AI Agents

Expert guidance for AI agents integrating with OpenRouter API - unified access to 400+ models from 90+ providers.

When to use this skill:

  • Making chat completions via OpenRouter API
  • Selecting appropriate models and variants
  • Implementing streaming responses
  • Using tool/function calling
  • Enforcing structured outputs
  • Integrating web search
  • Handling multimodal inputs (images, audio, video, PDFs)
  • Managing model routing and fallbacks
  • Handling errors and retries
  • Optimizing cost and performance

API Basics

Making a Request

Endpoint: POST https://openrouter.ai/api/v1/chat/completions

Headers (required):

{
  'Authorization': `Bearer ${apiKey}`,
  'Content-Type': 'application/json',
  // Optional: for app attribution
  'HTTP-Referer': 'https://your-app.com',
  'X-Title': 'Your App Name'
}

Minimal request structure:

const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      { role: 'user', content: 'Your prompt here' }
    ]
  })
});

Response Structure

Non-streaming response:

{
  "id": "gen-abc123",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Response text here"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  },
  "model": "anthropic/claude-3.5-sonnet"
}

Key fields:

  • choices[0].message.content - The assistant's response
  • choices[0].finish_reason - Why generation stopped (stop, length, tool_calls, etc.)
  • usage - Token counts and cost information
  • model - Actual model used (may differ from requested)

When to Use Streaming vs Non-Streaming

Use streaming (stream: true) when:

  • Real-time responses needed (chat interfaces, interactive tools)
  • Latency matters (user-facing applications)
  • Large responses expected (long-form content)
  • Want to show progressive output

Use non-streaming when:

  • Processing in background (batch jobs, async tasks)
  • Need complete response before processing
  • Building to an API/endpoint
  • Response is short (few tokens)

Streaming basics:

const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: { /* ... */ },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [{ role: 'user', content: '...' }],
    stream: true
  })
});

for await (const chunk of response.body) {
  const text = new TextDecoder().decode(chunk);
  const lines = text.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6); // Remove 'data: '
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      // Accumulate or display content
    }
  }
}

Model Selection

Model Identifier Format

Format: provider/model-name[:variant]

Examples:

  • anthropic/claude-3.5-sonnet - Specific model
  • openai/gpt-4o:online - With web search enabled
  • google/gemini-2.0-flash:free - Free tier variant

Model Variants and When to Use Them

VariantUse WhenTradeoffs
:freeCost is primary concern, testing, prototypingRate limits, lower quality models
:onlineNeed current information, real-time dataHigher cost, web search latency
:extendedLarge context window neededMay be slower, higher cost
:thinkingComplex reasoning, multi-step problemsHigher token usage, slower
:nitroSpeed is criticalMay have quality tradeoffs
:exactoNeed specific providerNo fallbacks, may be less available

Default Model Choices by Task

General purpose: anthropic/claude-3.5-sonnet or openai/gpt-4o

  • Balanced quality, speed, cost
  • Good for most tasks

Coding: anthropic/claude-3.5-sonnet or openai/gpt-4o

  • Strong code generation and understanding
  • Good reasoning

Complex reasoning: anthropic/claude-opus-4:thinking or openai/o3

  • Deep reasoning capabilities
  • Higher cost, slower

Fast responses: openai/gpt-4o-mini:nitro or google/gemini-2.0-flash

  • Minimal latency
  • Good for real-time applications

Cost-sensitive: google/gemini-2.0-flash:free or meta-llama/llama-3.1-70b:free

  • No cost with limits
  • Good for high-volume, lower-complexity tasks

Current information: anthropic/claude-3.5-sonnet:online or google/gemini-2.5-pro:online

  • Web search built-in
  • Real-time data

Large context: anthropic/claude-3.5-sonnet:extended or google/gemini-2.5-pro:extended

  • 200K+ context windows
  • Document analysis, codebase understanding

Provider Routing Preferences

Default behavior: OpenRouter automatically selects best provider

Explicit provider order:

{
  provider: {
    order: ['anthropic', 'openai', 'google'],
    allow_fallbacks: true,
    sort: 'price' // 'price', 'latency', or 'throughput'
  }
}

When to set provider order:

  • Have preferred provider arrangements
  • Need to optimize for specific metric (cost, speed)
  • Want to exclude certain providers
  • Have BYOK (Bring Your Own Key) for specific providers

Model Fallbacks

Automatic fallback - try multiple models in order:

{
  models: [
    'anthropic/claude-3.5-sonnet',
    'openai/gpt-4o',
    'google/gemini-2.0-flash'
  ]
}

When to use fallbacks:

  • High reliability required
  • Multiple providers acceptable
  • Want graceful degradation
  • Avoid single point of failure

Fallback behavior:

  • Tries first model
  • Falls to next on error (5xx, 429, timeout)
  • Uses whichever succeeds
  • Returns which model was used in model field

Parameters You Need

Core Parameters

model (string, optional)

  • Which model to use
  • Default: user's default model
  • Always specify for consistency

messages (Message[], required)

  • Conversation history
  • Structure: { role: 'user'|'assistant'|'system', content: string | ContentPart[] }
  • For multimodal: content can be array of text and image_url parts

stream (boolean, default: false)

  • Enable Server-Sent Events streaming
  • Use for real-time responses

temperature (float, 0.0-2.0, default: 1.0)

  • Controls randomness
  • 0.0-0.3: Deterministic, factual responses (code, precise answers)
  • 0.4-0.7: Balanced (general use)
  • 0.8-1.2: Creative (brainstorming, creative writing)
  • 1.3-2.0: Highly creative, unpredictable (experimental)

max_tokens (integer, optional)

  • Maximum tokens to generate
  • Always set to control cost and prevent runaway responses
  • Typical: 100-500 for short, 1000-2000 for long responses
  • Model limit: context_length - prompt_length

top_p (float, 0.0-1.0, default: 1.0)

  • Nucleus sampling - limits to top probability mass
  • Use instead of temperature when you want predictable diversity
  • 0.9-0.95: Common settings for quality

top_k (integer, 0+, default: 0/disabled)

  • Limit to K most likely tokens
  • 1: Always most likely (deterministic)
  • 40-50: Balanced
  • Not available for OpenAI models

Sampling Strategy Guidelines

For code generation: temperature: 0.1-0.3, top_p: 0.95 For factual responses: temperature: 0.0-0.2 For creative writing: temperature: 0.8-1.2 For brainstorming: temperature: 1.0-1.5 For chat: temperature: 0.6-0.8

Tool Calling Parameters

tools (Tool[], default: [])

  • Available functions for model to call
  • Structure:
{
  type: 'function',
  function: {
    name: 'function_name',
    description: 'What it does',
    parameters: { /* JSON Schema */ }
  }
}

tool_choice (string | object, default: 'auto')

  • Control when tools are called
  • 'auto': Model decides (default)
  • 'none': Never call tools
  • 'required': Must call a tool
  • { type: 'function', function: { name: 'specific_tool' } }: Force specific tool

parallel_tool_calls (boolean, default: true)

  • Allow multiple tools simultaneously
  • Set false for sequential execution

When to use tools:

  • Need to query external APIs (weather, search, database)
  • Need to perform calculations or data processing
  • Building agentic systems
  • Need structured data extraction

Structured Output Parameters

response_format (object, optional)

  • Enforce specific output format

JSON object mode:

{ type: 'json_object' }
  • Model returns valid JSON
  • Must also instruct model in system message

JSON Schema mode (strict):

{
  type: 'json_schema',
  json_schema: {
    name: 'schema_name',
    strict: true,
    schema: { /* JSON Schema */ }
  }
}
  • Model returns JSON matching exact schema
  • Use when structure is critical (APIs, data processing)

When to use structured outputs:

  • Need predictable response format
  • Integrating with systems (APIs, databases)
  • Data extraction
  • Form filling

Web Search Parameters

Enable via model variant (simplest):

{ model: 'anthropic/claude-3.5-sonnet:online' }

Enable via plugin:

{
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5
  }]
}

When to use web search:

  • Need current information (news, prices, events)
  • User asks about recent developments
  • Need factual verification
  • Topic requires real-time data

Other Important Parameters

user (string, optional)

  • Stable identifier for end-user
  • Set when you have user IDs
  • Helps with abuse detection and caching

session_id (string, optional)

  • Group related requests
  • Set for conversation tracking
  • Improves caching and observability

metadata (Record<string, string>, optional)

  • Custom metadata (max 16 key-value pairs)
  • Use for analytics and tracking
  • Keys: max 64 chars, Values: max 512 chars

stop (string | string[], optional)

  • Stop sequences to halt generation
  • Common: ['\n\n', '###', 'END']

Handling Responses

Non-Streaming Responses

Extract content:

const response = await fetch(/* ... */);
const data = await response.json();

const content = data.choices[0].message.content;
const finishReason = data.choices[0].finish_reason;
const usage = data.usage;

Check for tool calls:

const toolCalls = data.choices[0].message.tool_calls;
if (toolCalls) {
  // Model wants to call tools
  for (const toolCall of toolCalls) {
    const { name, arguments: args } = toolCall.function;
    const parsedArgs = JSON.parse(args);
    // Execute tool...
  }
}

Streaming Responses

Process SSE stream:

let fullContent = '';
const response = await fetch(/* ... */);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6);
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      fullContent += content;
      // Process incrementally...
    }

    // Handle usage in final chunk
    if (parsed.usage) {
      console.log('Usage:', parsed.usage);
    }
  }
}

Handle streaming tool calls:

// Tool calls stream across multiple chunks
let currentToolCall = null;
let toolArgs = '';

for (const parsed of chunks) {
  const toolCallChunk = parsed.choices?.[0]?.delta?.tool_calls?.[0];

  if (toolCallChunk?.function?.name) {
    currentToolCall = { id: toolCallChunk.id, ...toolCallChunk.function };
  }

  if (toolCallChunk?.function?.arguments) {
    toolArgs += toolCallChunk.function.arguments;
  }

  if (parsed.choices?.[0]?.finish_reason === 'tool_calls' && currentToolCall) {
    // Complete tool call
    currentToolCall.arguments = toolArgs;
    // Execute tool...
  }
}

Usage and Cost Tracking

const { usage } = data;
console.log(`Prompt: ${usage.prompt_tokens}`);
console.log(`Completion: ${usage.completion_tokens}`);
console.log(`Total: ${usage.total_tokens}`);

// Cost (if available)
if (usage.cost) {
  console.log(`Cost: $${usage.cost.toFixed(6)}`);
}

// Detailed breakdown
console.log(usage.prompt_tokens_details);
console.log(usage.completion_tokens_details);

Error Handling

Common HTTP Status Codes

400 Bad Request

  • Invalid request format
  • Missing required fields
  • Parameter out of range
  • Fix: Validate request structure and parameters

401 Unauthorized

  • Missing or invalid API key
  • Fix: Check API key format and permissions

403 Forbidden

  • Insufficient permissions
  • Model not allowed
  • Fix: Check guardrails, model access, API key permissions

402 Payment Required

  • Insufficient credits
  • Fix: Add credits to account

408 Request Timeout

  • Request took too long
  • Fix: Reduce prompt length, use streaming, try simpler model

429 Rate Limited

  • Too many requests
  • Fix: Implement exponential backoff, reduce request rate

502 Bad Gateway

  • Provider error
  • Fix: Use model fallbacks, retry with different model

503 Service Unavailable

  • Service overloaded
  • Fix: Retry with backoff, use fallbacks

Retry Strategy

Exponential backoff:

async function requestWithRetry(url, body, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await fetch(url, body);

      if (response.ok) {
        return await response.json();
      }

      // Retry on rate limit or server errors
      if (response.status === 429 || response.status >= 500) {
        const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }

      // Don't retry other errors
      return response;
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
      const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}

Retryable status codes: 408, 429, 502, 503 Do not retry: 400, 401, 403, 402

Graceful Degradation

Use model fallbacks:

{
  models: [
    'anthropic/claude-3.5-sonnet',  // Primary
    'openai/gpt-4o',                // Fallback 1
    'google/gemini-2.0-flash'        // Fallback 2
  ]
}

Handle partial failures:

  • Log errors but continue
  • Fall back to simpler features
  • Use cached responses when available
  • Provide degraded experience rather than failing completely

Advanced Features

When to Use Tool Calling

Good use cases:

  • Querying external APIs (weather, stock prices, databases)
  • Performing calculations or data processing
  • Extracting structured data from unstructured text
  • Building agentic systems with multiple steps
  • When decisions require external information

Implementation pattern:

  1. Define tools with clear descriptions and parameters
  2. Send request with tools array
  3. Check if tool_calls present in response
  4. Execute tools with parsed arguments
  5. Send tool results back in a new request
  6. Repeat until model provides final answer

See: references/ADVANCED_PATTERNS.md for complete agentic loop implementation

When to Use Structured Outputs

Good use cases:

  • API responses (need specific schema)
  • Data extraction (forms, documents)
  • Configuration files (JSON, YAML)
  • Database operations (structured queries)
  • When downstream processing requires specific format

Implementation pattern:

  1. Define JSON Schema for desired output
  2. Set response_format: { type: 'json_schema', json_schema: { ... } }
  3. Instruct model to produce JSON (system or user message)
  4. Validate response against schema
  5. Handle parsing errors gracefully

Add response healing for robustness:

{
  response_format: { /* ... */ },
  plugins: [{ id: 'response-healing' }]
}

When to Use Web Search

Good use cases:

  • User asks about recent events, news, or current data
  • Need verification of facts
  • Questions with time-sensitive information
  • Topic requires up-to-date information
  • User explicitly requests current information

Simple implementation (variant):

{
  model: 'anthropic/claude-3.5-sonnet:online'
}

Advanced implementation (plugin):

{
  model: 'openrouter.ai/auto',
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5,
    engine: 'exa' // or 'native'
  }]
}

When to Use Multimodal Inputs

Images (vision):

  • OCR, image understanding, visual analysis
  • Models: openai/gpt-4o, anthropic/claude-3.5-sonnet, google/gemini-2.5-pro

Audio:

  • Speech-to-text, audio analysis
  • Models with audio support

Video:

  • Video understanding, frame analysis
  • Models with video support

PDFs:

  • Document parsing, content extraction
  • Requires file-parser plugin

Implementation: See references/ADVANCED_PATTERNS.md for multimodal patterns


Best Practices for AI

Default Model Selection

Start with: anthropic/claude-3.5-sonnet or openai/gpt-4o

  • Good balance of quality, speed, cost
  • Strong at most tasks
  • Wide compatibility

Switch based on needs:

  • Need speed → openai/gpt-4o-mini:nitro or google/gemini-2.0-flash
  • Complex reasoning → anthropic/claude-opus-4:thinking
  • Need web search → :online variant
  • Large context → :extended variant
  • Cost-sensitive → :free variant

Default Parameters

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [...],
  temperature: 0.6,  // Balanced creativity
  max_tokens: 1000,   // Reasonable length
  top_p: 0.95        // Common for quality
}

Adjust based on task:

  • Code: temperature: 0.2
  • Creative: temperature: 1.0
  • Factual: temperature: 0.0-0.3

When to Prefer Streaming

Always prefer streaming when:

  • User-facing (chat, interactive tools)
  • Response length unknown
  • Want progressive feedback
  • Latency matters

Use non-streaming when:

  • Batch processing
  • Need complete response before acting
  • Building API endpoints
  • Very short responses (< 50 tokens)

When to Enable Specific Features

Tools: Enable when you need external data or actions Structured outputs: Enable when response format matters Web search: Enable when current information needed Streaming: Enable for user-facing, real-time responses Model fallbacks: Enable when reliability critical Provider routing: Enable when you have preferences or constraints

Cost Optimization Patterns

Use free models for:

  • Testing and prototyping
  • Low-complexity tasks
  • High-volume, low-value operations

Use routing to optimize:

{
  provider: {
    order: ['openai', 'anthropic'],
    sort: 'price',  // Optimize for cost
    allow_fallbacks: true
  }
}

Set max_tokens to prevent runaway responses Use caching via user and session_id parameters Enable prompt caching when supported

Performance Optimization

Reduce latency:

  • Use :nitro variants for speed
  • Use streaming for perceived speed
  • Set user ID for caching benefits
  • Choose faster models (mini, flash) when quality allows

Increase throughput:

  • Use provider routing with sort: 'throughput'
  • Parallelize independent requests
  • Use streaming to reduce wait time

Optimize for specific metrics:

{
  provider: {
    sort: 'latency'  // or 'price' or 'throughput'
  }
}

Progressive Disclosure

For detailed reference information, consult:

Parameters Reference

File: references/PARAMETERS.md

  • Complete parameter reference (50+ parameters)
  • Types, ranges, defaults
  • Parameter support by model
  • Usage examples

Error Codes Reference

File: references/ERROR_CODES.md

  • All HTTP status codes
  • Error response structure
  • Error metadata types
  • Native finish reasons
  • Retry strategies

Model Selection Guide

File: references/MODEL_SELECTION.md

  • Model families and capabilities
  • Model variants explained
  • Selection criteria by use case
  • Model capability matrix
  • Provider routing preferences

Routing Strategies

File: references/ROUTING_STRATEGIES.md

  • Model fallbacks configuration
  • Provider selection patterns
  • Auto router setup
  • Routing by use case (cost, latency, quality)

Advanced Patterns

File: references/ADVANCED_PATTERNS.md

  • Tool calling with agentic loops
  • Structured outputs implementation
  • Web search integration
  • Multimodal handling
  • Streaming patterns
  • Framework integrations

Working Examples

File: references/EXAMPLES.md

  • TypeScript patterns for common tasks
  • Python examples
  • cURL examples
  • Advanced patterns
  • Framework integration examples

Ready-to-Use Templates

Directory: templates/

  • basic-request.ts - Minimal working request
  • streaming-request.ts - SSE streaming with cancellation
  • tool-calling.ts - Complete agentic loop with tools
  • structured-output.ts - JSON Schema enforcement
  • error-handling.ts - Robust retry logic

Quick Reference

Minimal Request

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: 'Your prompt' }]
}

With Streaming

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  stream: true
}

With Tools

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  tools: [{ type: 'function', function: { name, description, parameters } }],
  tool_choice: 'auto'
}

With Structured Output

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'system', content: 'Output JSON only...' }],
  response_format: { type: 'json_object' }
}

With Web Search

{
  model: 'anthropic/claude-3.5-sonnet:online',
  messages: [{ role: 'user', content: '...' }]
}

With Model Fallbacks

{
  models: ['anthropic/claude-3.5-sonnet', 'openai/gpt-4o'],
  messages: [{ role: 'user', content: '...' }]
}

Remember: OpenRouter is OpenAI-compatible. Use the OpenAI SDK with baseURL: 'https://openrouter.ai/api/v1' for a familiar experience.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

agents-md

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

deep-agent-review

No summary provided by upstream source.

Repository SourceNeeds Review
General

gh-profile

No summary provided by upstream source.

Repository SourceNeeds Review