AI Rankings Leaderboard Skill
Description
A comprehensive skill for querying AI model and application rankings from multiple authoritative sources. Get the latest insights on LLM performance, popularity, pricing, and value metrics.
Data Sources
| Source | URL | Focus |
|---|---|---|
| Artificial Analysis | https://artificialanalysis.ai/ | Intelligence Index, Speed, Price benchmarks |
| LLM Leaderboard | https://artificialanalysis.ai/leaderboards/models | Model comparison (100+ models) |
| LLM API Providers | https://artificialanalysis.ai/leaderboards/providers | API Provider comparison (500+ endpoints) |
| Image & Video Leaderboards | https://artificialanalysis.ai/ (Image & Video section) | Image/Video model ELO rankings |
| OpenRouter Rankings | https://openrouter.ai/rankings | Model usage & popularity |
| OpenRouter Apps | https://openrouter.ai/apps | AI applications ranking |
| OpenRouter Models | https://openrouter.ai/models | All available models with pricing |
| OpenRouter Free Models | https://openrouter.ai/models?q=free | Free models only |
| Pinchbench | https://pinchbench.com/ | Model benchmark (Success Rate, Speed, Cost, Value) |
Features
1. Artificial Analysis LLM Leaderboard
Intelligence Index (智力指数)
- Artificial Analysis Intelligence Index v4.0: Comprehensive model intelligence score
- 10 evaluation dimensions: Multiple independent assessment criteria
- Frontier Models: Top intelligence models (Gemini 3.1 Pro, GPT-5.4, Claude Opus 4.6, etc.)
- Reasoning Models: Identifies models with reasoning capabilities
Artificial Analysis Coding Index (编程能力指数)
- URL: https://artificialanalysis.ai/?intelligence=coding-index
- 评估模型在编程任务上的表现
- 综合多个代码评测基准
Artificial Analysis Agentic Index (智能体能力指数)
- URL: https://artificialanalysis.ai/?intelligence=agentic-index
- 评估模型的自主智能体能力
- 包括工具使用、多步骤推理、任务完成等
Performance Metrics
| Metric | Description |
|---|---|
| Intelligence Index | Overall model intelligence score (higher is better) |
| Speed | Output tokens per second (tokens/s) |
| Blended Price | Combined USD per million tokens (3:1 input/output ratio) |
| Input Price | Price per million input tokens (USD) |
| Output Price | Price per million output tokens (USD) |
| Latency (TTFT) | Time to First Token in seconds |
| Context Window | Maximum context length supported |
Model Comparison Table Columns
| Column | Description |
|---|---|
| Features | Model features (reasoning badge, etc.) |
| Model | Model name with logo |
| Context Window | Max context length |
| Creator | Provider/Company |
| Intelligence Index | AI intelligence score |
| Blended USD/1M Tokens | Combined input/output price |
| Median Tokens/s | Median output speed |
| Latency First Chunk (s) | Time to first token |
| Further Analysis | Link to detailed analysis |
Filters Available
| Filter | Options |
|---|---|
| Frontier Models | On/Off |
| Open Weights | On/Off (开源权重模型) |
| Size Class | Small, Medium, Large, etc. |
| Reasoning | On/Off (推理模型筛选) |
| Model Status | Current, Preview, Discontinued |
2. Artificial Analysis LLM API Providers Leaderboard
Comparison of 500+ AI Model Endpoints
| Column | Description |
|---|---|
| API Provider | Provider name (Cerebras, Groq, Fireworks, etc.) |
| Model | Model name |
| Context Window | Max context length |
| License | Model license |
| Intelligence Index | Model intelligence score |
| Blended USD/1M Tokens | Combined price |
| Median Tokens/s | Output speed |
| Median First Chunk (s) | Latency (TTFT) |
| Total Response (s) | End-to-end response time |
| Reasoning Time (s) | Reasoning model computation time |
| End-to-End Response Time | Full request-response cycle |
Key Providers
- Cerebras
- Eigen AI
- Fireworks
- SambaNova
- Together.ai
- Hyperbolic
- Nebius Fast
- Google Vertex
- Groq
- Azure OpenAI
- AWS Bedrock
- OpenAI Direct
- Anthropic Direct
- And 10+ more...
3. Artificial Analysis Image & Video Leaderboards
Text-to-Image Leaderboard
- ELO scores from blind preference votes
- 95% confidence intervals displayed
- Top models: GPT Image 1.5, Imagen 4 Ultra, Gemini Image models, etc.
Video Leaderboards
| Category | Description |
|---|---|
| Text to Video (with Audio) | Text generates video with sound |
| Text to Video (without Audio) | Text generates silent video |
| Image to Video (with Audio) | Image + text generates video with sound |
| Image to Video (without Audio) | Image + text generates silent video |
| Image Editing | Edit existing images with AI |
Evaluation Method
- ELO scoring system (blind preference voting)
- 95% confidence intervals
- Real user preference data
4. OpenRouter Model Rankings
- LLM Leaderboard: Overall model usage rankings
- Market Share: Market share by model provider
- Categories: Rankings by use case
- Languages: Natural language support rankings
- Programming: Programming language support
- Context Length: Long context handling
- Tool Calls: Tool calling capabilities
- Images: Image processing volume
5. OpenRouter App Rankings
- Most Popular: Top apps by token usage
- Trending: Fastest growing apps this week
- Categories: Coding Agents, Productivity, Creative, Entertainment
6. OpenRouter Model Catalog
- All Models: Complete list of available models on OpenRouter
- Free Models: Models with $0 pricing (free to use)
- Model ID: The exact
modelparameter to use when calling OpenRouter API - Pricing Info: Input/output token pricing
7. Pinchbench Benchmarks
- Success Rate: Task completion success percentage
- Speed: Response time performance
- Cost: Cost per run analysis
- Value: Price-performance ratio
Trigger Keywords
General AI Rankings
- "AI rankings" / "AI 排行榜"
- "LLM leaderboard" / "LLM 排行"
- "model comparison" / "模型对比"
- "best AI models" / "最好的 AI 模型"
- "AI apps ranking" / "AI 应用排行"
- "model benchmark" / "模型评测"
Artificial Analysis Specific
- "Artificial Analysis" / "artificialanalysis"
- "AI intelligence index" / "AI 智力指数"
- "intelligence index" / "智力指数"
- "模型速度排行" / "speed ranking"
- "模型价格对比" / "price comparison"
- "fastest models" / "最快模型"
- "cheapest models" / "最便宜模型"
- "tokens per second" / "t/s" / "tokens/s"
- "latency" / "TTFT" / "首 token 延迟"
- "Artificial Analysis Intelligence Index"
- "AAII" / "AA Intelligence"
- "API providers" / "API 提供商"
- "LLM providers" / "LLM 提供商"
- "Cerebras" / "Groq" / "Fireworks"
- "open weights" / "开源权重"
- "reasoning models" / "推理模型"
- "elo score" / "ELO 评分"
- "image arena" / "图生图"
- "text to image" / "文生图"
- "text to video" / "文生视频"
- "image to video" / "图生视频"
OpenRouter Specific
- "free models" / "免费模型" / "free AI models"
- "OpenRouter models" / "OpenRouter 免费模型"
- "OpenRouter rankings" / "OpenRouter 排行"
- "Pinchbench"
- "OpenRouter model ID" / "OpenRouter 模型 ID"
- "查找 OpenRouter" / "OpenRouter 上的模型"
- "model ID for [模型名]" / "[模型名] model ID"
- "OpenRouter 上 [模型名]" / "OpenRouter [模型名] 模型"
- "OpenRouter model parameter"
- "调用量排行" / "使用量排行" / "top models" / "top 模型"
- "OpenRouter 调用量" / "OpenRouter 使用量"
Runtime Tools
This skill requires:
execute_command: Execute shell commands and scriptsuse_skill: Load browser-automation skill for JavaScript-rendered pagesweb_fetch: Fallback for simple HTTP requests
Installation
Required CLI Dependency: agent-browser
The agent-browser CLI must be installed before using this skill. Install via:
npm install -g agent-browser
# or
npx agent-browser --version
This skill calls agent-browser via subprocess with hardcoded argument arrays (no shell injection risk).
Note on browser eval: The agent-browser eval command executes document.body.innerText or similar DOM queries on the remote page to extract rendered content. This is standard web scraping behavior for JavaScript-rendered pages and is limited to reading page content only.
Browser Automation Support
For JavaScript-rendered pages (OpenRouter Rankings, Artificial Analysis), this skill uses browser automation:
-
Load browser-automation skill first:
use_skill("browser-automation") -
Navigate to rankings page:
agent-browser open "https://artificialanalysis.ai/leaderboards/models" agent-browser wait --load networkidle agent-browser eval "document.body.innerText" -
Key pages requiring browser:
https://artificialanalysis.ai/leaderboards/models- LLM comparison (100+ models)https://artificialanalysis.ai/leaderboards/providers- API providers (500+ endpoints)https://artificialanalysis.ai/- Image & Video leaderboardshttps://openrouter.ai/rankings- Model usage rankings (JS rendered)https://openrouter.ai/apps- App rankings (JS rendered)
Artificial Analysis Page Structure
LLM Leaderboard Page (/leaderboards/models):
LLM Leaderboard - Comparison of over 100 AI models
├── HIGHLIGHTS section
│ ├── Intelligence: Gemini 3.1 Pro Preview, GPT-5.4 (xhigh)
│ ├── Speed: Mercury 2 (943 t/s), NVIDIA Nemotron 3 Super (462 t/s)
│ └── Price: Gemma 3n E4B (cheapest)
├── Filters:
│ ├── Frontier Models | Open Weights | Size Class | Reasoning | Model Status
├── Comparison table columns:
│ ├── Features | Model | Context Window | Creator
│ ├── Intelligence Index | Blended USD/1M | Median Tokens/s | Latency
│ └── Further Analysis
└── Key definitions (expandable)
├── Context window
├── Output Speed (tokens/s)
├── Latency (Time to First Token)
├── Price (3:1 blended)
├── Output Price
└── Input Price
LLM API Providers Page (/leaderboards/providers):
LLM API Providers Leaderboard - 500+ endpoints
├── Filters (same as LLM Leaderboard)
├── Comparison table columns:
│ ├── API Provider | Model | Context Window | License
│ ├── Intelligence Index | Blended USD/1M | Median Tokens/s
│ ├── Median First Chunk (s) | Total Response (s) | Reasoning Time (s)
│ └── Further Analysis
└── 24+ Providers: Cerebras, Groq, Fireworks, SambaNova, etc.
Image & Video Leaderboards (on homepage):
Image & Video Leaderboards
├── Tabs:
│ ├── Text to Image (ELO scores, 95% CI)
│ ├── Image Editing
│ ├── Text to Video (with Audio)
│ ├── Text to Video (without Audio)
│ ├── Image to Video (with Audio)
│ └── Image to Video (without Audio)
└── Top models with ELO rankings
OpenRouter Page Structure (Reminder)
OpenRouter Rankings Page (/rankings):
https://openrouter.ai/rankings
├── Top Models (chart header)
├── LLM Leaderboard ← THIS is the usage ranking (parse this!)
│ ├── 1. MiniMax M2.5 (1.75T tokens)
│ ├── 2. Step 3.5 Flash (1.34T tokens)
│ └── [Show more] button
├── Market Share (different metric - don't mix!)
└── ...
Usage Examples
Query Artificial Analysis Intelligence Index
User: "What are the top models on Artificial Analysis Intelligence Index?"
-> Fetches Artificial Analysis LLM Leaderboard and displays top models by intelligence
Query Model Speed Rankings
User: "Which AI models are the fastest in terms of output speed?"
-> Fetches Artificial Analysis data and lists models by tokens/second
Query API Providers
User: "Compare LLM API providers like Cerebras and Groq"
-> Fetches Artificial Analysis Providers Leaderboard and compares speed/price
Query Image/Video Models
User: "What are the best text-to-image models?"
-> Fetches Artificial Analysis Image Arena leaderboard with ELO scores
Query Model Rankings (OpenRouter)
User: "What are the top 10 AI models right now?"
-> Fetches OpenRouter rankings and displays top models with usage stats
Query Free Models
User: "What free models are available on OpenRouter?"
-> Fetches https://openrouter.ai/models?q=free and lists all free models with their model IDs
Get Model ID for API Calls
User: "What's the model ID for GPT-4o on OpenRouter?"
-> Fetches https://openrouter.ai/models and returns the exact model parameter to use
Compare Model Performance
User: "Compare GPT-4 and Claude on Pinchbench"
-> Fetches Pinchbench data and compares success rate, speed, cost
Output Format
Artificial Analysis Intelligence Index
==================================================
Artificial Analysis Intelligence Index
==================================================
Top 10 Models by Intelligence:
| Rank | Model | Intelligence | Speed (t/s) | Price ($/M) |
|------|-------|--------------|-------------|-------------|
| 1 | Gemini 3.1 Pro Preview | 57 | ~50 | $1.25 |
| 2 | GPT-5.4 (xhigh) | 57 | ~60 | $15.00 |
| 3 | Claude Opus 4.6 (max) | 53 | ~80 | $18.00 |
| 4 | Claude Sonnet 4.6 (max) | 52 | ~85 | $4.50 |
| 5 | GLM-5 | 50 | ~45 | $0.50 |
...
Fastest Models: Mercury 2 (943 t/s), NVIDIA Nemotron 3 Super (462 t/s)
Best Price: Gemma 3n E4B, Granite 4.0 H Small
Data Source: Artificial Analysis (artificialanalysis.ai)
==================================================
API Providers Comparison
==================================================
LLM API Providers Leaderboard
==================================================
| Provider | Model | Speed (t/s) | Price ($/M) | Latency (s) |
|----------|-------|-------------|-------------|-------------|
| Cerebras | Llama 3.1 70B | 2143 | $0.12 | 0.08 |
| Groq | Llama 3.1 70B | 943 | $0.59 | 0.15 |
| Fireworks | Llama 3.1 70B | 562 | $0.90 | 0.22 |
...
Data Source: Artificial Analysis Providers
==================================================
Image Arena (ELO Rankings)
==================================================
Text-to-Image Leaderboard (ELO)
==================================================
| Rank | Model | ELO Score | 95% CI |
|------|-------|-----------|--------|
| 1 | GPT Image 1.5 (high) | 1342 | ±12 |
| 2 | Imagen 4 Ultra | 1289 | ±15 |
| 3 | Gemini 3.1 Flash Image | 1245 | ±18 |
...
Data Source: Artificial Analysis Image Arena
==================================================
OpenRouter Model Rankings
==================================================
AI Model Rankings (OpenRouter)
==================================================
Top 10 Models by Usage:
| Rank | Model | Provider | Tokens | Growth |
|------|-------|----------|--------|--------|
| 1 | MiniMax M2.5 | minimax | 1.75T | +15% |
| 2 | Step 3.5 Flash | step | 1.34T | +22% |
...
Data Source: OpenRouter (Weekly Rankings)
==================================================
Free Models List
==================================================
Free Models on OpenRouter
==================================================
| Model Name | Model ID (for API) | Context |
|------------|-------------------|---------|
| GPT-4o Mini | openai/gpt-4o-mini | 128K |
| Llama 3.3 70B | meta-llama/llama-3.3-70b-instruct | 128K |
| DeepSeek V3 | deepseek/deepseek-chat | 64K |
...
💡 Usage: Set model parameter to the Model ID value
Example: model="openai/gpt-4o-mini"
Data Source: OpenRouter Models
==================================================
Execution Instructions
Method 1: Browser Automation for Rankings (Recommended)
Artificial Analysis and OpenRouter rankings pages require JavaScript rendering:
# Step 1: Load browser-automation skill (REQUIRED)
use_skill("browser-automation")
# Step 2: Navigate to Artificial Analysis LLM Leaderboard
agent-browser open "https://artificialanalysis.ai/leaderboards/models"
agent-browser wait --load networkidle
# Step 3: Wait for content to load, then extract
agent-browser wait 3000
agent-browser eval "document.body.innerText"
# Step 4: Close browser when done
agent-browser close
Method 2: Python Script for OpenRouter Model Catalog
Use the query_leaderboard.py script to fetch model data via OpenRouter API (no JavaScript needed):
# List free models
python3 "${SKILL_DIR}/query_leaderboard.py --free"
# Search models by name
python3 "${SKILL_DIR}/query_leaderboard.py -s glm"
python3 "${SKILL_DIR}/query_leaderboard.py -s gpt"
# Get specific model info
python3 "${SKILL_DIR}/query_leaderboard.py --id openai/gpt-4o"
# List all models with limit
python3 "${SKILL_DIR}/query_leaderboard.py --all --limit 50"
Method 3: Web Fetch (Fallback)
When browser/Python is not available, use web_fetch:
- For Artificial Analysis: Fetch
https://artificialanalysis.ai/leaderboards/models - For OpenRouter model catalog: Use OpenRouter API
https://openrouter.ai/api/v1/models - For benchmarks: Fetch
https://pinchbench.com/
Note: Rankings pages require JavaScript rendering - use browser automation (Method 1).
Notes
- Data is updated regularly (Artificial Analysis, OpenRouter weekly, Pinchbench near real-time)
- Artificial Analysis Intelligence Index is based on 10 independent evaluations
- ELO scores are from blind preference voting with 95% confidence intervals
- Pinchbench disclaimer: "For entertainment purposes only, should not be relied upon for critical decisions"
- Rankings reflect actual usage data from millions of users
- Free models have $0.00 pricing on OpenRouter
- Model ID format: Use the exact string (e.g.,
openai/gpt-4o-mini) as themodelparameter in API calls
Artificial Analysis API Patterns
Based on observed page structure, Artificial Analysis provides:
- Model comparison data: https://artificialanalysis.ai/leaderboards/models
- Provider comparison: https://artificialanalysis.ai/leaderboards/providers
- Image/Video arenas: Embedded on homepage with tab navigation
- Model-specific provider data:
/models/{model-id}/providersendpoint pattern
Example model providers API:
/models/gpt-oss-120b/providers
/models/gemini-3-1-pro-preview/providers
/models/claude-opus-4-6-adaptive/providers
OpenRouter API Usage
When calling OpenRouter API (for chat completions), use the Model ID. Note: This skill's scripts (fetch_rankings.py, query_leaderboard.py) only read public leaderboard data and do NOT require API authentication.
curl https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini", # <- Model ID from this skill
"messages": [{"role": "user", "content": "Hello"}]
}'