Claude Intel Monitor
Track LLM intelligence degradation over time. Run 30 benchmark questions daily and detect when models get dumber. Supports Anthropic Claude, OpenAI GPT, and DeepSeek.
When to Use
- Claude or GPT seems worse today than yesterday
- You suspect a silent model downgrade
- You want proof before switching providers
- You need a baseline for comparing models (DeepSeek scored 91.1%)
Quick Start
# Run full benchmark suite (30 questions)
claude-intel-monitor run --provider anthropic
# Compare against historical baselines
claude-intel-monitor compare --baseline 2026-04-15
# Show score trend over time
claude-intel-monitor trend --days 30
# Alert if score drops below threshold
claude-intel-monitor alert --threshold 80
Benchmark Categories
| Category | Questions | Example |
|---|
| Math | 10 | Calculus, probability, number theory |
| Reasoning | 10 | Logic puzzles, formal deduction |
| Code | 10 | Algorithm design, debugging, refactoring |
Baseline Results
| Model | Score | Date |
|---|
| DeepSeek V3 | 91.1% (27/30) | 2026-04-17 |
| Claude 3.5 Sonnet | 93.3% (28/30) | 2026-03-01 |
| GPT-4o | 90.0% (27/30) | 2026-03-01 |
Install
git clone https://github.com/minirr890112-byte/claude-intel-monitor.git
cd claude-intel-monitor
pip install -e .
Source
github.com/minirr890112-byte/claude-intel-monitor