zouroboros-bench

Benchmark harness for AI memory systems. Evaluates LongMemEval, LoCoMo, and ConvoMem datasets against any memory backend via the zouroboros-memory CLI. Includes Mimir judge for catching architectural drift.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "zouroboros-bench" with this command: npx skills add marlandoj.zo.computer/zouroboros-bench

Usage

Install: npm install zouroboros-bench zouroboros-memory

Run all benchmarks

npx zouroboros-bench --limit 50

Run specific benchmark

npx zouroboros-bench --benchmarks longmemeval --limit 100 --judge

Generate report

npx zouroboros-bench-report --runs ./data/runs/

Environment Variables

  • ZOUROBOROS_MEMORY_CLI — Path to memory CLI binary (default: zouroboros-memory)
  • ZOUROBOROS_MEMORY_DB — SQLite DB path for benchmarks
  • OPENAI_API_KEY — Required for GPT-4o judge
  • OLLAMA_URL — Ollama URL for local LLM (default: http://localhost:11434)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Vnsh Skill

Securely share files using encrypted, expiring vnsh.dev links with the vnsh CLI for uploading and decrypting shared content.

Registry SourceRecently Updated
Coding

Notion

Notion API for creating and managing pages, databases, blocks, relations, rollups, and multi-workspace profiles via the notioncli CLI tool.

Registry SourceRecently Updated
Coding

Lybic Sandbox

Lybic Sandbox is a cloud sandbox built for agents and automation workflows. Think of it as a disposable cloud computer you can spin up on demand. Agents can perform GUI actions like seeing the screen, clicking, typing, and handling pop ups, which makes it a great fit for legacy apps and complex flows where APIs are missing or incomplete. It is designed for control and observability. You can monitor execution in real time, stop it when needed, and use logs and replay to debug, reproduce runs, and evaluate reliability. For long running tasks, iterative experimentation, or sensitive environments, sandboxed execution helps reduce risk and operational overhead.

Registry SourceRecently Updated
1.2K0aenjoy
Coding

Homeassistant Skill

Control Home Assistant devices and automations via REST API. 25 entity domains including lights, climate, locks, presence, weather, calendars, notifications, scripts, and more. Use when the user asks about their smart home, devices, or automations.

Registry SourceRecently Updated
5.1K7anotb