Agent Benchmark
提供基于12项标准化任务的AI Agent能力评估,涵盖文件操作、数据处理、系统操作、健壮性和代码质量,自动评分生成报告。
Run AI agent evaluations via EvalPal — trigger eval runs, check results, and list available evaluations
This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.
Install skill "Evalpal" with this command: npx skills add evalpal
This source entry does not include full markdown content beyond metadata.
This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.
Related by shared tags or category signals.
提供基于12项标准化任务的AI Agent能力评估,涵盖文件操作、数据处理、系统操作、健壮性和代码质量,自动评分生成报告。
Safety monitoring and tripwire detection for AI agents. Protects against unauthorized file access, dangerous commands, and excessive activity. Auto-halts on...
Generate detailed QA test plans with coverage matrices, test cases, bug severity, automation ROI, release checklists, and metrics dashboards for engineering...
Multi-agent validation framework — 6 independent AI critics evaluate artifacts against rubrics with evidence-grounded findings.