evaluation-benchmark

Agent评估测试助手。设计评估指标、构建测试集、生成报告。使用场景:(1) 设计评估指标,(2) 构建测试集,(3) 执行评估测试,(4) 分析评估结果。

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "evaluation-benchmark" with this command: npx skills add sky-lv/skylv-evaluation-benchmark

Evaluation & Benchmark — Agent评估助手

功能说明

评估和测试Agent性能。

使用方法

1. 评估指标

用户: 如何评估Agent的效果?

2. 测试集设计

用户: 构建一个代码生成测试集

3. 评估执行

用户: 运行评估测试

4. 结果分析

用户: 分析这次评估的结果

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Auto Skill Loader

自动检测当前任务类型,动态加载对应的 Skill。当收到新任务时,分析任务意图, 匹配最佳 Skill 并自动加载。支持 Skill 分级保护(core/protected/dynamic), 即插即用零配置,兼容任何 OpenClaw 部署。 触发词:"自动加载skill"、"动态加载"、"智能匹配skill"...

Registry SourceRecently Updated
1900Profile unavailable
Automation

Agent Config Validator

OpenClaw Agent配置验证器 - 自动检查openclaw.json与agent核心文档的一致性,检测过时引用,生成诊断报告并支持自动修复。当新增/调整agent或修改核心文档后使用此技能确保配置完整性。

Registry SourceRecently Updated
2230Profile unavailable
Automation

Agent Memory Persistent Workspace Memory System

Stop your AI agent from forgetting everything between sessions. Three-tier memory architecture (long-term owner namespace / daily logs / session handoff), cr...

Registry SourceRecently Updated
3070Profile unavailable
Automation

Mobayilo Voice (Beta)

Place outbound phone calls via Mobayilo with safe defaults (preview mode by default) and explicit live execution.

Registry SourceRecently Updated
3900Profile unavailable