agent-evals

Create repeatable checks so agent behavior improves safely over time.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-evals" with this command: npx skills add bagelhole/devops-security-agent-skills/bagelhole-devops-security-agent-skills-agent-evals

Agent Evals

Create repeatable checks so agent behavior improves safely over time.

Evaluation Layers

Unit evals: prompt-level correctness
Tool evals: API/tool call decision quality
End-to-end evals: realistic multi-step tasks
Safety evals: prompt injection and data leak resistance

CI/CD Integration

Example eval pipeline steps

make evals-smoke make evals-regression make evals-safety

Best Practices

Version datasets with expected outputs.
Track pass rates and score drift over time.
Block deploys on critical safety regressions.

Related Skills

github-actions - Eval automation in CI
ai-agent-security - Security-focused eval cases

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Open in GitHub Open in ClawHub

Related Skills

Related by shared tags or category signals.

Security

sops-encryption

No summary provided by upstream source.

Repository SourceNeeds Review

-31

bagelhole

Security

linux-administration

No summary provided by upstream source.

Repository SourceNeeds Review

-29

bagelhole

Security

linux-hardening

No summary provided by upstream source.

Repository SourceNeeds Review

-26

bagelhole