agent-evals

Create repeatable checks so agent behavior improves safely over time.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-evals" with this command: npx skills add bagelhole/devops-security-agent-skills/bagelhole-devops-security-agent-skills-agent-evals

Agent Evals

Create repeatable checks so agent behavior improves safely over time.

Evaluation Layers

  • Unit evals: prompt-level correctness

  • Tool evals: API/tool call decision quality

  • End-to-end evals: realistic multi-step tasks

  • Safety evals: prompt injection and data leak resistance

CI/CD Integration

Example eval pipeline steps

make evals-smoke make evals-regression make evals-safety

Best Practices

  • Version datasets with expected outputs.

  • Track pass rates and score drift over time.

  • Block deploys on critical safety regressions.

Related Skills

  • github-actions - Eval automation in CI

  • ai-agent-security - Security-focused eval cases

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

sops-encryption

No summary provided by upstream source.

Repository SourceNeeds Review
Security

linux-administration

No summary provided by upstream source.

Repository SourceNeeds Review
Security

linux-hardening

No summary provided by upstream source.

Repository SourceNeeds Review