agent-safety

Ensure agent safety - guardrails, content filtering, monitoring, and compliance

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-safety" with this command: npx skills add pluginagentmarketplace/custom-plugin-ai-agents/pluginagentmarketplace-custom-plugin-ai-agents-agent-safety

Agent Safety

Implement safety systems for responsible AI agent deployment.

When to Use This Skill

Invoke this skill when:

  • Adding input/output guardrails
  • Implementing content filtering
  • Setting up rate limiting
  • Ensuring compliance (GDPR, SOC2)

Parameter Schema

ParameterTypeRequiredDescriptionDefault
taskstringYesSafety goal-
risk_levelenumNostrict, moderate, permissivestrict
filterslistNoFilter types to enable["injection", "pii", "toxicity"]

Quick Start

from guardrails import Guard
from guardrails.validators import ToxicLanguage, PIIFilter

guard = Guard.from_validators([
    ToxicLanguage(threshold=0.8, on_fail="exception"),
    PIIFilter(on_fail="fix")
])

# Validate output
validated = guard.validate(llm_response)

Guardrail Types

Input Guardrails

# Prompt injection detection
INJECTION_PATTERNS = [
    r"ignore (previous|all) instructions",
    r"you are now",
    r"forget everything"
]

Output Guardrails

# Content filtering
filters = [
    ToxicityFilter(),
    PIIRedactor(),
    HallucinationDetector()
]

Rate Limiting

class RateLimiter:
    def __init__(self, rpm=60, tpm=100000):
        self.rpm = rpm
        self.tpm = tpm

    def check(self, user_id, tokens):
        # Token bucket algorithm
        pass

Troubleshooting

IssueSolution
False positivesTune thresholds
Injection bypassAdd LLM-based detection
PII leakageAdd secondary validation
Performance hitCache filter results

Best Practices

  • Defense in depth (multiple layers)
  • Fail-safe defaults (deny by default)
  • Audit everything
  • Regular red team testing

Compliance Checklist

  • Input validation active
  • Output filtering enabled
  • Audit logging configured
  • Rate limits set
  • PII handling compliant

Related Skills

  • tool-calling - Input validation
  • llm-integration - API security
  • multi-agent - Per-agent permissions

References

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

agent-memory

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

multi-agent

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

rag-systems

No summary provided by upstream source.

Repository SourceNeeds Review