guardrails & safety

Guardrails & Safety

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "guardrails & safety" with this command: npx skills add lauraflorentin/skills-marketplace/lauraflorentin-skills-marketplace-guardrails-safety

Guardrails & Safety

Guardrails are the firewall of an AI system. They sit between the user and the agent (Input Guardrail) and between the agent and the user (Output Guardrail). They enforce policy, security, and tone. Unlike the main agent, which tries to be helpful, the guardrail tries to be safe and compliant.

When to Use

  • Jailbreak Prevention: Stopping users from tricking the model ("Ignore previous instructions...").

  • PII Protection: Detecting and redacting phone numbers, emails, or credit cards.

  • Topic Adherence: Ensuring a customer support bot doesn't discuss politics or religion.

  • Brand Safety: preventing the model from generating offensive or competitor-promoting content.

Use Cases

  • Input Filter: Blocking prompts that violate usage policies.

  • Output Filter: Blocking model responses that contain hate speech or hallucinations.

  • Sandboxing: Ensuring code generated by the agent acts within safe bounds (e.g., no network access).

Implementation Pattern

def guarded_execution(user_input): # Layer 1: Input Guardrail # Check for prompt injection or policy violations if not safety_agent.check_input(user_input).safe: return "I cannot answer that request."

# Layer 2: Main Execution
response = main_agent.run(user_input)

# Layer 3: Output Guardrail
# Check for PII or harmful content in the response
if not safety_agent.check_output(response).safe:
    log_violation(user_input, response)
    return "Response withheld due to safety policy."
    
return response

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

human-in-the-loop

No summary provided by upstream source.

Repository SourceNeeds Review
General

reflection

No summary provided by upstream source.

Repository SourceNeeds Review
General

planning

No summary provided by upstream source.

Repository SourceNeeds Review
General

adaptation

No summary provided by upstream source.

Repository SourceNeeds Review