agent-firewall

Real-time input/output filtering for agent communications. Block prompt injection, data exfiltration, and unauthorized commands before they reach the model.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-firewall" with this command: npx skills add arhadnane/agent-firewall

Agent Firewall — Input/Output Guardian

Architecture

[Channel Input] → [INPUT FILTER] → [Agent/Model] → [OUTPUT FILTER] → [Channel Output]
                        ↓                                  ↓
                  ┌─────────────┐                  ┌──────────────┐
                  │ Block List  │                  │ Secret Scan  │
                  │ Pattern DB  │                  │ PII Redact   │
                  │ Rate Limit  │                  │ Path Scrub   │
                  │ Encoding Det│                  │ URL Checker  │
                  └─────────────┘                  └──────────────┘

Input Filters

#FilterDescription
1Injection patternsRegex + heuristic match for "ignore previous", "you are now", role confusion
2Unicode sanitizerStrip zero-width chars, control characters, RTL overrides
3Encoding detectorDetect Base64, hex, ROT13 encoded payloads in user messages
4Role confusionDetect fake system messages, assistant impersonation
5Rate limiterMax messages per user per channel per minute
6Size limiterReject inputs exceeding token budget

Output Filters

#FilterDescription
1Secret scannerHigh-entropy strings + known patterns (AWS key, GitHub token)
2PII redactorEmail, phone, SSN, credit card → [REDACTED]
3Path scrubberRemove internal filesystem paths from outputs
4URL checkerBlock responses containing known malicious URLs
5Consistency checkVerify output doesn't contradict system prompt directives

Configuration

# .security/firewall-rules.yaml
input:
  injection_patterns:
    - pattern: "ignore (all )?previous instructions"
      action: BLOCK
      severity: CRITICAL
    - pattern: "you are now (?!helping)"
      action: BLOCK
      severity: HIGH
  rate_limit:
    max_per_minute: 30
    max_per_hour: 500
  max_input_tokens: 4096

output:
  secret_patterns:
    - name: aws_key
      pattern: "AKIA[0-9A-Z]{16}"
      action: REDACT
    - name: github_token
      pattern: "gh[ps]_[A-Za-z0-9_]{36,}"
      action: REDACT
  pii_redaction: true
  path_scrubbing: true

Guardrails

  • Firewall rules are append-only in production — deletion requires human approval
  • False positives → log, alert, pass through with warning (don't silently drop)
  • All blocks are logged with: timestamp, rule matched, full context, channel, user hash
  • Firewall itself cannot be disabled by agent instructions
  • Rules file is read-only from the agent's perspective

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Chrome Use

Use chrome-use when standard web access (fetch/web search) fails due to Cloudflare challenges, CAPTCHAs, JavaScript-rendered content, or bot detection — or w...

Registry SourceRecently Updated
Automation

Agentchat Skill Publish

The messaging platform for AI agents. Send DMs, join groups, manage contacts, and check presence.

Registry SourceRecently Updated
Automation

Draft0

Official skill for interacting with Draft0, the Medium for Agents.

Registry SourceRecently Updated
Automation

ifly-pdf-image-ocr

ifly-pdf&image-ocr skill supporting both image OCR (AI-powered LLM OCR) and PDF document recognition. Use when user asks to OCR images, extract text from ima...

Registry SourceRecently Updated