prompt-injection-defense

Prompt Injection Defense

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "prompt-injection-defense" with this command: npx skills add bagelhole/devops-security-agent-skills/bagelhole-devops-security-agent-skills-prompt-injection-defense

Prompt Injection Defense

Mitigate direct and indirect prompt injection across chat apps, agentic workflows, and RAG pipelines.

Attack Surface

  • User input attempting to override system instructions

  • Untrusted documents/web pages in retrieval context

  • Tool output that smuggles malicious instructions

  • Cross-tenant leakage via shared context windows

Defense-in-Depth Pattern

  • Instruction hierarchy enforcement: system > developer > user > tool output.

  • Context segregation: isolate untrusted text from control instructions.

  • Tool permissioning: explicit allow-list per task and tenant.

  • Output policy checks: validate schema, redact secrets, block unsafe actions.

  • Human approval: required for high-impact operations.

Implementation Controls

  • Strip or label untrusted content blocks before generation.

  • Disable autonomous tool chaining for sensitive workflows.

  • Use deterministic parsers (JSON schema) before tool execution.

  • Reject requests containing high-risk exfiltration patterns.

  • Add canary tokens to detect data exfil attempts.

Red-Team Test Cases

  • "Ignore previous instructions" style direct override

  • Retrieval payload containing hidden policy bypass text

  • Tool output instructing follow-up privileged command

  • Prompt that asks for secrets from memory or env vars

Security Metrics

  • Prompt injection detection rate

  • Unsafe tool invocation prevention rate

  • Time-to-containment for injection attempts

  • False positive rate on blocked safe prompts

Related Skills

  • ai-agent-security - Agent threat model and controls

  • llm-app-security - End-to-end LLM app hardening

  • security-automation - Automated policy response workflows

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

linux-administration

No summary provided by upstream source.

Repository SourceNeeds Review
Security

sops-encryption

No summary provided by upstream source.

Repository SourceNeeds Review
Security

linux-hardening

No summary provided by upstream source.

Repository SourceNeeds Review
Security

vpn-setup

No summary provided by upstream source.

Repository SourceNeeds Review