prompt-injection-defense | V50.AI

prompt-injection-defense

Prompt Injection Defense

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "prompt-injection-defense" with this command: npx skills add bagelhole/devops-security-agent-skills/bagelhole-devops-security-agent-skills-prompt-injection-defense

Prompt Injection Defense

Mitigate direct and indirect prompt injection across chat apps, agentic workflows, and RAG pipelines.

Attack Surface

User input attempting to override system instructions
Untrusted documents/web pages in retrieval context
Tool output that smuggles malicious instructions
Cross-tenant leakage via shared context windows

Defense-in-Depth Pattern

Instruction hierarchy enforcement: system > developer > user > tool output.
Context segregation: isolate untrusted text from control instructions.
Tool permissioning: explicit allow-list per task and tenant.
Output policy checks: validate schema, redact secrets, block unsafe actions.
Human approval: required for high-impact operations.

Implementation Controls

Strip or label untrusted content blocks before generation.
Disable autonomous tool chaining for sensitive workflows.
Use deterministic parsers (JSON schema) before tool execution.
Reject requests containing high-risk exfiltration patterns.
Add canary tokens to detect data exfil attempts.

Red-Team Test Cases

"Ignore previous instructions" style direct override
Retrieval payload containing hidden policy bypass text
Tool output instructing follow-up privileged command
Prompt that asks for secrets from memory or env vars

Security Metrics

Prompt injection detection rate
Unsafe tool invocation prevention rate
Time-to-containment for injection attempts
False positive rate on blocked safe prompts

Related Skills

ai-agent-security - Agent threat model and controls
llm-app-security - End-to-end LLM app hardening
security-automation - Automated policy response workflows

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Open in GitHub Open in ClawHub

Related Skills

Related by shared tags or category signals.

Security

linux-administration

No summary provided by upstream source.

Repository SourceNeeds Review

Security

sops-encryption

No summary provided by upstream source.

Repository SourceNeeds Review

Security

linux-hardening

No summary provided by upstream source.

Repository SourceNeeds Review

Security

vpn-setup

No summary provided by upstream source.

Repository SourceNeeds Review