secure-ai

Secures AI integrations against prompt injection, privilege escalation, and data leakage. Use when implementing defense-in-depth for LLM pipelines, applying zero-trust controls to autonomous agents, hardening server actions that interact with AI services, validating model outputs, securing MCP tool integrations, or auditing AI system access patterns and identity management. Use for OWASP LLM Top 10, prompt injection defense, agentic security, AI supply chain.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "secure-ai" with this command: npx skills add oakoss/agent-skills/oakoss-agent-skills-secure-ai

Secure AI

Overview

Secures AI integration layers through multi-layered defense, structural isolation, and zero-trust orchestration. Covers prompt injection defense, model output validation, agentic security, secure server actions, supply chain integrity, MCP tool security, and audit protocols for applications that interact with LLMs.

Aligned with the OWASP Top 10 for LLM Applications 2025 and the NIST AI Risk Management Framework (AI RMF 1.0). Provides coverage for all ten OWASP LLM risks with concrete defense patterns.

When to use: Securing LLM-powered features against prompt injection, validating and sanitizing model outputs before downstream use, implementing zero-trust for autonomous agents, hardening server actions for AI endpoints, securing MCP tool integrations, managing AI supply chain risks, auditing AI access patterns.

When NOT to use: General web application security without AI components, frontend-only security concerns, non-AI API hardening, basic authentication or authorization without AI involvement.

Quick Reference

PatternApproachKey Points
Structural isolationSeparate system/user message rolesNever mix instructions and user data in one string
Input boundariesDelimit user data with markersHelps models identify where untrusted data begins/ends
Guardian modelPre-scan input with a fast classifierDetect injection patterns before main reasoning model
Output validationTreat LLM output as untrusted inputContext-aware encoding, parameterized queries, CSP headers
Least privilegeCapability-based scopes per sub-taskAgents get only the tools needed for current work
Human-in-the-loopRequire human sign-off for destructive actionsFinancial or data-altering events need approval
Non-human identityOIDC-based agent authenticationVerifiable identity for every agent, rotate keys regularly
Server-only AI logicserver-only imports for all AI codeKeys and reasoning never leak to client bundle
Input validationZod schemas on all AI-facing server actionsNever pass raw user input to AI services
Rate limitingPer-user/IP token budget via RedisPrevent denial-of-wallet attacks on AI endpoints
Stream scrubbingFilter sensitive strings from AI output streamsRemove internal IDs, secrets before reaching client
MCP tool securityAllowlist tools, validate inputs/outputsTreat MCP servers as untrusted, enforce least privilege
Supply chain integrityVerify model provenance, maintain AI-BOMTrack models, datasets, and dependencies with checksums
Secret managementEnvironment variables with CI leak scanningUse gitleaks in CI to prevent committed secrets

Core Security Principles

  1. Isolation is absolute -- user data must never be treated as system instruction
  2. LLM output is untrusted -- treat all model responses as potentially malicious input before downstream use
  3. Least privilege for agents -- grant only the tools needed for the current sub-task, revoke after completion
  4. Human verification of destruction -- destructive or irreversible actions require a human signature
  5. No secrets in client -- all AI logic and keys reside in server-only environments
  6. Adversarial mindset -- assume both users and agents will attempt to bypass rules
  7. Defense in depth -- layer defenses so that bypassing one layer does not compromise the system
  8. Supply chain verification -- verify provenance and integrity of all models, datasets, and AI tools

OWASP LLM Top 10 (2025) Coverage

OWASP RiskReference
LLM01 Prompt InjectionPrompt Injection Defense
LLM02 Sensitive Information DisclosureSecure Server Actions (stream scrubbing, output filtering)
LLM03 Supply ChainSupply Chain and MCP Security
LLM04 Data and Model PoisoningSupply Chain and MCP Security
LLM05 Improper Output HandlingOutput Validation and Encoding
LLM06 Excessive AgencyAgentic Zero-Trust Security (least privilege, HITL)
LLM07 System Prompt LeakagePrompt Injection Defense (non-extractable prompts)
LLM08 Vector and Embedding WeaknessesOutput Validation and Encoding (RAG sanitization)
LLM09 MisinformationOutput Validation and Encoding (semantic filtering)
LLM10 Unbounded ConsumptionSecure Server Actions (rate limiting, token budgets)

Common Mistakes

MistakeCorrect Pattern
Mixing user input and system instructions in the same prompt fieldUse structural isolation with separate system and user message roles
Trusting LLM output and passing it directly to exec, eval, or SQLTreat all model output as untrusted; use parameterized queries and context-aware encoding
Giving agents unlimited tool access for all tasksApply capability-based scopes granting only tools needed per sub-task
Using static API keys for AI service authenticationUse OIDC with dynamic key rotation and short-lived tokens
Loading third-party models without provenance checksVerify model checksums, use signed artifacts, maintain AI-BOM
Granting MCP servers broad permissions without validationAllowlist MCP tools, validate all inputs/outputs, enforce human approval for sensitive actions
Passing raw user input directly to AI servicesValidate all input with Zod schemas before AI processing
Streaming AI responses without output filteringScrub sensitive strings from streams before they reach the client

Key Frameworks

  • OWASP Top 10 for LLM Applications 2025 -- industry standard for LLM vulnerability classification
  • NIST AI Risk Management Framework (AI RMF 1.0) -- four-function framework (Govern, Map, Measure, Manage) for AI risk
  • NIST Cybersecurity Framework Profile for AI (NISTIR 8596) -- guidelines for secure AI adoption
  • OWASP MCP Security Cheat Sheet -- practical guide for securing third-party MCP server integrations
  • CycloneDX 1.6 / SPDX 3.0 -- standards supporting AI Bill of Materials (ML-BOM)

Delegation

  • Scan codebase for prompt injection vulnerabilities: Use Explore agent to search for user data flowing into system prompts and unvalidated inputs
  • Implement zero-trust agent orchestration: Use Task agent to add identity verification, sandboxing, and human approval gates
  • Audit model output handling: Use Explore agent to find LLM outputs passed to exec, eval, SQL, or rendered as HTML without sanitization
  • Review MCP tool configurations: Use Explore agent to check MCP server permissions, tool allowlists, and authentication setup
  • Design secure AI integration architecture: Use Plan agent to map trust boundaries, agent scopes, and audit requirements
  • Assess supply chain risks: Use Explore agent to inventory third-party models, datasets, and MCP servers with provenance records
  • Set up security monitoring: Use Task agent to configure audit logging, anomaly detection, and incident response alerts

For general application security (OWASP Top 10, auth patterns, security headers, input validation), use the application-security skill. For database-layer security (RLS policies, audit trails, Postgres hardening), use the database-security skill.

References

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

application-security

No summary provided by upstream source.

Repository SourceNeeds Review
Security

database-security

No summary provided by upstream source.

Repository SourceNeeds Review
Security

quality-auditor

No summary provided by upstream source.

Repository SourceNeeds Review