agent:secure

Performs a security audit of an AI agent system. Applies patterns 18-21 from "Patterns for Building AI Agents" (Bhagwat & Gienow, 2025): preventing the lethal trifecta, sandboxing code execution, granular access control, and input/output guardrails.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent:secure" with this command: npx skills add ikatsuba/skills/ikatsuba-skills-agent-secure

Agent Security Audit

Performs a security audit of an AI agent system. Applies patterns 18-21 from "Patterns for Building AI Agents" (Bhagwat & Gienow, 2025): preventing the lethal trifecta, sandboxing code execution, granular access control, and input/output guardrails.

When to use

Use this skill when the user needs to:

  • Audit an existing agent for security vulnerabilities

  • Design security controls for a new agent

  • Prevent prompt injection and data exfiltration

  • Set up sandboxing for code execution

  • Design access control and guardrails

Instructions

Step 1: Understand the Agent

Use the AskUserQuestion tool to gather context:

  • What does the agent do?

  • Does it access private/sensitive data? (user data, internal docs, credentials)

  • Does it process untrusted input? (public content, user uploads, external APIs)

  • Can it communicate externally? (send emails, create PRs, call APIs, write files)

  • Does it execute code? (run scripts, shell commands, code generation)

  • What authentication/authorization exists today?

Read any existing spec documents (.specs/<spec-name>/ ) before proceeding.

Step 2: Lethal Trifecta Analysis (Pattern 18)

The "lethal trifecta" (coined by Simon Willison) is the combination of:

  • Access to private data — agent can read sensitive information

  • Exposure to untrusted content — agent processes external/user-generated input

  • Exfiltration capability — agent can send data outside the system

When all three are present, prompt injection attacks become possible: malicious instructions hidden in external content trick the agent into accessing private data and sending it to an attacker.

Analyze the agent:

Lethal Trifecta Analysis

Leg 1: Private Data Access

  • Reads user PII
  • Accesses internal documents
  • Has database read access
  • Can read credentials/secrets
  • Accesses private repositories Risk level: [None / Low / Medium / High]

Leg 2: Untrusted Content Exposure

  • Processes user-generated content
  • Reads public web pages
  • Parses uploaded files
  • Ingests third-party API responses
  • Reads public issues/tickets/comments Risk level: [None / Low / Medium / High]

Leg 3: Exfiltration Capability

  • Can send emails
  • Can create PRs/issues
  • Can call external APIs
  • Can write to public endpoints
  • Can modify shared state Risk level: [None / Low / Medium / High]

Trifecta Status: [SAFE / AT RISK / VULNERABLE]

If all three legs are present: The agent is VULNERABLE. Recommend removing at least one leg:

  • Easiest: remove exfiltration — constrain agent actions after processing untrusted input

  • Alternative: isolate data access — use separate agents for private data vs. untrusted content

  • Alternative: sanitize input — add middleware to intercept and clean untrusted content before it reaches the LLM

Use AskUserQuestion to recommend and confirm the mitigation approach.

Step 3: Sandbox Assessment (Pattern 19)

If the agent executes code, audit the sandbox:

Code Execution Sandbox

Current State

  • Code runs in isolated container
  • Network access restricted
  • File system access restricted
  • Resource limits set (CPU, memory, time)
  • No access to production credentials
  • No access to host file system

Threats

ThreatRiskMitigation
Secret exfiltration[Risk][Mitigation]
Environment deletion[Risk][Mitigation]
Resource abuse (crypto mining)[Risk][Mitigation]
Accidental resource hogging[Risk][Mitigation]

Recommendations

  • Runtime: [Docker / E2B / Daytona / other]
    • Note: Docker has 10-20s cold starts; consider agentic runtimes for sub-second startup
  • Resource limits: CPU: [X], Memory: [X], Timeout: [X]
  • Network policy: [Allow-list specific endpoints / Block all / etc.]

If the agent does NOT execute code, note this and skip to Step 4.

Step 4: Access Control Review (Pattern 20)

Agents need MORE granular access control than humans because they are:

  • Infinitely diligent — security by obscurity doesn't work

  • Ephemeral — sessions are short-lived, credentials need scoping

  • Unpredictable — LLM behavior is nondeterministic

Access Control Review

Authentication

  • Agent has its own identity (not using a shared service account)
  • OAuth flow implemented for user-delegated access
  • Credentials are scoped to specific operations
  • Credentials are short-lived / rotated

Authorization

Tool/ActionCurrent AccessRecommended AccessJustification
[Database read][Full access][Read-only, filtered by user][Least privilege]
[API call X][Admin][Scoped to operation][Least privilege]
[File write][Unrestricted][Specific directory only][Blast radius reduction]

Permission Modes

  • Planning mode — agent has reduced permissions during reasoning
    • Restrict: UPDATE, DELETE, external API calls
    • Allow: SELECT, read-only operations
  • Execution mode — elevated permissions only for confirmed actions
    • Requires: explicit user approval or automated policy check

Just-in-Time Access

  • Credentials granted per-task, not per-session
  • Access scoped to specific user context
  • Unused permissions revoked after task completion

Step 5: Guardrails Design (Pattern 21)

Design input and output guardrails — live, low-latency checks that prevent harm in real-time.

Guardrails

Input Guardrails

Intercept incoming inputs BEFORE they reach the LLM.

GuardDescriptionAction on Trigger
Prompt InjectionDetect attempts to override system instructionsBlock + return default message
Jailbreak DetectionDetect attempts to bypass safety constraintsBlock + log + alert
PII DetectionDetect sensitive personal information in inputRedact or block
Off-TopicDetect requests outside agent's domainRedirect to appropriate handler
On-BrandEnsure input aligns with acceptable useBlock inappropriate content

Output Guardrails

Screen generated output BEFORE it reaches the user or tools.

GuardDescriptionAction on Trigger
Data LeakageDetect private data in outputRedact + log
Hallucination CheckVerify factual claims against source dataFlag for review
ToxicityDetect harmful, biased, or inappropriate contentBlock + regenerate
Format ValidationEnsure output matches expected schemaRetry with format instructions
Action ValidationVerify tool calls are within authorized scopeBlock unauthorized actions

Implementation Notes

  • Guardrails must be LOW LATENCY — they run on every request
  • Use specialized lightweight models or rule-based systems for speed
  • Log all guardrail triggers for monitoring and tuning
  • Guardrails complement evals — evals are after-the-fact, guardrails are real-time

Use AskUserQuestion to prioritize which guardrails to implement first based on the agent's risk profile.

Step 6: Generate Security Report

Compile all outputs into .specs/<spec-name>/agent-security.md :

Agent Security Audit: [System Name]

Executive Summary

Overall Risk: [Low / Medium / High / Critical] Lethal Trifecta: [SAFE / AT RISK / VULNERABLE] Immediate Actions Required: [Count]

Lethal Trifecta Analysis

[From Step 2]

Sandbox Assessment

[From Step 3]

Access Control

[From Step 4]

Guardrails

[From Step 5]

Priority Actions

#ActionSeverityEffort
1[Action]Critical[Low/Med/High]
2[Action]High[Low/Med/High]

Step 7: Offer Next Steps

Use AskUserQuestion to offer:

  • Implement top-priority fix — start with the highest-severity action item

  • Full review — run agent:review to validate against all 22 patterns

  • Re-audit — run agent:secure again after implementing fixes

Arguments

  • <args>

  • Optional spec name or path to agent code

  • <spec-name> — reads existing agent design from .specs/<spec-name>/

  • <path> — analyzes agent code at the given path

Examples:

  • agent:secure customer-support — audit the customer-support agent

  • agent:secure src/agents/ — audit agent code in the given directory

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

spec:design

No summary provided by upstream source.

Repository SourceNeeds Review
General

spec:requirements

No summary provided by upstream source.

Repository SourceNeeds Review
General

git:amend

No summary provided by upstream source.

Repository SourceNeeds Review