ai-red-teaming

Continuously test AI applications like an adversary to discover exploitable failure modes before attackers do.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "ai-red-teaming" with this command: npx skills add bagelhole/devops-security-agent-skills/bagelhole-devops-security-agent-skills-ai-red-teaming

AI Red Teaming

Continuously test AI applications like an adversary to discover exploitable failure modes before attackers do.

Program Design

  • Define threat scenarios: jailbreaks, policy evasion, prompt injection, model abuse.

  • Build reusable attack suites by domain (support bot, coding agent, RAG assistant).

  • Include multilingual and obfuscated attack prompts.

  • Track results in a risk register with severity and exploitability.

Test Categories

  • Jailbreak robustness: bypassing safety instructions.

  • Data exfiltration: extracting secrets, system prompts, tenant data.

  • Tool abuse: unauthorized API calls or command execution.

  • Social engineering: inducing unsafe business actions.

  • Availability abuse: token amplification and DoS-style prompts.

Exercise Cadence

  • Pre-release blocking red-team gate.

  • Monthly deep-dive campaigns.

  • Post-incident targeted retests.

Scoring Model

  • Likelihood (1-5)

  • Impact (1-5)

  • Detectability (1-5)

  • Control maturity (low/medium/high)

Use scores to prioritize fixes and define SLA for remediation.

Reporting Essentials

  • Reproducible prompt traces

  • Model/version and config used

  • Successful attack chain narrative

  • Recommended mitigations + verification steps

Related Skills

  • agent-evals - Convert findings into regression tests

  • prompt-injection-defense - Implement injection countermeasures

  • penetration-testing - Broader offensive security process

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

linux-administration

No summary provided by upstream source.

Repository SourceNeeds Review
Security

sops-encryption

No summary provided by upstream source.

Repository SourceNeeds Review
Security

linux-hardening

No summary provided by upstream source.

Repository SourceNeeds Review
Security

vpn-setup

No summary provided by upstream source.

Repository SourceNeeds Review