Security Suite

Purpose: Provide composable, repeatable security/internal-testing primitives for authorized binaries and repo-managed prompt surfaces.

This skill separates concerns into primitives so security workflows stay testable and reusable.

Guardrails

Use only on binaries you own or are explicitly authorized to assess.
Do not use this workflow to bypass legal restrictions or extract third-party proprietary content without authorization.
Prefer behavioral assurance and policy gating over ad-hoc one-off reverse-engineering.

Primitive Model

collect-static — file metadata, runtime heuristics, linked libraries, embedded archive signatures.
collect-dynamic — sandboxed execution trace (processes, file changes, network endpoints).
collect-contract — machine-readable behavior contract from help-surface probing.
compare-baseline — current vs baseline contract drift (added/removed commands, runtime change).
enforce-policy — allowlist/denylist gates and severity-based verdict.
collect-redteam — offline repo-surface attack-pack scan for prompt-injection, tool-misuse, secret-exfiltration, and unsafe-shell regressions.
run — thin binary orchestrator that composes primitives and writes suite summary.

Quick Start

Single run (default dynamic command is --help):

python3 skills/security-suite/scripts/security_suite.py run \
  --binary "$(command -v ao)" \
  --out-dir .tmp/security-suite/ao-current

Baseline regression gate:

python3 skills/security-suite/scripts/security_suite.py run \
  --binary "$(command -v ao)" \
  --out-dir .tmp/security-suite/ao-current \
  --baseline-dir .tmp/security-suite/ao-baseline \
  --fail-on-removed

Policy gate:

python3 skills/security-suite/scripts/security_suite.py run \
  --binary "$(command -v ao)" \
  --out-dir .tmp/security-suite/ao-current \
  --policy-file skills/security-suite/references/policy-example.json \
  --fail-on-policy-fail

Repo-surface redteam:

python3 skills/security-suite/scripts/prompt_redteam.py scan \
  --repo-root . \
  --pack-file skills/security-suite/references/agentops-redteam-pack.json \
  --out-dir .tmp/security-suite-redteam

For OWASP Top 10 code-level review, see references/owasp-checklist.md.

Recommended Workflow

Capture baseline on known-good release.
Run suite on candidate binary in CI.
Compare against baseline and enforce policy.
Block promotion on failing verdict.

Output Contract

All outputs are written under --out-dir:

static/static-analysis.json
dynamic/dynamic-analysis.json
contract/contract.json
compare/baseline-diff.json (when baseline supplied)
policy/policy-verdict.json (when policy supplied)
suite-summary.json
redteam/redteam-results.json (when repo-surface redteam is run)

This output structure is intentionally machine-consumable for CI gates.

Policy Model

Use skills/security-suite/references/policy-example.json as a starting point.

Supported checks:

required_top_level_commands
deny_command_patterns
max_created_files
forbid_file_path_patterns
allow_network_endpoint_patterns
deny_network_endpoint_patterns
block_if_removed_commands
min_command_count

Redteam Pack Model

Use agentops-redteam-pack.json as the starting point for offline repo-surface redteam checks.

Supported target fields:

globs
require_groups
forbidden_any
applies_if_any

Each case expresses a concrete adversarial prompt or operator-bypass attempt and binds it to one or more repo-owned files. The first shipped pack covers instruction precedence, context overexposure, destructive git misuse, security gate bypass, and unsafe shell or secret-handling regressions.

Technique Coverage

This suite is designed for broad binary classes, not just CLI metadata:

static runtime/library fingerprinting
sandboxed behavior observation
command/contract capture
drift classification
policy enforcement and CI verdicting
repo-surface redteam checks for prompt and operator-contract regressions

It is intentionally modular so you can add deeper primitives later (syscall tracing, SBOM attestation verification, fuzz harnesses) without rewriting the workflow.

Validation

Run:

bash skills/security-suite/scripts/validate.sh
bash tests/scripts/test-security-suite-redteam.sh

Smoke test (recommended):

python3 skills/security-suite/scripts/security_suite.py run \
  --binary "$(command -v ao)" \
  --out-dir .tmp/security-suite-smoke \
  --policy-file skills/security-suite/references/policy-example.json

Repo-surface smoke test:

python3 skills/security-suite/scripts/prompt_redteam.py scan \
  --repo-root . \
  --pack-file skills/security-suite/references/agentops-redteam-pack.json \
  --out-dir .tmp/security-suite-redteam-smoke

Examples

Scenario: Capture a Baseline and Gate a New Release

User says: /security-suite run --binary $(command -v ao) --out-dir .tmp/security-suite/ao-v2.4

What happens:

The suite runs static analysis (file metadata, linked libraries, embedded archive signatures), dynamic tracing (sandboxed --help execution observing processes, file changes, network endpoints), and contract capture against the ao binary.
It writes static/static-analysis.json, dynamic/dynamic-analysis.json, contract/contract.json, and suite-summary.json under the output directory.

Result: A complete baseline snapshot is captured for ao v2.4, ready to be used as --baseline-dir for future release comparisons.

Scenario: CI Regression Gate With Baseline and Policy

User says: /security-suite run --binary ./bin/ao-candidate --out-dir .tmp/ao-candidate --baseline-dir .tmp/security-suite/ao-v2.4 --policy-file skills/security-suite/references/policy-example.json --fail-on-removed --fail-on-policy-fail

What happens:

The suite runs all three collection primitives on the candidate binary, then compares the resulting contract against the v2.4 baseline to produce compare/baseline-diff.json with any added, removed, or changed commands.
It evaluates the policy file checks (required commands, denied patterns, network allowlists, file limits) and writes policy/policy-verdict.json with a pass/fail verdict.

Result: The suite exits non-zero if any commands were removed or a policy check failed, blocking the candidate from promotion in the CI pipeline.

Scenario: Offline Redteam the Repo's Prompt and Skill Surfaces

User says: /security-suite collect-redteam --repo-root .

What happens:

The redteam scanner loads the attack pack from agentops-redteam-pack.json and evaluates repo-owned control surfaces against concrete attack cases.
It writes redteam/redteam-results.json and redteam/redteam-results.md under the chosen output directory, then exits non-zero if a fail-severity case is not resisted.

Result: The repo gets a deterministic redteam verdict for prompt-injection, tool misuse, context overexposure, secret-handling, and unsafe-shell regressions without needing hosted model scanning.

Troubleshooting

Problem	Cause	Solution
Suite exits non-zero with no clear finding	`--fail-on-removed` or `--fail-on-policy-fail` triggered on a legitimate change	Review `compare/baseline-diff.json` and `policy/policy-verdict.json` to identify the specific delta, then update the baseline or policy file accordingly.
`dynamic/dynamic-analysis.json` is empty or minimal	Binary requires arguments beyond `--help`, or sandbox blocked execution	Supply a custom dynamic command if supported, or verify the binary runs in the sandboxed environment (check permissions, missing shared libraries).
`contract/contract.json` shows zero commands	The binary does not expose a `--help` surface or uses a non-standard help flag	Verify the binary supports `--help`; for binaries with unusual help interfaces, run `collect-contract` separately with the correct invocation.
Policy verdict fails on `deny_command_patterns`	A new subcommand matches a deny regex in the policy file	Either rename the subcommand or update `deny_command_patterns` in your policy JSON to exclude the legitimate pattern.
`baseline-diff.json` not generated	`--baseline-dir` was not provided or points to a missing directory	Ensure the baseline directory exists and contains a valid `contract/contract.json` from a prior run.
Redteam scan fails after a wording cleanup	The attack pack no longer matches the intended guardrail language in target files	Review `redteam/redteam-results.json`, confirm whether the control regressed or the regex is too brittle, then update the target file or the pack intentionally.

security-suite

Safety Notice

Copy this and send it to your AI assistant to learn

Security Suite

Guardrails

Primitive Model

Quick Start

Recommended Workflow

Output Contract

Policy Model

Redteam Pack Model

Technique Coverage

Validation

Examples

Scenario: Capture a Baseline and Gate a New Release

Scenario: CI Regression Gate With Baseline and Policy

Scenario: Offline Redteam the Repo's Prompt and Skill Surfaces

Troubleshooting

Reference Documents

Source Transparency

Related Skills

security

council

swarm

bug-hunt