security-suite

Composable security suite for binary and prompt-surface assurance, static analysis, dynamic tracing, repo-native redteam scans, contract capture, baseline drift, and policy gating. Triggers: "binary security", "reverse engineer binary", "black-box binary test", "behavioral trace", "baseline diff", "prompt redteam", "security suite".

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "security-suite" with this command: npx skills add boshu2/agentops/boshu2-agentops-security-suite

Security Suite

Purpose: Provide composable, repeatable security/internal-testing primitives for authorized binaries and repo-managed prompt surfaces.

This skill separates concerns into primitives so security workflows stay testable and reusable.

Guardrails

  • Use only on binaries you own or are explicitly authorized to assess.
  • Do not use this workflow to bypass legal restrictions or extract third-party proprietary content without authorization.
  • Prefer behavioral assurance and policy gating over ad-hoc one-off reverse-engineering.

Primitive Model

  1. collect-static — file metadata, runtime heuristics, linked libraries, embedded archive signatures.
  2. collect-dynamic — sandboxed execution trace (processes, file changes, network endpoints).
  3. collect-contract — machine-readable behavior contract from help-surface probing.
  4. compare-baseline — current vs baseline contract drift (added/removed commands, runtime change).
  5. enforce-policy — allowlist/denylist gates and severity-based verdict.
  6. collect-redteam — offline repo-surface attack-pack scan for prompt-injection, tool-misuse, secret-exfiltration, and unsafe-shell regressions.
  7. run — thin binary orchestrator that composes primitives and writes suite summary.

Quick Start

Single run (default dynamic command is --help):

python3 skills/security-suite/scripts/security_suite.py run \
  --binary "$(command -v ao)" \
  --out-dir .tmp/security-suite/ao-current

Baseline regression gate:

python3 skills/security-suite/scripts/security_suite.py run \
  --binary "$(command -v ao)" \
  --out-dir .tmp/security-suite/ao-current \
  --baseline-dir .tmp/security-suite/ao-baseline \
  --fail-on-removed

Policy gate:

python3 skills/security-suite/scripts/security_suite.py run \
  --binary "$(command -v ao)" \
  --out-dir .tmp/security-suite/ao-current \
  --policy-file skills/security-suite/references/policy-example.json \
  --fail-on-policy-fail

Repo-surface redteam:

python3 skills/security-suite/scripts/prompt_redteam.py scan \
  --repo-root . \
  --pack-file skills/security-suite/references/agentops-redteam-pack.json \
  --out-dir .tmp/security-suite-redteam

Recommended Workflow

  1. Capture baseline on known-good release.
  2. Run suite on candidate binary in CI.
  3. Compare against baseline and enforce policy.
  4. Block promotion on failing verdict.

Output Contract

All outputs are written under --out-dir:

  • static/static-analysis.json
  • dynamic/dynamic-analysis.json
  • contract/contract.json
  • compare/baseline-diff.json (when baseline supplied)
  • policy/policy-verdict.json (when policy supplied)
  • suite-summary.json
  • redteam/redteam-results.json (when repo-surface redteam is run)

This output structure is intentionally machine-consumable for CI gates.

Policy Model

Use skills/security-suite/references/policy-example.json as a starting point.

Supported checks:

  • required_top_level_commands
  • deny_command_patterns
  • max_created_files
  • forbid_file_path_patterns
  • allow_network_endpoint_patterns
  • deny_network_endpoint_patterns
  • block_if_removed_commands
  • min_command_count

Redteam Pack Model

Use agentops-redteam-pack.json as the starting point for offline repo-surface redteam checks.

Supported target fields:

  • globs
  • require_groups
  • forbidden_any
  • applies_if_any

Each case expresses a concrete adversarial prompt or operator-bypass attempt and binds it to one or more repo-owned files. The first shipped pack covers instruction precedence, context overexposure, destructive git misuse, security gate bypass, and unsafe shell or secret-handling regressions.

Technique Coverage

This suite is designed for broad binary classes, not just CLI metadata:

  • static runtime/library fingerprinting
  • sandboxed behavior observation
  • command/contract capture
  • drift classification
  • policy enforcement and CI verdicting
  • repo-surface redteam checks for prompt and operator-contract regressions

It is intentionally modular so you can add deeper primitives later (syscall tracing, SBOM attestation verification, fuzz harnesses) without rewriting the workflow.

Validation

Run:

bash skills/security-suite/scripts/validate.sh
bash tests/scripts/test-security-suite-redteam.sh

Smoke test (recommended):

python3 skills/security-suite/scripts/security_suite.py run \
  --binary "$(command -v ao)" \
  --out-dir .tmp/security-suite-smoke \
  --policy-file skills/security-suite/references/policy-example.json

Repo-surface smoke test:

python3 skills/security-suite/scripts/prompt_redteam.py scan \
  --repo-root . \
  --pack-file skills/security-suite/references/agentops-redteam-pack.json \
  --out-dir .tmp/security-suite-redteam-smoke

Examples

Scenario: Capture a Baseline and Gate a New Release

User says: /security-suite run --binary $(command -v ao) --out-dir .tmp/security-suite/ao-v2.4

What happens:

  1. The suite runs static analysis (file metadata, linked libraries, embedded archive signatures), dynamic tracing (sandboxed --help execution observing processes, file changes, network endpoints), and contract capture against the ao binary.
  2. It writes static/static-analysis.json, dynamic/dynamic-analysis.json, contract/contract.json, and suite-summary.json under the output directory.

Result: A complete baseline snapshot is captured for ao v2.4, ready to be used as --baseline-dir for future release comparisons.

Scenario: CI Regression Gate With Baseline and Policy

User says: /security-suite run --binary ./bin/ao-candidate --out-dir .tmp/ao-candidate --baseline-dir .tmp/security-suite/ao-v2.4 --policy-file skills/security-suite/references/policy-example.json --fail-on-removed --fail-on-policy-fail

What happens:

  1. The suite runs all three collection primitives on the candidate binary, then compares the resulting contract against the v2.4 baseline to produce compare/baseline-diff.json with any added, removed, or changed commands.
  2. It evaluates the policy file checks (required commands, denied patterns, network allowlists, file limits) and writes policy/policy-verdict.json with a pass/fail verdict.

Result: The suite exits non-zero if any commands were removed or a policy check failed, blocking the candidate from promotion in the CI pipeline.

Scenario: Offline Redteam the Repo's Prompt and Skill Surfaces

User says: /security-suite collect-redteam --repo-root .

What happens:

  1. The redteam scanner loads the attack pack from agentops-redteam-pack.json and evaluates repo-owned control surfaces against concrete attack cases.
  2. It writes redteam/redteam-results.json and redteam/redteam-results.md under the chosen output directory, then exits non-zero if a fail-severity case is not resisted.

Result: The repo gets a deterministic redteam verdict for prompt-injection, tool misuse, context overexposure, secret-handling, and unsafe-shell regressions without needing hosted model scanning.

Troubleshooting

ProblemCauseSolution
Suite exits non-zero with no clear finding--fail-on-removed or --fail-on-policy-fail triggered on a legitimate changeReview compare/baseline-diff.json and policy/policy-verdict.json to identify the specific delta, then update the baseline or policy file accordingly.
dynamic/dynamic-analysis.json is empty or minimalBinary requires arguments beyond --help, or sandbox blocked executionSupply a custom dynamic command if supported, or verify the binary runs in the sandboxed environment (check permissions, missing shared libraries).
contract/contract.json shows zero commandsThe binary does not expose a --help surface or uses a non-standard help flagVerify the binary supports --help; for binaries with unusual help interfaces, run collect-contract separately with the correct invocation.
Policy verdict fails on deny_command_patternsA new subcommand matches a deny regex in the policy fileEither rename the subcommand or update deny_command_patterns in your policy JSON to exclude the legitimate pattern.
baseline-diff.json not generated--baseline-dir was not provided or points to a missing directoryEnsure the baseline directory exists and contains a valid contract/contract.json from a prior run.
Redteam scan fails after a wording cleanupThe attack pack no longer matches the intended guardrail language in target filesReview redteam/redteam-results.json, confirm whether the control regressed or the regex is too brittle, then update the target file or the pack intentionally.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

security

No summary provided by upstream source.

Repository SourceNeeds Review
-155
boshu2
Automation

council

No summary provided by upstream source.

Repository SourceNeeds Review
-1.5K
boshu2
Automation

swarm

No summary provided by upstream source.

Repository SourceNeeds Review
-281
boshu2