Validate Feature

Deploy the feature locally with DEBUG logging, run security scans and behavioral tests against live services, check CI/CD status, and verify OpenSpec spec compliance. Produces a structured validation report and posts it to the PR.

Arguments

$ARGUMENTS - OpenSpec change-id (required), optionally followed by flags:

--skip-e2e or --skip-playwright — skip the Playwright E2E phase
--skip-ci — skip the CI/CD status check
--skip-security — skip the Security Scan phase
--phase <name>[,<name>] — run only specified phases (e.g., --phase smoke,security)

Valid phase names: deploy, smoke, security, e2e, architecture, spec, logs, ci

Prerequisites

Feature branch openspec/<change-id> exists with implementation commits
Docker/docker-compose installed and running (for Deploy phase)
Approved OpenSpec proposal exists at openspec/changes/<change-id>/
Run /implement-feature first if no implementation exists

OpenSpec Execution Preference

Use OpenSpec-generated runtime assets first, then CLI fallback:

Claude: .claude/commands/opsx/*.md or .claude/skills/openspec-*/SKILL.md
Codex: .codex/skills/openspec-*/SKILL.md
Gemini: .gemini/commands/opsx/*.toml or .gemini/skills/openspec-*/SKILL.md
Fallback: direct openspec CLI commands

Coordinator Integration (Optional)

Use docs/coordination-detection-template.md as the shared detection preamble.

Detect transport and capability flags at skill start
Execute hooks only when the matching CAN_* flag is true
If coordinator is unavailable, continue with standalone behavior

Steps

0. Detect Coordinator and Recall Memory

At skill start, run the coordination detection preamble and set:

COORDINATOR_AVAILABLE
COORDINATION_TRANSPORT (mcp|http|none)
CAN_LOCK, CAN_QUEUE_WORK, CAN_HANDOFF, CAN_MEMORY, CAN_GUARDRAILS

If CAN_MEMORY=true, recall relevant validation history:

MCP path: recall
HTTP path: scripts/coordination_bridge.py try_recall(...)

On recall failure/unavailability, continue with validation and log informationally.

1. Determine Change ID and Configuration

# Parse change-id from argument or current branch
BRANCH=$(git branch --show-current)
CHANGE_ID=${ARGUMENTS%% --*}  # Everything before first flag
CHANGE_ID=${CHANGE_ID:-$(echo $BRANCH | sed 's/^openspec\///')}

# Detect worktree context and resolve OpenSpec path
eval "$(python3 scripts/worktree.py detect)"
PROJECT_ROOT="${MAIN_REPO:-$(git rev-parse --show-toplevel)}"

Parse flags from $ARGUMENTS:

--skip-e2e or --skip-playwright → set SKIP_E2E=true
--skip-ci → set SKIP_CI=true
--skip-security → set SKIP_SECURITY=true
--phase <names> → set PHASES to comma-separated list; only run those phases

If --phase is provided, only the listed phases execute. If --phase includes phases other than deploy, assume services are already running (skip deploy and teardown).

2. Verify Prerequisites

# Verify on feature branch
git branch --show-current  # Should be openspec/<change-id>

# Verify proposal exists
openspec show $CHANGE_ID

# Verify implementation commits exist
COMMIT_COUNT=$(git log --oneline main..HEAD | wc -l)
if [ "$COMMIT_COUNT" -eq 0 ]; then
  echo "ERROR: No implementation commits found on this branch."
  echo "Run /implement-feature $CHANGE_ID first."
  exit 1
fi

# Check Docker availability (only if Deploy phase will run)
if docker info > /dev/null 2>&1; then
  echo "Docker is available"
else
  echo "ERROR: Docker is not available. Install Docker Desktop or start the Docker daemon."
  echo "  macOS: brew install --cask docker"
  echo "  Linux: sudo systemctl start docker"
  exit 1
fi

If not on the feature branch, check out openspec/<change-id>. If no implementation commits exist, abort with guidance.

2.5. Prepare Validation Artifacts

Preferred path:

Use runtime-native verify/continue workflow (opsx:verify equivalent) for artifact guidance.

CLI fallback path:

openspec instructions validation-report --change "$CHANGE_ID"
openspec instructions architecture-impact --change "$CHANGE_ID"
openspec status --change "$CHANGE_ID"

Ensure validation-report.md and architecture-impact.md are updated in the change directory as part of this validation run.

3. Deploy Phase

Phase name: deploy Criticality: Critical (stops validation on failure)

# Find docker-compose file
COMPOSE_FILE=$(find "$PROJECT_ROOT" -maxdepth 2 -name "docker-compose.yml" | head -1)

if [ -z "$COMPOSE_FILE" ]; then
  echo "SKIP: No docker-compose.yml found. Skipping Deploy phase."
  echo "  Smoke tests will run against already-running services."
  DEPLOY_SKIPPED=true
else
  COMPOSE_DIR=$(dirname "$COMPOSE_FILE")
  LOG_FILE="/tmp/validate-feature-${CHANGE_ID}-$(date +%s).log"

  echo "Starting services with DEBUG logging..."
  echo "  Compose file: $COMPOSE_FILE"
  echo "  Log file: $LOG_FILE"

  # Start services with DEBUG logging, redirect output to log file
  AGENT_COORDINATOR_DB_PORT=${AGENT_COORDINATOR_DB_PORT:-54322} \
  AGENT_COORDINATOR_REST_PORT=${AGENT_COORDINATOR_REST_PORT:-3000} \
  AGENT_COORDINATOR_REALTIME_PORT=${AGENT_COORDINATOR_REALTIME_PORT:-4000} \
  LOG_LEVEL=DEBUG docker-compose -f "$COMPOSE_FILE" up -d 2>&1 | tee "$LOG_FILE"

  # Wait for health checks
  echo "Waiting for services to be healthy..."
  docker-compose -f "$COMPOSE_FILE" ps

  # Wait for PostgreSQL health check (up to 30 seconds)
  for i in $(seq 1 30); do
    if docker-compose -f "$COMPOSE_FILE" exec -T postgres pg_isready -U postgres > /dev/null 2>&1; then
      echo "PostgreSQL is ready"
      break
    fi
    sleep 1
  done

  # Wait for REST API (up to 15 seconds)
  for i in $(seq 1 15); do
    if curl -s http://localhost:${AGENT_COORDINATOR_REST_PORT:-3000}/ > /dev/null 2>&1; then
      echo "REST API is ready"
      break
    fi
    sleep 1
  done

  # Collect running container logs in background
  docker-compose -f "$COMPOSE_FILE" logs -f >> "$LOG_FILE" 2>&1 &
  LOG_PID=$!

  DEPLOY_RESULT="pass"
fi

If Deploy fails, report the failure with Docker logs and skip to Teardown.

4. Smoke Phase

Phase name: smoke Criticality: Critical (stops validation on failure)

Run the reusable pytest smoke test suite against the live services. The suite is configurable via environment variables so it works with any deployed HTTP API.

# Configure for the target API (adjust per project)
export API_BASE_URL="${API_BASE_URL:-http://localhost:8000}"
export API_HEALTH_ENDPOINT="${API_HEALTH_ENDPOINT:-/health}"
export API_READY_ENDPOINT="${API_READY_ENDPOINT:-/ready}"
export API_AUTH_HEADER="${API_AUTH_HEADER:-X-Admin-Key}"
export API_AUTH_VALUE="${API_AUTH_VALUE:-$ADMIN_API_KEY}"
export API_PROTECTED_ENDPOINT="${API_PROTECTED_ENDPOINT:-/api/v1/settings/prompts}"
export API_CORS_ORIGIN="${API_CORS_ORIGIN:-http://localhost:5173}"

# Run smoke tests
SKILL_DIR="$(git rev-parse --show-toplevel)/skills/validate-feature"
pytest "$SKILL_DIR/scripts/smoke_tests/" -v --tb=short 2>&1
SMOKE_EXIT=$?

if [ $SMOKE_EXIT -eq 0 ]; then
  SMOKE_RESULT="pass"
elif [ $SMOKE_EXIT -eq 5 ]; then
  # Exit code 5 = no tests collected (services not running, all skipped)
  SMOKE_RESULT="skip"
  echo "SKIP: Services not running — smoke tests auto-skipped"
else
  SMOKE_RESULT="fail"
  SMOKE_FAILED=true
fi

The smoke tests cover:

Health: Health and readiness endpoints respond with 2xx
Auth enforcement: No credentials → 401/403, valid credentials → 2xx, garbage credentials rejected
CORS: Preflight returns correct Access-Control-* headers (skipped if CORS not configured)
Error sanitization: Error responses don't leak filesystem paths, stack traces, internal IPs, or credentials
Security headers: Content-Type set correctly, Server header not overly detailed, no X-Powered-By

If Smoke fails (SMOKE_EXIT != 0 and != 5), stop validation and skip to Teardown.

5. Security Phase

Phase name: security Criticality: Non-critical (continues on failure)

Run security scanners (OWASP Dependency-Check and ZAP) against the live deployment using the existing security-review orchestrator.

# Skip if --skip-security flag was provided
if [ "$SKIP_SECURITY" = true ]; then
  echo "SKIP: Security phase skipped (--skip-security flag)"
  SECURITY_RESULT="skip"
else
  echo "Running security scans against live deployment..."

  # Invoke the security-review orchestrator with the live API target
  python3 skills/security-review/scripts/main.py \
    --repo . \
    --out-dir docs/security-review \
    --zap-target "http://localhost:${AGENT_COORDINATOR_REST_PORT:-3000}" \
    --change "$CHANGE_ID" \
    --allow-degraded-pass 2>&1
  SECURITY_EXIT=$?

  if [ $SECURITY_EXIT -eq 0 ]; then
    SECURITY_RESULT="pass"
    echo "Security: PASS — No threshold findings"
  elif [ $SECURITY_EXIT -eq 10 ]; then
    SECURITY_RESULT="fail"
    echo "Security: FAIL — Threshold findings detected"
  elif [ $SECURITY_EXIT -eq 11 ]; then
    SECURITY_RESULT="degraded"
    echo "Security: INCONCLUSIVE — Scanners degraded (check prerequisites)"
  else
    SECURITY_RESULT="fail"
    echo "Security: ERROR — Unexpected exit code $SECURITY_EXIT"
  fi
fi

The Security phase reuses the /security-review skill's scripts without requiring a separate invocation. The --allow-degraded-pass flag ensures missing prerequisites (Java, container runtime) degrade gracefully instead of blocking validation.

6. E2E Phase

Phase name: e2e Criticality: Non-critical (continues on failure)

# Skip if --skip-e2e flag was provided
if [ "$SKIP_E2E" = true ]; then
  echo "SKIP: E2E phase skipped (--skip-e2e flag)"
else
  # Check if pytest-playwright is installed
  if python -c "import playwright" 2>/dev/null; then
    PLAYWRIGHT_AVAILABLE=true
  else
    PLAYWRIGHT_AVAILABLE=false
  fi

  # Check if E2E tests exist
  E2E_DIR=$(find "$PROJECT_ROOT" -path "*/tests/e2e" -type d | head -1)

  if [ -z "$E2E_DIR" ]; then
    echo "SKIP: No tests/e2e/ directory found. Skipping E2E phase."
  elif [ "$PLAYWRIGHT_AVAILABLE" = false ]; then
    echo "SKIP: pytest-playwright not installed. To install:"
    echo "  pip install pytest-playwright"
    echo "  playwright install chromium"
  else
    echo "Running E2E tests from $E2E_DIR..."
    pytest "$E2E_DIR" -v --tb=short 2>&1
    E2E_EXIT=$?

    if [ $E2E_EXIT -eq 0 ]; then
      E2E_RESULT="pass"
    else
      E2E_RESULT="fail"
    fi
  fi
fi

6b. Architecture Diagnostics Phase

Phase name: architecture Criticality: Non-critical (continues on failure)

Run architecture flow validation against the changed files:

# Get changed files relative to main
CHANGED_FILES=$(git diff --name-only main...HEAD | tr '\n' ',')

if [ -f "scripts/validate_flows.py" ] && [ -f "docs/architecture-analysis/architecture.graph.json" ]; then
  echo "Running architecture validation on changed files..."
  python scripts/validate_flows.py \
    --graph docs/architecture-analysis/architecture.graph.json \
    --output docs/architecture-analysis/architecture.diagnostics.json \
    --files "$CHANGED_FILES" 2>&1
  ARCH_EXIT=$?

  if [ $ARCH_EXIT -eq 0 ]; then
    ARCH_RESULT="pass"
    ARCH_ERRORS=$(python -c "import json; d=json.load(open('docs/architecture-analysis/architecture.diagnostics.json')); print(d['summary']['errors'])" 2>/dev/null || echo 0)
    ARCH_WARNINGS=$(python -c "import json; d=json.load(open('docs/architecture-analysis/architecture.diagnostics.json')); print(d['summary']['warnings'])" 2>/dev/null || echo 0)
    if [ "$ARCH_ERRORS" -gt 0 ]; then
      ARCH_RESULT="fail"
    elif [ "$ARCH_WARNINGS" -gt 0 ]; then
      ARCH_RESULT="warn"
    fi
  else
    ARCH_RESULT="fail"
  fi
else
  echo "SKIP: Architecture validation not available (missing scripts or artifacts)"
  echo "  Run 'make architecture' to generate architecture artifacts"
  ARCH_RESULT="skip"
fi

Report architecture diagnostics including broken flows, missing test coverage, orphaned code, and disconnected endpoints.

7. Spec Compliance Phase

Phase name: spec Criticality: Non-critical (continues on failure)

Read OpenSpec spec deltas for the change and verify each scenario against the live system:

Read $OPENSPEC_PATH/changes/<change-id>/specs/ to find all spec delta files
For each delta file, extract #### Scenario: blocks
For each scenario, interpret the WHEN/THEN/AND clauses and verify against the live system:
- API scenarios: Make HTTP requests to the running service and verify responses
- MCP tool scenarios: Invoke MCP tools via the Python module and check results
- Database scenarios: Query PostgreSQL directly and verify state
- Configuration scenarios: Check file existence, content, or environment variables
Record pass/fail per scenario with details on any mismatches

Example verification pattern:

# For a scenario like:
# WHEN agent calls discover_agents(capability?, status?)
# THEN system returns array of {agent_id, agent_type, capabilities, status, ...}

import httpx
rest_port = os.environ.get("AGENT_COORDINATOR_REST_PORT", "3000")
response = httpx.get(f"http://localhost:{rest_port}/agent_sessions?status=eq.active")
assert response.status_code == 200
data = response.json()
# Verify response shape matches scenario expectations

Report results in a structured table:

Spec Compliance Results:
  ✓ Session Continuity > Agent writes handoff document
  ✓ Session Continuity > Agent reads previous handoff
  ✗ Agent Discovery > No matching agents: Expected empty array, got 500
  ✓ Heartbeat > Agent heartbeat updates timestamp

8. Log Analysis Phase

Phase name: logs Criticality: Non-critical (continues on failure)

Scan the collected log file for warning signs:

if [ -f "$LOG_FILE" ]; then
  echo "Analyzing logs: $LOG_FILE"
  echo "Log file size: $(wc -l < "$LOG_FILE") lines"

  # Count by severity
  WARNINGS=$(grep -c -i "WARNING" "$LOG_FILE" 2>/dev/null || echo 0)
  ERRORS=$(grep -c -i "ERROR" "$LOG_FILE" 2>/dev/null || echo 0)
  CRITICALS=$(grep -c -i "CRITICAL" "$LOG_FILE" 2>/dev/null || echo 0)

  # Check for specific patterns
  DEPRECATIONS=$(grep -c -i "deprecat" "$LOG_FILE" 2>/dev/null || echo 0)
  STACK_TRACES=$(grep -c "Traceback" "$LOG_FILE" 2>/dev/null || echo 0)
  UNHANDLED=$(grep -c "unhandled\|uncaught" "$LOG_FILE" 2>/dev/null || echo 0)

  echo "  Warnings: $WARNINGS"
  echo "  Errors: $ERRORS"
  echo "  Critical: $CRITICALS"
  echo "  Deprecations: $DEPRECATIONS"
  echo "  Stack traces: $STACK_TRACES"
  echo "  Unhandled exceptions: $UNHANDLED"

  # Show context for errors and critical entries
  if [ "$ERRORS" -gt 0 ] || [ "$CRITICALS" -gt 0 ]; then
    echo ""
    echo "Error/Critical entries with context:"
    grep -n -i -B2 -A2 "ERROR\|CRITICAL" "$LOG_FILE" | head -50
  fi

  # Show deprecation warnings
  if [ "$DEPRECATIONS" -gt 0 ]; then
    echo ""
    echo "Deprecation notices:"
    grep -n -i "deprecat" "$LOG_FILE" | head -20
  fi
else
  echo "SKIP: No log file available (Deploy phase was skipped or no services started)"
fi

Categorize findings by severity:

Critical: CRITICAL log entries, unhandled exceptions, stack traces
Warning: WARNING entries, deprecation notices
Info: Unusual patterns, high log volume from specific components

9. CI/CD Status Phase

Phase name: ci Criticality: Non-critical (continues on failure)

# Skip if --skip-ci flag was provided
if [ "$SKIP_CI" = true ]; then
  echo "SKIP: CI/CD check skipped (--skip-ci flag)"
else
  # Check if GitHub remote is configured
  if git remote get-url origin > /dev/null 2>&1; then
    # Check if PR exists for this branch
    PR_URL=$(gh pr view "openspec/$CHANGE_ID" --json url --jq '.url' 2>/dev/null)

    if [ -n "$PR_URL" ]; then
      echo "PR found: $PR_URL"
      echo ""
      echo "CI/CD Check Status:"
      gh pr checks "openspec/$CHANGE_ID" 2>/dev/null || echo "  No CI checks configured yet"
    else
      echo "No PR found for openspec/$CHANGE_ID"
      echo "Checking latest workflow runs..."
      gh run list --branch "openspec/$CHANGE_ID" --limit 3 2>/dev/null || echo "  No workflow runs found"
    fi
  else
    echo "SKIP: No GitHub remote configured"
  fi
fi

10. Teardown

Stop services and clean up:

# Only teardown if we started services (Deploy phase ran)
if [ "$DEPLOY_SKIPPED" != true ] && [ -n "$COMPOSE_FILE" ]; then
  echo "Stopping services..."

  # Stop background log collection
  if [ -n "$LOG_PID" ]; then
    kill $LOG_PID 2>/dev/null
  fi

  # Stop docker-compose services
  docker-compose -f "$COMPOSE_FILE" down

  echo "Services stopped"
fi

# Handle log file
if [ -f "$LOG_FILE" ]; then
  if [ "$ALL_PHASES_PASSED" = true ]; then
    rm "$LOG_FILE"
    echo "Log file removed (all phases passed)"
  else
    echo "Log file preserved for inspection: $LOG_FILE"
  fi
fi

11. Validation Report

Produce a structured summary of all phases:

## Validation Report: <change-id>

**Date**: YYYY-MM-DD HH:MM:SS
**Commit**: <short SHA>
**Branch**: openspec/<change-id>

### Phase Results

✓ Deploy: Services started (N containers, DEBUG logging enabled)
✓ Smoke: All health checks passed (API, MCP, database)
✓ Security: PASS — No threshold findings (dependency-check: ok, zap: ok)
✗ E2E: 3/5 tests passed, 2 failures
  - test_login_flow: TimeoutError on /api/auth
  - test_dashboard_load: Element not found: #stats-panel
✓ Architecture: No broken flows (2 warnings: orphaned functions)
✓ Spec Compliance: 8/8 scenarios verified
⚠ Log Analysis: 3 warnings found
  - [WARNING] Deprecated function call: old_api_handler (line 142)
✓ CI/CD: All checks passing

### Result

**PASS** — Ready for `/cleanup-feature <change-id>`

_or_

**FAIL** — Address findings, then re-run `/validate-feature` or `/iterate-on-implementation`

Use these symbols:

✓ — Phase passed
✗ — Phase failed
⚠ — Phase passed with warnings
○ — Phase skipped

12. Persist Report

Write the validation report to the OpenSpec change directory:

REPORT_FILE="$OPENSPEC_PATH/changes/$CHANGE_ID/validation-report.md"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
COMMIT_SHA=$(git rev-parse --short HEAD)

# Write report (overwrites previous)
cat > "$REPORT_FILE" << EOF
# Validation Report: $CHANGE_ID

**Date**: $TIMESTAMP
**Commit**: $COMMIT_SHA
**Branch**: openspec/$CHANGE_ID

## Phase Results

<phase results from Step 10>

## Result

<PASS or FAIL with guidance>
EOF

echo "Report written to: $REPORT_FILE"

13. PR Comment

Post the validation report as a PR comment:

PR_NUMBER=$(gh pr view "openspec/$CHANGE_ID" --json number --jq '.number' 2>/dev/null)

if [ -n "$PR_NUMBER" ]; then
  gh pr comment "$PR_NUMBER" --body "$(cat <<EOF
## 🔍 Automated Validation Report

<contents of validation report from Step 10>

---
_Generated by \`/validate-feature $CHANGE_ID\` at $TIMESTAMP_
EOF
)"
  echo "Report posted to PR #$PR_NUMBER"
else
  echo "SKIP: No PR found for openspec/$CHANGE_ID — report not posted"
  echo "  Create a PR first, then re-run to post the report"
fi

After Validation

If all phases PASS:

Ready for cleanup:
/cleanup-feature <change-id>

If phases FAIL:

Option 1: Fix findings and re-validate:
/iterate-on-implementation <change-id>
/validate-feature <change-id>

Option 2: Re-run specific failing phases:
/validate-feature <change-id> --phase smoke,spec

Option 3: Skip non-critical failures and proceed:
/cleanup-feature <change-id>

Present the validation report and let the user decide the next step.

Output

Validation report printed to console
Report persisted to openspec/changes/<change-id>/validation-report.md
Report posted as PR comment (if PR exists)
Services cleaned up (if Deploy phase ran)
Log file preserved (if failures occurred) or removed (if all passed)

If CAN_MEMORY=true, remember validation outcomes (phase pass/fail, key regressions, and next actions):

MCP path: remember
HTTP path: scripts/coordination_bridge.py try_remember(...)

Next Step

After validation passes:

/cleanup-feature <change-id>

linear-validate-feature

Safety Notice

Copy this and send it to your AI assistant to learn