Testing Expected Results
Run real commands and verify they produce the ACTUAL side effects and outputs you expect - not just "exit code 0." Catches the dangerous cases where commands "succeed" but don't do what they claim.
When to use me
Use this skill when:
- A command returns 0 but you're not sure it actually worked
- You need to verify side effects (files created, data changed, services running)
- Exit code checking gives false confidence
- "It ran without error" isn't enough proof
- Commands have complex side effects across multiple systems
- You're debugging "why did the deploy succeed but the app is down?"
What I do
1. Capture Pre-State
Before running the command, capture:
- Filesystem state (files, directories, permissions)
- Database state (records, schema)
- Process state (running services)
- Network state (ports, connections)
- Environment variables
2. Run the Command
Execute the actual command with:
- Timeout protection
- Resource limits
- Security sandboxing
- Output capture (stdout, stderr)
- Exit code capture
3. Capture Post-State
After the command completes, capture the same state.
4. Smart Comparison
Compare actual vs expected with intelligence:
- Exact match - For deterministic output
- Pattern match - For variable content (timestamps, UUIDs)
- Range match - For numeric values (response time, file size)
- Structure match - For JSON/XML (ignore key order)
- Semantic match - For content meaning (not just bytes)
- Existence check - For "should exist" / "should not exist"
- Delta check - For "should have changed by X"
5. Side Effect Verification
Verify specific side effects:
- Filesystem - File created/modified/deleted, permissions changed
- Database - Records inserted/updated, schema migrated
- Processes - Service started/stopped/restarted
- Network - Port bound, connection made, API called
- External - Cloud resources created, messages queued
6. Async/Delayed Effect Handling
For commands with eventual consistency:
- Poll with configurable intervals
- Wait for specific conditions
- Timeout handling
- Retry logic
Examples
# Verify a backup actually created a valid backup
bash scripts/verify.sh \
--command "./backup.sh --source=/data --dest=/backups" \
--expected 'file_exists:/backups/backup-$(date +%Y%m%d).tar.gz' \
--expected 'file_size:>100MB' \
--expected 'file_integrity:sha256' \
--negative 'file_modified:/data' \
--timeout 300
# Verify a deployment actually started the service
bash scripts/verify.sh \
--command "./deploy.sh --version=v2.0.0" \
--expected 'process_running:my-service' \
--expected 'port_listening:8080' \
--expected 'http_healthy:http://localhost:8080/health' \
--poll-interval 5 --timeout 120
# Verify a database migration actually changed the schema
bash scripts/verify.sh \
--command "./migrate.sh up" \
--expected 'db_table_exists:new_table' \
--expected 'db_column_exists:new_table.new_column' \
--expected 'db_constraint:unique_on_email' \
--db-connection "postgresql://localhost/mydb"
# Verify an export actually produced correct data
bash scripts/verify.sh \
--command "./export.sh --format=csv --output=/exports/users.csv" \
--expected 'file_exists:/exports/users.csv' \
--expected 'file_contains:"user_id,email,name"' \
--expected 'line_count:>1000' \
--expected 'csv_valid:yes' \
--negative 'file_contains:ERROR'
# Verify negative side effects (what shouldn't happen)
bash scripts/verify.sh \
--command "./cleanup.sh --days=30" \
--expected 'file_deleted:/tmp/old_stuff' \
--negative 'file_exists:/important/data' \
--negative 'file_deleted:/critical/config'
Verification Types
Filesystem Effects
file_exists:
path: /path/to/file
optional:
- min_size: 100MB # File must be at least this big
- max_size: 1GB # File must be at most this big
- permissions: 644 # Specific permissions
- owner: appuser # Specific owner
- modified_after: now # Modified after command started
- content_type: text # MIME type or magic number
file_contains:
path: /path/to/file
pattern: "string or regex"
optional:
- count: 1 # Must appear exactly N times
- line_number: 5 # Must be on specific line
file_hash:
path: /path/to/file
algorithm: sha256 # sha256, md5, sha512
expected: abc123... # Hash value (optional - just check hash exists)
directory_structure:
path: /path/to/dir
expected: |
dir/
dir/file1.txt
dir/subdir/
dir/subdir/file2.txt
Database Effects
db_table_exists:
name: users
connection: ${DB_URL}
db_column_exists:
table: users
column: email
type: varchar(255)
nullable: false
db_row_count:
table: users
where: "created_at > NOW() - INTERVAL '1 day'"
expected: 100
tolerance: +/- 10 # Allow 90-110
db_query_result:
query: "SELECT COUNT(*) FROM users WHERE active = true"
expected: "> 1000"
Process Effects
process_running:
name: my-service
optional:
- user: appuser
- cpu_percent: < 50
- memory_mb: < 1024
- uptime_seconds: > 60
port_listening:
port: 8080
protocol: tcp # tcp, udp
optional:
- interface: 0.0.0.0 # Specific bind address
- process_name: app # Must be owned by this process
Network Effects
http_request:
url: http://localhost:8080/health
method: GET
expected_status: 200
optional:
- timeout: 5
- expected_body: '{"status": "healthy"}'
- expected_headers: 'Content-Type: application/json'
- retry: 3
tcp_connect:
host: localhost
port: 5432
timeout: 5
Content Verification
csv_valid:
file: /path/to/file.csv
expected_columns: id,name,email
row_count: "> 100"
json_valid:
file: /path/to/file.json
schema: /path/to/schema.json # JSON Schema validation
required_paths:
- $.status
- $.data.users[0].name
Comparison Strategies
Handling Non-Determinism
Timestamps:
# Match any ISO8601 timestamp
--expected 'file_contains:{{TIMESTAMP}}'
# Match timestamp within range
--expected 'file_contains:{{TIMESTAMP_RANGE:2024-01-01,2024-12-31}}'
UUIDs:
# Match any UUID
--expected 'file_contains:{{UUID}}'
# Match UUID pattern but validate it
--expected 'file_contains:{{UUID_FORMAT}}'
Order-Independent:
# For JSON arrays, sets, etc.
--expected 'json_path:$.data.items contains [1,2,3] (any order)'
Partial Matching
# File must contain ALL these patterns
--expected 'file_contains_all:["success", "completed", "exit 0"]'
# File must contain AT LEAST ONE of these
--expected 'file_contains_any:["success", "done", "finished"]'
# File must contain pattern EXACTLY N times
--expected 'file_contains:"ERROR" count:0' # No errors
Security
Sandboxing:
# Run in container
bash scripts/verify.sh --sandbox container ...
# Run with limited permissions
bash scripts/verify.sh --sandbox chroot --chroot-dir /tmp/sandbox ...
# Resource limits
bash scripts/verify.sh --max-memory 1GB --max-cpu 50% --timeout 300 ...
Secret Masking:
# Automatically mask common secret patterns in output
bash scripts/verify.sh --mask-secrets ...
Output Format
Verification Report
===================
Command: ./backup.sh --source=/data --dest=/backups
Exit Code: 0
Duration: 45.2s
Pre-State Captured:
Files: 1,247
Database tables: 23
Processes: 12
Post-State Captured:
Files: 1,248 (+1)
Database tables: 23 (unchanged)
Processes: 12 (unchanged)
Expected Results Verification:
✅ file_exists:/backups/backup-20240308.tar.gz
- Path exists: yes
- Size: 1.2GB (expected: >100MB) ✅
- Permissions: 644 ✅
- Created: 2024-03-08T10:30:15Z (after command start) ✅
- Hash (sha256): abc123... ✅
❌ file_integrity (custom check)
- Can extract archive: yes
- Can restore from backup: FAILED
- Error: "table users has wrong schema version"
Negative Assertions:
✅ file_modified:/data - No changes detected
✅ file_deleted:/important - No deletions detected
Async Effects:
✅ service_health (after 30s polling)
- Service responsive: yes
- Health check passed: yes
Result: FAILED
Discrepancy Analysis:
The backup file was created with correct size and permissions,
but integrity check reveals it cannot be restored. The schema
version mismatch suggests the backup captured incompatible data.
Recommendations:
1. Run schema migration before backup
2. Add schema version check to backup script
3. Include test restore in backup verification
Commands Run:
Pre-state capture: 0.5s
Command execution: 42.1s
Post-state capture: 0.4s
Verification: 2.2s
Total: 45.2s
Limitations
What we CAN'T verify:
- In-memory state changes (caches, variables)
- Browser/client-side state
- Side effects in systems we can't access
- Changes that happen after verification timeout
- Non-deterministic behavior that changes between runs
- Effects in distributed systems with eventual consistency (we can poll but may miss)
Known Issues:
- Time-of-check-time-of-use (TOCTOU) race conditions
- State changes between capture and verification
- Verification is only as good as the expected results definition
Trust But Verify
This skill implements "trust but verify" - we trust the command ran, but we verify it did what it claimed. Always remember:
- Exit code 0 doesn't mean success
- Success doesn't mean correctness
- Correctness doesn't mean safety
- Safety doesn't mean completeness
Use multiple verification layers for critical operations.
Notes
- Verification adds overhead. Use selectively for critical commands.
- Define expected results exhaustively - partial verification gives false confidence.
- Include negative assertions (what shouldn't happen) alongside positive ones.
- For long-running commands, use async verification with polling.
- Always include timeout to prevent hanging on failed commands.
- Use semantic matchers (structure, pattern) over exact string comparison when possible.
- Document in your expected results WHY each check matters - future you will thank you.