evidence-verification

Evidence Verification

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "evidence-verification" with this command: npx skills add yonatangross/orchestkit/yonatangross-orchestkit-evidence-verification

Evidence Verification

Ensures all claims are backed by executable proof: test results, coverage metrics, build success, and deployment verification.

Key Principle: Show, don't tell. No task is complete without verifiable evidence.

Overview

Auto-Activate Triggers

  • Completing code implementation

  • Finishing code review

  • Marking tasks complete in Squad mode

  • Before agent handoff

  • Production deployment verification

Manual Activation

  • When user requests "verify this works"

  • Before creating pull requests

  • During quality assurance reviews

  • When troubleshooting failures

Evidence Verification

Core Concepts

  1. Evidence Types

Test Evidence

  • Exit code (must be 0 for success)

  • Test suite results (passed/failed/skipped)

  • Coverage percentage (if available)

  • Test duration

Build Evidence

  • Build exit code (0 = success)

  • Compilation errors/warnings

  • Build artifacts created

  • Build duration

Deployment Evidence

  • Deployment status (success/failed)

  • Environment deployed to

  • Health check results

  • Rollback capability verified

Code Quality Evidence

  • Linter results (errors/warnings)

  • Type checker results

  • Security scan results

  • Accessibility audit results

  1. Evidence Collection Protocol

Evidence Collection Steps

  1. Identify Verification Points

    • What needs to be proven?
    • What could go wrong?
    • What does "complete" mean?
  2. Execute Verification

    • Run tests
    • Run build
    • Run linters
    • Check deployments
  3. Capture Results

    • Record exit codes
    • Save output snippets
    • Note timestamps
    • Document environment
  4. Store Evidence

    • Add to shared context
    • Reference in task completion
    • Link to artifacts
  5. Verification Standards

Minimum Evidence Requirements:

  • ✅ At least ONE verification type executed

  • ✅ Exit code captured (0 = pass, non-zero = fail)

  • ✅ Timestamp recorded

  • ✅ Evidence stored in context

Production-Grade Requirements:

  • ✅ Tests run with exit code 0

  • ✅ Coverage >70% (or project standard)

  • ✅ Build succeeds with exit code 0

  • ✅ No critical linter errors

  • ✅ Security scan passes

Evidence Verification

Evidence Collection Templates

Template 1: Test Evidence

Use this template when running tests:

Test Evidence

Command: npm test (or equivalent) Exit Code: 0 ✅ / non-zero ❌ Duration: X seconds Results:

  • Tests passed: X
  • Tests failed: X
  • Tests skipped: X
  • Coverage: X%

Output Snippet:

[First 10 lines of test output]

Timestamp: YYYY-MM-DD HH:MM:SS Environment: Node vX.X.X, OS, etc.

Template 2: Build Evidence

Use this template when building:

Build Evidence

Command: npm run build (or equivalent) Exit Code: 0 ✅ / non-zero ❌ Duration: X seconds Artifacts Created:

  • dist/bundle.js (245 KB)
  • dist/styles.css (18 KB)

Errors: X Warnings: X

Output Snippet:

[First 10 lines of build output]

Timestamp: YYYY-MM-DD HH:MM:SS

Template 3: Code Quality Evidence

Use this template for linting and type checking:

Code Quality Evidence

Linter: ESLint / Ruff / etc. Command: npm run lint Exit Code: 0 ✅ / non-zero ❌ Errors: X Warnings: X

Type Checker: TypeScript / mypy / etc. Command: npm run typecheck Exit Code: 0 ✅ / non-zero ❌ Type Errors: X

Timestamp: YYYY-MM-DD HH:MM:SS

Evidence Verification

Quality Standards

Minimum Acceptable

✅ Tests executed with captured exit code ✅ Timestamp recorded ✅ Evidence stored in context

Production-Grade

✅ Tests pass (exit code 0) ✅ Coverage ≥70% (or project standard) ✅ Build succeeds (exit code 0) ✅ No critical linter errors ✅ Type checker passes ✅ Security scan shows no critical issues

Gold Standard

✅ All production-grade requirements ✅ Coverage ≥80% ✅ No linter warnings ✅ Performance benchmarks within thresholds ✅ Accessibility audit passes (WCAG 2.1 AA) ✅ Integration tests pass ✅ Deployment verification complete

Evidence Verification

Common Pitfalls

❌ Don't Skip Evidence Collection

Bad:

"I've implemented the login feature. It should work correctly."

Good:

"I've implemented the login feature. Evidence:

  • Tests: Exit code 0, 12 tests passed, 0 failed
  • Build: Exit code 0, no errors
  • Coverage: 89% Task complete with verification."

❌ Don't Fake Evidence

Bad:

"Tests passed" (without actually running them)

Good:

"Tests passed. Exit code: 0 Command: npm test Output: Test Suites: 3 passed, 3 total Timestamp: 2025-11-02 10:30:15"

❌ Don't Ignore Failed Evidence

Bad:

"Build failed with exit code 1, but the code looks correct so marking complete."

Good:

"Build failed with exit code 1. Errors:

  • TypeError: Cannot read property 'id' of undefined (line 42) Fixing the error now before marking complete."

Evidence Verification

Quick Reference

Evidence Collection Checklist

Before marking task complete:

  • Tests executed
  • Test exit code captured (0 = pass)
  • Build executed (if applicable)
  • Build exit code captured (0 = pass)
  • Code quality checks run (linter, types)
  • Evidence documented with timestamp
  • Evidence added to shared context
  • Evidence summary in completion message

Common Commands by Language/Framework

JavaScript/TypeScript:

npm test # Run tests npm run build # Build project npm run lint # Run ESLint npm run typecheck # Run TypeScript compiler

Python:

pytest # Run tests pytest --cov # Run tests with coverage ruff check . # Run linter mypy . # Run type checker

Evidence Verification

Remember: Evidence-first development prevents hallucinations, ensures production quality, and builds confidence. When in doubt, collect more evidence, not less.

Related Skills

  • unit-testing

  • Unit test patterns for generating test evidence

  • integration-testing

  • Integration test patterns for component verification

  • security-scanning

  • Security scan evidence collection (npm audit, pip-audit)

  • test-standards-enforcer

  • Enforce evidence collection standards

Key Decisions

Decision Choice Rationale

Minimum Coverage 70% Industry standard for production-grade code

Exit Code Requirement 0 = pass Unix standard for success/failure indication

Gold Standard Coverage 80% Higher bar for critical paths

Retry Before Block 2 attempts Allow fix attempts before escalation

Capability Details

exit-code-validation

Keywords: exit code, return code, success, failure, status, $?, exit 0, non-zero Solves:

  • How do I verify command succeeded?

  • Check exit codes for evidence (0 = pass)

  • Validate build/test success with exit codes

  • Capture command exit status in evidence

test-evidence

Keywords: test results, test output, coverage report, test evidence, jest, pytest, test suite, passed, failed Solves:

  • How do I capture test evidence?

  • Record test results in session state

  • Prove tests passed with exit code 0

  • Document test coverage percentage

  • Capture passed/failed/skipped counts

build-evidence

Keywords: build log, build output, compile, bundle, webpack, vite, cargo build, npm build Solves:

  • How do I capture build evidence?

  • Record build success with exit code

  • Verify compilation without errors

  • Document build artifacts created

  • Track build duration and warnings

code-quality-evidence

Keywords: linter, lint, eslint, ruff, type check, mypy, typescript, code quality, warnings, errors Solves:

  • How do I capture code quality evidence?

  • Run linter and capture results

  • Execute type checker and record errors

  • Document linter errors and warnings count

  • Prove code quality checks passed

deployment-evidence

Keywords: deployment, deploy, production, staging, health check, rollback, deployment status Solves:

  • How do I verify deployment succeeded?

  • Check health endpoints after deploy

  • Verify application started successfully

  • Document deployment status and environment

  • Confirm rollback capability exists

security-scan-evidence

Keywords: security, vulnerability, npm audit, pip-audit, security scan, cve, critical vulnerabilities Solves:

  • How do I capture security scan results?

  • Run npm audit or pip-audit

  • Document critical vulnerabilities found

  • Record security scan exit code

  • Prove no critical security issues

evidence-storage

Keywords: session state, state.json, evidence storage, record evidence, save results, quality_evidence, context 2.0 Solves:

  • How do I store evidence in context?

  • Update session/state.json with results

  • Structure evidence data properly

  • Add timestamp to evidence records

  • Link to evidence log files

combined-evidence-report

Keywords: evidence report, task completion, verification summary, proof of completion, comprehensive evidence Solves:

  • How do I create complete evidence report?

  • Combine test, build, and quality evidence

  • Create task completion evidence summary

  • Document all verification checks run

  • Provide comprehensive proof of completion

evidence-collection-workflow

Keywords: evidence workflow, verification steps, evidence protocol, collection process, verification checklist Solves:

  • What steps to collect evidence?

  • Follow evidence collection protocol

  • Run all necessary verification checks

  • Complete evidence checklist before marking done

  • Ensure minimum evidence requirements met

quality-standards

Keywords: quality standards, minimum requirements, production-grade, gold standard, evidence thresholds Solves:

  • What evidence is required to pass?

  • Understand minimum vs production-grade standards

  • Meet gold standard evidence requirements

  • Know when evidence is sufficient

  • Validate evidence meets project standards

evidence-pitfalls

Keywords: evidence mistakes, common errors, skip evidence, fake evidence, ignore failures Solves:

  • What evidence mistakes to avoid?

  • Never skip evidence collection

  • Don't fake evidence results

  • Don't ignore failed evidence

  • Always re-collect after changes

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ui-components

No summary provided by upstream source.

Repository SourceNeeds Review
General

responsive-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

domain-driven-design

No summary provided by upstream source.

Repository SourceNeeds Review
General

dashboard-patterns

No summary provided by upstream source.

Repository SourceNeeds Review