completion-evaluator

A technology-agnostic skill for evaluating deliverables against acceptance criteria and achievement indicators.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "completion-evaluator" with this command: npx skills add masanao-ohba/claude-manifests/masanao-ohba-claude-manifests-completion-evaluator

Completion Evaluator

A technology-agnostic skill for evaluating deliverables against acceptance criteria and achievement indicators.

Core Purpose

Provide objective, evidence-based evaluation of deliverables to determine:

  • Whether acceptance criteria are met

  • Pass/fail verdicts with justification

  • Specific feedback for improvements

  • Next action recommendations

Evaluation Framework

Achievement Indicator Categories

completeness_indicators: description: "What must be present in the deliverable" examples: - "All required functions implemented" - "Error handling present" - "Edge cases covered" - "Documentation included"

quality_indicators: description: "How the deliverable should be built" examples: - "Code follows project standards" - "Test coverage meets threshold" - "No security vulnerabilities" - "Performance within limits"

functionality_indicators: description: "What the deliverable must do" examples: - "Business logic correct" - "User acceptance criteria met" - "Integration points working" - "No regressions introduced"

Evaluation Process

step_1_load_context: actions: - "Read acceptance criteria/indicators" - "Identify all criteria to evaluate" - "Prepare evaluation checklist"

step_2_evaluate_each_indicator: for_each_criterion: - "Gather evidence from deliverable" - "Compare to threshold/target" - "Determine PASS/FAIL" - "Document evidence"

step_3_calculate_verdict: rules: all_pass: "Overall PASS" any_fail: "Overall FAIL" note: "No CONDITIONAL - binary verdict only"

step_4_determine_next_action: if_pass: - "Deliverable accepted" - "Return to workflow-orchestrator" if_fail: - "Identify failed indicators" - "Determine responsible agent" - "Provide specific feedback"

Indicator Evaluation

Evidence Requirements

evidence_types: code_presence: method: "File/function existence check" example: "Function X exists in file Y"

test_results: method: "Test execution output" example: "All 15 tests pass"

static_analysis: method: "Linter/analyzer output" example: "No errors in cs-check"

coverage_metrics: method: "Coverage tool output" example: "82% line coverage"

performance_metrics: method: "Benchmark results" example: "Response time: 150ms"

manual_verification: method: "Code review observations" example: "Error handling present in all external calls"

Threshold Comparison

threshold_types: boolean: example: "Feature X implemented: true" pass_if: "value == expected"

numeric: example: "Test coverage >= 80%" pass_if: "value >= threshold"

count: example: "Critical bugs == 0" pass_if: "count <= max_allowed"

presence: example: "Documentation exists" pass_if: "file/section exists"

Verdict Generation

Pass Criteria

pass_requirements:

  • "ALL completeness indicators PASS"
  • "ALL quality indicators PASS"
  • "ALL functionality indicators PASS"
  • "No critical issues found"

pass_verdict: overall: "PASS" action: "Proceed to commit/completion" delegate_to: "workflow-orchestrator"

Fail Criteria

fail_triggers:

  • "ANY indicator fails"
  • "Critical issue found"
  • "Blocking problem identified"

fail_verdict: overall: "FAIL" action: "Return for rework" delegate_to: "<agent responsible for failed area>" feedback: "<specific issues and remediation>"

Iteration Management

Iteration Tracking

iteration_limits: max_iterations: 3 focus_by_iteration: iteration_1: "Critical issues" iteration_2: "Major improvements" iteration_3: "Final polish"

escalation: trigger: "After 3 failed iterations" action: "Consult user for decision" options: - "Accept current state" - "Modify requirements" - "Abandon task"

Feedback Quality

feedback_requirements: specific: bad: "Code quality issues" good: "Missing error handling in getUserById() at line 45"

actionable: bad: "Improve test coverage" good: "Add tests for edge case: null user input in login()"

prioritized: bad: "Fix all issues" good: "Critical: SQL injection at line 23; Major: Missing validation"

agent_assigned: bad: "Fix the code" good: "code-developer: Add input validation to processPayment()"

Output Format

evaluation_result: overall_verdict: PASS|FAIL

indicator_results: completeness: - indicator: "All required functions implemented" status: PASS|FAIL evidence: "<proof>" target: "<what was required>" actual: "<what was achieved>"

quality:
  - indicator: "Code follows standards"
    status: PASS|FAIL
    evidence: "&#x3C;proof>"
    target: "&#x3C;standard>"
    actual: "&#x3C;measurement>"

functionality:
  - indicator: "Business logic correct"
    status: PASS|FAIL
    evidence: "&#x3C;proof>"
    target: "&#x3C;expected behavior>"
    actual: "&#x3C;observed behavior>"

summary: passed_count: <number> failed_count: <number> pass_rate: "<percentage>"

feedback: critical: - issue: "<description>" location: "<file:line>" remediation: "<how to fix>" assign_to: "<agent>"

major:
  - issue: "&#x3C;description>"
    location: "&#x3C;file:line>"
    remediation: "&#x3C;how to fix>"
    assign_to: "&#x3C;agent>"

minor:
  - issue: "&#x3C;description>"
    suggestion: "&#x3C;improvement>"

next_action: action: complete|rework reason: "<why>" assigned_agent: "<agent for rework>" specific_tasks: - "<task 1>" - "<task 2>"

iteration: current: <number> max: 3 escalate_if_exceeded: true

Evaluation Checklist Template

checklist: completeness: - [ ] All specified functions exist - [ ] Error handling implemented - [ ] Edge cases covered - [ ] Required documentation present

quality: - [ ] Code follows project standards - [ ] No linting errors - [ ] Test coverage adequate - [ ] No security vulnerabilities

functionality: - [ ] Core business logic works - [ ] Acceptance criteria met - [ ] No regressions introduced - [ ] Integration points functional

Integration

Used By Agents

primary_users:

  • deliverable-evaluator: "Core evaluation skill"

secondary_users:

  • workflow-orchestrator: "Completion checks"
  • quality-reviewer: "Quality assessment"

Acceptance Criteria Sources

criteria_sources: from_goal_clarifier: when: "All tasks" format: "Structured acceptance criteria"

from_user_request: when: "Explicit criteria provided" format: "User-defined acceptance criteria"

default_indicators: when: "No explicit criteria provided" use: "Standard quality checklist"

Best Practices

  • Evidence-Based: Every verdict must cite specific evidence

  • Objective: Apply same standards consistently

  • Actionable Feedback: Every failure includes remediation path

  • Clear Assignments: Failed areas specify responsible agent

  • Iteration Awareness: Track progress across iterations

  • Escalation Ready: Know when to involve user

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

requirement-analyzer

No summary provided by upstream source.

Repository SourceNeeds Review
General

test-case-designer

No summary provided by upstream source.

Repository SourceNeeds Review
General

functional-designer

No summary provided by upstream source.

Repository SourceNeeds Review
General

test-validator

No summary provided by upstream source.

Repository SourceNeeds Review