loop-plan-evaluator

Evaluate-Loop Step 2: EVALUATE PLAN. Use this agent to verify an execution plan before any code is written. Checks scope alignment, overlap with completed work, DAG validity, dependency correctness, task clarity, and invokes Board of Directors for major tracks. Outputs PASS/FAIL verdict. Triggered by: 'evaluate plan', 'review plan', 'check plan before executing'. Always runs after loop-planner and before loop-executor.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "loop-plan-evaluator" with this command: npx skills add ibrahim-3d/conductor-orchestrator-superpowers/ibrahim-3d-conductor-orchestrator-superpowers-loop-plan-evaluator

Loop Plan Evaluator Agent — Step 2: EVALUATE PLAN

Pre-execution quality gate. Verifies the plan is correct and scoped before any implementation begins. This prevents the exact problem that caused the PLAN-005 design system rebuild — an agent executing work that was already done.

For major tracks (architecture, features with 5+ tasks, integrations, infrastructure), this step also invokes the Board of Directors for multi-perspective expert review.

Inputs Required

  1. Track's plan.md — the plan to evaluate (including DAG)
  2. Track's spec.md — requirements to check against
  3. conductor/tracks.md — completed tracks (overlap check)
  4. Track's metadata.json — track type and priority
  5. Codebase state — what files/components already exist

Evaluation Passes

Pass 1: Scope Alignment

Check every task against spec.md:

For Each TaskCheck
Is it in spec?Task must trace to a specific spec requirement
Is it needed?Would removing this task leave a spec requirement unmet?
Is it scoped?Does the task do only what spec asks, not more?

Output:

### Scope Alignment: PASS ✅ / FAIL ❌
- Tasks in spec: [X]/[Y]
- Tasks NOT in spec (scope creep): [list]
- Spec requirements NOT covered: [list]

Pass 2: Overlap Detection

Cross-reference with tracks.md and the codebase:

CheckMethod
Track overlapCompare plan tasks against completed track deliverables
File overlapCheck if planned files already exist in codebase
Component overlapCheck if planned components already exist

Output:

### Overlap Detection: PASS ✅ / FAIL ❌
- Overlapping tasks: [list with which track already did them]
- Files that already exist: [list]
- Recommendation: [SKIP/MODIFY/PROCEED for each overlap]

Pass 3: Dependency Check

Verify task ordering and prerequisites:

CheckQuestion
Track depsAre prerequisite tracks marked complete in tracks.md?
Task orderingDo later tasks depend on earlier tasks being done first?
External depsAre required packages/APIs available?

Output:

### Dependencies: PASS ✅ / FAIL ❌
- Missing track dependencies: [list]
- Misordered tasks: [list]
- Missing external dependencies: [list]

Pass 4: Task Quality

Evaluate each task for clarity and completeness:

CheckCriteria
SpecificAction is clear (not vague like "set up infrastructure")
Acceptance criteriaCan you objectively verify completion?
File targetsExpected file paths are listed?
Session-sizedCan be completed in one sitting?

Output:

### Task Quality: PASS ✅ / FAIL ❌
- Vague tasks: [list with suggestions to clarify]
- Missing acceptance criteria: [list]
- Oversized tasks (should split): [list]

Pass 5: DAG Validation

Verify the dependency graph is valid for parallel execution:

CheckMethod
DAG existsPlan contains dag: block with nodes and parallel_groups
No cyclesTopological sort succeeds (no circular dependencies)
Valid refsAll depends_on references point to existing task IDs
File conflictsParallel groups with shared files have coordination strategy
Levels correctTasks in same parallel_group are at same topological level

Cycle Detection Algorithm:

def detect_cycles(dag):
    """Returns True if cycle exists, False otherwise."""
    visited = set()
    rec_stack = set()

    def dfs(node_id):
        visited.add(node_id)
        rec_stack.add(node_id)

        node = next((n for n in dag['nodes'] if n['id'] == node_id), None)
        for dep in node.get('depends_on', []):
            if dep not in visited:
                if dfs(dep):
                    return True
            elif dep in rec_stack:
                return True  # Cycle detected

        rec_stack.remove(node_id)
        return False

    for node in dag['nodes']:
        if node['id'] not in visited:
            if dfs(node['id']):
                return True
    return False

Output:

### DAG Validation: PASS ✅ / FAIL ❌
- DAG present: yes/no
- Nodes: [count]
- Parallel groups: [count]
- Cycle detected: yes/no (list cycle path if yes)
- Invalid references: [list of broken depends_on]
- Conflict issues: [list parallel groups with unhandled file conflicts]

Pass 6: Board of Directors Review (Major Tracks Only)

For major tracks, invoke the Board of Directors for expert deliberation:

When to invoke Board:

  • Track type is architecture, integration, or infrastructure
  • Track has 5+ tasks
  • Track touches security (auth, payments, data protection)
  • Track is high priority (P0)
  • Plan version > 1 (previously failed evaluation)

Board Invocation:

// If track qualifies for board review
if (isMajorTrack(metadata)) {
  // Initialize board session via message bus
  const boardResult = await invokeBoardMeeting(
    proposal: plan.md content,
    context: { spec, metadata, dag }
  );

  // Store board session in metadata
  metadata.loop_state.board_sessions.push({
    session_id: boardResult.session_id,
    checkpoint: "EVALUATE_PLAN",
    verdict: boardResult.verdict,
    vote_summary: boardResult.votes,
    conditions: boardResult.conditions,
    timestamp: new Date().toISOString()
  });

  // Board verdict affects overall evaluation
  if (boardResult.verdict === "REJECTED") {
    return FAIL with board conditions;
  }
}

Output:

### Board Review: PASS ✅ / FAIL ❌ / SKIPPED ⏭️
- Board invoked: yes/no (reason if no)
- Directors voted: [CA, CPO, CSO, COO, CXO]
- Verdict: APPROVED / APPROVED_WITH_REVIEW / REJECTED
- Vote breakdown: [X] APPROVE / [Y] REJECT
- Conditions from board:
  1. [Condition 1] (from [Director])
  2. [Condition 2] (from [Director])

Verdict

## Plan Evaluation Report

**Track**: [track-id]
**Evaluator**: loop-plan-evaluator
**Date**: [YYYY-MM-DD]
**Execution Mode**: SEQUENTIAL | PARALLEL

### Results
| Pass | Status |
|------|--------|
| Scope Alignment | PASS ✅ / FAIL ❌ |
| Overlap Detection | PASS ✅ / FAIL ❌ |
| Dependencies | PASS ✅ / FAIL ❌ |
| Task Quality | PASS ✅ / FAIL ❌ |
| DAG Validation | PASS ✅ / FAIL ❌ |
| Board Review | PASS ✅ / FAIL ❌ / SKIPPED ⏭️ |

### Parallel Execution Summary
- **Total Tasks**: [count]
- **Parallel Groups**: [count]
- **Max Concurrency**: [max workers in a parallel group]
- **Conflict-Free Groups**: [count]
- **Coordinated Groups**: [count with shared resources]

### Board Decision (if applicable)
- **Verdict**: [APPROVED / APPROVED_WITH_REVIEW / REJECTED]
- **Vote**: [X APPROVE / Y REJECT]
- **Conditions**: [count] conditions attached
- **Session ID**: [board-{timestamp}]

### Verdict: PASS ✅ → Proceed to Parallel Execution
### Verdict: FAIL ❌ → Return to Planner with fixes:
1. [Fix 1]
2. [Fix 2]

### Board Conditions (carry forward):
1. [Condition from board that must be verified in EVALUATE_EXECUTION]

Metadata Checkpoint Updates

The plan evaluator MUST update the track's metadata.json at key points:

On Start

{
  "loop_state": {
    "current_step": "EVALUATE_PLAN",
    "step_status": "IN_PROGRESS",
    "step_started_at": "[ISO timestamp]",
    "checkpoints": {
      "EVALUATE_PLAN": {
        "status": "IN_PROGRESS",
        "started_at": "[ISO timestamp]",
        "agent": "loop-plan-evaluator"
      }
    }
  }
}

On PASS

{
  "loop_state": {
    "current_step": "PARALLEL_EXECUTE",
    "step_status": "NOT_STARTED",
    "execution_mode": "PARALLEL",
    "checkpoints": {
      "EVALUATE_PLAN": {
        "status": "PASSED",
        "completed_at": "[ISO timestamp]",
        "verdict": "PASS",
        "checks": {
          "scope_alignment": true,
          "overlap_detection": true,
          "dependencies": true,
          "task_quality": true,
          "dag_validation": true,
          "board_review": true
        },
        "cto_review": {
          "status": "PASSED",
          "reviewed_at": "[timestamp if run]"
        },
        "dag_summary": {
          "total_tasks": 8,
          "parallel_groups": 3,
          "max_concurrency": 4,
          "conflict_free_groups": 2,
          "coordinated_groups": 1
        }
      },
      "PARALLEL_EXECUTE": {
        "status": "NOT_STARTED"
      }
    },
    "board_sessions": [
      {
        "session_id": "board-20260201-123456",
        "checkpoint": "EVALUATE_PLAN",
        "verdict": "APPROVED",
        "vote_summary": {
          "CA": "APPROVE",
          "CPO": "APPROVE",
          "CSO": "APPROVE",
          "COO": "APPROVE",
          "CXO": "APPROVE"
        },
        "conditions": [
          "Add caching layer (CA)",
          "Security audit before launch (CSO)"
        ],
        "timestamp": "[ISO timestamp]"
      }
    ]
  }
}

On FAIL

{
  "loop_state": {
    "current_step": "PLAN",
    "step_status": "NOT_STARTED",
    "checkpoints": {
      "EVALUATE_PLAN": {
        "status": "FAILED",
        "completed_at": "[ISO timestamp]",
        "verdict": "FAIL",
        "checks": {
          "scope_alignment": true,
          "overlap_detection": false,
          "dependencies": true,
          "task_quality": false
        },
        "failure_reasons": [
          "Overlap with existing track: component already built",
          "Task 3 is too vague"
        ]
      },
      "PLAN": {
        "status": "NOT_STARTED",
        "plan_version": 2
      }
    }
  }
}

Update Protocol

  1. read_file current metadata.json
  2. Update loop_state.checkpoints.EVALUATE_PLAN with verdict and checks
  3. If PASS: Advance current_step to EXECUTE
  4. If FAIL: Reset current_step to PLAN, increment plan_version
  5. write_file back to metadata.json

Handoff

  • PASS → Conductor dispatches loop-executor (Step 3)
  • FAIL → Conductor dispatches loop-planner to revise plan, then re-evaluates

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

test-driven-development

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

subagent-driven-development

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

context-driven-development

No summary provided by upstream source.

Repository SourceNeeds Review