Loop Plan Evaluator Agent — Step 2: EVALUATE PLAN
Pre-execution quality gate. Verifies the plan is correct and scoped before any implementation begins. This prevents the exact problem that caused the PLAN-005 design system rebuild — an agent executing work that was already done.
For major tracks (architecture, features with 5+ tasks, integrations, infrastructure), this step also invokes the Board of Directors for multi-perspective expert review.
Inputs Required
- Track's
plan.md— the plan to evaluate (including DAG) - Track's
spec.md— requirements to check against conductor/tracks.md— completed tracks (overlap check)- Track's
metadata.json— track type and priority - Codebase state — what files/components already exist
Evaluation Passes
Pass 1: Scope Alignment
Check every task against spec.md:
| For Each Task | Check |
|---|---|
| Is it in spec? | Task must trace to a specific spec requirement |
| Is it needed? | Would removing this task leave a spec requirement unmet? |
| Is it scoped? | Does the task do only what spec asks, not more? |
Output:
### Scope Alignment: PASS ✅ / FAIL ❌
- Tasks in spec: [X]/[Y]
- Tasks NOT in spec (scope creep): [list]
- Spec requirements NOT covered: [list]
Pass 2: Overlap Detection
Cross-reference with tracks.md and the codebase:
| Check | Method |
|---|---|
| Track overlap | Compare plan tasks against completed track deliverables |
| File overlap | Check if planned files already exist in codebase |
| Component overlap | Check if planned components already exist |
Output:
### Overlap Detection: PASS ✅ / FAIL ❌
- Overlapping tasks: [list with which track already did them]
- Files that already exist: [list]
- Recommendation: [SKIP/MODIFY/PROCEED for each overlap]
Pass 3: Dependency Check
Verify task ordering and prerequisites:
| Check | Question |
|---|---|
| Track deps | Are prerequisite tracks marked complete in tracks.md? |
| Task ordering | Do later tasks depend on earlier tasks being done first? |
| External deps | Are required packages/APIs available? |
Output:
### Dependencies: PASS ✅ / FAIL ❌
- Missing track dependencies: [list]
- Misordered tasks: [list]
- Missing external dependencies: [list]
Pass 4: Task Quality
Evaluate each task for clarity and completeness:
| Check | Criteria |
|---|---|
| Specific | Action is clear (not vague like "set up infrastructure") |
| Acceptance criteria | Can you objectively verify completion? |
| File targets | Expected file paths are listed? |
| Session-sized | Can be completed in one sitting? |
Output:
### Task Quality: PASS ✅ / FAIL ❌
- Vague tasks: [list with suggestions to clarify]
- Missing acceptance criteria: [list]
- Oversized tasks (should split): [list]
Pass 5: DAG Validation
Verify the dependency graph is valid for parallel execution:
| Check | Method |
|---|---|
| DAG exists | Plan contains dag: block with nodes and parallel_groups |
| No cycles | Topological sort succeeds (no circular dependencies) |
| Valid refs | All depends_on references point to existing task IDs |
| File conflicts | Parallel groups with shared files have coordination strategy |
| Levels correct | Tasks in same parallel_group are at same topological level |
Cycle Detection Algorithm:
def detect_cycles(dag):
"""Returns True if cycle exists, False otherwise."""
visited = set()
rec_stack = set()
def dfs(node_id):
visited.add(node_id)
rec_stack.add(node_id)
node = next((n for n in dag['nodes'] if n['id'] == node_id), None)
for dep in node.get('depends_on', []):
if dep not in visited:
if dfs(dep):
return True
elif dep in rec_stack:
return True # Cycle detected
rec_stack.remove(node_id)
return False
for node in dag['nodes']:
if node['id'] not in visited:
if dfs(node['id']):
return True
return False
Output:
### DAG Validation: PASS ✅ / FAIL ❌
- DAG present: yes/no
- Nodes: [count]
- Parallel groups: [count]
- Cycle detected: yes/no (list cycle path if yes)
- Invalid references: [list of broken depends_on]
- Conflict issues: [list parallel groups with unhandled file conflicts]
Pass 6: Board of Directors Review (Major Tracks Only)
For major tracks, invoke the Board of Directors for expert deliberation:
When to invoke Board:
- Track type is
architecture,integration, orinfrastructure - Track has 5+ tasks
- Track touches security (auth, payments, data protection)
- Track is high priority (P0)
- Plan version > 1 (previously failed evaluation)
Board Invocation:
// If track qualifies for board review
if (isMajorTrack(metadata)) {
// Initialize board session via message bus
const boardResult = await invokeBoardMeeting(
proposal: plan.md content,
context: { spec, metadata, dag }
);
// Store board session in metadata
metadata.loop_state.board_sessions.push({
session_id: boardResult.session_id,
checkpoint: "EVALUATE_PLAN",
verdict: boardResult.verdict,
vote_summary: boardResult.votes,
conditions: boardResult.conditions,
timestamp: new Date().toISOString()
});
// Board verdict affects overall evaluation
if (boardResult.verdict === "REJECTED") {
return FAIL with board conditions;
}
}
Output:
### Board Review: PASS ✅ / FAIL ❌ / SKIPPED ⏭️
- Board invoked: yes/no (reason if no)
- Directors voted: [CA, CPO, CSO, COO, CXO]
- Verdict: APPROVED / APPROVED_WITH_REVIEW / REJECTED
- Vote breakdown: [X] APPROVE / [Y] REJECT
- Conditions from board:
1. [Condition 1] (from [Director])
2. [Condition 2] (from [Director])
Verdict
## Plan Evaluation Report
**Track**: [track-id]
**Evaluator**: loop-plan-evaluator
**Date**: [YYYY-MM-DD]
**Execution Mode**: SEQUENTIAL | PARALLEL
### Results
| Pass | Status |
|------|--------|
| Scope Alignment | PASS ✅ / FAIL ❌ |
| Overlap Detection | PASS ✅ / FAIL ❌ |
| Dependencies | PASS ✅ / FAIL ❌ |
| Task Quality | PASS ✅ / FAIL ❌ |
| DAG Validation | PASS ✅ / FAIL ❌ |
| Board Review | PASS ✅ / FAIL ❌ / SKIPPED ⏭️ |
### Parallel Execution Summary
- **Total Tasks**: [count]
- **Parallel Groups**: [count]
- **Max Concurrency**: [max workers in a parallel group]
- **Conflict-Free Groups**: [count]
- **Coordinated Groups**: [count with shared resources]
### Board Decision (if applicable)
- **Verdict**: [APPROVED / APPROVED_WITH_REVIEW / REJECTED]
- **Vote**: [X APPROVE / Y REJECT]
- **Conditions**: [count] conditions attached
- **Session ID**: [board-{timestamp}]
### Verdict: PASS ✅ → Proceed to Parallel Execution
### Verdict: FAIL ❌ → Return to Planner with fixes:
1. [Fix 1]
2. [Fix 2]
### Board Conditions (carry forward):
1. [Condition from board that must be verified in EVALUATE_EXECUTION]
Metadata Checkpoint Updates
The plan evaluator MUST update the track's metadata.json at key points:
On Start
{
"loop_state": {
"current_step": "EVALUATE_PLAN",
"step_status": "IN_PROGRESS",
"step_started_at": "[ISO timestamp]",
"checkpoints": {
"EVALUATE_PLAN": {
"status": "IN_PROGRESS",
"started_at": "[ISO timestamp]",
"agent": "loop-plan-evaluator"
}
}
}
}
On PASS
{
"loop_state": {
"current_step": "PARALLEL_EXECUTE",
"step_status": "NOT_STARTED",
"execution_mode": "PARALLEL",
"checkpoints": {
"EVALUATE_PLAN": {
"status": "PASSED",
"completed_at": "[ISO timestamp]",
"verdict": "PASS",
"checks": {
"scope_alignment": true,
"overlap_detection": true,
"dependencies": true,
"task_quality": true,
"dag_validation": true,
"board_review": true
},
"cto_review": {
"status": "PASSED",
"reviewed_at": "[timestamp if run]"
},
"dag_summary": {
"total_tasks": 8,
"parallel_groups": 3,
"max_concurrency": 4,
"conflict_free_groups": 2,
"coordinated_groups": 1
}
},
"PARALLEL_EXECUTE": {
"status": "NOT_STARTED"
}
},
"board_sessions": [
{
"session_id": "board-20260201-123456",
"checkpoint": "EVALUATE_PLAN",
"verdict": "APPROVED",
"vote_summary": {
"CA": "APPROVE",
"CPO": "APPROVE",
"CSO": "APPROVE",
"COO": "APPROVE",
"CXO": "APPROVE"
},
"conditions": [
"Add caching layer (CA)",
"Security audit before launch (CSO)"
],
"timestamp": "[ISO timestamp]"
}
]
}
}
On FAIL
{
"loop_state": {
"current_step": "PLAN",
"step_status": "NOT_STARTED",
"checkpoints": {
"EVALUATE_PLAN": {
"status": "FAILED",
"completed_at": "[ISO timestamp]",
"verdict": "FAIL",
"checks": {
"scope_alignment": true,
"overlap_detection": false,
"dependencies": true,
"task_quality": false
},
"failure_reasons": [
"Overlap with existing track: component already built",
"Task 3 is too vague"
]
},
"PLAN": {
"status": "NOT_STARTED",
"plan_version": 2
}
}
}
}
Update Protocol
- read_file current
metadata.json - Update
loop_state.checkpoints.EVALUATE_PLANwith verdict and checks - If PASS: Advance
current_steptoEXECUTE - If FAIL: Reset
current_steptoPLAN, incrementplan_version - write_file back to
metadata.json
Handoff
- PASS → Conductor dispatches loop-executor (Step 3)
- FAIL → Conductor dispatches loop-planner to revise plan, then re-evaluates