Execution Lifecycle Manager
Centralized state management for running DAG executions with graceful shutdown patterns.
When to Use
✅ Use for:
-
Implementing execution start/stop/pause/resume controls
-
Graceful process termination (SIGTERM → SIGKILL)
-
Tracking active executions across the system
-
Cleaning up orphaned processes
-
Implementing abort handlers with cost tracking
❌ NOT for:
-
Cost estimation or pricing calculations (use cost-accrual-tracker)
-
Building or modifying DAG structures
-
Skill matching or selection
-
Process spawning (use the executor directly)
Core Patterns
- Graceful Shutdown Pattern
Always use SIGTERM first, then escalate to SIGKILL:
// CORRECT: Two-phase shutdown const GRACEFUL_TIMEOUT_MS = 2000;
async function terminateProcess(proc: ChildProcess): Promise<void> { proc.kill('SIGTERM');
const forceKillTimer = setTimeout(() => { if (!proc.killed) { proc.kill('SIGKILL'); } }, GRACEFUL_TIMEOUT_MS);
await waitForExit(proc); clearTimeout(forceKillTimer); }
- AbortController Pattern
Use AbortController for cancellation propagation:
// Parent (DAGExecutor) const abortController = new AbortController();
// Pass signal to child executors await executor.execute({ ...request, abortSignal: abortController.signal, });
// To abort all children: abortController.abort();
- Execution Registry Pattern
Track active executions for monitoring and cleanup:
interface ActiveExecution { executionId: string; abortController: AbortController; status: 'running' | 'stopping' | 'stopped' | 'completed' | 'failed'; startedAt: number; stoppedAt?: number; }
class ExecutionManager { private executions: Map<string, ActiveExecution> = new Map();
create(id: string): ActiveExecution { /* ... / } stop(id: string, reason: string): Promise<StopResult> { / ... / } listActive(): ActiveExecution[] { / ... */ } }
Anti-Patterns
SIGKILL Without SIGTERM
Novice thinking: "Just kill it immediately"
Reality: SIGKILL doesn't allow cleanup. Processes can't:
-
Flush buffers to disk
-
Close network connections gracefully
-
Release locks
-
Save partial progress
Timeline:
-
Always: SIGTERM allows graceful shutdown
-
If stuck after 2-5s: Then use SIGKILL
Correct approach: Always SIGTERM first, SIGKILL as fallback.
Missing Abort Signal Propagation
Novice thinking: "Just track the top-level execution"
Reality: Without signal propagation, child processes become orphans:
-
Parent dies, children keep running
-
Resources leak
-
Costs continue accruing
Correct approach: Pass AbortSignal through entire execution tree.
Synchronous Stop Handler
Novice thinking: "Stop should return immediately"
Reality: Stopping is async - processes need time to terminate:
-
Network requests need to timeout
-
File handles need to close
-
Costs need final calculation
Correct approach: Return Promise with final state after cleanup completes.
State Machine
┌──────────┐
│ idle │
└────┬─────┘
│ start()
▼
┌──────────┐
┌───►│ running │◄───┐
│ └────┬─────┘ │
│ │ │ resume()
│ │ pause() │
│ ▼ │
│ ┌──────────┐ │
│ │ paused │────┘
│ └────┬─────┘
│ │ stop()
│ ▼
│ ┌──────────┐
└────│ stopping │ (transitional - 2-10s)
└────┬─────┘
│
┌────────┴────────┐
▼ ▼
┌──────────┐ ┌──────────┐ │ stopped │ │ failed │ └──────────┘ └──────────┘
API Design
Stop Endpoint Response
interface StopResponse { status: 'stopped'; executionId: string; reason: string; // 'user_abort' | 'timeout' | 'error' finalCostUsd: number; stoppedAt: number; summary: { nodesCompleted: number; nodesFailed: number; nodesTotal: number; durationMs: number; }; }
Cleanup on Server Shutdown
// In server.ts process.on('SIGINT', async () => { console.log('Shutting down...');
// Stop all active executions gracefully const active = executionManager.listActive(); await Promise.all( active.map(e => executionManager.stop(e.executionId, 'server_shutdown')) );
server.close(); });
Integration Points
Component Responsibility
ExecutionManager
Tracks executions, coordinates stop
DAGExecutor
Owns AbortController, orchestrates waves
ProcessExecutor
Spawns processes, handles SIGTERM/SIGKILL
/api/execute/stop
HTTP interface for stop requests
References
See /references/process-signals.md for Unix signal handling details.