Load Balancing Coordinator Skill
Overview
This skill provides comprehensive load balancing capabilities including work-stealing algorithms, dynamic task distribution, queue management, and adaptive resource allocation for optimal swarm coordination.
When to Use
-
Distributing tasks across multiple agents efficiently
-
Preventing agent overload while maximizing utilization
-
Implementing fair scheduling across different task priorities
-
Optimizing throughput in distributed systems
-
Handling variable workloads with adaptive balancing
-
Migrating tasks from overloaded to underloaded agents
Quick Start
Initialize load balancer
npx claude-flow agent spawn load-balancer --type coordinator
Start load balancing
npx claude-flow load-balance --swarm-id <id> --strategy adaptive
Monitor load distribution
npx claude-flow agent-metrics --type load-balancer
Adjust balancing parameters
npx claude-flow config-manage --action update --config '{"stealThreshold": 5, "agingBoost": 10}'
Architecture
+-----------------------------------------------------------+ | Load Balancing Coordinator | +-----------------------------------------------------------+ | Work Stealer | Load Balancer | Queue Manager | +----------------+-----------------+-------------------------+ | | | v v v +---------------+ +-----------------+ +------------------+ | Victim Select | | Agent Capacity | | Priority Queues | | - Heaviest | | - Load Tracking | | - Critical | | - Threshold | | - Performance | | - High/Normal | | - Locality | | - Migration | | - Low/Background | +---------------+ +-----------------+ +------------------+ | | | v v v +-----------------------------------------------------------+ | Resource Optimization Engine | +-----------------------------------------------------------+
Core Capabilities
- Work-Stealing Algorithm
Efficiently redistributes work from overloaded agents:
// Work-stealing configuration const workStealing = { stealThreshold: 5, // Steal when queue > 5 tasks stealPercentage: 0.5, // Take 50% of victim's queue victimSelection: 'heaviest', // Steal from busiest agent localityAware: true // Prefer nearby agents };
// Victim selection strategies: // - heaviest: Steal from agent with most tasks // - random: Random selection for fairness // - locality: Prefer agents in same topology region
- Dynamic Load Balancing
Real-time load distribution:
Strategy Description Best For
Round Robin Sequential distribution Uniform tasks
Weighted Based on agent capacity Heterogeneous agents
Least Connections To least loaded agent Variable task duration
Adaptive ML-based optimization Complex workloads
- Queue Management
Multi-level feedback queue scheduling:
Priority Level Weight Use Case
Critical 40% System-critical tasks
High 30% User-facing operations
Normal 20% Standard processing
Low 10% Background tasks
- Resource Allocation
Multi-objective optimization:
-
Minimize latency
-
Maximize utilization
-
Balance load
-
Minimize cost
Scheduling Algorithms
Earliest Deadline First (EDF)
// EDF for real-time task scheduling const edfScheduler = { schedule(tasks) { return tasks.sort((a, b) => a.deadline - b.deadline); },
// Liu & Layland utilization bound admissionControl(newTask, existingTasks) { const utilization = [...existingTasks, newTask] .reduce((sum, t) => sum + (t.executionTime / t.period), 0); return utilization <= 1.0; } };
Completely Fair Scheduler (CFS)
// CFS for fair task distribution const cfsScheduler = { virtualRuntime: new Map(), weights: new Map(),
schedule() { // Select task with minimum virtual runtime return this.getMinVirtualRuntimeTask(); },
updateVirtualRuntime(task, elapsedTime) { const weight = this.weights.get(task.id) || 1; const vruntime = this.virtualRuntime.get(task.id) || 0; this.virtualRuntime.set(task.id, vruntime + (elapsedTime / weight)); } };
Weighted Fair Queuing (WFQ)
Proportional bandwidth allocation based on agent weights.
MCP Integration
// MCP load balancing integration const loadBalancingIntegration = { // Real-time metrics collection async collectMetrics() { const [performance, bottlenecks, tokenUsage] = await Promise.all([ mcp.performance_report({ format: 'json' }), mcp.bottleneck_analyze({}), mcp.token_usage({}) ]);
return { performance, bottlenecks, tokenUsage, timestamp: Date.now() };
},
// Execute load balancing async coordinateLoadBalancing(swarmId) { const agents = await mcp.agent_list({ swarmId }); const metrics = await mcp.agent_metrics({});
const rebalancing = this.calculateRebalancing(agents, metrics);
if (rebalancing.required) {
await mcp.load_balance({
swarmId,
tasks: rebalancing.taskMigrations
});
}
return rebalancing;
} };
Circuit Breaker Pattern
Protect against cascade failures:
const circuitBreaker = { state: 'CLOSED', // CLOSED, OPEN, HALF_OPEN failureThreshold: 5, // Open after 5 failures successThreshold: 3, // Close after 3 successes timeout: 60000, // Recovery timeout (ms)
async execute(operation, fallback) { if (this.state === 'OPEN' && !this.shouldAttemptReset()) { return fallback ? await fallback() : null; }
try {
const result = await operation();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
if (fallback) return fallback();
throw error;
}
} };
Key Metrics
Performance Indicators
Metric Description Target
Load Distribution Variance Balance across agents < 0.1
Task Migration Rate Work-stealing frequency < 5%
Queue Latency Time in queue < 100ms
Utilization Efficiency Resource usage
80%
Fairness Index Jain's fairness
0.9
Benchmarking
// Load balancer benchmarks const benchmarks = { async throughputTest(taskCount, agentCount) { const startTime = performance.now(); await this.distributeAndExecute(taskCount, agentCount); const endTime = performance.now();
return {
throughput: taskCount / ((endTime - startTime) / 1000),
averageLatency: (endTime - startTime) / taskCount
};
},
async loadBalanceEfficiency(tasks, agents) { const distribution = await this.distributeLoad(tasks, agents); const idealLoad = tasks.length / agents.length;
const variance = distribution.reduce((sum, load) =>
sum + Math.pow(load - idealLoad, 2), 0) / agents.length;
return {
efficiency: 1 / (1 + variance),
loadVariance: variance
};
} };
Commands Reference
Real-time load monitoring
npx claude-flow performance-report --format detailed
Bottleneck analysis
npx claude-flow bottleneck-analyze --component swarm-coordination
Resource utilization tracking
npx claude-flow metrics-collect --components ["load-balancer", "task-queue"]
Configure load balancing strategy
npx claude-flow config-manage --action update
--config '{"strategy": "adaptive", "threshold": 0.8}'
Integration Points
Integration Purpose
Performance Monitor Real-time metrics for load decisions
Topology Optimizer Coordinate topology changes with load
Resource Allocator Optimize resource distribution
Task Orchestrator Receive load-balanced assignments
Best Practices
-
Gradual Migration: Move tasks incrementally to avoid oscillation
-
Locality Awareness: Prefer local task execution to minimize latency
-
Priority Preservation: Maintain task priorities during migration
-
Monitoring: Track load balance metrics continuously
-
Adaptive Thresholds: Adjust thresholds based on workload patterns
-
Circuit Breakers: Protect against cascade failures
Example: Adaptive Load Balancing
// Adaptive load balancing strategy const adaptiveBalancer = { config: { checkInterval: 5000, // Check every 5 seconds migrationThreshold: 0.3, // Migrate if imbalance > 30% cooldownPeriod: 30000, // Wait 30s between migrations maxMigrations: 5 // Max 5 migrations per cycle },
async balance(swarm) { const loads = await this.getAgentLoads(swarm); const average = loads.reduce((a, b) => a + b) / loads.length;
const overloaded = loads.filter(l => l > average * 1.3);
const underloaded = loads.filter(l => l < average * 0.7);
if (overloaded.length > 0 && underloaded.length > 0) {
await this.migrateTasks(overloaded, underloaded);
}
} };
Related Skills
-
optimization-monitor
-
Real-time performance monitoring
-
optimization-resources
-
Resource allocation and scaling
-
optimization-topology
-
Network topology optimization
-
optimization-benchmark
-
Performance validation
Version History
- 1.0.0 (2026-01-02): Initial release - converted from load-balancer agent with work-stealing, queue management, scheduling algorithms, circuit breaker pattern, and adaptive balancing