AgentDB Learning Plugins
What This Skill Does
Provides access to 9 reinforcement learning algorithms via AgentDB's plugin system. Create, train, and deploy learning plugins for autonomous agents that improve through experience. Includes offline RL (Decision Transformer), value-based learning (Q-Learning), policy gradients (Actor-Critic), and advanced techniques.
Performance: Train models 10-100x faster with WASM-accelerated neural inference.
Prerequisites
-
Node.js 18+
-
AgentDB v1.0.7+ (via agentic-flow)
-
Basic understanding of reinforcement learning (recommended)
Quick Start with CLI
Create Learning Plugin
Interactive wizard
npx agentdb@latest create-plugin
Use specific template
npx agentdb@latest create-plugin -t decision-transformer -n my-agent
Preview without creating
npx agentdb@latest create-plugin -t q-learning --dry-run
Custom output directory
npx agentdb@latest create-plugin -t actor-critic -o .$plugins
List Available Templates
Show all plugin templates
npx agentdb@latest list-templates
Available templates:
- decision-transformer (sequence modeling RL - recommended)
- q-learning (value-based learning)
- sarsa (on-policy TD learning)
- actor-critic (policy gradient with baseline)
- curiosity-driven (exploration-based)
Manage Plugins
List installed plugins
npx agentdb@latest list-plugins
Get plugin information
npx agentdb@latest plugin-info my-agent
Shows: algorithm, configuration, training status
Quick Start with API
import { createAgentDBAdapter } from 'agentic-flow$reasoningbank';
// Initialize with learning enabled const adapter = await createAgentDBAdapter({ dbPath: '.agentdb$learning.db', enableLearning: true, // Enable learning plugins enableReasoning: true, cacheSize: 1000, });
// Store training experience await adapter.insertPattern({ id: '', type: 'experience', domain: 'game-playing', pattern_data: JSON.stringify({ embedding: await computeEmbedding('state-action-reward'), pattern: { state: [0.1, 0.2, 0.3], action: 2, reward: 1.0, next_state: [0.15, 0.25, 0.35], done: false } }), confidence: 0.9, usage_count: 1, success_count: 1, created_at: Date.now(), last_used: Date.now(), });
// Train learning model const metrics = await adapter.train({ epochs: 50, batchSize: 32, });
console.log('Training Loss:', metrics.loss); console.log('Duration:', metrics.duration, 'ms');
Available Learning Algorithms (9 Total)
- Decision Transformer (Recommended)
Type: Offline Reinforcement Learning Best For: Learning from logged experiences, imitation learning Strengths: No online interaction needed, stable training
npx agentdb@latest create-plugin -t decision-transformer -n dt-agent
Use Cases:
-
Learn from historical data
-
Imitation learning from expert demonstrations
-
Safe learning without environment interaction
-
Sequence modeling tasks
Configuration:
{ "algorithm": "decision-transformer", "model_size": "base", "context_length": 20, "embed_dim": 128, "n_heads": 8, "n_layers": 6 }
- Q-Learning
Type: Value-Based RL (Off-Policy) Best For: Discrete action spaces, sample efficiency Strengths: Proven, simple, works well for small$medium problems
npx agentdb@latest create-plugin -t q-learning -n q-agent
Use Cases:
-
Grid worlds, board games
-
Navigation tasks
-
Resource allocation
-
Discrete decision-making
Configuration:
{ "algorithm": "q-learning", "learning_rate": 0.001, "gamma": 0.99, "epsilon": 0.1, "epsilon_decay": 0.995 }
- SARSA
Type: Value-Based RL (On-Policy) Best For: Safe exploration, risk-sensitive tasks Strengths: More conservative than Q-Learning, better for safety
npx agentdb@latest create-plugin -t sarsa -n sarsa-agent
Use Cases:
-
Safety-critical applications
-
Risk-sensitive decision-making
-
Online learning with exploration
Configuration:
{ "algorithm": "sarsa", "learning_rate": 0.001, "gamma": 0.99, "epsilon": 0.1 }
- Actor-Critic
Type: Policy Gradient with Value Baseline Best For: Continuous actions, variance reduction Strengths: Stable, works for continuous$discrete actions
npx agentdb@latest create-plugin -t actor-critic -n ac-agent
Use Cases:
-
Continuous control (robotics, simulations)
-
Complex action spaces
-
Multi-agent coordination
Configuration:
{ "algorithm": "actor-critic", "actor_lr": 0.001, "critic_lr": 0.002, "gamma": 0.99, "entropy_coef": 0.01 }
- Active Learning
Type: Query-Based Learning Best For: Label-efficient learning, human-in-the-loop Strengths: Minimizes labeling cost, focuses on uncertain samples
Use Cases:
-
Human feedback incorporation
-
Label-efficient training
-
Uncertainty sampling
-
Annotation cost reduction
- Adversarial Training
Type: Robustness Enhancement Best For: Safety, robustness to perturbations Strengths: Improves model robustness, adversarial defense
Use Cases:
-
Security applications
-
Robust decision-making
-
Adversarial defense
-
Safety testing
- Curriculum Learning
Type: Progressive Difficulty Training Best For: Complex tasks, faster convergence Strengths: Stable learning, faster convergence on hard tasks
Use Cases:
-
Complex multi-stage tasks
-
Hard exploration problems
-
Skill composition
-
Transfer learning
- Federated Learning
Type: Distributed Learning Best For: Privacy, distributed data Strengths: Privacy-preserving, scalable
Use Cases:
-
Multi-agent systems
-
Privacy-sensitive data
-
Distributed training
-
Collaborative learning
- Multi-Task Learning
Type: Transfer Learning Best For: Related tasks, knowledge sharing Strengths: Faster learning on new tasks, better generalization
Use Cases:
-
Task families
-
Transfer learning
-
Domain adaptation
-
Meta-learning
Training Workflow
- Collect Experiences
// Store experiences during agent execution for (let i = 0; i < numEpisodes; i++) { const episode = runEpisode();
for (const step of episode.steps) { await adapter.insertPattern({ id: '', type: 'experience', domain: 'task-domain', pattern_data: JSON.stringify({ embedding: await computeEmbedding(JSON.stringify(step)), pattern: { state: step.state, action: step.action, reward: step.reward, next_state: step.next_state, done: step.done } }), confidence: step.reward > 0 ? 0.9 : 0.5, usage_count: 1, success_count: step.reward > 0 ? 1 : 0, created_at: Date.now(), last_used: Date.now(), }); } }
- Train Model
// Train on collected experiences const trainingMetrics = await adapter.train({ epochs: 100, batchSize: 64, learningRate: 0.001, validationSplit: 0.2, });
console.log('Training Metrics:', trainingMetrics); // { // loss: 0.023, // valLoss: 0.028, // duration: 1523, // epochs: 100 // }
- Evaluate Performance
// Retrieve similar successful experiences const testQuery = await computeEmbedding(JSON.stringify(testState)); const result = await adapter.retrieveWithReasoning(testQuery, { domain: 'task-domain', k: 10, synthesizeContext: true, });
// Evaluate action quality const suggestedAction = result.memories[0].pattern.action; const confidence = result.memories[0].similarity;
console.log('Suggested Action:', suggestedAction); console.log('Confidence:', confidence);
Advanced Training Techniques
Experience Replay
// Store experiences in buffer const replayBuffer = [];
// Sample random batch for training const batch = sampleRandomBatch(replayBuffer, batchSize: 32);
// Train on batch await adapter.train({ data: batch, epochs: 1, batchSize: 32, });
Prioritized Experience Replay
// Store experiences with priority (TD error) await adapter.insertPattern({ // ... standard fields confidence: tdError, // Use TD error as confidence$priority // ... });
// Retrieve high-priority experiences const highPriority = await adapter.retrieveWithReasoning(queryEmbedding, { domain: 'task-domain', k: 32, minConfidence: 0.7, // Only high TD-error experiences });
Multi-Agent Training
// Collect experiences from multiple agents for (const agent of agents) { const experience = await agent.step();
await adapter.insertPattern({
// ... store experience with agent ID
domain: multi-agent/${agent.id},
});
}
// Train shared model await adapter.train({ epochs: 50, batchSize: 64, });
Performance Optimization
Batch Training
// Collect batch of experiences const experiences = collectBatch(size: 1000);
// Batch insert (500x faster) for (const exp of experiences) { await adapter.insertPattern({ /* ... */ }); }
// Train on batch await adapter.train({ epochs: 10, batchSize: 128, // Larger batch for efficiency });
Incremental Learning
// Train incrementally as new data arrives setInterval(async () => { const newExperiences = getNewExperiences();
if (newExperiences.length > 100) { await adapter.train({ epochs: 5, batchSize: 32, }); } }, 60000); // Every minute
Integration with Reasoning Agents
Combine learning with reasoning for better performance:
// Train learning model await adapter.train({ epochs: 50, batchSize: 32 });
// Use reasoning agents for inference const result = await adapter.retrieveWithReasoning(queryEmbedding, { domain: 'decision-making', k: 10, useMMR: true, // Diverse experiences synthesizeContext: true, // Rich context optimizeMemory: true, // Consolidate patterns });
// Make decision based on learned experiences + reasoning const decision = result.context.suggestedAction; const confidence = result.memories[0].similarity;
CLI Operations
Create plugin
npx agentdb@latest create-plugin -t decision-transformer -n my-plugin
List plugins
npx agentdb@latest list-plugins
Get plugin info
npx agentdb@latest plugin-info my-plugin
List templates
npx agentdb@latest list-templates
Troubleshooting
Issue: Training not converging
// Reduce learning rate await adapter.train({ epochs: 100, batchSize: 32, learningRate: 0.0001, // Lower learning rate });
Issue: Overfitting
// Use validation split await adapter.train({ epochs: 50, batchSize: 64, validationSplit: 0.2, // 20% validation });
// Enable memory optimization await adapter.retrieveWithReasoning(queryEmbedding, { optimizeMemory: true, // Consolidate, reduce overfitting });
Issue: Slow training
Enable quantization for faster inference
Use binary quantization (32x faster)
Learn More
-
Algorithm Papers: See docs$algorithms/ for detailed papers
-
GitHub: https:/$github.com$ruvnet$agentic-flow$tree$main$packages$agentdb
-
MCP Integration: npx agentdb@latest mcp
-
Website: https:/$agentdb.ruv.io
Category: Machine Learning / Reinforcement Learning Difficulty: Intermediate to Advanced Estimated Time: 30-60 minutes