ml-evolution-agent

Auto-evolving ML competition agent. Learns from each experiment, accumulates HCC multi-layer memory, and continuously improves LB scores. Inspired by MLE-Bench #1 ML-Master methodology.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "ml-evolution-agent" with this command: npx skills add guohongbin-git/ml-evolution-agent

ML Evolution Agent 🤖

Auto-evolving ML competition agent that learns from every experiment.

What This Skill Does

  1. Auto-evolves ML models for Kaggle-style competitions
  2. HCC Multi-layer Memory - Episodic, Pattern, Knowledge, Strategic layers
  3. Continuous improvement - Each phase learns from previous failures/successes
  4. Resource-aware - Respects system limits (time, memory, API quotas)

When to Use

  • User mentions Kaggle competition
  • Tabular data classification/regression tasks
  • Need to beat a target LB score
  • User wants automated ML experimentation

Quick Start

# Initialize
from ml_evolution import MLEvolutionAgent

agent = MLEvolutionAgent(
    competition="playground-series-s6e2",
    target_lb=0.95400,
    data_dir="./data"
)

# Run evolution
agent.evolve(max_phases=10)

HCC Memory Architecture

Layer 1: Episodic Memory
├── Experiment logs (phase, CV, LB, features, params)
├── Success/failure records
└── Resource usage tracking

Layer 2: Pattern Memory
├── What works (success patterns)
├── What fails (failure patterns)
└── When to use each approach

Layer 3: Knowledge Memory
├── Feature engineering techniques
├── Model configurations
├── Hyperparameter knowledge
└── Domain-specific features

Layer 4: Strategic Memory
├── Auto-evolution rules
├── Resource management rules
├── Exploration-exploitation balance
└── Competition-specific strategies

Proven Techniques (from real competitions)

Feature Engineering

TechniqueEffectBest For
Target Statistics+0.00018 LBAll tabular data
Frequency Encoding+0.00005 LBHigh-cardinality features
Smooth Target Encoding+0.00003 LBPrevent overfitting
Medical Indicators+0.00006 CVHealth data

Model Configurations

ModelBest ParamsWeight
CatBoostiter=1000-1200, lr=0.04-0.05, depth=6-750%
XGBoostn_est=1000-1200, lr=0.04, max_depth=625-30%
LightGBMn_est=1000-1200, lr=0.04, leaves=4020-25%

Resource Limits

  • Features: < 60 (avoids timeout)
  • Iterations: < 1200 (avoids SIGKILL)
  • Training time: < 20 min (system limit)
  • Submissions: 10/day (Kaggle quota)

Evolution Rules

# Auto-evolution decision tree
if phase_improved:
    keep_features()
    try_similar_approach()
elif phase_degraded > 0.0001:
    rollback()
    try_new_direction()
else:
    fine_tune_params()

# Overfitting detection
if cv_lb_gap > 0.002:
    increase_regularization()
    reduce_features()
    simplify_model()

Files Structure

ml-evolution-agent/
├── SKILL.md              # This file
├── HCC_MEMORY.md         # Memory architecture details
├── FEATURE_ENGINEERING.md # Feature techniques library
├── MODEL_CONFIGS.md      # Optimal model configurations
├── EVOLUTION_RULES.md    # Auto-evolution decision rules
└── templates/
    ├── train_baseline.py # Baseline training script
    ├── train_evolved.py  # Evolution training script
    └── memory.json       # Example memory state

Example Results

Playground S6E2 (Feb 2026)

  • Started: LB 0.95347
  • Best: LB 0.95365 (+0.00018)
  • Phases: 14
  • Success rate: 36%
  • Target beaten: Yes (0.95361 → 0.95365)

Key Learnings

  1. Simple > Complex - Target stats beat complex feature engineering
  2. Resource limits matter - Too many features = timeout
  3. CatBoost is king - Consistently best for tabular data
  4. Daily quota awareness - Kaggle limits submissions

Installation

clawhub install ml-evolution-agent

Built from real competition experience. Evolved through 14 phases of experimentation.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Ocean Chat

OceanBus-powered P2P messaging, shared address book, 1v1 meetup negotiation, and conversation threading for AI agents. Use when users want to manage contacts...

Registry SourceRecently Updated
Automation

Remote Skill Test

Use when the user wants to test an agent skill on a remote jump host after updating it locally. Triggers on "test skill remotely", "remote test", "远程测试 skill...

Registry SourceRecently Updated
Automation

Link Midjourney Instagram

Runs the linkmidjourneyinstagram automation — generate four Midjourney images in Chromium via Playwright, then post each PNG as its own Instagram web post wi...

Registry SourceRecently Updated
250Profile unavailable
Automation

NEXO Brain

Cognitive memory system for AI agents — Atkinson-Shiffrin memory model, semantic RAG, trust scoring, and metacognitive error prevention. Gives your agent per...

Registry SourceRecently Updated
2.1K1Profile unavailable