ClawSergeant: Boosting OpenClaw Agents from AI Feedback

ClawSergeant trains OpenClaw agents through a structured, LLM-driven pipeline. A Trainer LLM designs curriculum, generates training tasks, and adapts its teaching dynamically based on the agent's responses. A separate Evaluator LLM objectively scores each response, creating a feedback loop that drives iterative improvement.

Architecture Overview

User Intent ──────────────────────→ LLM (Curriculum Designer)
                                          ↓
                                   Curriculum JSON (stages, tasks, criteria)
                                          ↓
Training Session Loop:
    Trainer LLM → crafts message → openclaw CLI → Claw Agent → reply
                                                      ↓
                                          Evaluator LLM → score + feedback
                                                      ↓
                              record to .claw_sergeant_accumulated_lessons/ ←──┘
                                          ↓
                                  (if failed) → Trainer LLM retries with feedback
                                          ↓
                                  (if stage passed) → stage summary for memory consolidation
                                          ↓
                    [Curriculum Pattern] → record to .claw_sergeant_accumulated_lessons/

Training Pipeline

Phase 1: Curriculum Design

The user's training intent is passed directly as input. The LLM generates a multi-stage curriculum as structured JSON based on this intent. The user reviews and approves the curriculum before training begins.

Each curriculum contains:

Title and overview of the training program
Target persona describing the ideal agent after training
3–5 stages, each with:
- Name, description, and learning objectives
- 2–4 training tasks with scenario descriptions and expected behaviors
- Evaluation criteria with passing standards

Phase 2: Training Execution

For each stage and task, the system runs a dialogue loop:

Trainer LLM generates a task message tailored to the agent (it never sees hardcoded prompts — everything is dynamically composed)
Message is sent to the Claw Agent via openclaw agent CLI
Agent's reply is captured and fed back to the Trainer's conversation context
Evaluator LLM scores the reply (1–10) and reports strengths, weaknesses, and improvement suggestions
If the task is not passed and retries remain, the Trainer generates a follow-up message incorporating the evaluation feedback
After a stage passes, the agent receives a summary prompt to internalize lessons learned

Environment Setup

Create a .env file in the project root with:

LLM_API_KEY=<your-api-key>          # Required: API key for the LLM
LLM_BASE_URL=https://api.openai.com/v1  # Optional: OpenAI-compatible endpoint
LLM_MODEL=gpt-4o                    # Optional: model identifier
CLAW_RECIPIENT=+15555550123         # Required: target agent's address

Running the Training

Full Training Session

python main.py "An efficient, rigorous programming assistant"

The training intent is passed as a command-line argument. ClawSergeant designs a curriculum, presents it for approval, and runs the training session automatically. Results are saved to training_results.json.

Phase-by-Phase Testing

Use test_phases.py to verify each component independently before running a full session:

python test_phases.py 1    # Verify LLM API connectivity
python test_phases.py 2    # Test curriculum generation
python test_phases.py 3    # Test Claw agent communication
python test_phases.py 4    # Run a single-task training round
python test_phases.py all  # Run all phases sequentially

Always start with phase 1 to confirm the LLM connection works, then progress through subsequent phases.

Configuration

All training parameters are centralized in config.py:

Parameter	Default	Purpose
`STAGE_COUNT_MIN` / `MAX`	3 / 5	Number of training stages
`TASKS_PER_STAGE_MIN` / `MAX`	2 / 4	Tasks per stage
`CURRICULUM_TEMPERATURE`	0.4	LLM temperature for curriculum design
`TRAINER_TEMPERATURE`	0.7	LLM temperature for training messages
`EVALUATOR_TEMPERATURE`	0.2	LLM temperature for evaluation (low = strict)
`MAX_ATTEMPTS_PER_TASK`	2	Retries per task before moving on
`STAGE_PASS_THRESHOLD`	0.6	Fraction of tasks needed to pass a stage

Adjust STAGE_PASS_THRESHOLD higher (e.g., 0.8) for stricter training, or lower temperatures for more deterministic evaluations.

Key Components

File	Role
`main.py`	Entry point — orchestrates curriculum design → approval → training execution
`trainer.py`	Training session controller — manages dialogue loop and captures per-task/stage learnings
`curriculum.py`	Curriculum data model and LLM-based generation
`claw_agent.py`	Wraps `openclaw agent` CLI for agent communication
`llm_handler.py`	Async LLM client with conversation history management
`learning_logger.py`	Structured experience logger — records training insights and writes to OpenClaw MEMORY.md
`config.py`	Centralized training parameters
`test_phases.py`	Step-by-step pipeline verification

Training Results

After a session completes, training_results.json contains:

{
  "curriculum": {
    "title": "...",
    "overview": "...",
    "target_persona": "...",
    "stages_total": 4,
    "stages_passed": 3
  },
  "stage_reports": [
    {
      "stage_id": 1,
      "stage_name": "...",
      "passed": true,
      "overall_feedback": "...",
      "tasks": [
        {
          "task_id": "1.1",
          "passed": true,
          "score": 8,
          "strengths": ["..."],
          "weaknesses": ["..."],
          "feedback": "..."
        }
      ]
    }
  ]
}

Experience Recording

Training experiences are automatically recorded throughout the session. Every task evaluation, stage result, and infrastructure error is logged to .claw_sergeant_accumulated_lessons/ as structured markdown entries for future reference.

After the session completes, a summary is written to ~/.openclaw/workspace/MEMORY.md containing the training timestamp, curriculum details, stage pass/fail results, and a pointer to the full logs. This allows the Claw agent to reference its training history in future sessions. If the OpenClaw workspace is not found, this step is silently skipped.

Troubleshooting

LLM connection fails: Run python test_phases.py 1 to verify API key and endpoint. Check LLM_BASE_URL points to a valid OpenAI-compatible API.
Claw agent timeout: The default timeout is 120 seconds. If the agent is slow to respond, check network connectivity and the openclaw CLI installation.
Curriculum has no stages: The LLM may have returned malformed JSON. Try lowering CURRICULUM_TEMPERATURE or switching to a more capable model.
All tasks fail: Review evaluation criteria — they may be too strict. Lower STAGE_PASS_THRESHOLD or increase MAX_ATTEMPTS_PER_TASK in config.py.

Dependencies

Python 3.11+
httpx — async HTTP client for LLM API calls
loguru — structured logging
python-dotenv — environment variable management
openclaw CLI — must be installed and accessible in PATH

claw-sergeant

Safety Notice

Copy this and send it to your AI assistant to learn