Ralph TDD Loop

Naming: Skill and script are both ralph-tdd (the capability). Ralph is designed to run AFK (away-from-keyboard); the script is ralph-tdd.sh .

Ralph runs AI coding agents in an AFK loop. The agent picks tasks from a backlog, implements with TDD, verifies test quality with mutation testing, and commits. You come back to working code.

TDD: Use the mattpocock/skills/tdd skill for red-green-refactor and vertical slicing (one test → one impl). Install: npx skills add mattpocock/skills@tdd . Ralph adds the backlog loop and mutation gate on top.

Architecture

┌──────────────────────────────────────────────────────┐ │ RALPH OUTER LOOP (per task) │ │ │ │ 1. Read .ralph/progress.md + .ralph/lessons.md │ │ 2. Read backlog (Linear, GitHub Issues, PRD, etc.) │ │ 3. Pick highest-priority unfinished task │ │ 4. TDD red-green-refactor (see ref below) │ │ 5. Run feedback loops (types, lint, tests) │ │ 6. Verify: "Would a staff engineer approve this?" │ │ │ │ ┌────────────────────────────────────────────────┐ │ │ │ MUTATION QUALITY GATE (see ref below) │ │ │ │ 7. Run incremental mutation testing │ │ │ │ 8. Kill survivors on touched files │ │ │ │ 9. Repeat until score >= 95% │ │ │ └────────────────────────────────────────────────┘ │ │ │ │ 10. Mark task done, append to .ralph/progress.md │ │ Update .ralph/lessons.md if anything learned │ │ 11. Commit │ └──────────────────────────────────────────────────────┘

Outer loop = Ralph picking tasks. Inner loop = mutation quality gate. The gate prevents "green but useless" tests — a constraint the AI can't cheat its way out of.

Mutation quality gate (steps 7–9)

After tests pass: run npm run test:mutate:incremental (or project equivalent). For each surviving mutant on files you changed, write a test that would fail with the mutation, then re-run until mutation score ≥ 95% on those files. Full workflow and setup: use the mutation-testing skill (this repo; install with Ralph stack).

Reference guide

Everything except progress format comes from installed skills (install with Ralph stack). Project-specific commands: use package.json scripts and config (vitest.config, playwright.config).

Topic Use Load when

TDD mattpocock/skills@tdd Red-green-refactor, vertical slices, good vs bad tests

Vitest antfu/skills@vitest Unit tests, Vitest API

Mutation testing mutation-testing skill (this repo) Stryker, survivors, setup

E2E wshobson/agents@e2e-testing-patterns E2E/Playwright patterns

AGENTS.md create-agents-md skill (this repo) Creating AGENTS.md when missing

Progress format references/progress-format.md Appending to .ralph/progress.md or .ralph/lessons.md (Ralph-specific)

Pre-Flight Checklist

Before going AFK, gather all of this. Ask the user until every item is answered.

Question Default

1 Project name and working directory —

2 Backlog source (Linear team, GitHub repo, local PRD file) —

3 Tasks to skip or focus on? Priority order

4 How many iterations? 5

5 Agent runtime — see Agent Runtimes Codex

6 Permission mode — see Permission Modes Full auto

7 Feedback commands: typecheck, lint, test, mutation Auto-detect

8 Does AGENTS.md exist? If not, the Ralph script will prompt the agent to run the create-agents-md skill first. —

9 Start fresh .ralph/progress.md or continue existing? Fresh

10 Does .ralph/lessons.md exist? Create if not (persists across sprints). —

11 Commit per task, or batch? Per task

12 Branch — current or create new? Current

13 Anything off-limits? None

After gathering answers, confirm back:

Ready to go AFK:

Project: [name] on branch [branch]
Backlog: [source] — [N] iterations, priority order
Agent: [runtime] with [permission mode]
Feedback: tsc → biome → vitest → stryker (incremental)
Commit after each task

Anything to change?

Only start after user confirms.

Agent Runtimes

The Ralph TDD script supports multiple agent CLIs. Set AGENT_CMD in the script.

Runtime Command Notes

Codex (default) codex --approval-mode full-auto -q

OpenAI Codex CLI. -q for quiet/non-interactive.

Claude Code claude -p --dangerously-skip-permissions

Full auto. Best for AFK.

Claude Code (semi) claude -p --permission-mode acceptEdits

Allows edits, blocks shell. May stall AFK.

For true AFK, use full-auto permission modes. Semi-auto modes may prompt for approval and stall the loop.

Permission Modes

Mode Claude Code Flag Codex Flag Risk Best For

Full auto --dangerously-skip-permissions

--approval-mode full-auto

Agent can run any command Trusted repos, overnight runs

Accept edits --permission-mode acceptEdits

--approval-mode auto-edit

Blocks on shell commands Semi-trusted, may stall

Default (none) --approval-mode suggest

Blocks on everything Not suitable for AFK

Recommendation: Use full-auto for AFK. The mutation testing quality gate and test suite act as safety nets. If tests pass and mutations are killed, the code is likely correct regardless of what commands ran.

Setup

Run the Ralph TDD script

See scripts/ralph-tdd.sh and run it directly from the skills repo (no copy required).

Make executable:

chmod +x /Users/jonathanmumm/src/skills/ralph-tdd/scripts/ralph-tdd.sh

Run:

/Users/jonathanmumm/src/skills/ralph-tdd/scripts/ralph-tdd.sh
--project /abs/path/to/your-repo
--iterations 5

Optional:

Use Claude runtime instead of Codex

/Users/jonathanmumm/src/skills/ralph-tdd/scripts/ralph-tdd.sh
--project /abs/path/to/your-repo
--iterations 5
--agent claude

Typically run AFK.

Create .ralph/progress.md

Ralph scripts write progress and lessons under .ralph/ and ensure .ralph/ is in the project’s .gitignore so these files are not committed.

Progress

Agent working memory. Delete after sprint.

See references/progress-format.md for entry format and promise tags (COMPLETE, BLOCKED, DECIDE).

Create AGENTS.md (if missing)

The agent's onboarding doc — project description, tech stack, feedback commands, conventions, off-limits. If AGENTS.md doesn't exist, the Ralph script instructs the agent to run the create-agents-md skill (this repo) to create it from the template, then continue.

Create .ralph/lessons.md

Lessons

Patterns and rules learned during development. Review at the start of each iteration.

The agent updates this file after any failed approach, mistake, or course correction. Unlike .ralph/progress.md (what was done), .ralph/lessons.md captures what to avoid — it persists across iterations and prevents repeating the same class of mistake.

See references/progress-format.md for entry format and promise tags (COMPLETE, BLOCKED, DECIDE).

Task Prioritization

Architectural decisions — cascade through entire codebase
Integration points — reveals incompatibilities early
Unknowns / spikes — fail fast
Features — implementation work
Polish — save for last

Task Sources & Work Tracking

Use Linear for tracking work when the backlog is a Linear team: mark the current task in-progress when starting, and mark it done when the task is complete (before committing). Use Linear MCP or linear CLI. Same idea for GitHub Issues or a local PRD — update status so progress is visible.

Source How

Linear MCP or CLI. Mark issue in-progress → implement → mark done. Preferred when available.

GitHub Issues gh issue list , gh issue close (or update labels/state)

PRD file Local prd.md with checklist; tick off items as done

Optional: Critical work before backlog

Some setups (e.g. pro-ralph) use a STEERING.md (or similar) file that the agent must complete before picking backlog tasks: one-time env fixes, install deps, install Playwright browsers, start dev server, etc. You can add a step in your prompt: "Check .agent/STEERING.md (or PROJECT_ROOT/STEERING.md); complete items in sequence and remove when done. Only then proceed to the backlog." This avoids burning iterations on broken env.

Alternative Loop Types

Same Ralph pattern works for non-feature work:

Loop Focus

Mutation Score Kill surviving mutants across codebase

Test Coverage Write tests for uncovered lines

Lint Fix lint errors one at a time

Refactor Code smells → extract, simplify

ralph-tdd

Safety Notice

Copy this and send it to your AI assistant to learn

Use Claude runtime instead of Codex

Progress

Lessons

Source Transparency

Related Skills

create-agents-md

mutation-testing

dont-use-use-effect

react-render-performance