Test-Driven Development
The Essence
TDD is a design workflow, not a testing technique. Writing a test is an interface design act — you decide how a behavior should be called. Making it pass is a learning act — you discover the simplest implementation. Refactoring is an implementation design act — you improve internal structure.
Every behavior is born from this cycle:
Describe the behavior in a test → Make it real → Clean up
A test that errors on import is not a failing test. A cycle that stops at RED is not a cycle.
Workflow Overview
- Detect project context (test framework, conventions)
- Confirm intent with user (strict TDD vs legacy mode)
- Test List — enumerate behavioral scenarios (alive, evolves during coding)
- Cycle — for each item: Write Test → Make Pass → Refactor → Update List
- Verify test quality and isolation
Step 0: Detect Project Context
Run scripts/detect_test_env.sh from the project root. If the script is unavailable, manually check:
- Test framework (Jest, Vitest, pytest, Go test, cargo test, etc.)
- Test file pattern (
.test.ts,.spec.ts,_test.go,test_*.py) - Test execution command (
package.jsonscripts,Makefile, etc.) - Existing test directory structure
Adapt all subsequent commands to the detected framework. Never assume npm test.
Step 1: Confirm User Intent
Strict TDD (default for new features/bug fixes):
- Write failing test first, then implement
Legacy mode (existing code without tests):
- See
references/legacy-mode.md
Not applicable — skip TDD for:
- Configuration files, auto-generated code, declarative CSS, throwaway prototypes
Step 2: Test List (Dynamic)
Create a list of behaviors this change needs to support. This is behavioral analysis.
GOOD (behaviors): BAD (implementation steps):
- adds two positive numbers - create Calculator class
- returns 0 for 0 + 0 - implement add() method
- handles negative results - add validation logic
- rejects non-numeric input - handle edge cases
Rules:
- Write entries in plain language, not code
- Each entry describes ONE observable behavior
- Order from simplest/most central to complex/edge-case
- Share with user, then start coding — do NOT wait for exhaustive approval
- This list is ALIVE — add, remove, reorder items as you learn from each cycle
- See
references/test-case-derivation.mdfor systematic discovery techniques
Step 3: TDD Cycles
One cycle = one behavior. A cycle is NOT complete until GREEN.
Pick one item from the test list. Execute this cycle:
DO NOT write all tests first, then all implementation.
WRONG (horizontal): test1, test2, test3 → impl1, impl2, impl3
RIGHT (vertical): test1→impl1 → test2→impl2 → test3→impl3
WRITE THE TEST (Interface Design Happens Here)
Write a test for the chosen behavior. As you write, you are designing the interface:
- Function name, parameters, return type, error format
- The test IS the first client of the API — design for the caller
Use Arrange-Act-Assert. Your assertion must express a CONCRETE expected value. Never compute the expected value with the same logic you plan to implement.
See references/test-quality.md for good/bad test patterns.
MAKE THE TEST RUNNABLE (This Is Not RED Yet)
Before the test can fail meaningfully, it must RUN. Create scaffolding:
# Python: create calculator.py
def add(a, b):
pass
// TypeScript: create calculator.ts
export function add(a: number, b: number): number {
return undefined as any;
}
// Go: create calculator.go
func Add(a, b int) int {
return 0
}
These stubs are NOT production code. They are scaffolding so the test runner can execute your test and reach the assertion.
RED — Confirm the Test Fails for the Right Reason
Run the test. Classify the result:
VALID RED — assertion fails with wrong value:
✗ Expected 5 but received 0
✗ Expected "confirmed" but received undefined
✗ Expected function to throw but it did not
→ Proceed to GREEN.
INVALID — infrastructure error (test never reached the assertion):
✗ Cannot find module './calculator'
✗ TypeError: add is not a function
✗ SyntaxError: Unexpected token
→ Fix scaffolding (create file, add stub). Re-run. Loop until you get a VALID RED.
INVALID — test passes immediately: → Test is wrong. It tests existing behavior or has weak assertions. Rewrite.
The rule: your assertion line must EXECUTE and FAIL.
GREEN — Make It Pass with Minimal Code
Write just enough code to make THIS test pass. All previous tests must also pass.
Three strategies (choose based on confidence):
-
Fake It (default when unsure) — return a hardcoded value:
Test: expect(add(2, 3)).toBe(5) Code: return 5; ← literally thisThe NEXT test will force generalization.
-
Triangulation — when 2+ tests demand different hardcoded values, NOW generalize. Not before. This is how TDD drives you from specific to general.
-
Obvious Implementation — if the correct general solution is immediately clear AND trivially simple, write it. If you hesitate, Fake It instead.
No speculative features (YAGNI). No refactoring yet.
REFACTOR (Only When Green)
All tests pass. Now improve the code:
- Remove duplication (but duplication is a hint, not a command)
- Improve names, extract helpers, simplify structure
- Run tests after EVERY change — stay GREEN
- Never add behavior during refactor (new return value or exception = new behavior = new test first)
- See
references/design-and-refactoring.md
UPDATE TEST LIST AND REPEAT
After each cycle:
- Did you discover a new case? Add it to the list.
- Is an item no longer relevant? Remove it.
- Pick the next item and repeat until the list is empty.
Mocking Rules
Mock ONLY at system boundaries: external APIs, databases (prefer test DB), time, randomness.
Never mock your own classes or internal collaborators.
See references/mocking-guidelines.md.
Per-Cycle Checklist (all must be true before reporting to user)
[ ] Test describes behavior, not implementation
[ ] Test uses public interface only
[ ] Assertion executed and failed with WRONG VALUE (not import/type error)
[ ] Wrote minimal code to make test pass (Fake It / Triangulation / Obvious)
[ ] ALL tests pass (including pre-existing)
[ ] No speculative features added
[ ] Reported result AFTER GREEN, not after RED
Completion Checklist
[ ] Every behavior has a test that was seen failing (assertion failure) first
[ ] Edge cases and error paths covered
[ ] All tests pass with clean output
[ ] Tests run independently (no order dependency)
[ ] Test names read as behavior specifications
When Stuck
| Problem | Solution |
|---|---|
| Don't know how to test | Write the API you wish existed. Assert first. Ask user. |
| Test too complicated | Design too coupled. Simplify the interface. |
| Must mock everything | Code too coupled. Use dependency injection. |
| Test passes immediately | Strengthen assertions. Verify it tests NEW behavior. |
| Import error on first run | Create stub file/function first, then re-run. |
| Tempted to skip TDD | See references/discipline.md |
Resources
references/test-quality.md— Good vs bad tests, naming, AAA patternreferences/test-case-derivation.md— Systematic test case discoveryreferences/mocking-guidelines.md— When/how to mock, test doublesreferences/design-and-refactoring.md— Interface design, deep modules, refactoringreferences/discipline.md— Common rationalizations, red flagsreferences/legacy-mode.md— Adding tests to existing codescripts/detect_test_env.sh— Auto-detect test framework and conventions