Harness Engineering Playbook
Use this skill to operationalize the practices from OpenAI's Harness Engineering guide in a repo that agents can run against repeatedly and safely.
What To Load
-
Use references/openai-harness-practices.md for the full practice-to-artifact mapping.
-
Use references/rollout-checklist.md for phased adoption in active repos.
-
Use references/wizard-cli.md for Typer wizard command flows.
-
Use assets/templates/ when creating or updating harness files.
Inputs
-
Target repository path.
-
Existing command surface (make , npm , cargo , pytest , etc.).
-
Existing CI workflows and branch protections.
Workflow
-
Baseline the repo and detect existing workflows.
-
Bootstrap harness artifacts and templates.
-
Apply all nine Harness Engineering practices.
-
Run harness audit checks and repair gaps.
-
Iterate after real agent runs.
Step 1: Baseline The Repo
-
Identify language/toolchain and canonical entrypoints.
-
Inventory existing checks, scripts, and CI jobs.
-
Record current pain points for agent runs: setup drift, unclear docs, flaky tests, missing trace IDs, slow loops.
Use a short baseline note inside PLANS.md so decisions remain durable.
Step 2: Bootstrap Harness Artifacts
Preferred entrypoint:
python3 scripts/harness_wizard.py init <repo-path> --profile control
Profiles:
-
baseline : only core harness artifacts.
-
control : baseline + control-system primitives.
-
full : control + entropy controls (nightly audit + entropy checks).
Direct shell fallback:
Run:
./scripts/bootstrap_harness.sh <repo-path>
This script installs safe defaults from assets/templates/ :
-
AGENTS.md
-
PLANS.md
-
docs/ARCHITECTURE.md
-
docs/OBSERVABILITY.md
-
Makefile.harness (+ -include Makefile.harness in Makefile )
-
scripts/audit_harness.sh
-
scripts/harness/{smoke,test,lint,typecheck}.sh
-
.github/workflows/harness.yml
By default, existing files are not overwritten. Pass --force to replace template-managed files.
Step 3: Apply The Nine Practices
Implement each practice directly in repo artifacts.
- Make Easy To Do Hard Thing
-
Ensure hard, high-value tasks are one command away (make smoke , make check , make ci ).
-
Keep setup and cleanup scripted.
-
Make smoke checks cheap enough for frequent use.
- Communicate Actionable Constraints With Compact Docs
-
Keep AGENTS.md short, concrete, and command-first.
-
Document non-obvious constraints and guardrails.
-
Keep docs close to code and update with behavior changes.
- Structure Codebase With Strict Boundaries And Flow
-
Define module boundaries in docs/ARCHITECTURE.md .
-
Parse and validate data at boundaries; use typed contracts for internal flow.
-
Prefer one abstraction per module and one clear ownership path.
- Build Observability In From Day 1
-
Emit structured logs/events with correlation IDs.
-
Capture key transitions in long-running workflows.
-
Define minimum observable fields in docs/OBSERVABILITY.md .
- Optimize For Agent Flow, Not Human Flow
-
Treat context as a first-class system dependency.
-
Use PLANS.md for multi-step/multi-hour tasks.
-
Front-load durable context (scope, constraints, checkpoints) so restarts stay cheap.
- Bring Your Own Harness
-
Standardize repo-local wrappers (Makefile.harness , scripts/harness/ ).
-
Wrap local infra actions in deterministic scripts.
-
Make agent behavior reproducible across machines and runs.
- Prototype In Natural Language First
-
Draft logic and tests in prose before coding.
-
Review edge cases in prose and lock acceptance criteria.
-
Translate approved prose into code and tests.
- Invest In Static Analysis And Linting
-
Pin formatter/linter/typechecker versions where practical.
-
Enforce checks in both local workflow and CI.
-
Run static checks before long tests to shorten failure loops.
- Manage Entropy
-
Add periodic audits for docs drift, flaky checks, and dead scripts.
-
Keep templates synchronized with real workflows.
-
Remove stale abstractions quickly to keep agent context clean.
For a detailed artifact matrix, load references/openai-harness-practices.md .
Step 4: Validate
Run:
python3 scripts/harness_wizard.py audit <repo-path>
Treat any MISSING or FAIL result as blocking before calling harness setup complete.
Step 5: Iterate On Real Runs
-
Observe one full agent run from clean checkout to merged change.
-
Patch harness gaps immediately.
-
Re-run audit.
-
Keep AGENTS.md , PLANS.md , and architecture docs aligned with current behavior.
Adaptation Rules
-
Preserve existing project conventions and replace templates incrementally.
-
Do not overwrite user-authored files without explicit approval.
-
Keep command names stable; change internals behind wrappers.
-
Favor deterministic, scriptable workflows over ad-hoc interactive steps.