autoresearch-create

Set up and run an autonomous experiment loop for any optimization target. Gathers what to optimize, then starts the loop immediately. Use when asked to "run autoresearch", "optimize X in a loop", "set up autoresearch for X", or "start experiments".

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "autoresearch-create" with this command: npx skills add davebcn87/pi-autoresearch/davebcn87-pi-autoresearch-autoresearch-create

Autoresearch

Autonomous experiment loop: try ideas, keep what works, discard what doesn't, never stop.

Tools

  • init_experiment — configure session (name, metric, unit, direction). Call again to re-initialize with a new baseline when the optimization target changes.
  • run_experiment — runs command, times it, captures output.
  • log_experiment — records result. keep auto-commits. discard/crash/checks_failed auto-reverts code changes (autoresearch files preserved). Always include secondary metrics dict. Dashboard: ctrl+x.

Setup

  1. Ask (or infer): Goal, Command, Metric (+ direction), Files in scope, Constraints.
  2. git checkout -b autoresearch/<goal>-<date>
  3. Read the source files. Understand the workload deeply before writing anything.
  4. Write autoresearch.md and autoresearch.sh (see below). Commit both.
  5. init_experiment → run baseline → log_experiment → start looping immediately.

autoresearch.md

This is the heart of the session. A fresh agent with no context should be able to read this file and run the loop effectively. Invest time making it excellent.

# Autoresearch: <goal>

## Objective
<Specific description of what we're optimizing and the workload.>

## Metrics
- **Primary**: <name> (<unit>, lower/higher is better)
- **Secondary**: <name>, <name>, ...

## How to Run
`./autoresearch.sh` — outputs `METRIC name=number` lines.

## Files in Scope
<Every file the agent may modify, with a brief note on what it does.>

## Off Limits
<What must NOT be touched.>

## Constraints
<Hard rules: tests must pass, no new deps, etc.>

## What's Been Tried
<Update this section as experiments accumulate. Note key wins, dead ends,
and architectural insights so the agent doesn't repeat failed approaches.>

Update autoresearch.md periodically — especially the "What's Been Tried" section — so resuming agents have full context.

autoresearch.sh

Bash script (set -euo pipefail) that: pre-checks fast (syntax errors in <1s), runs the benchmark, outputs METRIC name=number lines. Keep it fast — every second is multiplied by hundreds of runs. Update it during the loop as needed.

autoresearch.config.json (optional)

JSON config file that lives in the pi session's working directory (ctx.cwd). Supported fields:

  • maxIterations (number) — maximum experiments before auto-stopping.
  • workingDir (string) — override the directory for all autoresearch operations: file I/O (autoresearch.jsonl, autoresearch.md, autoresearch.sh, autoresearch.checks.sh, autoresearch.ideas.md), command execution, and git operations. Supports absolute paths or relative paths (resolved against ctx.cwd). The config file itself always stays in ctx.cwd. Fails if the directory doesn't exist.
{
  "workingDir": "/path/to/project",
  "maxIterations": 50
}

autoresearch.checks.sh (optional)

Bash script (set -euo pipefail) for backpressure/correctness checks: tests, types, lint, etc. Only create this file when the user's constraints require correctness validation (e.g., "tests must pass", "types must check").

When this file exists:

  • Runs automatically after every passing benchmark in run_experiment.
  • If checks fail, run_experiment reports it clearly — log as checks_failed.
  • Its execution time does NOT affect the primary metric.
  • You cannot keep a result when checks have failed.
  • Has a separate timeout (default 300s, configurable via checks_timeout_seconds).

When this file does not exist, everything behaves exactly as before — no changes to the loop.

Keep output minimal. Only the last 80 lines of checks output are fed back to the agent on failure. Suppress verbose progress/success output and let only errors through. This keeps context lean and helps the agent pinpoint what broke.

#!/bin/bash
set -euo pipefail
# Example: run tests and typecheck — suppress success output, only show errors
pnpm test --run --reporter=dot 2>&1 | tail -50
pnpm typecheck 2>&1 | grep -i error || true

Loop Rules

LOOP FOREVER. Never ask "should I continue?" — the user expects autonomous work.

  • Primary metric is king. Improved → keep. Worse/equal → discard. Secondary metrics rarely affect this.
  • Simpler is better. Removing code for equal perf = keep. Ugly complexity for tiny gain = probably discard.
  • Don't thrash. Repeatedly reverting the same idea? Try something structurally different.
  • Crashes: fix if trivial, otherwise log and move on. Don't over-invest.
  • Think longer when stuck. Re-read source files, study the profiling data, reason about what the CPU is actually doing. The best ideas come from deep understanding, not from trying random variations.
  • Resuming: if autoresearch.md exists, read it + git log, continue looping.

NEVER STOP. The user may be away for hours. Keep going until interrupted.

Ideas Backlog

When you discover complex but promising optimizations that you won't pursue right now, append them as bullets to autoresearch.ideas.md. Don't let good ideas get lost.

On resume (context limit, crash), check autoresearch.ideas.md — prune stale/tried entries, experiment with the rest. When all paths are exhausted, delete the file and write a final summary.

User Messages During Experiments

If the user sends a message while an experiment is running, finish the current run_experiment + log_experiment cycle first, then incorporate their feedback in the next iteration. Don't abandon a running experiment.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

learn-anything-in-one-hour

Teach users any new skill/knowledge X in ~1 hour using a fixed 4-step workflow optimized for complete beginners, focusing on 80/20 rule for maximum value in minimum time. Triggers when user asks to learn something new quickly, or mentions "learn X in one hour".

Archived SourceRecently Updated
Research

X/Twitter Research

# X/Twitter Research Skill

Archived SourceRecently Updated
Research

council

Convene the Council of High Intelligence — multi-persona deliberation with historical thinkers for deeper analysis of complex problems.

Archived SourceRecently Updated
Research

polymarket-openclaw-trader

Reusable Polymarket + OpenClaw trading operations skill for any workspace. Use when the user needs to set up, run, tune, monitor, and deploy an automated Polymarket trading project (paper/live), including env configuration, risk controls, reporting, and dashboard operations.

Archived SourceRecently Updated