SKILL: smart-wake

Objective

Ensure long-running tasks do not lose progress when subagent timeout occurs.

Standard mechanism:

Auto-save state before timeout / reaching timeout threshold.
Cron wake to re-invoke agent at scheduled time.
Resume task from latest checkpoint using sessions_spawn (or session_spawn depending on runtime) with appropriate wakeMode.

Activation Triggers

Use smart-wake when any of the following conditions are met:

Task duration exceeds default subagent timeout.
Requires periodic polling (by minute/hour) to continue processing.
Multi-step workflow that may pause mid-execution but must resume accurately.
Running overnight / off-hours but still needs to auto-continue.

Available Tools (Built-in)

sessions_spawn / session_spawn (spawn new work session)
Gateway cron jobs (schedule wake events)
Memory/slot for state persistence (state checkpoint)

Do not create new tools. Only use existing mechanisms.

State Contract (MANDATORY)

Each task must implement checkpointing with minimum schema:

{
  "task_id": "smartwake_<slug>_<timestamp>",
  "goal": "Final objective",
  "status": "running|waiting|blocked|done|failed",
  "current_step": "step_name",
  "progress_pct": 0,
  "last_completed": ["step_a", "step_b"],
  "next_actions": ["action_1", "action_2"],
  "artifacts": ["path/file1", "path/file2"],
  "errors": [],
  "retry_count": 0,
  "updated_at": "ISO_TIMESTAMP"
}

Requirements:

Update checkpoint after each completed step.
Final checkpoint before timeout.
Resume always reads latest checkpoint; never re-run blindly.

Standard Workflow

Step 1 — Preflight (Estimate Timeout)

Determine subagent timeout and total task duration.
If estimated_duration > 70% timeout: enable smart-wake immediately.
Create task_id and initial checkpoint.

Step 2 — Execute by Small Chunks

Divide task into 5–15 minute chunks.
After each chunk:
- Write progress to checkpoint,
- Record artifact/path,
- Update next_actions.

Step 3 — Pre-timeout Auto-save

When ~10–20% timeout remains:
- Flush final checkpoint,
- Set status = waiting,
- Prepare resume payload (task_id + next_actions).

Step 4 — Schedule Cron Wake

Register Gateway cron job to wake after appropriate interval (e.g., 2–10 minutes depending on load).
Cron payload MUST contain:
- task_id
- resume_from=latest_checkpoint
- reason=timeout_recovery

Step 5 — Spawn Resume Session

At wake time, call sessions_spawn (or session_spawn) with wakeMode enabling auto-resume via cron.
Pass concise context:
- objective,
- latest checkpoint,
- next step,
- completion criteria.

Step 6 — Resume + Idempotency Guard

New session must:
1. Read checkpoint,
2. Verify artifacts exist,
3. Skip completed steps,
4. Continue exact next_actions.
When entire task completes: set status = done, cancel remaining cron wakes.

Sample Resume Packet (Reference)

{
  "task": "Resume long-running task",
  "task_id": "smartwake_repo_scan_20260301T120000Z",
  "wakeMode": "cron",
  "resume": {
    "from": "latest_checkpoint",
    "current_step": "collect_phase_2",
    "next_actions": ["fetch page 6-10", "dedupe", "export report"]
  },
  "done_criteria": [
    "output file generated",
    "validation passed",
    "status marked done"
  ]
}

Note: Field names may vary by Gateway/OpenClaw version, but checkpoint + wakeMode + resume context principles are mandatory.

Operational Rules

Never resume without valid checkpoint.
Never overwrite artifacts without checksum/timestamp verification.
No infinite wake loops: limit retry_count (e.g., max 5).
Every wake must have clear reason (timeout_recovery, dependency_ready, scheduled_progress).
Clean up on done: completed task must clear related cron jobs.

Output Format After Each Wake Cycle

{
  "task_id": "smartwake_<...>",
  "status": "running|waiting|done|failed",
  "progress_pct": 65,
  "current_step": "...",
  "resumed_from_checkpoint": true,
  "next_wake_scheduled": true,
  "next_wake_at": "ISO_TIMESTAMP|null",
  "notes": "concise, auditable"
}

Anti-patterns (FORBIDDEN)

Running long tasks without periodic checkpointing.
Waking without passing task_id/resume context.
Re-running entire pipeline from start when only 1 step is missing.
Not canceling cron after done.

Expected Outcomes

No state loss on subagent timeout.
Long tasks progress steadily via cron wake.
Full system auditability: checkpoint → wake → resume → done.

smart-wake

Safety Notice

Copy this and send it to your AI assistant to learn