Self-Healing Agent

Monitors test failures and fixes them autonomously.

Mode: On startup, fix all existing failures. Then monitor for new failures and fix them as they occur.

Prerequisites

MANDATORY: Before healing, call:

how_to({ type: "context_discovery" })
how_to({ type: "debugging_self_healing" })
how_to({ type: "interactive_debugging" })

Use context_discovery to find the existing SelfHealing artifact (if any) and resume from it rather than creating a duplicate. Also identifies which Feature artifacts have known bugs so you don't try to heal tests that are failing due to real application bugs.

Startup: Fix Existing Failures

On startup, check all tests for failures:

Get list of all failing tests using helpmetest_status
For each failing test:
- Get recent test history to understand failure pattern
- Classify failure type (selector change, timing issue, etc.)
- Determine if fixable or not
For fixable failures (test issues):
- Investigate interactively using helpmetest_run_interactive_command
- Find the fix (new selector, longer timeout, etc.)
- Apply fix using helpmetest_upsert_test
- Verify fix works using helpmetest_run_test
- Document the fix in SelfHealing artifact
For non-fixable failures (likely bugs):
- Document why not fixable in SelfHealing artifact
- Include issue type, error message, recommendation
After processing all existing failures, enter monitoring mode

Monitoring: Respond to New Failures

After startup, monitor for test failures using listen_to_events:

When a test fails:

Get error details
Classify failure pattern
If fixable: investigate, fix, verify, document
If not fixable: document as potential bug
Continue monitoring

Healing Workflow

Use this workflow for both startup failures and new failures during monitoring:

Fetch test history:
- Get last 10 runs using helpmetest_status
- Look for patterns (alternating PASS/FAIL? changing error values?)
Classify failure pattern:
- Check error type (selector? timeout? assertion?)
- Determine root cause using debugging_self_healing.md categories
- Decide: test issue (fixable) vs app bug (not fixable)
If fixable (test issue):
- Investigate interactively using helpmetest_run_interactive_command
- Find the fix (new selector, longer timeout, etc.)
- Validate fix works interactively first
- Apply fix using helpmetest_upsert_test
- Verify fix by running test multiple times (especially for isolation issues)
- Document in SelfHealing artifact:
  - test_id, pattern_detected, fix_applied, verification_result, timestamp
  - Tags: feature:self-healing, test:, pattern:
If not fixable (app bug):
- Document in SelfHealing artifact:
  - issue_type, error_message, why_not_fixable, recommendation
Continue monitoring for new failures

Fixable vs Not Fixable

Fixable (test issues):

Selector changed (element still exists, different selector needed)
Timing issue (element appears later, need longer wait)
Form field added/removed (form structure changed)
Button moved (still exists, different location)
Test isolation (shared state between tests, make idempotent)

Not fixable (likely bugs):

Authentication broken
Server errors (500, 404)
Missing pages/features
Data corruption
API endpoints removed

SelfHealing Artifact

Track all healing activity in a SelfHealing artifact. This gives the user a clear record of what was fixed automatically vs what needs human attention.

{
  "type": "SelfHealing",
  "id": "self-healing-log",
  "name": "SelfHealing: Test Maintenance Log",
  "content": {
    "fixed": [
      {
        "test_id": "test-login",
        "pattern_detected": "selector_change",
        "error": "Element not found: button.submit",
        "fix_applied": "Updated selector to [data-testid='submit-btn']",
        "verification_result": "Test passed on re-run",
        "timestamp": "2024-01-15T10:30:00Z",
        "tags": ["feature:authentication", "pattern:selector_change"]
      }
    ],
    "not_fixed": [
      {
        "test_id": "test-checkout",
        "issue_type": "server_error",
        "error_message": "500 Internal Server Error on POST /api/checkout",
        "why_not_fixable": "Backend endpoint returning 500 - application bug, not a test issue",
        "recommendation": "Investigate checkout API endpoint",
        "timestamp": "2024-01-15T10:35:00Z"
      }
    ],
    "summary": {
      "total_processed": 5,
      "fixed": 3,
      "not_fixable": 2,
      "last_run": "2024-01-15T10:35:00Z"
    }
  }
}

Monitoring Loop

The monitor must run in the background — otherwise the agent is blocked while healing a test and can't receive new events at the same time.

Preferred: use /loop if available. The /loop skill provides background execution and will handle the event loop for you. Pass it the monitoring logic below.

Fallback: spawn as a background Task. If /loop isn't available, spawn a subagent with the Task tool and let it run the loop independently. The foreground conversation stays free while the background agent heals tests.

Monitoring logic (runs inside the background agent):

listen_to_events({ type: "test_run_completed" })

When an event arrives with a failed test:

Check event.test_id and event.error
Get test history: helpmetest_status({ id: event.test_id, testRunLimit: 10 })
Run the healing workflow (classify → investigate → fix or document)
Update SelfHealing artifact with result
Resume listening

Events that arrive while healing a test are queued — they'll be processed once the current fix completes. Report a summary to the SelfHealing artifact periodically so the user can check in at any time.

helpmetest-self-heal

Safety Notice

Copy this and send it to your AI assistant to learn

Self-Healing Agent

Prerequisites

Startup: Fix Existing Failures

Monitoring: Respond to New Failures

Healing Workflow

Fixable vs Not Fixable

SelfHealing Artifact

Monitoring Loop

Source Transparency

Related Skills

helpmetest-validator

helpmetest-debugger

helpmetest-test-generator

helpmetest-visual-check