flaky-test-detective

Detect, diagnose, and fix flaky tests. Identify tests with non-deterministic outcomes by analyzing CI history, test timing, shared state, race conditions, and environment dependencies — then provide targeted fixes.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "flaky-test-detective" with this command: npx skills add charlie-morrison/flaky-test-detective

Flaky Test Detective

Hunt down flaky tests — the ones that pass sometimes and fail sometimes with no code change. Analyze CI run history, detect timing-sensitive tests, find shared mutable state, identify race conditions, and generate targeted fixes for each flaky test.

Use when: "find flaky tests", "test keeps failing randomly", "CI is unreliable", "non-deterministic test failures", "test stability", "intermittent failures", or when builds fail but pass on retry.

Commands

1. detect — Find Flaky Tests

Step 1: Analyze CI History

# GitHub Actions — get recent test failures
gh run list --limit 30 --json conclusion,headBranch,createdAt,databaseId | \
  python3 -c "
import json, sys
runs = json.load(sys.stdin)
failures = [r for r in runs if r['conclusion'] == 'failure']
retried = [r for r in runs if r['headBranch'] == runs[0]['headBranch'] and r['conclusion'] != failures[0]['conclusion']]
print(f'Total runs: {len(runs)}')
print(f'Failures: {len(failures)} ({len(failures)/len(runs)*100:.0f}%)')
if retried:
    print(f'Flaky signal: same branch has both pass and fail')
"

# Download test results (JUnit XML)
gh run download <RUN_ID> --name test-results 2>/dev/null

Step 2: Statistical Detection

Run the test suite multiple times and track results:

# Run tests N times and collect results
RESULTS_FILE="/tmp/flaky-results.json"
echo '[]' > "$RESULTS_FILE"

for i in $(seq 1 5); do
  echo "=== Run $i/5 ==="
  # Capture per-test results (adjust for your framework)
  npm test -- --json 2>/dev/null | python3 -c "
import json, sys
try:
    data = json.load(sys.stdin)
    for suite in data.get('testResults', []):
        for test in suite.get('testResults', []):
            print(f'{test[\"status\"]}\t{test[\"fullName\"]}')
except: pass
" >> "/tmp/run-$i.txt"
done

# Find tests with inconsistent results across runs
python3 -c "
import os, collections
results = collections.defaultdict(list)
for i in range(1, 6):
    path = f'/tmp/run-{i}.txt'
    if os.path.exists(path):
        for line in open(path):
            parts = line.strip().split('\t', 1)
            if len(parts) == 2:
                results[parts[1]].append(parts[0])

for test, outcomes in sorted(results.items()):
    unique = set(outcomes)
    if len(unique) > 1:
        pass_rate = outcomes.count('passed') / len(outcomes) * 100
        print(f'🎯 FLAKY ({pass_rate:.0f}% pass rate): {test}')
        print(f'   Outcomes: {\" → \".join(outcomes)}')
"

Step 3: Pattern Analysis

For each flaky test, analyze the failure to classify the root cause:

Timing-dependent:

# Check for setTimeout, sleep, waitFor with hardcoded timeouts
rg "setTimeout|sleep|waitFor|delay|\.timeout" --type ts --type js -g '*test*' -g '*spec*' 2>/dev/null

Shared state:

# Check for global variables, singletons, shared fixtures
rg "beforeAll|before\(|global\.|singleton|shared" --type ts --type js -g '*test*' -g '*spec*' 2>/dev/null

Test order dependency:

# Run tests in random order
npm test -- --randomize 2>/dev/null
pytest --randomly 2>/dev/null
go test -shuffle=on ./... 2>/dev/null

Environment dependency:

# Check for hardcoded ports, paths, dates, timezones
rg "localhost:[0-9]+|/tmp/|/var/|new Date\(\)|Date\.now\(\)" --type ts --type js -g '*test*' 2>/dev/null

Network dependency:

# Check for real HTTP calls in tests
rg "fetch\(|axios\.|http\.get|requests\.(get|post)" --type ts --type js --type py -g '*test*' 2>/dev/null

Step 4: Generate Report

# Flaky Test Report

## Summary
- Tests analyzed: 342
- Flaky tests found: 7
- Test suite reliability: 98% (target: 99.9%)

## Flaky Tests

### 1. `UserService.test.ts` — "should send welcome email"
- **Pass rate:** 60% (3/5 runs)
- **Root cause:** Timing — test waits 100ms but email service sometimes takes 200ms
- **Fix:** Replace `setTimeout(100)` with `waitFor(() => expect(emailSent).toBe(true))`
- **Category:** ⏱️ Timing

### 2. `OrderController.test.ts` — "should calculate total"
- **Pass rate:** 80% (4/5 runs)
- **Root cause:** Shared state — previous test modifies global discount config
- **Fix:** Reset `DiscountConfig` in `beforeEach()` or isolate with `jest.isolateModules()`
- **Category:** 🔗 Shared State

### 3. `DateFormatter.test.ts` — "should format date correctly"
- **Pass rate:** 40% (2/5 runs)
- **Root cause:** Timezone — test assumes UTC but CI runs in different timezone
- **Fix:** Use `new Date('2024-01-15T00:00:00Z')` instead of `new Date('2024-01-15')`
- **Category:** 🌐 Environment

2. fix — Generate Fixes for Flaky Tests

For each root cause category, apply the appropriate fix pattern:

  • Timing: Replace fixed delays with polling/retry assertions
  • Shared state: Add setup/teardown, use test isolation
  • Order dependency: Make tests independent, reset state
  • Network: Mock external calls, use test fixtures
  • Environment: Pin timezone, use deterministic dates, avoid temp paths

3. quarantine — Manage Flaky Test Quarantine

Move known-flaky tests to a quarantine suite that runs separately:

  • Tag with @flaky or skip with documentation
  • Track quarantined tests in a manifest file
  • Alert when quarantine grows beyond threshold
  • Automatically un-quarantine after fix is merged and verified stable (5 consecutive passes)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Anglo American

Global mining leader Anglo American operates diversified assets including De Beers diamonds, platinum in South Africa, and copper in Chile.

Registry SourceRecently Updated
General

Cn Text Summary

长文本智能摘要工具。提取文章要点、关键词,生成可配置的摘要长度。支持新闻、论文、邮件等多种文本类型。

Registry SourceRecently Updated
General

Cn Todo Today

Todo Today skill. 自动生成,无人工审查。

Registry SourceRecently Updated
General

Barrick Gold

全球第二大金矿公司,加拿大企业,拥有内华达顶级矿产并与沙特PIF合作开发中东非洲矿藏。

Registry SourceRecently Updated