Test Architect
Design test strategies, analyze coverage gaps, identify edge cases, diagnose flaky tests, and audit test suite architecture.
Scope: Test design and analysis only. NOT for running tests or CI/CD (devops-engineer), code review (honest-review), or TDD workflow.
Dispatch
| $ARGUMENTS | Action |
|---|---|
design <feature/module> | Design test strategy and pyramid for a feature or module |
generate <file/function> | Generate test cases (strategy text or actual test code based on context) |
gaps | Analyze coverage gaps from coverage reports |
edge-cases <function> | Systematic edge case identification for a function |
flaky | Diagnose flaky tests from logs and code |
review | Audit test suite architecture |
| Empty | Show mode menu with examples |
Canonical Vocabulary
Use these terms exactly throughout all modes:
| Term | Definition |
|---|---|
| test pyramid | Layered test distribution: unit (base), integration (middle), e2e (top) |
| coverage gap | Code path with no test coverage, weighted by complexity risk |
| edge case | Input at boundary conditions, null/empty, type coercion, overflow, unicode, concurrent |
| flaky test | Test with non-deterministic pass/fail behavior across identical runs |
| mutation score | Percentage of injected mutations detected by the test suite |
| test strategy | Document defining what to test, how, at what layer, with what tools |
| property-based test | Test asserting invariants over generated inputs (Hypothesis/fast-check) |
| test isolation | Guarantee that tests do not share mutable state or execution order dependencies |
| fixture | Reusable test setup/teardown providing controlled state |
| test surface | Set of public interfaces, code paths, and states requiring test coverage |
Mode 1: Design
/test-architect design <feature/module>
Surface Analysis
- Read the feature/module code. Map the test surface: public API, internal paths, state transitions, error conditions.
- Classify complexity: simple (pure functions), moderate (I/O, state), complex (distributed, concurrent, multi-service).
Pyramid Design
- Design test pyramid:
- Unit layer: Pure logic, transformations, validators. Target: 70-80% of tests.
- Integration layer: Database, API, file I/O, service boundaries. Target: 15-25%.
- E2E layer: Critical user flows only. Target: 5-10%.
- For each layer, list specific test cases with: description, input, expected output, rationale.
- Recommend framework and tooling based on language/ecosystem.
- Output: structured strategy document with pyramid diagram, case list, and priority order.
Reference: read references/test-pyramid.md for layer guidance.
Mode 2: Generate
/test-architect generate <file/function>
- Read the target file/function. Identify signature, dependencies, side effects.
- Determine output format:
- If test file exists for target: generate actual test code matching existing patterns.
- If no test file exists: generate test strategy text with case descriptions.
- If user specifies
--code: always generate test code.
- Generate test cases covering:
- Happy path (expected inputs and outputs)
- Error path (invalid inputs, exceptions, timeouts)
- Edge cases (run edge-case-generator.py if function has typed parameters)
- Boundary conditions (min/max values, empty collections, null)
- Follow framework conventions: read references/framework-patterns.md for pytest/jest/vitest patterns.
- Output: test cases or test code with clear section headers per category.
Mode 3: Gaps
/test-architect gaps
- Locate coverage reports. Search for:
coverage.json,coverage.xml,.coverage(Python/coverage.py)lcov.info,coverage/lcov.info(JS/lcov)htmlcov/,coverage/directories
- Run coverage analyzer:
uv run python skills/test-architect/scripts/coverage-analyzer.py <report-path> - Parse JSON output. Rank gaps by complexity-weighted risk.
- For each gap, assess:
- What code is untested and why it matters
- Complexity score (cyclomatic complexity proxy)
- Recommended test type (unit/integration/e2e)
- Priority (P0: security/auth, P1: core logic, P2: utilities, P3: cosmetic)
- Render dashboard if 10+ gaps:
Copy templates/dashboard.html to a temporary file Inject gap data JSON into <script id="data"> tag Open in browser - Output: prioritized gap list with recommended actions.
Reference: read references/coverage-analysis.md for interpretation guidance.
Mode 4: Edge Cases
/test-architect edge-cases <function>
- Read the function. Extract parameter types, return types, and constraints.
- Run edge case generator:
uv run python skills/test-architect/scripts/edge-case-generator.py --name "<function_name>" --params "<param1:type,param2:type>" - Parse JSON output. Review generated categories:
- Null/empty: None, "", [], {}, 0, False
- Boundary: min/max int, float limits, string length limits
- Type coercion: "123" vs 123, True vs 1, None vs "null"
- Overflow: large numbers, deep nesting, long strings
- Unicode: emoji, RTL text, zero-width chars, combining marks
- Concurrent: race conditions, deadlocks, stale reads
- For each edge case, provide: input value, expected behavior, rationale.
- Flag cases where current code would likely fail (no guard, no validation).
Reference: read references/edge-case-heuristics.md for category details.
Mode 5: Flaky
/test-architect flaky
Log Collection
- Locate test result logs. Search for:
- CI logs, pytest output, jest output
.pytest_cache/,test-results/- Ask user for log path if not found
- Run flaky test analyzer:
uv run python skills/test-architect/scripts/flaky-test-analyzer.py <log-path>
Root Cause Classification
- Parse JSON output. For each flaky test:
- Failure count vs pass count
- Failure pattern (timing, ordering, resource, state)
- Likely root cause classification:
- Timing: sleep/timeout dependencies, race conditions
- Ordering: test execution order dependencies
- Resource: external service, database, file system
- State: shared mutable state between tests
- Environment: platform-specific, timezone, locale
- Recommend fix strategy per root cause.
- Prioritize by failure frequency and blast radius.
Reference: read references/flaky-diagnosis.md for root cause patterns.
Mode 6: Review
/test-architect review
- Scan the test suite. Map: test file count, framework(s), directory structure.
- Assess architecture dimensions:
- Pyramid balance: ratio of unit:integration:e2e tests
- Isolation: shared state, global fixtures, test ordering dependencies
- Naming: consistency, descriptiveness, convention adherence
- Coverage distribution: even vs clustered coverage
- Fixture health: duplication, complexity, setup/teardown balance
- Assertion quality: specific assertions vs generic assertTrue
- Speed: identify slow tests (>1s unit, >10s integration)
- Determinism: potential flakiness indicators
- Run coverage analyzer if reports exist.
- Cross-reference with source code:
- Untested public APIs
- Tests for deleted/renamed code (orphaned tests)
- Missing negative test cases
- Output: architecture audit report with scores per dimension, findings, and recommendations.
Reference: read references/test-suite-audit.md for scoring criteria.
Reference Files
Load ONE reference at a time. Do not preload all references into context.
| File | Content | Read When |
|---|---|---|
| references/test-pyramid.md | Test pyramid layers, distribution targets, anti-patterns | Mode 1 (Design) |
| references/framework-patterns.md | pytest, jest, vitest patterns and conventions | Mode 2 (Generate), Mode 6 (Review) |
| references/coverage-analysis.md | Coverage report interpretation, complexity weighting | Mode 3 (Gaps) |
| references/edge-case-heuristics.md | Edge case categories by data type, generation strategies | Mode 4 (Edge Cases) |
| references/flaky-diagnosis.md | Flaky test root causes, fix strategies, prevention patterns | Mode 5 (Flaky) |
| references/test-suite-audit.md | Test architecture scoring rubric, quality dimensions | Mode 6 (Review) |
| references/property-testing.md | Property-based testing with Hypothesis and fast-check | Mode 1 (Design), Mode 2 (Generate) |
| references/mutation-testing.md | Mutation testing plan design, tool integration | Mode 1 (Design), Mode 6 (Review) |
| Script | When to Run |
|---|---|
| scripts/coverage-analyzer.py | Mode 3 (Gaps) -- parse coverage reports |
| scripts/edge-case-generator.py | Mode 4 (Edge Cases) -- generate edge cases from function signature |
| scripts/flaky-test-analyzer.py | Mode 5 (Flaky) -- parse test logs for flaky indicators |
| Template | When to Render |
|---|---|
| templates/dashboard.html | Mode 3 (Gaps) with 10+ gaps -- coverage gap visualization |
Critical Rules
- Never run tests -- design and analyze only. Suggest commands but do not execute.
- Never modify source code -- test architecture is advisory, not implementation.
- Always recommend the correct test layer (unit/integration/e2e) for each test case.
- Edge cases must include rationale -- "why this matters" not just "try this input."
- Coverage gaps must be prioritized by risk, not by line count.
- Flaky test diagnosis must identify root cause category before recommending fixes.
- Framework recommendations must match the project's existing stack.
- Property-based testing is recommended only when invariants are identifiable.
- Load ONE reference file at a time -- do not preload all references.
- Every finding must cite the specific file and function it applies to.
- Test generation must follow existing test patterns in the project when present.
- Dashboard rendering requires 10+ gaps -- do not render for small gap sets.