Testing Strategy

Analyzes a web project and produces a testing strategy that answers the three questions developers struggle with most: what to test, with which type of test, and in what order of priority. When asked, also generates the tests themselves.

Before You Start

Stack and Framework Detection

Scan the project for configuration files, dependency manifests, and existing test files to determine:

Language(s) and runtime
Web framework(s)
Existing testing framework(s) (Jest, Vitest, pytest, JUnit, RSpec, PHPUnit, etc.)
Existing test files — their location, naming convention, and approximate count
Mocking libraries already in use
Test runner configuration (if any)
CI/CD test pipeline (if detectable)

If the project already has tests, the strategy should build on what exists rather than propose a replacement. If no testing framework is present, recommend one based on the detected stack.

Record the detected stack and testing setup at the top of the strategy document.

Scope Assessment

Identify the codebase areas that are testable:

Business logic — Pure functions, calculations, transformations, rules
Data mutation handlers — Server actions, controllers, API routes, views
Data access layer — Repositories, services, ORM queries
UI components — Interactive components with user-facing behavior
Utilities — Helpers, formatters, validators, parsers
Integrations — External API calls, third-party services, database operations
Critical user flows — Authentication, checkout, data submission, onboarding

Map each area to the test types it needs. Read references/test-type-guide.md for detailed decision criteria.

The Testing Pyramid

The testing pyramid is the foundation of the strategy. It defines the ratio of test types:

        /  E2E  \          Few, slow, expensive — critical flows only
       /----------\
      / Component  \       Moderate — UI behavior and interaction
     /--------------\
    /  Integration   \     More — module boundaries, APIs, data access
   /------------------\
  /     Unit Tests     \   Many, fast, cheap — business logic and utilities
 /______________________\

The pyramid exists because of economics: unit tests are fast, cheap, and precise. E2E tests are slow, expensive, and brittle. A healthy strategy has many unit tests, fewer integration tests, fewer component tests, and very few E2E tests.

Recommended ratios (adapt based on project type):

Project Type	Unit	Integration	Component	E2E
API / Backend	60%	30%	—	10%
Fullstack web app	40%	25%	25%	10%
SPA with external API	30%	15%	40%	15%
Static site with forms	20%	10%	50%	20%

This adaptive model reconciles the classic testing pyramid with the Testing Trophy (Kent C. Dodds), which argues that integration tests deliver the most confidence per dollar invested. Both models are valid — the ratios above shift based on where your project's complexity lives, not on dogma. An API-heavy backend leans toward the pyramid; a frontend-heavy SPA leans toward the trophy.

These are guidelines, not rules. The right ratio depends on where the complexity lives.

Decision Framework

For each piece of code in the project, use this decision tree to determine the right test type:

Is it a pure function with no dependencies? → Unit test. Fast, isolated, test inputs and outputs.

Does it coordinate multiple internal modules? → Integration test. Test the modules working together with real (or realistic) dependencies.

Does it call an external service, database, or API? → Integration test with mocked external boundaries. The external service is mocked, but internal logic runs for real.

Is it a UI component with user interaction? → Component test. Render the component, simulate user actions, assert on visible output. Don't test implementation details.

Is it a critical end-to-end user flow? → E2E test. Only for flows where failure means business impact: authentication, checkout, data submission, onboarding.

Is it glue code, configuration, or trivial logic? → Don't test it. Testing configuration files, simple getters, or framework boilerplate adds cost without value.

Read references/test-type-guide.md for detailed criteria, examples, and edge cases for each test type.

Coverage Prioritization

When time is limited (it always is), test in this order:

Priority 1 — Security and data integrity:

Authentication and authorization logic
Data mutation handlers (create, update, delete)
Input validation (especially server-side)
Payment or financial calculations

Priority 2 — Core business logic:

Domain rules and calculations
Data transformations
State machines and workflow logic
Complex conditional logic

Priority 3 — Integration boundaries:

API endpoint contracts (request/response shapes)
Database queries for critical operations
External service interactions

Priority 4 — User-facing behavior:

Interactive UI components (forms, modals, wizards)
Error states and loading states
Accessibility-critical interactions

Priority 5 — Utilities and edge cases:

Helper functions and formatters
Edge cases in already-tested code
Error message accuracy

Read references/coverage-strategy.md for coverage targets, metrics guidance, and how to assess existing coverage gaps.

Mocking Strategy

Mocking is necessary but dangerous. Over-mocking means testing mocks instead of code. Under-mocking means slow, flaky tests.

Mock at the boundary, not inside the unit:

Mock external services (APIs, databases, file systems, email providers)
Mock time-dependent operations (dates, timers, randomness)
Don't mock the code you're testing
Don't mock internal collaborators unless they're expensive to set up

Use the lightest mock that works:

Stub — Returns predetermined data. Use when you need to control what a dependency returns.
Spy — Records calls. Use when you need to verify a dependency was called correctly.
Fake — Working implementation with shortcuts. Use for databases (in-memory DB) or APIs (local server).
Full mock — Replaces everything. Use only when nothing lighter works.

When testing tells you something: If a function is hard to test because it has too many dependencies, that's not a testing problem — it's a design problem. The difficulty of testing is feedback about the code's design. Consider refactoring the code rather than adding more mocks.

Anti-Patterns

Read references/anti-patterns.md for detailed descriptions and fixes. The most critical ones:

Testing implementation, not behavior — Tests break when you refactor, even though behavior didn't change
Over-mocking — Tests pass but code is broken because mocks hide real bugs
Shared mutable state — Tests pass individually but fail when run together
Slow tests — External calls in unit tests, no test isolation
Testing trivial code — Wasting effort on getters, setters, and configuration

Output Modes

This skill operates in two modes:

Strategy Mode (default)

When the user asks for a testing strategy, analysis, or recommendations, produce a strategy document that includes:

Detected stack and testing setup
Current coverage assessment (if tests exist)
Testing pyramid recommendation with ratios for this project
Prioritized test plan — what to test, in what order, with which test type
Mocking recommendations — what to mock and how
Suggested testing framework and tools (if not already in place)
Quick wins — 3-5 highest-impact tests to write first

Generation Mode

When the user asks to generate, write, or create tests, switch to generation mode:

Read the strategy (generate one first if none exists)
Identify the target code to test
Determine the appropriate test type from the decision framework
Generate tests following the project's existing conventions (naming, file location, framework)
Apply the AAA pattern (Arrange-Act-Assert) for every test
Include happy path, error cases, and critical edge cases
Use the mocking strategy from the recommendations

When generating tests, follow the project's existing test conventions. If no conventions exist, place test files adjacent to the code they test and use the framework's standard naming convention.

Critical Rules

Strategy before code. Always understand what needs testing before writing tests. A strategy document, even a brief one, prevents wasted effort.
Behavior over implementation. Tests should verify what code does, not how it does it. If a refactor breaks tests without changing behavior, the tests are wrong.
One reason to fail. Each test should fail for exactly one reason. If a test has multiple assertions testing different behaviors, split it.
Tests are documentation. A well-named test tells the next developer what the code is supposed to do. Invest in test names.
Don't chase 100%. 100% coverage is not the goal. Meaningful coverage of critical paths is worth more than exhaustive coverage of trivial code.
Adapt to the stack. Use the project's native testing vocabulary. If it's pytest, say "fixtures" not "beforeEach". If it's JUnit, say "@BeforeEach" not "setup".

Edge Cases

No tests exist yet: Start with the strategy document. Recommend a testing framework. Identify the 5 highest-priority test targets. Generate those first.
Tests exist but no strategy: Analyze existing tests for patterns, coverage gaps, and anti-patterns. Build the strategy around what exists.
Microservices: Each service gets its own strategy. Cross-service tests are E2E by definition.
Legacy code without dependency injection: Acknowledge that some code is hard to test without refactoring first. Recommend testing at the integration level and refactoring incrementally.
Project with only E2E tests: This is an inverted pyramid. The strategy should propose adding unit and integration tests to reduce reliance on slow E2E tests.