Writing Tests
Core Philosophy: Test user-observable behavior with real dependencies. Tests should survive refactoring when behavior is unchanged.
Iron Laws:
Testing Trophy Model
Write tests in this priority order:
-
Integration Tests (PRIMARY) - Multiple units with real dependencies
-
E2E Tests (SECONDARY) - Complete workflows across the stack
-
Unit Tests (RARE) - Pure functions only (no dependencies)
Default to integration tests. Only drop to unit tests for pure utility functions.
Pre-Test Workflow
BEFORE writing any tests, copy this checklist and track your progress:
Test Writing Progress:
- Step 1: Review project standards (check existing tests)
- Step 2: Understand behavior (what should it do? what can fail?)
- Step 3: Choose test type (Integration/E2E/Unit)
- Step 4: Identify dependencies (real vs mocked)
- Step 5: Write failing test first (TDD)
- Step 6: Implement minimal code to pass
- Step 7: Verify coverage (happy path, errors, edge cases)
Before writing any tests:
-
Review project standards - Check existing test files, testing docs, or project conventions
-
Understand behavior - What should this do? What can go wrong?
-
Choose test type - Integration (default), E2E (critical workflows), or Unit (pure functions)
-
Identify dependencies - What needs to be real vs mocked?
Test Type Decision
Is this a complete user workflow? → YES: E2E test
Is this a pure function (no side effects/dependencies)? → YES: Unit test
Everything else: → Integration test (with real dependencies)
Mocking Guidelines
Default: Don't mock. Use real dependencies.
Only Mock These
-
External HTTP/API calls
-
Time-dependent operations (timers, dates)
-
Randomness (random numbers, UUIDs)
-
File system I/O
-
Third-party services (payments, analytics, email)
-
Network boundaries
Never Mock These
-
Internal modules/packages
-
Database queries (use test database)
-
Business logic
-
Data transformations
-
Your own code calling your own code
Why: Mocking internal dependencies creates brittle tests that break during refactoring.
Before Mocking, Ask:
-
"What side effects does this method have?"
-
"Does my test depend on those side effects?"
-
If yes → Mock at lower level (the slow/external operation, not the method test needs)
-
Unsure? → Run with real implementation first, observe what's needed, THEN add minimal mocking
Mock Red Flags
-
"I'll mock this to be safe"
-
"This might be slow, better mock it"
-
Can't explain why mock is needed
-
Mock setup longer than test logic
-
Test fails when removing mock
Integration Test Pattern
describe("Feature Name", () => { setup(initialState)
test("should produce expected output when action is performed", () => { // Arrange: Set up preconditions // Act: Perform the action being tested // Assert: Verify observable output }) })
Key principles:
-
Use real state/data, not mocks
-
Assert on outputs users/callers can observe
-
Test the behavior, not the implementation
For language-specific patterns, see the Language-Specific Patterns section.
Async Waiting Patterns
When tests involve async operations, avoid arbitrary timeouts:
// BAD: Guessing at timing sleep(500) assert result == expected
// GOOD: Wait for the actual condition wait_for(lambda: result == expected)
When to use condition-based waiting:
-
Tests use sleep , setTimeout , or arbitrary delays
-
Tests are flaky (pass locally, fail in CI)
-
Tests timeout when run in parallel
-
Waiting for async operations to complete
Delegate to skill: When you encounter these patterns, invoke Skill(ce:condition-based-waiting) for detailed guidance on implementing proper condition polling and fixing flaky tests.
Assertion Strategy
Principle: Assert on observable outputs, not internal state.
Context Assert On Avoid
UI Visible text, accessibility roles, user-visible state CSS classes, internal state, test IDs
API Response body, status code, headers Internal DB state directly
CLI stdout/stderr, exit code Internal variables
Library Return values, documented side effects Private methods, internal state
Why: Tests that assert on implementation details break when you refactor, even if behavior is unchanged.
Test Data Management
Use source constants and fixtures, not hard-coded values:
// Good - References actual constant or fixture expected_message = APP_MESSAGES.SUCCESS assert response.message == expected_message
// Bad - Hard-coded, breaks when copy changes assert response.message == "Action completed successfully!"
Why: When product copy changes, you want one place to update, not every test file.
Anti-Patterns to Avoid
Testing Mock Behavior
// BAD: Testing that the mock was called, not real behavior mock_service.assert_called_once()
// GOOD: Test the actual outcome assert user.is_active == True assert len(sent_emails) == 1
Gate: Before asserting on mock calls, ask "Am I testing real behavior or mock interactions?" If testing mocks → Stop, test the actual outcome instead.
Test-Only Methods in Production
// BAD: destroy() only used in tests - pollutes production code class Session: def destroy(self): # Only exists for test cleanup ...
// GOOD: Test utilities handle cleanup
In test_utils.py
def cleanup_session(session): # Access internals here, not in production code ...
Gate: Before adding methods to production code, ask "Is this only for tests?" Yes → Put in test utilities.
Mocking Without Understanding
// BAD: Mock prevents side effect test actually needs mock(database.save) # Now duplicate detection won't work!
add_item(item) add_item(item) # Should fail as duplicate, but won't
// GOOD: Mock at correct level mock(external_api.validate) # Mock slow external call only
add_item(item) # DB save works, duplicate detected add_item(item) # Fails correctly
Incomplete Mocks
// BAD: Partial mock - missing fields downstream code needs mock_response = { status: "success", data: {...} // Missing: metadata.request_id that downstream code uses }
// GOOD: Mirror real API completely mock_response = { status: "success", data: {...}, metadata: {request_id: "...", timestamp: ...} }
Gate: Before creating mocks, check "What does the real thing return?" Include ALL fields.
TDD Prevents Anti-Patterns
-
Write test first → Think about what you're testing (not mocks)
-
Watch it fail → Confirms test tests real behavior
-
Minimal implementation → No test-only methods creep in
-
Real dependencies first → See what test needs before mocking
If testing mock behavior, you violated TDD - you added mocks without watching test fail against real code.
Language-Specific Patterns
For detailed framework and language-specific patterns:
-
JavaScript/React: See references/javascript-react.md for React Testing Library queries, Jest/Vitest setup, Playwright E2E, and component testing patterns
-
Python: See references/python.md for pytest fixtures, polyfactory, respx mocking, testcontainers, and FastAPI testing
-
Go: See references/go.md for table-driven tests, testify/go-cmp assertions, testcontainers-go, and interface fakes
Quality Checklist
Before completing tests, verify:
-
Happy path covered
-
Error conditions handled
-
Edge cases considered
-
Real dependencies used (minimal mocking)
-
Async waiting uses conditions, not arbitrary timeouts
-
Tests survive refactoring (no implementation details)
-
No test-only methods added to production code
-
No assertions on mock existence or call counts
-
Test names describe behavior, not implementation
What NOT to Test
-
Internal state
-
Private methods
-
Function call counts
-
Implementation details
-
Mock existence
-
Framework internals
Test behavior users/callers observe, not code structure.
Quick Reference
Test Type When Dependencies
Integration Default choice Real (test DB, real modules)
E2E Critical user workflows Real (full stack)
Unit Pure functions only None
Anti-Pattern Fix
Testing mock existence Test actual outcome instead
Test-only methods in production Move to test utilities
Mocking without understanding Understand dependencies, mock minimally
Incomplete mocks Mirror real API completely
Tests as afterthought TDD - write tests first
Arbitrary timeouts/sleeps Use condition-based waiting