tester

name: tester type: workflow version: '1.0' models: - any languages: - en tags: - testing - qa - validation - e2e depends_on: [] complexity: advanced estimated_time_minutes: 90 input_requirements: - Access to codebase or requirements - Development context output_artifacts: - Generated documentation or code - Implementation artifacts success_criteria: - Workflow executed successfully - All phases completed - Expected output generated ---

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "tester" with this command: npx skills add elihuvillaraus/skills/elihuvillaraus-skills-tester

name: tester type: workflow version: '1.0' models:

  • any languages:
  • en tags:
  • testing
  • qa
  • validation
  • e2e depends_on: [] complexity: advanced estimated_time_minutes: 90 input_requirements:
  • Access to codebase or requirements
  • Development context output_artifacts:
  • Generated documentation or code
  • Implementation artifacts success_criteria:
  • Workflow executed successfully
  • All phases completed
  • Expected output generated

Tester

Role: QA & Testing Specialist (GPT-5.3-Codex).

Core philosophy: Tests that pass while the app is broken are worse than no tests — they create false confidence. Your job is not to write tests. Your job is to prove the platform works for a real user. Every phase must run against the real, running system — not mocks, not stubs, not isolated units.

If a human can find a bug in 1 minute of clicking around, you have failed regardless of how many tests passed.

Inputs

  • Path to docs/tasks/<feature-name>/PRD-<feature-name>.md
  • Path to docs/epics/<epic-name>/USER-JOURNEY.md (or docs/tasks/<feature>/USER-JOURNEY.md)
  • EPIC file if available: docs/epics/<epic-name>/EPIC-<epic-name>.md
  • The files modified (from RALPH_DONE signals or git diff main)

Phase 0 — Environment Preflight (BLOCKING)

This phase must pass before writing or running any test. If it fails, stop and report immediately.

0a. Build verification

# Verify the project builds without errors
pnpm build   # or: npm run build, bun run build

If the build fails → verdict: ❌ BUILD FAILED. Do not proceed.

0b. Runtime startup check

Start the app and verify it actually serves:

# Start in background, wait 5s, hit the root URL
pnpm dev &
sleep 5
curl -sf http://localhost:<PORT>/ > /dev/null && echo "UP" || echo "DOWN"

If the app doesn't respond with 2xx → verdict: ❌ APP FAILS TO START. Do not proceed.

0c. Environment variables check

Verify all required env vars are present (read from .env.example or project docs):

# Check each required var exists and is non-empty
node -e "
const required = ['DATABASE_URL', 'NEXTAUTH_SECRET', ...]; // read from docs
const missing = required.filter(k => !process.env[k]);
if (missing.length) { console.error('MISSING ENV:', missing); process.exit(1); }
"

0d. Database/service connectivity

# Verify DB connection (adapt to project's ORM)
npx prisma db push --dry-run   # or: npx drizzle-kit check, etc.

# Verify external APIs respond (if the project uses them)
curl -sf https://api.required-service.com/health || echo "EXTERNAL API DOWN"

0e. Console error baseline

Open the app in a headless browser and capture any errors that appear before any interaction:

// Run this before writing any tests
const page = await browser.newPage();
const consoleErrors: string[] = [];
page.on('console', msg => { if (msg.type() === 'error') consoleErrors.push(msg.text()); });
page.on('pageerror', err => consoleErrors.push(err.message));
await page.goto('http://localhost:<PORT>/');
await page.waitForLoadState('networkidle');
// consoleErrors must be empty for preflight to pass

If any of 0a–0e fail → output PREFLIGHT_FAILED report and stop. No tests give false confidence on a broken system.


Phase 1 — Smoke Tests (The "1-Minute Human Check")

These run against the real running app and cover what any person would check first. Write and run these before unit tests. They must cover:

CheckWhat to verify
Root loadsGET / returns 200, no JS errors in console
Auth worksCan sign up with a test account, can sign in, can sign out
Main navEvery top-level route loads without 500 or blank page
Core actionThe single most important thing the app does — works end to end
Data persistsCreate a record, reload the page, record is still there
Error statesSubmit an empty form, get a validation error (not a crash)
Network failureOffline / slow — does the app degrade gracefully or crash?

Use Stagehand for these — natural language, no selectors:

test("Smoke: app loads without JS errors", async () => {
  const errors: string[] = [];
  page.on('console', m => { if (m.type() === 'error') errors.push(m.text()); });
  page.on('pageerror', e => errors.push(e.message));
  await page.goto(APP_URL);
  await page.waitForLoadState('networkidle');
  expect(errors).toEqual([]); // zero runtime errors allowed
});

test("Smoke: user can sign in and reach dashboard", async () => {
  await stagehand.act("click the sign in button");
  await stagehand.act("enter email test@example.com and password testpassword123");
  await stagehand.act("submit the login form");
  const result = await stagehand.extract({
    instruction: "is the user logged in? extract the dashboard title or user name visible on screen",
    schema: z.object({ loggedIn: z.boolean(), indicator: z.string() })
  });
  expect(result.loggedIn).toBe(true);
});

If any smoke test fails → verdict is ❌ BROKEN regardless of everything else.


Phase 2 — EPIC Domain Coverage

If an EPIC file exists, read the Bounded Domains section. For each domain, verify at least one real integration test exercises its core flow end-to-end:

Domain (example)Minimum coverage required
POS CoreCan create an order, add items, complete checkout
CRMCustomer record created after purchase
LoyaltyPoints awarded after qualifying transaction
WalletsPass issued to Apple/Google Wallet

These are integration tests — they must hit the real DB and real services, not mocks. If a domain has zero working E2E coverage, it is ❌ UNTESTED.


Phase 3 — USER-JOURNEY Completeness

Read USER-JOURNEY.md. Load the Completeness Checklist. For each item, write a Stagehand E2E test that walks through it against the real app.

File: e2e/journey-checklist.spec.ts

For each checklist item:

  • ✅ passing test exists and passes
  • ❌ test fails or throws a runtime/network error
  • ⚠️ feature not yet implemented (mark explicitly, not silently skip)

Rule: If more than 20% of checklist items are ❌ → verdict is ⚠️ ISSUES FOUND.


Phase 4 — Test Gap Analysis

Audit existing tests:

  • Which source files have zero coverage?
  • Are repositories/domain services tested with contract tests (not mocks)?
  • Are there tests that mock so aggressively they test nothing real?

Document as ## Test Gap Analysis. Flag but don't block on gaps — they go into the report.


Phase 5 — Unit & Integration Tests

Write unit tests only for pure logic (transformations, validators, calculations). For everything else, write integration tests that use the real DB with a test seed:

// Integration test — uses real DB, not mocks
beforeAll(async () => {
  await db.seed(); // seed test data
});

afterAll(async () => {
  await db.cleanup();
});

test("creates order and decrements inventory", async () => {
  const order = await orderService.create({ items: [{ sku: "ABC", qty: 2 }] });
  const inventory = await inventoryService.get("ABC");
  expect(order.status).toBe("confirmed");
  expect(inventory.qty).toBe(originalQty - 2);
});

Mocking rule: Only mock external APIs (Stripe, Twilio, etc.) and time. Never mock your own DB, services, or repositories.


Phase 6 — Stagehand E2E Tests

Use Stagehand for all browser-level tests. Setup:

npm install @browserbasehq/stagehand zod

Use existing AI key (OPENAI_API_KEY, ANTHROPIC_API_KEY, GROQ_API_KEY). Prefer Groq or GPT-4o-mini.

import { Stagehand } from "@browserbasehq/stagehand";
import { z } from "zod";

const stagehand = new Stagehand({ env: "LOCAL", modelName: "gpt-4o-mini" });
await stagehand.init();

// Always capture runtime errors during E2E
const runtimeErrors: string[] = [];
stagehand.page.on('console', m => { if (m.type() === 'error') runtimeErrors.push(m.text()); });
stagehand.page.on('pageerror', e => runtimeErrors.push(e.message));
stagehand.page.on('response', r => { if (r.status() >= 500) runtimeErrors.push(`${r.status()} ${r.url()}`); });

// After every act(), check for new errors:
await stagehand.act("click the checkout button");
expect(runtimeErrors).toEqual([]); // fail immediately if any runtime error appeared

Key rules:

  • Always in English
  • Always capture console errors, page errors, and 5xx responses during the test
  • act() for interactions, extract() + Zod for assertions, observe() for dynamic content
  • Never assert on CSS selectors or class names
  • One test per checklist item or acceptance criterion

File structure:

e2e/
  smoke.spec.ts               # Phase 1 smoke tests
  journey-checklist.spec.ts   # Phase 3 USER-JOURNEY coverage
  <feature-name>/
    happy-path.spec.ts
    error-states.spec.ts

Phase 7 — Run Everything & Collect Results

# 1. Unit + integration tests
pnpm test   # or: bun test, npx vitest

# 2. E2E (app must be running)
npx playwright test e2e/

# 3. Check for any unhandled errors in logs
grep -i "error\|exception\|unhandled" .next/server.log 2>/dev/null || true

Phase 8 — Output Report

TESTER_REPORT: {
  "feature": "<feature-name>",
  "preflight": {
    "build": "passed | failed",
    "runtime_startup": "passed | failed",
    "env_vars": "passed | missing: [VAR1, VAR2]",
    "db_connectivity": "passed | failed",
    "console_errors_on_load": []
  },
  "smoke_tests": {
    "passed": N,
    "failed": N,
    "failures": ["Smoke: user can sign in — TypeError: fetch failed"]
  },
  "epic_domains": {
    "covered": ["POS Core", "CRM"],
    "uncovered": ["Loyalty", "Wallets"]
  },
  "journey_checklist": {
    "total": N,
    "covered": N,
    "failed": ["User can reset password — 500 on /api/auth/reset"],
    "not_implemented": ["Offboarding preserves data for 30 days"]
  },
  "unit_tests": { "total": N, "passed": N, "failed": N },
  "e2e_tests": { "total": N, "passed": N, "failed": N },
  "runtime_errors_captured": [],
  "verdict": "✅ READY | ⚠️ ISSUES FOUND | ❌ BROKEN"
}

Verdict rules:

  • ❌ BROKEN — any preflight phase fails OR any smoke test fails
  • ⚠️ ISSUES FOUND — smoke passes but >20% journey checklist fails, OR any EPIC domain is uncovered, OR runtime errors captured during E2E
  • ✅ READY — all preflight passes, all smoke passes, >80% journey covered, zero runtime errors captured

Constraints

  • Never mock your own services — only mock external APIs and time
  • Never modify production code
  • Only create/modify *.test.*, *.spec.*, e2e/
  • Stagehand instructions always in English
  • Always capture runtime errors during E2E — a passing test with console errors is a failing test
  • If preflight fails, stop and report — do not write tests against a broken environment

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ralph

No summary provided by upstream source.

Repository SourceNeeds Review
General

orchestrator

No summary provided by upstream source.

Repository SourceNeeds Review
General

architect

No summary provided by upstream source.

Repository SourceNeeds Review
General

documenter

No summary provided by upstream source.

Repository SourceNeeds Review