e2e-test-reviewer

E2E Test Scenario Quality Review

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "e2e-test-reviewer" with this command: npx skills add dididy/e2e-test-reviewer/dididy-e2e-test-reviewer-e2e-test-reviewer

E2E Test Scenario Quality Review

Systematic checklist for reviewing E2E spec files AND Page Object Model (POM) files. Framework-agnostic principles; code examples show Playwright, Cypress, and Puppeteer where they differ.

Phase 1: Automated Grep Checks (Run First)

Once the review target files are determined, run the following Bash commands before LLM analysis to mechanically detect known anti-patterns.

echo "=== E2E Mechanical Anti-Pattern Check ===" echo ""

3. Error Swallowing — .catch(() => {}) / .catch(() => false)

echo "--- #3 Error Swallowing ---" grep -rn '.catch(\s*() =>' e2e/ --include='.ts' --include='.js' --include='.cy.' | grep -v node_modules | grep -v '// justified'

4. Always-Passing — assertion that can never fail

echo "--- #4 Always-Passing ---" grep -rn -E 'toBeGreaterThanOrEqual(0)|should(.(gte|greaterThan).0)' e2e/ --include='.ts' --include='.js' --include='.cy.'

5. Boolean Trap — toBeTruthy on non-boolean values (Locator, ElementHandle, selector result)

Excludes cases where the value is already boolean (e.g., response.ok(), isVisible(), isChecked())

echo "--- #5 Boolean Trap ---" grep -rn -E 'expect(.).toBeTruthy()|should(.(be.truthy|be.true))' e2e/ --include='.spec.' --include='.test.' --include='.cy.' | grep -v -E '.(ok|isVisible|isChecked|isDisabled|isEnabled|isEditable|isHidden)()'

6. Conditional Bypass — expect inside if(isVisible)

echo "--- #6 Conditional Bypass ---" grep -rn -E "if.(isVisible|is(.:visible.))" e2e/ --include='.spec.' --include='.test.' --include='.cy.*'

7. Raw DOM — document.querySelector in spec/test files

echo "--- #7 Raw DOM in specs ---" grep -rn 'document.querySelector' e2e/ --include='.spec.' --include='.test.' --include='.cy.'

12. Hard-coded Timeout — waitForTimeout / cy.wait(number)

echo "--- #12 Hard-coded Timeout ---" grep -rn -E 'waitForTimeout|cy.wait(\d' e2e/ --include='.ts' --include='.js' --include='.cy.'

13b. Network dependency — goto/visit without route/intercept setup nearby

echo "--- #13b Missing Network Mock ---" grep -rn -E 'page.goto|cy.visit' e2e/ --include='.spec.' --include='.test.' --include='.cy.' | grep -v 'route.|intercept|mock' | head -20

echo "" echo "=== Done ==="

Interpreting results:

  • Zero hits → no mechanical issues found, proceed to Phase 2

  • Any hit → report each line as an issue (includes file:line)

  • Lines with // justified comments are excluded (intentional usage)

Output Phase 1 results as-is. The LLM must not reinterpret them.

Phase 2: LLM Review (Subjective Checks Only)

Patterns already detected in Phase 1 (#3, #4, #5 partial, #6, #7, #12, #13b partial) are skipped. The LLM performs only these checks:

Check Reason

1 Name-Assertion Alignment Requires semantic interpretation

2 Missing Then Requires logic flow analysis

8 Render-Only Requires test value judgment

9 Duplicate Scenarios Requires similarity comparison

10 Misleading Names Requires semantic interpretation

11 Over-Broad Assertions + Subject-Inversion Requires domain context

13 Flaky Patterns (partial) Requires context judgment for nth(), animation, network patterns

14 YAGNI in POM Requires usage grep then judgment

Phase 3: Coverage Gap Analysis (After Review)

After completing Phase 1 + 2, identify scenarios the test suite does NOT cover. Scan the page/feature under test and flag missing:

Gap Type What to look for

Error paths Form validation errors, API failure states, network offline, 404/500 pages

Edge cases Empty state, max-length input, special characters, concurrent actions

Accessibility Keyboard navigation, screen reader labels, focus management after modal/dialog

Auth boundaries Unauthorized access redirects, expired session handling, role-based visibility

Output: List up to 5 highest-value missing scenarios as suggestions, not requirements. Format:

Coverage Gaps (Suggestions)

  1. [Error path] No test for form submission with server error — add API mock returning 500
  2. [Edge case] No test for empty list state — verify empty state message shown

Review Checklist

Run each check against every non-skipped test and every changed POM file.

Important: test.skip() with a reason comment or reason string is intentional — do NOT flag or remove these. Only flag mid-test conditional skips that hide failures (see #6).

Tier 1 — P0/P1 (always check)

  1. Name-Assertion Alignment [LLM-only]

Symptom: Test name promises something the assertions don't verify.

// BAD — name says "status" but only checks visibility test('should display paragraph status', () => { await expect(status).toBeVisible(); // no status content check });

Rule: Every noun in the test name must have a corresponding assertion. Add it or rename.

Procedure:

  • Extract all nouns from the test name (e.g., "should display paragraph status")

  • For each noun, search the test body for expect() that verifies it

  • Missing noun → add assertion or remove noun from name

Common patterns: "should display X" with only toBeVisible() (no content check), "should update X and Y" with assertion for X but not Y, "should validate form" with only happy-path assertion.

  1. Missing Then [LLM-only]

Symptom: Test acts but doesn't verify the final expected state.

// BAD — toggles but doesn't verify the dismissed state test('should cancel edit on Escape', () => { await input.click(); await page.keyboard.press('Escape'); await expect(text).toBeVisible(); // input still hidden? });

Rule: For toggle/cancel/close actions, verify both the restored state AND the dismissed state.

Procedure:

  • Identify the action verb (toggle, cancel, close, delete, submit, undo)

  • List the expected state changes (element appears/disappears, text changes, count changes)

  • Check that BOTH sides of the state change are asserted

Common patterns: Cancel/Escape without verifying input is hidden, delete without verifying count decreased, submit without verifying form resets, tab switch without verifying previous tab content is hidden.

  1. Error Swallowing [grep-detectable]

Symptom (spec): try/catch wrapping assertions — test passes on error.

Symptom (POM): .catch(() => {}) or .catch(() => false) on awaited operations — caller never sees the failure.

// BAD spec — silent pass try { await expect(header).toBeVisible(); } catch { console.log('skipped'); }

// BAD POM — caller thinks execution succeeded await runningIndicator.waitFor({ state: 'detached' }).catch(() => {});

Rule (spec): Never wrap assertions in try/catch. Use test.skip() in beforeEach if the test can't run.

Rule (POM): Remove .catch(() => {}) / .catch(() => false) from wait/assertion methods. If the operation can legitimately fail, the caller should decide how to handle it. Only keep catch for UI stabilization like editor.click({ force: true }).catch(() => textArea.focus()) .

  1. Always-Passing Assertions [grep-detectable]

Symptom: Assertion that can never fail.

// BAD — count >= 0 is always true expect(count).toBeGreaterThanOrEqual(0);

Rule: Search for toBeGreaterThanOrEqual(0) , toBeTruthy() on always-truthy strings, || chains that accept defaults as valid.

  1. Boolean Trap Assertions [grep-detectable]

Symptom (spec): expect(locator).toBeTruthy() on a Locator/ElementHandle object — always passes because objects are always truthy regardless of whether the element exists in the DOM.

NOT a boolean trap: expect(response.ok()).toBeTruthy() or expect(await el.isVisible()).toBe(true) — these operate on actual boolean return values. While toBe(true) is slightly more precise than toBeTruthy() for booleans, this is a style preference, not a bug. Only flag as P1 when the value is a non-boolean object (Locator, ElementHandle, Promise).

Symptom (POM): Method returns Promise<boolean> instead of exposing an element handle — forces spec into boolean trap.

// BAD — boolean return forces spec into trap async isEditorVisible(index = 0): Promise<boolean> { return await paragraph.locator('code-editor').isVisible(); } expect(await page.isEditorVisible(0)).toBe(true);

Rule (spec): Use the framework's built-in assertion instead of extracting a boolean first:

  • Playwright: await expect(locator).toBeVisible()

  • Cypress: cy.get(selector).should('be.visible')

  • Puppeteer: await page.waitForSelector(selector, { visible: true })

Rule (POM): Expose the element handle (Locator / selector string) instead of returning Promise<boolean> . Let specs use framework assertions directly.

  1. Conditional Bypass (Silent Pass / Hidden Skip) [grep-detectable]

Symptom: expect() inside if block, or mid-test test.skip() — test silently passes when feature is broken.

// BAD — if spinner never appears, assertion never runs if (await spinner.isVisible()) { await expect(spinner).toBeHidden({ timeout: 5000 }); }

Rule: Every test path must contain at least one expect() . Move environment checks to beforeEach or declaration-level test.skip() .

  1. Raw DOM Queries (Bypassing Framework API) [grep-detectable]

Symptom: Test drops into raw document.querySelector* / document.getElementById via evaluate() when the framework's element lookup API could do the same job.

// BAD — no auto-wait, returns stale boolean const has = await page.evaluate((i) => { return !!document.querySelectorAll('.para')[i]?.querySelector('.result'); }, 0); expect(has).toBe(true);

Why it matters: No auto-waiting, no retry, boolean trap, framework error messages lost.

Rule: Use the framework's element API instead of raw DOM:

  • Playwright: page.locator()
  • web-first assertions
  • Cypress: cy.get() / cy.find() — avoid cy.window().then(win => win.document.querySelector(...))

  • Puppeteer: page.$() / page.waitForSelector() — avoid page.evaluate(() => document.querySelector(...))

Only use evaluate /waitForFunction when the framework API can't express the condition (getComputedStyle , cross-element DOM relationships). In POM, add a comment explaining why.

Tier 2 — P1/P2 (check when time permits)

  1. Render-Only Tests (Low E2E Value) [LLM-only]

Symptom: Test only calls toBeVisible() with no interaction or content assertion.

Rule: Add at least one of: content assertion (not.toBeEmpty() , toContainText() ), count assertion (toHaveCount(n) ), or sibling element assertion.

  1. Duplicate Scenarios (DRY) [LLM-only]

Symptom: Two tests share >70% of their steps with minor variations.

Rule (within file): Merge tests that differ only in setup or a single assertion. Use the richer verification set from both.

Rule (cross-file): After reviewing all files in scope, cross-check tests with similar names across different spec files. If test A in feature-settings.spec.ts is a subset of test B in feature-form-validation.spec.ts , delete A and strengthen B.

Procedure:

  • List all test names in the file — look for similar prefixes or overlapping verbs

  • For each pair with >70% step overlap, compare their assertion sets

  • If one is a subset of the other, delete the weaker test and keep the richer one

Common patterns: "should add item" and "should add item and verify count" (subset), "should open dialog" in file A and "should open dialog and fill form" in file B (cross-file subset), parameterizable tests written as separate cases.

  1. Misleading Test Names (KISS) [LLM-only]

Symptom: Name implies UI interaction but test uses API/REST, or name implies feature X but tests feature Y.

Rule: If the test uses REST API, reload, or indirect methods, the name must make that explicit.

  1. Over-Broad Assertions (KISS) [LLM-only]

Symptom: Assertion too loose to catch regressions.

// BAD — any string containing '%' passes expect(content.includes('%')).toBe(true);

Rule: Prefer exact matches or explicit value lists over .includes() or loose regex when valid values are known and small.

11b. Subject-Inversion [LLM-only]

Symptom: Expected values placed in expect() instead of the actual value — failure messages become confusing.

// BAD — subject is the expected values array, not the actual result // failure message: "Expected [200, 202] to contain 204" (confusing) expect([200, 202]).toContain(deleteResponse.status());

// GOOD — actual value as subject, clear failure message const status = deleteResponse.status(); expect(status === 200 || status === 202).toBe(true);

Rule: The value under test (actual) must always be the argument to expect() . Expected values go in the matcher. If the matcher doesn't support multi-value checks natively, use a boolean expression with toBe(true) rather than inverting the subject.

  1. Hard-coded Timeouts [grep-detectable]

Symptom: waitForTimeout() or magic timeout numbers scattered across tests and POM.

// BAD — arbitrary sleep await page.waitForTimeout(2000);

// BAD — magic number, no explanation await element.waitFor({ state: 'visible', timeout: 30000 });

Rule: Never use explicit sleep (waitForTimeout / cy.wait(ms) ) — rely on framework auto-wait or retry mechanisms. For custom timeouts, extract named constants with comments explaining why the default isn't sufficient.

  1. Flaky Patterns [LLM-only + grep]

Symptom: Test passes locally but fails intermittently in CI due to timing, ordering, or environment assumptions.

Sub-patterns:

13a. Positional selectors — nth() , first() , last() without comment.

// BAD — breaks if DOM order changes await expect(items.nth(2)).toContainText('Settings');

Rule: Prefer data-testid , role-based, or attribute selectors. If nth() is unavoidable, add a comment explaining why.

13b. Network dependency without mock — Test relies on real API responses without route.fulfill() / cy.intercept() .

// BAD — fails if API is slow or returns different data await page.goto('/dashboard'); await expect(page.locator('.user-count')).toHaveText('42');

Rule: For data-dependent assertions, mock the network response or assert on structure (element exists, is not empty) rather than exact values.

13c. Animation race — Assertion runs before CSS transition or animation completes.

// BAD — modal may still be animating await button.click(); await expect(modal).toBeVisible(); // passes await expect(modal.locator('.content')).toHaveText('Done'); // flaky — content not rendered yet

Rule: After triggering animations, wait for the final state element, not the container. Use waitForSelector with stable content or toHaveCSS('opacity', '1') for fade-ins.

  1. YAGNI in Page Objects [LLM-only]

Symptom: POM has locators/methods never referenced by any spec.

Procedure:

  • List all public members of each changed POM file

  • Grep each member across all test files and other POMs

  • Classify: USED / INTERNAL-ONLY (private ) / UNUSED (delete)

Common patterns: Convenience wrappers (clickEdit() when specs use editButton.click() ), getter methods (getCount() when specs use toHaveCount() ), state checkers (isEditMode() when specs assert on elements directly), pre-built "just in case" locators.

Rule: Delete unused members. Make internal-only members private . When creating new shared utils, ensure they will be used by 2+ specs. Do not delete existing util files/classes that are actively imported and used by specs — only flag unused individual members within them.

Output:

FileMemberUsed InStatus
page.tsaddLinks(none)DELETE
page.tssearchDialoginternal onlyPRIVATE

Output Format

Present findings grouped by severity:

[P0/P1/P2] Task N: [filename] — [issue type]

N-1. [test name or POM method]

  • Issue: [description]
  • Fix: [name change / assertion addition / merge / deletion]
  • Code:
    // concrete code to add or change
    

After all findings, append a summary table:

## Review Summary

| Sev | Count | Top Issue | Affected Files |
|-----|-------|-----------|----------------|
| P0  | 3     | Missing Then | auth.spec.ts, form.spec.ts |
| P1  | 5     | Duplicate Scenarios | settings.spec.ts |
| P2  | 2     | Render-Only | dashboard.spec.ts |

**Total: 10 issues across 4 files. Fix P0 first.**

Severity classification:

- P0 (Must fix): Test silently passes when the feature is broken — no real verification happening

- P1 (Should fix): Test works but gives poor diagnostics, wastes CI time, or misleads developers

- P2 (Nice to fix): Weak but not wrong — maintenance and robustness improvements

Quick Reference

#
Check
Sev
Phase
Detection Signal

1
Name-Assertion
P0
LLM
Noun in name with no matching expect()

2
Missing Then
P0
LLM
Action without final state verification

3
Error Swallowing
P0
grep
try/catch
 in spec, .catch(() => {})
 in POM

4
Always-Passing
P0
grep
>=0
, truthy on non-empty, ||
 defaults

5
Boolean Trap
P1
grep
expect(locator).toBeTruthy()
 on non-boolean objects; skip when value is actual boolean (.ok()
, .isVisible()
)

6
Conditional Bypass
P0
grep
expect()
 inside if
, mid-test test.skip()

7
Raw DOM Queries
P1
grep
document.querySelector
 in evaluate

8
Render-Only
P2
LLM
Only toBeVisible()
, no content/count

9
Duplicate
P1
LLM
>70% shared steps, cross-file overlap

10
Misleading Name
P1
LLM
API/reload in "should [UI verb]" test

11
Over-Broad
P2
LLM
.includes()
 where enum values known

11b
Subject-Inversion
P1
LLM
expect([expected]).toContain(actual)
 — confusing failure messages

12
Hard-coded Timeout
P2
grep
waitForTimeout()
, magic numbers

13
Flaky Patterns
P1
LLM+grep
nth()
, missing network mock, animation race

14
YAGNI in POM
P2
LLM
Public member not referenced in any spec

Suppression

When a grep-detected pattern is intentional, add a // justified: [reason]
 comment to the line. Phase 1 will exclude it.

Example: await editor.click({ force: true }).catch(() => textArea.focus()); // justified: UI stabilization fallback

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

cypress-debugger

No summary provided by upstream source.

Repository SourceNeeds Review
General

playwright-debugger

No summary provided by upstream source.

Repository SourceNeeds Review
General

e2e-reviewer

No summary provided by upstream source.

Repository SourceNeeds Review