E2E Test Scenario Quality Review

Systematic checklist for reviewing E2E spec files AND Page Object Model (POM) files. Framework-agnostic principles; code examples show Playwright, Cypress, and Puppeteer where they differ.

Phase 1: Automated Grep Checks (Run First)

Once the review target files are determined, run the following Bash commands before LLM analysis to mechanically detect known anti-patterns.

echo "=== E2E Mechanical Anti-Pattern Check ===" echo ""

3. Error Swallowing — .catch(() => {}) / .catch(() => false)

echo "--- #3 Error Swallowing ---" grep -rn '.catch(\s*() =>' e2e/ --include='.ts' --include='.js' --include='.cy.' | grep -v node_modules | grep -v '// justified'

4. Always-Passing — assertion that can never fail

echo "--- #4 Always-Passing ---" grep -rn -E 'toBeGreaterThanOrEqual(0)|should(.(gte|greaterThan).0)' e2e/ --include='.ts' --include='.js' --include='.cy.'

5. Boolean Trap — toBeTruthy on non-boolean values (Locator, ElementHandle, selector result)

Excludes cases where the value is already boolean (e.g., response.ok(), isVisible(), isChecked())

6. Conditional Bypass — expect inside if(isVisible)

echo "--- #6 Conditional Bypass ---" grep -rn -E "if.(isVisible|is(.:visible.))" e2e/ --include='.spec.' --include='.test.' --include='.cy.*'

7. Raw DOM — document.querySelector in spec/test files

echo "--- #7 Raw DOM in specs ---" grep -rn 'document.querySelector' e2e/ --include='.spec.' --include='.test.' --include='.cy.'

12. Hard-coded Timeout — waitForTimeout / cy.wait(number)

echo "--- #12 Hard-coded Timeout ---" grep -rn -E 'waitForTimeout|cy.wait(\d' e2e/ --include='.ts' --include='.js' --include='.cy.'

13b. Network dependency — goto/visit without route/intercept setup nearby

echo "" echo "=== Done ==="

Interpreting results:

Zero hits → no mechanical issues found, proceed to Phase 2
Any hit → report each line as an issue (includes file:line)
Lines with // justified comments are excluded (intentional usage)

Output Phase 1 results as-is. The LLM must not reinterpret them.

Phase 2: LLM Review (Subjective Checks Only)

Patterns already detected in Phase 1 (#3, #4, #5 partial, #6, #7, #12, #13b partial) are skipped. The LLM performs only these checks:

Check Reason

1 Name-Assertion Alignment Requires semantic interpretation

2 Missing Then Requires logic flow analysis

8 Render-Only Requires test value judgment

9 Duplicate Scenarios Requires similarity comparison

10 Misleading Names Requires semantic interpretation

11 Over-Broad Assertions + Subject-Inversion Requires domain context

13 Flaky Patterns (partial) Requires context judgment for nth(), animation, network patterns

14 YAGNI in POM Requires usage grep then judgment

Phase 3: Coverage Gap Analysis (After Review)

After completing Phase 1 + 2, identify scenarios the test suite does NOT cover. Scan the page/feature under test and flag missing:

Gap Type What to look for

Error paths Form validation errors, API failure states, network offline, 404/500 pages

Edge cases Empty state, max-length input, special characters, concurrent actions

Accessibility Keyboard navigation, screen reader labels, focus management after modal/dialog

Auth boundaries Unauthorized access redirects, expired session handling, role-based visibility

Output: List up to 5 highest-value missing scenarios as suggestions, not requirements. Format:

Coverage Gaps (Suggestions)

[Error path] No test for form submission with server error — add API mock returning 500
[Edge case] No test for empty list state — verify empty state message shown

Review Checklist

Run each check against every non-skipped test and every changed POM file.

Important: test.skip() with a reason comment or reason string is intentional — do NOT flag or remove these. Only flag mid-test conditional skips that hide failures (see #6).

Tier 1 — P0/P1 (always check)

Name-Assertion Alignment [LLM-only]

Symptom: Test name promises something the assertions don't verify.

// BAD — name says "status" but only checks visibility test('should display paragraph status', () => { await expect(status).toBeVisible(); // no status content check });

Rule: Every noun in the test name must have a corresponding assertion. Add it or rename.

Procedure:

Extract all nouns from the test name (e.g., "should display paragraph status")
For each noun, search the test body for expect() that verifies it
Missing noun → add assertion or remove noun from name

Common patterns: "should display X" with only toBeVisible() (no content check), "should update X and Y" with assertion for X but not Y, "should validate form" with only happy-path assertion.

Missing Then [LLM-only]

Symptom: Test acts but doesn't verify the final expected state.

// BAD — toggles but doesn't verify the dismissed state test('should cancel edit on Escape', () => { await input.click(); await page.keyboard.press('Escape'); await expect(text).toBeVisible(); // input still hidden? });

Rule: For toggle/cancel/close actions, verify both the restored state AND the dismissed state.

Procedure:

Identify the action verb (toggle, cancel, close, delete, submit, undo)
List the expected state changes (element appears/disappears, text changes, count changes)
Check that BOTH sides of the state change are asserted

Common patterns: Cancel/Escape without verifying input is hidden, delete without verifying count decreased, submit without verifying form resets, tab switch without verifying previous tab content is hidden.

Error Swallowing [grep-detectable]

Symptom (spec): try/catch wrapping assertions — test passes on error.

Symptom (POM): .catch(() => {}) or .catch(() => false) on awaited operations — caller never sees the failure.

// BAD spec — silent pass try { await expect(header).toBeVisible(); } catch { console.log('skipped'); }

// BAD POM — caller thinks execution succeeded await runningIndicator.waitFor({ state: 'detached' }).catch(() => {});

Rule (spec): Never wrap assertions in try/catch. Use test.skip() in beforeEach if the test can't run.

Rule (POM): Remove .catch(() => {}) / .catch(() => false) from wait/assertion methods. If the operation can legitimately fail, the caller should decide how to handle it. Only keep catch for UI stabilization like editor.click({ force: true }).catch(() => textArea.focus()) .

Always-Passing Assertions [grep-detectable]

Symptom: Assertion that can never fail.

// BAD — count >= 0 is always true expect(count).toBeGreaterThanOrEqual(0);

Rule: Search for toBeGreaterThanOrEqual(0) , toBeTruthy() on always-truthy strings, || chains that accept defaults as valid.

Boolean Trap Assertions [grep-detectable]

Symptom (spec): expect(locator).toBeTruthy() on a Locator/ElementHandle object — always passes because objects are always truthy regardless of whether the element exists in the DOM.

NOT a boolean trap: expect(response.ok()).toBeTruthy() or expect(await el.isVisible()).toBe(true) — these operate on actual boolean return values. While toBe(true) is slightly more precise than toBeTruthy() for booleans, this is a style preference, not a bug. Only flag as P1 when the value is a non-boolean object (Locator, ElementHandle, Promise).

Symptom (POM): Method returns Promise<boolean> instead of exposing an element handle — forces spec into boolean trap.

// BAD — boolean return forces spec into trap async isEditorVisible(index = 0): Promise<boolean> { return await paragraph.locator('code-editor').isVisible(); } expect(await page.isEditorVisible(0)).toBe(true);

Rule (spec): Use the framework's built-in assertion instead of extracting a boolean first:

Playwright: await expect(locator).toBeVisible()
Cypress: cy.get(selector).should('be.visible')
Puppeteer: await page.waitForSelector(selector, { visible: true })

Rule (POM): Expose the element handle (Locator / selector string) instead of returning Promise<boolean> . Let specs use framework assertions directly.

Conditional Bypass (Silent Pass / Hidden Skip) [grep-detectable]

Symptom: expect() inside if block, or mid-test test.skip() — test silently passes when feature is broken.

// BAD — if spinner never appears, assertion never runs if (await spinner.isVisible()) { await expect(spinner).toBeHidden({ timeout: 5000 }); }

Rule: Every test path must contain at least one expect() . Move environment checks to beforeEach or declaration-level test.skip() .

Raw DOM Queries (Bypassing Framework API) [grep-detectable]

Symptom: Test drops into raw document.querySelector* / document.getElementById via evaluate() when the framework's element lookup API could do the same job.

// BAD — no auto-wait, returns stale boolean const has = await page.evaluate((i) => { return !!document.querySelectorAll('.para')[i]?.querySelector('.result'); }, 0); expect(has).toBe(true);

Why it matters: No auto-waiting, no retry, boolean trap, framework error messages lost.

Rule: Use the framework's element API instead of raw DOM:

Playwright: page.locator()

web-first assertions

Cypress: cy.get() / cy.find() — avoid cy.window().then(win => win.document.querySelector(...))
Puppeteer: page.$() / page.waitForSelector() — avoid page.evaluate(() => document.querySelector(...))

Only use evaluate /waitForFunction when the framework API can't express the condition (getComputedStyle , cross-element DOM relationships). In POM, add a comment explaining why.

Tier 2 — P1/P2 (check when time permits)

Render-Only Tests (Low E2E Value) [LLM-only]

Symptom: Test only calls toBeVisible() with no interaction or content assertion.

Rule: Add at least one of: content assertion (not.toBeEmpty() , toContainText() ), count assertion (toHaveCount(n) ), or sibling element assertion.

Duplicate Scenarios (DRY) [LLM-only]

Symptom: Two tests share >70% of their steps with minor variations.

Rule (within file): Merge tests that differ only in setup or a single assertion. Use the richer verification set from both.

Rule (cross-file): After reviewing all files in scope, cross-check tests with similar names across different spec files. If test A in feature-settings.spec.ts is a subset of test B in feature-form-validation.spec.ts , delete A and strengthen B.

Procedure:

List all test names in the file — look for similar prefixes or overlapping verbs
For each pair with >70% step overlap, compare their assertion sets
If one is a subset of the other, delete the weaker test and keep the richer one

Common patterns: "should add item" and "should add item and verify count" (subset), "should open dialog" in file A and "should open dialog and fill form" in file B (cross-file subset), parameterizable tests written as separate cases.

Misleading Test Names (KISS) [LLM-only]

Symptom: Name implies UI interaction but test uses API/REST, or name implies feature X but tests feature Y.

Rule: If the test uses REST API, reload, or indirect methods, the name must make that explicit.

Over-Broad Assertions (KISS) [LLM-only]

Symptom: Assertion too loose to catch regressions.

// BAD — any string containing '%' passes expect(content.includes('%')).toBe(true);

Rule: Prefer exact matches or explicit value lists over .includes() or loose regex when valid values are known and small.

11b. Subject-Inversion [LLM-only]

Symptom: Expected values placed in expect() instead of the actual value — failure messages become confusing.

// BAD — subject is the expected values array, not the actual result // failure message: "Expected [200, 202] to contain 204" (confusing) expect([200, 202]).toContain(deleteResponse.status());

// GOOD — actual value as subject, clear failure message const status = deleteResponse.status(); expect(status === 200 || status === 202).toBe(true);

Rule: The value under test (actual) must always be the argument to expect() . Expected values go in the matcher. If the matcher doesn't support multi-value checks natively, use a boolean expression with toBe(true) rather than inverting the subject.

Hard-coded Timeouts [grep-detectable]

Symptom: waitForTimeout() or magic timeout numbers scattered across tests and POM.

// BAD — arbitrary sleep await page.waitForTimeout(2000);

// BAD — magic number, no explanation await element.waitFor({ state: 'visible', timeout: 30000 });

Rule: Never use explicit sleep (waitForTimeout / cy.wait(ms) ) — rely on framework auto-wait or retry mechanisms. For custom timeouts, extract named constants with comments explaining why the default isn't sufficient.

Flaky Patterns [LLM-only + grep]

Symptom: Test passes locally but fails intermittently in CI due to timing, ordering, or environment assumptions.

Sub-patterns:

13a. Positional selectors — nth() , first() , last() without comment.

// BAD — breaks if DOM order changes await expect(items.nth(2)).toContainText('Settings');

Rule: Prefer data-testid , role-based, or attribute selectors. If nth() is unavoidable, add a comment explaining why.

13b. Network dependency without mock — Test relies on real API responses without route.fulfill() / cy.intercept() .

// BAD — fails if API is slow or returns different data await page.goto('/dashboard'); await expect(page.locator('.user-count')).toHaveText('42');

Rule: For data-dependent assertions, mock the network response or assert on structure (element exists, is not empty) rather than exact values.

13c. Animation race — Assertion runs before CSS transition or animation completes.

// BAD — modal may still be animating await button.click(); await expect(modal).toBeVisible(); // passes await expect(modal.locator('.content')).toHaveText('Done'); // flaky — content not rendered yet

Rule: After triggering animations, wait for the final state element, not the container. Use waitForSelector with stable content or toHaveCSS('opacity', '1') for fade-ins.

YAGNI in Page Objects [LLM-only]

Symptom: POM has locators/methods never referenced by any spec.

Procedure:

List all public members of each changed POM file
Grep each member across all test files and other POMs
Classify: USED / INTERNAL-ONLY (private ) / UNUSED (delete)

Common patterns: Convenience wrappers (clickEdit() when specs use editButton.click() ), getter methods (getCount() when specs use toHaveCount() ), state checkers (isEditMode() when specs assert on elements directly), pre-built "just in case" locators.

Rule: Delete unused members. Make internal-only members private . When creating new shared utils, ensure they will be used by 2+ specs. Do not delete existing util files/classes that are actively imported and used by specs — only flag unused individual members within them.

Output:

File	Member	Used In	Status
page.ts	addLinks	(none)	DELETE
page.ts	searchDialog	internal only	PRIVATE

Output Format

Present findings grouped by severity:

[P0/P1/P2] Task N: [filename] — [issue type]

N-1. `[test name or POM method]`

Issue: [description]
Fix: [name change / assertion addition / merge / deletion]
Code:
```
// concrete code to add or change
```

After all findings, append a summary table:

## Review Summary

| Sev | Count | Top Issue | Affected Files |
|-----|-------|-----------|----------------|
| P0  | 3     | Missing Then | auth.spec.ts, form.spec.ts |
| P1  | 5     | Duplicate Scenarios | settings.spec.ts |
| P2  | 2     | Render-Only | dashboard.spec.ts |

**Total: 10 issues across 4 files. Fix P0 first.**

Severity classification:

- P0 (Must fix): Test silently passes when the feature is broken — no real verification happening

- P1 (Should fix): Test works but gives poor diagnostics, wastes CI time, or misleads developers

- P2 (Nice to fix): Weak but not wrong — maintenance and robustness improvements

Quick Reference

#
Check
Sev
Phase
Detection Signal

1
Name-Assertion
P0
LLM
Noun in name with no matching expect()

2
Missing Then
P0
LLM
Action without final state verification

3
Error Swallowing
P0
grep
try/catch
 in spec, .catch(() => {})
 in POM

4
Always-Passing
P0
grep
>=0
, truthy on non-empty, ||
 defaults

5
Boolean Trap
P1
grep
expect(locator).toBeTruthy()
 on non-boolean objects; skip when value is actual boolean (.ok()
, .isVisible()
)

6
Conditional Bypass
P0
grep
expect()
 inside if
, mid-test test.skip()

7
Raw DOM Queries
P1
grep
document.querySelector
 in evaluate

8
Render-Only
P2
LLM
Only toBeVisible()
, no content/count

9
Duplicate
P1
LLM
>70% shared steps, cross-file overlap

10
Misleading Name
P1
LLM
API/reload in "should [UI verb]" test

11
Over-Broad
P2
LLM
.includes()
 where enum values known

11b
Subject-Inversion
P1
LLM
expect([expected]).toContain(actual)
 — confusing failure messages

12
Hard-coded Timeout
P2
grep
waitForTimeout()
, magic numbers

13
Flaky Patterns
P1
LLM+grep
nth()
, missing network mock, animation race

14
YAGNI in POM
P2
LLM
Public member not referenced in any spec

Suppression

When a grep-detected pattern is intentional, add a // justified: [reason]
 comment to the line. Phase 1 will exclude it.

Example: await editor.click({ force: true }).catch(() => textArea.focus()); // justified: UI stabilization fallback

e2e-test-reviewer

Safety Notice

Copy this and send it to your AI assistant to learn

3. Error Swallowing — .catch(() => {}) / .catch(() => false)

4. Always-Passing — assertion that can never fail

5. Boolean Trap — toBeTruthy on non-boolean values (Locator, ElementHandle, selector result)

Excludes cases where the value is already boolean (e.g., response.ok(), isVisible(), isChecked())

6. Conditional Bypass — expect inside if(isVisible)

7. Raw DOM — document.querySelector in spec/test files

12. Hard-coded Timeout — waitForTimeout / cy.wait(number)

13b. Network dependency — goto/visit without route/intercept setup nearby

Coverage Gaps (Suggestions)

[P0/P1/P2] Task N: [filename] — [issue type]

N-1. `[test name or POM method]`

Source Transparency

Related Skills

cypress-debugger

playwright-debugger

e2e-reviewer

e2e-test-reviewer

Safety Notice

Copy this and send it to your AI assistant to learn

3. Error Swallowing — .catch(() => {}) / .catch(() => false)

4. Always-Passing — assertion that can never fail

5. Boolean Trap — toBeTruthy on non-boolean values (Locator, ElementHandle, selector result)

Excludes cases where the value is already boolean (e.g., response.ok(), isVisible(), isChecked())

6. Conditional Bypass — expect inside if(isVisible)

7. Raw DOM — document.querySelector in spec/test files

12. Hard-coded Timeout — waitForTimeout / cy.wait(number)

13b. Network dependency — goto/visit without route/intercept setup nearby

Coverage Gaps (Suggestions)

[P0/P1/P2] Task N: [filename] — [issue type]

N-1. [test name or POM method]

Source Transparency

Related Skills

cypress-debugger

playwright-debugger

e2e-reviewer

N-1. `[test name or POM method]`