Visual QA — AI-Powered Visual Testing Skill
You perform visual quality assurance by walking through a web app in the browser, recording every action with structured captions, and sending the evidence to Gemini for automated review. You catch what traditional unit/integration tests miss: misaligned layouts, confusing UX, broken visual states, and edge cases.
CRITICAL RULES
- Take a screenshot BEFORE and AFTER every action. This creates the visual evidence chain.
- Log every action with the structured caption format. Every click, type, scroll, and
wait must have an ACTION, INTENT, and EXPECT block. Read
resources/caption-format.md. - Never skip the edge case checklist. After the happy path, run through edge cases.
Read
resources/edge-cases.md. - The GIF recording must be started BEFORE the first action and stopped AFTER the last.
- Gemini reviews the FULL evidence — screenshots + captions + GIF. Not just one piece.
How It Works
┌─────────────┐ ┌──────────────┐ ┌─────────────┐ ┌──────────────┐
│ 1. Plan the │───▶│ 2. Walk the │───▶│ 3. Collect │───▶│ 4. Send to │
│ test run │ │ app │ │ evidence │ │ Gemini │
└─────────────┘ └──────────────┘ └─────────────┘ └──────────────┘
What to test Click, type, Screenshots, AI reviews
and expect scroll, wait GIF, captions actual vs expected
Workflow
Phase 1 — Plan the Test Run
Before touching the browser, define the test plan:
- What app? Get the URL from the user
- What flows? List the user journeys to test (e.g., "create a record, edit it, delete it")
- What to check? Define expected visual states for each step
Write the test plan as a caption script. Read resources/caption-format.md for the format.
Phase 2 — Execute the Test Run
Use Claude in Chrome tools to walk through the app:
1. Navigate to the app URL (mcp__claude-in-chrome__navigate)
2. Start GIF recording (mcp__claude-in-chrome__gif_creator: start_recording)
3. Take initial screenshot (mcp__claude-in-chrome__computer: screenshot)
4. For each test step:
a. Log the caption (ACTION, INTENT, EXPECT)
b. Take a "before" screenshot
c. Perform the action (click, type, scroll)
d. Wait for the page to settle
e. Take an "after" screenshot
f. Note any discrepancies from expected behavior
5. Stop GIF recording (mcp__claude-in-chrome__gif_creator: stop_recording)
6. Export GIF (mcp__claude-in-chrome__gif_creator: export)
Phase 3 — Edge Case Testing
After the happy path, run through the edge case checklist in resources/edge-cases.md.
For each applicable edge case:
- Attempt the action
- Screenshot the result
- Log whether it passed or failed
Phase 4 — Compile Evidence
Gather all evidence into a structured report:
- The caption script (expected behavior)
- Screenshots at each step (actual behavior)
- The GIF recording (full flow)
- Edge case results
Phase 5 — Gemini Review (Optional)
If the user has a Gemini API key, send the evidence for AI review.
Read resources/gemini-review.md for the integration approach.
Gemini analyzes:
- Visual alignment (are elements properly positioned?)
- Content accuracy (do labels/values match expectations?)
- State consistency (do UI states match the action taken?)
- Accessibility issues (contrast, text size, touch targets)
- Missing feedback (loading states, error messages, confirmations)
Phase 6 — Report
Present findings in this format:
## Visual QA Report — [App Name]
Date: [date]
Flows Tested: [count]
Edge Cases Checked: [count]
### Results Summary
PASS: [count] FAIL: [count] PARTIAL: [count]
### Findings
FINDING #1 [SEVERITY: Critical]
STEP: [which step in the flow]
EXPECTED: [what should have happened]
ACTUAL: [what actually happened]
SCREENSHOT: [reference to screenshot]
RECOMMENDATION: [how to fix]
Agent Team Mode (Optional)
For large apps, spawn a team for parallel test coverage:
| Role | Agent Name | Tests |
|---|---|---|
| Happy Path Tester | happy-path | Core user flows, CRUD operations |
| Edge Case Hunter | edge-hunter | Empty states, long text, permissions, error handling |
| Visual Inspector | visual-inspector | Layout, alignment, responsive, accessibility |
Each agent walks the app independently and produces their own findings. The Lead merges results into a single report.
Read resources/team-testing.md for agent team test orchestration.
Without Claude in Chrome
If the Chrome extension isn't available, the skill can still generate:
- A structured test plan with the caption format
- An edge case checklist customized to the app
- A manual testing script the user can follow
The user would then record their own screen and send the video + captions to Gemini.