Playwright Screenshot Inspector
LLM-powered visual testing expert for automated screenshot capture, analysis, and UI verification using Playwright with multimodal AI inspection.
Activation Triggers
Activate on:
-
"screenshot test", "visual test", "screenshot inspection"
-
"playwright headless", "playwright screenshot"
-
"UI verification", "visual regression"
-
"theme compliance test", "dark mode test", "light mode test"
-
"automated screenshot", "capture and analyze"
-
"compare screenshots", "visual diff"
NOT for:
-
Simple one-off screenshots (use browser DevTools)
-
Pixel-perfect comparison without AI (use native Playwright toHaveScreenshot )
-
Non-web UI testing (use platform-specific tools)
-
Performance testing (use Lighthouse/WebPageTest)
Core Philosophy
Traditional visual testing compares pixels. LLM-powered visual testing understands semantics.
Instead of "these 50 pixels changed", LLM inspection answers:
-
"Is the content actually rendered?"
-
"Does the theme switch correctly?"
-
"Are interactive elements visible and properly styled?"
-
"What's broken vs. what's just different?"
The Screenshot Inspection Loop
┌─────────────────────────────────────────────────────────────┐ │ LLM SCREENSHOT INSPECTION │ ├─────────────────────────────────────────────────────────────┤ │ │ │ 1. CAPTURE (Playwright) │ │ └─► Wait for React hydration, not just network │ │ │ │ 2. READ (Claude vision) │ │ └─► Pass screenshot to LLM with specific questions │ │ │ │ 3. ANALYZE (Structured response) │ │ └─► Extract: content present? theme correct? errors? │ │ │ │ 4. ACT (Conditional logic) │ │ └─► Pass/fail based on semantic understanding │ │ │ └─────────────────────────────────────────────────────────────┘
Critical: Waiting for React Content
The #1 failure mode: Taking screenshots before React hydrates.
Anti-Pattern: Network Idle Alone
❌ WRONG - React may not have rendered yet
page.goto(url) page.wait_for_load_state('networkidle') page.screenshot(path='broken.png') # Often blank!
Correct Pattern: Wait for Actual Content
✅ CORRECT - Wait for React to mount
page.goto(url, wait_until='domcontentloaded') page.wait_for_load_state('networkidle')
Give React time to hydrate
import time time.sleep(0.5)
Wait for actual content selector
page.wait_for_selector('.main-content, h1, [data-testid="app"]', state='visible', timeout=10000)
Verify content exists
body_text = page.locator('body').inner_text() if len(body_text) < 50: time.sleep(2) # Extra wait for slow hydration
page.screenshot(path='good.png', full_page=True)
Content Verification Function
def wait_for_react_content(page, selectors, timeout=10000): """Wait for React to hydrate by checking for actual content.""" page.wait_for_load_state('domcontentloaded') page.wait_for_load_state('networkidle') time.sleep(0.5) # React hydration buffer
for selector in selectors.split(','):
try:
locator = page.locator(selector.strip())
if locator.count() > 0:
locator.first.wait_for(state='visible', timeout=timeout)
return True
except:
continue
# Fallback: wait for substantial body content
try:
page.wait_for_function(
'document.body.innerText.length > 100',
timeout=timeout
)
return True
except:
return False
Headless Mode: Preventing Window Spam
Always use headless=True to prevent browser windows from spawning:
from playwright.sync_api import sync_playwright
with sync_playwright() as p: # CRITICAL: headless=True prevents visible browser windows browser = p.chromium.launch(headless=True)
context = browser.new_context(
viewport={'width': 1280, 'height': 800},
color_scheme='dark' # Initial theme
)
page = context.new_page()
# ... your test logic ...
browser.close() # Always clean up
Theme Testing Pattern
Dark mode screenshot
page.emulate_media(color_scheme='dark') # Note: on PAGE, not context page.goto(url) wait_for_react_content(page, '.app-container, main, h1') page.screenshot(path='dark.png', full_page=True)
Light mode screenshot
page.emulate_media(color_scheme='light') page.reload() wait_for_react_content(page, '.app-container, main, h1') page.screenshot(path='light.png', full_page=True)
LLM Screenshot Analysis Patterns
Pattern 1: Content Verification
Prompt: "Analyze this screenshot. Answer:
- Is the main content rendered (not blank/loading)?
- What major UI elements are visible?
- Are there any error states or broken layouts?
- Rate content completeness: FULL / PARTIAL / EMPTY"
Pattern 2: Theme Compliance
Prompt: "This is a {dark/light} mode screenshot. Verify:
- Background color matches expected theme (dark bg for dark mode)
- Text has sufficient contrast against background
- Interactive elements are visible and styled correctly
- No theme leakage (dark elements on light bg or vice versa)"
Pattern 3: Comparison Analysis
Prompt: "Compare these two screenshots (before/after). Identify:
- What changed between them?
- Are changes intentional (theme switch) or bugs?
- Is any content missing in the 'after' version?
- Rate similarity: IDENTICAL / MINOR_DIFF / MAJOR_DIFF / BROKEN"
Pattern 4: Accessibility Check
Prompt: "Evaluate this screenshot for visual accessibility:
- Is text readable (sufficient size and contrast)?
- Are interactive elements clearly identifiable?
- Is there visual hierarchy (headings, sections)?
- Any elements that would fail WCAG contrast requirements?"
Complete Test Script Template
#!/usr/bin/env python3 """ LLM-Powered Screenshot Test Suite Captures screenshots and uses Claude vision for semantic analysis. """
from playwright.sync_api import sync_playwright import os import time
PAGES_TO_TEST = [ # (path, name, content_selectors) ('/', 'Home', '.hero, main, h1'), ('/about', 'About', '.about-content, main, h1'), ('/dashboard', 'Dashboard', '.dashboard, .stats, h1'), ]
BASE_URL = 'http://localhost:5173' SCREENSHOT_DIR = '/tmp/visual-tests'
def wait_for_content(page, selectors, timeout=10000): """Wait for React/Vue/Svelte to hydrate.""" page.wait_for_load_state('domcontentloaded') page.wait_for_load_state('networkidle') time.sleep(0.5)
for selector in selectors.split(','):
try:
loc = page.locator(selector.strip())
if loc.count() > 0:
loc.first.wait_for(state='visible', timeout=timeout)
return True
except:
continue
try:
page.wait_for_function('document.body.innerText.length > 100', timeout=timeout)
return True
except:
return False
def capture_themed_screenshots(page, url, name, selectors): """Capture both dark and light mode screenshots.""" safe_name = name.lower().replace(' ', '-') results = {'name': name, 'url': url}
for theme in ['dark', 'light']:
page.emulate_media(color_scheme=theme)
if theme == 'dark':
page.goto(url, wait_until='domcontentloaded')
else:
page.reload(wait_until='domcontentloaded')
content_loaded = wait_for_content(page, selectors)
if not content_loaded:
print(f" ⚠️ {theme} mode: Content slow to load, waiting...")
time.sleep(2)
screenshot_path = f'{SCREENSHOT_DIR}/{safe_name}-{theme}.png'
page.screenshot(path=screenshot_path, full_page=True)
# Check content length
body_text = page.locator('body').inner_text().strip()
results[f'{theme}_screenshot'] = screenshot_path
results[f'{theme}_content_length'] = len(body_text)
results[f'{theme}_has_content'] = len(body_text) > 50
print(f" {theme}: {'✅' if results[f'{theme}_has_content'] else '❌'} ({len(body_text)} chars)")
return results
def run_tests(): """Run visual tests on all pages.""" os.makedirs(SCREENSHOT_DIR, exist_ok=True)
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context(
viewport={'width': 1280, 'height': 800},
color_scheme='dark'
)
page = context.new_page()
# Capture console errors
errors = []
page.on('console', lambda m: errors.append(m.text) if m.type == 'error' else None)
results = []
for path, name, selectors in PAGES_TO_TEST:
print(f"Testing {name}...")
url = f'{BASE_URL}{path}'
result = capture_themed_screenshots(page, url, name, selectors)
result['errors'] = list(errors)
errors.clear()
results.append(result)
browser.close()
# Summary
print("\n" + "=" * 50)
print("VISUAL TEST SUMMARY")
print("=" * 50)
passed = sum(1 for r in results
if r.get('dark_has_content') and r.get('light_has_content'))
print(f"\nPassed: {passed}/{len(results)}")
print(f"Screenshots: {SCREENSHOT_DIR}")
return results
if name == 'main': run_tests()
MCP vs Native Playwright Decision Tree
What are you doing? │ ├─ Interactive debugging / exploring │ └─► Playwright MCP (see live browser) │ ├─ Automated test suite │ └─► Native Python Playwright (headless) │ ├─ CI/CD pipeline │ └─► Native Python Playwright (headless) │ ├─ Screenshot capture for LLM analysis │ └─► Native Python Playwright (headless) │ └─ One-off inspection └─► Either works, MCP is convenient
Common Failures and Fixes
Failure: Blank Screenshots
Cause: Screenshot taken before React hydrates Fix: Wait for content selectors, add hydration buffer
Failure: "Reconnecting..." Badge Visible
Cause: HMR/WebSocket not connected (cosmetic in tests) Fix: This is often fine - focus on actual content
Failure: Theme Not Applied
Cause: emulate_media called on context instead of page Fix: Use page.emulate_media(color_scheme='dark')
Failure: Browser Windows Spawning
Cause: headless=False or using MCP instead of native Fix: Use p.chromium.launch(headless=True)
Failure: Timeout on Content
Cause: Wrong selectors or page actually broken Fix: Verify selectors exist, check console errors
Integration with Claude Code
When Claude reads screenshots captured by this pattern:
-
Request specific analysis: Don't just show screenshot - ask targeted questions
-
Provide context: "This should be dark mode" or "This is the login page"
-
Compare systematically: Before/after, dark/light, desktop/mobile
-
Trust semantic analysis: LLM can tell "blank page" from "content loaded"
References
Research Papers
-
Using Vision LLMs For UI Testing - University of Washington
-
Vision-driven Automated Mobile GUI Testing - Multimodal LLM approach
-
ScreenLLM: Stateful Screen Schema - UI understanding framework
Tools & Integrations
-
Building an AI QA Engineer with Claude + Playwright
-
AI-Powered Visual Testing in Playwright
-
Playwright Visual Regression Testing Guide
Official Documentation
- Playwright Visual Comparisons
Version History
-
2026-01-23: Initial skill creation
-
Researched multimodal LLM screenshot analysis best practices
-
Documented React hydration waiting patterns
-
Added headless mode requirements
-
Created complete test script template
Core Insight: The difference between useless and useful screenshot tests is waiting for content, not just network. LLMs can analyze semantics, but only if there's actually content to analyze.