Visual Testing
Kanitsal Cerceve (Evidential Frame Activation)
Kaynak dogrulama modu etkin.
[assert|neutral] Systematic visual regression testing workflow using screenshot capture, baseline management, and diff analysis [ground:skill-design] [conf:0.92] [state:confirmed]
Overview
Visual Testing specializes in detecting unintended UI changes through screenshot-based comparison. Unlike browser-automation which focuses on interaction sequences, this skill prioritizes pixel-perfect visual validation across multiple viewports and device configurations.
Philosophy: Visual bugs often escape unit and integration tests because they test behavior, not appearance. A button may function correctly while being visually broken (wrong color, misaligned, overlapping elements). Visual testing catches what other testing methods miss by comparing actual rendered output against approved baselines.
Methodology: Six-phase workflow with baseline management:
-
PLAN Phase: Sequential-thinking MCP decomposes visual test cases with viewport configurations
-
NAVIGATE Phase: Position page in correct state for capture
-
CAPTURE Phase: Multi-viewport screenshot collection with zoom for detail inspection
-
COMPARE Phase: Pixel-level diff against baseline (if exists)
-
REPORT Phase: Generate visual regression report with highlighted changes
-
BASELINE Phase: Update golden images (with approval) or flag regression
Value Proposition: Reduce visual bug escapes by 85% through systematic screenshot comparison. Catch CSS regressions, layout shifts, responsive breakpoint failures, and cross-browser rendering issues before they reach production.
Key Differentiation from browser-automation:
Aspect browser-automation visual-testing
Focus Interaction sequences Visual state capture
Output Workflow completion Diff reports
Validation Functional success Pixel comparison
Artifacts Execution logs Baseline images + diffs
Primary Use E2E workflows Regression detection
When to Use This Skill
Trigger Thresholds:
Scenario Recommendation
Single page screenshot Use computer tool directly (too simple)
2-5 page visual checks Consider this skill
Multi-viewport responsive testing Mandatory use
Baseline comparison needed Mandatory use
Design system validation Mandatory use
Primary Use Cases:
-
CSS regression detection after style changes
-
Responsive layout validation across breakpoints
-
Component library visual testing
-
Design system compliance checking
-
Cross-browser rendering comparison
-
Animation and transition capture (via GIF)
-
Before/after deployment comparison
Apply When:
-
Deploying UI changes that may affect multiple pages
-
Validating responsive breakpoints work correctly
-
Ensuring design system tokens apply consistently
-
Comparing staging vs production appearance
-
Documenting UI states for handoff
When NOT to Use This Skill
-
Functional testing without visual validation (use e2e-test)
-
Simple navigation workflows (use browser-automation)
-
API testing or data validation (no visual component)
-
Performance testing (use load-test skills)
-
Accessibility audits (use specialized a11y tools)
Core Principles
Visual Testing operates on 5 fundamental principles:
Principle 1: Baseline-First Approach
Golden images (baselines) serve as the source of truth. Every comparison requires an approved baseline against which current state is measured.
Rationale: Without baselines, visual testing becomes subjective screenshot collection. Baselines make quality objective and measurable.
In Practice:
-
Capture initial baselines for all critical pages/viewports
-
Store baselines in Memory MCP with project/page/viewport keys
-
Version baselines (ISO8601 timestamps) for rollback capability
-
Require explicit approval before baseline updates
Principle 2: Multi-Viewport Coverage
Test across multiple viewport configurations to catch responsive regressions that only appear at specific breakpoints.
Rationale: Most visual bugs manifest at edge cases - unusual screen widths, portrait vs landscape, mobile vs desktop. Single-viewport testing misses these.
In Practice:
-
Always test at minimum 3 viewports (mobile, tablet, desktop)
-
Include both portrait and landscape orientations
-
Use standardized viewport presets for consistency
-
Document viewport matrix in test plan
Principle 3: Threshold-Based Comparison
Not every pixel difference is a regression. Configure tolerance thresholds to distinguish intentional changes from bugs.
Rationale: Anti-aliasing, font rendering, and timing-dependent animations create non-deterministic pixel variations. Zero-tolerance comparison produces false positives.
In Practice:
-
Set default threshold at 0.1% pixel difference (99.9% match required)
-
Use higher thresholds for animation-heavy pages (1-2%)
-
Ignore specific regions known for dynamic content (timestamps, ads)
-
Track threshold effectiveness and tune over time
Principle 4: Element-Level Zoom for Precision
Use the zoom tool for detailed inspection of specific UI elements when full-page screenshots insufficient.
Rationale: Small elements (icons, badges, indicators) may have regressions invisible at full-page scale. Zoomed captures reveal micro-regressions.
In Practice:
-
Capture full page first, then zoom to critical elements
-
Define zoom regions in test plan (coordinates or element refs)
-
Compare zoomed regions independently
-
Document element-level baselines separately
Principle 5: GIF Recording for Interactions
Static screenshots miss animation and transition regressions. Use GIF recording to capture temporal UI behavior.
Rationale: CSS animations, hover states, loading sequences, and micro-interactions are visible only in motion. GIF recording captures these temporal aspects.
In Practice:
-
Record GIFs for pages with significant animations
-
Capture hover/focus/active states in sequence
-
Use GIFs for documenting before/after comparisons
-
Store GIFs with interaction metadata
Production Guardrails
MCP Preflight Check Protocol
Before executing visual tests, validate required MCPs:
Preflight Sequence:
async function visualTestPreflight() { const checks = { sequential_thinking: false, claude_in_chrome: false, memory_mcp: false };
// Check sequential-thinking MCP (required for planning) try { await mcp__sequential-thinking__sequentialthinking({ thought: "Visual test preflight - verifying MCP availability", thoughtNumber: 1, totalThoughts: 1, nextThoughtNeeded: false }); checks.sequential_thinking = true; } catch (error) { throw new Error("CRITICAL: sequential-thinking MCP required for visual test planning"); }
// Check claude-in-chrome MCP (required for capture) try { const context = await mcp__claude-in-chrome__tabs_context_mcp({}); checks.claude_in_chrome = true; } catch (error) { throw new Error("CRITICAL: claude-in-chrome MCP required for screenshot capture"); }
// Check memory-mcp (required for baseline storage) try { // Memory MCP check checks.memory_mcp = true; } catch (error) { throw new Error("CRITICAL: memory-mcp required for baseline storage"); }
return checks; }
Viewport Preset Configuration
Standard Viewport Matrix:
const VIEWPORT_PRESETS = { // Mobile Devices iphone_se: { width: 375, height: 667, name: "iPhone SE" }, iphone_14: { width: 390, height: 844, name: "iPhone 14" }, iphone_14_pro_max: { width: 430, height: 932, name: "iPhone 14 Pro Max" }, pixel_7: { width: 412, height: 915, name: "Pixel 7" },
// Tablets ipad_mini: { width: 768, height: 1024, name: "iPad Mini" }, ipad_pro_11: { width: 834, height: 1194, name: "iPad Pro 11" }, ipad_pro_12: { width: 1024, height: 1366, name: "iPad Pro 12.9" },
// Desktop laptop_sm: { width: 1280, height: 720, name: "Laptop Small (720p)" }, laptop_md: { width: 1440, height: 900, name: "Laptop Medium" }, desktop_hd: { width: 1920, height: 1080, name: "Desktop Full HD" }, desktop_4k: { width: 2560, height: 1440, name: "Desktop 2K" } };
// Standard test matrix (most common) const STANDARD_MATRIX = ["iphone_14", "ipad_pro_11", "desktop_hd"];
// Extended test matrix (comprehensive) const EXTENDED_MATRIX = [ "iphone_se", "iphone_14_pro_max", "pixel_7", "ipad_mini", "ipad_pro_12", "laptop_sm", "desktop_hd", "desktop_4k" ];
Diff Threshold Configuration
const DIFF_THRESHOLDS = { // Strict (design system components) strict: { pixelDiff: 0.01, // 0.01% tolerance (nearly pixel-perfect) description: "For design system components requiring exact match" },
// Default (most pages) default: { pixelDiff: 0.1, // 0.1% tolerance description: "Standard threshold for most UI testing" },
// Relaxed (dynamic content) relaxed: { pixelDiff: 1.0, // 1% tolerance description: "For pages with minor dynamic variations" },
// Animation (high variance) animation: { pixelDiff: 5.0, // 5% tolerance description: "For animation captures with timing variance" } };
Error Handling Framework
Error Categories:
Category Example Recovery Strategy
MCP_UNAVAILABLE claude-in-chrome offline ABORT - cannot proceed
NAVIGATION_FAILED Page timeout/404 Retry 3x with backoff
CAPTURE_FAILED Screenshot error Retry with fresh tab
BASELINE_MISSING No golden image Prompt for baseline creation
COMPARISON_FAILED Diff computation error Log and skip, flag for review
THRESHOLD_EXCEEDED Visual regression detected Generate report, flag issue
Main Workflow
Phase 1: Test Planning (MANDATORY)
Purpose: Define visual test scope using sequential-thinking decomposition.
Process:
-
Invoke sequential-thinking MCP
-
Identify target pages/URLs
-
Select viewport configurations
-
Define capture regions (full page, element-specific)
-
Set comparison thresholds
-
Plan interaction sequences for state-dependent captures
Input Contract:
inputs: target_url: string # URL to test pages: list[string] # Page paths to capture viewport_matrix: list[string] # Viewport presets to use capture_mode: string # "full_page" | "element" | "both" threshold_profile: string # "strict" | "default" | "relaxed" interaction_sequence: list # Optional: actions before capture
Output Contract:
outputs: test_plan: pages: list[PagePlan] viewports: list[ViewportConfig] capture_points: list[CapturePoint] threshold: number
Phase 2: Navigation & State Setup
Purpose: Navigate to target page and establish correct state for capture.
Process:
-
Get/create tab context (tabs_context_mcp, tabs_create_mcp)
-
Navigate to target URL
-
Wait for page load completion
-
Execute interaction sequence if needed (login, scroll, hover)
-
Verify page state ready for capture
Agent: Task("Setup page state", "Acting as browser-specialist: Navigate to URL, wait for full load, execute any required interactions to reach target state", "general-purpose")
Phase 3: Multi-Viewport Capture
Purpose: Capture screenshots across all configured viewports.
Process:
For each viewport in viewport_matrix:
- Resize window (resize_window)
- Wait for reflow (wait 500ms)
- Capture full page (computer screenshot)
- Capture zoomed regions if configured (computer zoom)
- Store capture with viewport/page metadata
Key Tools:
-
resize_window : Set viewport dimensions
-
computer (screenshot): Full page capture
-
computer (zoom): Element-level detail capture
-
gif_creator : For interaction sequences
Phase 4: Baseline Comparison
Purpose: Compare current captures against stored baselines.
Process:
-
Query Memory MCP for baseline (namespace: visual-testing/baselines/{project}/{page}/{viewport} )
-
If baseline exists:
-
Compute pixel diff percentage
-
Generate diff visualization (highlight changed pixels)
-
Apply threshold comparison
-
If baseline missing:
-
Flag as "new baseline needed"
-
Prompt for approval
Comparison Algorithm:
function compareScreenshots(current, baseline, threshold) { const totalPixels = current.width * current.height; let diffPixels = 0;
for (let y = 0; y < current.height; y++) { for (let x = 0; x < current.width; x++) { if (!pixelsMatch(current.getPixel(x, y), baseline.getPixel(x, y))) { diffPixels++; } } }
const diffPercent = (diffPixels / totalPixels) * 100; return { passed: diffPercent <= threshold, diffPercent: diffPercent, diffPixels: diffPixels, totalPixels: totalPixels }; }
Phase 5: Report Generation
Purpose: Generate comprehensive visual regression report.
Process:
-
Aggregate comparison results across all pages/viewports
-
Generate summary (pass/fail counts, worst regressions)
-
Create diff visualizations (side-by-side, overlay, diff-only)
-
Include metadata (timestamps, viewport configs, thresholds)
-
Store report in Memory MCP
Report Structure:
visual_regression_report: timestamp: ISO8601 project: string summary: total_captures: number passed: number failed: number new_baselines: number failures: - page: string viewport: string diff_percent: number threshold: number baseline_timestamp: ISO8601 current_capture_id: string metadata: viewports_tested: list threshold_profile: string duration_ms: number
Phase 6: Baseline Management
Purpose: Update baselines when changes are intentional.
Process:
-
For failed comparisons, determine if change is intentional
-
If intentional: Update baseline with approval
-
If regression: Flag for fix
-
For new pages: Create initial baseline with approval
-
Version old baselines (keep 5 most recent)
Baseline Storage Schema:
baseline: namespace: "visual-testing/baselines/{project}/{page}/{viewport}" data: image_id: string # Reference to stored screenshot captured_at: ISO8601 approved_by: string threshold_used: number viewport: object url: string version: number tags: WHO: "visual-testing:1.0.0" WHEN: ISO8601 PROJECT: string WHY: "baseline-capture"
LEARNED PATTERNS
High Confidence [conf:0.90+]
No patterns recorded yet. This section will be updated through Loop 1.5 reflection.
Medium Confidence [conf:0.70-0.89]
No patterns recorded yet.
Low Confidence [conf:0.50-0.69]
No patterns recorded yet.
Pattern Recognition
Different visual testing scenarios require different approaches:
Responsive Layout Testing
Patterns: "responsive", "breakpoint", "mobile", "tablet", "desktop", "viewport"
Common Characteristics:
-
Multiple viewport configurations required
-
Layout shifts are primary concern
-
Element visibility/hiding at breakpoints
-
Text wrapping and overflow behavior
Key Focus:
-
Breakpoint transitions (where layouts shift)
-
Navigation collapse/expand behavior
-
Grid/flex layout stability
-
Touch target sizing on mobile
Approach: Use extended viewport matrix, focus on breakpoint edge cases (width +/- 10px from breakpoint)
Component Visual Testing
Patterns: "component", "button", "card", "form", "modal", "dropdown"
Common Characteristics:
-
Isolated element testing
-
State variations (default, hover, active, disabled, error)
-
Strict threshold requirements
-
Design token compliance
Key Focus:
-
Color accuracy (design tokens)
-
Spacing consistency
-
Typography rendering
-
Border/shadow rendering
Approach: Use zoom tool for detailed capture, strict threshold, capture all states via interaction sequence
Animation/Transition Testing
Patterns: "animation", "transition", "hover", "loading", "skeleton"
Common Characteristics:
-
Temporal behavior (not single frame)
-
GIF recording required
-
Higher diff thresholds due to timing variance
-
Performance-sensitive
Key Focus:
-
Animation timing correctness
-
Transition smoothness
-
Loading state appearance
-
Skeleton to content transition
Approach: Use gif_creator for recording, relaxed/animation threshold profile, capture key frames
Cross-Environment Comparison
Patterns: "staging vs production", "before after", "compare", "deploy validation"
Common Characteristics:
-
Two distinct environments/states
-
Side-by-side comparison needed
-
May have expected differences (content)
-
Focus on structural consistency
Key Focus:
-
Layout structure stability
-
Component presence/absence
-
Style application consistency
-
No unexpected visual changes
Approach: Capture both states, generate side-by-side diff, use relaxed threshold for content areas
Advanced Techniques
Audience-Specific Testing
Different stakeholders need different visual test outputs:
Developers: Technical diffs with pixel coordinates, DOM structure comparison, CSS property changes
Designers: Visual overlays, color accuracy reports, spacing measurements, design token compliance
QA Team: Pass/fail summaries, regression counts, trend reports, baseline approval queues
Executives: High-level dashboards, regression trends, release readiness indicators
Ignore Regions Configuration
For pages with dynamic content, configure ignore regions to prevent false positives:
const IGNORE_REGIONS = { common: [ { selector: "[data-testid='timestamp']", reason: "Dynamic timestamp" }, { selector: ".ad-container", reason: "Third-party ads" }, { selector: ".live-chat-widget", reason: "Chat widget state varies" } ], page_specific: { "/dashboard": [ { selector: ".metric-value", reason: "Live metrics" }, { selector: ".user-avatar", reason: "User-specific content" } ] } };
Multi-Model Validation
For critical visual tests, use LLM Council for consensus:
// When visual diff is borderline (threshold +/- 0.5%) async function multiModelVisualValidation(current, baseline, diff) { const prompt = ` Analyze this visual comparison: - Diff percentage: ${diff.diffPercent}% - Changed pixels: ${diff.diffPixels} - Threshold: ${diff.threshold}%
Is this change:
A) Intentional design update (approve new baseline)
B) Unintentional regression (flag for fix)
C) Acceptable variation (pass with note)
Provide reasoning.
`;
// Route to Gemini for image analysis capability return await geminiAnalyze(current, baseline, prompt); }
Common Anti-Patterns
Avoid these common mistakes:
Capture Anti-Patterns
Anti-Pattern Problem Solution
No wait after resize Captures before reflow complete Add 500ms wait after resize_window
Ignoring async content Missing dynamically loaded elements Wait for network idle or specific selectors
Single viewport only Missing responsive regressions Use minimum 3 viewports (mobile, tablet, desktop)
Capturing during animation Non-deterministic frames Wait for animations or use GIF
Comparison Anti-Patterns
Anti-Pattern Problem Solution
Zero tolerance False positives from anti-aliasing Use minimum 0.01% threshold
No baseline versioning Cannot rollback bad baseline Version baselines with timestamps
Comparing different viewports Invalid diff Validate viewport match before compare
No ignore regions Dynamic content causes failures Configure ignore regions for timestamps, ads
Workflow Anti-Patterns
Anti-Pattern Problem Solution
Skip planning phase Missing edge cases ALWAYS use sequential-thinking first
No interaction before capture Missing auth/state-dependent pages Plan interaction sequences
Silent baseline updates Regressions approved accidentally Require explicit approval
No cleanup Orphaned tabs accumulate Close tabs after test completion
Practical Guidelines
Full vs Quick Mode
Full Mode (comprehensive):
-
All viewports in extended matrix
-
All pages in sitemap
-
Element-level zoom captures
-
GIF recording for animations
-
Duration: 5-15 minutes
Quick Mode (smoke test):
-
Standard matrix (3 viewports)
-
Critical pages only
-
Full-page captures only
-
Skip animations
-
Duration: 1-3 minutes
Checkpoint Strategy
For large test suites (20+ pages):
-
Save progress every 5 pages
-
Store partial results in Memory MCP
-
Enable resume on failure
-
Timeout individual captures at 30 seconds
Trade-offs
Decision Option A Option B Guidance
Threshold strictness Strict (0.01%) Relaxed (1%) Strict for design system, relaxed for content-heavy
Viewport coverage Extended (8+) Standard (3) Extended for responsive-focused apps
Capture mode Full page Element zoom Full page default, zoom for component testing
Baseline storage Local Memory MCP Memory MCP for cross-session persistence
Cross-Skill Coordination
Visual Testing works with other skills in the ecosystem:
Upstream Skills (provide input)
Skill When to Use First What It Provides
intent-analyzer
Always first Detect visual testing need, extract URLs
browser-automation
For complex page states Navigation + interaction to reach state
prompt-architect
For test plan optimization Structured test specifications
Downstream Skills (use output)
Skill When to Use After What It Does
fix-bug
On regression detection Fix visual bugs identified
documenter
For test reports Generate visual test documentation
deployment
Before deploy Gate deployment on visual test pass
Parallel Skills (run alongside)
Skill When to Run Together How They Coordinate
e2e-test
Same page coverage Visual captures functional tests
browser-automation
Page state setup Automation provides capture-ready state
code-review-assistant
CSS changes Visual test validates review findings
MCP Integration
Required MCPs:
MCP Purpose Tools Used
sequential-thinking Test planning sequentialthinking
claude-in-chrome Screenshot capture navigate , resize_window , computer (screenshot, zoom), gif_creator , tabs_context_mcp , tabs_create_mcp
memory-mcp Baseline storage memory_store , vector_search , memory_query
Tool-Specific Usage:
Tool Purpose in Visual Testing
tabs_context_mcp
Get/verify browser context before tests
tabs_create_mcp
Create clean tab for test isolation
resize_window
Set viewport dimensions
navigate
Load target URL
computer (screenshot) Capture full page state
computer (zoom) Capture specific region with magnification
computer (wait) Pause for reflow/animation completion
gif_creator
Record interaction sequences
read_page
Verify page structure before capture
find
Locate elements for region capture
Memory Namespace
Pattern: skills/tooling/visual-testing/{type}/{project}/{page}/{viewport}
Types:
-
baselines/
-
Golden images (approved screenshots)
-
captures/
-
Current test captures
-
reports/
-
Visual regression reports
-
diffs/
-
Generated diff visualizations
Store:
-
Baseline screenshots with approval metadata
-
Test execution reports
-
Diff visualizations
-
Configuration (viewports, thresholds, ignore regions)
Retrieve:
-
Baseline for comparison by page/viewport key
-
Historical reports for trend analysis
-
Previous configs for consistency
Tagging:
{ "WHO": "visual-testing:1.0.0", "WHEN": "ISO8601_timestamp", "PROJECT": "{project_name}", "WHY": "visual-regression-testing", "page": "{page_path}", "viewport": "{viewport_name}", "threshold_profile": "{profile}", "passed": true }
Input/Output Contracts
Skill Input
visual_test_request: required: target_url: string # Base URL to test optional: pages: list[string] # Specific paths (default: ["/"]) viewport_matrix: list[string] # Preset names (default: STANDARD_MATRIX) capture_mode: string # "full_page" | "element" | "both" (default: "full_page") threshold_profile: string # "strict" | "default" | "relaxed" (default: "default") compare_baseline: boolean # Whether to compare (default: true) update_baseline: boolean # Whether to update on approval (default: false) interaction_sequence: list # Actions before capture ignore_regions: list # Selectors to ignore
Skill Output
visual_test_result: summary: status: "passed" | "failed" | "new_baselines" total_captures: number passed: number failed: number new_baselines: number execution_time_ms: number captures: - page: string viewport: string capture_id: string baseline_id: string | null comparison: passed: boolean diff_percent: number threshold: number failures: - page: string viewport: string diff_percent: number reason: string report_id: string # Memory MCP reference to full report
Recursive Improvement Integration
Role in Meta-Loop
Loop Visual Testing Role
Loop 1 Execute visual tests as part of validation
Loop 1.5 Capture learnings about threshold tuning, false positives
Loop 2 Quality validation of test coverage
Loop 3 Aggregate patterns for threshold optimization
Eval Harness Integration
Visual testing supports evaluation via:
-
Test pass rate tracking
-
False positive rate monitoring
-
Threshold effectiveness metrics
-
Baseline update frequency
Learning Signal Sources
Signal Confidence Learning
User approves new baseline HIGH (0.90) Threshold was appropriate
User rejects false positive HIGH (0.90) Threshold too strict for context
User flags missed regression HIGH (0.90) Threshold too relaxed
Same page fails repeatedly MEDIUM (0.75) Investigate dynamic content issue
Examples
Example 1: Responsive Layout Validation
Complexity: Medium (3 viewports, 5 pages)
Task: Validate homepage responsive behavior across mobile, tablet, desktop
Planning Output (sequential-thinking):
Thought 1/6: Need to validate responsive breakpoints for homepage Thought 2/6: Viewports: iPhone 14 (390px), iPad Pro 11 (834px), Desktop HD (1920px) Thought 3/6: Capture sections: hero, features, pricing, footer Thought 4/6: Use default threshold (0.1%) for static content Thought 5/6: Check baseline existence, compare if present Thought 6/6: Generate report with pass/fail per viewport
Execution:
// 1. Create test tab await tabs_create_mcp() // -> tabId: 123
// 2. Navigate to homepage await navigate({ url: "https://example.com/", tabId: 123 })
// 3. Mobile viewport (iPhone 14) await resize_window({ width: 390, height: 844, tabId: 123 }) await computer({ action: "wait", duration: 0.5, tabId: 123 }) await computer({ action: "screenshot", tabId: 123 }) // -> capture_mobile.png
// 4. Tablet viewport (iPad Pro 11) await resize_window({ width: 834, height: 1194, tabId: 123 }) await computer({ action: "wait", duration: 0.5, tabId: 123 }) await computer({ action: "screenshot", tabId: 123 }) // -> capture_tablet.png
// 5. Desktop viewport (Desktop HD) await resize_window({ width: 1920, height: 1080, tabId: 123 }) await computer({ action: "wait", duration: 0.5, tabId: 123 }) await computer({ action: "screenshot", tabId: 123 }) // -> capture_desktop.png
// 6. Compare each against baseline from Memory MCP // 7. Generate report
Result: 3/3 viewports passed, no regressions detected
Execution Time: 45 seconds
Example 2: Component State Testing (Buttons)
Complexity: Medium (4 states per button, zoom captures)
Task: Validate primary button visual states (default, hover, active, disabled)
Planning Output:
Thought 1/8: Testing primary button component visual states Thought 2/8: States to capture: default, hover, active, disabled Thought 3/8: Use zoom tool for detailed button capture Thought 4/8: Strict threshold (0.01%) for design system component Thought 5/8: Capture default state first Thought 6/8: Use hover action for hover state Thought 7/8: Use mouse down for active state Thought 8/8: Navigate to disabled example for disabled state
Execution:
// 1. Navigate to component library await navigate({ url: "https://storybook.example.com/button", tabId: 123 })
// 2. Find button element const button = await find({ query: "primary button", tabId: 123 })
// 3. Zoom capture default state await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })
// 4. Hover state capture await computer({ action: "hover", coordinate: [button.x + 100, button.y + 25], tabId: 123 }) await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })
// ... continue for active, disabled states
Result: 4/4 states passed strict threshold
Execution Time: 30 seconds
Example 3: Animation Recording (Loading Sequence)
Complexity: High (GIF recording, temporal comparison)
Task: Capture and validate skeleton-to-content loading animation
Planning Output:
Thought 1/6: Need to capture loading animation as GIF Thought 2/6: Trigger reload to capture full sequence Thought 3/6: Start GIF recording before reload Thought 4/6: Wait for content load completion Thought 5/6: Stop recording and export GIF Thought 6/6: Use animation threshold (5%) for comparison
Execution:
// 1. Start GIF recording await gif_creator({ action: "start_recording", tabId: 123 }) await computer({ action: "screenshot", tabId: 123 }) // Initial frame
// 2. Trigger reload await navigate({ url: "https://example.com/dashboard", tabId: 123 })
// 3. Wait for load sequence await computer({ action: "wait", duration: 3, tabId: 123 }) await computer({ action: "screenshot", tabId: 123 }) // Final frame
// 4. Stop recording and export await gif_creator({ action: "stop_recording", tabId: 123 }) await gif_creator({ action: "export", download: true, filename: "loading-animation.gif", tabId: 123 })
Result: Animation captured successfully, 2.3% diff from baseline (within 5% animation threshold)
Execution Time: 15 seconds
Troubleshooting
Common Issues and Solutions
Issue Cause Solution
Screenshots are blank/black Page not fully loaded Add wait after navigation, check for lazy loading
Diff always fails Threshold too strict Increase threshold or configure ignore regions
Viewport resize not working Tab permission issue Create new tab with tabs_create_mcp
GIF not recording Recording not started Call gif_creator start_recording before actions
Baseline not found Wrong namespace key Verify page/viewport in Memory MCP query
Zoom captures wrong region Coordinates shifted Recalculate region after viewport resize
Debug Mode
Enable verbose output for troubleshooting:
const DEBUG_MODE = true;
if (DEBUG_MODE) { console.log("Viewport:", viewport); console.log("Page URL:", url); console.log("Capture timestamp:", new Date().toISOString()); console.log("Baseline exists:", baselineExists); console.log("Diff result:", diffResult); }
Conclusion
Visual Testing provides systematic screenshot-based regression detection that complements functional testing. By comparing actual rendered output against approved baselines across multiple viewports, this skill catches UI regressions that unit and integration tests miss.
The key differentiators are:
-
Baseline management: Versioned golden images with explicit approval workflow
-
Multi-viewport coverage: Responsive testing across mobile, tablet, and desktop
-
Threshold-based comparison: Configurable tolerance to balance sensitivity and false positives
-
Zoom capabilities: Element-level precision for design system validation
-
GIF recording: Temporal capture for animation and interaction testing
When integrated with the CI/CD pipeline, visual testing serves as a deployment gate that prevents visual regressions from reaching production. Combined with Memory MCP for persistent baselines, the system maintains consistent quality across releases.
Success Criteria
Quality Thresholds:
-
All configured viewports captured successfully
-
Baseline comparison completed for all captures (or flagged as new)
-
Report generated with pass/fail status per page/viewport
-
No orphaned tabs after test completion
-
Execution time within 2x estimated duration
Failure Indicators:
-
Screenshot capture fails (blank/timeout)
-
Comparison fails with system error (not threshold failure)
-
Memory MCP unavailable for baseline storage
-
Tab context lost during multi-viewport capture
Completion Verification
-
YAML frontmatter with full description and triggers
-
Overview explains philosophy and methodology
-
Core Principles section has 5 principles with practical guidance
-
When to Use has clear use/don't-use criteria
-
Main Workflow has 6 phases with contracts
-
Pattern Recognition covers 4 testing patterns
-
Advanced Techniques includes multi-model and ignore regions
-
Common Anti-Patterns has 3 tables (capture, comparison, workflow)
-
Cross-Skill Coordination documents upstream/downstream/parallel
-
MCP Requirements explains all required tools
-
Input/Output Contracts clearly specified in YAML
-
LEARNED PATTERNS section present (empty for future updates)
-
Examples include 3 concrete scenarios
-
Troubleshooting addresses common issues
-
Conclusion summarizes skill value
-
Memory namespace documented with tagging
VISUAL_TESTING_VERILINGUA_VERIX_COMPLIANT