Screenshotter Skill
Description
A high-resolution Playwright-based screenshot capture skill that takes full-page screenshots of any URL with optimized settings for quality and reliability.
Features
-
High-resolution viewport (1920x1080)
-
Full-page screenshot capture
-
Timeout error handling
-
Page reload for stability
-
Base64 encoding of screenshot data
-
Extended timeout (120 seconds) for slow-loading pages
Configuration
-
Viewport: 1920x1080 pixels
-
Device Scale Factor: 0.5
-
Timeout: 120 seconds
-
Wait Strategy: domcontentloaded
-
Screenshot Type: Full page
Wait Strategies
Choose the appropriate wait strategy based on your needs:
-
domcontentloaded (default): Fast, waits for HTML to parse. Good for most pages.
-
load : Waits for all resources (images, stylesheets). More reliable but slower.
-
networkidle : Waits until no network activity for 500ms. Best for dynamic content.
Python Implementation
import asyncio import base64 from playwright.async_api import async_playwright import playwright._impl._api_types
async def get_screenshot(url): """ Capture a full-page screenshot of a given URL using Playwright.
Args:
url (str): The URL to capture
Returns:
str: Base64-encoded screenshot data
"""
print('in get_screenshot_func_remote', url)
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page(viewport={"width": 1920, "height": 1080, "device_scale_factor": 0.5})
try:
await page.goto(url, wait_until="domcontentloaded", timeout=120000)
except playwright._impl._api_types.TimeoutError:
print(f"TimeoutError: Failed to load {url} within the specified timeout.")
await asyncio.sleep(2)
# Reload page for stability
await page.reload(wait_until='domcontentloaded')
# Capture full-page screenshot
await page.screenshot(path="screenshot.png", full_page=True)
await browser.close()
# Read and encode screenshot
data = open("screenshot.png", "rb").read()
print('screenshot done,', len(data))
encoded_data = base64.b64encode(data).decode('utf-8')
base64_image_data = f"data:image/png;base64,{encoded_data}"
print("Screenshot of size %d bytes" % len(data))
return encoded_data
Usage Example
import asyncio
Basic usage
async def main(): url = "https://example.com" screenshot_data = await get_screenshot(url) print(f"Screenshot captured and encoded: {len(screenshot_data)} characters")
Run the async function
asyncio.run(main())
Advanced Usage
Save to Custom Path
async def get_screenshot_custom_path(url, output_path="screenshot.png"): """ Capture screenshot with custom output path. """ print('in get_screenshot_func_remote', url)
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page(viewport={"width": 1920, "height": 1080, "device_scale_factor": 0.5})
try:
await page.goto(url, wait_until="domcontentloaded", timeout=120000)
except playwright._impl._api_types.TimeoutError:
print(f"TimeoutError: Failed to load {url} within the specified timeout.")
await asyncio.sleep(2)
await page.reload(wait_until='domcontentloaded')
await page.screenshot(path=output_path, full_page=True)
await browser.close()
data = open(output_path, "rb").read()
print('screenshot done,', len(data))
encoded_data = base64.b64encode(data).decode('utf-8')
print("Screenshot of size %d bytes" % len(data))
return encoded_data
Batch Screenshots
async def capture_multiple_screenshots(urls): """ Capture screenshots of multiple URLs.
Args:
urls (list): List of URLs to capture
Returns:
dict: Dictionary mapping URLs to their base64-encoded screenshots
"""
results = {}
for url in urls:
try:
screenshot_data = await get_screenshot(url)
results[url] = screenshot_data
except Exception as e:
print(f"Error capturing {url}: {e}")
results[url] = None
return results
Usage
urls = ["https://example.com", "https://another-site.com"] results = asyncio.run(capture_multiple_screenshots(urls))
Wait for Full Page Load
async def get_screenshot_full_load(url): """Wait for all resources to load before screenshot.""" async with async_playwright() as p: browser = await p.chromium.launch() page = await browser.new_page(viewport={"width": 1920, "height": 1080, "device_scale_factor": 0.5})
# Wait for complete load including all resources
await page.goto(url, wait_until="load", timeout=120000)
await page.screenshot(path="screenshot.png", full_page=True)
await browser.close()
data = open("screenshot.png", "rb").read()
return base64.b64encode(data).decode('utf-8')
Wait for Network Idle (Dynamic Content)
async def get_screenshot_network_idle(url): """Wait for network to be idle - best for JavaScript-heavy sites.""" async with async_playwright() as p: browser = await p.chromium.launch() page = await browser.new_page(viewport={"width": 1920, "height": 1080, "device_scale_factor": 0.5})
# Wait for network idle (no requests for 500ms)
await page.goto(url, wait_until="networkidle", timeout=120000)
# Optional: wait for specific element
await page.wait_for_selector("body", state="visible")
await page.screenshot(path="screenshot.png", full_page=True)
await browser.close()
data = open("screenshot.png", "rb").read()
return base64.b64encode(data).decode('utf-8')
Cloudflare Bypass
For sites protected by Cloudflare, standard Playwright sessions are often detected. Use these techniques to bypass detection:
Installation
Node.js (JavaScript):
npm install playwright-extra playwright-extra-plugin-stealth
Python:
pip install playwright playwright-stealth
Stealth Mode Setup (JavaScript)
const { chromium } = require('playwright-extra'); const stealth = require('puppeteer-extra-plugin-stealth')();
// CRITICAL: Must use stealth plugin BEFORE launching browser chromium.use(stealth);
// Launch with stealth enabled const browser = await chromium.launch({ headless: false // Headed mode reduces detection });
Browser Fingerprint Randomization
Randomize viewport, user-agent, locale, and timezone to avoid fingerprinting:
const context = await browser.newContext({ viewport: { width: 1280 + Math.floor(Math.random() * 100), // Randomize height: 720 + Math.floor(Math.random() * 100) }, userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36', locale: 'en-US', timezoneId: 'America/New_York' });
Persistent Sessions
Reuse cookies and localStorage to appear as returning user:
const userDataDir = './session-profile';
const browser = await chromium.launchPersistentContext(userDataDir, { headless: false, args: ['--start-maximized'] });
Proxy Rotation
Rotate proxies to distribute requests and avoid IP-based blocking:
const browser = await chromium.launch({ headless: false, args: [ '--proxy-server=http://username:password@proxy-ip:port' ] });
CAPTCHA Detection
// Check for CAPTCHA iframe const isCaptchaPresent = await page.$('iframe[src*="captcha"]');
if (isCaptchaPresent) { console.log('CAPTCHA detected – solve or switch proxy'); }
CAPTCHA Solving (Optional)
For reCAPTCHA, use 2Captcha service:
const RecaptchaPlugin = require('@extra/recaptcha');
chromium.use( RecaptchaPlugin({ provider: { id: '2captcha', token: 'YOUR_2CAPTCHA_API_KEY' }, visualFeedback: true }) );
await page.solveRecaptchas();
Session Cookie Management
Save and restore cookies for continuity:
// Save cookies after successful scrape const cookies = await context.cookies(); fs.writeFileSync('./cookies.json', JSON.stringify(cookies, null, 2));
// Restore cookies on next run const savedCookies = JSON.parse(fs.readFileSync('./cookies.json')); await context.addCookies(savedCookies);
Complete Cloudflare Bypass Example
const { chromium } = require('playwright-extra'); const stealth = require('puppeteer-extra-plugin-stealth')(); const fs = require('fs');
chromium.use(stealth);
async function screenshotWithCloudflareBypass(url, proxy = null) {
const args = proxy ? [--proxy-server=${proxy}] : [];
const browser = await chromium.launch({ headless: false, args: args });
const context = await browser.newContext({ viewport: { width: 1280 + Math.floor(Math.random() * 100), height: 720 + Math.floor(Math.random() * 100) }, userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36', locale: 'en-US', timezoneId: 'America/New_York' });
const page = await context.newPage();
// Load page and wait for Cloudflare checks await page.goto(url, { waitUntil: "domcontentloaded" }); await page.waitForTimeout(5000); // Let Cloudflare finish background checks
// Check for CAPTCHA const captchaPresent = await page.$('iframe[src*="captcha"]'); if (captchaPresent) { console.log('CAPTCHA detected'); // Handle CAPTCHA or switch proxy }
// Capture screenshot await page.screenshot({ path: "screenshot.png", fullPage: true });
// Save cookies for next visit const cookies = await context.cookies(); fs.writeFileSync('./cookies.json', JSON.stringify(cookies, null, 2));
await browser.close(); }
Best Practices
-
Use headed mode (headless: false ) - reduces detection
-
Rotate proxies - avoid IP-based blocking
-
Randomize fingerprints - viewport, user-agent, timezone
-
Persist sessions - reuse cookies to appear as returning user
-
Wait for Cloudflare - add delays for background JS checks
-
Monitor CAPTCHAs - detect and handle challenges
-
Limit reuse - don't reuse same proxy/UA combo too often
Dependencies
Python:
pip install playwright playwright install chromium
Node.js (with Cloudflare bypass):
npm install playwright-extra playwright-extra-plugin-stealth
Error Handling
The skill includes robust error handling for:
-
Timeout errors: Gracefully handles pages that don't load within 120 seconds
-
Network failures: Continues execution even if initial page load fails
-
Browser crashes: Ensures browser is properly closed even on errors
Performance Notes
-
The viewport is set to 1920x1080 with a device scale factor of 0.5, resulting in effective 960x540 rendering
-
Full-page screenshots may take longer for very long pages
-
The page reload step ensures dynamic content is fully loaded
-
Screenshots are saved temporarily as PNG files before being base64-encoded
Use Cases
-
Automated website monitoring
-
Visual regression testing
-
Web scraping with visual confirmation
-
Documentation generation
-
Archiving web pages
-
Quality assurance workflows