Steel Browsing Skill (CDP-first)
This Skill enables reliable web browsing + automation via Steel.
Core operating principle (what we learned)
Default to CDP (Playwright/Puppeteer) for anything interactive. Use Computer API actions only as a tactical fallback for “visual” recovery (focus traps, stubborn overlays, browser-native-ish prompts, CAPTCHAs, etc.).
Default run profile (apply unless user overrides)
-
Assume anti-bot defenses may exist.
-
Prefer Steel session + CDP for most web tasks, even when scrape might appear possible.
-
Start sessions with anti-captcha/stealth settings enabled.
-
Keep one authoritative SESSION_ID and reuse it through retries.
-
Fail fast only after the bounded retry ladder (defined later) exhausts.
Session contract (before any task)
-
Use exactly one source of truth for state: SESSION_ID , SESSION_WS , SESSION_VIEWER_URL .
-
SESSION_ID comes from /v1/sessions .
-
SESSION_WS is websocketUrl returned by create-session.
-
SESSION_VIEWER_URL is for human verification only.
-
Reuse the same SESSION_ID and SESSION_WS for all retries.
-
Replace all session variables only when the previous session is proven gone (Session not found or confirmed hard expiration).
-
Maximum 1 replacement of a session per task unless explicit expiration is confirmed.
Golden template (default hard mode)
Use this as the default flow for any new site:
POST /v1/sessions { "url": "https://target.site", "timeout": 900000, "solveCaptcha": true, "stealthConfig": { "humanizeInteractions": true, "autoCaptchaSolving": true }, "deviceConfig": { "device": "desktop" }, "region": "iad", "useProxy": false }
-
If using Playwright/Puppeteer, connect CDP with: websocketUrl + "&apiKey=" + encodeURIComponent(STEEL_API_KEY)
-
Run the interaction with selector-based waits and DOM verification.
-
If blocked/hung, do bounded fallback via POST /v1/sessions/{id}/computer (Esc / close overlay / small scroll), then retry once.
-
Always release in finally , even on failure.
Golden runbook (single-task template)
Use this exact sequence before each interactive task:
- Preflight
: "${STEEL_API_KEY:?missing STEEL_API_KEY}" command -v curl >/dev/null || exit 1 command -v jq >/dev/null || exit 1
- Create one session and export state
RESPONSE=$(curl -sS -X POST https://api.steel.dev/v1/sessions
-H "Content-Type: application/json"
-H "steel-api-key: $STEEL_API_KEY"
--data-raw '{"url":"https://target.site","timeout":900000,"solveCaptcha":true,"stealthConfig":{"humanizeInteractions":true,"autoCaptchaSolving":true},"deviceConfig":{"device":"desktop"},"region":"iad","useProxy":false}')
SESSION_ID=$(echo "$RESPONSE" | jq -r .id) SESSION_WS=$(echo "$RESPONSE" | jq -r --arg key "$STEEL_API_KEY" '.websocketUrl + "&apiKey=" + $key') SESSION_VIEWER_URL=$(echo "$RESPONSE" | jq -r .sessionViewerUrl)
-
Run CDP automation (single runtime path)
-
Use one runtime only (Playwright JS or Python Playwright).
-
Pass SESSION_WS and TARGET_URL as env vars.
-
On any recoverable exception, run one longer-timeout retry before fallback.
-
Verify post-condition
-
URL changed to target destination OR
-
expected success selector visible OR
-
expected state/text changed.
-
Release
curl -sS -X POST https://api.steel.dev/v1/sessions/"$SESSION_ID"/release
-H "steel-api-key: $STEEL_API_KEY" || true
-
Bounded fallback
-
If blocked: one Computer recovery pass (take_screenshot , press_key ["Escape"] , click outside, scroll ) then one final CDP retry.
-
If still blocked: stop and report blocker reason.
Optional scripts for repetitive steps (non-mandatory)
Use these local helpers when you want fast, low-risk execution:
-
scripts/create_steel_session.sh – create session and export SESSION_ID , SESSION_WS , SESSION_VIEWER_URL , TARGET_URL .
-
scripts/release_steel_session.sh – idempotent release helper.
-
scripts/cdp_template.js – compact Playwright-CDP interaction scaffold.
Examples:
- examples/runbook.md for one-shot copy/paste flow using the helper scripts.
Why:
-
CDP gives deterministic navigation + selectors + robust waits and verifications.
-
Computer actions are slower and fragile (coordinates), but excellent as an escape hatch.
Security & Setup
API key handling (mandatory policy)
-
Do not ask the user to paste API keys into chat.
-
Expect STEEL_API_KEY in the environment.
Example header (bash/curl):
-H "steel-api-key: $STEEL_API_KEY"
Base URL:
Runtime preflight (before first request)
-
if ! command -v jq >/dev/null; then install or fall back with safe shell JSON.
-
if ! command -v node >/dev/null; then switch to Python-only CDP path.
-
if ! command -v python >/dev/null; then use Node-only path.
-
if ! command -v playwright >/dev/null for chosen runtime, install before interaction or switch to Python Playwright package.
-
Validate at session creation time that timeout is present and includes anti-bot flags for interactive targets.
-
Set STEEL_API_KEY and never print request headers containing it.
Standard session variable setup
-
Set and reuse export SESSION_ID=<id> .
-
Set and reuse export SESSION_WS="<websocketUrl>&apiKey=${STEEL_API_KEY}" .
-
Set and reuse export SESSION_VIEWER_URL=<sessionViewerUrl> .
-
Treat missing SESSION_WS as hard failure before CDP code execution.
Quick Decision Tree
Use Stateless endpoints when:
-
You only need page content, a screenshot, or a PDF
-
No login/multi-step flow required
✅ Use:
-
POST /v1/scrape
-
POST /v1/screenshot
-
POST /v1/pdf
Use Sessions when:
-
Login required
-
Multi-step interaction
-
Form submissions
-
JS-heavy apps
-
You need cookies/localStorage persistence
✅ Use:
-
POST /v1/sessions (create; always set timeout )
-
CDP (preferred) using websocketUrl from session response
-
POST /v1/sessions/{id}/computer (fallback / recovery)
-
GET /v1/sessions/{id}/context (cookies/storage)
-
POST /v1/sessions/{id}/release (always)
Mode 1: Stateless (One-shot)
- Scrape
Use for clean text extraction and planning selectors.
Endpoint: POST /v1/scrape
Example:
{ "url": "https://example.com", "format": ["markdown"], "screenshot": false, "pdf": false }
Formats:
-
markdown (best for summarization)
-
cleaned_html (best for parsing + finding forms/selectors)
-
html (raw)
Tip: For form automation, scrape first and record:
-
input selectors (name=email , input[type=email] , etc.)
-
submit button selector
-
success message text/element to verify completion
- Screenshot
Endpoint: POST /v1/screenshot
Example:
{ "url": "https://example.com", "fullPage": true }
Endpoint: POST /v1/pdf
Example:
{ "url": "https://example.com" }
Mode 2: Sessions (Stateful)
Session lifecycle (critical)
Sessions expire if you don’t set a long enough timeout . Common failure symptom: Session ... not found .
Rule:
-
Always set timeout for anything non-trivial.
-
Track the active SESSION_ID in one place and don’t mix IDs.
-
Reuse the same session for retries; don’t create a new session for each selector tweak.
-
Bound session creation attempts (for example: max 2 per task) to avoid session sprawl.
-
Always release when done.
-
If release returns Session not found after successful work, treat it as already-ended/idempotent cleanup.
Create session
Endpoint: POST /v1/sessions
Minimal:
{ "timeout": 600000 }
Common options:
{ "url": "https://example.com", "timeout": 900000, "solveCaptcha": true, "stealthConfig": { "humanizeInteractions": true, "autoCaptchaSolving": true }, "deviceConfig": { "device": "desktop" }, "region": "iad", "useProxy": false }
For most sites, the minimum anti-bot-safe session is:
{ "timeout": 900000, "solveCaptcha": true, "stealthConfig": { "humanizeInteractions": true, "autoCaptchaSolving": true }, "region": "iad" }
The response typically includes:
-
id
-
websocketUrl (use for CDP)
-
sessionViewerUrl / debugUrl (use for human verification)
Step 2A (Preferred): Control the session via CDP (Playwright/Puppeteer)
When to use CDP
Use CDP for:
-
navigation (goto )
-
selector-based clicks and fills
-
robust waits and assertions
-
reliable verification (URL/text/DOM)
How to connect
Use the websocketUrl returned by POST /v1/sessions . (Do not guess the URL pattern; Steel returns the correct one for your session.)
Important auth note from field use:
-
For some environments, connectOverCDP requires appending apiKey in the WS query string.
-
Safe default:
const wsUrl = ${session.websocketUrl}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)};
const browser = await chromium.connectOverCDP(wsUrl);
Stable CDP script pattern (copy-safe)
Use one runtime and export required variables.
import { chromium } from "playwright";
const wsUrl = ${process.env.SESSION_WS}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)};
const target = process.env.TARGET_URL || "https://example.com";
(async () => { const browser = await chromium.connectOverCDP(wsUrl); const context = browser.contexts()[0]; const page = context.pages()[0] || (await context.newPage());
await page.goto(target, { waitUntil: "domcontentloaded", timeout: 60000 }); await page.waitForSelector("body", { timeout: 30000 }); // run deterministic interactions here await browser.close(); })();
import os import asyncio from playwright.async_api import async_playwright
async def run(): ws_url = f"{os.environ['SESSION_WS']}&apiKey={os.environ['STEEL_API_KEY']}" target = os.environ.get("TARGET_URL", "https://example.com") async with async_playwright() as p: browser = await p.chromium.connect_over_cdp(ws_url) context = browser.contexts[0] page = context.pages[0] if context.pages() else await context.new_page() await page.goto(target, wait_until="domcontentloaded", timeout=60000) await page.wait_for_selector("body", timeout=30000) # run deterministic interactions here await browser.close()
asyncio.run(run())
Recommended CDP workflow
-
Create one session and keep its SESSION_ID as the single source of truth
-
CDP handshake preflight (connectOverCDP ) before deeper task logic
-
page.goto(url) (or rely on session url at creation)
-
Wait for stable UI (waitForLoadState , waitForSelector )
-
Interact using selectors (fill , click )
-
Verify success via DOM (preferred), or via scrape + known success text
-
Release session
Failure handling inside CDP flow
-
If a CDP operation throws, wait and retry once with longer timeouts.
-
If the same selector fails twice, use one backup selector and retry once.
-
Do not recreate the session after a single transient timeout.
Example (Playwright-style pseudo)
// connect to session websocketUrl
// const wsUrl = ${session.websocketUrl}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}
// const browser = await chromium.connectOverCDP(wsUrl)
// const page = (await browser.contexts()[0].pages())[0] ?? await browser.newPage()
await page.goto("https://example.com"); await page.waitForLoadState("domcontentloaded"); await page.fill('input[name="email"]', "test@test.com"); await page.click('button[type="submit"]');
// verify success await page.waitForSelector("text=Thanks for subscribing", { timeout: 10000 });
Prefer CDP-native solutions before falling back to Computer actions:
-
JS dialogs: handle via dialog listeners
-
File uploads: setInputFiles (avoid OS file picker)
-
Permissions: grant at browser context level when possible
Step 2B (Fallback): Computer API (mouse/keyboard actions)
Use Computer actions when:
-
CDP selectors fail repeatedly and you need a visual “nudge”
-
You’re blocked by a stubborn overlay/focus trap
-
A browser-native-ish prompt is blocking progress
-
You need quick recovery (Esc, click outside, scroll, etc.)
Endpoint: POST /v1/sessions/{id}/computer
Hard-learned schema rules (avoid validation errors)
-
There is no navigate action.
-
press_key requires keys as an array (NOT key )
-
scroll uses delta_y / delta_x (NOT direction/amount)
Action reference (safe subset)
take_screenshot: { "action": "take_screenshot" }
click_mouse: { "action": "click_mouse", "button": "left", "coordinates": [x,y], "screenshot": true }
type_text: { "action": "type_text", "text": "...", "screenshot": true }
press_key: { "action": "press_key", "keys": ["Enter"], "screenshot": true }
scroll: { "action": "scroll", "delta_y": 800, "coordinates": [x,y], "screenshot": true }
wait: { "action": "wait", "duration": 2000, "screenshot": true }
Computer-first recovery playbook (fast unstick)
-
take_screenshot
-
press_key → ["Escape"]
-
click outside modal area
-
scroll a bit (delta_y )
-
screenshot again
-
retry CDP approach once the blocker is gone
Anti-bot / blocker detection and response
-
Cloudflare or anti-bot challenge wording appears (Just a moment , Checking your browser , etc.): wait, capture screenshot, then one Computer recovery pass.
-
Repeated click interception or overlay coverage persists: screenshot, press_key ["Escape"] , click outside modal, scroll, screenshot.
-
Repeated wait-for-selector on same element: inspect blocker state first before changing selectors.
Navigating without CDP (fallback)
Since there is no navigate action, emulate it:
-
Click address bar area (top center)
-
type_text URL
-
press_key ["Enter"]
-
wait
- screenshot
Step 3: CAPTCHA handling
Best default:
- set solveCaptcha: true when creating a session
If stuck:
-
use viewer URL for human-in-the-loop
-
try computer recovery steps (scroll/hover/click checkbox) only if needed
Step 4: Extract session context (cookies/storage)
Endpoint: GET /v1/sessions/{id}/context
Use to:
-
persist login state
-
debug whether session stored cookies/localStorage
-
export state for follow-up tasks
Note: if cookies/storage are empty, it may mean:
-
you never actually logged in
-
the page is blocked
-
you’re in a different origin than expected
-
session expired and you queried the wrong ID
Step 5: Release session (always)
Endpoint: POST /v1/sessions/{id}/release
Rule:
-
Release as soon as you’ve verified success or determined you can’t proceed.
-
If release returns Session not found after success verification, treat as completed.
Recipes
Recipe: Newsletter signup (CDP-first)
POST /v1/scrape to find:
-
email input selector
-
submit selector
-
success message text (for verification)
Create session with long enough timeout:
{ "url": "https://site.com", "timeout": 600000 }
-
Use CDP:
-
goto
-
fill
-
click submit
-
verify success text/element
-
Release session.
Recipe: Login flow (CDP-first)
- Create session with timeout
- optionally solveCaptcha
-
CDP:
-
goto(login)
-
fill(username/password)
-
click(sign in)
-
wait for logged-in selector
-
Verify via DOM (profile avatar / logout button / dashboard URL)
-
Optionally GET /context to confirm cookies exist
-
Release
Recipe: Stuck on an overlay (hybrid)
-
CDP attempts fail due to overlay/click interception
-
Use Computer API:
-
screenshot
-
press Esc
-
click close “X”
-
scroll slightly
-
screenshot
-
Return to CDP and continue with selectors
-
Verify + release
Troubleshooting (Error → Fix)
invalid_union / “No matching discriminator”
Cause: unsupported action or wrong payload shape. Fix:
-
Use only the documented Computer actions
-
Remove any navigate action usage
Invalid input: expected array … path: keys
Cause: you used key instead of keys . Fix:
{ "action": "press_key", "keys": ["Enter"] }
Scroll does nothing / “Scrolled up by 0 at (0,0)”
Cause: using direction/amount or missing delta_y . Fix:
{ "action": "scroll", "delta_y": 800, "coordinates": [960, 540] }
Session ... not found
Cause: session expired/released OR you used an old ID. Fix:
-
Create a new session with a longer timeout
-
Update the stored SESSION_ID everywhere
-
Don’t mix multiple sessions unless necessary
-
If this happens on release after successful verification, treat cleanup as already complete
connectOverCDP ... 502 Bad Gateway (to wss://connect.steel.dev/ )
Cause: WS connection missing required auth in query string in this environment. Fix:
const wsUrl = ${session.websocketUrl}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)};
await chromium.connectOverCDP(wsUrl);
Curl errors like “blank argument where content is expected”
Cause: broken shell quoting / multiline JSON issues. Fix:
-
Use one-line JSON with --data-raw
-
Or build payload with jq -n and pass it safely
SyntaxError / malformed page.evaluate script
Cause: mixed quoting or invalid JS embedded in shell/JSON. Fix:
-
Keep JS scripts short and pass as raw heredocs or files.
-
Validate escaping before embedding script text in one-liners.
-
Fall back to one clean script per run instead of incremental inline patches.
Cannot find module 'playwright' or runtime import failures
Cause: missing playwright package in the execution environment. Fix:
-
Use one runtime per task and confirm module availability first.
-
Install dependency before running or switch to a Python Playwright path consistently.
write_stdin failed: stdin is closed
Cause: writing to a terminated subprocess. Fix:
-
Use session lifecycle to avoid interactive drift.
-
Treat closed stdin as terminal for that branch; proceed with command-based rerun.
Best Practices (to prevent the exact failures from the logs)
CDP-first by default
-
Use CDP for navigation + selectors + verification
-
Only use Computer actions as an escape hatch
Always verify
For “submit” tasks:
-
Prefer DOM verification (CDP wait for success)
-
Or re-scrape and look for success text / state change
-
Don’t claim success based on “click happened”
Verification contract:
-
Require one of the following before completion: expected URL change.
-
Require one of the following before completion: visible success element.
-
Require one of the following before completion: expected text or state change.
-
If no post-condition is met, continue the retry ladder or return a blocker reason.
Bound your retries (avoid spirals)
Suggested retry ladder:
-
CDP attempt (selectors + waits)
-
CDP attempt (adjust selectors, wait longer)
-
Computer recovery (Esc/click outside/scroll)
-
One final CDP attempt If still blocked: stop and report what’s blocking progress.
Standardized stop conditions:
-
No more than 4 total retry loops per task.
-
Session replacement only if expiration is confirmed (Session not found ).
-
At most one Computer recovery pass unless a new blocker category is observed.
Session hygiene
-
Set timeout
-
Reuse a single session per task whenever possible
-
Release sessions
-
Keep a single authoritative SESSION_ID
-
Treat release -> Session not found as non-fatal if success was already verified
Secret hygiene
-
Never request/paste keys
-
Never echo keys in logs
-
Prefer env vars
Summary
-
Stateless endpoints for quick extraction/screenshots/PDFs.
-
Sessions + CDP for reliable multi-step automation.
-
Computer actions as a fallback to break through blockers or recover from stuck UI.
-
Always verify outcomes and manage session lifecycles correctly.