steel-browsing-skill

Steel Browsing Skill (CDP-first)

This Skill enables reliable web browsing + automation via Steel.

Core operating principle (what we learned)

Default to CDP (Playwright/Puppeteer) for anything interactive. Use Computer API actions only as a tactical fallback for “visual” recovery (focus traps, stubborn overlays, browser-native-ish prompts, CAPTCHAs, etc.).

Default run profile (apply unless user overrides)

Assume anti-bot defenses may exist.
Prefer Steel session + CDP for most web tasks, even when scrape might appear possible.
Start sessions with anti-captcha/stealth settings enabled.
Keep one authoritative SESSION_ID and reuse it through retries.
Fail fast only after the bounded retry ladder (defined later) exhausts.

Session contract (before any task)

Use exactly one source of truth for state: SESSION_ID , SESSION_WS , SESSION_VIEWER_URL .
SESSION_ID comes from /v1/sessions .
SESSION_WS is websocketUrl returned by create-session.
SESSION_VIEWER_URL is for human verification only.
Reuse the same SESSION_ID and SESSION_WS for all retries.
Replace all session variables only when the previous session is proven gone (Session not found or confirmed hard expiration).
Maximum 1 replacement of a session per task unless explicit expiration is confirmed.

Golden template (default hard mode)

Use this as the default flow for any new site:

POST /v1/sessions { "url": "https://target.site", "timeout": 900000, "solveCaptcha": true, "stealthConfig": { "humanizeInteractions": true, "autoCaptchaSolving": true }, "deviceConfig": { "device": "desktop" }, "region": "iad", "useProxy": false }

If using Playwright/Puppeteer, connect CDP with: websocketUrl + "&apiKey=" + encodeURIComponent(STEEL_API_KEY)
Run the interaction with selector-based waits and DOM verification.
If blocked/hung, do bounded fallback via POST /v1/sessions/{id}/computer (Esc / close overlay / small scroll), then retry once.
Always release in finally , even on failure.

Golden runbook (single-task template)

Use this exact sequence before each interactive task:

Preflight

: "${STEEL_API_KEY:?missing STEEL_API_KEY}" command -v curl >/dev/null || exit 1 command -v jq >/dev/null || exit 1

Create one session and export state

RESPONSE=$(curl -sS -X POST https://api.steel.dev/v1/sessions
-H "Content-Type: application/json"
-H "steel-api-key: $STEEL_API_KEY"
--data-raw '{"url":"https://target.site","timeout":900000,"solveCaptcha":true,"stealthConfig":{"humanizeInteractions":true,"autoCaptchaSolving":true},"deviceConfig":{"device":"desktop"},"region":"iad","useProxy":false}')

SESSION_ID=$(echo "$RESPONSE" | jq -r .id) SESSION_WS=$(echo "$RESPONSE" | jq -r --arg key "$STEEL_API_KEY" '.websocketUrl + "&apiKey=" + $key') SESSION_VIEWER_URL=$(echo "$RESPONSE" | jq -r .sessionViewerUrl)

Run CDP automation (single runtime path)
Use one runtime only (Playwright JS or Python Playwright).
Pass SESSION_WS and TARGET_URL as env vars.
On any recoverable exception, run one longer-timeout retry before fallback.
Verify post-condition
URL changed to target destination OR
expected success selector visible OR
expected state/text changed.
Release

curl -sS -X POST https://api.steel.dev/v1/sessions/"$SESSION_ID"/release
-H "steel-api-key: $STEEL_API_KEY" || true

Bounded fallback
If blocked: one Computer recovery pass (take_screenshot , press_key ["Escape"] , click outside, scroll ) then one final CDP retry.
If still blocked: stop and report blocker reason.

Optional scripts for repetitive steps (non-mandatory)

Use these local helpers when you want fast, low-risk execution:

scripts/create_steel_session.sh – create session and export SESSION_ID , SESSION_WS , SESSION_VIEWER_URL , TARGET_URL .
scripts/release_steel_session.sh – idempotent release helper.
scripts/cdp_template.js – compact Playwright-CDP interaction scaffold.

Examples:

examples/runbook.md for one-shot copy/paste flow using the helper scripts.

Why:

CDP gives deterministic navigation + selectors + robust waits and verifications.
Computer actions are slower and fragile (coordinates), but excellent as an escape hatch.

Security & Setup

API key handling (mandatory policy)

Do not ask the user to paste API keys into chat.
Expect STEEL_API_KEY in the environment.

Example header (bash/curl):

-H "steel-api-key: $STEEL_API_KEY"

Base URL:

https://api.steel.dev

Runtime preflight (before first request)

if ! command -v jq >/dev/null; then install or fall back with safe shell JSON.
if ! command -v node >/dev/null; then switch to Python-only CDP path.
if ! command -v python >/dev/null; then use Node-only path.
if ! command -v playwright >/dev/null for chosen runtime, install before interaction or switch to Python Playwright package.
Validate at session creation time that timeout is present and includes anti-bot flags for interactive targets.
Set STEEL_API_KEY and never print request headers containing it.

Standard session variable setup

Set and reuse export SESSION_ID=<id> .
Set and reuse export SESSION_WS="<websocketUrl>&apiKey=${STEEL_API_KEY}" .
Set and reuse export SESSION_VIEWER_URL=<sessionViewerUrl> .
Treat missing SESSION_WS as hard failure before CDP code execution.

Quick Decision Tree

Use Stateless endpoints when:

You only need page content, a screenshot, or a PDF
No login/multi-step flow required

✅ Use:

POST /v1/scrape
POST /v1/screenshot
POST /v1/pdf

Use Sessions when:

Login required
Multi-step interaction
Form submissions
JS-heavy apps
You need cookies/localStorage persistence

✅ Use:

POST /v1/sessions (create; always set timeout )
CDP (preferred) using websocketUrl from session response
POST /v1/sessions/{id}/computer (fallback / recovery)
GET /v1/sessions/{id}/context (cookies/storage)
POST /v1/sessions/{id}/release (always)

Mode 1: Stateless (One-shot)

Scrape

Use for clean text extraction and planning selectors.

Endpoint: POST /v1/scrape

Example:

{ "url": "https://example.com", "format": ["markdown"], "screenshot": false, "pdf": false }

Formats:

markdown (best for summarization)
cleaned_html (best for parsing + finding forms/selectors)
html (raw)

Tip: For form automation, scrape first and record:

input selectors (name=email , input[type=email] , etc.)
submit button selector
success message text/element to verify completion

Screenshot

Endpoint: POST /v1/screenshot

Example:

{ "url": "https://example.com", "fullPage": true }

Endpoint: POST /v1/pdf

Example:

{ "url": "https://example.com" }

Mode 2: Sessions (Stateful)

Session lifecycle (critical)

Sessions expire if you don’t set a long enough timeout . Common failure symptom: Session ... not found .

Rule:

Always set timeout for anything non-trivial.
Track the active SESSION_ID in one place and don’t mix IDs.
Reuse the same session for retries; don’t create a new session for each selector tweak.
Bound session creation attempts (for example: max 2 per task) to avoid session sprawl.
Always release when done.
If release returns Session not found after successful work, treat it as already-ended/idempotent cleanup.

Create session

Endpoint: POST /v1/sessions

Minimal:

{ "timeout": 600000 }

Common options:

{ "url": "https://example.com", "timeout": 900000, "solveCaptcha": true, "stealthConfig": { "humanizeInteractions": true, "autoCaptchaSolving": true }, "deviceConfig": { "device": "desktop" }, "region": "iad", "useProxy": false }

For most sites, the minimum anti-bot-safe session is:

{ "timeout": 900000, "solveCaptcha": true, "stealthConfig": { "humanizeInteractions": true, "autoCaptchaSolving": true }, "region": "iad" }

The response typically includes:

id
websocketUrl (use for CDP)
sessionViewerUrl / debugUrl (use for human verification)

Step 2A (Preferred): Control the session via CDP (Playwright/Puppeteer)

When to use CDP

Use CDP for:

navigation (goto )
selector-based clicks and fills
robust waits and assertions
reliable verification (URL/text/DOM)

How to connect

Use the websocketUrl returned by POST /v1/sessions . (Do not guess the URL pattern; Steel returns the correct one for your session.)

Important auth note from field use:

For some environments, connectOverCDP requires appending apiKey in the WS query string.
Safe default:

const wsUrl = ${session.websocketUrl}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}; const browser = await chromium.connectOverCDP(wsUrl);

Stable CDP script pattern (copy-safe)

Use one runtime and export required variables.

import { chromium } from "playwright";

const wsUrl = ${process.env.SESSION_WS}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}; const target = process.env.TARGET_URL || "https://example.com";

(async () => { const browser = await chromium.connectOverCDP(wsUrl); const context = browser.contexts()[0]; const page = context.pages()[0] || (await context.newPage());

await page.goto(target, { waitUntil: "domcontentloaded", timeout: 60000 }); await page.waitForSelector("body", { timeout: 30000 }); // run deterministic interactions here await browser.close(); })();

import os import asyncio from playwright.async_api import async_playwright

async def run(): ws_url = f"{os.environ['SESSION_WS']}&apiKey={os.environ['STEEL_API_KEY']}" target = os.environ.get("TARGET_URL", "https://example.com") async with async_playwright() as p: browser = await p.chromium.connect_over_cdp(ws_url) context = browser.contexts[0] page = context.pages[0] if context.pages() else await context.new_page() await page.goto(target, wait_until="domcontentloaded", timeout=60000) await page.wait_for_selector("body", timeout=30000) # run deterministic interactions here await browser.close()

asyncio.run(run())

Recommended CDP workflow

Create one session and keep its SESSION_ID as the single source of truth
CDP handshake preflight (connectOverCDP ) before deeper task logic
page.goto(url) (or rely on session url at creation)
Wait for stable UI (waitForLoadState , waitForSelector )
Interact using selectors (fill , click )
Verify success via DOM (preferred), or via scrape + known success text
Release session

Failure handling inside CDP flow

If a CDP operation throws, wait and retry once with longer timeouts.
If the same selector fails twice, use one backup selector and retry once.
Do not recreate the session after a single transient timeout.

Example (Playwright-style pseudo)

// connect to session websocketUrl // const wsUrl = ${session.websocketUrl}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)} // const browser = await chromium.connectOverCDP(wsUrl) // const page = (await browser.contexts()[0].pages())[0] ?? await browser.newPage()

await page.goto("https://example.com"); await page.waitForLoadState("domcontentloaded"); await page.fill('input[name="email"]', "test@test.com"); await page.click('button[type="submit"]');

// verify success await page.waitForSelector("text=Thanks for subscribing", { timeout: 10000 });

Prefer CDP-native solutions before falling back to Computer actions:

JS dialogs: handle via dialog listeners
File uploads: setInputFiles (avoid OS file picker)
Permissions: grant at browser context level when possible

Step 2B (Fallback): Computer API (mouse/keyboard actions)

Use Computer actions when:

CDP selectors fail repeatedly and you need a visual “nudge”
You’re blocked by a stubborn overlay/focus trap
A browser-native-ish prompt is blocking progress
You need quick recovery (Esc, click outside, scroll, etc.)

Endpoint: POST /v1/sessions/{id}/computer

Hard-learned schema rules (avoid validation errors)

There is no navigate action.
press_key requires keys as an array (NOT key )
scroll uses delta_y / delta_x (NOT direction/amount)

Action reference (safe subset)

take_screenshot: { "action": "take_screenshot" }

click_mouse: { "action": "click_mouse", "button": "left", "coordinates": [x,y], "screenshot": true }

type_text: { "action": "type_text", "text": "...", "screenshot": true }

press_key: { "action": "press_key", "keys": ["Enter"], "screenshot": true }

scroll: { "action": "scroll", "delta_y": 800, "coordinates": [x,y], "screenshot": true }

wait: { "action": "wait", "duration": 2000, "screenshot": true }

Computer-first recovery playbook (fast unstick)

take_screenshot
press_key → ["Escape"]
click outside modal area
scroll a bit (delta_y )
screenshot again
retry CDP approach once the blocker is gone

Anti-bot / blocker detection and response

Cloudflare or anti-bot challenge wording appears (Just a moment , Checking your browser , etc.): wait, capture screenshot, then one Computer recovery pass.
Repeated click interception or overlay coverage persists: screenshot, press_key ["Escape"] , click outside modal, scroll, screenshot.
Repeated wait-for-selector on same element: inspect blocker state first before changing selectors.

Navigating without CDP (fallback)

Since there is no navigate action, emulate it:

Click address bar area (top center)
type_text URL
press_key ["Enter"]
wait

screenshot

Step 3: CAPTCHA handling

Best default:

set solveCaptcha: true when creating a session

If stuck:

use viewer URL for human-in-the-loop
try computer recovery steps (scroll/hover/click checkbox) only if needed

Step 4: Extract session context (cookies/storage)

Endpoint: GET /v1/sessions/{id}/context

Use to:

persist login state
debug whether session stored cookies/localStorage
export state for follow-up tasks

Note: if cookies/storage are empty, it may mean:

you never actually logged in
the page is blocked
you’re in a different origin than expected
session expired and you queried the wrong ID

Step 5: Release session (always)

Endpoint: POST /v1/sessions/{id}/release

Rule:

Release as soon as you’ve verified success or determined you can’t proceed.
If release returns Session not found after success verification, treat as completed.

Recipes

Recipe: Newsletter signup (CDP-first)

POST /v1/scrape to find:

email input selector
submit selector
success message text (for verification)

Create session with long enough timeout:

{ "url": "https://site.com", "timeout": 600000 }

Use CDP:
goto
fill
click submit
verify success text/element
Release session.

Recipe: Login flow (CDP-first)

Create session with timeout

optionally solveCaptcha

CDP:
goto(login)
fill(username/password)
click(sign in)
wait for logged-in selector
Verify via DOM (profile avatar / logout button / dashboard URL)
Optionally GET /context to confirm cookies exist
Release

Recipe: Stuck on an overlay (hybrid)

CDP attempts fail due to overlay/click interception
Use Computer API:
screenshot
press Esc
click close “X”
scroll slightly
screenshot
Return to CDP and continue with selectors
Verify + release

Troubleshooting (Error → Fix)

invalid_union / “No matching discriminator”

Cause: unsupported action or wrong payload shape. Fix:

Use only the documented Computer actions
Remove any navigate action usage

Invalid input: expected array … path: keys

Cause: you used key instead of keys . Fix:

{ "action": "press_key", "keys": ["Enter"] }

Scroll does nothing / “Scrolled up by 0 at (0,0)”

Cause: using direction/amount or missing delta_y . Fix:

{ "action": "scroll", "delta_y": 800, "coordinates": [960, 540] }

Session ... not found

Cause: session expired/released OR you used an old ID. Fix:

Create a new session with a longer timeout
Update the stored SESSION_ID everywhere
Don’t mix multiple sessions unless necessary
If this happens on release after successful verification, treat cleanup as already complete

connectOverCDP ... 502 Bad Gateway (to wss://connect.steel.dev/ )

Cause: WS connection missing required auth in query string in this environment. Fix:

const wsUrl = ${session.websocketUrl}&apiKey=${encodeURIComponent(process.env.STEEL_API_KEY)}; await chromium.connectOverCDP(wsUrl);

Curl errors like “blank argument where content is expected”

Cause: broken shell quoting / multiline JSON issues. Fix:

Use one-line JSON with --data-raw
Or build payload with jq -n and pass it safely

SyntaxError / malformed page.evaluate script

Cause: mixed quoting or invalid JS embedded in shell/JSON. Fix:

Keep JS scripts short and pass as raw heredocs or files.
Validate escaping before embedding script text in one-liners.
Fall back to one clean script per run instead of incremental inline patches.

Cannot find module 'playwright' or runtime import failures

Cause: missing playwright package in the execution environment. Fix:

Use one runtime per task and confirm module availability first.
Install dependency before running or switch to a Python Playwright path consistently.

write_stdin failed: stdin is closed

Cause: writing to a terminated subprocess. Fix:

Use session lifecycle to avoid interactive drift.
Treat closed stdin as terminal for that branch; proceed with command-based rerun.

Best Practices (to prevent the exact failures from the logs)

CDP-first by default

Use CDP for navigation + selectors + verification
Only use Computer actions as an escape hatch

Always verify

For “submit” tasks:

Prefer DOM verification (CDP wait for success)
Or re-scrape and look for success text / state change
Don’t claim success based on “click happened”

Verification contract:

Require one of the following before completion: expected URL change.
Require one of the following before completion: visible success element.
Require one of the following before completion: expected text or state change.
If no post-condition is met, continue the retry ladder or return a blocker reason.

Bound your retries (avoid spirals)

Suggested retry ladder:

CDP attempt (selectors + waits)
CDP attempt (adjust selectors, wait longer)
Computer recovery (Esc/click outside/scroll)
One final CDP attempt If still blocked: stop and report what’s blocking progress.

Standardized stop conditions:

No more than 4 total retry loops per task.
Session replacement only if expiration is confirmed (Session not found ).
At most one Computer recovery pass unless a new blocker category is observed.

Session hygiene

Set timeout
Reuse a single session per task whenever possible
Release sessions
Keep a single authoritative SESSION_ID
Treat release -> Session not found as non-fatal if success was already verified

Secret hygiene

Never request/paste keys
Never echo keys in logs
Prefer env vars

Summary

Stateless endpoints for quick extraction/screenshots/PDFs.
Sessions + CDP for reliable multi-step automation.
Computer actions as a fallback to break through blockers or recover from stuck UI.
Always verify outcomes and manage session lifecycles correctly.

steel-browsing-skill

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

steel-browser

Bitpanda

Bark Push

Sslgen