browser

Browser automation via Vercel's agent-browser CLI. Runs headless by default; use --headed for visible window. Uses ref-based selection (@e1, @e2) from accessibility snapshots.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "browser" with this command: npx skills add gmickel/gmickel-claude-marketplace/gmickel-gmickel-claude-marketplace-browser

Browser Automation

Browser automation via Vercel's agent-browser CLI. Runs headless by default; use --headed for visible window. Uses ref-based selection (@e1, @e2) from accessibility snapshots.

Setup & Version Check

Check installed + print version

command -v agent-browser >/dev/null 2>&1 && agent-browser --version || echo "MISSING: npm i -g agent-browser && agent-browser install"

Always run the version check at the start of a browser session. agent-browser iterates quickly — check for updates if the version is more than a week old:

npm view agent-browser version # Latest published

Core Workflow

  • Open URL

  • Snapshot to get refs

  • Interact via refs

  • Re-snapshot after DOM changes

agent-browser open https://example.com agent-browser snapshot -i # Interactive elements with refs agent-browser click @e1 agent-browser wait --load networkidle # Wait for SPA to settle agent-browser snapshot -i # Re-snapshot after change

Command Chaining

Commands can be chained with && in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.

Chain open + wait + snapshot

agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i

Chain multiple interactions

agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3

When to chain: Use && when you don't need intermediate output (e.g., open + wait + screenshot). Run separately when you need to parse output first (e.g., snapshot to discover refs, then interact).

Essential Commands

Navigation

agent-browser open <url> # Navigate (aliases: goto, navigate) agent-browser back # Go back agent-browser forward # Go forward agent-browser reload # Reload agent-browser close # Close browser (aliases: quit, exit)

Snapshots

agent-browser snapshot # Full accessibility tree agent-browser snapshot -i # Interactive only (recommended) agent-browser snapshot -i -C # Include cursor-interactive (divs with onclick, cursor:pointer) agent-browser snapshot -i --json # JSON for parsing agent-browser snapshot -c # Compact (remove empty) agent-browser snapshot -d 3 # Limit depth agent-browser snapshot -s "#main" # Scope to selector

Interactions

agent-browser click @e1 # Click agent-browser click @e1 --new-tab # Click and open in new tab agent-browser dblclick @e1 # Double-click agent-browser fill @e1 "text" # Clear + fill input agent-browser type @e1 "text" # Type without clearing agent-browser press Enter # Key press agent-browser press Control+a # Key combination agent-browser keydown Shift # Hold key down agent-browser keyup Shift # Release key agent-browser hover @e1 # Hover agent-browser check @e1 # Check checkbox agent-browser uncheck @e1 # Uncheck agent-browser select @e1 "option" # Dropdown agent-browser select @e1 "a" "b" # Multi-select agent-browser scroll down 500 # Scroll direction + pixels agent-browser scrollintoview @e1 # Scroll element visible agent-browser drag @e1 @e2 # Drag and drop agent-browser upload @e1 file.pdf # Upload files

Get Info

agent-browser get text @e1 # Element text agent-browser get value @e1 # Input value agent-browser get html @e1 # Element HTML agent-browser get attr href @e1 # Attribute agent-browser get title # Page title agent-browser get url # Current URL agent-browser get count "button" # Count matches agent-browser get box @e1 # Bounding box (x, y, width, height) agent-browser get styles @e1 # Computed styles (font, color, bg)

Check State

agent-browser is visible @e1 # Check visibility agent-browser is enabled @e1 # Check enabled agent-browser is checked @e1 # Check checkbox state

Wait

agent-browser wait @e1 # Wait for element visible agent-browser wait 2000 # Wait milliseconds agent-browser wait --text "Success" # Wait for text (-t) agent-browser wait --url "**/dashboard" # Wait for URL pattern (-u) agent-browser wait --load networkidle # Wait for network idle (-l) agent-browser wait --fn "window.ready" # Wait for JS condition (-f)

Screenshots & Capture

agent-browser screenshot # Viewport to temp dir agent-browser screenshot out.png # Save to file agent-browser screenshot --full # Full page agent-browser screenshot --annotate # Annotated with numbered element labels agent-browser pdf out.pdf # Save as PDF

Diff (Compare Page States)

Compare accessibility tree or visual state before/after changes:

Snapshot diff: compare current vs last snapshot

agent-browser snapshot -i # Baseline agent-browser click @e2 # Action agent-browser diff snapshot # See what changed

Snapshot diff: compare vs saved file

agent-browser diff snapshot --baseline before.txt

Visual pixel diff

agent-browser diff screenshot --baseline before.png

Compare two URLs

agent-browser diff url https://staging.example.com https://prod.example.com agent-browser diff url <url1> <url2> --wait-until networkidle agent-browser diff url <url1> <url2> --selector "#main" agent-browser diff url <url1> <url2> --screenshot # Visual diff

diff snapshot uses + /- like git diff. diff screenshot produces a diff image with changed pixels in red + mismatch percentage.

Semantic Locators

Alternative when you know the element (no snapshot needed):

agent-browser find role button click --name "Submit" agent-browser find text "Sign In" click agent-browser find text "Sign In" click --exact # Exact match only agent-browser find label "Email" fill "user@test.com" agent-browser find placeholder "Search" fill "query" agent-browser find alt "Logo" click agent-browser find title "Close" click agent-browser find testid "submit-btn" click agent-browser find first ".item" click agent-browser find last ".item" click agent-browser find nth 2 "a" hover

Annotated Screenshots (Vision Mode)

Use --annotate to take a screenshot with numbered labels overlaid on interactive elements. Each label [N] maps to ref @eN . Also caches refs — interact immediately without separate snapshot.

agent-browser screenshot --annotate

Output includes image path + legend:

[1] @e1 button "Submit"

[2] @e2 link "Home"

[3] @e3 textbox "Email"

agent-browser click @e2 # Use ref from annotated screenshot

Use when: unlabeled icon buttons, visual-only elements, canvas/charts (invisible to text snapshots), or spatial reasoning needed.

JavaScript Evaluation

Use eval to run JS in the browser. Shell quoting can corrupt complex expressions — use --stdin or -b to avoid issues.

Simple expressions: regular quoting OK

agent-browser eval 'document.title' agent-browser eval 'document.querySelectorAll("img").length'

Complex JS: use --stdin with heredoc (RECOMMENDED)

agent-browser eval --stdin <<'EVALEOF' JSON.stringify( Array.from(document.querySelectorAll("img")) .filter(i => !i.alt) .map(i => ({ src: i.src.split("/").pop(), width: i.width })) ) EVALEOF

Alternative: base64 encoding (bypasses all shell escaping)

agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"

Rules of thumb:

  • Single-line, no nested quotes → eval 'expression' with single quotes

  • Nested quotes, arrow functions, template literals, multiline → eval --stdin <<'EVALEOF'

  • Programmatic/generated scripts → eval -b with base64

Sessions

Parallel isolated browsers (see auth.md for multi-user auth):

agent-browser --session test1 open site-a.com agent-browser --session test2 open site-b.com agent-browser session list

Session Persistence

Auto-save/restore cookies and localStorage across browser restarts:

agent-browser --session-name myapp open https://app.example.com/login

... login flow ...

agent-browser close # State auto-saved to ~/.agent-browser/sessions/

Next time: state auto-loaded

agent-browser --session-name myapp open https://app.example.com/dashboard

Encrypt state at rest

export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32) agent-browser --session-name secure open https://app.example.com

Manage saved states

agent-browser state list agent-browser state show myapp-default.json agent-browser state clear myapp agent-browser state clean --older-than 7

Connect to Existing Chrome

Auto-discover running Chrome with remote debugging

agent-browser --auto-connect open https://example.com agent-browser --auto-connect snapshot

Or explicit CDP port

agent-browser --cdp 9222 snapshot

Local Files

agent-browser --allow-file-access open file:///path/to/document.pdf agent-browser --allow-file-access open file:///path/to/page.html agent-browser screenshot output.png

iOS Simulator (Mobile Safari)

List available iOS simulators

agent-browser device list

Launch Safari on specific device

agent-browser -p ios --device "iPhone 16 Pro" open https://example.com

Same workflow: snapshot, interact, re-snapshot

agent-browser -p ios snapshot -i agent-browser -p ios tap @e1 # Tap (alias for click) agent-browser -p ios fill @e2 "text" agent-browser -p ios swipe up # Mobile gesture agent-browser -p ios screenshot mobile.png agent-browser -p ios close

Requires: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest ). Real devices: Use --device "<UDID>" (UDID from xcrun xctrace list devices ).

Configuration File

Create agent-browser.json in project root for persistent settings:

{ "headed": true, "proxy": "http://localhost:8080", "profile": "./browser-data" }

Priority (lowest→highest): ~/.agent-browser/config.json < ./agent-browser.json < env vars < CLI flags. Use --config <path> or AGENT_BROWSER_CONFIG for custom path. All CLI options map to camelCase keys (--executable-path → "executablePath" ).

Timeouts and Slow Pages

Default Playwright timeout is 60s. For slow pages, use explicit waits:

agent-browser wait --load networkidle # Best for slow pages agent-browser wait "#content" # Wait for specific element agent-browser wait @e1 # Wait for ref agent-browser wait --url "**/dashboard" # Wait after redirects agent-browser wait --fn "document.readyState === 'complete'" agent-browser wait 5000 # Fixed duration (last resort)

Use wait --load networkidle after open for consistently slow sites.

JSON Output

Add --json for machine-readable output:

agent-browser snapshot -i --json agent-browser get text @e1 --json agent-browser is visible @e1 --json

Recording & Profiling

Video recording

agent-browser record start demo.webm

... actions ...

agent-browser record stop agent-browser record restart take2.webm # Stop current + start new

Chrome DevTools profiling

agent-browser profiler start

... actions ...

agent-browser profiler stop trace.json

See debugging.md for details.

Examples

Form Submission

agent-browser open https://example.com/form agent-browser snapshot -i agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Verify result

Auth with Saved State

Login once

agent-browser open https://app.example.com/login agent-browser snapshot -i agent-browser fill @e1 "username" agent-browser fill @e2 "password" agent-browser click @e3 agent-browser wait --url "**/dashboard" agent-browser state save auth.json

Later: reuse saved auth

agent-browser state load auth.json agent-browser open https://app.example.com/dashboard

More auth patterns in auth.md.

Token Auth (Skip Login)

agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}' agent-browser snapshot -i --json

Debugging

agent-browser --headed open example.com # Show browser window agent-browser console # View console messages agent-browser errors # View page errors agent-browser highlight @e1 # Highlight element agent-browser --debug open example.com # Verbose output

See debugging.md for traces, profiling, video, common issues.

Session Cleanup

Always close sessions when done to avoid leaked processes:

agent-browser close # Close default session agent-browser --session name close # Close specific session

If previous session not closed properly, daemon may still be running. agent-browser close cleans it up.

Troubleshooting

"Browser not launched" error: Daemon stuck. Kill and retry:

pkill -f agent-browser && agent-browser open <url>

--headed not showing window: Daemon reuse bug. If daemon started headless, --headed is ignored. Kill daemon first:

agent-browser close pkill -f "node.*daemon.js.*AGENT_BROWSER" pkill -f "Google Chrome for Testing" sleep 1 agent-browser open <url> --headed

Window exists but not visible (macOS):

osascript -e 'tell application "Google Chrome for Testing" to activate'

Element not found: Re-snapshot after page changes. DOM may have updated.

Ref lifecycle: Refs (@e1 , @e2 ) are invalidated when the page changes. Always re-snapshot after clicks that navigate, form submissions, or dynamic content loading.

References

Topic File

Full command reference commands.md

Snapshot refs, lifecycle, troubleshooting snapshot-refs.md

Auth, OAuth, 2FA, state persistence auth.md

Sessions, parallel browsers, state session-management.md

Debugging, profiling, video recording debugging.md

Proxy, geo-testing, rotating proxies proxy.md

Network mocking, tabs, frames, dialogs, settings advanced.md

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

flow-next-plan-review

No summary provided by upstream source.

Repository SourceNeeds Review
General

flow-next

No summary provided by upstream source.

Repository SourceNeeds Review
General

flow-interview

No summary provided by upstream source.

Repository SourceNeeds Review
General

flow-plan-review

No summary provided by upstream source.

Repository SourceNeeds Review