Browser Automation

Browser automation via Vercel's agent-browser CLI. Runs headless by default; use --headed for visible window. Uses ref-based selection (@e1, @e2) from accessibility snapshots.

Setup & Version Check

Check installed + print version

command -v agent-browser >/dev/null 2>&1 && agent-browser --version || echo "MISSING: npm i -g agent-browser && agent-browser install"

Always run the version check at the start of a browser session. agent-browser iterates quickly — check for updates if the version is more than a week old:

npm view agent-browser version # Latest published

Core Workflow

Open URL
Snapshot to get refs
Interact via refs
Re-snapshot after DOM changes

agent-browser open https://example.com agent-browser snapshot -i # Interactive elements with refs agent-browser click @e1 agent-browser wait --load networkidle # Wait for SPA to settle agent-browser snapshot -i # Re-snapshot after change

Command Chaining

Commands can be chained with && in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.

Chain open + wait + snapshot

agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i

Chain multiple interactions

agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3

When to chain: Use && when you don't need intermediate output (e.g., open + wait + screenshot). Run separately when you need to parse output first (e.g., snapshot to discover refs, then interact).

Essential Commands

Navigation

agent-browser open <url> # Navigate (aliases: goto, navigate) agent-browser back # Go back agent-browser forward # Go forward agent-browser reload # Reload agent-browser close # Close browser (aliases: quit, exit)

Snapshots

agent-browser snapshot # Full accessibility tree agent-browser snapshot -i # Interactive only (recommended) agent-browser snapshot -i -C # Include cursor-interactive (divs with onclick, cursor:pointer) agent-browser snapshot -i --json # JSON for parsing agent-browser snapshot -c # Compact (remove empty) agent-browser snapshot -d 3 # Limit depth agent-browser snapshot -s "#main" # Scope to selector

Interactions

agent-browser click @e1 # Click agent-browser click @e1 --new-tab # Click and open in new tab agent-browser dblclick @e1 # Double-click agent-browser fill @e1 "text" # Clear + fill input agent-browser type @e1 "text" # Type without clearing agent-browser press Enter # Key press agent-browser press Control+a # Key combination agent-browser keydown Shift # Hold key down agent-browser keyup Shift # Release key agent-browser hover @e1 # Hover agent-browser check @e1 # Check checkbox agent-browser uncheck @e1 # Uncheck agent-browser select @e1 "option" # Dropdown agent-browser select @e1 "a" "b" # Multi-select agent-browser scroll down 500 # Scroll direction + pixels agent-browser scrollintoview @e1 # Scroll element visible agent-browser drag @e1 @e2 # Drag and drop agent-browser upload @e1 file.pdf # Upload files

Get Info

agent-browser get text @e1 # Element text agent-browser get value @e1 # Input value agent-browser get html @e1 # Element HTML agent-browser get attr href @e1 # Attribute agent-browser get title # Page title agent-browser get url # Current URL agent-browser get count "button" # Count matches agent-browser get box @e1 # Bounding box (x, y, width, height) agent-browser get styles @e1 # Computed styles (font, color, bg)

Check State

agent-browser is visible @e1 # Check visibility agent-browser is enabled @e1 # Check enabled agent-browser is checked @e1 # Check checkbox state

Wait

agent-browser wait @e1 # Wait for element visible agent-browser wait 2000 # Wait milliseconds agent-browser wait --text "Success" # Wait for text (-t) agent-browser wait --url "**/dashboard" # Wait for URL pattern (-u) agent-browser wait --load networkidle # Wait for network idle (-l) agent-browser wait --fn "window.ready" # Wait for JS condition (-f)

Screenshots & Capture

agent-browser screenshot # Viewport to temp dir agent-browser screenshot out.png # Save to file agent-browser screenshot --full # Full page agent-browser screenshot --annotate # Annotated with numbered element labels agent-browser pdf out.pdf # Save as PDF

Diff (Compare Page States)

Compare accessibility tree or visual state before/after changes:

Snapshot diff: compare current vs last snapshot

agent-browser snapshot -i # Baseline agent-browser click @e2 # Action agent-browser diff snapshot # See what changed

Snapshot diff: compare vs saved file

agent-browser diff snapshot --baseline before.txt

Visual pixel diff

agent-browser diff screenshot --baseline before.png

Compare two URLs

agent-browser diff url https://staging.example.com https://prod.example.com agent-browser diff url <url1> <url2> --wait-until networkidle agent-browser diff url <url1> <url2> --selector "#main" agent-browser diff url <url1> <url2> --screenshot # Visual diff

diff snapshot uses + /- like git diff. diff screenshot produces a diff image with changed pixels in red + mismatch percentage.

Semantic Locators

Alternative when you know the element (no snapshot needed):

agent-browser find role button click --name "Submit" agent-browser find text "Sign In" click agent-browser find text "Sign In" click --exact # Exact match only agent-browser find label "Email" fill "user@test.com" agent-browser find placeholder "Search" fill "query" agent-browser find alt "Logo" click agent-browser find title "Close" click agent-browser find testid "submit-btn" click agent-browser find first ".item" click agent-browser find last ".item" click agent-browser find nth 2 "a" hover

Annotated Screenshots (Vision Mode)

Use --annotate to take a screenshot with numbered labels overlaid on interactive elements. Each label [N] maps to ref @eN . Also caches refs — interact immediately without separate snapshot.

agent-browser screenshot --annotate

Output includes image path + legend:

[1] @e1 button "Submit"

[2] @e2 link "Home"

[3] @e3 textbox "Email"

agent-browser click @e2 # Use ref from annotated screenshot

Use when: unlabeled icon buttons, visual-only elements, canvas/charts (invisible to text snapshots), or spatial reasoning needed.

JavaScript Evaluation

Use eval to run JS in the browser. Shell quoting can corrupt complex expressions — use --stdin or -b to avoid issues.

Simple expressions: regular quoting OK

agent-browser eval 'document.title' agent-browser eval 'document.querySelectorAll("img").length'

Complex JS: use --stdin with heredoc (RECOMMENDED)

agent-browser eval --stdin <<'EVALEOF' JSON.stringify( Array.from(document.querySelectorAll("img")) .filter(i => !i.alt) .map(i => ({ src: i.src.split("/").pop(), width: i.width })) ) EVALEOF

Alternative: base64 encoding (bypasses all shell escaping)

agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"

Rules of thumb:

Single-line, no nested quotes → eval 'expression' with single quotes
Nested quotes, arrow functions, template literals, multiline → eval --stdin <<'EVALEOF'
Programmatic/generated scripts → eval -b with base64

Sessions

Parallel isolated browsers (see auth.md for multi-user auth):

agent-browser --session test1 open site-a.com agent-browser --session test2 open site-b.com agent-browser session list

Session Persistence

Auto-save/restore cookies and localStorage across browser restarts:

agent-browser --session-name myapp open https://app.example.com/login

... login flow ...

agent-browser close # State auto-saved to ~/.agent-browser/sessions/

Next time: state auto-loaded

agent-browser --session-name myapp open https://app.example.com/dashboard

Encrypt state at rest

export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32) agent-browser --session-name secure open https://app.example.com

Manage saved states

agent-browser state list agent-browser state show myapp-default.json agent-browser state clear myapp agent-browser state clean --older-than 7

Connect to Existing Chrome

Auto-discover running Chrome with remote debugging

agent-browser --auto-connect open https://example.com agent-browser --auto-connect snapshot

Or explicit CDP port

agent-browser --cdp 9222 snapshot

Local Files

agent-browser --allow-file-access open file:///path/to/document.pdf agent-browser --allow-file-access open file:///path/to/page.html agent-browser screenshot output.png

iOS Simulator (Mobile Safari)

List available iOS simulators

agent-browser device list

Launch Safari on specific device

agent-browser -p ios --device "iPhone 16 Pro" open https://example.com

Same workflow: snapshot, interact, re-snapshot

agent-browser -p ios snapshot -i agent-browser -p ios tap @e1 # Tap (alias for click) agent-browser -p ios fill @e2 "text" agent-browser -p ios swipe up # Mobile gesture agent-browser -p ios screenshot mobile.png agent-browser -p ios close

Requires: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest ). Real devices: Use --device "<UDID>" (UDID from xcrun xctrace list devices ).

Configuration File

Create agent-browser.json in project root for persistent settings:

{ "headed": true, "proxy": "http://localhost:8080", "profile": "./browser-data" }

Priority (lowest→highest): ~/.agent-browser/config.json < ./agent-browser.json < env vars < CLI flags. Use --config <path> or AGENT_BROWSER_CONFIG for custom path. All CLI options map to camelCase keys (--executable-path → "executablePath" ).

Timeouts and Slow Pages

Default Playwright timeout is 60s. For slow pages, use explicit waits:

agent-browser wait --load networkidle # Best for slow pages agent-browser wait "#content" # Wait for specific element agent-browser wait @e1 # Wait for ref agent-browser wait --url "**/dashboard" # Wait after redirects agent-browser wait --fn "document.readyState === 'complete'" agent-browser wait 5000 # Fixed duration (last resort)

Use wait --load networkidle after open for consistently slow sites.

JSON Output

Add --json for machine-readable output:

agent-browser snapshot -i --json agent-browser get text @e1 --json agent-browser is visible @e1 --json

Recording & Profiling

Video recording

agent-browser record start demo.webm

... actions ...

agent-browser record stop agent-browser record restart take2.webm # Stop current + start new

Chrome DevTools profiling

agent-browser profiler start

... actions ...

agent-browser profiler stop trace.json

See debugging.md for details.

Examples

Form Submission

agent-browser open https://example.com/form agent-browser snapshot -i agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Verify result

Auth with Saved State

Login once

agent-browser open https://app.example.com/login agent-browser snapshot -i agent-browser fill @e1 "username" agent-browser fill @e2 "password" agent-browser click @e3 agent-browser wait --url "**/dashboard" agent-browser state save auth.json

Later: reuse saved auth

agent-browser state load auth.json agent-browser open https://app.example.com/dashboard

More auth patterns in auth.md.

Token Auth (Skip Login)

agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}' agent-browser snapshot -i --json

Debugging

agent-browser --headed open example.com # Show browser window agent-browser console # View console messages agent-browser errors # View page errors agent-browser highlight @e1 # Highlight element agent-browser --debug open example.com # Verbose output

See debugging.md for traces, profiling, video, common issues.

Session Cleanup

Always close sessions when done to avoid leaked processes:

agent-browser close # Close default session agent-browser --session name close # Close specific session

If previous session not closed properly, daemon may still be running. agent-browser close cleans it up.

Troubleshooting

"Browser not launched" error: Daemon stuck. Kill and retry:

pkill -f agent-browser && agent-browser open <url>

--headed not showing window: Daemon reuse bug. If daemon started headless, --headed is ignored. Kill daemon first:

agent-browser close pkill -f "node.*daemon.js.*AGENT_BROWSER" pkill -f "Google Chrome for Testing" sleep 1 agent-browser open <url> --headed

Window exists but not visible (macOS):

osascript -e 'tell application "Google Chrome for Testing" to activate'

Element not found: Re-snapshot after page changes. DOM may have updated.

Ref lifecycle: Refs (@e1 , @e2 ) are invalidated when the page changes. Always re-snapshot after clicks that navigate, form submissions, or dynamic content loading.

References

Topic File

Full command reference commands.md

Snapshot refs, lifecycle, troubleshooting snapshot-refs.md

Auth, OAuth, 2FA, state persistence auth.md

Sessions, parallel browsers, state session-management.md

Debugging, profiling, video recording debugging.md

Proxy, geo-testing, rotating proxies proxy.md

Network mocking, tabs, frames, dialogs, settings advanced.md

browser

Safety Notice

Copy this and send it to your AI assistant to learn

Check installed + print version

Chain open + wait + snapshot

Chain multiple interactions

Snapshot diff: compare current vs last snapshot

Snapshot diff: compare vs saved file

Visual pixel diff

Compare two URLs

Output includes image path + legend:

[1] @e1 button "Submit"

[2] @e2 link "Home"

[3] @e3 textbox "Email"

Simple expressions: regular quoting OK

Complex JS: use --stdin with heredoc (RECOMMENDED)

Alternative: base64 encoding (bypasses all shell escaping)

... login flow ...

Next time: state auto-loaded

Encrypt state at rest

Manage saved states

Auto-discover running Chrome with remote debugging

Or explicit CDP port

List available iOS simulators

Launch Safari on specific device

Same workflow: snapshot, interact, re-snapshot

Video recording

... actions ...

Chrome DevTools profiling

... actions ...

Login once

Later: reuse saved auth

Source Transparency

Related Skills

flow-next-plan-review

flow-next

flow-interview

flow-plan-review