agent-browser

Browser automation CLI for AI agents using Vercel's agent-browser. The best tool for AI-driven browser automation — uses deterministic refs from accessibility trees instead of fragile selectors. Optimized for LLMs with fast Rust CLI, JSON output, and purpose-built AI workflows. Use when you need reliable, scriptable browser automation that just works.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-browser" with this command: npx skills add clawdbrunner/skill-agent-browser/clawdbrunner-skill-agent-browser-agent-browser

agent-browser Skill

Browser automation that actually works for AI agents. Built by Vercel Labs specifically for LLM-driven workflows.

Why This Works Better Than Alternatives

1. Deterministic Refs (The Game-Changer)

Problem with traditional tools:

  • CSS selectors break when websites change
  • XPath is brittle and unreadable
  • Coordinate-based clicking fails on responsive layouts
  • Vision-based approaches are slow and expensive

The agent-browser solution:

# 1. Get snapshot with stable refs
agent-browser snapshot -i --json
# Output: - button "Submit" [ref=e2]

# 2. Use that ref forever — it points to the EXACT element
agent-browser click @e2
  • Refs are deterministic@e2 always points to the same element from your snapshot
  • No DOM re-query — direct reference is faster and more reliable
  • AI-optimized — LLMs parse the accessibility tree naturally, not CSS soup

2. Accessibility Trees > Screenshots/HTML

Traditional tools give you raw HTML (noisy) or screenshots (require vision models).

agent-browser gives you the accessibility tree — a clean, semantic representation of what a human (or screen reader) would perceive:

- heading "Billing" [level=1]
- link "Make a payment" [ref=e10]
- button "Submit" [ref=e2]
- textbox "Email" [ref=e3]
  • Semantic roles (button, link, textbox, heading)
  • Human-readable labels
  • Hierarchical structure
  • Perfect for LLM comprehension

3. Built for AI Agents

FeatureTraditional Toolsagent-browser
Element targetingFragile selectorsDeterministic refs
Page understandingRaw HTMLAccessibility tree
Output formatText logsStructured JSON
SpeedSlow (full browser per command)Fast (daemon persists)
AI integrationAfterthoughtPurpose-built

4. Fast Architecture

  • Rust CLI — Native binary, instant command parsing
  • Node.js Daemon — Browser stays warm between commands
  • First command: ~2s (daemon startup)
  • Subsequent commands: ~100ms

Prerequisites

npm install -g agent-browser
agent-browser install  # Download Chromium (~30s)

Core AI Workflow

The workflow designed for LLM agents:

# Step 1: Navigate
agent-browser open https://example.com

# Step 2: Get structured snapshot (the AI "sees" the page)
agent-browser snapshot -i --json

# Step 3: AI picks refs from JSON, execute actions
agent-browser click @e2
agent-browser fill @e3 "test@example.com"

# Step 4: Re-snapshot after changes (state verification)
agent-browser snapshot -i --json

# Step 5: Done
agent-browser close

Commands

Navigation

agent-browser open example.com
agent-browser open example.com --json            # JSON response
agent-browser open example.com --headed          # Visible browser

Snapshot (The Killer Feature)

agent-browser snapshot                           # Full accessibility tree
agent-browser snapshot -i                        # Interactive only (faster)
agent-browser snapshot -i --json                 # JSON for AI parsing
agent-browser snapshot -i -c -d 5 --json         # Compact, depth-limited

Interaction (Using Deterministic Refs)

agent-browser click @e2                          # Click element @e2
agent-browser fill @e3 "text"                    # Fill and clear
agent-browser type @e3 "text"                    # Type without clearing
agent-browser press Enter                        # Press key
agent-browser hover @e4                          # Hover

State Verification

agent-browser get text @e1                       # Get element text
agent-browser get url                            # Current URL
agent-browser is visible @e2                     # Check visibility

Session Management

agent-browser --session login open site.com      # Isolated session
agent-browser --profile ~/.myprofile open site   # Persistent cookies
agent-browser close                              # Clean up

Selector Strategies (Ranked by Reliability)

1. Refs (Best - Use These)

# From snapshot output — deterministic and stable
agent-browser click @e2
agent-browser fill @e3 "text"

2. Semantic Locators (Good)

agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"

3. CSS Selectors (Okay for static sites)

agent-browser click "#submit"
agent-browser click ".btn-primary"

4. Text/XPath (Last resort)

agent-browser click "text=Submit"
agent-browser click "xpath=//button[1]"

Snapshot Options

Control what the AI "sees":

FlagPurpose
-iInteractive elements only (buttons, links, inputs) — recommended
-CInclude cursor-interactive elements (onclick, cursor:pointer)
-cCompact (remove empty structural elements)
-d <n>Limit tree depth
-s <sel>Scope to CSS selector (e.g., #main)
--jsonMachine-readable JSON output — essential for AI

Recommended AI command:

agent-browser snapshot -i -c --json

Options

FlagDescription
--jsonJSON output with success/data/error structure
--headedShow browser window (for debugging)
--session <name>Isolated browser session
--profile <path>Persistent profile for cookies/logins
--cdp <port>Connect to existing Chrome via DevTools Protocol
--headers <json>Set auth headers per origin

Example: Complete Login Flow

# Start
agent-browser open https://portal.aeronetpr.com

# Get page structure
SNAPSHOT=$(agent-browser snapshot -i --json)
# AI parses JSON: sees textbox @e1 (Username), textbox @e2 (Password), button @e3 (Login)

# Execute login
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3

# Verify success (wait for navigation, re-snapshot)
sleep 2
agent-browser snapshot -i --json

# Done
agent-browser close

Tips for AI Agents

  1. Always use --json — Structured output is easier to parse than text
  2. Use -i flag — Interactive-only snapshots are smaller, faster, cleaner
  3. Re-snapshot after actions — Verify state changed as expected
  4. Trust refs over selectors@e2 from snapshot > #id that might change
  5. Use semantic locators when refs expirefind role button click is robust
  6. Session persistence — One open, many commands, one close

Comparison to Other Tools

ToolBest ForWhy agent-browser Wins
Puppeteer/PlaywrightDev testingBuilt for humans; brittle selectors
SeleniumLegacy testingSlow, heavy, selector-based
browser-usePython agentsagent-browser has better refs system
Screenshot + VisionVisual tasksagent-browser is 10x faster, 100x cheaper
OpenClaw browser toolSimple tasksagent-browser handles complex flows better

When to Use This Skill

Use agent-browser when:

  • Automating multi-step web workflows
  • Filling complex forms
  • Need reliable, repeatable automation
  • Working with dynamic/modern web apps
  • Cost matters (no vision API calls)

Use OpenClaw's built-in browser tool when:

  • Simple single-page checks
  • Quick screenshot needed
  • Already authenticated session in Chrome

Resources

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

agent-browser

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

agent-browser

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

Repository SourceNeeds Review
22.4K101.6K
vercel-labs
Coding

agent browser

No summary provided by upstream source.

Repository SourceNeeds Review
-1.2K
am-will