agent-browser CLI
A headless browser automation CLI designed for AI agents, with fast Rust-based execution and element refs optimized for LLM reasoning.
Overview
agent-browser provides programmatic browser control through a CLI that's purpose-built for AI agent workflows. It uses deterministic element references (@e1, @e2, etc.) from accessibility trees, making it ideal for LLM-based automation where consistent element targeting is critical.
Key Features:
- Fast Rust-based CLI with Node.js fallback
- Element refs (
@e1,@e2) for stable LLM reasoning - Session isolation for parallel browser instances
- JSON output for programmatic parsing
- Accessibility tree snapshots for element discovery
- CDP connection support for existing browser instances
When to Use
- Automating web interactions (form filling, clicking, navigation)
- Scraping web content with accessibility tree parsing
- Testing web applications programmatically
- Multi-agent scenarios requiring isolated browser sessions
- Screenshot capture and PDF generation
- Monitoring web page state changes
Prerequisites
- Node.js >= 18 or Bun runtime
- Chromium (installed via
agent-browser install)
Installation
# Install globally
bun install -g agent-browser
# Download Chromium
agent-browser install
# Linux with system dependencies
agent-browser install --with-deps
Verify installation:
agent-browser --version
Quick Start
# Navigate to a page
agent-browser open https://example.com
# Get accessibility snapshot with element refs
agent-browser snapshot -i # -i = interactive elements only
# Click an element by ref
agent-browser click @e2
# Fill a form field
agent-browser fill @e3 "test@example.com"
# Take a screenshot
agent-browser screenshot output.png
Core Workflow
1. Open Page and Snapshot
# Open URL
agent-browser open https://example.com
# Get interactive elements with refs
agent-browser snapshot -i --json
The snapshot returns elements like:
@e1 link "Home"
@e2 textbox "Email"
@e3 button "Submit"
2. Interact Using Refs
# Click by ref
agent-browser click @e2
# Type into focused element
agent-browser type @e2 "hello@example.com"
# Fill (clears first, then types)
agent-browser fill @e2 "hello@example.com"
# Press keyboard key
agent-browser press Enter
3. Verify State
# Check element visibility
agent-browser is visible @e3
# Get element text
agent-browser get text @e1
# Get page URL
agent-browser get url
Command Reference
Navigation
| Command | Description |
|---|---|
open <url> | Navigate to URL |
back | Go back in history |
forward | Go forward in history |
reload | Reload current page |
close | Close browser |
Interactions
| Command | Description |
|---|---|
click <ref> | Click element |
dblclick <ref> | Double-click element |
type <ref> <text> | Type text into element |
fill <ref> <text> | Clear and fill element |
press <key> | Press keyboard key (Enter, Tab, Control+a) |
hover <ref> | Hover over element |
focus <ref> | Focus element |
check <ref> | Check checkbox |
uncheck <ref> | Uncheck checkbox |
select <ref> <val> | Select dropdown option |
scroll <dir> [px] | Scroll (up/down/left/right) |
wait <ref|ms> | Wait for element or milliseconds |
Information
| Command | Description |
|---|---|
snapshot | Get accessibility tree with refs |
snapshot -i | Interactive elements only |
snapshot -c | Compact (remove empty elements) |
snapshot -d <n> | Limit tree depth |
get text <ref> | Get element text content |
get html <ref> | Get element HTML |
get value <ref> | Get input value |
get url | Get current page URL |
get title | Get page title |
screenshot [path] | Take screenshot |
screenshot --full | Full page screenshot |
pdf <path> | Save page as PDF |
State Checks
| Command | Description |
|---|---|
is visible <ref> | Check if element is visible |
is enabled <ref> | Check if element is enabled |
is checked <ref> | Check if checkbox is checked |
Find Elements
# Find by role and click
agent-browser find role button click --name Submit
# Find by text
agent-browser find text "Sign In" click
# Find by label
agent-browser find label "Email" fill "test@example.com"
# Find by placeholder
agent-browser find placeholder "Search..." type "query"
Session Management
Sessions provide isolated browser instances for parallel execution:
# Use named session
agent-browser --session login-flow open https://app.com
# Different session for another task
agent-browser --session checkout open https://app.com/cart
# List active sessions
agent-browser session list
# Environment variable (persistent across commands)
export AGENT_BROWSER_SESSION=my-session
agent-browser open https://example.com
JSON Output
Use --json for machine-readable output:
# Snapshot as JSON
agent-browser snapshot -i --json
# Get element info as JSON
agent-browser get text @e1 --json
# Parse in scripts
agent-browser get url --json | jq -r '.url'
Advanced Features
CDP Connection
Connect to an existing Chrome instance:
# Start Chrome with debugging
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
# Connect agent-browser
agent-browser --cdp 9222 snapshot
Video Recording
# Start recording
agent-browser record start recording.webm https://example.com
# Perform actions...
agent-browser click @e1
agent-browser fill @e2 "test"
# Stop and save
agent-browser record stop
Network Interception
# Block requests to URL pattern
agent-browser network route "**/analytics/*" --abort
# Mock API response
agent-browser network route "**/api/user" --body '{"name": "Test"}'
# View captured requests
agent-browser network requests
Custom Headers
# Set auth headers
agent-browser --headers '{"Authorization": "Bearer token123"}' open https://api.example.com
Browser Extensions
# Load extension
agent-browser --extension /path/to/extension open https://example.com
AI Agent Patterns
Form Automation
#!/bin/bash
# Login automation script
agent-browser open https://app.com/login
# Get form elements
agent-browser snapshot -i --json > elements.json
# Fill credentials
agent-browser fill @e2 "user@example.com"
agent-browser fill @e3 "password123"
# Submit
agent-browser click @e4
# Wait for navigation
agent-browser wait 2000
# Verify login
agent-browser get url --json | jq -r '.url'
Data Extraction
#!/bin/bash
# Scrape product data
agent-browser open https://shop.com/products
# Get page content
agent-browser snapshot --json > snapshot.json
# Extract specific element text
agent-browser get text '[data-testid="price"]' --json
# Screenshot for verification
agent-browser screenshot products.png
Multi-Page Workflow
#!/bin/bash
SESSION="checkout-flow"
# Step 1: Add to cart
agent-browser --session $SESSION open https://shop.com/product/123
agent-browser --session $SESSION click @add-to-cart-button
agent-browser --session $SESSION wait 1000
# Step 2: Go to checkout
agent-browser --session $SESSION open https://shop.com/checkout
agent-browser --session $SESSION snapshot -i
# Step 3: Fill shipping
agent-browser --session $SESSION fill @shipping-name "John Doe"
agent-browser --session $SESSION fill @shipping-address "123 Main St"
# Cleanup
agent-browser --session $SESSION close
Options Reference
| Option | Description |
|---|---|
--session <name> | Isolated browser session |
--json | JSON output format |
--headed | Show browser window (not headless) |
--cdp <port> | Connect via Chrome DevTools Protocol |
--headers <json> | HTTP headers for requests |
--proxy <url> | Proxy server |
--executable-path <path> | Custom browser executable |
--extension <path> | Load browser extension |
--full, -f | Full page screenshot |
--debug | Debug output |
Environment Variables
| Variable | Description |
|---|---|
AGENT_BROWSER_SESSION | Default session name |
AGENT_BROWSER_EXECUTABLE_PATH | Custom browser path |
AGENT_BROWSER_STREAM_PORT | WebSocket streaming port |
Troubleshooting
"Browser not found"
# Install Chromium
agent-browser install
Element ref not found
# Refresh snapshot after page changes
agent-browser snapshot -i
# Check if element is visible
agent-browser is visible @e1
Session conflicts
# List active sessions
agent-browser session list
# Use unique session names
agent-browser --session unique-$(date +%s) open https://example.com
Headless issues
# Run with visible browser for debugging
agent-browser --headed open https://example.com
Timeout waiting for element
# Explicit wait before interaction
agent-browser wait 2000
agent-browser wait @e1 # Wait for element to appear
Best Practices
- Always snapshot first: Get element refs before interacting
- Use
--jsonfor parsing: Machine-readable output for scripts - Session isolation: Use
--sessionfor parallel workflows - Explicit waits: Add
waitcommands after navigation/clicks - Interactive snapshots: Use
-iflag to reduce noise - Verify state: Check element visibility before clicking
- Screenshot on failure: Capture state when automation fails