Agent Browser Automation
Guide for using agent-browser CLI to automate web browsing tasks in Claude Code.
Quick Start
Installation Check
Before using agent-browser, verify installation:
# Check if installed
agent-browser --version
# If not installed, install globally
npm install -g agent-browser
agent-browser install # Download Chromium
Windows Note: If you encounter /bin/sh errors on Windows, use:
npx agent-browser <command>
See troubleshooting.md for platform-specific issues.
Core Workflow
agent-browser uses a refs-based system where page elements get unique identifiers (like @e1, @e2) that you can use for interactions.
Basic Pattern
- Open a page
- Get snapshot with refs
- Interact using refs
- Repeat as needed
# 1. Navigate to page
agent-browser open example.com
# 2. Get page snapshot with interactive elements
agent-browser snapshot -i --json
# 3. Use refs from snapshot to interact
agent-browser click @e5
agent-browser fill @e3 "search query"
# 4. Take screenshot or get results
agent-browser screenshot result.png
Essential Commands
Navigation
agent-browser open <url> # Open URL
agent-browser goto <url> # Navigate to URL
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
Getting Page Information
agent-browser snapshot # Get accessibility tree
agent-browser snapshot -i # Interactive elements only
agent-browser snapshot -i --json # JSON format (best for AI)
agent-browser screenshot <file> # Take screenshot
agent-browser get text @e1 # Get element text
agent-browser get html # Get page HTML
agent-browser get url # Get current URL
Interacting with Elements
agent-browser click @e2 # Click element by ref
agent-browser dblclick @e2 # Double click
agent-browser fill @e3 "text" # Fill input field
agent-browser type @e3 "text" # Type text (slower, more realistic)
agent-browser press Enter # Press keyboard key
agent-browser check @e4 # Check checkbox
agent-browser select @e5 "option" # Select dropdown option
agent-browser upload @e6 file.pdf # Upload file
Semantic Locators (Find Commands)
When you don't have refs, use semantic locators:
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@example.com"
agent-browser find placeholder "Search..." type "query"
Waiting
agent-browser wait @e1 # Wait for element
agent-browser wait --text "Done" # Wait for text
agent-browser wait --url /success # Wait for URL change
agent-browser wait --load # Wait for page load
Session Management
Use sessions to run multiple isolated browser instances:
# Start different sessions
agent-browser --session task1 open site-a.com
agent-browser --session task2 open site-b.com
# Each session has separate:
# - Cookies and storage
# - Authentication state
# - Navigation history
# List active sessions
agent-browser session list
# Close specific session
agent-browser --session task1 close
AI Agent Workflow
For AI-driven automation, follow this pattern:
- Navigate and snapshot
agent-browser open https://example.com
agent-browser snapshot -i --json > page.json
-
Parse JSON to understand page structure
- Identify interactive elements and their refs
- Understand page layout and available actions
-
Execute actions using refs
agent-browser click @e2
agent-browser fill @e5 "input data"
- Get new snapshot after page changes
agent-browser snapshot -i --json > updated.json
- Repeat until task complete
See workflows.md for detailed AI workflow patterns.
Advanced Features
Network Interception
# Block requests
agent-browser route --block "*.ads.com/*"
# Mock responses
agent-browser route --mock "/api/data" response.json
State Persistence
# Save authentication state
agent-browser save-state auth.json
# Load state in new session
agent-browser load-state auth.json
Debugging
# Enable console logs
agent-browser --console open example.com
# Highlight elements
agent-browser highlight @e3
# Enable tracing
agent-browser --trace trace.zip open example.com
Best Practices
- Use
-i --jsonfor snapshots - Reduces noise, easier for AI to parse - Prefer refs over selectors - More reliable than CSS/XPath
- Use sessions for parallel tasks - Isolate different workflows
- Wait for elements - Use
waitcommands to handle dynamic content - Take screenshots - Visual confirmation of state
- Use semantic locators as fallback - When refs aren't available
Common Patterns
Form Filling
agent-browser open https://form.example.com
agent-browser snapshot -i --json
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser click @e3 # Submit button
agent-browser wait --url /success
Data Extraction
agent-browser open https://data.example.com
agent-browser snapshot -i --json > structure.json
agent-browser get text @e5 > data.txt
agent-browser screenshot evidence.png
Multi-Step Workflow
# Login
agent-browser open https://app.example.com/login
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
# Navigate to target
agent-browser wait --url /dashboard
agent-browser goto https://app.example.com/data
# Extract data
agent-browser snapshot -i --json > results.json
Platform-Specific Notes
Windows
- Use
npx agent-browserif global command fails - PowerShell may require quotes around URLs with special characters
- See troubleshooting.md for
/bin/sherrors
Linux
- Install with dependencies:
agent-browser install --with-deps - May need to install Playwright system dependencies manually
macOS
- Works out of the box after
npm install -g agent-browser
Reference Documentation
- commands.md - Complete command reference
- workflows.md - AI workflow patterns and examples
- troubleshooting.md - Common issues and solutions
Architecture
agent-browser is built on Playwright with:
- Fast Rust CLI implementation (with Node.js fallback)
- Accessibility tree parsing for AI-friendly page representation
- Reference system (
@e1,@e2) for stable element targeting - Chrome DevTools Protocol (CDP) for persistent sessions
When to Use agent-browser
✅ Use agent-browser when:
- Automating web browsing tasks
- Scraping data from websites
- Filling and submitting forms
- Testing web applications
- Interacting with dynamic web pages
- Need AI-friendly element targeting
❌ Don't use agent-browser when:
- Simple HTTP requests suffice (use curl/fetch instead)
- API endpoints are available (use API directly)
- Task doesn't require browser rendering