Browser Automation with agent-browser
Core Workflow
Every browser automation follows this pattern:
-
Navigate: agent-browser open <url>
-
Snapshot: agent-browser snapshot -i (get element refs like @e1 , @e2 )
-
Interact: Use refs to click, fill, select
-
Re-snapshot: After navigation or DOM changes, get fresh refs
agent-browser open https://example.com/form agent-browser snapshot -i
Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Check result
Command Chaining
Commands can be chained with && in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.
Chain open + wait + snapshot in one call
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
Chain multiple interactions
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3
Navigate and capture
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
When to chain: Use && when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).
Essential Commands
Navigation
agent-browser open <url> # Navigate (aliases: goto, navigate) agent-browser close # Close browser
Snapshot
agent-browser snapshot -i # Interactive elements with refs (recommended) agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer) agent-browser snapshot -s "#selector" # Scope to CSS selector
Interaction (use @refs from snapshot)
agent-browser click @e1 # Click element agent-browser click @e1 --new-tab # Click and open in new tab agent-browser fill @e2 "text" # Clear and type text agent-browser type @e2 "text" # Type without clearing agent-browser select @e1 "option" # Select dropdown option agent-browser check @e1 # Check checkbox agent-browser press Enter # Press key agent-browser keyboard type "text" # Type at current focus (no selector) agent-browser keyboard inserttext "text" # Insert without key events agent-browser scroll down 500 # Scroll page agent-browser scroll down 500 --selector "div.content" # Scroll within a specific container
Get information
agent-browser get text @e1 # Get element text agent-browser get url # Get current URL agent-browser get title # Get page title
Wait
agent-browser wait @e1 # Wait for element agent-browser wait --load networkidle # Wait for network idle agent-browser wait --url "**/page" # Wait for URL pattern agent-browser wait 2000 # Wait milliseconds
Downloads
agent-browser download @e1 ./file.pdf # Click element to trigger download agent-browser wait --download ./output.zip # Wait for any download to complete agent-browser --download-path ./downloads open <url> # Set default download directory
Capture
agent-browser screenshot # Screenshot to temp dir agent-browser screenshot --full # Full page screenshot agent-browser screenshot --annotate # Annotated screenshot with numbered element labels agent-browser pdf output.pdf # Save as PDF
Diff (compare page states)
agent-browser diff snapshot # Compare current vs last snapshot agent-browser diff snapshot --baseline before.txt # Compare current vs saved file agent-browser diff screenshot --baseline before.png # Visual pixel diff agent-browser diff url <url1> <url2> # Compare two pages agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait strategy agent-browser diff url <url1> <url2> --selector "#main" # Scope to element
Common Patterns
Form Submission
agent-browser open https://example.com/signup agent-browser snapshot -i agent-browser fill @e1 "Jane Doe" agent-browser fill @e2 "jane@example.com" agent-browser select @e3 "California" agent-browser check @e4 agent-browser click @e5 agent-browser wait --load networkidle
Authentication with State Persistence
Login once and save state
agent-browser open https://app.example.com/login agent-browser snapshot -i agent-browser fill @e1 "$USERNAME" agent-browser fill @e2 "$PASSWORD" agent-browser click @e3 agent-browser wait --url "**/dashboard" agent-browser state save auth.json
Reuse in future sessions
agent-browser state load auth.json agent-browser open https://app.example.com/dashboard
Data Extraction
agent-browser open https://example.com/products agent-browser snapshot -i agent-browser get text @e5 # Get specific element text agent-browser get text body > page.txt # Get all page text
JSON output for parsing
agent-browser snapshot -i --json agent-browser get text @e1 --json
Ref Lifecycle (Important)
Refs (@e1 , @e2 , etc.) are invalidated when the page changes. Always re-snapshot after:
-
Clicking links or buttons that navigate
-
Form submissions
-
Dynamic content loading (dropdowns, modals)
agent-browser click @e5 # Navigates to new page agent-browser snapshot -i # MUST re-snapshot agent-browser click @e1 # Use new refs
Annotated Screenshots (Vision Mode)
Use --annotate to take a screenshot with numbered labels overlaid on interactive elements.
agent-browser screenshot --annotate
Output includes the image path and a legend:
[1] @e1 button "Submit"
[2] @e2 link "Home"
[3] @e3 textbox "Email"
agent-browser click @e2 # Click using ref from annotated screenshot
JavaScript Evaluation
Simple expressions
agent-browser eval 'document.title'
Complex JS: use --stdin with heredoc (RECOMMENDED)
agent-browser eval --stdin <<'EVALEOF' JSON.stringify( Array.from(document.querySelectorAll("img")) .filter(i => !i.alt) .map(i => ({ src: i.src.split("/").pop(), width: i.width })) ) EVALEOF
Session Management and Cleanup
Always close your browser session when done:
agent-browser close # Close default session agent-browser --session agent1 close # Close specific session