Browser Automation with agent-browser

Core Workflow

Every browser automation follows this pattern:

Navigate: agent-browser open <url>
Snapshot: agent-browser snapshot -i (get element refs like @e1 , @e2 )
Interact: Use refs to click, fill, select
Re-snapshot: After navigation or DOM changes, get fresh refs

agent-browser open https://example.com/form agent-browser snapshot -i

Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Check result

Command Chaining

Commands can be chained with && in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.

Chain open + wait + snapshot in one call

agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i

Chain multiple interactions

agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3

Navigate and capture

agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png

When to chain: Use && when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).

Essential Commands

Navigation

agent-browser open <url> # Navigate (aliases: goto, navigate) agent-browser close # Close browser

Snapshot

agent-browser snapshot -i # Interactive elements with refs (recommended) agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer) agent-browser snapshot -s "#selector" # Scope to CSS selector

Interaction (use @refs from snapshot)

agent-browser click @e1 # Click element agent-browser click @e1 --new-tab # Click and open in new tab agent-browser fill @e2 "text" # Clear and type text agent-browser type @e2 "text" # Type without clearing agent-browser select @e1 "option" # Select dropdown option agent-browser check @e1 # Check checkbox agent-browser press Enter # Press key agent-browser keyboard type "text" # Type at current focus (no selector) agent-browser keyboard inserttext "text" # Insert without key events agent-browser scroll down 500 # Scroll page agent-browser scroll down 500 --selector "div.content" # Scroll within a specific container

Get information

agent-browser get text @e1 # Get element text agent-browser get url # Get current URL agent-browser get title # Get page title

Wait

agent-browser wait @e1 # Wait for element agent-browser wait --load networkidle # Wait for network idle agent-browser wait --url "**/page" # Wait for URL pattern agent-browser wait 2000 # Wait milliseconds

Downloads

agent-browser download @e1 ./file.pdf # Click element to trigger download agent-browser wait --download ./output.zip # Wait for any download to complete agent-browser --download-path ./downloads open <url> # Set default download directory

Capture

agent-browser screenshot # Screenshot to temp dir agent-browser screenshot --full # Full page screenshot agent-browser screenshot --annotate # Annotated screenshot with numbered element labels agent-browser pdf output.pdf # Save as PDF

Diff (compare page states)

agent-browser diff snapshot # Compare current vs last snapshot agent-browser diff snapshot --baseline before.txt # Compare current vs saved file agent-browser diff screenshot --baseline before.png # Visual pixel diff agent-browser diff url <url1> <url2> # Compare two pages agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait strategy agent-browser diff url <url1> <url2> --selector "#main" # Scope to element

Common Patterns

Form Submission

agent-browser open https://example.com/signup agent-browser snapshot -i agent-browser fill @e1 "Jane Doe" agent-browser fill @e2 "jane@example.com" agent-browser select @e3 "California" agent-browser check @e4 agent-browser click @e5 agent-browser wait --load networkidle

Authentication with State Persistence

Login once and save state

agent-browser open https://app.example.com/login agent-browser snapshot -i agent-browser fill @e1 "$USERNAME" agent-browser fill @e2 "$PASSWORD" agent-browser click @e3 agent-browser wait --url "**/dashboard" agent-browser state save auth.json

Reuse in future sessions

agent-browser state load auth.json agent-browser open https://app.example.com/dashboard

Data Extraction

agent-browser open https://example.com/products agent-browser snapshot -i agent-browser get text @e5 # Get specific element text agent-browser get text body > page.txt # Get all page text

JSON output for parsing

agent-browser snapshot -i --json agent-browser get text @e1 --json

Ref Lifecycle (Important)

Refs (@e1 , @e2 , etc.) are invalidated when the page changes. Always re-snapshot after:

Clicking links or buttons that navigate
Form submissions
Dynamic content loading (dropdowns, modals)

agent-browser click @e5 # Navigates to new page agent-browser snapshot -i # MUST re-snapshot agent-browser click @e1 # Use new refs

Annotated Screenshots (Vision Mode)

Use --annotate to take a screenshot with numbered labels overlaid on interactive elements.

agent-browser screenshot --annotate

Output includes the image path and a legend:

[1] @e1 button "Submit"

[2] @e2 link "Home"

[3] @e3 textbox "Email"

agent-browser click @e2 # Click using ref from annotated screenshot

JavaScript Evaluation

Simple expressions

agent-browser eval 'document.title'

Complex JS: use --stdin with heredoc (RECOMMENDED)

agent-browser eval --stdin <<'EVALEOF' JSON.stringify( Array.from(document.querySelectorAll("img")) .filter(i => !i.alt) .map(i => ({ src: i.src.split("/").pop(), width: i.width })) ) EVALEOF

Session Management and Cleanup

Always close your browser session when done:

agent-browser close # Close default session agent-browser --session agent1 close # Close specific session

agent-browser

Safety Notice

Copy this and send it to your AI assistant to learn