agent-browser - Browser Automation for AI Agents
When to use this skill
-
Open websites and automate UI actions
-
Fill forms, click controls, and verify outcomes
-
Capture screenshots/PDFs or extract content
-
Run deterministic web checks with accessibility refs
-
Execute parallel browser tasks via isolated sessions
Core workflow
Always use the deterministic ref loop:
-
agent-browser open <url>
-
agent-browser snapshot -i
-
interact with refs (@e1 , @e2 , ...)
-
agent-browser snapshot -i again after page/DOM changes
agent-browser open https://example.com/form agent-browser wait --load networkidle agent-browser snapshot -i agent-browser fill @e1 "user@example.com" agent-browser click @e2 agent-browser snapshot -i
Command patterns
Use && chaining when intermediate output is not needed.
Good chaining: open -> wait -> snapshot
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
Separate calls when output is needed first
agent-browser snapshot -i
parse refs
agent-browser click @e2
High-value commands:
-
Navigation: open , close
-
Snapshot: snapshot -i , snapshot -i -C , snapshot -s "#selector"
-
Interaction: click , fill , type , select , check , press
-
Verification: diff snapshot , diff screenshot --baseline <file>
-
Capture: screenshot , screenshot --annotate , pdf
-
Wait: wait --load networkidle , wait <selector|@ref|ms>
Verification patterns
Use explicit evidence after actions.
Baseline -> action -> verify structure
agent-browser snapshot -i agent-browser click @e3 agent-browser diff snapshot
Visual regression
agent-browser screenshot baseline.png agent-browser click @e5 agent-browser diff screenshot --baseline baseline.png
Safety and reliability
-
Refs are invalid after navigation or significant DOM updates; re-snapshot before next action.
-
Prefer wait --load networkidle or selector/ref waits over fixed sleeps.
-
For multi-step JS, use eval --stdin (or base64) to avoid shell escaping breakage.
-
For concurrent tasks, isolate with --session <name> .
-
Use output controls in long pages to reduce context flooding.
-
Optional hardening in sensitive flows: domain allowlist and action policies.
Optional hardening examples:
Wrap page content with boundaries to reduce prompt-injection risk
export AGENT_BROWSER_CONTENT_BOUNDARIES=1
Limit output volume for long pages
export AGENT_BROWSER_MAX_OUTPUT=50000
Restrict navigation and network to trusted domains
export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"
Restrict allowed action types
export AGENT_BROWSER_ACTION_POLICY=./policy.json
Example policy.json :
{"default":"deny","allow":["navigate","snapshot","click","fill","scroll","wait","get"],"deny":["eval","download","upload","network","state"]}
CLI-flag equivalent:
agent-browser --content-boundaries --max-output 50000 --allowed-domains "example.com,*.example.com" --action-policy ./policy.json open https://example.com
Troubleshooting
-
command not found : install and run agent-browser install .
-
Wrong element clicked: run snapshot -i again and use fresh refs.
-
Dynamic SPA content missing: wait with --load networkidle or targeted wait selector.
-
Session collisions: assign unique --session names and close each session.
-
Large output pressure: narrow snapshots (-i , -c , -d , -s ) and extract only needed text.
References
Deep-dive docs in this skill:
-
commands
-
snapshot-refs
-
session-management
-
authentication
Related resources:
Ready templates:
-
./templates/form-automation.sh
-
./templates/capture-workflow.sh
Metadata
-
Version: 1.1.0
-
Last updated: 2026-02-26
-
Scope: deterministic browser automation for agent workflows