Browser Automation Skill
Web browser automation using agent-browser with AI-optimized snapshots. Reduces context by 93% using element refs (@e1, @e2) instead of full DOM.
Core Workflow
1. Navigate to page
agent-browser open <url>
2. Get accessibility tree with element refs
agent-browser snapshot -i # -i = interactive elements only
3. Interact using refs from snapshot
agent-browser click @e2 agent-browser fill @e3 "text"
4. Re-snapshot after page changes
agent-browser snapshot -i
Quick Reference
Navigation
Command Description
open <url>
Navigate to URL
back
Go back
forward
Go forward
reload
Reload page
close
Close browser
Snapshots (AI-Optimized)
Command Description
snapshot
Full accessibility tree
snapshot -i
Interactive elements only (buttons, links, inputs)
snapshot -c
Compact (remove empty elements)
snapshot -d 3
Limit depth to 3 levels
screenshot [path]
Capture screenshot (base64 if no path)
Interaction
Command Description
click <sel>
Click element
fill <sel> <text>
Clear and fill input
type <sel> <text>
Type with key events
press <key>
Press key (Enter, Tab, etc.)
hover <sel>
Hover element
select <sel> <val>
Select dropdown option
check/uncheck <sel>
Toggle checkbox
scroll <dir> [px]
Scroll page
Get Info
Command Description
get text <sel>
Get text content
get html <sel>
Get innerHTML
get value <sel>
Get input value
get attr <sel> <attr>
Get attribute
get title
Get page title
get url
Get current URL
Wait
Command Description
wait <selector>
Wait for element
wait <ms>
Wait milliseconds
wait --text "text"
Wait for text
wait --url "pattern"
Wait for URL
wait --load networkidle
Wait for load state
Sessions
Command Description
--session <name>
Use isolated session
session list
List active sessions
Selectors
Element Refs (Recommended)
Get refs from snapshot
agent-browser snapshot -i
Output: button "Submit" [ref=e2]
Use ref to interact
agent-browser click @e2
CSS Selectors
agent-browser click "#submit" agent-browser fill ".email-input" "test@test.com"
Semantic Locators
agent-browser find role button click --name "Submit" agent-browser find label "Email" fill "test@test.com" agent-browser find testid "login-btn" click
Examples
Login Flow
agent-browser open https://example.com/login agent-browser snapshot -i agent-browser fill @e2 "user@example.com" agent-browser fill @e3 "password123" agent-browser click @e4 agent-browser wait --url "**/dashboard"
Form Submission
agent-browser open https://example.com/contact agent-browser snapshot -i agent-browser fill @e1 "John Doe" agent-browser fill @e2 "john@example.com" agent-browser fill @e3 "Hello, this is my message" agent-browser click @e4 agent-browser wait --text "Thank you"
Data Extraction
agent-browser open https://example.com/products agent-browser snapshot -i
Iterate through product refs
agent-browser get text @e1 # Product name agent-browser get text @e2 # Price agent-browser get attr @e3 href # Link
Multi-Session (Swarm)
Session 1: Navigator
agent-browser --session nav open https://example.com agent-browser --session nav state save auth.json
Session 2: Scraper (uses same auth)
agent-browser --session scrape state load auth.json agent-browser --session scrape open https://example.com/data agent-browser --session scrape snapshot -i
Integration with Claude Flow
MCP Tools
All browser operations are available as MCP tools with browser/ prefix:
-
browser/open
-
browser/snapshot
-
browser/click
-
browser/fill
-
browser/screenshot
-
etc.
Memory Integration
Store successful patterns
npx @claude-flow/cli memory store --namespace browser-patterns --key "login-flow" --value "snapshot->fill->click->wait"
Retrieve before similar task
npx @claude-flow/cli memory search --query "login automation"
Hooks
Pre-browse hook (get context)
npx @claude-flow/cli hooks pre-edit --file "browser-task.ts"
Post-browse hook (record success)
npx @claude-flow/cli hooks post-task --task-id "browse-1" --success true
Tips
-
Always use snapshots - They're optimized for AI with refs
-
Prefer -i flag - Gets only interactive elements, smaller output
-
Use refs, not selectors - More reliable, deterministic
-
Re-snapshot after navigation - Page state changes
-
Use sessions for parallel work - Each session is isolated