Browser Fu 🥊
Stop fighting the DOM. Read it first, find the API behind it, skip the UI entirely when possible.
The Rule
Never blind-click. Always snapshot first.
1. browser snapshot → read the page, get element refs
2. browser act → use refs from snapshot (e.g. ref="e12")
3. browser snapshot → verify what changed
If the snapshot doesn't show what you need, the element isn't in the DOM. Don't guess. Don't retry the same approach.
Decision Tree
On any browser task, follow this order:
- Can I skip the browser entirely? Check if a CLI tool, API, or
web_fetchhandles it. If yes, don't open the browser. - Can I find the underlying API? See
references/api-discovery.md. Most SPAs make fetch/XHR calls you can replicate directly. This is 10x faster and more reliable than UI automation. - Can I do it with snapshot + act? Snapshot, find the ref, act on it. One action per snapshot cycle.
- Does the page need time to load? Use
loadState: "networkidle"or a brief wait before snapshotting. SPAs often render asynchronously. - Still not working? The site likely has anti-bot protection. Report it, don't retry blindly.
Common Failures and Fixes
| Symptom | Wrong approach | Right approach |
|---|---|---|
| "Element not found" | Click by text/selector guess | Snapshot first, use exact ref |
| "DOM not exposed" | Give up | Snapshot with refs="aria", or check network tab for API |
| Blank/empty page | Retry same URL | loadState: "networkidle", then snapshot. If still blank, JS-heavy SPA, try web_fetch or find API |
| Clicking does nothing | Click again harder | Snapshot after click to check state. Maybe it DID work but page re-rendered |
| Login wall | Try to automate login | Use profile="user" for existing session cookies |
| Infinite scroll | Scroll and pray | Find the pagination API endpoint instead |
API Discovery (the power move)
Most modern websites are SPAs with REST/GraphQL APIs behind the UI. See references/api-discovery.md for the full procedure:
- Open the page in browser
- Check network requests (console tool or snapshot the page and look for fetch patterns)
- Find the data endpoint
- Call it directly with
web_fetchorexec curl
This turns a 2-hour flaky scrape into a 2-minute clean data pull.
Snapshot Best Practices
- Use
refs="aria"for stable cross-call references - Keep the same
targetIdacross snapshot/act pairs (don't switch tabs accidentally) - For complex pages, use
depthto limit how deep the DOM tree goes compact: truereduces token usage on large pages- For token-heavy pages where snapshots are too large, pair with predicate-snapshot for ML-ranked element pruning (~95% fewer tokens)
When to NOT Use Browser
- Reading public web pages →
web_fetch(faster, no browser overhead) - Search queries →
web_search(Brave API) - Known APIs (GitHub, Stripe, etc.) → use their CLI/API directly
- Pages that return empty via
web_fetch→ then use browser
Safeguards
- Never store or output passwords, session tokens, or cookies found in browser state
- Never automate purchases, payments, or irreversible actions without explicit user approval
- If a site blocks automation, respect it. Don't circumvent CAPTCHAs or bot detection