agent-browser

Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-browser" with this command: npx skills add itechmeat/llm-code/itechmeat-llm-code-agent-browser

Agent Browser

Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.

Works with: Claude Code, Cursor, GitHub Copilot, OpenAI Codex, Google Gemini, opencode.

Quick Navigation

Topic Reference

Installation installation.md

Commands commands.md

Refs refs.md

Advanced advanced.md

When to Use

  • Automating browser tasks in AI agent workflows

  • Web scraping with AI-friendly output

  • Testing web applications with LLM agents

  • Managing multiple browser sessions with isolated auth

Core Concepts

Refs (Element References)

The snapshot command returns an accessibility tree where each element has a unique ref like @e1 , @e2 :

  • Deterministic - ref points to exact element from snapshot

  • Fast - no DOM re-query needed

  • AI-friendly - LLMs can reliably parse and use refs

Architecture

Client-daemon architecture:

  • Rust CLI - parses commands, communicates with daemon

  • Daemon - runs the browser automation engine:

  • Default: Node.js daemon (Playwright)

  • Native (v0.16.0+): native Rust daemon (direct Chrome DevTools Protocol), enabled via --native , AGENT_BROWSER_NATIVE=1 , or "native": true

  • Lightpanda (v0.17.0+): Lightpanda engine, selected via --engine lightpanda or AGENT_BROWSER_ENGINE=lightpanda (implies native mode)

Daemon starts automatically and persists between commands. Startup errors are surfaced directly (v0.17.0+).

v0.8.6 improves daemon reliability by cleaning stale socket/PID files and retrying transient connection errors.

Quick Example

Navigate and get snapshot

agent-browser open example.com agent-browser snapshot # Get accessibility tree with refs agent-browser click @e2 # Click by ref from snapshot agent-browser fill @e3 "test@example.com" # Fill input by ref agent-browser get text @e1 # Get text by ref agent-browser screenshot page.png # Save screenshot agent-browser close

AI Workflow Pattern

Optimal workflow for AI agents:

1. Navigate and get snapshot

agent-browser open example.com agent-browser snapshot -i --json # AI parses tree and refs

2. AI identifies target refs from snapshot

3. Execute actions using refs

agent-browser click @e2 agent-browser fill @e3 "input text"

4. Get new snapshot if page changed

agent-browser snapshot -i --json

Headed Mode (Debugging)

agent-browser open example.com --headed

Local File Access (v0.9.1)

agent-browser open file:///path/to/doc.pdf --allow-file-access

Cursor-Aware Snapshots (v0.9.1)

agent-browser snapshot -C agent-browser snapshot --cursor

Session Persistence (v0.10.0)

Automatically save and restore cookies/localStorage across restarts with a named session:

agent-browser --session-name myapp open myapp.com agent-browser --session-name myapp open myapp.com

State management commands:

agent-browser state list agent-browser state show myapp agent-browser state rename myapp myapp-prod agent-browser state clear myapp-prod agent-browser state cleanup

Release Updates (v0.12.0–v0.14.0)

  • Added keyboard commands for raw keyboard input at the currently focused element (no selector needed).

  • Added persistent color scheme selection via --color-scheme and AGENT_BROWSER_COLOR_SCHEME .

  • Improved IPC reliability (EAGAIN/backpressure-aware writes) and lowered default Playwright timeout to 25s (configurable via AGENT_BROWSER_DEFAULT_TIMEOUT ).

  • Improved CDP reconnection and fixed state load when no browser is running.

  • Reduced --annotate warning noise when the flag isn’t explicitly passed.

New Tab Clicks (v0.10.0)

agent-browser click @e12 --new-tab

Mobile Safari (iOS)

agent-browser -p ios device list agent-browser -p ios open https://example.com --device "iPhone 15" agent-browser tap 200 400 agent-browser swipe 200 600 200 200 500

JSON Output

Use --json for machine-readable output:

agent-browser snapshot --json agent-browser get text @e1 --json agent-browser is visible @e2 --json

Critical Prohibitions

  • Do not use CSS/XPath selectors when refs are available (use @e1, @e2, etc.)

  • Do not forget to close sessions when done

  • Do not assume element positions without taking a fresh snapshot

  • Do not use old refs after page navigation or content changes (re-snapshot)

Common Commands

Navigation

agent-browser open <url> agent-browser back / forward / reload agent-browser close

Interaction

agent-browser click <sel> agent-browser click <sel> --new-tab agent-browser fill <sel> <text> agent-browser press <key> agent-browser hover <sel> agent-browser select <sel> <val> agent-browser download <sel> <path> # v0.7+

Info

agent-browser get text <sel> agent-browser get url agent-browser get title agent-browser is visible <sel>

Snapshots & Screenshots

agent-browser snapshot -i --json agent-browser screenshot [path]

Links

  • Documentation

  • Changelog

  • GitHub

  • npm

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

react-testing-library

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

social-writer

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

commits

No summary provided by upstream source.

Repository SourceNeeds Review