agent-desktop

Desktop automation via native OS accessibility trees using the agent-desktop CLI. Use when an AI agent needs to observe, interact with, or automate desktop applications (click buttons, fill forms, navigate menus, read UI state, toggle checkboxes, scroll, drag, type text, take screenshots, manage windows, use clipboard). Covers 54 commands across observation, interaction, keyboard/mouse, app lifecycle, clipboard, and wait. Triggers on: "click button", "fill form", "open app", "read UI", "automate desktop", "accessibility tree", "snapshot app", "type into field", "navigate menu", "toggle checkbox", "take screenshot", "desktop automation", "agent-desktop", or any desktop GUI interaction task. Supports macOS (Phase 1), with Windows and Linux planned.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-desktop" with this command: npx skills add lahfir/agent-desktop

agent-desktop

CLI tool enabling AI agents to observe and control desktop applications via native OS accessibility trees.

Core principle: agent-desktop is NOT an AI agent. It is a tool that AI agents invoke. It outputs structured JSON with ref-based element identifiers. The observation-action loop lives in the calling agent.

Installation

npm install -g agent-desktop
# or
bun install -g --trust agent-desktop

Requires macOS 12+ with Accessibility permission granted to your terminal.

Reference Files

Detailed documentation is split into focused reference files. Read them as needed:

ReferenceContents
references/commands-observation.mdsnapshot, find, get, is, screenshot, list-surfaces — all flags, output examples
references/commands-interaction.mdclick, type, set-value, select, toggle, scroll, drag, keyboard, mouse — choosing the right command
references/commands-system.mdlaunch, close, windows, clipboard, wait, batch, status, permissions, version
references/workflows.md12 common patterns: forms, menus, dialogs, scroll-find, drag-drop, async wait, anti-patterns
references/macos.mdmacOS permissions/TCC, AX API internals, smart activation chain, surfaces, Notification Center, troubleshooting

The Observe-Act Loop

Every automation follows this pattern:

1. OBSERVE  → agent-desktop snapshot --app "App Name" -i
2. REASON   → Parse JSON, find target element by ref (@e1, @e2...)
3. ACT      → agent-desktop click @e5  (or type, select, toggle...)
4. VERIFY   → agent-desktop snapshot again to confirm state change
5. REPEAT   → Continue until task is complete

Always snapshot before acting. Refs are snapshot-scoped and become stale after UI changes.

Ref System

  • Refs assigned depth-first: @e1, @e2, @e3...
  • Only interactive elements get refs: button, textfield, checkbox, link, menuitem, tab, slider, combobox, treeitem, cell
  • Static text, groups, containers remain in tree for context but have no ref
  • Refs are deterministic within a snapshot but NOT stable across snapshots if UI changed
  • After any action that changes UI, run snapshot again for fresh refs

JSON Output Contract

Every command returns a JSON envelope on stdout:

Success: { "version": "1.0", "ok": true, "command": "snapshot", "data": { ... } } Error: { "version": "1.0", "ok": false, "command": "click", "error": { "code": "STALE_REF", "message": "...", "suggestion": "..." } }

Exit codes: 0 success, 1 structured error, 2 argument error.

Error Codes

CodeMeaningRecovery
PERM_DENIEDAccessibility permission not grantedGrant in System Settings > Privacy > Accessibility
ELEMENT_NOT_FOUNDRef not in current refmapRe-run snapshot, use fresh ref
APP_NOT_FOUNDApp not runningLaunch it first
ACTION_FAILEDAX action rejectedTry alternative approach or coordinate-based click
ACTION_NOT_SUPPORTEDElement can't do thisUse different command
STALE_REFRef from old snapshotRe-run snapshot
WINDOW_NOT_FOUNDNo matching windowCheck app name, use list-windows
TIMEOUTWait condition not metIncrease --timeout
INVALID_ARGSBad argumentsCheck command syntax

Command Quick Reference (54 commands)

Observation

agent-desktop snapshot --app "App" -i           # Accessibility tree with refs
agent-desktop screenshot --app "App" out.png    # PNG screenshot
agent-desktop find --app "App" --role button    # Search elements
agent-desktop get @e1 --property text           # Read element property
agent-desktop is @e1 --property enabled         # Check element state
agent-desktop list-surfaces --app "App"         # Available surfaces

Interaction

agent-desktop click @e5                         # Click element
agent-desktop double-click @e3                  # Double-click
agent-desktop triple-click @e2                  # Triple-click (select line)
agent-desktop right-click @e5                   # Right-click (context menu)
agent-desktop type @e2 "hello"                  # Type text into element
agent-desktop set-value @e2 "new value"         # Set value directly
agent-desktop clear @e2                         # Clear element value
agent-desktop focus @e2                         # Set keyboard focus
agent-desktop select @e4 "Option B"             # Select dropdown option
agent-desktop toggle @e6                        # Toggle checkbox/switch
agent-desktop check @e6                         # Idempotent check
agent-desktop uncheck @e6                       # Idempotent uncheck
agent-desktop expand @e7                        # Expand disclosure
agent-desktop collapse @e7                      # Collapse disclosure
agent-desktop scroll @e1 --direction down       # Scroll element
agent-desktop scroll-to @e8                     # Scroll into view

Keyboard & Mouse

agent-desktop press cmd+c                       # Key combo
agent-desktop press return --app "App"          # Targeted key press
agent-desktop key-down shift                    # Hold key
agent-desktop key-up shift                      # Release key
agent-desktop hover @e5                         # Cursor to element
agent-desktop hover --xy 500,300                # Cursor to coordinates
agent-desktop drag --from @e1 --to @e5          # Drag between elements
agent-desktop mouse-click --xy 500,300          # Click at coordinates
agent-desktop mouse-move --xy 100,200           # Move cursor
agent-desktop mouse-down --xy 100,200           # Press mouse button
agent-desktop mouse-up --xy 300,400             # Release mouse button

App & Window

agent-desktop launch "System Settings"          # Launch and wait
agent-desktop close-app "TextEdit"              # Quit gracefully
agent-desktop close-app "TextEdit" --force      # Force kill
agent-desktop list-windows --app "Finder"       # List windows
agent-desktop list-apps                         # List running GUI apps
agent-desktop focus-window --app "Finder"       # Bring to front
agent-desktop resize-window --app "App" --width 800 --height 600
agent-desktop move-window --app "App" --x 0 --y 0
agent-desktop minimize --app "App"
agent-desktop maximize --app "App"
agent-desktop restore --app "App"

Notifications

agent-desktop list-notifications                # List all notifications
agent-desktop list-notifications --app "Slack"  # Filter by app
agent-desktop list-notifications --text "deploy" --limit 5  # Filter by text
agent-desktop dismiss-notification 1            # Dismiss by index
agent-desktop dismiss-all-notifications         # Dismiss all
agent-desktop dismiss-all-notifications --app "Slack"  # Dismiss all from app
agent-desktop notification-action 1 --action "Reply"   # Click action button

Clipboard

agent-desktop clipboard-get                     # Read clipboard
agent-desktop clipboard-set "text"              # Write to clipboard
agent-desktop clipboard-clear                   # Clear clipboard

Wait

agent-desktop wait 1000                         # Pause 1 second
agent-desktop wait --element @e5 --timeout 5000 # Wait for element
agent-desktop wait --window "Title"             # Wait for window
agent-desktop wait --text "Done" --app "App"    # Wait for text
agent-desktop wait --menu --app "App"           # Wait for context menu
agent-desktop wait --menu-closed --app "App"    # Wait for menu dismissal
agent-desktop wait --notification --app "App"   # Wait for new notification

System

agent-desktop status                            # Health check
agent-desktop permissions                       # Check permission
agent-desktop permissions --request             # Trigger permission dialog
agent-desktop version --json                    # Version info
agent-desktop batch '[...]' --stop-on-error     # Batch commands

Key Principles for Agents

  1. Always snapshot first. Never assume UI state.
  2. Use -i flag. Filters to interactive elements only, reducing tokens.
  3. Refs are ephemeral. Snapshot again after any UI-changing action.
  4. Prefer refs over coordinates. click @e5 > mouse-click --xy 500,300.
  5. Use wait for async UI. After launch/dialog triggers, wait for expected state.
  6. Check permissions first. Run permissions on first use.
  7. Handle errors. Parse error.code and follow error.suggestion.
  8. Use find for targeted searches. Faster than full snapshot when you know role/name.
  9. Use surfaces for menus. snapshot --surface menu captures open menus.
  10. Batch for performance. Multiple commands in one invocation.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Agent Cli Orchestrator

Orchestrates multiple AI CLI tools by auto-detecting, prioritizing, and switching between them for stable, fallback-enabled automated coding workflows.

Registry SourceRecently Updated
029
Profile unavailable
Coding

airq

Check air quality, AQI, PM2.5, PM10, pollution levels for any city from the terminal using airq CLI. Installs, configures, and runs air quality queries. Use...

Registry SourceRecently Updated
056
Profile unavailable
Coding

Bit URL Shortener

Explain bit-cli skill purpose, installation, required setup, and troubleshooting.

Registry SourceRecently Updated
055
Profile unavailable