clawdcursor

AI desktop agent — control any app on Windows/macOS from your OpenClaw agent. Send natural language tasks to the Clawd Cursor API and it handles everything: opening apps, clicking buttons, typing text, navigating browsers, filling forms. If you can click it, your agent can too.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "clawdcursor" with this command: npx skills add amrdab/clawd-cursor/amrdab-clawd-cursor-clawdcursor

Clawd Cursor

If a human can do it on a screen, you can too. No API? No integration? No problem. Every app already has a UI — Clawd Cursor gives you eyes and hands to use all of them.

No task is impossible. If it's on the screen, it's within reach.

IMPORTANT — READ THIS BEFORE ANYTHING ELSE

You have a desktop AI agent over GUI running on the user's machine. This means:

  1. You CAN open apps, click buttons, type text, fill forms
  2. You CAN see what's on screen (via Clawd Cursor's screenshot pipeline)
  3. You CAN interact with applications — browser, desktop apps, settings
  4. You MUST respect safety tiers — 🟢 Auto actions run freely, 🔴 Confirm actions require user approval
  5. You MUST ask before accessing sensitive apps (email, banking, private messages, password managers)
  6. You SHOULD prefer direct tools (file read, web_fetch, CLI) over desktop automation when available

Think of Clawd Cursor as your hands and eyes on the user's computer — used with their permission.


When to Use This (AGENTS: READ THIS)

You have a desktop UI agent available. Use it when the user asks you to interact with their desktop, or when no other tool can accomplish the task.

Tool vs Skill (OpenClaw terminology)

  • Tool = direct capability (API call, filesystem, shell, web fetch, browser command).
  • Skill = packaged workflow/domain logic that may call one or more tools.
  • This skill (Clawd Cursor) = GUI execution skill. Use it after OpenClaw tools/skills that can complete the same work without GUI.

Use Clawd Cursor for (examples, not limits):

Clawd Cursor can perform any action that is visible and interactable in the GUI (subject to safety policy).

  • User-requested desktop tasks — "open Gmail and send an email", "check my calendar"
  • Read a webpage — when web_fetch or browser tools aren't available
  • Interact with desktop apps — click buttons, fill forms, read results
  • Browser tasks — search, navigate, fill forms (when browser tool unavailable)
  • Visual verification — did the page load? what does the UI show?
  • Cross-app workflows — copy from one app, paste in another
  • Settings changes — when the user explicitly asks

⚠️ Sensitive App Policy

Always ask the user before accessing:

  • Email clients (Gmail, Outlook)
  • Banking or financial apps
  • Private messaging (WhatsApp, Signal, Telegram)
  • Password managers
  • Admin panels or cloud consoles

Don't use Clawd Cursor when:

  • You can do it with a direct API call or CLI command (faster)
  • The task is purely computational (math, text generation, code writing)
  • You can already read/write the file directly
  • The browser tool or web_fetch can handle it

OpenClaw + Clawd Cursor Routing Contract (Avoid Overlap)

Clawd Cursor should be treated as OpenClaw's GUI execution layer, not a competing planner.

Route tasks in this order:

  1. OpenClaw native tools first (filesystem, API, shell, provider-native skills)
  2. Browser-native automation next (Playwright/CDP direct) for browser-only reads/clicks
  3. Clawd Cursor API task (POST /task) only when desktop/UI-level interaction is required

Practical rule

  • If OpenClaw already has a reliable skill/tool for the domain, use it.
  • Use Clawd Cursor to bridge gaps where no API/tool exists or when the user explicitly asks for GUI interaction.

This keeps behavior predictable, lowers latency/cost, and avoids duplicated logic between the main OpenClaw agent and this skill.

Universal task pattern

For broad "get it done" requests, split into three phases:

  1. Plan in OpenClaw: break work into API/CLI/browser/GUI subtasks.
  2. Execute cheap paths first: API + CLI + browser direct.
  3. Escalate only residual UI steps to Clawd Cursor.

Think: "OpenClaw decides, Clawd Cursor acts on GUI when needed."

Direct Browser Access (Fast Path)

For quick page reads without a full task, connect to Chrome via Playwright CDP:

const pw = require('playwright');
const browser = await pw.chromium.connectOverCDP('http://127.0.0.1:9222');
const pages = browser.contexts()[0].pages();
const text = await pages[0].innerText('body');

Use this when you just need page content — faster than sending a task.

ScenarioUseWhy
Read page content/textCDP DirectInstant, free
Fill a web formAPI task (POST /task)Clawd handles multi-step planning
Check if a page loadedCDP DirectJust read the title/URL
Click through a complex UI flowAPI task (POST /task)Clawd handles planning
Get a list of elements on pageCDP DirectFast DOM query
Interact with a desktop appAPI task (POST /task)CDP is browser-only

REST API Reference

Base URL: http://127.0.0.1:3847

Note: On Windows PowerShell, use curl.exe (with .exe) or Invoke-RestMethod. Bare curl is aliased to Invoke-WebRequest which behaves differently.

Pre-flight Check

Before your first task, verify Clawd Cursor is running:

curl.exe -s http://127.0.0.1:3847/health

Expected: {"status":"ok","version":"0.6.0"}

If connection refused — start it yourself (don't ask the user):

# Find the skill directory and start the server
Start-Process -FilePath "node" -ArgumentList "dist/index.js","start" -WorkingDirectory "<clawd-cursor-directory>" -WindowStyle Hidden
Start-Sleep 3
# Verify it's running
curl.exe -s http://127.0.0.1:3847/health

The skill directory is wherever SKILL.md lives (the parent of this file). Use that path as the working directory.

Sending a Task (Async — Returns Immediately)

POST /task accepts the task and returns immediately. The task runs in the background. You must poll /status to know when it's done.

curl.exe -s -X POST http://127.0.0.1:3847/task -H "Content-Type: application/json" -d "{\"task\": \"YOUR_TASK_HERE\"}"

PowerShell:

Invoke-RestMethod -Uri http://127.0.0.1:3847/task -Method POST -ContentType "application/json" -Body '{"task": "YOUR_TASK_HERE"}'

Polling Pattern (Follow This)

1. POST /task → get accepted
2. Wait 2 seconds
3. GET /status
4. If status is "idle" → done
5. If status is "waiting_confirm" → ASK THE USER, then POST /confirm based on their answer
6. If still running → wait 2 more seconds, go to step 3
7. If 60+ seconds → POST /abort and retry with clearer instructions

Checking Status

curl.exe -s http://127.0.0.1:3847/status

Confirming Safety-Gated Actions

Some actions (sending messages, deleting) require approval. 🔴 NEVER self-approve these. Always ask the user for confirmation before POST /confirm. These exist to protect the user — do not bypass them.

curl.exe -s -X POST http://127.0.0.1:3847/confirm -H "Content-Type: application/json" -d "{\"approved\": true}"

Aborting a Task

curl.exe -s -X POST http://127.0.0.1:3847/abort

Reading Logs (Debugging)

curl.exe -s http://127.0.0.1:3847/logs

Returns last 200 log entries. Check for error or warn entries when tasks fail.

Response States

StateResponseWhat to do
Accepted{"accepted": true, "task": "..."}Start polling
Running{"status": "acting", "currentTask": "...", "stepsCompleted": 2}Keep polling
Waiting confirm{"status": "waiting_confirm", "currentStep": "..."}POST /confirm
Done{"status": "idle"}Task complete
Busy{"error": "Agent is busy", "state": {...}}Wait or POST /abort first

CDP Direct Reference

Chrome must be running with --remote-debugging-port=9222.

Quick check:

curl.exe -s http://127.0.0.1:9222/json/version

If this returns JSON, Chrome is ready.

Connecting via Playwright:

const { chromium } = require('playwright');
const browser = await chromium.connectOverCDP('http://127.0.0.1:9222');
const context = browser.contexts()[0];
const page = context.pages()[0];

// Read page content
const title = await page.title();
const url = page.url();
const text = await page.textContent('body');

// Click by role
await page.getByRole('button', { name: 'Submit' }).click();

// Fill a field
await page.getByLabel('Email').fill('user@example.com');

// Read specific elements
const buttons = await page.$$eval('button', els => els.map(e => e.textContent));

Task Writing Guidelines

  1. Be specific — include app names, URLs, exact text to type, button names
  2. One task at a time — wait for completion before sending the next
  3. Describe the goal, not the clicks — say "Send an email to john@example.com about the meeting" not "click compose, click to field..."
  4. Check status if a task seems to hang
  5. Don't include credentials in task text — tasks are logged

Task Examples

GoalTask to send
Simple navigationOpen Chrome and go to github.com
Read screen contentWhat text is currently displayed in Notepad?
Cross-app workflowCopy the email address from the Chrome tab and paste it into the To field in Outlook
Form fillingIn the open Chrome tab, fill the contact form: name "John Doe", email "john@example.com"
App interactionOpen Spotify and play the Discover Weekly playlist
Settings changeOpen Windows Settings and turn on Dark Mode
Data extractionRead the stock price shown in the Bloomberg tab in Chrome
Complex browserOpen YouTube, search for "Adele Hello", and play the first video result
VerificationCheck if the deployment succeeded — look at the Vercel dashboard in Chrome
Send emailOpen Gmail, compose email to john@example.com, subject: Meeting Tomorrow, body: Confirming 2pm. Best regards.
Take screenshotTake a screenshot

Error Recovery

ProblemSolution
Connection refused on :3847Start Clawd Cursor: cd clawd-cursor && npm start
Connection refused on :9222Start Chrome with CDP: Start-Process chrome -ArgumentList "--remote-debugging-port=9222"
Agent returns "busy"Poll /status — wait for idle, or POST /abort
Task fails with no detailsCheck /logs for error entries
Task completes but wrong resultRephrase with more specifics: exact app name, button text, field labels
Same task fails repeatedlyBreak into smaller tasks (one action per task)
Safety confirmation pendingPOST /confirm with {"approved": true} or {"approved": false}
Task hangs > 60 secondsPOST /abort, then retry with simpler phrasing

How It Works — 5-Layer Pipeline

LayerWhatSpeedCost
0: Browser LayerURL detection → direct navigationInstantFree
1: Action Router + ShortcutsRegex + UI Automation + keyboard shortcutsInstantFree
1.5: Smart Interaction1 LLM plan → CDP/UIDriver executes~2-5s1 LLM call
2: Accessibility ReasonerUI tree → text LLM decides~1sCheap
3: Computer UseScreenshot → vision LLM~5-8sExpensive

Layer 1 includes keyboard shortcuts — common actions execute as direct keystrokes (0 LLM calls).

80%+ of tasks handled by Layer 0-1 (free, instant). Vision model is last resort only.

Safety Tiers

TierActionsBehavior
🟢 AutoNavigation, reading, opening appsRuns immediately
🟡 PreviewTyping, form fillingLogs before executing
🔴 ConfirmSending messages, deletingPauses — ask the user before POST /confirm. Never self-approve.

Security & Privacy

Network Isolation

  • API binds to 127.0.0.1 only — not network accessible. Verify: netstat -an | findstr 3847 should show 127.0.0.1:3847
  • Screenshots stay in memory, never saved to disk (unless --debug)
  • No telemetry, no analytics, no phone-home calls

Data Flow

  • With Ollama (local): 100% offline — zero external network calls. No data leaves the machine.
  • With cloud providers: screenshots/text are sent to the user's chosen provider API only. No data goes to skill authors, ClawHub, or third parties.
  • OpenClaw users: credentials auto-discovered from local config files — no keys stored in skill directory.
  • The user controls data flow by choosing their provider. Ollama = fully private.

Agent Autonomy Controls

  • 🟢 Auto actions (navigation, reading, opening apps) run without prompting
  • 🟡 Preview actions (typing, form filling) are logged before executing
  • 🔴 Confirm actions (sending messages, deleting, purchases) always pause for user approval
  • Agents must ask the user before accessing sensitive apps (email, banking, messaging, passwords)
  • Agents must never self-approve 🔴 Confirm actions

Setup (User Reference)

Setup is handled by the user. If Clawd Cursor isn't running, start it yourself using the exec tool:

Start-Process -FilePath "node" -ArgumentList "dist/index.js","start" -WorkingDirectory "<skill-directory>" -WindowStyle Hidden

Only ask the user if you cannot start it (e.g., node not installed, build missing).

git clone https://github.com/AmrDab/clawd-cursor.git
cd clawd-cursor
npm install && npm run build
npx clawd-cursor doctor    # auto-detects and configures everything
npm start                  # starts on port 3847

macOS: Grant Accessibility permission to terminal: System Settings → Privacy & Security → Accessibility

ProviderSetupCost
Ollama (free)ollama pull <model>$0 (fully offline)
Any cloud providerSet AI_API_KEY=your-keyVaries by provider
OpenClaw usersAutomatic — no setup neededUses configured provider

Performance Optimization

Proven optimizations applied to reduce task execution latency and LLM API costs. Reference files in perf/references/patches/.

Applied Optimizations

#NameImpact
1Screenshot hash cache90% fewer LLM calls on static screens
2Parallel screenshot+a11y30-40% per-step latency cut
3A11y context cache (2s TTL)Eliminates redundant PS spawns
4Screenshot compression52% smaller payload (58KB vs 120KB)
5Async debug writes94% less event loop blocking
6Streaming LLM responses1-3s faster per LLM call
7Trimmed system prompts~60% fewer prompt tokens
8A11y tree filteringInteractive elements only, 3000 char cap
9Combined PS script1 spawn instead of 3
10Taskbar cache (30s TTL)Skip expensive taskbar query
11Delay reduction50-150ms vs 200-1500ms

Benchmarks (2560x1440)

Metricv0.3 (VNC)v0.4 (Native)v0.4.1+ (Optimized)
Screenshot capture~850ms~50ms~57ms
Screenshot size~200KB~120KB~58KB
A11y context (uncached)N/A~600ms~462ms
A11y context (cached)N/A0ms0ms (2s TTL)
Delays (per step)N/A200-1500ms50-600ms
System prompt tokensN/A~800~300

Perf Tools

  • perf/apply-optimizations.ps1 — apply all patches
  • perf/perf-test.ts — benchmark harness (npx ts-node perf/perf-test.ts)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

A Python CLI skill for Cutout.Pro visual APIs — background removal, face cutout, and photo enhancement. Supports file upload & image URL input.

Call Cutout.Pro visual processing APIs to perform background removal, face cutout, and photo enhancement. Supports both file upload and image URL input, retu...

Registry SourceRecently Updated
Coding

client-onboarding-agent

Client onboarding and business diagnostic framework for AI agent deployments. Covers 4-round diagnostic process, 6 constraint categories, deployment SOP with...

Registry SourceRecently Updated
Coding

Ai Tools

AI Tools Box - Search and invoke 100+ AI tools. Categories: Writing, Image, Video, Coding, Office, Search, Chat, Audio, Design, Agent, Translation, Dev Platf...

Registry SourceRecently Updated