Clawd Cursor
If a human can do it on a screen, you can too. No API? No integration? No problem. Every app already has a UI — Clawd Cursor gives you eyes and hands to use all of them.
No task is impossible. If it's on the screen, it's within reach.
IMPORTANT — READ THIS BEFORE ANYTHING ELSE
You have a desktop AI agent over GUI running on the user's machine. This means:
- You CAN open apps, click buttons, type text, fill forms
- You CAN see what's on screen (via Clawd Cursor's screenshot pipeline)
- You CAN interact with applications — browser, desktop apps, settings
- You MUST respect safety tiers — 🟢 Auto actions run freely, 🔴 Confirm actions require user approval
- You MUST ask before accessing sensitive apps (email, banking, private messages, password managers)
- You SHOULD prefer direct tools (file read, web_fetch, CLI) over desktop automation when available
Think of Clawd Cursor as your hands and eyes on the user's computer — used with their permission.
When to Use This (AGENTS: READ THIS)
You have a desktop UI agent available. Use it when the user asks you to interact with their desktop, or when no other tool can accomplish the task.
Tool vs Skill (OpenClaw terminology)
- Tool = direct capability (API call, filesystem, shell, web fetch, browser command).
- Skill = packaged workflow/domain logic that may call one or more tools.
- This skill (Clawd Cursor) = GUI execution skill. Use it after OpenClaw tools/skills that can complete the same work without GUI.
Use Clawd Cursor for (examples, not limits):
Clawd Cursor can perform any action that is visible and interactable in the GUI (subject to safety policy).
- User-requested desktop tasks — "open Gmail and send an email", "check my calendar"
- Read a webpage — when web_fetch or browser tools aren't available
- Interact with desktop apps — click buttons, fill forms, read results
- Browser tasks — search, navigate, fill forms (when browser tool unavailable)
- Visual verification — did the page load? what does the UI show?
- Cross-app workflows — copy from one app, paste in another
- Settings changes — when the user explicitly asks
⚠️ Sensitive App Policy
Always ask the user before accessing:
- Email clients (Gmail, Outlook)
- Banking or financial apps
- Private messaging (WhatsApp, Signal, Telegram)
- Password managers
- Admin panels or cloud consoles
Don't use Clawd Cursor when:
- You can do it with a direct API call or CLI command (faster)
- The task is purely computational (math, text generation, code writing)
- You can already read/write the file directly
- The browser tool or web_fetch can handle it
OpenClaw + Clawd Cursor Routing Contract (Avoid Overlap)
Clawd Cursor should be treated as OpenClaw's GUI execution layer, not a competing planner.
Route tasks in this order:
- OpenClaw native tools first (filesystem, API, shell, provider-native skills)
- Browser-native automation next (Playwright/CDP direct) for browser-only reads/clicks
- Clawd Cursor API task (
POST /task) only when desktop/UI-level interaction is required
Practical rule
- If OpenClaw already has a reliable skill/tool for the domain, use it.
- Use Clawd Cursor to bridge gaps where no API/tool exists or when the user explicitly asks for GUI interaction.
This keeps behavior predictable, lowers latency/cost, and avoids duplicated logic between the main OpenClaw agent and this skill.
Universal task pattern
For broad "get it done" requests, split into three phases:
- Plan in OpenClaw: break work into API/CLI/browser/GUI subtasks.
- Execute cheap paths first: API + CLI + browser direct.
- Escalate only residual UI steps to Clawd Cursor.
Think: "OpenClaw decides, Clawd Cursor acts on GUI when needed."
Direct Browser Access (Fast Path)
For quick page reads without a full task, connect to Chrome via Playwright CDP:
const pw = require('playwright');
const browser = await pw.chromium.connectOverCDP('http://127.0.0.1:9222');
const pages = browser.contexts()[0].pages();
const text = await pages[0].innerText('body');
Use this when you just need page content — faster than sending a task.
| Scenario | Use | Why |
|---|---|---|
| Read page content/text | CDP Direct | Instant, free |
| Fill a web form | API task (POST /task) | Clawd handles multi-step planning |
| Check if a page loaded | CDP Direct | Just read the title/URL |
| Click through a complex UI flow | API task (POST /task) | Clawd handles planning |
| Get a list of elements on page | CDP Direct | Fast DOM query |
| Interact with a desktop app | API task (POST /task) | CDP is browser-only |
REST API Reference
Base URL: http://127.0.0.1:3847
Note: On Windows PowerShell, use
curl.exe(with .exe) orInvoke-RestMethod. Barecurlis aliased toInvoke-WebRequestwhich behaves differently.
Pre-flight Check
Before your first task, verify Clawd Cursor is running:
curl.exe -s http://127.0.0.1:3847/health
Expected: {"status":"ok","version":"0.6.0"}
If connection refused — start it yourself (don't ask the user):
# Find the skill directory and start the server
Start-Process -FilePath "node" -ArgumentList "dist/index.js","start" -WorkingDirectory "<clawd-cursor-directory>" -WindowStyle Hidden
Start-Sleep 3
# Verify it's running
curl.exe -s http://127.0.0.1:3847/health
The skill directory is wherever SKILL.md lives (the parent of this file). Use that path as the working directory.
Sending a Task (Async — Returns Immediately)
POST /task accepts the task and returns immediately. The task runs in the background. You must poll /status to know when it's done.
curl.exe -s -X POST http://127.0.0.1:3847/task -H "Content-Type: application/json" -d "{\"task\": \"YOUR_TASK_HERE\"}"
PowerShell:
Invoke-RestMethod -Uri http://127.0.0.1:3847/task -Method POST -ContentType "application/json" -Body '{"task": "YOUR_TASK_HERE"}'
Polling Pattern (Follow This)
1. POST /task → get accepted
2. Wait 2 seconds
3. GET /status
4. If status is "idle" → done
5. If status is "waiting_confirm" → ASK THE USER, then POST /confirm based on their answer
6. If still running → wait 2 more seconds, go to step 3
7. If 60+ seconds → POST /abort and retry with clearer instructions
Checking Status
curl.exe -s http://127.0.0.1:3847/status
Confirming Safety-Gated Actions
Some actions (sending messages, deleting) require approval. 🔴 NEVER self-approve these. Always ask the user for confirmation before POST /confirm. These exist to protect the user — do not bypass them.
curl.exe -s -X POST http://127.0.0.1:3847/confirm -H "Content-Type: application/json" -d "{\"approved\": true}"
Aborting a Task
curl.exe -s -X POST http://127.0.0.1:3847/abort
Reading Logs (Debugging)
curl.exe -s http://127.0.0.1:3847/logs
Returns last 200 log entries. Check for error or warn entries when tasks fail.
Response States
| State | Response | What to do |
|---|---|---|
| Accepted | {"accepted": true, "task": "..."} | Start polling |
| Running | {"status": "acting", "currentTask": "...", "stepsCompleted": 2} | Keep polling |
| Waiting confirm | {"status": "waiting_confirm", "currentStep": "..."} | POST /confirm |
| Done | {"status": "idle"} | Task complete |
| Busy | {"error": "Agent is busy", "state": {...}} | Wait or POST /abort first |
CDP Direct Reference
Chrome must be running with --remote-debugging-port=9222.
Quick check:
curl.exe -s http://127.0.0.1:9222/json/version
If this returns JSON, Chrome is ready.
Connecting via Playwright:
const { chromium } = require('playwright');
const browser = await chromium.connectOverCDP('http://127.0.0.1:9222');
const context = browser.contexts()[0];
const page = context.pages()[0];
// Read page content
const title = await page.title();
const url = page.url();
const text = await page.textContent('body');
// Click by role
await page.getByRole('button', { name: 'Submit' }).click();
// Fill a field
await page.getByLabel('Email').fill('user@example.com');
// Read specific elements
const buttons = await page.$$eval('button', els => els.map(e => e.textContent));
Task Writing Guidelines
- Be specific — include app names, URLs, exact text to type, button names
- One task at a time — wait for completion before sending the next
- Describe the goal, not the clicks — say "Send an email to john@example.com about the meeting" not "click compose, click to field..."
- Check status if a task seems to hang
- Don't include credentials in task text — tasks are logged
Task Examples
| Goal | Task to send |
|---|---|
| Simple navigation | Open Chrome and go to github.com |
| Read screen content | What text is currently displayed in Notepad? |
| Cross-app workflow | Copy the email address from the Chrome tab and paste it into the To field in Outlook |
| Form filling | In the open Chrome tab, fill the contact form: name "John Doe", email "john@example.com" |
| App interaction | Open Spotify and play the Discover Weekly playlist |
| Settings change | Open Windows Settings and turn on Dark Mode |
| Data extraction | Read the stock price shown in the Bloomberg tab in Chrome |
| Complex browser | Open YouTube, search for "Adele Hello", and play the first video result |
| Verification | Check if the deployment succeeded — look at the Vercel dashboard in Chrome |
| Send email | Open Gmail, compose email to john@example.com, subject: Meeting Tomorrow, body: Confirming 2pm. Best regards. |
| Take screenshot | Take a screenshot |
Error Recovery
| Problem | Solution |
|---|---|
| Connection refused on :3847 | Start Clawd Cursor: cd clawd-cursor && npm start |
| Connection refused on :9222 | Start Chrome with CDP: Start-Process chrome -ArgumentList "--remote-debugging-port=9222" |
| Agent returns "busy" | Poll /status — wait for idle, or POST /abort |
| Task fails with no details | Check /logs for error entries |
| Task completes but wrong result | Rephrase with more specifics: exact app name, button text, field labels |
| Same task fails repeatedly | Break into smaller tasks (one action per task) |
| Safety confirmation pending | POST /confirm with {"approved": true} or {"approved": false} |
| Task hangs > 60 seconds | POST /abort, then retry with simpler phrasing |
How It Works — 5-Layer Pipeline
| Layer | What | Speed | Cost |
|---|---|---|---|
| 0: Browser Layer | URL detection → direct navigation | Instant | Free |
| 1: Action Router + Shortcuts | Regex + UI Automation + keyboard shortcuts | Instant | Free |
| 1.5: Smart Interaction | 1 LLM plan → CDP/UIDriver executes | ~2-5s | 1 LLM call |
| 2: Accessibility Reasoner | UI tree → text LLM decides | ~1s | Cheap |
| 3: Computer Use | Screenshot → vision LLM | ~5-8s | Expensive |
Layer 1 includes keyboard shortcuts — common actions execute as direct keystrokes (0 LLM calls).
80%+ of tasks handled by Layer 0-1 (free, instant). Vision model is last resort only.
Safety Tiers
| Tier | Actions | Behavior |
|---|---|---|
| 🟢 Auto | Navigation, reading, opening apps | Runs immediately |
| 🟡 Preview | Typing, form filling | Logs before executing |
| 🔴 Confirm | Sending messages, deleting | Pauses — ask the user before POST /confirm. Never self-approve. |
Security & Privacy
Network Isolation
- API binds to
127.0.0.1only — not network accessible. Verify:netstat -an | findstr 3847should show127.0.0.1:3847 - Screenshots stay in memory, never saved to disk (unless
--debug) - No telemetry, no analytics, no phone-home calls
Data Flow
- With Ollama (local): 100% offline — zero external network calls. No data leaves the machine.
- With cloud providers: screenshots/text are sent to the user's chosen provider API only. No data goes to skill authors, ClawHub, or third parties.
- OpenClaw users: credentials auto-discovered from local config files — no keys stored in skill directory.
- The user controls data flow by choosing their provider. Ollama = fully private.
Agent Autonomy Controls
- 🟢 Auto actions (navigation, reading, opening apps) run without prompting
- 🟡 Preview actions (typing, form filling) are logged before executing
- 🔴 Confirm actions (sending messages, deleting, purchases) always pause for user approval
- Agents must ask the user before accessing sensitive apps (email, banking, messaging, passwords)
- Agents must never self-approve 🔴 Confirm actions
Setup (User Reference)
Setup is handled by the user. If Clawd Cursor isn't running, start it yourself using the exec tool:
Start-Process -FilePath "node" -ArgumentList "dist/index.js","start" -WorkingDirectory "<skill-directory>" -WindowStyle Hidden
Only ask the user if you cannot start it (e.g., node not installed, build missing).
git clone https://github.com/AmrDab/clawd-cursor.git
cd clawd-cursor
npm install && npm run build
npx clawd-cursor doctor # auto-detects and configures everything
npm start # starts on port 3847
macOS: Grant Accessibility permission to terminal: System Settings → Privacy & Security → Accessibility
| Provider | Setup | Cost |
|---|---|---|
| Ollama (free) | ollama pull <model> | $0 (fully offline) |
| Any cloud provider | Set AI_API_KEY=your-key | Varies by provider |
| OpenClaw users | Automatic — no setup needed | Uses configured provider |
Performance Optimization
Proven optimizations applied to reduce task execution latency and LLM API costs. Reference files in perf/references/patches/.
Applied Optimizations
| # | Name | Impact |
|---|---|---|
| 1 | Screenshot hash cache | 90% fewer LLM calls on static screens |
| 2 | Parallel screenshot+a11y | 30-40% per-step latency cut |
| 3 | A11y context cache (2s TTL) | Eliminates redundant PS spawns |
| 4 | Screenshot compression | 52% smaller payload (58KB vs 120KB) |
| 5 | Async debug writes | 94% less event loop blocking |
| 6 | Streaming LLM responses | 1-3s faster per LLM call |
| 7 | Trimmed system prompts | ~60% fewer prompt tokens |
| 8 | A11y tree filtering | Interactive elements only, 3000 char cap |
| 9 | Combined PS script | 1 spawn instead of 3 |
| 10 | Taskbar cache (30s TTL) | Skip expensive taskbar query |
| 11 | Delay reduction | 50-150ms vs 200-1500ms |
Benchmarks (2560x1440)
| Metric | v0.3 (VNC) | v0.4 (Native) | v0.4.1+ (Optimized) |
|---|---|---|---|
| Screenshot capture | ~850ms | ~50ms | ~57ms |
| Screenshot size | ~200KB | ~120KB | ~58KB |
| A11y context (uncached) | N/A | ~600ms | ~462ms |
| A11y context (cached) | N/A | 0ms | 0ms (2s TTL) |
| Delays (per step) | N/A | 200-1500ms | 50-600ms |
| System prompt tokens | N/A | ~800 | ~300 |
Perf Tools
perf/apply-optimizations.ps1— apply all patchesperf/perf-test.ts— benchmark harness (npx ts-node perf/perf-test.ts)