Desktop automation ultra

# Desktop Automation Skill v2.0

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "Desktop automation ultra" with this command: npx skills add JordaneParis/desktop-automation-ultra

Desktop Automation Skill v2.0

License: MIT OpenClaw

Complete desktop automation for Windows/macOS/Linux. Zero-error edition.


⚠️ Privacy & Security

CRITICAL: This skill captures ALL keyboard and mouse events.

  • NEVER record while entering passwords, credit cards, or secrets
  • Recorded macros are stored as JSON in recorded_macro/ directory
  • Always use dry_run=true to test before actual execution
  • Store macros in secure locations only
  • Enable safe mode by default (it is)

🎯 What It Does

Automate desktop interactions without APIs:

  • ✅ Click, type, drag, scroll
  • ✅ Capture screenshots
  • ✅ Recognize images (OpenCV template matching)
  • ✅ Extract text (Tesseract OCR)
  • ✅ Record and replay macros
  • ✅ Find windows by title
  • ✅ Clipboard operations
  • ✅ Safe mode with dry_run for testing

🔐 Safety Features (Built-In)

1. Safe Mode (Default: ON)

Blocks dangerous actions when enabled:

  • type, press_key, click, drag are monitored
  • Parameters are scanned for dangerous patterns: rm , del , C:\Windows\, /etc/, sudo, etc.
  • Blocked actions are logged

2. Dry-Run Mode

All actions support dry_run=true:

  • Action is logged but NOT executed
  • Use for testing before running real automation

3. Audit Logging

Every action logged to ~/.openclaw/skills/desktop-automation-logs/automation_YYYY-MM-DD.log

4. Thread Safety

All modules use locks to prevent race conditions.


📦 Installation

1. Extract Files

Place desktop-automation-ultra-local/ in:

  • Windows: C:\Users\<User>\.openclaw\workspace\skills\
  • Linux/macOS: ~/.openclaw/workspace/skills/

2. Install Dependencies

pip install -r requirements.txt

3. Optional: Tesseract for OCR

For find_text_on_screen functionality:

4. Restart OpenClaw

openclaw gateway restart

🚀 Quick Start

Basic Click

action: click
params:
  x: 100
  y: 100
  dry_run: true  # Test first!

Type Text

action: type
params:
  text: "Hello World"
  interval: 0.05  # Delay between keys
  dry_run: false

Find Image

action: find_image
params:
  template_path: "templates/button.png"
  confidence: 0.95

Extract Text (OCR)

action: read_text_ocr
params:
  lang: "fra"  # French

📖 Core Actions

Mouse & Keyboard

ActionParametersReturns
clickx, y, button="left", dry_run{status, x, y}
typetext, interval=0.05, dry_run{status, text}
press_keykey, dry_run{status, key}
move_mousex, y, duration=0.5, dry_run{status, x, y}
scrollamount=5, dry_run{status, amount}
dragstart_x, start_y, end_x, end_y, duration=0.5, dry_run{status}
copy_to_clipboardtext, dry_run{status}
paste_from_clipboarddry_run{status, length}

Screenshots & Windows

ActionParametersReturns
screenshotpath="~/Desktop/screenshot.png", dry_run{status, path}
get_active_windowdry_run{status, title, x, y, width, height}
list_windowsdry_run{status, windows[], count}
activate_windowtitle_substring, dry_run{status, title}

Image Recognition (requires OpenCV)

ActionParametersReturns
find_imagetemplate_path, confidence=0.9, dry_run{status, x, y, confidence}
find_image_multiscaletemplate_path, confidence, scale_factors, dry_run{status, x, y, confidence, scale}
wait_for_imagetemplate_path, timeout=30.0, interval=0.5, confidence=0.9, dry_run{status, x, y, confidence}

OCR / Text Recognition (requires Tesseract)

ActionParametersReturns
find_text_on_screentext, lang="fra", dry_run{status, locations[], count}
find_all_text_on_screentext, lang="fra", dry_run{status, data[], count}
read_text_ocrlang="fra", dry_run{status, text, length}
read_text_regionx, y, width, height, lang="fra", dry_run{status, text, length}
extract_screen_dataregion={}, output_format="json", lang="fra", dry_run{status, data[], count}

Macros

ActionParametersReturns
play_macromacro_path, speed=1.0, dry_run{status, executed, total, errors[]}
stop_macro{status}
play_macro_with_subroutinesmacro_path, speed=1.0, sub_macros_dir, dry_run{status, executed, total, errors[]}

Safety Management

ActionParametersReturns
set_safe_modeenabled=true{status, safe_mode}
get_safety_status{status, safe_mode_enabled, dangerous_patterns, dangerous_actions[]}

📝 Macro Format

Recorded macros are JSON with this structure:

{
  "events": [
    {
      "action": "click",
      "params": {"x": 100, "y": 50},
      "wait": 500
    },
    {
      "action": "type",
      "params": {"text": "Hello"},
      "wait": 200
    },
    {
      "action": "press_key",
      "params": {"key": "return"},
      "wait": 100
    }
  ]
}
  • action — action name
  • params — action parameters
  • wait — milliseconds to wait before next action

🔧 Advanced: Mouse Move Debouncing

To avoid recording hundreds of move_mouse events during a smooth drag, the recorder uses debouncing:

  • When you move the mouse, events are suppressed during movement
  • After you stop moving for N seconds (default: 1 sec), the final position is recorded
  • This reduces macro size dramatically while preserving intended end positions
  • Configurable via GUI: set debounce time (0.1–10 seconds)

Example:

  • Fast horizontal line → 1 move_mouse event (end coordinates)
  • Slow, stop-and-go → multiple move_mouse events (one per "stop")

🧪 Testing

Run the unit test suite:

python scripts/test_automation.py

Output:

test_dry_run_click ... ok
test_get_active_window ... ok
test_safe_mode_blocks_dangerous ... ok
...
Ran 13 tests
OK

📊 Logging

All actions logged to: ~/.openclaw/skills/desktop-automation-logs/automation_YYYY-MM-DD.log

Example:

[2026-03-15 10:23:45] [INFO] ActionManager: ActionManager initialized with safe_mode=True
[2026-03-15 10:23:46] [INFO] ActionManager: Clicked at (100, 50) with left button
[2026-03-15 10:23:47] [INFO] ActionManager: Typed: Hello World

⚙️ Configuration

Environment Variables

# Override log directory
export AUTOMATION_LOG_DIR=~/my_logs

# Disable safe mode globally (NOT recommended)
export AUTOMATION_SAFE_MODE=false

🐛 Troubleshooting

"pyautogui failsafe triggered"

Move mouse to corner of screen to stop.

OCR returns empty text

  • Ensure Tesseract is installed correctly
  • Check image quality (high contrast helps)
  • Try read_text_ocr instead of find_text_on_screen

Image recognition not finding template

  • Ensure template image exists and is correct format (PNG, JPG)
  • Try lower confidence threshold (e.g., 0.85 instead of 0.95)
  • Use find_image_multiscale to detect at different scales

Actions blocked by safe mode

This is intentional. To run dangerous actions:

action: set_safe_mode
params:
  enabled: false

Then execute your action. Re-enable safe mode immediately after:

action: set_safe_mode
params:
  enabled: true

📄 License

MIT License. See LICENSE file.


📚 Files Structure

desktop-automation-ultra-local/
├── SKILL.md                          (This file)
├── requirements.txt                  (Python dependencies)
├── lib/
│   ├── actions.py                   (Core click/type/drag actions)
│   ├── image_recognition.py         (OpenCV template matching)
│   ├── ocr_engine.py                (Tesseract OCR)
│   ├── macro_player.py              (Record/playback macros)
│   ├── safety_manager.py            (Safe mode, blocking)
│   └── utils.py                     (Logging, helpers)
├── scripts/
│   └── test_automation.py           (Unit tests)
└── recorded_macro/                  (Output: saved macros)

Validation Checklist

  • All modules have proper error handling
  • Thread safety implemented (locks)
  • Safe mode enabled by default
  • Dry-run mode on all actions
  • Comprehensive logging
  • Unit tests (13 tests)
  • UTF-8 encoding for all text
  • No hardcoded paths (uses expanduser)
  • Graceful fallbacks for missing dependencies
  • Documentation complete

Status: PRODUCTION READY


Last updated: 2026-03-15 Version: 2.0.0

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

claw2ui

Generate interactive web pages (dashboards, charts, tables, reports) and serve them via public URL. Use this skill when the user explicitly asks for data vis...

Registry SourceRecently Updated
General

WeChat Article Summarize

Read one or more WeChat public account article links from mp.weixin.qq.com, extract cleaned full text and optional image links, summarize each article in Chi...

Registry SourceRecently Updated
General

Openfinance

Connect bank accounts to AI models using openfinance.sh

Registry SourceRecently Updated
General

---

合同审查清单AI助手 - 5类合同+3大特殊条款,风险识别与修改建议

Registry SourceRecently Updated