Skill: desktop-control

When to Use

Use this skill when the user asks to:

Click somewhere on the screen
Move the mouse to a position
Type text into an application
Press keyboard shortcuts or hotkeys
Read what's on the current screen (accessibility tree)
Get information about the frontmost window
Automate desktop interactions
Control the computer (mouse, keyboard, screen)
Scroll up/down in an application
Drag and drop elements

IMPORTANT: This skill requires Accessibility permissions for the terminal/IDE. On macOS, go to System Settings > Privacy & Security > Accessibility and enable the running application.

Bundled Scripts

Script	Type	Description
`scripts/mouse.py`	Python	Mouse movement, clicking, dragging, scrolling
`scripts/keyboard.py`	Python	Text typing, key presses, hotkeys
`scripts/screen.py`	Python	Screen info, capture, accessibility tree reading

All scripts auto-install pyautogui if needed.

Mouse Control

Input Parameters

Parameter	Required	Description	Example
`action`	Yes	`move`, `click`, `doubleclick`, `rightclick`, `drag`, `scroll`	click
`x`	For most	X coordinate (pixels from left)	500
`y`	For most	Y coordinate (pixels from top)	300
`button`	No	Mouse button: `left` (default), `right`, `middle`	left
`to_x`	For drag	Destination X coordinate	700
`to_y`	For drag	Destination Y coordinate	400
`amount`	For scroll	Scroll amount (positive=up, negative=down)	-3

Script Usage

# Move mouse
python3 skills/desktop-control/scripts/mouse.py move --x 500 --y 300

# Click at position
python3 skills/desktop-control/scripts/mouse.py click --x 500 --y 300

# Double click
python3 skills/desktop-control/scripts/mouse.py doubleclick --x 500 --y 300

# Right click
python3 skills/desktop-control/scripts/mouse.py rightclick --x 500 --y 300

# Drag from one position to another
python3 skills/desktop-control/scripts/mouse.py drag --x 100 --y 100 --to-x 500 --to-y 500

# Scroll down 3 clicks
python3 skills/desktop-control/scripts/mouse.py scroll --amount -3

# Scroll up 5 clicks at specific position
python3 skills/desktop-control/scripts/mouse.py scroll --x 500 --y 300 --amount 5

# Get current mouse position
python3 skills/desktop-control/scripts/mouse.py position

Keyboard Control

Input Parameters

Parameter	Required	Description	Example
`action`	Yes	`type`, `press`, `hotkey`	type
`text`	For type	Text to type	Hello World
`key`	For press	Key name to press	enter
`keys`	For hotkey	Key combination, plus-separated	command+c
`interval`	No	Delay between keystrokes in seconds (default: 0.02)	0.05

Script Usage

# Type text
python3 skills/desktop-control/scripts/keyboard.py type --text "Hello World"

# Type slowly
python3 skills/desktop-control/scripts/keyboard.py type --text "Hello" --interval 0.1

# Press a single key
python3 skills/desktop-control/scripts/keyboard.py press --key enter
python3 skills/desktop-control/scripts/keyboard.py press --key tab
python3 skills/desktop-control/scripts/keyboard.py press --key escape

# Keyboard shortcuts (hotkeys)
python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "command+c"
python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "command+shift+s"
python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "alt+tab"
python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "command+space"

Common Key Names

enter, return, tab, space, backspace, delete, escape, up, down, left, right, home, end, pageup, pagedown, f1-f12, command, ctrl, alt, shift, capslock

Screen Reading

Input Parameters

Parameter	Required	Description	Example
`action`	Yes	`info`, `capture`, `read-ui`	read-ui
`output`	For capture	Screenshot output path	/tmp/screen.png
`x`, `y`, `width`, `height`	For capture region	Region to capture

Script Usage

# Get screen size and mouse position
python3 skills/desktop-control/scripts/screen.py info

# Take a screenshot
python3 skills/desktop-control/scripts/screen.py capture --output /tmp/screen.png

# Capture a specific region
python3 skills/desktop-control/scripts/screen.py capture --x 0 --y 0 --width 800 --height 600 --output /tmp/region.png

# Read the accessibility tree of the frontmost application (MOST USEFUL)
python3 skills/desktop-control/scripts/screen.py read-ui

# Read accessibility tree with depth limit
python3 skills/desktop-control/scripts/screen.py read-ui --depth 3

The read-ui command uses AppleScript to read the accessibility tree of the frontmost application, returning window titles, buttons, text fields, menus, and other UI elements. This is the primary way to understand what's on screen before interacting.

Typical Workflow

Read the screen to understand what's visible:

python3 skills/desktop-control/scripts/screen.py read-ui

Identify targets from the accessibility tree output

Interact using mouse/keyboard:

python3 skills/desktop-control/scripts/mouse.py click --x 500 --y 300
python3 skills/desktop-control/scripts/keyboard.py type --text "search query"
python3 skills/desktop-control/scripts/keyboard.py press --key enter

Verify by reading the screen again

Example

click on the search bar
type "hello" into the text field
press command+s to save
what's on the screen right now
read the UI elements of the current window
move the mouse to the center of the screen
scroll down in this window

desktop-control

Safety Notice

Copy this and send it to your AI assistant to learn

Skill: desktop-control

When to Use

Bundled Scripts

Mouse Control

Input Parameters

Script Usage

Keyboard Control

Input Parameters

Script Usage

Common Key Names

Screen Reading

Input Parameters

Script Usage

Typical Workflow

Example

Source Transparency

Related Skills

Linux Desktop Control

ClawdCursor

Virtual Desktop — Universal Browser Execution