ubuntu-desktop-control

Control Ubuntu desktop GUI with semantic element targeting using AT-SPI accessibility tree and OCR fallback. For wallet automation, browser extensions, and GUI tasks Playwright can't reach.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "ubuntu-desktop-control" with this command: npx skills add lommaj/ubuntu-desktop-control/lommaj-ubuntu-desktop-control-ubuntu-desktop-control

Desktop Control Skill

Control the desktop GUI using semantic element targeting. Find and click UI elements by name instead of coordinates.

Key Features:

  • AT-SPI - Primary method using accessibility tree (knows element roles, states, actions)
  • OCR Fallback - Tesseract-based text finding when AT-SPI can't find the element
  • Wait Utilities - Poll for elements to appear with exponential backoff
  • Click Verification - Optional pre-click screenshot verification

Prerequisites

Install dependencies:

bash install.sh

Or manually:

# System packages
sudo apt-get install -y xdotool scrot imagemagick \
    at-spi2-core libatk-adaptor python3-gi gir1.2-atspi-2.0 \
    tesseract-ocr tesseract-ocr-eng python3-pip

# Python packages
pip3 install -r requirements.txt

For headless Xvfb sessions:

export GTK_MODULES=gail:atk-bridge
export QT_LINUX_ACCESSIBILITY_ALWAYS_ON=1
/usr/lib/at-spi2-core/at-spi-bus-launcher &

Commands

All commands use DISPLAY=:10.0 by default. Override with --display flag.


find-element

Find UI element via AT-SPI with OCR fallback.

python3 scripts/desktop.py find-element --name "Confirm" [--role button] [--app Firefox]
ParameterTypeRequiredDescription
--name, -nstringNoElement name/text to find
--role, -rstringNoElement role (button, entry, etc.)
--app, -astringNoApplication name filter
--allflagNoFind all matches
--clickableflagNoOnly clickable elements
--max-resultsintNoMaximum results (default: 50)

Returns:

{
  "element": {
    "name": "Confirm",
    "bounds": { "x": 400, "y": 300, "width": 100, "height": 30 },
    "center": { "x": 450, "y": 315 },
    "role": "push button",
    "source": "atspi",
    "visible": true,
    "enabled": true,
    "clickable": true
  }
}

find-text

Find text on screen via OCR only.

python3 scripts/desktop.py find-text "I have an existing wallet" [--exact] [--all]
ParameterTypeRequiredDescription
textstringYesText to find
--exactflagNoRequire exact match
--case-sensitiveflagNoCase-sensitive matching
--allflagNoFind all occurrences
--max-resultsintNoMaximum results (default: 50)

Returns:

{
  "match": {
    "name": "I have an existing wallet",
    "bounds": { "x": 200, "y": 400, "width": 180, "height": 20 },
    "center": { "x": 290, "y": 410 },
    "source": "ocr",
    "confidence": 95.2
  }
}

click-element

Click element by name/role (finds element first, then clicks at center).

python3 scripts/desktop.py click-element --name "Next" [--role button] [--verify]
ParameterTypeRequiredDescription
--name, -nstringNoElement name/text
--role, -rstringNoElement role
--app, -astringNoApplication name filter
--rightflagNoRight click
--doubleflagNoDouble click
--verifyflagNoOCR verify before click

click-element requires at least one selector: --name or --role. When --verify is used, OCR must be available and text must be provided (typically via --name).

Returns:

{
  "clicked": {
    "element": { "name": "Next", "..." },
    "x": 450,
    "y": 315,
    "button": "left",
    "double": false
  }
}

wait-for

Wait for element or text to appear (with timeout and exponential backoff).

python3 scripts/desktop.py wait-for --name "Success" --timeout 30
python3 scripts/desktop.py wait-for --text "Transaction complete" --timeout 60
python3 scripts/desktop.py wait-for --name "Loading" --gone --timeout 30
ParameterTypeRequiredDescription
--name, -nstringNoElement name (AT-SPI + OCR)
--role, -rstringNoElement role (AT-SPI only)
--app, -astringNoApplication filter
--text, -tstringNoText to find (OCR only)
--exactflagNoExact text match
--goneflagNoWait until disappears
--timeoutfloatNoTimeout in seconds (default: 30)

Use either --text or element selectors (--name, --role, --app) for a single call, not both. For element waits (with or without --gone), provide at least one of --name or --role.

Returns:

{ "found": { "name": "Success", "..." } }
// or
{ "gone": true, "name": "Loading" }
// or
{ "error": "Element not found within 30s", "timeout": true }

list-elements

List all interactive elements (buttons, inputs, links, etc.)

python3 scripts/desktop.py list-elements [--app Firefox] [--role button]
ParameterTypeRequiredDescription
--app, -astringNoApplication name filter
--role, -rstringNoFilter by role
--include-hiddenflagNoInclude hidden elements
--max-resultsintNoMaximum results (default: 100)

Returns:

{
  "elements": [
    { "name": "Sign In", "role": "push button", "..." },
    { "name": "Email", "role": "entry", "..." }
  ],
  "count": 2
}

status

Check AT-SPI and OCR availability.

python3 scripts/desktop.py status

Returns:

{
  "atspi": {
    "available": true,
    "applications": ["Firefox", "gnome-calculator"]
  },
  "ocr": { "available": true },
  "display": ":10.0"
}

Original Commands

These coordinate-based commands are still available:

CommandDescription
screenshot [--output PATH]Take screenshot
click X Y [--right] [--double]Click at coordinates
type "TEXT" [--type-delay MS]Type text
key "KEYS"Press key combination
move X YMove mouse
activeGet active window info
find-window "NAME"Find windows by name
focus "NAME"Focus window by name
positionGet mouse position
windowsList all windows

Example Workflows

MetaMask Transaction (Semantic)

# Wait for and click Confirm button
python3 scripts/desktop.py wait-for --name "Confirm" --role button --timeout 30
python3 scripts/desktop.py click-element --name "Confirm" --role button

# Wait for success message
python3 scripts/desktop.py wait-for --text "Transaction submitted" --timeout 60

Phantom Wallet Import (Semantic)

# Click "I have an existing wallet"
python3 scripts/desktop.py click-element --name "I already have a wallet"

# Wait for seed phrase input
python3 scripts/desktop.py wait-for --role entry --timeout 10

# Type seed phrase
python3 scripts/desktop.py type "word1 word2 word3..."

# Click Import
python3 scripts/desktop.py click-element --name "Import"

Hybrid Approach (Semantic + Coordinates)

# Use semantic for known buttons
python3 scripts/desktop.py click-element --name "Settings"

# Fall back to coordinates for unlabeled icons
python3 scripts/desktop.py screenshot --output /tmp/screen.png
# (analyze screenshot to get coordinates)
python3 scripts/desktop.py click 850 120

Tips

  1. Prefer semantic commands - click-element and wait-for are more robust than coordinates
  2. Check status first - Run status to verify AT-SPI and OCR are available
  3. Use --role for precision - Distinguish between buttons and text with same name
  4. Fall back to OCR - If AT-SPI doesn't expose an element, find-text uses OCR
  5. Wait instead of sleep - wait-for is more reliable than fixed delays
  6. Use --verify for critical clicks - Adds OCR verification before clicking

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Web3

china-sportswear-outdoor-sourcing

Comprehensive sportswear and outdoor equipment sourcing guide for international buyers – provides detailed information about China's athletic apparel, footwear, outdoor gear, and accessories manufacturing clusters, supply chain structure, regional specializations, and industry trends (2026 updated).

Archived SourceRecently Updated
Web3

china-lighting-sourcing

Comprehensive lighting industry sourcing guide for international buyers – provides detailed information about China's LED, smart, outdoor, automotive, and specialty lighting manufacturing clusters, supply chain structure, regional specializations, and industry trends (2026 updated).

Archived SourceRecently Updated
Web3

china-furniture-sourcing

Comprehensive furniture industry sourcing guide for international buyers – provides detailed information about China's residential, office, hotel, outdoor, and custom furniture manufacturing clusters, supply chain structure, regional specializations, and industry trends (2026 updated).

Archived SourceRecently Updated
Web3

china-home-appliances-sourcing

Comprehensive home appliances industry sourcing guide for international buyers – provides detailed information about China's major appliances, kitchen appliances, and small appliances manufacturing clusters, supply chain structure, regional specializations, and industry trends (2026 updated).

Archived SourceRecently Updated