gui-agent

GUI automation via visual detection. Clicking, typing, reading content, navigating menus, filling forms — all through screenshot → detect → act workflow. Supports macOS and Linux.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "gui-agent" with this command: npx skills add alfredjamesli/gui-claw

GUI Agent

STEP 0: Activate Platform (MANDATORY FIRST STEP)

Before any GUI operation, run:

python3 {baseDir}/scripts/activate.py

This detects your OS, sets up the correct action commands, and outputs platform context. After running, {baseDir}/actions/_actions.yaml contains your platform's commands.

Workflow

OBSERVE → LEARN → ACT → VERIFY → SAVE

OBSERVE — Take screenshot → run OCR + detector → understand current state → read {baseDir}/skills/gui-observe/SKILL.md
LEARN — First time with an app? Save components to memory → read {baseDir}/skills/gui-learn/SKILL.md → learn_from_screenshot() auto-outputs app tips if available
ACT — Pick target → execute using _actions.yaml commands → verify → read {baseDir}/skills/gui-act/SKILL.md → read {baseDir}/actions/_actions.yaml for available commands
VERIFY — Screenshot again → confirm action succeeded
SAVE — Record state transitions to memory → read {baseDir}/skills/gui-memory/SKILL.md for memory structure

Core Rules

Coordinates from detection only — OCR or GPA-GUI-Detector, NEVER from guessing
Look before you act — every action must be justified by what you observed
image tool = understanding only — use it to decide WHAT to click, get WHERE from OCR/detector

Sub-Skills Reference

Sub-Skill	When to read
`skills/gui-observe/SKILL.md`	Before screenshots or detection
`skills/gui-learn/SKILL.md`	Before learning a new app
`skills/gui-act/SKILL.md`	Before any click/type action
`skills/gui-memory/SKILL.md`	For memory structure details
`skills/gui-workflow/SKILL.md`	For multi-step navigation
`skills/gui-setup/SKILL.md`	For first-time machine setup
`skills/gui-report/SKILL.md`	For task performance reporting

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Open Registry Record Open in ClawHub

Related Skills

Related by shared tags or category signals.

Coding

Miaoji Asin Clinic Pro

亚马逊ASIN诊所Pro版，90天行动计划+竞品对标+季节性优化日历。从合规度、广告度、评论度、视觉度、内容度五维升级为可执行的长期作战方案。基础功能可使用 miaoji-asin-clinic 免费版。

Registry SourceRecently Updated

00wangm-a3

Coding

wechat-publish-pro

Pure Python tool to convert Markdown to styled HTML and publish articles to WeChat official account drafts with AI-based content refinement and theme support.

Registry SourceRecently Updated

00yuesf

Coding

Miaoji Asin Clinic

基于ASIN和品类，快速诊断亚马逊Listing五维健康指数并智能排序修复优先级，提供详细分析与个性化修复方案。

Registry SourceRecently Updated

360wangm-a3

Coding

Toonany

A Claude Code skill for creating AI-generated short dramas (漫剧) from novels and stories. Use when user mentions "漫剧创作", "小说转剧本", "分镜生成", "短剧制作", "故事线生成", "大纲...

Registry SourceRecently Updated

421casperkwok