gui-agent-mobile

GUI Agent Mobile Skill

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "gui-agent-mobile" with this command: npx skills add ugorange/gui_agent_skill/ugorange-gui-agent-skill-gui-agent-mobile

GUI Agent Mobile Skill

This skill wraps gui_agent_skill CLI so Codex can execute complex Android GUI workflows.

When To Use

  • User asks to control an Android phone/emulator UI.

  • User asks for multi-step mobile automation with session continuation.

  • User asks to inspect current device/app screen state.

Command Workflow

  • New task: python -m gui_agent_skill.cli execute --task "<task>" [--provider <provider>] [--device-id <id>] [--max-steps <n>] [--timeout-sec <sec>] [--stateless]

  • Continue task: python -m gui_agent_skill.cli continue [--session-id <id>] [--reply "<text>"] [--task "<task>"] [--device-id <id>] [--max-steps <n>] [--timeout-sec <sec>]

  • Status: python -m gui_agent_skill.cli status [--device-id <id>]

  • Providers: python -m gui_agent_skill.cli providers

  • Direct coordinate tap (no model planning): python -m gui_agent_skill.cli tap --x <x> --y <y> [--coord-space auto|pixel|ratio] [--device-id <id>] [--post-delay-ms <ms>] [--timeout-sec <sec>]

Fallback when module import fails:

  • python cli.py execute ...

  • python cli.py continue ...

  • python cli.py status ...

  • python cli.py providers

  • python cli.py tap ...

Response Handling

  • Always parse returned JSON and report success .

  • Preserve and surface session_id for follow-up turns.

  • Respect timeout controls: pass --timeout-sec for bounded runtime and check timed_out in error responses.

  • When terminated_subprocesses is present, report that forced cleanup happened (timeout/interruption/tail cleanup).

  • Use next_action to drive interaction:

  • continue : proceed with next step

  • needs_reply : ask user for explicit reply content

  • complete : close task

  • Include caption and screenshot_path when available.

  • Check session_mode and continuation_supported :

  • session_mode=stateful : normal execute -> continue

  • session_mode=stateless : do not call continue ; run a new execute --stateless instead

  • If error=tap_only_mode_enabled , switch to tap /click ; do not retry execute /continue .

Execution Modes (Direct vs Planner-Controlled)

Use two complementary modes based on task complexity:

  • Direct execution mode (default): GUI Agent can receive and execute a single complex task with multiple actions/clicks.

  • Planner-controlled mode (for complex global tasks): Codex/Claude acts as planner and GUI Agent acts as executor.

When to switch to planner-controlled mode:

  • Long-horizon tasks with many dependent steps.

  • High-branching tasks where each screen state changes next action.

  • Tasks that need precise, low-risk, step-by-step control.

Planner-controlled workflow:

  • Start one global session with execute , then iteratively use continue .

  • Planner inspects each new screenshot/state and decides the next micro-steps.

  • Executor receives explicit, concrete commands (UI element identity, relative position, row/column/layer description, buttons, sequence) and performs them.

  • Repeat inspect -> plan -> execute until task completion.

Direct coordinate mode:

  • Use tap only when the user explicitly asks for coordinate-based control.

  • This path skips adapter/model planning and sends adb shell input tap directly.

  • Prefer --coord-space ratio when user gives normalized coordinates, or auto for mixed input.

  • After each tap , inspect returned screenshot_path and coordinate fields before the next action.

  • This is the only available control path when tap_only_mode=true (for example: installed with python install.py --tap-only ).

Stateless Mode

Use stateless mode for short, incremental actions where each call must start a new conversation without resetting the phone environment:

python -m gui_agent_skill.cli execute --task "<task>" --stateless [--device-id <id>] [--provider <provider>]

Behavior:

  • Starts a fresh adapter conversation for each call.

  • Skips local session persistence in gui_agent_skill .

  • Keeps current app/screen context (no forced Home reset in local/gelab path).

  • Best for minimal one-turn tasks.

This pattern is generic and applies to games and non-game global workflows alike.

Instruction style requirement in planner-controlled mode:

  • Do not use coordinate-based commands.

  • Use semantic location language (for example: "top row middle grass tile", "leftmost tile in the second row", "bottom toolbar shuffle button").

  • To improve efficiency, planner can issue one or multiple semantic actions in one turn.

Instruction style requirement in direct coordinate mode:

  • Coordinate commands are allowed.

  • Verify coordinate conversion using returned coordinate.screen_size , coordinate.computed , and coordinate.tap .

Safety Notes

  • execute/continue can operate real devices; confirm intent for risky actions.

  • If command fails, check ADB connectivity first; then check provider configuration unless running in tap-only mode.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Financial Ai Agent

查询金融行情数据(股票、黄金、原油等)。当用户提到查询行情、价格、指数、股票时使用。例如:'查询纳斯达克'、'现在金价多少'、'标普最近一周表现'、'设置我的key'、'替换key'。

Registry SourceRecently Updated
Automation

Git Workflow

Use this skill for any git commit, pull request, or release task. Invoke immediately when the user wants to: stage and commit changes, write a commit message...

Registry SourceRecently Updated
Automation

Deck Narrative Planner

把材料转成 PPT/Deck 叙事结构,生成每页一句标题、证据需求与过渡逻辑。;use for presentation, deck, storytelling workflows;do not use for 直接生成花哨视觉稿, 编造证据.

Registry SourceRecently Updated
Automation

Atlassian Jira by altf1be

Atlassian Jira Cloud CRUD skill — manage issues, comments, attachments, workflow transitions, and JQL search via Jira REST API v3 with email + API token auth.

Registry SourceRecently Updated