Computer Use Playbook
Overview
Use this skill for end-to-end computer automation across browser and desktop surfaces. Browser use is a major track, but not the only one. Prefer deterministic methods first, then escalate to visual/native automation only when required. For browser MCP workflows, treat tab_id as a required handle for all stateful actions.
Execution Mode (Default: Lesson-Lock)
When a matching topic already exists under references/learnings/<topic-slug>/lessons.md, run in lesson-lock mode.
- Execute the lesson checklist as written before trying any novel approach.
- Do not create a new topic slug when an existing topic clearly matches.
- Do not publish until all pre-publish lesson gates pass.
- If a lesson step fails, try only fallbacks documented in the same
lessons.mdfirst. - Use
experience-log.mdonly to fill missing detail, not to override lesson rules. - If documented lesson paths fail and the task is not human-gated, run bounded self-learning attempts, then codify the winning pattern.
Precedence Ladder (No Ambiguity)
Always follow this order:
- Reuse existing
lessons.mdfor the resolved topic slug. - If a lesson step fails, use only fallbacks already documented in that same
lessons.md. - If no documented fallback works and no human gate is present, perform bounded self-learning to discover a reliable path.
- Once a reliable path is found, update
lessons.mdandexperience-log.mdso the next run is mechanical. - Request human intervention only for login/2FA/CAPTCHA/security/policy gates or true hard blocks.
Playbook Structure
- Browser use (primary for web tasks): browser MCP tools, DOM snapshots, scripts, screenshots.
- Filesystem use: shell-native operations for deterministic file/process work.
- Native desktop use: coordinate and window automation only when DOM/shell are insufficient.
- Human-in-the-loop checkpoints: login, CAPTCHA, security prompts, or policy-gated steps.
Decision Order
- Identify the active surface: browser page, filesystem/process, or native desktop UI.
- For browser pages, use browser MCP tools first and keep a strict
tab_idcontract. - For filesystem/process work, use shell/system tools first (
rg,ls,find, etc.). - Escalate to vision or native UI automation only when deterministic methods are insufficient.
- If blocked by login, CAPTCHA, or security gates, switch to human-in-the-loop flow.
- Verify each critical step with state checks plus screenshot evidence.
Browser Automation (Major Track)
Use browser tools + DOM-first for browser flows. Avoid jumping to native desktop clicks while the target is still reachable by browser tools.
Preferred sequence:
open_taband capture returnedtab_id.navigate_to(tab_id, url)for explicit page transitions.dom_snapshot(tab_id, ...)orrun_script(tab_id, ...)to identify target.run_script(tab_id, ...)action (click/type/submit).read_page(tab_id, ...)/run_script(tab_id, ...)to verify URL/title/content.screenshot(tab_id, ...)as evidence.
Session behavior guidance:
- always pass
tab_idfornavigate_to,read_page,screenshot,dom_snapshot,run_script, andclose_tab. - never rely on implicit active-tab behavior.
- if a click opens a new tab/window, call
list_tabs, detect the newtab_id, and continue explicitly on thattab_id. - keep a local map of
purpose -> tab_idwhen handling multiple tabs.
Escalation triggers:
- dynamic overlays not stable via selectors,
- canvas/rendered controls,
- consent dialogs where selector path is inconsistent,
- native picker launched from browser (file upload dialog).
Do not overuse fallback:
- if a browser tool can do it, stay in browser tools.
- use native automation only for cross-app boundaries (OS dialogs, non-DOM UI).
File Explorer and Filesystem Automation
Prefer shell-native methods before GUI clicking.
Use shell when possible:
- search files:
rg --files,find - move/copy/rename:
mv,cp,mkdir - inspect metadata:
ls -la,stat
Use native UI only when the workflow is GUI-only:
- OS file picker from browser/app,
- drag-drop interactions not scriptable via API,
- app-specific explorer panes.
Native UI Automation
Use native UI automation for interactions outside application DOM/API.
Typical tools:
xdotoolfor key/click/type,xprop/xwininfofor window targeting.
Guidelines:
- ensure window focus before typing,
- prefer keyboard-driven deterministic paths,
- keep retries bounded and observable,
- re-check application state after each action.
Human-in-the-loop rules
Pause and ask for user intervention when blocked by:
- login/2FA challenges,
- CAPTCHA or anti-bot checkpoints,
- legal/security confirmation screens that require explicit human intent.
When waiting for user action:
- explain exactly what the user must do and where.
- issue an audible notification using
speakso the user notices immediately. - wait, then re-check state (
url,title, element visibility, screenshot) before continuing.
Special Cases
Consent dialogs
- DOM-first click (
Accept all/Reject all/localized variants). - if selector fails but button is visible, use coordinate/native fallback.
- confirm modal is not visible and main interaction path works.
CAPTCHA / anti-bot challenges
- do not attempt bypass logic.
- capture evidence and report blocked state clearly.
- require human-in-the-loop completion.
- notify user with
speakwhen intervention is required.
Login and account security gates
- try normal DOM steps first for username/password field fill and submit.
- if SSO, passkey, device approval, or 2FA requires human action, pause and request user action.
- after user confirms completion, re-snapshot and continue from verified page state.
File uploads
- use DOM file input assignment if available.
- if native picker opens, switch to native UI automation.
- verify upload appears in page/app state.
Verification Standard
Every important step should end with both:
- state evidence (URL/title/content/element state), and
- visual evidence (screenshot path).
If blocked, report:
- attempted method,
- blocker reason,
- evidence collected,
- next safe fallback.
Learning Library Structure
Use references/learnings/ as the canonical knowledge base.
references/learnings/index.md: topic registry and folder convention.references/learnings/general/: cross-task fallback logs.references/learnings/<topic-slug>/: topic-specific lessons and experience log.
Known canonical topic slugs:
x-posting(X / Twitter / Expost publishing)linkedin-posting(LinkedIn posting/comments)google-flowxiaohongshu-posting
Topic folder convention:
lessons.mdfor stable workflow rules.experience-log.mdfor incremental run learnings.
Continuous Learning Loop (Required)
Treat each real run as training data for future runs.
Priority contract:
lessons.mdis the source of truth for execution.experience-log.mdis supporting evidence used to refine or extend lessons.- If lesson and experience conflict, follow
lessons.mdand then update logs/lessons after the run.
Before starting similar work:
- Load
references/learnings/index.md. - Resolve the task to a canonical topic slug:
- X/Twitter/Expost ->
x-posting - LinkedIn/linkedin.com ->
linkedin-posting - Google Flow/labs.google/fx/tools/flow ->
google-flow - Xiaohongshu ->
xiaohongshu-posting
- X/Twitter/Expost ->
- If the canonical topic exists, use it directly and do not create a variant slug.
- Load topic
lessons.mdfirst when present:references/learnings/<topic-slug>/lessons.md
- Load topic
experience-log.mdsecond when present:references/learnings/<topic-slug>/experience-log.md
- Load
references/learnings/general/experience-log.mdonly as fallback context when topic files are missing or incomplete. - If no topic folder exists, create it with
lessons.mdandexperience-log.md, then run bounded self-learning to establish initial reliable lessons.
Before execution:
- Extract an ordered run checklist from topic
lessons.mdwith step IDs. - Execute step-by-step and mark each step pass/fail using state evidence.
- For publish flows, allow one publish action only after all pre-publish gates are passed.
- Use experience logs only to fill gaps not covered by lessons.
During execution:
- Capture failure signal and the exact checklist step where it appears.
- If blocked, apply only documented lesson fallback paths before any new approach.
- Keep one-action-at-a-time execution where UI state is fragile.
- If no documented fallback works and the issue is not human-gated, run bounded self-learning:
- try at most 2-3 alternative deterministic paths,
- verify each attempt with explicit state evidence,
- stop once one reliable path is found.
- If blocked by login/2FA/CAPTCHA/security/policy gates, pause and request human intervention with evidence.
After completion (or meaningful failure):
- Append a short run note to
references/learnings/<topic-slug>/experience-log.md. - Include: date, context, failure signal, root cause, fix pattern, reusable rule.
- If a new reliable rule or fallback was discovered, promote it into topic
lessons.mdimmediately. - Keep entries concise and deduplicated by updating prior rules instead of adding noisy repeats.
References
Load references/computer-use-techniques.md for command snippets and fallback templates.
Load references/learnings/index.md to select the right topic folder.
Load topic references/learnings/<topic-slug>/lessons.md first.
Load topic references/learnings/<topic-slug>/experience-log.md second.
Load references/learnings/general/experience-log.md only as fallback for cross-task patterns.
Load references/learnings/x-posting/lessons.md for all X/Twitter/Expost publishing.
Load references/learnings/linkedin-posting/lessons.md for all LinkedIn publishing.
Load references/learnings/google-flow/lessons.md when automating Google Flow video creation.
Load references/learnings/google-flow/experience-log.md after lessons for incremental learnings.