pinchtab

Use this skill when a task needs browser automation through PinchTab: open a website, inspect interactive elements, click through flows, fill out forms, scrape page text, log into sites with a persistent profile, export screenshots or PDFs, manage multiple browser instances, or fall back to the HTTP API when the CLI is unavailable. Prefer this skill for token-efficient browser work driven by stable accessibility refs such as `e5` and `e12`.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "pinchtab" with this command: npx skills add pinchtab/pinchtab/pinchtab-pinchtab-pinchtab

Browser Automation with PinchTab

CLI-first browser skill. Use pinchtab commands.

Core Workflow

  1. Ensure the right profile/instance is selected when needed.
  2. Navigate: pinchtab nav <url> --snap — auto-starts the local server if needed, then returns tab ID + interactive snapshot in one call.
  3. Interact: pinchtab click <ref> --snap-diff — returns OK + only changed elements (most token-efficient).
  4. For read-only observation: pinchtab text when you won't act on refs.

Key optimization: Use --snap-diff on click, fill, select, back, forward, reload to get only added/changed/removed elements — most token-efficient for multi-step flows. Use --snap when you need the full snapshot (e.g., first navigation, or after major page changes). Use --text when you need prose content for verification (skips snap, returns page text directly).

--snap-diff returns the same compact format as snap, but with change markers and a header showing counts:

# Page Title | URL | 57 nodes | +2 ~1 -0
e0:link "Home"
e5:button "Submit" [+]
e12:textbox val="updated" [~]
# removed: e99

[+] = added, [~] = changed, removed refs listed at end. All valid refs are shown — no need to remember previous snapshot. Do not follow with redundant snap; only call text when you need prose content.

Fallback observation (when --snap wasn't used):

  • pinchtab snap — interactive elements + headings in compact format (default).
  • pinchtab snap [selector] — scope the current-tab snapshot to one element.
  • pinchtab snap --full — all nodes as JSON (for debugging).
  • pinchtab text — content only (use when snap is missing prose you need).

Rules: only nav <url> auto-starts the default local server; snap, text, html, find, and action commands operate on an already-running server/current tab. Explicit --server targets are never auto-started. Never act on stale refs; screenshots only for visual/debug; choose the instance/profile up front for parallel or multi-site work.

Selectors

Unified selectors accepted by any element-targeting command:

  • Ref: e5 — from snapshot cache (fastest).
  • CSS: #login, .btn, [data-testid="x"]document.querySelector.
  • XPath: xpath://button[@id="submit"] — CDP search.
  • Text: text:Sign In — visible text match.
  • Semantic: find:login button — natural language via /find.

Auto-detection: bare eN→ref, #/./[...]→CSS, //→XPath. Use explicit css:/xpath:/text:/find: prefixes when ambiguous. HTTP API uses the same syntax in the selector field (legacy ref still accepted).

Command Chaining

&& when you don't need intermediate output (pinchtab nav <url> --snap && pinchtab click e3 --snap-diff). Run separately when you must read refs before acting.

Challenge Solving

Pages showing "Just a moment..." etc.: POST /solve {"maxAttempts":3} (or /tabs/TAB_ID/solve). Best with stealthLevel:"full". Safe to call speculatively — returns immediately if no challenge. See api.md.

Authentication and State

Patterns: (1) one-off pinchtab instance start; (2) reuse profile instance start --profile work --mode headed, switch to headless after login; (3) HTTP POST /profiles then POST /profiles/<name>/start; (4) human-assisted headed login, agent reuses headless. Agent sessions: pinchtab session create --agent-id <id> or POST /sessions → set PINCHTAB_SESSION=ses_....

Essential Commands

Server and targeting

pinchtab server | daemon install | health
pinchtab instances | profiles
pinchtab --server http://localhost:9868 snap -i -c  # target a specific instance

Navigation and tabs

pinchtab nav <url>                                  # auto-starts default local server; flags: --snap, --new-tab, --tab <id>, --block-images, --block-ads, --print-tab-id
pinchtab back | forward | reload                    # all support --snap, --snap-diff, --text
pinchtab tab                                        # list tabs
pinchtab tab <tab-id>                               # focus tab
pinchtab nav <url> --new-tab                        # force another tab
pinchtab tab close <tab-id>
pinchtab instance navigate <instance-id> <url>

Tab state is automatic: nav persists the tab ID to a state file, and subsequent commands read it. Just run pinchtab nav URL then pinchtab snap -i -c — the tab is remembered. For explicit control, use --tab <id>.

Parallel agents: when multiple agents share the same browser instance, the automatic tab state file becomes a race condition — one agent's nav overwrites another's tab ID. Each agent must manage its own tab ID explicitly: open with nav <url> --new-tab, capture the tab ID from the output, and pass --tab <id> on every subsequent command (snap, click, fill, text, eval, press, scroll, frame, etc.).

Observation

pinchtab snap [selector]                            # default: compact + interactive; flags: --full (JSON), -d (diff), --selector <css>, --max-tokens <n>
pinchtab text                                       # Readability-filtered page text
pinchtab text --full                                # raw document.body.innerText (alias: --raw)
pinchtab text <selector>                            # ref / -s CSS / xpath:... — text from one element
pinchtab text --json                                # full JSON (url/title/truncated)
pinchtab find <query>                               # semantic search; --ref-only for just the ref

Guidance:

  • snap — default observation (compact + interactive). Returns interactive elements + headings. Prefer this over separate text calls.
  • snap --full — all nodes as JSON; for debugging or when you need the full tree.
  • snap -d — standalone diff from previous snapshot. Use only when you need a diff without performing an action; for any click/fill/select/back/forward/reload, --snap-diff on the action itself already gives you the authoritative post-action state.
  • text — reading articles/dashboards when you won't act on refs. Falls back to --full when Readability drops content you need.
  • text <selector> — read one element without pulling the whole page.
  • find <query> — skip the snapshot when you can describe the target in a phrase. --ref-only pipes straight into click/fill/type.
  • Refs from snap -i and full snap are numbered differently — do not mix; re-snapshot before acting if you switched modes.
  • Use --block-images on nav for read-heavy tasks. Reserve screenshots/PDFs for visual verification.

Interaction

All interaction commands accept unified selectors (see Selectors above).

pinchtab click <selector>                           # flags: --snap, --snap-diff, --text, --wait-nav, --x/--y (coords), --dialog-action accept|dismiss [--dialog-text "..."]
pinchtab dblclick <selector>
pinchtab mouse move|down|up <selector|x y>          # --button left|middle|right
pinchtab mouse wheel <ms> --dx <n> --dy <n>
pinchtab drag <from> <to>                           # or: drag <selector> --drag-x <n> --drag-y <n>
pinchtab type <selector> <text>                     # keystroke events
pinchtab fill <selector> <text>                     # set value directly; flags: --snap, --snap-diff, --text
pinchtab press <key>                                # Enter, Tab, Escape, ...
pinchtab hover <selector>
pinchtab select <selector> <value|text>             # flags: --snap, --snap-diff, --text; matches value attr, falls back to visible text
pinchtab scroll <pixels|direction|selector>         # `scroll 1500`, `scroll down`, `scroll '#footer'`

Rules:

  • Default output is OK; use --json for recovery metadata. Errors go to stderr as ERROR: <cmd>: <reason>.
  • Prefer --snap-diff with click, fill, select, back, forward, reload — returns OK + only changed elements. Use --snap when you need the full snapshot (first nav, major page change).
  • Prefer fill for form entry; type only when the site depends on keystroke events.
  • click --wait-nav when a click navigates. May return {"success":true} or Error 409: unexpected page navigation — treat 409 as success and verify with fresh snap/text.
  • Use low-level mouse only for drag handles, canvas widgets, or exact pointer sequences.
  • JS dialogs: --dialog-action accept|dismiss, --dialog-text for prompt() responses.
  • HTTP scroll action: "scrollX"/"scrollY" for pixel deltas, "selector" to scroll into view — x/y are viewport coords, not deltas.
  • HTTP GET /download?url=... returns JSON {contentType, data (base64), size, url}; only http/https; private/internal hosts blocked unless in security.downloadAllowedDomains.

Waiting

Use for async DOM settling (spinners, toasts, XHR).

pinchtab wait <selector>                            # default: visible; --state hidden to wait for disappear
pinchtab wait --text "..." | --not-text "..."       # text appear / disappear
pinchtab wait --url "**/dashboard"                  # glob: **, *, ?
pinchtab wait --load ready-state|content-loaded|network-idle
pinchtab wait --fn "window.dataReady === true"      # requires security.allowEvaluate
pinchtab wait 500                                   # fixed ms delay (last resort, max 30000ms)

Timeout 10s default, 30s max via --timeout <ms>. Prefer --not-text/--state hidden over polling.

Export, debug, verification

pinchtab screenshot [-o path.png] [-q <jpeg-quality>]   # format by extension
pinchtab pdf [-o path.pdf] [--landscape]

Advanced (explicit opt-in only)

pinchtab eval "document.title"                      # --await-promise for async
pinchtab download <url> -o /tmp/out.bin
pinchtab upload /absolute/path -s <css>
  • eval: narrow read-only DOM inspection unless user asks for mutation.
  • download: prefer temp/workspace path over arbitrary filesystem.
  • upload: path must be user-provided or clearly approved.

HTTP API fallback

Use curl only when CLI is unavailable. Instance port (e.g. 9867):

  • POST /navigate {"url":"..."}
  • GET /snapshot?filter=interactive&format=compact
  • POST /action {"kind":"fill","selector":"e3","text":"..."} — kinds: click (waitNav:true), fill, type, press, select, hover, scroll (scrollX/scrollY/selector), drag (dragX/dragY).
  • POST /actions — batch in one round-trip. Body: array or {"actions":[...],"stopOnError":true,"tabId":"..."}. Response has per-step {index, success, result?, error?}.
  • GET /text, POST /solve {"maxAttempts":3}.
  • Tab-scoped: /tabs/TAB_ID/<endpoint> for navigate|snapshot|text|action|actions|screenshot|pdf|back|forward|close|wait|download|upload|handoff|resume|solve. Auth: Authorization: Bearer <token>.

Common Patterns

  • Form: nav --snapfill <ref> <text> --snap-diff per field → click --wait-nav --snap-diff submit → verify with text. Always click submit; never press Enter.
  • Multi-step: use click --snap-diff to get only changed refs with each action — most token-efficient for flows with many steps.
  • Direct selectors: skip the snapshot when structure is known — click "text:Accept", fill "#search" "q".

Verification & Gotchas

  • text confirms success messages / navigation outcomes. Default is Readability-filtered; may drop nav, repeated headlines, short-text nodes, or collapse lists. Use text --full (raw document.body.innerText) when verifying list/grid/tab/accordion pages, the marker is short, or a default read came back missing content you saw in snap.
  • Stale refs after a change are expected — fetch fresh refs instead of retrying.
  • {"clicked":true,"submitted":true} means the event fired, not that the server accepted or HTML validation passed. Verify via snap/text — or use --snap-diff on the action itself, which already reflects the post-event page state.
  • Same-origin iframes: pinchtab frame <target> sets a stateful scope inherited by subsequent selector-based snap/action/text calls. Target accepts main, an iframe ref, CSS for the iframe, a frame name, or a URL. Nested iframes need multiple hops. Full snap (no -i) flattens same-origin iframe descendants and ref-based actions work across the boundary. Cross-origin iframes aren't exposed as scopes — fall back to eval against iframe.contentDocument. text --frame <frameId> takes a 32-char hex frameId (from pinchtab frame output), not a CSS selector. One-shot read idiom: FID=$(pinchtab frame '#f' | jq -r .current.frameId); pinchtab frame main; pinchtab text --full --frame "$FID".
  • eval → always IIFE when introducing identifiers. Top-level const/let/class collide across calls in the shared realm (SyntaxError: Identifier 'x' has already been declared). Also needed to project DOMRect into a JSON-serializable object: pinchtab eval "(() => { const r = document.querySelector('#x').getBoundingClientRect(); return {x: r.x, y: r.y, w: r.width, h: r.height}; })()". Single expressions without identifiers (document.title) are fine bare.
  • text reads hidden nodes: both default and --full include display:none / visibility:hidden content because they read raw DOM. To confirm something is actually visible, use snap (accessibility tree respects visibility) or eval against offsetHeight / getComputedStyle().display. Common trap: pre-seeded hidden success <div> reported by text before submission.
  • Compact snap shows <option> by visible text, not value. select accepts either; only eval + Array.from(select.options) to debug a no-match.
  • text:<value> selectors use JS-level search and can flake with DOM Error / context deadline exceeded on large pages. Prefer refs from a fresh snap -i -c — they resolve by backend node IDs.
  • snap -i -c skips non-interactive descendants. For iframe interiors set a frame scope or use full snap.
  • aria-expanded is usually on the outer container of accordions/menus, not the click trigger. Verify via the wrapper's attribute.

References

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

arxiv-paper-writer

Use this skill whenever the user wants Claude Code to write, scaffold, compile, debug, or review an arXiv-style academic paper, especially survey papers with LaTeX, BibTeX citations, TikZ figures, tables, and PDF output. This skill should trigger for requests like writing a full paper, creating an arXiv paper project, turning a research topic into a LaTeX manuscript, reproducing the Paper-Write-Skill-Test agent-survey workflow, or setting up a Windows/Linux Claude Code paper-writing loop.

Archived SourceRecently Updated
Coding

cli-proxy-troubleshooting

排查 CLI Proxy API(codex-api-proxy)的配置、认证、模型注册和请求问题。适用场景包括:(1) AI 请求报错 unknown provider for model, (2) 模型列表中缺少预期模型, (3) codex-api-key/auth-dir 配置不生效, (4) CLI Proxy 启动后 AI 无法调用, (5) 认证成功但请求失败或超时。包含源码级排查方法:模型注册表架构、认证加载链路、 SanitizeCodexKeys 规则、常见错误的真实根因。

Archived SourceRecently Updated
Coding

visual-summary-analysis

Performs AI analysis on input video clips/image content and generates a smooth, natural scene description. | 视觉摘要智述技能,对传入的视频片段/图片内容进行AI分析,生成一段通顺自然的场景描述内容

Archived SourceRecently Updated
Coding

frontend-skill

全能高级前端研发工程师技能。擅长AI时代前沿技术栈(React最新 + shadcn/ui + Tailwind CSS v4 + TypeScript + Next.js),精通动效库与交互特效开发。采用Glue Code风格快速实现代码,强调高质量产品体验与高度友好的UI视觉规范。在组件调用、交互特效、全局Theme上保持高度规范:绝不重复造轮子,相同逻辑出现两次即封装为组件。具备安全意识,防范各类注入攻击。开发页面具有高度自适应能力,响应式设计贯穿始终。当用户无特殊技术栈要求时,默认采用主流前沿技术栈。

Archived SourceRecently Updated