Crab Catch — Research Skill Orchestration
Prerequisites
CLAWBOT must run these commands at session start (not just read them):
which agent-browser || npm install -g agent-browser
agent-browser install
Data Sources
CLAWBOT is the research driver — responsible for analysis, judgment, iteration, and report generation. The APIs, scripts, and tools below are data-fetching methods only.
API Base URL: https://crab-skill.opsat.io
Signature Authentication
All API requests except /api/health require Crab signature headers.
- Run
node skills/scripts/crab-sign.jsonce at session start to get headers JSON. (First run auto-generates credentials; cached signature reused if still valid within 24h.) - Store the output and attach these four headers to all subsequent API requests:
X-Crab-Timestamp,X-Crab-Signature,X-Crab-Key,X-Crab-Address. - Only re-run with
--refreshif API returnsauth_expired.
Twitter & Social Data (see twitter-analysis/SKILL.md for full params)
| Category | Key endpoints | Purpose |
|---|---|---|
| Profile | /api/twitter/user, tweets, replies | Basic info, content, interactions |
| Risk signals | /api/twitter/deleted-tweets, follower-events | Removed content, follow/unfollow patterns |
| Reply threads | /api/readx/tweet-detail-conversation-v2 | Primary comment source (fast, raw data) |
| Quote tweets | /api/readx/tweet-quotes | KOL commentary, community opinions with context |
| Engagement data | /api/readx/tweet-detail-v2 | Views/source — detect bot-inflation |
| Deleted content | /api/readx/tweet-results-by-ids | Batch fetch deleted tweet snapshots |
| Long-form | /api/readx/tweet-article | Technical analyses, roadmaps published as articles |
| Relationships | /api/readx/following-light, friendships-show | Inner circle, team relationship verification |
| Credibility | /api/twitter/kol-followers, /api/readx/user-verified-followers | Who credible follows them (verified-followers needs user_id not username) |
| Search | /api/twitter/search, /api/readx/search2 | Risk signals, disputes, community discussions |
GitHub Code (see github-analysis/SKILL.md)
Local script skills/scripts/github_analyze.js — no external API.
convertToMarkdown(url, options) or analyzeRepository(url, options).
On-chain Data (see onchain-audit/SKILL.md)
Binance API — address + chainName (uppercase: BSC/ETHEREUM/BASE/SOLANA):
| Endpoint | Description |
|---|---|
/api/onchain/audit | Contract audit (dual-source) |
/api/onchain/token-info | Token metadata and market dynamics |
/api/onchain/wallet | Wallet positions (BSC/BASE/SOLANA only) |
/api/onchain/token-search | Token search (requires keyword) |
Bitget API — chain + contract (lowercase: bnb/eth/base/sol):
| Endpoint | Description |
|---|---|
/api/onchain-2/token-info | Token details |
/api/onchain-2/token-price | Token price |
/api/onchain-2/tx-info | Transaction statistics |
/api/onchain-2/liquidity | Liquidity pool info |
/api/onchain-2/security-audit | Security audit |
Onchain Explorer API — chain + address (see API_EXPLORER.md for full params):
| Endpoint | Chain | Description |
|---|---|---|
/api/explorer/contract | ETH, BSC | Contract ABI, source code, compiler info, proxy detection |
/api/explorer/token-history | ETH, BSC, SOL | Token transfer history with pagination |
/api/explorer/sol-address | SOL | SOL/SPL balances + recent transfer records |
Website Content (see agent-browser/SKILL.md)
CLAWBOT uses agent-browser CLI to open and inspect websites.
Language Preference
Output language matches the user's input language; default Chinese (zh-CN). Raw API data (usernames, tickers, addresses, code) stays in original form.
Orchestration Flow
Callback-driven: each module's output triggers queries in other modules. Modules keep feeding each other until no new high-value leads remain.
User provides URL / Ticker / contract address + research intent
│
▼
Step 1 — Parse input, initialize entity queue
Extract: Twitter links, GitHub repos, contract addresses, tickers, chain
Aggregator URLs → extract entities from path (see rules below)
Initialize:
entity_queue = [{ entity, type, depth: 0 }]
processed = set()
claims = [] # official claims to verify later
fund_trace = [] # addresses to trace fund flow
team_members = [] # { handle, role, source }
MAX_DEPTH = 2
│
▼
Step 2 — Multi-module collection
While entity_queue is not empty:
pop → skip if processed or depth > MAX_DEPTH → route by type:
URL → 2a Website
Twitter → 2b Social
GitHub → 2c Code
Contract → 2d Chain
Ticker → 2d token-search first
After each module: extract new entities → queue at depth+1
(see Cross-module Callback Summary below for full routing)
── 2a. Website exploration ──────────────────────────────────
**Use `agent-browser` CLI** (see agent-browser/SKILL.md for commands).
agent-browser renders JS, captures interactive elements, and allows
clicking through pages — essential for DApp testing and dynamic sites.
Fallback to WebFetch only when agent-browser fails (e.g. install issue).
Visit pages in order:
Landing → Docs/Whitepaper → Team/About → DApp → Tokenomics → Footer
Extract from each page:
- Official claims → append to claims[] ("audited by X", "100M supply",
"decentralized", "LP locked", partnerships, etc.)
- Team names + social links → team_members[] + queue 2b
- Contract addresses → queue 2d
- GitHub repos → queue 2c
DApp proactive testing (key investigation step):
- Open DApp via agent-browser, wait for load
- Does the UI render real data or just a mock shell?
- Are core functions visible and interactive?
- Check network requests: broken APIs? Suspicious external calls?
- If DApp shows on-chain values → cross-check against 2d data
- Screenshot as evidence
Security check: SSL, domain age, redirects, suspicious popups.
Fallback: blank/Cloudflare → retry with `--headed`. No website → flag as risk.
── 2b. Social data collection (Twitter) ─────────────────────
Purpose: collect project claims, discover team, find community disputes.
NOT the investigation core — feeds into 2a/2c/2d for verification.
For project official account:
1. /api/twitter/user + tweets + replies + deleted-tweets (parallel)
2. Pick 1-2 high-value tweets → conversation-v2 + quotes
3. /api/readx/following-light → identify team members from following list
(mutual follows, bio mentions project, new account only posts about project)
→ add to team_members[], queue 2b at depth+1
4. Risk search: search2 "{project} scam OR rug OR hack OR exploit"
For team member accounts (depth 1+):
1. /api/twitter/user + tweets (parallel)
2. Only retain project-related tweets → append to claims[]
(team member statements carry same weight as official claims)
3. friendships-show with other known team members
(all isolated = fake team red flag)
── 2c. Code analysis (GitHub) ───────────────────────────────
github-analysis → analyzeRepository / convertToMarkdown
Focus: claim verification + security scan
- "Open source" → repo public? Code complete or stub?
- "Audited" → audit report in repo? Code matches?
- Hardcoded addresses (admin, treasury) → queue 2d + fund_trace[]
- Suspicious patterns: obfuscation, eval(), wallet-draining code,
backdoors, malicious dependencies, clipboard hijacking
- Contributor identities → try resolve to Twitter → team_members[]
- Freshness: last commit, bus factor, fork-of-fork detection
── 2d. On-chain analysis (investigation core) ───────────────
Phase 1 — Token & contract basics (parallel):
Binance: audit, token-info, wallet
Bitget: token-info, token-price, tx-info, liquidity, security-audit
Cross-verify between sources.
Phase 2 — Contract deep inspection (ETH/BSC):
/api/explorer/contract → ABI + source code
- Read ABI: identify owner-only functions (pause, mint, blacklist,
upgrade, setFee, transferOwnership)
- If proxy contract: queue implementation address (recursive 2d)
- If source verified: scan for backdoor patterns in code
- If NOT verified: flag as risk (cannot audit)
Phase 3 — Fund flow tracing:
Triggered by: fund_trace[], deployer discovery, large holder detection
/api/explorer/token-history → trace address transaction history
Tracing logic (recursive within depth limit):
1. Fetch token-history for the address
2. Identify significant transfers:
- Large outflows to unknown wallets → trace recipient
- Inflows from deployer → insider?
- Flows to/from known exchanges → cash-out pattern?
- Circular flows (A→B→C→A) → wash trading?
3. For each significant counterparty:
- New address → add to fund_trace[] at depth+1
- Known exchange → note cash-out
- Mixer/bridge → flag as risk signal
4. Stop when: depth limit / no significant new flows
SOL specific:
- /api/explorer/sol-address → balance snapshot + SPL tokens
- /api/explorer/token-history (SOL) → filter by type/source
SWAP on Jupiter/Raydium = trading; TRANSFER = fund movement
│
▼
Step 3 — Verify claims & resolve contradictions
Goal: every official claim gets a verdict. Contradictions are the story.
If verification needs data not yet collected → callback to Step 2.
Process claims[] collected during Step 2:
| Claim | Verify with | How |
|-------|-------------|-----|
| "Decentralized" | Explorer ABI + on-chain | pause/mint/blacklist? EOA or multisig? |
| "Audited by X" | Website + GitHub + firm | Link valid? Code matches audited version? |
| "Max supply N" | Explorer source code | Uncapped mint()? Owner can mint? |
| "Locked liquidity" | On-chain LP lock | Lock verified? Duration? Amount? |
| "Open source" | GitHub + Explorer | Public? Verified? ABI matches? |
| "Partnerships" | Partner channels (browser) | Partner acknowledges? One-sided? |
Priority: verify claims affecting user funds first.
Mark each: ✅ Verified / ⚠️ Unverified / ❌ Contradicted
For each ❌ or anomaly → dispute analysis:
1. Project claim vs actual data (on-chain, code) → cite both
2. Community analysis → search2 + conversation threads
3. On-chain evidence → tx hashes, fund flow from fund_trace[]
4. Synthesize: claim → reality → community → verdict
🔴 → full analysis / 🟡 → summary only
│
▼
Step 4 — Hypothesis-driven deep dig
Follow high-value leads from Steps 2-3. May callback to any module.
Key hypotheses:
- Contract upgradable → who holds proxy admin?
- Large holder → tokens from deployer? Insider?
- Deleted tweets → timing vs on-chain events?
- Deployer has other contracts → same pattern? Previous rugs?
Team verification:
- Identity: Twitter vs website claims vs GitHub commits
- History: search2 "{name} founder OR CEO", wallet history
- Red flags: account age = project age? No pre-project history?
Any new lead → callback to Step 2 (respecting MAX_DEPTH).
Stop when: no new leads or sufficient for judgment.
─── END OF DATA COLLECTION ───
│
▼
Step 5 — Distill (no fetching)
Rank by impact. Discard noise. Connect dots. Reconstruct timeline.
│
▼
Step 6 — Produce report (see REPORT_TEMPLATE.md)
Curated intelligence, NOT a data dump. Focus on:
1. Contradictions & anomalies
2. Claim verification results
3. Fund flow analysis
4. Proactive test results (DApp, website)
5. Security findings
Omit routine confirmations. [[N]](url) citations required.
Language follows user input; default zh-CN.
Cross-module Callback Summary
Each module feeds discoveries into other modules:
┌──────────┐ handles, claims ┌──────────┐
│ Website │ ──────────────────────→ │ Twitter │
│ (2a) │ ◀────────────────────── │ (2b) │
└─────┬─────┘ URLs from tweets └─────┬─────┘
│ contracts, repos │ addresses, accusations
▼ ▼
┌──────────┐ hardcoded addrs ┌──────────┐
│ GitHub │ ──────────────────────→ │ On-chain │
│ (2c) │ ◀── code vs claims ──── │ (2d) │
└──────────┘ └─────┬─────┘
│ recursive
▼
(2d again)
| Source | Discovers | Triggers |
|---|---|---|
| Website | Twitter handles, claims, contracts, repos | → team_members[]/claims[]/2b/2c/2d |
| URLs, addresses, accusations, team members, statements | → 2a/2d/fund_trace[]/claims[]/team_members[] | |
| GitHub | Contributors, hardcoded addrs, code contradictions, trojans | → team_members[]/2d/fund_trace[]/claims[] |
| On-chain | Proxy impl, deployer contracts, large holders, data contradictions | → 2d recursive/fund_trace[]/claims[] |
Depth control: 0 = user input → 1 = discovered → 2 = max, high-value only → beyond: note only
Failure Handling
| Failure type | Action |
|---|---|
| Timeout / 502-504 | Retry once after 3s |
| 429 (rate limit) | Retry once after Retry-After or 10s |
| 401 / 403 / 400 | Do not retry; skip |
| Other errors | Do not retry; skip |
On failure: skip source, continue. Include Data Coverage note in report. Omit sections with no data; never halt for a single failure.
Entity Extraction Rules
| Entity Type | Identification |
|---|---|
| Twitter profile | x.com/{username} or twitter.com/{username} |
| Twitter post | x.com/{username}/status/{id} |
| GitHub repo | github.com/{owner}/{repo} |
| EVM contract | 0x + 40 hex chars |
| Solana address | base58 32–44 chars + contextual keywords (below) |
| Ticker | $XXX or ticker/symbol/token: XXX |
| Chain | URL domain / path keywords / page text |
Solana keywords (at least one must be present):
solana, sol, raydium, jupiter, orca, meteora, pump.fun,
moonshot, birdeye, solscan, solana.fm, spl token, program id
No keyword → flag as "unresolved address".
Aggregator URL Parsing
| Platform | Path | Parsed result |
|---|---|---|
| clawhub.ai | /owner/repo | → GitHub repo (use github-analysis, skip browser) |
| dexscreener.com | /chain/address | → contract + chain |
| dextools.io | /app/chain/pair/address | → contract + chain |
| pump.fun | /address | → Solana contract |
| gmgn.ai | /chain/address | → contract + chain |
| birdeye.so | /token/address | → contract |
| defined.fi | /chain/address | → contract + chain |
Data Display Rules
- Skip any metric that returned an error or timed out — leave it out entirely.
- Do not display API latency unless it was actually measured successfully.
Local Memory & Report Storage
- Save report as PDF to
~/.crab-catch/reports/{project_name}_{YYYY-MM-DD}.pdf - Maintain index
~/.crab-catch/reports/index.json:{ "project": "name", "date": "YYYY-MM-DD", "file": "filename.pdf", "entry": "original input" }
Report Output
Use REPORT_TEMPLATE.md as the report structure.
Report philosophy: curated intelligence, not data dump
The report should be concise and decision-oriented. The reader wants to know: is this project trustworthy? What are the risks? Where do the claims fall apart?
Five pillars of the report (in order of importance):
- Contradictions & anomalies — where different sources tell different stories. This is the most valuable content. Twitter says X, website says Y, on-chain shows Z.
- Claim verification — systematic test of every official statement. What the project claims vs what the code/chain actually shows.
- Fund flow analysis — where the money goes. Deployer → holders → exchanges. Insider patterns, circular flows, cash-outs.
- Proactive testing — DApp functionality, website integrity, code security. Does the product work? Is the website legit? Are there backdoors in the code?
- Security findings — contract risks, code trojans, permission hazards. ABI dangerous functions, proxy patterns, obfuscated code.
What to omit: routine data that confirms nothing special. If a metric is normal, don't list it. If a claim checks out cleanly, a single ✅ row is enough — no paragraph. Only expand on findings that change the reader's decision.
Section constraints
Must keep — always present, fixed order:
- Header (project name + timestamp)
- 📌 Basic Information (flexible rows — agent adds/removes based on data, no fixed schema)
- 🧠 Core Findings (with Executive Summary)
- 📝 Conclusion & Verdict
- 📂 References
Default keep — user can request to skip:
- 🛡️ Verification & Cross-Reference (Claim / Contradictions / Disputes / Gaps)
- ⚠️ Risk Warning
Data-dependent — skip if no data:
- 📊 Deep Dive
- 👤 Team & Key Figures
- 💻 GitHub Analysis
- ⛓️ On-chain Security
- 📈 Social Signals
- 📅 Project Timeline
Formatting rules
Citation system (mandatory, like academic papers):
- Every factual claim MUST have
[[N]](url)citation - No source = mark as ⚠️ Unverified, NOT stated as fact
- Sequential numbering, first appearance order
- Bidirectional: every
[[N]]↔ References entry
Other:
- Numbers: K / M / B; prices:
$prefix - Highlight high-risk signals (honeypot, high tax, upgradable contracts)
- Data Coverage note when sources unavailable
- DYOR disclaimer
- Output language matches user input; default zh-CN