arcfetch

Use when working with arcfetch CLI for URL fetching, article extraction, and cache management. Triggers on fetching URLs, batch processing, managing cached references, promoting/deleting content, extracting/fetching links, or integrating arcfetch into automation pipelines.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "arcfetch" with this command: npx skills add briansunter/arcfetch/briansunter-arcfetch-arcfetch

arcfetch

Guide for using the arcfetch CLI to fetch web content, extract articles as clean markdown, and manage cached references.

Overview

arcfetch converts web pages to clean markdown with:

  • Automatic Playwright fallback when simple HTTP fetch produces low-quality results
  • Quality scoring (0-100) with boilerplate/error page/login wall/paywall detection
  • Content-to-source ratio analysis to catch JS-rendered or gated content
  • Anti-bot detection measures (stealth plugin, viewport/timezone rotation, realistic headers)

Installation

# Use via bunx (no install needed)
bunx arcfetch <command>

# Or install globally
bun install -g arcfetch

Commands

fetch

Fetch a URL and save extracted markdown to the temp folder.

arcfetch fetch <url> [options]

Options:

  • -q, --query <text> - Search query (saved as metadata)
  • -o, --output <format> - Output format:
    • text - Plain text, LLM-friendly (default)
    • json - Structured JSON
    • path - Just the filepath
    • summary - slug|filepath format
  • --pretty - Human-friendly output with emojis
  • -v, --verbose - Show detailed output (quality scores, pipeline decisions)
  • --min-quality <n> - Minimum quality score 0-100 (default: 60)
  • --temp-dir <path> - Temp folder (default: .tmp/arcfetch)
  • --docs-dir <path> - Docs folder (default: docs/ai/references)
  • --wait-strategy <mode> - Playwright wait: networkidle (default), domcontentloaded, load
  • --force-playwright - Skip simple fetch, use Playwright directly
  • --refetch - Re-fetch even if URL already cached

Examples:

# Basic fetch (plain text output for LLMs)
arcfetch fetch https://example.com/article

# Get just the filepath
arcfetch fetch https://example.com -o path

# Human-friendly output
arcfetch fetch https://example.com --pretty

# JSON output for scripting
arcfetch fetch https://example.com -o json

# With search query metadata
arcfetch fetch https://example.com -q "search term"

# Verbose mode to see pipeline decisions
arcfetch fetch https://example.com -v

# Force Playwright for JS-heavy sites
arcfetch fetch https://example.com --force-playwright

Quality Pipeline:

  1. Simple HTTP fetch with browser-like User-Agent
  2. Extract content with Readability + Turndown
  3. Quality score (0-100) with boilerplate detection
  4. Score >= 85: accept as-is
  5. Score 60-84: try Playwright, use whichever scores higher
  6. Score < 60: require Playwright, fail if still below threshold

list

List all cached references.

arcfetch list [options]

Options:

  • -o, --output <format> - Output format: text, json
  • --pretty - Human-friendly output
arcfetch list --pretty
arcfetch list -o json

links

Extract all links from a cached reference.

arcfetch links <ref-id> [options]

Options:

  • -o, --output <format> - Output format: text, json
  • --pretty - Human-friendly output
arcfetch links my-article --pretty
arcfetch links my-article -o json

fetch-links

Fetch all links from a cached reference (parallel, max 5 concurrent).

arcfetch fetch-links <ref-id> [options]

Options:

  • --refetch - Force re-fetch even if already cached
  • -o, --output <format> - Output format: text, json
  • --pretty - Human-friendly output
arcfetch fetch-links my-article --pretty

promote

Move reference from temp to permanent docs folder.

arcfetch promote <ref-id> [options]
arcfetch promote my-article --pretty
arcfetch promote my-article -o json

delete

Delete a cached reference.

arcfetch delete <ref-id> [options]
arcfetch delete my-article --pretty

config

Show current configuration.

arcfetch config

mcp

Start the MCP server (for Claude Code integration).

arcfetch mcp

Workflow Patterns

Single Article

arcfetch fetch https://example.com/guide --pretty
cat .tmp/arcfetch/example-guide.md  # Review content
arcfetch promote example-guide      # Move to docs if good

Batch Fetch

for url in "url1" "url2" "url3"; do
  arcfetch fetch "$url" --pretty
done
arcfetch list --pretty         # Review all
arcfetch promote my-article    # Promote desired ones

Fetch All Links from a Page

arcfetch fetch https://example.com/resources --pretty
arcfetch links resources --pretty           # See what links exist
arcfetch fetch-links resources --pretty     # Fetch them all

Scripting with JSON Output

RESULT=$(arcfetch fetch https://example.com -o json)
REF_ID=$(echo "$RESULT" | jq -r '.refId')
QUALITY=$(echo "$RESULT" | jq -r '.quality')

if (( QUALITY >= 85 )); then
  arcfetch promote "$REF_ID"
fi

Cleanup

arcfetch list --pretty
arcfetch delete unwanted-ref

Configuration

Priority Order

  1. CLI arguments
  2. Environment variables
  3. arcfetch.config.json
  4. .arcfetchrc / .arcfetchrc.json
  5. Built-in defaults

Config File (arcfetch.config.json)

{
  "quality": {
    "minScore": 60,
    "jsRetryThreshold": 85
  },
  "paths": {
    "tempDir": ".tmp/arcfetch",
    "docsDir": "docs/ai/references"
  },
  "playwright": {
    "timeout": 30000,
    "waitStrategy": "networkidle"
  }
}

Environment Variables

export SOFETCH_MIN_SCORE=60
export SOFETCH_TEMP_DIR=".tmp/arcfetch"
export SOFETCH_DOCS_DIR="docs/ai/references"

Quality Scoring

Score starts at 100 and deductions apply:

CheckDeductionSeverity
Blank contentScore = 0Issue
Content < 50 chars-50Issue
Content < 300 chars-15Warning
HTML tags > 100-40Issue
HTML tags > 50-20Warning
HTML tags > 10-5Warning
HTML ratio > 30%-25Issue
HTML ratio > 15%-10Warning
Table tags > 50-30Issue
Script tags present-15Warning
Style tags present-10Warning
Extraction ratio < 0.5% (from large page)-35Issue
Extraction ratio < 2% (from large page)-20Warning
Boilerplate detected (error/login/paywall)-40Issue
Excessive newlines-5Warning

Boilerplate patterns detected (on short content < 2000 chars):

  • Error pages: "something went wrong", "an error occurred", "unexpected error"
  • 404 pages: "page not found", "404 not found"
  • Login walls: "log in to continue", "please log in", "sign in to continue"
  • Paywalls: "subscribe to continue reading"
  • Bot detection: "are you a robot", "complete the captcha", "verify you are human"
  • Access denied, JS-required, unsupported browser pages

Score thresholds:

  • >= 90: Excellent
  • >= 75: Good
  • >= 60: Acceptable (minimum to pass)
  • < 60: Poor (rejected)

MCP Server

The MCP server exposes 6 tools for Claude Code integration:

ToolDescription
fetch_urlFetch URL, extract markdown, save to temp
list_cachedList all cached references
promote_referenceMove temp reference to docs folder
delete_cachedDelete a cached reference
extract_linksExtract links from a cached reference
fetch_linksFetch all links from a cached reference

Start via CLI: arcfetch mcp

Or configure in Claude Code MCP settings to run bunx arcfetch as stdio server.

File Format

Cached files use markdown with YAML frontmatter:

---
title: "Article Title"
source_url: https://example.com/article
fetched_date: 2026-02-06
type: web
status: temporary
query: "optional search query"
---

# Article Title

Extracted markdown content...
  • Ref IDs are slugified titles (e.g., how-to-build-react-apps)
  • Temp storage: .tmp/arcfetch/<slug>.md (status: temporary)
  • Permanent storage: docs/ai/references/<slug>.md (status: permanent, after promote)
  • Duplicate detection: re-fetching same URL returns existing ref unless --refetch

References

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

release-notes

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

CP2K Cross-Code Input Studio

Generate, refine, explain, and cross-convert CP2K-centered input drafts for computational chemistry and materials workflows. Use when a user wants a CP2K .in...

Registry SourceRecently Updated
Coding

Clipboard Manager

剪贴板历史管理工具。保存剪贴板历史,快速搜索和重复粘贴。适合频繁复制粘贴的用户。

Registry SourceRecently Updated