jina-web-fetch

Fetch webpage text with a normal HTTP request first, then automatically fall back to jina.ai proxy when direct access fails or returns login/JS-blocked content. Use when extracting content from X (Twitter) or other pages that are hard to read directly.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "jina-web-fetch" with this command: npx skills add xiaojiongqian/skills-hub/xiaojiongqian-skills-hub-jina-web-fetch

Jina Web Fetch

Use this skill to capture content from hard-to-fetch pages while keeping a deterministic workflow:

  1. Try direct fetch first.
  2. If direct fetch fails or looks blocked, retry via jina.ai.

Quick Start

bash "<path-to-skill>/scripts/fetch_with_jina_fallback.sh" "<url>" "<output_file>"

Example:

bash "<path-to-skill>/scripts/fetch_with_jina_fallback.sh" \
  "https://x.com/trq212/status/2027463795355095314" \
  "raw/x-status.txt"

The script prints source=direct or source=jina to stderr so you can see which path was used.

If output_file is omitted, content is printed to stdout.

Default Workflow

  1. Run the script with the target URL.
  2. Save raw output under a traceable path like raw/<slug>.txt.
  3. Parse extracted text/markdown for:
    • main body
    • media links (images/videos)
    • referenced URLs
  4. Keep original URL + raw capture together for auditability.

Blocking Heuristics

The script auto-falls back to jina.ai when direct content looks like:

  • login wall / sign-up prompt
  • JS-required page
  • anti-bot / captcha / access denied page
  • very small shell-like HTML page (default threshold < 800 bytes)

Environment Knobs

  • FETCH_TIMEOUT (default 25)
  • FETCH_CONNECT_TIMEOUT (default 10)
  • FETCH_MIN_BYTES (default 800)
  • JINA_FORCE=1 to skip direct fetch and always use jina.ai

URL Format Note

Fallback URL is built as: https://r.jina.ai/http://<original-host-and-path>

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

git-pr-merge

No summary provided by upstream source.

Repository SourceNeeds Review
General

playwright-mcp

No summary provided by upstream source.

Repository SourceNeeds Review
General

gh-fix-ci

No summary provided by upstream source.

Repository SourceNeeds Review
General

gh-address-comments

No summary provided by upstream source.

Repository SourceNeeds Review