cloudflare-tomarkdown

Convert URLs, images, PDFs, and documents to clean Markdown using Cloudflare APIs. Scraping tool with image AI summarization and JS-rendering fallback.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "cloudflare-tomarkdown" with this command: npx skills add mderazon/agent-skills/mderazon-agent-skills-cloudflare-tomarkdown

Cloudflare Markdown Conversion

Use this skill to convert URLs or local files (PDFs, Images, HTML, CSV, Office docs) into clean, structured Markdown for text analysis, RAG, and LLMs.

Features & Supported Formats

  • Scraping URLs: Extracts HTML, resolves relative links, handles JSON-LD, extracts title/description.
  • Images: Automatically runs object-detection and uses an LLM (gemma-3-12b-it) to generate image descriptions. Converts SVG to raster.
  • PDFs: Parses internal StructTree tagging for high-fidelity semantic Markdown extraction.
  • Office Docs: Supports .docx, .xlsx, .csv, .ods, .odt, and more.

Usage

Setup & Authentication

This skill requires CLOUDFLARE_ACCOUNT_ID and CLOUDFLARE_API_TOKEN.

Automatic Setup: For convenience, the script automatically looks for a .env file in the current directory or its parents. If you have a .env file in your project root, it will work out of the box.

Manual Setup: Alternatively, you can export them in your shell or pass them as parameters (--account and --token).

Instruction for the Agent: If the skill fails due to missing environment variables, check if a .env file exists in the workspace root.

Scraping a URL

# Basic usage (defaults to 'auto' method, trying AI parsing first, then browser rendering)
node scripts/render.js --url "https://example.com"

Scraping with Options (CSS Selectors, etc.)

Cloudflare allows filtering elements using cssSelector or providing a hostname.

# Only extract the main content container
node scripts/render.js --url "https://developer.cloudflare.com" \
  --options '{"html": {"cssSelector": "main.content"}}'

Converting a Local File (PDFs, Images, Office Docs)

node scripts/render.js --file "report.pdf"

Converting Images with Language Options

Image descriptions are generated via AI. You can specify a desired output language for the description (en, it, de, es, fr, pt).

node scripts/render.js --file "cat.jpeg" \
  --options '{"image": {"descriptionLanguage": "es"}}'

Advanced Options for JS-Heavy Sites

If a site requires complex JavaScript rendering or redirects, use the browser method with specific wait conditions.

# Wait for network to be idle before extracting content
node scripts/render.js --url "https://complex-site.com" --wait "networkidle2"

# Wait for a specific element to appear (e.g. price or main content)
node scripts/render.js --url "https://shop.com/prod" --selector ".product-price"

# Increase timeout for slow pages (in milliseconds)
node scripts/render.js --url "https://slow-site.com" --timeout 60000

Valid --wait options are: load, domcontentloaded (default), networkidle0, and networkidle2.

How It Works Intelligently

The --method auto capability tests two separate rendering paths:

  1. Workers AI tomarkdown (Primary): Ideal for documents, standard web pages, extracting JSON-LD structured data, and resolving standard HTML features. Uses multipart form data.
  2. Browser Rendering API (Fallback): If the page uses complex JavaScript (e.g. Single Page Apps) and the AI path cannot see the content, the Browser Rendering engine opens a headless real browser for accurate conversion.

Calling the REST API Directly (Advanced)

If you'd prefer not to use scripts/render.js, here is the curl equivalent for a local file using the tomarkdown REST API:

curl https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/ai/tomarkdown \
  -X POST \
  -H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
  -F "files=@document.pdf" \
  -F 'conversionOptions={"pdf":{"metadata":false}}'

Note: For URLs, you should use curl to fetch the source to a local file first before uploading it as files=@<temp.html>. The tomarkdown REST API does not directly ingest a --data url="https...".

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

vercel-composition-patterns

React composition patterns that scale. Use when refactoring components with boolean prop proliferation, building flexible component libraries, or designing reusable APIs. Triggers on tasks involving compound components, render props, context providers, or component architecture. Includes React 19 API changes.

Repository Source
85.9K23Kvercel
Automation

vercel-react-native-skills

React Native and Expo best practices for building performant mobile apps. Use when building React Native components, optimizing list performance, implementing animations, or working with native modules. Triggers on tasks involving React Native, Expo, mobile performance, or native platform APIs.

Repository Source
60.2K23Kvercel
Automation

supabase-postgres-best-practices

Postgres performance optimization and best practices from Supabase. Use this skill when writing, reviewing, or optimizing Postgres queries, schema designs, or database configurations.

Repository Source
35.1K1.6Ksupabase
Automation

sleek-design-mobile-apps

Use when the user wants to design a mobile app, create screens, build UI, or interact with their Sleek projects. Covers high-level requests ("design an app that does X") and specific ones ("list my projects", "create a new project", "screenshot that screen").

Repository Source