Markdown.New Crawl Local-First Access
Use markdown.new/crawl for multi-page crawling and async Markdown generation.
Required Behavior
- Prefer local API calls with
curl(or any suitable alternative tools). - Assume no authentication is required unless the service behavior changes.
- Start with
POST /crawlto get a job ID. - Poll
GET /crawl/status/{jobId}until crawl completes. - Default to Markdown output; use
?format=jsononly when structured output is needed. - Keep crawl scope explicit (
limit,depth, subdomain/external toggles, include/exclude patterns). - State default crawl behavior: same-domain links only, up to
500pages per job. - If
/crawlfails (network, timeout, blocked host), fall back to another method and state the fallback.
Core Commands
curl -X POST "https://markdown.new/crawl" \
-H "Content-Type: application/json" \
-d '{"url":"https://docs.example.com","limit":50}'
curl "https://markdown.new/crawl/status/<jobId>"
curl "https://markdown.new/crawl/status/<jobId>?format=json"
curl "https://markdown.new/crawl/https://docs.example.com"
curl -X DELETE "https://markdown.new/crawl/status/<jobId>"
API Coverage
POST /crawl: create async crawl job and return job ID.GET /crawl/status/{jobId}: return crawl output and status.DELETE /crawl/status/{jobId}: cancel a running crawl; completed pages remain available.GET /crawl/{url}: browser-style shortcut that starts crawl and returns tracking page.
Common Crawl Options
url(required): crawl starting URL.limit: max pages,1-500(default500).depth: max link depth,1-10(default5).render: enable JS rendering for SPA pages.source: URL discovery strategy (all,sitemaps,links).maxAge: max cache age seconds (0-604800, default86400).modifiedSince: UNIX timestamp; crawl pages modified after this time.includeExternalLinks: include cross-domain links.includeSubdomains: include subdomains.includePatterns/excludePatterns: wildcard URL filtering.
Output Notes
GET /crawl/status/{jobId}returns concatenated Markdown by default.- Add
?format=jsonfor per-page structured records. - Images are stripped by default; add
?retain_images=trueto keep them. - Results are retained for 14 days; expired job IDs return errors.
Operational Limits
- Rate model (as documented in the crawl page FAQ): each crawl costs 50 units against a 500 daily limit (about 10 crawls/day).
- Prefer lower
limitvalues for targeted extraction to reduce cost and runtime.