just-scrape

CLI tool for AI-powered web scraping, data extraction, search, and crawling via ScrapeGraph AI. Use when the user needs to scrape websites, extract structured data from URLs, convert pages to markdown, crawl multi-page sites, search the web for information, automate browser interactions (login, click, fill forms), get raw HTML, discover sitemaps, or generate JSON schemas. Triggers on tasks involving: (1) extracting data from websites, (2) web scraping or crawling, (3) converting webpages to markdown, (4) AI-powered web search with extraction, (5) browser automation, (6) generating output schemas for scraping. The CLI is just-scrape (npm package just-scrape).

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "just-scrape" with this command: npx skills add scrapegraphai/just-scrape/scrapegraphai-just-scrape-just-scrape

Web Scraping with just-scrape

AI-powered web scraping CLI by ScrapeGraph AI. Get an API key at dashboard.scrapegraphai.com.

Setup

Always install or run the @latest version to ensure you have the most recent features and fixes.

npm install -g just-scrape@latest           # npm
pnpm add -g just-scrape@latest              # pnpm
yarn global add just-scrape@latest          # yarn
bun add -g just-scrape@latest               # bun
npx just-scrape@latest --help               # run without installing
bunx just-scrape@latest --help              # run without installing (bun)
export SGAI_API_KEY="sgai-..."

API key resolution order: SGAI_API_KEY env var → .env file → ~/.scrapegraphai/config.json → interactive prompt (saves to config).

Command Selection

NeedCommand
Extract structured data from a known URLsmart-scraper
Search the web and extract from resultssearch-scraper
Convert a page to clean markdownmarkdownify
Crawl multiple pages from a sitecrawl
Get raw HTMLscrape
Automate browser actions (login, click, fill)agentic-scraper
Generate a JSON schema from descriptiongenerate-schema
Get all URLs from a sitemapsitemap
Check credit balancecredits
Browse past requestshistory
Validate API keyvalidate

Common Flags

All commands support --json for machine-readable output (suppresses banner, spinners, prompts).

Scraping commands share these optional flags:

  • --stealth — bypass anti-bot detection (+4 credits)
  • --headers <json> — custom HTTP headers as JSON string
  • --schema <json> — enforce output JSON schema

Commands

Smart Scraper

Extract structured data from any URL using AI.

just-scrape smart-scraper <url> -p <prompt>
just-scrape smart-scraper <url> -p <prompt> --schema <json>
just-scrape smart-scraper <url> -p <prompt> --scrolls <n>     # infinite scroll (0-100)
just-scrape smart-scraper <url> -p <prompt> --pages <n>       # multi-page (1-100)
just-scrape smart-scraper <url> -p <prompt> --stealth         # anti-bot (+4 credits)
just-scrape smart-scraper <url> -p <prompt> --cookies <json> --headers <json>
just-scrape smart-scraper <url> -p <prompt> --plain-text
# E-commerce extraction
just-scrape smart-scraper https://store.example.com/shoes -p "Extract all product names, prices, and ratings"

# Strict schema + scrolling
just-scrape smart-scraper https://news.example.com -p "Get headlines and dates" \
  --schema '{"type":"object","properties":{"articles":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}}}' \
  --scrolls 5

# JS-heavy SPA behind anti-bot
just-scrape smart-scraper https://app.example.com/dashboard -p "Extract user stats" \
  --stealth

Search Scraper

Search the web and extract structured data from results.

just-scrape search-scraper <prompt>
just-scrape search-scraper <prompt> --num-results <n>     # sources to scrape (3-20, default 3)
just-scrape search-scraper <prompt> --no-extraction       # markdown only (2 credits vs 10)
just-scrape search-scraper <prompt> --schema <json>
just-scrape search-scraper <prompt> --stealth --headers <json>
# Research across sources
just-scrape search-scraper "Best Python web frameworks in 2025" --num-results 10

# Cheap markdown-only
just-scrape search-scraper "React vs Vue comparison" --no-extraction --num-results 5

# Structured output
just-scrape search-scraper "Top 5 cloud providers pricing" \
  --schema '{"type":"object","properties":{"providers":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"free_tier":{"type":"string"}}}}}}'

Markdownify

Convert any webpage to clean markdown.

just-scrape markdownify <url>
just-scrape markdownify <url> --stealth         # +4 credits
just-scrape markdownify <url> --headers <json>
just-scrape markdownify https://blog.example.com/my-article
just-scrape markdownify https://protected.example.com --stealth
just-scrape markdownify https://docs.example.com/api --json | jq -r '.result' > api-docs.md

Crawl

Crawl multiple pages and extract data from each.

just-scrape crawl <url> -p <prompt>
just-scrape crawl <url> -p <prompt> --max-pages <n>      # default 10
just-scrape crawl <url> -p <prompt> --depth <n>           # default 1
just-scrape crawl <url> --no-extraction --max-pages <n>   # markdown only (2 credits/page)
just-scrape crawl <url> -p <prompt> --schema <json>
just-scrape crawl <url> -p <prompt> --rules <json>        # include_paths, same_domain
just-scrape crawl <url> -p <prompt> --no-sitemap
just-scrape crawl <url> -p <prompt> --stealth
# Crawl docs site
just-scrape crawl https://docs.example.com -p "Extract all code snippets" --max-pages 20 --depth 3

# Filter to blog pages only
just-scrape crawl https://example.com -p "Extract article titles" \
  --rules '{"include_paths":["/blog/*"],"same_domain":true}' --max-pages 50

# Raw markdown, no AI extraction (cheaper)
just-scrape crawl https://example.com --no-extraction --max-pages 10

Scrape

Get raw HTML content from a URL.

just-scrape scrape <url>
just-scrape scrape <url> --stealth          # +4 credits
just-scrape scrape <url> --branding         # extract logos/colors/fonts (+2 credits)
just-scrape scrape <url> --country-code <iso>
just-scrape scrape https://example.com
just-scrape scrape https://store.example.com --stealth --country-code DE
just-scrape scrape https://example.com --branding

Agentic Scraper

Browser automation with AI — login, click, navigate, fill forms. Steps are comma-separated strings.

just-scrape agentic-scraper <url> -s <steps>
just-scrape agentic-scraper <url> -s <steps> --ai-extraction -p <prompt>
just-scrape agentic-scraper <url> -s <steps> --schema <json>
just-scrape agentic-scraper <url> -s <steps> --use-session   # persist browser session
# Login + extract dashboard
just-scrape agentic-scraper https://app.example.com/login \
  -s "Fill email with user@test.com,Fill password with secret,Click Sign In" \
  --ai-extraction -p "Extract all dashboard metrics"

# Multi-step form
just-scrape agentic-scraper https://example.com/wizard \
  -s "Click Next,Select Premium plan,Fill name with John,Click Submit"

# Persistent session across runs
just-scrape agentic-scraper https://app.example.com \
  -s "Click Settings" --use-session

Generate Schema

Generate a JSON schema from a natural language description.

just-scrape generate-schema <prompt>
just-scrape generate-schema <prompt> --existing-schema <json>
just-scrape generate-schema "E-commerce product with name, price, ratings, and reviews array"

# Refine an existing schema
just-scrape generate-schema "Add an availability field" \
  --existing-schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}'

Sitemap

Get all URLs from a website's sitemap.

just-scrape sitemap <url>
just-scrape sitemap https://example.com --json | jq -r '.urls[]'

History

Browse request history. Interactive by default (arrow keys to navigate, select to view details).

just-scrape history <service>                     # interactive browser
just-scrape history <service> <request-id>        # specific request
just-scrape history <service> --page <n>
just-scrape history <service> --page-size <n>     # max 100
just-scrape history <service> --json

Services: markdownify, smartscraper, searchscraper, scrape, crawl, agentic-scraper, sitemap

just-scrape history smartscraper
just-scrape history crawl --json --page-size 100 | jq '.requests[] | {id: .request_id, status}'

Credits & Validate

just-scrape credits
just-scrape credits --json | jq '.remaining_credits'
just-scrape validate

Common Patterns

Generate schema then scrape with it

just-scrape generate-schema "Product with name, price, and reviews" --json | jq '.schema' > schema.json
just-scrape smart-scraper https://store.example.com -p "Extract products" --schema "$(cat schema.json)"

Pipe JSON for scripting

just-scrape sitemap https://example.com --json | jq -r '.urls[]' | while read url; do
  just-scrape smart-scraper "$url" -p "Extract title" --json >> results.jsonl
done

Protected sites

# JS-heavy SPA behind Cloudflare
just-scrape smart-scraper https://protected.example.com -p "Extract data" --stealth

# With custom cookies/headers
just-scrape smart-scraper https://example.com -p "Extract data" \
  --cookies '{"session":"abc123"}' --headers '{"Authorization":"Bearer token"}'

Credit Costs

FeatureExtra Credits
--stealth+4 per request
--branding (scrape only)+2
search-scraper extraction10 per request
search-scraper --no-extraction2 per request
crawl --no-extraction2 per page

Environment Variables

SGAI_API_KEY=sgai-...              # API key
JUST_SCRAPE_TIMEOUT_S=300          # Request timeout in seconds (default 120)
JUST_SCRAPE_DEBUG=1                # Debug logging to stderr

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

just-scrape

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

Raspberry Pi Manager

Manage Raspberry Pi devices — GPIO control, system monitoring (CPU/temp/memory), service management, sensor data reading, and remote deployment. Use when you...

Registry SourceRecently Updated
Coding

LinkdAPI

Complete LinkdAPI integration OpenClaw skill. Includes all 50+ endpoints, Python/Node.js/Go SDKs, authentication, rate limits, and real-world examples. Use t...

Registry SourceRecently Updated