Smart Scraper
Intelligent web scraping that understands page structure.
Features
- Auto-detection: Automatically identifies list, article, or table layouts
- Smart extraction: Parses prices, dates, URLs from unstructured text
- Multiple formats: Output as JSON, CSV, or Markdown
- Scroll support: Handles infinite scroll pages
Usage
# Extract product listings
smart-scraper --url "https://example.com/products" --type list
# Extract article content
smart-scraper --url "https://example.com/article" --type article --format markdown
# Extract table data
smart-scraper --url "https://example.com/data" --type table --format csv
Options
| Option | Description | Default |
|---|---|---|
--url, -u | Target URL (required) | - |
--type, -t | Extraction type: list, article, table, auto | auto |
--format, -f | Output format: json, csv, markdown | json |
--max, -m | Maximum items to extract | 100 |
--scroll | Enable auto-scroll for lazy-loaded content | false |
Examples
Extract Hacker News
smart-scraper -u https://news.ycombinator.com -t list -m 10
Save article as Markdown
smart-scraper -u https://blog.example.com/post -t article -f markdown > article.md
Export table to CSV
smart-scraper -u https://example.com/prices -t table -f csv > prices.csv