Apify Scrapers
Overview
Scrape content from major social platforms using Apify actors. Each platform has optimized settings for cost and quality.
Quick Decision Tree
What do you want to scrape? │ ├── Social Media Posts │ ├── Twitter/X → references/twitter.md │ │ └── Script: scripts/scrape_twitter_ai_trends.py │ │ │ ├── Reddit → references/reddit.md │ │ └── Script: scripts/scrape_reddit_ai_tech.py │ │ │ ├── LinkedIn → references/linkedin.md │ │ └── Script: scripts/scrape_linkedin_posts.py │ │ │ ├── Instagram → references/instagram.md │ │ └── Script: scripts/scrape_instagram.py │ │ └── Modes: profile, posts, hashtag, reels, comments │ │ │ ├── Facebook → references/facebook.md │ │ └── Script: scripts/scrape_facebook.py │ │ └── Modes: page, posts, reviews, groups, marketplace │ │ │ ├── TikTok → references/multi-platform.md │ │ └── Script: scripts/scrape_multi_platform.py │ │ │ └── YouTube → references/multi-platform.md │ └── Script: scripts/scrape_multi_platform.py │ ├── Business/Places │ ├── Google Maps businesses → references/google-maps.md │ │ └── Script: scripts/scrape_google_maps.py │ │ └── Modes: search, place, reviews │ │ │ └── Contact info from websites → references/contact-enrichment.md │ └── Script: scripts/scrape_contact_info.py │ └── Extract: emails, phone numbers, social profiles │ ├── Auto-detect URL type → references/url-detect.md │ └── Script: scripts/scrape_content_by_url.py │ ├── Trend Analysis (NEW) │ └── Enriched trend analysis → workflows/trend-analysis.md │ └── Script: scripts/analyze_trends.py │ └── Features: velocity scoring, lifecycle staging, opportunity scoring │ └── Workflows (multi-step) ├── Lead generation → workflows/lead-generation.md ├── Influencer discovery → workflows/influencer-discovery.md ├── Competitor analysis → workflows/competitor-intel.md ├── Trend analysis → workflows/trend-analysis.md └── Competitor Ads Intelligence (NEW) → workflows/competitor-ads.md └── Script: scripts/scrape_competitor_ads.py └── Platforms: Facebook Ads Library, Google Ads Transparency └── Features: Spend estimates, creative analysis, benchmarking
Environment Setup
Required in .env
APIFY_TOKEN=apify_api_xxxxx
Get your API key: https://console.apify.com/account/integrations
Common Usage Patterns
Scrape Twitter Trends
python scripts/scrape_twitter_ai_trends.py --query "AI agents" --max-tweets 50
Scrape Reddit Discussions
python scripts/scrape_reddit_ai_tech.py --subreddits "MachineLearning,LocalLLaMA" --max-posts 100
Scrape LinkedIn Author
python scripts/scrape_linkedin_posts.py author "https://linkedin.com/in/username" --max-posts 30
Auto-detect and Scrape URL
python scripts/scrape_content_by_url.py "https://x.com/user/status/123456"
Scrape Instagram Profile
python scripts/scrape_instagram.py profile "https://instagram.com/username" --max-posts 20
Scrape Instagram Hashtag
python scripts/scrape_instagram.py hashtag "#artificialintelligence" --max-posts 50
Scrape Instagram Reels
python scripts/scrape_instagram.py reels "https://instagram.com/username" --max-reels 30
Scrape Facebook Page
python scripts/scrape_facebook.py page "https://facebook.com/pagename" --max-posts 50
Scrape Facebook Reviews
python scripts/scrape_facebook.py reviews "https://facebook.com/pagename" --max-reviews 100
Scrape Facebook Marketplace
python scripts/scrape_facebook.py marketplace "laptops in san francisco" --max-items 30
Scrape Google Maps Businesses
python scripts/scrape_google_maps.py search "AI consulting firms in New York" --max-results 50
Scrape Google Maps Reviews
python scripts/scrape_google_maps.py reviews "ChIJN1t_tDeuEmsRUsoyG83frY4" --max-reviews 100
Extract Contact Info from Websites
python scripts/scrape_contact_info.py "https://example.com" --depth 2
Bulk Contact Enrichment
python scripts/scrape_contact_info.py --urls-file companies.txt --output contacts.json
Scrape Competitor Ads (Single Competitor)
python scripts/scrape_competitor_ads.py "Nike" --platforms facebook google --country US --days 30
Compare Multiple Competitors' Ads
python scripts/scrape_competitor_ads.py "Nike" "Adidas" "Puma" --compare --output comparison.json
Discover Advertisers by Keyword
python scripts/scrape_competitor_ads.py --search "running shoes" --country US --max-ads 200
Filter Competitor Ads by Media Type
python scripts/scrape_competitor_ads.py "Netflix" "Disney+" --platforms facebook --media-types video --days 7
Analyze Trends (NEW)
Analyze specific topic with enrichments
python scripts/analyze_trends.py "artificial intelligence" --sources google instagram tiktok --days 90
Discover trending topics in category
python scripts/analyze_trends.py --category technology --discover --top 50
Compare multiple trends
python scripts/analyze_trends.py "AI" "blockchain" "metaverse" --compare
Export HTML trend report
python scripts/analyze_trends.py "sustainable fashion" --format html --output trend_report.html
Cost Estimates
Platform Actor Cost per Item
Twitter kaitoeasyapi/twitter-x-data-tweet-scraper ~$0.00025
Reddit trudax/reddit-scraper ~$0.001-0.005
LinkedIn harvestapi/linkedin-post-search ~$0.01-0.05
YouTube streamers/youtube-scraper ~$0.01-0.05
TikTok clockworks/tiktok-scraper ~$0.005
Instagram (profile) apify/instagram-profile-scraper ~$0.005
Instagram (posts) apify/instagram-post-scraper ~$0.002-0.005
Instagram (hashtag) apify/instagram-hashtag-scraper ~$0.002-0.005
Instagram (reels) apify/instagram-reel-scraper ~$0.005-0.01
Instagram (comments) apify/instagram-comment-scraper ~$0.001-0.003
Facebook (page) apify/facebook-pages-scraper ~$0.005-0.01
Facebook (posts) apify/facebook-posts-scraper ~$0.003-0.005
Facebook (reviews) apify/facebook-reviews-scraper ~$0.002-0.005
Facebook (groups) apify/facebook-groups-scraper ~$0.005-0.01
Facebook (marketplace) apify/facebook-marketplace-scraper ~$0.005-0.01
Google Maps (search) compass/crawler-google-places ~$0.01-0.02
Google Maps (place) compass/google-maps-business-scraper ~$0.01
Google Maps (reviews) compass/google-maps-reviews-scraper ~$0.003-0.005
Contact Enrichment lukaskrivka/contact-info-scraper ~$0.01-0.03
Google Trends apify/google-trends-scraper ~$0.01
Trend Analysis (multi) Multiple actors ~$0.50-1.50/run
Facebook Ads Library apify/facebook-ads-scraper ~$0.75/1K ads
Facebook Ads (alt) curious_coder/facebook-ads-library-scraper ~$0.50/1K ads
Google Ads Transparency lexis-solutions/google-ads-scraper ~$1.00/1K ads
Google Ads (alt) xtech/google-ad-transparency-scraper ~$0.80/1K ads
Output Location
All scraped data saves to .tmp/ with timestamped filenames:
-
.tmp/twitter_ai_trends_YYYYMMDD.json
-
.tmp/reddit_ai_tech_YYYYMMDD.json
-
.tmp/linkedin_posts_YYYYMMDD_HHMMSS.json
Security Notes
Credential Handling
-
Store APIFY_TOKEN in .env file (never commit to git)
-
Rotate API tokens periodically via Apify Console
-
Never log or print API tokens in script output
-
Use environment variables, not hardcoded values
Data Privacy
-
Scraped data contains only publicly available content
-
Social media posts may include PII (names, handles, profile info)
-
Data is stored locally in .tmp/ directory
-
No data is retained by Apify after actor run completes
-
Consider data minimization - only scrape what you need
Access Scopes
-
Apify tokens have full account access (no granular scopes)
-
Use separate Apify accounts for different projects if needed
-
Monitor usage via Apify Console dashboard
Compliance Considerations
-
Terms of Service: Respect each platform's ToS (Twitter, Reddit, LinkedIn)
-
Rate Limiting: Actors have built-in rate limiting to avoid bans
-
Robots.txt: Some actors may bypass robots.txt - use responsibly
-
GDPR: Scraped PII may be subject to GDPR if EU residents
-
Ethical Use: Only scrape public data; never bypass authentication
-
Proxy Ethics: Residential proxies should be used ethically
Troubleshooting
Common Issues
Issue: Actor run failed
Symptoms: Script terminates with "Actor run failed" or timeout error Cause: Invalid actor ID, insufficient proxy credits, or actor configuration issue Solution:
-
Verify the actor ID is correct in the script
-
Check Apify Console for actor run logs
-
Ensure proxy settings match actor requirements
-
Try running with default proxy settings first
Issue: Empty results returned
Symptoms: Script completes but returns 0 items Cause: Content blocked by platform, invalid query, or proxy being detected Solution:
-
Try a different proxy type (residential vs datacenter)
-
Simplify the search query
-
Reduce the number of results requested
-
Check if the platform is blocking scraping attempts
Issue: Rate limited by platform
Symptoms: Script fails with 429 errors or "rate limited" messages Cause: Too many requests in a short time period Solution:
-
Add delays between requests (actor settings)
-
Reduce concurrent requests
-
Use proxy rotation
-
Wait and retry after a cooldown period
Issue: Invalid API token
Symptoms: Authentication error or "invalid token" message Cause: Token expired, revoked, or incorrectly set Solution:
-
Regenerate API token in Apify Console
-
Verify token is correctly set in .env file
-
Check for leading/trailing whitespace in token
-
Ensure APIFY_TOKEN environment variable is loaded
Issue: Proxy connection errors
Symptoms: Connection timeout or proxy errors Cause: Proxy pool exhausted or geo-restriction issues Solution:
-
Switch proxy type (basic, residential, or datacenter)
-
Verify proxy credit balance in Apify Console
-
Try a different proxy country/region
-
Disable proxy to test if that's the root cause
Resources
Platform References
-
references/twitter.md - Twitter/X scraping details
-
references/reddit.md - Reddit scraping with subreddit targeting
-
references/linkedin.md - LinkedIn post scraping (author or search mode)
-
references/instagram.md - Instagram profile, posts, hashtag, reels, and comments scraping
-
references/facebook.md - Facebook page, posts, reviews, groups, and marketplace scraping
-
references/multi-platform.md - TikTok and YouTube scraping
-
references/url-detect.md - Auto-detect URL type and scrape
Business/Places References
-
references/google-maps.md - Google Maps business search, place details, and reviews
-
references/contact-enrichment.md - Extract emails, phone numbers, and social profiles from websites
Workflow References
-
workflows/lead-generation.md - Multi-step lead generation workflow
-
workflows/influencer-discovery.md - Find and analyze influencers across platforms
-
workflows/competitor-intel.md - Competitive intelligence gathering workflow
-
workflows/trend-analysis.md - Enriched multi-platform trend analysis with scoring
Integration Patterns
Scrape and Enrich
Skills: apify-scrapers → parallel-research Use case: Scrape social media posts, then enrich with deep research Flow:
-
Scrape Twitter/Reddit for mentions of a topic
-
Extract company names or URLs from posts
-
Use parallel-research to get detailed info on each company
Scrape and Summarize
Skills: apify-scrapers → content-generation Use case: Create newsletter content from social media trends Flow:
-
Scrape trending AI posts from Twitter
-
Pass scraped data to content-generation summarize
-
Generate a formatted newsletter section
Scrape and Archive
Skills: apify-scrapers → google-workspace Use case: Save scraped data to Google Drive for team access Flow:
-
Scrape LinkedIn posts from target accounts
-
Format data as CSV or JSON
-
Upload to Google Drive client folder via google-workspace
Trend Analysis + Content Strategy
Skills: apify-scrapers (trend-analysis) → content-generation Use case: Identify trending topics and create content strategy Flow:
-
Run trend analysis: python scripts/analyze_trends.py "AI productivity" --sources all
-
Review lifecycle stage and opportunity score
-
Use content-generation to create content for high-opportunity trends
-
Focus on emerging trends with high velocity scores
Competitive Trend Monitoring
Skills: apify-scrapers (trend-analysis) → parallel-research Use case: Monitor competitor visibility in trending topics Flow:
-
Analyze industry trends: python scripts/analyze_trends.py --category "your-industry" --discover
-
Compare your brand vs competitors in those trends
-
Use parallel-research for deep dive on gaps
-
Generate competitive intelligence report