web-crawler

All-in-one web scraping and social media data extraction. Use when normal web_fetch cannot read a page, when the user needs YouTube content, or when the user wants to scrape/fetch/extract data from social media platforms (TikTok, Instagram, YouTube, Twitter/X, LinkedIn, Facebook, Reddit, Threads, Bluesky, Pinterest, Snapchat, Twitch, Kick, Truth Social, TikTok Shop, Google, and more). Covers profiles, posts, videos, reels, comments, transcripts, followers, ads, hashtags, trending content, and engagement metrics.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "web-crawler" with this command: npx skills add starchild-ai-agent/official-skills/starchild-ai-agent-official-skills-web-crawler

Web crawler

All-in-one scraping skill. Routes requests to the best backend automatically — the user does not need to know which API is used.

Use this when:

  • Normal web_fetch fails, returns boilerplate, or a site blocks basic fetching
  • The user asks for YouTube video content/transcript
  • The user wants to scrape, fetch, or extract data from any social media platform
  • The user mentions social media profiles, posts, comments, transcripts, ads, trending content, or engagement metrics

Prefer native web_fetch first for simple pages; paid fallback calls should be deliberate and scoped.

What each service is for

ScrapeCreators — Social media data extraction (27+ platforms)

Use for any request involving social media profiles, posts, videos, comments, transcripts, search, ads, trending content, or engagement metrics. Covers TikTok, Instagram, YouTube, LinkedIn, Facebook, Twitter/X, Reddit, Threads, Bluesky, Pinterest, Snapchat, Twitch, Kick, Truth Social, TikTok Shop, Google search, and link-in-bio services (Linktree, Komi, Pillar, Linkbio, Linkme, Amazon Shop).

Base URL: https://api.scrapecreators.com Auth: x-api-key: $SCRAPECREATORS_API_KEY header Method: All endpoints use GET requests with query params. Responses are JSON.

SerpApi — YouTube-only retrieval (legacy)

  • engine=youtube to search YouTube and discover candidate videos.
  • engine=youtube_video to fetch video metadata, description, chapters, related videos, and the transcript discovery link.
  • engine=youtube_video_transcript to fetch timestamped transcript segments for AI analysis.

Do not use SerpApi for Google/Bing/general SERP scraping, shopping, maps, news, or any non-YouTube engine. The transparent proxy blocks those engines.

Firecrawl — Fallback web page scraper

Only a fallback crawler for one web page when ordinary fetching fails. Use POST /v2/scrape with a single url and focused formats like markdown, html, rawHtml, links, summary, or constrained json/question/highlights extraction.

Do not use Firecrawl crawl/map/search/agent/browser endpoints. Do not request screenshots, audio, branding, images, or browser actions unless the proxy policy is expanded later.


ScrapeCreators — Intent routing

Map user intent to the right endpoint. Endpoint paths use the pattern /v1/platform/action.

Important: After selecting an endpoint from the tables below, fetch its OpenAPI spec at https://docs.scrapecreators.com/{path}/openapi.json for full parameter details, types, and example response before making the actual API call. For example: https://docs.scrapecreators.com/v1/tiktok/profile/openapi.json

Profiles / User Info

PlatformEndpointPrimary ParamExample
TikTok/v1/tiktok/profilehandlestoolpresidente
Instagram/v1/instagram/profilehandlejane
YouTube/v1/youtube/channelhandle, channelId, or urlThePatMcAfeeShow
LinkedIn (person)/v1/linkedin/profileurlhttps://www.linkedin.com/in/parrsam/
LinkedIn (company)/v1/linkedin/companyurlhttps://linkedin.com/company/shopify
Facebook/v1/facebook/profileurlhttps://www.facebook.com/mantraindianfolsom
Twitter/X/v1/twitter/profilehandleelonmusk
Reddit/v1/reddit/subreddit/detailssubreddit or urlAskReddit
Threads/v1/threads/profilehandlezuck
Bluesky/v1/bluesky/profilehandlejay.bsky.team
Pinterest/v1/pinterest/user/boardshandlepinterest
Truth Social/v1/truthsocial/profilehandlerealDonaldTrump
Twitch/v1/twitch/profilehandleninja
Snapchat/v1/snapchat/profilehandledjkhaled

Posts / Content Feeds

PlatformEndpointPrimary ParamExample
TikTok videos/v3/tiktok/profile/videoshandlestoolpresidente
Instagram posts/v2/instagram/user/postshandlejane
Instagram reels/v1/instagram/user/reelshandle or user_idjane or 2700692569
Instagram highlights/v1/instagram/user/highlightshandle or user_idjane or 2700692569
YouTube videos/v1/youtube/channel/videoshandle or channelIdThePatMcAfeeShow
YouTube shorts/v1/youtube/channel/shortshandle or channelIdstarterstory
YouTube playlist/v1/youtube/playlistplaylist_idPLP32wGpgzmIlInfgKVFfCwVsxgGqZNIiS
LinkedIn posts/v1/linkedin/company/postsurlhttps://linkedin.com/company/shopify
Facebook posts/v1/facebook/profile/postsurl or pageIdhttps://www.facebook.com/pacemorby
Facebook reels/v1/facebook/profile/reelsurlhttps://www.facebook.com/Spurs
Facebook photos/v1/facebook/profile/photosurlhttps://www.facebook.com/Spurs
Facebook group posts/v1/facebook/group/postsurl or group_id742354120555345
Twitter tweets/v1/twitter/user/tweetshandleelonmusk
Reddit posts/v1/reddit/subredditsubredditAskReddit
Threads posts/v1/threads/user/postshandlezuck
Bluesky posts/v1/bluesky/user/postshandle or user_idjay.bsky.team
Truth Social posts/v1/truthsocial/user/postshandle or user_idrealDonaldTrump
Pinterest board/v1/pinterest/boardurlhttps://www.pinterest.com/...

Single Post / Video Details

PlatformEndpointPrimary ParamExample
TikTok/v2/tiktok/videourlhttps://www.tiktok.com/@randomspamvideos25/video/7251387037834595630
Instagram/v1/instagram/posturlhttps://www.instagram.com/reel/DOq6eV6iIgD
Instagram highlight/v1/instagram/user/highlight/detailid18067016518767507
YouTube/v1/youtube/videourlhttps://www.youtube.com/watch?v=Y2Ah_DFr8cw
YouTube community post/v1/youtube/community-posturlhttps://www.youtube.com/post/Ugkxvj2KoApYAXoqLWnKVr6zZe5JjeHrQeP8
LinkedIn/v1/linkedin/posturlhttps://www.linkedin.com/pulse/being-father-has-made-me-better-leader...
Facebook/v1/facebook/posturlhttps://www.facebook.com/reel/1535656380759655
Twitter/X/v1/twitter/tweeturlhttps://twitter.com/elonmusk/status/...
Twitter/X community/v1/twitter/communityurlhttps://twitter.com/i/communities/...
Twitter/X community tweets/v1/twitter/community/tweetsurlhttps://twitter.com/i/communities/...
Reddit/v1/reddit/post/commentsurlhttps://www.reddit.com/r/AskReddit/comments/...
Threads/v1/threads/posturlhttps://www.threads.net/@zuck/post/...
Bluesky/v1/bluesky/posturlhttps://bsky.app/profile/.../post/...
Truth Social/v1/truthsocial/posturlhttps://truthsocial.com/@realDonaldTrump/posts/...
Pinterest/v1/pinterest/pinurlhttps://www.pinterest.com/pin/...
Twitch clip/v1/twitch/clipurlhttps://clips.twitch.tv/...
Kick clip/v1/kick/clipurlhttps://kick.com/...

Comments

PlatformEndpointPrimary ParamExample
TikTok/v1/tiktok/video/commentsurlhttps://www.tiktok.com/@stoolpresidente/video/7499229683859426602
Instagram/v2/instagram/post/commentsurlhttps://www.instagram.com/reel/DOq6eV6iIgD
YouTube/v1/youtube/video/commentsurlhttps://www.youtube.com/watch?v=dQw4w9WgXcQ
Facebook/v1/facebook/post/commentsurl or feedback_idhttps://www.facebook.com/reel/753347914167361
Reddit/v1/reddit/post/commentsurlhttps://www.reddit.com/r/AskReddit/comments/...

Transcripts

PlatformEndpointExampleNote
TikTok/v1/tiktok/video/transcripturl=https://www.tiktok.com/...&lang=enalso via /v2/tiktok/video with get_transcript=true
Instagram/v2/instagram/media/transcripturl=https://www.instagram.com/reel/...AI-powered, 10-30s, under 2min
YouTube/v1/youtube/video/transcripturl=https://www.youtube.com/watch?v=bjVIDXPP7Ukalso included in /v1/youtube/video response
Facebook/v1/facebook/post/transcripturl=https://www.facebook.com/reel/...under 2min only
Twitter/X/v1/twitter/tweet/transcripturl=https://twitter.com/...AI-powered, slow

Search

PlatformEndpointPrimary ParamExample
TikTok users/v1/tiktok/search/usersqueryfunny
TikTok videos (keyword)/v1/tiktok/search/keywordqueryfunny
TikTok videos (hashtag)/v1/tiktok/search/hashtaghashtagfyp
TikTok top (photos+videos)/v1/tiktok/search/topqueryfunny
Instagram reels/v2/instagram/reels/searchquerydogs
YouTube/v1/youtube/searchqueryfunny
YouTube hashtag/v1/youtube/search/hashtaghashtagfunny
Reddit (all)/v1/reddit/searchquerybest programming languages
Reddit (in subreddit)/v1/reddit/subreddit/searchsubreddit + queryAskReddit + funny
Threads posts/v1/threads/searchqueryAI
Threads users/v1/threads/search/usersqueryzuck
Pinterest/v1/pinterest/searchqueryhome decor
Google/v1/google/searchquerybest restaurants in NYC

Ad Libraries

PlatformEndpointPrimary ParamExample
Facebook ads search/v1/facebook/adLibrary/search/adsqueryrunning
Facebook company ads/v1/facebook/adLibrary/company/adspageId or companyNameLululemon
Facebook ad detail/v1/facebook/adLibrary/adid or url702369045530963
Facebook find companies/v1/facebook/adLibrary/search/companiesqueryNike
Google company ads/v1/google/company/adsdomain or advertiser_idnike.com
Google ad detail/v1/google/adurlhttps://adstransparency.google.com/...
Google find advertisers/v1/google/adLibrary/advertisers/searchqueryNike
LinkedIn ads search/v1/linkedin/ads/searchcompany or keywordShopify
LinkedIn ad detail/v1/linkedin/adurlhttps://www.linkedin.com/ad/...
Reddit ads search/v1/reddit/ads/searchquerygaming
Reddit ad detail/v1/reddit/adidt3_abc123

Trending / Popular

ContentEndpointParamExample
Trending feed/v1/tiktok/get-trending-feedregion (required)US
Popular videos/v1/tiktok/videos/popular
Popular creators/v1/tiktok/creators/popular
Popular hashtags/v1/tiktok/hashtags/popular
Popular songs/v1/tiktok/songs/popular
Song details/v1/tiktok/songclipId7439295283975702544
Videos using song/v1/tiktok/song/videosclipId7439295283975702544
Trending shorts (YT)/v1/youtube/shorts/trending

Followers / Following / Live (TikTok only)

TypeEndpointExample
Following/v1/tiktok/user/followinghandle=stoolpresidente
Followers/v1/tiktok/user/followershandle=stoolpresidente
Audience demographics/v1/tiktok/user/audience (26 credits!)handle=shakira
Live stream/v1/tiktok/user/livehandle=thejustalex

TikTok Shop

TypeEndpointPrimary ParamExample
Search products/v1/tiktok/shop/searchqueryshoes
Store products/v1/tiktok/shop/productsurlhttps://www.tiktok.com/shop/store/goli-nutrition/7495794203056835079
Product detail/v1/tiktok/producturlhttps://www.tiktok.com/shop/pdp/goli-ashwagandha-gummies.../1729587769570529799
Product reviews/v1/tiktok/shop/product/reviewsurl or product_id1731578642912612516
User showcase/v1/tiktok/user/showcasehandlemrtiktokreviews

Link-in-Bio / Other

ServiceEndpointParamExample
Linktree/v1/linktreeurlhttps://linktr.ee/...
Komi/v1/komiurlhttps://komi.io/...
Pillar/v1/pillarurlhttps://pillar.io/...
Linkbio/v1/linkbiourlhttps://linkbio.co/...
Linkme/v1/linkmeurlhttps://linkme.bio/...
Amazon Shop/v1/amazon/shopurlhttps://www.amazon.com/shop/...
Instagram basic profile/v1/instagram/basic/profileuserId314216
Instagram embed HTML/v1/instagram/user/embedhandlejane
Age/Gender detect/v1/detect/age-genderurl (social profile)https://www.tiktok.com/@charlidamelio
Credit balance/v1/credit/balance(none)

ScrapeCreators pagination

Paginated endpoints return a cursor/token in the response. Pass it back as a query param to get the next page.

Cursor FieldUsed By
cursorTikTok comments/search/song videos, Instagram comments, Reddit subreddit search, Pinterest, Bluesky, Facebook reels/photos/posts/comments, TikTok Shop products/user showcase
max_cursorTikTok profile videos
min_timeTikTok following/followers
continuationTokenYouTube (all paginated endpoints)
afterReddit posts, Reddit search
next_max_idInstagram posts, Truth Social posts
max_idInstagram reels
pageTikTok popular/shop, Instagram reels search, LinkedIn company posts, TikTok Shop reviews
paginationTokenLinkedIn ads

ScrapeCreators known limitations

  • Handles: pass without the @ symbol. Use charlidamelio not @charlidamelio. Applies to TikTok, Instagram, Twitter, Threads, Bluesky, Snapchat, Twitch, Pinterest, Truth Social
  • YouTube handles: pass without the @ symbol. Use ThePatMcAfeeShow not @ThePatMcAfeeShow. You can also pass a channelId or full URL instead
  • Hashtags: pass without the # symbol. Use fyp not #fyp. Applies to TikTok and YouTube hashtag search endpoints
  • Twitter: returns ~100 most popular tweets, not chronological/latest
  • Threads: only last 20-30 posts visible publicly
  • Facebook posts: only 3 posts per page (API limitation)
  • Facebook group posts: only 3 posts per page (same limitation)
  • LinkedIn company posts: max 7 pages
  • Instagram play counts: IG-only views (excludes cross-posted FB views)
  • Truth Social: only prominent users (Trump, Vance, etc.) work publicly
  • Transcripts: all transcript endpoints require video under 2 minutes
  • Reddit subreddit names: case-sensitive! Use "AskReddit" not "askreddit"

Access patterns

ScrapeCreators (social media)

Call the ScrapeCreators API directly using curl or WebFetch. Authenticate with x-api-key header.

curl -s "https://api.scrapecreators.com/v1/tiktok/profile?handle=charlidamelio" \
  -H "x-api-key: $SCRAPECREATORS_API_KEY"

Or with fetch:

const res = await fetch(
  "https://api.scrapecreators.com/v1/tiktok/profile?handle=charlidamelio",
  { headers: { "x-api-key": process.env.SCRAPECREATORS_API_KEY } }
);
const data = await res.json();

Each endpoint has its own OpenAPI spec at https://docs.scrapecreators.com/{path}/openapi.json. Always fetch the per-endpoint spec first to get full parameter details before making the actual API call. The full spec is at https://docs.scrapecreators.com/openapi.json (large file — prefer per-endpoint specs).

Common optional params:

  • trim (boolean): reduces response payload size. Use when you only need key metrics.
  • region (string): 2-letter country code for proxy location. Does NOT filter by region — just routes through that country's proxy.

SerpApi (YouTube via transparent proxy)

Use Python scripts with core.http_client.proxied_get; include a typed SC-CALLER-ID header for cost tracking.

from core.http_client import proxied_get

headers = {"SC-CALLER-ID": "chat:youtube-transcript"}

search = proxied_get(
    "https://serpapi.com/search.json",
    params={"engine": "youtube", "search_query": "topic keywords"},
    headers=headers,
).json()

video = proxied_get(
    "https://serpapi.com/search.json",
    params={"engine": "youtube_video", "v": "VIDEO_ID"},
    headers=headers,
).json()

transcript = proxied_get(
    "https://serpapi.com/search.json",
    params={"engine": "youtube_video_transcript", "v": "VIDEO_ID", "language_code": "en"},
    headers=headers,
).json()

Firecrawl (web page fallback via transparent proxy)

from core.http_client import proxied_post

headers = {"SC-CALLER-ID": "chat:web-crawl-fallback"}

page = proxied_post(
    "https://api.firecrawl.dev/v2/scrape",
    json={
        "url": "https://example.com/article",
        "formats": ["markdown", "links"],
        "onlyMainContent": True,
        "timeout": 60000,
    },
    headers=headers,
).json()

Decision rules

Route every request to the right backend. The user should never need to specify which API to use.

Social media request (profile, posts, comments, search, ads, trending, transcripts)

Use ScrapeCreators. Match the user's intent to an endpoint from the routing tables above. Strip @ from handles and # from hashtags before calling. Fetch the per-endpoint OpenAPI spec first for full param details.

YouTube URL or YouTube content request

Use SerpApi for YouTube search/discovery and transcript retrieval via the transparent proxy. Alternatively, ScrapeCreators also covers YouTube (channel info, videos, shorts, playlists, comments, transcripts, search, trending shorts) — use whichever is more convenient for the specific request.

For a YouTube URL, extract the 11-character video id and call youtube_video_transcript first if the user's goal is content analysis, summarization, quote extraction, or topic mining. This returns timestamped transcript segments directly.

If the transcript call returns empty, unavailable, or the wrong language, call youtube_video next to inspect title, description, channel, publication date, and any advertised transcript metadata. Try one obvious language_code change only when the desired language is clear. If no transcript exists, summarize from metadata only and say the transcript was not available.

For a YouTube topic query, call engine=youtube, choose relevant video_results, then call metadata/transcript only for the videos needed. Avoid fetching many transcripts by default.

Blocked or JS-heavy web page

Use Firecrawl once with formats:["markdown","links"] and onlyMainContent:true. Treat the returned Markdown as the extraction substrate, not as final truth: parse the title, price/value fields, specs, body description, image URLs, outbound links, and obvious contact/location hints from the page structure.

General web-page extraction lessons:

  • Many listing/detail pages render important content with JavaScript, image galleries, hidden sections, or repeated UI labels. web_fetch may return boilerplate while Firecrawl can still recover the real main content.
  • Do not hard-code site-specific labels. Convert page text into a generic structured summary: what it is, where it is, key numbers, evidence snippets, media/links, and caveats.
  • Preserve source URLs for images and links when they help verify the page, but do not download or batch-process every media asset unless the user asks.
  • If Markdown misses important layout or structured fields, retry once with rawHtml; use json, question, or highlights only when the user asked for narrow extraction and the schema/prompt is specific.

Cost discipline

ScrapeCreators — most endpoints cost 1 credit per request. Exceptions: /v1/tiktok/user/audience costs 26 credits; /v1/tiktok/video/transcript with use_ai_as_fallback=true costs +10 credits; /v1/google/company/ads with get_ad_details=true costs 25 credits. Warn users before calling expensive endpoints.

SerpApi — billed per successful search-like call, regardless of result count.

Firecrawl — billed per page plus expensive modifiers.

Keep calls tight: one page, one video, or a small shortlist. Never batch-crawl whole websites or bulk-scrape entire feeds with this skill.

If the proxy returns 403, the request is outside the allowed use case. Change the approach instead of retrying.

If the proxy returns 429, back off; do not parallelize around the limit.

If the upstream returns a failure, report the exact failure and avoid repeated paid retries unless one parameter change is clearly justified.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

coinglass

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

hyperliquid

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

coingecko

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

twitter

No summary provided by upstream source.

Repository SourceNeeds Review