tiktok-hotspot-monitor

TikTok US women's fashion hotspot monitor. Crawls video metadata via Apify (primary) or Playwright (backup), analyzes trends with heat/coverage scoring, generates static HTML reports. Supports cost-aware 5-window crawl strategy.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "tiktok-hotspot-monitor" with this command: npx skills add tandongtaotao/tiktok-hotspot-monitor

TikTok Hotspot Monitor — Agent Skill

1. Task Boundary (Scope)

Responsible For

  • Crawling TikTok video public metadata (keyword/hashtag/creator/music sources) via Apify cloud Actor (clockworks/tiktok-scraper)
  • Fallback crawling via Playwright browser automation with saved session
  • Offline deduplication, heat scoring, and trend analysis
  • Term extraction: content keywords and TikTok hashtags, with multi-bucket aging
  • Long-term term status based on current-snapshot age distribution, not only previous-snapshot overlap
  • Coverage scoring to surface "broadly appearing" signals vs "single viral" signals
  • Static HTML report generation with dark theme

NOT Responsible For

  • Downloading video/audio files
  • Real-time streaming or WebSocket data
  • TikTok login or session management (must be pre-configured)
  • Sentiment analysis of comments
  • Cross-platform trend comparison
  • Automated social media posting
  • User authentication or authorization
  • Data persistence beyond local JSONL/JSON files

Agent Addition Scope

The agent MAY add new keyword/hashtag sources to the config. The agent MUST NOT modify crawl window weights or add new window types without user approval, as those affect Apify billing.


2. Input Schema

2.1 Main Config (config/tiktok_hotspot_sources.json)

interface CrawlerConfig {
  market: string;                    // default: "US"
  output: {
    base_dir: string;                // default: "data/tiktok_hotspots"
    snapshots_dir: string;           // default: "snapshots"
    logs_dir: string;                // default: "logs"
  };
  provider: {
    type: "apify" | "tiktok_mcp";   // default: "apify"
    actor_id?: string;               // required if type=apify
  };
  defaults: {
    limit: number;                   // default: 10, per-source limit
  };
  sources: Array<{
    type: "keyword" | "hashtag" | "creator" | "music";
    value: string;
    limit?: number;                  // override defaults.limit
    enabled?: boolean;               // default: true
  }>;
  apify?: {
    token_env?: string;              // default: "APIFY_TOKEN"
    actor_id?: string;
    input: {
      defaults: Record<string, any>;
      per_source?: Record<string, any>;
      crawl_windows?: Record<string, CrawlWindow[]>;
    };
  };
  tiktok_mcp?: {
    command?: string;
    args?: string[];
    timeout_seconds?: number;
    reject_simulated?: boolean;
  };
}

interface CrawlWindow {
  name: string;
  label: string;
  weight: number;                    // allocation weight
  input: Record<string, any>;        // searchSorting, searchDatePosted, etc.
}

2.2 CLI Arguments

ArgumentTypeDefaultDescription
--configPathconfig/tiktok_hotspot_sources.jsonConfig file
--onceFlag-Run single crawl
--scheduleFlag-Run continuously
--max-sourcesintNoneLimit enabled sources
--snapshotPathlatestJSONL snapshot for analysis
--previous-snapshotPathautoPrevious snapshot for comparison
--topint10Items per ranked section
--reportPathlatestAnalysis JSON for rendering

2.3 Environment Variables

VariableRequiredDescription
APIFY_TOKENFor Apify modeApify API token
TIKTOK_PROXYFor Playwright modeProxy URL

3. Output Schema

3.1 Crawl Snapshot (JSONL, one record per line)

interface CrawlRecord {
  crawl_timestamp: string;           // UTC ISO
  source_type: "keyword" | "hashtag" | "creator" | "music";
  source_value: string;
  crawl_window: string;
  crawl_window_label: string;
  crawl_window_limit: number;
  video_id: string | null;
  webpage_url: string | null;
  title: string | null;
  description: string | null;
  uploader: string | null;
  uploader_id: string | null;
  view_count: number | null;
  like_count: number | null;
  comment_count: number | null;
  share_count: number | null;
  collect_count: number | null;
  hashtags: string[] | null;
  music: {
    id: string | null;
    track: string | null;
    artist: string | null;
  };
  upload_date: string | null;        // ISO date
  duration: number | null;
  is_ad: boolean | null;
}

3.2 Crawl Log (JSONL)

interface LogEntry {
  crawl_timestamp: string;
  source_type: string;
  source_value: string;
  crawl_window: string;
  crawl_window_limit: number;
  status: "success" | "failed";
  record_count: number;
  error: string | null;
}

Last entry is a CrawlRoundSummary:

interface CrawlRoundSummary {
  event: "crawl_round_summary";
  crawl_timestamp: string;
  provider: string;
  enabled_source_count: number;
  crawl_window_count: number;
  planned_run_count: number;
  requested_total_limit: number;
  completed_run_count: number;
  failed_run_count: number;
  raw_record_count: number;
  unique_video_count: number;
  duplicate_rate: number;            // 0.0 - 1.0
  effective_unique_yield: number;    // unique / requested
  windows: Record<string, WindowMetrics>;
  cost_model_note: string;
}

3.3 Analysis Report (JSON)

interface AnalysisReport {
  generated_at: string;
  snapshot_path: string;
  previous_snapshot_path: string | null;
  analysis_window: {
    current_snapshot_time: string;
    previous_snapshot_time: string | null;
    interval_hours: number | null;
    matched_previous_video_count: number;
  };
  record_count: number;
  unique_video_count: number;
  source_counts: Record<string, number>;
  top_videos: VideoItem[];
  top_rising_videos: VideoItem[];
  recent_videos_by_age: AgeBucket<VideoItem>[];
  recent_signals_by_age: SignalBucket[];
  established_terms: TermItem[];
  established_hashtags: TermItem[];
  top_music: RankedItem[];
  top_creators: RankedItem[];
  crawl_metrics: CrawlRoundSummary | null;
}

3.4 HTML Report

Self-contained static HTML file at data/tiktok_hotspot_analysis/tiktok_hotspot_report_<timestamp>.html. No external dependencies. Dark themed. Machine-readable data embedded as JSON in comments.


4. Tools

4.1 crawl_tiktok_hotspots.py — Metadata Crawler

When to call:

  • User requests data collection
  • Need fresh snapshot for analysis
  • Smoke test / validation run

When NOT to call:

  • User wants to view existing data only (use analyze instead)
  • No config changes made when config is invalid
  • Apify mode: APIFY_TOKEN not set (check env first)
  • MCP mode: session file missing (run tiktok_login_save_session.py first)

Provider switching: Edit config/tiktok_hotspot_sources.json to switch between providers:

// Apify mode (default, full features)
{ "provider": { "type": "apify", "actor_id": "clockworks/tiktok-scraper" } }

// Local MCP mode (limited, testing only)
{ "provider": { "type": "tiktok_mcp" } }

MCP mode requires:

  1. pip install playwright && playwright install chromium
  2. python scripts/tiktok_login_save_session.py (manual TikTok login)
  3. Config tiktok_mcp.args pointing to scripts/tiktok_search_mcp_adapter.py

Implementation:

# Provider dispatch
if config.provider_type == "apify":
    # Requires APIFY_TOKEN in env
    # Each source × window → one Actor run
    # Supports all 4 source types
elif config.provider_type == "tiktok_mcp":
    # Requires saved session file
    # Keyword/hashtag only, ~12 items per source

Error states:

ErrorRecovery
Apify token missingCheck env, prompt user to set APIFY_TOKEN
Actor run timeoutRetry with same config
No videos foundLog as failed window, continue
MCP session expiredPrompt re-login via tiktok_login_save_session.py
Proxy unreachableSkip proxy or switch to Apify
Snapshot emptyCheck sources config, ensure keywords are valid

Retry policy:

  • Network errors: retry up to 2 times with 5s backoff
  • Actor failures: no retry (Apify handles internally), log and continue
  • MCP browser crash: retry once

4.2 analyze_tiktok_hotspots.py — Offline Analyzer

When to call:

  • After crawl completes
  • User has existing snapshot to analyze
  • Need updated report

Implementation steps:

  1. Load snapshot JSONL → validate each record has video_id
  2. Deduplicate by video_id (keep highest heat score)
  3. Compute per-video heat score
  4. Bucket videos by upload age (1d/3d/7d/14d)
  5. Extract content terms and hashtags
  6. Compute cross-bucket novelty (new vs existing terms)
  7. Compute coverage scores
  8. Compare with previous snapshot for growth metrics
  9. Output structured JSON

4.2.1 Long-term Term Status

Long-term content terms and hashtags are not dropped when they are missing from the previous snapshot. A term enters the long-term section when its oldest matched video is older than 30 days. Its status is then computed from the current snapshot's video-age distribution:

StatusConditionMeaning
spreadingnewest video <= 7 days AND recent_7d_count / video_count >= 10%Still actively spreading
mature_or_flatnewest video <= 30 days but 7d ratio is too lowExisting signal, activity weakening
coolingnewest video > 30 daysNo recent new videos; cooling down

This avoids losing a long-term term simply because the previous crawl did not hit it, while also preventing one recent video among many old videos from falsely marking a term as spreading.

4.3 — HTML Report Generator

When to call:

  • After analysis completes
  • User requests visual output

Output: Valid HTML5, self-contained, no external CSS/JS.

4.4 tiktok_login_save_session.py — Session Setup (optional)

When to call:

  • User wants to use local Playwright mode
  • Session file missing or expired

5. State Machine

IDLE
  │
  ▼
CONFIG_LOAD ──invalid──▶ ERROR (report config issue)
  │
  ▼
CRAWL_PLAN
  ├─ Build requests: enabled_sources × crawl_windows
  ├─ Compute: planned_run_count, requested_total_limit
  └─ Validate: at least 1 enabled source
  │
  ▼
CRAWL_EXECUTE ──fail──▶ PARTIAL_COMPLETE (log failures, continue)
  │                       │
  ▼                       ▼
SNAPSHOT_WRITTEN       PARTIAL_SNAPSHOT
  │                       │
  └───────both────────────▶
  │
  ▼
ANALYZE ──empty_snapshot──▶ ERROR (no records to analyze)
  │
  ▼
REPORT_GENERATE ──fail──▶ ERROR (corrupted analysis JSON)
  │
  ▼
COMPLETE

State management is handled by the Python scripts via:

  • Exit codes: 0 (success), 1 (partial failure), 2 (config/input error)
  • Logs: per-run JSONL entries with status
  • Summary: CrawlRoundSummary as last log entry

6. Error Recovery

6.1 Crawl Phase

Failure ModeDetectionRecovery
Invalid configload_config() raises ValueErrorReport exact field, suggest fix
No enabled sourcesConfig load checkAdd at least one source
Apify token missingos.environ.get() returns emptyMessage: "Set APIFY_TOKEN in .env"
All sources failAll log entries show failedCheck token, network, actor_id
Some sources failLog shows mixed success/failContinue, report failed count
Snapshot empty0 records writtenCheck source keywords/limits
Disk fullwrite() raises OSErrorFree disk space, retry
MCP browser timeoutasyncio.wait_for raisesFallback to fewer sources
MCP session expiredActor raises RuntimeErrorRun tiktok_login_save_session.py

6.2 Analyze Phase

Failure ModeDetectionRecovery
Snapshot missingFileNotFoundErrorRun crawl first
Corrupted JSONLjson.JSONDecodeErrorCheck snapshot, re-crawl
No video recordsAll lines lack video_idReport empty snapshot
Previous snapshot missingvalid_snapshots() emptyRun without comparison
Division by zerovideo_count = 0Guard with max(vc, 1)

6.3 Report Phase

Failure ModeDetectionRecovery
Analysis JSON missingFileNotFoundErrorRun analyze first
Corrupted JSONjson.JSONDecodeErrorRe-run analyze
KeyError in templatereport.get(key) missingGraceful fallback to empty
Encoding errorUnicodeEncodeErrorForce UTF-8 output

7. Planning Logic

7.1 Task Decomposition

For a typical hotspot monitoring request, decompose as:

Step 1: Check existing data
  ├─ Is there a recent snapshot? (< 24h old)
  │   └─ Yes → skip crawl, go to Step 3
  │   └─ No → continue to Step 2
  │
Step 2: Crawl
  ├─ Validate APIFY_TOKEN exists
  ├─ Load config
  ├─ Run crawl (with timeout guard)
  └─ Verify snapshot has records
  │
Step 3: Analyze
  ├─ Auto-select latest snapshot
  ├─ Auto-select previous snapshot (if exists)
  ├─ Run analysis
  └─ Verify output JSON has all required fields
  │
Step 4: Generate report
  ├─ Render HTML from analysis JSON
  └─ Verify output is valid HTML

7.2 Decision Tree

User: "check TikTok trends for summer dresses"

Check: Does latest snapshot exist and have records?
├─ YES: Is it < 24h old?
│   ├─ YES: Skip crawl, go to analyze
│   └─ NO: Is user OK waiting 5-30 min for crawl?
│       ├─ YES: Run crawl, then analyze
│       └─ NO: Use existing snapshot, warn about staleness
└─ NO: Must crawl first
    ├─ Is APIFY_TOKEN configured?
    │   ├─ YES: Use Apify provider
    │   └─ NO: Check MCP session
    │       ├─ EXISTS: Use MCP provider (limited data)
    │       └─ MISSING: Ask user to configure one
    └─ Run crawl

8. Guardrails

8.1 Cost Limits

GuardrailValueEnforcement
Max sources per crawl50Config validation
Max limit per source500Config validation (positive_int)
Max requested total5000Config validation (project-level)
Max planned runs25050 sources × 5 windows
Apify modeRequired for > 200 recordsMCP limited to ~12/source
Report HTML size< 5MBSelf-limiting (trim if exceeded)

8.2 Time Limits

OperationTimeoutEnforcement
Single crawl run60 minBash timeout parameter
Per-Apify ActorNo limitApify handles internally
Per-MCP search120stiktok_mcp.timeout_seconds
Analysis30sPython processing (fast)
Report render10sPython processing (fast)

8.3 Rate Limits

  • No concurrent Apify runs (sequentially dispatched)
  • MCP browser: one at a time (sequential per source)
  • Web fetching: 60s minimum between full re-crawls

8.4 Token / Credit Safety

  • Never commit .env to git
  • Never print API tokens in logs or console
  • APIFY_TOKEN read from environment only
  • MCP session file is local only

9. Evaluation Criteria

9.1 Crawl Success

CriterionPassingWarningFailing
Run completion≥ 90% runs succeed70-90%< 70%
Record count≥ 80% requested50-80%< 50%
Duplicate rate< 25%25-40%> 40%
Failed windows01-3> 3
Unique videos≥ 5020-50< 20

9.2 Analysis Success

CriterionPassingFailing
Snapshot has records≥ 10 unique videos< 10
Dedup processedAll records checkedMissing video_id
Term extraction≥ 1 content term found0 terms
JSON outputAll required fields presentMissing required fields
Processing time< 30s> 60s

9.3 Report Success

CriterionPassingFailing
Valid HTMLCloses </html> tagMissing closing tag
Metrics visible≥ 4 grid metrics shownEmpty grid
Videos renderedTop list non-emptyEmpty list
All sections present6+ sections< 4 sections

9.4 Decision: Proceed to Next Stage

After a validation crawl (target ~500 records):

unique_yield = unique_videos / requested_total_limit

if unique_yield >= 0.6 and duplicate_rate < 0.25:
    ✅ Proceed to pilot (2000 target)
elif unique_yield >= 0.4:
    ⚠️ Proceed with caution, review source quality
else:
    ❌ Block scaling, fix sources/windows first

10. Composability

10.1 Output Consumption

Other skills/agents consume analysis JSON via standard path:

# Example: Another agent reads analysis for downstream processing
import json

report = json.load(open("data/tiktok_hotspot_analysis/latest_analysis.json"))
top_signals = [t["name"] for t in report.get("top_videos", [])[:5]]
hot_terms = [t["name"] for t in report.get("established_terms", [])[:10]]

10.2 Pipeline Integration

Data Source Agent
  └─► TikTok Hotspot Monitor Skill
        ├─► crawl → snapshot.jsonl
        │     └─► [External] Apify usage dashboard (cost tracking)
        ├─► analyze → analysis.json
        │     └─► [Downstream] Trend prediction / alerting
        └─► render → report.html
              └─► [Downstream] Static hosting / dashboard

10.3 File-Based Contract

All inter-skill communication is file-based:

ArtifactFormatSchemaConsumer
SnapshotJSONLCrawlRecordAnalysis, ML pipeline
AnalysisJSONAnalysisReportReport, dashboards
LogJSONLLogEntry / SummaryMonitoring, cost tracking
ReportHTMLSelf-containedHuman viewing

10.4 Exit Codes

# Standard exit codes for script chaining
0: Success (all operations completed)
1: Partial success (some failures, usable results)
2: Configuration error (fix config before retry)

Appendix: Quick Reference

# Full pipeline (one command each)
python scripts/crawl_tiktok_hotspots.py --config config/tiktok_hotspot_sources.json --once
python scripts/analyze_tiktok_hotspots.py
python scripts/render_tiktok_hotspot_report.py

# Smoke test (2 sources)
python scripts/crawl_tiktok_hotspots.py --once --max-sources 2

# Validation run (500 records)
python scripts/crawl_tiktok_hotspots.py --config config/_tiktok_hotspot_apify_500_config.json --once

Apify Cost Note: Verify actual charges at console.apify.com → Usage. Cost depends on Actor pricing, run count, compute duration, memory, proxy usage, retries, add-ons, and account plan — not only requested result count.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

apify-market-research

No summary provided by upstream source.

Repository SourceNeeds Review
3.4K-apify
Research

apify-trend-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
2.7K-apify
Research

apify-audience-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
2.5K-apify
Automation

apify-ultimate-scraper

No summary provided by upstream source.

Repository SourceNeeds Review
9.2K-apify