human-extractor

Human Extractor Skill

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "human-extractor" with this command: npx skills add yousufjoyian/claude-skills/yousufjoyian-claude-skills-human-extractor

Human Extractor Skill

Description

GPU-accelerated pipeline for detecting, tracking, and classifying humans in dashcam footage. Processes MP4 videos to extract human crops with optional CLIP-based head covering classification, saving all outputs to a unified directory with comprehensive indexing.

Purpose

Extract visual evidence of human presence from dashcam recordings for investigative analysis. Optimized for high throughput using NVDEC decoding, batched YOLOv8 detection, ByteTrack multi-object tracking, and optional CLIP classification.

Usage

Basic Invocation

Extract humans from Park_R videos on October 6, 2025

Advanced Invocation

Scan Park_R\20251006 and 20251007, keep only frames with people, save all outputs in one folder, add one full-frame per timestamp with boxes, use my GPU at max, filter for head-covered individuals at 80% confidence

Input Parameters

Required

  • roots (list[str]): One or more source directories containing MP4 files

  • Example: ["G:\My Drive\PROJECTS\INVESTIGATION\DASHCAM\Park_R\20251006"]

Core Detection

  • confidence (float, default: 0.35): YOLOv8 detection confidence threshold (0.0-1.0)

  • iou (float, default: 0.50): IoU threshold for NMS (0.0-1.0)

  • yolo_batch (int, default: 64): YOLOv8 batch size (32-128 depending on VRAM)

CLIP Filtering (Optional)

  • clip_filter.enabled (bool, default: false): Enable head covering classification

  • clip_filter.threshold (float, default: 0.80): CLIP confidence threshold

  • clip_filter.batch (int, default: 384): CLIP batch size (256-512)

Hardware Acceleration

  • nvdec (bool, default: true): Use NVIDIA hardware video decoding

  • gpu_id (int, default: 0): CUDA device ID

Output Control

  • single_output_dir (str, default: "parsed\ALL_CROPS"): Unified output directory

  • save_full_frame (bool, default: false): Save one annotated full-frame per timestamp

  • full_frame_maxw (int, default: 1280): Max width for full-frame saves

  • draw_boxes (bool, default: true): Annotate boxes on full-frames

Filename Convention

  • filename_version (str, default: "v1"): Version tag for output filenames

Deduplication

  • dedup.enabled (bool, default: true): Enable similarity deduplication

  • dedup.ssim (float, default: 0.92): SSIM threshold (0.0-1.0)

  • dedup.rate_cap_per_track_per_min (int, default: 12): Max crops per track per minute

Parallel Processing

  • parallel.dates (list[str], optional): Process multiple dates concurrently

  • parallel.max_workers (int, default: 3): Max parallel date workers

Output Format

Success Response

{ "status": "ok", "summary": { "videos_processed": 142, "crops_saved": 4414, "frames_saved": 728, "gpu_util_avg": 0.85, "processing_time_sec": 2847, "errors": 0 }, "artifacts": { "index_csv": "G:\My Drive\PROJECTS\APPS\Human_Detection\parsed\ALL_CROPS\INDEX.csv", "output_dir": "G:\My Drive\PROJECTS\APPS\Human_Detection\parsed\ALL_CROPS", "log_file": "G:\My Drive\PROJECTS\APPS\Human_Detection\parsed\ALL_CROPS\run_20251006_143022.log" }, "performance": { "nvdec_active": true, "yolo_batch": 64, "clip_batch": 384, "avg_fps": 48.3, "vram_peak_gb": 9.2 }, "notes": [ "NVDEC hardware decoding active", "Batched YOLO=64, CLIP=384", "GPU utilization: 85%" ] }

Error Response

{ "status": "error", "error": "CUDA out of memory", "suggestion": "Reduce batch sizes: yolo_batch=48, clip_batch=256", "partial_results": { "videos_processed": 67, "crops_saved": 2103 } }

Output Structure

Directory Layout

parsed\ALL_CROPS
├── INDEX.csv # Global master index ├── INDEX.20251006_pid1234.csv # Shard (pre-merge) ├── run_20251006_143022.log # Execution log │

Crop files (per person detection)

├── 20251006__20251006142644_070785B__t15234__f365__trk017__x1014y46w266h659__c85__v1.webp ├── 20251006__20251006143844_070787B__t8420__f202__trk003__x234y567w180h420__c92__v1.webp │

Full-frame files (optional, one per timestamp)

├── 20251006__20251006142644_070785B__t15234__FRAME__v1.webp └── 20251006__20251006143844_070787B__t8420__FRAME__v1.webp

Filename Convention

Crop Format:

<date>__<video_stem>__t<ts_ms>__f<frame_idx>__trk<track_id>__x<x1>y<y1>w<w>h<h>__c<covered_0to100>__v<ver>.webp

Example: 20251006__20251006142644_070785B__t15234__f365__trk017__x1014y46w266h659__c85__v1.webp

Decoded:

  • Date: 2025-10-06
  • Video: 20251006142644_070785B.MP4
  • Timestamp: 15234 ms
  • Frame: 365
  • Track: 17
  • BBox: x=1014, y=46, w=266, h=659
  • CLIP confidence: 85% (head covering)
  • Version: v1

Full-Frame Format:

<date>__<video_stem>__t<ts_ms>__FRAME__v<ver>.webp

Example: 20251006__20251006142644_070785B__t15234__FRAME__v1.webp

INDEX.csv Schema

dataset,date,video_rel,video_stem,frame_idx,ts_ms,track_id,x1,y1,w,h,person_conf,covered_conf,file_type,crop_file,sha1,bboxes_json,annotated,pipeline_ver,yolo_batch,clip_batch,nvdec,created_utc

Example rows:

Park_R,20251006,20251006\20251006142644_070785B.MP4,20251006142644_070785B,365,15234,17,1014,46,266,659,0.92,0.85,crop,20251006__20251006142644_070785B__t15234__f365__trk017__x1014y46w266h659__c85__v1.webp,a3f2c8b9...,,,v1,64,384,1,2025-10-06T14:30:22Z Park_R,20251006,20251006\20251006142644_070785B.MP4,20251006142644_070785B,365,15234,,,,,,,frame,20251006__20251006142644_070785B__t15234__FRAME__v1.webp,d4e1a2c7...,"[{""x1"":1014,""y1"":46,""w"":266,""h"":659,""conf"":0.92,""track"":17}]",1,v1,64,384,1,2025-10-06T14:30:22Z

Column Definitions:

  • dataset: Source camera (Park_R, Park_F, Movie_F, Movie_R)

  • date: YYYYMMDD

  • video_rel: Relative path from dataset root

  • video_stem: Filename without .MP4 extension

  • frame_idx: Frame number in video

  • ts_ms: Timestamp in milliseconds

  • track_id: ByteTrack ID (empty for FRAME rows)

  • x1,y1,w,h: Bounding box (empty for FRAME rows)

  • person_conf: YOLOv8 detection confidence

  • covered_conf: CLIP head covering confidence (0-100 scale, empty if disabled)

  • file_type: "crop" or "frame"

  • crop_file: Relative filename

  • sha1: File hash for integrity

  • bboxes_json: All detections in frame (FRAME rows only)

  • annotated: 1 if boxes drawn on frame, 0 otherwise

  • pipeline_ver: Semantic version tag

  • yolo_batch: YOLO batch size used

  • clip_batch: CLIP batch size used (0 if disabled)

  • nvdec: 1 if NVDEC used, 0 otherwise

  • created_utc: ISO 8601 timestamp

Implementation Details

Processing Pipeline

[MP4 Videos] │ ▼ [NVDEC Decoder (GPU)] RGB tensor → CUDA Stream A │ ▼ [YOLOv8s Detection] Batched (64 frames) FP16, conf=0.35 │ ▼ [ByteTrack Tracking] IoU=0.5, max_age=10 │ ├──────────────────────► [Full-Frame Saver] │ (optional, downscaled, annotated) ▼ [ROI Align (GPU)] Extract crops on GPU │ ▼ [CLIP Classification] ◄────── (optional) Batched (384 crops) FP16, threshold=0.80 │ ▼ [Deduplication Filter] SSIM ≥ 0.92 Rate cap: 12/min/track │ ▼ [Async I/O Thread Pool] WebP encode (q=85) Shard INDEX writes │ ▼ [Final Merge] INDEX.csv

GPU Optimization Strategy

Dual CUDA Streams:

  • Stream A: YOLOv8 detection

  • Stream B: CLIP classification

  • Overlap compute + memory transfers

Dynamic Batching:

  • Accumulate frames until batch size reached

  • Process immediately on timeout (100ms)

  • Keep GPU pipeline full (80-90% utilization)

Memory Management:

  • Pinned memory for faster CPU↔GPU transfers

  • Pre-allocated tensor buffers

  • Stream-ordered operations

Decoder Priority:

  • NVDEC (GPU hardware decoder) - 5-10x faster

  • CPU fallback (OpenCV) if NVDEC unavailable

  • Multi-threaded DataLoader (8-12 workers)

Performance Targets (RTX 4080 16GB)

Metric Target Notes

GPU Utilization 80-90% NVDEC + dual streams + large batches

Throughput 3-4 videos/min Parking videos (2 FPS sampling)

VRAM Usage 6-10 GB YOLO=64, CLIP=384

Latency <30s per video Including decode, detect, track, classify

Configuration Tuning

If GPU util < 70%:

  • Increase batch sizes: yolo_batch=80 , clip_batch=448

  • Verify NVDEC active (check nvdec_active in response)

  • Increase parallel workers: max_workers=4

If CUDA OOM:

  • Reduce CLIP batch first: clip_batch=256

  • Then reduce YOLO batch: yolo_batch=48

  • Disable full-frame saves: save_full_frame=false

If disk I/O bottleneck:

  • Disable full-frame: save_full_frame=false

  • Reduce quality: full_frame_maxw=960 , WebP q=75

  • Use faster storage (NVMe SSD)

CLI Equivalent

Basic usage

python -m src.cli.run_multi_dates
--root "G:\My Drive\PROJECTS\INVESTIGATION\DASHCAM\Park_R"
--out parsed\ALL_CROPS
--dates 20251006 20251007
--use-nvdec --conf 0.35 --iou 0.5

Advanced usage with CLIP filtering

python -m src.cli.run_multi_dates
--root "G:\My Drive\PROJECTS\INVESTIGATION\DASHCAM\Park_R"
--out parsed\ALL_CROPS
--dates 20251006 20251007 20251008
--use-nvdec
--yolo-batch 64
--clip-batch 384
--clip-threshold 0.80
--conf 0.35
--iou 0.5
--save-full-frame
--draw-boxes
--parallel 3

Example Interactions

Example 1: Basic Detection

User: "Extract all humans from Park_R videos on October 6"

Skill invokes:

{ "mode": "extract_humans", "roots": ["G:\My Drive\PROJECTS\INVESTIGATION\DASHCAM\Park_R\20251006"], "confidence": 0.35, "single_output_dir": "parsed\ALL_CROPS", "nvdec": true }

Example 2: Advanced with CLIP

User: "Scan Park_R for October 6-8, filter for people with head coverings at 80% confidence, save annotated frames, max GPU usage"

Skill invokes:

{ "mode": "extract_humans", "roots": [ "G:\My Drive\PROJECTS\INVESTIGATION\DASHCAM\Park_R\20251006", "G:\My Drive\PROJECTS\INVESTIGATION\DASHCAM\Park_R\20251007", "G:\My Drive\PROJECTS\INVESTIGATION\DASHCAM\Park_R\20251008" ], "confidence": 0.35, "iou": 0.50, "yolo_batch": 64, "clip_filter": { "enabled": true, "threshold": 0.80, "batch": 384 }, "nvdec": true, "save_full_frame": true, "draw_boxes": true, "single_output_dir": "parsed\ALL_CROPS", "parallel": { "max_workers": 3 } }

Example 3: Low-Resource Mode

User: "Process Park_R October 6 with minimal GPU memory"

Skill invokes:

{ "mode": "extract_humans", "roots": ["G:\My Drive\PROJECTS\INVESTIGATION\DASHCAM\Park_R\20251006"], "confidence": 0.35, "yolo_batch": 32, "clip_filter": { "enabled": false }, "nvdec": false, "save_full_frame": false, "single_output_dir": "parsed\ALL_CROPS" }

Safety & Guardrails

Do Not

  • ❌ Move or delete source MP4 files

  • ❌ Infer gender unless explicitly enabled (sensitive, noisy)

  • ❌ Process videos without user consent

  • ❌ Share outputs containing identifiable persons

Do

  • ✅ Verify GPU availability before processing

  • ✅ Enforce longitude sign corrections for GPS overlays

  • ✅ Maintain audit trail in INDEX.csv

  • ✅ Log versions, batches, NVDEC usage

  • ✅ Handle OOM gracefully with suggestions

Resume Safety

  • Idempotent: skip already-processed crops by filename

  • Shard-based: partial runs can resume

  • Index integrity: SHA1 hashes verify file correctness

Testing & Verification

Pre-Run Checks

GPU availability

assert torch.cuda.is_available(), "CUDA required" assert torch.cuda.device_count() > 0, "No GPU found"

Model files

assert Path("models/yolov8s.pt").exists(), "YOLOv8 model missing"

Output directory writable

output_dir = Path("parsed/ALL_CROPS") output_dir.mkdir(parents=True, exist_ok=True) assert os.access(output_dir, os.W_OK), "Output dir not writable"

Post-Run Verification

Check outputs exist

assert Path("parsed/ALL_CROPS/INDEX.csv").exists() assert len(list(Path("parsed/ALL_CROPS").glob("*.webp"))) > 0

Validate INDEX.csv

df = pd.read_csv("parsed/ALL_CROPS/INDEX.csv") assert df['crop_file'].notna().all() assert df['person_conf'].between(0, 1).all()

Sample roundtrip

sample = df.sample(1).iloc[0] assert Path(f"parsed/ALL_CROPS/{sample['crop_file']}").exists()

GPU utilization check

assert gpu_util_avg > 0.70, f"Low GPU util: {gpu_util_avg}"

Dependencies

Required

  • Python 3.10+

  • PyTorch 2.0+ with CUDA 11.8+

  • ultralytics (YOLOv8)

  • transformers (CLIP)

  • opencv-python

  • pillow

  • pandas

  • numpy

Optional (Performance)

  • NVIDIA Video Codec SDK (NVDEC)

  • TensorRT (future optimization)

  • nvJPEG (GPU JPEG encoding)

Installation

cd "G:\My Drive\PROJECTS\APPS\Human_Detection" pip install -r requirements.txt

Troubleshooting

Common Issues

  1. CUDA Out of Memory

Error: CUDA out of memory. Tried to allocate 2.50 GiB Solution: Reduce batch sizes yolo_batch: 64 → 48 → 32 clip_batch: 384 → 256 → 128

  1. NVDEC Not Available

Warning: NVDEC unavailable, falling back to CPU decode Solution: Check NVIDIA driver version (≥525.60) GPU must support Video Codec SDK Verify with: nvidia-smi --query-gpu=name --format=csv

  1. Low GPU Utilization

Warning: GPU util only 45% Solutions:

  1. Increase batch sizes (if VRAM allows)

  2. Enable NVDEC: nvdec=true

  3. Increase parallel workers: max_workers=4

  4. Check CPU bottleneck (use more DataLoader workers)

  5. Slow Processing

Performance: 0.8 videos/min (expected 3-4) Diagnostics:

  1. Check disk I/O (use SSD)
  2. Verify NVDEC active (5-10x faster than CPU)
  3. Profile with: python -m torch.utils.bottleneck script.py

Future Enhancements

Planned

  • TensorRT optimization (2-4x CLIP speedup)

  • Multi-GPU sharding (process different dates on different GPUs)

  • GPU JPEG/WebP encoding (nvJPEG)

  • Real-time streaming mode

Under Consideration

  • Face recognition integration

  • Gender classification (opt-in only, with warnings)

  • Action recognition (walking, standing, etc.)

  • Multi-camera fusion (correlate detections across cameras)

Version History

v1.0 (Current)

  • Initial release

  • YOLOv8s + ByteTrack + CLIP

  • NVDEC support

  • Unified output directory

  • Global INDEX.csv

References

  • YOLOv8 Documentation

  • ByteTrack Paper

  • CLIP Paper

  • NVIDIA Video Codec SDK

Contact & Support

For issues or questions:

  • Check parsed/ALL_CROPS/run_*.log for error details

  • Review GPU diagnostics: nvidia-smi

  • Validate input paths exist and are readable

  • Verify CUDA/PyTorch installation: python -c "import torch; print(torch.cuda.is_available())"

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

tesseract

No summary provided by upstream source.

Repository SourceNeeds Review
General

context-extract

No summary provided by upstream source.

Repository SourceNeeds Review
General

context-manager

No summary provided by upstream source.

Repository SourceNeeds Review
General

a2ui-embed

No summary provided by upstream source.

Repository SourceNeeds Review