detect-file-type-local

Local, offline AI-powered file type detection — no network, no API keys

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "detect-file-type-local" with this command: npx skills add pgeraghty/detect-file-type-local

Detect File Type - Local

Local-only, offline file type detection. Uses an embedded ML model (Google Magika) to identify 214 file types by content — no network calls, no API keys, no data leaves the machine. All inference runs on-device via ONNX Runtime.

When to Use

  • Identify unknown files by their content (not just extension) — locally, without sending data anywhere
  • Verify that a file's extension matches its actual content
  • Check MIME types before processing uploads or downloads
  • Triage files in a directory by type
  • Detect extension mismatches and masquerading (e.g., .pdf.exe, .xlsx.lnk)
  • Flag suspicious polyglot-style payloads (for example PDF/ZIP or PDF/HTA-style chains)
  • When privacy matters — file bytes never leave the local machine

Installation

pip install detect-file-type-local

From source:

pip install -e /path/to/detect-file-type-skill

Usage

Single file

detect_file_type path/to/file

Multiple files

detect_file_type file1.pdf file2.png file3.zip

Recursive directory scan

detect_file_type --recursive ./uploads/

From stdin

cat mystery_file | detect_file_type -

# Optional best-effort fast path (head only)
cat mystery_file | detect_file_type --stdin-mode head --stdin-max-bytes 1048576 -

Output formats

detect_file_type --json file.pdf    # JSON (default)
detect_file_type --human file.pdf   # Human-readable
detect_file_type --mime file.pdf    # Bare MIME type

Programmatic (Python)

python -m detect_file_type path/to/file

Output Schema (JSON)

Single file returns an object; multiple files return an array.

{
  "path": "document.pdf",
  "label": "pdf",
  "mime_type": "application/pdf",
  "score": 0.99,
  "group": "document",
  "description": "PDF document",
  "is_text": false
}

Fields

FieldTypeDescription
pathstringInput path (or - for stdin)
labelstringDetected file type label (e.g., pdf, png, python)
mime_typestringMIME type (e.g., application/pdf)
scorefloatConfidence score (0.0–1.0)
groupstringCategory (e.g., document, image, code)
descriptionstringHuman-readable description
is_textboolWhether the file is text-based

Exit Codes

CodeMeaning
0All files detected successfully
1Fatal error (no results produced)
2Partial failure (some files failed, some succeeded)

Error Handling

Errors are printed to stderr. Common cases:

  • File not found: error: path/to/file: No such file or directory
  • Permission denied: error: path/to/file: Permission denied
  • Not a regular file: error: path/to/dir: Not a regular file

When processing multiple files, detection continues for remaining files even if some fail.

Limitations

  • Default stdin mode (spool) writes stdin to a temporary file and uses Magika path detection.
  • --stdin-mode head is best effort and may miss trailing-byte signatures.
  • Very small files (< ~16 bytes) may produce low-confidence results
  • Empty files are detected as empty
  • Detection is content-based — file extensions are ignored

Security Context

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Ai Competitor Analyzer

提供AI驱动的竞争对手分析,支持批量自动处理,提升企业和专业团队分析效率与专业度。

Registry SourceRecently Updated
General

Ai Data Visualization

提供自动化AI分析与多格式批量处理,显著提升数据可视化效率,节省成本,适用企业和个人用户。

Registry SourceRecently Updated
General

Ai Cost Optimizer

提供基于预算和任务需求的AI模型成本优化方案,计算节省并指导OpenClaw配置与模型切换策略。

Registry SourceRecently Updated