vlm-image-helper

Visual inspection helper for VLM and OCR workflows. Use when agent needs to help a vision model see an image more clearly before re-analysis: rotate misaligned or sideways text, crop to a relevant region, zoom small details, enhance readability, or convert an image for re-input. Trigger especially when the model cannot confidently read text, cannot tell similar characters apart such as O/0 or I/l/1, says the image is unclear, needs to inspect only one area of the image, or would benefit from a second pass on a clearer view. Do not use as a general-purpose image editor.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "vlm-image-helper" with this command: npx skills add testlbin/vlm-image-helper

VLM Image Helper

Treat this skill as a visual aid for the model, not as a general image editor.

Use scripts/image_helper.py to create a clearer intermediate image, then re-run analysis on that result.

Core Workflow

  1. Start from the original image path, a raw base64 string, or a data URI.
  2. Apply the smallest transformation that is likely to remove the ambiguity.
  3. Prefer semantic crop presets over manual coordinates unless the exact box is already known.
  4. Return the processed image as a file or base64, then re-read that result.
  5. If the image is still unclear, iterate once with a tighter crop or stronger zoom instead of stacking many edits at once.

Quick Commands

# Rotate sideways text
python scripts/image_helper.py image.png --rotate 90 -o rotated.png

# Crop a likely area and zoom it
python scripts/image_helper.py image.png --crop-preset bottom-right --scale-preset x3 -o detail.png

# Improve low-contrast text
python scripts/image_helper.py image.png --auto-enhance -o enhanced.png

# Convert an existing file path directly to base64
python scripts/image_helper.py image.png --base64

Choose the Next Action

  • Text is sideways or upside down: use --rotate.
  • Only one region matters: use --crop-preset first, then add --scale-preset.
  • Small text or icons are hard to read: use --scale-preset x2 or x3.
  • Contrast is weak or edges are fuzzy: use --auto-enhance, or manually tune --contrast and --sharpness.
  • Another tool needs inline image data instead of a file path: add --base64.
  • The source image arrives as raw base64 or a data URI: use --input-mode auto or force --input-mode base64 / data-uri.

Input and Output Rules

  • Accept a file path, raw base64 string, or data URI as input.
  • Return a file with -o or return inline base64 with --base64.
  • Allow passthrough output with no edits when the only goal is format conversion or path-to-base64 conversion.

References

  • Full CLI reference: references/cli-reference.md
  • Crop and scale preset table: references/presets.md

Prerequisite

Install Pillow if it is missing:

pip install Pillow
# or
uv pip install Pillow

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

Batch Content Factory

Multi-platform content production line. Automates the entire workflow from topic research to content creation. Suitable for self-media operators producing hi...

Registry SourceRecently Updated
Research

Fund Analyzer Pro

[何时使用]当用户需要基金深度分析时;当用户说"分析这个基金""基金对比""基金诊断""基金经理分析"时;当检测到基金代码/基金名称/投顾策略时触发。整合天天基金 API+ 且慢 MCP,提供单一基金分析/基金比较/基金诊断/持仓诊断/基金经理/机会分析/投资方式/报告信号八大模块。新增信号监控提醒功能(sign...

Registry SourceRecently Updated
Research

FN Portrait Toolkit

Financial report footnote extraction and analysis tool for Chinese A-share listed companies. Use when: (1) User wants to extract financial note data from ann...

Registry SourceRecently Updated
Research

流式AI检索问答技能

通用流式AI检索问答技能 — 为任意行业应用提供四步流式分析交互界面。 触发场景:用户输入关键词 → AI自动执行:理解意图 → 检索知识库 → 流式生成 → 来源标记 → 完整回答。 当需要实现以下任意场景时激活: (1) AI搜索框 / 智能咨询组件重构 (2) 知识库问答(医疗/法律/金融/教育等垂直领域)...

Registry SourceRecently Updated