image-reader

Image recognition and understanding tool. Uses a multimodal model (e.g. doubao-seed-2.0-pro, kimi-k2.5) to analyze image content and supports OCR text extraction and image description. Use this skill when a user sends a screenshot or image and needs the text extracted or the image content understood.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "image-reader" with this command: npx skills add image-reader

Image Reader Skill

Image recognition and understanding tool that leverages Doubao multimodal models to analyze image content.


Features

  • Text Extraction (OCR): Extract text from images, suitable for documents, screenshots, posters, menus, etc.
  • Image Description: Generate detailed descriptions of images, suitable for photos, illustrations, memes, UI screens, etc.
  • General Analysis: Automatically choose the best analysis strategy based on the image type.

API Configuration

ItemValue
API Endpointhttps://ark.cn-beijing.volces.com/api/coding/v3
Modeldoubao-seed-2.0-pro
AuthenticationAPI Key (configured in config.yaml)

Usage

Command Line

# General analysis
python image_reader.py /path/to/image.png

# Extract text (OCR)
python image_reader.py /path/to/image.png -p "Extract all text from the image"

# Describe the image
python image_reader.py /path/to/image.png -p "Describe this image in detail"

OpenClaw Skill Invocation

Once installed, you can invoke it using natural language:

Analyze this image
Extract the text from the image
Describe this screenshot

Output

  • Text-heavy images: Returns all extracted text, preserving original formatting.
  • Non-text images: Returns a detailed scene description, including objects, people, colors, style, etc.
  • Mixed content: Provides both text extraction and a visual description.

Technical Details

  • Uses an OpenAI-compatible API to call Doubao multimodal models
  • Images are sent as base64-encoded data
  • The system prompt adapts to the image type to select the most appropriate analysis strategy

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Trunkate AI

Semantically optimizes context history and large text blocks via the Trunkate AI API. Includes proactive context pruning hooks for automated token management.

Registry SourceRecently Updated
General

Long-term Task Progress Manager

Manages multi-session, multi-stage projects by maintaining and syncing MISSION.md, PROGRESS.md, and NEXT_STEPS.md for seamless long-term progress tracking.

Registry SourceRecently Updated
General

Event Planner Pro

活动策划助手。活动方案(婚礼/生日/年会)、预算编制、准备清单、邀请函文案、时间轴、供应商清单。Event planner for weddings, birthdays, corporate events with budgets, checklists, invitations, timelines. 活动策...

Registry SourceRecently Updated
General

Trigger

Trigger - command-line tool for everyday use

Registry SourceRecently Updated