vision-bot

Describe images, detect objects, extract text, and analyze webpages. Pass any image URL directly in your task. Responds in your language.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "vision-bot" with this command: npx skills add unixlamadev-spec/vision-bot

Vision Bot

Analyze images for detailed descriptions, object detection, and OCR text extraction. Pass any image URL directly in your task string — no separate field needed. Auto-detects the right mode from your task — OCR for text extraction, counting for quantity questions, or full description by default. Responds in the language of your task.

When to Use

  • Describing image contents for accessibility
  • Extracting text from screenshots, signs, or photos (OCR)
  • Counting objects in images
  • Identifying objects in images
  • Analyzing charts, diagrams, or visual data
  • Analyzing images in any language (Chinese, Spanish, French, etc.)
  • Describing webpage screenshots for audits
  • Analyzing any image by including the URL directly in your task

Usage Flow

  1. Include an image URL directly in your task string
  2. Or provide image_url field separately
  3. Task language sets response language automatically

Security Manifest

PermissionScopeReason
Networkaiprox.devAPI calls to orchestration endpoint
Env ReadAIPROX_SPEND_TOKENAuthentication for paid API

Make Request

curl -X POST https://aiprox.dev/api/orchestrate \
  -H "Content-Type: application/json" \
  -d '{
    "task": "描述这张图片的内容: https://example.com/photo.jpg",
    "rail": "bitcoin-lightning",
    "spend_token": "$AIPROX_SPEND_TOKEN"
  }'
curl -X POST https://aiprox.dev/api/orchestrate \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Describe this image: https://example.com/photo.jpg",
    "rail": "bitcoin-lightning",
    "spend_token": "$AIPROX_SPEND_TOKEN"
  }'

Response

{
  "description": "A modern office workspace with a standing desk and dual monitors.",
  "objects": ["desk", "monitors", "keyboard", "mouse", "plant", "window", "headphones"],
  "text_found": "Visual Studio Code - main.js"
}

Trust Statement

Vision Bot analyzes images via URL or base64 input. Images are processed transiently using Claude's vision capabilities via LightningProx. No images are stored. Your spend token is used for payment only.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Fitbit Tracker

Personal Fitbit integration for daily health tracking with adaptive sleep and activity reporting

Registry SourceRecently Updated
General

Ollama Load Balancer

Ollama load balancer for Llama, Qwen, DeepSeek, and Mistral inference across multiple machines. Load balancing with auto-discovery via mDNS, health checks, q...

Registry SourceRecently Updated
General

Google Merchant Center

Google Merchant Center integration. Manage Accounts. Use when the user wants to interact with Google Merchant Center data.

Registry SourceRecently Updated
General

Twitter/X All-in-One — Search, Monitor & Publish Text & Media Posts

Searches and reads X (Twitter): profiles, timelines, mentions, followers, tweet search, trends, lists, communities, and Spaces. Publishes posts, likes/unlike...

Registry SourceRecently Updated