silicon-paddle-ocr

OCR skill using PaddleOCR model via SiliconFlow API. This skill should be used when the user asks to "recognize text from an image", "extract text from a photo", "OCR this image", "read text from screenshot", or mentions "PaddleOCR", "image text recognition", "text extraction from images".

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "silicon-paddle-ocr" with this command: npx skills add aotenjou/silicon-paddleocr/aotenjou-silicon-paddleocr-silicon-paddle-ocr

OCR - Image Text Recognition

Use PaddleOCR to extract text content from images. Supports single image or batch processing.

Overview

This skill provides optical character recognition (OCR) capabilities using the PaddlePaddle/PaddleOCR-VL-1.5 model via the SiliconFlow API. Extract text from JPG, PNG, WebP, BMP, and GIF images.

When to Use

Invoke this skill when:

  • User wants to extract text from an image
  • User asks to OCR a screenshot or photo
  • User needs to read text from an image file
  • User mentions text recognition from images

How to Use

Prerequisites

Ensure the SILICONFLOW_API_KEY environment variable is set:

export SILICONFLOW_API_KEY="your_api_key"

Basic Usage

Execute the OCR script:

python3 scripts/ocr_skill.py [options] image_path

Arguments

ArgumentDescription
imagesImage file path(s) or glob pattern (required)
-k, --api-keyAPI key (default: from SILICONFLOW_API_KEY env)
-m, --modelOCR model name (default: PaddlePaddle/PaddleOCR-VL-1.5)
-p, --promptRecognition prompt for custom behavior
-j, --jsonOutput results in JSON format
-o, --outputSave results to specified file
--max-tokensMaximum tokens in response (default: 2000)

Examples

Single image:

python3 scripts/ocr_skill.py /path/to/image.jpg

Multiple images with glob:

python3 scripts/ocr_skill.py /path/to/images/*.png

JSON output format:

python3 scripts/ocr_skill.py --json /path/to/image.jpg

Custom prompt for table extraction:

python3 scripts/ocr_skill.py -p "Please identify and format table content as Markdown" /path/to/table.jpg

Save to file:

python3 scripts/ocr_skill.py --json --output results.json /path/to/images/*.jpg

Output Format

Text output (default):

--- image.jpg ---
识别到的文字内容
识别到 X 处文字区域

JSON output:

{
  "image.jpg": {
    "image_path": "/path/to/image.jpg",
    "image_size": [width, height],
    "texts": [
      {
        "text": "识别的文字",
        "box": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]]
      }
    ],
    "full_text": "所有文本的组合"
  },
  "image2.png": { ... }
}

Coordinates Explanation:

  • LOC values are normalized coordinates converted to pixel coordinates
  • Conversion: pixel = LOC × (image_size / LOC_max_value)
  • LOC max_value is approximately 972 (may vary by model/image)
  • The box field provides the four corner coordinates of each text region in pixel format

Supported Image Formats

  • JPG/JPEG
  • PNG
  • WebP
  • BMP
  • GIF

Error Handling

If processing fails:

  • Check that the image file exists
  • Verify the SILICONFLOW_API_KEY is valid
  • Ensure the API endpoint is reachable

Images that fail to process will show an error message, and other images will continue processing.

Additional Resources

Reference Files

  • references/api-configuration.md - API configuration details

Example Files

  • examples/sample-usage.sh - Example usage script

Scripts

  • scripts/ocr_skill.py - The main OCR implementation

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

test_skill

import json import tkinter as tk from tkinter import messagebox, simpledialog

Archived SourceRecently Updated
General

错敏信息检测

1. 安装https://clawhub.ai/xxxx/sensitive-check-skill 这个技能 2. 调用错敏信息检测技能,传入参数 userKey=xxxx,content=xxxx

Archived SourceRecently Updated
General

neo

Browse websites, read web pages, interact with web apps, call website APIs, and automate web tasks. Use Neo when: user asks to check a website, read a web page, post on social media (Twitter/X), interact with any web app, look up information on a specific site, scrape data from websites, automate browser tasks, or when you need to call any website's API. Keywords: website, web page, browse, URL, http, API, twitter, tweet, post, scrape, web app, open site, check site, read page, social media, online service.

Archived SourceRecently Updated
General

image-gen

Generate AI images from text prompts. Triggers on: "生成图片", "画一张", "AI图", "generate image", "配图", "create picture", "draw", "visualize", "generate an image".

Archived SourceRecently Updated