tesseract-image-ocr

Extract text from images using Tesseract.js (OCR). Supports multi-language recognition including Chinese and English, region recognition, character whitelist filtering, text orientation detection, and can run in a Node.js environment.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "tesseract-image-ocr" with this command: npx skills add openlark/tesseract-image-ocr

Tesseract OCR Image Text Extraction

Extract text content from images based on Tesseract.js (the WebAssembly port of the Tesseract OCR engine).

Use Cases

Use when users need "image to text," "OCR recognition," "extract text from images," "screenshot character recognition," "scan to text," or "image text orientation detection."

Core Capabilities

  • Recognize text from local images or image URLs
  • Support for 100+ languages, with the ability to specify multiple languages simultaneously (e.g., ['eng', 'chi_sim'])
  • Support for specifying recognition regions (--rectangle), character whitelists (--whitelist)
  • Support for text orientation and script detection (--detect)
  • Support for switching page segmentation modes (--psm) and OCR engine modes (--oem)
  • Output formats: text (default), hocr, blocks (JSON), tsv

Limitations

  • Does not support PDF files
  • Does not modify the Tesseract recognition model to improve accuracy
  • Requires a Node.js environment (this Skill uses Node.js scripts)

Workflow

1. Confirm Environment

node -v && npm ls tesseract.js 2>/dev/null || echo "tesseract.js not installed"

If not installed:

cd /root/.openclaw/workspace/skills/tesseract-ocr && npm init -y > /dev/null 2>&1 && npm install tesseract.js

2. Execute Recognition

Basic command:

node scripts/ocr.js <image-path-or-url> [--options]

Parameter Descriptions:

ParameterTypeDefaultDescription
<image>RequiredLocal path or HTTPS URL
--langstringengLanguage code(s), multiple joined with +, e.g., eng+chi_sim
--psmnumberPage segmentation mode (see PSM table below)
--oemnumberOCR engine mode (see OEM table below)
--whiteliststringCharacter whitelist, e.g., 0123456789 to recognize only digits
--rectanglestringRecognition region, format top,left,width,height
--outputstringtextOutput format: text / hocr / blocks / tsv
--detectflagDetect text orientation and script (does not perform OCR)
--dpinumberManually specify image DPI

Common Examples:

# Basic mixed Chinese-English recognition
node scripts/ocr.js photo.jpg --lang chi_sim+eng

# Recognize digits only (license plates, CAPTCHAs, etc.)
node scripts/ocr.js captcha.png --whitelist 0123456789

# Column-based recognition (suitable for vertical Chinese text)
node scripts/ocr.js scroll.jpg --lang chi_sim --psm 4

# Specify a region for recognition
node scripts/ocr.js receipt.png --rectangle 50,100,400,200

# Detect image text orientation
node scripts/ocr.js rotated.jpg --detect

# Output structured data
node scripts/ocr.js doc.png --output blocks

# Manually specify DPI (avoids "Invalid resolution 0 dpi" warning)
node scripts/ocr.js scan.png --dpi 300

3. Batch Recognition of Multiple Images

To reuse a Worker, the AI should write an inline script:

const { createWorker } = require('tesseract.js');
(async () => {
  const worker = await createWorker('eng');
  for (const img of ['a.png', 'b.png', 'c.png']) {
    const { data: { text } } = await worker.recognize(img);
    console.log(img, '→', text);
  }
  await worker.terminate();
})();

PSM — Page Segmentation Modes

The --psm parameter controls how Tesseract analyzes page layout:

PSMNameDescription
0OSD_ONLYOrientation and script detection only
1AUTO_OSDAutomatic page segmentation + orientation detection
2AUTO_ONLYAutomatic page segmentation, no orientation detection
3AUTOFully automatic page segmentation (default)
4SINGLE_COLUMNSingle column of variable size text
5SINGLE_BLOCK_VERT_TEXTSingle block of vertical text
6SINGLE_BLOCKSingle block of text
7SINGLE_LINESingle line of text
8SINGLE_WORDSingle word
9CIRCLE_WORDSingle word in a circular arrangement
10SINGLE_CHARSingle character
11SPARSE_TEXTSparse text (find as much as possible)
12SPARSE_TEXT_OSDSparse text + orientation detection
13RAW_LINERaw line (treated as a single line)

OEM — OCR Engine Modes

OEMDescription
0Legacy engine
1LSTM neural network engine (default)
2Legacy + LSTM
3Default (automatically selected based on current configuration)

Language Code Quick Reference

LanguageCode
Englisheng
Simplified Chinesechi_sim
Traditional Chinesechi_tra
Japanesejpn
Koreankor
Frenchfra
Germandeu
Spanishspa
Russianrus
Arabicara
Hindihin

Full list: tesseract_lang_list.md

Advanced Usage

The following scenarios require the AI to write inline scripts directly rather than using scripts/ocr.js:

  • Reusing a Worker after switching languages: Use worker.reinitialize(langs, oem)
  • Setting Tesseract parameters: Use worker.setParameters({ tessedit_pageseg_mode: ... })
  • Detecting text orientation (requires Legacy engine): Call worker.detect(image) after createWorker('eng', 0, { legacyCore: true, legacyLang: true })
  • Processing large numbers of images in parallel: Use createScheduler() + multiple Workers

For complete API reference, see references/api.md.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Tiktok Ugc Creator

Hire TikTok users to create authentic user-generated content (UGC), testimonials, and brand storytelling videos that build trust and drive engagement through...

Registry SourceRecently Updated
General

Tiktok Trend Challenger

Hire TikTok creators to participate in trending hashtags, viral challenges, and cultural moments to boost brand visibility and algorithmic reach on the For Y...

Registry SourceRecently Updated
1.2K1realroc
General

Tiktok Product Promotion

Hire TikTok influencers for product reviews, demonstrations, unboxing videos, and conversion-focused promotional content to drive sales and measurable ROI.

Registry SourceRecently Updated
General

Tiktok Live Commerce

Hire TikTok livestreamers for live shopping sessions, product demonstrations, flash sales, and real-time interactive commerce to drive immediate purchases an...

Registry SourceRecently Updated