OCR - Image Text Recognition (Local)
Extract text from images using Tesseract.js. 100% local run, no API key required. Supports Chinese and English.
Quick start
node {baseDir}/scripts/ocr.js /path/to/image.jpg
node {baseDir}/scripts/ocr.js /path/to/image.png --lang chi_sim
node {baseDir}/scripts/ocr.js /path/to/image.jpg --lang chi_tra+eng
Options
-
--lang <langs>: Language codes (default: chi_sim+eng)chi_sim- Simplified Chinesechi_tra- Traditional Chineseeng- English- Combine with
+:chi_sim+eng
-
--json: Output as JSON instead of plain text
Examples
# Recognize Chinese screenshot
node {baseDir}/scripts/ocr.js screenshot.png
# Recognize English document
node {baseDir}/scripts/ocr.js document.jpg --lang eng
# Mixed Chinese + English
node {baseDir}/scripts/ocr.js mixed.png --lang chi_sim+eng
Notes
- First run downloads language data (~20MB per language)
- Subsequent runs are cached locally
- Works best with clear, high-contrast images
- For handwritten text, accuracy may vary