doc-to-text

Document to Text Reborn (Digital Archaeologist)

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "doc-to-text" with this command: npx skills add famaoai-creator/gemini-skills/famaoai-creator-gemini-skills-doc-to-text

Document to Text Reborn (Digital Archaeologist)

Overview

This skill utilizes a 3-layer extraction model to "excavate" meaning and aesthetics from various document formats. It separates pure content from design and metadata, enabling high-fidelity analysis and reuse.

3-Layer Extraction Model

  • Content Layer (Soul): High-fidelity text extraction maintaining structural elements like headings and tables (Markdown output).

  • Aesthetic Layer (Mask): Extraction of design parameters, colors, fonts, and layout grid information.

  • Metadata Layer (Context): File properties, authorship, and contextual markers.

Supported Formats

  • PDF: Text and metadata. (Aesthetic: Coordinate-based analysis)

  • Word (.docx ): Structural Markdown conversion. (Aesthetic: Style extraction)

  • Excel (.xlsx ): Multi-sheet CSV extraction.

  • PowerPoint (.pptx ): Slide-based content extraction.

  • Images: OCR supporting English and Japanese.

Usage

node dist/index.js <file_path> [options]

Options

  • --mode, -m : Extraction mode. Choices: content , aesthetic , metadata , all (default).

  • --out, -o : Save the structural JSON result to a file.

Examples

Extract only text (soul) as Markdown:

node dist/index.js report.pdf --mode content

Extract design/layout DNA (mask):

node dist/index.js brochure.docx --mode aesthetic

Dependencies

  • pdf-parse : Basic PDF text.

  • mammoth : Word-to-Markdown conversion.

  • xlsx : Excel data parsing.

  • tesseract.js : Image OCR.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

data-transformer

No summary provided by upstream source.

Repository SourceNeeds Review
General

completeness-scorer

No summary provided by upstream source.

Repository SourceNeeds Review
General

local-reviewer

No summary provided by upstream source.

Repository SourceNeeds Review
General

api-fetcher

No summary provided by upstream source.

Repository SourceNeeds Review