Book Converter Skill
Convert EPUB books into professionally formatted Markdown books with AI-assisted quality improvements.
Overview
This skill converts EPUB files into high-quality Markdown documents by:
- Using pandoc to extract raw Markdown from EPUB
- Creating a structured project directory
- Planning and executing AI-driven formatting fixes
- Producing chapter-by-chapter formatted output
- Generating merged book file with Table of Contents
Quick Start
User provides an EPUB file path:
/Users/username/Downloads/Book.Name.2024.epub
Execute the conversion workflow:
python3 scripts/convert_book.py "/path/to/book.epub"
This initiates the complete conversion process.
Workflow
CRITICAL: Use subagents for all formatting work to avoid polluting main context.
Phase 1: Setup and Extraction (Main Agent)
Run the conversion script:
python3 scripts/convert_book.py "/path/to/book.epub"
This script:
- Verifies EPUB file exists
- Creates project structure:
books/book-name/- Main directorybooks/book-name/raw/- Pandoc outputbooks/book-name/chapters/- Formatted chaptersbooks/book-name/images/- Extracted images
- Runs pandoc to extract Markdown
- Copies formatting standards to project directory
Output: Raw Markdown in books/book-name/raw/book-parsed.md
Phase 2: Analysis and Planning (Script + Subagent)
Step 1: Run the structure analysis script (Main Agent):
python3 books/book-name/analyze_structure.py books/book-name
This script:
- Extracts all headers with line numbers
- Detects formatting issues by sampling
- Suggests chapter boundaries
- Creates
STRUCTURE_ANALYSIS.mdreport (~5-10 KB instead of 35k+ lines)
Step 2: Launch a general subagent to create mapping files:
Task(
subagent_type="general",
description="Create chapter map and formatting plan",
prompt="""Create CHAPTER_MAP.md and FORMATTING_PLAN.md:
1. Read books/book-name/STRUCTURE_ANALYSIS.md (concise report with headers and issues)
2. Read books/book-name/references/chapter-map-template.md for format
3. Read books/book-name/references/formatting-plan-template.md for format
4. Create books/book-name/CHAPTER_MAP.md:
- Use suggested chapter boundaries from analysis
- Verify line ranges make sense
- Create proper slugged filenames
5. Create books/book-name/FORMATTING_PLAN.md:
- Document issues found in analysis
- Add severity and priority
- Note book-specific patterns
6. Update books/book-name/progress.md to mark Phase 2 complete
Return: Summary of chapters found and major issues identified."""
)
Output: CHAPTER_MAP.md, FORMATTING_PLAN.md, and updated progress.md
Phase 3: Chapter Formatting (Use Subagents)
For EACH chapter, launch a separate general subagent:
# Example for Chapter 1
Task(
subagent_type="general",
description="Format Chapter 1",
prompt="""Format Chapter 1 following the chapter formatting workflow.
**Critical Instructions:**
1. Read and follow ALL steps in books/book-name/references/chapter-workflow.md
2. Apply formatting rules from books/book-name/references/formatting-standards.md
3. Use books/book-name/CHAPTER_MAP.md to find line ranges for Chapter 1
4. Read books/book-name/FORMATTING_PLAN.md for known issues to watch for
**Workflow Summary (see chapter-workflow.md for complete details):**
Step 1: Read Standards and Chapter Map
- Read references/formatting-standards.md
- Read CHAPTER_MAP.md for your chapter's line ranges
- Read FORMATTING_PLAN.md for known issues
Step 2: Extract Chapter Content
- Extract Chapter 1 from raw/book-parsed.md using line ranges
Step 3: Identify Issues follow the standards
- Headers using bold instead of #
- Shattered code blocks
- Split paragraphs
- Missing code language identifiers
- Emphasis artifacts [word]
- Corrupted footnotes
- Missing image alt text
- Broken links
Step 4: Apply Formatting Fixes
- Follow the three-pass approach in chapter-workflow.md:
* First pass: Structure (headers, code blocks)
* Second pass: Content (paragraphs, emphasis)
* Third pass: Details (footnotes, images, links)
Step 5: Create Output File
- Write to books/book-name/chapters/chapter-01-title.md
- Use structure from chapter-workflow.md
Step 6: Update Progress
- Update books/book-name/progress.md with completion status
- Document fixes applied
**Quality Checklist (from chapter-workflow.md):**
- [ ] All headers use proper # syntax
- [ ] All code blocks have language identifiers
- [ ] No shattered code blocks remain
- [ ] Text flows naturally without mid-sentence breaks
- [ ] All footnotes have [^N] format with definitions
- [ ] Images have descriptive alt text
Return: Confirmation with summary of fixes applied."""
)
Important:
- Launch subagents in parallel batches (3-5 at a time) for efficiency
- Each subagent must read chapter-workflow.md and formatting-standards.md
- Follow the systematic workflow to ensure consistent quality
Output: Formatted chapters in books/book-name/chapters/
Phase 4: Book Assembly (Main Agent)
The merge_book.py script is already copied to your project directory. Simply run it:
python3 books/book-name/merge_book.py books/book-name
The script will:
- Read
CHAPTER_MAP.mdfor chapter order - Load all formatted chapters from
chapters/ - Extract headers for Table of Contents
- Fix image paths (relative to final location)
- Combine all chapters in order
- Generate comprehensive TOC
- Output to
books/book-name-book.md
Output: books/book-name-book.md with complete formatted book
Note: The merge script is reusable - no need to create it per book!
Critical: Chapter Formatting Requirements
Every subagent in Phase 3 MUST:
- Read chapter-workflow.md first - Contains the complete step-by-step process
- Read formatting-standards.md - Contains all formatting rules (678 lines)
- Follow the workflow systematically - Don't skip steps
- Use the three-pass approach:
- First pass: Fix structure (headers, code blocks)
- Second pass: Fix content (paragraphs, emphasis)
- Third pass: Fix details (footnotes, images, links)
- Complete the quality checklist - Verify all items before finishing
Why this matters:
- Ensures consistent quality across all chapters
- Prevents common mistakes (skipped issues, inconsistent style)
- Proven process from Clean Code Collection (35k+ lines)
- Each chapter is only formatted once - must be thorough
The workflow documents are your complete instructions - trust them!
Subagent Usage Principles
Never process book content in main context. Always use subagents to:
- Keep main context clean: Book content is large and pollutes context
- Enable parallelization: Format multiple chapters simultaneously
- Isolate formatting work: Each chapter gets fresh context
- Avoid token limits: Raw content can exceed context windows
Subagent Selection: Always use subagent_type="general" for all book processing tasks.
Progress Tracking
Create and maintain books/book-name/progress.md:
# Book Name - Conversion Progress
## Phase 1: Setup ✓
- [x] EPUB extracted
- [x] Project structure created
## Phase 2: Planning ✓
- [x] Chapter map created (15 chapters identified)
- [x] Formatting plan documented
## Phase 3: Chapter Formatting (5/15 complete)
- [x] Front Matter
- [x] Chapter 1: Introduction
- [x] Chapter 2: Getting Started
- [x] Chapter 3: Advanced Topics
- [x] Chapter 4: Best Practices
- [ ] Chapter 5: Performance
- [ ] ...
## Phase 4: Assembly
- [ ] Merge script created
- [ ] Final book generated
Update after each subagent completes.
Quality Standards
All formatted output must meet these criteria:
- Headers: Use proper
#syntax, not bold text - Code Blocks: Include language identifiers, merge shattered blocks
- Text Flow: Join split sentences into natural paragraphs
- Emphasis: Use
*italic*and**bold**, not[brackets] - Footnotes: Standard
[^1]format with definitions - Images: Descriptive alt text, not generic filenames
- Links: Clean anchors, no PDF conversion artifacts
Complete standards reference: references/formatting-standards.md
Example Usage
User Request:
"Convert this EPUB to Markdown: /Users/john/Downloads/Effective.Java.3rd.Edition.epub"
Skill Execution:
- Run conversion script to extract content
- Analyze structure and create chapter map
- Format each chapter using AI subagents
- Merge into final book with TOC
- Provide user with
books/effective-java-final.md
Scripts
- convert_book.py: Main conversion script (Phase 1) - Extracts EPUB and sets up project
- analyze_structure.py: Structure analyzer (Phase 2) - Extracts headers and detects issues efficiently
- merge_book.py: Reusable merge script (Phase 4) - Combines all chapters into final book
References
- formatting-standards.md: Complete formatting rules (loaded as needed during formatting)
- chapter-workflow.md: Detailed chapter formatting workflow (loaded as needed)
- progress-template.md: Template for progress tracking file
- chapter-map-template.md: Template for chapter mapping
- formatting-plan-template.md: Template for formatting issue documentation
Notes
- High Quality Focus: Manual AI-driven formatting ensures prose flows naturally
- No Automated Scripts: Formatting requires human-like judgment for line joining
- Preserve Content: Never alter meaning or remove content
- Code Accuracy: Ensure code blocks are syntactically complete