tex2docx — LaTeX to Word Converter

Requirements

pandoc (system install): winget install pandoc or pandoc.org
Python packages: pip install python-docx lxml pypandoc_binary

Usage

python scripts/tex2docx.py input.tex [output.docx]

If output.docx is omitted, output is input.docx in the same directory.

How It Works (Three Phases)

.tex ──→ [pandoc] ──→ OMML equations (13+ Word-editable formulas)
  │
  └──→ [Custom parser] ──→ Native Word tables ├──→ Final .docx
                           Embedded figures     │   (merged)
                           Formatted refs       │
                           IEEE layout & font  ┘

Phase 1 — Pandoc

Runs pandoc via pypandoc. Input file must be in its own directory (with figures/ subfolder if images exist). The script chdirs to the tex directory before running pandoc so image paths resolve correctly.

Phase 2 — Custom LaTeX Parser

RegEx-based extraction of:

Tables: \begin{table} → Word Table objects (full borders, centered, 8pt TNR)
Figures: \includegraphics{} + \caption{} → PNG/PDF embeds with italic captions
References: \thebibliography → formatted entries with hanging indent
Sections: \section{}, \subsection{} → bold headings
Metadata: \title, author, \abstract, \IEEEkeywords

Phase 3 — Merge

OMML equation paragraphs from pandoc are inserted into the cleanly-built document. Body paragraphs get 0.25in first-line indent. All LaTeX commands (\textbf, \toprule, \ref, \cite, \begin{itemize}, etc.) are stripped from text content.

Output Format

Feature	Detail
Font	Times New Roman (10pt body, 9pt table/figure, 8pt refs)
Layout	A4, two-column IEEE conference style
Equations	OMML (double-click to edit in Word)
Tables	Native Word tables, all borders
Figures	PNG/PDF embedded with "Fig." captions
References	Hanging indent, `[bN]` format
First indent	0.25in on body paragraphs

Verification

python scripts/verify.py output.docx

Reports paragraph/table/image/equation counts and checks for LaTeX residue.

Chinese (ctex) Support

Fully supports Chinese LaTeX documents using the ctex package:

Chinese section titles (引言, 方法, 实验, 结论等) are recognized
\section*{} (star variant) is supported
Chinese table headers preserved
Chinese text in titles rendered via w:eastAsia font fallback
\title{...} and \author{...} residue paragraphs are filtered

Limitations

Inline math ( $...$ ) becomes plain text (italic), not OMML — only \begin{equation}, \begin{align}, and \[...\] become editable equations
No .bib support: references must be in \thebibliography{} environment
PNG images preferred: script tries PNG then PDF fallback
Pandoc path: the system pandoc binary must be discoverable by pypandoc

Script: `scripts/tex2docx.py`

Self-contained (660+ lines). Key internal functions:

Function	Role
`extract_tex()`	Parse all structural elements from .tex
`extract_omml()`	Pull OMML XML from pandoc output
`build_docx()`	Construct final document with all components
`clean()`	Strip LaTeX commands to plain text
`add_table()`	Build Word table with borders
`add_figure()`	Embed image + caption

tex2docx

Safety Notice

Copy this and send it to your AI assistant to learn

tex2docx — LaTeX to Word Converter

Requirements

Usage

How It Works (Three Phases)

Phase 1 — Pandoc

Phase 2 — Custom LaTeX Parser

Phase 3 — Merge

Output Format

Verification

Chinese (ctex) Support

Limitations

Script: `scripts/tex2docx.py`

Source Transparency

Related Skills

Biomarker Investigation

孩子学习行为分析工具

UUMuse Brain

Autoresearch.Bak

tex2docx

Safety Notice

Copy this and send it to your AI assistant to learn

tex2docx — LaTeX to Word Converter

Requirements

Usage

How It Works (Three Phases)

Phase 1 — Pandoc

Phase 2 — Custom LaTeX Parser

Phase 3 — Merge

Output Format

Verification

Chinese (ctex) Support

Limitations

Script: scripts/tex2docx.py

Source Transparency

Related Skills

Biomarker Investigation

孩子学习行为分析工具

UUMuse Brain

Autoresearch.Bak

Script: `scripts/tex2docx.py`