Word 文档整理助手
智能识别文档结构,自动应用标准化排版,生成专业目录,让文档整理一键完成。
触发条件
当用户提出以下请求时激活此技能:
- "整理word文档"
- "格式化文档"
- "生成文档目录"
- "统一文档样式"
- "排版优化"
- "修复word格式"
- "规范文档格式"
- "word排版"
参数定义
document_path(必需)
- 类型: string
- 描述: Word文档的完整路径(支持.docx格式)
- 示例: "C:/Users/Desktop/报告.docx", "/home/user/docs/论文.docx"
operations(可选,默认: all)
- 类型: array[string]
- 描述: 要执行的操作列表
- 可选值:
format: 格式化段落(行距、间距、字体统一)toc: 生成文档目录styles: 应用标准样式模板cleanup: 清理空段落和冗余格式all: 执行所有操作(默认)
style_template(可选,默认: academic)
- 类型: string
- 描述: 样式模板类型
- 可选值:
academic: 学术模板(宋体、层级分明,适合论文/报告)business: 商务模板(微软雅黑、现代简洁,适合商业文档)minimal: 极简模板(Arial、紧凑排版,适合笔记/草稿)default: 默认模板(通用设置)
output_path(可选)
- 类型: string
- 描述: 输出文件路径(默认覆盖原文件,建议指定新路径保留原文件)
- 示例: "C:/Users/Desktop/报告_整理版.docx"
执行流程
步骤1: 环境检查与备份
#!/bin/bash
# 检查文件存在性
if [ ! -f "${document_path}" ]; then
echo "错误:文件不存在 ${document_path}"
echo "请检查路径是否正确,或文件是否被移动/删除"
exit 1
fi
# 检查文件扩展名
if [[ ! "${document_path}" =~ \.(docx|doc)$ ]]; then
echo "错误:仅支持 .docx 或 .doc 格式"
exit 1
fi
# 创建时间戳备份
backup_path="${document_path}.backup.$(date +%Y%m%d_%H%M%S)"
cp "${document_path}" "${backup_path}"
echo "已创建备份: ${backup_path}"
步骤2: 安装Python依赖
#!/bin/bash
# 检查python-docx是否已安装
python3 -c "import docx" 2>/dev/null || pip3 install python-docx -q
echo "依赖检查完成"
步骤3: 执行文档整理(核心Python脚本)
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from docx import Document
from docx.shared import Pt, Inches, RGBColor
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.oxml.ns import qn
import sys
import re
import os
# 获取参数
doc_path = "${document_path}"
output_path = "${output_path}" if "${output_path}" else doc_path
template = "${style_template}"
operations = [op.strip() for op in "${operations}".split(",")] if "${operations}" else ["all"]
print(f"处理文件: {doc_path}")
print(f"执行操作: {operations}")
print(f"使用模板: {template}")
# 加载文档
try:
doc = Document(doc_path)
except Exception as e:
print(f"无法打开文档: {e}")
print("提示:请确保文档未被Microsoft Word占用(关闭Word后重试)")
sys.exit(1)
# 模板配置
TEMPLATES = {
"academic": {
"h1_size": 18, "h2_size": 16, "h3_size": 14, "body_size": 12,
"h1_font": "黑体", "h2_font": "黑体", "body_font": "宋体",
"line_spacing": 1.5, "space_after": 6
},
"business": {
"h1_size": 16, "h2_size": 14, "h3_size": 12, "body_size": 11,
"h1_font": "微软雅黑", "h2_font": "微软雅黑", "body_font": "微软雅黑",
"line_spacing": 1.5, "space_after": 6
},
"minimal": {
"h1_size": 14, "h2_size": 12, "h3_size": 11, "body_size": 10.5,
"h1_font": "Arial", "h2_font": "Arial", "body_font": "Arial",
"line_spacing": 1.15, "space_after": 3
},
"default": {
"h1_size": 16, "h2_size": 14, "h3_size": 12, "body_size": 12,
"h1_font": "宋体", "h2_font": "宋体", "body_font": "宋体",
"line_spacing": 1.5, "space_after": 6
}
}
config = TEMPLATES.get(template, TEMPLATES["academic"])
changes_log = []
# 操作1: 格式化
if "format" in operations or "all" in operations:
print("正在格式化文档...")
count = 0
for para in doc.paragraphs:
para.paragraph_format.line_spacing = config["line_spacing"]
para.paragraph_format.space_after = Pt(config["space_after"])
para.paragraph_format.space_before = Pt(0)
if not para.style.name.startswith('Heading'):
for run in para.runs:
run.font.name = config["body_font"]
run._element.rPr.rFonts.set(qn('w:eastAsia'), config["body_font"])
run.font.size = Pt(config["body_size"])
count += 1
changes_log.append(f"格式化 {count} 个段落")
# 操作2: 应用样式
if "styles" in operations or "all" in operations:
print("正在应用样式模板...")
title_count = 0
for para in doc.paragraphs:
text = para.text.strip()
if not text:
continue
if re.match(r'^[第][一二三四五六七八九十\d]+[章\s]|^[\d]+\s*[、..\s]|^[((][一二三四五六七八九十]+[))]', text):
para.style = doc.styles['Heading 1']
for run in para.runs:
run.font.name = config["h1_font"]
run._element.rPr.rFonts.set(qn('w:eastAsia'), config["h1_font"])
run.font.size = Pt(config["h1_size"])
run.font.bold = True
run.font.color.rgb = RGBColor(0, 0, 0)
title_count += 1
elif re.match(r'^\d+\.\d+[\s.、]|^[((][\d一二三四五六七八九十]+[))]', text):
para.style = doc.styles['Heading 2']
for run in para.runs:
run.font.name = config["h2_font"]
run._element.rPr.rFonts.set(qn('w:eastAsia'), config["h2_font"])
run.font.size = Pt(config["h2_size"])
run.font.bold = True
title_count += 1
elif re.match(r'^\d+\.\d+\.\d+|^[((]\d+[))]', text):
para.style = doc.styles['Heading 3']
for run in para.runs:
run.font.size = Pt(config["h3_size"])
run.font.bold = True
title_count += 1
changes_log.append(f"识别并格式化 {title_count} 个标题")
# 操作3: 生成目录
if "toc" in operations or "all" in operations:
print("正在生成目录...")
toc_entries = []
for para in doc.paragraphs:
if para.style.name.startswith('Heading'):
level = int(para.style.name[-1]) if para.style.name[-1].isdigit() else 1
toc_entries.append((level, para.text.strip()))
if toc_entries:
first_para = doc.paragraphs[0]
toc_title = first_para.insert_paragraph_before("目录")
toc_title.alignment = WD_ALIGN_PARAGRAPH.CENTER
for run in toc_title.runs:
run.font.size = Pt(config["h1_size"])
run.font.bold = True
run.font.name = config["h1_font"]
run._element.rPr.rFonts.set(qn('w:eastAsia'), config["h1_font"])
for level, text in toc_entries[:50]:
indent = " " * (level - 1)
entry_para = first_para.insert_paragraph_before(f"{indent}{text}")
entry_para.paragraph_format.left_indent = Inches(0.3 * (level - 1))
for run in entry_para.runs:
run.font.size = Pt(config["body_size"])
separator = first_para.insert_paragraph_before("—" * 30)
separator.alignment = WD_ALIGN_PARAGRAPH.CENTER
changes_log.append(f"生成目录,包含 {len(toc_entries)} 个条目")
else:
changes_log.append("未检测到标题结构,跳过目录生成")
# 操作4: 清理
if "cleanup" in operations or "all" in operations:
print("正在清理冗余内容...")
removed = 0
prev_empty = False
for i in range(len(doc.paragraphs) - 1, -1, -1):
para = doc.paragraphs[i]
is_empty = not para.text.strip()
if is_empty and prev_empty:
p_element = para._element
p_element.getparent().remove(p_element)
removed += 1
prev_empty = is_empty
changes_log.append(f"删除 {removed} 个冗余空段落")
# 保存文档
try:
doc.save(output_path)
print(f"文档已保存: {output_path}")
except Exception as e:
print(f"保存失败: {e}")
sys.exit(1)
print("
" + "="*50)
print("整理报告")
print("="*50)
for log in changes_log:
print(log)
print("="*50)
print("文档整理完成!")
步骤4: 验证输出
#!/bin/bash
if [ -f "${output_path}" ]; then
file_size=$(ls -lh "${output_path}" | awk '{print $5}')
echo "输出文件: ${output_path} (${file_size})"
echo "提示:原文件已备份,如有问题可恢复"
else
echo "错误:输出文件未生成"
exit 1
fi
使用示例
示例1: 全面整理(推荐)
整理文档 C:/Users/Desktop/毕业论文.docx,使用学术模板,执行所有操作,输出到 C:/Users/Desktop/毕业论文_整理版.docx
示例2: 仅格式化和生成目录
格式化 C:/Docs/报告.docx,操作包括format,toc,模板用business
错误处理
| 错误场景 | 处理方式 |
|---|---|
| 文件不存在 | 提示检查路径,退出码1 |
| 文件格式不支持 | 提示仅支持docx/doc,退出码1 |
| 文件被Word占用 | 提示关闭Word后重试,退出码1 |
| 权限不足 | 提示以管理员身份运行,退出码1 |
安全说明
- 本地执行: 所有操作在本地完成,不上传任何文件到云端
- 自动备份: 每次执行自动创建带时间戳的备份
- 原子操作: 失败时不会损坏原文件(先处理再保存)
- 权限最小化: 仅需文件读写权限,无需网络或系统管理权限