Humanize Chinese AI Text v2.0
Comprehensive CLI for detecting and transforming Chinese AI-generated text. Makes robotic AI writing natural and human-like.
v2.0 highlights: weighted 0-100 scoring, sentence-level analysis, sentence restructuring (merge/split), context-aware replacement, rhythm variation, vocabulary diversification, 7 style transforms, external pattern config (patterns_cn.json).
Quick Start
# Detect AI patterns (20+ categories, 0-100 score)
python scripts/detect_cn.py text.txt
python scripts/detect_cn.py text.txt -v # verbose + worst sentences
python scripts/detect_cn.py text.txt -s # score only
python scripts/detect_cn.py text.txt -j # JSON output
# Humanize text
python scripts/humanize_cn.py text.txt -o clean.txt
python scripts/humanize_cn.py text.txt --scene social
python scripts/humanize_cn.py text.txt --scene tech -a # aggressive mode
python scripts/humanize_cn.py text.txt --seed 42 # reproducible
# Apply writing styles
python scripts/style_cn.py text.txt --style zhihu -o zhihu.txt
python scripts/style_cn.py text.txt --style xiaohongshu
python scripts/style_cn.py --list
# Compare before/after
python scripts/compare_cn.py text.txt --scene tech -a
python scripts/compare_cn.py text.txt -o clean.txt
Detection System
Scoring
Weighted 0-100 score with 4 severity levels:
| Score | Level | Meaning |
|---|---|---|
| 0-24 | LOW | Likely human-written |
| 25-49 | MEDIUM | Some AI signals |
| 50-74 | HIGH | Probably AI-generated |
| 75-100 | VERY HIGH | Almost certainly AI |
Detection Categories
🔴 Critical (weight: 8)
| Category | Examples |
|---|---|
| Three-Part Structure | 首先...其次...最后, 一方面...另一方面, 其一...其二...其三 |
| Mechanical Connectors | 值得注意的是, 综上所述, 不难发现, 归根结底, 由此可见 |
| Empty Grand Words | 赋能, 闭环, 数字化转型, 协同增效, 全方位, 多维度 |
🟠 High Signal (weight: 4)
| Category | Examples |
|---|---|
| AI High-Frequency Words | 助力, 彰显, 底层逻辑, 抓手, 触达, 沉淀, 复盘 |
| Filler Phrases | 值得一提的是, 众所周知, 毫无疑问 |
| Balanced Arguments | 虽然...但是...同时, 既有...也有...更有 |
| Template Sentences | 随着...的不断发展, 在当今...时代, 作为...的重要组成部分 |
🟡 Medium Signal (weight: 2)
| Category | Examples |
|---|---|
| Hedging Language | 在一定程度上, 某种程度上, 通常情况下 (>5 occurrences) |
| List Addiction | Excessive numbered/bulleted lists |
| Punctuation Overuse | Dense em dashes, semicolons |
| Excessive Rhetoric | 对偶/排比句过多 |
⚪ Style Signal (weight: 1.5)
| Category | Description |
|---|---|
| Uniform Paragraphs | Low CV in paragraph lengths |
| Low Burstiness | Monotonous sentence lengths |
| Emotional Flatness | Lack of emotional/personal expressions |
| Repetitive Starters | Same sentence starters >3 times |
| Low Entropy | Low character-level entropy (predictable text) |
Sentence-Level Analysis
With -v (verbose) mode, the detector identifies the most AI-like sentences:
── 最可疑句子 ──
1. [16分] 随着人工智能技术的不断发展,在当今数字化转型时代...
原因: 数字化转型, 深度融合, 模板: 随着.*?的(不断)?发展
Humanization Engine
Transforms (applied in order)
- Structure cleanup — Remove three-part structure (首先/其次/最后)
- Phrase replacement — Context-aware replacement of AI phrases (regex patterns first, then plain text, longest-first matching)
- Sentence merge — Merge overly short consecutive sentences
- Sentence split — Split long sentences at natural breakpoints (但是/不过/同时)
- Punctuation normalization — Reduce excessive semicolons, em dashes
- Vocabulary diversification — Replace repeated words (进行/实现/提供 etc.) with synonyms
- Paragraph rhythm — Vary uniform paragraph lengths (merge short, split long)
- Casual injection — Add human expressions (scene-dependent)
- Paragraph shortening — For social/chat scenes
Scenes
| Scene | Casualness | Best For |
|---|---|---|
general | 0.3 | Default, balanced |
social | 0.7 | Social media, short posts |
tech | 0.3 | Tech blogs, tutorials |
formal | 0.1 | Formal articles, reports |
chat | 0.8 | Conversations, messaging |
Aggressive Mode (-a)
Adds +0.3 casualness, more colloquial expressions, stronger sentence restructuring. Typical score reduction: 60-80 points on heavily AI-generated text.
Reproducibility
Use --seed N for reproducible results (same input + seed = same output).
Writing Style Transforms
7 specialized Chinese writing styles:
| Style | Name | Description |
|---|---|---|
casual | 口语化 | Like chatting with friends — natural, relaxed |
zhihu | 知乎 | Rational, in-depth, personal opinions |
xiaohongshu | 小红书 | Enthusiastic, emoji-rich, product-focused |
wechat | 公众号 | Storytelling, engaging, relatable |
academic | 学术 | Rigorous, precise, no colloquialisms |
literary | 文艺 | Poetic, imagery-rich, metaphorical |
weibo | 微博 | Short, opinionated, shareable |
Combine humanize + style
python scripts/humanize_cn.py text.txt --style xiaohongshu -o xhs.txt
This first humanizes (removes AI patterns) then applies the style transform.
External Configuration
All patterns, replacements, and scoring weights are in scripts/patterns_cn.json. Edit this file to:
- Add new AI vocabulary patterns
- Customize replacement alternatives
- Adjust scoring weights per severity
- Add regex patterns for template detection
- Set thresholds for hedging language detection
Scripts Reference
detect_cn.py
python scripts/detect_cn.py [file] [-j] [-s] [-v] [--sentences N]
| Flag | Description |
|---|---|
-j | JSON output |
-s | Score only (e.g. "72/100 (high)") |
-v | Verbose: show worst sentences |
--sentences N | Number of worst sentences to show (default: 5) |
humanize_cn.py
python scripts/humanize_cn.py [file] [-o output] [--scene S] [--style S] [-a] [--seed N]
| Flag | Description |
|---|---|
-o | Output file |
--scene | general/social/tech/formal/chat |
--style | casual/zhihu/xiaohongshu/wechat/academic/literary/weibo |
-a | Aggressive mode |
--seed | Random seed for reproducibility |
style_cn.py
python scripts/style_cn.py [file] --style S [-o output] [--seed N] [--list]
compare_cn.py
python scripts/compare_cn.py [file] [-o output] [--scene S] [--style S] [-a]
Shows score diff, category changes, and metric comparison before/after humanization.
Workflow
# 1. Check AI score
python scripts/detect_cn.py document.txt -v
# 2. Humanize with comparison
python scripts/compare_cn.py document.txt --scene tech -a -o clean.txt
# 3. Verify improvement
python scripts/detect_cn.py clean.txt -s
# 4. Optional: apply specific style
python scripts/style_cn.py clean.txt --style zhihu -o final.txt
Batch Processing
# Scan all files
for f in *.txt; do
echo "=== $f ==="
python scripts/detect_cn.py "$f" -s
done
# Transform all markdown
for f in *.md; do
python scripts/humanize_cn.py "$f" --scene tech -a -o "${f%.md}_clean.md"
done