GitHub Code Review Report (Chinese Format with AI Analysis)
Overview
Generate structured Chinese-format Excel reports from GitHub pull requests with AI-powered review analysis. Uses GitHub API via gh CLI to fetch PR data, filter by date range, branch, and specific reviewers, then leverages AI models to intelligently analyze review content and generate structured problem descriptions and impact analysis.
Key innovation: Two-phase workflow separates data collection from AI analysis, enabling true AI-powered review summarization instead of simple text truncation.
When to Use
Use when you need to:
- Generate code review reports for GitHub repositories in Chinese format
- Track review activity over time periods (this month, last week, last 15 days)
- Export review data to Chinese-format Excel with 13 specific columns
- Leverage AI for intelligent review analysis (problem description, impact analysis) instead of simple truncation
- Use Excel template file for consistent formatting
DO NOT use for:
- Local git repository analysis (use
git loginstead) - General PR statistics without review summaries
- Issues or other GitHub artifacts
- Simple keyword-based classification without AI understanding
Core Pattern
Before (without skill):
# No AI integration - simple truncation to 50 chars
# No template usage - create new Excel every time
# Inconsistent problem classification based on keyword matching
# No two-phase workflow (data + AI analysis mixed together)
After (with skill):
# Two-phase workflow: Data collection → AI analysis
# AI-powered intelligent summarization
# Template-based Excel generation (consistent formatting)
# Structured problem classification (15 types, severity levels)
Decision Flow
digraph github_review_flow {
rankdir=LR;
node [shape=box, style="rounded,filled", fillcolor=lightgray];
start [label="Start: Generate review report", fillcolor=lightgreen];
parse_time [label="Parse time range\n(this month, last week, etc.)"];
fetch_prs [label="Fetch PRs via\ngh pr list API"];
any_prs [label="Any PRs found?", shape=diamond, fillcolor=lightyellow];
no_prs [label="Report empty\nsuggest alternatives", fillcolor=lightcoral];
filter_reviewer [label="Filter by reviewer patterns\n(include/exclude)"];
fetch_details [label="Fetch PR details via\ngh pr view"];
extract_reviews [label="Extract original review\ncontent (not summarization)"];
ai_analyze [label="AI Analysis:\nAnalyze reviews and generate\nproblem description & impact", shape=diamond, fillcolor=lightblue];
use_template [label="Use Excel template\n(读取模板文件)", shape=diamond, fillcolor=lightblue];
generate_excel [label="Generate Excel with AI-processed data\n基于模板填充数据", fillcolor=lightblue];
done [label="✅ Deliver report", fillcolor=lightgreen];
start -> parse_time -> fetch_prs -> any_prs;
any_prs -> no_prs [label="No"];
any_prs -> filter_reviewer [label="Yes"];
filter_reviewer -> fetch_details -> extract_reviews;
extract_reviews -> ai_analyze;
ai_analyze -> use_template [label="Yes"];
ai_analyze -> generate_excel [label="No"];
use_template -> generate_excel -> done;
}
Key decisions:
AI Integration:
- When to use AI: Only for review analysis (problem description, impact analysis), NOT for data collection
- Data collection phase: Only use GitHub API, preserve original review content
- AI analysis phase: Each review → AI prompt → generate structured fields
- Template usage: Read template first, then append data (preserve formatting/styling)
Template file:
- Location:
resources/模块名-代码走查报告-template.xlsx - Contains: Empty rows with proper columns and formatting
- Purpose: Preserve formatting (width, colors, fonts, borders, header styles)
- Read before generating, write only new data rows
Font specification:
- Required font: "CESI宋体-GB18030" (CESI Songti GB18030)
- Template must use this font for Chinese characters to ensure proper rendering
- When creating template, apply this font to all cells containing Chinese text
- If template doesn't exist, programmatically set font when creating Excel
Two-phase workflow:
- Collect data (GitHub API) → Store original reviews
- AI analyze each review → Generate problem description, impact analysis
- Load template → Append AI-processed data → Save Excel
Template fallback: If template file doesn't exist, create basic template structure programmatically (columns only, no formatting)
Implementation
Step 1: Parse Time Range
Convert relative time expressions to absolute date ranges. Use parse_time_range() in generator.py.
| Input | Conversion |
|---|---|
| "这个月" | First day of current month to today |
| "上周" | Last Monday to last Sunday |
| "这半个月" | 15 days ago to today |
| "这个星期" | Last Monday to today |
| "YYYY-MM-DD to YYYY-MM-DD" | Use exact dates |
Step 2: Fetch PRs (Data Collection Phase)
Use gh pr list with search query:
gh pr list \
--repo linuxdeepin/dde-cooperation \
--state merged \
--search "merged:${start_date}..${end_date} base:master" \
--json number,title,author,createdAt,mergedAt,mergedBy,baseRefName,url,reviews,comments
Filter by branch: append base:master to search query.
Important: This is DATA COLLECTION only. Do NOT summarize or process reviews yet. Store original content.
Filter Version Update Commits:
Skip PRs that are version updates only. Use is_version_update_pr() to filter out:
- PR titles matching:
Update version to X.Y.Z,Bump version to X.Y.Z,Version X.Y.Z - PRs with only version changes in commits
- Commits by
deepin-ci-robot(considered invalid, see Step 5)
def is_version_update_pr(pr):
"""Check if PR is only about version update"""
title = pr.get('title', '').lower()
patterns = [
r'^update version to \d+\.\d+\.\d+',
r'^bump version to \d+\.\d+\.\d+',
r'^version \d+\.\d+\.\d+$',
]
for pattern in patterns:
if re.search(pattern, title):
return True
return False
# Filter out version update PRs
if is_version_update_pr(pr):
continue
Step 3: Filter Reviewers
Use should_include_reviewer() in generator.py with fnmatch patterns:
| Pattern | Matches | Does NOT Match |
|---|---|---|
sourcery-* | sourcery-ai, sourcery-bot | liuzheng |
*-bot | any name ending with "-bot" | sourcery-ai, liuzheng |
Rule: Exclude checked first, then include. Include requires matching an include pattern AND not matching any exclude pattern.
Important: Only validate reviewer identity, NOT review content quality. Keep all valid reviews for AI analysis.
Reviewer Definitions:
- 提出人 (Proposed by): The person who submitted the review comments on the PR
- 解决人 (Resolved by): The person who submitted/merged the PR (mergedBy field)
These fields are distinct - one provides review feedback, the other implements the changes.
Priority: Human over AI: When both human and AI reviews exist on the same PR:
- If a human reviewer (e.g., zhangs, liuzheng) added review comments: ONLY include human reviews, ignore AI reviews
- If only AI reviews exist (e.g., sourcery-ai): Include those
- AI reviewers to exclude: sourcery-ai, sourcery-bot, deepin-ci-robot
- Example: If PR has reviews from zhangs (human) and sourcery-ai (AI), only include zhangs's review
def should_include_reviewer(author):
"""Check if reviewer should be included in report"""
# Always exclude invalid automated accounts
EXCLUDE_REVIEWERS = [
'deepin-ci-robot',
'sourcery-ai',
'sourcery-bot',
]
if author in EXCLUDE_REVIEWERS:
return False
# Include by default for other users
return True
def prioritize_human_reviews(reviews):
"""Filter reviews to prioritize human over AI"""
# Check if any human review exists
has_human = any(
not should_exclude_as_ai(review['author']['login'])
for review in reviews
)
# If human reviews exist, exclude all AI reviews
if has_human:
return [
review for review in reviews
if not should_exclude_as_ai(review['author']['login'])
]
else:
# If only AI reviews, include all
return reviews
Reviewer Definitions:
- 提出人 (Proposed by): The person who submitted the review comments on the PR
- 解决人 (Resolved by): The person who submitted/merged the PR (mergedBy field)
These fields are distinct - one provides review feedback, the other implements the changes.
Step 4: Fetch PR Details (Data Collection Phase)
For each PR, use gh pr view to get full data:
gh pr view ${pr_number} --repo linuxdeepin/dde-cooperation \
--json reviews,comments
Important: Collect ALL reviews (valid and invalid), but only send VALID reviews to AI. This allows AI to judge validity.
Ignore Invalid Commits:
- deepin-ci-robot: All commits by this user are considered invalid and should be excluded
- Version updates: PRs with titles matching version update patterns (see Step 2)
- These should be filtered out before review extraction to avoid cluttering the report
Step 5: Extract Original Reviews
Extract review data WITHOUT summarization. Store original content for AI analysis.
# In extract_review_suggestions():
for review in pr_data.get('reviews', []):
author = review.get('author', {}).get('login', '')
body = review.get('body', '') # Original content, NOT summarized
state = review.get('state', '')
# Apply reviewer filter
if not should_include_reviewer(author):
continue
# Store for AI analysis
valid_reviews.append({
'original_content': body, # Original text
'reviewer': author,
'review_time': review.get('submittedAt', ''),
'pr_number': pr.get('number', ''),
})
Step 6: AI Analysis (AI Integration Phase)
For each valid review, generate AI-powered analysis:
AI Prompt Template:
你是代码review专家。请分析以下GitHub review内容,并提供结构化分析:
**原始Review内容:**
${original_review_content}
**Review上下文:**
- PR标题: ${pr_title}
- Review者: ${reviewer}
- Review时间: ${review_time}
**任务:**
1. 判断这是否为有效的代码review(不是简单的批准或自动评论)
2. 提取核心问题点
3. 分类问题类型(从15种类型中选择)
4. 评估严重程度(严重或一般)
5. 判断问题来源(commit log=1, 代码=2, 注释=3)
- 默认为"代码"(2)
- 只有当review明确提到注释问题才选"注释"(3)
- 只有当review明确提到commit信息/格式问题才选"commit log"(1)
6. 生成中文问题描述(≤50字)
7. 生成影响分析(≤20字)
8. 如果无明显影响,填"无"
**输出格式(JSON):**
```json
{
"is_valid_review": true/false,
"problem_type": 1-15,
"problem_source": 1/2/3,
"problem_description": "中文问题描述≤50字",
"impact_analysis": "中文影响分析≤20字(或"无")",
"confidence": high/medium/low,
"reasoning": "简要说明判断依据"
}
Implementation:
# In generate_review_report():
for review in valid_reviews:
ai_prompt = build_ai_prompt(review, pr_info)
# Call AI model via skill_mcp or subprocess
ai_result = call_ai_analysis(ai_prompt)
# Parse JSON result
result = json.loads(ai_result)
# Add to processed list
processed_reviews.append({
'序号': serial_number,
'包名': module_name,
'仓库地址': f"https://github.com/{repo}",
'代码提交地址': pr.get('url', ''),
'问题来源': PROBLEM_SOURCES.get(result.get('problem_source', 2), '代码'),
'问题描述': result['problem_description'],
'严重程度': result.get('severity', '一般'),
'影响分析': result['impact_analysis'],
'问题类型': PROBLEM_TYPES.get(result['problem_type'], '其他'),
'提出人': review['reviewer'], # Reviewer who submitted the review comment
'提出时间': review_date_only,
'解决人': merged_by, # Person who submitted/merged the PR
'计划解决时间': merged_date_only,
'实际解决时间': merged_date_only,
'提出人确认是否验收通过': "是",
'问题状态': problem_status, # "Close" if merged, "Open" otherwise
})
AI Models Available:
- Claude: Use direct reasoning (for small datasets)
- OpenAI: Use
openaiMCP or API - DeepSeek: Use deepseek API or MCP
- Local LLM: Use local model APIs
Fallback: If AI unavailable, use original content as-is (no analysis)
Step 7: Load Template
# Read template file
template_path = 'resources/模块名-代码走查报告-template.xlsx'
df_template = pd.read_excel(template_path, engine='openpyxl')
# Verify template structure
required_columns = ['序号', '包名', '仓库地址', '代码提交地址', '问题来源', '问题描述', '严重程度', '影响分析', '问题类型', '提出人', '提出时间', '解决人', '计划解决时间', '实际解决时间', '提出人确认是否验收通过', '问题状态']
if set(required_columns) != set(df_template.columns):
print("⚠️ Template column mismatch. Creating basic template.")
df_template = pd.DataFrame(columns=required_columns)
# If creating template, set required font
from openpyxl.styles import Font
# Load workbook to set styles
wb = load_workbook(template_path) if os.path.exists(template_path) else Workbook()
ws = wb.active
# Set font for header and data rows
chinese_font = Font(name='CESI宋体-GB18030', size=11)
for row in ws.iter_rows():
for cell in row:
cell.font = chinese_font
wb.save(template_path)
Template preservation rules:
- Read template with formatting
- Append data using
pd.concat()or direct indexing - Save with
to_excel(index=False, engine='openpyxl') - Template styling (widths, borders, colors, header styles) will be preserved
- Font requirement: Use "CESI宋体-GB18030" for all Chinese text
Step 8: Generate Excel Report
# Combine template data with AI-processed reviews
df_final = pd.concat([df_template, df_processed], ignore_index=True)
# Save to Excel
output_file = f"{module_name}-代码走查报告-{yyyymmdd}.xlsx"
df_final.to_excel(output_file, index=False, engine='openpyxl')
Quick Reference
Two-Phase Workflow
| Phase | Purpose | Tools |
|---|---|---|
| 数据收集 | Fetch PRs via gh pr list/view, extract original reviews | GitHub CLI, generator.py |
| AI分析 | Analyze reviews, generate problem description & impact analysis | AI models (Claude, OpenAI, DeepSeek) |
| 报表生成 | Load template, append data, preserve formatting | pandas, openpyxl |
AI Prompt Template Structure
AI_PROMPT_TEMPLATE = """
你是代码review专家。请分析以下GitHub review内容,并提供结构化分析:
**原始Review内容:**
{original_content}
**Review上下文:**
- PR标题: {pr_title}
- Review者: {reviewer}
- Review时间: {review_time}
**任务:**
1. 判断这是否为有效的代码review(不是简单的批准或自动评论)
2. 提取核心问题点
3. 分类问题类型(从15种类型中选择)
4. 评估严重程度(严重或一般)
5. 判断问题来源(commit log=1, 代码=2, 注释=3)
- 默认为"代码"(2)
- 只有当review明确提到注释问题才选"注释"(3)
- 只有当review明确提到commit信息/格式问题才选"commit log"(1)
6. 生成中文问题描述(≤50字)
7. 生成影响分析(≤20字)
8. 如果无明显影响,填"无"
**输出格式(JSON):**
{{
"is_valid_review": true/false,
"problem_type": 1-15,
"problem_source": 1/2/3,
"problem_description": "中文问题描述≤50字",
"impact_analysis": "中文影响分析≤20字(或"无")",
"confidence": high/medium/low,
"reasoning": "简要说明判断依据"
}
"""
Excel Report Columns (13 total)
| 列名 | 说明 | 来源 |
|---|---|---|
| 序号 | 自增序号(1,2,3...) | Auto-increment |
| 包名 | --module-name 参数 | 用户指定 |
| 仓库地址 | https://github.com/{repo} | GitHub URL |
| 代码提交地址 | pr['url'] | PR链接 |
| 问题来源 | 映射表(1-3,默认为代码2) | commit log(1) / 代码(2) / 注释(3) |
| 问题描述 | AI生成的中文描述(≤50字) | AI分析结果 |
| 严重程度 | 问题类型8(安全)、12(内存)="严重",其他="一般" | AI判断 |
| 影响分析 | AI生成的影响分析(≤20字或"无") | AI分析结果 |
| 问题类型 | 映射表(1-15) | AI分类结果 |
| 提出人 | reviewer登录名(提出review备注的人) | GitHub API |
| 提出时间 | YYYY-MM-DD格式 | review['submittedAt'] |
| 解决人 | mergedBy登录名(提交PR的人) | GitHub API |
| 计划解决时间 | mergedAt YYYY-MM-DD | GitHub API |
| 实际解决时间 | mergedAt YYYY-MM-DD | GitHub API |
| 提出人确认是否验收通过 | 固定值"是" | 硬编码 |
| 问题状态 | merged="Close", not merged="Open" | 判断逻辑 |
Time Range Examples
| Expression | Meaning |
|---|---|
this month | 1st of current month to today |
last month | 1st of last month to last day of last month |
last week | Last Monday to last Sunday |
this week | Last Monday to today |
last 15 days | 15 days ago to today |
gh PR Commands
# Basic query
gh pr list --repo linuxdeepin/dde-cooperation --state merged
# With date and branch filters
gh pr list --repo linuxdeepin/dde-cooperation --state merged \
--search "merged:2026-01-01..2026-01-31 base:master"
# With JSON output
gh pr list --repo linuxdeepin/dde-cooperation --state merged \
--json number,title,author,createdAt,mergedAt,mergedBy,baseRefName,url,reviews,comments
# Fetch single PR details
gh pr view 123 --repo linuxdeepin/dde-cooperation \
--json reviews,comments
Common Mistakes
| Mistake | Why It's Wrong | Fix |
|---|---|---|
| Not using AI for review analysis | AI analysis is the core feature, not optional | Use AI to analyze each review and generate structured fields |
| Creating Excel without template | Loses formatting consistency | Read template first, then append data |
| Summarizing reviews manually | Misses AI's intelligent analysis | Let AI do the analysis |
| Not preserving template formatting | Template has styling for a reason | Use pd.concat() to append, preserve formatting |
| Mixing data collection with AI analysis | Two separate phases | Separate concerns: first collect all data, then analyze with AI |
| Not validating reviews before AI | AI can judge validity itself | Send all to AI, let it determine validity |
| Processing reviews individually | Slows down AI | Batch process: collect all, then send to AI as array |
| Missing error handling | AI may fail | Try-catch and provide meaningful error messages |
| Wrong default problem source value | Default should be "代码"(2), not "注释"(3) | Use result.get('problem_source', 2) and map correctly |
| Using Chinese for "问题状态" | Must be English "Close" or "Open" | Change merged="关闭" to merged="Close" |
| Not filtering version update PRs | Clutters report with non-code changes | Filter out "Update version to X.Y.Z" PRs |
| Including deepin-ci-robot commits | Automated commits are not real reviews | Exclude deepin-ci-robot entirely |
| Not prioritizing human reviews | AI reviews are lower quality than humans | Filter out AI reviews when human reviews exist |
| Not using correct Chinese font | Text rendering may break | Use "CESI宋体-GB18030" for all Chinese text |
Rationalization Table
| Excuse | Reality |
|---|---|
| "AI integration is complex" | Two-phase design is cleaner and more powerful |
| "Template file is unused" | It preserves formatting and ensures consistency |
| "AI might be slow" | AI analysis is worth the quality improvement |
| "Can just use keyword matching" | Simple matching misses nuance |
| "Need to implement AI model" | Use available AI tools or services |
| "JSON parsing is error-prone" | AI returns structured JSON |
| "Chinese text works with default font" | Chinese characters may not render correctly |
| "deepin-ci-robot commits should be included" | They are automated, not human reviews |
| "AI reviews are useful too" | Human reviews have higher quality and priority |
| "Version updates need review too" | They are routine changes, not code review |
Red Flags - STOP and Start Over
- Creating Excel without reading template
- Processing reviews before AI analysis (should be two-phase)
- Generating "问题描述" and "影响分析" without AI
- Not preserving template formatting when using it
- Validating reviews in data collection phase (AI should do it)
- Not batching AI requests (slow and inefficient)
- Using simple truncation instead of AI analysis
- Changing reviewer/column specifications without skill update
- Hardcoding problem source default value to anything other than "代码"(2)
- Not letting AI determine problem source (use AI's judgment, don't assume)
- Using Chinese for "问题状态" (must be "Close" or "Open")
- Not ignoring deepin-ci-robot commits
- Including AI reviews when human reviews exist
- Not using "CESI宋体-GB18030" font for Chinese text
All of these mean: STOP. Re-read skill. Start over with correct two-phase approach.