paper-analysis-evidence

structured academic paper analysis from local paper files or paper urls, adapted from a dify scheme a workflow. use when the user asks to analyze pdf/docx/text/html academic papers, extract title/task/background/problem/method/datasets/baselines/metrics/results/ablations/limitations/contributions, cite evidence spans, verify consistency against the original paper, or export paper analysis reports. supports chinese or english outputs and saves downloaded inputs, intermediate files, generated json, markdown, html, and docx reports under the ubuntu desktop.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "paper-analysis-evidence" with this command: npx skills add paper-summary-json

Paper Analysis Evidence

Purpose

Run the Scheme A evidence-enhanced paper analysis workflow: prepare paper inputs, split the paper into key sections, generate structured extraction JSON, verify the extraction against the original text, and render final reports.

This skill is based on the uploaded Dify workflow 论文分析系统_方案A_结构化证据增强版.

Runtime file policy

Always save runtime downloads and generated outputs under the Ubuntu desktop unless the user explicitly requests another location:

~/Desktop/paper_analysis_results/<YYYYMMDD_HHMMSS>/

Do not modify the original local paper file. Copy it into the work directory before extraction. Download URL inputs into the same batch work directory.

Inputs

Accept:

  • language: 中文 or 英文; default to 中文 when unspecified.
  • paper_files: one or more local paper files, preferably PDF, DOCX, TXT, MD, or HTML.
  • paper_urls: one or more PDF/direct paper URLs, comma-separated or repeated.

If both local files and URLs are empty, stop with this message:

上传的文件和论文URL不能同时为空。

Workflow

1. Prepare inputs and sections

Run:

python scripts/prepare_papers.py --language 中文 --files /path/to/paper.pdf --urls "https://example.com/paper.pdf"

Use only the relevant arguments. For URL-only runs, omit --files; for local-only runs, omit --urls.

The script creates manifest.json and one work directory per paper. It performs:

  1. local file copy or URL download,
  2. raw text extraction,
  3. text cleaning,
  4. section splitting into abstract, intro, method, experiment, conclusion, and paper_body,
  5. prompt file generation.

2. Generate structured extraction JSON

For each paper in manifest.json, read:

prompts/01_structured_extraction_prompt.md

Send that prompt to the model. Save the model response exactly as JSON-only content to:

generated/structured_result.json

Required JSON fields:

{
  "title": "",
  "task": "",
  "background": "",
  "problem_statement": "",
  "method_name": "",
  "method_core": "",
  "datasets": [],
  "baselines": [],
  "metrics": [],
  "main_results": [
    {"dataset": "", "metric": "", "value": "", "baseline": "", "improvement": ""}
  ],
  "ablations": [],
  "limitations": [],
  "claims": [],
  "contributions": [],
  "evidence_spans": [
    {"field": "", "claim": "", "evidence": ""}
  ]
}

Extraction rules:

  • Only use information present in, or directly inferable from, the paper.
  • Prefer corresponding sections, but fall back to the full paper_body when a section is empty or insufficient.
  • Do not leave datasets, baselines, or metrics empty just because the experiment section is weak; first check paper_body, result text, implementation details, and table-neighboring text.
  • Use empty strings or arrays only when the full paper text truly lacks the information.
  • Provide at least 6 evidence spans. Each evidence span must be a direct quote or a very close paraphrase from the source text.
  • Prioritize numeric results from experiment, results, analysis, implementation details, or table-neighboring text.
  • Keep JSON keys in English. Natural-language values must use the selected output language.

3. Run consistency verification

Open:

prompts/02_verification_prompt_template.md

Replace {{structured_json}} with the actual content of generated/structured_result.json. Send the complete verification prompt to the model and save JSON-only output to:

generated/verification_result.json

Required verification JSON:

{
  "overall_score": 0,
  "hallucination_risk": "low/medium/high",
  "issues": [
    {"field": "", "problem": "", "severity": "low/medium/high"}
  ],
  "verified_claims": [
    {"claim": "", "status": "supported/weak/unsupported", "evidence": ""}
  ],
  "final_verdict": ""
}

Verification rules:

  • Score 5: nearly no hallucination, strong evidence.
  • Score 4: minor imprecision.
  • Score 3: several claims lack evidence.
  • Score 2: clear inconsistency exists.
  • Score 1: substantial hallucination or misreading.
  • Focus on omitted or incorrect datasets, baselines, metrics, and main results.
  • If the structured extraction uses an empty array/string for information that exists in the original paper, explicitly list that in issues.
  • Provide at least 4 verified claims.

4. Render reports

After structured_result.json and verification_result.json are saved for every paper, run:

python scripts/render_report.py --manifest ~/Desktop/paper_analysis_results/<YYYYMMDD_HHMMSS>/manifest.json

Outputs per paper:

report/final_report.md
report/final_report.html
report/final_report.docx

The .md file preserves editable Markdown source. The .html file is the rendered visual version. The .docx file is the Word-compatible report.

Report structure

Chinese report sections:

  1. 论文题目
  2. 任务与问题
  3. 方法概述
  4. 实验要素:数据集、基线方法、评价指标
  5. 主要结果
  6. 贡献提炼
  7. 消融与局限性
  8. 证据片段
  9. 一致性校验:总评分、幻觉风险、最终结论、已核验结论、发现的问题

English report sections mirror the same structure as Paper Analysis.

References

  • Use references/prompt_templates.md when prompt details are needed.
  • Use references/workflow_mapping.md when checking how the Dify nodes map to this skill.
  • references/dify_scheme_a_source.yml preserves the uploaded Dify DSL source for auditability.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

Batch Content Factory

Multi-platform content production line. Automates the entire workflow from topic research to content creation. Suitable for self-media operators producing hi...

Registry SourceRecently Updated
Research

Fund Analyzer Pro

[何时使用]当用户需要基金深度分析时;当用户说"分析这个基金""基金对比""基金诊断""基金经理分析"时;当检测到基金代码/基金名称/投顾策略时触发。整合天天基金 API+ 且慢 MCP,提供单一基金分析/基金比较/基金诊断/持仓诊断/基金经理/机会分析/投资方式/报告信号八大模块。新增信号监控提醒功能(sign...

Registry SourceRecently Updated
Research

FN Portrait Toolkit

Financial report footnote extraction and analysis tool for Chinese A-share listed companies. Use when: (1) User wants to extract financial note data from ann...

Registry SourceRecently Updated
Research

流式AI检索问答技能

通用流式AI检索问答技能 — 为任意行业应用提供四步流式分析交互界面。 触发场景:用户输入关键词 → AI自动执行:理解意图 → 检索知识库 → 流式生成 → 来源标记 → 完整回答。 当需要实现以下任意场景时激活: (1) AI搜索框 / 智能咨询组件重构 (2) 知识库问答(医疗/法律/金融/教育等垂直领域)...

Registry SourceRecently Updated