paperbanana

Generate publication-quality academic diagrams from paper methodology text

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "paperbanana" with this command: npx skills add dwzhu-pku/paperbanana

PaperBanana

Generate publication-quality academic diagrams and pipeline figures from a paper's methodology section and figure caption. PaperBanana orchestrates a multi-agent pipeline (Retriever, Planner, Stylist, Visualizer, Critic) to produce camera-ready figures suitable for venues like NeurIPS, ICML, and ACL.

Environment Setup

cd <repo-root>
uv pip install -r requirements.txt

Set your API key via environment variable or in configs/model_config.yaml.

Option 1 (Recommended): OpenRouter API key — one key for both text reasoning and image generation:

export OPENROUTER_API_KEY="sk-or-v1-..."

Option 2: Google API key — direct access to Gemini API:

export GOOGLE_API_KEY="your-key-here"

If both keys are configured, OpenRouter is used by default.

Usage

python skill/run.py \
  --content "METHOD_TEXT" \
  --caption "FIGURE_CAPTION" \
  --task diagram \
  --output output.png

Parameters

ParameterRequiredDefaultDescription
--contentYes*Method section text to visualize
--content-fileYes*Path to a file containing the method text (alternative to --content)
--captionYesFigure caption or visual intent
--taskNodiagramTask type: diagram
--outputNooutput.pngOutput image file path
--aspect-ratioNo21:9Aspect ratio: 21:9, 16:9, or 3:2
--max-critic-roundsNo3Maximum critic refinement iterations
--num-candidatesNo10Number of parallel candidates to generate
--retrieval-settingNoautoRetrieval mode: auto, manual, random, or none
--main-model-nameNogemini-3.1-pro-previewMain model for VLM agents. Provider auto-detected from configured API key
--image-gen-model-nameNogemini-3.1-flash-image-previewModel for image generation. Also supports gemini-3-pro-image-preview
--exp-modeNodemo_fullPipeline: demo_full (with Stylist) or demo_planner_critic (without Stylist)

*One of --content or --content-file is required.

When --num-candidates > 1, output files are named <stem>_0.png, <stem>_1.png, etc.

Output

The absolute path of each saved image is printed to stdout, one per line.

Examples

Diagram

python skill/run.py \
  --content "We propose a transformer-based encoder-decoder architecture. The encoder consists of 12 self-attention layers with residual connections. The decoder uses cross-attention to attend to encoder outputs and generates the target sequence autoregressively." \
  --caption "Figure 1: Overview of the proposed transformer architecture" \
  --task diagram \
  --output architecture.png

Important Notes

  • Runtime: A single candidate typically takes 3-10 minutes depending on model and network conditions. With the default 10 candidates running in parallel, expect ~10-30 minutes total. Plan accordingly.
  • API calls: Each candidate involves multiple LLM calls (Retriever + Planner + Stylist + Visualizer + up to 3 Critic rounds). Candidates run in parallel for efficiency.
  • Image generation: The Visualizer agent calls an image generation model (Gemini Image) to render diagrams.

About

PaperBanana is based on the PaperVizAgent framework, a reference-driven multi-agent system for automated academic illustration. It was developed as part of the research paper:

PaperBanana: Automating Academic Illustration for AI Scientists Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, Jinsung Yoon arXiv:2601.23265

The framework introduces a collaborative team of five specialized agents — Retriever, Planner, Stylist, Visualizer, and Critic — to transform raw scientific content into publication-quality diagrams. Evaluation is conducted on the PaperBananaBench benchmark.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

Autism Spectrum Disorder Behavior Analysis Tool | 孤独症谱系障碍行为分析工具

Performs special video analysis on behavioral characteristics of children with autism, identifies core symptom features, provides structured analysis reports...

Registry SourceRecently Updated
Research

""Mental Health Analysis Tool | 心理健康分析工具""

Analyzes human mental health and psychological behavior, supports identifying common psychological problem tendencies through video analysis, and provides st...

Registry SourceRecently Updated
Research

"""Micro-Expression Recognition & Analysis Tool | 微观情绪识别分析工具"""

Professional discernment of subtle cues! It performs detailed analysis and recognition of facial micro-expressions, outputs precise emotional state reports,...

Registry SourceRecently Updated
840Profile unavailable
Research

媒体广告流量分析

查询广告投放流量分布与趋势的数据分析技能。支持按行业、地域、媒体(OTT/移动端)、目标受众等多维度分析广告曝光数据,适用于媒体策略评估、竞品投放监测、行业广告趋势研究等场景。

Registry SourceRecently Updated
336Profile unavailable