paper-banana

Generates publication-ready academic illustrations using the PaperBanana five-agent pipeline (arXiv:2601.23265). Diagram mode runs 5 Gemini API agents with multimodal in-context learning from curated reference images. Plot mode generates executable Python matplotlib/seaborn code. Use when the user asks for: research paper figures, academic diagrams, methodology illustrations, architecture diagrams, statistical plots, conference-quality visualizations, flowcharts for papers, NeurIPS/ICML/CVPR figures, or improving existing paper figures.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "paper-banana" with this command: npx skills add javidmardanov/paper-banana-skill/javidmardanov-paper-banana-skill-paper-banana

PaperBanana: Academic Illustration Pipeline

Automates publication-ready academic illustrations via 5 specialized agents, each implemented as a separate Gemini API call: Retriever (categorize & select references) -> Planner (multimodal description) -> Stylist (polish) -> Visualizer (render) -> Critic (evaluate & refine).

Two output modes:

  • DIAGRAM MODE: Each agent is a Python script calling Gemini VLM/image APIs. Run scripts/orchestrate.py for end-to-end execution.
  • PLOT MODE: Statistical plots generated as executable Python matplotlib/seaborn code (code-based to eliminate data hallucination).

Requirements: GOOGLE_API_KEY env var (used for VLM calls in retriever/planner/stylist/critic AND image generation in visualizer), Python 3.10+ with google-genai, matplotlib, seaborn, numpy, pillow.

Paper: PaperBanana: Automating Academic Illustrations with Multi-Agent Systems (arXiv:2601.23265, Google/PKU)


Step 1: Determine Output Mode

Decide which track to follow:

SignalMode
User provides raw data, table, CSV + visual intent (bar chart, scatter, etc.)PLOT MODE
User provides methodology text, description, or figure captionDIAGRAM MODE
User provides existing figure to improveMatch original type

Critical rule: PLOT MODE always generates Python code (never image generation for data visualizations). Code-based generation eliminates data hallucination errors that corrupt numerical accuracy in image-based approaches.


Step 2: Execute Pipeline

DIAGRAM MODE — Automated Pipeline

Primary entry point: Run the end-to-end orchestrator:

python scripts/orchestrate.py \
  --methodology-file methodology.txt \
  --caption "Figure 1: Overview of proposed framework" \
  --mode diagram \
  --output output/diagram.png

Or with inline text:

python scripts/orchestrate.py \
  --methodology "Our framework consists of three modules..." \
  --caption "Figure 1: System overview" \
  --mode diagram \
  --output output/diagram.png

The orchestrator chains all 5 agents automatically and handles the Critic's refinement loop (up to 3 iterations). Intermediate outputs are saved to output/work/ for inspection.

Pipeline Details

Read references/DIAGRAM-PROMPTS.md for the actual Gemini prompt templates used by each agent.

Phase 1: RETRIEVER (scripts/retriever.py) — Gemini VLM call

  • Classifies methodology into 1 of 4 categories from references/DIAGRAM-CATEGORIES.md
  • Selects 2 most relevant reference diagrams from the 13 curated examples in assets/references/
  • Identifies visual intent: Framework Overview, Pipeline/Flow, Detailed Module, Architecture Diagram

Phase 2: PLANNER (scripts/planner.py) — Multimodal Gemini VLM call

  • Sends the 2 selected reference images + methodology text to Gemini as a multimodal prompt
  • The VLM "sees" what good methodology diagrams look like (in-context learning from images)
  • Generates an extremely detailed textual description of the target diagram
  • Critical: Natural language only for all visual attributes. NEVER hex codes or pixel dimensions

Phase 3: STYLIST (scripts/stylist.py) — Gemini VLM call

  • Takes the Planner's description + full NeurIPS 2025 style guide
  • Applies domain-specific styling based on the category from Phase 1
  • Follows 5 critical rules: preserve aesthetics, intervene minimally, respect domain, enrich details, preserve content
  • Outputs the polished description only

Phase 4: VISUALIZER (scripts/generate_image.py) — Gemini Image API call

  • Uses gemini-3-pro-image-preview to generate the diagram image from the styled description
  • Prepends quality prefix (high-res, legible text, clean background, no watermarks)
  • Aspect ratio selected based on visual intent (16:9 for pipelines, 3:2 for modules)

Phase 5: CRITIC (scripts/critic.py) — Multimodal Gemini VLM call

  • Sends the generated image + methodology text to Gemini for multimodal evaluation
  • Scores on 4 dimensions (faithfulness, readability, conciseness, aesthetics)
  • If faithfulness < 7 OR readability < 7: generates revised description → loops to Phase 4
  • Maximum 3 refinement iterations

DIAGRAM MODE — Manual Execution

You can also run each agent individually for more control:

# Phase 1: Retriever
python scripts/retriever.py --methodology-file text.txt --output work/retriever.json

# Phase 2: Planner
python scripts/planner.py --methodology-file text.txt --caption "Figure 1: ..." \
  --references work/retriever.json --output work/planner.json

# Phase 3: Stylist
python scripts/stylist.py --description work/planner.json --output work/stylist.json

# Phase 4: Visualizer (extract styled_description from JSON, pass to generate_image.py)
python scripts/generate_image.py --prompt-file work/styled_desc.txt --output output/diagram.png

# Phase 5: Critic
python scripts/critic.py --image output/diagram.png --methodology-file text.txt \
  --description work/stylist.json --output work/critic.json

PLOT MODE

Read references/PLOT-PROMPTS.md for detailed agent prompts. Read references/PLOT-STYLE-GUIDE.md for aesthetic rules.

Plot mode uses Claude (or the host agent) for reasoning and code generation — no Gemini API calls needed for plot generation itself.

Phase 1: CATEGORIZE (Retriever)

Match data characteristics and visual intent:

Data TypePlot Types
Categorical comparisonBar chart, grouped bar, stacked bar
Continuous trendsLine chart, area chart
Correlation/distributionScatter plot, histogram, box plot, violin
Matrix/similarityHeatmap, confusion matrix
Multi-dimensionalRadar/spider chart
ProportionalPie/donut chart, treemap

Phase 2: PLAN (Planner)

Create a detailed specification that explicitly enumerates:

  • Every raw data point with exact coordinates/values
  • Axis ranges, labels, tick marks, scales (linear/log)
  • Color assignments for each series/category
  • Font sizes for title, axis labels, tick labels, legend
  • Line widths, marker sizes, marker shapes
  • Legend placement and formatting
  • Grid style (major/minor, dashed/solid)
  • Figure dimensions and DPI

Phase 3: STYLE (Stylist)

Read references/PLOT-STYLE-GUIDE.md for NeurIPS 2025 plot aesthetics.

Key styling rules:

  • White backgrounds only
  • Colorblind-friendly palettes (see assets/palettes/colorblind_safe.json)
  • Sans-serif fonts (Helvetica, Arial, or DejaVu Sans)
  • Markers on line charts for print readability
  • Inward-facing tick marks
  • Subtle grid lines (light gray, dashed)

Phase 4: VISUALIZE (Visualizer — Code Generation)

Generate complete, self-contained Python matplotlib/seaborn code. Use scripts/plot_generator.py as a reference implementation or run it directly with a JSON config:

python scripts/plot_generator.py --config plot_config.json --output figure.pdf

Code requirements:

  • Self-contained: all data defined inline, no external file dependencies
  • Apply .mplstyle from assets/matplotlib_styles/academic_default.mplstyle
  • Set OUTPUT_PATH variable for output file location
  • 300 DPI, bbox_inches='tight'
  • No plt.show() — save only
  • Support both PDF and PNG output

After generating the code, execute it to produce the plot image.

Phase 5: CRITIQUE (Critic)

Same rubric as diagram mode, plus plot-specific checks:

  • Data fidelity: Every data point correctly plotted
  • Axis accuracy: Ranges, labels, scales match specification
  • Layout: No overlapping labels, legends, or data points
  • Code correctness: Syntax valid, imports available, output saved

If code execution failed, analyze the error, simplify the approach, and regenerate.


Quick Start Examples

Diagram (automated): Run scripts/orchestrate.py with your methodology text file and caption.

Diagram (via agent): "Generate a methodology diagram for my transformer architecture. Here is the methodology section: [paste text]. Caption: Overview of our proposed multi-head attention framework."

Plot: "Create a bar chart comparing model performance. Data: {BERT: 92.3, GPT-4: 88.1, Claude: 95.7, Gemini: 91.2}. Intent: F1 score comparison across language models."

Improve: "Improve the aesthetics of this diagram: [paste existing description or attach current figure]"


File Reference

FilePurposeWhen to Read
scripts/orchestrate.pyEnd-to-end pipeline runnerDiagram mode primary entry point
scripts/retriever.pyVLM-based reference selectionPhase 1 (diagram mode)
scripts/planner.pyMultimodal description generationPhase 2 (diagram mode)
scripts/stylist.pyVLM-based style applicationPhase 3 (diagram mode)
scripts/generate_image.pyGemini Image API callPhase 4 (diagram mode)
scripts/critic.pyVLM-based image evaluationPhase 5 (diagram mode)
scripts/plot_generator.pyTemplate-based matplotlib generatorPhase 4 (plot mode)
scripts/validate_output.pyOutput validation and dependency checkPost-generation validation
references/DIAGRAM-PROMPTS.mdActual Gemini prompt templates for diagramsAll diagram phases
references/PLOT-PROMPTS.mdAgent prompts for plotsAll plot phases
references/DIAGRAM-STYLE-GUIDE.mdNeurIPS 2025 diagram aestheticsPhase 3 (Style)
references/PLOT-STYLE-GUIDE.mdNeurIPS 2025 plot aestheticsPhase 3 (Style)
references/EVALUATION-RUBRIC.mdCritic scoring criteria (4 dimensions)Phase 5 (Critique)
references/DIAGRAM-CATEGORIES.md4 diagram categories with keywordsPhase 1 (Categorize)
assets/references/index.json13 curated reference diagram metadataPhase 1 (Retriever)
assets/references/*.jpg13 curated reference diagram imagesPhase 2 (Planner multimodal input)
assets/palettes/*.jsonColor palette definitionsPhase 3 (Style)
assets/matplotlib_styles/*.mplstyleMatplotlib style sheetsPhase 4 (plot mode)

Environment Setup

# Required for all Gemini API calls (VLM reasoning + image generation)
export GOOGLE_API_KEY="your-api-key-here"

# Install dependencies
pip install google-genai matplotlib seaborn numpy pillow

Verify setup: python scripts/validate_output.py --check-deps

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

openclaw-version-monitor

监控 OpenClaw GitHub 版本更新,获取最新版本发布说明,翻译成中文, 并推送到 Telegram 和 Feishu。用于:(1) 定时检查版本更新 (2) 推送版本更新通知 (3) 生成中文版发布说明

Archived SourceRecently Updated
Coding

ask-claude

Delegate a task to Claude Code CLI and immediately report the result back in chat. Supports persistent sessions with full context memory. Safe execution: no data exfiltration, no external calls, file operations confined to workspace. Use when the user asks to run Claude, delegate a coding task, continue a previous Claude session, or any task benefiting from Claude Code's tools (file editing, code analysis, bash, etc.).

Archived SourceRecently Updated
Coding

ai-dating

This skill enables dating and matchmaking workflows. Use it when a user asks to make friends, find a partner, run matchmaking, or provide dating preferences/profile updates. The skill should execute `dating-cli` commands to complete profile setup, task creation/update, match checking, contact reveal, and review.

Archived SourceRecently Updated
Coding

clawhub-rate-limited-publisher

Queue and publish local skills to ClawHub with a strict 5-per-hour cap using the local clawhub CLI and host scheduler.

Archived SourceRecently Updated