Research Paper Figure Skill Factory
This skill is a two-layer research-paper figure Skill Factory.
- Skill Builder layer: build or patch a reusable specialized figure-making skill for one paper-figure class by acquiring lawful source material, extracting figure evidence, building a taxonomy, generating the skill package, testing it, and locking it.
- Figure Production layer: after a specialized skill is locked, use that generated skill to design, compare, render, review, and integrate concrete figures for arbitrary target papers of the same figure class.
Non-Negotiable Contract
First Trigger
On first trigger, output only a startup plan. Do not analyze a paper, build a taxonomy, create candidate schemes, draft prompts, or generate images. The first reply is STARTUP_PLAN_ONLY (TEXT_ONLY).
If the first user message asks for images, record the request as pending only. The first reply must not call Create image, $imagegen, an image API, or include image artifacts.
Specialized-Skill-First Builder Rule
The normal route is:
figure-class goal -> corpus plan -> lawful acquisition/local corpus -> evidence extraction -> taxonomy -> specialized skill blueprint -> generated specialized skill -> tests/patches -> locked skill -> target-paper production.
Do not jump from source papers directly to one concrete figure unless the user explicitly chooses a full production fast-track. If fast-tracking, record the skipped builder steps and fallback skill/taxonomy.
Full-Feasible Corpus Rule
When local PDFs, a paper index, or retrieval manifests exist, enumerate the full relevant candidate set and process as many accessible relevant PDFs as feasible. A small sample can support only a limited/pilot/fallback lock unless the user explicitly accepts that limitation. Representative rendered pages are audit aids only, not the corpus size.
Mandatory Candidate-Image Bridge
Every generated specialized figure-making skill must include a hard workflow bridge after any multi-option text decision:
TEXT_ONLYcandidate text turn: present 4-6 text candidates, normally 6.TEXT_ONLYvisual candidate setup turn: define candidate count, varied axis, fixed elements, rendering route, and what the user should compare.IMAGE_ONLYcandidate-board turn: generate/display 4-6 candidate images or schematic candidates, normally 6.TEXT_ONLYcandidate-review turn: record the previous image batch, compare candidates, recommend one direction, and ask the user to select, revise, or request another board.
This bridge is mandatory after candidate schemes, subtype choices, layout choices, style choices, metaphor choices, density choices, and prompt alternatives. The generated skill must not move directly from 4-6 text candidates to final prompt construction, final image generation, caption writing, or text-only locking unless the user explicitly says to skip image candidates and stay text-only. If skipped, record visual_candidate_board_skipped_by_user: true.
Generated skill lock/test must fail if:
- the workflow lacks a dedicated visual candidate setup step;
- the workflow lacks a dedicated
IMAGE_ONLYcandidate-board step before direction lock; - examples show text candidates followed directly by final prompt or final image generation;
- the state footer cannot record
visual_candidate_board_status,candidate_image_batch_id, andselected_visual_candidate; - multi-option next prompts do not ask the user to generate/display multiple candidate images or schematic candidates, normally 6.
Strict Text/Image Separation
Every response is exactly one modality:
TEXT_ONLY: planning, intake, diagnosis, candidate text, candidate-board setup, prompt writing, critique, status, and next prompts.IMAGE_ONLY: image generation only. No prose, captions, critique, prompt text, or state footer.
If a reply emits any visible text, it must not generate images in the same response. If the user confirms generation and state is sufficient, the next assistant response may be IMAGE_ONLY only.
Rendering Route
For candidate boards, draft candidates, final diagrams, and revisions:
- ChatGPT web must use Create image through ChatGPT Images 2.0.
- Codex must use the
$imagegenskill first. - If
$imagegenis unavailable in Codex, use ChatGPT Images 2.0 API or another approved image-generation API. - Native bitmap outputs such as PNG, JPG, JPEG, and WebP are allowed when produced by the approved image route.
- Do not use SVG, Mermaid, TikZ, Graphviz, HTML/CSS, canvas, matplotlib, filesystem code drawing, or code-rendered/exported figures as candidate images, draft images, final visuals, or fallbacks.
Reference Images
Generated specialized skills must support optional sample/reference images. If the user provides multiple images, ask which attributes to borrow from each image: style, layout, panel rhythm, density, content-detail level, labels, color semantics, callout grammar, or negative-reference constraints.
Every Text Reply
Every TEXT_ONLY reply from this factory and from generated specialized skills must include:
当前执行计划- substantive work for the current step
默认推荐当前状态与产物下一步你可以这样问
The state footer must list all steps plus the current position and the response mode of every step. The first copyable next prompt must use:
请使用**<当前skill名称>**,执行,根据当前状态,下一步执行:...
Always include:
请使用**<当前skill名称>**,根据当前状态,提供下一步提问建议。
Skill Builder Workflow
| Step | Layer | Mode | Purpose | Output |
|---|---|---|---|---|
| S0 | Startup | STARTUP_PLAN_ONLY (TEXT_ONLY) | Show the complete two-layer plan only | Startup plan |
| B1 | Skill Builder | TEXT_ONLY | Define target figure class and generated skill goal | Figure-class brief |
| B2 | Skill Builder | TEXT_ONLY | Define corpus scope, venues, keywords, and lawful acquisition route | Corpus plan |
| B3 | Skill Builder | TEXT_ONLY | Acquire or organize open/user-authorized PDFs and manifests | Local corpus + retrieval manifest |
| B4 | Skill Builder | TEXT_ONLY | Extract paper cards, captions, figure inventory, labels, and visual observations | Evidence artifacts |
| B5 | Skill Builder | TEXT_ONLY | Build evidence-backed figure-class taxonomy | Taxonomy + lineage |
| B6 | Skill Builder | TEXT_ONLY | Convert taxonomy into specialized skill blueprint | Blueprint |
| B7 | Skill Builder | TEXT_ONLY | Generate specialized skill package | Skill folder/package |
| B8 | Skill Builder | TEXT_ONLY | Test and patch startup, state, candidate-board, rendering, and prompt behavior | Test report + patches |
| B9 | Skill Builder | TEXT_ONLY | Lock generated skill for reusable production | Locked skill with version/scope |
Required Generated Figure-Production Workflow
Every generated specialized figure-making skill must use this expanded production workflow, or a stricter equivalent with the same mandatory candidate-image bridge:
| Step | Mode | Purpose | Output |
|---|---|---|---|
| P1 | TEXT_ONLY | Intake target-paper material, target slot, constraints, and optional sample images | Material status |
| P2 | TEXT_ONLY | Diagnose figure need and multi-label subtype routing | Subtype candidates + default route |
| P3 | TEXT_ONLY | Define reader effect and produce 4-6 text candidate schemes, normally 6 | Text candidates + required visual-candidate next action |
| P4 | TEXT_ONLY | Set up visual candidate board: candidate count, varied axis, fixed content, route, comparison criteria | Candidate-board brief |
| P5 | IMAGE_ONLY | Generate/display 4-6 candidate images or schematic candidates, normally 6 | Image candidates only |
| P6 | TEXT_ONLY | Record the image batch, compare candidates, recommend one, and lock or revise direction | Selected/revised visual direction |
| P7 | TEXT_ONLY | Build final content architecture and formal image brief/prompt for the selected direction | Final image brief |
| P8 | IMAGE_ONLY | Generate formal figure candidate or revision batch through the approved image route | Formal image candidates only |
| P9 | TEXT_ONLY | Review, refine, caption, legend, body insertion, and handoff text | Final paper text package |
Rules for this workflow:
- P3 must not ask the user to choose only from text as the primary route. Its first recommended next prompt must be to generate/display 6 candidate images or schematic candidates.
- P4 is required before P5 unless the immediately preceding user message already confirms the board count, varied axis, fixed elements, and rendering route.
- P5 is not a final figure stage. It is a visual selection stage.
- P6 must happen after P5 and must record the image batch before any final prompt or caption work.
- P7/P8 may only occur after a direction is selected or the user explicitly requests a formal generation despite unresolved candidates.
- Any generated skill may add more domain-specific steps, but it must not remove P4/P5/P6 or collapse them into a mixed text+image response.
Generated Skill Package Requirements
Generated specialized skills must include the candidate-image bridge in:
SKILL.mdmetadata.jsonagents/openai.yamlreferences/workflow-and-state-contract.mdreferences/visual-style-and-board-protocol.mdreferences/prompt-generation-policy.mdtemplates/state-footer-template.mdtemplates/figure-brief-template.mdtemplates/prompt-template.md- examples, especially startup, text-candidate, visual-board setup, image-only board, and candidate-review examples
- release checklist and starter prompts
The release checklist must include a failing test for the exact bug this patch fixes: “after 4-6 text candidates or layout/style-axis setup, the generated skill still has no separate candidate-image generation step.”
Reference Loading Order
Load references as needed:
references/master-workflow.mdreferences/generated-specialized-skill-output-spec.mdreferences/generated-skill-multi-candidate-policy.mdreferences/visual-first-decision-board-protocol.mdreferences/startup-plan-step-output-map.mdreferences/planning-state-and-navigation-contract.mdreferences/prompt-generation-and-rendering-policy.mdreferences/strict-text-image-turn-separation-policy.mdtemplates/specialized_skill_blueprint_template.mdtemplates/state_footer_template.md
Version Note
Version 1.0.1 makes the candidate-image bridge mandatory in generated figure-making skills. A generated skill must no longer stop at text candidates, layout/style axis decisions, or visual-board suggestions; it must provide explicit steps for candidate-board setup, image-only generation of multiple candidates, and text-only candidate review/selection.