model-migrate-flagos

Migrate a model from the latest vLLM upstream repository into the vllm-plugin-FL project (pinned at vLLM v0.13.0). Use this skill whenever someone wants to add support for a new model to vllm-plugin-FL, port model code from upstream vLLM, or backport a newly released model. Trigger when the user says things like "migrate X model", "add X model support", "port X from upstream vLLM", "make X work with the FL plugin", or simply "/model-migrate-flagos model_name". The model_name argument uses snake_case (e.g. qwen3_5, kimi_k25, deepseek_v4). Do NOT use for models already supported by vLLM 0.13.0 core, or for multimodal-only components that don't need backporting.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "model-migrate-flagos" with this command: npx skills add flagos-ai/model-migrate-flagos

FL Plugin — Model Migration Skill

Usage

/model-migrate-flagos <model_name> [upstream_folder] [plugin_folder]
ArgumentRequiredDefault
model_nameYes
upstream_folderNo/tmp/vllm-upstream-ref
plugin_folderNocurrent working directory

Execution

Step 1: Parse arguments and validate paths

Extract from user input:

  • {{model_name}} = first argument (required, snake_case)
  • {{upstream_folder}} = second argument or /tmp/vllm-upstream-ref
  • {{plugin_folder}} = third argument or current working directory

If {{upstream_folder}} doesn't exist, ask user whether to clone it. If {{plugin_folder}} doesn't exist, error out.

→ Tell user: Confirm parsed model name and paths.

Step 2: Load references and resolve placeholders

Read these files (relative to this SKILL.md):

  • references/procedure.md — step-by-step migration procedure
  • references/compatibility-patches.md — 0.13.0 patch catalog
  • references/operational-rules.md — communication, TaskList, bash rules, resilience

The procedure references executable scripts in scripts/:

  • scripts/validate_migration.py — automated code review (Step 6)
  • scripts/benchmark.sh — benchmark verification (Step 9)
  • scripts/serve.sh — serve model locally (Step 10.1, also used for E2E)
  • scripts/request.sh — test request (Step 10.2)
  • scripts/e2e_eval.py — E2E correctness verification (Step 11)
  • scripts/e2e_test_prompts.json — test prompts for E2E (5 text + 5 multimodal)
  • scripts/e2e_config.template.json — E2E config template (copy to e2e_config.json and fill in)
  • scripts/e2e_remote_serve.sh — manage GT server on remote machine via SSH

Then investigate upstream source + HuggingFace to resolve all placeholders:

PlaceholderHow to derive
{{model_name}}Direct from argument
{{model_name_lower}}Lowercase of model_name (usually identical, e.g. qwen3_5) — used in file paths
{{MODEL_DISPLAY_NAME}}From upstream code or HF model card
{{ModelClassName}}From upstream model class (PascalCase)
{{model_type}}From HF config.json model_type field
{{ConfigClassName}}From upstream or derive from model_type
{{skill_root}}Absolute path to this skill's folder (the directory containing this SKILL.md)

Naming conventions vary per model — always verify from actual source, never guess.

→ Tell user: Present all resolved values. Use AskUserQuestion if anything is ambiguous.

Step 3: Execute procedure

With placeholders resolved, execute every step in procedure.md sequentially. Apply patches from compatibility-patches.md during the copy-then-patch step. Follow operational-rules.md throughout.

→ Tell user: Before starting, output a numbered plan. Report progress at each step boundary.

Scripts Reference

ScriptStepDescription
validate_migration.py6Automated import/API/registration checks
benchmark.sh9vllm bench throughput with dummy weights
serve.sh10, 11Start local vLLM server (port 8122, VLLM_FL_PREFER_ENABLED=false)
request.sh10Quick smoke-test request
e2e_eval.py11Token-level comparison vs upstream GT server
e2e_test_prompts.json115 text + 5 multimodal test prompts
e2e_config.template.json11Config template (GT machine, local port, eval params)
e2e_remote_serve.sh11SSH-based GT server lifecycle (start/stop/status/logs)

Examples

Example 1: Typical new model

User says: "/model-migrate-flagos kimi_k25"
Actions:
  1. Parse → model_name=kimi_k25, defaults for upstream/plugin paths
  2. Clone upstream, find vllm/model_executor/models/kimi_k25.py
  3. Discover it wraps DeepseekV2 → follow kimi_k25 (wrapper) pattern
  4. Copy file, apply P1+P2 patches, create config bridge
  5. Register, validate, test, benchmark, serve+request
  6. E2E verification against upstream GT
Result: kimi_k25 fully working in plugin, all 11 steps passed

Example 2: Re-run after upstream update

User says: "migrate qwen3_5 again, upstream updated"
Actions:
  1. Idempotent re-run — overwrite existing files with fresh upstream copy
  2. Re-apply patches, re-validate, re-test
  3. Re-run E2E to confirm no regression
Result: qwen3_5 updated to match latest upstream, no regressions

Troubleshooting

General principle: When any runtime error occurs, first compare vLLM upstream code against both the plugin adaptation and the installed 0.13.0 environment. The diff is the fastest path to root cause. See operational-rules.md § Debugging Priority: Upstream-First for the full protocol.

ProblemTypical CauseFix
ImportError after copy-then-patchMissing P1 fix (relative→absolute imports)Verify all from .xxx converted to from vllm.* or from vllm_fl.*
AttributeError: module 'vllm' has no attribute XAPI doesn't exist in 0.13.0Check P3 in compatibility-patches.md; stub or remove
Config not recognized by vLLMmodel_type mismatch or config bridge missingVerify _CONFIG_REGISTRY[model_type] matches HF config.json exactly
Registration has no effectClass name or import path typoCompare with existing registrations in __init__.py
Benchmark KeyError on config fieldConfig bridge missing a fieldCompare upstream config class vs bridge; add missing fields with defaults
Benchmark/Serve fails with OOM or "insufficient memory"GPUs occupied by other processesKill GPU processes: nvidia-smi --query-compute-apps=pid --format=csv,noheader | xargs -r kill -9 then retry. Never skip these steps.
Model outputs garbled/gibberish textColumnParallelLinear used for merged projections with different sub-dimensions (TP sharding mismatch)Override __init__ to use MergedColumnParallelLinear(output_sizes=[...]). See P8 in compatibility-patches.md
AssertionError: Duplicate op nameChild class imports custom op from different module path than parentUse same import path as parent module (e.g. vllm_fl.ops.fla not vllm_fl.models.fla_ops). See P11
AttributeError on fused_recurrent_* during CUDA graph warmup__init__ override with nn.Module.__init__(self) missed attributes used by inherited _forward_coreCreate ALL attributes from parent's __init__, especially custom ops. See P12
E2E: local server not reachableserve.sh port doesn't match e2e_config.json local portEnsure both use same port (default 8122)
E2E: GT server not reachableGT machine down or docker/conda env wrongCheck e2e_remote_serve.sh status or SSH manually
E2E: early token divergence (first 5 tokens)Weight loading bug, TP sharding errorCheck load_weights, stacked_params_mapping, MergedColumnParallelLinear
E2E: late minor divergence (token #15+)Numerical noise from different op implementationsUsually acceptable; document in report
resolve_op fails with VLLM_FL_PREFER_ENABLED=falseOp not registered in dispatch, no fallbackAdd try/except fallback to flag_gems in op import code

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

LLM Deploy

在 GPU 服务器上部署 LLM 模型服务(vLLM)。支持多服务器配置,自动检查 GPU 和端口占用,一键部署流行的开源大语言模型。

Registry SourceRecently Updated
4010Profile unavailable
General

ROCm vLLM Deployment

Production-ready vLLM deployment on AMD ROCm GPUs. Combines environment auto-check, model parameter detection, Docker Compose deployment, health verification...

Registry SourceRecently Updated
4442Profile unavailable
General

Gpu Deploy

在 GPU 服务器上部署 vLLM 模型服务。支持多服务器配置,自动检查 GPU 和端口占用,一键部署流行的开源模型。

Registry SourceRecently Updated
5000Profile unavailable