Fine-Tuning

Fine-tune LLMs with data preparation, provider selection, cost estimation, evaluation, and compliance checks.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "Fine-Tuning" with this command: npx skills add ivangdavila/fine-tuning

When to Use

User wants to fine-tune a language model, evaluate if fine-tuning is worth it, or debug training issues.

Quick Reference

TopicFile
Provider comparison & pricingproviders.md
Data preparation & validationdata-prep.md
Training configurationtraining.md
Evaluation & debuggingevaluation.md
Cost estimation & ROIcosts.md
Compliance & securitycompliance.md

Core Capabilities

  1. Decide fit — Analyze if fine-tuning beats prompting for the use case
  2. Prepare data — Convert raw data to JSONL, deduplicate, validate format
  3. Select provider — Compare OpenAI, Anthropic (Bedrock), Google, open source based on constraints
  4. Estimate costs — Calculate training cost, inference savings, break-even point
  5. Configure training — Set hyperparameters (learning rate, epochs, LoRA rank)
  6. Run evaluation — Compare fine-tuned vs base model on task-specific metrics
  7. Debug failures — Diagnose loss curves, overfitting, catastrophic forgetting
  8. Handle compliance — Scan for PII, configure on-premise training, generate audit logs

Decision Checklist

Before recommending fine-tuning, ask:

  • What's the failure mode with prompting? (format, style, knowledge, cost)
  • How many training examples available? (minimum 50-100)
  • Expected inference volume? (affects ROI calculation)
  • Privacy constraints? (determines provider options)
  • Budget for training + ongoing inference?

Fine-Tune vs Prompt Decision

SignalRecommendation
Format/style inconsistencyFine-tune ✓
Missing domain knowledgeRAG first, then fine-tune if needed
High inference volume (>100K/mo)Fine-tune for cost savings
Requirements change frequentlyStick with prompting
<50 quality examplesPrompting + few-shot

Critical Rules

  • Data quality > quantity — 100 great examples beat 1000 noisy ones
  • LoRA first — Never jump to full fine-tuning; LoRA is 10-100x cheaper
  • Hold out eval set — Always 80/10/10 split; never peek at test data
  • Same precision — Train and serve at identical precision (4-bit, 16-bit)
  • Baseline first — Run eval on base model before training to measure actual improvement
  • Expect iteration — First attempt rarely optimal; plan for 2-3 cycles

Common Pitfalls

MistakeFix
Training on inconsistent dataManual review of 100+ samples before training
Learning rate too highStart with 2e-4 for SFT, 5e-6 for RLHF
Expecting new knowledgeFine-tuning adjusts behavior, not knowledge — use RAG
No baseline comparisonAlways test base model on same eval set
Ignoring forgettingMix 20% general data to preserve capabilities

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

"Bird Recognition Tool | 鸟类识别工具"

Identifies bird species in images/videos of target areas. Supports recognition of no less than 500 common bird species, supports customized model training, s...

Registry SourceRecently Updated
General

Databricks Data

Databricks is a cloud-agnostic unified data lakehouse platform offering scalable batch and stream processing, AI/ML integration, and enterprise-grade securit...

Registry SourceRecently Updated
General

Airplane AI / 断网 AI 助手

Give LM Studio or Ollama users a browser-based AI chat interface that works completely offline. Use when the user wants an offline-capable personal AI, needs...

Registry SourceRecently Updated
General

Crest

全球知名宝洁旗下牙膏品牌,专注防蛀与美白技术,拥有行业先发优势和领先3D White专利技术。

Registry SourceRecently Updated