fine-tuning-expert

Use when fine-tuning LLMs, training custom models, or adapting foundation models for specific tasks. Invoke for configuring LoRA/QLoRA adapters, preparing JSONL training datasets, setting hyperparameters for fine-tuning runs, adapter training, transfer learning, finetuning with Hugging Face PEFT, OpenAI fine-tuning, instruction tuning, RLHF, DPO, or quantizing and deploying fine-tuned models. Trigger terms include: LoRA, QLoRA, PEFT, finetuning, fine-tuning, adapter tuning, LLM training, model training, custom model.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "fine-tuning-expert" with this command: npx skills add jeffallan/claude-skills/jeffallan-claude-skills-fine-tuning-expert

Fine-Tuning Expert

Senior ML engineer specializing in LLM fine-tuning, parameter-efficient methods, and production model optimization.

Core Workflow

  1. Dataset preparation — Validate and format data; run quality checks before training starts
    • Checkpoint: python validate_dataset.py --input data.jsonl — fix all errors before proceeding
  2. Method selection — Choose PEFT technique based on GPU memory and task requirements
    • Use LoRA for most tasks; QLoRA (4-bit) when GPU memory is constrained; full fine-tune only for small models
  3. Training — Configure hyperparameters, monitor loss curves, checkpoint regularly
    • Checkpoint: validation loss must decrease; plateau or increase signals overfitting
  4. Evaluation — Benchmark against the base model; test on held-out set and edge cases
    • Checkpoint: collect perplexity, task-specific metrics (BLEU/ROUGE), and latency numbers
  5. Deployment — Merge adapter weights, quantize, measure inference throughput before serving

Reference Guide

Load detailed guidance based on context:

TopicReferenceLoad When
LoRA/PEFTreferences/lora-peft.mdParameter-efficient fine-tuning, adapters
Dataset Prepreferences/dataset-preparation.mdTraining data formatting, quality checks
Hyperparametersreferences/hyperparameter-tuning.mdLearning rates, batch sizes, schedulers
Evaluationreferences/evaluation-metrics.mdBenchmarking, metrics, model comparison
Deploymentreferences/deployment-optimization.mdModel merging, quantization, serving

Minimal Working Example — LoRA Fine-Tuning with Hugging Face PEFT

from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer
import torch

# 1. Load base model and tokenizer
model_id = "meta-llama/Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# 2. Configure LoRA adapter
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,               # rank — increase for more capacity, decrease to save memory
    lora_alpha=32,      # scaling factor; typically 2× rank
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()  # verify: should be ~0.1–1% of total params

# 3. Load and format dataset (Alpaca-style JSONL)
dataset = load_dataset("json", data_files={"train": "train.jsonl", "test": "test.jsonl"})

def format_prompt(example):
    return {"text": f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['output']}"}

dataset = dataset.map(format_prompt)

# 4. Training arguments
training_args = TrainingArguments(
    output_dir="./checkpoints",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,     # effective batch size = 16
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    warmup_ratio=0.03,                 # always use warmup
    fp16=False,
    bf16=True,
    logging_steps=10,
    eval_strategy="steps",
    eval_steps=100,
    save_steps=200,
    load_best_model_at_end=True,
)

# 5. Train
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    dataset_text_field="text",
    max_seq_length=2048,
)
trainer.train()

# 6. Save adapter weights only
model.save_pretrained("./lora-adapter")
tokenizer.save_pretrained("./lora-adapter")

QLoRA variant — add these lines before loading the model to enable 4-bit quantization:

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")

Merge adapter into base model for deployment:

from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
merged = PeftModel.from_pretrained(base, "./lora-adapter").merge_and_unload()
merged.save_pretrained("./merged-model")

Constraints

MUST DO

  • Validate dataset quality before training
  • Use parameter-efficient methods for large models (>7B)
  • Monitor training/validation loss curves
  • Document hyperparameters and training config
  • Version datasets and model checkpoints
  • Always include a learning rate warmup

MUST NOT DO

  • Skip data quality validation
  • Overfit on small datasets — use regularisation (dropout, weight decay) and early stopping
  • Merge incompatible adapters (mismatched rank, base model, or target modules)
  • Deploy without evaluation against a held-out set and latency benchmark

Output Templates

When implementing fine-tuning, always provide:

  1. Dataset preparation script with validation logic (schema checks, token-length histogram, deduplication)
  2. Training configuration (full TrainingArguments + LoraConfig block, commented)
  3. Evaluation script reporting perplexity, task-specific metrics, and latency
  4. Brief design rationale — why this PEFT method, rank, and learning rate were chosen for this task

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

golang-pro

No summary provided by upstream source.

Repository SourceNeeds Review
General

flutter-expert

No summary provided by upstream source.

Repository SourceNeeds Review
General

laravel-specialist

No summary provided by upstream source.

Repository SourceNeeds Review
General

kubernetes-specialist

No summary provided by upstream source.

Repository SourceNeeds Review