Fine-Tuning & Customization

Customize LLMs for specific domains using parameter-efficient fine-tuning and alignment techniques.

**Unsloth **: 7x longer context RL, FP8 RL on consumer GPUs, rsLoRA support. TRL: OpenEnv integration, vLLM server mode, transformers 5.0.0+ compatible.

Decision Framework: Fine-Tune or Not?

Approach Try First When It Works

Prompt Engineering Always Simple tasks, clear instructions

RAG External knowledge needed Knowledge-intensive tasks

Fine-Tuning Last resort Deep specialization, format control

Fine-tune ONLY when:

Prompt engineering tried and insufficient
RAG doesn't capture domain nuances
Specific output format consistently required
Persona/style must be deeply embedded
You have ~1000+ high-quality examples

LoRA vs QLoRA (Unsloth )

Criteria LoRA QLoRA

Model fits in VRAM Use LoRA

Memory constrained

Use QLoRA

Training speed 39% faster

Memory savings

75%+ (dynamic 4-bit quants)

Quality Baseline ~Same (Unsloth recovered accuracy loss)

70B LLaMA

<48GB VRAM with QLoRA

Quick Reference: LoRA Training

from unsloth import FastLanguageModel from trl import SFTTrainer

Load with 4-bit quantization (QLoRA)

model, tokenizer = FastLanguageModel.from_pretrained( model_name="unsloth/Meta-Llama-3.1-8B", max_seq_length=2048, load_in_4bit=True, )

Add LoRA adapters

model = FastLanguageModel.get_peft_model( model, r=16, # Rank (16-64 typical) lora_alpha=32, # Scaling (2x r) lora_dropout=0.05, target_modules=[ "q_proj", "k_proj", "v_proj", "o_proj", # Attention "gate_proj", "up_proj", "down_proj", # MLP (QLoRA paper) ], )

Train

trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=dataset, max_seq_length=2048, ) trainer.train()

DPO Alignment

from trl import DPOTrainer, DPOConfig

config = DPOConfig( learning_rate=5e-6, # Lower for alignment beta=0.1, # KL penalty coefficient per_device_train_batch_size=4, num_train_epochs=1, )

Preference dataset: {prompt, chosen, rejected}

trainer = DPOTrainer( model=model, ref_model=ref_model, # Frozen reference args=config, train_dataset=preference_dataset, tokenizer=tokenizer, ) trainer.train()

Synthetic Data Generation

async def generate_synthetic(topic: str, n: int = 100) -> list[dict]: """Generate training examples using teacher model.""" examples = [] for _ in range(n): response = await client.chat.completions.create( model="gpt-5.2", # Teacher messages=[{ "role": "system", "content": f"Generate a training example about {topic}. " "Include instruction and response." }], response_format={"type": "json_object"} ) examples.append(json.loads(response.choices[0].message.content)) return examples

Key Hyperparameters

Parameter Recommended Notes

Learning rate 2e-4 LoRA/QLoRA standard

Epochs 1-3 More risks overfitting

LoRA r 16-64 Higher = more capacity

LoRA alpha 2x r Scaling factor

Batch size 4-8 Per device

Warmup 3% Ratio of steps

Anti-Patterns (FORBIDDEN)

NEVER fine-tune without trying alternatives first

model.fine_tune(data) # Try prompt engineering & RAG first!

NEVER use low-quality training data

data = scrape_random_web() # Garbage in, garbage out

NEVER skip evaluation

trainer.train() deploy(model) # Always evaluate before deploy!

ALWAYS use separate eval set

train, eval = split(data, test_size=0.1) trainer = SFTTrainer(..., eval_dataset=eval)

Detailed Documentation

Resource Description

references/lora-qlora.md Parameter-efficient fine-tuning

references/dpo-alignment.md Direct Preference Optimization

references/synthetic-data.md Training data generation

references/when-to-finetune.md Decision framework

Related Skills

llm-evaluation
Evaluate fine-tuned models
embeddings
When to use embeddings instead
rag-retrieval
When RAG is better than fine-tuning
langfuse-observability
Track training experiments

Capability Details

lora-qlora

Keywords: LoRA, QLoRA, PEFT, parameter efficient, adapter, low-rank Solves:

Fine-tune large models on consumer hardware
Configure LoRA hyperparameters
Choose target modules for adapters

dpo-alignment

Keywords: DPO, RLHF, preference, alignment, human feedback, preference data Solves:

Align models to human preferences
Create preference datasets
Configure DPO training

synthetic-data

Keywords: synthetic data, data generation, teacher model, distillation Solves:

Generate training data with LLMs
Implement teacher-student training
Scale training data quality

when-to-finetune

Keywords: should I fine-tune, fine-tune decision, customize model Solves:

Decide when fine-tuning is appropriate
Evaluate alternatives to fine-tuning
Assess data requirements

fine-tuning-customization

Safety Notice

Copy this and send it to your AI assistant to learn

Load with 4-bit quantization (QLoRA)

Add LoRA adapters

Train

Preference dataset: {prompt, chosen, rejected}

NEVER fine-tune without trying alternatives first

NEVER use low-quality training data

NEVER skip evaluation

ALWAYS use separate eval set

Source Transparency

Related Skills

responsive-patterns

domain-driven-design

dashboard-patterns

rag-retrieval