fine-tuning-customization

Fine-Tuning & Customization

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "fine-tuning-customization" with this command: npx skills add yonatangross/orchestkit/yonatangross-orchestkit-fine-tuning-customization

Fine-Tuning & Customization

Customize LLMs for specific domains using parameter-efficient fine-tuning and alignment techniques.

**Unsloth **: 7x longer context RL, FP8 RL on consumer GPUs, rsLoRA support. TRL: OpenEnv integration, vLLM server mode, transformers 5.0.0+ compatible.

Decision Framework: Fine-Tune or Not?

Approach Try First When It Works

Prompt Engineering Always Simple tasks, clear instructions

RAG External knowledge needed Knowledge-intensive tasks

Fine-Tuning Last resort Deep specialization, format control

Fine-tune ONLY when:

  • Prompt engineering tried and insufficient

  • RAG doesn't capture domain nuances

  • Specific output format consistently required

  • Persona/style must be deeply embedded

  • You have ~1000+ high-quality examples

LoRA vs QLoRA (Unsloth )

Criteria LoRA QLoRA

Model fits in VRAM Use LoRA

Memory constrained

Use QLoRA

Training speed 39% faster

Memory savings

75%+ (dynamic 4-bit quants)

Quality Baseline ~Same (Unsloth recovered accuracy loss)

70B LLaMA

<48GB VRAM with QLoRA

Quick Reference: LoRA Training

from unsloth import FastLanguageModel from trl import SFTTrainer

Load with 4-bit quantization (QLoRA)

model, tokenizer = FastLanguageModel.from_pretrained( model_name="unsloth/Meta-Llama-3.1-8B", max_seq_length=2048, load_in_4bit=True, )

Add LoRA adapters

model = FastLanguageModel.get_peft_model( model, r=16, # Rank (16-64 typical) lora_alpha=32, # Scaling (2x r) lora_dropout=0.05, target_modules=[ "q_proj", "k_proj", "v_proj", "o_proj", # Attention "gate_proj", "up_proj", "down_proj", # MLP (QLoRA paper) ], )

Train

trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=dataset, max_seq_length=2048, ) trainer.train()

DPO Alignment

from trl import DPOTrainer, DPOConfig

config = DPOConfig( learning_rate=5e-6, # Lower for alignment beta=0.1, # KL penalty coefficient per_device_train_batch_size=4, num_train_epochs=1, )

Preference dataset: {prompt, chosen, rejected}

trainer = DPOTrainer( model=model, ref_model=ref_model, # Frozen reference args=config, train_dataset=preference_dataset, tokenizer=tokenizer, ) trainer.train()

Synthetic Data Generation

async def generate_synthetic(topic: str, n: int = 100) -> list[dict]: """Generate training examples using teacher model.""" examples = [] for _ in range(n): response = await client.chat.completions.create( model="gpt-5.2", # Teacher messages=[{ "role": "system", "content": f"Generate a training example about {topic}. " "Include instruction and response." }], response_format={"type": "json_object"} ) examples.append(json.loads(response.choices[0].message.content)) return examples

Key Hyperparameters

Parameter Recommended Notes

Learning rate 2e-4 LoRA/QLoRA standard

Epochs 1-3 More risks overfitting

LoRA r 16-64 Higher = more capacity

LoRA alpha 2x r Scaling factor

Batch size 4-8 Per device

Warmup 3% Ratio of steps

Anti-Patterns (FORBIDDEN)

NEVER fine-tune without trying alternatives first

model.fine_tune(data) # Try prompt engineering & RAG first!

NEVER use low-quality training data

data = scrape_random_web() # Garbage in, garbage out

NEVER skip evaluation

trainer.train() deploy(model) # Always evaluate before deploy!

ALWAYS use separate eval set

train, eval = split(data, test_size=0.1) trainer = SFTTrainer(..., eval_dataset=eval)

Detailed Documentation

Resource Description

references/lora-qlora.md Parameter-efficient fine-tuning

references/dpo-alignment.md Direct Preference Optimization

references/synthetic-data.md Training data generation

references/when-to-finetune.md Decision framework

Related Skills

  • llm-evaluation

  • Evaluate fine-tuned models

  • embeddings

  • When to use embeddings instead

  • rag-retrieval

  • When RAG is better than fine-tuning

  • langfuse-observability

  • Track training experiments

Capability Details

lora-qlora

Keywords: LoRA, QLoRA, PEFT, parameter efficient, adapter, low-rank Solves:

  • Fine-tune large models on consumer hardware

  • Configure LoRA hyperparameters

  • Choose target modules for adapters

dpo-alignment

Keywords: DPO, RLHF, preference, alignment, human feedback, preference data Solves:

  • Align models to human preferences

  • Create preference datasets

  • Configure DPO training

synthetic-data

Keywords: synthetic data, data generation, teacher model, distillation Solves:

  • Generate training data with LLMs

  • Implement teacher-student training

  • Scale training data quality

when-to-finetune

Keywords: should I fine-tune, fine-tune decision, customize model Solves:

  • Decide when fine-tuning is appropriate

  • Evaluate alternatives to fine-tuning

  • Assess data requirements

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

responsive-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

domain-driven-design

No summary provided by upstream source.

Repository SourceNeeds Review
General

dashboard-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

rag-retrieval

No summary provided by upstream source.

Repository SourceNeeds Review