implementing-llms-litgpt

LitGPT - Clean LLM Implementations

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "implementing-llms-litgpt" with this command: npx skills add davila7/claude-code-templates/davila7-claude-code-templates-implementing-llms-litgpt

LitGPT - Clean LLM Implementations

Quick start

LitGPT provides 20+ pretrained LLM implementations with clean, readable code and production-ready training workflows.

Installation:

pip install 'litgpt[extra]'

Load and use any model:

from litgpt import LLM

Load pretrained model

llm = LLM.load("microsoft/phi-2")

Generate text

result = llm.generate( "What is the capital of France?", max_new_tokens=50, temperature=0.7 ) print(result)

List available models:

litgpt download list

Common workflows

Workflow 1: Fine-tune on custom dataset

Copy this checklist:

Fine-Tuning Setup:

  • Step 1: Download pretrained model
  • Step 2: Prepare dataset
  • Step 3: Configure training
  • Step 4: Run fine-tuning

Step 1: Download pretrained model

Download Llama 3 8B

litgpt download meta-llama/Meta-Llama-3-8B

Download Phi-2 (smaller, faster)

litgpt download microsoft/phi-2

Download Gemma 2B

litgpt download google/gemma-2b

Models are saved to checkpoints/ directory.

Step 2: Prepare dataset

LitGPT supports multiple formats:

Alpaca format (instruction-response):

[ { "instruction": "What is the capital of France?", "input": "", "output": "The capital of France is Paris." }, { "instruction": "Translate to Spanish: Hello, how are you?", "input": "", "output": "Hola, ¿cómo estás?" } ]

Save as data/my_dataset.json .

Step 3: Configure training

Full fine-tuning (requires 40GB+ GPU for 7B models)

litgpt finetune
meta-llama/Meta-Llama-3-8B
--data JSON
--data.json_path data/my_dataset.json
--train.max_steps 1000
--train.learning_rate 2e-5
--train.micro_batch_size 1
--train.global_batch_size 16

LoRA fine-tuning (efficient, 16GB GPU)

litgpt finetune_lora
microsoft/phi-2
--data JSON
--data.json_path data/my_dataset.json
--lora_r 16
--lora_alpha 32
--lora_dropout 0.05
--train.max_steps 1000
--train.learning_rate 1e-4

Step 4: Run fine-tuning

Training saves checkpoints to out/finetune/ automatically.

Monitor training:

View logs

tail -f out/finetune/logs.txt

TensorBoard (if using --train.logger_name tensorboard)

tensorboard --logdir out/finetune/lightning_logs

Workflow 2: LoRA fine-tuning on single GPU

Most memory-efficient option.

LoRA Training:

  • Step 1: Choose base model
  • Step 2: Configure LoRA parameters
  • Step 3: Train with LoRA
  • Step 4: Merge LoRA weights (optional)

Step 1: Choose base model

For limited GPU memory (12-16GB):

  • Phi-2 (2.7B) - Best quality/size tradeoff

  • Llama 3 1B - Smallest, fastest

  • Gemma 2B - Good reasoning

Step 2: Configure LoRA parameters

litgpt finetune_lora
microsoft/phi-2
--data JSON
--data.json_path data/my_dataset.json
--lora_r 16 \ # LoRA rank (8-64, higher=more capacity) --lora_alpha 32 \ # LoRA scaling (typically 2×r) --lora_dropout 0.05 \ # Prevent overfitting --lora_query true \ # Apply LoRA to query projection --lora_key false \ # Usually not needed --lora_value true \ # Apply LoRA to value projection --lora_projection true \ # Apply LoRA to output projection --lora_mlp false \ # Usually not needed --lora_head false # Usually not needed

LoRA rank guide:

  • r=8 : Lightweight, 2-4MB adapters

  • r=16 : Standard, good quality

  • r=32 : High capacity, use for complex tasks

  • r=64 : Maximum quality, 4× larger adapters

Step 3: Train with LoRA

litgpt finetune_lora
microsoft/phi-2
--data JSON
--data.json_path data/my_dataset.json
--lora_r 16
--train.epochs 3
--train.learning_rate 1e-4
--train.micro_batch_size 4
--train.global_batch_size 32
--out_dir out/phi2-lora

Memory usage: ~8-12GB for Phi-2 with LoRA

Step 4: Merge LoRA weights (optional)

Merge LoRA adapters into base model for deployment:

litgpt merge_lora
out/phi2-lora/final
--out_dir out/phi2-merged

Now use merged model:

from litgpt import LLM llm = LLM.load("out/phi2-merged")

Workflow 3: Pretrain from scratch

Train new model on your domain data.

Pretraining:

  • Step 1: Prepare pretraining dataset
  • Step 2: Configure model architecture
  • Step 3: Set up multi-GPU training
  • Step 4: Launch pretraining

Step 1: Prepare pretraining dataset

LitGPT expects tokenized data. Use prepare_dataset.py :

python scripts/prepare_dataset.py
--source_path data/my_corpus.txt
--checkpoint_dir checkpoints/tokenizer
--destination_path data/pretrain
--split train,val

Step 2: Configure model architecture

Edit config file or use existing:

config/pythia-160m.yaml

model_name: pythia-160m block_size: 2048 vocab_size: 50304 n_layer: 12 n_head: 12 n_embd: 768 rotary_percentage: 0.25 parallel_residual: true bias: true

Step 3: Set up multi-GPU training

Single GPU

litgpt pretrain
--config config/pythia-160m.yaml
--data.data_dir data/pretrain
--train.max_tokens 10_000_000_000

Multi-GPU with FSDP

litgpt pretrain
--config config/pythia-1b.yaml
--data.data_dir data/pretrain
--devices 8
--train.max_tokens 100_000_000_000

Step 4: Launch pretraining

For large-scale pretraining on cluster:

Using SLURM

sbatch --nodes=8 --gpus-per-node=8
pretrain_script.sh

pretrain_script.sh content:

litgpt pretrain
--config config/pythia-1b.yaml
--data.data_dir /shared/data/pretrain
--devices 8
--num_nodes 8
--train.global_batch_size 512
--train.max_tokens 300_000_000_000

Workflow 4: Convert and deploy model

Export LitGPT models for production.

Model Deployment:

  • Step 1: Test inference locally
  • Step 2: Quantize model (optional)
  • Step 3: Convert to GGUF (for llama.cpp)
  • Step 4: Deploy with API

Step 1: Test inference locally

from litgpt import LLM

llm = LLM.load("out/phi2-lora/final")

Single generation

print(llm.generate("What is machine learning?"))

Streaming

for token in llm.generate("Explain quantum computing", stream=True): print(token, end="", flush=True)

Batch inference

prompts = ["Hello", "Goodbye", "Thank you"] results = [llm.generate(p) for p in prompts]

Step 2: Quantize model (optional)

Reduce model size with minimal quality loss:

8-bit quantization (50% size reduction)

litgpt convert_lit_checkpoint
out/phi2-lora/final
--dtype bfloat16
--quantize bnb.nf4

4-bit quantization (75% size reduction)

litgpt convert_lit_checkpoint
out/phi2-lora/final
--quantize bnb.nf4-dq # Double quantization

Step 3: Convert to GGUF (for llama.cpp)

python scripts/convert_lit_checkpoint.py
--checkpoint_path out/phi2-lora/final
--output_path models/phi2.gguf
--model_name microsoft/phi-2

Step 4: Deploy with API

from fastapi import FastAPI from litgpt import LLM

app = FastAPI() llm = LLM.load("out/phi2-lora/final")

@app.post("/generate") def generate(prompt: str, max_tokens: int = 100): result = llm.generate( prompt, max_new_tokens=max_tokens, temperature=0.7 ) return {"response": result}

Run: uvicorn api:app --host 0.0.0.0 --port 8000

When to use vs alternatives

Use LitGPT when:

  • Want to understand LLM architectures (clean, readable code)

  • Need production-ready training recipes

  • Educational purposes or research

  • Prototyping new model ideas

  • Lightning ecosystem user

Use alternatives instead:

  • Axolotl/TRL: More fine-tuning features, YAML configs

  • Megatron-Core: Maximum performance for >70B models

  • HuggingFace Transformers: Broadest model support

  • vLLM: Inference-only (no training)

Common issues

Issue: Out of memory during fine-tuning

Use LoRA instead of full fine-tuning:

Instead of litgpt finetune (requires 40GB+)

litgpt finetune_lora # Only needs 12-16GB

Or enable gradient checkpointing:

litgpt finetune_lora
...
--train.gradient_accumulation_iters 4 # Accumulate gradients

Issue: Training too slow

Enable Flash Attention (built-in, automatic on compatible hardware):

Already enabled by default on Ampere+ GPUs (A100, RTX 30/40 series)

No configuration needed

Use smaller micro-batch and accumulate:

--train.micro_batch_size 1
--train.global_batch_size 32
--train.gradient_accumulation_iters 32 # Effective batch=32

Issue: Model not loading

Check model name:

List all available models

litgpt download list

Download if not exists

litgpt download meta-llama/Meta-Llama-3-8B

Verify checkpoints directory:

ls checkpoints/

Should see: meta-llama/Meta-Llama-3-8B/

Issue: LoRA adapters too large

Reduce LoRA rank:

--lora_r 8 # Instead of 16 or 32

Apply LoRA to fewer layers:

--lora_query true
--lora_value true
--lora_projection false \ # Disable this --lora_mlp false # And this

Advanced topics

Supported architectures: See references/supported-models.md for complete list of 20+ model families with sizes and capabilities.

Training recipes: See references/training-recipes.md for proven hyperparameter configurations for pretraining and fine-tuning.

FSDP configuration: See references/distributed-training.md for multi-GPU training with Fully Sharded Data Parallel.

Custom architectures: See references/custom-models.md for implementing new model architectures in LitGPT style.

Hardware requirements

  • GPU: NVIDIA (CUDA 11.8+), AMD (ROCm), Apple Silicon (MPS)

  • Memory:

  • Inference (Phi-2): 6GB

  • LoRA fine-tuning (7B): 16GB

  • Full fine-tuning (7B): 40GB+

  • Pretraining (1B): 24GB

  • Storage: 5-50GB per model (depending on size)

Resources

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

senior-data-scientist

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

senior-backend

No summary provided by upstream source.

Repository SourceNeeds Review
-1.2K
davila7
Coding

senior-frontend

No summary provided by upstream source.

Repository SourceNeeds Review