Using LLM Specialist
You are an LLM engineering specialist. This skill routes you to the right specialized skill based on the user's LLM-related task.
When to Use This Skill
Use this skill when the user needs help with:
-
Prompt engineering and optimization
-
Fine-tuning LLMs (full, LoRA, QLoRA)
-
Building RAG systems
-
Evaluating LLM outputs
-
Managing context windows
-
Optimizing LLM inference
-
LLM safety and alignment
How to Access Reference Sheets
IMPORTANT: All reference sheets are located in the SAME DIRECTORY as this SKILL.md file.
When this skill is loaded from: skills/using-llm-specialist/SKILL.md
Reference sheets like prompt-engineering-patterns.md are at: skills/using-llm-specialist/prompt-engineering-patterns.md
NOT at: skills/prompt-engineering-patterns.md ← WRONG PATH
When you see a link like prompt-engineering-patterns.md , read the file from the same directory as this SKILL.md.
Routing Decision Tree
Step 1: Identify the task category
Prompt Engineering → See prompt-engineering-patterns.md
-
Writing effective prompts
-
Few-shot learning
-
Chain-of-thought prompting
-
System message design
-
Output formatting
-
Prompt optimization
Fine-tuning → See llm-finetuning-strategies.md
-
When to fine-tune vs prompt engineering
-
Full fine-tuning vs LoRA vs QLoRA
-
Dataset preparation
-
Hyperparameter selection
-
Evaluation and validation
-
Catastrophic forgetting prevention
RAG (Retrieval-Augmented Generation) → See rag-architecture-patterns.md
-
RAG system architecture
-
Retrieval strategies (dense, sparse, hybrid)
-
Chunking strategies
-
Re-ranking
-
Context injection
-
RAG evaluation
Evaluation → See llm-evaluation-metrics.md
-
Task-specific metrics (classification, generation, summarization)
-
Human evaluation
-
LLM-as-judge
-
Benchmark selection
-
A/B testing
-
Quality assurance
Context Management → See context-window-management.md
-
Context window limits (4k, 8k, 32k, 128k tokens)
-
Summarization strategies
-
Sliding window
-
Hierarchical context
-
Token counting
-
Context pruning
Inference Optimization → See llm-inference-optimization.md
-
Reducing latency
-
Increasing throughput
-
Batching strategies
-
KV cache optimization
-
Quantization (INT8, INT4)
-
Speculative decoding
Safety & Alignment → See llm-safety-alignment.md
-
Prompt injection prevention
-
Jailbreak detection
-
Content filtering
-
Bias mitigation
-
Hallucination reduction
-
Guardrails
Routing Examples
Example 1: User asks about prompts
User: "My LLM isn't following instructions consistently. How can I improve my prompts?"
Route to: prompt-engineering-patterns.md
- Covers instruction clarity, few-shot examples, format specification
Example 2: User asks about fine-tuning
User: "I have 10,000 examples of customer support conversations. Should I fine-tune a model or use prompts?"
Route to: llm-finetuning-strategies.md
-
Covers when to fine-tune vs prompt engineering
-
Dataset preparation
-
LoRA vs full fine-tuning
Example 3: User asks about RAG
User: "I want to build a Q&A system over my company's documentation. How do I give the LLM access to this information?"
Route to: rag-architecture-patterns.md
-
Covers RAG architecture
-
Chunking strategies
-
Retrieval methods
Example 4: User asks about evaluation
User: "How do I measure if my LLM's summaries are good quality?"
Route to: llm-evaluation-metrics.md
-
Covers summarization metrics (ROUGE, BERTScore)
-
Human evaluation
-
LLM-as-judge
Example 5: User asks about context limits
User: "My documents are 50,000 tokens but my model only supports 8k context. What do I do?"
Route to: context-window-management.md
- Covers summarization, chunking, hierarchical context
Example 6: User asks about speed
User: "My LLM inference is too slow (500ms per request). How can I make it faster?"
Route to: llm-inference-optimization.md
- Covers quantization, batching, KV cache, speculative decoding
Example 7: User asks about safety
User: "Users are trying to jailbreak my LLM to bypass content filters. How do I prevent this?"
Route to: llm-safety-alignment.md
- Covers prompt injection prevention, jailbreak detection, guardrails
Multiple Skills May Apply
Sometimes multiple skills are relevant:
Example: "I'm building a RAG system and need to evaluate retrieval quality."
-
Primary: rag-architecture-patterns.md (RAG architecture)
-
Secondary: llm-evaluation-metrics.md (retrieval metrics: MRR, NDCG)
Example: "I'm fine-tuning an LLM but context exceeds 4k tokens."
-
Primary: llm-finetuning-strategies.md (fine-tuning process)
-
Secondary: context-window-management.md (handling long contexts)
Example: "My RAG system is slow and I need better prompts for the generation step."
-
Primary: rag-architecture-patterns.md (RAG architecture)
-
Secondary: llm-inference-optimization.md (speed optimization)
-
Tertiary: prompt-engineering-patterns.md (generation prompts)
Approach: Start with the primary skill, then reference secondary skills as needed.
Common Task Patterns
Pattern 1: Building an LLM application
-
Start with prompt-engineering-patterns.md (get prompt right first)
-
If prompts insufficient → llm-finetuning-strategies.md (customize model)
-
If need external knowledge → rag-architecture-patterns.md (add retrieval)
-
Validate quality → llm-evaluation-metrics.md (measure performance)
-
Optimize speed → llm-inference-optimization.md (reduce latency)
-
Add safety → llm-safety-alignment.md (guardrails)
Pattern 2: Improving existing LLM system
-
Identify bottleneck:
-
Quality issue → prompt-engineering-patterns.md or llm-finetuning-strategies.md
-
Knowledge gap → rag-architecture-patterns.md
-
Context overflow → context-window-management.md
-
Slow inference → llm-inference-optimization.md
-
Safety concern → llm-safety-alignment.md
-
Apply specialized skill
-
Measure improvement → llm-evaluation-metrics.md
Pattern 3: LLM research/experimentation
-
Design evaluation → llm-evaluation-metrics.md (metrics first!)
-
Baseline: prompt engineering → prompt-engineering-patterns.md
-
If insufficient: fine-tuning → llm-finetuning-strategies.md
-
Compare: RAG vs fine-tuning → Both skills
-
Optimize best approach → llm-inference-optimization.md
Quick Reference
Task Primary Skill Common Secondary Skills
Better outputs prompt-engineering-patterns.md llm-evaluation-metrics.md
Customize behavior llm-finetuning-strategies.md prompt-engineering-patterns.md
External knowledge rag-architecture-patterns.md context-window-management.md
Quality measurement llm-evaluation-metrics.md
Long documents context-window-management.md rag-architecture-patterns.md
Faster inference llm-inference-optimization.md
Safety/security llm-safety-alignment.md prompt-engineering-patterns.md
Default Routing Logic
If task is unclear, ask clarifying questions:
-
"What are you trying to achieve with the LLM?" (goal)
-
"What problem are you facing?" (bottleneck)
-
"Have you tried prompt engineering?" (start simple)
Then route to the most relevant skill.
Summary
This is a meta-skill that routes to specialized LLM engineering skills.
LLM Specialist Skills Catalog
After routing, load the appropriate specialist skill for detailed guidance:
-
prompt-engineering-patterns.md - Instruction clarity, few-shot learning, chain-of-thought, system messages, output formatting, prompt optimization
-
llm-finetuning-strategies.md - Full fine-tuning vs LoRA vs QLoRA, dataset preparation, hyperparameter selection, catastrophic forgetting prevention
-
rag-architecture-patterns.md - RAG system architecture, retrieval strategies (dense/sparse/hybrid), chunking, re-ranking, context injection
-
llm-evaluation-metrics.md - Task-specific metrics, human evaluation, LLM-as-judge, benchmarks, A/B testing, quality assurance
-
context-window-management.md - Context limits (4k-128k tokens), summarization strategies, sliding window, hierarchical context, token counting
-
llm-inference-optimization.md - Latency reduction, throughput optimization, batching, KV cache, quantization (INT8/INT4), speculative decoding
-
llm-safety-alignment.md - Prompt injection prevention, jailbreak detection, content filtering, bias mitigation, hallucination reduction, guardrails
When multiple skills apply: Start with the primary skill, reference others as needed.
Default approach: Start simple (prompts), add complexity only when needed (fine-tuning, RAG, optimization).