outlines

Outlines: Structured Text Generation

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "outlines" with this command: npx skills add davila7/claude-code-templates/davila7-claude-code-templates-outlines

Outlines: Structured Text Generation

When to Use This Skill

Use Outlines when you need to:

  • Guarantee valid JSON/XML/code structure during generation

  • Use Pydantic models for type-safe outputs

  • Support local models (Transformers, llama.cpp, vLLM)

  • Maximize inference speed with zero-overhead structured generation

  • Generate against JSON schemas automatically

  • Control token sampling at the grammar level

GitHub Stars: 8,000+ | From: dottxt.ai (formerly .txt)

Installation

Base installation

pip install outlines

With specific backends

pip install outlines transformers # Hugging Face models pip install outlines llama-cpp-python # llama.cpp pip install outlines vllm # vLLM for high-throughput

Quick Start

Basic Example: Classification

import outlines from typing import Literal

Load model

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

Generate with type constraint

prompt = "Sentiment of 'This product is amazing!': " generator = outlines.generate.choice(model, ["positive", "negative", "neutral"]) sentiment = generator(prompt)

print(sentiment) # "positive" (guaranteed one of these)

With Pydantic Models

from pydantic import BaseModel import outlines

class User(BaseModel): name: str age: int email: str

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

Generate structured output

prompt = "Extract user: John Doe, 30 years old, john@example.com" generator = outlines.generate.json(model, User) user = generator(prompt)

print(user.name) # "John Doe" print(user.age) # 30 print(user.email) # "john@example.com"

Core Concepts

  1. Constrained Token Sampling

Outlines uses Finite State Machines (FSM) to constrain token generation at the logit level.

How it works:

  • Convert schema (JSON/Pydantic/regex) to context-free grammar (CFG)

  • Transform CFG into Finite State Machine (FSM)

  • Filter invalid tokens at each step during generation

  • Fast-forward when only one valid token exists

Benefits:

  • Zero overhead: Filtering happens at token level

  • Speed improvement: Fast-forward through deterministic paths

  • Guaranteed validity: Invalid outputs impossible

import outlines

Pydantic model -> JSON schema -> CFG -> FSM

class Person(BaseModel): name: str age: int

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

Behind the scenes:

1. Person -> JSON schema

2. JSON schema -> CFG

3. CFG -> FSM

4. FSM filters tokens during generation

generator = outlines.generate.json(model, Person) result = generator("Generate person: Alice, 25")

  1. Structured Generators

Outlines provides specialized generators for different output types.

Choice Generator

Multiple choice selection

generator = outlines.generate.choice( model, ["positive", "negative", "neutral"] )

sentiment = generator("Review: This is great!")

Result: One of the three choices

JSON Generator

from pydantic import BaseModel

class Product(BaseModel): name: str price: float in_stock: bool

Generate valid JSON matching schema

generator = outlines.generate.json(model, Product) product = generator("Extract: iPhone 15, $999, available")

Guaranteed valid Product instance

print(type(product)) # <class 'main.Product'>

Regex Generator

Generate text matching regex

generator = outlines.generate.regex( model, r"[0-9]{3}-[0-9]{3}-[0-9]{4}" # Phone number pattern )

phone = generator("Generate phone number:")

Result: "555-123-4567" (guaranteed to match pattern)

Integer/Float Generators

Generate specific numeric types

int_generator = outlines.generate.integer(model) age = int_generator("Person's age:") # Guaranteed integer

float_generator = outlines.generate.float(model) price = float_generator("Product price:") # Guaranteed float

  1. Model Backends

Outlines supports multiple local and API-based backends.

Transformers (Hugging Face)

import outlines

Load from Hugging Face

model = outlines.models.transformers( "microsoft/Phi-3-mini-4k-instruct", device="cuda" # Or "cpu" )

Use with any generator

generator = outlines.generate.json(model, YourModel)

llama.cpp

Load GGUF model

model = outlines.models.llamacpp( "./models/llama-3.1-8b-instruct.Q4_K_M.gguf", n_gpu_layers=35 )

generator = outlines.generate.json(model, YourModel)

vLLM (High Throughput)

For production deployments

model = outlines.models.vllm( "meta-llama/Llama-3.1-8B-Instruct", tensor_parallel_size=2 # Multi-GPU )

generator = outlines.generate.json(model, YourModel)

OpenAI (Limited Support)

Basic OpenAI support

model = outlines.models.openai( "gpt-4o-mini", api_key="your-api-key" )

Note: Some features limited with API models

generator = outlines.generate.json(model, YourModel)

  1. Pydantic Integration

Outlines has first-class Pydantic support with automatic schema translation.

Basic Models

from pydantic import BaseModel, Field

class Article(BaseModel): title: str = Field(description="Article title") author: str = Field(description="Author name") word_count: int = Field(description="Number of words", gt=0) tags: list[str] = Field(description="List of tags")

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, Article)

article = generator("Generate article about AI") print(article.title) print(article.word_count) # Guaranteed > 0

Nested Models

class Address(BaseModel): street: str city: str country: str

class Person(BaseModel): name: str age: int address: Address # Nested model

generator = outlines.generate.json(model, Person) person = generator("Generate person in New York")

print(person.address.city) # "New York"

Enums and Literals

from enum import Enum from typing import Literal

class Status(str, Enum): PENDING = "pending" APPROVED = "approved" REJECTED = "rejected"

class Application(BaseModel): applicant: str status: Status # Must be one of enum values priority: Literal["low", "medium", "high"] # Must be one of literals

generator = outlines.generate.json(model, Application) app = generator("Generate application")

print(app.status) # Status.PENDING (or APPROVED/REJECTED)

Common Patterns

Pattern 1: Data Extraction

from pydantic import BaseModel import outlines

class CompanyInfo(BaseModel): name: str founded_year: int industry: str employees: int

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, CompanyInfo)

text = """ Apple Inc. was founded in 1976 in the technology industry. The company employs approximately 164,000 people worldwide. """

prompt = f"Extract company information:\n{text}\n\nCompany:" company = generator(prompt)

print(f"Name: {company.name}") print(f"Founded: {company.founded_year}") print(f"Industry: {company.industry}") print(f"Employees: {company.employees}")

Pattern 2: Classification

from typing import Literal import outlines

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

Binary classification

generator = outlines.generate.choice(model, ["spam", "not_spam"]) result = generator("Email: Buy now! 50% off!")

Multi-class classification

categories = ["technology", "business", "sports", "entertainment"] category_gen = outlines.generate.choice(model, categories) category = category_gen("Article: Apple announces new iPhone...")

With confidence

class Classification(BaseModel): label: Literal["positive", "negative", "neutral"] confidence: float

classifier = outlines.generate.json(model, Classification) result = classifier("Review: This product is okay, nothing special")

Pattern 3: Structured Forms

class UserProfile(BaseModel): full_name: str age: int email: str phone: str country: str interests: list[str]

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, UserProfile)

prompt = """ Extract user profile from: Name: Alice Johnson Age: 28 Email: alice@example.com Phone: 555-0123 Country: USA Interests: hiking, photography, cooking """

profile = generator(prompt) print(profile.full_name) print(profile.interests) # ["hiking", "photography", "cooking"]

Pattern 4: Multi-Entity Extraction

class Entity(BaseModel): name: str type: Literal["PERSON", "ORGANIZATION", "LOCATION"]

class DocumentEntities(BaseModel): entities: list[Entity]

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, DocumentEntities)

text = "Tim Cook met with Satya Nadella at Microsoft headquarters in Redmond." prompt = f"Extract entities from: {text}"

result = generator(prompt) for entity in result.entities: print(f"{entity.name} ({entity.type})")

Pattern 5: Code Generation

class PythonFunction(BaseModel): function_name: str parameters: list[str] docstring: str body: str

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, PythonFunction)

prompt = "Generate a Python function to calculate factorial" func = generator(prompt)

print(f"def {func.function_name}({', '.join(func.parameters)}):") print(f' """{func.docstring}"""') print(f" {func.body}")

Pattern 6: Batch Processing

def batch_extract(texts: list[str], schema: type[BaseModel]): """Extract structured data from multiple texts.""" model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, schema)

results = []
for text in texts:
    result = generator(f"Extract from: {text}")
    results.append(result)

return results

class Person(BaseModel): name: str age: int

texts = [ "John is 30 years old", "Alice is 25 years old", "Bob is 40 years old" ]

people = batch_extract(texts, Person) for person in people: print(f"{person.name}: {person.age}")

Backend Configuration

Transformers

import outlines

Basic usage

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

GPU configuration

model = outlines.models.transformers( "microsoft/Phi-3-mini-4k-instruct", device="cuda", model_kwargs={"torch_dtype": "float16"} )

Popular models

model = outlines.models.transformers("meta-llama/Llama-3.1-8B-Instruct") model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3") model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct")

llama.cpp

Load GGUF model

model = outlines.models.llamacpp( "./models/llama-3.1-8b.Q4_K_M.gguf", n_ctx=4096, # Context window n_gpu_layers=35, # GPU layers n_threads=8 # CPU threads )

Full GPU offload

model = outlines.models.llamacpp( "./models/model.gguf", n_gpu_layers=-1 # All layers on GPU )

vLLM (Production)

Single GPU

model = outlines.models.vllm("meta-llama/Llama-3.1-8B-Instruct")

Multi-GPU

model = outlines.models.vllm( "meta-llama/Llama-3.1-70B-Instruct", tensor_parallel_size=4 # 4 GPUs )

With quantization

model = outlines.models.vllm( "meta-llama/Llama-3.1-8B-Instruct", quantization="awq" # Or "gptq" )

Best Practices

  1. Use Specific Types

✅ Good: Specific types

class Product(BaseModel): name: str price: float # Not str quantity: int # Not str in_stock: bool # Not str

❌ Bad: Everything as string

class Product(BaseModel): name: str price: str # Should be float quantity: str # Should be int

  1. Add Constraints

from pydantic import Field

✅ Good: With constraints

class User(BaseModel): name: str = Field(min_length=1, max_length=100) age: int = Field(ge=0, le=120) email: str = Field(pattern=r"^[\w.-]+@[\w.-]+.\w+$")

❌ Bad: No constraints

class User(BaseModel): name: str age: int email: str

  1. Use Enums for Categories

✅ Good: Enum for fixed set

class Priority(str, Enum): LOW = "low" MEDIUM = "medium" HIGH = "high"

class Task(BaseModel): title: str priority: Priority

❌ Bad: Free-form string

class Task(BaseModel): title: str priority: str # Can be anything

  1. Provide Context in Prompts

✅ Good: Clear context

prompt = """ Extract product information from the following text. Text: iPhone 15 Pro costs $999 and is currently in stock. Product: """

❌ Bad: Minimal context

prompt = "iPhone 15 Pro costs $999 and is currently in stock."

  1. Handle Optional Fields

from typing import Optional

✅ Good: Optional fields for incomplete data

class Article(BaseModel): title: str # Required author: Optional[str] = None # Optional date: Optional[str] = None # Optional tags: list[str] = [] # Default empty list

Can succeed even if author/date missing

Comparison to Alternatives

Feature Outlines Instructor Guidance LMQL

Pydantic Support ✅ Native ✅ Native ❌ No ❌ No

JSON Schema ✅ Yes ✅ Yes ⚠️ Limited ✅ Yes

Regex Constraints ✅ Yes ❌ No ✅ Yes ✅ Yes

Local Models ✅ Full ⚠️ Limited ✅ Full ✅ Full

API Models ⚠️ Limited ✅ Full ✅ Full ✅ Full

Zero Overhead ✅ Yes ❌ No ⚠️ Partial ✅ Yes

Automatic Retrying ❌ No ✅ Yes ❌ No ❌ No

Learning Curve Low Low Low High

When to choose Outlines:

  • Using local models (Transformers, llama.cpp, vLLM)

  • Need maximum inference speed

  • Want Pydantic model support

  • Require zero-overhead structured generation

  • Control token sampling process

When to choose alternatives:

  • Instructor: Need API models with automatic retrying

  • Guidance: Need token healing and complex workflows

  • LMQL: Prefer declarative query syntax

Performance Characteristics

Speed:

  • Zero overhead: Structured generation as fast as unconstrained

  • Fast-forward optimization: Skips deterministic tokens

  • 1.2-2x faster than post-generation validation approaches

Memory:

  • FSM compiled once per schema (cached)

  • Minimal runtime overhead

  • Efficient with vLLM for high throughput

Accuracy:

  • 100% valid outputs (guaranteed by FSM)

  • No retry loops needed

  • Deterministic token filtering

Resources

See Also

  • references/json_generation.md

  • Comprehensive JSON and Pydantic patterns

  • references/backends.md

  • Backend-specific configuration

  • references/examples.md

  • Production-ready examples

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

senior-data-scientist

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

senior-backend

No summary provided by upstream source.

Repository SourceNeeds Review
-1.2K
davila7
Coding

senior-frontend

No summary provided by upstream source.

Repository SourceNeeds Review