Modal

Modal is a serverless platform for running Python in the cloud with zero configuration. Define everything in code—no YAML, Docker, or Kubernetes required.

Quick Start

import modal

app = modal.App("my-app")

@app.function() def hello(): return "Hello from Modal!"

@app.local_entrypoint() def main(): print(hello.remote())

Run: modal run app.py

Core Concepts

Functions

Decorate Python functions to run remotely:

@app.function(gpu="H100", memory=32768, timeout=600) def train_model(data): # Runs on H100 GPU with 32GB RAM, 10min timeout return model.fit(data)

Images

Define container environments via method chaining:

image = ( modal.Image.debian_slim(python_version="3.12") .apt_install("ffmpeg", "libsndfile1") .uv_pip_install("torch", "transformers", "numpy") .env({"CUDA_VISIBLE_DEVICES": "0"}) )

app = modal.App("ml-app", image=image)

Key image methods:

.debian_slim() / .micromamba()
Base images
.uv_pip_install() / .pip_install()
Python packages
.apt_install()
System packages
.run_commands()
Shell commands
.add_local_python_source()
Local modules
.env()
Environment variables

GPUs

Attach GPUs with a single parameter:

@app.function(gpu="H100") # Single H100 @app.function(gpu="A100-80GB") # 80GB A100 @app.function(gpu="H100:4") # 4x H100 @app.function(gpu=["H100", "A100-40GB:2"]) # Fallback options

Available: B200, H200, H100, A100-80GB, A100-40GB, L40S, L4, A10G, T4

Classes with Lifecycle Hooks

Load models once at container startup:

@app.cls(gpu="L40S") class Model: @modal.enter() def load(self): self.model = load_pretrained("model-name")

@modal.method()
def predict(self, x):
    return self.model(x)

Usage

Model().predict.remote(data)

Web Endpoints

Deploy APIs instantly:

@app.function() @modal.fastapi_endpoint() def api(text: str): return {"result": process(text)}

For complex apps

@app.function() @modal.asgi_app() def fastapi_app(): from fastapi import FastAPI web = FastAPI()

@web.get("/health")
def health():
    return {"status": "ok"}

return web

Volumes (Persistent Storage)

volume = modal.Volume.from_name("my-data", create_if_missing=True)

@app.function(volumes={"/data": volume}) def save_file(content: str): with open("/data/output.txt", "w") as f: f.write(content) volume.commit() # Persist changes

Secrets

@app.function(secrets=[modal.Secret.from_name("my-api-key")]) def call_api(): import os key = os.environ["API_KEY"]

Create secrets: Dashboard or modal secret create my-secret KEY=value

Dicts (Distributed Key-Value Store)

cache = modal.Dict.from_name("my-cache", create_if_missing=True)

@app.function() def cached_compute(key: str): if key in cache: return cache[key] result = expensive_computation(key) cache[key] = result return result

Queues (Distributed FIFO)

queue = modal.Queue.from_name("task-queue", create_if_missing=True)

@app.function() def producer(): queue.put_many([{"task": i} for i in range(10)])

@app.function() def consumer(): while task := queue.get(timeout=60): process(task)

Parallel Processing

Map over inputs (auto-parallelized)

results = list(process.map(items))

Spawn async jobs

calls = [process.spawn(item) for item in items] results = [call.get() for call in calls]

Batch processing (up to 1M inputs)

process.spawn_map(range(100_000))

Scheduling

@app.function(schedule=modal.Period(hours=1)) def hourly_job(): pass

@app.function(schedule=modal.Cron("0 9 * * 1-5")) # 9am weekdays def daily_report(): pass

CLI Commands

modal run app.py # Run locally-triggered function modal serve app.py # Hot-reload web endpoints modal deploy app.py # Deploy persistently modal shell app.py # Interactive shell in container modal app list # List deployed apps modal app logs <name> # Stream logs modal volume list # List volumes modal secret list # List secrets

Common Patterns

LLM Inference

@app.cls(gpu="H100", image=image) class LLM: @modal.enter() def load(self): from vllm import LLM self.llm = LLM("meta-llama/Llama-3-8B")

@modal.method()
def generate(self, prompt: str):
    return self.llm.generate(prompt)

Download Models at Build Time

def download_model(): from huggingface_hub import snapshot_download snapshot_download("model-id", local_dir="/models")

image = ( modal.Image.debian_slim() .pip_install("huggingface-hub") .run_function(download_model) )

Concurrency for I/O-bound Work

@app.function() @modal.concurrent(max_inputs=100) async def fetch_urls(url: str): async with aiohttp.ClientSession() as session: return await session.get(url)

Memory Snapshots (Faster Cold Starts)

@app.cls(enable_memory_snapshot=True, gpu="A10G") class FastModel: @modal.enter(snap=True) def load(self): self.model = load_model() # Snapshot this state

Autoscaling

@app.function( min_containers=2, # Always keep 2 warm max_containers=100, # Scale up to 100 buffer_containers=5, # Extra buffer for bursts scaledown_window=300, # Keep idle for 5 min ) def serve(): pass

Best Practices

Put imports inside functions when packages aren't installed locally
Use @modal.enter() for expensive initialization (model loading)
Pin dependency versions for reproducible builds
Use Volumes for model weights and persistent data
Use memory snapshots for sub-second cold starts in production
Set appropriate timeouts for long-running tasks
Use min_containers=1 for production APIs to keep containers warm
Use absolute imports with full package paths (not relative imports)

Fast Image Builds with uv_sync

Use .uv_sync() instead of .pip_install() for faster dependency installation:

In pyproject.toml, define dependency groups:

[dependency-groups]

modal = ["fastapi", "pydantic-ai>=1.0.0", "logfire"]

image = ( modal.Image.debian_slim(python_version="3.12") .uv_sync("agent", groups=["modal"], frozen=False) .add_local_python_source("agent.src") # Use dot notation for packages )

Key points:

Deploy from project root: modal deploy agent/src/api.py
Use dot notation in .add_local_python_source("package.subpackage")
Imports must match: from agent.src.config import ... (not relative from .config )

Logfire Observability

Add observability with Logfire (especially for pydantic-ai):

@app.cls(image=image, secrets=[..., modal.Secret.from_name("logfire")], min_containers=1) class Web: @modal.enter() def startup(self): import logfire logfire.configure(send_to_logfire="if-token-present", environment="production", service_name="my-agent") logfire.instrument_pydantic_ai() self.agent = create_agent()

Reference Documentation

See references/ for detailed guides on images, functions, GPUs, scaling, web endpoints, storage, dicts, queues, sandboxes, and networking.

Official docs: https://modal.com/docs

modal-deployment

Safety Notice

Copy this and send it to your AI assistant to learn

Usage

For complex apps

Map over inputs (auto-parallelized)

Spawn async jobs

Batch processing (up to 1M inputs)

In pyproject.toml, define dependency groups:

[dependency-groups]

modal = ["fastapi", "pydantic-ai>=1.0.0", "logfire"]

Source Transparency

Related Skills

tanstack-start

Product Framework

Langchain Skill Vmisep 2026

Element Nft Tracker