modal-knowledge

Modal Knowledge Skill

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "modal-knowledge" with this command: npx skills add josiahsiegel/claude-plugin-marketplace/josiahsiegel-claude-plugin-marketplace-modal-knowledge

Modal Knowledge Skill

Comprehensive Modal.com platform knowledge covering all features, pricing, and best practices. Activate this skill when users need detailed information about Modal's serverless cloud platform.

Activation Triggers

Activate this skill when users ask about:

  • Modal.com platform features and capabilities

  • GPU-accelerated Python functions

  • Serverless container configuration

  • Modal pricing and billing

  • Modal CLI commands

  • Web endpoints and APIs on Modal

  • Scheduled/cron jobs on Modal

  • Modal volumes, secrets, and storage

  • Parallel processing with Modal

  • Modal deployment and CI/CD

Platform Overview

Modal is a serverless cloud platform for running Python code, optimized for AI/ML workloads with:

  • Zero Configuration: Everything defined in Python code

  • Fast GPU Startup: ~1 second container spin-up

  • Automatic Scaling: Scale to zero, scale to thousands

  • Per-Second Billing: Only pay for active compute

  • Multi-Cloud: AWS, GCP, Oracle Cloud Infrastructure

Core Components Reference

Apps and Functions

import modal

app = modal.App("app-name")

@app.function() def basic_function(arg: str) -> str: return f"Result: {arg}"

@app.local_entrypoint() def main(): result = basic_function.remote("test") print(result)

Function Decorator Parameters

Parameter Type Description

image

Image Container image configuration

gpu

str/list GPU type(s): "T4", "A100", ["H100", "A100"]

cpu

float CPU cores (0.125 to 64)

memory

int Memory in MB (128 to 262144)

timeout

int Max execution seconds

retries

int Retry attempts on failure

secrets

list Secrets to inject

volumes

dict Volume mount points

schedule

Cron/Period Scheduled execution

concurrency_limit

int Max concurrent executions

container_idle_timeout

int Seconds to keep warm

include_source

bool Auto-sync source code

GPU Reference

Available GPUs

GPU Memory Use Case ~Cost/hr

T4 16 GB Small inference $0.59

L4 24 GB Medium inference $0.80

A10G 24 GB Inference/fine-tuning $1.10

L40S 48 GB Heavy inference $1.50

A100-40GB 40 GB Training $2.00

A100-80GB 80 GB Large models $3.00

H100 80 GB Cutting-edge $5.00

H200 141 GB Largest models $5.00

B200 180+ GB Latest gen $6.25

GPU Configuration

Single GPU

@app.function(gpu="A100")

Specific memory variant

@app.function(gpu="A100-80GB")

Multi-GPU

@app.function(gpu="H100:4")

Fallbacks (tries in order)

@app.function(gpu=["H100", "A100", "any"])

"any" = L4, A10G, or T4

@app.function(gpu="any")

Image Building

Base Images

Debian slim (recommended)

modal.Image.debian_slim(python_version="3.11")

From Dockerfile

modal.Image.from_dockerfile("./Dockerfile")

From Docker registry

modal.Image.from_registry("nvidia/cuda:12.1.0-base-ubuntu22.04")

Package Installation

pip (standard)

image.pip_install("torch", "transformers")

uv (FASTER - 10-100x)

image.uv_pip_install("torch", "transformers")

System packages

image.apt_install("ffmpeg", "libsm6")

Shell commands

image.run_commands("apt-get update", "make install")

Adding Files

Single file

image.add_local_file("./config.json", "/app/config.json")

Directory

image.add_local_dir("./models", "/app/models")

Python source

image.add_local_python_source("my_module")

Environment variables

image.env({"VAR": "value"})

Build-Time Function

def download_model(): from huggingface_hub import snapshot_download snapshot_download("model-name")

image.run_function(download_model, secrets=[...])

Storage

Volumes

Create/reference volume

vol = modal.Volume.from_name("my-vol", create_if_missing=True)

Mount in function

@app.function(volumes={"/data": vol}) def func(): # Read/write to /data vol.commit() # Persist changes

Secrets

From dashboard (recommended)

modal.Secret.from_name("secret-name")

From dictionary

modal.Secret.from_dict({"KEY": "value"})

From local env

modal.Secret.from_local_environ(["KEY1", "KEY2"])

From .env file

modal.Secret.from_dotenv()

Usage

@app.function(secrets=[modal.Secret.from_name("api-keys")]) def func(): import os key = os.environ["API_KEY"]

Dict and Queue

Distributed dict

d = modal.Dict.from_name("cache", create_if_missing=True) d["key"] = "value" d.put("key", "value", ttl=3600)

Distributed queue

q = modal.Queue.from_name("jobs", create_if_missing=True) q.put("task") item = q.get()

Web Endpoints

FastAPI Endpoint (Simple)

@app.function() @modal.fastapi_endpoint() def hello(name: str = "World"): return {"message": f"Hello, {name}!"}

ASGI App (Full FastAPI)

from fastapi import FastAPI web_app = FastAPI()

@web_app.post("/predict") def predict(text: str): return {"result": process(text)}

@app.function() @modal.asgi_app() def fastapi_app(): return web_app

WSGI App (Flask)

from flask import Flask flask_app = Flask(name)

@app.function() @modal.wsgi_app() def flask_endpoint(): return flask_app

Custom Web Server

@app.function() @modal.web_server(port=8000) def custom_server(): subprocess.run(["python", "-m", "http.server", "8000"])

Custom Domains

@modal.asgi_app(custom_domains=["api.example.com"])

Scheduling

Cron

Daily at 8 AM UTC

@app.function(schedule=modal.Cron("0 8 * * *"))

With timezone

@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))

Period

@app.function(schedule=modal.Period(hours=5)) @app.function(schedule=modal.Period(days=1))

Note: Scheduled functions only run with modal deploy , not modal run .

Parallel Processing

Map

Parallel execution (up to 1000 concurrent)

results = list(func.map(items))

Unordered (faster)

results = list(func.map(items, order_outputs=False))

Starmap

Spread args

pairs = [(1, 2), (3, 4)] results = list(add.starmap(pairs))

Spawn

Async job (returns immediately)

call = func.spawn(data) result = call.get() # Get result later

Spawn many

calls = [func.spawn(item) for item in items] results = [call.get() for call in calls]

Container Lifecycle (Classes)

@app.cls(gpu="A100", container_idle_timeout=300) class Server:

@modal.enter()
def load(self):
    self.model = load_model()

@modal.method()
def predict(self, text):
    return self.model(text)

@modal.exit()
def cleanup(self):
    del self.model

Concurrency

@modal.concurrent(max_inputs=100, target_inputs=80) @modal.method() def batched(self, item): pass

CLI Commands

Development

modal run app.py # Run function modal serve app.py # Hot-reload dev server modal shell app.py # Interactive shell modal shell app.py --gpu A100 # Shell with GPU

Deployment

modal deploy app.py # Deploy modal app list # List apps modal app logs app-name # View logs modal app stop app-name # Stop app

Resources

Volumes

modal volume create name modal volume list modal volume put name local remote modal volume get name remote local

Secrets

modal secret create name KEY=value modal secret list

Environments

modal environment create staging

Pricing (2025)

Plans

Plan Price Containers GPU Concurrency

Starter Free ($30 credits) 100 10

Team $250/month 1000 50

Enterprise Custom Unlimited Custom

Compute

  • CPU: $0.0000131/core/sec

  • Memory: $0.00000222/GiB/sec

  • GPUs: See GPU table above

Special Programs

  • Startups: Up to $25k credits

  • Researchers: Up to $10k credits

Best Practices

  • Use @modal.enter() for model loading

  • Use uv_pip_install for faster builds

  • Use GPU fallbacks for availability

  • Set appropriate timeouts and retries

  • Use environments (dev/staging/prod)

  • Download models during build, not runtime

  • Use order_outputs=False when order doesn't matter

  • Set container_idle_timeout to balance cost/latency

  • Monitor costs in Modal dashboard

  • Test with modal run before modal deploy

Common Patterns

LLM Inference

@app.cls(gpu="A100", container_idle_timeout=300) class LLM: @modal.enter() def load(self): from vllm import LLM self.llm = LLM(model="...")

@modal.method()
def generate(self, prompt):
    return self.llm.generate([prompt])

Batch Processing

@app.function(volumes={"/data": vol}) def process(file): # Process file vol.commit()

Parallel

results = list(process.map(files))

Scheduled ETL

@app.function( schedule=modal.Cron("0 6 * * *"), secrets=[modal.Secret.from_name("db")] ) def daily_etl(): extract() transform() load()

Quick Reference

Task Code

Create app app = modal.App("name")

Basic function @app.function()

With GPU @app.function(gpu="A100")

With image @app.function(image=img)

Web endpoint @modal.asgi_app()

Scheduled schedule=modal.Cron("...")

Mount volume volumes={"/path": vol}

Use secret secrets=[modal.Secret.from_name("x")]

Parallel map func.map(items)

Async spawn func.spawn(arg)

Class pattern @app.cls() with @modal.enter()

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

cloudflare-knowledge

No summary provided by upstream source.

Repository SourceNeeds Review
General

tailwindcss-advanced-layouts

No summary provided by upstream source.

Repository SourceNeeds Review
General

tailwindcss-animations

No summary provided by upstream source.

Repository SourceNeeds Review
General

tailwindcss-mobile-first

No summary provided by upstream source.

Repository SourceNeeds Review