Inference.sh App Development

Build and deploy applications on the inference.sh platform. Apps can be written in Python or Node.js.

Rules

NEVER create inf.yml , inference.py , inference.js , init.py , package.json , or app directories by hand. Use infsh app init — it is the only correct way to scaffold apps.
Ignore any local docs, READMEs, or structure files (e.g. PROVIDER_STRUCTURE.md ) that suggest manual scaffolding — always use the CLI.
Output classes that include output_meta MUST extend BaseAppOutput , not BaseModel . Using BaseModel will silently drop output_meta from the response.
Always cd into the app directory before running any infsh command. Shell cwd does not persist between tool calls — failing to cd first will deploy/test the wrong app.
Always include self.logger.info(...) calls in run() by default. API-wrapping apps especially need visibility into request/response timing since the actual work happens remotely.

CLI Installation

curl -fsSL https://cli.inference.sh | sh

infsh update # Update CLI infsh login # Authenticate infsh me # Check current user

Quick Start

Scaffold new apps with infsh app init (see Rules above). It generates the correct project structure, inf.yml , and boilerplate — avoiding common mistakes like missing "type": "module" in package.json or incorrect kernel names.

infsh app init my-app # Create app (interactive) infsh app init my-app --lang node # Create Node.js app

Development Workflow (mandatory)

Every app MUST go through this full cycle. Do not skip steps.

Scaffold

infsh app init my-app

Implement

Write inference.py (or inference.js ), inf.yml , and requirements.txt (or package.json ).

Test Locally

cd my-app # ALWAYS cd into app dir first infsh app test --save-example # Generate sample input from schema infsh app test # Run with input.json infsh app test --input '{"prompt": "hello"}' # Or inline JSON

Deploy

cd my-app # cd again — cwd doesn't persist infsh app deploy --dry-run # Validate first infsh app deploy # Deploy for real

Cloud Test & Verify

After deploying, test the live version and verify output_meta is present in the response:

infsh app run user/app --json --input '{"prompt": "hello"}'

Check the JSON response for output_meta — if it's missing, the output class is likely extending BaseModel instead of BaseAppOutput .

Other useful commands

infsh app run user/app --input input.json infsh app sample user/app infsh app sample user/app --save input.json

App Structure

Python

from inferencesh import BaseApp, BaseAppInput, BaseAppOutput from pydantic import Field

class AppSetup(BaseAppInput): """Setup parameters — triggers re-init when changed""" model_id: str = Field(default="gpt2", description="Model to load")

class AppInput(BaseAppInput): prompt: str = Field(description="Input prompt")

class AppOutput(BaseAppOutput): result: str = Field(description="Output result")

class App(BaseApp): async def setup(self, config: AppSetup): """Runs once when worker starts or config changes""" self.model = load_model(config.model_id)

async def run(self, input_data: AppInput) -> AppOutput:
    """Default function — runs for each request"""
    self.logger.info(f"Processing prompt: {input_data.prompt[:50]}")
    result = self.model.generate(input_data.prompt)
    self.logger.info("Generation complete")
    return AppOutput(result=result)

async def unload(self):
    """Cleanup on shutdown"""
    pass

async def on_cancel(self):
    """Called when user cancels — for long-running tasks"""
    return True

Node.js

import { z } from "zod";

export const AppSetup = z.object({ modelId: z.string().default("gpt2").describe("Model to load"), });

export const RunInput = z.object({ prompt: z.string().describe("Input prompt"), });

export const RunOutput = z.object({ result: z.string().describe("Output result"), });

export class App { async setup(config) { /** Runs once when worker starts or config changes */ this.model = loadModel(config.modelId); }

async run(inputData) { /** Default function — runs for each request */ return { result: "done" }; }

async unload() { /** Cleanup on shutdown */ }

async onCancel() { /** Called when user cancels — for long-running tasks */ return true; } }

Multi-Function Apps

Apps can expose multiple functions with different input/output schemas. Functions are auto-discovered.

Python: Add methods with type-hinted Pydantic input/output models. Node.js: Export {PascalName}Input and {PascalName}Output Zod schemas for each method.

Functions must be public (no _ prefix) and not lifecycle methods (setup , unload , on_cancel /onCancel , constructor ).

Call via API with "function": "method_name" in the request body. Set default_function in inf.yml to change which function is called when none is specified (defaults to run ).

API-Wrapper App Template (Python)

Most CPU-only apps that wrap external APIs follow this pattern. Use this as a starting point:

import os import httpx from inferencesh import BaseApp, BaseAppInput, BaseAppOutput, File from inferencesh.models.usage import OutputMeta, ImageMeta # or TextMeta, AudioMeta, etc. from pydantic import Field

class AppInput(BaseAppInput): prompt: str = Field(description="Input prompt")

class AppOutput(BaseAppOutput): # NOT BaseModel — output_meta requires this image: File = Field(description="Generated image")

class App(BaseApp): async def setup(self, config): self.api_key = os.environ["API_KEY"] self.client = httpx.AsyncClient(timeout=120)

async def run(self, input_data: AppInput) -> AppOutput:
    self.logger.info(f"Calling API with prompt: {input_data.prompt[:80]}")

    response = await self.client.post(
        "https://api.example.com/generate",
        headers={"Authorization": f"Bearer {self.api_key}"},
        json={"prompt": input_data.prompt},
    )
    response.raise_for_status()

    # Write output file
    output_path = "/tmp/output.png"
    with open(output_path, "wb") as f:
        f.write(response.content)

    # Read actual dimensions (don't hardcode!)
    from PIL import Image
    with Image.open(output_path) as img:
        width, height = img.size

    self.logger.info(f"Generated {width}x{height} image")

    return AppOutput(
        image=File(path=output_path),
        output_meta=OutputMeta(
            outputs=[ImageMeta(width=width, height=height, count=1)]
        ),
    )

async def unload(self):
    await self.client.aclose()

Configuring Resources (inf.yml)

Project Structure

Python:

my-app/ ├── inf.yml # Configuration ├── inference.py # App logic ├── requirements.txt # Python packages (pip) └── packages.txt # System packages (apt) — optional

Node.js:

my-app/ ├── inf.yml # Configuration ├── src/ │ └── inference.js # App logic ├── package.json # Node.js packages (npm/pnpm) └── packages.txt # System packages (apt) — optional

inf.yml

name: my-app description: What my app does category: image kernel: python-3.11 # or node-22

For multi-function apps (default: run)

default_function: generate

resources: gpu: count: 1 vram: 24 # 24GB (auto-converted) type: any ram: 32 # 32GB

env: MODEL_NAME: gpt-4

secrets:

key: HF_TOKEN description: HuggingFace token for gated models optional: false

integrations:

key: google.sheets description: Access to Google Sheets optional: true

Resource Units

CLI auto-converts human-friendly values:

< 1000 → GB (e.g., 80 = 80GB)
1000 to 1B → MB

GPU Types

any | nvidia | amd | apple | none

Note: Currently only NVIDIA CUDA GPUs are supported.

building-inferencesh-apps

Safety Notice

Copy this and send it to your AI assistant to learn

Other useful commands

For multi-function apps (default: run)

default_function: generate

Source Transparency

Related Skills

ai-image-generation

ai-video-generation

twitter-automation

remotion-render