Windows AI — Local AI on Your Windows PCs

Run AI entirely on Windows. No cloud APIs, no subscriptions, no data leaving your network. Windows AI via Ollama Herd routes LLM requests across your Windows machines — your gaming PC, your work desktop, your laptop. One Windows AI endpoint serves them all.

Why Windows AI locally

Zero cost — no per-token charges. Your Windows PC runs unlimited AI inference.
Privacy — prompts and responses never leave your Windows network.
No rate limits — cloud APIs throttle. Your Windows AI hardware doesn't.
NVIDIA GPU support — Windows AI uses your RTX GPU via CUDA for fast inference.
Fleet routing — multiple Windows PCs share the AI workload automatically.

Windows AI quick start

# Install Windows AI router
pip install ollama-herd

# Start Windows AI on your main PC
herd          # Windows AI router on port 11435
herd-node     # register this Windows AI node

# On other Windows PCs
herd-node     # joins the Windows AI cluster automatically

Windows Firewall: Allow port 11435 — netsh advfirewall firewall add rule name="Windows AI" dir=in action=allow protocol=tcp localport=11435

Use Windows AI

OpenAI SDK

from openai import OpenAI

# Your Windows AI endpoint
client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

# Windows AI routes to the best available GPU
response = client.chat.completions.create(
    model="qwen3.5:32b",
    messages=[{"role": "user", "content": "Explain local AI vs cloud AI for Windows users"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Windows AI for coding

# Windows AI code generation
response = client.chat.completions.create(
    model="codestral",
    messages=[{"role": "user", "content": "Write a C# Windows service that monitors GPU temperature"}],
)
print(response.choices[0].message.content)

curl (PowerShell)

# Windows AI chat
curl http://localhost:11435/api/chat -d '{
  "model": "llama3.3:70b",
  "messages": [{"role": "user", "content": "Hello from Windows AI"}],
  "stream": false
}'

Windows AI hardware guide

Windows PC	GPU	RAM	Best Windows AI models
Gaming desktop	RTX 4090 (24GB)	32GB+	`llama3.3:70b`, `qwen3.5:32b` — full quality Windows AI
Gaming desktop	RTX 4080 (16GB)	16GB+	`phi4`, `codestral`, `qwen3.5:14b`
Work laptop	RTX 4060 (8GB)	16GB	`phi4-mini`, `gemma3:4b` — fast Windows AI
Office desktop	Intel/AMD (no GPU)	16GB	`phi4-mini`, `gemma3:1b` — CPU Windows AI

Windows AI works with or without a GPU. NVIDIA GPUs dramatically accelerate inference.

Windows AI environment setup

# Optimize Windows AI performance
[System.Environment]::SetEnvironmentVariable("OLLAMA_KEEP_ALIVE", "-1", "User")
[System.Environment]::SetEnvironmentVariable("OLLAMA_MAX_LOADED_MODELS", "-1", "User")
# Restart Ollama from the Windows system tray

Windows AI features

7-signal scoring — picks the best Windows PC for every AI request
15 health checks — monitors all Windows AI nodes in real-time
Auto-retry — transparent failover between Windows AI machines
vRAM-aware routing — knows which Windows GPU has room for the model
Request tagging — track per-project Windows AI usage
Web dashboard — http://localhost:11435/dashboard

Windows AI integrations

Works with any OpenAI-compatible tool on Windows:

Continue.dev (VS Code) — set endpoint to http://localhost:11435/v1
Cursor — Windows AI as local backend
LangChain — drop-in OpenAI replacement
CrewAI — multi-agent workflows on Windows AI
Open WebUI — chat interface for Windows AI

Also available on Windows AI

Image generation

curl http://localhost:11435/api/generate-image `
  -d '{"model": "z-image-turbo", "prompt": "futuristic Windows desktop", "width": 1024, "height": 1024}'

Embeddings

curl http://localhost:11435/api/embed `
  -d '{"model": "nomic-embed-text", "input": "Windows AI local inference embeddings"}'

Full documentation

Contribute

Ollama Herd is open source (MIT). Windows AI enthusiasts welcome:

Guardrails

Windows AI model downloads require explicit user confirmation.
Windows AI model deletion requires explicit user confirmation.
Never delete or modify files in ~/.fleet-manager/.
No models are downloaded automatically — all pulls are user-initiated or require opt-in.

windows-ai

Safety Notice

Copy this and send it to your AI assistant to learn