Ubuntu Ollama — Fleet Routing for Ollama on Ubuntu

Run Ollama on Ubuntu with multi-machine load balancing. Ubuntu Ollama Herd turns your Ubuntu servers and desktops into one smart Ollama endpoint. Install with apt + pip, manage with systemd, monitor with the web dashboard.

Ubuntu Ollama setup

Step 1: Install Ollama on Ubuntu

# Install Ollama on Ubuntu
curl -fsSL https://ollama.ai/install.sh | sh

# Verify Ollama is running on Ubuntu
ollama --version
systemctl status ollama

Step 2: Install Ubuntu Ollama Herd

# Ubuntu prerequisites
sudo apt update && sudo apt install python3-pip curl -y

# Install Ubuntu Ollama fleet router
pip install ollama-herd

Step 3: Start Ubuntu Ollama router

On one Ubuntu machine (the router):

herd          # start Ubuntu Ollama router on port 11435
herd-node     # register this Ubuntu Ollama node

On every other Ubuntu machine:

herd-node     # auto-discovers the Ubuntu Ollama router via mDNS

No mDNS? Connect Ubuntu Ollama nodes directly: herd-node --router-url http://router-ip:11435

Step 4: Verify Ubuntu Ollama fleet

curl -s http://localhost:11435/fleet/status | python3 -m json.tool

Ubuntu Ollama systemd services

Run Ubuntu Ollama as systemd services for automatic startup:

# Ubuntu Ollama router service
sudo tee /etc/systemd/system/herd-router.service << 'EOF'
[Unit]
Description=Ubuntu Ollama Router
After=network.target ollama.service

[Service]
Type=simple
ExecStart=/usr/local/bin/herd
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

# Ubuntu Ollama node service
sudo tee /etc/systemd/system/herd-node.service << 'EOF'
[Unit]
Description=Ubuntu Ollama Node
After=network.target ollama.service

[Service]
Type=simple
ExecStart=/usr/local/bin/herd-node
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable --now herd-router
sudo systemctl enable --now herd-node

Use Ubuntu Ollama

OpenAI SDK

from openai import OpenAI

# Your Ubuntu Ollama fleet
client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="llama3.3:70b",
    messages=[{"role": "user", "content": "Write an Ubuntu cron job for log rotation"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

curl (Ollama format)

# Ubuntu Ollama inference
curl http://localhost:11435/api/chat -d '{
  "model": "qwen3.5:32b",
  "messages": [{"role": "user", "content": "Explain Ubuntu apt package management"}],
  "stream": false
}'

curl (OpenAI format)

curl http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "phi4", "messages": [{"role": "user", "content": "Hello from Ubuntu Ollama"}]}'

Ubuntu Ollama NVIDIA CUDA setup

# Install NVIDIA drivers on Ubuntu for Ollama CUDA
sudo apt install nvidia-driver-550 -y
sudo reboot

# Verify Ubuntu NVIDIA CUDA
nvidia-smi

# Ubuntu Ollama automatically uses CUDA when NVIDIA drivers are installed
ollama ps    # should show GPU acceleration

Ubuntu Ollama environment

# Optimize Ollama on Ubuntu via systemd
sudo systemctl edit ollama
# Add under [Service]:
#   Environment="OLLAMA_KEEP_ALIVE=-1"
#   Environment="OLLAMA_MAX_LOADED_MODELS=-1"
#   Environment="OLLAMA_NUM_PARALLEL=2"
sudo systemctl restart ollama

# Verify Ubuntu Ollama settings
systemctl show ollama | grep Environment

Ubuntu Ollama model recommendations

Ubuntu Machine	GPU	Best Ubuntu Ollama models
Ubuntu desktop (RTX 4090)	24GB	`llama3.3:70b`, `qwen3.5:32b`, `deepseek-r1:32b`
Ubuntu desktop (RTX 4080)	16GB	`phi4`, `codestral`, `qwen3.5:14b`
Ubuntu Server (A100)	80GB	`deepseek-v3`, `qwen3.5:72b`
Ubuntu Server (no GPU)	CPU	`phi4-mini`, `gemma3:4b`
Ubuntu on Raspberry Pi 5	CPU	`gemma3:1b`, `phi4-mini`

Ubuntu Ollama firewall

# Ubuntu UFW
sudo ufw allow 11435/tcp
sudo ufw reload

Monitor Ubuntu Ollama

# Ubuntu Ollama fleet status
curl -s http://localhost:11435/fleet/status | python3 -m json.tool

# Ubuntu Ollama health — 15 automated checks
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

# Ubuntu Ollama models loaded
curl -s http://localhost:11435/api/ps | python3 -m json.tool

# Ubuntu Ollama logs
journalctl -u herd-router -f
tail -f ~/.fleet-manager/logs/herd.jsonl.$(date +%Y-%m-%d)

Dashboard at http://localhost:11435/dashboard — live Ubuntu Ollama monitoring.

Also available on Ubuntu Ollama

Image generation

curl http://localhost:11435/api/generate-image \
  -d '{"model": "z-image-turbo", "prompt": "Ubuntu penguin in space", "width": 1024, "height": 1024}'

Embeddings

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "Ubuntu Ollama local inference routing"}'

Full documentation

Contribute

Ollama Herd is open source (MIT). Ubuntu Ollama users welcome:

Guardrails

Ubuntu Ollama model downloads require explicit user confirmation.
Ubuntu Ollama model deletion requires explicit user confirmation.
Never delete or modify files in ~/.fleet-manager/.
No models are downloaded automatically — all pulls are user-initiated or require opt-in.

ubuntu-ollama

Safety Notice

Copy this and send it to your AI assistant to learn