linux-ai-server

Linux AI Server — turn Linux servers into a local AI inference cluster. Headless Linux AI with systemd, NVIDIA CUDA, and zero GUI overhead. Linux AI server for Llama, Qwen, DeepSeek, Phi, Mistral. Run a Linux AI server cluster on Ubuntu, Debian, RHEL, Fedora. Linux AI服务器本地推理。Servidor Linux IA para inferencia local.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "linux-ai-server" with this command: npx skills add twinsgeeks/linux-ai-server

Linux AI Server — Headless AI Inference Cluster

Turn your Linux servers into a distributed AI inference cluster. No GUI, no Docker, no Kubernetes — just Linux + pip install. Your rack-mounted servers, cloud VMs, and spare Linux boxes all serve AI through one endpoint.

Why Linux AI server

  • Zero GUI overhead — headless Linux AI uses all resources for inference, not desktops
  • systemd native — Linux AI server starts on boot, restarts on failure, logs to journald
  • SSH management — manage your Linux AI server cluster entirely over SSH
  • Any Linux distro — Ubuntu, Debian, RHEL, Fedora, Arch, Alpine — if it runs Ollama, it joins the fleet
  • NVIDIA CUDA — Linux AI server uses NVIDIA GPUs natively. No compatibility issues.
  • Fleet routing — multiple Linux AI servers share the load. 7-signal scoring picks the best one.

Linux AI server setup

Quick install on each Linux server

# Install Ollama on Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Install the Linux AI router
pip install ollama-herd

Linux AI server router (pick one server)

herd          # start Linux AI server router on port 11435
herd-node     # register this Linux AI server

Linux AI server nodes (all other servers)

herd-node     # auto-discovers the Linux AI server router
# Or explicit: herd-node --router-url http://router-ip:11435

Linux AI server systemd services

# /etc/systemd/system/herd-router.service
[Unit]
Description=Linux AI Server Router
After=network.target ollama.service

[Service]
Type=simple
ExecStart=/usr/local/bin/herd
Restart=always
RestartSec=5
User=ollama

[Install]
WantedBy=multi-user.target
# /etc/systemd/system/herd-node.service
[Unit]
Description=Linux AI Server Node
After=network.target ollama.service

[Service]
Type=simple
ExecStart=/usr/local/bin/herd-node
Restart=always
RestartSec=5
User=ollama

[Install]
WantedBy=multi-user.target
sudo systemctl enable --now herd-router    # on the Linux AI router
sudo systemctl enable --now herd-node      # on all Linux AI nodes

Linux AI server hardware guide

Linux AI ServerGPURAMBest Linux AI models
Rack server (NVIDIA A100)80GB256GBdeepseek-v3, qwen3.5:72b — frontier
Rack server (NVIDIA L40S)48GB128GBllama3.3:70b, qwen3.5:32b
Desktop server (RTX 4090)24GB64GBllama3.3:70b (Q4), deepseek-r1:32b
Mini PC / NUC (no GPU)CPU32GBphi4, gemma3:12b — CPU inference
Cloud VM (no GPU)CPU16GBphi4-mini, gemma3:4b
Raspberry Pi 5CPU8GBgemma3:1b, phi4-mini — edge AI

Linux AI server works with NVIDIA CUDA GPUs, AMD ROCm (experimental), and CPU-only inference.

Use your Linux AI server

OpenAI SDK

from openai import OpenAI

# Your Linux AI server endpoint
client = OpenAI(base_url="http://linux-ai-server:11435/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="llama3.3:70b",
    messages=[{"role": "user", "content": "Write a Terraform module for AWS ECS"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

curl from any machine

# Hit your Linux AI server from anywhere on the network
curl http://linux-ai-server:11435/api/chat -d '{
  "model": "codestral",
  "messages": [{"role": "user", "content": "Write a Dockerfile for a FastAPI app"}],
  "stream": false
}'

Linux AI server environment

# Optimize Linux AI server Ollama
sudo systemctl edit ollama
# Add under [Service]:
#   Environment="OLLAMA_KEEP_ALIVE=-1"
#   Environment="OLLAMA_MAX_LOADED_MODELS=-1"
#   Environment="OLLAMA_NUM_PARALLEL=2"
sudo systemctl restart ollama

Linux AI server firewall

# UFW (Ubuntu/Debian)
sudo ufw allow 11435/tcp

# firewalld (RHEL/Fedora)
sudo firewall-cmd --add-port=11435/tcp --permanent && sudo firewall-cmd --reload

Linux AI server monitoring

# Linux AI server fleet status
curl -s http://localhost:11435/fleet/status | python3 -m json.tool

# Linux AI server health — 15 automated checks
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

# Linux AI server traces — recent requests
curl -s "http://localhost:11435/dashboard/api/traces?limit=10" | python3 -m json.tool

# Linux AI server logs
journalctl -u herd-router -f
tail -f ~/.fleet-manager/logs/herd.jsonl.$(date +%Y-%m-%d)

Dashboard at http://linux-ai-server:11435/dashboard — access from any browser on the network.

Also available on Linux AI server

Image generation

curl http://localhost:11435/api/generate-image \
  -d '{"model": "z-image-turbo", "prompt": "server rack visualization", "width": 1024, "height": 1024}'

Embeddings

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "Linux AI server headless inference"}'

Full documentation

Contribute

Ollama Herd is open source (MIT). Linux server admins welcome:

Guardrails

  • Linux AI server model downloads require explicit user confirmation.
  • Linux AI server model deletion requires explicit user confirmation.
  • Never delete or modify files in ~/.fleet-manager/.
  • No models are downloaded automatically — all pulls are user-initiated or require opt-in.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

GigaChat (Sber AI) Proxy

Integrate GigaChat (Sber AI) with OpenClaw via gpt2giga proxy

Registry SourceRecently Updated
3600smvlx
General

TencentCloud Video Face Fusion

通过提取两张人脸核心特征并实现自然融合,支持多种风格适配,提升创意互动性和内容传播力,广泛应用于创意营销、娱乐互动和社交分享场景。

Registry SourceRecently Updated
General

TencentCloud Image Face Fusion

图片人脸融合(专业版)为同步接口,支持自定义美颜、人脸增强、牙齿增强、拉脸等参数,最高支持8K分辨率,有多个模型类型供选择。

Registry SourceRecently Updated
General

YoudaoNote News

有道云笔记资讯推送:基于收藏笔记分析关注话题,推送最新相关资讯。支持对话触发与每日定时推送(如早上9点)。触发词:资讯推送、设置资讯推送、生成资讯推送。

Registry SourceRecently Updated
1.5K1lephix