qwen-qwen3

Qwen Qwen3 — run Qwen3.5, Qwen3, Qwen3-Coder, Qwen2.5-Coder, and Qwen3-ASR across your local fleet. LLM inference, code generation, and speech-to-text from Alibaba's Qwen family via Ollama Herd. Cross-platform (macOS, Linux, Windows). Zero cloud costs.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "qwen-qwen3" with this command: npx skills add twinsgeeks/qwen-qwen3

Qwen — Run Qwen Models Across Your Local Fleet

Run Qwen3.5, Qwen3, Qwen3-Coder, and Qwen ASR on your own hardware. The fleet router picks the best device for every request — chat, code generation, and speech-to-text from one endpoint.

Supported Qwen models

LLM (Chat & Reasoning)

ModelParametersOllama nameBest for
Qwen3.50.8B–397B MoEqwen3.5Latest — multimodal, best reasoning
Qwen30.6B–235B MoEqwen3Competitive with GPT-4o
Qwen2.50.5B–72Bqwen2.5Proven, stable, multilingual

Code Generation

ModelParametersOllama nameBest for
Qwen3-Coder30B MoE (3.3B active)qwen3-coderAgentic coding workflows
Qwen2.5-Coder0.5B–32Bqwen2.5-coderCode — matches GPT-4o at 32B

Speech-to-Text

ModelParametersToolBest for
Qwen3-ASR0.6B–1.7Bmlx-qwen3-asrState-of-the-art local transcription

Setup

pip install ollama-herd
herd              # start the router (port 11435)
herd-node         # run on each machine

# Pull Qwen models
ollama pull qwen3.5:32b
ollama pull qwen3-coder

For speech-to-text:

uv tool install "mlx-qwen3-asr[serve]" --python 3.14
curl -X POST http://localhost:11435/dashboard/api/settings \
  -H "Content-Type: application/json" -d '{"transcription": true}'

Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd

Use Qwen through the fleet

OpenAI SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

# Qwen3.5 for general chat
response = client.chat.completions.create(
    model="qwen3.5:32b",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Qwen3-Coder for code

response = client.chat.completions.create(
    model="qwen3-coder",
    messages=[{"role": "user", "content": "Write a FastAPI CRUD app with SQLAlchemy"}],
)
print(response.choices[0].message.content)

Qwen ASR for transcription

curl http://localhost:11435/api/transcribe -F "audio=@meeting.wav"
import httpx

def transcribe(audio_path):
    with open(audio_path, "rb") as f:
        resp = httpx.post(
            "http://localhost:11435/api/transcribe",
            files={"audio": (audio_path, f)},
            timeout=300.0,
        )
    resp.raise_for_status()
    return resp.json()["text"]

Ollama API

# Qwen3.5 chat
curl http://localhost:11435/api/chat -d '{
  "model": "qwen3.5:32b",
  "messages": [{"role": "user", "content": "Explain transformers"}],
  "stream": false
}'

# Qwen2.5-Coder
curl http://localhost:11435/api/chat -d '{
  "model": "qwen2.5-coder:32b",
  "messages": [{"role": "user", "content": "Optimize this SQL query: ..."}],
  "stream": false
}'

Hardware recommendations

Cross-platform: These are example configurations. Any device (Mac, Linux, Windows) with equivalent RAM works. The fleet router runs on all platforms.

ModelMin RAMRecommended hardware
qwen3.5:0.8b2GBAny Mac
qwen3.5:9b8GBMac Mini M4 (16GB)
qwen3.5:32b24GBMac Mini M4 Pro (48GB)
qwen3.5:122b-a10b64GBMac Studio M4 Max (128GB)
qwen3.5:397b-a17b256GB+Mac Studio M3 Ultra (512GB)
qwen3-coder24GBMac Mini M4 Pro (48GB)
qwen2.5-coder:32b24GBMac Mini M4 Pro (48GB)
Qwen3-ASR (0.6B)1.2GBAny Mac
Qwen3-ASR (1.7B)3.4GBAny Mac (8GB+)

Why run Qwen locally

  • Zero cost — no per-token charges for Qwen API
  • Privacy — Chinese and English content stays on your devices
  • Full Qwen family — chat, code, reasoning, and speech-to-text from one fleet
  • No rate limits — Alibaba Cloud throttles API access. Local runs unlimited
  • Fleet routing — multiple machines share the load. The router picks the fastest available

The Qwen advantage on this fleet

Qwen models are uniquely suited for fleet routing:

  • MoE architecture — Qwen3.5 (397B total, 17B active) and Qwen3-Coder (30B total, 3.3B active) use Mixture of Experts. Only a fraction of parameters activate per request, making them fast despite large total size.
  • Size variety — from 0.6B to 397B, there's a Qwen model for every device in your fleet. Small Macs run the small models, big Macs run the big ones.
  • Code + Chat + STT — Qwen covers three modalities. One vendor, one fleet, three capabilities.

Also available on this fleet

Other LLM models

Llama 3.3, DeepSeek-V3, DeepSeek-R1, Phi 4, Mistral, Gemma 3 — any Ollama model routes through the same endpoint.

Image generation

curl -o image.png http://localhost:11435/api/generate-image \
  -H "Content-Type: application/json" \
  -d '{"model":"z-image-turbo","prompt":"a sunset","width":1024,"height":1024,"steps":4}'

Embeddings

curl http://localhost:11435/api/embeddings -d '{"model":"nomic-embed-text","prompt":"query"}'

Dashboard

http://localhost:11435/dashboard — monitor Qwen requests alongside all other models. Per-model latency, token throughput, error rates, health checks.

Full documentation

Agent Setup Guide

Guardrails

  • Never pull or delete Qwen models without user confirmation.
  • Never delete or modify files in ~/.fleet-manager/.
  • If a Qwen model is too large for available memory, suggest a smaller variant or MoE version.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Qwen Qwen3 5

Qwen 3.5 by Alibaba — run Qwen 3.5 (the latest and most capable Qwen model) across your local device fleet. Qwen 3.5 rivals GPT-4o and Claude 3.5 on reasonin...

Registry SourceRecently Updated
1461Profile unavailable
General

Ollama Herd

Ollama multimodal model router for Llama, Qwen, DeepSeek, Phi, and Mistral — plus mflux image generation, speech-to-text, and embeddings. Self-hosted Ollama...

Registry SourceRecently Updated
2330Profile unavailable
Coding

Deepseek Deepseek Coder

DeepSeek DeepSeek-Coder — run DeepSeek-V3, DeepSeek-R1, DeepSeek-Coder across your local fleet. 7-signal scoring routes every request to the best device. Cro...

Registry SourceRecently Updated
1322Profile unavailable
Coding

Mistral Codestral

Mistral and Codestral — run Mistral Large, Mistral-Nemo, Codestral, and Mistral-Small locally. Mistral AI's open-source LLMs for code generation and reasonin...

Registry SourceRecently Updated
1270Profile unavailable