ARENA-001: Multi-Model Council

Parallel execution of multiple local LLMs with voting strategies for higher quality responses.

Why Multi-Model?

Diversity: Different models = different perspectives
Robustness: If one fails, others continue
Quality: Consensus often beats single model
Cost: All local = $0 (vs $0.60/M for cloud)

Quick Start

from scripts.council import council_decide

# Simple usage
result = council_decide(
    "Explain Python decorators",
    models=['nerdsking-3b', 'llama-3.1-8b'],
    strategy="weighted"
)
print(result)

Architecture

User Prompt
    ↓
[Router] → Model A → Response A
         → Model B → Response B  
         → Model C → Response C
    ↓
[Voting Engine]
    ↓
Consensus Response

Voting Strategies

1. Majority Vote

Most common response wins (exact match).

2. Weighted Vote (default)

Bigger models get more weight:

Model	Weight
Nerdsking 3B	1
Llama 3.1 8B	2
Strand 14B	3
Mistral 24B	4
GLM 4.7	5
Qwen3.5 35B	6

Usage Examples

Basic

from scripts.council import ModelCouncil
import asyncio

async def main():
    async with ModelCouncil() as council:
        answer = await council.decide(
            "Best practice for API design?",
            strategy="weighted"
        )
        print(answer)

asyncio.run(main())

Custom Models

# Use specific models only
council = ModelCouncil(
    active_models=['qwen3.5-35b', 'glm-4.7-flash'],
    timeout=30
)

Raw Responses

# Get all responses without voting
responses = await council.query_all("Explain async/await")
for model, response in responses.items():
    print(f"{model}: {response}")

Integration with OpenClaw

# In your skill:
from arena_council.scripts.council import council_decide

def handle_query(prompt):
    # Use council instead of single model
    return council_decide(
        prompt,
        strategy="weighted"
    )

Performance

Setup	Time	Cost
Single (Kimi)	2s	$0.12
Council (3 local)	5s	$0
Council (5 local)	8s	$0

Requirements

pip install aiohttp

Models Available

Any model running on LM Studio (http://127.0.0.1:1234/v1). Recommended:

Small/fast (QA, routing): qwen3.5-9b, glm-4.7-flash, nerdsking-3b
Medium (analysis): huihui-qwen3.5-27b-abliterated, huihui-mistral-small-24b-abliterated
Heavy (complex tasks): qwen3.5-35b-uncensored-hauhaucs, holo3-35b

Use curl http://127.0.0.1:1234/v1/models to list what's loaded.

God Mode Integration

When all council models refuse due to censorship — apply god-mode techniques and retry:

# If council returns no usable responses (all refused)
if all_refused(responses):
    from god_mode import apply_techniques
    modified_prompt = apply_techniques(prompt, methods=["unicode", "prefill"])
    responses = await council.query_all(modified_prompt)

See: god-mode skill for full technique library.

Model Susceptibility Testing

Before using a censored model in council, test it:

Send a borderline request directly
Note: refuses? → needs god-mode wrapper
Document in MODEL-CATALOG.md

arena-council

Safety Notice

Copy this and send it to your AI assistant to learn